3 min read

Free resources for data science and computational biology

Updated Feb 2020

Below are some resources that I have found useful related to data science, data visualization, and computational biology. These mostly pertain to the R environment.

Newsletters and Blogs

  1. Subscribe to Monday morning data science (MMDS) - weekly newsletter with updates on R and data science news and resources

  2. Subscribe to R-bloggers as a way to learn about new tools

  3. The Simply statistics blog has updates from leading academic data scientists

Research workflow - best practices

  1. Cookie cutter data science - provides template for data science project file structure

    • there are other opinions/examples on how to do this. Might be best to pick one and then organize this the same way for all your projects. Even better if you use a R package that can create this project structure automatically, like workflowR.
  2. Peerj - practical data science. This is a collection of articles that discuss best practices for data collection and analysis.

  3. A guide to getting RStudio set up on Amazon Web Services

  4. eLife’s computationally reproducible articles blend manuscript with editable code and interactive figures. They recommend the use of [Stencila Desktop](), a free online manuscript editor.

Courses

  1. Audit this series of courses: Data science specialization on coursera

  2. Take the Stanford edX course on statistical learning. Very comprehensive - includes videos, a free book, and R code examples.

  3. An understanding of statistics (e.g. linear models, dimensionality reduction, permutations) and linear algebra help with doing computational biology. Essential linear algebra and statistics are offered by this course. Knowing the basics of calculus is also helpful.

  4. A sequence of courses offered by Rafa Irizarry on edX in R for genomics

  5. R for Data Science. A comprehensive guide to how to do data science. Covers everything from data import to statistical modeling, visualization, and communication of results. Provides both the big picture and details through worked examples. Explains “tidyverse” principles. Only downside is there’s not a focus on life sciences.

Genomics databases

  1. Genomics databases available online - there’s a wealth of knowledge out there. Wikipedia has a good list, and so does CSHL.

Twitter

  1. #hiddencurriculum hashtag on Twitter - discusses insights into graduate school that people wish they knew when they started.

  2. Follow Hadley Wickham, Jeff Leek, Roger Peng on Twitter.

Data visualization

  1. Policyvis for data visualization help.

  2. Data-to-vis is a guide for picking the best graph for your data.

  3. Data viz course in D3 - combination of youtube lectures and worked examples in Vizhub (kind of like Github, but provides an online HTMl editor for D3 and SVG). Some worked examples come from this excellent free online javascript book, Eloquent Javascript.

  4. Fundamentals of data visualization - free online book by Clause O. Wilke.

  5. Data visualization: A Practical Introduction - free online book by Kieran Healy.

Examples of data visualization

  1. Very nice visualization of cave rescue

  2. Flowing data is a collection of data visualization and data journalism.

  3. Pudding.cool has well-illustrated and interesting data essays

  4. Visual cinnamon has excellent examples of data visualization

  5. Look at the winners from data visualization contests like malofiej and the information is beautiful awards

Other

  1. Sign up for a local hackathon. If there are none in your neighborhood, join one online.

  2. Contribute to an open-source project. Here’s a guide for how to get started.

  3. Cheatsheets offered by RStudio

  4. A cheatsheet for statistical tests and their equivalent linear models in R