These are my materials from “Introduction to Data Science” at Middlebury College in Spring 2023.

This course utilizes R and RStudio for computation. Free download available here.

There is no official textbook for this course. Here are some resources that I utilize for inspiration/examples. They are great references.

Course Materials

what topic notes_template_qmd notes_html homework_qmd project other_links
A Introductions, Installation, Intro R, RMarkdown, File Paths
B Naming Objects, Wrangling data with `dplyr`:: `filter`, `select`, `arrange`
C Aggregating data with `summarize`, `group_by()`
D Making pretty tables with `kableExtra`
E Making plots with `ggplot2`: barplots, scatterplots
F Making plots with `ggplot2`: line graphs, histograms & boxplots
G Making plots with `ggplot2`: `scales`, jitter, labels
H Working with categorical data using `forcats`
Midterm Project
I Joining tables with `dplyr`
J Pivoting data with `tidyr`: `pivot_longer`, `pivot_wider`
K Maps – choropleth maps
L Maps – plotting points
M Simple Linear Regression, pretty output with `broom`
N Multiple Regression
O Web scraping tables with `rvest`
P Web scraping text with `rvest`
Q Working with text as data using `stringr`
R Working with dates and times using `lubridate`
Final Project

Notes

What’s Next?

This course is meant to be an Introduction to Data Science and is only the tip of the iceberg! If you you interested in further reading or next steps, I would recommend the following: