STAT 118

Introduction to Data Science

Emily Malcolm-White (she/her)

Middlebury College

Overview

  • What is Data Science?
  • What will we be doing in this class?
  • Intro to RStudio and Quarto

What is Data Science?

  • From RStudio CTO and R for Data Science co-author Hadley Wickham: “Data science is the process of turning data into understanding.”

We use data to understand something

  • What customers are interested in my product?
  • Who will respond to this cancer treatment?
  • How do neighborhoods in Chicago differ?

We use data to do something

  • Facial recognition
  • Targeted Advertising
  • Predict Disease Spread
  • Fraud Detection

Who is a data scientist?

Data Science Process

Credit: @allisonhorst

Is this how you’ve been working with data?

Credit: Lumen Learning

Cool! But what about this might make it hard to reproduce?

Tools for this course

R

  • R is a free & open-source programming language
  • Think “big fancy calculator”

RStudio

  • An Integrated Development Environment for working w/ R
  • it makes it easier to write R code, work with objects, prepare & publish documents all in one place

Quarto

  • Quarto is a file format for making dynamic documents – documents that content text and chunks of embedded R code
  • Think Word Document or Google Doc with code in it

Assessment in this course

  • Homework: Each class we will have a lesson (30-40 minutes) and time to work on your daily homework assignments (30 - 40 minutes)
  • Midterm Project: can be completed individually or in pairs
  • Final Project: can be completed individually or in pairs

Late Work

  • When you become aware that you won’t be able to make a deadline, please email Professor Emily to let her know which homework you won’t be submitting on time and what date you anticipate the homework will be done. You do not need to disclose why you are missing the deadline. So long as you communicate to me before the deadline, no late penalty will be applied.
  • If you do not communicate with me before the deadline, late submissions will be subject to a penalty of 20% per day

Academic Integrity

  • Unless explicitly stated otherwise, you may make use of online resources (e.g. StackOverflow) for coding examples on assignments. If you directly use code from an outside source (or use it as inspiration), you must or explicitly cite where you obtained the code. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism.
  • On individual assignments, you may discuss the assignment with one another; however, you may not directly share code or write up with other students.

Let’s get to it!