Welcome to STA 371H!

Before the first day of class

The first thing to do is to install R and then RStudio on your own computer. Detailed instructions for installing these two programs can be found here. Both are free.

R is the underlying data-analysis program we’ll use in this course, while RStudio provides a nice front-end interface to R that makes certain repetitive steps (e.g. loading data, saving plots) very simple. I will use RStudio in class most days this semester, and you will use it most weeks for your homework. RStudio depends upon having R available behind the scenes, so make sure you install both, even though you won’t need to interact directly with R.

Please install these on your own computer before the first day of class.

In class

On the first day of class, we’ll first give an overview of the course (the course syllabus is here, intro slides here).

Then we’ll talk a bit about visualization. As a gentle introduction to the material, we will go over some basic principles that separate good data visualizations from bad ones (slides here).

Time allowing, we will also get a gentle introduction to R, by loading in the data for the case study we’ll work on in class next week: calorie counts for meals at Chipotle.

Video lectures

These follow Chapter 1 of the course packet exactly. Please watch these before coming to class on Monday, January 22.

Reading

All readings are accessible through the Resources tab, above. For this week, please read the introduction and Chapter 1 of the course packet. The key topics are:

  • continuous and categorical/grouping variables
  • contingency tables
  • simple summaries and graphics: histogram, boxplot, dotplot, scatter plot, lattice plot
  • variation between and within groups.
  • variation among numerical variables.
  • multivariate plots

Optionally, you can also consult Chapter 1 of Kaplan. This is mainly useful as an introduction to R. Feel free to move rapidly if you’re feeling comfortable with the software.

Software

Once you’ve installed R and RStudio, complete the following R walkthroughs. The first two are designed to get you off the ground, so if you’re familiar with R, you can safely skip these.

Supplemental reading

The material this week and next also coincides roughly with Chapter 1 of OpenIntro: Statistics. As with the Kaplan book, you should not feel obligated to read this, but it would be a good supplement for anyone looking for additional study materials or review of your first statistics class.

Additional practice with R

If you are feeling a little uncomfortable with the idea of R, do not worry. We will practice a lot in class. But if you’d like, you can get a jump start on things by following along with all the R commands in Chapters 1-3 of Kaplan. Just replicate exactly what he does in your own R session. Don’t just copy and paste; actually type the commands yourself! It’s the best way to learn them.

Exercises

The first set of exercises will cover this week’s and next week’s material. They are due Monday, Jan 29, and they are posted here.

Just for fun

Some examples of great data visualizations from the New York Times: