On Monday, we will take some time to practice exploratory data analysis and visualization skills in class. There are two short case studies. First, there’s just a single variable to work with: calorie counts for meals at Chipotle. Follow the link and answer the questions for discussion in class.
For the second case, we’ll work with the data set on cars from the course packet, where there are lots more variables. Here, you should:
1) Download the cars.csv data.
2) Use what you learned on last week’s R walkthroughs to replicate Figures 1.11 and 1.12 from the course packet.
Being able to do this will set you up for success on the first homework question this week.
We will then have a short lecture that introduces the idea of fitting simple regression models via ordinary least squares (OLS). This material covers Chapter 2 of the course packet, up until the section on “Beyond Straight Lines.” Key concepts:
- the parameters of a statistical model
- the least-squares criterion
- fitted values and residuals in a linear regression model
- interpretation of the intercept and slope
- goals of regression analysis.
On Wednesday we will practice fitting linear models in class. Then we will introduce the idea of fitting nonlinear curves using the least-squares criterion (most of which is covered in the videos).
Videos
Please watch the following short videos for class next week:
- Fitting polynomial models. Covers polynomial regression models and overfitting.
- Exponential growth/decay curves
- Power laws and the log-log transformation
Software practice
Complete the following software walkthroughs. For Wednesday of this week:
- Asking prices of pickup trucks.: fitting straight lines
To prepare for next week:
- Utility bills versus temperature: adding polynomial terms to fit nonlinear curves
- Infant mortality and GDP: using log transformations to fit power laws via linear least squares
Readings
From the course packet, please read all of Chapter 2. Suggested pacing would be:
- Sections on “Fitting Straight Lines” and “Goals of Regression Analysis” by class on Wednesday.
- The section on “Beyond Straight Lines” by the beginning of class next week.
Good optional readings are Chapter 2 and Chapter 3 of the Kaplan book. This overlaps, although not entirely, with Chapter 1 of the course packet.
Another good optional reading is Chapter 3 of Tufte’s book, pages 65-73. The examples on pages 74-107 are also good, but there are a lot of them. Feel free to consult these if you want to see examples of regression analysis being applied to real problems.
Exercises
The first set of exercises is due at the beginning of class on January 29. These will have you practice the data exploration and simple model-fitting skills you’ve learned on the walkthroughs from the first two weeks of class.