On Monday, we will finish up Chapter 4 of the course packet by talking about interactions between numerical and categorical variables. Topics include:

  • interpreting main effects and interactions in models with both a numerical and a categorical predictor (changing the intercept vs. changing the slope). See TenMileRace.R and LifeExpectancy.R from the “R Scripts” tab.
  • detecting interactions of numerical and categorical variables graphically (parallel lines, versus lines with different slopes)
  • correlation among the predictors in a regression model (collinearity).

We will then turn our attention to Chapter 5 of the course packet. A lot of this you will have covered in the videos from last week. We’ll briefly review two key topics:

  • sampling distributions and standard errors
  • bootstrapping (using the TenMileRace.R script)

In class, we will practice bootstrapping in a regression model by revisiting two recent homework problems: finishing times in a ten-mile road race, and economic growth versus life expectancy in three different groups of countries.

Time allowing, we will then talk about two more advanced topics in Chapter 5:

  • confidence intervals
  • the normal linear regression model

On Wednesday, we will finish the material from Monday. We will then start Chapter 6 of the course packet on multiple regression. The key idea here is that of a partial relationship between two variables in a multiple-regression model. The lecture material here is all covered in the videos (see below). During class time, we will work on a case study on gas prices in Austin. This is your main homework problem this week.

Software

Outside of class, complete the following R walkthroughs.

For Wednesday of this week:

  • The wage gap: an introduction to multiple regression; quantifying uncertainty in a multiple regression model by bootstrapping.

For Monday of next week:

  • Current population survey: the affect of collinearity on the estimated coefficients and ANOVA table in a multiple regression model.

Readings

If you haven’t yet finished reading Chapter 5, then please do so. Then read Chapter 6 of the course packet: through page 135 by Wednesday (reading until the section on “Using multiple regression to address real-world questions”), and the rest of the chapter by next Monday. Note that in order to complete your homework, you’ll need some of the ideas from the latter half of Chapter 6 (i.e. the part you should have read by next Monday), particularly the use of the bootstrap to quantify uncertainty in a multiple regression model.

Videos

For Wednesday of this week (Week 5):

  • Partial relationships: using multiple regression to estimate a partial relationship between two variables, holding another variable constant.

For Monday of next week (Week 6). Useful for thinking about the gas-prices case study:

  • Using multiple regression: a real-world application of multiple regression to answer some questions about the real-estate market in Saratoga, NY.

Exercises

Exercises 4 this week are about quantifying uncertainty and multiple regression modeling. They are due in class on February 19.