Note: no evening help sessions this week – both David and Ellen need to cancel. To make up for this, I will hold extra office hours in GDC 7.516 on Wednesday of this week from 4:30 to 5:30 PM.

On Monday, we will discuss two advanced topics regarding prediction intervals:

  • how to form prediction intervals in models that involve a transformation of the y variable. We’ll learn this in the context of the homework problem on supply and demand for milk. Download “milk.R” from the R Scripts tab above.
  • diagnosing when “simple” prediction intervals break down (in the presence of heteroskedasticity, or non-constant variance), and fixing them using quantile regression. Download “hetpred.R” from the R Scripts tab above. For those who want to read a simple (and entirely optional) overview of quantile regression, this page is pretty accessible.

We will also finish our discussion of R^2 and the decomposition of variance. We will then work on a case study anchored in the first half of Chapter 4 of the course packet, about grouping variables in regression models. The goal is to build a model to predict flaws in a manufacturing process for printed circuit boards.

On Wednesday, we will spend a lot of time in class talking about main effects and interactions. This will finish off the material from Chapter 4. See the “solder.R” and “TenMileRace.R” scripts from the R Scripts tab above. Topics include:

  • interpreting main effects and interactions in a model with only grouping variables as predictors
  • detecting interactions graphically (e.g. boxplots on combinations of features)
  • Using an ANOVA table to reason about the practical significance of a main effect or interaction in improving the fit of a model
  • interpreting main effects and interactions in models with both a numerical and a categorical predictor (changing the intercept vs. changing the slope).
  • detecting interactions of numerical and categorical variables graphically (parallel lines, versus lines with different slopes)

Readings

For Wednesday, please finish Chapter 4 of the course packet (picking up with the section on “Numerical and grouping variables together.”)

For Monday of next week, please read chapter 5, on quantifying uncertainty using the bootstrap. We will cover the sections starting from “Confidence intervals and coverage” onward in class next Monday, so it is optional to have read that far by then. You can skip the section at the very end on “Bootstrapped Prediction Intervals,” which we will not cover in this course.

Videos

For Monday of next week, please watch the following videos:

Software

Outside of class, complete the following R walkthroughs.

For Wednesday of this week:

  • house prices: modeling numerical outcomes with both numerical and categorical predictors.

For next week:

  • Gone fishing: using the Monte Carlo method to simulate the sampling distributions of the sample mean and of the least-squares estimator of a regression line.
  • Creatinine, revisited: bootstrapping the sample mean and the OLS estimator; computing confidence intervals from bootstrapped samples; standard errors and confidence intervals from the normality assumption.

Exercises

Exercises 3 this week are about regression models incorporating grouping variables. They are due in class on February 13.