Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  The Logic of Inference

segmentChapter 10  Model Comparison with F

segmentChapter 11  Parameter Estimation and Confidence Intervals

segmentPART IV: MULTIVARIATE MODELS

segmentChapter 12  Introduction to Multivariate Models

segmentChapter 13  Multivariate Model Comparisons

13.4 Inference for Targeted Model Comparisons

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list College / Advanced Statistics and Data Science (ABCD)
13.4 Inference for Targeted Model Comparisons
By using targeted model comparisons, we can compare a complex model (with two predictors) to a simpler one with just a single predictor. This allows us to see how one variable (e.g., HomeSizeK
) in the multivariate model uniquely improves the fit of the model to the data, even after controlling for the effect of other predictors (e.g., Neighborhood
).
But just the fact that HomeSizeK
reduces error in our data over a model that doesn’t include it does not show that it is a better model of the DGP. For that, we need to rule out the possibility that the simple model of the DGP could have produced an F (or PRE) for the HomeSizeK
effect as large as the one we observed in the data.
For the HomeSizeK
effect, we are comparing these two models of the DGP (expressed in both R code and GLM notation:
Model  R Code  GLM Notation 

Complex 
PriceK ~ Neighborhood + HomeSizeK

\(PriceK_i= \beta_0 + \beta_1NeighborhoodEastside_{i} + \beta_2HomeSizeK_{i} + \epsilon_i\) 
Simple 
PriceK ~ Neighborhood

\(PriceK_i= \beta_0 + \beta_1NeighborhoodEastside_{i} + \colorbox{yellow}{(0)}HomeSizeK_{i} + \epsilon_i\) 
We have highlighted a different way of describing the simple Neighborhood
model. It is a model where the additional effect of HomeSizeK
is 0. Could this simpler DGP produce an F as large as the one we observed in our data?
F and pvalue in the ANOVA Table
The answer to this question is summarized by the pvalues in the ANOVA table below. The supernova()
function uses a mathematical model of the F distribution, assuming that the simpler of the two models being compared is a true model of the DGP. It then looks to see how likely the observed F would be to have resulted in a world in which the simpler model is true and any effect of the additional predictor is only due to randomness.
Analysis of Variance Table (Type III SS)
Model: PriceK ~ Neighborhood + HomeSizeK
SS df MS F PRE p
        
Model (error reduced)  124402.900 2 62201.450 17.216 0.5428 .0000
Neighborhood  27758.138 1 27758.138 7.683 0.2094 .0096
HomeSizeK  42003.739 1 42003.739 11.626 0.2862 .0019
Error (from model)  104774.201 29 3612.903
        
Total (empty model)  229177.101 31 7392.810
The pvalue on the Model
row (.0000) means that there is less than a .0001 chance that an F as large as the overall F (17) could be generated by the simple model (which, for this row, is the empty model). This small pvalue indicates that we should reject the simple model.
The pvalue for HomeSizeK
(0.0019) is also very small so we should reject the simpler model.
This pvalue means that the probability of getting an F of 11.626 for HomeSizeK
in the multivariate model – if HomeSizeK
adds no predictive value in the DGP – is very low (0.0096). Based on this, we would reject the simple model that only includes Neighborhood
, and go with the complex model that includes both Neighborhood
and HomeSizeK
.