Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - The Logic of Inference
-
segmentChapter 10 - Model Comparison with F
-
segmentChapter 11 - Parameter Estimation and Confidence Intervals
-
segmentPART IV: MULTIVARIATE MODELS
-
segmentChapter 12 - Introduction to Multivariate Models
-
segmentChapter 13 - Multivariate Model Comparisons
-
13.4 Inference for Targeted Model Comparisons
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Advanced Statistics and Data Science (ABCD)
13.4 Inference for Targeted Model Comparisons
By using targeted model comparisons, we can compare a complex model (with two predictors) to a simpler one with just a single predictor. This allows us to see how one variable (e.g., HomeSizeK
) in the multivariate model uniquely improves the fit of the model to the data, even after controlling for the effect of other predictors (e.g., Neighborhood
).
But just the fact that HomeSizeK
reduces error in our data over a model that doesn’t include it does not show that it is a better model of the DGP. For that, we need to rule out the possibility that the simple model of the DGP could have produced an F (or PRE) for the HomeSizeK
effect as large as the one we observed in the data.
For the HomeSizeK
effect, we are comparing these two models of the DGP (expressed in both R code and GLM notation:
Model | R Code | GLM Notation |
---|---|---|
Complex |
PriceK ~ Neighborhood + HomeSizeK
|
\(PriceK_i= \beta_0 + \beta_1NeighborhoodEastside_{i} + \beta_2HomeSizeK_{i} + \epsilon_i\) |
Simple |
PriceK ~ Neighborhood
|
\(PriceK_i= \beta_0 + \beta_1NeighborhoodEastside_{i} + \colorbox{yellow}{(0)}HomeSizeK_{i} + \epsilon_i\) |
We have highlighted a different way of describing the simple Neighborhood
model. It is a model where the additional effect of HomeSizeK
is 0. Could this simpler DGP produce an F as large as the one we observed in our data?
F and p-value in the ANOVA Table
The answer to this question is summarized by the p-values in the ANOVA table below. The supernova()
function uses a mathematical model of the F distribution, assuming that the simpler of the two models being compared is a true model of the DGP. It then looks to see how likely the observed F would be to have resulted in a world in which the simpler model is true and any effect of the additional predictor is only due to randomness.
Analysis of Variance Table (Type III SS)
Model: PriceK ~ Neighborhood + HomeSizeK
SS df MS F PRE p
------------ --------------- | ---------- -- --------- ------ ------ -----
Model (error reduced) | 124402.900 2 62201.450 17.216 0.5428 .0000
Neighborhood | 27758.138 1 27758.138 7.683 0.2094 .0096
HomeSizeK | 42003.739 1 42003.739 11.626 0.2862 .0019
Error (from model) | 104774.201 29 3612.903
------------ --------------- | ---------- -- --------- ------ ------ -----
Total (empty model) | 229177.101 31 7392.810
The p-value on the Model
row (.0000) means that there is less than a .0001 chance that an F as large as the overall F (17) could be generated by the simple model (which, for this row, is the empty model). This small p-value indicates that we should reject the simple model.
The p-value for HomeSizeK
(0.0019) is also very small so we should reject the simpler model.
This p-value means that the probability of getting an F of 11.626 for HomeSizeK
in the multivariate model – if HomeSizeK
adds no predictive value in the DGP – is very low (0.0096). Based on this, we would reject the simple model that only includes Neighborhood
, and go with the complex model that includes both Neighborhood
and HomeSizeK
.