Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
8.6 Assessing Model Fit with Sum of Squares
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - The Logic of Inference
-
segmentChapter 10 - Model Comparison with F
-
segmentChapter 11 - Parameter Estimation and Confidence Intervals
-
segmentPART IV: MULTIVARIATE MODELS
-
segmentChapter 12 - Introduction to Multivariate Models
-
segmentChapter 13 - Multivariate Model Comparisons
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Advanced Statistics and Data Science (ABCD)
8.6 Assessing Model Fit with Sum of Squares
Finally, let’s examine the fit of our regression model by running the supernova()
function on our model. And at the same time, let’s compare the table we get from the regression model (Height_model
) with the one we produced before for the Height2Group_model
.
supernova(Height2Group_model)
supernova(Height_model)
Height2Group Model
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group
SS df MS F PRE p
----- --------------- | --------- --- ------- ------ ------ -----
Model (error reduced) | 830.880 1 830.880 11.656 0.0699 .0008
Error (from model) | 11049.331 155 71.286
----- --------------- | --------- --- ------- ------ ------ -----
Total (empty model) | 11880.211 156 76.155
Height Model
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height
SS df MS F PRE p
----- --------------- | --------- --- -------- ------ ------ -----
Model (error reduced) | 1816.862 1 1816.862 27.984 0.1529 .0000
Error (from model) | 10063.349 155 64.925
----- --------------- | --------- --- -------- ------ ------ -----
Total (empty model) | 11880.211 156 76.155
Remember, the total sum of squares is the sum of squared deviations (or more generally, residuals) from the empty model. Total sum of squares is all about the outcome variable, and isn’t affected by the explanatory variable or variables. And when we compare statistical models, as we are doing here, we always are modeling the same outcome variable.
Partitioning Sums of Squares
If you want to try out the app Dr. Ji uses in this video you can click this link to the sum of squares applet. Copy/paste the data below into the little “sample data” box to reproduce Ji’s examples. (Here’s the full link in case that one doesn’t work: http://www.rossmanchance.com/applets/RegShuffle.htm)
Height2 | Group Thumb |
0 | 56 |
0 | 60 |
1 | 61 |
0 | 63 |
1 | 64 |
1 | 68 |
Height | Thumb |
62 | 56 |
66 | 60 |
67 | 61 |
63 | 63 |
68 | 64 |
71 | 68 |
For any model with an explanatory variable (what we have been calling “complex models”), the SS Total can be partitioned into the SS Error and the SS Model. The SS Model is the amount by which the error is reduced under the complex model (e.g., the Height
model) compared with the empty model.
As we developed previously for the group models, SS Model is easily calculated by subtracting SS Error from SS Total. This is the same, regardless of whether you are fitting a group model or a regression model. Error from the model is defined in the former case as residuals from the group means, and in the latter, residuals from the regression line.
It also is possible to calculate the SS Model in the regression model directly, in much the same way we did for the group model. Recall that for the group model, SS Model was the sum of the squared deviations of each person’s predicted score (their group mean) from the Grand Mean. In the regression model, SS Model is calculated in exactly the same way, except that each person’s predicted score is defined as a point on the regression line. The Grand Mean is the same in both cases.