Course Outline

segmentGetting Started (Don't Skip This Part)

segmentIntroduction to Statistics: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

11.1 PRE and F Ratio Revisited

segmentChapter 12  What You Have Learned

segmentResources
list Introduction to Statistics: A Modeling Approach
PRE and F Ratio Revisited
Constructing confidence intervals around parameters is a perfectly fine way to test the difference between two models. But in this chapter we are going to explore a new method, the F test. The F test is probably the most widespread method for comparing statistical models, and by now you have all the concepts you need to understand how it works.
Let’s start by going back to the model we just fit, the one we saved in an object called Condition.model. In the DataCamp window below, add code to run supernova()
to get the ANOVA table for Condition.model.
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
Condition.model < lm(Tip ~ Condition, data = TipExperiment)
# add code to get the supernova table for this model
Condition.model < lm(Tip ~ Condition, data = TipExperiment)
# add code to get the supernova table for this model
supernova(Condition.model)
test_function("supernova")
test_function_result("supernova")
test_error()
success_msg("Keep up the good work!")
In Chapters 7 and 8 we introduced
supernova()
and spent considerable time developing the concept of PRE, and a little less time developing the F ratio.
L_Ch11_PRE_1
PRE, or Proportional Reduction in Error, indicates the percentage of total variation in the outcome variable that can be explained by the more complex model (in this case, the twogroup model we called Condition.model). The F ratio is closely related to PRE, though it takes into account the number of degrees of freedom used to fit the model. Both are ways of quantifying the strength of the relationship between the explanatory and outcome variables. You can think of F as a measure of the strength of a relationship (like PRE) per parameter included in the model.
The fact that the twogroup model has a positive PRE is pretty much a given. Our data shows that adding Condition to the model helps explain .07 of the error that remained unexplained. But is that .07 meaningful? Is that a lot? The only way PRE would be equal to 0 is if there were no mean difference at all between the two groups in the sample. In that case, knowing which group a party was in (Smiley Face or Control) would add no predictive value, and thus result in 0 reduction in error.
But just as finding a difference between the two means does not by itself rule out the possibility that the true difference in the DGP could be 0, the same is true of PRE. Just because in the sample distribution the twogroup model reduces error by .07, that doesn’t necessarily rule out the possibility that the true PRE in the population might be 0.
When just looking at the incremental difference of means (\(b_1\)), we constructed a sampling distribution to help us put the observed difference in means in context. In particular, it allowed us to ask: could the mean difference observed by the researchers have occurred just by chance if the true mean difference in the DGP was 0?
L_Ch11_PRE_2
Previously, we learned methods of simulation and bootstrapping to help us implement various DGPs that we dreamed up. How would we implement a situation where there was no relationship, that is, a \(\beta_1=0\), between Condition and Tip? One way to break any relationship in the data is to shuffle the values of Condition so that the Tip would be randomly related to whether there was a smiley face, or nothing, on the check.
L_Ch11_PRE_3