Course Outline

list Introduction to Statistics: A Modeling Approach

PRE and F Ratio Revisited

Constructing confidence intervals around parameters is a perfectly fine way to test the difference between two models. But in this chapter we are going to explore a new method, the F test. The F test is probably the most widespread method for comparing statistical models, and by now you have all the concepts you need to understand how it works.

Let’s start by going back to the model we just fit, the one we saved in an object called Condition.model. In the DataCamp window below, add code to run supernova() to get the ANOVA table for Condition.model.

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Condition.model <- lm(Tip ~ Condition, data = TipExperiment) # add code to get the supernova table for this model Condition.model <- lm(Tip ~ Condition, data = TipExperiment) # add code to get the supernova table for this model supernova(Condition.model) test_function("supernova") test_function_result("supernova") test_error() success_msg("Keep up the good work!")
DataCamp: ch11-5

In Chapters 7 and 8 we introduced supernova() and spent considerable time developing the concept of PRE, and a little less time developing the F ratio.

L_Ch11_PRE_1

PRE, or Proportional Reduction in Error, indicates the percentage of total variation in the outcome variable that can be explained by the more complex model (in this case, the two-group model we called Condition.model). The F ratio is closely related to PRE, though it takes into account the number of degrees of freedom used to fit the model. Both are ways of quantifying the strength of the relationship between the explanatory and outcome variables. You can think of F as a measure of the strength of a relationship (like PRE) per parameter included in the model.

The fact that the two-group model has a positive PRE is pretty much a given. Our data shows that adding Condition to the model helps explain .07 of the error that remained unexplained. But is that .07 meaningful? Is that a lot? The only way PRE would be equal to 0 is if there were no mean difference at all between the two groups in the sample. In that case, knowing which group a party was in (Smiley Face or Control) would add no predictive value, and thus result in 0 reduction in error.

But just as finding a difference between the two means does not by itself rule out the possibility that the true difference in the DGP could be 0, the same is true of PRE. Just because in the sample distribution the two-group model reduces error by .07, that doesn’t necessarily rule out the possibility that the true PRE in the population might be 0.

When just looking at the incremental difference of means (\(b_1\)), we constructed a sampling distribution to help us put the observed difference in means in context. In particular, it allowed us to ask: could the mean difference observed by the researchers have occurred just by chance if the true mean difference in the DGP was 0?

L_Ch11_PRE_2

Previously, we learned methods of simulation and bootstrapping to help us implement various DGPs that we dreamed up. How would we implement a situation where there was no relationship, that is, a \(\beta_1=0\), between Condition and Tip? One way to break any relationship in the data is to shuffle the values of Condition so that the Tip would be randomly related to whether there was a smiley face, or nothing, on the check.

L_Ch11_PRE_3

Responses