## Course Outline

• segmentGetting Started (Don't Skip This Part)
• segmentIntroduction to Statistics: A Modeling Approach
• segmentPART I: EXPLORING VARIATION
• segmentChapter 1 - Welcome to Statistics: A Modeling Approach
• segmentChapter 2 - Understanding Data
• segmentChapter 3 - Examining Distributions
• segmentChapter 4 - Explaining Variation
• segmentPART II: MODELING VARIATION
• segmentChapter 5 - A Simple Model
• segmentChapter 6 - Quantifying Error
• segmentChapter 7 - Adding an Explanatory Variable to the Model
• segmentChapter 8 - Models with a Quantitative Explanatory Variable
• segmentPART III: EVALUATING MODELS
• segmentChapter 9 - Distributions of Estimates
• segmentChapter 10 - Confidence Intervals and Their Uses
• segmentChapter 11 - Model Comparison with the F Ratio
• segmentChapter 12 - What You Have Learned
• segmentResources

## Using the F Ratio to Compare Multiple Groups

L_Ch11_UsingF_1

Back in Chapter 3 we developed a model we called the Height3Group.model to explain thumb length (Thumb). We created a categorical explanatory variable by divying up the students into three groups by their Height using the ntile() function. We then plotted Thumb as a function of Height3Group, pictured above.

We developed the idea at that time that Height3Group explains some of the variation of thumb length in our sample. We noted that just by visually inspecting the boxplots, you can see that knowing someone’s height (whether they are short, medium, or tall) will help you make a slightly better guess as to the length of their thumb than if you didn’t know which height category they were in.

L_Ch11_UsingF_2

From the boxplots, we can see a pretty clear relationship between height and thumb length in our sample. But this relationship could be the result of random sampling variation. Just because we have a relationship in our sample, it doesn’t mean the same relationship exists in the DGP.

We know a lot more now than we did in Chapter 3! Let’s see how we could re-cast this question in terms of model comparison using F. The two models we want to compare are about the three-group model and the empty model in the population. Here they are in GLM notation:

1. $$Y_i=\beta_{0}+\beta_{1}X_{1i}+\beta_{1}X_{2i}+\epsilon_i$$

2. $$Y_i=\beta_{0}+\epsilon_i$$

Fit these models (the Height3Group model and the empty model) in the DataCamp window below.

 #load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) require(tidyverse) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") Fingers$Height3Group <- factor(ntile(Fingers$Height, 3), levels = c(1:3), labels = c("short","medium","tall"))   # fit and print the Height3Group model of Thumb length # fit and print the empty model of Thumb length   # fit and print the Height3Group model of Thumb length lm(Thumb ~ Height3Group, data = Fingers) # fit and print the empty model of Thumb length lm(Thumb ~ NULL, data = Fingers)   test_function("lm") test_output_contains("lm(Thumb ~ Height3Group, data = Fingers)") test_output_contains("lm(Thumb ~ NULL, data = Fingers)") test_error() success_msg("Keep up the hard work!") 
DataCamp: ch11-15b

L_Ch11_UsingF_3

In the next DataCamp window, run thesupernova() function on the Height3Group model to look at how it compares with the empty model.

 #load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") Fingers$Height3Group <- factor(ntile(Fingers$Height, 3), levels = c(1:3), labels = c("short","medium","tall"))   Height3Group.model<-lm(Thumb ~ Height3Group,data = Fingers) # print the ANOVA table comparing Height3Group.model to the empty model   Height3Group.model<-lm(Thumb ~ Height3Group,data = Fingers) # print the ANOVA table comparing Height3Group.model to the empty model supernova(Height3Group.model)   test_object("Height3Group.model") test_function("supernova") test_error() success_msg("Great thinking!") 
DataCamp: ch11-16

L_Ch11_UsingF_4

Based on the F statistic, we can say that the complex model based on the group means of short, medium, and tall height does a significantly better job explaining thumb length than the simple model. It is highly unlikely that the F statistic we observed would have occurred just by chance. The effect is real.

On the other hand, although it makes sense for taller people to have longer thumbs, we wouldn’t want to say that being taller causes one to have longer thumbs. It seems likely that both are caused by other factors that, when working together, produce bigger people.

### Randomizing to Compare Models

The p-level reported in the ANOVA table above (.0000) is the probability of getting an F as extreme or more extreme as 12.77. This probability was calculated using the F distribution for 2 and 154 degrees of freedom. In the DataCamp window below, try running the xpf() function, putting in our sample F, and 2 and 154 as the respective degrees of freedoms.

The plot of the F distribution does not go out far enough to show you how extreme our sample F is. Notice that the x-axis only goes up to about 7 and our sample F was 12.77. The xpf() function also prints out the value of the coral red area in the console (.9999926). So the greenish-blue p-value would be .0000074 (or 1 - .9999926), too small to be depicted here or in the supernova table (which is limited to four decimal places).

What if you wanted to use randomization to get the sampling distribution of F instead of using the mathematical probability distribution for F? Remember, we’ll have to use our “If..” thinking and imagine a DGP where there is no real relationship between Thumb and Height3Group (any differences are just caused by chance).

Use the do(), fVal(), and shuffle() functions in the DataCamp window below to create a distribution of 10,000 F statistics from randomized samples. Save the randomized sampling distribution in a data frame called SDoF and print its first six rows.

 #load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") Fingers$Height3Group <- factor(ntile(Fingers$Height, 3), levels = c(1:3), labels = c("short","medium","tall")) custom_seed(101)   # create a sampling distribution of F generated from the empty model # print the first 6 rows   # create a sampling distribution of F generated from the empty model SDoF <- do(10000) * fVal(Thumb ~ shuffle(Height3Group), data = Fingers) # print the first 6 rows head(SDoF)   test_object("SDoF") test_function("head") test_error() success_msg("Great thinking!") 
DataCamp: ch11-17

Next, make a histogram of your randomized sampling distribution.

 #load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") Fingers$Height3Group <- factor(ntile(Fingers$Height, 3), levels = c(1:3), labels = c("short","medium","tall")) custom_seed(101)   # SDoF has been created for you SDoF <- do(10000) * fVal(Thumb ~ shuffle(Height3Group), data = Fingers) # plot these randomized F values in a histogram   # SDoF has been created for you SDoF <- do(10000) * fVal(Thumb ~ shuffle(Height3Group), data = Fingers) # plot these randomized F values in a histogram gf_histogram(~ fVal, data = SDoF, fill = "firebrick", alpha = .9)   test_function("gf_histogram") test_error() success_msg("Keep up the great work!") 
DataCamp: ch11-18

L_Ch11_Randomizing_1

Finally, use the DataCamp window below to tally the number of Fs out of the 10,000 randomized samples that were more extreme than the one observed in our study (sampleF).

 #load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") Fingers$Height3Group <- factor(ntile(Fingers$Height, 3), levels = c(1:3), labels = c("short","medium","tall")) custom_seed(101)   SDoF <- do(10000) * fVal(Thumb ~ shuffle(Height3Group), data = Fingers) sampleF <- fVal(Thumb ~ Height3Group, data = Fingers) # how many of the randomized Fs are greater than the sampleF?   SDoF <- do(10000) * fVal(Thumb ~ shuffle(Height3Group), data = Fingers) sampleF <- fVal(Thumb ~ Height3Group, data = Fingers) # how many of the randomized Fs are greater than the sampleF? tally(~fVal > sampleF, data = SDoF)   test_object("sampleF") test_function("tally") test_error() success_msg("Great thinking!") 
DataCamp: ch11-19

Clearly, the F we observed in our sample would have been very unlikely to have occured if height (short, medium, or tall) was completely unrelated to thumb length. In terms of model comparison, this means that we should reject the simple (empty) model and adopt the complex model. The effect we observed could not be simply attributed to random sampling error. It’s a real relationship.

### Interpretation of the Multi-Group F Test

When we compare the two-group model to the empty model, as we did in the tipping study above, an extreme (as in large) F has a low likelihood of being generated by the empty model. That likelihood is called the p-value. When the p-value of the sample F is very low (smaller than .05), the difference between the two groups is significant at the .05 level. In other words, the difference between the two means is not just due to random sampling error, but to a real difference in means in the population. It allows us to say with 95% confidence that $$\beta_1\neq0$$.

The interpretation of the three-group model compared to the empty model is a little different. In the three group model, if we observe an F with a less than .05 probability of occuring by chance, it means that if the means of the three groups are all equal to each other in the population (and notice the importance of the word “if” here), the F we observed would occur in fewer than .05 of random samples from the population.

If the probability is greater than .05, we are going to stick with the empty model, and just assume that there is no real difference between the means of the three groups. But if p < .05, it means that we can be confident that the three group means are not equal to each other. It does not tell us, however, which of the three group means are likely to differ, and which might not differ. It only tells us that the three have a less than .05 chance of being exactly the same as each other.

### Simple Effects Tests

It might be good enough just to know that the complex model, the one that includes Height3Group, is significantly better than the empty model. But sometimes we want to know more than that. We want to know which of the three groups, specifically, differ from each other statistically, and which do not.

To figure this out we need to do three model comparisons, each designed to test the difference between pairs of the three groups. In the case of Height3Group, we would need to compare: 1) a model in which short versus medium people have different thumb lengths to one where they are the same (the empty model); 2) another in which medium versus tall people have different thumb lengths; and 3), a model in which tall versus short people have different thumb lengths.

The models being compared in all three cases would be represented the same way:

1. $$Y_{i}=b_{0}+e_{i}$$

2. $$Y_{i}=b_{0}+b_{1}X_{i}+e_{i}$$

In other words, we would be doing three separate two-group comparisons. Each comparison would yield a separate F statistic, which we could compare to the appropriate F distribution.