Course Outline

list Introduction to Statistics: A Modeling Approach

11.0 Model Comparison with the F Ratio

You should be feeling your personal statistical power grow! You now know how to specify models for explaining variation, and how to fit those models to your data by estimating the parameters. You also know how to construct sampling distributions and build confidence intervals around parameter estimates, and how to use these confidence intervals to decide which model–simple or complex–you want to retain, and which you want to reject.

With all this under your belt, you can go far! However, we would be remiss if we did not give you one more superpower: We need to teach you how to use the F ratio for directly comparing models. We developed the F ratio back in Chapter 5. We will start there. Then we will show you how everything you have learned in the meantime can be used to transform F from an esoteric statistic into a highly effective tool for model comparison. But first, let’s go back long, long ago, to the Tipping Experiment from Chapter 4.

The Tipping Experiment Revisited

Back in Chapter 4 we learned about an experiment in which researchers sought to figure out whether putting hand-drawn smiley faces on on the back of the check would cause restaurant servers to receive higher tips (Rind, B., & Bordia, P. (1996). Effect on restaurant tipping of male and female servers drawing a happy, smiling face on the backs of customers’ checks. Journal of Applied Social Psychology, 26(3), 218-225.)

The study was a randomized experiment. A female server in a particular restaurant was asked to either draw a smiley face—or not—for each party she served following a predetermined random sequence. The outcome variable was the amount of tip left by each party. Distributions of tip amounts in the two groups (n=22 parties per condition) are shown below.

gf_histogram(..density..~ Tip, data = TipExperiment, fill = "yellowgreen", color = "black", alpha = 1) %>%
gf_facet_grid(ion Condit~ .) %>%
gf_labs(title = "Tip Experiment")

We learned about this experiment in Chapter 4; but now we have a lot more tools in our toolbox! Back then, we didn’t even calculate the means for the two groups. We can take a look at the favstats for Tips from the two conditions in the experiment (we’ll also save them as Tip.stats).

Tip.stats <- favstats(Tip ~ Condition, data = TipExperiment)
Tip.stats

Below, we made a gf_point() plot of the two groups, and overlaid the mean of each group.

gf_point(Tip ~ 1, data = TipExperiment, size=4, alpha=.5, color = "firebrick") %>%
gf_facet_grid(. ~ Condition) %>%
gf_hline(yintercept = ~mean, data = Tip.stats)

In the plot, you can clearly see there is a mean difference between the two groups. In fact, the mean for the Control group is $27, whereas the mean for the Smiley Face group is about $33, which means there was a mean difference of about $6 in favor of the Smiley Face condition.

Specifying a Model Comparison

The research question in this study concerned the effect of smiley faces on tips: Did parties randomly assigned to get smiley faces on their checks give higher tips than the parties that got no smiley faces?

You now have the tools to state this question in terms of a comparison between two models: the empty model, in which the grand mean is used to predict tips, versus the two-group model, in which knowing which group a party is in improves the prediction.

Using GLM notation, the researchers who conducted the tipping study were seeking to compare these two models:

\(Y_{i}=b_{0}+b_{1}X_{i}+e_{i}\) (the two-group model; we also call this “complex”)

\(Y_{i}=b_{0}+e_{i}\) (the empty model; we also call this “simple”)

L_Ch11_Specifying_1

In the two-group model, \(b_0\) represents the mean of one group (for example, it could be the mean of the Control group), and \(b_1\) the increment that would be added to get to the mean of the other group (e.g., the Smiley Face group).

Fitting the Models

Let’s find the best-fitting estimates for the two-group model.

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) # fit the two-group model and save it as Condition.model Condition.model <- # print out the best fitting estimates # fit the two-group model and save it as Condition.model Condition.model <- lm(Tip ~ Condition, data = TipExperiment) # print out the best fitting estimates Condition.model test_object("Condition.model") test_output_contains("Condition.model") test_error() success_msg("Great thinking!")
DataCamp: ch11-1

What does it mean for these to be the best-fitting estimates? Even though we do not know what the true population parameters are, this is the best model we can come up with based on our data:

\[Y_i = 27 + 6.05X_i + e_i\]

This set of estimates is “best” because it minimizes the error measured in a specific way: Sum of Squares. Let’s take a look at the SS Error (as well as the SS Total and SS Model, and let’s also look at PRE while we are at it) from this model in the DataCamp window below. (Assume that Condition.model has already been saved for you.)

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Condition.model <- lm(Tip ~ Condition, data = TipExperiment) # take a look at the sum of squares from the Condition.model # take a look at the sum of squares from the Condition.model supernova(Condition.model) test_function("supernova") test_output_contains("supernova(Condition.model)") test_error() success_msg("Great work!")
DataCamp: ch11-2

The two-group model will, of course, reduce error compared with the empty model in the sample distribution.

L_Ch11_Fitting_1

But we know, at this point, that the observed mean difference (\(b_1\)) between the two groups in this particular sample is only one of many possible mean differences that could have occured if different random samples had been selected.

Our real question is: Will knowing which group someone is in (Smiley Face or Control) necessarily result in an improved prediction of tips in the future? Or, to ask this another way, does the smiley face actually produce a difference in the Data Generating Process (DGP)—the tip generating process? Which of these two models is a better model of the DGP?

Comparing the Models by Constructing a Confidence Interval

In the previous chapter we compared two models by constructing a confidence interval around the parameter that differentiates the two models, which, in this case, is \(b_1\), or the difference in means between the two groups. We can also think of \(b_1\) as the increment to go from one group to the other.

By constructing a confidence interval around this difference, we can determine whether, with 95% confidence, we could rule out the possibility that \(b_1\) is equal to 0, and thus adopt the two-group model over the empty model.

To construct a confidence interval, we first need to construct a sampling distribution to put our estimate of \(b_1\) in context. One simple way to do this, for the current study, is to use a bootstrapping (or resampling) technique, the resample() function in R.

SDob1 <- do(10000) * b1(Tip ~ Condition, data = resample(TipExperiment, 44))

L_Ch11_Comparing_1

Use the DataCamp window to run the code. Then add (and run) a command to output the first six rows of the resulting data frame.

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) custom_seed(50) # run this code SDob1 <- do(10000) * b1(Tip ~ Condition, data = resample(TipExperiment, 44)) # show the first 6 lines of this data frame # run this code SDob1 <- do(10000) * b1(Tip ~ Condition, data = resample(TipExperiment, 44)) # show the first 6 lines of this data frame head(SDob1) test_object("SDob1") test_error() success_msg("Great thinking!")
DataCamp: ch11-3

The column labelled b1 contains the 10,000 mean differences from the resamples of TipExperiment.

L_Ch11_Comparing_2

SDob1 is the sampling distribution of 10,000 increments created by resampling the 44 parties in the sample 10,000 times. We used the following code to plot this simulated sampling distribution in a histogram, with the mean of the sampling distribution indicated by the vertical line, like this:

gf_histogram(~ b1, data = SDob1, fill = "orange") %>%
gf_vline(xintercept = mean(SDob1$b1), color = "purple")

L_Ch11_Comparing_3

The sampling distribution gives us a way of judging whether the true difference in the DGP could have been 0. This is important to know, because if the difference we observed could have occured with a true difference of 0, then we don’t have very good evidence that smiley faces affect tips. If the true difference could have been 0, then we should leave Condition out of our model and just stick with using the empty model.

We can see just by looking at the sampling distribution that a difference of 0 is somewhat unlikely (out in the lower tail), but certainly not completely unexpected.

To get a more exact calculation of the 95% confidence interval for the mean difference in tips between the two conditions using the t distribution, we could calculate the confidence interval around the parameters estimated in our Condition.model. Add code that will calculate the 95% Confidence Interval and press Run.

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) Condition.model <- lm(Tip ~ Condition, data = TipExperiment) # add code to calculate the 95% confidence interval around our estimates Condition.model <- lm(Tip ~ Condition, data = TipExperiment) # add code to calculate the 95% confidence interval around our estimates confint(Condition.model) test_function("confint") test_function_result("confint") test_error() success_msg("Great thinking!")
DataCamp: ch11-4

Based on the data from the tipping experiment, the true mean difference of the DGP could be as high as 12.76. On the other hand, it could be as low as -0.67. That’s a pretty big range. Importantly, it includes the possibility that the true DGP is 0. With 95% confidence, we can’t rule out the possibility that the empty model is true, and so we will stick with the empty model for now.

Responses