Course Outline

segmentGetting Started (Don't Skip This Part)

segmentIntroduction to Statistics: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

11.0 The Tipping Experiment Revisited


segmentChapter 12  What You Have Learned

segmentResources
list Introduction to Statistics: A Modeling Approach
11.0 Model Comparison with the F Ratio
You should be feeling your personal statistical power grow! You now know how to specify models for explaining variation, and how to fit those models to your data by estimating the parameters. You also know how to construct sampling distributions and build confidence intervals around parameter estimates, and how to use these confidence intervals to decide which model–simple or complex–you want to retain, and which you want to reject.
With all this under your belt, you can go far! However, we would be remiss if we did not give you one more superpower: We need to teach you how to use the F ratio for directly comparing models. We developed the F ratio back in Chapter 5. We will start there. Then we will show you how everything you have learned in the meantime can be used to transform F from an esoteric statistic into a highly effective tool for model comparison. But first, let’s go back long, long ago, to the Tipping Experiment from Chapter 4.
The Tipping Experiment Revisited
Back in Chapter 4 we learned about an experiment in which researchers sought to figure out whether putting handdrawn smiley faces on on the back of the check would cause restaurant servers to receive higher tips (Rind, B., & Bordia, P. (1996). Effect on restaurant tipping of male and female servers drawing a happy, smiling face on the backs of customers’ checks. Journal of Applied Social Psychology, 26(3), 218225.)
The study was a randomized experiment. A female server in a particular restaurant was asked to either draw a smiley face—or not—for each party she served following a predetermined random sequence. The outcome variable was the amount of tip left by each party. Distributions of tip amounts in the two groups (n=22 parties per condition) are shown below.
gf_histogram(..density..~ Tip, data = TipExperiment, fill = "yellowgreen", color = "black", alpha = 1) %>%
gf_facet_grid(ion Condit~ .) %>%
gf_labs(title = "Tip Experiment")
We learned about this experiment in Chapter 4; but now we have a lot more tools in our toolbox! Back then, we didn’t even calculate the means for the two groups. We can take a look at the favstats for Tips from the two conditions in the experiment (we’ll also save them as Tip.stats).
Tip.stats < favstats(Tip ~ Condition, data = TipExperiment)
Tip.stats
Below, we made a gf_point()
plot of the two groups, and overlaid the mean of each group.
gf_point(Tip ~ 1, data = TipExperiment, size=4, alpha=.5, color = "firebrick") %>%
gf_facet_grid(. ~ Condition) %>%
gf_hline(yintercept = ~mean, data = Tip.stats)
In the plot, you can clearly see there is a mean difference between the two groups. In fact, the mean for the Control group is $27, whereas the mean for the Smiley Face group is about $33, which means there was a mean difference of about $6 in favor of the Smiley Face condition.
Specifying a Model Comparison
The research question in this study concerned the effect of smiley faces on tips: Did parties randomly assigned to get smiley faces on their checks give higher tips than the parties that got no smiley faces?
You now have the tools to state this question in terms of a comparison between two models: the empty model, in which the grand mean is used to predict tips, versus the twogroup model, in which knowing which group a party is in improves the prediction.
Using GLM notation, the researchers who conducted the tipping study were seeking to compare these two models:
\(Y_{i}=b_{0}+b_{1}X_{i}+e_{i}\) (the twogroup model; we also call this “complex”)
\(Y_{i}=b_{0}+e_{i}\) (the empty model; we also call this “simple”)
L_Ch11_Specifying_1
In the twogroup model, \(b_0\) represents the mean of one group (for example, it could be the mean of the Control group), and \(b_1\) the increment that would be added to get to the mean of the other group (e.g., the Smiley Face group).
Fitting the Models
Let’s find the bestfitting estimates for the twogroup model.
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
# fit the twogroup model and save it as Condition.model
Condition.model <
# print out the best fitting estimates
# fit the twogroup model and save it as Condition.model
Condition.model < lm(Tip ~ Condition, data = TipExperiment)
# print out the best fitting estimates
Condition.model
test_object("Condition.model")
test_output_contains("Condition.model")
test_error()
success_msg("Great thinking!")
What does it mean for these to be the bestfitting estimates? Even though we do not know what the true population parameters are, this is the best model we can come up with based on our data:
\[Y_i = 27 + 6.05X_i + e_i\]
This set of estimates is “best” because it minimizes the error measured in a specific way: Sum of Squares. Let’s take a look at the SS Error (as well as the SS Total and SS Model, and let’s also look at PRE while we are at it) from this model in the DataCamp window below. (Assume that Condition.model has already been saved for you.)
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
Condition.model < lm(Tip ~ Condition, data = TipExperiment)
# take a look at the sum of squares from the Condition.model
# take a look at the sum of squares from the Condition.model
supernova(Condition.model)
test_function("supernova")
test_output_contains("supernova(Condition.model)")
test_error()
success_msg("Great work!")
The twogroup model will, of course, reduce error compared with the empty model in the sample distribution.
L_Ch11_Fitting_1
But we know, at this point, that the observed mean difference (\(b_1\)) between the two groups in this particular sample is only one of many possible mean differences that could have occured if different random samples had been selected.
Our real question is: Will knowing which group someone is in (Smiley Face or Control) necessarily result in an improved prediction of tips in the future? Or, to ask this another way, does the smiley face actually produce a difference in the Data Generating Process (DGP)—the tip generating process? Which of these two models is a better model of the DGP?
Comparing the Models by Constructing a Confidence Interval
In the previous chapter we compared two models by constructing a confidence interval around the parameter that differentiates the two models, which, in this case, is \(b_1\), or the difference in means between the two groups. We can also think of \(b_1\) as the increment to go from one group to the other.
By constructing a confidence interval around this difference, we can determine whether, with 95% confidence, we could rule out the possibility that \(b_1\) is equal to 0, and thus adopt the twogroup model over the empty model.
To construct a confidence interval, we first need to construct a sampling distribution to put our estimate of \(b_1\) in context. One simple way to do this, for the current study, is to use a bootstrapping (or resampling) technique, the resample()
function in R.
SDob1 < do(10000) * b1(Tip ~ Condition, data = resample(TipExperiment, 44))
L_Ch11_Comparing_1
Use the DataCamp window to run the code. Then add (and run) a command to output the first six rows of the resulting data frame.
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
custom_seed(50)
# run this code
SDob1 < do(10000) * b1(Tip ~ Condition, data = resample(TipExperiment, 44))
# show the first 6 lines of this data frame
# run this code
SDob1 < do(10000) * b1(Tip ~ Condition, data = resample(TipExperiment, 44))
# show the first 6 lines of this data frame
head(SDob1)
test_object("SDob1")
test_error()
success_msg("Great thinking!")
The column labelled b1 contains the 10,000 mean differences from the resamples of TipExperiment.
L_Ch11_Comparing_2
SDob1 is the sampling distribution of 10,000 increments created by resampling the 44 parties in the sample 10,000 times. We used the following code to plot this simulated sampling distribution in a histogram, with the mean of the sampling distribution indicated by the vertical line, like this:
gf_histogram(~ b1, data = SDob1, fill = "orange") %>%
gf_vline(xintercept = mean(SDob1$b1), color = "purple")
L_Ch11_Comparing_3
The sampling distribution gives us a way of judging whether the true difference in the DGP could have been 0. This is important to know, because if the difference we observed could have occured with a true difference of 0, then we don’t have very good evidence that smiley faces affect tips. If the true difference could have been 0, then we should leave Condition out of our model and just stick with using the empty model.
We can see just by looking at the sampling distribution that a difference of 0 is somewhat unlikely (out in the lower tail), but certainly not completely unexpected.
To get a more exact calculation of the 95% confidence interval for the mean difference in tips between the two conditions using the t distribution, we could calculate the confidence interval around the parameters estimated in our Condition.model. Add code that will calculate the 95% Confidence Interval and press Run.
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
Condition.model < lm(Tip ~ Condition, data = TipExperiment)
# add code to calculate the 95% confidence interval around our estimates
Condition.model < lm(Tip ~ Condition, data = TipExperiment)
# add code to calculate the 95% confidence interval around our estimates
confint(Condition.model)
test_function("confint")
test_function_result("confint")
test_error()
success_msg("Great thinking!")
Based on the data from the tipping experiment, the true mean difference of the DGP could be as high as 12.76. On the other hand, it could be as low as 0.67. That’s a pretty big range. Importantly, it includes the possibility that the true DGP is 0. With 95% confidence, we can’t rule out the possibility that the empty model is true, and so we will stick with the empty model for now.