Course Outline

list Introduction to Statistics: A Modeling Approach

Measures of Effect Size

So far we have been talking a lot about a model using Sex as the explanatory variable. These kind of models are also called a two-group model because there are two values of sex in this data set and thus two groups of data. We’ve been thinking about trying to measure “how good (or bad) is our model?” but there is another way to think about this.

Given that an explanatory variable (e.g., sex) has an effect on an outcome variable (e.g., thumb length), how big is the effect? We call the answer to this question effect size. We haven’t used the term effect size up to now, but we have, in fact, presented two measures of effect size.

Mean Difference

The most straightforward measure of effect size in the context of the two-group model is simply the actual difference in means on the outcome variable between the two groups.

L_Ch7_Measures_1

In our data set Fingers, we can see that the size of the sex effect is 6.447 mm: males, on average, have thumbs that are 6.447 mm longer than females.

PRE

PRE is a second measure of effect size. As just discussed, it tells us the proportional reduction in error of the two-group model over the empty model. PRE is a nice measure of effect size because it is relative: it is a measure of improvement (reduction in error) that results from adding in the explanatory variable. But what counts as a good PRE?

Recall the TinySex.model had a PRE of .66 while the Sex.model had a PRE of .11. Are PREs in general going to be as large as TinySex.model? Probably not. In TinyFingers we stacked the deck for purposes of teaching, creating a data set in which all the females had smaller thumbs than all the males. This resulted in a large PRE.

As with every other statistic, PRE will vary from model to model and situation to situation. Having more experience with making models will give you a sense of what counts as an impressive PRE in your research area.

In the social sciences, at least, there are some generally agreed-on ideas about what is considered a strong effect. A PRE of .25 is considered a pretty large effect, .09 is considered medium, and .01 is considered small. So according to these conventions, there is a medium effect of sex on thumb length in the Fingers data set.

Take these conventions with a grain of salt though because effect size ultimately depends on your purpose. For example, if an online retailer found a small effect of changing the color of their “buy” button (e.g., PRE = .01), they might want to do it even though the effect is small. The change is free and easy to make and it might result in a tiny increase in sales.

Cohen’s d

A third measure of effect size that applies especially to the two-group model (such as the Sex model) is Cohen’s d. Cohen’s d is related to the z score. Recall that z scores tell us how far an individual score is from the mean of a distribution in standard deviation units. Cohen’s d, similarly, indicates the size of a group difference in standard deviation units.

\[d=\frac{\bar{Y}_{1}-\bar{Y}_{2}}{s}\]

As with everything else in this class, there is an R function for calculating Cohen’s d.

cohensD(Thumb ~ Sex, data = Fingers)

Try running this code in the DataCamp window.

require(mosaic) require(ggformula) require(supernova) require(lsr) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") # run this code cohensD(Thumb ~ Sex, data = Fingers) # run this code cohensD(Thumb ~ Sex, data = Fingers) test_function_result("cohensD") test_error()
Just press Run
DataCamp: ch7-15

L_Ch7_Cohen_1

We know that there is a 6.447 mm difference between male and female thumb lengths on average. If you think about a standard deviation as a little ruler, that 6.447 mm difference is a little less than one of those rulers (.78 to be exact!).

With something like thumb length, knowing there is about a 6 mm difference is actually pretty meaningful. But for other variables such as Kargle and Spargle scores, people may not be as clear what a straight point difference implies.

Nevertheless, in both cases, it is somewhat illuminating to add the information from Cohen’s d to the mix. Male thumbs are .78 standard deviations longer than female thumbs.

L_Ch7_Cohen_2

Responses