Consider the following data set of \(344\) penguins samples from the Palmer
Archipelago in Antarctica. There are three different species of penguins
who reside on three different islands in the archipelago. Note, this
data is only accessible after installing and loading the library
palmerpenguins
. [Source: https://allisonhorst.github.io/palmerpenguins/]
#install.packages("palmerpenguins")
library("palmerpenguins")
penguins %>% head
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
## 4 Adelie Torgersen NA NA NA NA <NA> 2007
## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
## # … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
Note, there are some missing values in the data so let’s re-define our data set to exclude these observations.
penguins_complete <- penguins %>% na.omit
penguins %>% dim
## [1] 344 8
penguins_complete %>% dim
## [1] 333 8
Let’s consider a one-way ANOVA model for the length of the penguins beaks as a function of the species type. First, let’s see what the different species types are.
penguins_complete$species %>% summary
## Adelie Chinstrap Gentoo
## 146 68 119
penguin_anova <- lm(bill_length_mm ~ species, penguins_complete)
penguin_anova %>% summary
##
## Call:
## lm(formula = bill_length_mm ~ species, data = penguins_complete)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.934 -2.234 0.076 2.066 12.032
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.8240 0.2459 157.88 <2e-16 ***
## speciesChinstrap 10.0099 0.4362 22.95 <2e-16 ***
## speciesGentoo 8.7441 0.3670 23.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.971 on 330 degrees of freedom
## Multiple R-squared: 0.7066, Adjusted R-squared: 0.7048
## F-statistic: 397.3 on 2 and 330 DF, p-value: < 2.2e-16
The summary above shows the output for the overall F-test of the hypothesis \(H_0: \alpha_{A} = \alpha_{C} = \alpha_{G} = 0\) vs \(H_A: \text{ At least one }\alpha_{k}\text{ is not}0\).
With such a small p-value for the overall F-test, there is
statistically detectable difference among at least one of the species in
terms of bill length. But in order to determine which species differ, we
have at least three different null hypotheses to consider: \[H_0: \alpha_{C} - \alpha_{G} = 0\] or
\[H_0: \alpha_{A} - \alpha_{C} = 0\]
or \[H_0: \alpha_{A} - \alpha_{G} =
0.\] Now that we are testing the same data three additional times
we need to consider that we risk increasing the change of a false
positive (or Type I) error. One way to adjust for the problem of
multiple comparisons is to use Fisher’s Least
Significant Difference method. This method is easy to apply in R but we
first need to install and load the package DescTools
.
#install.packages("DescTools")
library(DescTools)
Then, we need to make sure our ANOVA model is in a form that the
function PostHocTest()
will recognize. To do this we can
use the aov()
function which will make sure R understands
that our linear regression model penguin_anova
is a special
ANOVA model object. Take a look at the summary of this converted model
and compare it to the summary above. They mathematically represent the
same model, just in different forms.
penguin_aov <- penguin_anova %>% aov
penguin_aov %>% summary
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 7015 3508 397.3 <2e-16 ***
## Residuals 330 2914 9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Finally, we use the PostHotTest()
function and specify
method="lsd"
to apply Fisher’s Least Significant Difference
adjustment and get valid results for each of the three comparisons
listed above.
PostHocTest(penguin_aov, method="lsd")
##
## Posthoc multiple comparisons of means : Fisher LSD
## 95% family-wise confidence level
##
## $species
## diff lwr.ci upr.ci pval
## Chinstrap-Adelie 10.009851 9.151684 10.8680175 <2e-16 ***
## Gentoo-Adelie 8.744095 8.022209 9.4659807 <2e-16 ***
## Gentoo-Chinstrap -1.265756 -2.154320 -0.3771928 0.0054 **
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Work with your group to conduct a test for the difference in bill lengths between Gentoo and Chinstrap penguins without using a multiple comparisons adjustement. Recall that testing \(H_0: \alpha_{A} - \alpha_{G} = 0\) is equivalent to testing \(H_0: \mu_{A} - \mu_{G}=0\).
First, note that the Adelie species is the reference group, so \(\beta_0 = \mu_{A}\). Referencing the ANOVA
cheat sheet helps us realize that \(\beta_1 =
\mu_{A} - \mu_{G}\) and \(\beta_2 =
\mu_{A} - \mu_{C}\). Hence, \(\mu_{G} -
\mu_{C} = (\mu_{A} - \beta_1) - (\mu_{A}-\beta_2) = \beta_2 -
\beta_1\) and our null is
\[H_0: \beta_2 - \beta_1 = 0.\]
This is a two sided alternative that \[H_A: \beta_2 - \beta_1 \neq 0.\]
If we this view this as a test for a difference in group means we can calculate the upper and lower bounds of a 95% CI for the difference in means “by hand” and determine the conclusion of our test depending on whether or not this interval contains zero. Using \(\hat{\sigma} = \sqrt{MSE}\) and the formula \[(\bar{y}_{G} - \bar{y}_{C}) \pm \left( t^*\cdot SE(\bar{y}_{G} - \bar{y}_{C})\right), \quad \text{ where } \quad SE(\bar{y}_{G} - \bar{y}_{C}) = \sqrt{\hat{\sigma}\left(\frac{1}{n_G} + \frac{1}{n_C}\right)} \] and \(t^*\) is the \(alpha/2^{th}\) quantile from a t-distribution with \(n-k\) degrees of freedom. If the confidence interval contains zero, then we would fail to reject the null that \(H_0: \mu_G - \mu_C = 0\).
##By hand approach
penguins_G <- penguins_complete %>% filter(species=="Gentoo")
penguins_C <- penguins_complete %>% filter(species=="Chinstrap")
penguins_A <- penguins_complete %>% filter(species=="Adelie")
alpha = 0.05
MSE <- (summary(penguin_anova)$sigma)^2
t_star <- qt(alpha/2, df = (146+119+68)-3, lower.tail=TRUE)
SE_GC <- sqrt(MSE*((1/119)+(1/68)))
(mean(penguins_G$bill_length_mm) - mean(penguins_C$bill_length_mm)) - (t_star * SE_GC)
## [1] -0.3771928
(mean(penguins_G$bill_length_mm) - mean(penguins_C$bill_length_mm)) + (t_star * SE_GC)
## [1] -2.15432
SE_AC <- sqrt(MSE*((1/146)+(1/68)))
(mean(penguins_A$bill_length_mm) - mean(penguins_C$bill_length_mm)) - (t_star * SE_AC)
## [1] -9.151684
(mean(penguins_A$bill_length_mm) - mean(penguins_C$bill_length_mm)) + (t_star * SE_AC)
## [1] -10.86802
SE_AG <- sqrt(MSE*((1/146)+(1/119)))
(mean(penguins_A$bill_length_mm) - mean(penguins_G$bill_length_mm)) - (t_star * SE_AG)
## [1] -8.022209
(mean(penguins_A$bill_length_mm) - mean(penguins_G$bill_length_mm)) + (t_star * SE_AG)
## [1] -9.465981