Consider the following data set of \(344\) penguins samples from the Palmer
Archipelago in Antarctica. There are three different species of penguins
who reside on three different islands in the archipelago. Note, this
data is only accessible after installing and loading the library
palmerpenguins
. [Source: https://allisonhorst.github.io/palmerpenguins/]
#install.packages("palmerpenguins")
library("palmerpenguins")
penguins %>% head
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
## 4 Adelie Torgersen NA NA NA NA <NA> 2007
## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
## # … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
Let’s consider a one-way ANOVA model for the length of the penguins beaks as a function of the species type. First, let’s see what the different species types are.
penguins$species %>% summary
## Adelie Chinstrap Gentoo
## 152 68 124
penguin_anova <- lm(bill_length_mm ~ species, penguins)
penguin_anova %>% summary
##
## Call:
## lm(formula = bill_length_mm ~ species, data = penguins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9338 -2.2049 0.0086 2.0662 12.0951
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.7914 0.2409 161.05 <2e-16 ***
## speciesChinstrap 10.0424 0.4323 23.23 <2e-16 ***
## speciesGentoo 8.7135 0.3595 24.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.96 on 339 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.7078, Adjusted R-squared: 0.7061
## F-statistic: 410.6 on 2 and 339 DF, p-value: < 2.2e-16
The summary above shows the output for the overall F-test of the hypothesis \(H_0: \alpha_{A} = \alpha_{C} = \alpha_{G} = 0\) vs \(H_A: \text{ At least one }\alpha_{k}\text{ is not}0\).
With such a small p-value for the overall F-test, there is
statistically detectable difference among at least one of the species in
terms of bill length. But in order to determine which species differ, we
have at least three different null hypotheses to consider: \[H_0: \alpha_{C} - \alpha_{G} = 0\] or
\[H_0: \alpha_{A} - \alpha_{C} = 0\]
or \[H_0: \alpha_{A} - \alpha_{G} =
0.\] Now that we are testing the same data three additional times
we need to consider that we risk increasing the change of a false
positive (or Type I) error. One way to adjust for the problem of
multiple comparisons is to use Fisher’s Least
Significant Difference method. This method is easy to apply in R but we
first need to install and load the package DescTools
.
#install.packages("DescTools")
library(DescTools)
Then, we need to make sure our ANOVA model is in a form that the
function PostHocTest()
will recognize. To do this we can
use the aov()
function which will make sure R understands
that our linear regression model penguin_anova
is a special
ANOVA model object. Take a look at the summary of this converted model
and compare it to the summary above. They mathematically represent the
same model, just in different forms.
penguin_aov <- penguin_anova %>% aov
penguin_aov %>% summary
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 7194 3597 410.6 <2e-16 ***
## Residuals 339 2970 9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
Finally, we use the PostHotTest()
function and specify
method="lsd"
to apply Fisher’s Least Significant Difference
adjustment and get valid results for each of the three comparisons
listed above.
PostHocTest(penguin_aov, method="lsd")
##
## Posthoc multiple comparisons of means : Fisher LSD
## 95% family-wise confidence level
##
## $species
## diff lwr.ci upr.ci pval
## Chinstrap-Adelie 10.042433 9.192175 10.892691 <2e-16 ***
## Gentoo-Adelie 8.713487 8.006347 9.420628 <2e-16 ***
## Gentoo-Chinstrap -1.328945 -2.208740 -0.449151 0.0032 **
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Work with your group to conduct a test for the difference in bill lengths between Gentoo and Chinstrap penguins without using a multiple comparisons adjustement. Recall that testing \(H_0: \alpha_{A} - \alpha_{G} = 0\) is equivalent to testing \(H_0: \mu_{A} - \mu_{G}=0\).
\[H_0: ?\]
\[H_A: ?\]
## Use this space to code your answer to #3