Unit 3 topics

Example using the CLT to estimate a probability

(Note: This is the old version of Problem 1 from the in-class worksheet for week 13.)

The Centers for Disease Control and Prevention reports that \(9.3\%\) of surveyed high school students reported in 2015 that they had smoked cigarettes in the past 30 days. A college has \(522\) students in its freshman class. How likely is it that more than \(10\%\) of them are smokers?

Step 1) Define the probability model.

Step 2) Determine the sampling distribution for the sample statistic and draw the density plot.

Step 3) Plot the observed sample statistic and shade the region of interest on the density plot.

Step 4) Use software or a table to find the answer.

Sampling distributions and confidence intervals

Steps for finding a confidence interval

Step 1) Identify and define the population parameter and choose your confidence level.

Step 2) Calculate the sample estimate for the population parameter.

Step 3) Assess the required assumptions and conditions.

Step 4) Find the critical value corresponding to your confidence level.

Step 5) Calculate the standard error of your sample estimate.

Step 6) Calculate the lower and upper bounds of your confidence interval.

Hypothesis tests

Hypotheses are typically designed so that what we want to prove is expressed in the alternative. For all of the methods that we’ve covered thus far, the null hypothesis is always going to be of the form \[H_0: \text{<parameter> } = \text{ some number}\]

Types of conclusions

Below is a table showing all possible conclusions we can make from a hypothesis test and all possible mistakes we could make. We will cover the different types of mistakes (or errors) next class.

The only way to reduce both types of error is to collect more evidence or, in statistical terms, to collect more data.

  • \(\alpha = Pr(\text{Type I error})\): If \(H_0\) is true, this is the probability that we (incorrectly) reject it.

  • \(\beta = Pr(\text{Type II error})\): If \(H_0\) is false, this is the probability that we (incorrectly) fail to reject it.

  • \(1-\beta = Power\) If \(H_0\) is false, this is the probability that we (correctly) reject it.

Steps for conducting a hypothesis test

Step 1) Identify and define the population parameter and choose your significance level.

Step 2) State the null and alternative hypotheses.

Step 3) Assess the required assumptions and conditions.

Step 4) Calculate the test statistic and plot it.

Step 5) Shade the area under the curve that corresponds to your p-value and then calculate it. State your conclusion.

Student Requested Examples

Two-tailed hypothesis test for an unknown mean

A technology committee wants to perform a test to see if the mean amount of time students are spending in the lab has changed from a historical average of 55 minutes. The sample of data recorded below are the lab times for \(12\) students.

lab_time <-c(52, 57,54, 76,62,52,74, 53, 136, 73, 8, 62)
mean(lab_time)
## [1] 63.25
sd(lab_time)
## [1] 28.92663

Question: What is the result of a type I error? What is the result of a type II error? Which error is more consequential?

Answer: A type I error would mean that the committee concludes the average time in labs has changed from 55 minutes when it has not in reality changed. A type II error however would be the committee concluding that the average time in labs is not significantly different from 55 minutes when in fact it really has increased or decreased. It isn’t clear which error is more consequential here. Perhaps a type I error is more consequential for students (as this could lead to procedural changes in lab fees for example), but a type II error is probably more consequential for the school as this erroneous conclusion may mean that necessary procedural changes aren’t implemented and that faculty and staff are doing more work with limited resourses.

Upper-tailed hypothesis test for paired means

Having done poorly on their Biology final exams in June, six students repeat the course in summer school and take another exam in August. Here are the exam scores:

\[\begin{align*} \text{June }\quad& 54 \quad 49 \quad 68 \quad 66 \quad 62 \quad 62 \\ \text{Aug }\quad& 50 \quad 65 \quad 74 \quad 64 \quad 68 \quad 72 \\ \text{Aug - June }\quad& -4 \quad 16 \quad 6 \quad -2 \quad 6 \quad 10 \end{align*}\]

If we consider these students to be representative of all students who might attend this summer school in other years, do these results provide evidence that the program is worthwhile?

d <- c(-4, 16, 6, -2, 6, 10)
mean(d)
## [1] 5.333333
sd(d)
## [1] 7.447595

Question: What is the result of a type I error? What is the result of a type II error? Which error is more consequential?

Answer: A type I error means that we conclude the summer school program is worthwhile (and improves student learning) when it does not actually do so. A type II error means we conclude the summer school program is not useful when in fact it really is. In this case, a type I error is probably more consequential because this would impact future students who may be forced to take a summer class that is actually a waste of their time. The consequence of a type II error however would probably be additional work done to improve the summer school program which, although costly, would benefit students who take the summer course anyway.