1. Recap: One sample tests

Data: \(x_1, x_2, \dots, x_n\) corresponds to a column of our data set

Random variable: \(X\) corresponds to the probability model for the variable we observe in our data set

Example: Week 11 Worksheet Problem 5

The National Perinatal Statistics Unit of the Sydney Children’s Hospital reports that the mean birth weight of all babies born in birth centers in Australia in a recent year was \(3564\) grams—about \(7.86\) pounds. A Missouri hospital reports that the average weight of \(112\) babies born there last year was \(7.68\) pounds, with a standard deviation of \(1.31\) pounds. We want to see if U.S. babies weigh the same, on average, as Australian babies.


Reality check

Q) Why do we bother with hypothesis tests?

Suppose we want to know if the probability of success is really \(0.5\), i.e. \(H_0: p = 0.5\). After collecting our data of observed successes and failures, why not just stop when we calculate \(\hat{p}=0.81\)? Since \(0.81 > 0.5\), why can’t we just conclude that no, the probability of success is probably not really \(0.5\)?


To answer this question, think back on some of the different examples we’ve solved. Does this approach work in the baby weight example above? Would it work in our experiment of randomly selecting \(20\) jelly beans and counting the number of green ones? Would you be satisfied with this reasoning in the drinking water contamination example?


2. Recap: Two sample tests

Data: \(\{x_1, x_2, \dots, x_{n_1}\}\) and \(\{y_1, y_2 \dots, y_{n_2}\}\)

Random variables: \(X\) and \(Y\)

Example: Hypothesis test for a difference in proportions

Are people who work for non-profits generally more satisfied with their jobs compared to those who work at for-profit companies?

Separate random samples were collected by a polling agency to investigate the difference. Data collected from \(422\) employees at non-profit organizations revealed. that \(377\) of them were “highly satisfied”. From the for-profit companies, \(431\) out of \(518\) employees reported the same level of satisfaction.

Step 1) Identify and define the population parameter and choose your significance level.

Step 2) State the null and alternative hypotheses.

Step 3) Assess the required assumptions and conditions.

Step 4) Calculate the test statistic and plot it.

Step 5) Shade the area under the curve that corresponds to your p-value and then calculate it.


Example: Confidence interval for a difference in means

On average, how much more money do consumers spend at Target compared to Walmart?

Suppose researchers collected a systematic sample from \(85\) Walmart customers and \(80\) Target customers by asking them for their purchase amount as they left the stores. The data they collected is summarized in the table below. Suppose a computer already calculated the degrees of freedom to be \(162.75\).

Walmart Target
\(\bar{x}\) \(\$45\) \(\$53\)
s \(\$21\) \(\$19\)

Step 1) Identify and define the population parameter and choose your confidence level.

Step 2) Calculate the sample estimate for the population parameter.

Step 3) Assess the required assumptions and conditions.

Step 4) Find the critical value corresponding to your confidence level.

Step 5) Calculate the standard error of your sample estimate.

Step 6) Calculate the lower and upper bounds of your confidence interval.


Where to start?

If you feel like you’re starting to get lost with all the different statistical methods we’ve explored in this unit, don’t fret! There are plenty of flow-chart-like guides to using different tests. For example, here is a flow-chart created by a stats grad student. You could also make your own! The starting point for using any of these kinds of guides is to first assess the data that you have (or want to have). This means you need to be able to answer questions like:

  • What constitutes an observational unit?

  • What are the variables and what are their types?

  • Do you expect there to be any relationships among the variables given your knowledge of the subject?

  • Is your sample random? If not, is your sample conceivably representative of any larger population?

3. Looking ahead