Overview of CIs

Population parameter – these are typically unknown, fixed numbers representing the population proportion, mean, variance, and standard deviation. There are many other types of parameters but these are the ones we are most interested in for now.

Sample estimate – these are values that approximate the population parameters, we can view them as random variables and consider their sampling distributions, or, once we observe a data set, we can plug in the data and get a point estimate which is an actual numerical guess

Sampling distribution – the theoretical distribution of any sample estimate, typically if our sample size is large enough, we can use the Central Limit Theorem to approximate this theoretical distribution with a Normal model

General formula for any confidence interval

\[\text{sample estimate } ± [\text{critical value }\times SE(\text{sample estimate})]\]

Confidence interval assumption

The data was collected without bias and each observation is independent of the others and we can apply the CLT.

When to use CIs

When the population parameters are unknown and we want a range of possible values for these unknown parameters.

Honest interpretation of an \((a\times 100)\%\) CI

(Note: Anywhere you see <>’s you should replace the inside with problem-specific terms.)

“We don’t know exactly what <the unknown parameter value> is, but the interval from <lower bound> to <upper bound> probably contains the true <parameter value>.”

OR

“We are \((a \times 100)\%\) confident that the true population <parameter> is between <lower bound> and <upper bound>.”

The key to these interpretations is that we admit that we are unsure about two things. First, we need an interval, not just a single value, to try to capture the true <parameter value>. Second, we aren’t even certain that the true <parameter value> is included in our interval, but we’re “pretty sure” that it is. By pretty sure, we mean in the hypothetical long-run frequency sense. We are saying that if we could take all random samples of size n from this population and then create a CI based on each of these different random samples, then \((a \times 100)\%\) of these confidence intervals will contain the true population <parameter value>.

\((a \times 100)\%\) is called the confidence level. It must be chosen before data is analyzed.

Tips on how to interpret CIs

Confidence intervals are useful because they allow us to quantify our uncertainty. We can heuristically think about them as informing us on how much we’d be willing to bet on certain outcomes. What we really mean however, has to do with the behavior of these intervals if we could calculate them based for all possible samples of size \(n\) from the population of interest.