Probability and Conditional Probability

class: center, middle, inverse, title-slide

# Probability and Conditional Probability
### STAT 021 with Prof Suzy
### Swarthmore College

---

.scroll-output {
  height: 75%;
  overflow-y: scroll;
}

.scroll-small {
  height: 50%;
  overflow-y: scroll;
}
   
.red{color: #ce151e;}
.green{color: #26b421;}
.blue{color: #426EF0;}
</style>

## Probability example
### Tuberculosis testing

There is a TB screening test that is administered via a shot in the forearm. After waiting 48 hours, the TB test is declared negative if your arm looks normal (no redness or swelling). This test is about `$99\%$` effective. Suppose Bob takes this test but their arm has a severe reaction, indicative of a positive diagnosis.

**Q:** What is the chance that Bob has TB?

---
## Probability Review

Rules of Probability

1. Probabilities must be between zero and one (inclusive).
  
  2. The probability of an entire sample space is equal to one.
  
  3. The probability that an event, `$A$`, occurs is equal to one minus the probability that any other event besides `$A$` occurs.

The following is true for any events `$A$` and `$B$` within some sample space:

`$$P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B).$$`

---
## Conditional Probability

We can talk about the probability of one event, `$B$`, given that another event `$A$` is guaranteed to occur. That probability is defined as
`$$P(B \text{ given } A) = P(B | A) = \frac{P(A \text{ and } B)}{P(A)}.$$`

--
Sometimes we only know the conditional probabilities associate with an event. In such circumstances, the following facts (which are just consequences of the rules of probability) can be useful

- `$P(A \text{ and }B) = P(A)P(B|A);$`

- `$P(A) = P(A \text{ and } B^C) + P(A \text{ and } B).$`

---
## Independence

The following are all equivalent statements

1. Random events `$A$` and `$B$` are independent;

2. `$P(A \text{ and } B) = P(A)P(B)$`;

3. `$P(B \text{ given } A) = P(B)$`.

---
## Probability example
### Tuberculosis testing

Let's define some terms to answer the question: how likely it is that Bob has TB, given that they tested positive for TB?

Define the following events:

`$A =$` event that Bob has TB

`$A^C =$` event that Bob does not have TB

`$B =$` event that Bob tests positive for TB

`$B^C =$` event that Bob tests negative for TB

The probability we want to find to answer the question is: `$P(A|B)$`

---
## Probability example
### Tuberculosis testing

According to the CDC, the incidence rate of TB is 5 cases per 100,000 in the US. This means that `$P(A) = 0.00005$` and that `$P(A^C) = 1 - 0.00005=0.99995$`.

A false positive occurs when the result of the test is positive but the patient does not actually have TB. A `$99\%$` accuracy means that we can expect false-positives to occur at a rate of `$1\%$`.

`$$P(\text{False positive}) = P(\text{the test is positive given that the person does not have TB}) = P(B|A^C) = 0.01$$`

--
Similarly, a false negative occurs when the result of the test is negative but the patient actually does have TB. Suppose the TB test results in a false negative, `$0.1\%$` of the time.

`$$P(\text{False negative}) = P(\text{the test is negative given that the person does have TB}) = P(B^C \mid A) = 0.001$$`

---
## Probability example
### Tuberculosis testing

From the properties of conditional probability we know that to calculate `$P(A|B)$` we need `$P(A \text{ and } B)$` and `$P(B).$`

A tree diagram helps us think through how to find these two probabilities...

---
## Probability example
### Tuberculosis testing

Putting everything together, we can answer the question: what is the chance that Bob actually has TB given that Bob's TB test is positive?

`$$P(A|B) = \frac{P(A\text{ and }B)}{P(B)} = \frac{0.00004995}{0.00004995+0.0099995} = 0.00497$$`

Thus the chance that Bob has TB, after testing positive for TB, is less than `$0.5\%$`.

.footnote[**Disclaimer:** We took some liberties with deriving this calculation. First, we assumed that the probability that Bob actually had TB is equal to the incidence rate. In reality, this probability depends on many more factors, not the least of which is location and proximity to infected individuals. Second, we also assumed the false-negative rate.]

---
## Probability calculations

The table below summarizes all of the employees who work in a small business at three different levels based on their previous years experience when hired.

|      |<1 yr | at least 1 yr |
|------|------|---------------|
| Management | 6 | 7 |
| Supervision | 12 | 8 |
| Production | 72 | 45 |

**Questions:**

1. What is the probability that a worker selected at random is both a supervisor and has at least one year of prior experience?

2. What is the probability that a worker selected at random is a production worker, given that the worker has less than one year of previous experience?

3. Do the data in the table above suggest that whether a randomly chosen employee is in a supervisory position is independent of previous experience?