class: center, middle, inverse, title-slide .title[ # Week 4 - Part 2 ] .subtitle[ ## Estimating the Error Variance ] .author[ ### Suzanne Thornton ] .institute[ ### Swarthmore College ] .date[ ### For class in Week 9 (updated: 2023-07-31) ] --- <style type="text/css"> pre { background: #FFBB33; max-width: 100%; overflow-x: scroll; } .scroll-output { height: 70%; overflow-y: scroll; } .scroll-small { height: 50%; overflow-y: scroll; } .red{color: #ce151e;} .green{color: #26b421;} .blue{color: #426EF0;} </style> ## What does the rest of the R output mean? ### Residual standard error and it's degrees of freedom Recall that `\(\sigma\)` is also a model parameter that we don't know and must estimate. `$$\hat{\sigma} = \text{Residual standard error} = \sqrt{\frac{SSres}{n-2}} = \sqrt{MSE}$$` -- The *residual degrees of freedom* are `\(n-2\)`, where we are subtracting `\(2\)` because *SSres* is computed after we have estimated two model parameters, `\(\beta_0\)` and `\(\beta_1\)`. ```r df.residual(SLR_hc) ``` ``` ## [1] 49 ``` ```r dim(hc_employer_2013) ``` ``` ## [1] 51 6 ``` --- ## What does the rest of the R output mean? ### Residual standard error and it's degrees of freedom To extract this estimate for `\(\sigma\)` from the model use: ```r SLR_hc_summary <- SLR_hc %>% summary SLR_hc_summary$sigma ``` ``` ## [1] 0.02945572 ``` --- ## What does the rest of the R output mean? ### Residual standard error and it's degrees of freedom To see that `\(\hat{\sigma} = \text{Residual standard error} = \sqrt{\frac{SSres}{n-2}}\)`, look at the following output: ```r anova(SLR_hc) ``` ``` ## Analysis of Variance Table ## ## Response: prop_uninsured ## Df Sum Sq Mean Sq F value Pr(>F) ## spending_capita 1 0.023730 0.0237300 27.35 3.503e-06 *** ## Residuals 49 0.042514 0.0008676 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` .footnote[<a href="https://campus.datacamp.com/courses/correlation-and-regression/model-fit?ex=3">Here</a> is a useful R tutorial that helps to explain what *residual standard error* is.] --- ## What does the rest of the R output mean? ### Residual standard error and it's degrees of freedom .scroll-output[ ```r hc_resid_data <- hc_employer_2013 %>% mutate(residuals = SLR_hc$residuals, fitted_vals = SLR_hc$fitted.values) ggplot(hc_resid_data) + geom_point(aes(x=fitted_vals, y=residuals)) + labs(title="Residual plot", subtitle="Cost of health care a predictor of proportion of people uninsured", x="Fitted values", y="Residuals") + geom_hline(yintercept=0) ``` <img src="Figs/class14-2-1.png" style="display: block; margin: auto;" /> ]