Simple Linear Regression: Diagnostics, Inference and Prediction

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Simple Linear Regression: Diagnostics, Inference and Prediction
## .black[STAT3022 Applied Linear Models Lecture 4]

### .black[2020/02/16]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Q-Q plots for residuals 
1. Sampling distribution of least squares estimators.
1. Hypothesis testing for `$\beta_0$` and `$\beta_1$`
1. Confidence intervals for `$\beta_0$` and `$\beta_1$`
1. Confidence interval for `$\sigma^2$`

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Model Diagnostics
]]
.row[
.split-two[
.column[
.split-two[
.row[.content[

Plot `$Y_i$` vs `$x_i$` to see if there is `$\approx$` a linear relationship between `$Y$` and `$x$`.

]]
.row.bg-brand-gray[.content[
A boxplot of the residuals `$R_i$` to check for symmetry.
<img src="lecture04_2020JC_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" />
]]
]
]
.column[
.split-two[
.row.bg-brand-gray[.content[
To check the homoscedasticity assumption, plot `$R_i$` vs `$x_i$`. There should be no obvious patterns.

]]
.row[.content[
A normal Q-Q plot, i.e. a plot of the ordered residuals vs `$\Phi^{-1}(\frac{i}{n+1})$`.

]]
]
]
]]

---

---

---

.row.bg-brand-blue.white[.content.vmiddle[

# Assessing (A1) `$E[\epsilon_i] = 0$` for `$i=1,\ldots,n$`

]]
.row[.content[
* It is a property of the least squares method that $$\sum_{i=1}^n R_i = 0,\quad
\text{so}\quad \bar R_i  = 0$$ 
where `$R_i=\tilde \epsilon_i =Y_i -\tilde Y_i$` and hence (A1)  will always appear valid "overall". 
* Trend in residual versus fitted values or covariate can indicate "local"
failure of (A1). 
* What do you conclude from the following plots?

![](lecture04_2020JC_files/figure-html/unnamed-chunk-7-1.svg)

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Assessing (A2)-(A3) 
]]
.row[.split-two[
.column.bg-brand-gray[.content[

### (A2) `$\epsilon_1, \ldots ,\epsilon_n$` are independent

* If (A2) is correct, then residuals should appear randomly scattered
about zero if plotted against fitted values or covariate.
* Long sequences of positive residuals followed by sequences of negative residuals in `$R_i$` vs `$x_i$` plot suggests that the error terms are not independent.

![](lecture04_2020JC_files/figure-html/unnamed-chunk-8-1.svg)

]]
.column[.content[

### (A3) `$Var(\epsilon_i) = \sigma^2$` for `$i=1,\ldots,n$`

* If (A3) holds then the spread of the residuals should be roughly the same across the fitted values or covariate.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[

# Examining the simulated data further
]]
.row[.split-two[
.column[.split-two[
.row[.content[

]]
.row.bg-brand-gray[.content[
<img src="lecture04_2020JC_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" />

]]]]

]]
.row[.content[
<img src="lecture04_2020JC_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
]]]]
]]

]]

Simulation scheme

```r
n <- 100; s=seq(0.5, 1, length.out = n)
x <- seq(0, 1, length.out = n)
y1 <- x + rnorm(n) / 3 # Linear
y2 <- 3 * (x - 0.5) ^ 2 + 
 c(rnorm(n / 2)/3, rnorm(n / 2)/6) # Quadratic
y3 <- -0.25 * sin(20 * x - 0.2) + 
 x + rnorm(n,0,2) / 3 # Non-linear

M1 <- lm(y1 ~ x); M2 <- lm(y2 ~ x); M3 <- lm(y3 ~ x)
```

---

.row.bg-brand-blue.white[.content.vmiddle[
# Assessing (A4) `$\epsilon_1, \ldots ,\epsilon_n$` are normally distributed
]]
.row[.split-two[
.column[.content[
### Q-Q Plots

* The function `qqnorm(x)` produces a Q-Q plot of the ordered vector `x` against the quantiles of the normal distribution, ie plotting `$(\Phi^{-1}(\frac{i}{n+1}),\frac{\epsilon_i}{\hat \sigma})$`.
* The `$n$` chosen normal quantiles `$\Phi^{-1}(\frac{i}{n+1})$` are easy to calculate but more sophisticated ways exist:
   * `$\frac{i}{n+1} \mapsto \frac{i-3/8}{n+1/4}$`, default in `qqnorm`. 
   * `$\frac{i}{n+1} \mapsto \frac{i-1/3}{n+1/3}$`, recommended by Hyndman and Fan (1996).
* Symmetry, heavy or light tails, location and spread can be "easily" seen in Q-Q plots. How?

]]
.column.bg-brand-gray[.content[
### In R

By "hand"

```r
plot(qnorm((1:n)/(n+1)),sort(resid(M1))) #or divide sigma
```

By `base`

```r
qqnorm(resid(M1))
qqline(resid(M1))
```

By `ggplot2`

```r
data.frame(residual=resid(M1)) %>% 
  ggplot(aes(sample=residual)) + 
  stat_qq() + stat_qq_line(color="blue")
```

]]

.bottom_abs.width100[
Reference: Hyndman and Fan (1996). *Sample quantiles in statistical packages*, American Statistician, 50, 361--365.
]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Does any Q-Q plot look unusual?
]]
.row[.content[

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# How to interpret QQ plot
]]
.row[.content[
<img src="images/InterpretQQPlot.PNG" width="100%" height="100%">

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Making inferences based on a single data plot
]]
.row[.split-two[
.column[
<img src="lecture04_2020JC_files/figure-html/unnamed-chunk-17-1.png" width="504" />
]
.column[.content[

* Did you think plot 20 was unusual?

* Plot 1-19 was generated as follows:

1. Simulate new response, `$\log(\texttt{BrainWt})$`, from 
 `$$\hat{\beta}_0 + \hat{\beta}_1 \log(\texttt{BodyWt}) + \epsilon_i$$`
 where `$\epsilon_i$` is sampled independently from `$N(0, \hat\sigma^2)$`.
 2. Fit the following linear model to the new response
`$$\log(\texttt{BrainWt}) = \beta_0 + \beta_1 \log(\texttt{BodyWt}) + \epsilon_i$$`
where `$\epsilon_i\sim NID(0, \sigma^2)$`.
 3. Make a Q-Q plot from the residuals of the above fit.

]]
]]

.bottom_abs.width100[
For more details see Wickham, Chowdhury, Cook and Hofmann
 (2018). `nullabor`: Tools for Graphical Inference.
]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Using .yellow[augment] from .yellow[broom]
]]
.row[.content[

```r
dat <- read.csv("data/sleep.csv")
fit <- lm(log(BrainWt) ~ log(BodyWt), data=dat)
fit %>% broom::augment()
```

```
     log.BrainWt. log.BodyWt.     .fitted    .se.fit       .resid       .hat
  1     8.6503245  8.80297346  8.75186002 0.23006114 -0.101535517 0.10979911
  2     1.8870696  0.00000000  2.13478868 0.09604339 -0.247719028 0.01913583
  3     3.7954892  1.21935391  3.05135986 0.08823966  0.744129330 0.01615251
  4     1.7404662 -0.08338161  2.07211189 0.09700864 -0.331645719 0.01952240
  5     8.4344635  7.84267147  8.03001453 0.20508257  0.404449017 0.08725087
  6     5.1901752  2.35612586  3.90585535 0.09281968  1.284319858 0.01787279
  7    -1.2039728 -3.77226106 -0.70076691 0.17008415 -0.503205892 0.06001225
  8     5.1298987  5.07517382  5.94972546 0.13817737 -0.819826743 0.03960829
  9     3.2425924  1.19392247  3.03224341 0.08827022  0.210348946 0.01616370
  10    6.0867747  3.95431592  5.10719234 0.11542360  0.979582387 0.02763767
  11    1.8562980 -0.85566611  1.49159650 0.10803694  0.364701495 0.02421346
  12    6.0473722  6.14203741  6.75167181 0.16271552 -0.704299635 0.05492502
  13    0.8754687 -0.59783700  1.68540301 0.10396911 -0.809934274 0.02242440
  14    6.0378709  5.23164323  6.06734132 0.14163489 -0.029470399 0.04161528
  15    0.1823216 -2.59026717  0.18772128 0.14238685 -0.005399721 0.04205833
  16    3.2188758  1.09861229  2.96060008 0.08843739  0.258275741 0.01622498
  17    1.2527630 -0.24207156  1.95282689 0.09897699 -0.700063920 0.02032267
  18    1.6094379 -1.60943791  0.92499683 0.12170054  0.684441080 0.03072538
  19    2.8622009  0.34358970  2.39306023 0.09260303  0.469140656 0.01778946
  20    4.3944492  4.09434456  5.21244990 0.11803486 -0.818000748 0.02890233
  21    6.5220928  6.27098843  6.84860249 0.16581222 -0.326509689 0.05703551
  22    4.7449321  3.31998733  4.63037646 0.10468519  0.114555670 0.02273436
  23    0.0000000 -2.12026354  0.54101640 0.13214268 -0.541016395 0.03622417
  24    6.0063532  5.33271879  6.14331840 0.14389749 -0.136965236 0.04295549
  25    5.7838252  4.44265126  5.47426715 0.12484549  0.309558036 0.03233389
  26    4.7833164  3.59264385  4.83532853 0.10906448 -0.052012159 0.02467624
  27    1.3862944 -2.29263476  0.41144737 0.13583641  0.974846992 0.03827759
  28    1.7047481  0.03922071  2.16427034 0.09560636 -0.459522243 0.01896207
  29    6.4846352  6.25575004  6.83714800 0.16544505 -0.352512768 0.05678319
  30    5.0562458  4.60517019  5.59643034 0.12816196 -0.540184534 0.03407458
  31    4.0253517  3.55534806  4.80729381 0.10844311 -0.781942122 0.02439586
  32   -1.9661129 -5.29831737 -1.84788197 0.20844810 -0.118230883 0.09013805
  33   -1.3862944 -4.60517019 -1.32685299 0.19075342 -0.059441375 0.07548435
  34    7.1853870  4.12713439  5.23709755 0.11865737  1.948289465 0.02920799
  35    1.0986123 -2.10373423  0.55344124 0.13179263  0.545171049 0.03603251
  36    2.0918641  0.30010459  2.36037308 0.09298865 -0.268509017 0.01793792
  37   -0.9162907 -3.77226106 -0.70076691 0.17008415 -0.215523820 0.06001225
  38   -1.1086626 -3.03655427 -0.14774646 0.15256388 -0.960916163 0.04828538
  39    1.8405496  0.53062825  2.53365447 0.09111767 -0.693104837 0.01722335
  40    2.3795461  1.25276297  3.07647298 0.08820853 -0.696926847 0.01614111
  41    6.1944054  5.52146092  6.28519320 0.14817958 -0.090787805 0.04555006
  42    2.7408400 -0.73396918  1.58307437 0.10607308  1.157765654 0.02334117
  43    4.7449321  2.30258509  3.86560951 0.09235501  0.879322620 0.01769429
  44    2.4336134  0.48242615  2.49742163 0.09147314 -0.063808273 0.01735799
  45    5.1929569  5.25749537  6.08677401 0.14221149 -0.893817157 0.04195480
  46    2.4932055  0.91629073  2.82355153 0.08898701 -0.330346081 0.01642727
  47    3.6686767  1.45582042  3.22910842 0.08823977  0.439568332 0.01615255
  48    0.6418539 -1.27296568  1.17791828 0.11530848 -0.536064395 0.02758257
  49    3.9199912  1.44338333  3.21975963 0.08822697  0.700231550 0.01614786
  50    5.1873858  1.91692261  3.57571245 0.08970444  1.611673361 0.01669322
  51    2.5095993 -0.28768207  1.91854211 0.09957344  0.591057154 0.02056834
  52    3.0445224  1.28093385  3.09764863 0.08819024 -0.053126196 0.01613442
  53    4.5870062  2.69665216  4.16182418 0.09628846  0.425182038 0.01923361
  54    5.1647860  4.01638302  5.15384731 0.11657143  0.010938666 0.02819009
  55    2.5257286  0.33647224  2.38771012 0.09266513  0.138018519 0.01781332
  56    0.0000000 -2.81341072  0.01998741 0.14742637 -0.019987408 0.04508817
  57    0.9555114 -0.10536052  2.05559066 0.09727112 -1.100079214 0.01962818
  58    2.5095993  0.69314718  2.65581766 0.09006298 -0.146218402 0.01682693
  59    0.9162907 -2.26336438  0.43344950 0.13520374  0.482841228 0.03792186
  60    4.0604430  1.43270073  3.21172967 0.08821711  0.848713341 0.01614425
  61    1.3609766  1.25276297  3.07647298 0.08820853 -1.715496428 0.01614111
  62    2.8332133  1.39871688  3.18618449 0.08819271 -0.352971141 0.01613532
        .sigma      .cooksd   .std.resid
  1  0.7000137 1.481634e-03 -0.154999431
  2  0.6993962 1.265991e-03 -0.360255866
  3  0.6933081 9.584369e-03  1.080539535
  4  0.6987947 2.316801e-03 -0.482404874
  5  0.6979813 1.776956e-02  0.609739319
  6  0.6795215 3.170192e-02  1.866575053
  7  0.6968857 1.783895e-02 -0.747550871
  8  0.6916313 2.993752e-02 -1.204908341
  9  0.6996093 7.664044e-04  0.305446427
  10 0.6881054 2.909440e-02  1.430813525
  11 0.6985021 3.508363e-03  0.531760730
  12 0.6937719 3.163988e-02 -1.043471186
  13 0.6919840 1.596622e-02 -1.179860691
  14 0.7001429 4.081571e-05 -0.043358296
  15 0.7001535 1.386113e-06 -0.007946171
  16 0.6993327 1.159957e-03  0.375052316
  17 0.6940724 1.076396e-02 -1.018714051
  18 0.6942793 1.589125e-02  1.001310516
  19 0.6974364 4.209610e-03  0.681799856
  20 0.6917635 2.127143e-02 -1.195579266
  21 0.6987841 7.092985e-03 -0.484288921
  22 0.6999913 3.240208e-04  0.166903870
  23 0.6964682 1.183994e-02 -0.793740504
  24 0.6999166 9.125515e-04 -0.201651015
  25 0.6989542 3.432206e-03  0.453247533
  26 0.7001203 7.279028e-05 -0.075855418
  27 0.6880895 4.079438e-02  1.431751738
  28 0.6975437 4.315286e-03 -0.668220462
  29 0.6985574 8.226758e-03 -0.522787527
  30 0.6964878 1.105376e-02 -0.791637720
  31 0.6925265 1.625550e-02 -1.140233831
  32 0.6999679 1.578706e-03 -0.178525111
  33 0.7001076 3.236595e-04 -0.089040552
  34 0.6511095 1.220221e-01  2.848042829
  35 0.6964120 1.195412e-02  0.799756396
  36 0.6992647 1.390895e-03 -0.390252356
  37 0.6995555 3.272415e-03 -0.320177132
  38 0.6883104 5.105702e-02 -1.418692456
  39 0.6942121 8.885625e-03 -1.006995766
  40 0.6941528 8.400874e-03 -1.011991626
  41 0.7000493 4.274841e-04 -0.133846516
  42 0.6833399 3.402205e-02  1.687350683
  43 0.6905607 1.470682e-02  1.277853428
  44 0.7001037 7.591805e-05 -0.092711895
  45 0.6899867 3.787819e-02 -1.315260618
  46 0.6988096 1.922087e-03 -0.479757807
  47 0.6977727 3.344418e-03  0.638290887
  48 0.6965678 8.694517e-03 -0.782972891
  49 0.6940954 8.484396e-03  1.016793797
  50 0.6674150 4.651557e-02  2.340931152
  51 0.6958232 7.769485e-03  0.860198221
  52 0.7001191 4.879569e-05 -0.077143078
  53 0.6979192 3.749399e-03  0.618369772
  54 0.7001524 3.704630e-06  0.015981952
  55 0.6999191 3.648496e-04  0.200584086
  56 0.7001488 2.048950e-05 -0.029459880
  57 0.6850499 2.563473e-02 -1.600238374
  58 0.6998906 3.860390e-04 -0.212394456
  59 0.6972146 9.907405e-03  0.709014804
  60 0.6912354 1.246119e-02  1.232399309
  61 0.6629608 5.090140e-02 -2.491033350
  62 0.6986194 2.154109e-03 -0.512539859
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Inference for the linear regression model
]]
.row[.content[

* Consider the simple linear regression model
`$$Y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad i=1,\ldots,n$$` and `$\epsilon_i \sim NID(0,\sigma^2)$`.

* Let `$p$` be the number of regression parameters to be estimated. So for a simple linear regression model, `$p=2$`.

* The following is an important known result from distribution theory (use moment generating functions to prove it):

## Theorem

Let `$Y_1,Y_2,\ldots,Y_n$` be independent normal random variables with `$E (Y_i) = \mu_i$` and `$Var(Y_i) = \sigma^2_i$`. Then for real constants `$c_1,\ldots,c_n$` we have that

`\begin{equation}
\mathbf{c}^\top \boldsymbol{Y} = \sum_{i=1}^n c_iY_i \sim N\left(\sum_{i=1}^n c_i \mu_i, \sum_{i=1}^n c_i^2 \sigma_i^2\right). \label{eqn:vecindnor2nor}
\end{equation}`

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Distribution of the regression parameter estimates
]]
.row[.split-two[
.column[.content[

* Write the LSEs as linear combination of the `$Y$`'s:

`\begin{eqnarray*}
\hat{\beta}_1 &=& \frac{S_{xy}}{S_{xx}} = \frac{1}{S_{xx}} \sum_{i=1}^{n}(x_i-\bar{x})Y_i,\\
\hat{\beta_0} &=& \bar{Y} - \hat{\beta}_1\bar{x}.
\end{eqnarray*}`

since `$\bar Y \sum_{i=1}^{n}(x_i-\bar{x})=0$`.

* If `$\epsilon_i \sim NID(0,\sigma^2)$`, we have that `$$Y_i \sim N(\beta_0+\beta_1 x_i,\sigma^2).$$`
* From (A2) errors are independent `$\Rightarrow$` `$Y_i$`'s are independent.

Hence we conclude that the .brand-blue[sampling distributions] of `$\hat{\beta}_0$` and `$\hat{\beta}_1$` are also normal.

]]
.column[.content[
* The distribution of any random variable `$Y \sim N(\mu,\sigma^2)$` is uniquely defined by knowing its first two moments: location and spread.
* It can be shown that `$\hat{\beta}_0$` and `$\hat{\beta}_1$` are *unbiased*, i.e.
  * `$E(\hat{\beta}_0) = \beta_0$` and `$E(\hat{\beta}_1) = \beta_1$`.
* Further,
  * `$Var(\hat{\beta}_0) = \sigma^2[\frac{1}{n}+\frac{\bar x^2}{S_{xx}}]=\sigma^2[\frac{S_{xx}+n\bar x^2}{nS_{xx}}]=\sigma^2\frac{\sum x_i^2}{nS_{xx}},$` (depends on location of `$x$`'s; `$S_{xx}=\sum x_i^2-n \bar x^2$`),
  * `$Var(\hat{\beta}_1) = \frac{\sigma^2}{S_{xx}} .$`

.content-box-blue[
`\begin{eqnarray*}
\hat\beta_0 &\sim& N\left(\beta_0, \sigma^2 \dfrac{(\sum x_i^2)}{nS_{xx}}\right)\\
\hat\beta_1 &\sim& N\left(\beta_1, \dfrac{\sigma^2}{S_{xx}}\right)\\
\end{eqnarray*}`
]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Sampling distribution 
]]
.row[.split-two[
.column[.content[

* Standardizing the normal sampling distribution of `$\hat \beta_1$`  gives
`\begin{equation}
\frac{\hat \beta_1 - \beta_1}{\sigma/\sqrt{S_{xx}}} \sim N(0,1). 
\end{equation}`
* Replacing `$\sigma^2$` with the estimate `$s^2 = \text{RSS}/(n-p)$` changes the sampling distribution of `$\hat \beta_1$` 
`\begin{equation}
\frac{\hat \beta_1 - \beta_1}{s/\sqrt{S_{xx}}} \sim  t_{n-p}.
\label{eq:b1samp}
\end{equation}`

* This forms the basis for testing hypotheses and constructing confidence intervals for `$\beta_1$`.

]]
.column.content-box-yellow[.content[

## Hypothesis testing for LSEs

* For testing `$H_0: \beta = \beta^0$`, 
against `$H_1: \beta \ne \beta^0$`, use the `$t$`-test statistic
`\begin{equation}
t_{obs} = \frac{\hat \beta - \beta^0}{\hat{SE}(\hat \beta)}\sim t_{n-p}, 
\label{eq:b1test}
\end{equation}`

`$\beta=\beta_0,\beta_1$` and `$\beta^0$` is their hypothesised values. 
* The `$p$`-value is given as `$P(|t_{n-p}| > t_{obs})$`. 
* As usual, small `$p$`-value indicate evidence against `$H_0$`.
* You can obtain these `$t$`-test statistics and `$p$`-values from `summary` or `broom::tidy` functions as shown next.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Sampling distribution of `$\hat \beta_0$` and `$\hat \beta_1$`
]]
.row[
.split-two[
.column.bg-gray[.content[

```r
print(summary(fit), digits = 3)
```

```
 
 Call:
 lm(formula = log(BrainWt) ~ log(BodyWt), data = dat)
 
 Residuals:
 Min 1Q Median 3Q Max 
 -1.7155 -0.4923 -0.0616 0.4360 1.9483 
 
 Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
 (Intercept) 2.1348 0.0960 22.2 <2e-16 ***
 log(BodyWt) 0.7517 0.0285 26.4 <2e-16 ***
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 Residual standard error: 0.694 on 60 degrees of freedom
 Multiple R-squared: 0.921,	Adjusted R-squared: 0.919 
 F-statistic: 697 on 1 and 60 DF, p-value: <2e-16
```
]]
.column.bg-brand-gray[.content[

```r
print(broom::tidy(fit), digits = 3)
```

```
           term estimate std.error statistic  p.value
  1 (Intercept)    2.135    0.0960      22.2 1.18e-30
  2 log(BodyWt)    0.752    0.0285      26.4 9.84e-35
```
A 95% confidence interval for `$\beta_0$` and `$\beta_1$` from `fit`

```r
confint(fit, level=0.95)
```

```
                  2.5 %    97.5 %
  (Intercept) 1.9426733 2.3269041
  log(BodyWt) 0.6947503 0.8086215
```
and similarly a 98% confidence interval is given by

```r
confint(fit, level=0.98)
```

```
                    1 %      99 %
  (Intercept) 1.9052335 2.3643438
  log(BodyWt) 0.6836546 0.8197172
```
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Confidence intervals for regression parameters
]]
.row[.content[

A `$(1-\alpha)100\%$` confidence interval for a regression parameter, `$\beta$`, is

`$$\hat\beta \pm t_{n-p}(1-\alpha/2) \times \hat{SE}(\hat \beta).$$`
A 95% confidence interval for regression parameters from `out` can be found in R by

```r
alpha <- 0.05
out %>% 
 mutate(
lwr=estimate-qt(1-alpha/2,fit$df.residual)*std.error,
upr=estimate+qt(1-alpha/2,fit$df.residual)*std.error) %>%
 select(term,lwr, upr)
```

```
           term       lwr       upr
  1 (Intercept) 1.9426733 2.3269041
  2 log(BodyWt) 0.6947503 0.8086215
```

```r
y=log(dat$BrainWt); x=log(dat$BodyWt); n=length(y); s2=deviance(fit)/(n-2); Sxx=sum((x-mean(x))^2)
round(c(sqrt(s2*sum(x^2)/(n*Sxx)),sqrt(s2/Sxx)),4) #checking
```

```
  [1] 0.0960 0.0285
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Inference for the error variance
]]
.row[.content[
  
Recall that an estimator of the error variance is `$\hat{\sigma}^2 = s^2 = \dfrac{\text{RSS}}{n-p}$`. 
  
Because of the distributional result
`$\text{RSS} \sim \sigma^2 \chi^2_{n-p}$` we know the sampling distribution of `$\hat{\sigma}^2$`,
`$$\hat{\sigma}^2 \sim \left(\frac{\sigma^2}{n-p}\right) \;\chi^2_{n-p}.$$`
    
## Confidence interval for `$\sigma^2$`

Choose `$a$` and `$b$` such that `$$1-\alpha = P(a\leq \chi_{n-2}^2 \leq b).$$`
<img src="lecture04_2020JC_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" />

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Confidence interval for `$\sigma^2$`
]]
.row[.content[
Then from `$\text{RSS} \sim \sigma^2 \chi^2_{n-2}$`,
`\begin{eqnarray*}
1-\alpha &=& P\left(a\leq \frac{\text{RSS}}{\sigma^2} \leq b\right)  =  P\left(\frac{\text{RSS}}{b}\leq \sigma^2 \leq \frac{\text{RSS}}{a}\right).
\end{eqnarray*}`

A `$(1 - \alpha)100\%$` confidence interval for `$\sigma^2$` can be given as

`$$\left(\frac{\text{RSS}}{b}, \frac{\text{RSS}}{a}\right)$$`
where `$a$` and `$b$` satisfy `$P(\chi_{n-p}^2 < a)=\alpha/2$` and `$P(\chi_{n-p}^2 <b)=1-\alpha/2$`.

```r
RSS <- deviance(fit); alpha <- 0.05
n <- nrow(dat); p <- length(coef(fit))
a <- qchisq(alpha / 2, n - p); b <- qchisq(1 - alpha / 2, n - p)
c(RSS, RSS / b, RSS / a)
```

```
  [1] 28.9227104  0.3472211  0.7144630
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Prediction and estimation in linear regression
]]
.row[.split-two[
.column[.content[

### Motivation

* Simple linear regression models can be used to predict the response at any given value of the predictor.
* .brand-red[Beware predicting far beyond the range of the data.]
* Point predictions should be accompanied by corresponding prediction intervals, providing a range of plausible values.
* Suppose that we want to predict `$Y_0=Y|x_0$` when the predictor `$x$` takes the value `$x_0$`.

Note that predicting a response is about estimating the value of a random variable, say `$Y_0$`, and not the value of a parameter, say `$\mu_0$`.
]

]]
.column[.content[
### Prediction of `$Y|x_0$`

* Denote the prediction by `$\tilde Y_0$`.
* Our model for the corresponding response, `$Y_0$`, is `$Y_0 = E(Y_0) + \epsilon_0.$`
* Natural to estimate  the parameter `$E(Y_0)=\mu_0$` by `$\hat \beta_0 + \hat \beta_1 x_0$`.
* No information about the location of `$\epsilon_0$` so predict by its expected
  value, `$\tilde \epsilon_0 = E(\epsilon_0) = 0$`.
* But we know that the error variance is `$\sigma^2$`, thus to not underestimate the variability of the predicted error, we define `$Var(\tilde{\epsilon}_0) = \sigma^2$`.
* Then for simple linear regression
`$$\tilde Y_0 = \hat \beta_0 + \hat \beta_1 x_0 + \tilde \epsilon_0.$$`

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Sampling distribution for the mean response
]]
.row[.content[

For a simple linear regression:

* The standard deviation of the sampling distribution of `$\hat{\mu}_0 = \hat\beta_0 + \hat \beta_1 x_0$` is 
`$$\sigma_{\hat\mu_0} = \sigma \sqrt{ \frac1n + \frac{ (x_0 - \bar x)^2}
      {S_{xx}} }$$`
* The standard error estimate for `$\hat{\mu}_0$` is
`$$\hat{SE}(\hat \mu_0) = s \sqrt{ \frac1n + \frac{(x_0 - \bar x)^2}{S_{xx}}  }$$`
* It can be shown that the sampling distribution of `$\hat{\mu}_0$` is defined by
`$$\frac{ \hat \mu_0 - \mu_0 }{\hat{SE}(\hat \mu_0)} \sim t_{n-2}$$`

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Prediction versus estimation of mean response
]]
.row[.content[
*  The prediction was 
`$$\tilde Y_0 = \color{blue}{\hat \beta_0 + \hat \beta_1 x_0} + \color{red}{\tilde \epsilon_0}.$$`
* But `$\hat \beta_0 + \hat \beta_1 x_0$` is also the natural estimate of `$E(Y_0)=\mu_0$` and `$E(\tilde \epsilon_0)=0$`.
* Hence `$\tilde{Y}_0$`, the predicted value of `$Y_0$` is the same as `$\hat{E( Y_0)}=\hat\mu_0$`, the estimated mean response of `$E (Y_0)$`.
* However, differences arise if we want to construct corresponding confidence intervals!

### Confidence interval for the mean response

* A `$(1-\alpha)100\%$` confidence interval for `$\mu_0$` is given by

`$$\hat\mu_0 \pm t_{n-p}(1-\alpha/2) \hat{SE}(\hat\mu_0).$$`
* This confidence interval provides a range of plausible values for the mean response `$\hat \mu_0$` = plausible values for the location of the regression line.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Prediction intervals for simple linear regression
]]
.row[.content[
  
* When it comes to finding the .brand-blue[prediction error] for `$Y_0$`, we must  take into account the `$\epsilon_0$` term:
`$$Var(\tilde Y_0) = \color{blue}{Var(\hat \mu_0)} + \color{red}{Var(\tilde\epsilon_0)} = [SE(\hat \mu_0)]^2 + \sigma^2.$$`

* Replacing `$\sigma^2$` by its estimate `$s^2$`, we get the prediction error estimate
  of `$\tilde Y_0$` as
`$$\hat{PE}(\tilde Y_0) = s \sqrt{ \frac1n + \frac{(x_0 - \bar x)^2}{S_{xx} 
    }+1}.$$`
* A `$100(1-\alpha)\%$` prediction interval for `$Y_0$` is then
`$$\tilde{Y}_0 \pm t_{n-p}(1-\alpha/2) \times \hat{PE}(\tilde Y_0).$$`
`$$\tilde{Y}_0 \pm t_{n-p}(1-\alpha/2) \sqrt{\hat{SE}^2(\hat\mu_0) + \hat\sigma^2}.$$`
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Confidence and Prediction Interval in R
]]
.row[.split-60[
.column[.content[
The long way in R

```r
uxnew <- c(27, 32); xnew <- log(uxnew)
xbar <- mean(log(dat$BodyWt))
mu_hat <- coef(fit)[1] + coef(fit)[2] * xnew
s <- sigma(fit); n <- nrow(dat); p <- length(coef(fit))
Sxx <- (n - 1) * var(log(dat$BodyWt))
SE_mu <- s * sqrt(1 / n + (xnew - xbar) ^ 2 / Sxx)
PE_mu <- s * sqrt(1 + 1 / n + (xnew - xbar) ^ 2 / Sxx)
cbind(fit=mu_hat, lwr=mu_hat - qt(0.975, n - p) * SE_mu,
 upr=mu_hat + qt(0.975, n - p) * SE_mu)
```

```
            fit      lwr      upr
  [1,] 4.612223 4.403559 4.820887
  [2,] 4.739934 4.525945 4.953922
```

```r
cbind(fit=mu_hat, lwr=mu_hat - qt(0.975, n - p) * PE_mu,
                  upr=mu_hat + qt(0.975, n - p) * PE_mu)
```

```
            fit      lwr      upr
  [1,] 4.612223 3.207839 6.016607
  [2,] 4.739934 3.334748 6.145119
```

]]
.column.bg-brand-gray[.content[
The short way in R

```r
predict.at <- data.frame(BodyWt = uxnew)
predict(fit, newdata = predict.at,
 interval = "confidence", 
 level = 0.95)
```

```
         fit      lwr      upr
  1 4.612223 4.403559 4.820887
  2 4.739934 4.525945 4.953922
```

```r
predict(fit, newdata = predict.at,
        interval = "prediction", 
        level = 0.95)
```

```
         fit      lwr      upr
  1 4.612223 3.207839 6.016607
  2 4.739934 3.334748 6.145119
```
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Conf intervals for estimation `$\mu_0$` and prediction `$Y_0$`
]]
.row[.split-50[
.column[.content[
<img src="images/3DRegressionLine.PNG" width="80%" height="80%">

* `$\color{blue}{\mu_0}=E(Y_0)=\beta_0+\beta_1 x_0$` and `$Y_0=\color{blue}{\beta_0+\beta_1 x_0} +\color{red}{\epsilon_0}$`.
* `$\color{blue}{\mu_0}$` on regression line is a fixed mean parameter of normal curve when `$x=x_0$`. 
* `$Y_0$` is a random variable distributes around the normal curve with mean `$\mu_0$`, ie `$Y_0 \sim N(\mu_0, \color{red}{\sigma^2})$` with variability due to `$\sigma^2$`.

]]
.column.bg-brand-gray[.content[

* `$\mu_0$` and `$Y_0$` are estimated from a sample.
* Think about different samples give different regression lines with slightly different `$\hat \mu_0$`.
* That is `$\color{blue}{\hat \mu_0}$` has variability due to sampling distribution.
* Then `$\hat Y_0$` has two sources of variability from the model distribution `$N(\mu_0, \color{red}{\sigma^2})$` and the sampling distribution of `$\color{blue}{\hat \mu_0}$`.
* This explains why the CI for prediction `$\hat Y_0$` is wider than the CI for mean `$\hat \mu_0$` as shown in the next plot.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Confidence interval plot for simple linear regression
]]
.row[.content[

Confidence interval for mean response and prediction.
<img src="lecture04_2020JC_files/figure-html/unnamed-chunk-30-1.png" style="display: block; margin: auto;" />
Note that the CI for mean response is narrower in the middle (Why?) and is much narrower than the CI for prediction.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Why these CIs are narrower in the middle? 
]]
.row[.content[
<img src="images/CentreEstInterval.PNG" width="30%" height="30%">

* The error increases away from `$\bar x$` because if you wiggle the regression line,
it makes more of a difference as it gets further away from the mean `$\bar x$`.

]]

---

.column.bg-brand-red.white[.content[
# Summary

* Sampling distribution of least squares estimators is normal under assumptions (A1)-(A4).

* For `$Y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad \epsilon_i\sim NID(0, \sigma^2)$`

`$$\hat\beta_0 \sim N\left(\beta_0, \sigma^2 \dfrac{(\sum x_i^2)}{nS_{xx}}\right)\qquad
\hat\beta_1 \sim N\left(\beta_1, \dfrac{\sigma^2}{S_{xx}}\right)$$`

* A `$(1-\alpha)100\%$` confidence interval

* for a regression parameter, `$\beta$`, is `$\hat\beta \pm t_{n-p}(1-\alpha/2) \times \hat{SE}(\hat \beta)$`; 
  * for `$\sigma^2$` is `$\left(\frac{\text{RSS}}{b}, \frac{\text{RSS}}{a}\right)$` where `$a$` and `$b$` satisfy `$1-\alpha = P(a\leq \chi_{n-p}^2 \leq b)$`; and
  * for `$\mu_0$` is `$\hat\mu_0 \pm t_{n-p}(1-\alpha/2) \hat{SE}(\hat\mu_0).$`

* A `$(1-\alpha)100\%$` prediction interval for `$\mu_0$` is

`$$\hat\mu_0 \pm t_{n-p}(1-\alpha/2) \sqrt{\hat{SE}^2(\hat\mu_0) + \hat\sigma^2}.$$`
]]
.column[.content[
# Next lesson

* Matrix algebra for linear regression
]]

---

---