One-way ANOVA Part I

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# One-way ANOVA Part I
## .black[STAT3022 Applied Linear Models Lecture 14]

### .black[2020/02/17]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. One-way ANOVA 
1. Within and between sum of squares
1. Sum constraint
1. Treatment constraint
1. Contrasts
1. Confidence intervals for contrasts

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# One-way ANOVA
]]
.row[.split-two[
.column[.content[

* One-way ANOVA was already covered in DATA2002/STAT2012/STAT2912 and is therefore assumed knowledge.

* However, in this course we derive all underlying quantities more carefully and it is expected that the various formulae are understood.

### Scope

A one-way ANOVA can be used for comparing `$t$` different treatments when it is reasonable to assume that the treatment only affects the mean response (but not the entire distribution!).

]]
.column.bg-brand-gray[.content[

### Example - Stimulating effects of caffeine

Does caffeine stimulation affect the rate at which individuals can tap their fingers?

* `$n=30$` male students randomly allocated to `$t=3$`
groups of `$10$` students each:
  * Group 1: zero caffeine dose
  * Group 2: low caffeine dose
  * Group 3: high caffeine dose
* Allocation was blind (subjects did not know their caffeine
dosage).
* Two hours after treatment, each subject tapped fingers as quickly
as possible for a minute. Number of taps recorded.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Caffeine: Variable types and summaries
]]
.row[.split-two[
.column.bg-brand-gray[.content[

```r
caffeine <- read.table(file="data/caffeine.txt",
 header=TRUE)

skimr::skim(caffeine)
```

```
 Skim summary statistics
 n obs: 30 
 n variables: 2 
 
 -- Variable type:factor ----------------------------------------------------------------------------------------------------
 variable missing n n_unique ordered
 Dose 0 30 3 FALSE
 
 -- Variable type:integer ---------------------------------------------------------------------------------------------------
 variable missing n mean sd p0 p25 p50 p75 p100 hist
 Taps 0 30 246.5 2.6 242 245 246.5 248 252 <U+2583><U+2583><U+2583><U+2587><U+2587><U+2581><U+2583><U+2581>
```

```r
m=tapply(caffeine$Taps,caffeine$Dose,mean); m
```

```
   High   Low  Zero 
  248.3 246.4 244.8
```

]]
.column.bg-white[.content[

```r
ggplot(caffeine, aes(Dose, Taps)) + 
  geom_boxplot() + 
  geom_point() + geom_jitter()
```

<img src="lecture14_2020JC_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" />
]]
]]
---

.row.bg-brand-blue.white[.content.vmiddle[
# One-Way ANOVA
]]
.row[.split-two[
.column[.content[
### Getting the ANOVA table in R

```r
anova(lm(Taps ~ Dose, data=caffeine))
```

```
  Analysis of Variance Table
  
  Response: Taps
            Df Sum Sq Mean Sq F value   Pr(>F)   
  Dose       2   61.4 30.7000  6.1812 0.006163 **
  Residuals 27  134.1  4.9667                    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]]
.column.bg-brand-gray[.content[

```r
summary(aov(Taps ~ Dose, data=caffeine))
```

```
              Df Sum Sq Mean Sq F value  Pr(>F)   
  Dose         2   61.4  30.700   6.181 0.00616 **
  Residuals   27  134.1   4.967                   
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Model equation
]]
.row[.split-two[
.column[.content[

*  The **null model** (under `$H_0$`) is 
`$$Y_{ij} = \mu + \epsilon_{ij}, \quad j=1,\ldots,n_{i},\;\; i=1,\ldots,t$$`
where `$\epsilon_{ij}\sim NID(0,\sigma^2)$`.
* The **full model** (under `$H_1$`) is 
`$$Y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad j=1,\ldots,n_{i},\;\; i=1,\ldots,t$$`
where `$\epsilon_{ij}\sim NID(0,\sigma^2)$`.

* There are `$n_i$` observations receiving treatment `$i$` over all `$t$` treatments and total observations are `$$n = \sum_{i=1}^t n_i.$$`

* Here covariate is .brand-blue[categorical treatment types] and regression parameters are `$\alpha_i$` for each level `$i$`. 
]]
.column.bg-brand-gray[.content[

* `$\boldsymbol{\beta} = (\mu, \alpha_1,\alpha_2,\ldots,\alpha_t)^\top$`,
* `$\boldsymbol{Y} = (Y_{11},\ldots,Y_{1n_1},Y_{21},\ldots,Y_{2n_2},\ldots,Y_{tn_t})^\top$` then,
`$$\mathbf{X}=\begin{pmatrix} 1 & 1 & 0 & \cdots & 0 \\ 1 & 1 & \vdots & \ddots & \vdots \\ 1 & 1 & 0 & \cdots & 0 \\ 1 & 0 & 1 & 0 \cdots & 0 \\ \vdots & \ddots &  &   &  \\  1 & 0 & \cdots & \cdots 0 & 1 \\ \end{pmatrix}\begin{matrix}  \\ \Rightarrow n_1 \text{ rows}\\ \\\Rightarrow n_2 \text{ rows} \\ \vdots \\ \\ \end{matrix}$$` 
* `$\mathbf{X}$` is an `$n\times(t+1)$` matrix of rank `$t$` because there are only `$\color{blue}{t}$` .brand-blue[distinct columns] in `$\mathbf{X}$`.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# LS for one-way ANOVA
]]
.row[.split-two[
.column[.content[
* The least squares estimates `$\hat{\boldsymbol{\beta}}$` for `$\boldsymbol{\beta}$` is

`$\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1}\mathbf{X}^\top \boldsymbol{Y}$`

* Summing the estimated full model over `$i$` and `$j$`:

(1):  `$\hspace{5mm} n\hat{\mu} + n_1 \hat{\alpha}_1 + \ldots + n_t \hat{\alpha}_t = \sum_{i=1}^t \sum_{j=1}^{n_i} Y_{ij} = Y_{\bullet\bullet}$`

and summing over `$j$` for `$i=1,\ldots,t$`:

`$(i+1)$`: `$n_i\hat{\mu} + n_i \hat{\alpha}_i = \sum_{j=1}^{n_i} Y_{ij} = Y_{i\bullet} = n_i \bar{Y}_{i\bullet}$`

* These equations are

`$\begin{pmatrix} n & n_1 & \cdots & n_t \\ n_1 & n_1 & 0\cdots &0 \\ \vdots &  0 & \ddots &  \vdots \\ n_t& 0 & \cdots 0 & n_t \\ \end{pmatrix} \begin{pmatrix} \hat \mu \\ \hat \alpha_1 \\ \vdots \\ \hat \alpha_t \end{pmatrix}=\begin{pmatrix} Y_{\bullet\bullet} \\ Y_{1\bullet} \\ \vdots \\ Y_{t\bullet} \end{pmatrix}$`

where the first matrix is `$\mathbf{X}^\top\mathbf{X}$`.

]]
.column.bg-brand-gray[.content[

```r
X <- model.matrix(~ Dose,data=caffeine,contrasts.arg=
 list(Dose=contrasts(caffeine$Dose,contrasts=F)))
t(X) %*% X
```

```
              (Intercept) DoseHigh DoseLow DoseZero
  (Intercept)          30       10      10       10
  DoseHigh             10       10       0        0
  DoseLow              10        0      10        0
  DoseZero             10        0       0       10
```

```r
all.equal(det(t(X) %*% X), 0)
```

```
  [1] TRUE
```
* The first column of `$\mathbf{X}^\top \mathbf{X}$` is the sum of remaining columns. Hence `$\mathbf{X}^\top \mathbf{X}$` is .brand-blue[singular] and its .brand-blue[inverse does not exist].
* Hence equations (1)-(t+1) and so `$\hat{\boldsymbol{\beta}}$` do **not** have unique solution.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Constraints
]]
.row[.split-40[
.column[.content[

To solve the least squares equations we need to impose a further condition.

* .brand-blue[Sum constraint]: (i) `$\sum_{i=1}^t n_i \hat\alpha_i = 0$`, or
* .brand-blue[Treatment constraint]: (ii) `$\hat\alpha_1 =0$`

Parameter estimates are different under different constraints!
* From (i), Eqn (1) and (i+1): 
`$n \hat \mu+0 = Y_{\bullet\bullet} \Rightarrow \color{blue}{\hat{\mu} = \bar{Y}_{\bullet\bullet}}$` and 
`$n_i\bar{Y}_{\bullet\bullet}+n_i \hat \alpha_i = n_i \bar Y_{i \bullet} \Rightarrow \color{blue}{\hat{\alpha}_i = \overline{Y}_{i\bullet} - \overline{Y}_{\bullet\bullet}}$`
* From (ii), Eqn (2) and (i+1):: 
`$n_1 \hat{\mu} + n_1 0 = n_1 \bar{Y}_{1\bullet} \Rightarrow \color{blue}{\hat{\mu}=\bar{Y}_{1\bullet}}$` and 
`$n_i \bar{Y}_{1\bullet} + n_i\hat\alpha_i = n_i \overline{Y}_{i\bullet} \Rightarrow \color{blue}{\hat\alpha_i = \overline{Y}_{i\bullet} - \overline{Y}_{1\bullet}},$` 
`$i=2,\ldots t.$`

For both constraints, `$\hat{\mu}+\hat\alpha_i = \overline{Y}_{i\bullet}$`.

]]
.column.bg-brand-gray[.content[

```r
#treatment constraint with different parametrisation
broom::tidy(lm(Taps ~ Dose, data=caffeine)) 
```

```
           term estimate std.error  statistic      p.value
  1 (Intercept)    248.3 0.7047458 352.325610 5.456621e-51
  2     DoseLow     -1.9 0.9966611  -1.906365 6.729867e-02
  3    DoseZero     -3.5 0.9966611  -3.511725 1.584956e-03
```

```r
broom::tidy(lm(Taps ~ Dose - 1, data=caffeine)) 
```

```
        term estimate std.error statistic      p.value
  1 DoseHigh    248.3 0.7047458  352.3256 5.456621e-51
  2  DoseLow    246.4 0.7047458  349.6296 6.713935e-51
  3 DoseZero    244.8 0.7047458  347.3593 8.004821e-51
```

```r
c(m,m[2]-m[1],m[3]-m[1]) #mean, diff of Low and Zero from High
```

```
   High   Low  Zero   Low  Zero 
  248.3 246.4 244.8  -1.9  -3.5
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Contrasts
]]
.row[.split-90[
.column[.content[

* Contrasts are functions of the `$\alpha_i$` that have the same estimates .brand-blue[regardless of the constraint used].

* To test `$H_0: \alpha_1+\alpha_2=2 \alpha_3$` or `$c=\alpha_1+\alpha_2-2 \alpha_3=0$` under `$H_0$`, we have `$(c_1,c_2,c_3)=(1,1,-2)$` with a sum of 0. In general, the .brand-blue[contrast function] is 
`$$c = \sum_{i=1}^t c_i \alpha_i, \hspace{3mm} \mbox{where} \hspace{3mm} \color{blue}{\sum_{i=1}^t c_i =0}.$$`   
* Under .brand-blue[sum] constraint (i): `$\sum_{i=1}^t n_i \hat\alpha_i = 0$`,
`$$\hat{c} = \sum_{i=1}^tc_i\hat{\alpha}_i = \sum_{i=1}^t c_i (\bar{Y}_{i\bullet}-\bar Y_{\bullet \bullet})= \sum_{i=1}^t c_i \bar{Y}_{i\bullet}-\bar Y_{\bullet \bullet}\color{blue}{\sum_{i=1}^t c_i}= \sum_{i=1}^t c_i \bar{Y}_{i\bullet}.$$`
* Under .brand-blue[treatment] constraint (ii): `$\hat{\alpha}_1 = 0$`, 
`\begin{eqnarray*}
\hat{c} & = &\sum_{i=1}^tc_i\hat{\alpha}_i = c_1\cdot 0 + \sum_{i=2}^t c_i(\bar{Y}_{i\bullet} - \bar{Y}_{1\bullet}) = \sum_{i=2}^t c_i\bar{Y}_{i\bullet} - \bar{Y}_{1\bullet}\color{blue}{\underbrace{\sum_{i=2}^t c_i}_{= - c_1}} = \sum_{i=1}^t c_i \bar{Y}_{i\bullet}.
\end{eqnarray*}`
]]
.column[.content[
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# RSS for alternative model or within treatment SS
]]
.row[.split-90[
.column[.content[

* However, the .brand-blue[residual sum of squares] under `$H_1$`, is well defined as
`\begin{eqnarray*}
\text{RSS}_{H_1} &=& \sum_{i=1}^{t}\sum_{j=1}^{n_i} \Big(Y_{ij} - \widehat Y_{ij}\Big)^2 = \sum_{i=1}^{t}\sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 
\end{eqnarray*}`
since `$\widehat Y_{ij}=\hat \mu+\hat \alpha_i=\bar Y_{i \bullet}$` for both constraints. Note that `$\text{RSS}_{H_1}$` is sometimes called the .brand-blue[within treatment sum of squares (WSS)] asit refers to the variability within each  treatment not captured by treatment effect.
* For the `$i$`th sample, 
`$$\sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \sim  \sigma^2 \chi^2_{n_i -1}.$$` 
and each sample is independent. 
* Since the sum of independent `$\chi^2$` random variables is `$\chi^2$` with df as the sum of the df's,
`$$\sum_{i=1}^t \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \sim  \sigma^2 \chi^2_{n -t}.$$` 
]]
.column[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Variance of contrasts
]]
.row[.split-two[
.column[.content[

`\begin{eqnarray*}
Var\Big(\sum_i c_i \hat\alpha_i\Big) &=& Var\Big(\sum_i c_i \bar{Y}_{i\bullet}\Big) \\
 &=& \sum_i c^2_i \left(\frac{\sigma^2}{n_i}\right) \\
 &=& \sigma^2 \sum_{i=1}^t(c_i^2/n_i). 
\end{eqnarray*}`

This is a very important equation and you should make sure that you fully understand it!

]]
.column.bg-brand-gray[.content[

### Caffeine: Variance of pairwise contrast

Consider the contrast `$\alpha_1-\alpha_2$`, `$c_1=1$`, `$c_2=-1$` and `$c_3 = 0$`. This contrast looks at the difference of the means of the zero and low dose group:
`$$Var(\bar{Y}_{1\bullet}- \bar{Y}_{2\bullet} ) = \frac{\sigma^2}{10} + \frac{\sigma^2}{10}.$$`
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Confidence interval for contrasts
]]
.row[.split-two[
.column[.content[

Construction of a  `$100(1-\alpha)\%$` CI for `$c$`:
`$$\sum_{i=1}^t c_i \bar{Y}_{i\bullet} \pm t_{n-t}(1-\alpha/2)\hat{\sigma}\sqrt{\sum_{i=1}^t \frac{c_i^2}{n_i}},$$`
where `$\hat{\sigma}^2 = \text{RSS}/(n-t)$`.

The common contrasts relate to *pairwise comparisons*, e.g. Treatment 1 vs Treatment 2 or Treatment 1 vs the average of Treatments 2,3,4.

]]
.column.bg-brand-gray[.content[
### Caffeine: CI for pairwise contrast

To contrast `$\alpha_1-\alpha_2$` between High and Low, we get
* `$\hat c=\sum_{i=1}^t c_i \bar{Y}_{i\bullet} = \bar{Y}_{1\bullet}- \bar{Y}_{2\bullet} = 248.3-246.4=1.9$`, 
* `$t_{27}(0.975) = 2.05, \hspace{3mm} (n=30, p=3)$`, 
* `$\sqrt{\sum_{i=1}^t(c_i^2/n_i)} = \sqrt{1/10+1/10}=1/\sqrt{5}$` and 
* `$\hat\sigma = \sqrt{134.1/27} = 2.23$`.

Thus, the 95% CI for `$\alpha_1 - \alpha_2$` becomes,
$$
1.9 \;  \pm \; 2.05 \times 2.23/\sqrt{5} =1.9 \;  \pm \; 2.05. 
$$
* The difference from the two-sample CI is the estimator `$\hat \sigma$` for the residual se `$=\sqrt{RSS/(n-p)}$` such that .brand-blue[all samples] not only the samples from the pair are used.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# RSS for null-model or between treatment SS
]]
.row[.split-two[
.column[.content[

* Typically the primary question of interest for a one-way ANOVA model
is this: "Is the mean response dependent on the treatment (factor)?"

`$$H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_t ~~~~~\mbox{vs}~~~~~
H_1: \exists i_1\neq i_2 \text{ s.t. }\alpha_{i_1} \neq \alpha_{i_2}$$`

* Under `$H_0$` it follows that
`$$Y_{ij} \sim NID(\mu,\sigma^2)$$`
and the null-model is
`$$Y_{ij} = \mu +\epsilon_{ij} \;\Rightarrow\; \text{RSS}_{H_0} = \sum_{i=1}^t \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{\bullet\bullet})^2$$`
since `$\widehat Y_{ij}=\hat \mu=\bar{Y}_{\bullet\bullet}$` for both constraints.
* `$\text{RSS}_{H_0}$` is called the .brand-blue[total sum of squares] (TotSS) for `$Y$` if `$H_0$` is a null model.

]]
.column.bg-brand-gray[.content[
### The between treatment sum of squares

`\begin{align*}
&\text{RSS}_{H_0} - \text{RSS}_{H_1} \\
&\quad= \sum_{i=1}^t \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{\bullet\bullet})^2 - \sum_{i=1}^{t}\sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \\
&\quad= \sum_{i=1}^t n_i (\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2.
\end{align*}`

* The between treatment sum of squares measures how much larger the RSS is under `$H_0$` compared to `$H_1$`. 
* In multi-way ANOVA the term .brand-blue[treatment sum of squares] (TSS) is more commonly used for the .brand-blue[between sum of squares] (BSS).

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# F-test for treatment effect
]]
.row[.split-two[
.column[.content[

To test `$H_0$` vs `$H_1$` use
`$$f_0 = \frac{(\text{RSS}_{H_0} - \text{RSS}_{H_1})/(t-1)}{\text{RSS}_{H_1}/(n-t)} = \frac{\text{BSS}/(t-1)}{\text{WSS}/(n-t)},$$`
`$$p=P(F_{t-1,n-t}\geq f_0).$$`

Numerator df = `$(n-1) - (n-t) = (t-1)$`.

If, as is often the case, the data are not consistent with `$H_0$` then we need to examine contrasts to try to understand how the treatments differ.

]]
.column.bg-brand-gray[.content[

This tests full model against null model. Hence with respect to regression parameters, the null model is empty.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Caffeine: Omnibus F-test
]]
.row[.content[
* The test that `$H_0: \alpha_1 = \alpha_2 = \alpha_3$` is rejected.
* Reason: `$f_0 = 6.1812$` and because `$F_{2,27}(0.95) = 3.35$` it follows that the `$p$`-value for this test is less than `$0.05$`.

```r
M1 <- lm(Taps ~ Dose, data=caffeine)
anova(M1)
```

]]