Multiple Comparisons

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Multiple Comparisons
## .black[STAT3022 Applied Linear Models Lecture 16]

### .black[2020/02/17]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Multiple comparisons
1. Tukey, Scheffe and Bonferroni corrections
1. Data snooping

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Fertilizer
]]
.row[.split-30[
.column[.content[

* 5 treatments 
* 6 replications each

]]
.column.bg-brand-gray[.content[

```r
skimr::skim(dat)
```

```
 Skim summary statistics
 n obs: 30 
 n variables: 3 
 
 -- Variable type:factor ----------------------------------------------------------------------------------------------------
 variable missing n n_unique ordered
 Replicate 0 30 6 FALSE
 Trt 0 30 5 FALSE
 
 -- Variable type:numeric ---------------------------------------------------------------------------------------------------
 variable missing complete n mean sd p0 p25 p50 p75 p100 hist
 Y 0 30 30 10 4.81 2 7 9.5 12 22 <U+2583><U+2586><U+2586><U+2587><U+2583><U+2581><U+2581><U+2582>
```

```r
round(c(tapply(Y,Trt,mean),mean(Y)),3)
```

```
       0      A      B      C      D        
   5.833 11.500 15.500  7.000 10.167 10.000
```
B and 0 have the largest difference and next come B and C.
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# All pairwise differences
]]
.row[.split-two[
.column.bg-brand-gray[.content[

* When no single group is "special" or notable, we can consider each pairwise difference as a contrast of interest.

* In this case,
  * a t-statistic can be constructed for each pairwise difference;
  * a t-based confidence interval can be constructed for each pairwise population difference (contrast).

* Let us focus on confidence intervals and revise the definition.

.bottom_abs.width100.gray[
These lecture slides make grateful use of Michael Stewart's notes. 
]
]]
.column[.content[
### Recall confidence interval

* A 95% CI for the population mean of 0 is constructed repeatedly from 50 samples.
* We expect 95% of these CIs contain 0. In this run, 48/50 or 96% of these CIs contain 0.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Confidence interval for pairwise differences
]]
.row[.split-two[
.column[.content[

```r
library(emmeans)
M1 <- lm(Y ~ Trt, data = dat)
plot(pairs(emmeans(M1, "Trt"), adjust="none")) + 
 theme_bw(base_size=18) + 
 geom_vline(xintercept=0, size=2, color="red")
```

![](lecture16_2020JC_files/figure-html/unnamed-chunk-5-1.png)

]]
.column.bg-brand-gray[.content[

### Construct a 95% CI for the contrast C - D

```r
c(mean(dat$Y[dat$Trt=="C"]),mean(dat$Y[dat$Trt=="D"]))
```

```
  [1]  7.00000 10.16667
```

```r
c(sigma(M1), qt(0.975, M1$df.residual),M1$df.residual)
```

```
  [1]  3.570247  2.059539 25.000000
```
* So for 95% CI we have

`$$\hat{\alpha}_C - \hat{\alpha}_D \pm t_{n - t, 0.975}\times SE(\hat{\alpha}_C - \hat{\alpha}_D)$$`

`$$(7 - 10.1667) \pm 2.0595 \times 3.5702 \times \sqrt{\frac{1}{6} + \frac{(-1)^2}{6}}$$`
$$(-7.41, 1.08) $$
* This includes 0 so the difference is not significant.
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# 95% CIs for pairwise differences (unadjusted)
]]
.row[.split-two[
.column[.content[

```r
confint(pairs(emmeans(M1, "Trt"), adjust="none"))
```

```
   contrast estimate   SE df lower.CL upper.CL
   0 - A       -5.67 2.06 25   -9.912   -1.421
   0 - B       -9.67 2.06 25  -13.912   -5.421
   0 - C       -1.17 2.06 25   -5.412    3.079
   0 - D       -4.33 2.06 25   -8.579   -0.088
   A - B       -4.00 2.06 25   -8.245    0.245
   A - C        4.50 2.06 25    0.255    8.745
   A - D        1.33 2.06 25   -2.912    5.579
   B - C        8.50 2.06 25    4.255   12.745
   B - D        5.33 2.06 25    1.088    9.579
   C - D       -3.17 2.06 25   -7.412    1.079
  
  Confidence level used: 0.95
```

So individually the significantly different pairs are 
.pull-left[
* B & D, 
* B & C, 
* A & C, 
]
.pull-right[
* 0 & D, 
* 0 & B,  
* 0 & A.

]

]]
.column.bg-brand-gray[.content[
<img src="lecture16_2020JC_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" />

* However, we construct each of these `${5 \choose 2}=10$` CIs .brand-blue[without taking any regard of the others].

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Controlling the family-wise error rate
]]
.row[.split-40[
.column[.content[

* Each interval is constructed using a procedure so that when the model is correct,
the "probability" that the correct population contrast is covered is 0.95. . . individually.

* But, what is the "probability" that all intervals cover their corresponding true values
simultaneously?

### Notations

* Let `$A_1, A_2, ..., A_{10}$` denote the events where each of the 10 intervals above cover the
corresponding "true" value.

* Then, under our model, we have
`$$P(A_1) = P(A_2) = ... = P(A_{10}) = 0.95.$$`

]]
.column.bg-brand-gray[.content[

* What is `$P(A_1 \cap A_2 \cap ...\cap A_{10})$`?

* This is a bit hard, but we can derive a lower bound a bit more easily using the
relation

`$$(A_1 \cap A_2 \cap ...\cap A_{10})^c = A_1^c \cup A_2^c \cup ... \cup A_{10}^c$$`
* Recall that `$P(A \cup B) \leq P(A) + P(B)$`.

`\begin{eqnarray*}
1 - P(A_1 \cap A_2 \cap ... \cap A_{10}) &=& P\left((A_1 \cap A_2 \cap ... \cap A_{10})^c\right)\\
&=& P\left(A_1^c \cup A_2^c \cup ... \cup A_{10}^c\right)\\
&\leq & P(A_1^c) + P(A_2^c) + ... + P(A_{10}^c)\\
&=& 10\times 0.05 = 0.5 \\
\Rightarrow P(A_1 \cap A_2 \cap ... \cap A_{10}) & \geq & 1-10 \times 0.05 = 0.5
\end{eqnarray*}`

* Therefore `$P(A_1 \cap A_2 \cap ...\cap A_{10}) \geq 0.5$`.

* The simultaneous coverage probability of all 10 intervals is therefore at least 50%.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Bonferroni Procedure
]]
.row[.split-50[
.column[.content[
* To ensure a higher simultaneous coverage, the individual coverage should also be higher, eg `$1 - 0.05 / 10 = 0.995$`. Then
`$$P(A_1 \cap A_2 \cap ...\cap A_{10}) \geq 1-10(1-0.995)=0.95!$$`

* For each individual `$100(1-\alpha)$`% CI, the critical value is `$t_{n-t, 1 - \alpha/(2m)}$` where `$m$` is the number of pairwise comparisons.

* So for the Bonferroni adjusted 95% CI we have

`$$\hat{\alpha}_C - \hat{\alpha}_D \pm \color{red}{t_{n - t, 0.9975}} \times \text{SE}(\hat{\alpha}_C - \hat{\alpha}_D)$$`

`$$(7 - 10.1667) \pm \color{red}{3.0782} \times 3.5702 \times \sqrt{\frac{1}{6} + \frac{(-1)^2}{6}}$$`
$$(-9.51, 3.18) $$
which is wider and only BC & OB are significant.
]]
.column.bg-brand-gray[.content[

```r
confint(pairs(emmeans(M1, "Trt"),
              adjust="bonferroni")) 
```

```
 contrast estimate SE df lower.CL upper.CL
 0 - A -5.67 2.06 25 -12.01 0.678
 0 - B -9.67 2.06 25 -16.01 -3.322
 0 - C -1.17 2.06 25 -7.51 5.178
 0 - D -4.33 2.06 25 -10.68 2.012
 A - B -4.00 2.06 25 -10.35 2.345
 A - C 4.50 2.06 25 -1.85 10.845
 A - D 1.33 2.06 25 -5.01 7.678
 B - C 8.50 2.06 25 2.15 14.845
 B - D 5.33 2.06 25 -1.01 11.678
 C - D -3.17 2.06 25 -9.51 3.178
 
 Confidence level used: 0.95 
 Conf-level adjustment: bonferroni method for 10 estimates
```
]
<img src="lecture16_2020JC_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" />

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Tukey's Procedure
]]
.row[.split-two[
.column[.content[
* The Bonferroni adjustment which fulfills the lower bounds may not be exact.
* John Tukey derived the **exact multiplier** needed for simultaneous confidence intervals for
all pairwise comparisons when the .brand-blue[sample sizes] for each group are equal*.
* It was later shown that when the sample sizes are unequal, Tukey's procedure is
.brand-blue[conservative], thus yielding valid simultaneous intervals that may be .brand-blue[narrower] than those
using the Bonferroni method.
* Tukey named his method .brand-blue["Honest Significant Differences (HSD)"] and use this distribution to replace the t distribution in the CI.

]]
.column.bg-brand-gray[.content[

* Suppose `$\bar Y_{1\bullet},\ldots,\bar  Y_{t\bullet}$` distributed as `$NID(\mu,\sigma^2/n_i)$`.
* `$\hat{\sigma}^2$` is estimated with `$\nu$` degrees of freedom.
* Then, the ratio
`$$\left(\max_{1\leq i\leq t}\bar{Y}_{i\bullet} - \min_{1\leq i\leq t}\bar{Y}_{i\bullet}\right)/\hat{\sigma}$$` is called the .brand-blue[studentized range].
* The `$(1-\alpha)$` quantile of the distribution of studentised range is denoted by `$\text{HSD}_{\nu}(1-\alpha; t)$`. 
* This multiplier `$\text{HSD}_{\nu}(1-\alpha; t)$` is calculated in R as `qtukey(1-alpha, t, df)`
* A `$(1-\alpha)100$`% Tukey confidence interval is given by

`$$(\hat{\alpha}_{i} - \hat{\alpha}_{i'}) \pm  \color{red}{\frac{\text{HSD}_{n-t}(1-\alpha; t)}{\sqrt{2}}} \times \text{SE}(\hat{\alpha}_{i} - \hat{\alpha}_{i'})$$`

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Tukey's Honest Significant Differences
]]
.row[.split-two[
.column[.content[

```r
nlevels(dat$Trt)
```

```
  [1] 5
```

```r
nrow(dat)
```

```
  [1] 30
```

```r
qtukey(1 - 0.05, 5, 25)
```

```
  [1] 4.153363
```

`$$(7 - 10.1667) \pm \color{red}{\frac{4.1534}{\sqrt{2}}} \times 3.5702 \times \sqrt{\frac{1}{6} + \frac{(-1)^2}{6}}$$`
`$$(-9.22, 2.89)$$`
which is narrower than the Bonferroni adjusted CI `$(-9.51, 3.18)$` and again, only BC & OB are significant.
]]
.column.bg-brand-gray[.content[

```r
confint(pairs(emmeans(M1, "Trt"),
              adjust="tukey")) 
```

```
 contrast estimate SE df lower.CL upper.CL
 0 - A -5.67 2.06 25 -11.72 0.387
 0 - B -9.67 2.06 25 -15.72 -3.613
 0 - C -1.17 2.06 25 -7.22 4.887
 0 - D -4.33 2.06 25 -10.39 1.720
 A - B -4.00 2.06 25 -10.05 2.054
 A - C 4.50 2.06 25 -1.55 10.554
 A - D 1.33 2.06 25 -4.72 7.387
 B - C 8.50 2.06 25 2.45 14.554
 B - D 5.33 2.06 25 -0.72 11.387
 C - D -3.17 2.06 25 -9.22 2.887
 
 Confidence level used: 0.95 
 Conf-level adjustment: tukey method for comparing a family of 5 estimates
```
]
<img src="lecture16_2020JC_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" />

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# The "most significant sample contrast"
]]
.row[.split-70[
.column[.content[

* It can be shown (using the Cauchy-Schwarz inequality) that the "most significant sample
contrast" (which is data dependent) is given by choosing the coefficients according to 
`$$c_i = n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet}).$$`
* The corresponding contrast is given by (since `$\sum_{i=1}^t n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})\bar{Y}_{\bullet\bullet}=0$`)
`$$\hat{c} = \sum_{i=1}^t c_i \bar{Y}_{i\bullet} = \sum_{i=1}^t n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})\bar{Y}_{i\bullet}=\sum_{i=1}^t n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2.$$`
* An underestimate of `$\text{SE}(\hat c)$` assuming `$c_i$` are .brand-blue[known (not random)] is 
`$$\widehat{\text{S}}\text{E}(\hat{c})=\sqrt{\widehat{\text{V}}\text{ar}\left(\sum_{i=1}^t n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})\bar{Y}_{i\bullet}\right)}=\sqrt{\sum_{i=1}^t n_i^2(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2\widehat{\text{V}}\text{ar}(\bar{Y}_{i\bullet})}$$`
`$$=\sqrt{\sum_{i=1}^t n_i^2(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2 \frac{\hat \sigma^2}{n_i}}=\hat{\sigma}\sqrt{\sum_{i=1}^t n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2}\hspace{34mm}$$`

]]
.column.bg-brand-gray[.content[

So the test statistic for testing `$H_0: c=0$` is

`\begin{eqnarray*}
t &=& (\hat c-0)/\widehat{\text{S}}\text{E}(\hat{c}) \\
  &=&\sqrt{\frac{\color{blue}{\sum_{i=1}^t n_i(\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2}}{\color{red}{\hat{\sigma}^2}}}\\
 &=&\sqrt{(t-1)\frac{\color{blue}{\text{TSS}}/(t-1)}{\color{red}{\text{RSS}/(n-t)}}}\\
 &=&\sqrt{(t-1)F_{t-1, n-t}}.
\end{eqnarray*}`

Hence the multiplier

`$$\sqrt{(t-1)F_{t-1, n-t}(1-\alpha)}$$`

is chosen to construct the Scheffe simultaneous CI (again to replace t quantile).

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Scheffe adjustment
]]
.row[.split-two[
.column[.content[

```r
sqrt(4 * qf(0.95, 4, 25))
```

```
  [1] 3.321873
```

* So for the Scheffe adjusted 95% CI we have

`$$\hat{\alpha}_C - \hat{\alpha}_D \pm {\color{red}{\sqrt{(t-1)F_{t- 1, n - t}(0.95)}}}\times \widehat{\text{S}}\text{E}(\hat{\alpha}_C - \hat{\alpha}_D)$$`

`$$(7 - 10.1667) \pm \color{red}{3.3219} \times 3.5702 \times \sqrt{\frac{1}{6} + \frac{(-1)^2}{6}}$$`
$$(-10.01, 3.68) $$
which is wider than both Bonferroni adjusted CI `$(-9.51, 3.18)$` and Tukey adjusted CI `$(-9.22, 2.89)$` (may due to the underestimation of `$\text{SE}(\hat c)$`).

]]
.column.bg-brand-gray[.content[

```r
confint(pairs(emmeans(M1, "Trt"),
              adjust="scheffe")) 
```

```
 contrast estimate SE df lower.CL upper.CL
 0 - A -5.67 2.06 25 -12.51 1.18
 0 - B -9.67 2.06 25 -16.51 -2.82
 0 - C -1.17 2.06 25 -8.01 5.68
 0 - D -4.33 2.06 25 -11.18 2.51
 A - B -4.00 2.06 25 -10.85 2.85
 A - C 4.50 2.06 25 -2.35 11.35
 A - D 1.33 2.06 25 -5.51 8.18
 B - C 8.50 2.06 25 1.65 15.35
 B - D 5.33 2.06 25 -1.51 12.18
 C - D -3.17 2.06 25 -10.01 3.68
 
 Confidence level used: 0.95 
 Conf-level adjustment: scheffe method with dimensionality 4
```
]
<img src="lecture16_2020JC_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" />

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Conclusion - Multiple Testing
]]
.row[.split-two[
.column[.content[

]]
.column.bg-brand-gray[.content[

* If all pairwise comparisons are of interest: Tukey is superior to Bonferroni.
* If not all pairwise comparisons are of interest: Bonferroni can be better than
Tukey.
* Bonferroni is better than Scheffe when number of contrasts of interest about the
same as the number of factor levels.
* In practice: choose smallest of the three!

]]
]]