Randomized complete block designs

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Randomized complete block designs
## .black[STAT3022 Applied Linear Models Lecture 24]

### .black[2020/02/17]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. More on randomized complete block designs

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Potato Yield
]]
.row[.split-60[
.column[.content[

* An experiement was devised to investigate the effects of four different types of fungicides (labelled A,B,C,D) on the yield of potatoes in field plots.
* An untreated control treatment (labelled 0) was also included to give a baseline comparison.
* In the field designated for the trial, heterogeneity was thought to be present at large scales, but suitable blocks of .brand-blue[five field plots] could be identified and so .brand-blue[four blocks] each of five plots could be used.

What are:

* Treatments: five type of fungicides including control,
* Experimental units: plot
* Observation units: plot

and how many observations are there?

]]

.column.bg-brand-gray[.content[
### A poor design

**(Non-randomised) Design 1**

|**Plot**|**1**|**2**|**3**|**4**|**5**|
|-----|-----|------|-----|-----|------|
|Block 1|A|A|A|A|B|
|Block 2|B|B|0|0|B|
|Block 3|C|C|0|0|C|
|Block 4|D|D|D|D|C|

* Design 1 is poor because difference between any fungicides A and D are .brand-blue[confounded with block difference].
* Indeed, it is .brand-blue[not a complete block design], since not all treatment occur in each block.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - A better design
]]
.row[.split-30[
.column[.content[

**(Non-randomised) Design 2**

|**Plot**|**1**|**2**|**3**|**4**|**5**|
|-----|-----|------|-----|-----|------|
|Block 1|A|A|A|A|0|
|Block 2|B|B|B|B|0|
|Block 3|C|C|C|C|0|
|Block 4|D|D|D|D|0|
 
**(Non-randomised) Design 3**

|**Plot**|**1**|**2**|**3**|**4**|**5**|
|-----|-----|------|-----|-----|------|
|Block 1|A|B|C|D|0|
|Block 2|A|B|C|D|0|
|Block 3|A|B|C|D|0|
|Block 4|A|B|C|D|0|

]]
.column.bg-brand-gray[.content[
* Design 3 is a complete block design as every treatment occurs exactly the same number of times in each block.

### Construction of RCBD's
* To each of the `$b$` blocks .brand-blue[randomly allocate] the `$t$` treatments to the `$t$` plots so there are `$(t!)^b$` possible designs.
* The randomisation offers some .brand-blue[protection against plot location bias].
* Use a random number generator to determine the allocation for each block separately.
* Randomisation offers protection in a probabilistic way in that there is a small chance that treatment will turn up in the same plot in all blocks.
* Over a number of repetitions of the trial every possible arrangement would occur the same number of times.
* It can be good to only allow .brand-blue[good] randomised designs.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Randomised complete block design (RCBD)
]]
.row[.split-50[
.column[.content[

```r
set.seed(1)
replicate(4,sample(c("A","B","C","D","0")))
```

```
       [,1] [,2] [,3] [,4]
  [1,] "A"  "0"  "C"  "B" 
  [2,] "D"  "C"  "0"  "0" 
  [3,] "C"  "D"  "A"  "D" 
  [4,] "0"  "B"  "D"  "C" 
  [5,] "B"  "A"  "B"  "A"
```
Of course, choosing a different seed will get a different RCBD.

```r
set.seed(71)
replicate(4,sample(c("A","B","C","D","0")))
```

```
       [,1] [,2] [,3] [,4]
  [1,] "C"  "A"  "B"  "B" 
  [2,] "D"  "C"  "D"  "C" 
  [3,] "0"  "B"  "A"  "0" 
  [4,] "B"  "D"  "C"  "D" 
  [5,] "A"  "0"  "0"  "A"
```
]]
.column.bg-brand-gray[.content[
### Two-way ANOVA for RCBD
* Assumptions
  * We observe `$t$` treatments in each of the `$b$` blocks.
  * The treatment factor does not interact with the block factor.
  * We model the data as a two-way main effects ANOVA
  `$$Y_{ijk} = \mu + \alpha_i + \beta_j + \epsilon_{ijk},\;  \epsilon_{ijk} \sim NID(0,\sigma^2)$$` 
  with `$i=1,\ldots,t$` and `$j=1,\ldots,b$`.
* Normality assumption needs to be checked.  
* There is no repeats in each treatment block combination; otherwise it can be called a .brand-blue[factorial design] in which .brand-blue[interaction effect] can be modelled.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Potato Yield Data 
]]
.row[.split-70[
.column[.content[

```r
skimr::skim(dat)
```

```r
dat <- dat %>%
 mutate(Block=factor(Block),
 Trt=factor(Trt))
```

]]
.column.bg-brand-gray[.content[

```r
dat
```

```
     Yield Trt Block
  1    375   0     1
  2    409   0     2
  3    500   0     3
  4    332   0     4
  5    528   A     1
  6    602   A     2
  7    604   A     3
  8    535   A     4
  9    636   B     1
  10   599   B     2
  11   650   B     3
  12   570   B     4
  13   645   C     1
  14   710   C     2
  15   665   C     3
  16   504   C     4
  17   626   D     1
  18   550   D     2
  19   561   D     3
  20   668   D     4
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Potato Yield Data 
]]
.row[.split-40[
.column[.content[

```r
dat %>%
 ggplot(aes(Trt,Block,
            fill=Yield,
            label=Trt)) +
 geom_tile(color="black") +
 geom_text(size=5) +
 scale_fill_distiller(palette=2,
                    direction=1)
```

]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Potato Yield Data
]]
.row[.split-50[
.column[.content[

```r
ggplot(dat, aes(Block, Yield)) +
 geom_boxplot() + geom_point(aes(color=Trt),size=3)
```

]]
.column.bg-brand-gray[.content[

```r
ggplot(dat, aes(Trt, Yield)) +
 geom_boxplot() + geom_point(aes(color=Block),size=3)
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Sum of Squares
]]
.row[.split-50[
.column[.content[

```r
dat %>% group_by(Block) %>%
 summarise(sum(Yield)) %>% as.data.frame()
```

```
    Block sum(Yield)
  1     1       2810
  2     2       2870
  3     3       2980
  4     4       2609
```

```r
dat %>% group_by(Trt) %>%
 summarise(sum(Yield)) %>% as.data.frame()
```

```
    Trt sum(Yield)
  1   0       1616
  2   A       2269
  3   B       2455
  4   C       2524
  5   D       2405
```

```r
c(sum(dat$Yield),sum(dat$Yield^2))
```

```
  [1]   11269 6542743
```
]]
.column.bg-brand-gray[.content[

* Total SS = `$S_{yy} = \sum_{i=1}^t \sum_{j=1}^b Y_{ij}^2-\frac{Y_{\bullet \bullet}^2}{n}$` with `$(n-1)$` df;

$$ S_{yy} = 6542743-\frac{11269^2}{20}=193225 $$
* `$\text{SS}_{\text{Block}} = \sum_{j=1}^b \frac{ Y_{\bullet j}^2}{t}-\frac{Y_{\bullet \bullet}^2}{n}$` with `$(b-1)$` df;

`$$\text{SS}_{\text{Block}} = \frac{ 2810^2+\dots+2609^2}{5}-\frac{11269^2}{20}=14538$$`
* `$\text{SS}_{\text{Trt}} = \sum_{i=1}^t \frac{ Y_{i\bullet}^2}{t}-\frac{Y_{\bullet \bullet}^2}{n}$` with `$(t-1)$` df;

$$\text{SS}_{\text{Trt}} = \frac{1616^2+\dots+2405^2}{4}-\frac{11269^2}{20}=135843 $$
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Omnibus hypothesis
]]
.row[.split-50[
.column[.content[
* We want to test

`$$H_0: \alpha_A=\alpha_B=\alpha_C=\alpha_D=\alpha_0$$`

i.e. all treatment (including control) have the same effect.

* It is good statistical practice to show the block before the treatment factor in the ANOVA table:

Source  | Df | Sum Sq | Mean Sq  
------------- | ------------- | ------------- | ------------- 
Block | `$b-1$` | `$\text{SS}_{\text{Block}}$`  | `$\text{SS}_{\text{Block}}/(b-1)$`    
Treatment | `$t-1$` | `$\text{SS}_{\text{Trt}}$`  | `$\text{SS}_{\text{Trt}}/(t-1)$`    
Residual | `$(b-1)(t-1)$` | `$\text{RSS}$`  | `$\text{RSS}/\text{Df}_{Res}$`  
Total       | `$n-1$` | `$\text{S}_{yy}$` |

]]
.column.bg-brand-gray[.content[

```r
anova(lm(Yield ~ Block + Trt, data=dat)) #block 1st
```

```
  Analysis of Variance Table
  
  Response: Yield
            Df Sum Sq Mean Sq F value   Pr(>F)   
  Block      3  14538    4846  1.3573 0.302600   
  Trt        4 135843   33961  9.5119 0.001057 **
  Residuals 12  42844    3570                    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
anova(lm(Yield ~ Trt + Block, data=dat)) #=, bal&com
```

```
  Analysis of Variance Table
  
  Response: Yield
            Df Sum Sq Mean Sq F value   Pr(>F)   
  Trt        4 135843   33961  9.5119 0.001057 **
  Block      3  14538    4846  1.3573 0.302600   
  Residuals 12  42844    3570                    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Why not a one-way ANOVA
]]
.row[.split-50[
.column[.content[
* Testing for a treatment effect (having adjusted for the blocks), we have `$p$`-value `$<0.05$`.
* Thus there are significant differences in the mean treatment effect after adjusting for the block.
* If we ignore the blocking factor in the anaysis then the error variance estimate in the one-way ANOVA is

`$$\frac{\text{RSS}+\text{SS}_{\text{Block}}}{\text{df}_{\text{RSS}}+\text{df}_{\text{Block}}}=\frac{42844+14538}{12+3}=3825.5$$`
whereas in the two-way ANOVA `$\hat{\sigma}^2=3570$`.
* Adding the SS for block, the RSS for one-way ANOVA is higher though at higher df. So the differences between blocks would mask the treatment effect if we did not use a design incorporating the blocks.
 
]]
.column.bg-brand-gray[.content[

```r
anova(lm(Yield ~ Block + Trt, data=dat)) #2-way ANOVA
```

```r
anova(lm(Yield ~ Trt, data=dat)) #1-way ANOVA
```

```
  Analysis of Variance Table
  
  Response: Yield
            Df Sum Sq Mean Sq F value    Pr(>F)    
  Trt        4 135843   33961  8.8775 0.0006963 ***
  Residuals 15  57382    3825                      
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Pairwise differences: Calculating CI
]]
.row[.split-50[
.column[.content[
`\begin{eqnarray*}
\text{E}(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet})&=&\text{E}\left(\mu+\alpha_i+\frac{1}{b}\sum_{j=1}^b \beta_j+\bar{\epsilon}_{i\bullet}\right)-\\
& & \text{E}\left(\mu+\alpha_k+\frac{1}{b}\sum_{j=1}^b \beta_j+\bar{\epsilon}_{k\bullet}\right)\\
&=&\alpha_i-\alpha_k
\end{eqnarray*}`

* since `$\text{E}(\bar{\epsilon}_{i\bullet})=0$`. Hence `$\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet}$` is an unbiased estimator for `$\alpha_i-\alpha_k$`.

* Also because `$\mu$`, `$\alpha_i$` and `$\beta_j$` are fixed, 
`\begin{eqnarray*}
\text{Var}(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet})&=&\text{Var}(\bar{\epsilon}_{i\bullet}-\bar{\epsilon}_{k\bullet})\\
&=& \frac{\sigma^2}{b}+\frac{\sigma^2}{b}=\frac{2\sigma^2}{b} 
\end{eqnarray*}`

* So `$(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet}) \sim N\left(\alpha_i-\alpha_k,\frac{2\sigma^2}{b}\right)$`.

]]
.column.bg-brand-gray[.content[

* We estimate `$\sigma^2$` using the residual mean squares.
* Thus a `$(1-\alpha)$` 100% CI for `$\alpha_i-\alpha_k$` is

`$$(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet})\pm \color{blue}{t_{(t-1)(b-1)}(1-\alpha/2)}\sqrt{\frac{2\hat{\sigma}^2}{b}}$$`
* Change the critical value to `$\color{blue}{t_{(t-1)(b-1)}(1-\alpha/2m)}$` where the number of pairs `$m={t \choose 2}$` if one corrects for multiple testinig as per Lecture 16.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Pairwise Confidence Interval
]]
.row[.split-50[
.column[.content[

```r
library(emmeans)
M1 <- lm(Yield ~ Block + Trt, data=dat)
EM1 <- emmeans(M1, "Trt")
confint(pairs(EM1, adjust="none"))
```

```
   contrast estimate   SE df lower.CL upper.CL
   0 - A      -163.2 42.3 12   -255.3    -71.2
   0 - B      -209.8 42.3 12   -301.8   -117.7
   0 - C      -227.0 42.3 12   -319.1   -134.9
   0 - D      -197.2 42.3 12   -289.3   -105.2
   A - B       -46.5 42.3 12   -138.6     45.6
   A - C       -63.8 42.3 12   -155.8     28.3
   A - D       -34.0 42.3 12   -126.1     58.1
   B - C       -17.2 42.3 12   -109.3     74.8
   B - D        12.5 42.3 12    -79.6    104.6
   C - D        29.8 42.3 12    -62.3    121.8
  
  Results are averaged over the levels of: Block 
  Confidence level used: 0.95
```

]]
.column.bg-brand-gray[.content[

```r
EM1 %>%
  pairs(adjust="none") %>%
  plot() + 
  geom_vline(xintercept=0, size=2, color="red")
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Pairwise CI with Multiple Correction
]]
.row[.split-three[
.column[.content[

```r
EM1 %>%
  pairs(adjust="scheffe") %>%
  plot() +
  geom_vline(xintercept=0, size=2,
             color="red") +
  xlim(-400, 200)
```

]]
.column.bg-brand-gray[.content[

```r
EM1 %>%
  pairs(adjust="tukey") %>%
  plot() +
  geom_vline(xintercept=0, size=2,
             color="red") +
  xlim(-400, 200)
```

<img src="lecture24_2020JC_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
]]
.column[.content[

```r
EM1 %>%
  pairs(adjust="bonferroni") %>%
  plot() +
  geom_vline(xintercept=0, size=2,
             color="red") +
  xlim(-400, 200)
```

]]
]]