Nested Designs

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Nested Designs
## .black[STAT3022 Applied Linear Models Lecture 27]

### .black[2020/02/20]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Pseudo-, false-, technical-, biological- replication
2. Skeleton Analysis of Variance
3. Split-Plot Design
4. Nested Designs

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# John Nelder
]]
.row[.split-70[
.column[.content[

* John Nelder (1924-2010) was the head of the statistics
department at Rothamsted Experimental Station from
1968-1984.

* He was the major developer of the statistical program
Genstat which is still in active development by VSN
(although R has largely become the preferred statistical
language)
]]
.column.bg-brand-gray[.content[

<img src="images/JohnNelder.gif" width="100%" height="100%">
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Two factors: crossed vs nested/hierarchical factors
]]
.row[.split-50[
.column[.content[
#### .brand-blue[Complete block design (factor B as block)]

* In the pollen example, A is Depth, B is Sample and 2 replicates.
]]
.column.bg-brand-gray[.content[
#### .brand-blue[Two-stage nested design or hierarchical design]

* Levels of B are similar but not identical for different levels of A;
* .brand-blue[Balanced nested design:] an equal number of levels of B within
each level of A and an equal number of repeats.
* In the pollen example, A is Depth, B is Sample.id and 2 replicates but the 6 samples are not identical, so nested design. 
* Otherwise, CBD.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Recall: Sum of Squares decomposition and ANOVA
]]
.row[.split-50[
.column[.content[
`$$Y_{ijk} = \mu + \alpha_i + (\alpha\beta)_{j(i)} + \epsilon_{ijk},\quad \epsilon_{ijk} \sim NID(0,\sigma^2),$$`

* Absorb `$\text{SS}_{\text{B}}$` into `$\text{SS}_{\text{A:B}}$` to get pooled `$\text{SS}_{\text{A:B}}$` or `$\text{SS}_{\text{B(A)}}$`.

`$$(\alpha\beta)'_{ij}=\color{blue}{\beta_j}+(\alpha\beta)_{ij}\hspace{100mm}$$`
`$$=\color{blue}{\bar{Y}_{\bullet j \bullet}-\bar{Y}_{\bullet \bullet \bullet}}+\bar{Y}_{ij\bullet}-\bar{Y}_{i\bullet \bullet}-\bar{Y}_{\bullet j\bullet}+\bar{Y}_{\bullet \bullet \bullet}=\bar{Y}_{ij\bullet}-\bar{Y}_{i\bullet \bullet}$$`
In R, change all dots for `$i$`, `$j$` or `$ij$` subscripts to 1. 
]]
.column.bg-brand-gray[.content[

* Pooled B(A): df `$(b-1)+(a-1)(b-1)=a(b-1)$`

* Same 2-way ANOVA components but different merging or splitting.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Calf feeding
]]
.row[.split-70[
.column[.content[

* Three feed treatments (B,Y,G) are compared on 24 calves, kept in 6 pens (1 to 6) with four calves per pen. Each feed is applied to 2 whole pens. Each calf is weighed individually.

* Treatments: three feed type. 
 Experimental unit: pen. 
 Observational unit: calf.
* .brand-blue[4 by 6 data structure:] Feed B,Y,G each .brand-blue[nests] pen (1,6), (2,4), (3,5) and 4 repeats under each pen.

]]
.column.bg-brand-gray[.content[

|Source | df |
|:-------|----|
|**Pens Statum** | |
|Feed   | 2  |
|EU Residual | 3 |
|Calves = OU | 18|

* 6 pens have 5 df. So the residual df is 5-2=3 or 3(2-1) for B(A).
* 24 calves has 23 df. Residual df is 23-2-3=18 or 2(3)(4-1).
* .brand-blue[Replication is 2] (2 pen each).
* .brand-blue[Repetition is 2(4)=8].

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Theory: Pseudo-replication 
]]
.row[.content[

.brand-blue[Pseudo-replication] or .brand-blue[false replication] describes a situation in which multiple
measurements are taken from each experimental unit.

### Technical & biological replication

In life-sciences, it is often important to make a distinction between technical and biological replication.

* .brand-blue[Technical replication] occurs when several measurements are taken from the same biological material. Technical replication is always a pseudo-replication, eg `$k=4$` replicates in each pen.
* .brand-blue[Biological replication] occurs when measurements are taken from several independent biological subjects, eg `$j=2$` pen for each feed.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# More examples of nested factor
]]
.row[.content[
### .brand-blue[Drug company to study product stability across sites]

* Two manufacturing sites ( `$t=2$` )
* Three batches from each site ( `$b=3$`; 6 batches from different sites)
* Ten tablets from each batch ( `$r=10$` )

* Clearly the 6 batches are not identical across sites.
* Technical replication is the 10 tablets from each batch (30 repetitions). 
* Biological replication is the 3 batches from each site (3 replications).
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Calf feeding (continued)

]]
.row[.split-80[
.column[.content[

### Hypothesis Testing

`$$Y_{ijk} = \mu + \alpha_i + (\alpha\beta)_{j(i)} + \epsilon_{ijk},\quad \epsilon_{ijk} \sim NID(0,\sigma^2), \ i=1,2,3, \ j=1,2, \ k=1,2,3,4$$`
where

`$\alpha_i$` is the `$i$`-th level main effect for feed treatment; 
`$(\alpha\beta)_{ij}=(\alpha\beta) _{j(i)}$` is the (random) pen effect, and 
`$\epsilon_{ijk}$` is the (OU) error

* To test `$H_0: \alpha_1=\alpha_2=\alpha_3$`, we can use the test statistic `$\text{MS(Feed)/MS(Residual)}$`. 
* Which `$\text{MS(Residual)}$`? Some people wrongly compare `$\text{MS(Feed)}$` to `$\text{MS(OU Residual)}$`
* Now we consider splitting of RSS into different components: 
.brand-blue[EU residuals (groups):] `$a(b-1)$` df out of `$ab-1$` df for EU groups and 
.brand-blue[OU residuals:] `$ab(r-1)$` df out of `$abr-1$` df for OU.

]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: grafting on animals
]]
.row[.split-60[
.column[.content[

* A surgeon was going to use .brand-blue[nine animals], of the same species, in an experiment.
* He wanted to compare .brand-blue[three methods of grafting skin].
* He intended to use .brand-blue[three animals for each method].
* After the graft was complete he would take a sample of new skin from each animal.
* He would then cut each sample into .brand-blue[20 (tiny) pieces] and use a precision instrument to measure the thickness of each piece.
* Treatments? Methods of grafting skin 
Experimental units? Animals 
Observational units? Each piece of skin
* .brand-blue[20 by 9 data structure:] 3 methods, with animals (1,2,3), (4,5,6) and (7,8,9) nested under each and 20 repeats under each animal.

]]
.column.bg-brand-gray[.content[

|Source | df |
|:-------|----|
|**Animal=EU Statum** | |
|  Grafting   | 2  |
|  EU Residual | 6 |
|**Sample Stratum** |  |
|OU Residual | 171 |

`$t=3$` methods are nested under 
`$b=9$` animals (like Sample.id) with `$3(3-1)=6$` or 8-2=6 df. 
`$n=9\times 20=180$` animals have residual df 3(3)(20-1)=171 or 179-2-6=171 df.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - grafting on animals: simulation I 
]]
.row[.content[

```r
set.seed(1)
graft <- factor(rep(LETTERS[1:3], each=3*20))
animal <- factor(rep(LETTERS[1:9], each=20))
anidev <- rnorm(9, 0, 20)
trt <- c(300, 320, 350)
y <- trt[as.numeric(graft)] + anidev[as.numeric(animal)] + rnorm(9*20, 0, 5)
anova(lm(y ~ graft+animal)) #if swap order, graft is absorbed into animal
```

```
 Analysis of Variance Table
 
 Response: y
 Df Sum Sq Mean Sq F value Pr(>F) 
 graft 2 145394 72697 3312.84 < 2.2e-16 ***
 animal 6 28346 4724 215.29 < 2.2e-16 ***
 Residuals 171 3752 22 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
* Notice that F-value for graft is computed as `$\text{MS(graft)/ MS(OU Residual)}$` rather than `$\text{MS(graft)/ MS(animal)}$`!
* In this case the conclusion you draw is the same (small p-value) but what if the treatment effect is not signifcantly different?

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - grafting on animals: simulation II 
]]
.row[.content[

```r
set.seed(1); trt <- c(300, 300, 300)
y <- trt[as.numeric(graft)] + anidev[as.numeric(animal)] + rnorm(9*20, 0, 5)
anova(lm(y ~ graft + animal))
```

```
 Analysis of Variance Table
 
 Response: y
 Df Sum Sq Mean Sq F value Pr(>F) 
 graft 2 13168.6 6584.3 310.21 < 2.2e-16 ***
 animal 6 27857.8 4643.0 218.75 < 2.2e-16 ***
 Residuals 171 3629.5 21.2 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
We get a signifcant p-value for graft here but if we compared with the EU Residual as opposed to the OU Residual we have

```r
1 - pf(6584.3 / 4643.0, 2, 6)
```

```
  [1] 0.3130785
```
which is not signifcant!

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - grafting on animals: the appropriate way 
]]
.row[.split-50[
.column[.content[

```r
summary(aov(y ~ graft + Error(animal)))
```

```
  
  Error: animal
            Df Sum Sq Mean Sq F value Pr(>F)
  graft      2  13169    6584   1.418  0.313
  Residuals  6  27858    4643               
  
  Error: Within
             Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 171   3629   21.23
```
]]
.column.bg-brand-gray[.content[

Residuals are splitted into two sources:

* 9 animals with `$\text{SS}_{\text{B(A)}}=27858$` at `$3(3-1)=6$` df. 
* the usual `$\text{RSS}$` for the 20 pieces .brand-blue[within] each animals at 171 df.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Split-plot designs
]]
.row[.split-60[
.column[.content[
* A standard .brand-blue[split-plot design] consists of two crossed factors with one factor randomly applied to the large units (main plots) and the second factor randomly applied within large units to the small units (subplots). 
* Split-plot design in agriculture refers to fields being splitted into whole-plots and subplots.
* Split-plot design (or more generally nested designs) arise .brand-blue[due to practical constraint] rather than being .brand-blue[better] designs.
* Simplified, but sound advice is to avoid using split-plot designs unless they are essential for practical reasons. 
* Constructing a .brand-blue[skeleton analysis of variance] can help to identify issues with the design.

]]
.column.bg-brand-gray[.content[

<img src="images/SplitPlot.jpg" width="90%" height="90%">
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Irrigation `$\times$` Fertiliser
]]
.row[.content[

Suppose you have 8 plots laid out in a 2 `$\times$` 4 rectangular layout. You want to test the effect of irrigation amount (2 levels, L in white and H in blue) and fertiliser type (A and B) on crop yield.

* .brand-blue[Design 1:] 8 plots. 
* .brand-blue[Design 2:] 4 whole plots (df=3) of different irrigation and each contains 2 subplots of 2 fertilisers. 
* .brand-blue[Design 3:] 2 whole plots (df=1) of 2 irrigations and each contains 4 subplots of different fertilisers.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Skeleton Analysis of Variance
]]
.row[.split-three[
.column[.content[

|	Source | df | 
|:----|----|
|	Fertilizer=Irrigation | 1| 
| Residual | 6|

* 8 plots.

]]
.column.bg-brand-gray[.content[
<img src="images/Plot2.png" width="50%" height="50%">

|Source | df |
|:----|----|
|**Whole plot=EU Stratum** |  |
|    Irrigation  | 1 | 
|    EU residual | 2 |
|**Subplot=OU Stratum**| | 
|    Fertilizer | 1 | 
|    Fertilizer : Irrigation | 1 | 
|    OU residual| 2 |

* 4 wholeplots each 2 subplots.

]]
.column[.content[
<img src="images/Plot3.png" width="50%" height="50%">

|Source | df |
|:----|----|
|**Whole plot=EU Stratum**| | 
|    Irrigation  | 1| 
|    EU residual | 0| 
|**Subplot=OU Stratum**| | 
|	   Fertilizer | 1 | 
|	   Fertilizer : Irrigation | 1 | 
|	   OU residual | 4|

* 2 wholeplots each 4 subplots.

]]
]]

---
class: split-10

.row.bg-brand-blue.white[.content.vmiddle[
# Example: setting wholeplot
]]
.row[.split-three[
.column[.content[

Fertiliser confounded with irrigation.
	
]]
.column.bg-brand-gray[.content[
<img src="images/Plot2.png" width="50%" height="50%">

* Similar to RCBD, there are 4 blocks/wholeplots with different irrigation and arrangements. 
* Each wholeplot has 2 subplots/ fertilisers at different irrigations repeated with AB or BA.
* .brand-blue[2 by 4 data structure:] Fertilisers A and B each .brand-blue[nests] irrigation L and H; the 4 col/wholeplots each has two repeats for AB or BA.

]]
.column[.content[
<img src="images/Plot3.png" width="50%" height="50%">

* Why not divide the whole plot into two with each with 2 fertilisers.

* This is similar to design 2.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Seed growth   
]]
.row[.split-70[
.column[.content[

* Each of 2 cabinets at 10 `$^o$`C and at 20 `$^o$`C, contains 8 seed trays, with 2 watering regimes (A and B), each applied to four trays chosen at random within each cabinet. The experiment was repeated with another 2 cabinets. The average growth of the seeds in each tray are measured.
* Treatment? Watering regimes (A and B) 
 Experimental unit? Cabinet (temp and experiment combinations) 
 Observational unit? Tray
* 2 factors: Water and Temp.
* .brand-blue[8 by 4 data structure:] water A and B each nests temp 10 and 20, each 8 rows, 4 repeats from expt 1 and another 4 repeats from expt 2.
* Source of errors for water (2 by 2 at 8 repeats): 
 for each expt: across cabinet, temp `$\text{SS}_{\text{B}}$` at 2-1=1df, 
 for each expt: across cabinet by tray, water `$\text{SS}_{\text{A}}$` at 2-1=1df, `$\text{SS}_{\text{A:B}}$` at 1df and `$\text{RSS}_1$` at `$ab(r'-1)$`=2(2)(4-1) or 15-1-1-1=12df (4 repeats), and 
 across expts `$\text{RSS}_2$` at 2(8)=16 df. The 2 RSS have `$ab(r-1)$` =2(2)7=28df. 
]]
.column.bg-brand-gray[.content[

<img src="images/Experiment1.png" width="80%" height="90%">
<img src="images/Experiment2.png" width="80%" height="90%">
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
#  Example - Two-way ANOVA factorial design
]]
.row[.split-50[
.column[.content[

```r
dat
```

```
 Water Temp Experiment Cabinet Tray Length
 1 A 10 1 1 1 5.811440
 2 B 10 1 1 2 13.681228
 3 A 10 1 1 3 6.014650
 4 B 10 1 1 4 11.280688
 5 A 10 1 1 5 8.167450
 6 B 10 1 1 6 12.879721
 7 A 10 1 1 7 3.261500
 8 B 10 1 1 8 6.889271
 9 A 20 1 2 1 11.328180
 10 B 20 1 2 2 15.930219
 11 A 20 1 2 3 8.703601
 12 B 20 1 2 4 18.204390
 13 A 20 1 2 5 10.629647
 14 B 20 1 2 6 17.399540
 15 A 20 1 2 7 10.854487
 16 B 20 1 2 8 17.832480
 17 A 10 2 1 1 3.711876
 18 B 10 2 1 2 6.257596
 19 A 10 2 1 3 4.965976
 20 B 10 2 1 4 10.704009
 21 A 10 2 1 5 3.182047
 22 B 10 2 1 6 7.450374
 23 A 10 2 1 7 2.440632
 24 B 10 2 1 8 11.794371
 25 A 20 2 2 1 10.715825
 26 B 20 2 2 2 14.647179
 27 A 20 2 2 3 8.482507
 28 B 20 2 2 4 14.759839
 29 A 20 2 2 5 4.423625
 30 B 20 2 2 6 13.929103
 31 A 20 2 2 7 6.683996
 32 B 20 2 2 8 14.747170
```
]
 
* Temp is the same as Cabinet.
]]
.column.bg-brand-gray[.content[

### With interaction

```r
summary(aov(Length ~ Water*Temp + 
  Error(Cabinet/Tray/Experiment), data=dat))
```

```
  
  Error: Cabinet
       Df Sum Sq Mean Sq
  Temp  1  203.9   203.9
  
  Error: Cabinet:Tray
             Df Sum Sq Mean Sq F value  Pr(>F)    
  Water       1 306.34  306.34 133.380 7.4e-08 ***
  Water:Temp  1   4.69    4.69   2.041   0.179    
  Residuals  12  27.56    2.30                    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Error: Cabinet:Tray:Experiment
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 16  115.7   7.232
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example - Two-way ANOVA factorial design
]]
.row[.split-50[
.column[.content[

### Decomposition of df

|Source | df | df |
|:----|----|----|
|**Whole plot=EU Cabinet:Tray** |  |  |
|    Temp `$(b-1)$`        | 1  |    |
|    Water `$(a-1)$`       | 1  |    |
|    Water : Temp `$(a-1)(b-1)$`| 1  |    |
|    EU residual `$ab(r'-1)$`  | 12 |    |
| Sum `$abr'-1$`            |    | 15 |
|**Subplot=OU Cabinet:Tray:Experiment**  |    |  | 
|    OU residual `$abr-1-(abr'-1)=ab(r-r')$`       | 16 |  | 
| Sum                 |    | 16 |
| Total `$abr-1$`              |    | 31 |

Water: `$a=2$` levels; Temp: `$b=2$` levels; 
Repeat for EU: `$r'=4$`; Repeat for OU: `$r=8$` 
]]
.column.bg-brand-gray[.content[

### Only main effects

```r
summary(aov(Length ~ Water+Temp + 
  Error(Cabinet/Tray/Experiment), data=dat))
```

```
  
  Error: Cabinet
       Df Sum Sq Mean Sq
  Temp  1  203.9   203.9
  
  Error: Cabinet:Tray
            Df Sum Sq Mean Sq F value   Pr(>F)    
  Water      1 306.34  306.34   123.5 5.22e-08 ***
  Residuals 13  32.25    2.48                     
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Error: Cabinet:Tray:Experiment
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 16  115.7   7.232
```

* Now the interaction term `$\text{SS}_{\text{A:B}}$` goes to `$\text{RSS}_1$` for the 16 Cabinet:Tray combinations within each experiment.

]]
]]