class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Randomized complete block designs ## .black[STAT3022 Applied Linear Models Lecture 24] <br><br><br> ### .black[2020/02/17] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. More on randomized complete block designs ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Potato Yield ]] .row[.split-60[ .column[.content[ * An experiement was devised to investigate the effects of four different types of fungicides (labelled A,B,C,D) on the yield of potatoes in field plots. * An untreated control treatment (labelled 0) was also included to give a baseline comparison. * In the field designated for the trial, heterogeneity was thought to be present at large scales, but suitable blocks of .brand-blue[five field plots] could be identified and so .brand-blue[four blocks] each of five plots could be used. What are: * Treatments: five type of fungicides including control, * Experimental units: plot * Observation units: plot and how many observations are there? ]] .column.bg-brand-gray[.content[ ### A poor design **(Non-randomised) Design 1** |**Plot**|**1**|**2**|**3**|**4**|**5**| |-----|-----|------|-----|-----|------| |Block 1|A|A|A|A|B| |Block 2|B|B|0|0|B| |Block 3|C|C|0|0|C| |Block 4|D|D|D|D|C| * Design 1 is poor because difference between any fungicides A and D are .brand-blue[confounded with block difference]. * Indeed, it is .brand-blue[not a complete block design], since not all treatment occur in each block. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - A better design ]] .row[.split-30[ .column[.content[ **(Non-randomised) Design 2** |**Plot**|**1**|**2**|**3**|**4**|**5**| |-----|-----|------|-----|-----|------| |Block 1|A|A|A|A|0| |Block 2|B|B|B|B|0| |Block 3|C|C|C|C|0| |Block 4|D|D|D|D|0| <br> **(Non-randomised) Design 3** |**Plot**|**1**|**2**|**3**|**4**|**5**| |-----|-----|------|-----|-----|------| |Block 1|A|B|C|D|0| |Block 2|A|B|C|D|0| |Block 3|A|B|C|D|0| |Block 4|A|B|C|D|0| ]] .column.bg-brand-gray[.content[ * Design 3 is a complete block design as every treatment occurs exactly the same number of times in each block. ### Construction of RCBD's * To each of the `\(b\)` blocks .brand-blue[randomly allocate] the `\(t\)` treatments to the `\(t\)` plots so there are `\((t!)^b\)` possible designs. * The randomisation offers some .brand-blue[protection against plot location bias]. * Use a random number generator to determine the allocation for each block separately. * Randomisation offers protection in a probabilistic way in that there is a small chance that treatment will turn up in the same plot in all blocks. * Over a number of repetitions of the trial every possible arrangement would occur the same number of times. * It can be good to only allow .brand-blue[good] randomised designs. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Randomised complete block design (RCBD) ]] .row[.split-50[ .column[.content[ ```r set.seed(1) replicate(4,sample(c("A","B","C","D","0"))) ``` ``` [,1] [,2] [,3] [,4] [1,] "A" "0" "C" "B" [2,] "D" "C" "0" "0" [3,] "C" "D" "A" "D" [4,] "0" "B" "D" "C" [5,] "B" "A" "B" "A" ``` Of course, choosing a different seed will get a different RCBD. ```r set.seed(71) replicate(4,sample(c("A","B","C","D","0"))) ``` ``` [,1] [,2] [,3] [,4] [1,] "C" "A" "B" "B" [2,] "D" "C" "D" "C" [3,] "0" "B" "A" "0" [4,] "B" "D" "C" "D" [5,] "A" "0" "0" "A" ``` ]] .column.bg-brand-gray[.content[ ### Two-way ANOVA for RCBD * Assumptions * We observe `\(t\)` treatments in each of the `\(b\)` blocks. * The treatment factor does not interact with the block factor. * We model the data as a two-way main effects ANOVA `$$Y_{ijk} = \mu + \alpha_i + \beta_j + \epsilon_{ijk},\; \epsilon_{ijk} \sim NID(0,\sigma^2)$$` with `\(i=1,\ldots,t\)` and `\(j=1,\ldots,b\)`. * Normality assumption needs to be checked. * There is no repeats in each treatment block combination; otherwise it can be called a .brand-blue[factorial design] in which .brand-blue[interaction effect] can be modelled. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Potato Yield Data ]] .row[.split-70[ .column[.content[ ```r skimr::skim(dat) ``` <img src="images/L24Skim.png" width="90%" height="90%"> ```r dat <- dat %>% mutate(Block=factor(Block), Trt=factor(Trt)) ``` ]] .column.bg-brand-gray[.content[ ```r dat ``` ``` Yield Trt Block 1 375 0 1 2 409 0 2 3 500 0 3 4 332 0 4 5 528 A 1 6 602 A 2 7 604 A 3 8 535 A 4 9 636 B 1 10 599 B 2 11 650 B 3 12 570 B 4 13 645 C 1 14 710 C 2 15 665 C 3 16 504 C 4 17 626 D 1 18 550 D 2 19 561 D 3 20 668 D 4 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Potato Yield Data ]] .row[.split-40[ .column[.content[ ```r dat %>% ggplot(aes(Trt,Block, fill=Yield, label=Trt)) + geom_tile(color="black") + geom_text(size=5) + scale_fill_distiller(palette=2, direction=1) ``` ]] .column.bg-brand-gray[.content[ <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Potato Yield Data ]] .row[.split-50[ .column[.content[ ```r ggplot(dat, aes(Block, Yield)) + geom_boxplot() + geom_point(aes(color=Trt),size=3) ``` <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ ```r ggplot(dat, aes(Trt, Yield)) + geom_boxplot() + geom_point(aes(color=Block),size=3) ``` <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Sum of Squares ]] .row[.split-50[ .column[.content[ ```r dat %>% group_by(Block) %>% summarise(sum(Yield)) %>% as.data.frame() ``` ``` Block sum(Yield) 1 1 2810 2 2 2870 3 3 2980 4 4 2609 ``` ```r dat %>% group_by(Trt) %>% summarise(sum(Yield)) %>% as.data.frame() ``` ``` Trt sum(Yield) 1 0 1616 2 A 2269 3 B 2455 4 C 2524 5 D 2405 ``` ```r c(sum(dat$Yield),sum(dat$Yield^2)) ``` ``` [1] 11269 6542743 ``` ]] .column.bg-brand-gray[.content[ * Total SS = `\(S_{yy} = \sum_{i=1}^t \sum_{j=1}^b Y_{ij}^2-\frac{Y_{\bullet \bullet}^2}{n}\)` with `\((n-1)\)` df; $$ S_{yy} = 6542743-\frac{11269^2}{20}=193225 $$ * `\(\text{SS}_{\text{Block}} = \sum_{j=1}^b \frac{ Y_{\bullet j}^2}{t}-\frac{Y_{\bullet \bullet}^2}{n}\)` with `\((b-1)\)` df; `$$\text{SS}_{\text{Block}} = \frac{ 2810^2+\dots+2609^2}{5}-\frac{11269^2}{20}=14538$$` * `\(\text{SS}_{\text{Trt}} = \sum_{i=1}^t \frac{ Y_{i\bullet}^2}{t}-\frac{Y_{\bullet \bullet}^2}{n}\)` with `\((t-1)\)` df; $$\text{SS}_{\text{Trt}} = \frac{1616^2+\dots+2405^2}{4}-\frac{11269^2}{20}=135843 $$ ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Omnibus hypothesis ]] .row[.split-50[ .column[.content[ * We want to test `$$H_0: \alpha_A=\alpha_B=\alpha_C=\alpha_D=\alpha_0$$` i.e. all treatment (including control) have the same effect. * It is good statistical practice to show the block before the treatment factor in the ANOVA table: Source | Df | Sum Sq | Mean Sq ------------- | ------------- | ------------- | ------------- Block | `\(b-1\)` | `\(\text{SS}_{\text{Block}}\)` | `\(\text{SS}_{\text{Block}}/(b-1)\)` Treatment | `\(t-1\)` | `\(\text{SS}_{\text{Trt}}\)` | `\(\text{SS}_{\text{Trt}}/(t-1)\)` Residual | `\((b-1)(t-1)\)` | `\(\text{RSS}\)` | `\(\text{RSS}/\text{Df}_{Res}\)` Total | `\(n-1\)` | `\(\text{S}_{yy}\)` | ]] .column.bg-brand-gray[.content[ ```r anova(lm(Yield ~ Block + Trt, data=dat)) #block 1st ``` ``` Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(>F) Block 3 14538 4846 1.3573 0.302600 Trt 4 135843 33961 9.5119 0.001057 ** Residuals 12 42844 3570 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r anova(lm(Yield ~ Trt + Block, data=dat)) #=, bal&com ``` ``` Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(>F) Trt 4 135843 33961 9.5119 0.001057 ** Block 3 14538 4846 1.3573 0.302600 Residuals 12 42844 3570 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Why not a one-way ANOVA ]] .row[.split-50[ .column[.content[ * Testing for a treatment effect (having adjusted for the blocks), we have `\(p\)`-value `\(<0.05\)`. * Thus there are significant differences in the mean treatment effect after adjusting for the block. * If we ignore the blocking factor in the anaysis then the error variance estimate in the one-way ANOVA is `$$\frac{\text{RSS}+\text{SS}_{\text{Block}}}{\text{df}_{\text{RSS}}+\text{df}_{\text{Block}}}=\frac{42844+14538}{12+3}=3825.5$$` whereas in the two-way ANOVA `\(\hat{\sigma}^2=3570\)`. * Adding the SS for block, the RSS for one-way ANOVA is higher though at higher df. So the differences between blocks would mask the treatment effect if we did not use a design incorporating the blocks. ]] .column.bg-brand-gray[.content[ ```r anova(lm(Yield ~ Block + Trt, data=dat)) #2-way ANOVA ``` ``` Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(>F) Block 3 14538 4846 1.3573 0.302600 Trt 4 135843 33961 9.5119 0.001057 ** Residuals 12 42844 3570 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r anova(lm(Yield ~ Trt, data=dat)) #1-way ANOVA ``` ``` Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(>F) Trt 4 135843 33961 8.8775 0.0006963 *** Residuals 15 57382 3825 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Pairwise differences: Calculating CI ]] .row[.split-50[ .column[.content[ `\begin{eqnarray*} \text{E}(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet})&=&\text{E}\left(\mu+\alpha_i+\frac{1}{b}\sum_{j=1}^b \beta_j+\bar{\epsilon}_{i\bullet}\right)-\\ & & \text{E}\left(\mu+\alpha_k+\frac{1}{b}\sum_{j=1}^b \beta_j+\bar{\epsilon}_{k\bullet}\right)\\ &=&\alpha_i-\alpha_k \end{eqnarray*}` * since `\(\text{E}(\bar{\epsilon}_{i\bullet})=0\)`. Hence `\(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet}\)` is an unbiased estimator for `\(\alpha_i-\alpha_k\)`. * Also because `\(\mu\)`, `\(\alpha_i\)` and `\(\beta_j\)` are fixed, `\begin{eqnarray*} \text{Var}(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet})&=&\text{Var}(\bar{\epsilon}_{i\bullet}-\bar{\epsilon}_{k\bullet})\\ &=& \frac{\sigma^2}{b}+\frac{\sigma^2}{b}=\frac{2\sigma^2}{b} \end{eqnarray*}` * So `\((\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet}) \sim N\left(\alpha_i-\alpha_k,\frac{2\sigma^2}{b}\right)\)`. ]] .column.bg-brand-gray[.content[ * We estimate `\(\sigma^2\)` using the residual mean squares. * Thus a `\((1-\alpha)\)` 100% CI for `\(\alpha_i-\alpha_k\)` is `$$(\bar{Y}_{i\bullet}-\bar{Y}_{k\bullet})\pm \color{blue}{t_{(t-1)(b-1)}(1-\alpha/2)}\sqrt{\frac{2\hat{\sigma}^2}{b}}$$` * Change the critical value to `\(\color{blue}{t_{(t-1)(b-1)}(1-\alpha/2m)}\)` where the number of pairs `\(m={t \choose 2}\)` if one corrects for multiple testinig as per Lecture 16. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pairwise Confidence Interval ]] .row[.split-50[ .column[.content[ ```r library(emmeans) M1 <- lm(Yield ~ Block + Trt, data=dat) EM1 <- emmeans(M1, "Trt") confint(pairs(EM1, adjust="none")) ``` ``` contrast estimate SE df lower.CL upper.CL 0 - A -163.2 42.3 12 -255.3 -71.2 0 - B -209.8 42.3 12 -301.8 -117.7 0 - C -227.0 42.3 12 -319.1 -134.9 0 - D -197.2 42.3 12 -289.3 -105.2 A - B -46.5 42.3 12 -138.6 45.6 A - C -63.8 42.3 12 -155.8 28.3 A - D -34.0 42.3 12 -126.1 58.1 B - C -17.2 42.3 12 -109.3 74.8 B - D 12.5 42.3 12 -79.6 104.6 C - D 29.8 42.3 12 -62.3 121.8 Results are averaged over the levels of: Block Confidence level used: 0.95 ``` ]] .column.bg-brand-gray[.content[ ```r EM1 %>% pairs(adjust="none") %>% plot() + geom_vline(xintercept=0, size=2, color="red") ``` <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pairwise CI with Multiple Correction ]] .row[.split-three[ .column[.content[ ```r EM1 %>% pairs(adjust="scheffe") %>% plot() + geom_vline(xintercept=0, size=2, color="red") + xlim(-400, 200) ``` <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ ```r EM1 %>% pairs(adjust="tukey") %>% plot() + geom_vline(xintercept=0, size=2, color="red") + xlim(-400, 200) ``` <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ]] .column[.content[ ```r EM1 %>% pairs(adjust="bonferroni") %>% plot() + geom_vline(xintercept=0, size=2, color="red") + xlim(-400, 200) ``` <img src="lecture24_2020JC_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ]] ]]