class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Nested Designs ## .black[STAT3022 Applied Linear Models Lecture 27] <br><br><br> ### .black[2020/02/20] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. Pseudo-, false-, technical-, biological- replication 2. Skeleton Analysis of Variance 3. Split-Plot Design 4. Nested Designs ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # John Nelder ]] .row[.split-70[ .column[.content[ * John Nelder (1924-2010) was the head of the statistics department at Rothamsted Experimental Station from 1968-1984. * He was the major developer of the statistical program Genstat which is still in active development by VSN (although R has largely become the preferred statistical language) ]] .column.bg-brand-gray[.content[ <img src="images/JohnNelder.gif" width="100%" height="100%"> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Two factors: crossed vs nested/hierarchical factors ]] .row[.split-50[ .column[.content[ #### .brand-blue[Complete block design (factor B as block)] <img src="images/CrossFactor.png" width="100%" height="100%"> * In the pollen example, A is Depth, B is Sample and 2 replicates. ]] .column.bg-brand-gray[.content[ #### .brand-blue[Two-stage nested design or hierarchical design] <img src="images/NestFactor.png" width="100%" height="100%"> * Levels of B are similar but not identical for different levels of A; * .brand-blue[Balanced nested design:] an equal number of levels of B within each level of A and an equal number of repeats. * In the pollen example, A is Depth, B is Sample.id and 2 replicates but the 6 samples are not identical, so nested design. * Otherwise, CBD. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Recall: Sum of Squares decomposition and ANOVA ]] .row[.split-50[ .column[.content[ `$$Y_{ijk} = \mu + \alpha_i + (\alpha\beta)_{j(i)} + \epsilon_{ijk},\quad \epsilon_{ijk} \sim NID(0,\sigma^2),$$` <img src="images/NestSS.png" width="100%" height="100%"> * Absorb `\(\text{SS}_{\text{B}}\)` into `\(\text{SS}_{\text{A:B}}\)` to get pooled `\(\text{SS}_{\text{A:B}}\)` or `\(\text{SS}_{\text{B(A)}}\)`. `$$(\alpha\beta)'_{ij}=\color{blue}{\beta_j}+(\alpha\beta)_{ij}\hspace{100mm}$$` `$$=\color{blue}{\bar{Y}_{\bullet j \bullet}-\bar{Y}_{\bullet \bullet \bullet}}+\bar{Y}_{ij\bullet}-\bar{Y}_{i\bullet \bullet}-\bar{Y}_{\bullet j\bullet}+\bar{Y}_{\bullet \bullet \bullet}=\bar{Y}_{ij\bullet}-\bar{Y}_{i\bullet \bullet}$$` In R, change all dots for `\(i\)`, `\(j\)` or `\(ij\)` subscripts to 1. ]] .column.bg-brand-gray[.content[ <img src="images/NestANOVA.png" width="100%" height="100%"> * Pooled B(A): df `\((b-1)+(a-1)(b-1)=a(b-1)\)` * Same 2-way ANOVA components but different merging or splitting. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Calf feeding ]] .row[.split-70[ .column[.content[ * Three feed treatments (B,Y,G) are compared on 24 calves, kept in 6 pens (1 to 6) with four calves per pen. Each feed is applied to 2 whole pens. Each calf is weighed individually. <img src="images/CalfFeeding.png" width="60%" height="60%"> * Treatments: three feed type. <br> Experimental unit: pen. <br> Observational unit: calf. * .brand-blue[4 by 6 data structure:] Feed B,Y,G each .brand-blue[nests] pen (1,6), (2,4), (3,5) and 4 repeats under each pen. ]] .column.bg-brand-gray[.content[ .brand-blue[Is the replication of the treatment 2 or 8?] |Source | df | |:-------|----| |**Pens Statum** | | |Feed | 2 | |EU Residual | 3 | |Calves = OU | 18| * 6 pens have 5 df. So the residual df is 5-2=3 or 3(2-1) for B(A). * 24 calves has 23 df. Residual df is 23-2-3=18 or 2(3)(4-1). * .brand-blue[Replication is 2] (2 pen each). * .brand-blue[Repetition is 2(4)=8]. ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Theory: Pseudo-replication ]] .row[.content[ .brand-blue[Pseudo-replication] or .brand-blue[false replication] describes a situation in which multiple measurements are taken from each experimental unit. ### Technical & biological replication In life-sciences, it is often important to make a distinction between technical and biological replication. * .brand-blue[Technical replication] occurs when several measurements are taken from the same biological material. Technical replication is always a pseudo-replication, eg `\(k=4\)` replicates in each pen. * .brand-blue[Biological replication] occurs when measurements are taken from several independent biological subjects, eg `\(j=2\)` pen for each feed. ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # More examples of nested factor ]] .row[.content[ ### .brand-blue[Drug company to study product stability across sites] * Two manufacturing sites ( `\(t=2\)` ) * Three batches from each site ( `\(b=3\)`; 6 batches from different sites) * Ten tablets from each batch ( `\(r=10\)` ) <br> * Clearly the 6 batches are not identical across sites. * Technical replication is the 10 tablets from each batch (30 repetitions). * Biological replication is the 3 batches from each site (3 replications). ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Calf feeding (continued) ]] .row[.split-80[ .column[.content[ ### Hypothesis Testing `$$Y_{ijk} = \mu + \alpha_i + (\alpha\beta)_{j(i)} + \epsilon_{ijk},\quad \epsilon_{ijk} \sim NID(0,\sigma^2), \ i=1,2,3, \ j=1,2, \ k=1,2,3,4$$` where <br> `\(\alpha_i\)` is the `\(i\)`-th level main effect for feed treatment; <br> `\((\alpha\beta)_{ij}=(\alpha\beta) _{j(i)}\)` is the (random) pen effect, and <br> `\(\epsilon_{ijk}\)` is the (OU) error * To test `\(H_0: \alpha_1=\alpha_2=\alpha_3\)`, we can use the test statistic `\(\text{MS(Feed)/MS(Residual)}\)`. * Which `\(\text{MS(Residual)}\)`? <br>Some people wrongly compare `\(\text{MS(Feed)}\)` to `\(\text{MS(OU Residual)}\)` * Now we consider splitting of RSS into different components: <br> .brand-blue[EU residuals (groups):] `\(a(b-1)\)` df out of `\(ab-1\)` df for EU groups and <br> .brand-blue[OU residuals:] `\(ab(r-1)\)` df out of `\(abr-1\)` df for OU. ]] .column.bg-brand-gray[.content[ ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: grafting on animals ]] .row[.split-60[ .column[.content[ * A surgeon was going to use .brand-blue[nine animals], of the same species, in an experiment. * He wanted to compare .brand-blue[three methods of grafting skin]. * He intended to use .brand-blue[three animals for each method]. * After the graft was complete he would take a sample of new skin from each animal. * He would then cut each sample into .brand-blue[20 (tiny) pieces] and use a precision instrument to measure the thickness of each piece. * Treatments? Methods of grafting skin<br> Experimental units? Animals <br> Observational units? Each piece of skin * .brand-blue[20 by 9 data structure:] 3 methods, with animals (1,2,3), (4,5,6) and (7,8,9) nested under each and 20 repeats under each animal. ]] .column.bg-brand-gray[.content[ <img src="images/grafting.png" width="100%" height="100%"> |Source | df | |:-------|----| |**Animal=EU Statum** | | | Grafting | 2 | | EU Residual | 6 | |**Sample Stratum** | | |OU Residual | 171 | `\(t=3\)` methods are nested under <br> `\(b=9\)` animals (like Sample.id) with `\(3(3-1)=6\)` or 8-2=6 df. <br> `\(n=9\times 20=180\)` animals have residual df 3(3)(20-1)=171 or 179-2-6=171 df. ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - grafting on animals: simulation I ]] .row[.content[ ```r set.seed(1) graft <- factor(rep(LETTERS[1:3], each=3*20)) animal <- factor(rep(LETTERS[1:9], each=20)) anidev <- rnorm(9, 0, 20) trt <- c(300, 320, 350) y <- trt[as.numeric(graft)] + anidev[as.numeric(animal)] + rnorm(9*20, 0, 5) anova(lm(y ~ graft+animal)) #if swap order, graft is absorbed into animal ``` ``` Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) graft 2 145394 72697 3312.84 < 2.2e-16 *** animal 6 28346 4724 215.29 < 2.2e-16 *** Residuals 171 3752 22 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` * Notice that F-value for graft is computed as `\(\text{MS(graft)/ MS(OU Residual)}\)` rather than `\(\text{MS(graft)/ MS(animal)}\)`! * In this case the conclusion you draw is the same (small p-value) but what if the treatment effect is not signifcantly different? ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - grafting on animals: simulation II ]] .row[.content[ ```r set.seed(1); trt <- c(300, 300, 300) y <- trt[as.numeric(graft)] + anidev[as.numeric(animal)] + rnorm(9*20, 0, 5) anova(lm(y ~ graft + animal)) ``` ``` Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) graft 2 13168.6 6584.3 310.21 < 2.2e-16 *** animal 6 27857.8 4643.0 218.75 < 2.2e-16 *** Residuals 171 3629.5 21.2 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` We get a signifcant p-value for graft here but if we compared with the EU Residual as opposed to the OU Residual we have ```r 1 - pf(6584.3 / 4643.0, 2, 6) ``` ``` [1] 0.3130785 ``` which is not signifcant! ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - grafting on animals: the appropriate way ]] .row[.split-50[ .column[.content[ ```r summary(aov(y ~ graft + Error(animal))) ``` ``` Error: animal Df Sum Sq Mean Sq F value Pr(>F) graft 2 13169 6584 1.418 0.313 Residuals 6 27858 4643 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 171 3629 21.23 ``` ]] .column.bg-brand-gray[.content[ Residuals are splitted into two sources: * 9 animals with `\(\text{SS}_{\text{B(A)}}=27858\)` at `\(3(3-1)=6\)` df. * the usual `\(\text{RSS}\)` for the 20 pieces .brand-blue[within] each animals at 171 df. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Split-plot designs ]] .row[.split-60[ .column[.content[ * A standard .brand-blue[split-plot design] consists of two crossed factors with one factor randomly applied to the large units (main plots) and the second factor randomly applied within large units to the small units (subplots). * Split-plot design in agriculture refers to fields being splitted into whole-plots and subplots. * Split-plot design (or more generally nested designs) arise .brand-blue[due to practical constraint] rather than being .brand-blue[better] designs. * Simplified, but sound advice is to avoid using split-plot designs unless they are essential for practical reasons. * Constructing a .brand-blue[skeleton analysis of variance] can help to identify issues with the design. ]] .column.bg-brand-gray[.content[ <img src="images/SplitPlot.jpg" width="90%" height="90%"> ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Irrigation `\(\times\)` Fertiliser ]] .row[.content[ Suppose you have 8 plots laid out in a 2 `\(\times\)` 4 rectangular layout. You want to test the effect of irrigation amount (2 levels, L in white and H in blue) and fertiliser type (A and B) on crop yield. <img src="images/Irrigation.png" width="80%" height="80%"> * .brand-blue[Design 1:] 8 plots. * .brand-blue[Design 2:] 4 whole plots (df=3) of different irrigation and each contains 2 subplots of 2 fertilisers. * .brand-blue[Design 3:] 2 whole plots (df=1) of 2 irrigations and each contains 4 subplots of different fertilisers. ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Skeleton Analysis of Variance ]] .row[.split-three[ .column[.content[ <img src="images/Plot1.png" width="50%" height="50%"> | Source | df | |:----|----| | Fertilizer=Irrigation | 1| | Residual | 6| * 8 plots. ]] .column.bg-brand-gray[.content[ <img src="images/Plot2.png" width="50%" height="50%"> |Source | df | |:----|----| |**Whole plot=EU Stratum** | | | Irrigation | 1 | | EU residual | 2 | |**Subplot=OU Stratum**| | | Fertilizer | 1 | | Fertilizer : Irrigation | 1 | | OU residual| 2 | * 4 wholeplots each 2 subplots. ]] .column[.content[ <img src="images/Plot3.png" width="50%" height="50%"> |Source | df | |:----|----| |**Whole plot=EU Stratum**| | | Irrigation | 1| | EU residual | 0| |**Subplot=OU Stratum**| | | Fertilizer | 1 | | Fertilizer : Irrigation | 1 | | OU residual | 4| * 2 wholeplots each 4 subplots. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: setting wholeplot ]] .row[.split-three[ .column[.content[ <img src="images/Plot1.png" width="50%" height="50%"> Fertiliser confounded with irrigation. ]] .column.bg-brand-gray[.content[ <img src="images/Plot2.png" width="50%" height="50%"> * Similar to RCBD, there are 4 blocks/wholeplots with different irrigation and arrangements. * Each wholeplot has 2 subplots/ fertilisers at different irrigations repeated with AB or BA. * .brand-blue[2 by 4 data structure:] Fertilisers A and B each .brand-blue[nests] irrigation L and H; the 4 col/wholeplots each has two repeats for AB or BA. ]] .column[.content[ <img src="images/Plot3.png" width="50%" height="50%"> * Why not divide the whole plot into two with each with 2 fertilisers. <img src="images/Plot4.png" width="50%" height="50%"> * This is similar to design 2. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Seed growth ]] .row[.split-70[ .column[.content[ * Each of 2 cabinets at 10 `\(^o\)`C and at 20 `\(^o\)`C, contains 8 seed trays, with 2 watering regimes (A and B), each applied to four trays chosen at random within each cabinet. The experiment was repeated with another 2 cabinets. The average growth of the seeds in each tray are measured. * Treatment? Watering regimes (A and B) <br> Experimental unit? Cabinet (temp and experiment combinations) <br> Observational unit? Tray * 2 factors: Water and Temp. * .brand-blue[8 by 4 data structure:] water A and B each nests temp 10 and 20, each 8 rows, 4 repeats from expt 1 and another 4 repeats from expt 2. * Source of errors for water (2 by 2 at 8 repeats): <br> for each expt: across cabinet, temp `\(\text{SS}_{\text{B}}\)` at 2-1=1df, <br> for each expt: across cabinet by tray, water `\(\text{SS}_{\text{A}}\)` at 2-1=1df, `\(\text{SS}_{\text{A:B}}\)` at 1df and `\(\text{RSS}_1\)` at `\(ab(r'-1)\)`=2(2)(4-1) or 15-1-1-1=12df (4 repeats), and <br> across expts `\(\text{RSS}_2\)` at 2(8)=16 df. The 2 RSS have `\(ab(r-1)\)` =2(2)7=28df. ]] .column.bg-brand-gray[.content[ <img src="images/Experiment1.png" width="80%" height="90%"> <img src="images/Experiment2.png" width="80%" height="90%"> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Two-way ANOVA factorial design ]] .row[.split-50[ .column[.content[ .scroll-box-20[ ```r dat ``` ``` Water Temp Experiment Cabinet Tray Length 1 A 10 1 1 1 5.811440 2 B 10 1 1 2 13.681228 3 A 10 1 1 3 6.014650 4 B 10 1 1 4 11.280688 5 A 10 1 1 5 8.167450 6 B 10 1 1 6 12.879721 7 A 10 1 1 7 3.261500 8 B 10 1 1 8 6.889271 9 A 20 1 2 1 11.328180 10 B 20 1 2 2 15.930219 11 A 20 1 2 3 8.703601 12 B 20 1 2 4 18.204390 13 A 20 1 2 5 10.629647 14 B 20 1 2 6 17.399540 15 A 20 1 2 7 10.854487 16 B 20 1 2 8 17.832480 17 A 10 2 1 1 3.711876 18 B 10 2 1 2 6.257596 19 A 10 2 1 3 4.965976 20 B 10 2 1 4 10.704009 21 A 10 2 1 5 3.182047 22 B 10 2 1 6 7.450374 23 A 10 2 1 7 2.440632 24 B 10 2 1 8 11.794371 25 A 20 2 2 1 10.715825 26 B 20 2 2 2 14.647179 27 A 20 2 2 3 8.482507 28 B 20 2 2 4 14.759839 29 A 20 2 2 5 4.423625 30 B 20 2 2 6 13.929103 31 A 20 2 2 7 6.683996 32 B 20 2 2 8 14.747170 ``` ] <br> * Temp is the same as Cabinet. ]] .column.bg-brand-gray[.content[ ### With interaction ```r summary(aov(Length ~ Water*Temp + Error(Cabinet/Tray/Experiment), data=dat)) ``` ``` Error: Cabinet Df Sum Sq Mean Sq Temp 1 203.9 203.9 Error: Cabinet:Tray Df Sum Sq Mean Sq F value Pr(>F) Water 1 306.34 306.34 133.380 7.4e-08 *** Water:Temp 1 4.69 4.69 2.041 0.179 Residuals 12 27.56 2.30 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Error: Cabinet:Tray:Experiment Df Sum Sq Mean Sq F value Pr(>F) Residuals 16 115.7 7.232 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Two-way ANOVA factorial design ]] .row[.split-50[ .column[.content[ ### Decomposition of df |Source | df | df | |:----|----|----| |**Whole plot=EU Cabinet:Tray** | | | | Temp `\((b-1)\)` | 1 | | | Water `\((a-1)\)` | 1 | | | Water : Temp `\((a-1)(b-1)\)`| 1 | | | EU residual `\(ab(r'-1)\)` | 12 | | | Sum `\(abr'-1\)` | | 15 | |**Subplot=OU Cabinet:Tray:Experiment** | | | | OU residual `\(abr-1-(abr'-1)=ab(r-r')\)` | 16 | | | Sum | | 16 | | Total `\(abr-1\)` | | 31 | Water: `\(a=2\)` levels; Temp: `\(b=2\)` levels; <br> Repeat for EU: `\(r'=4\)`; Repeat for OU: `\(r=8\)` ]] .column.bg-brand-gray[.content[ ### Only main effects ```r summary(aov(Length ~ Water+Temp + Error(Cabinet/Tray/Experiment), data=dat)) ``` ``` Error: Cabinet Df Sum Sq Mean Sq Temp 1 203.9 203.9 Error: Cabinet:Tray Df Sum Sq Mean Sq F value Pr(>F) Water 1 306.34 306.34 123.5 5.22e-08 *** Residuals 13 32.25 2.48 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Error: Cabinet:Tray:Experiment Df Sum Sq Mean Sq F value Pr(>F) Residuals 16 115.7 7.232 ``` * Now the interaction term `\(\text{SS}_{\text{A:B}}\)` goes to `\(\text{RSS}_1\)` for the 16 Cabinet:Tray combinations within each experiment. ]] ]]