class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Two-way ANOVA Part II ## .black[STAT3022 Applied Linear Models Lecture 19] <br><br><br> ### .black[2020/02/20] ]] .column.bg-brand-charcoal[.content.white[ ## Today * Testing interaction and main effects * Interaction plots * Balanced / non balanced designs ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Test for interaction effects ]] .row[.split-two[ .column[.content[ ### Decomposing TSS given `\(n_{ij} \equiv r\)` For the interaction model `$$Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}$$` we have seen that the TSS can be decomposed into three components, `$$\text{TSS} = \text{SS}_A + \text{SS}_B +\text{SS}_{AB}.$$` Thus, `$$\text{SS}_{AB} = \text{TSS} - \text{SS}_{A} - \text{SS}_B,$$` and `\begin{eqnarray} \text{df}_{AB} &=& \text{df}_{\text{Trt}} - \text{df}_A - \text{df}_B \\ &=& (ab - 1) - (a - 1) - (b - 1)\\ &=& ab - a - b + 1 \\ &=& (a - 1)(b - 1). \end{eqnarray}` ]] .column.bg-brand-gray[.content[ ### Test for interaction effects `$$H_0\,:\, (\alpha\beta)_{ij} = 0 \;\;\forall i,j\quad\text{vs.}\quad H_1: \text{not }H_0$$` * Recall `$$\text{df}_{\text{RSS}_{H_1}} = n - ab = abr - ab.$$` * Hence, using the general F-test we have `$$f =\frac{\text{SS}_{AB} / (a-1)(b-1)}{\text{RSS}_{H_1}/(ab(r-1))} \sim F_{(a-1)(b-1),ab(r-1)}$$` under `\(H_0\)`. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Frozen beans ]] .row[.split-two[ .column[.content[ * Data from a study on the ascorbic acid losses in snap beans * stored at 3 temperatures * for 4 periods, each two weeks longer than the preceding. * The beans were all harvested and processed *under uniform Temps*. * `\(n_{ij} =3\)` packages were assigned at random to each of the `\(t = ab=12\)` treatments. * All packages were randomly allocated to positions in the freezers. (This is a `\(3\times4\)` balanced factorial design with quantitative factors.) ]] .column.bg-brand-gray[.content[ The .brand-blue[sum of 3] ascorbic acid determinations (mg/100g) for snap beans by treatment combination is Temp `\((^{\circ}F)\)` | Wk 2 | Wk 4 | Wk 6 | Wk 8 | Total ------------- | ------------- | ------------- | ------------- | ------------- | ------------- 0 | 45 | 47 | 46 | 46 | 184 10 | 45 | 43 | 41 | 37 | 166 20 | 34 | 28 | 21 | 16 | 99 **Totals** | **124** | **118** | **108** | **99** | **449** `\((^{\circ}\)`C = `\(\frac{5}{9}(^{\circ}\text{F} - 32)\)` hence, -17.8, -12.2 and -6.7 `\(^{\circ}\)`C, respectively) ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Main effects and interaction terms ]] .row[.split-two[ .column[.content[ If we consider it as a one-way ANOVA we have `$$\text{TSS} = \sum_{i=1}^a\sum_{j=1}^b n_{ij} (\bar{Y}_{ij\bullet} - \bar{Y}_{\bullet\bullet\bullet})^2 = \sum_{i=1}^a\sum_{j=1}^b \frac{Y^2_{ij\bullet}}{n_{ij}} - \frac{Y^2_{\bullet\bullet\bullet}}{n}.$$` * `\(a=3\)`, `\(b=4\)`, `\(n_{ij} = n_k \equiv r =3\)` and `\(n=abr = 36\)`, $$\text{TSS} = \left( \frac{45^2}{3} + \ldots + \frac{21^2}{3} + \frac{16^2}{3} - \frac{449^2}{36}\right) = 408.98\quad(11 \text{ df}) $$ * Assume we know that the `\(\text{RSS} = 16.94\)` with `\(n-ab=36-12=24\)` df. * `\(f^* = (408.98/11)/(16.94/24) = 52.67\)` * `\(p=(F_{11,24}\geq 52.7) < 0.001\)` We have strong evidence that not all .brand-blue[combinations] have the same effects on average. ]] .column.bg-brand-gray[.content[ ### Decomposing TSS given `\(n_{ij} \equiv r\)` If `\(n_{ij} \equiv r\)` then the treatment SS consists of the following three independent components: `\begin{eqnarray*} \text{Factor A} \text{ SS} &=& \text{SS}_A = \sum_{i=1}^a \frac{Y^2_{i\bullet\bullet}}{br} - \frac{Y^2_{\bullet\bullet\bullet}}{n},\\ \text{Factor B} \text{ SS} &=& \text{SS}_B =\sum_{j=1}^b \frac{Y^2_{\bullet j \bullet}}{ar} - \frac{Y^2_{\bullet\bullet\bullet}}{n} \end{eqnarray*}` The third component is the AB Interaction SS: `$$\text{SS}_{AB} = \text{TSS} - \text{SS}_{A} - \text{SS}_B .$$` The `\(\text{RSS}\)` for `\(H_0: \ M_{A+B}\)` vs `\(H_1: \ M_{AB}\)` is `\begin{eqnarray*} \text{RSS}_{H_0} &=& \underbrace{S_{yy}}_{\text{total SS} } - \text{SS}_A - \text{SS}_B \\ &=& (\text{RSS}_{H_1} + \color{blue}{\text{TSS}}) \color{blue}{ - \text{SS}_A - \text{SS}_B}=\text{RSS}_{H_1} + \color{blue}{\text{SS}_{AB}} \end{eqnarray*}` ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Decomposition of total variation in `\(Y\)` ]] .row[.content[ <img src="images/FTest4.png" width="80%" height="60%"> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Beans: Testing for interaction effects ]] .row[.split-two[ .column[.content[ * The main effect sum of squares for Temperature (A) and Period (B) are `\begin{eqnarray*} \text{SS}_A &=& \frac{184^2}{12} + \frac{166^2}{12} + \frac{99^2}{12} - \frac{449^2}{36} = 334.39\\ \text{SS}_B &=& \frac{124^2}{9} + \frac{118^2}{9} + \frac{108^2}{9} + \frac{99^2}{9} - \frac{449^2}{36} = 40.53 \\ \text{SS}_{AB} &=& 408.98 - 334.39 - 40.53 = 34.06 \ \text{with 2(3)=6 df}.\\ \\ f &=&\frac{\text{SS}_{AB} / 6}{\text{RSS}_{H_1}/24} = \frac{34.06 / 6}{16.94/24} =8.0 \\ \\ & &\Rightarrow p = P(F_{6,24}\geq 8) = 0.00008 \end{eqnarray*}` * The interaction terms must be kept in the model. * There is little point in testing main effects as both A and B factors must be maintained in the model. ]] .column.bg-brand-gray[.content[ * .brand-blue[Interaction plots] are a straightforward visualization of interaction effects. * Plot the average response of factor A at each level of factor B (or vice versa). .brand-blue[Example: Interaction plot by "hand"] ```r y00 <- c(45,47,46,46)/3 y10 <- c(45,43,41,37)/3 y20 <- c(34,28,21,16)/3 x <- c(2,4,6,8) beans <- data.frame( Means=c(y00, y10, y20), Week=rep(x, times=3), Temp=rep(c("0F", "10F", "20F"), each=4)) head(beans,3) ``` ``` Means Week Temp 1 15.00000 2 0F 2 15.66667 4 0F 3 15.33333 6 0F ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Interaction plots ]] .row[.split-two[ .column[.content[ ```r ggplot(beans, aes(Week, Means,color=Temp)) + geom_point(size=3) + geom_line() ``` <img src="lecture19_2020JC_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ ### Beans: Remarks * We see that the average acid level decreases with storage period. * But the *rate of decrease* varies with temperature. * At `\(0^{\circ}\)` there is effectively no decrease. ### No interaction effects If there are no interaction effects, the lines for each level of factor A (Temp) has (up to random noise) the .brand-blue[*same trend*] across levels of B and should look .brand-blue[parallel]. .brand-blue[Cross-over] of lines indicates potential interaction effect. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Dressing percentages in swine ]] .row[.split-30[ .column[.content[ * Dressing percentages (less 70%) is the percentage of the live animal that ends up as carcass * `\(n = 24\)` swine. * Swine are classified by * sex (2 levels), i.e. `\(a=2\)` * breed (4 levels), i.e. `\(b=4\)`. ```r head(dat) ``` ``` dress sex breed 1 13.3 1 1 2 18.2 2 1 3 10.9 1 2 4 14.3 2 2 5 10.3 1 3 6 12.8 2 3 ``` ]] .column.bg-brand-gray[.content[ ```r dat <- read.table("data/dressing.txt",header=TRUE) %>% mutate(sex=factor(sex),breed=factor(breed)) skimr::skim(dat) ``` <img src="images/L19Skim.png" width="90%" height="90%"> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Test for interaction terms ]] .row[.split-50[ .column[.content[ ```r anova(lm(dress ~ breed*sex, data=dat)) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) breed 3 31.751 10.5838 1.6203 0.2241 sex 1 3.920 3.9204 0.6002 0.4498 breed:sex 3 13.605 4.5349 0.6942 0.5689 Residuals 16 104.513 6.5321 ``` * When making interaction plots, the factor on the `\(x\)`-axis is interchangeable. * However, it is generally better to put the factor with more levels on .brand-blue[x-axis] and the other factor for .brand-blue[color]. * Notice here the line for each `sex` factor does not have the same .brand-blue[trend] due in particular to `breed` 4, but note that the variability of the response for `breed` 4 male is large. ]] .column.bg-brand-gray[.content[ ### Interaction plot with `ggplot2` overlayed with boxplot ```r ggplot(dat, aes(breed, dress, color=sex)) + geom_boxplot() + stat_summary(fun.y=mean, geom="point", aes(group=sex), size=3) + stat_summary(fun.y=mean, geom="line", aes(group=sex)) ``` <img src="lecture19_2020JC_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Interaction plots using a simple command ]] .row[.split-50[ .column[.content[ ```r interaction.plot(sex,breed,dress) title("Interaction plot across sex") ``` <img src="lecture19_2020JC_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ ```r interaction.plot(breed,sex,dress) title("Interaction plot across breed") ``` <img src="lecture19_2020JC_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Test for main effects ]] .row[.split-two[ .column[.content[ ```r anova(lm(dress ~ breed*sex, data=dat)) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) breed 3 31.751 10.5838 1.6203 0.2241 sex 1 3.920 3.9204 0.6002 0.4498 breed:sex 3 13.605 4.5349 0.6942 0.5689 Residuals 16 104.513 6.5321 ``` ```r anova(lm(dress ~ breed + sex, data=dat)) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) breed 3 31.751 10.5838 1.7025 0.2003 sex 1 3.920 3.9204 0.6306 0.4369 Residuals 19 118.118 6.2167 ``` ]] .column.bg-brand-gray[.content[ * If all interaction terms can be set to 0 we get the main effect model. * In this case we can then test for A and B main effects. E.g. `\(H_0 : \alpha_1 = \ldots = \alpha_a\)` use `\begin{eqnarray*} \frac{\text{SS}_A/(a-1)}{\color{red}{\text{RSS}_{H_1} / (ab(r-1))}} \sim F_{a-1, \color{red}{ab(r-1)}} \quad\text{or}\\ \\ \frac{\text{SS}_A/(a-1)}{\color{red}{\text{RSS}_{H_0} / (n-a-b +1)}} \sim F_{a-1, \color{red}{n-a-b+1}} \end{eqnarray*}` * Both tests are formally correct and the first one is used before. * The .brand-blue[*latter*] one is .brand-blue[*preferred*] if we believe that there is .brand-blue[*no interaction term*] in the model because it provides more df for the estimate of the error variance `\(\sigma^2\)`. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Swine: Main effects models ]] .row[.split-two[ .column[.content[ * A design is called .brand-blue[complete] if all treatments is observed (within each block/strata). * A design is called .brand-blue[balanced] if the precision of all treatment comparisons is equal, in this course, this is the equivalent as the **same number of experimental units are observed at each treatment**. * A complete and balanced design is referred to as an .brand-blue[orthogonal] design. ```r with(dat, table(sex, breed)) # ``` ``` breed sex 1 2 3 4 1 3 3 3 3 2 3 3 3 3 ``` * There is no empty cell and all cells have the same count. The design is balanced and complete! ]] .column.bg-brand-gray[.content[ Order of the main effects do not change the result: ```r anova(lm(dress ~ sex + breed, dat)) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) sex 1 3.920 3.9204 0.6306 0.4369 breed 3 31.751 10.5838 1.7025 0.2003 Residuals 19 118.118 6.2167 ``` ```r anova(lm(dress ~ breed + sex, dat)) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) breed 3 31.751 10.5838 1.7025 0.2003 sex 1 3.920 3.9204 0.6306 0.4369 Residuals 19 118.118 6.2167 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Swine: Reduced data set ]] .row[.split-two[ .column[.content[ ```r dat2 <- dat[-c(14, 24), ] #drop the 14 & 24th obs anova(lm(dress ~ sex + breed, dat2)) #order make diff ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) sex 1 21.348 21.3480 8.7692 0.008747 ** breed 3 20.225 6.7416 2.7692 0.073447 . Residuals 17 41.385 2.4344 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r *anova(lm(dress ~ breed + sex, dat2)) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) breed 3 22.382 7.4605 3.0646 0.05623 . sex 1 19.191 19.1912 7.8832 0.01211 * Residuals 17 41.385 2.4344 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] .column.bg-brand-gray[.content[ ### Non-orthogonal designs or missing values * If the design is .brand-blue[non-orthogonal] or certain observations are missing, then be careful about the order in which the factors are declared! * `y ~ A + B` produces the B main effect SS adjusted for A, i.e. `\(\text{SS}_{B|A}\)` and the A main effects SS ignoring B, i.e. `\(\text{SS}_{A}\)` as in a one-way ANOVA on A treatments alone. ```r *anova(lm(dress ~ breed, dat2)) #same SS for breed ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) breed 3 22.382 7.4605 2.2168 0.1213 Residuals 18 60.577 3.3654 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Two-way ANOVA with missing values ]] .row[.split-two[ .column[.content[ ### Adjusted sum of squares ```r M1 <- lm(dress ~ sex, dat2) anova(M1) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) sex 1 21.348 21.3480 6.93 0.01596 * Residuals 20 61.610 3.0805 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` * `\(\text{SS}_A=21.348\)` due to A but ignoring B should **not** be used to test if the A main effects can be dropped from the model because it does not allow for the effect of B. ]] .column.bg-brand-gray[.content[ ```r M2 <- lm(dress ~ sex + breed, dat2) anova(M2) ``` ``` Analysis of Variance Table Response: dress Df Sum Sq Mean Sq F value Pr(>F) sex 1 21.348 21.3480 8.7692 0.008747 ** breed 3 20.225 6.7416 2.7692 0.073447 . Residuals 17 41.385 2.4344 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` Then `\(\text{SS}_{B|A}\)` (breed|sex) is given by ```r deviance(M1) - deviance(M2) # 61.610-41.385 ``` ``` [1] 20.22469 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Why "orthogonal" design? ]] .row[.split-30[ .column[.content[ * Hence, in general the importance of one factor can depend on whether or not we have adjusted for another. * In essence, the pattern of factor levels is orthogonal if the SS attributable to one factor is the same whether or not the other factor has been included in the model. * This means that `\(\text{SS}_A\)` will be the same whether A appears first or second in ANOVA table or R code, respectively. ]] .column.bg-brand-gray[.content[ * Recall from lecture 18 on vector spaces for this data: ```r X0 <- model.matrix(~ 1, data=dat) XA <- model.matrix(~ -1 + breed, data=dat) XB <- model.matrix(~ -1 + sex, data=dat) PV0 <-X0 %*% solve(t(X0) %*% X0) %*% t(X0) PWA <- XA %*% solve(t(XA) %*% XA) %*% t(XA) - PV0 PWB <- XB %*% solve(t(XB) %*% XB) %*% t(XB) - PV0 ``` * The data is .brand-blue[orthogonal] because `\(W_A \perp W_B\)` or `\(\mathbf{P}_{W_A}\mathbf{P}_{W_B} = \mathbf{0}\)` .scroll-box-10[ ```r PWA %*% PWB ``` ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ``` ]]] ]]