class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Higher order models ## .black[STAT3022 Applied Linear Models Lecture 30] <br><br><br> ### .black[2020/02/20] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. 3 factor models 2. Model formulae in R 3. `\(2^n\)` factorial design 4. Confounding ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Theory - three factor models ]] .row[.split-40[ .column[.content[ * Latin square designs allow to test for three factors: treatment, block, and position. * Assume we have factor A at `\(a\)` levels, factor B at `\(b\)` levels, and factor C at `\(c\)` levels, with `\(r\)` replicates of each (A,B,C) combination. * Then the treatment SS for the `\(abc\)` treatment combinations has `\(\text{df}_{\text{TSS}} = (abc-1)\)`. * This can be decomposed into * `\((a-1) + (b-1) + (c-1)\)` main effects and * interaction terms. ]] .column.bg-brand-gray[.content[ ### The interaction model * Let `\(Y_{ijkl}\)` be the `\(l\)`-th observation of factor level combination `\(ijk, \ l = 1, \dots, n_{ijk}\)`. `$$\text{M}_{\text{ABC}}:\,Y_{ijkl}=\mu_{ijk}+\epsilon_{ijkl},\,i=1,\dots,a,j=1,\dots,b,k=1,\dots,c,$$` * where `\(\mu_{ijk}\)` denotes the mean at the `\(i\)`th level of A, `\(j\)`th level of B and `\(k\)`th level of C, * `\(\mu_{ijk}=\mu+\alpha_i+\beta_j+\gamma_k+(\alpha\beta)_{ij}+(\alpha\gamma)_{ik}+(\beta\gamma)_{jk}+(\alpha\beta\gamma)_{ijk}.\)` * For such a model we can calculate * TSS with `\((abc-1)\)` df * the RSS with `\(n-abc\)` df * the F-test for the hypothesis of .brand-blue[overall] treatment effects i.e. `$$H_0\,:\,\mu_{111}=\mu_{112}=\ldots=\mu_{abc}$$` ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # SS in the ANOVA table with `\(n_{ijk}=r\)` replicates ]] .row[.content[ |Source of var | df | sum of squares | |----|----|:----| | A | `\(a-1\)` | `\(\text{SS}_{\text{A}}=\sum_{i=1}^a\frac{Y^2_{i\bullet\bullet\bullet}}{bcr}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}\)` | | B | `\(b-1\)` | `\(\text{SS}_{\text{B}}=\sum_{j=1}^b \frac{Y^2_{\bullet j\bullet\bullet}}{acr}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}\)` | | C | `\(c-1\)` | `\(\text{SS}_{\text{C}}=\sum_{k=1}^c \frac{Y^2_{\bullet\bullet k\bullet}}{abr}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}\)` | | A `\(\times\)` B | `\((a-1)(b-1)\)` | `\(\text{SS}_{\text{A:B}}=\sum_{i=1}^a\sum_{j=1}^b\frac{Y^2_{ij\bullet\bullet}}{cr}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}-\text{SS}_{\text{A}}-\text{SS}_{\text{B}}\)` | | A `\(\times\)` C | `\((a-1)(c-1)\)` | `\(\text{SS}_{\text{A:C}}=\sum_{i=1}^a\sum_{k=1}^c\frac{Y^2_{i\bullet k\bullet}}{br}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}-\text{SS}_{\text{A}}-\text{SS}_{\text{C}}\)` | | B `\(\times\)` C | `\((b-1)(c-1)\)` | `\(\text{SS}_{\text{B:C}}=\sum_{j=1}^b\sum_{k=1}^c\frac{Y^2_{\bullet jk\bullet}}{ar}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}-\text{SS}_{\text{B}}-\text{SS}_{\text{C}}\)` | | A `\(\times\)` B `\(\times\)` C | `\((a-1)(b-1)(c-1)\)` | `\(\text{SS}_{\text{A:B:C}}=\text{TSS}-\text{SS}_{\text{A}}-\text{SS}_{\text{B}}-\text{SS}_{\text{C}}-\text{SS}_{\text{A:B}}-\text{SS}_{\text{A:C}}-\text{SS}_{\text{B:C}}\)` | | TSS | `\(abc -1\)` | `\(\text{TSS}=\sum_{ijk}\frac{Y^2_{ijk\bullet}}{r}-\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}\)` | | RSS | `\(abc(r-1)\)` | `\(\text{RSS}=\sum_{ijkl} (Y_{ijkl}-\bar{Y}_{ijk \bullet})^2=\sum_{ijk} (n_{ijk}-1) s_{ijk}^2\)` or `\(\text{TotSS} - \text{TSS}\)` | |Total SS | `\(abcr-1\)` | `\(\text{TotSS}=\sum_{ijkl} Y_{ijkl}^2 -\frac{Y^2_{\bullet\bullet\bullet\bullet}}{n}\)` | ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Theory - Calculation of the effects ]] .row[.split-50[ .column[.content[ * Effects are combinations of .brand-blue[cell averages], where a cell represents one of the `\(abc\)` factor level combinations. * For sum constraint we have, `$$\alpha_i=\bar{\mu}_{i\bullet\bullet}-\bar\mu_{\bullet\bullet\bullet}=\frac{1}{bc}\sum_{j=1}^b\sum_{k=1}^c\mu_{ijk}-\bar\mu_{\bullet\bullet\bullet}\hspace{30mm}$$` `$$(\alpha\beta)_{ij}=\bar{\mu}_{ij\bullet}-\bar{\mu}_{i\bullet\bullet}-\bar{\mu}_{\bullet j \bullet}+\bar{\mu}_{\bullet\bullet\bullet}\hspace{100mm}$$` `$$(\alpha\beta\gamma)_{ijk}\hspace{-1mm}=\hspace{-1mm} {\mu}_{ijk}\hspace{-1mm}-\hspace{-1mm}\bar{\mu}_{ij\bullet}\hspace{-1mm}-\hspace{-1mm}\bar{\mu}_{i\bullet k}\hspace{-1mm}-\hspace{-1mm}\bar{\mu}_{\bullet jk}\hspace{-1mm}+\hspace{-1mm}\bar{\mu}_{i\bullet\bullet}\hspace{-1mm}+\hspace{-1mm}\bar{\mu}_{\bullet j\bullet}\hspace{-1mm}+\hspace{-1mm}\bar{\mu}_{\bullet\bullet k}-\bar{\mu}_{\bullet\bullet\bullet}.$$` * Note the alternating sign! * For treatment constraint with baseline `\(i,j,k=1\)`, just replace any dot for `\(i,j,k\)` with 1. ]] .column.bg-brand-gray[.content[ In practice, we may not have many replicates, often 1 or 2, or even some empty cells. * So to test main effects and low order interactions we may have to assume certain high order interactions are negligible. * Combine those SS with the residual SS to gain the residual SS for an initial model. * The df are adjusted accordingly. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Paper planes ]] .row[.split-40[ .column[.content[ * An experiment was conducted to determine the effect of 3 factors on the travel distance of differently designed paper planes. To eliminate the effect of wind, it was run in a hallway of QUT. See <a href="http://www.statsci.org/data/oz/planes.html">here</a>. * .brand-blue[Paper] weight at 2 levels (50gms, 80gms), * .brand-blue[Angle] at 2 levels (Horizontal = 0, 45 degrees = 45), * .brand-blue[Design] at 2 levels (High-performance = A, Incredibly simple = S). ]] .column.bg-brand-gray[.content[ <br> | Paper | 50gms | 80gms | | |----|----|----| |Angle | 0 `\(\hspace{35mm}\)` 45 | 0 `\(\hspace{35mm}\)` 45 | | Design | A `\(\hspace{12mm}\)` S `\(\hspace{13mm}\)` A `\(\hspace{13mm}\)` S | A `\(\hspace{12mm}\)` S `\(\hspace{12mm}\)` A `\(\hspace{12mm}\)` S | | | 6520 `\(\hspace{2mm}\)` 2130 `\(\hspace{3mm}\)` 6348 `\(\hspace{3mm}\)` 2730 | 2160 `\(\hspace{2mm}\)` 4596 `\(\hspace{2mm}\)` 3854 `\(\hspace{2mm}\)` 5088 | | | 4091 `\(\hspace{2mm}\)` 3150 `\(\hspace{3mm}\)` 4550 `\(\hspace{3mm}\)` 2585 | 1511 `\(\hspace{2mm}\)` 3706 `\(\hspace{2mm}\)` 1690 `\(\hspace{2mm}\)` 4255 | | Total | 10611 `\(\hspace{1mm}\)` 5280 `\(\hspace{1mm}\)` 10898 `\(\hspace{1mm}\)` 5315 | 3671 `\(\hspace{2mm}\)` 8302 `\(\hspace{2mm}\)` 5544 `\(\hspace{2mm}\)` 9343 | * Two observations are taken for each treatment combination. * All 16 observations were made in a completely randomised order. .brand-blue[Remark:] This is a `\(2\times2\times 2=2^3\)` .brand-blue[balanced factorial design] with `\(r=2\)` replications. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Planes auxiliary quantities, `\(Y_{1\bullet\bullet\bullet}\)` to `\(Y_{\bullet22\bullet}\)` ]] .row[.split-50[ .column[.content[ ```r tapply(dat$Distance,list(dat$Paper,dat$Angle),sum) ``` ``` 0 45 50 15891 16213 80 11973 14887 ``` ```r tapply(dat$Distance,list(dat$Paper,dat$Design),sum) ``` ``` A S 50 21509 10595 80 9215 17645 ``` ```r tapply(dat$Distance,list(dat$Angle,dat$Design),sum) ``` ``` A S 0 14282 13582 45 16442 14658 ``` ]] .column.bg-brand-gray[.content[ ```r tapply(dat$Distance,dat$Paper,sum) ``` ``` 50 80 32104 26860 ``` ```r tapply(dat$Distance,dat$Angle,sum) ``` ``` 0 45 27864 31100 ``` ```r tapply(dat$Distance,dat$Design,sum) ``` ``` A S 30724 28240 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Planes: ANOVA table ]] .row[.split-80[ .column[.content[ ```r M1 <- aov(Distance ~ Paper*Angle*Design, data=dat) summary(M1) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) Paper 1 1718721 1718721 1.638 0.2364 Angle 1 654481 654481 0.624 0.4524 Design 1 385641 385641 0.368 0.5611 Paper:Angle 1 419904 419904 0.400 0.5446 Paper:Design 1 23386896 23386896 22.294 0.0015 ** Angle:Design 1 73441 73441 0.070 0.7980 Paper:Angle:Design 1 21025 21025 0.020 0.8909 Residuals 8 8392178 1049022 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` Only .brand-blue[Paper:Design] is significant. ]] .column.bg-brand-gray[.content[ ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Planes: Simplifying the model ]] .row[.split-50[ .column[.content[ ```r summary(update(M1, .~. - Paper:Angle:Design)) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) Paper 1 1718721 1718721 1.839 0.208157 Angle 1 654481 654481 0.700 0.424400 Design 1 385641 385641 0.413 0.536700 Paper:Angle 1 419904 419904 0.449 0.519544 Paper:Design 1 23386896 23386896 25.018 0.000737 *** Angle:Design 1 73441 73441 0.079 0.785590 Residuals 9 8413203 934800 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r M2 <- aov(Distance ~ Paper*Design, data=dat) summary(M2) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) Paper 1 1718721 1718721 2.157 0.167628 Design 1 385641 385641 0.484 0.499861 Paper:Design 1 23386896 23386896 29.353 0.000156 *** Residuals 12 9561029 796752 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] .column.bg-brand-gray[.content[ ### R formula and model equations * Variable multiplied together indicates the interaction between these terms and all lower order terms are included.<br> .brand-blue[A] `\(\color{blue}{\ast}\)` .brand-blue[B] `\(\color{blue}{\ast}\)` .brand-blue[C = A + B + C + A:B + B:C + C:A + A:B:C] * Brackets can be expanded in a model formula in the expected manner.<br> .brand-blue[(A + B)] `\(\color{blue}{\ast}\)` .brand-blue[C = A + B + C + A:C + B:C] * Consider the model with (mathematical) equation `$$Y_{ijklm}\hspace{-1mm}=\hspace{-1mm}\mu+\alpha_i+\beta_j+\gamma_k+\delta_l+(\alpha\gamma)_{ik}+(\beta\gamma)_{jk}+\epsilon_{ijklm}$$` The R formula is .brand-blue[y] `\(\color{blue}{\sim}\)` .brand-blue[(A + B)] `\(\color{blue}{\ast}\)` .brand-blue[C + D] ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Planes: Diagnostics for full model ]] .row[.content[ <img src="images/planesdiag1.png" width="65%" height="65%"> ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Planes: Diagnostics for simplified model ]] .row[.content[ <img src="images/planesdiag2.png" width="65%" height="65%"> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # `\(2^n\)` factorial design ]] .row[.split-50[ .column[.content[ * Factorial design with `\(2^n\)` treatment combinations arise when we have `\(n\)` factors each at 2 levels. * These designs are often used in exploratory experiments (pilot studies) since many potential factors can be considered. * Each factor is analysed on two levels only such that an effect if exists is not masked. * Denote the factors by A, B, C, `\(\ldots\)` and use `\(a\)`, `\(b\)`, `\(c, \ldots\)` for the treatment combinations. * The presence of `\(a\)` denotes the high level of A and the absence of `\(a\)` denotes the low level of A. * `\(1\)` denotes the low level of all factors. * Introduce factors one at a time, then combine with all labels in the list before adding a new letter to the right. ]] .column.bg-brand-gray[.content[ ### Example: `\(2^5\)` design and standard order * Factors: A, B, C, D, E with levels: `\(a\)`, `\(b\)`, `\(c\)`, `\(d\)`, `\(e\)` * `\(ade \equiv\)` high level of A, D and E combined with the low level of B and C * `\(1\)` denotes that all five factors are on the low level * In a `\(2^1\)` experiment the possible treatment combinations are: `\(\color{blue}{1,\, a}\)` (L,H). * In a `\(2^2\)` experiment the possible treatment combinations are: `\(\color{blue}{1,\, a,\, b,\, ab}\)` (LL, HL, LH, HH). * In a `\(2^3\)` experiment the possible treatment combinations are: `\(\color{blue}{1,\, a,\, b,\, ab,\, c,\, ac,\, bc,\, abc}\)` <br> (LLL, HLL, LHL, HHL, LLH, HLH, LHH, HHH) and this order is called .brand-blue[standard order]. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Yield of isatin derivative ]] .row[.split-50[ .column[.content[ * A `\(2^4\)` single replicate design was used for a laboratory investigation of the yield of an isatin derivative prepared from a standard base material. * The factors and their levels were: | Acid strength | (A): | 87%, 93% | |:----|----|:----| | **Reaction time** | **(B):** | **15mins, 30mins** | | **Amount of acid** | **(C):** | **35ml, 45ml** | | **Reaction temperature** | **(D):** | `\(\boldsymbol{60}^\circ\)`C, `\(\boldsymbol{70}^\circ\)`C | * The tests were made in random order. * The yields were for the standard order: $$ 6.08\; 6.04\; 6.53\; 6.43\; 6.31\; 6.09\; 6.12\; 6.36 $$ $$ 6.79\; 6.68\; 6.73\; 6.08\; 6.77\; 6.38\; 6.49\; 6.23 $$ ]] .column.bg-brand-gray[.content[ ```r y=c(6.08,6.04,6.53,6.43,6.31,6.09,6.12,6.36, 6.79,6.68,6.73,6.08,6.77,6.38,6.49,6.23) A=factor(c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)) #gl(2,1,16,labels=0:1) B=factor(c(0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1)) #gl(2,2,16,labels=0:1) C=factor(c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1)) #gl(2,4,16,labels=0:1) D=factor(c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1)) #gl(2,8,16,labels=0:1) M1=aov(y~A*B*C*D); summary(M1) #saturated model ``` ``` Df Sum Sq Mean Sq A 1 0.14631 0.14631 B 1 0.00181 0.00181 C 1 0.02326 0.02326 D 1 0.29976 0.29976 A:B 1 0.00001 0.00001 A:C 1 0.00456 0.00456 B:C 1 0.01756 0.01756 A:D 1 0.10401 0.10401 B:D 1 0.25251 0.25251 C:D 1 0.00276 0.00276 A:B:C 1 0.08851 0.08851 A:B:D 1 0.04101 0.04101 A:C:D 1 0.00016 0.00016 B:C:D 1 0.06126 0.06126 A:B:C:D 1 0.00141 0.00141 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Yield of isatin derivative ]] .row[.split-50[ .column[.content[ ```r coef(summary(lm(M1))) ``` ``` Estimate Std. Error t value Pr(>|t|) (Intercept) 6.08 NaN NaN NaN A1 -0.04 NaN NaN NaN B1 0.45 NaN NaN NaN C1 0.23 NaN NaN NaN D1 0.71 NaN NaN NaN A1:B1 -0.06 NaN NaN NaN A1:C1 -0.18 NaN NaN NaN B1:C1 -0.64 NaN NaN NaN A1:D1 -0.07 NaN NaN NaN B1:D1 -0.51 NaN NaN NaN C1:D1 -0.25 NaN NaN NaN A1:B1:C1 0.52 NaN NaN NaN A1:B1:D1 -0.48 NaN NaN NaN A1:C1:D1 -0.10 NaN NaN NaN B1:C1:D1 0.42 NaN NaN NaN A1:B1:C1:D1 0.15 NaN NaN NaN ``` Why? A saturated model with all df for TSS and the intercept. ]] .column.bg-brand-gray[.content[ ### Neglecting higher order interactions * For high order factorial designs we may not be able to obtain many replicates because of large number of plots required. * In such situations we often assume high order interactions are negligible and use high order interaction terms pooled as the residual SS. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Yield of isatin derivative ]] .row[.split-50[ .column[.content[ ```r M2=aov(y~A*B*C*D-A:B:C:D) summary(M2) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) A 1 0.14631 0.14631 104.040 0.0622 . B 1 0.00181 0.00181 1.284 0.4603 C 1 0.02326 0.02326 16.538 0.1535 D 1 0.29976 0.29976 213.160 0.0435 * A:B 1 0.00001 0.00001 0.004 0.9576 A:C 1 0.00456 0.00456 3.240 0.3228 B:C 1 0.01756 0.01756 12.484 0.1756 A:D 1 0.10401 0.10401 73.960 0.0737 . B:D 1 0.25251 0.25251 179.560 0.0474 * C:D 1 0.00276 0.00276 1.960 0.3949 A:B:C 1 0.08851 0.08851 62.938 0.0798 . A:B:D 1 0.04101 0.04101 29.160 0.1166 A:C:D 1 0.00016 0.00016 0.111 0.7952 B:C:D 1 0.06126 0.06126 43.560 0.0957 . Residuals 1 0.00141 0.00141 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` gives access to `\(p\)`-values when RSS has 1 df. ]] .column.bg-brand-gray[.content[ * Since the sums of squares are invariant, we know that the most important coefficients are: D and BD, which are followed by A (where can you see that?). #### .brand-blue[Example - Single] `\(\color{blue}{2^6}\)` .brand-blue[vs] `\(\color{blue}{2^5}\)` .brand-blue[with 2 replicates] * .brand-blue[1 replicate of] `\(\color{blue}{2^6}\)` .brand-blue[experiment:] <br> pool `\(4+\)` interactions to get residual df = `\({6 \choose 6}+{6 \choose 5}+{6 \choose 4}=1+6+15 = 22\)`. * .brand-blue[2 replicates of] `\(\color{blue}{2^5}\)` .brand-blue[experiment:] with 2 replicates, 1 is for TSS and 1 for RSS. So the residual df = `\(2^5=32\)`. * The residual df are comparable but in the first study we obtain information on the main effect and interaction of an extra factor. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Confounding in two `\(2^3\)` experiments in 2 blocks ]] .row[.split-50[ .column[.content[ * As `\(n\)` increases the number of plots required in each block becomes large very quickly. * In practice it is often not possible to find `\(2^n\)` homogenous plots. * We allocate treatments to blocks with fewer plots than treatment such that no information is lost on main effects and possibly low order interactions. <img src="images/confblock.jpg" width="100%" height="100%"> ]] .column.bg-brand-gray[.content[ * .brand-blue[Design I:] each block has 2 observations with A at the high level and 2 observations with A at the low level, so the block effect cancels out. * .brand-blue[Design II:] all high level observations are in Block I. * If the A main effects SS is large, this may due to differences across levels of A or differences across blocks. * So block effect is .brand-blue[confounded] with main effect A and is a poor design. * It can not estimate higher order interactions as main effects are confounded! * Thus use design I. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pilot training ]] .row[.split-70[ .column[.content[ * An unreplicated `\(2^5\)` full factorial design to investigate the effects of five factors on the learning rates of flight trainees. * .brand-blue[Display:] `\(X_2 = -1\)` (symbolic); `\(X_2 = +1\)` (pictorial) * .brand-blue[Display orientation:] `\(X_3 = -1\)` (outside in), `\(X_3 = +1\)` (inside out) * .brand-blue[Crosswind:] `\(X_5 = -1\)` (no); `\(X_5 = 1\)` (yes) * .brand-blue[Command guidance:] `\(X_9\)` * .brand-blue[Flight path prediction:] `\(X_{17}\)` * Why do the main effects have those subscripts? They are the columns in the design matrix in .brand-blue[standard order]. So the subscripts 2,3,5,9,17 ie `\(2^i+1, \ i=0,1,2,3,4\)` correspond to **a b c d e** in <br> .red[1 a b ab c] .blue[ac bc abc d ad] .red[bd abd cd acd bcd] .blue[abcd e ...] The design matrix of -1 and 1 (instead of 0 and 1) gives sum constraints. ]] .column.bg-brand-gray[.content[ <img src="images/flight.jpg" width="100%" height="100%"> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pilot training ]] .row[.split-50[ .column[.content[ ```r Y=c(8.69,7.71,9.03,10.82,8.04,11.54,6.61,10.71, 6.60,7.84,5.45,6.62,6.01,9.53,7.14,8.72, 1.24,6.52,2.73,4.74,6.34,13.73,6.59,13.02, 0.69,5.19,1.12,4.53,4.89,6.67,2.78,7.45) A=rep(c(-1,1),16) B=rep(c(-1,-1,1,1),8) C=rep(c(-1,-1,-1,-1,1,1,1,1),4) D=rep(c(-1,1,-1,1), each=8) E=rep(c(-1,1), each=16) dat=data.frame(Y,A,B,C,D,E) boxplot(Y~A,at=1:2,xlim=c(0,11),col=2,names=c("lA","hA"), main="Boxplots for main effects") boxplot(Y~B,at=3:4,names=c("lB","hB"),col=3,add=TRUE) boxplot(Y~C,at=5:6,names=c("lC","hC"),col=4,add=TRUE) boxplot(Y~D,at=7:8,names=c("lD","hD"),col=5,add=TRUE) boxplot(Y~E,at=9:10,names=c("lE","hE"),col=6,add=TRUE) ``` ]] .column.bg-brand-gray[.content[ <img src="lecture30_2020JC_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pilot training full model ]] .row[.split-50[ .column[.content[ .scroll-box-20[ ```r M1 <- lm(Y ~ A*B*C*D*E, dat); coef(summary(M1)) ``` ``` Estimate Std. Error t value Pr(>|t|) (Intercept) 6.8528125 NaN NaN NaN A 1.6059375 NaN NaN NaN B -0.0990625 NaN NaN NaN C 1.2578125 NaN NaN NaN D -1.1509375 NaN NaN NaN E -1.3384375 NaN NaN NaN A:B -0.0334375 NaN NaN NaN A:C 0.4546875 NaN NaN NaN B:C -0.1340625 NaN NaN NaN A:D -0.2390625 NaN NaN NaN B:D -0.1265625 NaN NaN NaN C:D -0.3109375 NaN NaN NaN A:E 0.6109375 NaN NaN NaN B:E -0.0453125 NaN NaN NaN C:E 0.9115625 NaN NaN NaN D:E -0.1984375 NaN NaN NaN A:B:C 0.0703125 NaN NaN NaN A:B:D 0.0203125 NaN NaN NaN A:C:D -0.3778125 NaN NaN NaN B:C:D 0.2334375 NaN NaN NaN A:B:E -0.1184375 NaN NaN NaN A:C:E -0.1378125 NaN NaN NaN B:C:E 0.0546875 NaN NaN NaN A:D:E -0.1828125 NaN NaN NaN B:D:E 0.0759375 NaN NaN NaN C:D:E -0.5759375 NaN NaN NaN A:B:C:D 0.0615625 NaN NaN NaN A:B:C:E 0.3228125 NaN NaN NaN A:B:D:E 0.3565625 NaN NaN NaN A:C:D:E -0.1215625 NaN NaN NaN B:C:D:E -0.2915625 NaN NaN NaN A:B:C:D:E 0.0428125 NaN NaN NaN ``` ] ]] .column.bg-brand-gray[.content[ ```r mean(Y) #intercept estimate ``` ``` [1] 6.852812 ``` ```r C <- M1$coefficients[-1] #take coeff except intercept summary(C) #can only consider coeff; all r are 0 ``` ``` Min. 1st Qu. Median Mean 3rd Qu. Max. -1.33844 -0.19062 -0.04531 0.01926 0.15469 1.60594 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pilot training simpler model ]] .row[.split-50[ .column[.content[ <img src="lecture30_2020JC_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" /> ``` Shapiro-Wilk normality test data: C W = 0.9063, p-value = 0.01039 ``` So the coefficients are not normal. ]] .column.bg-brand-gray[.content[ ```r M2 <- lm(Y~ A+C+D+E,data=dat) #main effect model summary(M2) ``` ``` Call: lm(formula = Y ~ A + C + D + E, data = dat) Residuals: Min 1Q Median 3Q Max -2.5616 -0.9834 -0.3769 0.5775 4.2009 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.8528 0.3102 22.095 < 2e-16 *** A 1.6059 0.3102 5.178 1.89e-05 *** C 1.2578 0.3102 4.055 0.000382 *** D -1.1509 0.3102 -3.711 0.000946 *** E -1.3384 0.3102 -4.315 0.000191 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.755 on 27 degrees of freedom Multiple R-squared: 0.737, Adjusted R-squared: 0.698 F-statistic: 18.91 on 4 and 27 DF, p-value: 1.62e-07 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example - Pilot training simpler model ]] .row[.split-70[ .column[.content[ <img src="lecture30_2020JC_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ * Normality assumption is better satisfied. * All leverages `\(h_{ii} = \boldsymbol{x}_i^\top (\mathbf{X}^\top\mathbf{X})^{-1}\boldsymbol{x}_i.\)` are all the same. * SE with `\(\text{Cov}(\widehat{\boldsymbol{\beta}})=\hat{\sigma}^2 (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\)` are all the same. ]] ]]