class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # One-way ANOVA Part I ## .black[STAT3022 Applied Linear Models Lecture 14] <br><br><br> ### .black[2020/02/17] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. One-way ANOVA 1. Within and between sum of squares 1. Sum constraint 1. Treatment constraint 1. Contrasts 1. Confidence intervals for contrasts ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # One-way ANOVA ]] .row[.split-two[ .column[.content[ * One-way ANOVA was already covered in DATA2002/STAT2012/STAT2912 and is therefore assumed knowledge. * However, in this course we derive all underlying quantities more carefully and it is expected that the various formulae are understood. ### Scope A one-way ANOVA can be used for comparing `\(t\)` different treatments when it is reasonable to assume that the treatment only affects the mean response (but not the entire distribution!). ]] .column.bg-brand-gray[.content[ ### Example - Stimulating effects of caffeine Does caffeine stimulation affect the rate at which individuals can tap their fingers? * `\(n=30\)` male students randomly allocated to `\(t=3\)` groups of `\(10\)` students each: * Group 1: zero caffeine dose * Group 2: low caffeine dose * Group 3: high caffeine dose * Allocation was blind (subjects did not know their caffeine dosage). * Two hours after treatment, each subject tapped fingers as quickly as possible for a minute. Number of taps recorded. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Caffeine: Variable types and summaries ]] .row[.split-two[ .column.bg-brand-gray[.content[ ```r caffeine <- read.table(file="data/caffeine.txt", header=TRUE) skimr::skim(caffeine) ``` ``` Skim summary statistics n obs: 30 n variables: 2 -- Variable type:factor ---------------------------------------------------------------------------------------------------- variable missing n n_unique ordered Dose 0 30 3 FALSE -- Variable type:integer --------------------------------------------------------------------------------------------------- variable missing n mean sd p0 p25 p50 p75 p100 hist Taps 0 30 246.5 2.6 242 245 246.5 248 252 <U+2583><U+2583><U+2583><U+2587><U+2587><U+2581><U+2583><U+2581> ``` ```r m=tapply(caffeine$Taps,caffeine$Dose,mean); m ``` ``` High Low Zero 248.3 246.4 244.8 ``` ]] .column.bg-white[.content[ ```r ggplot(caffeine, aes(Dose, Taps)) + geom_boxplot() + geom_point() + geom_jitter() ``` <img src="lecture14_2020JC_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # One-Way ANOVA ]] .row[.split-two[ .column[.content[ ### Getting the ANOVA table in R ```r anova(lm(Taps ~ Dose, data=caffeine)) ``` ``` Analysis of Variance Table Response: Taps Df Sum Sq Mean Sq F value Pr(>F) Dose 2 61.4 30.7000 6.1812 0.006163 ** Residuals 27 134.1 4.9667 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] .column.bg-brand-gray[.content[ ```r summary(aov(Taps ~ Dose, data=caffeine)) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) Dose 2 61.4 30.700 6.181 0.00616 ** Residuals 27 134.1 4.967 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Model equation ]] .row[.split-two[ .column[.content[ * The **null model** (under `\(H_0\)`) is `$$Y_{ij} = \mu + \epsilon_{ij}, \quad j=1,\ldots,n_{i},\;\; i=1,\ldots,t$$` where `\(\epsilon_{ij}\sim NID(0,\sigma^2)\)`. * The **full model** (under `\(H_1\)`) is `$$Y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad j=1,\ldots,n_{i},\;\; i=1,\ldots,t$$` where `\(\epsilon_{ij}\sim NID(0,\sigma^2)\)`. * There are `\(n_i\)` observations receiving treatment `\(i\)` over all `\(t\)` treatments and total observations are `$$n = \sum_{i=1}^t n_i.$$` * Here covariate is .brand-blue[categorical treatment types] and regression parameters are `\(\alpha_i\)` for each level `\(i\)`. ]] .column.bg-brand-gray[.content[ * `\(\boldsymbol{\beta} = (\mu, \alpha_1,\alpha_2,\ldots,\alpha_t)^\top\)`, * `\(\boldsymbol{Y} = (Y_{11},\ldots,Y_{1n_1},Y_{21},\ldots,Y_{2n_2},\ldots,Y_{tn_t})^\top\)` then, `$$\mathbf{X}=\begin{pmatrix} 1 & 1 & 0 & \cdots & 0 \\ 1 & 1 & \vdots & \ddots & \vdots \\ 1 & 1 & 0 & \cdots & 0 \\ 1 & 0 & 1 & 0 \cdots & 0 \\ \vdots & \ddots & & & \\ 1 & 0 & \cdots & \cdots 0 & 1 \\ \end{pmatrix}\begin{matrix} \\ \Rightarrow n_1 \text{ rows}\\ \\\Rightarrow n_2 \text{ rows} \\ \vdots \\ \\ \end{matrix}$$` * `\(\mathbf{X}\)` is an `\(n\times(t+1)\)` matrix of rank `\(t\)` because there are only `\(\color{blue}{t}\)` .brand-blue[distinct columns] in `\(\mathbf{X}\)`. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # LS for one-way ANOVA ]] .row[.split-two[ .column[.content[ * The least squares estimates `\(\hat{\boldsymbol{\beta}}\)` for `\(\boldsymbol{\beta}\)` is `\(\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1}\mathbf{X}^\top \boldsymbol{Y}\)` * Summing the estimated full model over `\(i\)` and `\(j\)`: (1): `\(\hspace{5mm} n\hat{\mu} + n_1 \hat{\alpha}_1 + \ldots + n_t \hat{\alpha}_t = \sum_{i=1}^t \sum_{j=1}^{n_i} Y_{ij} = Y_{\bullet\bullet}\)` and summing over `\(j\)` for `\(i=1,\ldots,t\)`: `\((i+1)\)`: `\(n_i\hat{\mu} + n_i \hat{\alpha}_i = \sum_{j=1}^{n_i} Y_{ij} = Y_{i\bullet} = n_i \bar{Y}_{i\bullet}\)` * These equations are `\(\begin{pmatrix} n & n_1 & \cdots & n_t \\ n_1 & n_1 & 0\cdots &0 \\ \vdots & 0 & \ddots & \vdots \\ n_t& 0 & \cdots 0 & n_t \\ \end{pmatrix} \begin{pmatrix} \hat \mu \\ \hat \alpha_1 \\ \vdots \\ \hat \alpha_t \end{pmatrix}=\begin{pmatrix} Y_{\bullet\bullet} \\ Y_{1\bullet} \\ \vdots \\ Y_{t\bullet} \end{pmatrix}\)` where the first matrix is `\(\mathbf{X}^\top\mathbf{X}\)`. ]] .column.bg-brand-gray[.content[ ```r X <- model.matrix(~ Dose,data=caffeine,contrasts.arg= list(Dose=contrasts(caffeine$Dose,contrasts=F))) t(X) %*% X ``` ``` (Intercept) DoseHigh DoseLow DoseZero (Intercept) 30 10 10 10 DoseHigh 10 10 0 0 DoseLow 10 0 10 0 DoseZero 10 0 0 10 ``` ```r all.equal(det(t(X) %*% X), 0) ``` ``` [1] TRUE ``` * The first column of `\(\mathbf{X}^\top \mathbf{X}\)` is the sum of remaining columns. Hence `\(\mathbf{X}^\top \mathbf{X}\)` is .brand-blue[singular] and its .brand-blue[inverse does not exist]. * Hence equations (1)-(t+1) and so `\(\hat{\boldsymbol{\beta}}\)` do **not** have unique solution. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Constraints ]] .row[.split-40[ .column[.content[ To solve the least squares equations we need to impose a further condition. * .brand-blue[Sum constraint]: (i) `\(\sum_{i=1}^t n_i \hat\alpha_i = 0\)`, or * .brand-blue[Treatment constraint]: (ii) `\(\hat\alpha_1 =0\)` Parameter estimates are different under different constraints! * From (i), Eqn (1) and (i+1): <br> `\(n \hat \mu+0 = Y_{\bullet\bullet} \Rightarrow \color{blue}{\hat{\mu} = \bar{Y}_{\bullet\bullet}}\)` and <br> `\(n_i\bar{Y}_{\bullet\bullet}+n_i \hat \alpha_i = n_i \bar Y_{i \bullet} \Rightarrow \color{blue}{\hat{\alpha}_i = \overline{Y}_{i\bullet} - \overline{Y}_{\bullet\bullet}}\)` * From (ii), Eqn (2) and (i+1):: <br> `\(n_1 \hat{\mu} + n_1 0 = n_1 \bar{Y}_{1\bullet} \Rightarrow \color{blue}{\hat{\mu}=\bar{Y}_{1\bullet}}\)` and <br> `\(n_i \bar{Y}_{1\bullet} + n_i\hat\alpha_i = n_i \overline{Y}_{i\bullet} \Rightarrow \color{blue}{\hat\alpha_i = \overline{Y}_{i\bullet} - \overline{Y}_{1\bullet}},\)` <br> `\(i=2,\ldots t.\)` For both constraints, `\(\hat{\mu}+\hat\alpha_i = \overline{Y}_{i\bullet}\)`. ]] .column.bg-brand-gray[.content[ ```r #treatment constraint with different parametrisation broom::tidy(lm(Taps ~ Dose, data=caffeine)) ``` ``` term estimate std.error statistic p.value 1 (Intercept) 248.3 0.7047458 352.325610 5.456621e-51 2 DoseLow -1.9 0.9966611 -1.906365 6.729867e-02 3 DoseZero -3.5 0.9966611 -3.511725 1.584956e-03 ``` ```r broom::tidy(lm(Taps ~ Dose - 1, data=caffeine)) ``` ``` term estimate std.error statistic p.value 1 DoseHigh 248.3 0.7047458 352.3256 5.456621e-51 2 DoseLow 246.4 0.7047458 349.6296 6.713935e-51 3 DoseZero 244.8 0.7047458 347.3593 8.004821e-51 ``` ```r c(m,m[2]-m[1],m[3]-m[1]) #mean, diff of Low and Zero from High ``` ``` High Low Zero Low Zero 248.3 246.4 244.8 -1.9 -3.5 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Contrasts ]] .row[.split-90[ .column[.content[ * Contrasts are functions of the `\(\alpha_i\)` that have the same estimates .brand-blue[regardless of the constraint used]. * To test `\(H_0: \alpha_1+\alpha_2=2 \alpha_3\)` or `\(c=\alpha_1+\alpha_2-2 \alpha_3=0\)` under `\(H_0\)`, we have `\((c_1,c_2,c_3)=(1,1,-2)\)` with a sum of 0. In general, the .brand-blue[contrast function] is `$$c = \sum_{i=1}^t c_i \alpha_i, \hspace{3mm} \mbox{where} \hspace{3mm} \color{blue}{\sum_{i=1}^t c_i =0}.$$` * Under .brand-blue[sum] constraint (i): `\(\sum_{i=1}^t n_i \hat\alpha_i = 0\)`, `$$\hat{c} = \sum_{i=1}^tc_i\hat{\alpha}_i = \sum_{i=1}^t c_i (\bar{Y}_{i\bullet}-\bar Y_{\bullet \bullet})= \sum_{i=1}^t c_i \bar{Y}_{i\bullet}-\bar Y_{\bullet \bullet}\color{blue}{\sum_{i=1}^t c_i}= \sum_{i=1}^t c_i \bar{Y}_{i\bullet}.$$` * Under .brand-blue[treatment] constraint (ii): `\(\hat{\alpha}_1 = 0\)`, `\begin{eqnarray*} \hat{c} & = &\sum_{i=1}^tc_i\hat{\alpha}_i = c_1\cdot 0 + \sum_{i=2}^t c_i(\bar{Y}_{i\bullet} - \bar{Y}_{1\bullet}) = \sum_{i=2}^t c_i\bar{Y}_{i\bullet} - \bar{Y}_{1\bullet}\color{blue}{\underbrace{\sum_{i=2}^t c_i}_{= - c_1}} = \sum_{i=1}^t c_i \bar{Y}_{i\bullet}. \end{eqnarray*}` ]] .column[.content[ ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # RSS for alternative model or within treatment SS ]] .row[.split-90[ .column[.content[ * However, the .brand-blue[residual sum of squares] under `\(H_1\)`, is well defined as `\begin{eqnarray*} \text{RSS}_{H_1} &=& \sum_{i=1}^{t}\sum_{j=1}^{n_i} \Big(Y_{ij} - \widehat Y_{ij}\Big)^2 = \sum_{i=1}^{t}\sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \end{eqnarray*}` since `\(\widehat Y_{ij}=\hat \mu+\hat \alpha_i=\bar Y_{i \bullet}\)` for both constraints. Note that `\(\text{RSS}_{H_1}\)` is sometimes called the .brand-blue[within treatment sum of squares (WSS)] asit refers to the variability within each treatment not captured by treatment effect. * For the `\(i\)`th sample, `$$\sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \sim \sigma^2 \chi^2_{n_i -1}.$$` and each sample is independent. * Since the sum of independent `\(\chi^2\)` random variables is `\(\chi^2\)` with df as the sum of the df's, `$$\sum_{i=1}^t \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \sim \sigma^2 \chi^2_{n -t}.$$` ]] .column[.content[ ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Variance of contrasts ]] .row[.split-two[ .column[.content[ `\begin{eqnarray*} Var\Big(\sum_i c_i \hat\alpha_i\Big) &=& Var\Big(\sum_i c_i \bar{Y}_{i\bullet}\Big) \\ &=& \sum_i c^2_i \left(\frac{\sigma^2}{n_i}\right) \\ &=& \sigma^2 \sum_{i=1}^t(c_i^2/n_i). \end{eqnarray*}` This is a very important equation and you should make sure that you fully understand it! ]] .column.bg-brand-gray[.content[ ### Caffeine: Variance of pairwise contrast Consider the contrast `\(\alpha_1-\alpha_2\)`, `\(c_1=1\)`, `\(c_2=-1\)` and `\(c_3 = 0\)`. This contrast looks at the difference of the means of the zero and low dose group: `$$Var(\bar{Y}_{1\bullet}- \bar{Y}_{2\bullet} ) = \frac{\sigma^2}{10} + \frac{\sigma^2}{10}.$$` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Confidence interval for contrasts ]] .row[.split-two[ .column[.content[ Construction of a `\(100(1-\alpha)\%\)` CI for `\(c\)`: `$$\sum_{i=1}^t c_i \bar{Y}_{i\bullet} \pm t_{n-t}(1-\alpha/2)\hat{\sigma}\sqrt{\sum_{i=1}^t \frac{c_i^2}{n_i}},$$` where `\(\hat{\sigma}^2 = \text{RSS}/(n-t)\)`. The common contrasts relate to *pairwise comparisons*, e.g. Treatment 1 vs Treatment 2 or Treatment 1 vs the average of Treatments 2,3,4. ]] .column.bg-brand-gray[.content[ ### Caffeine: CI for pairwise contrast To contrast `\(\alpha_1-\alpha_2\)` between High and Low, we get * `\(\hat c=\sum_{i=1}^t c_i \bar{Y}_{i\bullet} = \bar{Y}_{1\bullet}- \bar{Y}_{2\bullet} = 248.3-246.4=1.9\)`, * `\(t_{27}(0.975) = 2.05, \hspace{3mm} (n=30, p=3)\)`, * `\(\sqrt{\sum_{i=1}^t(c_i^2/n_i)} = \sqrt{1/10+1/10}=1/\sqrt{5}\)` and * `\(\hat\sigma = \sqrt{134.1/27} = 2.23\)`. Thus, the 95% CI for `\(\alpha_1 - \alpha_2\)` becomes, $$ 1.9 \; \pm \; 2.05 \times 2.23/\sqrt{5} =1.9 \; \pm \; 2.05. $$ * The difference from the two-sample CI is the estimator `\(\hat \sigma\)` for the residual se `\(=\sqrt{RSS/(n-p)}\)` such that .brand-blue[all samples] not only the samples from the pair are used. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # RSS for null-model or between treatment SS ]] .row[.split-two[ .column[.content[ * Typically the primary question of interest for a one-way ANOVA model is this: "Is the mean response dependent on the treatment (factor)?" `$$H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_t ~~~~~\mbox{vs}~~~~~ H_1: \exists i_1\neq i_2 \text{ s.t. }\alpha_{i_1} \neq \alpha_{i_2}$$` * Under `\(H_0\)` it follows that `$$Y_{ij} \sim NID(\mu,\sigma^2)$$` and the null-model is `$$Y_{ij} = \mu +\epsilon_{ij} \;\Rightarrow\; \text{RSS}_{H_0} = \sum_{i=1}^t \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{\bullet\bullet})^2$$` since `\(\widehat Y_{ij}=\hat \mu=\bar{Y}_{\bullet\bullet}\)` for both constraints. * `\(\text{RSS}_{H_0}\)` is called the .brand-blue[total sum of squares] (TotSS) for `\(Y\)` if `\(H_0\)` is a null model. ]] .column.bg-brand-gray[.content[ ### The between treatment sum of squares `\begin{align*} &\text{RSS}_{H_0} - \text{RSS}_{H_1} \\ &\quad= \sum_{i=1}^t \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{\bullet\bullet})^2 - \sum_{i=1}^{t}\sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\bullet})^2 \\ &\quad= \sum_{i=1}^t n_i (\bar{Y}_{i\bullet} - \bar{Y}_{\bullet\bullet})^2. \end{align*}` * The between treatment sum of squares measures how much larger the RSS is under `\(H_0\)` compared to `\(H_1\)`. * In multi-way ANOVA the term .brand-blue[treatment sum of squares] (TSS) is more commonly used for the .brand-blue[between sum of squares] (BSS). ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # F-test for treatment effect ]] .row[.split-two[ .column[.content[ To test `\(H_0\)` vs `\(H_1\)` use `$$f_0 = \frac{(\text{RSS}_{H_0} - \text{RSS}_{H_1})/(t-1)}{\text{RSS}_{H_1}/(n-t)} = \frac{\text{BSS}/(t-1)}{\text{WSS}/(n-t)},$$` `$$p=P(F_{t-1,n-t}\geq f_0).$$` Numerator df = `\((n-1) - (n-t) = (t-1)\)`. If, as is often the case, the data are not consistent with `\(H_0\)` then we need to examine contrasts to try to understand how the treatments differ. ]] .column.bg-brand-gray[.content[ <img src="images/Ftest2.png" width="100%" height="100%"> This tests full model against null model. Hence with respect to regression parameters, the null model is empty. ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Caffeine: Omnibus F-test ]] .row[.content[ * The test that `\(H_0: \alpha_1 = \alpha_2 = \alpha_3\)` is rejected. * Reason: `\(f_0 = 6.1812\)` and because `\(F_{2,27}(0.95) = 3.35\)` it follows that the `\(p\)`-value for this test is less than `\(0.05\)`. ```r M1 <- lm(Taps ~ Dose, data=caffeine) anova(M1) ``` ``` Analysis of Variance Table Response: Taps Df Sum Sq Mean Sq F value Pr(>F) Dose 2 61.4 30.7000 6.1812 0.006163 ** Residuals 27 134.1 4.9667 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]]