class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Variance component estimation ## .black[STAT3022 Applied Linear Models Lecture 33] <br><br><br> ### .black[2020/02/20] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. Estimation of variance components ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Fun fact ]] .row[.split-60[ .column[.content[ * Shayle R. Searle (1928-2013) was a New Zealand Professor of Biological Statistics at Cornell University and was the prominent leader in variance component estimation and proponent of the use of matrix algebra in statistics. * H. Desmond Patterson and Robin Thompson worked on their seminal paper (1971) at the University of Edinburgh. * Before arriving at Edinburgh, Patterson worked at Rothamsted Experimental Station while Frank Yates was the Head. Thompson later moved to Rothamsted Experimental Station. .bottom_abs.width100[ Patterson, H. D. and Thompson, R. (1971) Recovery of inter-block information when block sizes are unequal. *Biometrika* **58** (3) 545-554. ] ]] .column.bg-brand-gray[.content[ <img src="images/Searle.jpg" width="30%" height="30%"> <img src="images/DesmondPatterson.png" width="50%" height="50%"> ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Moments estimate of variance components ]] .row[.content[ * Recall for one-way ANOVA model II, we model `\(Y_{ij}\)` as `$$Y_{ij} = \color{blue}{\mu} + \color{red}{A_i} + \epsilon_{ij},\quad j=1,\ldots,r,\;\; i=1,\ldots,t,\quad \color{red}{A_i} \sim NID(0,\color{blue}{\sigma^2_A}), \quad \epsilon_{ij} \sim NID(0,\color{blue}{\sigma^2})$$` where `\(\color{red}{A_i}\)` is a random variable independent of `\(\epsilon_{ij}\)`. * Now `\(\displaystyle \hspace{5mm} E\left(\text{MS}_A\right) = E\left(\frac{\text{SS}_A}{t - 1}\right) = \color{blue}{\sigma^2} + r \color{blue}{\sigma^2_A}\ \text{and}\ E(\text{MSR}) = E\left(\frac{\text{RSS}}{rt - t}\right) = \color{blue}{\sigma^2}\)` where `\(\text{SS}_A=\sum\limits_{i=1}^t r(\bar{Y}_{i\bullet}-\bar{Y}_{\bullet \bullet})^2.\)` * By .brand-blue[method of moments] (MM), we have `$$\color{blue}{\hat{\sigma}^2} = \text{MSR} \qquad\text{and}\qquad \color{blue}{\hat{\sigma}^2_A} = \frac{1}{r}(\text{MS}_A - \text{MSR}).$$` * MM estimate of variance components cannot be readily estimated for incomplete designs. ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # ML estimate of variance components ]] .row[.content[ * Rewriting it as a matrix notation we have: `\(\hspace{5mm}\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\color{blue}{\textbf{Z}\boldsymbol{u}}+\boldsymbol{\epsilon}\)` <br> where `\(\begin{bmatrix}\boldsymbol{u}\\ \boldsymbol{\epsilon}\end{bmatrix}\)` `\(\sim\)` `\(N\left(\begin{bmatrix}\boldsymbol{0}\\ \boldsymbol{0}\end{bmatrix} \right.\)`, `\(\left.\color{red}{\begin{bmatrix}\sigma^2_A\boldsymbol{I}_t&\boldsymbol{0}\\ \boldsymbol{0} & \sigma^2\boldsymbol{I}_{rt}\end{bmatrix}}\right)\)`, `\(\textbf{X}=\boldsymbol{1}_{rt}\)`, `\(\boldsymbol{\beta}=\mu\)`, `\(\color{blue}{\textbf{Z}}=\color{blue}{\boldsymbol{I}_t \otimes \boldsymbol{1}_r}\)` and `\(\color{blue}{\boldsymbol{u}}=\color{blue}{(A_1, \ldots, A_t)^\top}\)`. * Then marginally (integrate out random effect), we have `\(\boldsymbol{y} \sim N(\textbf{X}\boldsymbol{\beta}, \color{red}{\textbf{V}})\)` where `\(\color{red}{\textbf{V}}=\color{red}{\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}}\)` `\(=\color{red}{\sigma^2_A\textbf{Z}\textbf{Z}^\top+\sigma^2\textbf{I}_{rt}}= \color{red}{\sigma^2_A\boldsymbol{I}_t\otimes\boldsymbol{J}_r+\sigma^2\textbf{I}_{rt}}\)` and `\(\boldsymbol{J}_r\)` is a `\(r\times r\)` matrix where all the entries are ones. * This is a block diagonal matrix in which each matrix along the diagonal has diagonal entities `\(\sigma_A^2+\sigma^2\)` and off diagonal `\(\sigma_A^2\)`. * The .brand-blue[profile] (fixed effect substituted by their estimates) marginal log-likelihood is given as `$$\ell_p=-\frac{rt}{2}\log(2\pi)-\frac{1}{2}\log\mid \hspace{-1mm}\textbf{V}\hspace{-1mm}\mid -\frac{1}{2}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}})^\top \textbf{V}^{-1}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}})$$` versus the conditional (on `\(\boldsymbol{u}\)` ) likelihood `\(\ell=-\frac{rt}{2}\log(2\pi)-\frac{1}{2}\log |\textbf{R}|-\frac{1}{2}(\boldsymbol{y}- \textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})^\top\textbf{R}^{-1}(\boldsymbol{y}-\textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})\)` * The maximum likelihood (ML) of variance components are estimated from `\(\text{argmax}_{\sigma^2_A,\sigma^2} \ \ell_p.\)` ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Residual (Restricted) likelihood ]] .row[.split-90[ .column[.content[ * Assume that the design matrix `\(\textbf{X}\)` is a full-rank `\(n\times p\)` matrix. Suppose that `\(\textbf{L}=[\textbf{L}_1 \ \textbf{L}_2]\)` is a `\(n\times n\)` matrix and `\(\textbf{L}_1\)` and `\(\textbf{L}_2\)` are `\(n\times p\)` and `\(n\times (n-p)\)` matrix, respectively, such that `\(\textbf{L}_2^\top\textbf{X}=\boldsymbol{0}\)`. * `\(\displaystyle\textbf{L}^\top\boldsymbol{Y}=\begin{bmatrix}\textbf{L}_1^\top\boldsymbol{Y}\\ \textbf{L}_2^\top\boldsymbol{Y}\end{bmatrix}=\begin{bmatrix}\boldsymbol{Y}_1\\ \boldsymbol{Y}_2 \end{bmatrix} \sim N\left(\begin{bmatrix}\textbf{L}_1^\top\textbf{X}\boldsymbol{\beta}\\ \boldsymbol{0}\end{bmatrix},\begin{bmatrix}\textbf{L}_1^\top \color{red}{\textbf{V}}\textbf{L}_1&\textbf{L}_1^\top\color{red}{\textbf{V}}\textbf{L}_2\\ \textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_1&\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2\end{bmatrix}\right).\)` * The marginal likelihood based on `\(\boldsymbol{Y}_2 \sim N(\boldsymbol{0},\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2)\)` is called the .brand-blue[residual likelihood]. * The residual log-likelihood is given as `$$\ell_R=-\frac{n-p}{2}\log(2\pi)-\frac{1}{2}\log |\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2|- \frac{1}{2}\boldsymbol{y}_2^\top(\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2)^{-1}\boldsymbol{y}_2$$` which can also be expressed as (proof out of scope for this course) `$$\ell_R=-\frac{1}{2}\log |\color{red}{\textbf{V}}|-\frac{1}{2}(\boldsymbol{y}- \textbf{X}\hat{\boldsymbol{\beta}})^\top\color{red}{\textbf{V}}^{-1}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}})-\frac{1}{2}\log|\textbf{X}^\top\color{red}{\textbf{V}}^{-1}\textbf{X}|+\text{constant}$$` * The difference between profile and residual likelihood is `\(l_R(\textbf{V})=l_P(\textbf{V})-\frac{1}{2}\log|\textbf{X}^\top\textbf{V}^{-1}\textbf{X}|\)`. * The REML estimate is found from `\(\text{argmax}_{\sigma^2_A,\sigma^2}\ \ell_R\)`. ]] .column.bg-brand-gray[.content[ ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Sodium content in beer ]] .row[.split-50[ .column[.content[ In the beer data with 6 brands and 8 replicates each, * `\(\boldsymbol{y}\in\Bbb{R}^{48}\)`, `\(\mathbf{X}=\mathbf{1}_{48}\)` and `\(\boldsymbol{\beta}=\mu\)`. * `\(\mathbf{Z}=\begin{bmatrix}\mathbf{1}_8&\mathbf{0}&\mathbf{0}&\mathbf{0}& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{1}_8&\mathbf{0}&\mathbf{0}& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{1}_8&\mathbf{0}& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{1}_8& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}& \mathbf{1}_8&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}& \mathbf{0}&\mathbf{1}_8\end{bmatrix}\)` `\(\in\Bbb{R}^{48\times 6}\)`. * `\(\boldsymbol{V}=\mathbf{Z}(\sigma^2_A\mathbf{I}_6)\mathbf{Z}^\top+\sigma^2\mathbf{I}_{48}\)` is a block diagonal matrix in which each matrix along the diagonal has diagonal entities `\(\sigma_A^2+\sigma^2\)` and off diagonal `\(\sigma_A^2\)`. * The two variance parameters to model the random effects are `\(\sigma_A^2,\sigma^2\)` and one fixed effect to model the mean of the responses is `\(\mu\)`. ]] .column.bg-brand-gray[.content[ .scroll-box-12[ ```r Z ``` ``` [,1] [,2] [,3] [,4] [,5] [,6] 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 0 0 0 4 1 0 0 0 0 0 5 1 0 0 0 0 0 6 1 0 0 0 0 0 7 1 0 0 0 0 0 8 1 0 0 0 0 0 9 0 1 0 0 0 0 10 0 1 0 0 0 0 11 0 1 0 0 0 0 12 0 1 0 0 0 0 13 0 1 0 0 0 0 14 0 1 0 0 0 0 15 0 1 0 0 0 0 16 0 1 0 0 0 0 17 0 0 1 0 0 0 18 0 0 1 0 0 0 19 0 0 1 0 0 0 20 0 0 1 0 0 0 21 0 0 1 0 0 0 22 0 0 1 0 0 0 23 0 0 1 0 0 0 24 0 0 1 0 0 0 25 0 0 0 1 0 0 26 0 0 0 1 0 0 27 0 0 0 1 0 0 28 0 0 0 1 0 0 29 0 0 0 1 0 0 30 0 0 0 1 0 0 31 0 0 0 1 0 0 32 0 0 0 1 0 0 33 0 0 0 0 1 0 34 0 0 0 0 1 0 35 0 0 0 0 1 0 36 0 0 0 0 1 0 37 0 0 0 0 1 0 38 0 0 0 0 1 0 39 0 0 0 0 1 0 40 0 0 0 0 1 0 41 0 0 0 0 0 1 42 0 0 0 0 0 1 43 0 0 0 0 0 1 44 0 0 0 0 0 1 45 0 0 0 0 0 1 46 0 0 0 0 0 1 47 0 0 0 0 0 1 48 0 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$brand [1] "contr.treatment" ``` ] ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Sodium content in beer `\(\mathbf{V}\)` matrix ]] .row[.content[ ```r V[1:20,1:19] #take sigma_A^2=0.1 and sigma^2=1 as an example ``` ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 7 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8 0.1 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.0 0.0 0.0 14 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.0 0.0 0.0 15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.0 0.0 0.0 16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.0 0.0 0.0 17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0.1 0.1 18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.1 0.1 19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 1.1 20 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 ``` ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Sodium content in beer, ML vs REML ]] .row[.split-50[ .column[.content[ ```r M1=lmer(sodium ~ 1+(1|brand), data=dat, REML=FALSE) summary(M1) ``` ``` Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: sodium ~ 1 + (1 | brand) Data: dat AIC BIC logLik deviance df.resid 157.9 163.6 -76.0 151.9 45 Scaled residuals: Min 1Q Median 3Q Max -1.89956 -0.68858 0.08401 0.79244 1.64595 Random effects: Groups Name Variance Std.Dev. brand (Intercept) 17.713 4.2087 Residual 0.716 0.8461 Number of obs: 48, groups: brand, 6 Fixed effects: Estimate Std. Error t value (Intercept) 17.629 1.723 10.23 ``` ]] .column.bg-brand-gray[.content[ ```r M2=lmer(sodium ~ 1+(1|brand), data=dat, REML=TRUE) summary(M2) ``` ``` Linear mixed model fit by REML ['lmerMod'] Formula: sodium ~ 1 + (1 | brand) Data: dat REML criterion at convergence: 148.9 Scaled residuals: Min 1Q Median 3Q Max -1.90551 -0.68338 0.08232 0.79247 1.64969 Random effects: Groups Name Variance Std.Dev. brand (Intercept) 21.274 4.6123 Residual 0.716 0.8461 Number of obs: 48, groups: brand, 6 Fixed effects: Estimate Std. Error t value (Intercept) 17.629 1.887 9.343 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Sodium content in beer ]] .row[.split-50[ .column[.content[ Fixed effect model ```r M3=lm(sodium ~ brand); anova(M3) ``` ``` Analysis of Variance Table Response: sodium Df Sum Sq Mean Sq F value Pr(>F) brand 5 854.53 170.906 238.71 < 2.2e-16 *** Residuals 42 30.07 0.716 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` * F-test of `\(H_0: \sigma_A^2=0\)` for random effect model is the same as `\(H_0: \alpha_i=0\)` for fixed effect model. Since p-value `\(\approx 0\)`, random effects are significant. * MM and REML estimate `\(\hat{\sigma}^2_A\)` equals: `\(\frac{1}{r}(\text{MS}_A - \text{MSR})=(170.906-0.716)/8= 21.274\)`, . * ML and REML estimates are the same for `\(\sigma^2\)` and `\(\mu\)` and `\(\hat{\sigma}^2_A=17.713,\ 21.274\)` respectively. ]] .column.bg-brand-gray[.content[ ```r ranef(M1); ranef(M2) #A_i; REML est larger ``` ``` $brand (Intercept) 1 5.9781296 2 -6.9192082 3 1.6997456 4 -0.1285173 5 -3.3994911 6 2.7693416 with conditional variances for "brand" ``` ``` $brand (Intercept) 1 5.9831634 2 -6.9250345 3 1.7011768 4 -0.1286256 5 -3.4023537 6 2.7716735 with conditional variances for "brand" ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # ML vs. REML for linear fixed effect model ]] .row[.split-50[ .column[.content[ * We will illustrate the difference between ML and REML using `\(\color{red}{\sigma^2}\)` in a linear fixed effect model `$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon}\sim N(\boldsymbol{0},\color{red}{\sigma^2\boldsymbol{I}_n})$$` where `\(\textbf{X}\)` is a `\(n\times p\)` matrix of full-rank. * Roughly, the difference between REML and ML estimates of variance component `\(\sigma_A^2\)` is comparable to estimating `\(\sigma^2\)` in a fixed-effects regression by `\(\text{RSS}/(n-p)\)` versus `\(\text{RSS}/n\)`. * So the REML estimate of `\(\sigma_A^2\)` is larger as shown in the beer data. * For REML estimate, we will additionally assume that `\(\textbf{L}_2^\top\textbf{L}_2=\boldsymbol{I}_{n-p}\)` and `\(\textbf{L}_2^\top\textbf{L}_1=\boldsymbol{0}\)` to make the algebra simpler. ]] .column.bg-brand-gray[.content[ * The default method for linear mixed models is restricted (or residual) maximum likelihood (REML). * ML estimates can be requested by specifying `REML = FALSE` in the call to `lmer`. * Generally REML estimates of variance components are preferred. ML estimates are known to be biased. Although REML estimates are not guaranteed to be unbiased, they are usually less biased than ML estimates. * .brand-blue[Warning]: the residual likelihood is not unique so should not be compared between different programs! * For a balanced, one-way classification like the beer data, the REML and ML estimates of the fixed-effects are identical. ]] ]]