Variance component estimation

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Variance component estimation
## .black[STAT3022 Applied Linear Models Lecture 33]

### .black[2020/02/20]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Estimation of variance
components

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Fun fact
]]
.row[.split-60[
.column[.content[
* Shayle R. Searle (1928-2013) was a New Zealand Professor of Biological Statistics at Cornell University and was the prominent leader in variance component estimation and proponent of the use of matrix algebra in statistics.
*	H. Desmond Patterson and Robin Thompson worked on their seminal paper (1971) at the University of Edinburgh.
* Before arriving at Edinburgh, Patterson worked at Rothamsted Experimental Station while Frank Yates was the Head. Thompson later moved to Rothamsted Experimental Station.

.bottom_abs.width100[
Patterson, H. D. and Thompson, R. (1971) Recovery of inter-block information when block sizes are unequal. *Biometrika* **58** (3) 545-554.
]
]]
.column.bg-brand-gray[.content[

<img src="images/DesmondPatterson.png" width="50%" height="50%">
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Moments estimate of variance components
]]
.row[.content[

* Recall for one-way ANOVA model II, we model `$Y_{ij}$` as
	`$$Y_{ij} = \color{blue}{\mu} + \color{red}{A_i} + \epsilon_{ij},\quad j=1,\ldots,r,\;\; i=1,\ldots,t,\quad \color{red}{A_i} \sim NID(0,\color{blue}{\sigma^2_A}), \quad  \epsilon_{ij} \sim  NID(0,\color{blue}{\sigma^2})$$`
	where `$\color{red}{A_i}$` is a random variable independent of `$\epsilon_{ij}$`. 
* Now `$\displaystyle \hspace{5mm} E\left(\text{MS}_A\right)  = E\left(\frac{\text{SS}_A}{t - 1}\right) = \color{blue}{\sigma^2} + r \color{blue}{\sigma^2_A}\ \text{and}\ E(\text{MSR}) = E\left(\frac{\text{RSS}}{rt - t}\right) = \color{blue}{\sigma^2}$`  where  `$\text{SS}_A=\sum\limits_{i=1}^t r(\bar{Y}_{i\bullet}-\bar{Y}_{\bullet \bullet})^2.$`
* By .brand-blue[method of moments] (MM), we have 
	`$$\color{blue}{\hat{\sigma}^2} = \text{MSR} \qquad\text{and}\qquad \color{blue}{\hat{\sigma}^2_A} =  \frac{1}{r}(\text{MS}_A - \text{MSR}).$$`

* MM estimate of variance components cannot be readily estimated for incomplete designs.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# ML estimate of variance components
]]
.row[.content[

* Rewriting it as a matrix notation we have: 
`$\hspace{5mm}\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\color{blue}{\textbf{Z}\boldsymbol{u}}+\boldsymbol{\epsilon}$` 
where `$\begin{bmatrix}\boldsymbol{u}\\ \boldsymbol{\epsilon}\end{bmatrix}$` `$\sim$` `$N\left(\begin{bmatrix}\boldsymbol{0}\\ \boldsymbol{0}\end{bmatrix} \right.$`, `$\left.\color{red}{\begin{bmatrix}\sigma^2_A\boldsymbol{I}_t&\boldsymbol{0}\\ \boldsymbol{0} & \sigma^2\boldsymbol{I}_{rt}\end{bmatrix}}\right)$`,
`$\textbf{X}=\boldsymbol{1}_{rt}$`, `$\boldsymbol{\beta}=\mu$`, `$\color{blue}{\textbf{Z}}=\color{blue}{\boldsymbol{I}_t \otimes \boldsymbol{1}_r}$` and `$\color{blue}{\boldsymbol{u}}=\color{blue}{(A_1, \ldots, A_t)^\top}$`.

* Then marginally (integrate out random effect), we have `$\boldsymbol{y} \sim N(\textbf{X}\boldsymbol{\beta}, \color{red}{\textbf{V}})$`  where `$\color{red}{\textbf{V}}=\color{red}{\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}}$` `$=\color{red}{\sigma^2_A\textbf{Z}\textbf{Z}^\top+\sigma^2\textbf{I}_{rt}}= \color{red}{\sigma^2_A\boldsymbol{I}_t\otimes\boldsymbol{J}_r+\sigma^2\textbf{I}_{rt}}$` and `$\boldsymbol{J}_r$` is a `$r\times r$` matrix where all the entries are ones.

* This is a block diagonal matrix in which each matrix along the diagonal has diagonal entities `$\sigma_A^2+\sigma^2$` and off diagonal `$\sigma_A^2$`.

* The .brand-blue[profile] (fixed effect substituted by their estimates) marginal log-likelihood is given as

`$$\ell_p=-\frac{rt}{2}\log(2\pi)-\frac{1}{2}\log\mid \hspace{-1mm}\textbf{V}\hspace{-1mm}\mid -\frac{1}{2}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}})^\top \textbf{V}^{-1}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}})$$`

versus the conditional (on `$\boldsymbol{u}$` ) likelihood `$\ell=-\frac{rt}{2}\log(2\pi)-\frac{1}{2}\log |\textbf{R}|-\frac{1}{2}(\boldsymbol{y}- \textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})^\top\textbf{R}^{-1}(\boldsymbol{y}-\textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})$`

* The maximum likelihood (ML) of variance components are estimated from `$\text{argmax}_{\sigma^2_A,\sigma^2} \ \ell_p.$`

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Residual (Restricted) likelihood
]]
.row[.split-90[
.column[.content[

* Assume that the design matrix `$\textbf{X}$` is a full-rank `$n\times p$`  matrix. Suppose that `$\textbf{L}=[\textbf{L}_1 \ \textbf{L}_2]$` is a `$n\times n$`  matrix and `$\textbf{L}_1$` and `$\textbf{L}_2$` are `$n\times p$` and `$n\times (n-p)$` matrix, respectively, such that `$\textbf{L}_2^\top\textbf{X}=\boldsymbol{0}$`. 
	
* `$\displaystyle\textbf{L}^\top\boldsymbol{Y}=\begin{bmatrix}\textbf{L}_1^\top\boldsymbol{Y}\\ \textbf{L}_2^\top\boldsymbol{Y}\end{bmatrix}=\begin{bmatrix}\boldsymbol{Y}_1\\ \boldsymbol{Y}_2	\end{bmatrix} \sim N\left(\begin{bmatrix}\textbf{L}_1^\top\textbf{X}\boldsymbol{\beta}\\ \boldsymbol{0}\end{bmatrix},\begin{bmatrix}\textbf{L}_1^\top  \color{red}{\textbf{V}}\textbf{L}_1&\textbf{L}_1^\top\color{red}{\textbf{V}}\textbf{L}_2\\ \textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_1&\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2\end{bmatrix}\right).$`
	 
* The marginal likelihood based on `$\boldsymbol{Y}_2 \sim N(\boldsymbol{0},\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2)$` is called the .brand-blue[residual likelihood].

* The residual log-likelihood is given as 
	`$$\ell_R=-\frac{n-p}{2}\log(2\pi)-\frac{1}{2}\log |\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2|- \frac{1}{2}\boldsymbol{y}_2^\top(\textbf{L}_2^\top\color{red}{\textbf{V}}\textbf{L}_2)^{-1}\boldsymbol{y}_2$$`
	which can also be expressed as (proof out of scope for this course)
	`$$\ell_R=-\frac{1}{2}\log |\color{red}{\textbf{V}}|-\frac{1}{2}(\boldsymbol{y}- \textbf{X}\hat{\boldsymbol{\beta}})^\top\color{red}{\textbf{V}}^{-1}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}})-\frac{1}{2}\log|\textbf{X}^\top\color{red}{\textbf{V}}^{-1}\textbf{X}|+\text{constant}$$`

* The difference between profile and residual likelihood is `$l_R(\textbf{V})=l_P(\textbf{V})-\frac{1}{2}\log|\textbf{X}^\top\textbf{V}^{-1}\textbf{X}|$`.

* The REML estimate is found from `$\text{argmax}_{\sigma^2_A,\sigma^2}\ \ell_R$`.

]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sodium content in beer
]]
.row[.split-50[
.column[.content[

In the beer data with 6 brands and 8 replicates each,

* `$\boldsymbol{y}\in\Bbb{R}^{48}$`, `$\mathbf{X}=\mathbf{1}_{48}$` and `$\boldsymbol{\beta}=\mu$`.
* 
`$\mathbf{Z}=\begin{bmatrix}\mathbf{1}_8&\mathbf{0}&\mathbf{0}&\mathbf{0}& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{1}_8&\mathbf{0}&\mathbf{0}& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{1}_8&\mathbf{0}& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{1}_8& \mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}& \mathbf{1}_8&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}& \mathbf{0}&\mathbf{1}_8\end{bmatrix}$` `$\in\Bbb{R}^{48\times 6}$`. 
* `$\boldsymbol{V}=\mathbf{Z}(\sigma^2_A\mathbf{I}_6)\mathbf{Z}^\top+\sigma^2\mathbf{I}_{48}$` is a block diagonal matrix in which each matrix along the diagonal has diagonal entities `$\sigma_A^2+\sigma^2$` and off diagonal `$\sigma_A^2$`.

* The two variance parameters to model the random effects are `$\sigma_A^2,\sigma^2$` and one fixed effect to model the mean of the responses is `$\mu$`.
]]
.column.bg-brand-gray[.content[

```r
Z
```

```
     [,1] [,2] [,3] [,4] [,5] [,6]
  1     1    0    0    0    0    0
  2     1    0    0    0    0    0
  3     1    0    0    0    0    0
  4     1    0    0    0    0    0
  5     1    0    0    0    0    0
  6     1    0    0    0    0    0
  7     1    0    0    0    0    0
  8     1    0    0    0    0    0
  9     0    1    0    0    0    0
  10    0    1    0    0    0    0
  11    0    1    0    0    0    0
  12    0    1    0    0    0    0
  13    0    1    0    0    0    0
  14    0    1    0    0    0    0
  15    0    1    0    0    0    0
  16    0    1    0    0    0    0
  17    0    0    1    0    0    0
  18    0    0    1    0    0    0
  19    0    0    1    0    0    0
  20    0    0    1    0    0    0
  21    0    0    1    0    0    0
  22    0    0    1    0    0    0
  23    0    0    1    0    0    0
  24    0    0    1    0    0    0
  25    0    0    0    1    0    0
  26    0    0    0    1    0    0
  27    0    0    0    1    0    0
  28    0    0    0    1    0    0
  29    0    0    0    1    0    0
  30    0    0    0    1    0    0
  31    0    0    0    1    0    0
  32    0    0    0    1    0    0
  33    0    0    0    0    1    0
  34    0    0    0    0    1    0
  35    0    0    0    0    1    0
  36    0    0    0    0    1    0
  37    0    0    0    0    1    0
  38    0    0    0    0    1    0
  39    0    0    0    0    1    0
  40    0    0    0    0    1    0
  41    0    0    0    0    0    1
  42    0    0    0    0    0    1
  43    0    0    0    0    0    1
  44    0    0    0    0    0    1
  45    0    0    0    0    0    1
  46    0    0    0    0    0    1
  47    0    0    0    0    0    1
  48    0    0    0    0    0    1
  attr(,"assign")
  [1] 1 1 1 1 1 1
  attr(,"contrasts")
  attr(,"contrasts")$brand
  [1] "contr.treatment"
```
]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sodium content in beer `$\mathbf{V}$` matrix 
]]
.row[.content[

```r
V[1:20,1:19] #take sigma_A^2=0.1 and sigma^2=1 as an example
```

```
       1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
  1  1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  2  0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  3  0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  4  0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  5  0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  6  0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  7  0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  8  0.1 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  9  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0
  10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0
  11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0
  12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0
  13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.1 0.0 0.0 0.0
  14 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.1 0.0 0.0 0.0
  15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.1 0.0 0.0 0.0
  16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 1.1 0.0 0.0 0.0
  17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0.1 0.1
  18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.1 0.1
  19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 1.1
  20 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
#  Example: Sodium content in beer, ML vs REML
]]
.row[.split-50[
.column[.content[

```r
M1=lmer(sodium ~ 1+(1|brand), data=dat, REML=FALSE)
summary(M1)
```

```
  Linear mixed model fit by maximum likelihood  ['lmerMod']
  Formula: sodium ~ 1 + (1 | brand)
     Data: dat
  
       AIC      BIC   logLik deviance df.resid 
     157.9    163.6    -76.0    151.9       45 
  
  Scaled residuals: 
       Min       1Q   Median       3Q      Max 
  -1.89956 -0.68858  0.08401  0.79244  1.64595 
  
  Random effects:
   Groups   Name        Variance Std.Dev.
   brand    (Intercept) 17.713   4.2087  
   Residual              0.716   0.8461  
  Number of obs: 48, groups:  brand, 6
  
  Fixed effects:
              Estimate Std. Error t value
  (Intercept)   17.629      1.723   10.23
```

]]
.column.bg-brand-gray[.content[

```r
M2=lmer(sodium ~ 1+(1|brand), data=dat, REML=TRUE)
summary(M2)
```

```
  Linear mixed model fit by REML ['lmerMod']
  Formula: sodium ~ 1 + (1 | brand)
     Data: dat
  
  REML criterion at convergence: 148.9
  
  Scaled residuals: 
       Min       1Q   Median       3Q      Max 
  -1.90551 -0.68338  0.08232  0.79247  1.64969 
  
  Random effects:
   Groups   Name        Variance Std.Dev.
   brand    (Intercept) 21.274   4.6123  
   Residual              0.716   0.8461  
  Number of obs: 48, groups:  brand, 6
  
  Fixed effects:
              Estimate Std. Error t value
  (Intercept)   17.629      1.887   9.343
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sodium content in beer
]]
.row[.split-50[
.column[.content[

Fixed effect model

```r
M3=lm(sodium ~ brand); anova(M3)
```

```
 Analysis of Variance Table
 
 Response: sodium
 Df Sum Sq Mean Sq F value Pr(>F) 
 brand 5 854.53 170.906 238.71 < 2.2e-16 ***
 Residuals 42 30.07 0.716 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

* F-test of `$H_0: \sigma_A^2=0$` for random effect model is the same as `$H_0: \alpha_i=0$` for fixed effect model. Since p-value `$\approx 0$`, random effects are significant.

* MM and REML estimate `$\hat{\sigma}^2_A$` equals: 
`$\frac{1}{r}(\text{MS}_A - \text{MSR})=(170.906-0.716)/8= 21.274$`, .

* ML and REML estimates are the same for `$\sigma^2$` and `$\mu$` and 
`$\hat{\sigma}^2_A=17.713,\ 21.274$` respectively.

]]
.column.bg-brand-gray[.content[

```r
ranef(M1); ranef(M2) #A_i; REML est larger
```

```
  $brand
    (Intercept)
  1   5.9781296
  2  -6.9192082
  3   1.6997456
  4  -0.1285173
  5  -3.3994911
  6   2.7693416
  
  with conditional variances for "brand"
```

```
  $brand
    (Intercept)
  1   5.9831634
  2  -6.9250345
  3   1.7011768
  4  -0.1286256
  5  -3.4023537
  6   2.7716735
  
  with conditional variances for "brand"
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# ML vs. REML for linear fixed effect model
]]
.row[.split-50[
.column[.content[

* We will illustrate the difference between ML and REML using  `$\color{red}{\sigma^2}$` in a linear fixed effect model
	`$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon}\sim  N(\boldsymbol{0},\color{red}{\sigma^2\boldsymbol{I}_n})$$`
	where `$\textbf{X}$` is a `$n\times p$` matrix of full-rank.
* Roughly, the difference between REML and ML estimates of
    variance component `$\sigma_A^2$` is comparable to estimating `$\sigma^2$` in a fixed-effects regression by `$\text{RSS}/(n-p)$` versus     `$\text{RSS}/n$`.

* So the REML estimate of `$\sigma_A^2$` is larger as shown in the beer data.    
* For REML estimate, we will additionally assume that `$\textbf{L}_2^\top\textbf{L}_2=\boldsymbol{I}_{n-p}$` and `$\textbf{L}_2^\top\textbf{L}_1=\boldsymbol{0}$` to make the algebra simpler.

]]
.column.bg-brand-gray[.content[

* The default method for linear mixed
    models is restricted (or residual) maximum likelihood (REML).
* ML estimates can be requested by specifying `REML =
      FALSE` in the call to `lmer`.
* Generally REML estimates of variance components are
    preferred.  ML estimates are known to be biased.  Although REML
    estimates are not guaranteed to be unbiased, they are usually less
    biased than ML estimates.
* .brand-blue[Warning]: the residual likelihood is not unique so should not be compared between different programs!

* For a balanced, one-way classification like the
    beer data, the REML and ML estimates of the fixed-effects
    are identical.

]]
]]