Linear Mixed Models

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Linear Mixed Models
## .black[STAT3022 Applied Linear Models Lecture 32]

### .black[2020/02/20]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Linear mixed models
2. More lme4
3. More random vs. fixed effects 
4. Borrowing strength
5. Mixed model equations

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Fun fact
]]
.row[.split-60[
.column[.content[
* Charles Roy Henderson (1911-1989) applied quantitative methods for genetic evaluation of domestic livestock.
* He worked at the Department of Animal Science at Cornell University from 1948 to his retirement in 1976 and was the PhD supervisor of Shayle Searle.
* He originated the idea of mixed model equations. 
* Douglas Bates, the Emeritus Professor of Statistics at University of Wisconsin, is the author and maintainer of the R package `lme4`.

]]
.column.bg-brand-gray[.content[

<img src="images/PackageDevelopers2.png" width="70%" height="70%"> 
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Linear mixed model 
]]
.row[.content[

* The response `$n\times 1$` vector `$\boldsymbol{Y}$` is modelled as 
`$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{u}+\boldsymbol{\epsilon},\quad\text{where}\quad
E\begin{bmatrix}\boldsymbol{u} \\ \boldsymbol{\epsilon} \end{bmatrix}=\begin{bmatrix}\mathbf{0} \\ \mathbf{0} \end{bmatrix}\quad\text{and}\quad\text{var} \begin{bmatrix}\boldsymbol{u} \\ \boldsymbol{\epsilon} \end{bmatrix}=\begin{bmatrix}\textbf{G} & \mathbf{0} \\ \mathbf{0} & \textbf{R} \end{bmatrix},$$`
	
`$\textbf{X}$` is the `$n\times p$` design matrix for the fixed effects `$\boldsymbol{\beta}$`, 
`$\textbf{Z}$` is the `$n\times q$` design matrix for the random effects `$\boldsymbol{u}$`, 
`$\boldsymbol{\epsilon}$` is the `$n\times 1$` vector of error.

* The above model is referred to as a .brand-blue[linear mixed model]. Some also refer it to as .brand-blue[mixed linear model], .brand-blue[mixed-effects model], .brand-blue[linear mixed-effects model], hierarchical model, multi-level model, nested models (latter three usual specific to a structure in the data) `$\ldots$`

* 
`$$\text{We assume that }\begin{bmatrix} \boldsymbol{u}\\ \boldsymbol{\epsilon} \end{bmatrix}\sim N\left(\begin{bmatrix}
	\boldsymbol{0}\\ \boldsymbol{0}\end{bmatrix}, \begin{bmatrix}\textbf{G} & \mathbf{0}\\ \mathbf{0} & \textbf{R} \end{bmatrix}\right), \ \text{hence  marginally } \boldsymbol{Y}\sim N(\textbf{X}\boldsymbol{\beta},\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}). \hspace{100mm}$$`

* Note marginally, `$\text{Var}(\boldsymbol{Y})=\textbf{V}=\text{Var}(\textbf{Z}\boldsymbol{u})+\text{Var}(\boldsymbol{\epsilon})=\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}$`, where we often assume `$\textbf{R}= \sigma^2\boldsymbol{I}_n$`.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Mixed model equations
]]
.row[.content[

* The log-density function of the joint (conditional) distribution of   `$\boldsymbol{y}$` and `$\boldsymbol{u}$` is given by

`$$\ell=-\frac{1}{2}\log |\textbf{R}|-\frac{1}{2}(\boldsymbol{y}- \textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})^\top\textbf{R}^{-1}(\boldsymbol{y}-\textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})-\frac{1}{2}\log|\textbf{G}|-\boldsymbol{u}^\top\textbf{G}^{-1}\boldsymbol{u}+\text{constant}.$$`
* The  `$( \hat{\boldsymbol{\beta}}, \tilde{\boldsymbol{u}} )$` that jointly maximises `$\ell$` (assuming `$\textbf{R}$` and `$\textbf{G}$` are known) leads to the .brand-blue[mixed model equations] (MME), sometimes referred to as Henderson's equations:

`$$\begin{bmatrix}
	\textbf{X}^\top\textbf{R}^{-1}\textbf{X} & \textbf{X}^\top\textbf{R}^{-1}\textbf{Z} \\
	\textbf{Z}^\top\textbf{R}^{-1}\textbf{X} & \textbf{Z}^\top\textbf{R}^{-1}\textbf{Z} + \textbf{G}^{-1}\\
	\end{bmatrix}
	\begin{bmatrix}
	\hat{\boldsymbol{\beta}}\\
	\tilde{\boldsymbol{u}}\\
	\end{bmatrix} =
	\begin{bmatrix}
	\textbf{X}^\top\textbf{R}^{-1}\boldsymbol{y} \\
	\textbf{Z}^\top\textbf{R}^{-1}\boldsymbol{y} \\
	\end{bmatrix}.
	$$`

* This gives the .brand-blue[best linear unbiased estimate] (BLUE) for  `$\hat{\boldsymbol{\beta}}$` and .brand-blue[best linear unbiased predictor] (BLUP) for  `$\tilde{\boldsymbol{u}}$`  (assuming that `$\textbf{X}$` full-rank)

`\begin{eqnarray*}
\hat{\boldsymbol{\beta}}&=&(\textbf{X}^\top\textbf{V}^{-1}\textbf{X})^{-1}\textbf{X}^\top\textbf{V}^{-1}\boldsymbol{y}\\ \tilde{\boldsymbol{u}}&=&\textbf{G}\textbf{Z}^\top\textbf{V}^{-1}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}}).
\end{eqnarray*}`
	
* When the estimated variance parameters are .brand-blue[plugged in] the above solution, we refer to them as empirical BLUE (E-BLUE) and empirical BLUP (E-BLUP).

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Dairy cow - that BLUP is good estimator
]]
.row[.split-70[
.column[.content[

The first lactation yields of the daughters of the four sires were recorded.

* To measure the sire additive genetic merits, we model the yield `$\boldsymbol{Y}$` as

`$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{u}+\boldsymbol{\epsilon}$$`
where

* `$\boldsymbol{\beta} = (\beta_1, \beta_2, \beta_3)^\top$` are the (fixed) herd effects; and
  * `$\boldsymbol{u} = (u_A, u_B, u_C, u_D)^\top$` are the (random) sire effects.

* For simplicity we assume that variance components are known

`$$\begin{bmatrix} 
			\boldsymbol{u} \\
			\boldsymbol{e}
			\end{bmatrix} \sim N\left(\begin{bmatrix}
			\boldsymbol{0}\\	
			\boldsymbol{0}\\
			\end{bmatrix}, \begin{bmatrix}
			0.1\boldsymbol{I}_4 & \boldsymbol{0}\\
			\boldsymbol{0} & \boldsymbol{I}_9 
			\end{bmatrix} \right).$$`

]]
.column.bg-brand-gray[.content[
<img src="images/DairyCows.jpg" width="50%" height="50%">

|Herd | Sire | Yield |
|---|---|---|
|	1 | A | 100 |
|	1 | D | 110 |
|	2 | B | 110 |
|	2 | A | 100 |
|	2 | A | 100 |
|	3 | C | 110 |
|	3 | C | 110 |
|	3 | A | 100 |
|	3 | A | 100 |

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Dairy cow - that BLUP is good estimator
]]
.row[.split-70[
.column[.content[

* That is, we have `$\textbf{G} = 0.1\boldsymbol{I}_4$`, `$\textbf{R} = \boldsymbol{I}_9$` where `$\sigma_A^2=0.1$` and `$\sigma^2=1$`,

`$$\textbf{X}=\begin{bmatrix}
1 & 0 & 0\\
1 & 0 & 0\\
0 & 1 & 0\\
0 & 1 & 0\\
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 1\\
0 & 0 & 1\\
0 & 0 & 1\\	
\end{bmatrix},\quad\text{and}\quad\textbf{Z}=\begin{bmatrix}
1 & 0 & 0 & 0\\
0 & 0 & 0 & 1\\
0 & 1 & 0 & 0\\
1 & 0 & 0 & 0\\
1 & 0 & 0 & 0\\
0 & 0 & 1 & 0\\
0 & 0 & 1 & 0\\
1 & 0 & 0 & 0\\
1 & 0 & 0 & 0\\
\end{bmatrix}.$$`
			
* The BLUE and BLUP are then given as

`\begin{eqnarray*}
\boldsymbol{\beta}&=&(105.64, 104.28, 105.45)^\top,\\
\boldsymbol{u}&=&(-1.67,0.52,0.76,0.40)^\top.
\end{eqnarray*}`

]]
.column.bg-brand-gray[.content[

|Herd | Sire | Yield |
|---|---|---|
|	1 | A | 100 |
|	1 | D | 110 |
|	2 | B | 110 |
|	2 | A | 100 |
|	2 | A | 100 |
|	3 | C | 110 |
|	3 | C | 110 |
|	3 | A | 100 |
|	3 | A | 100 |

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: BLUE and BLUP for random effect model
]]
.row[.split-50[
.column[.content[

```r
Yield=c(100,110,110,100,100,110,110,100,100); Y=Yield
Herd=c(1,1,2,2,2,3,3,3,3)
Sire=c("A","D","B","A","A","C","C","A","A")
X=model.matrix(~factor(Herd)-1)
Z=model.matrix(~Sire-1) 
G=diag(0.1,4,4); R=diag(1,9,9)
V=Z%*%G%*%t(Z)+R; VI=solve(V)
beta=solve(t(X)%*%VI%*%X)%*%(t(X)%*%VI%*%Y); beta
```

```
                    [,1]
  factor(Herd)1 105.6387
  factor(Herd)2 104.2757
  factor(Herd)3 105.4584
```

```r
u=G%*%t(Z)%*%VI%*%(Y-X%*%beta); u
```

```
             [,1]
  [1,] -1.6738004
  [2,]  0.5203875
  [3,]  0.7569272
  [4,]  0.3964857
```

]]
.column.bg-brand-gray[.content[

```r
V # Var(Y)
```

```
      1   2   3   4   5   6   7   8   9
  1 1.1 0.0 0.0 0.1 0.1 0.0 0.0 0.1 0.1
  2 0.0 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
  3 0.0 0.0 1.1 0.0 0.0 0.0 0.0 0.0 0.0
  4 0.1 0.0 0.0 1.1 0.1 0.0 0.0 0.1 0.1
  5 0.1 0.0 0.0 0.1 1.1 0.0 0.0 0.1 0.1
  6 0.0 0.0 0.0 0.0 0.0 1.1 0.1 0.0 0.0
  7 0.0 0.0 0.0 0.0 0.0 0.1 1.1 0.0 0.0
  8 0.1 0.0 0.0 0.1 0.1 0.0 0.0 1.1 0.1
  9 0.1 0.0 0.0 0.1 0.1 0.0 0.0 0.1 1.1
```

* Observations  1,4,5,8,9 are from Sire A. So these rows and columns away from the diagonals are covariances which are 0.1 (nonzero) showing  dependency. 
* The diagonals are variance 1 + 0.1 showing two sources of variability. 
* Similary for observations 6,7 from Sire C.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Dairy cow - that BLUP is good estimator
]]
.row[.split-70[
.column[.content[

* Now assume that the sire effects are .brand-blue[fixed]. 
* As the fixed design matrix is no longer full rank, we arbitrary constrain one sire effects to zero, say `$u_A=0$`. 
* Then the solution is

`\begin{eqnarray*}
\boldsymbol{\beta}&=&(100,100,100)^\top, \\
\boldsymbol{u}&=&(0,10,10,10)^\top.
\end{eqnarray*}`

* The BLUP takes into account the information that sire effects have less variation than the variance of lactation yields from daughters of a single sire.

```r
coef(lm(Yield~factor(Herd)+Sire-1))
```

```
  factor(Herd)1 factor(Herd)2 factor(Herd)3         SireB         SireC 
            100           100           100            10            10 
          SireD 
             10
```

]]
.column.bg-brand-gray[.content[

|Herd | Sire | Yield |
|---|---|---|
|	1 | A | 100 |
|	1 | D | 110 |
|	2 | B | 110 |
|	2 | A | 100 |
|	2 | A | 100 |
|	3 | C | 110 |
|	3 | C | 110 |
|	3 | A | 100 |
|	3 | A | 100 |

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Dairy cow - that BLUP is good estimator
]]
.row[.split-70[
.column[.content[

* The extent to which a sire effect is regressed to the mean (called .brand-blue[shrinkage]) depends on the amount of information available concerning the sire. 
* E.g. sire C is predicted better (0.76) than sire B (0.52) and sire D (0.40) even when the lactation yields of the daughters are all the same (110).
* Consequently, it may be better to model sire effects as random rather than fixed. 
* In animal and plant breeding, the genotype effects are fitted often as random for the aim of selection.

]]
.column.bg-brand-gray[.content[

|Herd | Sire | Yield |
|---|---|---|
|	1 | A | 100 |
|	1 | D | 110 |
|	2 | B | 110 |
|	2 | A | 100 |
|	2 | A | 100 |
|	3 | C | 110 |
|	3 | C | 110 |
|	3 | A | 100 |
|	3 | A | 100 |

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Performance test
]]
.row[.split-50[
.column[.content[

The completion time for a half marathon of seven men when they were 18 years old and 30  years old were recorded.

* The completion time appears to be largely dependent on the subject and their age. 
* To measure the performance of each subject by age, we could model the completion time `$Y_{ijk}$` as

`$$Y_{ijk}  =(\alpha\beta)_{ij} + \epsilon_{ij}$$`

where  `$(\alpha\beta)_{ij}$` is the subject-age combined effect; `$i=1,\ldots,7$`; `$j=1,2$` and `$\epsilon_{ijk}\sim N(0, \sigma^2)$`. 
Note: this model is equivalent to one-way ANOVA with all subject-age group combinations.

* Sample size `$n=23$`.

]]
.column.bg-brand-gray[.content[

|   | 18 years | 18 years | 30 years | 30 years |
|---|---|---|---|---|
| **Subject** | **Race 1** | **Race 2** | **Race 1** | **Race 2**|
|1 |  376 | 364 | 434| 414 |
|2  | 227 | 220 | -| 244  |
|3 | 252 | 254 | -| 320 |
|4 | - | 335| -| 368 |
|5 | 341 | 334 | -|  373 |
|6 | 130 | 108 | 150| 183  |
|7 | 229 | 238 | 247| 282 |

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Performance test: Fixed and random effects models
]]
.row[.split-50[
.column[.content[

The .brand-blue[fixed effects] model results in the raw average between completion time by subject and age.

```r
Time=c(376,227,252,341,130,229,364,220,254,335,334,
	108,238,434,150,247,414,244,320,368,373,183,282)
Sub=factor(paste0("S",c(1:3,5:7,1:7,1,6,7,1:7)))
Age=factor(c(rep("Y18", 13),rep("Y30", 10)))
Race=factor(paste0("R",c(rep(1, 6),rep(2, 7),
                         rep(1,3),rep(2, 7))))
M1=lm(Time ~ Age:Sub - 1); M=matrix(coef(M1),2,7)
rownames(M)=c("Y18","Y30")
colnames(M)=paste0("S",1:7); M
```

```
       S1    S2  S3  S4    S5    S6    S7
  Y18 370 223.5 253 335 337.5 119.0 233.5
  Y30 424 244.0 320 368 373.0 166.5 264.5
```

```r
Mm=tapply(Time,list(Age,Sub),mean); Mm
```

```
       S1    S2  S3  S4    S5    S6    S7
  Y18 370 223.5 253 335 337.5 119.0 233.5
  Y30 424 244.0 320 368 373.0 166.5 264.5
```

]]
.column.bg-brand-gray[.content[

Prediction with .brand-blue[random effects] shrinks to the mean.

`$$Y_{ijk}=\mu+(AB)_{ij}+\epsilon_{ijk},\ (AB)_{ij} \sim N(0,\sigma_A^2),$$`
`$$\epsilon_{ijk}\sim N(0,\sigma^2),\ i=1, \dots,7, j=1,2,\ k=1,2.$$`

```r
mean(Time)
```

```
  [1] 279.2609
```

```r
library(lme4)
M2=lmer(Time ~ 1 + (1|Sub:Age))
M=matrix(round(coef(M2)[[1]][,1], 1), 2, 7)
rownames(M)=c("Y18","Y30") 
colnames(M)=paste0("S",1:7); M
```

```
         S1    S2    S3    S4    S5    S6    S7
  Y18 368.9 224.3 253.4 333.8 336.9 121.2 234.2
  Y30 422.2 245.1 319.2 366.0 370.8 168.1 264.8
```
Random effect estimates `$(AB)_{ij}$` are closer to the overall mean than the fixed effect estimates `$(\alpha\beta)_{ij}$`.
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Performance test - Borrowing strength
]]
.row[.split-50[
.column[.content[

```r
M2=lmer(Time ~ 1 + (1|Sub:Age))
summary(M2)
```

```
  Linear mixed model fit by REML ['lmerMod']
  Formula: Time ~ 1 + (1 | Sub:Age)
  
  REML criterion at convergence: 235
  
  Scaled residuals: 
       Min       1Q   Median       3Q      Max 
  -1.29254 -0.33170  0.05856  0.28418  1.23104 
  
  Random effects:
   Groups   Name        Variance Std.Dev.
   Sub:Age  (Intercept) 7492.2   86.56   
   Residual              195.2   13.97   
  Number of obs: 23, groups:  Sub:Age, 14
  
  Fixed effects:
              Estimate Std. Error t value
  (Intercept)   287.78      23.34   12.33
```
`$\hat{\mu}=287.78$`, `$\hat \sigma_A^2=7492.2$`, `$\hat \sigma^2=195.2$` 
]]
.column.bg-brand-gray[.content[

```r
U=matrix(ranef(M2)[[1]][,1],2,7)
rownames(U)=c("Y18","Y30") 
colnames(U)=paste0("S",1:7); round(U,1) #random eff
```

```
         S1    S2    S3   S4   S5     S6    S7
  Y18  81.2 -63.5 -34.3 46.0 49.1 -166.6 -53.6
  Y30 134.5 -42.7  31.4 78.2 83.1 -119.7 -23.0
```

```r
round(sum(U),4) #random effects sum to 0
```

```
  [1] 0
```

```r
round(U+287.78,1) #random effect+mu = coeff
```

```
         S1    S2    S3    S4    S5    S6    S7
  Y18 368.9 224.3 253.4 333.8 336.9 121.2 234.2
  Y30 422.2 245.1 319.2 366.0 370.8 168.1 264.8
```

* Estimates of `$(AB)_{ij}$` using `$\sigma_A^2$` for all random effects borrow strength from other random effect whereas `$(\alpha\beta)_{ij}$` are based on group means only.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Performance test - Borrowing strength
]]
.row[.split-60[
.column[.content[

* The .brand-blue[random effects] model

`$$Y_{ijk}=\mu+(AB)_{ij}+\epsilon_{ijk}$$`
can be written in matrix form with the vector of responses `$\boldsymbol{Y}$` (ordered by age within subject) as

`$$^{23}\boldsymbol{Y}^1=\mu\boldsymbol{1}_n+ \ ^{23}\textbf{Z}^{2\times 7}\boldsymbol{u}^1+ \ ^{23}\boldsymbol{\epsilon}^1$$`

where `$\boldsymbol{u}$` is the subject-age combination (random) effect (ordered by age within subject) and

`$$\begin{bmatrix} 
\boldsymbol{u} \\ \boldsymbol{\epsilon}\end{bmatrix} \sim N\left(\begin{bmatrix} \boldsymbol{0} \\ \boldsymbol{0} 
\end{bmatrix},\begin{bmatrix} \textbf{G} & \boldsymbol{0} \\
\boldsymbol{0} & \sigma^2 \boldsymbol{I}_{23} \end{bmatrix}
\right).$$`

The `$14 \times 14$` matrix of `$\textbf{G}$` is `$\sigma^2_A \mathbf{I}_{14}$`.

]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Performance test - Further borrowing strength with `$\rho$`
]]
.row[.content[
We assume 
`$$\scriptsize\textbf{G}=\begin{bmatrix}
\sigma^2_{1,1} & \sigma_{12,1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
\sigma_{12,1} & \sigma^2_{2,1}& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & \sigma^2_{1,2} & \sigma_{12,2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & \sigma_{12,2} & \sigma^2_{2,2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0  \\
0 & 0 &  0 & 0 &  \sigma^2_{1,3} & \sigma_{12,3} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0  \\
0 & 0 & 0 & 0 & \sigma_{12,3} & \sigma^2_{2,3}& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 &  0 & 0 &  0 & 0 & \sigma^2_{1,4} & \sigma_{12,4} & 0 & 0 & 0 & 0 & 0 & 0  \\
0 & 0 & 0 & 0 & 0 & 0& \sigma_{12,4} & \sigma^2_{2,4}  & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 &  0 & 0 &  0 & 0 & 0 & 0 & \sigma^2_{1,5} & \sigma_{12,5}   & 0 & 0 & 0 & 0  \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma_{12,5} & \sigma^2_{2,5}   & 0 & 0 & 0 & 0 \\
0 & 0 &  0 & 0 &  0 & 0 & 0 & 0 &  0 & 0 & \sigma^2_{1,6} & \sigma_{12,6} & 0 & 0  \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma_{12,6} & \sigma^2_{2,6} & 0 & 0 \\
0 & 0 &  0 & 0 &  0 & 0 & 0 & 0 &  0 & 0 & 0 & 0 &   \sigma^2_{1,7} & \sigma_{12,7}   \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &\sigma_{12,7} & \sigma^2_{2,7}    \\
\end{bmatrix}$$`

If this matrix looks complicated, we consider an alternative expression.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Performance test - Borrowing strength
]]
.row[.split-60[
.column[.content[

* The matrix `$\textbf{G}$` can be written as

`$$\textbf{G}=\boldsymbol{I}_7\otimes\begin{bmatrix}
\sigma^2_1 & \rho\sigma_{1}\sigma_{2} \\
\rho\sigma_{1}\sigma_{2} & \sigma^2_{2} 
\end{bmatrix}$$`

* The operator `$\otimes$` is called a Kronecker product.
* If `$m\times n$` matrix `$\textbf{A}$` and `$p\times q$` matrix `$\textbf{B}$` then `$\textbf{A}\otimes \textbf{B}$` is a `$mp\times nq$` matrix:

`$$\textbf{A}\otimes\textbf{B}=\begin{bmatrix}
a_{11} \textbf{B} & \cdots & a_{1n}\textbf{B} \\
\vdots & \ddots & \vdots\\
a_{m1}\textbf{B} & \cdots & a_{mn} \textbf{B}
\end{bmatrix}.$$`

]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# So which model?
]]
.row[.split-40[
.column[.content[

* This depends on the true data generation process. 
* This is often a hard question to answer.
* .brand-blue[All models are wrong but some are useful] - George Box 
* Everyone like him

]]
.column.bg-brand-gray[.content[

* Suppose `$\boldsymbol{Y}\sim N(\boldsymbol{\mu},\mathbf{\Sigma})$`. The probability density function of .brand-blue[multivariate Normal distribution]  is

`$$f(\boldsymbol{y};\boldsymbol{\mu},\mathbf{\Sigma})= \frac{1}{\sqrt{(2\pi)^N|\mathbf{\Sigma}|}}\text{exp}\left( -\frac{1}{2}(\boldsymbol{y}-\boldsymbol{\mu})^\top\mathbf{\Sigma}^{-1}(\boldsymbol{y}-\boldsymbol{\mu})\right)$$`

* Conditional distribution is `$\displaystyle f_{Y}(\boldsymbol{y}|\boldsymbol{X}=\boldsymbol{x})=\frac{f_{X,Y}(\boldsymbol{x},\boldsymbol{y})}{f_X(\boldsymbol{x})}.$`

* If  `$\textbf{A}$` is some known matrix, `$\text{Var}(\textbf{A}\boldsymbol{Y})=\textbf{A}\mathbf{\Sigma}\textbf{A}^\top$`.
	
* If `$\textbf{A}$` is some known  matrix and `$\boldsymbol{u}=\boldsymbol{u}(\boldsymbol{x})$` then
    * `$\displaystyle\frac{\partial \boldsymbol{u}^\top\textbf{A}\boldsymbol{u}}{\partial\boldsymbol{x}}=\frac{\partial\boldsymbol{u}}{\partial\boldsymbol{x}}\textbf{A}\boldsymbol{u}+\frac{\partial\boldsymbol{u}}{\partial\boldsymbol{x}}\textbf{A}^\top\boldsymbol{u},$` 
    * `$\displaystyle \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{I},$` 
    * `$\displaystyle\frac{\partial\textbf{A}\boldsymbol{u}}{\partial\boldsymbol{x}}=\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{x}}\boldsymbol{A}^\top.$`

]]
]]