class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Linear Mixed Models ## .black[STAT3022 Applied Linear Models Lecture 32] <br><br><br> ### .black[2020/02/20] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. Linear mixed models 2. More lme4 3. More random vs. fixed effects 4. Borrowing strength 5. Mixed model equations ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Fun fact ]] .row[.split-60[ .column[.content[ * Charles Roy Henderson (1911-1989) applied quantitative methods for genetic evaluation of domestic livestock. * He worked at the Department of Animal Science at Cornell University from 1948 to his retirement in 1976 and was the PhD supervisor of Shayle Searle. * He originated the idea of mixed model equations. * Douglas Bates, the Emeritus Professor of Statistics at University of Wisconsin, is the author and maintainer of the R package `lme4`. ]] .column.bg-brand-gray[.content[ <img src="images/PackageDevelopers2.png" width="70%" height="70%"> ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Linear mixed model ]] .row[.content[ * The response `\(n\times 1\)` vector `\(\boldsymbol{Y}\)` is modelled as `$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{u}+\boldsymbol{\epsilon},\quad\text{where}\quad E\begin{bmatrix}\boldsymbol{u} \\ \boldsymbol{\epsilon} \end{bmatrix}=\begin{bmatrix}\mathbf{0} \\ \mathbf{0} \end{bmatrix}\quad\text{and}\quad\text{var} \begin{bmatrix}\boldsymbol{u} \\ \boldsymbol{\epsilon} \end{bmatrix}=\begin{bmatrix}\textbf{G} & \mathbf{0} \\ \mathbf{0} & \textbf{R} \end{bmatrix},$$` `\(\textbf{X}\)` is the `\(n\times p\)` design matrix for the fixed effects `\(\boldsymbol{\beta}\)`, <br> `\(\textbf{Z}\)` is the `\(n\times q\)` design matrix for the random effects `\(\boldsymbol{u}\)`, <br> `\(\boldsymbol{\epsilon}\)` is the `\(n\times 1\)` vector of error. * The above model is referred to as a .brand-blue[linear mixed model]. Some also refer it to as .brand-blue[mixed linear model], .brand-blue[mixed-effects model], .brand-blue[linear mixed-effects model], hierarchical model, multi-level model, nested models (latter three usual specific to a structure in the data) `\(\ldots\)` * `$$\text{We assume that }\begin{bmatrix} \boldsymbol{u}\\ \boldsymbol{\epsilon} \end{bmatrix}\sim N\left(\begin{bmatrix} \boldsymbol{0}\\ \boldsymbol{0}\end{bmatrix}, \begin{bmatrix}\textbf{G} & \mathbf{0}\\ \mathbf{0} & \textbf{R} \end{bmatrix}\right), \ \text{hence marginally } \boldsymbol{Y}\sim N(\textbf{X}\boldsymbol{\beta},\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}). \hspace{100mm}$$` * Note marginally, `\(\text{Var}(\boldsymbol{Y})=\textbf{V}=\text{Var}(\textbf{Z}\boldsymbol{u})+\text{Var}(\boldsymbol{\epsilon})=\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}\)`, where we often assume `\(\textbf{R}= \sigma^2\boldsymbol{I}_n\)`. ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Mixed model equations ]] .row[.content[ * The log-density function of the joint (conditional) distribution of `\(\boldsymbol{y}\)` and `\(\boldsymbol{u}\)` is given by `$$\ell=-\frac{1}{2}\log |\textbf{R}|-\frac{1}{2}(\boldsymbol{y}- \textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})^\top\textbf{R}^{-1}(\boldsymbol{y}-\textbf{X}\boldsymbol{\beta}-\textbf{Z}\boldsymbol{u})-\frac{1}{2}\log|\textbf{G}|-\boldsymbol{u}^\top\textbf{G}^{-1}\boldsymbol{u}+\text{constant}.$$` * The `\(( \hat{\boldsymbol{\beta}}, \tilde{\boldsymbol{u}} )\)` that jointly maximises `\(\ell\)` (assuming `\(\textbf{R}\)` and `\(\textbf{G}\)` are known) leads to the .brand-blue[mixed model equations] (MME), sometimes referred to as Henderson's equations: `$$\begin{bmatrix} \textbf{X}^\top\textbf{R}^{-1}\textbf{X} & \textbf{X}^\top\textbf{R}^{-1}\textbf{Z} \\ \textbf{Z}^\top\textbf{R}^{-1}\textbf{X} & \textbf{Z}^\top\textbf{R}^{-1}\textbf{Z} + \textbf{G}^{-1}\\ \end{bmatrix} \begin{bmatrix} \hat{\boldsymbol{\beta}}\\ \tilde{\boldsymbol{u}}\\ \end{bmatrix} = \begin{bmatrix} \textbf{X}^\top\textbf{R}^{-1}\boldsymbol{y} \\ \textbf{Z}^\top\textbf{R}^{-1}\boldsymbol{y} \\ \end{bmatrix}. $$` * This gives the .brand-blue[best linear unbiased estimate] (BLUE) for `\(\hat{\boldsymbol{\beta}}\)` and .brand-blue[best linear unbiased predictor] (BLUP) for `\(\tilde{\boldsymbol{u}}\)` (assuming that `\(\textbf{X}\)` full-rank) `\begin{eqnarray*} \hat{\boldsymbol{\beta}}&=&(\textbf{X}^\top\textbf{V}^{-1}\textbf{X})^{-1}\textbf{X}^\top\textbf{V}^{-1}\boldsymbol{y}\\ \tilde{\boldsymbol{u}}&=&\textbf{G}\textbf{Z}^\top\textbf{V}^{-1}(\boldsymbol{y}-\textbf{X}\hat{\boldsymbol{\beta}}). \end{eqnarray*}` * When the estimated variance parameters are .brand-blue[plugged in] the above solution, we refer to them as empirical BLUE (E-BLUE) and empirical BLUP (E-BLUP). ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Dairy cow - that BLUP is good estimator ]] .row[.split-70[ .column[.content[ The first lactation yields of the daughters of the four sires were recorded. * To measure the sire additive genetic merits, we model the yield `\(\boldsymbol{Y}\)` as `$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{u}+\boldsymbol{\epsilon}$$` where * `\(\boldsymbol{\beta} = (\beta_1, \beta_2, \beta_3)^\top\)` are the (fixed) herd effects; and * `\(\boldsymbol{u} = (u_A, u_B, u_C, u_D)^\top\)` are the (random) sire effects. * For simplicity we assume that variance components are known `$$\begin{bmatrix} \boldsymbol{u} \\ \boldsymbol{e} \end{bmatrix} \sim N\left(\begin{bmatrix} \boldsymbol{0}\\ \boldsymbol{0}\\ \end{bmatrix}, \begin{bmatrix} 0.1\boldsymbol{I}_4 & \boldsymbol{0}\\ \boldsymbol{0} & \boldsymbol{I}_9 \end{bmatrix} \right).$$` ]] .column.bg-brand-gray[.content[ <img src="images/DairyCows.jpg" width="50%" height="50%"> |Herd | Sire | Yield | |---|---|---| | 1 | A | 100 | | 1 | D | 110 | | 2 | B | 110 | | 2 | A | 100 | | 2 | A | 100 | | 3 | C | 110 | | 3 | C | 110 | | 3 | A | 100 | | 3 | A | 100 | ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Dairy cow - that BLUP is good estimator ]] .row[.split-70[ .column[.content[ * That is, we have `\(\textbf{G} = 0.1\boldsymbol{I}_4\)`, `\(\textbf{R} = \boldsymbol{I}_9\)` where `\(\sigma_A^2=0.1\)` and `\(\sigma^2=1\)`, `$$\textbf{X}=\begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 1 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ 0 & 0 & 1\\ 0 & 0 & 1\\ 0 & 0 & 1\\ \end{bmatrix},\quad\text{and}\quad\textbf{Z}=\begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ \end{bmatrix}.$$` * The BLUE and BLUP are then given as `\begin{eqnarray*} \boldsymbol{\beta}&=&(105.64, 104.28, 105.45)^\top,\\ \boldsymbol{u}&=&(-1.67,0.52,0.76,0.40)^\top. \end{eqnarray*}` ]] .column.bg-brand-gray[.content[ <br> |Herd | Sire | Yield | |---|---|---| | 1 | A | 100 | | 1 | D | 110 | | 2 | B | 110 | | 2 | A | 100 | | 2 | A | 100 | | 3 | C | 110 | | 3 | C | 110 | | 3 | A | 100 | | 3 | A | 100 | ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: BLUE and BLUP for random effect model ]] .row[.split-50[ .column[.content[ ```r Yield=c(100,110,110,100,100,110,110,100,100); Y=Yield Herd=c(1,1,2,2,2,3,3,3,3) Sire=c("A","D","B","A","A","C","C","A","A") X=model.matrix(~factor(Herd)-1) Z=model.matrix(~Sire-1) G=diag(0.1,4,4); R=diag(1,9,9) V=Z%*%G%*%t(Z)+R; VI=solve(V) beta=solve(t(X)%*%VI%*%X)%*%(t(X)%*%VI%*%Y); beta ``` ``` [,1] factor(Herd)1 105.6387 factor(Herd)2 104.2757 factor(Herd)3 105.4584 ``` ```r u=G%*%t(Z)%*%VI%*%(Y-X%*%beta); u ``` ``` [,1] [1,] -1.6738004 [2,] 0.5203875 [3,] 0.7569272 [4,] 0.3964857 ``` ]] .column.bg-brand-gray[.content[ ```r V # Var(Y) ``` ``` 1 2 3 4 5 6 7 8 9 1 1.1 0.0 0.0 0.1 0.1 0.0 0.0 0.1 0.1 2 0.0 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 0.0 0.0 1.1 0.0 0.0 0.0 0.0 0.0 0.0 4 0.1 0.0 0.0 1.1 0.1 0.0 0.0 0.1 0.1 5 0.1 0.0 0.0 0.1 1.1 0.0 0.0 0.1 0.1 6 0.0 0.0 0.0 0.0 0.0 1.1 0.1 0.0 0.0 7 0.0 0.0 0.0 0.0 0.0 0.1 1.1 0.0 0.0 8 0.1 0.0 0.0 0.1 0.1 0.0 0.0 1.1 0.1 9 0.1 0.0 0.0 0.1 0.1 0.0 0.0 0.1 1.1 ``` * Observations 1,4,5,8,9 are from Sire A. So these rows and columns away from the diagonals are covariances which are 0.1 (nonzero) showing dependency. * The diagonals are variance 1 + 0.1 showing two sources of variability. * Similary for observations 6,7 from Sire C. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Dairy cow - that BLUP is good estimator ]] .row[.split-70[ .column[.content[ * Now assume that the sire effects are .brand-blue[fixed]. * As the fixed design matrix is no longer full rank, we arbitrary constrain one sire effects to zero, say `\(u_A=0\)`. * Then the solution is `\begin{eqnarray*} \boldsymbol{\beta}&=&(100,100,100)^\top, \\ \boldsymbol{u}&=&(0,10,10,10)^\top. \end{eqnarray*}` * The BLUP takes into account the information that sire effects have less variation than the variance of lactation yields from daughters of a single sire. ```r coef(lm(Yield~factor(Herd)+Sire-1)) ``` ``` factor(Herd)1 factor(Herd)2 factor(Herd)3 SireB SireC 100 100 100 10 10 SireD 10 ``` ]] .column.bg-brand-gray[.content[ <br> |Herd | Sire | Yield | |---|---|---| | 1 | A | 100 | | 1 | D | 110 | | 2 | B | 110 | | 2 | A | 100 | | 2 | A | 100 | | 3 | C | 110 | | 3 | C | 110 | | 3 | A | 100 | | 3 | A | 100 | ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Dairy cow - that BLUP is good estimator ]] .row[.split-70[ .column[.content[ * The extent to which a sire effect is regressed to the mean (called .brand-blue[shrinkage]) depends on the amount of information available concerning the sire. * E.g. sire C is predicted better (0.76) than sire B (0.52) and sire D (0.40) even when the lactation yields of the daughters are all the same (110). * Consequently, it may be better to model sire effects as random rather than fixed. * In animal and plant breeding, the genotype effects are fitted often as random for the aim of selection. ]] .column.bg-brand-gray[.content[ <br> |Herd | Sire | Yield | |---|---|---| | 1 | A | 100 | | 1 | D | 110 | | 2 | B | 110 | | 2 | A | 100 | | 2 | A | 100 | | 3 | C | 110 | | 3 | C | 110 | | 3 | A | 100 | | 3 | A | 100 | ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Example: Performance test ]] .row[.split-50[ .column[.content[ The completion time for a half marathon of seven men when they were 18 years old and 30 years old were recorded. * The completion time appears to be largely dependent on the subject and their age. * To measure the performance of each subject by age, we could model the completion time `\(Y_{ijk}\)` as `$$Y_{ijk} =(\alpha\beta)_{ij} + \epsilon_{ij}$$` where `\((\alpha\beta)_{ij}\)` is the subject-age combined effect; `\(i=1,\ldots,7\)`; `\(j=1,2\)` and `\(\epsilon_{ijk}\sim N(0, \sigma^2)\)`. Note: this model is equivalent to one-way ANOVA with all subject-age group combinations. * Sample size `\(n=23\)`. ]] .column.bg-brand-gray[.content[ <br> | | 18 years | 18 years | 30 years | 30 years | |---|---|---|---|---| | **Subject** | **Race 1** | **Race 2** | **Race 1** | **Race 2**| |1 | 376 | 364 | 434| 414 | |2 | 227 | 220 | -| 244 | |3 | 252 | 254 | -| 320 | |4 | - | 335| -| 368 | |5 | 341 | 334 | -| 373 | |6 | 130 | 108 | 150| 183 | |7 | 229 | 238 | 247| 282 | ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Performance test: Fixed and random effects models ]] .row[.split-50[ .column[.content[ The .brand-blue[fixed effects] model results in the raw average between completion time by subject and age. ```r Time=c(376,227,252,341,130,229,364,220,254,335,334, 108,238,434,150,247,414,244,320,368,373,183,282) Sub=factor(paste0("S",c(1:3,5:7,1:7,1,6,7,1:7))) Age=factor(c(rep("Y18", 13),rep("Y30", 10))) Race=factor(paste0("R",c(rep(1, 6),rep(2, 7), rep(1,3),rep(2, 7)))) M1=lm(Time ~ Age:Sub - 1); M=matrix(coef(M1),2,7) rownames(M)=c("Y18","Y30") colnames(M)=paste0("S",1:7); M ``` ``` S1 S2 S3 S4 S5 S6 S7 Y18 370 223.5 253 335 337.5 119.0 233.5 Y30 424 244.0 320 368 373.0 166.5 264.5 ``` ```r Mm=tapply(Time,list(Age,Sub),mean); Mm ``` ``` S1 S2 S3 S4 S5 S6 S7 Y18 370 223.5 253 335 337.5 119.0 233.5 Y30 424 244.0 320 368 373.0 166.5 264.5 ``` ]] .column.bg-brand-gray[.content[ Prediction with .brand-blue[random effects] shrinks to the mean. `$$Y_{ijk}=\mu+(AB)_{ij}+\epsilon_{ijk},\ (AB)_{ij} \sim N(0,\sigma_A^2),$$` `$$\epsilon_{ijk}\sim N(0,\sigma^2),\ i=1, \dots,7, j=1,2,\ k=1,2.$$` ```r mean(Time) ``` ``` [1] 279.2609 ``` ```r library(lme4) M2=lmer(Time ~ 1 + (1|Sub:Age)) M=matrix(round(coef(M2)[[1]][,1], 1), 2, 7) rownames(M)=c("Y18","Y30") colnames(M)=paste0("S",1:7); M ``` ``` S1 S2 S3 S4 S5 S6 S7 Y18 368.9 224.3 253.4 333.8 336.9 121.2 234.2 Y30 422.2 245.1 319.2 366.0 370.8 168.1 264.8 ``` Random effect estimates `\((AB)_{ij}\)` are closer to the overall mean than the fixed effect estimates `\((\alpha\beta)_{ij}\)`. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Performance test - Borrowing strength ]] .row[.split-50[ .column[.content[ ```r M2=lmer(Time ~ 1 + (1|Sub:Age)) summary(M2) ``` ``` Linear mixed model fit by REML ['lmerMod'] Formula: Time ~ 1 + (1 | Sub:Age) REML criterion at convergence: 235 Scaled residuals: Min 1Q Median 3Q Max -1.29254 -0.33170 0.05856 0.28418 1.23104 Random effects: Groups Name Variance Std.Dev. Sub:Age (Intercept) 7492.2 86.56 Residual 195.2 13.97 Number of obs: 23, groups: Sub:Age, 14 Fixed effects: Estimate Std. Error t value (Intercept) 287.78 23.34 12.33 ``` `\(\hat{\mu}=287.78\)`, `\(\hat \sigma_A^2=7492.2\)`, `\(\hat \sigma^2=195.2\)` ]] .column.bg-brand-gray[.content[ ```r U=matrix(ranef(M2)[[1]][,1],2,7) rownames(U)=c("Y18","Y30") colnames(U)=paste0("S",1:7); round(U,1) #random eff ``` ``` S1 S2 S3 S4 S5 S6 S7 Y18 81.2 -63.5 -34.3 46.0 49.1 -166.6 -53.6 Y30 134.5 -42.7 31.4 78.2 83.1 -119.7 -23.0 ``` ```r round(sum(U),4) #random effects sum to 0 ``` ``` [1] 0 ``` ```r round(U+287.78,1) #random effect+mu = coeff ``` ``` S1 S2 S3 S4 S5 S6 S7 Y18 368.9 224.3 253.4 333.8 336.9 121.2 234.2 Y30 422.2 245.1 319.2 366.0 370.8 168.1 264.8 ``` * Estimates of `\((AB)_{ij}\)` using `\(\sigma_A^2\)` for all random effects borrow strength from other random effect whereas `\((\alpha\beta)_{ij}\)` are based on group means only. ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Performance test - Borrowing strength ]] .row[.split-60[ .column[.content[ * The .brand-blue[random effects] model `$$Y_{ijk}=\mu+(AB)_{ij}+\epsilon_{ijk}$$` can be written in matrix form with the vector of responses `\(\boldsymbol{Y}\)` (ordered by age within subject) as `$$^{23}\boldsymbol{Y}^1=\mu\boldsymbol{1}_n+ \ ^{23}\textbf{Z}^{2\times 7}\boldsymbol{u}^1+ \ ^{23}\boldsymbol{\epsilon}^1$$` where `\(\boldsymbol{u}\)` is the subject-age combination (random) effect (ordered by age within subject) and `$$\begin{bmatrix} \boldsymbol{u} \\ \boldsymbol{\epsilon}\end{bmatrix} \sim N\left(\begin{bmatrix} \boldsymbol{0} \\ \boldsymbol{0} \end{bmatrix},\begin{bmatrix} \textbf{G} & \boldsymbol{0} \\ \boldsymbol{0} & \sigma^2 \boldsymbol{I}_{23} \end{bmatrix} \right).$$` The `\(14 \times 14\)` matrix of `\(\textbf{G}\)` is `\(\sigma^2_A \mathbf{I}_{14}\)`. ]] .column.bg-brand-gray[.content[ <img src="images/RandomPeopleEffect.png" width="50%" height="50%"> ]] ]] --- layout: false class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Performance test - Further borrowing strength with `\(\rho\)` ]] .row[.content[ We assume `$$\scriptsize\textbf{G}=\begin{bmatrix} \sigma^2_{1,1} & \sigma_{12,1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \sigma_{12,1} & \sigma^2_{2,1}& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2_{1,2} & \sigma_{12,2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma_{12,2} & \sigma^2_{2,2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma^2_{1,3} & \sigma_{12,3} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma_{12,3} & \sigma^2_{2,3}& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2_{1,4} & \sigma_{12,4} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0& \sigma_{12,4} & \sigma^2_{2,4} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2_{1,5} & \sigma_{12,5} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma_{12,5} & \sigma^2_{2,5} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2_{1,6} & \sigma_{12,6} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma_{12,6} & \sigma^2_{2,6} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2_{1,7} & \sigma_{12,7} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &\sigma_{12,7} & \sigma^2_{2,7} \\ \end{bmatrix}$$` If this matrix looks complicated, we consider an alternative expression. ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Performance test - Borrowing strength ]] .row[.split-60[ .column[.content[ * The matrix `\(\textbf{G}\)` can be written as `$$\textbf{G}=\boldsymbol{I}_7\otimes\begin{bmatrix} \sigma^2_1 & \rho\sigma_{1}\sigma_{2} \\ \rho\sigma_{1}\sigma_{2} & \sigma^2_{2} \end{bmatrix}$$` * The operator `\(\otimes\)` is called a Kronecker product. * If `\(m\times n\)` matrix `\(\textbf{A}\)` and `\(p\times q\)` matrix `\(\textbf{B}\)` then `\(\textbf{A}\otimes \textbf{B}\)` is a `\(mp\times nq\)` matrix: `$$\textbf{A}\otimes\textbf{B}=\begin{bmatrix} a_{11} \textbf{B} & \cdots & a_{1n}\textbf{B} \\ \vdots & \ddots & \vdots\\ a_{m1}\textbf{B} & \cdots & a_{mn} \textbf{B} \end{bmatrix}.$$` ]] .column.bg-brand-gray[.content[ <img src="images/RandomPeopleEffect.png" width="50%" height="50%"> ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # So which model? ]] .row[.split-40[ .column[.content[ * This depends on the true data generation process. * This is often a hard question to answer. * .brand-blue[All models are wrong but some are useful] - George Box * Everyone like him <img src="images/ModelWrong.png" width="60%" height="60%"> ]] .column.bg-brand-gray[.content[ * Suppose `\(\boldsymbol{Y}\sim N(\boldsymbol{\mu},\mathbf{\Sigma})\)`. The probability density function of .brand-blue[multivariate Normal distribution] is `$$f(\boldsymbol{y};\boldsymbol{\mu},\mathbf{\Sigma})= \frac{1}{\sqrt{(2\pi)^N|\mathbf{\Sigma}|}}\text{exp}\left( -\frac{1}{2}(\boldsymbol{y}-\boldsymbol{\mu})^\top\mathbf{\Sigma}^{-1}(\boldsymbol{y}-\boldsymbol{\mu})\right)$$` * Conditional distribution is `\(\displaystyle f_{Y}(\boldsymbol{y}|\boldsymbol{X}=\boldsymbol{x})=\frac{f_{X,Y}(\boldsymbol{x},\boldsymbol{y})}{f_X(\boldsymbol{x})}.\)` * If `\(\textbf{A}\)` is some known matrix, `\(\text{Var}(\textbf{A}\boldsymbol{Y})=\textbf{A}\mathbf{\Sigma}\textbf{A}^\top\)`. * If `\(\textbf{A}\)` is some known matrix and `\(\boldsymbol{u}=\boldsymbol{u}(\boldsymbol{x})\)` then * `\(\displaystyle\frac{\partial \boldsymbol{u}^\top\textbf{A}\boldsymbol{u}}{\partial\boldsymbol{x}}=\frac{\partial\boldsymbol{u}}{\partial\boldsymbol{x}}\textbf{A}\boldsymbol{u}+\frac{\partial\boldsymbol{u}}{\partial\boldsymbol{x}}\textbf{A}^\top\boldsymbol{u},\)` * `\(\displaystyle \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{I},\)` * `\(\displaystyle\frac{\partial\textbf{A}\boldsymbol{u}}{\partial\boldsymbol{x}}=\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{x}}\boldsymbol{A}^\top.\)` ]] ]]