Longitudinal Data

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Longitudinal Data
## .black[STAT3022 Applied Linear Models Lecture 34]

### .black[2020/02/20]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Longitudinal and repeated measures data
2. Mixed models for longitudinal data
3. Random intercept and random slope models

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Employability fact
]]
.row[.split-70[
.column[.content[

* Psychology and medical fields are full of longitudinal data. 
* An example of where you can find a job is in National Centre for Longitudinal Data (NCLD) in the Department of Social Services in the Australian Government.
* The NCLD:

- develop a national framework (or architecture) for government investment in longitudinal data;
	- produce and manage quality longitudinal data sets and encourage their use;
  - foster collaboration between longitudinal survey developers, researchers and policy makers;
  - facilitate broader use of longitudinal data.

]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Repeated measures and longitudinal dat
]]
.row[.split-70[
.column[.content[

* .brand-blue[Repeated measures] data consist of measurements of a response (and perhaps some covariates) on several experimental (or observational) units usually over a .brand-blue[shorter time frame]. 
* When the measurement time is considered to be (nearly) at the same time point for different experimental units, the data are said to be .brand-blue[cross-sectional]. 
* Frequently the experimental (observational) unit is the .brand-blue[Subject]. This is not restricted to human subjects.
* .brand-blue[Longitudinal] data are repeated measures data in which
	the observations are taken over a .brand-blue[longer period of time].
* We wish to characterise the response over time within subjects and
	the variation in the time trends between subjects.
* Frequently we are not as interested in comparing the
	particular subjects in the study as much as we are interested in
	modeling the .brand-blue[variability in the population] from which the subjects
	were chosen.
	
]]
.column.bg-brand-gray[.content[

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sleep data
]]
.row[.split-60[
.column[.content[

* This laboratory experiment measured the effect of sleep deprivation
	on cognitive performance.
* There were .brand-blue[18 subjects], chosen from the population of interest
	(long-distance truck drivers), in the .brand-blue[10 day trial]. These subjects were
	restricted to 3 hours sleep per night during the trial.
* On each day of the trial each subject's reaction time was
	measured.  The reaction time shown here is the average of several
	measurements.
* This data are .brand-blue[balanced] in that each subject is
	measured the same number of times and on the same occasions.

]]
.column.bg-brand-gray[.content[

```r
head(sleepstudy,20)
```

```
     Reaction Days Subject
  1  249.5600    0     308
  2  258.7047    1     308
  3  250.8006    2     308
  4  321.4398    3     308
  5  356.8519    4     308
  6  414.6901    5     308
  7  382.2038    6     308
  8  290.1486    7     308
  9  430.5853    8     308
  10 466.3535    9     308
  11 222.7339    0     309
  12 205.2658    1     309
  13 202.9778    2     309
  14 204.7070    3     309
  15 207.7161    4     309
  16 215.9618    5     309
  17 213.6303    6     309
  18 217.7272    7     309
  19 224.2957    8     309
  20 237.3142    9     309
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sleep data
]]
.row[.content[

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M1: Linear fixed effect model to each subject
]]
.row[.split-50[
.column[.content[

```r
M1 <- lmList(Reaction ~ Days | Subject, sleepstudy)
Sigma=summary(M1)$sigma; cbind(coef(M1),Sigma)
```

```
      (Intercept)      Days     Sigma
  310    203.4842  6.114899 12.320320
  309    205.0549  2.261785  8.874744
  370    210.4491 18.056151 24.118155
  349    215.1118 13.493933 14.012891
  350    225.8346 19.504017 24.360551
  334    240.1629 12.253141 20.556388
  308    244.1927 21.764702 47.779687
  371    253.6360  9.188445 25.072182
  369    254.9681 11.348109 15.832082
  351    261.1470  6.433498 22.757629
  335    263.0347 -2.881034 11.388590
  332    264.2516  9.566768 60.896273
  372    267.0448 11.298073 11.283986
  333    275.0191  9.142045 12.458792
  352    276.3721 13.566549 25.518745
  331    285.7390  5.266019 23.429771
  330    289.6851  3.008073 22.296248
  337    290.1041 19.025974 16.320930
```
Same as fitting a simple linear reg to each subject
]]
.column.bg-brand-gray[.content[

For each Subject `$i$`, the linear fixed effect model is  
`$$Y_{ij}  = \beta_{i0} + \beta_{i1} x_{j} + \epsilon_{ij} \quad i=1, \ldots 18, \quad j=1, \ldots, 10,$$`
`$\epsilon_{ij} \sim N(0, \sigma^2_i)$` where `$x_{j}$` is the number of days.

`$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}\quad \text{where}$$`
`$\textbf{X}=\boldsymbol{I}_{18}\otimes$` `$\begin{bmatrix}1& x_1\\1&x_2\\ \vdots&\vdots\\1& x_{10}\end{bmatrix}$`, `$\boldsymbol{\beta}=\begin{bmatrix}\beta_{10}\\ \beta_{11}\\ \vdots\\ \beta_{18,1}\end{bmatrix}$` and 
`$\boldsymbol{\epsilon}=\begin{bmatrix}\epsilon_{11}\\ \epsilon_{12}\\ \vdots\\ \epsilon_{18,10}\end{bmatrix}$` 
assuming `$\boldsymbol{\epsilon}\sim N\left(\boldsymbol{0},\begin{bmatrix}\sigma^2_1&0&\cdots &0\\0&\sigma^2_2& \ddots&\vdots \\ \vdots&\ddots&\ddots&0\\0&\cdots&0&\sigma^2_{18}\\ \end{bmatrix}\otimes\boldsymbol{I}_{10}\right)$`.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Design matrices for M1 and M2
]]
.row[.split-50[
.column[.content[

```r
X1[1:20,1:6] #180 by 36 matrix X in M1 and Z in M2 
```

```
        [,1] [,2] [,3] [,4] [,5] [,6]
   [1,]    1    0    0    0    0    0
   [2,]    1    1    0    0    0    0
   [3,]    1    2    0    0    0    0
   [4,]    1    3    0    0    0    0
   [5,]    1    4    0    0    0    0
   [6,]    1    5    0    0    0    0
   [7,]    1    6    0    0    0    0
   [8,]    1    7    0    0    0    0
   [9,]    1    8    0    0    0    0
  [10,]    1    9    0    0    0    0
  [11,]    0    0    1    0    0    0
  [12,]    0    0    1    1    0    0
  [13,]    0    0    1    2    0    0
  [14,]    0    0    1    3    0    0
  [15,]    0    0    1    4    0    0
  [16,]    0    0    1    5    0    0
  [17,]    0    0    1    6    0    0
  [18,]    0    0    1    7    0    0
  [19,]    0    0    1    8    0    0
  [20,]    0    0    1    9    0    0
```

]]
.column.bg-brand-gray[.content[

```r
X2[1:20,] #180 by 2 matrix X in M2
```

```
        [,1] [,2]
   [1,]    1    0
   [2,]    1    1
   [3,]    1    2
   [4,]    1    3
   [5,]    1    4
   [6,]    1    5
   [7,]    1    6
   [8,]    1    7
   [9,]    1    8
  [10,]    1    9
  [11,]    1    0
  [12,]    1    1
  [13,]    1    2
  [14,]    1    3
  [15,]    1    4
  [16,]    1    5
  [17,]    1    6
  [18,]    1    7
  [19,]    1    8
  [20,]    1    9
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: simple linear regression per Subject 
]]
.row[.content[

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M2: Correlated rand. intercept & rand. slope for all
]]
.row[.split-50[
.column[.content[

We model the reaction time in .brand-blue[one model] as
`$$\boldsymbol{Y}=\textbf{X}\boldsymbol{\beta}+\textbf{Z}\boldsymbol{u}+\boldsymbol{\epsilon}$$`
where `$\textbf{X}=\boldsymbol{1}_{18}\otimes\begin{bmatrix}1&x_1\\1&x_2\\ \vdots&\vdots\\1&x_{10}\end{bmatrix}$`, `$\textbf{Z}=\boldsymbol{I}_{18}\otimes \begin{bmatrix}1& x_1\\1&x_2\\ \vdots&\vdots\\1&x_{10}\end{bmatrix}$`, `$\boldsymbol{\beta}=\begin{bmatrix}\beta_{0}\\ \beta_{1}\\ \end{bmatrix}$`, `$\boldsymbol{u}=\begin{bmatrix}B_{10}\\ B_{11}\\ \vdots\\ B_{18,1}\end{bmatrix}$` and `$\boldsymbol{\epsilon}=\begin{bmatrix}\epsilon_{11} \\ \epsilon_{12}\\ \vdots\\ \epsilon_{18,10}\end{bmatrix}$`,
`$\boldsymbol{u}\sim N\left(\boldsymbol{0},\boldsymbol{I}_{18}\otimes\begin{bmatrix}\sigma^2_0&\rho \sigma_0\sigma_1\\ \rho\sigma_0\sigma_1&\sigma^2_1\end{bmatrix}\right), \boldsymbol{\epsilon}\sim N(\boldsymbol{0},\sigma^2\boldsymbol{I}_{180})$` and `$\boldsymbol{u}$` and `$\boldsymbol{\epsilon}$` are independent.

So `$\sigma_0=24.737$`, `$\sigma_1=5.923$`, `$\rho=0.07$`, `$\sigma=25.592$`, `$\beta_0=251.41$` and `$\beta_1=10.47$`.
]]
.column.bg-brand-gray[.content[

```r
M2=lmer(Reaction~Days+(Days|Subject),data=sleepstudy)
M2 #Days|Subject nest intc and slope with correlation
```

```
  Linear mixed model fit by REML ['lmerMod']
  Formula: Reaction ~ Days + (Days | Subject)
     Data: sleepstudy
  REML criterion at convergence: 1743.628
  Random effects:
   Groups   Name        Std.Dev. Corr
   Subject  (Intercept) 24.737       
            Days         5.923   0.07
   Residual             25.592       
  Number of obs: 180, groups:  Subject, 18
  Fixed Effects:
  (Intercept)         Days  
       251.41        10.47
```

```r
VarCorr(M2) 
```

```
   Groups   Name        Std.Dev. Corr 
   Subject  (Intercept) 24.7366       
            Days         5.9229  0.066
   Residual             25.5918
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M2: Variance matrix of u, `$\textbf{G}$`  
]]
.row[.content[

```r
G[1:20,1:13] #36 by 36; take simple values sig1^2=0.1, sig2^2=0.2, rho=0.5 as an example only
```

```
        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
   [1,]  0.1  0.1  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
   [2,]  0.1  0.2  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
   [3,]  0.0  0.0  0.1  0.1  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
   [4,]  0.0  0.0  0.1  0.2  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
   [5,]  0.0  0.0  0.0  0.0  0.1  0.1  0.0  0.0  0.0   0.0   0.0   0.0   0.0
   [6,]  0.0  0.0  0.0  0.0  0.1  0.2  0.0  0.0  0.0   0.0   0.0   0.0   0.0
   [7,]  0.0  0.0  0.0  0.0  0.0  0.0  0.1  0.1  0.0   0.0   0.0   0.0   0.0
   [8,]  0.0  0.0  0.0  0.0  0.0  0.0  0.1  0.2  0.0   0.0   0.0   0.0   0.0
   [9,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.1   0.1   0.0   0.0   0.0
  [10,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.1   0.2   0.0   0.0   0.0
  [11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.1   0.1   0.0
  [12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.1   0.2   0.0
  [13,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.1
  [14,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.1
  [15,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
  [16,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
  [17,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
  [18,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
  [19,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
  [20,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M2: Variance matrix of Y,  `$\textbf{V}=\textbf{Z}\textbf{G}\textbf{Z}^\top+\textbf{R}$`   
]]
.row[.content[

```r
V[1:20,1:13] #180 by 180; further sig^2=1; var & cov increase with x over time
```

```
        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
   [1,]  1.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9   1.0   0.0   0.0   0.0
   [2,]  0.2  1.5  0.8  1.1  1.4  1.7  2.0  2.3  2.6   2.9   0.0   0.0   0.0
   [3,]  0.3  0.8  2.3  1.8  2.3  2.8  3.3  3.8  4.3   4.8   0.0   0.0   0.0
   [4,]  0.4  1.1  1.8  3.5  3.2  3.9  4.6  5.3  6.0   6.7   0.0   0.0   0.0
   [5,]  0.5  1.4  2.3  3.2  5.1  5.0  5.9  6.8  7.7   8.6   0.0   0.0   0.0
   [6,]  0.6  1.7  2.8  3.9  5.0  7.1  7.2  8.3  9.4  10.5   0.0   0.0   0.0
   [7,]  0.7  2.0  3.3  4.6  5.9  7.2  9.5  9.8 11.1  12.4   0.0   0.0   0.0
   [8,]  0.8  2.3  3.8  5.3  6.8  8.3  9.8 12.3 12.8  14.3   0.0   0.0   0.0
   [9,]  0.9  2.6  4.3  6.0  7.7  9.4 11.1 12.8 15.5  16.2   0.0   0.0   0.0
  [10,]  1.0  2.9  4.8  6.7  8.6 10.5 12.4 14.3 16.2  19.1   0.0   0.0   0.0
  [11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   1.1   0.2   0.3
  [12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.2   1.5   0.8
  [13,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.3   0.8   2.3
  [14,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.4   1.1   1.8
  [15,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.5   1.4   2.3
  [16,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.6   1.7   2.8
  [17,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.7   2.0   3.3
  [18,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.8   2.3   3.8
  [19,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.9   2.6   4.3
  [20,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   1.0   2.9   4.8
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M2: Correlated rand. intercept & rand. slope for all
]]
.row[.split-50[
.column[.content[

The random intercepts `$B_{i0}$` and slopes `$B_{i1}$` are

```r
ranef(M2)$Subject
```

```
      (Intercept)        Days
  310 -38.9563542  -5.4495796
  309 -40.3942719  -8.6205161
  370 -25.6110745   4.8222518
  349 -25.2080191   1.1730997
  350 -13.0694180   6.6142185
  334  -7.2320462   1.0745075
  308   2.2575329   9.1992737
  371   0.8070591  -0.9881730
  369   3.2750882   0.8722876
  351   4.5777099  -3.0152825
  335  -0.3326901 -10.7524799
  332   9.0387625  -0.2720535
  372  12.3133491   1.2842380
  333  16.8389833  -0.2233978
  352  20.8614523   3.5364062
  331  22.2585409  -3.0696766
  330  23.6888704  -4.8141448
  337  34.8865253   8.6290208
```

]]
.column.bg-brand-gray[.content[

The intercepts `$\beta_0+B_{i0}$` and slopes `$\beta_1+B_{i1}$` are

```r
coef(M2)$Subject
```

```
      (Intercept)       Days
  310    212.4488  5.0177063
  309    211.0108  1.8467699
  370    225.7940 15.2895377
  349    226.1971 11.6403856
  350    238.3357 17.0815045
  334    244.1731 11.5417935
  308    253.6626 19.6665597
  371    252.2122  9.4791130
  369    254.6802 11.3395736
  351    255.9828  7.4520035
  335    251.0724 -0.2851939
  332    260.4439 10.1952325
  372    263.7185 11.7515240
  333    268.2441 10.2438881
  352    272.2666 14.0036922
  331    273.6636  7.3976093
  330    275.0940  5.6531411
  337    286.2916 19.0963068
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Correlated random intercept and random slope for all
]]
.row[.content[

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M3: Uncorrelated rand. intercept & rand. slope for all
]]
.row[.split-50[
.column[.content[

Assume instead `$\boldsymbol{u}\sim N\left(\boldsymbol{0},\boldsymbol{I}_{18}\otimes\begin{bmatrix}\sigma^2_0&0\\0&\sigma^2_1\\ \end{bmatrix}\right)$` and `$\boldsymbol{\epsilon}\sim N(\boldsymbol{0},\sigma^2\boldsymbol{I}_{180})$` as `$\hat{\rho}=0.07 \approx 0$` in M2.

```r
M3=lmer(Reaction~Days+(1|Subject)+(0+Days|Subject), 
data=sleepstudy) #separate rand effect
M3 #not correlated, no rho
```

```
  Linear mixed model fit by REML ['lmerMod']
  Formula: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
     Data: sleepstudy
  REML criterion at convergence: 1743.669
  Random effects:
   Groups    Name        Std.Dev.
   Subject   (Intercept) 25.050  
   Subject.1 Days         5.989  
   Residual              25.565  
  Number of obs: 180, groups:  Subject, 18
  Fixed Effects:
  (Intercept)         Days  
       251.41        10.47
```
So `$\beta_0=251.41$` and `$\beta_1=10.47$` have no change.
]]
.column.bg-brand-gray[.content[

```r
cbind(ranef(M2)$Subject,coef(M2)$Subject)
```

```
      (Intercept)        Days (Intercept)       Days
  310 -38.9563542  -5.4495796    212.4488  5.0177063
  309 -40.3942719  -8.6205161    211.0108  1.8467699
  370 -25.6110745   4.8222518    225.7940 15.2895377
  349 -25.2080191   1.1730997    226.1971 11.6403856
  350 -13.0694180   6.6142185    238.3357 17.0815045
  334  -7.2320462   1.0745075    244.1731 11.5417935
  308   2.2575329   9.1992737    253.6626 19.6665597
  371   0.8070591  -0.9881730    252.2122  9.4791130
  369   3.2750882   0.8722876    254.6802 11.3395736
  351   4.5777099  -3.0152825    255.9828  7.4520035
  335  -0.3326901 -10.7524799    251.0724 -0.2851939
  332   9.0387625  -0.2720535    260.4439 10.1952325
  372  12.3133491   1.2842380    263.7185 11.7515240
  333  16.8389833  -0.2233978    268.2441 10.2438881
  352  20.8614523   3.5364062    272.2666 14.0036922
  331  22.2585409  -3.0696766    273.6636  7.3976093
  330  23.6888704  -4.8141448    275.0940  5.6531411
  337  34.8865253   8.6290208    286.2916 19.0963068
```
So `$\sigma_0=25.050$`, `$\sigma_1=5.989$`, `$\sigma=25.565$` are very close to M2 because `$\rho\approx 0$` in M2.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# M3: Uncorrelated rand. intercept & rand. slope for all 
]]
.row[.content[

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Likelihood ratio tests on variance components 
]]
.row[.split-50[
.column[.content[

* For model `M2`, if `$\rho=0$` then it is model `M3`.
* We can use a .brand-blue[likelihood ratio test] with a reference distribution of a `$\chi^2_1$`  under the `$H_0: \rho=0$`.

```r
anova(M3, M2)
```

```
  refitting model(s) with ML (instead of REML)
```

```
  Data: sleepstudy
  Models:
  M3: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
  M2: Reaction ~ Days + (Days | Subject)
     Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
  M3  5 1762.0 1778.0 -876.00   1752.0                         
  M2  6 1763.9 1783.1 -875.97   1751.9 0.0639      1     0.8004
```
]]
.column.bg-brand-gray[.content[

So the difference between M2 and M3 is insignificant or `$\rho=0$`.

In the next plot which compares M1, M2 and M3, there is no obvious difference between M2 and M3.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sleep data all together 
]]
.row[.content[

]]