Random Effect Models

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Random Effect Models
## .black[STAT3022 Applied Linear Models Lecture 31]

### .black[2020/02/20]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Random effects model
2. Interclass correlation
3. Use of lmer function

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Fun fact
]]
.row[.split-60[
.column[.content[

* Churchill Eisenhart (1913-1994) was chief of the Statistical
Engineering Laboratory, National Bureau of Standards, USA.
* Eisenhart (1947) made explicit the distinction between fixed
and random interpretations of ANOVA and introduced this
terminology.
* The random effects model was routinely referred to as
.brand-blue[Eisenhart's Model II].

.bottom_abs.width100[
Eisenhart C. (1947) The assumptions underlying the analysis of variance. *Biometrics* **47** 1-21
]

]]
.column.bg-brand-gray[.content[

<img src="images/Churchill-Eisenhart.png" width="50%" height="50%"> 
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sodium content in beer
]]
.row[.split-50[
.column[.content[

* .brand-blue[Six] brands from the large number of North American beers were selected at random. 
* Per brand .brand-blue[eight] 12-ounce ( `$\approx$` 3.5 dl) bottles were sampled and the sodium content (in mg) of each bottle was measured.
* What is of interest?

* What is the expected value of sodium content for a given beer brand?
  * Does the sodium content differ significantly between brands?
  * What is the expected value of sodium content in a beer?
  * What is the variability of the sodium content between different beer brands?
  
]]

.column.bg-brand-gray[.content[

```r
dat
```

```
     sodium brand rep
  1    24.4     1   1
  2    22.6     1   2
  3    23.8     1   3
  4    22.0     1   4
  5    24.5     1   5
  6    22.3     1   6
  7    25.0     1   7
  8    24.5     1   8
  9    10.2     2   1
  10   12.1     2   2
  11   10.3     2   3
  12   10.2     2   4
  13    9.9     2   5
  14   11.2     2   6
  15   12.0     2   7
  16    9.5     2   8
  17   19.2     3   1
  18   19.4     3   2
  19   19.8     3   3
  20   19.0     3   4
  21   19.6     3   5
  22   18.3     3   6
  23   20.0     3   7
  24   19.4     3   8
  25   17.4     4   1
  26   18.1     4   2
  27   16.7     4   3
  28   18.3     4   4
  29   17.6     4   5
  30   17.5     4   6
  31   18.0     4   7
  32   16.4     4   8
  33   13.4     5   1
  34   15.0     5   2
  35   14.1     5   3
  36   13.1     5   4
  37   14.9     5   5
  38   15.0     5   6
  39   13.4     5   7
  40   14.8     5   8
  41   21.3     6   1
  42   20.2     6   2
  43   20.7     6   3
  44   20.8     6   4
  45   20.1     6   5
  46   18.8     6   6
  47   21.1     6   7
  48   20.3     6   8
```
]
.bottom_abs.width100[
Data from Kutner et al. (2005) *Applied Linear Statistical Models* P.1078.
]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Sodium content in beer
]]
.row[.split-50[
.column[.content[

```r
ggplot(dat, aes(brand, sodium, fill=brand)) + 
  geom_boxplot() + theme_classic() + xlab("Brand") +
  ylab("Sodium") + 
  theme(axis.text=element_text(size=16),
        axis.title=element_text(size=20)) + 
  guides(fill=F) + geom_hline(yintercept=mean(sodium),
                              linetype="dotted")
```

]]

.column.bg-brand-gray[.content[
<img src="lecture31_2020JC_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" />

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Example: Beer - one-way ANOVA model
]]
.row[.split-50[
.column[.content[
* What is the expected value of the sodium content for given brands?

```r
anova(lm(sodium ~ brand - 1))
```

```
 Analysis of Variance Table
 
 Response: sodium
 Df Sum Sq Mean Sq F value Pr(>F) 
 brand 6 15772.3 2628.72 3671.6 < 2.2e-16 ***
 Residuals 42 30.1 0.72 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
round(coef(M1 <-lm(sodium ~ brand-1)),1) #brand mean
```

```
  brand1 brand2 brand3 brand4 brand5 brand6 
    23.6   10.7   19.3   17.5   14.2   20.4
```
* Do the sodium content differ significantly between brands?

]]
.column.bg-brand-gray[.content[

```r
summary(aov(sodium ~ brand))
```

```
 Df Sum Sq Mean Sq F value Pr(>F) 
 brand 5 854.5 170.91 238.7 <2e-16 ***
 Residuals 42 30.1 0.72 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
* What is the expected value of sodium content?
* What is the variability of sodium content between brands?

```r
mi=tapply(sodium,brand,mean); var(mi) #sigA^2
```

```
  [1] 21.36323
```

```r
c(mean(sodium),deviance(M1)/42) #intercept and MSR
```

```
  [1] 17.6291667  0.7159524
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# One-way ANOVA model II
]]
.row[.split-50[
.column[.content[

* Let `$Y_{ij}$` denote the `$j$`-th observation on factor level `$i$`.
* We have so far modelled `$Y_{ij}$` as

`$$Y_{ij}=\mu+\alpha_i+\epsilon_{ij}, \ j=1,\ldots,n_i, \  i=1,\ldots,t,$$`

where `$\epsilon_{ij}\sim NID(0,\sigma^2)$` and `$\alpha_i$` is a .brand-blue[fixed effect]. This is referred to as one-way ANOVA model .brand-blue[I] at times.

* For one-way ANOVA model .brand-blue[II], we model `$Y_{ij}$` as

`$$Y_{ij}=\mu+A_i+\epsilon_{ij}, \ j=1,\ldots,n_i, \  i=1,\ldots,t,$$`
where `$\color{blue}{A_i\sim NID(0,\sigma^2_A)}$`, `$\epsilon_{ij}\sim NID(0,\sigma^2)$`, `$A_i$` is a .brand-blue[random variable] independent of  `$\epsilon_{ij}$`. This is more commonly referred as .brand-blue[random effects model].

* Note: by convention italic upper case letters are random variables (although we violate this at times for convenience).

]]
.column.bg-brand-gray[.content[

```r
library(lme4)
M0 <- lmer(sodium ~ 1+(1|brand), data=dat)
summary(M0)
```

```
  Linear mixed model fit by REML ['lmerMod']
  Formula: sodium ~ 1 + (1 | brand)
     Data: dat
  
  REML criterion at convergence: 148.9
  
  Scaled residuals: 
       Min       1Q   Median       3Q      Max 
  -1.90551 -0.68338  0.08232  0.79247  1.64969 
  
  Random effects:
   Groups   Name        Variance Std.Dev.
   brand    (Intercept) 21.274   4.6123  
   Residual              0.716   0.8461  
  Number of obs: 48, groups:  brand, 6
  
  Fixed effects:
              Estimate Std. Error t value
  (Intercept)   17.629      1.887   9.343
```
So `$\hat\mu=17.629$`, `$\hat\sigma^2=0.716$` and `$\hat{\sigma}^2_A=21.274$`. 
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Prediction of random effects 
]]
.row[.split-50[
.column[.content[

The deviation from the average sodium content by brand is `$A_i$` given by

```r
ranef(M0)
```

```
  $brand
    (Intercept)
  1   5.9831634
  2  -6.9250345
  3   1.7011768
  4  -0.1286256
  5  -3.4023537
  6   2.7716735
  
  with conditional variances for "brand"
```

```r
round(tapply(sodium,brand,mean)-mean(sodium),3)
```

```
       1      2      3      4      5      6 
   6.008 -6.954  1.708 -0.129 -3.417  2.783
```

]]
.column.bg-brand-gray[.content[

* The estimation method REML refers to .brand-blue[restricted maximum likelihood]. 
* Note that we often say we .brand-blue[predict] random effects and .brand-blue[estimate] fixed effects and variance parameters.
* That is, we .brand-blue[predict a realization of a random variable] while .brand-blue[estimate a parameter of a random variable].  
* The model parameters of ANOVA II as a random effect model are `$\mu$`, `$\sigma_a^2$` and `$\sigma^2$`. We do not consider `$A_i$` as a model parameters but just a random realisation from the population of `$A_i$`. 
* Hence this model has less parameters and is preferred when there are large number of brands with few replicates so that there are large number of model parameters `$\alpha_i$` with insufficient replicates to estimate these `$\alpha_i$` parameters.

]]
]]
---

.row.bg-brand-blue.white[.content.vmiddle[
# Differences between one-way ANOVA model I and II
]]
.row[.split-50[
.column[.content[
Recall under .brand-blue[one-way ANOVA model I], 
`$E(Y_{ij}) = \mu + \alpha_i$`, 
`$\text{Cov}(Y_{ij},Y_{ik})=\text{Cov}(\mu+\alpha_i+\epsilon_{ij},\mu+\alpha_i+\epsilon_{ik})$`

`$\hspace{40mm} =\text{Cov}(\epsilon_{ij},\epsilon_{ik})=0$`

and so responses are independent. 
.brand-blue[Note:] zero covariance does not imply independence unless the random variables are jointly Normally distributed.

For .brand-blue[one-way ANOVA model II], 
the model is
`$$Y_{ij}=\mu+A_i+\epsilon_{ij}, \ j=1,\ldots,n_i, \ i=1,\ldots,t.$$`

*  `$E(Y_{ij}) = \mu$` i.e., all observations have the same expected value since `$E(A_{i}) = 0$`; 
*  `$\text{Var}(Y_{ij})=\sigma^2_A+\sigma^2$`, where `$\sigma^2$` and `$\sigma^2_A$` are called the .brand-blue[variance components];

]]
.column.bg-brand-gray[.content[

* `$\text{Cov}(Y_{ij},Y_{ik})=\text{Cov}(\mu+A_i+\epsilon_{ij},\mu+ A_i+\epsilon_{ik})$` 
`$\hspace{40mm}=\text{Var}(A_i) = \sigma^2_A$` 
which means that the observations within groups are not statistically independent!

* This naturally leads to the so-called .brand-blue[intraclass correlation] defined as

`$$\color{blue}{\rho=\frac{\text{Cov}(Y_{ij},Y_{ik})}{\sqrt{\text{Var}(Y_{ij})\text{Var}(Y_{ik})}}=\frac{\sigma^2_A}{\sigma^2_A+\sigma^2}.}$$`
* So far we assume results are observed at each level of the factor.
* In many situations the factor levels are a sample from a larger population of factor levels.
* This situation should be modelled via a .brand-blue[random effects model] also known as .brand-blue[ANOVA model II].

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Hypothesis test in a random-effects model
]]
.row[.split-50[
.column[.content[

* In the random effects situation the hypothesis of interest may be

`$$\color{blue}{H_0:\sigma^2_A=0\quad \text{vs.}\quad H_1: \sigma^2_A>0}$$`
`$H_0:\sigma^2_A=0$` implies .blue[no variability in the factor levels] and so `$A_i=0, \ \forall i$` since `$E(A_i)=0$` or move to `$\mu$`.
* To test this we use the classical `$F$`-test statistic. Other tests such as likelihood ratio, score and Wald test exist but it is out of scope for this course.

`$$f=\frac{\text{SS}_A /(t-1)}{\text{RSS}/(n-t)}=\frac{\text{MS}_A}{\text{MSR}}.$$`

* So `$f\sim F_{t-1,n-t}$` under `$H_0$` as before.
* Thus it is the same test as the one-way ANOVA `$F$`-test but the interpretation changes.

]]
.column.bg-brand-gray[.content[
### Inference for the overall mean estimate

* We have `$\text{E}(\bar{Y}_{\bullet\bullet})=\mu$` so `$\bar{Y}_{\bullet\bullet}$` unbiased for `$\mu$`.

* And `$\text{var}\left(\bar{Y}_{\bullet\bullet}\right)=\frac{1}{n}\sigma^2+\left(\frac{\sum_{i=1}^t {n_i^2}}{n^2}\right)\sigma^2_A$`, which is estimated by  `$\frac{\text{SS}_A/(t-1)}{n}$`.

* Then a 100 `$(1-\alpha)$` % CI for `$\mu$` is:

`$$\hat{\mu} \pm t_{\color{blue}{(t-1)}}(1-\alpha/2) \sqrt{\frac{\color{blue}{\text{SS}_A/(t-1)}}{n}}$$`
	
where `$\hat{\mu}=\bar{Y}_{\bullet\bullet}$`.
* Note that as the `$MSR=SSR/(n-t)$` to estimate `$\sigma^2$` is now replaced by `$\text{MS}_A=SS_A/(t-1)$` to estimate `$n^2 \sigma^2 + \sum_i n_i^2 \sigma_A^2$`, the df also changes from `$n-t$` to `$t-1$`.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Hypothesis Testing
]]
.row[.split-50[
.column[.content[

```r
alpha=0.01; t=6; n=6*8; mu.hat=mean(sodium)
MSA <- anova(lm(sodium ~ brand))[1, 3]
c(mu.hat-qt(1-alpha/2, t-1)*sqrt(MSA/n),
 mu.hat+qt(1-alpha/2, t-1)*sqrt(MSA/n))
```

```
  [1] 10.02076 25.23757
```
which is wider than the one sample 99% CI which is (15.94813, 19.31020) because there are extra source of variability from the random effects.

```r
anova(lm(sodium ~ brand))
```

```
 Analysis of Variance Table
 
 Response: sodium
 Df Sum Sq Mean Sq F value Pr(>F) 
 brand 5 854.53 170.906 238.71 < 2.2e-16 ***
 Residuals 42 30.07 0.716 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]]
.column.bg-brand-gray[.content[

```r
t.test(sodium, conf.level=1 - alpha) #test mu = 0
```

```
 
 	One Sample t-test
 
 data: sodium
 t = 28.153, df = 47, p-value < 2.2e-16
 alternative hypothesis: true mean is not equal to 0
 99 percent confidence interval:
 15.94813 19.31020
 sample estimates:
 mean of x 
 17.62917
```
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Fixed vs. random effects
]]
.row[.split-50[
.column[.content[

### Theoretical considerations

* Fixed effects are constant values but random effects follow a distribution.
* Effects are fixed if they are the interest, ie the fixed `$\alpha_i$` or random if there is interest in the underlying population with variance estimate `$\sigma_A^2$`. 
* When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random.
* If an effect is assumed to be a realised value of a random variable, it is called a random effect.
* Fixed effects are estimated using least squares and random effects are estimated with shrinkage like REML.

]]
.column.bg-brand-gray[.content[
### Question to ask
How does inference change when a set of effects is changed from fixed to random?

.bottom_abs.width100[
Gelman (2004) Analysis of variance - why it is more important than ever. *The Annals of Statistics*. **33** (1) 1-53.
]
]]
]]