Exam Revision

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Exam Revision
## .black[STAT3022 Applied Linear Models]

### .black[2020/05/23]

]]

.column.bg-brand-charcoal[.content.white[

1. Important information
2. Revision

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Exam Information
]]
.row[.content[

* The exam is 2 hours with 10 minutes reading time and 15 minutes document upload time.
* Read exam information sheet. 
* There are Three questions NOT of equal value.
* All materials from week 1 to 13 are covered. 
* Exam consultation hours to be announced a little later. Do complete the course survey!

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# What to study
]]
.row[.split-two[
.column[.content[
## Focus will be on applied
* Go through tutorial and assignment questions.
* Know well in particular (not exclusive): 
  * Multiple linear regression 
  * Outliers and high leverage points
  * Variable selection
  * General F-test
  * Contrasts
  * ANOVA
  * Experimental design
  * Nested design
  * Random effect model
  
]]
.column.bg-brand-gray[.content[
## Don't put too much focus 
* The derivation of REML
* Robust regression (only how to interpret and use it)

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Averages by block of corn yields, for treatment 111
]]
.row[.split-50[
.column[.content[

Lets read a data and play around.

```r
dat <- read.csv(file="data/ant111br.csv",header=T)
dat[7:10,]; dim(dat)
```

```
     site parcel    id ears harvwt
  7  WEAN      I 115.0 43.5  5.040
  8  WLAN      I 151.0 50.0  2.020
  9  DBAN     II  15.5 46.0  4.805
  10 LFAN     II  51.0 46.5  4.770
```

```
  [1] 32  5
```

* site: a factor with levels DBAN LFAN NSAN ORAN OVAN TEAN WEAN WLAN

* parcel: a factor with levels I II III IV

* parcel & site are 2 crossed factors, each with equal size. Others are numeric.

* take harvwt as outcomes. .brand-blue[Q0. What design is it?]

]]
.column.bg-brand-gray[.content[

Full model

```r
M0=lm(harvwt ~ 1,data=dat)
Mf=lm(harvwt ~ . ,data=dat)
anova(Mf)
```

```
  Analysis of Variance Table
  
  Response: harvwt
            Df Sum Sq Mean Sq F value  Pr(>F)    
  site       7 70.339 10.0485 30.9528 4.3e-09 ***
  parcel     3  4.406  1.4686  4.5238 0.01481 *  
  id         1  0.411  0.4114  1.2673 0.27429    
  ears       1  2.876  2.8756  8.8578 0.00776 ** 
  Residuals 19  6.168  0.3246                    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Andrews DF; Herzberg AM, 1985. Data. A Collection of Problems from Many Fields for the Student and Research Worker. Springer-Verlag. (pp. 339-353)

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Visualisation of the data
]]
.row[.split-50[
.column[.content[

```r
par(mfrow=c(1,2))
boxplot(harvwt~site)
boxplot(harvwt~parcel)
```

]]
.column.bg-brand-gray[.content[

```r
plot(dat)
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection for M1 and general F test
]]
.row[.split-50[
.column[.content[

```r
M1=step(Mf,scope=list(lower=harvwt~1,upper=harvwt~.),
  direction="both",k=log(32)) #AIC should be BIC
```

```
 Start: AIC=-7.63
 harvwt ~ site + parcel + id + ears
 
 Df Sum of Sq RSS AIC
 - parcel 3 0.173 6.342 -17.138
 - id 1 0.076 6.244 -10.702
 <none> 6.168 -7.628
 - ears 1 2.876 9.044 1.151
 - site 7 50.230 56.399 38.929
 
 Step: AIC=-17.14
 harvwt ~ site + id + ears
 
 Df Sum of Sq RSS AIC
 <none> 6.342 -17.1384
 - id 1 1.903 8.244 -12.2067
 + parcel 3 0.173 6.168 -7.6283
 - ears 1 3.331 9.672 -7.0952
 - site 7 51.289 57.631 29.2235
```

]]
.column.bg-brand-gray[.content[

```r
anova(M1)
```

```
  Analysis of Variance Table
  
  Response: harvwt
            Df Sum Sq Mean Sq F value    Pr(>F)    
  site       7 70.339 10.0485  34.860 1.769e-10 ***
  id         1  4.189  4.1885  14.531 0.0009531 ***
  ears       1  3.331  3.3308  11.555 0.0025755 ** 
  Residuals 22  6.342  0.2883                      
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
c(var(harvwt),var(harvwt)*(length(harvwt)-1))
```

```
  [1]  2.716135 84.200180
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# General F test and adjusted `$R^2$`
]]
.row[.split-50[
.column[.content[

```r
anova(M0,M1,Mf) #check
```

```
  Analysis of Variance Table
  
  Model 1: harvwt ~ 1
  Model 2: harvwt ~ site + id + ears
  Model 3: harvwt ~ site + parcel + id + ears
    Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
  1     31 84.200                                  
  2     22  6.342  9    77.859 26.648 6.229e-09 ***
  3     19  6.168  3     0.173  0.178      0.91    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
summary(M1)$adj.r.squared #check
```

```
  [1] 0.8938746
```

]]
.column.bg-brand-gray[.content[

```r
M2=lm(harvwt~id+parcel,data=dat)
round(c(sigma(M2),augment(M2)$.hat[1]),4)
```

```
  [1] 1.6838 0.1997
```

```r
unname(summary(M2)$res[1])
```

```
  [1] 1.955654
```

```r
c(augment(M2)$.cooksd[1],qf(0.5,5,32-5))#check
```

```
  [1] 0.08414524 0.89244181
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Prediction
]]
.row[.split-60[
.column[.content[

```r
round(coef(M2),6)
```

```
  (Intercept)          id    parcelII   parcelIII    parcelIV 
     3.192797    0.003850    0.621552    0.631907    0.917547
```

```r
round(vcov(M2),6)
```

```
              (Intercept)        id  parcelII parcelIII  parcelIV
  (Intercept)    0.576330 -0.001695 -0.338289 -0.324411 -0.308732
  id            -0.001695  0.000013 -0.000123 -0.000229 -0.000349
  parcelII      -0.338289 -0.000123  0.709951  0.356566  0.357704
  parcelIII     -0.324411 -0.000229  0.356566  0.712832  0.360559
  parcelIV      -0.308732 -0.000349  0.357704  0.360559  0.718176
```

```r
c(qt(0.975,32-5))
```

```
  [1] 2.051831
```

.brand-blue[Q5. What is the predicted harvwt for parcel=I (baseline) when id=10 and 95% PI using M2?]

]]
.column.bg-brand-gray[.content[

```r
pred0=data.frame(id=10,parcel="I")#check
predict(M2,newdata=pred0,
  interval="prediction",level=0.95)
```

```
         fit        lwr      upr
  1 3.231294 -0.5403101 7.002899
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Constrast and multiple comparison
]]
.row[.split-50[
.column[.content[

```r
M1=lm(harvwt~site + id + ears,data=dat)
M2=lm(harvwt~id + parcel,data=dat)
```
.brand-blue[Q6. What are the model equations for M1 and M2?]

```r
M2a=lm(harvwt~id*parcel)
round(coef(M2a)[1:4],6); round(coef(M2a)[5:8],6); 
```

```
  (Intercept)          id    parcelII   parcelIII 
     3.396007    0.002298    0.319258    0.591695
```

```
      parcelIV  id:parcelII id:parcelIII  id:parcelIV 
      0.410399     0.002257     0.000455     0.003477
```

.brand-blue[Q7. What is the design matrix] `$\color{blue}{\mathbf{X}}$` .brand-blue[of obs 7-10 in P.4 for M2a? What are the regression lines of harvwt against id for parcel=I & III using M2a?]

]]
.column.bg-brand-gray[.content[

```r
M3=lm(harvwt~parcel,data=dat); sigma(M3)
```

```
  [1] 1.688135
```

```r
tapply(harvwt,parcel,length)
```

```
    I  II III  IV 
    8   8   8   8
```

```r
tapply(harvwt,parcel,mean)
```

```
         I       II      III       IV 
  3.696875 4.355000 4.396875 4.718125
```

.brand-blue[Q8. What is the CI for the diff between parcel 1 and others using M3? Is the diff significant? Is this diff orthogonal to diff of parcel 1 & 2?]

.brand-blue[Q9. What if the diff is corrected for all possible pairwise tests using Bonferroni, Tukey and Scheffe?]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Contrast in ANCOVA model
]]
.row[.split-60[
.column[.content[

```r
M2=lm(harvwt~id + parcel,data=dat); round(coef(M2),6)
```

```
  (Intercept)          id    parcelII   parcelIII    parcelIV 
     3.192797    0.003850    0.621552    0.631907    0.917547
```

```r
S0xx=deviance(lm(id~parcel,data=dat)); S0xx
```

```
  [1] 219012.2
```

```r
round(tapply(harvwt,parcel,mean),4)
```

```
       I     II    III     IV 
  3.6969 4.3550 4.3969 4.7181
```

```r
tapply(id,parcel,mean)
```

```
         I       II      III       IV 
  130.9375 140.4375 148.6250 157.8750
```

]]
.column.bg-brand-gray[.content[

```r
sigma(M2)
```

```
  [1] 1.683784
```

```r
qt(0.975,27)
```

```
  [1] 2.051831
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Quantitative factors
]]
.row[.split-40[
.column[.content[

Suppose the four parcels have 1.1, 1.4, 1.8, 2.1 kg of nitrogen respectively.

```r
x=rep(c(1.1, 1.4, 1.8, 2.1),each=8) 
M3a=lm(harvwt~x)
```

.brand-blue[Q11. What is the linear component of the SS that measures the linear effect of nitrogen on harvwt? How to select the suitable higher order?]

```r
anova(M3a) #check
```

```
  Analysis of Variance Table
  
  Response: harvwt
            Df Sum Sq Mean Sq F value Pr(>F)
  x          1  3.715  3.7153  1.3849 0.2485
  Residuals 30 80.485  2.6828
```

]]
.column.bg-brand-gray[.content[

```r
MOP <- lm(harvwt ~ poly(x, degree = 3)) #higer deg is singular
print(broom::tidy(MOP), digits=2)
```

```
 # A tibble: 4 x 5
 term estimate std.error statistic p.value
 <chr> <dbl> <dbl> <dbl> <dbl>
 1 (Intercept) 4.29 0.298 14.4 1.86e-14
 2 poly(x, degree = 3)1 1.93 1.69 1.14 2.63e- 1
 3 poly(x, degree = 3)2 -0.476 1.69 -0.282 7.80e- 1
 4 poly(x, degree = 3)3 0.681 1.69 0.403 6.90e- 1
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Two-way `$8 \times 4$` factorial design
]]
.row[.split-50[
.column[.content[

```r
M4=lm(harvwt~site*parcel,data=dat)
anova(M4) #why RSS=0?
```

```
  Analysis of Variance Table
  
  Response: harvwt
              Df Sum Sq Mean Sq F value Pr(>F)
  site         7 70.339 10.0485               
  parcel       3  4.406  1.4686               
  site:parcel 21  9.455  0.4502               
  Residuals    0  0.000
```
Suppose 2 similar sites can be combined to create 2 repeats.

```r
siteC=rep(c("A","B","B","A","C","D","C","D"),4)
```

So parcel and sitC are two .brand-blue[crossed] factors with 2 repeats for each combination.

]]
.column.bg-brand-gray[.content[

```r
M4a=lm(harvwt~siteC*parcel,data=dat)
sigma(M4a)
```

```
  [1] 1.151487
```

```r
tapply(harvwt,parcel,sum)
```

```
       I     II    III     IV 
  29.575 34.840 35.175 37.745
```

```r
tapply(harvwt,siteC,sum)
```

```
       A      B      C      D 
  47.200 25.190 41.435 23.510
```

.brand-blue[Q12. What is the ANOVA table for M4a given SS_SC=52.091? Is the parcel effect significant?]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Two-way `$4^2$` factorial design and ANOVA table
]]
.row[.split-50[
.column[.content[

```r
anova(M4a) #check
```

```
  Analysis of Variance Table
  
  Response: harvwt
               Df Sum Sq Mean Sq F value    Pr(>F)    
  siteC         3 52.091 17.3637 13.0956 0.0001415 ***
  parcel        3  4.406  1.4686  1.1076 0.3750266    
  siteC:parcel  9  6.488  0.7209  0.5437 0.8220754    
  Residuals    16 21.215  1.3259                      
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Remark: We can also have `$2^4$` factorial design by dichotomise the 4 variables. Remember standard order.

]]
.column.bg-brand-gray[.content[

```r
anova(lm(harvwt~parcel*siteC,data=dat))#Same as M4a?
```

```
  Analysis of Variance Table
  
  Response: harvwt
               Df Sum Sq Mean Sq F value    Pr(>F)    
  parcel        3  4.406  1.4686  1.1076 0.3750266    
  siteC         3 52.091 17.3637 13.0956 0.0001415 ***
  parcel:siteC  9  6.488  0.7209  0.5437 0.8220754    
  Residuals    16 21.215  1.3259                      
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

.brand-blue[Q13. Why the SS remains unchanged if the two categorial variables swap order in M4a? What if we have ancova model with one categorial and one continuous variables?]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# ANCOVA and nested factor
]]
.row[.split-40[
.column[.content[

```r
anova(lm(harvwt~parcel*id,data=dat))#ANCOVA
```

```
  Analysis of Variance Table
  
  Response: harvwt
            Df Sum Sq Mean Sq F value Pr(>F)
  parcel     3  4.406  1.4686  0.4630 0.7107
  id         1  3.246  3.2459  1.0234 0.3218
  parcel:id  3  0.430  0.1434  0.0452 0.9869
  Residuals 24 76.118  3.1716
```

```r
anova(lm(harvwt~id*parcel,data=dat))#change
```

```
  Analysis of Variance Table
  
  Response: harvwt
            Df Sum Sq Mean Sq F value Pr(>F)
  id         1  4.102  4.1025  1.2935 0.2666
  parcel     3  3.549  1.1831  0.3730 0.7732
  id:parcel  3  0.430  0.1434  0.0452 0.9869
  Residuals 24 76.118  3.1716
```

]]
.column.bg-brand-gray[.content[

Create also nested factor siteP under parcel (I to IV) with **16 levels**. 2 repeats each.

|I|II|III|IV|
|-----|-----|------|-----|
|AI BI CI DI| AII BII CII DII | AIII BIII CIII DIII | AIV BIV CIV DIV|

Note that crossed factor siteC under parcel (I to IV) has **4 levels**. 2 repeats each.

|I|II|III|IV|
|-----|-----|------|-----|
|A B C D|A B C D|A B C D|A B C D|

```r
siteP=paste(siteC,parcel)
```

```r
M4b=lm(harvwt~parcel/siteP,data=dat)
M4c=lm(harvwt~parcel/siteC,data=dat)
```

.brand-blue[Q14. If nested factor siteP (or crossed factor siteC) are to be hidden in the output, what is the SS for parcel:siteP?]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Nested factor
]]
.row[.split-50[
.column[.content[

```r
anova(M4b)#check with M4c. Same siteP & siteC.
```

```
  Analysis of Variance Table
  
  Response: harvwt
               Df Sum Sq Mean Sq F value  Pr(>F)   
  parcel        3  4.406  1.4686  1.1076 0.37503   
  parcel:siteP 12 58.580  4.8816  3.6817 0.00845 **
  Residuals    16 21.215  1.3259                   
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
anova(M4a)#compare with nested design
```

]]
.column.bg-brand-gray[.content[

```r
anova(M4c) 
```

```
  Analysis of Variance Table
  
  Response: harvwt
               Df Sum Sq Mean Sq F value  Pr(>F)   
  parcel        3  4.406  1.4686  1.1076 0.37503   
  parcel:siteC 12 58.580  4.8816  3.6817 0.00845 **
  Residuals    16 21.215  1.3259                   
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

.brand-blue[Q15. If siteP (repeat your answer with siteC), which are nested under parcel forms the EU and the OU are each repeats, how is RSS splitted for the EU and OU under the main effect model (repeat your answer with the interaction model with siteC as a nested factor)?]

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Nested design, main, crossed & nested factors 
]]
.row[.split-50[
.column[.content[

```r
summary(aov(harvwt~parcel+Error(siteP),data=dat))#main
```

```
  
  Error: siteP
            Df Sum Sq Mean Sq F value Pr(>F)
  parcel     3   4.41   1.469   0.301  0.824
  Residuals 12  58.58   4.882               
  
  Error: Within
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 16  21.21   1.326
```

EU: parcel with 4-1 df is within siteP with 16-1 df. .brand-blue[RSS is SS for siteC & siteC:parcel with 3+9=12 df.]

OU: .brand-blue[Remaining RSS with 31-15=16 df.]

RSS and its 28 df is splitted into part in EU with SS for siteC & .brand-blue[siteC:parcel] (12 df) and in OU with remaining RSS (16 df).

]]
.column.bg-brand-gray[.content[

```r
summary(aov(harvwt~parcel+Error(siteC),data=dat))#main
```

```
  
  Error: siteC
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals  3  52.09   17.36               
  
  Error: Within
            Df Sum Sq Mean Sq F value Pr(>F)
  parcel     3  4.406   1.469   1.325  0.288
  Residuals 25 27.703   1.108
```

EU: siteC has 4-1 df. .brand-blue[RSS is SS for siteC.]

OU: parcel is not nested but crossed with siteC. So parcel with 4-1 df goes outside siteC (EU) with 32-1 df in total. .brand-blue[RSS has SS for siteC:parcel & RSS with 6.49+21.22=27.71 at 9+16=25 df.]

RSS and its 28 df is splitted into part in EU with SS for siteC (3 df) and in OU with SS for .brand-blue[siteC:parcel] and remaining RSS (9+16 df).

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Nested design, interaction, crossed & nested factors
]]
.row[.split-50[
.column[.content[

```r
summary(aov(harvwt~parcel/siteC+Error(siteP),
            data=dat)) #interaction
```

```
  
  Error: siteP
               Df Sum Sq Mean Sq
  parcel        3   4.41   1.469
  parcel:siteC 12  58.58   4.882
  
  Error: Within
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 16  21.21   1.326
```

EU: parcel and parcel:site (RSS in main effect) are within siteP.

OU: remaining RSS same as main effect model.

Diff is on treatment SS. RSS are the same with 16 df. The 9+3 df of RSS in main model goes to parcel:siteC as treatment SS.

]]
.column.bg-brand-gray[.content[

```r
summary(aov(harvwt~parcel/siteC+Error(siteC),
            data=dat)) #interaction
```

```
  
  Error: siteC
               Df Sum Sq Mean Sq
  parcel:siteC  3  52.09   17.36
  
  Error: Within
               Df Sum Sq Mean Sq F value Pr(>F)
  parcel        3  4.406  1.4686   1.108  0.375
  parcel:siteC  9  6.488  0.7209   0.544  0.822
  Residuals    16 21.215  1.3259
```

EU: SS for parcel:siteC is SS for siteC. The nested model parcel/siteC does not allow siteC to appear.

OU: parcel & parcel:siteC are within OU, outside siteC. RSS is the remaining RSS with 16 df.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Latin square & balance incomplete block design
]]
.row[.split-50[
.column[.content[

No LSD is possible for this data with 2 categorical variables as LSD has 2 block factors and 1 treatment factor. For a 4 level treatment say, since `$n=16$`, only main effect model (no repeats for interaction) is possible.

To create BICBD, take parcel as `$k=4$` treatments, set `$r=6$` (instead of `$r=8$`), `$b=8$` and `$k=3$`. Total `$n=24$` and so drop 8 observations. BICBD is

```r
datr=dat[-c(7,8,13,14,19,20,25,26),] 
```

|.gray[A]|.gray[A]|.gray[A]|.gray[A]|.gray[A]|.gray[A]|.red[B]|.red[B]|
|-----|-----|------|-----|-----|-----|-----|-----|
|.red[B]|.red[B]|.red[B]|.red[B]|.blue[C]|.blue[C]|.blue[C]|.blue[C]|
|.blue[C]|.blue[C]|.green[D]|.green[D]|.green[D]|.green[D]|.green[D]|.green[D]|

]]
.column.bg-brand-gray[.content[

Randomisation: col (1,2,3,4,5,6,7,8) to (3,6,1,7,4,8,2,5)

|.gray[A]|.gray[A]|.gray[A]|.red[B]|.gray[A]|.red[B]|.gray[A]|.gray[A]|
|-----|-----|------|-----|-----|-----|-----|-----|
|.red[B]|.blue[C]|.red[B]|.blue[C]|.red[B]|.blue[C]|.red[B]|.blue[C]|
|.green[D]|.green[D]|.blue[C]|.green[D]|.green[D]|.green[D]|.blue[C]|.green[D]|

Randomisation: row (2,3,1),(3,1,2),(1,3,2),(1,2,3),(3,1,2),(3,2,1),(2,3,1),(2,1,3)

|.red[B]|.green[D]|.gray[A]|.red[B]|.green[D]|.green[D]|.red[B]|.blue[C]|
|-----|-----|------|-----|-----|-----|-----|-----|
|.green[D]|.gray[A]|.blue[C]|.blue[C]|.gray[A]|.blue[C]|.blue[C]|.gray[A]|
|.gray[A]|.blue[C]|.red[B]|.green[D]|.red[B]|.red[B]|.gray[A]|.green[D]|

Data is

|.red[3.17]|.green[3.515]|.gray[5.16]|.red[4.6]|.green[7.07]|.green[2.65]|.red[4.77]|.blue[5.565]|
|-----|-----|------|-----|-----|-----|-----|-----|
|.green[1.97]|.gray[2.65]|.blue[5.065]|.blue[6.34]|.gray[6.79]|.blue[3.16]|.blue[4.33]|.gray[3.255]|
|.gray[1.73]|.blue[2.79]|.red[4.805]|.green[6.125]|.red[7.365]|.red[2.665]|.gray[2.93]|.green[6.235]|

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Balance incomplete block design
]]
.row[.split-50[
.column[.content[

```r
M5=lm(harvwtr~block+parcelr)
round(tapply(harvwtr,parcelr,mean),4)
```

```
       I     II    III     IV 
  3.7525 4.5625 4.5417 4.7392
```

```r
round(tapply(harvwtr,block,mean)[1:4],4)
```

```
       1      2      3      4 
  2.2900 2.9850 5.0100 5.6883
```

```r
round(tapply(harvwtr,block,mean)[5:8],4)
```

```
       5      6      7      8 
  7.0750 3.1150 4.0100 5.0183
```

]]
.column.bg-brand-gray[.content[

```r
anova(M5)
```

```
  Analysis of Variance Table
  
  Response: harvwtr
            Df Sum Sq Mean Sq F value    Pr(>F)    
  block      7 53.483  7.6404 15.0542 2.642e-05 ***
  parcelr    3  4.160  1.3866  2.7322   0.08646 .  
  Residuals 13  6.598  0.5075                      
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Random effect model
]]
.row[.split-50[
.column[.content[

```r
library(lme4)
M6=lmer(harvwt~parcel+(1|site))
summary(M6); ranef(M6)
```

```
  $site
       (Intercept)
  DBAN  0.56668029
  LFAN -0.08044263
  NSAN -2.10300025
  ORAN  2.50566115
  OVAN  0.51653423
  TEAN -1.19917728
  WEAN  1.17917855
  WLAN -1.38543406
  
  with conditional variances for "site"
```

.brand-blue[Q18. What is model equation in terms of the matrics and vectors] `$\color{blue}{\mathbf{X}}$`, `$\color{blue}{\boldsymbol{\beta}}$`,  `$\color{blue}{\mathbf{Z}}$`, `$\color{blue}{\boldsymbol{u}}$` .brand-blue[for M5? What are their estimates and the matrix ] `$\color{blue}{\mathbf{X}^\top\mathbf{X}}$`.brand-blue[? When should random effect model be used?]

]]
.column.bg-brand-gray[.content[

```
  Linear mixed model fit by REML ['lmerMod']
  Formula: harvwt ~ parcel + (1 | site)
  
  REML criterion at convergence: 87.2
  
  Scaled residuals: 
      Min      1Q  Median      3Q     Max 
  -1.4282 -0.6298  0.1011  0.4391  1.4907 
  
  Random effects:
   Groups   Name        Variance Std.Dev.
   site     (Intercept) 2.3983   1.5486  
   Residual             0.4503   0.6711  
  Number of obs: 32, groups:  site, 8
  
  Fixed effects:
              Estimate Std. Error t value
  (Intercept)   3.6969     0.5967   6.195
  parcelII      0.6581     0.3355   1.961
  parcelIII     0.7000     0.3355   2.086
  parcelIV      1.0212     0.3355   3.044
  
  Correlation of Fixed Effects:
            (Intr) prclII prcIII
  parcelII  -0.281              
  parcelIII -0.281  0.500       
  parcelIV  -0.281  0.500  0.500
```

]]
]]