Variable Selection: Stepwise, AIC and BIC

class: split-70 with-border hide-slide-number bg-brand-red
background-image: url("images/USydLogo-black.svg")
background-size: 200px
background-position: 2% 90%

.column.white[.content[

# Variable Selection: Stepwise, AIC and BIC
## .black[STAT3022 Applied Linear Models Lecture 11]

### .black[2020/02/17]

]]

.column.bg-brand-charcoal[.content.white[
## Today

1. Stepwise variable selection
2. The step command
3. The AIC, `$C_p$` and BIC variable selection criterion

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection
]]
.row[.split-two[
.column[.content[

1. Start with some model, typically null model (with no explanatory variables) or full model (with all variables).
2. For each variable in the current model, investigate effect of removing it.
3. Remove the least informative variable, unless this variable is nonetheless supplying significant information about the response.
4. For each variable not in the current model, investigate effect of including it.
5. Include the most statistically significant variable not currently in model (unless no significant variable exists).
6. Go to step 2. Stop only if no change in steps 2-5.

]]
.column.bg-brand-gray[.content[
### Cheese data: Stepwise
.brand-blue[First pass through algorithm]

We will perform stepwise variable selection starting from the "null model" (i.e. the model with no explanatory variables) using the following exclusion/inclusion level of significance: 
`$$\text{Exclude: p-value}>p_\text{out} = 0.20\quad\text{and}$$`
`$$\text{Include: p-value} < p_\text{in} = 0.10.$$`
Note: `$p_\text{out}>p_\text{in}$` or some variables will be in and out repeatedly.

```r
dat <- read.table(file = "data/cheese.txt", 
 header = TRUE, row.names = NULL)
M1 <- lm(taste ~ 1, data = dat) # null model
Mf <- lm(taste ~ ., data = dat) # full model
```

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection
]]
.row[.split-two[
.column.bg-brand-gray[.content[
.brand-blue[First pass through algorithm (step 4 - 5)]

There are no variables to drop from `M1`. Hence, the algorithm starts at step 4.

```r
add1(M1, scope = Mf, data = dat, test = "F")
```

```
 Single term additions
 
 Model:
 taste ~ 1
 Df Sum of Sq RSS AIC F value Pr(>F) 
 <none> 7662.9 168.29 
 Acetic 1 2314.1 5348.7 159.50 12.114 0.001658 ** 
 H2S 1 4376.7 3286.1 144.89 37.293 1.374e-06 ***
 Lactic 1 3800.4 3862.5 149.74 27.550 1.405e-05 ***
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
M2 <- update(M1, . ~ . + H2S, data = dat)
```
Add .brand-blue[H2S] with the lowest p-value < `$p_\text{in}=0.10$`.
]]
.column.bg-white[.content[

```r
autoplot(M2, which=1, ncol=1)
```

![](lecture11_2020JC_files/figure-html/unnamed-chunk-5-1.png)

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection
]]
.row[.split-two[
.column.bg-brand-gray[.content[
.brand-blue[ Second pass through algorithm (step 2 - 3)]

```r
drop1(M2, test = "F")
```

```
 Single term deletions
 
 Model:
 taste ~ H2S
 Df Sum of Sq RSS AIC F value Pr(>F) 
 <none> 3286.1 144.89 
 H2S 1 4376.7 7662.9 168.29 37.293 1.374e-06 ***
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
Do not drop .brand-blue[H2S] with p-value not > `$p_\text{out}=0.20$`.

]]
.column.bg-white[.content.vmiddle.center[

Nothing gets dropped here.

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection
]]
.row[.split-two[
.column.bg-brand-gray[.content[

```r
add1(M2, scope = Mf, data = dat, test = "F")
```

```
 Single term additions
 
 Model:
 taste ~ H2S
 Df Sum of Sq RSS AIC F value Pr(>F) 
 <none> 3286.1 144.89 
 Acetic 1 84.41 3201.7 146.11 0.7118 0.40625 
 Lactic 1 617.18 2669.0 140.65 6.2435 0.01885 *
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
M3 <- update(M2, . ~ . + Lactic, data = dat)
```
Add .brand-blue[Lactic] with the lowest p-value < `$p_\text{in}=0.10$`.
]]
.column[.content[

```r
autoplot(M3, which=1, ncol=1)
```

![](lecture11_2020JC_files/figure-html/unnamed-chunk-8-1.png)
]]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection
]]
.row[.split-two[
.column.bg-brand-gray[.content[
.brand-blue[Third pass through algorithm (step 2 - 3)]

```r
drop1(M3, test = "F")
```

```
 Single term deletions
 
 Model:
 taste ~ H2S + Lactic
 Df Sum of Sq RSS AIC F value Pr(>F) 
 <none> 2669.0 140.65 
 H2S 1 1193.52 3862.5 149.74 12.0740 0.001743 **
 Lactic 1 617.18 3286.1 144.89 6.2435 0.018850 * 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
Do not drop .brand-blue[H2S] or .brand-blue[Lactic] with p-values not > `$p_\text{out}=0.20$`.
]]
.column.bg-white[.content.vmiddle.center[

Nothing gets dropped here.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Stepwise variable selection
]]
.row[.split-two[
.column[.content[
.brand-blue[Third pass through algorithm (step 4 - 5)]

```r
add1(M3, scope = Mf, data = dat, test = "F")
```

```
 Single term additions
 
 Model:
 taste ~ H2S + Lactic
 Df Sum of Sq RSS AIC F value Pr(>F)
 <none> 2669.0 140.65 
 Acetic 1 0.55427 2668.4 142.64 0.0054 0.942
```

* There is no change in the model from steps 2 - 5.
* The "best" model as selected by the stepwise procedure
is

`$$E[{\tt taste}] = -27.59 + 3.95 \cdot {\tt H2S} + 19.89 \cdot {\tt Lactic}.$$`

* (Coincidentally) this is the same model as was selected by backward and forward variable selections.

]]
.column.bg-brand-gray[.content[

"Best" model with `$p_\text{out}=0.2$` and `$p_\text{in}=0.1$` as cutoffs.

```r
summary(M3)
```

```
  
  Call:
  lm(formula = taste ~ H2S + Lactic, data = dat)
  
  Residuals:
      Min      1Q  Median      3Q     Max 
  -17.343  -6.530  -1.164   4.844  25.618 
  
  Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
  (Intercept)  -27.592      8.982  -3.072  0.00481 **
  H2S            3.946      1.136   3.475  0.00174 **
  Lactic        19.887      7.959   2.499  0.01885 * 
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 9.942 on 27 degrees of freedom
  Multiple R-squared:  0.6517,	Adjusted R-squared:  0.6259 
  F-statistic: 25.26 on 2 and 27 DF,  p-value: 6.551e-07
```
]]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Criticism of stepwise procedures
]]
.row[.split-two[
.column[.content[

* Stepwise, forward and backward variable selection procedures will sometimes generate the same model, but this will not always be the case.

]]
.column.bg-brand-gray[.content[

* Never run automated stepwise procedures on their own!
* Wilkinson (1984, p196, SYSTAT)

"Stepwise regression is probably the most abused computerized statistical technique ever devised. If you think you need stepwise regression to solve a particular problem you have, it is almost certain you do not. Professional statisticians rarely use automated stepwise regression."

* The main issue is that stepwise procedures potentially identify models that are only locally optimal.

.bottom_abs.width100.font_small[
If you want to know more, now, google for Thayer (1990), Implementing Variable Selection Techniques in Regression.
]]]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# More goodness of fit criteria
]]
.row[.content[

* Recall from Lecture 6 and for a linear regression model `$\mathbf{m}\in\mathcal{M}$`

`\begin{equation*}
R^2(\mathbf{m}) = \frac{S_{yy} - \text{RSS}(\mathbf{m})}{S_{yy}}
\end{equation*}`

`\begin{equation*}
R_a^2(\mathbf{m}) = 1- \frac{\text{RSS}(\mathbf{m})/(n-p_{\mathbf{m}})}{S_{yy}/(n-1)}
\end{equation*}`

* This implies the estimated best model `$\hat{\mathbf{m}} = \operatorname{argmax}_{\mathbf{m}\in\mathcal{M}} R_a^2(\mathbf{m})$` has the largest `$R_a^2(\mathbf{m})$` across all models `$\mathbf{m}$`.
* `$S_{yy}$` is independent of `$\mathbf{m}$` and `$\text{RSS}(\mathbf{m}) = \sum_{i=1}^n R^2_{i \mathbf{m}}$`.
*  Why not `$\sum_{i=1}^n |R_{i \mathbf{m}}|$` or some other loss function (metric)? Yes, it is possible.
* In general, any criterion has the structure

`\begin{equation}
\sum_{i=1}\rho(R_i(\mathbf{m})) + \lambda(p_{\mathbf{m}},n), \label{eq:inf}
\end{equation}`
where `$\rho(z)$` is a nonnegative loss function (e.g. `$z^2$` or `$|z|$`) & `$\lambda$` is a penalty term.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Akaike information criterion
]]
.row[.split-30[
.column.bg-brand-gray[.content.vmiddle[

`$$\sum_{i=1}\rho(R_i(\mathbf{m})) + \lambda(p_{\mathbf{m}},n),$$`

where `$\rho(z)$` is a nonnegative loss function (e.g. `$z^2$`) & `$\lambda$` is a penalty term.

]]
.column[.content[
* The .brand-blue[Akaike information criterion] (AIC) is defined by

`$$\text{AIC}(\mathbf{m})=-2\,\text{loglikelihood} + 2p_{\mathbf{m}} \stackrel{\epsilon_{\mathbf{m} i} \sim NID,LS}{=} n\log\bigg(\frac{\text{RSS}(\mathbf{m})}{n}\bigg) + 2p_{\mathbf{m}}$$`
*  with LS means dropping constants. Since AIC contains -2 loglikelihood, the lower AIC, the better is the model. The "best" model using AIC is

`$$\hat{\mathbf{m}}_{\text{AIC}} = \text{argmin}_{\mathbf{m} \in \mathcal{A}}\text{AIC}(\mathbf{m}).$$`

* `$\hat{\mathbf{m}}_{\text{AIC}}$` is invariant under any strictly non-negative monotone transformation.
* To return the AIC value of a regression model in R use `extractAIC`.

```r
extractAIC(M3, k = 2) 
```

```
  [1]   3.0000 140.6475
```
* `k = 2` is the constant in `$2p_{\mathbf{m}}$` which reflects penalty severity. `$p=3$` for regression parameter but the model parameter is 4 here, including `$\hat \sigma^2$`.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Mallow's `$C_p$`
]]
.row[.content[

*  Mallows `$C_p$` `$\approx \text{AIC} + \text{constant}$`:

`$$C_p(\mathbf{m}) = \frac{\text{RSS}(\mathbf{m})}{\hat{\sigma}^2} - (n - 2p_{\mathbf{m}}), \;\; \hat{\sigma}^2 = \underbrace{\sum_{i=1}^{n}\frac{(Y_i - \boldsymbol{x}^\top_i\hat{\boldsymbol{\beta}}_{LS})^2}{n-p}}_{\text{error var estimate for full model}}.$$`

* For a reasonable model,
   * `$\hat{\sigma}^2 \approx \sigma^2$`,
   * `$\text{RSS}(p_{\mathbf{m}}) \approx (n-p_{\mathbf{m}})\sigma^2$`.

`$\Rightarrow C_p(\mathbf{m}) \approx (n-p_{\mathbf{m}})- (n - 2p_{\mathbf{m}})=p_{\mathbf{m}}$` which implies that a poor model has `$C_p(\mathbf{m}) \gg p_{\mathbf{m}}$`.
* We search models with small `$p_{\mathbf{m}}$` and `$C_p(\mathbf{m}) \leq p_{\mathbf{m}}$`.

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# US crime rate data
]]
.row[.split-two[
.column[.content[

* This example concerns famous crime data from 47 states of the USA in 1960. 
* Our aim is to model the crime rate as a function of up to 15 potential
explanatory variables.
* We will use stepwise variable selection based on AIC to choose a model.
* Data source:
Ehrlich, I. (1973) 'Participation in illegitimate activities: a theoretical and empirical 
investigation' *Journal of Political Economy*, **81**, 521-565.

]]
.column.bg-brand-gray[.content[

Variable | Description
--- | --- 
`M`  |  percentage of males aged 14-24 
`So` | indicator variable for a southern state 
`Ed` | mean years of schooling 
`Po1`| police expenditure in 1960 
`Po2`| police expenditure in 1959  
`LF` | labour force participation rate 
`M.F`| number of males per 1000 females 
`Pop`| state population
`NW` | number of nonwhites per 1000 people  
`U1` | unemployment rate of urban males 14-24 
`U2` | unemployment rate of urban males 35-39  
`GDP`| gross domestic product per head 
`Ineq`| income inequality 
`Prob` | probability of imprisonment 
`Time` | average time served in state prisons 
`Crime` | rate of crimes in a particular category per head of population

]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Crime data
]]
.row[.content[

```r
dat <- read.table(file = "data/uscrime.txt", 
 header = TRUE, row.names = NULL)
skimr::skim(dat)
```

```
 Skim summary statistics
 n obs: 47 
 n variables: 16 
 
 -- Variable type:integer ---------------------------------------------------------------------------------------------------
 variable missing n mean sd p0 p25 p50 p75 p100 hist
 Crime 0 47 905.09 386.76 342 658.5 831 1057.5 1993 <U+2586><U+2587><U+2587><U+2583><U+2583><U+2581><U+2581><U+2581>
 Ed 0 47 105.64 11.19 87 97.5 108 114.5 122 <U+2587><U+2581><U+2582><U+2585><U+2583><U+2587><U+2583><U+2586>
 GDP 0 47 525.38 96.49 288 459.5 537 591.5 689 <U+2582><U+2581><U+2586><U+2586><U+2586><U+2587><U+2587><U+2583>
 Ineq 0 47 194 39.9 126 165.5 176 227.5 276 <U+2582><U+2585><U+2587><U+2583><U+2582><U+2583><U+2583><U+2582>
 LF 0 47 561.19 40.41 480 530.5 560 593 641 <U+2581><U+2582><U+2587><U+2583><U+2585><U+2585><U+2581><U+2583>
 M 0 47 138.57 12.57 119 130 136 146 177 <U+2587><U+2587><U+2587><U+2587><U+2583><U+2582><U+2581><U+2581>
 M.F 0 47 983.02 29.47 934 964.5 977 992 1071 <U+2582><U+2586><U+2587><U+2583><U+2582><U+2581><U+2582><U+2581>
 NW 0 47 101.13 102.83 2 24 76 132.5 423 <U+2587><U+2585><U+2581><U+2582><U+2581><U+2581><U+2581><U+2581>
 Po1 0 47 85 29.72 45 62.5 78 104.5 166 <U+2586><U+2587><U+2585><U+2582><U+2583><U+2582><U+2581><U+2581>
 Po2 0 47 80.23 27.96 41 58.5 73 97 157 <U+2586><U+2587><U+2586><U+2585><U+2582><U+2582><U+2581><U+2581>
 Pop 0 47 36.62 38.07 3 10 25 41.5 168 <U+2587><U+2585><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581>
 So 0 47 0.34 0.48 0 0 0 1 1 <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2585>
 U1 0 47 95.47 18.03 70 80.5 92 104 142 <U+2587><U+2586><U+2586><U+2586><U+2583><U+2582><U+2581><U+2582>
 U2 0 47 33.98 8.45 20 27.5 34 38.5 58 <U+2583><U+2587><U+2585><U+2587><U+2585><U+2581><U+2581><U+2581>
 
 -- Variable type:numeric ---------------------------------------------------------------------------------------------------
 variable missing complete n mean sd p0 p25 p50 p75 p100
 Prob 0 47 47 0.047 0.023 0.0069 0.033 0.042 0.054 0.12
 Time 0 47 47 26.6 7.09 12.2 21.6 25.8 30.45 44 
 hist
 <U+2582><U+2585><U+2587><U+2583><U+2582><U+2582><U+2581><U+2581>
 <U+2582><U+2585><U+2586><U+2587><U+2587><U+2582><U+2583><U+2582>
```
]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Crime data
]]
.row[.content[
.scroll-box-20[

![](lecture11_2020JC_files/figure-html/unnamed-chunk-14-1.png)

]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Crime data
]]
.row[.content[
.scroll-box-20[

```r
M0 <- lm(Crime ~ 1, data = dat) # null model
M1 <- lm(Crime ~ ., data = dat) # full model
summary(M1)
```

```
  
  Call:
  lm(formula = Crime ~ ., data = dat)
  
  Residuals:
      Min      1Q  Median      3Q     Max 
  -395.74  -98.09   -6.69  112.99  512.67 
  
  Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
  (Intercept) -5984.2876  1628.3184  -3.675 0.000893 ***
  M               8.7830     4.1714   2.106 0.043443 *  
  So             -3.8035   148.7551  -0.026 0.979765    
  Ed             18.8324     6.2088   3.033 0.004861 ** 
  Po1            19.2804    10.6110   1.817 0.078892 .  
  Po2           -10.9422    11.7478  -0.931 0.358830    
  LF             -0.6638     1.4697  -0.452 0.654654    
  M.F             1.7407     2.0354   0.855 0.398995    
  Pop            -0.7330     1.2896  -0.568 0.573845    
  NW              0.4204     0.6481   0.649 0.521279    
  U1             -5.8271     4.2103  -1.384 0.176238    
  U2             16.7800     8.2336   2.038 0.050161 .  
  GDP             0.9617     1.0367   0.928 0.360754    
  Ineq            7.0672     2.2717   3.111 0.003983 ** 
  Prob        -4855.2658  2272.3746  -2.137 0.040627 *  
  Time           -3.4790     7.1653  -0.486 0.630708    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 209.1 on 31 degrees of freedom
  Multiple R-squared:  0.8031,	Adjusted R-squared:  0.7078 
  F-statistic: 8.429 on 15 and 31 DF,  p-value: 3.539e-07
```
]]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Crime data: Forward using AIC
]]
.row[.split-two[
.column[.content[
.scroll-box-20[

```r
M1.forw <- step(M0, scope = list(lower = M0, 
 upper = M1),direction = "forward", k = 2)
```

```
 Start: AIC=561.02
 Crime ~ 1
 
 Df Sum of Sq RSS AIC
 + Po1 1 3253302 3627626 532.94
 + Po2 1 3058626 3822302 535.39
 + GDP 1 1340152 5540775 552.84
 + Prob 1 1257075 5623853 553.54
 + Pop 1 783660 6097267 557.34
 + Ed 1 717146 6163781 557.85
 + M.F 1 314867 6566061 560.82
 <none> 6880928 561.02
 + LF 1 245446 6635482 561.32
 + Ineq 1 220530 6660397 561.49
 + U2 1 216354 6664573 561.52
 + Time 1 154545 6726383 561.96
 + So 1 56527 6824400 562.64
 + M 1 55084 6825844 562.65
 + U1 1 17533 6863395 562.90
 + NW 1 7312 6873615 562.97
 
 Step: AIC=532.94
 Crime ~ Po1
 
 Df Sum of Sq RSS AIC
 + Ineq 1 739819 2887807 524.22
 + M 1 616741 3010885 526.18
 + M.F 1 250522 3377104 531.57
 + NW 1 232434 3395192 531.82
 + So 1 219098 3408528 532.01
 + GDP 1 180872 3446754 532.53
 <none> 3627626 532.94
 + Po2 1 146167 3481459 533.00
 + Prob 1 92278 3535348 533.72
 + LF 1 77479 3550147 533.92
 + Time 1 43185 3584441 534.37
 + U2 1 17848 3609778 534.70
 + Pop 1 5666 3621959 534.86
 + U1 1 2878 3624748 534.90
 + Ed 1 767 3626859 534.93
 
 Step: AIC=524.22
 Crime ~ Po1 + Ineq
 
 Df Sum of Sq RSS AIC
 + Ed 1 587050 2300757 515.53
 + M.F 1 454545 2433262 518.17
 + Prob 1 280690 2607117 521.41
 + LF 1 260571 2627236 521.77
 + GDP 1 213937 2673871 522.60
 + M 1 181236 2706571 523.17
 + Pop 1 130377 2757430 524.04
 <none> 2887807 524.22
 + NW 1 36439 2851369 525.62
 + So 1 33738 2854069 525.66
 + Po2 1 30673 2857134 525.71
 + U1 1 2309 2885498 526.18
 + Time 1 497 2887310 526.21
 + U2 1 253 2887554 526.21
 
 Step: AIC=515.53
 Crime ~ Po1 + Ineq + Ed
 
 Df Sum of Sq RSS AIC
 + M 1 239405 2061353 512.37
 + Prob 1 234981 2065776 512.47
 + M.F 1 117026 2183731 515.08
 <none> 2300757 515.53
 + GDP 1 79540 2221218 515.88
 + U2 1 62112 2238646 516.25
 + Time 1 61770 2238987 516.26
 + Po2 1 42584 2258174 516.66
 + Pop 1 39319 2261438 516.72
 + U1 1 7365 2293392 517.38
 + LF 1 7254 2293503 517.39
 + NW 1 4210 2296547 517.45
 + So 1 4135 2296622 517.45
 
 Step: AIC=512.37
 Crime ~ Po1 + Ineq + Ed + M
 
 Df Sum of Sq RSS AIC
 + Prob 1 258063 1803290 508.08
 + U2 1 200988 1860365 509.55
 + GDP 1 163378 1897975 510.49
 <none> 2061353 512.37
 + M.F 1 74398 1986955 512.64
 + U1 1 50835 2010518 513.20
 + Po2 1 45392 2015961 513.32
 + Time 1 42746 2018607 513.39
 + NW 1 16488 2044865 513.99
 + Pop 1 8101 2053251 514.19
 + So 1 3189 2058164 514.30
 + LF 1 2988 2058365 514.30
 
 Step: AIC=508.08
 Crime ~ Po1 + Ineq + Ed + M + Prob
 
 Df Sum of Sq RSS AIC
 + U2 1 192233 1611057 504.79
 + GDP 1 86490 1716801 507.77
 + M.F 1 84509 1718781 507.83
 <none> 1803290 508.08
 + U1 1 52313 1750977 508.70
 + Pop 1 47719 1755571 508.82
 + Po2 1 37967 1765323 509.08
 + So 1 21971 1781320 509.51
 + Time 1 10194 1793096 509.82
 + LF 1 990 1802301 510.06
 + NW 1 797 1802493 510.06
 
 Step: AIC=504.79
 Crime ~ Po1 + Ineq + Ed + M + Prob + U2
 
 Df Sum of Sq RSS AIC
 <none> 1611057 504.79
 + GDP 1 59910 1551147 505.00
 + U1 1 54830 1556227 505.16
 + Pop 1 51320 1559737 505.26
 + M.F 1 30945 1580112 505.87
 + Po2 1 25017 1586040 506.05
 + So 1 17958 1593098 506.26
 + LF 1 13179 1597878 506.40
 + Time 1 7159 1603898 506.58
 + NW 1 359 1610698 506.78
```
]]]
.column.bg-brand-gray[.content[

```r
summary(M1.forw)
```

```
  
  Call:
  lm(formula = Crime ~ Po1 + Ineq + Ed + M + Prob + U2, data = dat)
  
  Residuals:
      Min      1Q  Median      3Q     Max 
  -470.68  -78.41  -19.68  133.12  556.23 
  
  Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
  (Intercept) -5040.505    899.843  -5.602 1.72e-06 ***
  Po1            11.502      1.375   8.363 2.56e-10 ***
  Ineq            6.765      1.394   4.855 1.88e-05 ***
  Ed             19.647      4.475   4.390 8.07e-05 ***
  M              10.502      3.330   3.154  0.00305 ** 
  Prob        -3801.836   1528.097  -2.488  0.01711 *  
  U2              8.937      4.091   2.185  0.03483 *  
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 200.7 on 40 degrees of freedom
  Multiple R-squared:  0.7659,	Adjusted R-squared:  0.7307 
  F-statistic: 21.81 on 6 and 40 DF,  p-value: 3.418e-11
```

```r
extractAIC(M1.forw)
```

```
  [1]   7.0000 504.7859
```

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
## Crime data: Stepwise using AIC starting with full model
]]
.row[.split-two[
.column[.content[
.scroll-box-20[

```r
M1 <- lm(Crime ~ ., data = dat) # full model
M1.step <- step(M1, scope = list(lower = Crime ~ 1, 
 upper = Crime ~ .),direction = "both", k = 2)
```

```
 Start: AIC=514.65
 Crime ~ M + So + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + 
 U2 + GDP + Ineq + Prob + Time
 
 Df Sum of Sq RSS AIC
 - So 1 29 1354974 512.65
 - LF 1 8917 1363862 512.96
 - Time 1 10304 1365250 513.00
 - Pop 1 14122 1369068 513.14
 - NW 1 18395 1373341 513.28
 - M.F 1 31967 1386913 513.74
 - GDP 1 37613 1392558 513.94
 - Po2 1 37919 1392865 513.95
 <none> 1354946 514.65
 - U1 1 83722 1438668 515.47
 - Po1 1 144306 1499252 517.41
 - U2 1 181536 1536482 518.56
 - M 1 193770 1548716 518.93
 - Prob 1 199538 1554484 519.11
 - Ed 1 402117 1757063 524.86
 - Ineq 1 423031 1777977 525.42
 
 Step: AIC=512.65
 Crime ~ M + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + U2 + 
 GDP + Ineq + Prob + Time
 
 Df Sum of Sq RSS AIC
 - Time 1 10341 1365315 511.01
 - LF 1 10878 1365852 511.03
 - Pop 1 14127 1369101 511.14
 - NW 1 21626 1376600 511.39
 - M.F 1 32449 1387423 511.76
 - Po2 1 37954 1392929 511.95
 - GDP 1 39223 1394197 511.99
 <none> 1354974 512.65
 - U1 1 96420 1451395 513.88
 + So 1 29 1354946 514.65
 - Po1 1 144302 1499277 515.41
 - U2 1 189859 1544834 516.81
 - M 1 195084 1550059 516.97
 - Prob 1 204463 1559437 517.26
 - Ed 1 403140 1758114 522.89
 - Ineq 1 488834 1843808 525.13
 
 Step: AIC=511.01
 Crime ~ M + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + U2 + 
 GDP + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - LF 1 10533 1375848 509.37
 - NW 1 15482 1380797 509.54
 - Pop 1 21846 1387161 509.75
 - Po2 1 28932 1394247 509.99
 - GDP 1 36070 1401385 510.23
 - M.F 1 41784 1407099 510.42
 <none> 1365315 511.01
 - U1 1 91420 1456735 512.05
 + Time 1 10341 1354974 512.65
 + So 1 65 1365250 513.00
 - Po1 1 134137 1499452 513.41
 - U2 1 184143 1549458 514.95
 - M 1 186110 1551425 515.01
 - Prob 1 237493 1602808 516.54
 - Ed 1 409448 1774763 521.33
 - Ineq 1 502909 1868224 523.75
 
 Step: AIC=509.37
 Crime ~ M + Ed + Po1 + Po2 + M.F + Pop + NW + U1 + U2 + GDP + 
 Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - NW 1 11675 1387523 507.77
 - Po2 1 21418 1397266 508.09
 - Pop 1 27803 1403651 508.31
 - M.F 1 31252 1407100 508.42
 - GDP 1 35035 1410883 508.55
 <none> 1375848 509.37
 - U1 1 80954 1456802 510.06
 + LF 1 10533 1365315 511.01
 + Time 1 9996 1365852 511.03
 + So 1 3046 1372802 511.26
 - Po1 1 123896 1499744 511.42
 - U2 1 190746 1566594 513.47
 - M 1 217716 1593564 514.27
 - Prob 1 226971 1602819 514.54
 - Ed 1 413254 1789103 519.71
 - Ineq 1 500944 1876792 521.96
 
 Step: AIC=507.77
 Crime ~ M + Ed + Po1 + Po2 + M.F + Pop + U1 + U2 + GDP + Ineq + 
 Prob
 
 Df Sum of Sq RSS AIC
 - Po2 1 16706 1404229 506.33
 - Pop 1 25793 1413315 506.63
 - M.F 1 26785 1414308 506.66
 - GDP 1 31551 1419073 506.82
 <none> 1387523 507.77
 - U1 1 83881 1471404 508.52
 + NW 1 11675 1375848 509.37
 + So 1 7207 1380316 509.52
 + LF 1 6726 1380797 509.54
 + Time 1 4534 1382989 509.61
 - Po1 1 118348 1505871 509.61
 - U2 1 201453 1588976 512.14
 - Prob 1 216760 1604282 512.59
 - M 1 309214 1696737 515.22
 - Ed 1 402754 1790276 517.74
 - Ineq 1 589736 1977259 522.41
 
 Step: AIC=506.33
 Crime ~ M + Ed + Po1 + M.F + Pop + U1 + U2 + GDP + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - Pop 1 22345 1426575 505.07
 - GDP 1 32142 1436371 505.39
 - M.F 1 36808 1441037 505.54
 <none> 1404229 506.33
 - U1 1 86373 1490602 507.13
 + Po2 1 16706 1387523 507.77
 + NW 1 6963 1397266 508.09
 + So 1 3807 1400422 508.20
 + LF 1 1986 1402243 508.26
 + Time 1 575 1403654 508.31
 - U2 1 205814 1610043 510.76
 - Prob 1 218607 1622836 511.13
 - M 1 307001 1711230 513.62
 - Ed 1 389502 1793731 515.83
 - Ineq 1 608627 2012856 521.25
 - Po1 1 1050202 2454432 530.57
 
 Step: AIC=505.07
 Crime ~ M + Ed + Po1 + M.F + U1 + U2 + GDP + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - GDP 1 26493 1453068 503.93
 <none> 1426575 505.07
 - M.F 1 84491 1511065 505.77
 - U1 1 99463 1526037 506.24
 + Pop 1 22345 1404229 506.33
 + Po2 1 13259 1413315 506.63
 + NW 1 5927 1420648 506.87
 + So 1 5724 1420851 506.88
 + LF 1 5176 1421398 506.90
 + Time 1 3913 1422661 506.94
 - Prob 1 198571 1625145 509.20
 - U2 1 208880 1635455 509.49
 - M 1 320926 1747501 512.61
 - Ed 1 386773 1813348 514.35
 - Ineq 1 594779 2021354 519.45
 - Po1 1 1127277 2553852 530.44
 
 Step: AIC=503.93
 Crime ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 <none> 1453068 503.93
 + GDP 1 26493 1426575 505.07
 - M.F 1 103159 1556227 505.16
 + Pop 1 16697 1436371 505.39
 + Po2 1 14148 1438919 505.47
 + So 1 9329 1443739 505.63
 + LF 1 4374 1448694 505.79
 + NW 1 3799 1449269 505.81
 + Time 1 2293 1450775 505.86
 - U1 1 127044 1580112 505.87
 - Prob 1 247978 1701046 509.34
 - U2 1 255443 1708511 509.55
 - M 1 296790 1749858 510.67
 - Ed 1 445788 1898855 514.51
 - Ineq 1 738244 2191312 521.24
 - Po1 1 1672038 3125105 537.93
```

]]]
.column.bg-brand-gray[.content[

```r
summary(M1.step)
```

```
  
  Call:
  lm(formula = Crime ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq + Prob, 
      data = dat)
  
  Residuals:
      Min      1Q  Median      3Q     Max 
  -444.70 -111.07    3.03  122.15  483.30 
  
  Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
  (Intercept) -6426.101   1194.611  -5.379 4.04e-06 ***
  M               9.332      3.350   2.786  0.00828 ** 
  Ed             18.012      5.275   3.414  0.00153 ** 
  Po1            10.265      1.552   6.613 8.26e-08 ***
  M.F             2.234      1.360   1.642  0.10874    
  U1             -6.087      3.339  -1.823  0.07622 .  
  U2             18.735      7.248   2.585  0.01371 *  
  Ineq            6.133      1.396   4.394 8.63e-05 ***
  Prob        -3796.032   1490.646  -2.547  0.01505 *  
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 195.5 on 38 degrees of freedom
  Multiple R-squared:  0.7888,	Adjusted R-squared:  0.7444 
  F-statistic: 17.74 on 8 and 38 DF,  p-value: 1.159e-10
```

```r
extractAIC(M1.step)
```

```
  [1]   9.0000 503.9349
```
]]

]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# Remarks
]]
.row[.split-50[
.column[.content[
## Forward selection
* Usually start from the null model.
* Every variable outside the current model `<none>` can be added (+) one at time at each step until AIC is no better (`<none>` has the lowest AIC in the top of the list). 
* When AIC is used, no `$p_\text{out},p_\text{in}$` are needed. This is the same for forward, backforward and stepwise selections.
]]
.column.bg-brand-gray[.content[
## Stepwise selection
* Start from any model such as the full model. 
* Every variable outside the current model `<none>` can be added (+) and those inside can be dropped (-) one at time at each step until AIC is no better.
* This final model is bigger and contains all variables of forward selection and .brand-blue[M.F, U1] with p-values 0.108 and 0.076.
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
# BIC has tougher penalties
]]
.row[.split-two[
.column[.content[

* AIC and `$C_p$` have a tendency to include too many variables. 
* Solution: Penalize more!
* Bayes information criterion (BIC):

`\begin{eqnarray*}
BIC(\mathbf{m}) & =& -2\,\text{loglikelihood} + \log(n)\cdot p_{\mathbf{m}} \\
& =& n \log \frac{\text{RSS}(\mathbf{m})}{n} + \log(n)\cdot p_{\mathbf{m}}
\end{eqnarray*}`

* BIC in R with additional option `k=log(n)` in function `step()`.
* `$n=47$` in the crime data and now it uses log(47)=3.85 instead of 2 in the penalty.
* Now the best model using stepwise with BIC is the same as using forward with AIC.

]]
.column.bg-brand-gray[.content[

```r
M1 <- lm(Crime ~ ., data = dat) 
M1.step <- step(M1, 
 scope = list(lower = Crime ~ 1, 
 upper = Crime ~ .),
 direction = "both", k = log(47))
```

```
 Start: AIC=544.25
 Crime ~ M + So + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + 
 U2 + GDP + Ineq + Prob + Time
 
 Df Sum of Sq RSS AIC
 - So 1 29 1354974 540.40
 - LF 1 8917 1363862 540.71
 - Time 1 10304 1365250 540.76
 - Pop 1 14122 1369068 540.89
 - NW 1 18395 1373341 541.03
 - M.F 1 31967 1386913 541.50
 - GDP 1 37613 1392558 541.69
 - Po2 1 37919 1392865 541.70
 - U1 1 83722 1438668 543.22
 <none> 1354946 544.25
 - Po1 1 144306 1499252 545.16
 - U2 1 181536 1536482 546.31
 - M 1 193770 1548716 546.68
 - Prob 1 199538 1554484 546.86
 - Ed 1 402117 1757063 552.62
 - Ineq 1 423031 1777977 553.17
 
 Step: AIC=540.4
 Crime ~ M + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + U2 + 
 GDP + Ineq + Prob + Time
 
 Df Sum of Sq RSS AIC
 - Time 1 10341 1365315 536.91
 - LF 1 10878 1365852 536.93
 - Pop 1 14127 1369101 537.04
 - NW 1 21626 1376600 537.30
 - M.F 1 32449 1387423 537.66
 - Po2 1 37954 1392929 537.85
 - GDP 1 39223 1394197 537.89
 - U1 1 96420 1451395 539.78
 <none> 1354974 540.40
 - Po1 1 144302 1499277 541.31
 - U2 1 189859 1544834 542.72
 - M 1 195084 1550059 542.87
 - Prob 1 204463 1559437 543.16
 + So 1 29 1354946 544.25
 - Ed 1 403140 1758114 548.79
 - Ineq 1 488834 1843808 551.03
 
 Step: AIC=536.91
 Crime ~ M + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + U2 + 
 GDP + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - LF 1 10533 1375848 533.42
 - NW 1 15482 1380797 533.59
 - Pop 1 21846 1387161 533.81
 - Po2 1 28932 1394247 534.04
 - GDP 1 36070 1401385 534.28
 - M.F 1 41784 1407099 534.48
 - U1 1 91420 1456735 536.11
 <none> 1365315 536.91
 - Po1 1 134137 1499452 537.46
 - U2 1 184143 1549458 539.01
 - M 1 186110 1551425 539.07
 + Time 1 10341 1354974 540.40
 - Prob 1 237493 1602808 540.60
 + So 1 65 1365250 540.76
 - Ed 1 409448 1774763 545.39
 - Ineq 1 502909 1868224 547.80
 
 Step: AIC=533.42
 Crime ~ M + Ed + Po1 + Po2 + M.F + Pop + NW + U1 + U2 + GDP + 
 Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - NW 1 11675 1387523 529.97
 - Po2 1 21418 1397266 530.30
 - Pop 1 27803 1403651 530.51
 - M.F 1 31252 1407100 530.63
 - GDP 1 35035 1410883 530.75
 - U1 1 80954 1456802 532.26
 <none> 1375848 533.42
 - Po1 1 123896 1499744 533.62
 - U2 1 190746 1566594 535.67
 - M 1 217716 1593564 536.47
 - Prob 1 226971 1602819 536.75
 + LF 1 10533 1365315 536.91
 + Time 1 9996 1365852 536.93
 + So 1 3046 1372802 537.17
 - Ed 1 413254 1789103 541.91
 - Ineq 1 500944 1876792 544.16
 
 Step: AIC=529.97
 Crime ~ M + Ed + Po1 + Po2 + M.F + Pop + U1 + U2 + GDP + Ineq + 
 Prob
 
 Df Sum of Sq RSS AIC
 - Po2 1 16706 1404229 526.68
 - Pop 1 25793 1413315 526.98
 - M.F 1 26785 1414308 527.02
 - GDP 1 31551 1419073 527.17
 - U1 1 83881 1471404 528.88
 - Po1 1 118348 1505871 529.96
 <none> 1387523 529.97
 - U2 1 201453 1588976 532.49
 - Prob 1 216760 1604282 532.94
 + NW 1 11675 1375848 533.42
 + So 1 7207 1380316 533.57
 + LF 1 6726 1380797 533.59
 + Time 1 4534 1382989 533.66
 - M 1 309214 1696737 535.57
 - Ed 1 402754 1790276 538.10
 - Ineq 1 589736 1977259 542.76
 
 Step: AIC=526.68
 Crime ~ M + Ed + Po1 + M.F + Pop + U1 + U2 + GDP + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - Pop 1 22345 1426575 523.57
 - GDP 1 32142 1436371 523.89
 - M.F 1 36808 1441037 524.05
 - U1 1 86373 1490602 525.64
 <none> 1404229 526.68
 - U2 1 205814 1610043 529.26
 - Prob 1 218607 1622836 529.63
 + Po2 1 16706 1387523 529.97
 + NW 1 6963 1397266 530.30
 + So 1 3807 1400422 530.40
 + LF 1 1986 1402243 530.46
 + Time 1 575 1403654 530.51
 - M 1 307001 1711230 532.12
 - Ed 1 389502 1793731 534.34
 - Ineq 1 608627 2012856 539.75
 - Po1 1 1050202 2454432 549.07
 
 Step: AIC=523.57
 Crime ~ M + Ed + Po1 + M.F + U1 + U2 + GDP + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - GDP 1 26493 1453068 520.59
 - M.F 1 84491 1511065 522.43
 - U1 1 99463 1526037 522.89
 <none> 1426575 523.57
 - Prob 1 198571 1625145 525.85
 - U2 1 208880 1635455 526.14
 + Pop 1 22345 1404229 526.68
 + Po2 1 13259 1413315 526.98
 + NW 1 5927 1420648 527.23
 + So 1 5724 1420851 527.23
 + LF 1 5176 1421398 527.25
 + Time 1 3913 1422661 527.29
 - M 1 320926 1747501 529.26
 - Ed 1 386773 1813348 531.00
 - Ineq 1 594779 2021354 536.10
 - Po1 1 1127277 2553852 547.09
 
 Step: AIC=520.59
 Crime ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - M.F 1 103159 1556227 519.96
 <none> 1453068 520.59
 - U1 1 127044 1580112 520.68
 + GDP 1 26493 1426575 523.57
 + Pop 1 16697 1436371 523.89
 + Po2 1 14148 1438919 523.98
 + So 1 9329 1443739 524.13
 - Prob 1 247978 1701046 524.14
 + LF 1 4374 1448694 524.29
 + NW 1 3799 1449269 524.31
 - U2 1 255443 1708511 524.35
 + Time 1 2293 1450775 524.36
 - M 1 296790 1749858 525.47
 - Ed 1 445788 1898855 529.31
 - Ineq 1 738244 2191312 536.04
 - Po1 1 1672038 3125105 552.73
 
 Step: AIC=519.96
 Crime ~ M + Ed + Po1 + U1 + U2 + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 - U1 1 54830 1611057 517.74
 <none> 1556227 519.96
 + M.F 1 103159 1453068 520.59
 - U2 1 194750 1750977 521.65
 + Pop 1 66223 1490004 521.77
 + GDP 1 45162 1511065 522.43
 - Prob 1 239705 1795931 522.84
 + Po2 1 29979 1526248 522.90
 + Time 1 22501 1533726 523.13
 + LF 1 10865 1545361 523.48
 + So 1 3867 1552360 523.69
 + NW 1 147 1556080 523.81
 - M 1 413318 1969545 527.18
 - Ed 1 815182 2371408 535.91
 - Ineq 1 906629 2462856 537.69
 - Po1 1 1811722 3367949 552.40
 
 Step: AIC=517.74
 Crime ~ M + Ed + Po1 + U2 + Ineq + Prob
 
 Df Sum of Sq RSS AIC
 <none> 1611057 517.74
 - U2 1 192233 1803290 519.18
 + GDP 1 59910 1551147 519.81
 + U1 1 54830 1556227 519.96
 + Pop 1 51320 1559737 520.07
 - Prob 1 249308 1860365 520.65
 + M.F 1 30945 1580112 520.68
 + Po2 1 25017 1586040 520.85
 + So 1 17958 1593098 521.06
 + LF 1 13179 1597878 521.20
 + Time 1 7159 1603898 521.38
 + NW 1 359 1610698 521.58
 - M 1 400611 2011667 524.32
 - Ed 1 776207 2387264 532.37
 - Ineq 1 949221 2560278 535.66
 - Po1 1 2817067 4428124 561.41
```
]
]]
]]

---

.row.bg-brand-blue.white[.content.vmiddle[
## Crime data: Stepwise using BIC starting with full model
]]
.row[.split-two[
.column.bg-brand-gray[.content[

```r
summary(M1.step)
```

```
  
  Call:
  lm(formula = Crime ~ M + Ed + Po1 + U2 + Ineq + Prob, data = dat)
  
  Residuals:
      Min      1Q  Median      3Q     Max 
  -470.68  -78.41  -19.68  133.12  556.23 
  
  Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
  (Intercept) -5040.505    899.843  -5.602 1.72e-06 ***
  M              10.502      3.330   3.154  0.00305 ** 
  Ed             19.647      4.475   4.390 8.07e-05 ***
  Po1            11.502      1.375   8.363 2.56e-10 ***
  U2              8.937      4.091   2.185  0.03483 *  
  Ineq            6.765      1.394   4.855 1.88e-05 ***
  Prob        -3801.836   1528.097  -2.488  0.01711 *  
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 200.7 on 40 degrees of freedom
  Multiple R-squared:  0.7659,	Adjusted R-squared:  0.7307 
  F-statistic: 21.81 on 6 and 40 DF,  p-value: 3.418e-11
```
]]
.column.bg-white[.content[

```r
autoplot(M1.step)
```

![](lecture11_2020JC_files/figure-html/unnamed-chunk-22-1.png)

]]
]]