Example 1: Eight Schools

We now continue the “Eight Schools” example introduced in the “StanWorkshopBackground.pdf” document.

This is an example in Section 5.5 of Gelman et al (2003), which studied coaching effects from eight schools. For simplicity, we call this example “eight schools.”

We enter the data as a list into R using:

schools_dat <- list(
  J = 8, 
  y = c(28,  8, -3,  7, -1,  1, 18, 12),
  sigma = c(15, 10, 16, 11,  9, 11, 10, 18)
)

Loading the package into R

The package name is rstan (all lowercase), so we can use library(“rstan”) to load the package.

library("rstan") # observe startup messages

## Warning: package 'rstan' was built under R version 3.4.3

## Loading required package: ggplot2

## Loading required package: StanHeaders

## Warning: package 'StanHeaders' was built under R version 3.4.3

## rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())

As the startup message says, if you are using rstan locally on a multicore machine and have plenty of RAM to estimate your model in parallel, at this point execute

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

These options respectively allow you to automatically save a bare version of a compiled Stan program to the hard disk so that it does not need to be recompiled and to execute multiple Markov chains in parallel.

Model 1 – No pooling

Let us try the first model where \[ y_j \sim N(\theta_j,\sigma_j^2) \]

where each school has a different mean \(\theta_j\) and \(\sigma_j^2\) are known.

school_model1 <- "
data {
    int<lower=0> J;         // # of schools
    real y[J];              // estimated treatment
    real<lower=0> sigma[J]; // std err of effect
}
parameters {
    real theta[J]; // school effect
}
model {
    y ~ normal(theta, sigma);
}
"
cat(school_model1,file="school_model_1.rstan")

We can also specify a Stan model using a character string by using argument model_code of function stan instead. However, this is not recommended, as using a file allows you to use print statements and to dereference line numbers in error messages.

Fit the model:

fit1 <- stan(file="school_model_1.rstan", data = schools_dat, iter = 1000, chains = 4)

The object fit1, returned from function stan is an “S4 object” of class “stanfit”. Methods such as print, plot, and pairs are associated with the fitted result so we can use the following code to check out the results in fit. print provides a summary for the parameter of the model as well as the log-posterior with name lp__ (see the following example output). For more methods and details of class stanfit, see the help of class stanfit.

In particular, we can use extract function on stanfit objects to obtain the samples. extract extracts samples from the stanfit object as a list of arrays for parameters of interest, or just an array. In addition, S3 functions as.array and as.matrix are defined for stanfit object (using help(“as.array.stanfit”) to check out the help document in R).

Print the summary statistics of the parameter posterior distributions and plot the fitted distributions.

print(fit1)

## Inference for Stan model: school_model_1.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##           mean se_mean    sd   2.5%    25%   50%   75% 97.5% n_eff Rhat
## theta[1] 27.48    0.34 15.20  -3.31  17.50 27.44 37.65 58.03  2000    1
## theta[2]  7.90    0.23 10.11 -11.69   0.71  7.97 14.78 27.39  2000    1
## theta[3] -3.28    0.36 16.07 -34.86 -13.89 -3.03  7.71 27.86  2000    1
## theta[4]  7.30    0.25 11.00 -14.41  -0.26  7.20 14.86 29.01  2000    1
## theta[5] -1.10    0.19  8.67 -18.20  -6.91 -0.94  4.81 15.49  2000    1
## theta[6]  0.92    0.26 11.45 -21.55  -6.54  0.89  8.70 23.20  2000    1
## theta[7] 17.80    0.22  9.91  -1.30  11.13 17.72 24.44 37.62  2000    1
## theta[8] 12.38    0.41 18.32 -23.28  -0.13 12.16 25.15 49.00  2000    1
## lp__     -4.04    0.07  2.00  -8.82  -5.19 -3.70 -2.56 -1.15   932    1
## 
## Samples were drawn using NUTS(diag_e) at Thu Dec 07 07:05:07 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

plot(fit1)

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

pairs(fit1,pars=c("theta"))

Since each \(\theta_j\) is essentially estimated by one observation each of the posterior means is centered around the observed \(y_j\) values.

Model 2 – Complte pooling

For this next model we will change the model to: \[ y_j \sim N(\theta,\sigma_j^2) \] where \(\theta\) is a common parameter for each school and again \(\sigma_j^2\) is a known constant for each school. We now specify the model.

school_model2 <- "
data {
    int<lower=0> J;         // # of schools
    real y[J];              // estimated treatment
    real<lower=0> sigma[J]; // std err of effect
}
parameters {
    real theta; // school effect
}
model {
    y ~ normal(theta, sigma);
}
"
cat(school_model2,file="school_model_2.rstan")

Fit the model:

fit2 <- stan(file="school_model_2.rstan", data = schools_dat, iter = 1000, chains = 4)

Print the summary statistics of the parameter posterior distributions and plot the fitted distributions.

print(fit2)

## Inference for Stan model: school_model_2.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##        mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## theta  7.75    0.16 3.9  0.39  5.18  7.51 10.30 15.84   618    1
## lp__  -2.81    0.02 0.7 -4.72 -2.92 -2.55 -2.39 -2.35   947    1
## 
## Samples were drawn using NUTS(diag_e) at Thu Dec 07 07:05:49 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

plot(fit2)

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

Model 3 – Hierarchical Model

For this next model we will change the model to: \[ y_j \sim N(\theta_j,\sigma_j^2) \] where \(\theta_j\) is a parameter for each school and again \(\sigma_j^2\) is a known constant for each school. We now suppose that the \(\theta_j\)’s are themselves drawn from a common distribution \[ \theta_j \sim N(\mu,\tau) \] where \(\mu\) and \(\tau^2\) are parameters to be fit from the data. We use the priors \[ \mu \sim N(0,5^2) \qquad \mbox{and} \qquad \tau \sim \mbox{Cauchy}(0,5^2) \] which are relatively uniformative priors.

We now specify the model.

school_model3 <- "
data {
  int<lower=0> J;
  real y[J];
  real<lower=0> sigma[J];
}

parameters {
  real mu;
  real<lower=0> tau;
  real theta[J];
}

model {
  mu ~ normal(0, 5);
  tau ~ cauchy(0, 5);
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}
"
cat(school_model3,file="school_model_3.rstan")

Fit the model:

Print the summary statistics of the parameter posterior distributions and plot the fitted distributions.

plot(fit3)

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

pairs(fit3, pars = c("mu", "tau", "lp__"))

We can extract the results into either a list of arrays, or a three dimensional array using the following code:

la <- extract(fit3, permuted = TRUE) # return a list of arrays 
mu <- la$mu 
tau <- la$tau 
summary( mu )

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -12.073   2.202   4.464   4.454   6.779  17.388

### return an array of three dimensions: iterations, chains, parameters 
a <- extract(fit3, permuted = FALSE)
summary( a[,,1] )

##     chain:1          chain:2          chain:3          chain:4       
##  Min.   :-7.575   Min.   :-8.259   Min.   :-8.478   Min.   :-12.073  
##  1st Qu.: 1.952   1st Qu.: 2.335   1st Qu.: 2.422   1st Qu.:  2.057  
##  Median : 4.429   Median : 4.686   Median : 4.613   Median :  4.348  
##  Mean   : 4.227   Mean   : 4.609   Mean   : 4.695   Mean   :  4.284  
##  3rd Qu.: 6.524   3rd Qu.: 7.142   3rd Qu.: 7.022   3rd Qu.:  6.492  
##  Max.   :15.227   Max.   :17.388   Max.   :15.273   Max.   : 15.157

We can also summarise the posterior distributions for each parameter using:

### use S3 functions as.array (or as.matrix) on stanfit objects
a2 <- as.array(fit3)
m <- as.matrix(fit3)
print(fit3, digits = 1)

## Inference for Stan model: school_model_3.
## 4 chains, each with iter=5000; warmup=2500; thin=1; 
## post-warmup draws per chain=2500, total post-warmup draws=10000.
## 
##           mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## mu         4.5     0.1 3.3  -2.2   2.2   4.5   6.8  10.8   811    1
## tau        4.1     0.2 3.2   0.6   1.7   3.2   5.4  12.3   448    1
## theta[1]   6.5     0.1 5.7  -3.1   2.9   6.1   9.4  19.5  2156    1
## theta[2]   5.1     0.1 4.8  -4.5   2.0   5.0   8.0  14.9  2030    1
## theta[3]   4.0     0.1 5.5  -8.3   1.1   4.3   7.3  14.1  2154    1
## theta[4]   4.8     0.1 5.0  -5.2   1.8   4.9   7.8  14.8  1614    1
## theta[5]   3.6     0.1 4.8  -6.8   0.9   3.9   6.8  12.2  1637    1
## theta[6]   4.1     0.1 4.9  -6.1   1.3   4.3   7.3  13.4  1971    1
## theta[7]   6.6     0.1 5.2  -2.5   3.2   6.3   9.5  18.9  1769    1
## theta[8]   5.0     0.1 5.5  -6.1   1.8   4.9   8.1  16.3  2332    1
## lp__     -15.5     0.4 6.1 -27.0 -19.8 -15.9 -11.3  -3.4   208    1
## 
## Samples were drawn using NUTS(diag_e) at Thu Dec 07 07:06:05 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Lastly, we can look at the trace plot to diagnose potential difficulties with convergence of the Markov chain:

traceplot(fit3, pars = c("mu", "tau"), inc_warmup = TRUE, nrow = 2)

Workshop on statistical challenges in astronomy – Hierarchical models in Stan

John Ormerod

Example 1: Eight Schools

Loading the package into R

Model 1 – No pooling

Model 2 – Complte pooling

Model 3 – Hierarchical Model