---
title: "Workshop on statistical challenges in astronomy -- Hierarchical models in Stan"
author: "John Ormerod"
output:
  html_document: null
  pdf_document: default
  fontsize: 12pt
  geometry: margin=1in
  toc: yes
 
---

 
# Example 1: Eight Schools

We now continue the "Eight Schools" example introduced in the "StanWorkshopBackground.pdf"
document.

This is an example in Section 5.5 of Gelman et al (2003), which studied coaching effects from eight schools. 
For simplicity, we call this example "eight schools."

We enter the data as a list into R using:

```{r,echo=TRUE}
schools_dat <- list(
  J = 8, 
  y = c(28,  8, -3,  7, -1,  1, 18, 12),
  sigma = c(15, 10, 16, 11,  9, 11, 10, 18)
)
```

## Loading the package into R

The package name is rstan (all lowercase), so we can use library("rstan") to load the package.

```{r,echo=TRUE}
library("rstan") # observe startup messages
```

As the startup message says, if you are using rstan locally on a multicore machine and have plenty of 
RAM to estimate your model in parallel, at this point execute

```{r,echo=TRUE}
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```

These options respectively allow you to automatically save a bare version of a compiled Stan program 
to the hard disk so that it does not need to be recompiled and to execute multiple Markov chains in parallel.

# Model 1 -- No pooling

Let us try the first model where
$$
y_j \sim N(\theta_j,\sigma_j^2)
$$

where each school has a different mean $\theta_j$ and $\sigma_j^2$ are known.

```{r,echo=TRUE}
school_model1 <- "
data {
    int<lower=0> J;         // # of schools
    real y[J];              // estimated treatment
    real<lower=0> sigma[J]; // std err of effect
}
parameters {
    real theta[J]; // school effect
}
model {
    y ~ normal(theta, sigma);
}
"
cat(school_model1,file="school_model_1.rstan")
```

We can also specify a Stan model using a character string by using argument model_code of 
function stan instead. However, this is not recommended, as using a file allows you to use 
print statements and to dereference line numbers in error messages.

```{r,echo=FALSE, warning=FALSE}
# Redfine cat function to suppress stan compiler warnings
oldcat <- cat()
cat <- function(...) return(invisible(NULL))
```

Fit the model:

```{r,echo=TRUE, warning=FALSE}
fit1 <- stan(file="school_model_1.rstan", data = schools_dat, iter = 1000, chains = 4)
```

```{r,echo=FALSE, warning=FALSE}
cat <- oldcat
```

The object fit1, returned from function stan is an "S4 object" of class "stanfit". Methods such 
as print, plot, and pairs are associated with the fitted result so we can use the following 
code to check out the results in fit. print provides a summary for the parameter of the model 
as well as the log-posterior with name lp__ (see the following example output). For more 
methods and details of class stanfit, see the help of class stanfit.

In particular, we can use extract function on stanfit objects to obtain the samples. extract 
extracts samples from the stanfit object as a list of arrays for parameters of interest, or 
just an array. In addition, S3 functions as.array and as.matrix are defined for stanfit object 
(using help("as.array.stanfit") to check out the help document in R).

Print the summary statistics of the parameter posterior distributions and plot
the fitted distributions.

```{r,echo=TRUE}
print(fit1)
plot(fit1)
pairs(fit1,pars=c("theta"))
```

Since each $\theta_j$ is essentially estimated by one observation each of the
posterior means is centered around the observed $y_j$ values.

# Model 2 -- Complte pooling

For this next model we will change the model to:
$$
y_j \sim N(\theta,\sigma_j^2)
$$
where $\theta$ is a common parameter for each school and again $\sigma_j^2$
is a known constant for each school. We now specify the model.

```{r,echo=TRUE}
school_model2 <- "
data {
    int<lower=0> J;         // # of schools
    real y[J];              // estimated treatment
    real<lower=0> sigma[J]; // std err of effect
}
parameters {
    real theta; // school effect
}
model {
    y ~ normal(theta, sigma);
}
"
cat(school_model2,file="school_model_2.rstan")
```


```{r,echo=FALSE, warning=FALSE}
# Redfine cat function to suppress stan compiler warnings
oldcat <- cat()
cat <- function(...) return(invisible(NULL))
```

Fit the model:

```{r,echo=TRUE, warning=FALSE}
fit2 <- stan(file="school_model_2.rstan", data = schools_dat, iter = 1000, chains = 4)
```

```{r,echo=FALSE, warning=FALSE}
cat <- oldcat
```

Print the summary statistics of the parameter posterior distributions and plot
the fitted distributions.

```{r,echo=TRUE}
print(fit2)
plot(fit2)
```




# Model 3 -- Hierarchical Model

For this next model we will change the model to:
$$
y_j \sim N(\theta_j,\sigma_j^2)
$$
where $\theta_j$ is a parameter for each school and again $\sigma_j^2$
is a known constant for each school. We now suppose that the $\theta_j$'s
are themselves drawn from a common distribution
$$
\theta_j \sim N(\mu,\tau)
$$
where $\mu$ and $\tau^2$ are parameters to be fit from the data. We use the
priors
$$
\mu \sim N(0,5^2) \qquad \mbox{and} \qquad \tau \sim \mbox{Cauchy}(0,5^2)
$$
which are relatively uniformative priors.

We now specify the model.

```{r,echo=TRUE}
school_model3 <- "
data {
  int<lower=0> J;
  real y[J];
  real<lower=0> sigma[J];
}

parameters {
  real mu;
  real<lower=0> tau;
  real theta[J];
}

model {
  mu ~ normal(0, 5);
  tau ~ cauchy(0, 5);
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}
"
cat(school_model3,file="school_model_3.rstan")
```


Fit the model:

```{r,echo=FALSE, warning=FALSE}
# Redfine cat function to suppress stan compiler warnings
oldcat <- cat()
cat <- function(...) return(invisible(NULL))

# Fit the model using stan
fit3 <- stan(file="school_model_3.rstan", data = schools_dat, iter = 5000, chains = 4)

cat <- oldcat
```

Print the summary statistics of the parameter posterior distributions and plot
the fitted distributions.

```{r,echo=TRUE}
plot(fit3)
pairs(fit3, pars = c("mu", "tau", "lp__"))
```

We can extract the results into either a list of arrays, or a three dimensional 
array using the following code:

```{r,echo=TRUE}
la <- extract(fit3, permuted = TRUE) # return a list of arrays 
mu <- la$mu 
tau <- la$tau 
summary( mu )

### return an array of three dimensions: iterations, chains, parameters 
a <- extract(fit3, permuted = FALSE)
summary( a[,,1] )
```

We can also summarise the posterior distributions for each parameter using:

```{r,echo=TRUE}
### use S3 functions as.array (or as.matrix) on stanfit objects
a2 <- as.array(fit3)
m <- as.matrix(fit3)
print(fit3, digits = 1)

```

Lastly, we can look at the trace plot to diagnose potential difficulties
with convergence of the Markov chain:

```{r,echo=TRUE}
traceplot(fit3, pars = c("mu", "tau"), inc_warmup = TRUE, nrow = 2)
```