---
title: "Workshop on statistical challenges in astronomy -- Hierarchical models in Stan"
author: "John Ormerod"
output:
  html_document: null
  pdf_document: default
  fontsize: 12pt
  geometry: margin=1in
  toc: yes

  
---

# Example 2: Normal linear model for accessing the relationship between central black hole mass and bulge velocity dispersion



Our first practical application illustrates the use of a Bayesian Gaussian model for 
describing the relation between the mass $M_\bullet$ ofthe supermassive black hole 
present in the center of a galaxy and the velocity dispersion $\sigma_\epsilon$ of 
the stars in its bulge.

The study of the global properties of individual galaxies has attracted attention in the
astronomical community.  
Merritt (2000) the so-called $M_\bullet-\sigma_\epsilon$ relation and took the
form             
$$
\frac{M_\bullet}{10^8 M_{\odot}} \approx 3.1 \times \left( \frac{\sigma_\epsilon}{200 km s^{-1}} \right)^4
$$
originally known as the Faber-Jackson law for black holes. Here $M_{\odot}$ is the mass of 
the Sun. In this example we follow Harris et al (2013) who model the relation as:
$$
\log \left( \frac{M_\bullet}{M_{\odot}} \right) = \alpha + \beta \log\left( \frac{\sigma_\epsilon}{\sigma_0} \right)
$$
where $\alpha_0$ is a reference value, usually chosen to be 200 km/s,
and $\alpha$ and $\beta$ are the linear predictor coefficients to be determined.


The data set used in this example was presented by Harris et al (2013). 
This is a compilation of literature data from a variety of sources obtained with the Hubble
Space Telescope as well as with a wide range of other ground-based facilities. The original
data set was composed of data from 422 galaxies, 46 of which have available measurements
of $M_\bullet$ and $\sigma_\epsilon$.

The logarithm of the observed central black hole mass, $M \equiv \log(M_\bullet/M_\odot)$, represents
our response variable and the logarithm of the observed velocity dispersion is our explanatory
variable, $\sigma \equiv \log(\sigma_\epsilon/\sigma_0)$. 
Their true values are denoted by $M^{\mbox{true}} \equiv \log(M_\bullet^{\mbox{true}}/M_\cdot)$
and $\sigma^{\mbox{true}} \equiv \log(\sigma_\epsilon^{\mbox{true}}/\sigma_0)$ 
respectively. 

We want to take into account the measurement
errors in both variables $(\varepsilon_M, \varepsilon_\sigma)$. 
In the following statistical model, $N = 46$ is the total number of galaxies, 
$\eta = \mu = \alpha + \beta \sigma$
is the linear predictor, $\alpha$ is the intercept, $\beta$ is the slope, and 
$\varepsilon$ is the intrinsic scatter
around $\mu$. Our model is

$$
\begin{array}{rl}
M_i               & \sim N(M_i^{\mbox{true}},\varepsilon_{M,i}^2) \\
M_i^{\mbox{true}} & \sim N(\mu_i,\varepsilon^2) \\
\mu_i             & = \alpha + \beta \dot \sigma_i \\
\sigma_i          & \sim N(\sigma_i^{\mbox{true}},\varepsilon_{\sigma,i}^2) \\
\end{array}
$$
for $i = 1,\ldots, N$. We assign non-informative priors to $\alpha$, $\beta$,
$\sigma$, and $\varepsilon$ with
$$
\begin{array}{rl}
\alpha & \sim N(0,10^3) \\
\beta  & \sim N(0,10^3) \\
\varepsilon^2 & \sim \mbox{Gamma}(10^{-3},10^{-3}) 
\end{array}
$$

Load rstan and set options.
```{r,echo=TRUE}
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```

Read and extract the variables into R.
```{r,echo=TRUE}
# Data
path_to_data = "./M_sigma.csv"

# Read data
MS<-read.csv(path_to_data,header = T)

# Identify variables
N <- nrow(MS) # number of data points
obsx <- MS$obsx # log black hole mass
errx <- MS$errx # error on log black hole mass
obsy <- MS$obsy # log velocity dispersion
erry <- MS$erry # error on log velocity dispersion

# Prepare data  
MS_data <- list(
  obsx = obsx,
  obsy = obsy,
  errx = errx,
  erry = erry,
  N = N
)
```

Specify the rstan model.
```{r,echo=TRUE}
# Stan Gaussian model
stan_code="
data {
    int<lower=0> N;          // number of data points
    vector[N] obsx;          // obs velocity dispersion
    vector<lower=0>[N] errx; // errors in velocity dispersion measurements
    vector[N] obsy;          // obs black hole mass
    vector<lower=0>[N] erry; // errors in black hole mass measurements
}
parameters {
    real alpha;            // intercept
    real beta;             // angular coefficient
    real<lower=0> epsilon; // scatter around true black hole mass
    vector[N] x;           // true velocity dispersion
    vector[N] y;           //  true black hole mass
}
model {
    // likelihood and priors
    alpha ~ normal(0, 1000);
    beta ~ normal(0, 1000);
    epsilon ~ gamma(0.001, 0.001);
    for (i in 1:N){
        x[i] ~ normal(0, 1000);
        y[i] ~ normal(0, 1000);
    }
    obsx ~ normal(x, errx);
    y ~ normal(alpha + beta*x, epsilon);
    obsy ~ normal(y, erry);
}
"
```

Fit the model using rstan
```{r,echo=TRUE,warning=FALSE}
oldcat <- cat()
cat <- function(...) return(invisible(NULL))

# Run mcmc
fit = stan(model_code=stan_code, data=MS_data, iter=15000, chains=4, warmup=5000)


```

```{r,echo=TRUE,warning=FALSE}
cat <- oldcat
```



Print the summary statistics of the parameter posterior distributions and plot
the fitted distributions.

```{r,echo=TRUE}
print(fit, pars = c("alpha", "beta", "epsilon", "lp__"))
plot(fit, pars = c("alpha", "beta", "epsilon", "lp__"))
pairs(fit, pars = c("alpha", "beta", "epsilon", "lp__"))
```

Using another method Harris et al (2013) reported parameter values are 
$\alpha = 8.412 \pm 0.067$ and $\beta = 4.610\pm 0.403$.

# References

From: Bayesian Models for Astrophysical Data, Cambridge Univ. Press (c) 2017, 
Joseph M. Hilbe, Rafael S. de Souza and Emille E. O. Ishida 
 
Code 10.1 - Normal linear model in R using JAGS for accessing the relationship between
             central black hole mass and bulge velocity dispersion

* Harris, G.L.H. et al (2013). 
"Globular clusters and supermassive black holes in galaxies: further analysis and a larger sample."
Monthly Notices of the Royal Astronomical Society, 438, 2117-2130.

* Merritt, D. (2000). "Black holes and galaxy evolution. Dynamics of Galaxies: from
the Early Universe to the Present", eds. F. Combes, G. A. Mamon, and V. Charmandaris
Vol. 197. Astronomical Society of the Pacific Conference Series, p. 221.