R / R News / Statistics

Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression

by AO · September 6, 2017

This article is originally published at https://stablemarkets.wordpress.com

In a previous post, I derived and coded a Gibbs sampler in R for estimating a simple linear regression.

In this post, I will do the same for multivariate linear regression. I will derive the conditional posterior distributions necessary for the blocked Gibbs sampler. I will then code the sampler and test it using simulated data.

R code for simulating data and implementing the blocked Gibbs is in by GitHub repo.

A Bayesian Model

Suppose we have a sample size of $n$ subjects. We observe an $n \times 1$ outcome vector $y$ . The Bayesian multivariate regression assumes that this vector is drawn from a multivariate normal distribution where the mean vector is $X\beta$ and covariance matrix $\phi I$ . Here, $X$ is an observed $n \times p$ matrix of covariates. Note, the first column of this matrix is identity. The parameter vector $\beta$ is $p \times 1$ , $\phi$ is a common variance parameter, and $I$ is the $n \times n$ identity matrix. By using the identity matrix, we are assuming independent observations. Formally,

$y \sim N_n(X\beta, \phi I)$

So far, this is identical to the multivariate normal regression seen in the frequentist setting. Assuming $X$ is full column rank, maximizing the likelihood yields the solution:

$\hat{\beta} = (X'X)^{-1}X'y$

A Bayesian model is obtained by specifying a prior distribution for $\beta$ and $\phi$ . For this example, I will use a flat, improper prior for $\beta$ and a inverse gamma prior for $\phi$ :

$\beta \sim f(\beta) \propto 1$

$\phi \sim IG(\alpha, \gamma)$

We assume the hyperparamters are known for simplicity.

Joint Posterior Distribution

The joint posterior distribution is proportional to

$p(\beta, \phi | y, X) \propto p(y | \beta, \phi, X) p(\beta|X) p(\phi| X)$

We can write this because we assumed prior independence. That is,

$p(\beta,\phi|X) = p(\beta|X) p(\phi|X)$ .

Substituting the distributions,

$p(\beta, \phi | y, X) \propto \phi^{-n/2} e^{-\frac{1}{2\phi}(y - X\beta)'(y - X\beta) } \phi^{-(\alpha + 1)}e^{-\frac{\gamma}{\phi}}$

Blocked Gibbs Sampler

Before coding the sampler, we need to derive the components of the Gibbs sampler – the posterior conditional distributions of each parameter.

The conditional posterior of $\phi$ is found by dropping factors independent of $\phi$ from the joint posterior and rearranging. This case is easy since there’s nothing to drop:

$\begin{aligned} p(\phi | \beta, \alpha, \gamma, y, X) & \propto \phi^{-n/2} e^{-\frac{1}{2\phi}(y - X\beta)'(y - X\beta) } \phi^{-(\alpha + 1)}e^{-\frac{\gamma}{\phi}} \\ & \propto \phi^{-(\alpha + \frac{n}{2} + 1)} e^{-\frac{1}{\phi}\Big[ \frac{1}{2}(y - X\beta)'(y - X\beta) + \gamma \Big] } \\ & = IG(shape=\alpha + \frac{n}{2}, rate=\frac{1}{2}(y - X\beta)'(y - X\beta) + \gamma) \end{aligned}$

The conditional posterior of $\beta$ takes some more linear algebra.

$\begin{aligned} p(\beta | \phi, \alpha, \gamma, y, X ) & \propto e^{-\frac{1}{2\phi}(y - X\beta)'(y - X\beta) } \\ & \propto e^{-\frac{1}{2\phi}( y'y - 2\beta'X'y + \beta'X'X\beta ) } \\ & \propto e^{-\frac{1}{2\phi}(\beta'X'X\beta - 2\beta'X'y ) } \\ & \propto e^{-\frac{1}{2\phi}(\beta'X'X\beta - 2\beta'(X'X)(X'X)^{-1}X'y ) } \\ & \propto e^{-\frac{1}{2\phi}(\beta - (X'X)^{-1}X'y )'(X'X)(\beta - (X'X)^{-1}X'y )} \\ & = N_p((X'X)^{-1}X'y, \phi (X'X)^{-1} )\end{aligned}$

This is a really nifty and intuitive result. Because we use a flat prior on the parameter vector, the conditional posterior of the parameter vector is centered around the maximum likelihood estimate $\hat{\beta} = (X'X)^{-1}X'y$ . The covariance matrix of the conditional posterior is the frequentist estimate of the covariance matrix, $Cov(\hat \beta) = \phi (X'X)^{-1}$

Also note that the conditional posterior is a multivariate distribution since $\beta$ is a vector. So in each iteration of the Gibbs sampler, we draw a whole vector, or “block”, from the posterior. This is much more efficient than drawing from the conditional distribution of each parameter conditional on the others, one at a time. This is why the procedure is called a “blocked” sampler.

Simulation

All code referencing in the following can be found in my GitHub repo.

I simulate a $50 \times 1$ outcome vector from $y \sim N_{50}(X\beta, 10000 \cdot I)$ . Here $I$ is the $n \times n$ identity matrix and $X$ is a $50 \times 4$ model matrix. The true parameter vector, $\beta$ is

$\beta = (1000, 50, -50, 10) \ '$

Running the blocked Gibbs sampler (the block_gibbs() function) produces estimates of the true coefficient and variance parameters. 500,000 iterations were run. A burn-in period of 100,000 with a trimming of 10 iterations.

Below are the plots of the MCMC chains with true values indicated with a red line.

Here are the posterior distributions of the parameters after applying burn-in and trimming:

Seems like we’re able to get reasonable posterior estimates of these parameters. The distributions are not exactly centered around the truth because our data set is only one realization of the truth. To make sure the Bayesian estimator is working properly, I repeat this exercise for 1,000 simulated datasets.

This will yield a 1,000 sets of posterior means and 1,000 sets of 95% credible intervals. On average, these 1,000 posterior means should be centered around the truth. And on average the true parameter values should be within the credible interval 95% of the time.

Below is a summary of these evaluations.

The “Estimator Means” column is the average posterior mean across all 1,000 simulations. Pretty good. The percent bias is all less than 5%. The coverage of the 95% CI is around 95% for all parameters.

Extensions

There are many extensions we can make to this model. For example, other distributions aside from the Normal could be used in order to accommodate different types of outcomes. For example, if we had binary data, we could model it as:

$y \sim Ber(p)$

$logit(p) = X\beta$

And then place a prior distribution on $\beta$ . This idea generalizes Bayesian linear regression to Bayesian GLM.

In the linear case outlined in this post, it’s possible to have modeled the covariance matrix more flexibly. Instead, it is assumed that the covariance matrix is diagonal with a single common variance. This is the homoskedasticity assumption made in multiple linear regression. If the data is clustered (e.g., multiple observations per subject), we could use an inverse-Wishart distribution to model the whole covariance matrix.

Thanks for visiting r-craft.org
This article is originally published at https://stablemarkets.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression

You may also like...

Categories

Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression

A Bayesian Model

Joint Posterior Distribution

Blocked Gibbs Sampler

Simulation

Extensions

You may also like...

Will the next German parliament have a gigantic size? A law and coding challenge…

RTutor: Water Pollution and Cancer

Registration open for rstudio::conf 2018!

Categories