R News / Statistics

BayesBag, and how to approximate it

by Pierre Jacob · October 3, 2019

This article is originally published at https://statisfaction.wordpress.com

Hi all,

This post describes how unbiased MCMC can help in approximating expectations with respect to “BayesBag”, an alternative to standard posterior distributions mentioned in Peter Bühlmann‘s discussion of Big Bayes Stories (which was a special issue of Statistical Science). Essentially BayesBag is the result of “bagging” applied to “Bayesian inference”. In passing, here is an R script implementing this on a model written in the Stan language (as in this previous post), namely a Negative Binomial regression, and using a pure R implementation of unbiased HMC (joint work with Jeremy Heng). The script produces the following figure:

which shows, for two parameters of the model, the cumulative distribution function (CDF) under standard Bayes (blue thin line) and under BayesBag (wider red line). BayesBag results in distributions on the parameter space that are more “spread out” than standard Bayes.

So what is BayesBag? Let’s quote from Bühlmann’s discussion:

We can stabilize the posterior distribution by using a bootstrap and aggregation scheme, in the spirit of bagging (Breiman, 1996b). In a nutshell, denote by $\text{D}^\star$ a bootstrap- or subsample of the data $\text{D}$ . The posterior of the random parameters $\theta$ given the data $\text{D}$ has c.d.f. $F(\cdot|\text{D})$ , and we can stabilize this using
$F_{\text{BayesBag}} (\cdot | \text{D} ) = \mathbb{E}^\star [ F( \cdot | \text{D}^\star ) ]$ ,
where $\mathbb{E}^\star$ is with respect to the bootstrap- or subsampling scheme. We call it the BayesBag estimator. It can be approximated by averaging over B posterior computations for bootstrap- or subsamples, which might be a rather demanding task (although say B=10 would already stabilize to a certain extent).

Indeed in the usual MCMC way, we would have to run an MCMC algorithm given each data set $\text{D}^\star$ , and repeat B times. Each MCMC run is asymptotically consistent in its number of iterations. So to approximate BayesBag consistently one would need both B and the number of iterations per data set to go to infinity. This is awkward; for instance, suppose that you chose some B and some number of iterations per chain, and obtain some result. You would next like to obtain a more precise result, to be sure. What do you do? Increase B or increase the number of iterations per chain? This seems like a difficult choice.

This is where unbiased MCMC might be handy: since BayesBag is defined as an average over posterior distributions, it is very simple to obtain unbiased estimators with respect to BayesBag itself by

sampling a data set $\text{D}^\star$ by bootstrapping from $\text{D}$ ,
obtaining an unbiased approximation of $\pi(\cdot |D^\star)$ with unbiased MCMC.

See the R script for an implementation. By the law of total expectation this produces unbiased approximations of BayesBag; we can then repeat B times and let B go to infinity. So if we want to refine our results, we can just increase B. The same idea works for the “cut distribution” as illustrated in Section 5.5 of the unbiased MCMC paper (see this previous post).

It thus appears that unbiased MCMC for BayesBag costs about the same computational effort as unbiased MCMC for standard posteriors: the only difference is that the data are bootstrapped before each pair of chains is generated. This would of course work with other ways of obtaining unbiased estimators or even perfect samples.

Thanks for visiting r-craft.org
This article is originally published at https://statisfaction.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

BayesBag, and how to approximate it

You may also like...

Categories

BayesBag, and how to approximate it

You may also like...

Data Storytelling

UNHCR Refugee Data Visualized

Forecasting of Corylus, Alnus, and Betula pollen concentration in the air in Poland (PhD thesis)

Categories