R News / Statistics

Budget constrained simulations

by Pierre Jacob · February 5, 2019

This article is originally published at https://statisfaction.wordpress.com

You’ll have to read the post to get what these lines are about.

Hi all,

This post is about some results from “Bias Properties of Budget Constrained Simulations“, by Glynn & Heidelberger and published in Operations Research in 1990. I have found these results extremely useful, and our latest manuscript on unbiased MCMC recalls them in detail. Below I go through some of the results and describe the simulations that lead to the above figure.

Consider the following setting: you have a time budget (say, one day), you have access to a number of machines, and you have Monte Carlo simulations to run. Each simulation produces a random variable $X$ with expectation $\mu = \mathbb{E}[X]$ , in a random time denoted by $\tau$ . The variable $X$ could, for instance, be the output of a rejection sampler, or of some Markov chain-based algorithm such as coupling from the past, some unbiased MCMC procedure, or some sequential Monte Carlo sampler for normalizing constant estimation. In any case you’re interested in $\mu$ .

You could first ask each machine to produce precisely $N$ estimators. But you would not be sure how long it would take for the calculations to finish. Also, it could be wasteful: suppose that machine $A$ finishes its calculations faster than machine $B$ : you would prefer machine $A$ to keep on producing estimators while waiting for $B$ to finish.

So instead, you prefer to ask each machine to produce as many estimators as possible within the given time budget, starting a new estimator whenever the previous one completes. But what exactly do you do when the budget expires? Do you interrupt the machines? Or do you wait for them to complete their on-going calculations? What else? These are the types of question that Glynn & Heidelberger help to answer.

Let’s introduce some notation (exactly that of Glynn & Heidelberger). Suppose that a machine produces $X_1,X_2,X_3,\ldots$ in sequence, in times denoted by $\tau_1, \tau_2, \tau_3,\ldots$ . Let us denote by $N(t)\geq 0$ the number of completed estimators at time $t$ . Define $\bar{X}_0 = 0$ and $\bar{X}_{n} = n^{-1}\sum_{k=1}^n X_k$ for $n \geq 1$ . Thus $\bar{X}_{N(t)}$ denotes the estimator obtained if we interrupt the machine at time $t$ and average over the available estimators; if no estimator is available yet, then $\bar{X}_{N(t)} = 0$ .

Equation (1) in Glynn & Heidelberger states (assuming $\mathbb{E}[|X_1|]<\infty$ ):

$\mathbb{E}[\bar{X}_{N(t)}] = \mathbb{E}[X_{1}] - \mathbb{E}[X_{1} 1(\tau_1 > t)].$

Thus the estimator $\bar{X}_{N(t)}$ is biased for the object of interest $\mathbb{E}[X_1] = \mu$ , and the bias diminishes with $t$ . The proof of that result starts by noting the conditional exchangeability of the pairs $(X_k,\tau_k)$ given $\{N(t)=k\}$ . That conditional exchangeability is used to compute, for any $k\geq 1$ , $\mathbb{E}[\bar{X}_{N(t)}|N(t)=k]$ . Finally, a multiplication by $\mathbb{P}(N(t) = k)$ , a sum over $k$ and the application of Fubini’s theorem conclude the proof.

An alternative estimator could be $\bar{X}_{N(t)+1}$ , obtained if we wait for on-going calculations to complete. That estimator is biased too, and Theorem 2 in Glynn & Heidelberger gives explicitly:

$\mathbb{E}[\bar{X}_{N(t)+1}] = \mathbb{E}[X_{1}] + \mathbb{E}[X_{N(t)+1}/(N(t)+1)] - \mathbb{E}[X_1 / (N(t) + 1)]$ .

The bias of that estimator goes to zero as $t\to \infty$ , as indeed stated in Corollary 10 of the paper.

It turns out that one can obtain an unbiased estimator of $\mathbb{E}[X_1]=\mu$ with a mild modification of the above estimators. Define $\tilde{X}_{N(t)} = \bar{X}_{N(t)}$ if $N(t) \geq 1$ and $\tilde{X}_{N(t)} = X_1$ if $N(t) = 0$ . That is, if at least one estimator is already available, interrupt the on-going calculation at time $t$ ; otherwise wait for the first estimator to complete. Then $\mathbb{E}[\tilde{X}_{N(t)}] = \mathbb{E}[X_1]$ . This is stated in Corollary 7 of Glynn & Heidelberger, and is a consequence of the first equation mentioned above.

Now, what about the figure above? It represents the bias and the expected completion time of the three estimators: $\bar{X}_{N(t)}$ , $\bar{X}_{N(t)+1}$ , and $\tilde{X}_{N(t)}$ . The plots show bias (left) and expected completion time (right) as a function of the budget. In this example $X \sim \text{Gamma}(2,2)$ , with a mean $\mu = 1$ , and $\tau = X + \text{Exp}(1)$ , thus creating dependencies between estimators and compute times. The three estimators are represented by different colours; I’ll let you guess which one is which!

Thanks for visiting r-craft.org
This article is originally published at https://statisfaction.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Budget constrained simulations

You may also like...

Categories

Budget constrained simulations

You may also like...

An absolute beginner’s guide to creating data frames for a Stack Overflow [r] question

Streamtubes in Plotly with Python

Advent of 2021, Day 16 – Dataframe operations for Spark streaming

Categories