R News / Statistics

Dempster’s analysis and donkeys

by Pierre Jacob · March 24, 2021

This article is originally published at https://statisfaction.wordpress.com

This post is about estimating the parameter of a Bernoulli distribution from observations, in the “Dempster” or “Dempster–Shafer” way, which is a generalization of Bayesian inference. I’ll recall what this approach is about, and describe a Gibbs sampler to perform the computation. Intriguingly the associated Markov chain happens to be equivalent to the so-called “donkey walk” (not this one), as pointed out by Guanyang Wang and Persi Diaconis.

Denote the observations, or “coin flips”, by $(x_1,\ldots,x_N)\in\{0,1\}^N$ . The model stipulates that $x_n = 1(u_n < \theta)$ , where $u_n$ are independent Uniform(0,1) variables, and $\theta \in (0,1)$ is the parameter to be estimated. That is, $x_n = 1$ if some uniform lands below $\theta$ , which indeed occurs with probability $\theta$ , otherwise $x_n = 0$ . We’ll call the uniform variables “auxiliary”, and denote by $N_0,N_1$ the counts of “0” and “1”, with $N_0 + N_1 = N$ .

In a Bayesian approach, we would specify a prior distribution on the parameter; for example a Beta prior would lead to a Beta posterior on $\theta$ . The auxiliary variables would play no role; apart perhaps in Approximate Bayesian Computation. In Dempster’s approach, we can avoid the specification of a prior, and instead, and “transfer” the randomness from the auxiliary variables to a distribution of subsets of parameters; see ref [1] below. Let’s see how this works.

Given observations $(x_1,\ldots,x_N)\in\{0,1\}^N$ , there are auxiliary variables $(u_1,\ldots,u_N)$ that are compatible with the observations, in the sense that there exists some $\theta$ such that $\forall n\in\{1,\ldots,n\} \; x_n = 1(u_n < \theta)$ . And there are other configurations of $(u_1,\ldots,u_N)$ that are not compatible. If we denote by $\mathcal{I}_0$ the indices $n\in \{1,\ldots,N\}$ corresponding to an observed $x_n = 0$ , and likewise for $\mathcal{I}_1$ , we can see that there exists some “feasible” $\theta$ only when $\max_{n\in\mathcal{I}_1} u_n < \min_{n\in\mathcal{I}_0} u_n$ . In that case the feasible $\theta$ are in the interval $(\max_{n\in\mathcal{I}_1} u_n, \min_{n\in\mathcal{I}_0} u_n)$ . The following diagram illustrates this with $N_0 = 2, N_1 = 3$ .

How do we obtain the distribution of these sets $\mathcal{F}(u)$ , under the Uniform distribution of $u\in [0,1]^N$ and conditioning on $\mathcal{F}(u)\neq \emptyset$ ? We could draw $N$ uniforms, sorted in increasing order, and report the interval between the $N_1$ -th and the $(N_1+1)$ -th values (Section 4 in [1]). But that would be no fun, so let us consider a Gibbs sampler instead (taken from [4]). We will sample the auxiliary variables uniformly, conditional upon $\mathcal{F}(u)\neq \emptyset$ , and we will proceed by sampling the variables $u_n$ indexed by $\mathcal{I}_0$ given the variables indexed by $\mathcal{I}_1$ , and vice versa. The joint distribution of all the variables has density proportional to

$1(\forall n \; u_n \in (0,1) \quad \text{and} \quad \max_{n\in\mathcal{I}_1} u_n < \min_{n\in\mathcal{I}_0} u_n).$

From this joint density we can work out the conditionals. We can then express the Gibbs updates in terms of the endpoints of the interval $\mathcal{F}(u)$ . Specifically, writing the endpoints at iteration $t$ as $(Y^{(t)},Z^{(t)})$ , the Gibbs sampler is equivalent to:

Sampling $Y^{(t)} = Z^{(t-1)} \times \text{Beta}(N_1,1)$ .
Sampling $Z^{(t)} = Y^{(t)} + (1-Y^{(t)}) \times \text{Beta}(1,N_2)$ .

This is exactly the model of Buridan’s donkey in refs [2,3] below. The idea is that the donkey, being both hungry and thirsty but not being able to choose between the water and the hay, takes a step in either direction alternatively.

The donkey walk has been generalized to higher dimensions in [3], and in a sense our Gibbs sampler in [4] is also a generalization to higher dimensions… it’s not clear whether these two generalizations are the same or not. So I’ll leave that discussion for another day.

A few remarks to wrap up.

It’s a feature of Dempster’s approach that it yields random subsets of parameters rather than singletons as standard Bayesian analysis. Dempster’s approach is a generalization of Bayes: if we specify a standard prior and apply “Dempster’s rule of combination” we retrieve standard Bayes.
What do we do with these random intervals $\mathcal{F}(u)$ , once we obtain them? We can compute the proportion of them that intersects/is contained in a set of interest, for example the set $\{\theta > 1/2\}$ , and these proportions are transformed into measures of agreement, disagreement or indeterminacy regarding the set of interest, as opposed to posterior probabilities in standard Bayes.
Dempster’s estimates depend on the choice of sampling mechanism and associated auxiliary variables, which is topic of many discussions in that literature.
In a previous post I described an equivalence between the sampling mechanism considered in [1] when there are more than two categories, and the Gumbel-max trick… it seems that the Dempster’s approach has various intriguing connections.

References:

[1] Arthur P. Dempster, New Methods for Reasoning Towards Posterior Distributions Based on Smple Data, 1966. [link]
[2] Jordan Stoyanov & Christo Pirinsky, Random motions, classes of ergodic Markov chains and beta distributions, 2000. [link]
[3] Gérard Letac, Donkey walk and Dirichlet distributions, 2002. [link]
[4] Pierre E Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P. Dempster, A Gibbs sampler for a class of random convex polytopes, 2021. [link]

Thanks for visiting r-craft.org
This article is originally published at https://statisfaction.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Dempster’s analysis and donkeys

You may also like...

Categories

Dempster’s analysis and donkeys

You may also like...

Slides on practical Topic Modeling: Preparation, evaluation, visualization

Lies, damned lies, and rankings: the problem with Bloomberg’s COVID resilience ranking

Online Tests for TestVision with R/exams

Categories