mixture models [book review]
This article is originally published at https://xianblog.wordpress.com
Strangely enough, I became aware of this new book on mixtures through one of these annoying emails “Your work has been cited n times this week“… Mixture Models (Parametric, Semiparametric, and New Directions) by Weixin Yao and Sijia Wang got published by CRC Press earlier this year, within the Monographs on Statistics and Applied Probability green series (#175), and covers across 380 pages most aspects of mixture (and hidden Markov) estimation, if with strong emphasis on maximum likelihood estimation, while the new directions are unsurprisingly those pursued by the authors, namely robust and semi-parametric estimation, as well as model selection by testing.
An early warning about this book review is that I co-edited a Handbook of Mixture Analysis with my friends Sylvia Früwirth-Schnatter and Gilles Celeux a few years ago. I am therefore biased in what I would have included in a new book on the topic, the more because I find the available literature already plentiful, even though the early (1984) book of Titterington et al. that was my entry to the field may have become an historical reference. For instance, Finite Mixtures by McLachlan and Peel (2000) remains relevant, with similar emphasis on maximum likelihood and the EM algorithm, while Sylvia’s Finite Mixture and Markov Switching Models is still a reference to this day.
And an additional warning on me not being a massive fan of semi- and non-parametric estimation in this setting…
Preliminaries that may explain my limited enthusiasm about the book and its limited originality. Not that I found significant errors there (even though “improper priors [do not always] yield improper posteriors” [p.145] as we demonstrated in several papers), however, I had trouble with the uneven pace adopted by the authors that often skim some topics of importance while spending an inconsiderate amount of space on less relevant once. Some items get many bibliographical references, while others do not. For instance, EM receives a lion’s share (see, e..g, Sections 6.6 and 6.7). Or the 12 pages of proof in Chapter 10. Declination of sections into mixtures, mixtures of regressions, multivariate mixtures, hidden Markov models, and so on feels somewhat repetitive. This is particularly the case for the “mixture regression models” chapter.
The book also contains Bayesian entries, with a first introduction (p.105) in the discrete data chapter that precedes the short Bayesian chapter #4 (p.145), the same issue arising for related algorithms like Gibbs (p.107) that “estimate properties of the joint posterior” and MCMC (p.112). Which sort of erases the specificity of a Bayesian approach by reducing it to one item in the toolbox (with the wrong stress on MAP estimates). In this Bayesian chapter, MCMC validation is handled for discrete state spaces while applied in general spaces. The focus is mostly on relabelling for the following label switching chapter, albeit a large collection of methods are compared if not mentioned.
Handing an unknown number of components by hypothesis testing is supported in the next short chapter, although very little is said about reversible jump MCMC. And there is no general discussion on the consistency of these tests, in particular with bootstrap. Or at least on the regularity conditions they request. An puzzling paradox (p.191) is the existence of an unbounded Fisher information of an exponential mixture
when the weight π is the parameter (and close to 1).
High-dimensional mixtures in Chapter 8 are mostly handled by linear projections in smaller subspaces, which is natural given that they preserve the mixture structure but open a Pandora box of a wide range of proposed methods, again with little comparison available. Except in the R final section opposing several R functions on the same dataset (if unconclusively).
The semi-parametric chapters mention Dirichlet process priors, albeit briefly, but fail to relate to the recent works on using these when inferring about the number of components. Or failing to do so. There is also a very limited connection pointed out with machine learning but little can be gathered from the three page presentation (pp.308-310). These chapters also have significant overlap with the review paper of Xiang et al. (2019) in Statistical Science.
Most chapters end up with an R section, which usually reads as a quick demo of a related R package, like BayesLCA or our own mixtool. Hence not massively helpful beyond pointers to these packages. The numerical illustrations also are unevenly distributed between chapters, from nothing at all to four pages of small font tables on an MSE comparison between more or less robust approaches undertaken by Yu et al. (2020).
The above thus explains why I am not particularly excited about this bibliographical addition to the analysis of mixtures. It does offer a reference for researchers in the field by adding recent references and approaches to the existing books mentioned above, but I could not recommend it as a textbook (as suggested on p.xiii).
[Disclaimer about potential self-plagiarism: this post or an edited version may eventually appear in my Books Review section in CHANCE.]
Thanks for visiting r-craft.org
This article is originally published at https://xianblog.wordpress.com
Please visit source website for post related comments.