Skip to main content
Log in

Mixture models: building a parameter space

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Despite the flexibility and popularity of mixture models, their associated parameter spaces are often difficult to represent due to fundamental identification problems. This paper looks at a novel way of representing such a space for general mixtures of exponential families, where the parameters are identifiable, interpretable, and, due to a tractable geometric structure, the space allows fast computational algorithms to be constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Anaya-Izquierdo, K., Marriott, P.: Local mixture models of exponential families. Bernoulli 13, 623–640 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G.: Mixture models for classification. In: Decker, R., Lenz, H.-J. (eds.) Advances in Data Analysis, pp. 3–14. Springer, Berlin (2007)

    Chapter  Google Scholar 

  • Chen, J., Kalbfleisch, J.: Penalized minimum-distance estimates in finite mixture models. Can. J. Stat. 24(2), 167–175 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Culter, A.: Windham: information-based validity functionals for mixture analysis. In: Proceedings of the First US/Japan Conference on the frontires of Statistical Modeling in Informational Approach Amsterdam: Kluwer, pp. 149–170 (1994)

  • Donoho, D.L.: One-sided inference about functionals of a density. Ann. Stat. 16, 1390–1420 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Everitt, B.S.: An introduction to finite mixture distributions. Stat. Methods Med. Res. 5(2), 107–127 (1996)

    Article  Google Scholar 

  • Gan, L., Jiang, J.: A test for global maximum. J. Am. Stat. Assoc. 94(447), 847–854 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P., Stewart, M.: Theoretical analysis of power in a two-componet normal mixture model. J. Stat. Plan. Inference 134, 158–179 (2005)

    Article  MATH  Google Scholar 

  • Leroux, B.G., et al.: Consistent estimation of a mixing distribution. Ann. Stat. 20(3), 1350–1360 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, P., Chen, J.: Testing the order of a finite mixture. J. Am. Stat. Assoc. 105(491), 1084–1092 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, P., Chen, J., Marriott, P.: Non-finite fisher information and homogeneity: an em approach. Biometrika, 1–16 (2008)

  • Lindsay, B.G.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics (1995)

  • Lindsay, B.G., Roeder, K.: Uniqueness of estimation and identifiability in mixture models. Can. J. Stat. 21(2), 139–147 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Maciejowska, K.: Assessing the number of componentsi a normal mixture: an alternative appraoch. University Library of Munich (No. 50303) (2013)

  • Maroufy, V., Marriott, P.: Generalizing the frailty assumptions in survival analysis. arXiv:1510.02425 (2015)

  • Marriott, P.: On the local geometry of mixture models. Biometrika 89, 77–93 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Marriott, P.: Extending local mixture models. AISM 59, 95–110 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Mclachlan, G., Peel, D.: Extending local mixture models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  • Morris, C.: Natural exponential families with quadratic variance functions. Ann. Stat. 10(1), 65–80 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Richardson, S., Green, P.J.: On bayesian analysis of mixtures with an unknown number of components (with discuassion). J. R. Stat. Soc. B 59, 731–792 (1997)

    Article  MATH  Google Scholar 

  • Schlattmann, P.: Medical Applications of Finite Mixture models. Springer, Berlin (2009)

    MATH  Google Scholar 

  • Shun, Z., McCullagh, P.: Laplace approximation of high dimensional integrals. J. R. Stat. Soc. Ser. B (Methodological) 1, 749–760 (1955)

    MATH  Google Scholar 

  • Struik, D.J.: Lectures on Classical Differential Geometry. Dover Publications, Mineola (1988)

    MATH  Google Scholar 

  • Tallis, G.: The identifiability of mixtures of distributions. J. Appl. Prob. 6(2), 389–398 (1969)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vahed Maroufy.

Appendix

Appendix

Following the methodology in Sect. 2.1 select suitably separated grid points \(\varvec{\mu }=(\mu _1,\ldots ,\mu _L)\), which are fixed throughout. Select initial proportions \(\varvec{\rho }^{(0)}=(\rho _1^{(0)},\ldots ,\rho _L^{(0)})\) and local mixture parameters \(\underline{\varvec{\lambda }}^{(0)} =(\varvec{\lambda }^{1,(0)},\ldots ,\varvec{\lambda }^{L,(0)})\). Suppose we have \(\varvec{\rho }^{(r)}\) and \(\underline{\varvec{\lambda }}^{(r)}\) at step r, where \(L_r\) is the number of non-zero proportions and \(L_r \le L\). To obtain the estimates at step \(r+1\) run the following steps.

  1. 1.

    Calculate \(\rho _l^{(r+1)}=\frac{n_l}{n}\), where \(n_l= \sum \nolimits _{i=1}^{n}w^{(r+1)}_{il}\) and for \(x=1,\ldots ,n;\,\, l=1,\ldots ,L_r\)

    $$\begin{aligned} w^{(r+1)}_{il}=\frac{\rho _l^{(r)} g_{\mu _l}\left( x_i,\varvec{\lambda }^{l,(r)}\right) }{\sum \nolimits _{l=1}^{L_r} \rho _l^{(r)} g_{\mu _l}\left( x_i,\varvec{\lambda }^{l,(r)}\right) }. \end{aligned}$$
  2. 2.

    Choose a positive value \(0 < \gamma < 1\), and check if there is any l such that \(\rho _l^{(r+1)} < \gamma \).

    1. (a)

      If yes: exclude the components corresponding to \(\rho _l^{(r+1)} < \gamma \), update \(L_r\rightarrow L_{r+1}\) and go back to step 1.

    2. (b)

      If no: go to step 3.

  3. 3.

    Classify the data set into \(\varvec{x}^1,\ldots \varvec{x}^{L_{r+1}}\) by assigning each \(x_i\) to only one local mixture component. For each \(l=1,\ldots ,L_{r+1}\), update \(\varvec{\lambda }^{l,(r)}\) by

    $$\begin{aligned} \varvec{\lambda }^{l,(r+1)}= {\text {*}}{arg\,max}_{\varvec{\lambda } \in \Lambda _{\mu _l}} l_{\mu _l}(x^l,\varvec{\lambda }), \end{aligned}$$

    where \( l_{\mu _l}(x^l,\cdot )\) is the log-likelihood function for the component l as defined in Marriott (2002).

Remark 1

Step 2 restricts the number of required components for fitting a data set in a way that there is enough information necessary for running inference on each local mixture component. The value, \(\gamma \), has an influence on the final result of the algorithm in a similar way that an initial value affects the convergence of a general EM algorithm (Table 1).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maroufy, V., Marriott, P. Mixture models: building a parameter space. Stat Comput 27, 591–597 (2017). https://doi.org/10.1007/s11222-016-9641-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9641-6

Keywords

Navigation