Distribution mixtures with product components have been applied repeatedly to determine clusters in multivariate data. Unfortunately, for categorical variables the mixture parameters are not uniquely identifiable and therefore the result of cluster analysis may become questionable. We give a simple proof that any non-degenerate discrete product mixture can be equivalently described by infinitely many different parameter sets. Nevertheless a unique result of cluster analysis can be guaranteed by additional constraints. We propose a heuristic method of sequential estimation of components to guarantee a unique identification of clusters by means of EM algorithm. The application of the method is illustrated by a numerical example.


  1. 1.
    Bartholomew, D.J.: Factor analysis for categorical data. J. Roy. Statist. Soc. B 42(3), 293–321 (1980)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Blischke, W.R.: Estimating the parameters of mixtures of binomial distributions. Journal Amer. Statist. Assoc. 59, 510–528 (1964)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Carreira-Perpignan, M.A., Renals, S.: Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation 12, 141–152 (2000)CrossRefGoogle Scholar
  4. 4.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Fielding, A.: Latent structure models. In: O’Muircheaxtaigh, C.A., Payne, C. (eds.) The Analysis of survey data, pp. 125–157. Wiley, London (1977)Google Scholar
  6. 6.
    Gibson, W.A.: Three multivariate models: Factor analysis, latent structure analysis and latent profile analysis. Psychometrika 24, 229–252 (1969)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Grim, J.: Multivariate statistical pattern recognition with nonreduced dimensionality. Kybernetika 22, 142–157 (1986)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Grim, J., Boček, P., Pudil, P.: Safe dissemination of census results by means of interactive probabilistic models. In: Proceedings of the ETK-NTTS 2001 Conference (Hersonissos (Crete), European Communities 2001), vol. 2, pp. 849–856 (2001)Google Scholar
  9. 9.
    Grim, J.: Latent Structure Analysis for Categorical Data. Research Report UTIA, No. 2019, Academy of Sciences, Czech Republic, Prague, p. 13 (2001)Google Scholar
  10. 10.
    Grim, J., Haindl, M.: Texture Modelling by Discrete Distribution Mixtures. Computational Statistics and Data Analysis 41(3-4), 603–615 (2003)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Grim, J., Kittler, J., Pudil, P., Somol, P.: Multiple classifier fusion in probabilistic neural networks. Pattern Analysis & Applications 5(7), 221–233 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Gyllenberg, M., Koski, T., Reilink, E., Verlaan, M.: Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Prob. 31, 542–548 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    McLachlan, G.J., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2000)zbMATHCrossRefGoogle Scholar
  14. 14.
    Lazarsfeld, P.F., Henry, N.: Latent structure analysis. Houghton Mifl., Boston (1968)zbMATHGoogle Scholar
  15. 15.
    Pearl, J.: Probabilistic reasoning in intelligence systems: networks of plausible inference. Morgan-Kaufman, San Mateo (1988)Google Scholar
  16. 16.
    Suppes, P.A.: Probabilistic theory of causality. North-Holland, Amsterdam (1970)Google Scholar
  17. 17.
    Teicher, H.: Identifiability of mixtures of product measures. Ann. Math. Statist. 39, 1300–1302 (1968)MathSciNetGoogle Scholar
  18. 18.
    Vermunt, J.K., Magidson, J.: Latent Class Cluster Analysis. In: Hagenaars, J.A., et al. (eds.) Advances in Latent Class Analysis. Cambridge University Press, Cambridge (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiří Grim
    • 1
  1. 1.Institute of Information Theory and Automation of the Czech Academy of SciencesPrague 8Czech Republic

Personalised recommendations