Incremental Mixture Learning for Clustering Discrete Data

  • Konstantinos Blekas
  • Aristidis Likas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3025)

Abstract

This paper elaborates on an efficient approach for clustering discrete data by incrementally building multinomial mixture models through likelihood maximization using the Expectation-Maximization (EM) algorithm. The method adds sequentially at each step a new multinomial component to a mixture model based on a combined scheme of global and local search in order to deal with the initialization problem of the EM algorithm. In the global search phase several initial values are examined for the parameters of the multinomial component. These values are selected from an appropriately defined set of initialization candidates. Two methods are proposed here to specify the elements of this set based on the agglomerative and the kd-tree clustering algorithms. We investigate the performance of the incremental learning technique on a synthetic and a real dataset and also provide comparative results with the standard EM-based multinomial mixture model.

Keywords

Local Search Mixture Model Leaf Node Synthetic Dataset Discrete Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cheeseman, P., Stutz, J.: Bayesian classification (AutoClass): Theory and resutls. In: Fayyad, U., Piatesky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI Press, CA (1995)Google Scholar
  2. 2.
    Bengio, Y., Bengio, S.: Modeling high-dimensional discrete data with multi-layer neural networks. In: Solla, S.A., Leen, T.K., Móller, K.-R. (eds.) Advances in Neural Processing Systems 12, pp. 400–406. MIT Press, Cambridge (2000)Google Scholar
  3. 3.
    Meil˘a, M., Hecherman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42, 9–29 (2001)CrossRefGoogle Scholar
  4. 4.
    Blekas, K., Fotiadis, D.I., Likas, A.: Greedy mixture learning for multiple motif discovering in biological sequences. Bioinformatics 19(5), 607–617 (2003)CrossRefGoogle Scholar
  5. 5.
    Chickering, D., Heckerman, D.: Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning 29, 181–212 (1997)MATHCrossRefGoogle Scholar
  6. 6.
    Render, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26(2), 195–239 (1984)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Processing Letters 15, 77–87 (2002)MATHCrossRefGoogle Scholar
  8. 8.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Konstantinos Blekas
    • 1
  • Aristidis Likas
    • 1
  1. 1.Department of Computer ScienceUniversity of IoanninaIoanninaGreece

Personalised recommendations