Estimation of Mixture Models Using Co-EM

  • Steffen Bickel
  • Tobias Scheffer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)

Abstract

We study estimation of mixture models for problems in which multiple views of the instances are available. Examples of this setting include clustering web pages or research papers that have intrinsic (text) and extrinsic (references) attributes. Our optimization criterion quantifies the likelihood and the consensus among models in the individual views; maximizing this consensus minimizes a bound on the risk of assigning an instance to an incorrect mixture component. We derive an algorithm that maximizes this criterion. Empirically, we observe that the resulting clustering method incurs a lower cluster entropy than regular EM for web pages, research papers, and many text collections.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abney, S.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002)Google Scholar
  2. 2.
    Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2002)CrossRefGoogle Scholar
  3. 3.
    Becker, S., Hinton, G.: A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355, 161–163 (1992)CrossRefGoogle Scholar
  4. 4.
    Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the IEEE International Conference on Data Mining (2004)Google Scholar
  5. 5.
    Bickel, S., Scheffer, T.: Estimation of mixture models using Co-EM. In: Proceedings of the ICML Workshop on Learning with Multiple Views (2005)Google Scholar
  6. 6.
    Blei, D., Jordan, M.: Modeling annotated data. In: Proceedings of the ACM SIGIR Conference on Information Retrieval (2003)Google Scholar
  7. 7.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory (1998)Google Scholar
  8. 8.
    Brefeld, U., Scheffer, T.: Co-EM support vector learning. In: Proceedings of the International Conference on Machine Learning (2004)Google Scholar
  9. 9.
    Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proc. of the Conf. on Empirical Methods in Natural Language Processing (1999)Google Scholar
  10. 10.
    Dasgupta, S., Littman, M., McAllester, D.: PAC generalization bounds for co-training. In: Proceedings of Neural Information Processing Systems (2001)Google Scholar
  11. 11.
    de Sa, V.: Learning classification with unlabeled data. In: Proceedings of Neural Information Processing Systems (1994)Google Scholar
  12. 12.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39 (1977)Google Scholar
  13. 13.
    Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: Proceedings of the International Conference on Machine Learning (2002)Google Scholar
  14. 14.
    Kailing, K., Kriegel, H., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    McCallum, A., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proc. of the International Conference on Machine Learning (1998)Google Scholar
  16. 16.
    Muslea, I., Kloblock, C., Minton, S.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the International Conf. on Machine Learning (2002)Google Scholar
  17. 17.
    Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Workshop on Information and Knowledge Management (2000)Google Scholar
  18. 18.
    Sinkkonen, J., Nikkilä, J., Lahti, L., Kaski, S.: Associative clustering. In: Proceedings of the European Conference on Machine Learning (2004)Google Scholar
  19. 19.
    Wu, J.: On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103 (1983)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the Annual Meeting of the Association for Comp. Ling. (1995)Google Scholar
  21. 21.
    Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01-40, Department of Computer Science, University of Minnesota, Minneapolis, MN, 2001 (2001)Google Scholar
  22. 22.
    Zhou, Z., Li, M.: Semi-supervised regression with co-training. In: Proceedings of the International Joint Conference on Artificial Intelligence (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Steffen Bickel
    • 1
  • Tobias Scheffer
    • 1
  1. 1.School of Computer ScienceHumboldt-Universität zu BerlinBerlinGermany

Personalised recommendations