A Comparative Study on the Use of Labeled and Unlabeled Data for Large Margin Classifiers

  • Hiroya Takamura
  • Manabu Okumura
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


We propose to use both labeled and unlabeled data with the Expectation-Maximization (EM) algorithm in order to estimate the generative model and use this model to construct a Fisher kernel. The Naive Bayes generative probability is used to model a document. Through the experiments of text categorization, we empirically show that, (a) the Fisher kernel with labeled and unlabeled data outperforms Naive Bayes classifiers with EM and other methods for a sufficient amount of labeled data, (b) the value of additional unlabeled data diminishes when the labeled data size is large enough for estimating a reliable model, (c) the use of categories as latent variables is effective, and (d) larger unlabeled training datasets yield better results.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Herbrich, R., Graepel, T.: A PAC-bayesian margin bound for linear classifiers: Why SVMs work. Advances in Neural Information Processing Systems 12, 224–230 (2000)Google Scholar
  3. 3.
    Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical Report AIM-1625, Artifical Intelligence Laboratory, Massachusetts Institute of Technology (1998)Google Scholar
  4. 4.
    Hofmann, T.: Learning the similarity of documents: An information geometric approach to document retrieval and categorization. Advances in Neural Information Processing Systems 12, 914–920 (2000)Google Scholar
  5. 5.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1998)Google Scholar
  6. 6.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)Google Scholar
  7. 7.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of 16th International Conference on Machine Learning (ICML 1999), pp. 200–209 (1999)Google Scholar
  8. 8.
    Kass, R.E., Vos, P.W.: Geometrical foundations of asymptotic inference. Wiley, New York (1997)zbMATHGoogle Scholar
  9. 9.
    Kressel, U.: Pairwise classication and support vector machines. In: Schölkopf, C.J.C., Burgesa, A.J. (eds.) Advances in Kernel Methods *Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)Google Scholar
  10. 10.
    Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), pp. 192–199 (2001)Google Scholar
  11. 11.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)Google Scholar
  12. 12.
    Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)zbMATHCrossRefGoogle Scholar
  13. 13.
    Smola, A.J., Bartlett, P.J., Schölkopf, B., Schuurmans, D.: Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)zbMATHGoogle Scholar
  14. 14.
    Tsuda, K., Kawanabe, M.: The leave-one-out kernel. In: Proceedings of International Conference on Artificial Neural Networks, pp. 727–732 (2002)Google Scholar
  15. 15.
    Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Computation 14(10), 2397–2414 (2002)zbMATHCrossRefGoogle Scholar
  16. 16.
    Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Networks 11(2), 271–282 (1998)CrossRefGoogle Scholar
  17. 17.
    Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Hiroya Takamura
    • 1
  • Manabu Okumura
    • 1
  1. 1.Precision and Intelligence LaboratoryTokyo Institute of TechnologyYokohamaJapan

Personalised recommendations