A PAC-Style Model for Learning from Labeled and Unlabeled Data

  • Maria-Florina Balcan
  • Avrim Blum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3559)

Abstract

There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of different approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PAC-style framework that can be used to model many of these assumptions, and analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C, one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution. In this view, unlabeled data can be helpful because it allows one to estimate compatibility over the space of hypotheses, and reduce the size of the search space to those that, according to one’s assumptions, are a-priori reasonable with respect to the distribution. We discuss a number of technical issues that arise in this context, and provide sample-complexity bounds both for uniform convergence and ε-cover based algorithms. We also consider algorithmic issues, and give an efficient algorithm for a special case of co-training.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balcan, M.F., Blum, A., Yang, K.: Co-training and expansion: Towards bridging theory and practice. In: NIPS (2004)Google Scholar
  2. 2.
    Benedek, G.M., Itai, A.: Learnability with respect to a fixed distribution. Theoretical Computer Science 86, 377–389 (1991)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proc. ICML, pp. 19–26 (2001)Google Scholar
  4. 4.
    Blum, A., Frieze, A., Kannan, R., Vempala, S.: A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22, 35–52 (1998)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Annual Conf. Computational Learning Theory, pp. 92–100 (1998)Google Scholar
  6. 6.
    Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik Chervonenkis dimension. Journal of the ACM 36(4), 929–965 (1989)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey of recent advances. Manuscript (2004)Google Scholar
  8. 8.
    Boucheron, S., Lugosi, G., Massart, P.: A sharp concentration inequality with applications. Random Structures and Algorithms 16, 277–292 (2000)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)CrossRefGoogle Scholar
  10. 10.
    Castelli, V., Cover, T.M.: The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory 42(6), 2102–2117 (1996)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Devroye, L., Gyorfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)MATHGoogle Scholar
  12. 12.
    Dunagan, J., Vempala, S.: Optimal outlier removal in high-dimensional spaces. In: Proceedings of the 33rd ACM Symposium on Theory of Computing (2001)Google Scholar
  13. 13.
    Ehrenfeucht, A., Haussler, D., Kearns, M., Valiant, L.: A general lower bound on the number of examples needed for learning. Inf. and Comput 82, 246–261 (1989)MathSciNetGoogle Scholar
  14. 14.
    Flaxman, A.: Personal communication (2003)Google Scholar
  15. 15.
    Hwa, R., Osborne, M., Sarkar, A., Steedman, M.: Corrected co-training for statistical parsers. In: ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, Washington D.C (2003)Google Scholar
  16. 16.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: Proc. ICML, pp. 200–209 (1999)Google Scholar
  17. 17.
    Levin, A., Viola, P., Freund, Y.: Unsupervised improvement of visual detectors using co-training. In: Proc. 9th Int. Conf. Computer Vision, pp. 626–633 (2003)Google Scholar
  18. 18.
    Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learning 39(2/3), 103–134 (2000)MATHCrossRefGoogle Scholar
  19. 19.
    Park, S.-B., Zhang, B.-T.: Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Information Processing and Management 40(3), 421–439 (2004)CrossRefGoogle Scholar
  20. 20.
    Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory 44(5), 1926–1940 (1998)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)MATHCrossRefGoogle Scholar
  22. 22.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc, Chichester (1998)MATHGoogle Scholar
  23. 23.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. Meeting of the Association for Computational Linguistics, 189–196 (1995)Google Scholar
  24. 24.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proc. ICML, pp. 912–912 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Maria-Florina Balcan
    • 1
  • Avrim Blum
    • 1
  1. 1.Computer Science DepartmentCarnegie Mellon University 

Personalised recommendations