A PAC-Style Model for Learning from Labeled and Unlabeled Data
There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of different approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PAC-style framework that can be used to model many of these assumptions, and analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C, one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution. In this view, unlabeled data can be helpful because it allows one to estimate compatibility over the space of hypotheses, and reduce the size of the search space to those that, according to one’s assumptions, are a-priori reasonable with respect to the distribution. We discuss a number of technical issues that arise in this context, and provide sample-complexity bounds both for uniform convergence and ε-cover based algorithms. We also consider algorithmic issues, and give an efficient algorithm for a special case of co-training.
KeywordsTarget Function Concept Class Label Data Unlabeled Data Hypothesis Space
Unable to display preview. Download preview PDF.
- 1.Balcan, M.F., Blum, A., Yang, K.: Co-training and expansion: Towards bridging theory and practice. In: NIPS (2004)Google Scholar
- 3.Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proc. ICML, pp. 19–26 (2001)Google Scholar
- 5.Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Annual Conf. Computational Learning Theory, pp. 92–100 (1998)Google Scholar
- 7.Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey of recent advances. Manuscript (2004)Google Scholar
- 12.Dunagan, J., Vempala, S.: Optimal outlier removal in high-dimensional spaces. In: Proceedings of the 33rd ACM Symposium on Theory of Computing (2001)Google Scholar
- 14.Flaxman, A.: Personal communication (2003)Google Scholar
- 15.Hwa, R., Osborne, M., Sarkar, A., Steedman, M.: Corrected co-training for statistical parsers. In: ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, Washington D.C (2003)Google Scholar
- 16.Joachims, T.: Transductive inference for text classification using support vector machines. In: Proc. ICML, pp. 200–209 (1999)Google Scholar
- 17.Levin, A., Viola, P., Freund, Y.: Unsupervised improvement of visual detectors using co-training. In: Proc. 9th Int. Conf. Computer Vision, pp. 626–633 (2003)Google Scholar
- 23.Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. Meeting of the Association for Computational Linguistics, 189–196 (1995)Google Scholar
- 24.Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proc. ICML, pp. 912–912 (2003)Google Scholar