Supervised Learning and Co-training
Co-training under the Conditional Independence Assumption is among the models which demonstrate how radically the need for labeled data can be reduced if a huge amount of unlabeled data is available. In this paper, we explore how much credit for this saving must be assigned solely to the extra-assumptions underlying the Co-training model. To this end, we compute general (almost tight) upper and lower bounds on the sample size needed to achieve the success criterion of PAC-learning within the model of Co-training under the Conditional Independence Assumption in a purely supervised setting. The upper bounds lie significantly below the lower bounds for PAC-learning without Co-training. Thus, Co-training saves labeled data even when not combined with unlabeled data. On the other hand, the saving is much less radical than the known savings in the semi-supervised setting.
KeywordsConcept Class Label Data Unlabeled Data Success Criterion Target Concept
Unable to display preview. Download preview PDF.
- 1.Balcan, M.-F., Blum, A.: A discriminative model for semi-supervised learning. Journal of the Association on Computing Machinery 57(3), 19:1–19:46 (2010)Google Scholar
- 2.Balcan, M.-F., Blum, A., Yang, K.: Co-training and expansion: Towards bridging theory and practice. In: Advances in Neural Information Processing Systems, vol. 17, pp. 89–96. MIT Press, Cambridge (2005)Google Scholar
- 3.Ben-David, S., Lu, T., Pál, D.: Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In: Proceedings of the 21st Annual Conference on Learning Theory, pp. 33–44 (2008)Google Scholar
- 4.Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)Google Scholar
- 8.Geréb-Graus, M.: Lower bounds on parallel, distributed and automata computations. PhD thesis, Harvard University Cambridge, MA, USA (1989)Google Scholar
- 9.Hanneke, S.: A bound on the label complexity of agnostic active learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 353–360 (2007)Google Scholar
- 12.Wang, W., Zhou, Z.-H.: A new analysis of co-training. In: ICML, pp. 1135–1142 (2010)Google Scholar