Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Semi-supervised Learning

  • Sugato Basu
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_609

Synonyms

Semi-supervised classification

Definition

In machine learning and data mining, supervised algorithms (e.g., classification) typically learn a model for predicting an output variable (e.g., class label for classification) from some supervised training data (e.g., data instances annotated with both features and class labels). These algorithms use various techniques of increasing the accuracy of predicting the training data labels, by minimizing a loss function that measures the prediction error on the training data. They also use different regularization methods to ensure that the model does not overtrain on the training data, thereby having good prediction performance on unseen test data.

In semi-supervised learning, unlabeled data (i.e., data instances with only features) are used along with the labeled training data, in an effort to improve the accuracy of the models on the training data as well as provide better generalization performance on unseen data. This paradigm is...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Belkin M, Niyogi P. Semi-supervised learning on manifolds. Technical Report, The University of Chicago, TR-2002-12. 2002.Google Scholar
  2. 2.
    Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory; 1998. p. 92–100.Google Scholar
  3. 3.
    Chapelle O, Schölkopf B, Zien A, editors. Semi-supervised learning. Cambridge: MIT Press; 2006.Google Scholar
  4. 4.
    Collins M, Singer Y. Unsupervised models for named entity classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and Very Large Corpora; 1999.Google Scholar
  5. 5.
    Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B. 1977;39(1):1–38.MathSciNetzbMATHGoogle Scholar
  6. 6.
    Hosmer Jr DW. A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics. 1973;29(4):761–70.CrossRefGoogle Scholar
  7. 7.
    Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning; 1999. p. 200–9.Google Scholar
  8. 8.
    Nigam K, McCallum A, Thrun S, Mitchell T. Learning to classify text from labeled and unlabeled documents. In: Proceedings of the 11th National Conference on AI; 1998. p. 792–9.Google Scholar
  9. 9.
    Ratsaby J, Venkatesh SS. Learning from a mixture of labeled and unlabeled examples with parametric side information. In: Proceedings of the 8th Annual Conference on Computational Learning Theory; 1995. p. 412–7.Google Scholar
  10. 10.
    Scudder HJ. Probability of error of some adaptive pattern-recognition machines. IEEE Trans Inf Theory. 1965;11(3):363–71.MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Seeger M. Learning with labeled and unlabeled data. Technical Report, Edinburgh University; 2001.Google Scholar
  12. 12.
    Vapnik VN, Chervonenkis A. Theory of pattern recognition [in Russian]. Nauka/Moscow; 1974.Google Scholar
  13. 13.
    Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics; 1995. p. 189–96.Google Scholar
  14. 14.
    Zhu X. Semi-supervised learning literature survey. Computer Sciences Technical Report TR 1530, University of Wisconsin Madison; 2006.Google Scholar
  15. 15.
    Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning; 2003.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Google IncMountain ViewUSA

Section editors and affiliations

  • Dimitrios Gunopulos
    • 1
  1. 1.Department of Computer Science and EngineeringThe University of California at Riverside, Bourns College of EngineeringRiversideUSA