Advertisement

TEST

pp 1–24 | Cite as

On semi-supervised learning

  • A. CholaquidisEmail author
  • R. Fraiman
  • M. Sued
Original Paper
  • 10 Downloads

Abstract

Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.

Keywords

Semi-supervised learning Small training sample Consistency 

Mathematics Subject Classification

62G08 68T05 68Q32 

Notes

Acknowledgements

We thank two referees and an associated editor for their constructive comments and insightful suggestions, which have improved the presentation of the present manuscript. We also thank to Damián Scherlis for helpful suggestions.

References

  1. Aaron C, Cholaquidis A, Cuevas A (2017) Stochastic detection of some topological and geometric features. Electron J Stat 11(2):4596–4628.  https://doi.org/10.1214/17-EJS1370 MathSciNetCrossRefzbMATHGoogle Scholar
  2. Abdous B, Theodorescu R (1989) On the strong uniform consistency of a new Kernel density estimator. Metrika 11:177–194MathSciNetCrossRefGoogle Scholar
  3. Agrawala AK (1970) Learning with a probabilistic teacher. IEEE Trans Autom Control 19:716–723MathSciNetzbMATHGoogle Scholar
  4. Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: Seventh IEEE international conference on data mining workshops (ICDMW)Google Scholar
  5. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
  6. Azizyan M, Singh A, Wasserman L (2013) Density-sensitive semisupervised inference. Ann Stat 41(2):751–771MathSciNetCrossRefGoogle Scholar
  7. Belkin M, Niyogi P (2004) Semi-supervised learning on Riemannian manifolds. Mach Learn 56:209–239CrossRefGoogle Scholar
  8. Ben-David S, Lu T, Pal D (2008) Does unlabelled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In: 21st annual conference on learning theory (COLT). Available at http://www.informatik.uni-trier.de/~ley/db/conf/colt/colt2008.html
  9. Chapelle O, Schölkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, CambridgeGoogle Scholar
  10. Chapelle O, Zien A (2005) Semi-supervised classification by low density separation AISTATS, vol 2005, pp 57–64Google Scholar
  11. Cholaquidis A, Cuevas A, Fraiman R (2014) On Poincaré cone property. Ann Stat 42:255–284CrossRefGoogle Scholar
  12. Castelli V, Cover TM (1995) On the exponential value of labeled samples. Pattern Recognit Lett 16(1):105–111CrossRefGoogle Scholar
  13. Castelli V, Cover TM (1996) The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans Inf Theory 42(6):2102–2117MathSciNetCrossRefGoogle Scholar
  14. Cuevas A, Fraiman R (1997) A plug-in approach to support estimation. Ann Stat 25:2300–2312MathSciNetCrossRefGoogle Scholar
  15. Cuevas A, Rodríguez-Casal A (2004) On boundary estimation. Adv Appl Probab 36:340–354MathSciNetCrossRefGoogle Scholar
  16. Cuevas A, Fraiman R, Pateiro-López B (2012) On statistical properties of sets fulfilling rolling-type conditions. Adv Appl Probab 44:311–329MathSciNetCrossRefGoogle Scholar
  17. Erdös P (1945) Some remarks on the measurability of certain sets. Bull Am Math Soc 51:728–731MathSciNetCrossRefGoogle Scholar
  18. Fanty M, Cole R (1991) Spoken letter recognition. In: Lippman RP, Moody J, Touretzky DS (eds) Advances in neural information processing systems, 3. Morgan Kaufmann, San MateoGoogle Scholar
  19. Federer H (1959) Curvature measures. Trans Am Math Soc 93:418–491MathSciNetCrossRefGoogle Scholar
  20. Fralick SC (1967) Learning to recognize patterns without a teacher. IEEE Trans Inf Theory 13:57–64CrossRefGoogle Scholar
  21. Haffari G, Sarkar A (2007) Analysis of semi-supervised learning with the Yarowsky algorithm. In: Proceedings of the 23rd conference on uncertainty in artificial intelligence, UAI 2007, July 19–22, 2007. Vancouver, BCGoogle Scholar
  22. Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML 16Google Scholar
  23. Joachims T (2003) Transductive learning via spectral graph partitioning. In: ICMLGoogle Scholar
  24. Lafferty J, Wasserman L (2008) Statistical analysis of semi-supervised regression. In: Conference in advances in neural information processing systems, pp 801–808Google Scholar
  25. Nadler B, Srebro N, Zhou X (2009) Statistical analysis of semi-supervised learning: the limit of infinite unlabelled data. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. Curran Associates, Inc., pp 1330–1338. http://papers.nips.cc/paper/3652-statistical-analysis-of-semi-supervised-learning-the-limit-of-infinite-unlabelled-data.pdf
  26. Niyogi P (2008) Manifold regularization and semi-supervised learning: some theoretical analyses. Technical Report TR-2008-01, Computer Science Dept., Univ. of Chicago. Available at http://people.cs.uchicago.edu/~niyogi/papersps/ssminimax2.pdf
  27. Rigollet P (2007) Generalized error bound in semi-supervised classification under the cluster assumption. J Mach Learn Res 8:1369–1392 MR2332435MathSciNetzbMATHGoogle Scholar
  28. Scudder HJ (1965) Probability of error of some adaptive patter-recognition machines. IEEE Trans Inf Theory 11:363–371MathSciNetCrossRefGoogle Scholar
  29. Singh A, Nowak RD, Zhu X (2008) Unlabeled data: now it helps, now it doesn’t. Technical report, ECE Dept., Univ. Wisconsin-Madison. Available at www.cs.cmu.edu/~aarti/pubs/SSL_TR.pdf
  30. Sinha K, Belkin M (2009) Semi-supervised learning using sparse eigenfunction bases. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. MIT Press, Cambridge, pp 1687–1695Google Scholar
  31. Vapnik V (1998) Statistical learning theory. Wiley, HobokenzbMATHGoogle Scholar
  32. Wang J, Shen X, Pan W (2007) On transductive support vector machines. Contemp Math 443:7–20MathSciNetCrossRefGoogle Scholar
  33. Zhu X (2008) Semi-supervised learning literature survey. http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html

Copyright information

© Sociedad de Estadística e Investigación Operativa 2019

Authors and Affiliations

  1. 1.Facultad de CienciasUniversidad de la RepúblicaMontevideoUruguay
  2. 2.Instituto de CálculoFacultad de Ciencias Exactas y NaturalesBuenos AiresArgentina

Personalised recommendations