## Abstract

Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.

## Keywords

Semi-supervised learning Small training sample Consistency## Mathematics Subject Classification

62G08 68T05 68Q32## Notes

### Acknowledgements

We thank two referees and an associated editor for their constructive comments and insightful suggestions, which have improved the presentation of the present manuscript. We also thank to Damián Scherlis for helpful suggestions.

## References

- Aaron C, Cholaquidis A, Cuevas A (2017) Stochastic detection of some topological and geometric features. Electron J Stat 11(2):4596–4628. https://doi.org/10.1214/17-EJS1370 MathSciNetCrossRefzbMATHGoogle Scholar
- Abdous B, Theodorescu R (1989) On the strong uniform consistency of a new Kernel density estimator. Metrika 11:177–194MathSciNetCrossRefGoogle Scholar
- Agrawala AK (1970) Learning with a probabilistic teacher. IEEE Trans Autom Control 19:716–723MathSciNetzbMATHGoogle Scholar
- Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: Seventh IEEE international conference on data mining workshops (ICDMW)Google Scholar
- Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
- Azizyan M, Singh A, Wasserman L (2013) Density-sensitive semisupervised inference. Ann Stat 41(2):751–771MathSciNetCrossRefGoogle Scholar
- Belkin M, Niyogi P (2004) Semi-supervised learning on Riemannian manifolds. Mach Learn 56:209–239CrossRefGoogle Scholar
- Ben-David S, Lu T, Pal D (2008) Does unlabelled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In: 21st annual conference on learning theory (COLT). Available at http://www.informatik.uni-trier.de/~ley/db/conf/colt/colt2008.html
- Chapelle O, Schölkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, CambridgeGoogle Scholar
- Chapelle O, Zien A (2005) Semi-supervised classification by low density separation AISTATS, vol 2005, pp 57–64Google Scholar
- Cholaquidis A, Cuevas A, Fraiman R (2014) On Poincaré cone property. Ann Stat 42:255–284CrossRefGoogle Scholar
- Castelli V, Cover TM (1995) On the exponential value of labeled samples. Pattern Recognit Lett 16(1):105–111CrossRefGoogle Scholar
- Castelli V, Cover TM (1996) The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans Inf Theory 42(6):2102–2117MathSciNetCrossRefGoogle Scholar
- Cuevas A, Fraiman R (1997) A plug-in approach to support estimation. Ann Stat 25:2300–2312MathSciNetCrossRefGoogle Scholar
- Cuevas A, Rodríguez-Casal A (2004) On boundary estimation. Adv Appl Probab 36:340–354MathSciNetCrossRefGoogle Scholar
- Cuevas A, Fraiman R, Pateiro-López B (2012) On statistical properties of sets fulfilling rolling-type conditions. Adv Appl Probab 44:311–329MathSciNetCrossRefGoogle Scholar
- Erdös P (1945) Some remarks on the measurability of certain sets. Bull Am Math Soc 51:728–731MathSciNetCrossRefGoogle Scholar
- Fanty M, Cole R (1991) Spoken letter recognition. In: Lippman RP, Moody J, Touretzky DS (eds) Advances in neural information processing systems, 3. Morgan Kaufmann, San MateoGoogle Scholar
- Federer H (1959) Curvature measures. Trans Am Math Soc 93:418–491MathSciNetCrossRefGoogle Scholar
- Fralick SC (1967) Learning to recognize patterns without a teacher. IEEE Trans Inf Theory 13:57–64CrossRefGoogle Scholar
- Haffari G, Sarkar A (2007) Analysis of semi-supervised learning with the Yarowsky algorithm. In: Proceedings of the 23rd conference on uncertainty in artificial intelligence, UAI 2007, July 19–22, 2007. Vancouver, BCGoogle Scholar
- Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML 16Google Scholar
- Joachims T (2003) Transductive learning via spectral graph partitioning. In: ICMLGoogle Scholar
- Lafferty J, Wasserman L (2008) Statistical analysis of semi-supervised regression. In: Conference in advances in neural information processing systems, pp 801–808Google Scholar
- Nadler B, Srebro N, Zhou X (2009) Statistical analysis of semi-supervised learning: the limit of infinite unlabelled data. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. Curran Associates, Inc., pp 1330–1338. http://papers.nips.cc/paper/3652-statistical-analysis-of-semi-supervised-learning-the-limit-of-infinite-unlabelled-data.pdf
- Niyogi P (2008) Manifold regularization and semi-supervised learning: some theoretical analyses. Technical Report TR-2008-01, Computer Science Dept., Univ. of Chicago. Available at http://people.cs.uchicago.edu/~niyogi/papersps/ssminimax2.pdf
- Rigollet P (2007) Generalized error bound in semi-supervised classification under the cluster assumption. J Mach Learn Res 8:1369–1392 MR2332435MathSciNetzbMATHGoogle Scholar
- Scudder HJ (1965) Probability of error of some adaptive patter-recognition machines. IEEE Trans Inf Theory 11:363–371MathSciNetCrossRefGoogle Scholar
- Singh A, Nowak RD, Zhu X (2008) Unlabeled data: now it helps, now it doesn’t. Technical report, ECE Dept., Univ. Wisconsin-Madison. Available at www.cs.cmu.edu/~aarti/pubs/SSL_TR.pdf
- Sinha K, Belkin M (2009) Semi-supervised learning using sparse eigenfunction bases. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. MIT Press, Cambridge, pp 1687–1695Google Scholar
- Vapnik V (1998) Statistical learning theory. Wiley, HobokenzbMATHGoogle Scholar
- Wang J, Shen X, Pan W (2007) On transductive support vector machines. Contemp Math 443:7–20MathSciNetCrossRefGoogle Scholar
- Zhu X (2008) Semi-supervised learning literature survey. http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html