Skip to main content
Log in

Safe semi-supervised learning: a brief introduction

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Semi-supervised learning constructs the predictive model by learning from a few labeled training examples and a large pool of unlabeled ones. It has a wide range of application scenarios and has attracted much attention in the past decades. However, it is noteworthy that although the learning performance is expected to be improved by exploiting unlabeled data, some empirical studies show that there are situations where the use of unlabeled data may degenerate the performance. Thus, it is advisable to be able to exploit unlabeled data safely. This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality; model uncertainty, where the learning algorithm fails to handle the uncertainty during training; measure diversity, where the safe performance could be adapted to diverse measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Goodfellow I, Bengio Y, Courville A. Deep Learning. MA: MIT Press, 2016

    MATH  Google Scholar 

  2. Chapelle O, Schölkopf B, Zien A. Semi-supervised Learning. MA: MIT Press, 2006

    Book  Google Scholar 

  3. Miller D J, Uyar H S. A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Proceedings of the 10th Annual Conference on Neural Information Processing Systems. 1996, 571–577

  4. Nigam K, McCallum A, Thrun S, Mitchell T M. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2–3): 103–134

    Article  MATH  Google Scholar 

  5. Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209

  6. Bennett K P, Demiriz A. Semi-supervised support vector machines. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1998, 368–374

  7. Zhu X J, Ghahramani Z, Lafferty J D. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 912–919

  8. Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7(Nov): 2399–2434

    MathSciNet  MATH  Google Scholar 

  9. Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26

  10. Liu W, Wang J, Chang S F. Robust and scalable graph-based semi-supervised learning. Proceedings of the IEEE, 2012, 100(9): 26242638

    Article  Google Scholar 

  11. Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321–328

  12. Blum A, Mitchell T M. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100

  13. Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529–1541

    Article  Google Scholar 

  14. Singh A, Nowak R D, Zhu X. Unlabeled data: now it helps, now it doesn’ t. In: Proceedings of the 21st International Conference on Neural Information Processing Systems. 2008, 1513–1520

  15. Yang T, Priebe C E. The effect of model misspecification on semi-supervised classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(10): 2093–2103

    Article  Google Scholar 

  16. Chapelle O, Sindhwani V, Keerthi S S. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research, 2008, 9: 203–233

    MATH  Google Scholar 

  17. Chawla N V, Karakoulas G I. Learning from labeled and unlabeled data: an empirical study across techniques and domains. Journal of Artificial Intelligence Research, 2005, 23: 331–366

    Article  MATH  Google Scholar 

  18. Chen K, Wang S. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129–143

    Article  Google Scholar 

  19. Cozman F G, Cohen I, Cirelo M C. Semi-supervised learning of mixture models. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 99–106

  20. Grandvalet Y. Semi-supervised learning by entropy minimization. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 529–536

  21. Cozman F G, Cohen I, Cirelo M. Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference. 2002, 327–331

  22. Li Y F, Wang S B, Zhou Z H. Graph quality judgement: a large margin expedition. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1725–1731

  23. Wang H, Wang S B, Li Y F. Instance selection method for improving graph-based semi-supervised learning. Frontiers of Computer Science, 2018, 12(4): 725–735

    Article  Google Scholar 

  24. Li Y F, Zhou Z H. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 386–391

  25. Li Y F, Zhou Z H. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175–188

    Article  Google Scholar 

  26. Li Y F, Kwok J T, Zhou Z H. Towards safe semi-supervised learning for multivariate performance measures. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1816–1822

  27. Jebara T, Wang J, Chang S F. Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 441–448

  28. Carreira-Perpi nán M Á, Zemel R S. Proximity graphs for clustering and manifold learning. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 225–232

  29. Zhu X. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2006, 2(3): 4

    Google Scholar 

  30. Wang F, Zhang C. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55–67

    Article  Google Scholar 

  31. Belkin M, Niyogi P. Semi-supervised learning on riemannian manifolds. Machine Learning, 2004, 56(1–3): 209–239

    Article  MATH  Google Scholar 

  32. Karlen M, Weston J, Erkan A, Collobert R. Large scale manifold transduction. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 448–455

  33. Liang D M, Li Y F. Learning safe graph construction from multiple graphs. In: Proceedings of the International CCF Conference on Artificial Intelligence. 2018, 41–54

  34. Guo L Z, Wang S B, Li Y F. Large margin graph construction for semi-supervised learning. In: Proceedings of the International Workshop on Large Scale Graph Representation Learning and Applications. 2018, 1030–1033

  35. Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems, 2010, 24(3): 415–439

    Article  MathSciNet  Google Scholar 

  36. Li Y F, Zhou Z H. Towards making unlabeled data never hurt. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1081–1088

  37. Wang Y, Chen S, Zhou Z H. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(5): 689–702

    Article  Google Scholar 

  38. Wang Y, Meng Y, Fu Z, Xue H. Towards safe semi-supervised classification: adjusted cluster assumption via clustering. Neural Processing Letters, 2017, 46(3): 1031–1042

    Article  Google Scholar 

  39. Balsubramani A, Freund Y. Optimally combining classifiers using unlabeled data. In: Proceedings of the 28th Conference on Learning Theory. 2015, 211–225

  40. Niu G, Plessis d M C, Sakai T, Ma Y, Sugiyama M. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 1207–1215

  41. Kawakita M, Takeuchi J. Safe semi-supervised learning based on weighted likelihood. Neural Networks, 2014, 53: 146–164

    Article  MATH  Google Scholar 

  42. Li Y F, Zha H W, Zhou Z H. Learning safe prediction for semi-supervised regression. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2217–2223

  43. Zhou Z H. A brief introduction to weakly supervised learning. National Science Review, 2017, 5(1): 44–53

    Article  Google Scholar 

  44. Frénay B, Verleysen M. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 845–869

    Article  MATH  Google Scholar 

  45. Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359

    Article  Google Scholar 

  46. Guo L Z, Li Y F. A general formulation for safely exploiting weakly supervised data. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 3126–3133

  47. Wei T, Guo L Z, Li Y F, Gao W. Learning safe multi-label prediction for weakly labeled data. Machine Learning, 2018, 107(4): 703–725

    Article  MathSciNet  MATH  Google Scholar 

  48. Wei T, Li Y F. Does tail label help for large-scale multi-label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2847–2853

  49. Wei T, Li Y F. Learning from semi-supervised weak-label data. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019

  50. Li F, Qian Y, Wang J, Dang C, Liu B. Cluster’s quality evaluation and selective clustering ensemble. ACM Transactions on Knowledge Discovery from Data, 2018, 12(5): 60

    Google Scholar 

  51. Qian Y, Li F, Liang J, Liu B, Dang C. Space structure and clustering of categorical data. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2047–2059

    Article  MathSciNet  Google Scholar 

  52. Yao Q, Wang M, Chen Y, Dai W, Hu Y Q, Li Y F, Tu W W, Yang Q, Yu Y. Taking human out of learning applications: a survey on automated machine learning. 2018, arXiv preprint arXiv: 1810.13306

  53. Feurer M, Klein A, Eggensperger K, Springenberg J T, Blum M, Hutter F. Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2755–2763

  54. Li Y F, Wang H, Wei T, Tu W W. Towards automated semi-supervised learning. In: Proceedings of the 33rd Conference on Artificial Intelligence. 2019

  55. Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1760–1766

  56. Zhu Y, Ting K M, Zhou Z H. New class adaptation via instance generation in one-pass class incremental learning. In: Proceedings of the IEEE International Conference on Data Mining. 2017, 1207–1212

  57. Zhu Y, Ting K M, Zhou Z H. Multi-label learning with emerging new labels. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(10): 1901–1914

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Key R&D Program of China (2017YFB1001903), the National Natural Science Foundation of China (Grant No. 61772262) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Feng Li.

Additional information

Yu-Feng Li received the BSc and PhD degrees in computer science from Nanjing University, China in 2006 and 2013, respectively. He joined the Department of Computer Science & Technology at Nanjing University as an Assistant Researcher in 2013, and is currently associate professor of the National Key Laboratory for Novel Software Technology, China. He is a member of the LAMDA group. His research interests are mainly in machine learning. Particularly, he is interested in semi-supervised learning, statistical learning and optimization. He has published over 30 papers in toptier journal and conferences such as JMLR, TPAMI, AIJ, ICML, NIPS, AAAI, etc. He is the senior program committee member of top-tier AI conferences such as IJCAI15, IJCAI17, AAAI19, and an editorial board member of machine learning journal special issues. He has received outstanding doctoral dissertation award from China Computer Federation (CCF), outstanding doctoral dissertation award from Jiangsu Province and Microsoft Fellowship Award.

De-Ming Liang received the BSc degree in 2017. He is currently a master student in the Department of Computer Science and Technology at Nanjing University, China. His research interests are mainly in machine learning. Particularly, he is interested in weakly supervised learning.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, YF., Liang, DM. Safe semi-supervised learning: a brief introduction. Front. Comput. Sci. 13, 669–676 (2019). https://doi.org/10.1007/s11704-019-8452-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-019-8452-2

Keywords

Navigation