Safe semi-supervised learning: a brief introduction

Li, Yu-Feng; Liang, De-Ming

doi:10.1007/s11704-019-8452-2

Safe semi-supervised learning: a brief introduction

Review Article
Published: 18 June 2019

Volume 13, pages 669–676, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yu-Feng Li^1,2 &
De-Ming Liang^1,2

1101 Accesses
72 Citations
1 Altmetric
Explore all metrics

Abstract

Semi-supervised learning constructs the predictive model by learning from a few labeled training examples and a large pool of unlabeled ones. It has a wide range of application scenarios and has attracted much attention in the past decades. However, it is noteworthy that although the learning performance is expected to be improved by exploiting unlabeled data, some empirical studies show that there are situations where the use of unlabeled data may degenerate the performance. Thus, it is advisable to be able to exploit unlabeled data safely. This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality; model uncertainty, where the learning algorithm fails to handle the uncertainty during training; measure diversity, where the safe performance could be adapted to diverse measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive safety degree-based safe semi-supervised learning

Article 24 January 2018

A risk degree-based safe semi-supervised learning algorithm

Article 30 August 2015

LaRW: boosting open-set semi-supervised learning with label-guided re-weighting

Article 20 October 2023

References

Goodfellow I, Bengio Y, Courville A. Deep Learning. MA: MIT Press, 2016
MATH Google Scholar
Chapelle O, Schölkopf B, Zien A. Semi-supervised Learning. MA: MIT Press, 2006
Book Google Scholar
Miller D J, Uyar H S. A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Proceedings of the 10th Annual Conference on Neural Information Processing Systems. 1996, 571–577
Nigam K, McCallum A, Thrun S, Mitchell T M. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2–3): 103–134
Article MATH Google Scholar
Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209
Bennett K P, Demiriz A. Semi-supervised support vector machines. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1998, 368–374
Zhu X J, Ghahramani Z, Lafferty J D. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 912–919
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7(Nov): 2399–2434
MathSciNet MATH Google Scholar
Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26
Liu W, Wang J, Chang S F. Robust and scalable graph-based semi-supervised learning. Proceedings of the IEEE, 2012, 100(9): 26242638
Article Google Scholar
Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321–328
Blum A, Mitchell T M. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100
Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529–1541
Article Google Scholar
Singh A, Nowak R D, Zhu X. Unlabeled data: now it helps, now it doesn’ t. In: Proceedings of the 21st International Conference on Neural Information Processing Systems. 2008, 1513–1520
Yang T, Priebe C E. The effect of model misspecification on semi-supervised classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(10): 2093–2103
Article Google Scholar
Chapelle O, Sindhwani V, Keerthi S S. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research, 2008, 9: 203–233
MATH Google Scholar
Chawla N V, Karakoulas G I. Learning from labeled and unlabeled data: an empirical study across techniques and domains. Journal of Artificial Intelligence Research, 2005, 23: 331–366
Article MATH Google Scholar
Chen K, Wang S. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129–143
Article Google Scholar
Cozman F G, Cohen I, Cirelo M C. Semi-supervised learning of mixture models. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 99–106
Grandvalet Y. Semi-supervised learning by entropy minimization. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 529–536
Cozman F G, Cohen I, Cirelo M. Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference. 2002, 327–331
Li Y F, Wang S B, Zhou Z H. Graph quality judgement: a large margin expedition. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1725–1731
Wang H, Wang S B, Li Y F. Instance selection method for improving graph-based semi-supervised learning. Frontiers of Computer Science, 2018, 12(4): 725–735
Article Google Scholar
Li Y F, Zhou Z H. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 386–391
Li Y F, Zhou Z H. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175–188
Article Google Scholar
Li Y F, Kwok J T, Zhou Z H. Towards safe semi-supervised learning for multivariate performance measures. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1816–1822
Jebara T, Wang J, Chang S F. Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 441–448
Carreira-Perpi nán M Á, Zemel R S. Proximity graphs for clustering and manifold learning. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 225–232
Zhu X. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2006, 2(3): 4
Google Scholar
Wang F, Zhang C. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55–67
Article Google Scholar
Belkin M, Niyogi P. Semi-supervised learning on riemannian manifolds. Machine Learning, 2004, 56(1–3): 209–239
Article MATH Google Scholar
Karlen M, Weston J, Erkan A, Collobert R. Large scale manifold transduction. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 448–455
Liang D M, Li Y F. Learning safe graph construction from multiple graphs. In: Proceedings of the International CCF Conference on Artificial Intelligence. 2018, 41–54
Guo L Z, Wang S B, Li Y F. Large margin graph construction for semi-supervised learning. In: Proceedings of the International Workshop on Large Scale Graph Representation Learning and Applications. 2018, 1030–1033
Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems, 2010, 24(3): 415–439
Article MathSciNet Google Scholar
Li Y F, Zhou Z H. Towards making unlabeled data never hurt. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1081–1088
Wang Y, Chen S, Zhou Z H. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(5): 689–702
Article Google Scholar
Wang Y, Meng Y, Fu Z, Xue H. Towards safe semi-supervised classification: adjusted cluster assumption via clustering. Neural Processing Letters, 2017, 46(3): 1031–1042
Article Google Scholar
Balsubramani A, Freund Y. Optimally combining classifiers using unlabeled data. In: Proceedings of the 28th Conference on Learning Theory. 2015, 211–225
Niu G, Plessis d M C, Sakai T, Ma Y, Sugiyama M. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 1207–1215
Kawakita M, Takeuchi J. Safe semi-supervised learning based on weighted likelihood. Neural Networks, 2014, 53: 146–164
Article MATH Google Scholar
Li Y F, Zha H W, Zhou Z H. Learning safe prediction for semi-supervised regression. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2217–2223
Zhou Z H. A brief introduction to weakly supervised learning. National Science Review, 2017, 5(1): 44–53
Article Google Scholar
Frénay B, Verleysen M. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 845–869
Article MATH Google Scholar
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359
Article Google Scholar
Guo L Z, Li Y F. A general formulation for safely exploiting weakly supervised data. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 3126–3133
Wei T, Guo L Z, Li Y F, Gao W. Learning safe multi-label prediction for weakly labeled data. Machine Learning, 2018, 107(4): 703–725
Article MathSciNet MATH Google Scholar
Wei T, Li Y F. Does tail label help for large-scale multi-label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2847–2853
Wei T, Li Y F. Learning from semi-supervised weak-label data. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019
Li F, Qian Y, Wang J, Dang C, Liu B. Cluster’s quality evaluation and selective clustering ensemble. ACM Transactions on Knowledge Discovery from Data, 2018, 12(5): 60
Google Scholar
Qian Y, Li F, Liang J, Liu B, Dang C. Space structure and clustering of categorical data. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2047–2059
Article MathSciNet Google Scholar
Yao Q, Wang M, Chen Y, Dai W, Hu Y Q, Li Y F, Tu W W, Yang Q, Yu Y. Taking human out of learning applications: a survey on automated machine learning. 2018, arXiv preprint arXiv: 1810.13306
Feurer M, Klein A, Eggensperger K, Springenberg J T, Blum M, Hutter F. Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2755–2763
Li Y F, Wang H, Wei T, Tu W W. Towards automated semi-supervised learning. In: Proceedings of the 33rd Conference on Artificial Intelligence. 2019
Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1760–1766
Zhu Y, Ting K M, Zhou Z H. New class adaptation via instance generation in one-pass class incremental learning. In: Proceedings of the IEEE International Conference on Data Mining. 2017, 1207–1212
Zhu Y, Ting K M, Zhou Z H. Multi-label learning with emerging new labels. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(10): 1901–1914
Article Google Scholar

Download references

Acknowledgements

This research was supported by the National Key R&D Program of China (2017YFB1001903), the National Natural Science Foundation of China (Grant No. 61772262) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Yu-Feng Li & De-Ming Liang
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China
Yu-Feng Li & De-Ming Liang

Authors

Yu-Feng Li
View author publications
You can also search for this author in PubMed Google Scholar
De-Ming Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Feng Li.

Additional information

Yu-Feng Li received the BSc and PhD degrees in computer science from Nanjing University, China in 2006 and 2013, respectively. He joined the Department of Computer Science & Technology at Nanjing University as an Assistant Researcher in 2013, and is currently associate professor of the National Key Laboratory for Novel Software Technology, China. He is a member of the LAMDA group. His research interests are mainly in machine learning. Particularly, he is interested in semi-supervised learning, statistical learning and optimization. He has published over 30 papers in toptier journal and conferences such as JMLR, TPAMI, AIJ, ICML, NIPS, AAAI, etc. He is the senior program committee member of top-tier AI conferences such as IJCAI15, IJCAI17, AAAI19, and an editorial board member of machine learning journal special issues. He has received outstanding doctoral dissertation award from China Computer Federation (CCF), outstanding doctoral dissertation award from Jiangsu Province and Microsoft Fellowship Award.

De-Ming Liang received the BSc degree in 2017. He is currently a master student in the Department of Computer Science and Technology at Nanjing University, China. His research interests are mainly in machine learning. Particularly, he is interested in weakly supervised learning.

Electronic supplementary material