Abstract
Semi-supervised learning constructs the predictive model by learning from a few labeled training examples and a large pool of unlabeled ones. It has a wide range of application scenarios and has attracted much attention in the past decades. However, it is noteworthy that although the learning performance is expected to be improved by exploiting unlabeled data, some empirical studies show that there are situations where the use of unlabeled data may degenerate the performance. Thus, it is advisable to be able to exploit unlabeled data safely. This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality; model uncertainty, where the learning algorithm fails to handle the uncertainty during training; measure diversity, where the safe performance could be adapted to diverse measures.
Similar content being viewed by others
References
Goodfellow I, Bengio Y, Courville A. Deep Learning. MA: MIT Press, 2016
Chapelle O, Schölkopf B, Zien A. Semi-supervised Learning. MA: MIT Press, 2006
Miller D J, Uyar H S. A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Proceedings of the 10th Annual Conference on Neural Information Processing Systems. 1996, 571–577
Nigam K, McCallum A, Thrun S, Mitchell T M. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2–3): 103–134
Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209
Bennett K P, Demiriz A. Semi-supervised support vector machines. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1998, 368–374
Zhu X J, Ghahramani Z, Lafferty J D. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 912–919
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7(Nov): 2399–2434
Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26
Liu W, Wang J, Chang S F. Robust and scalable graph-based semi-supervised learning. Proceedings of the IEEE, 2012, 100(9): 26242638
Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321–328
Blum A, Mitchell T M. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100
Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529–1541
Singh A, Nowak R D, Zhu X. Unlabeled data: now it helps, now it doesn’ t. In: Proceedings of the 21st International Conference on Neural Information Processing Systems. 2008, 1513–1520
Yang T, Priebe C E. The effect of model misspecification on semi-supervised classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(10): 2093–2103
Chapelle O, Sindhwani V, Keerthi S S. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research, 2008, 9: 203–233
Chawla N V, Karakoulas G I. Learning from labeled and unlabeled data: an empirical study across techniques and domains. Journal of Artificial Intelligence Research, 2005, 23: 331–366
Chen K, Wang S. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129–143
Cozman F G, Cohen I, Cirelo M C. Semi-supervised learning of mixture models. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 99–106
Grandvalet Y. Semi-supervised learning by entropy minimization. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 529–536
Cozman F G, Cohen I, Cirelo M. Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference. 2002, 327–331
Li Y F, Wang S B, Zhou Z H. Graph quality judgement: a large margin expedition. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1725–1731
Wang H, Wang S B, Li Y F. Instance selection method for improving graph-based semi-supervised learning. Frontiers of Computer Science, 2018, 12(4): 725–735
Li Y F, Zhou Z H. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 386–391
Li Y F, Zhou Z H. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175–188
Li Y F, Kwok J T, Zhou Z H. Towards safe semi-supervised learning for multivariate performance measures. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1816–1822
Jebara T, Wang J, Chang S F. Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 441–448
Carreira-Perpi nán M Á, Zemel R S. Proximity graphs for clustering and manifold learning. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 225–232
Zhu X. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2006, 2(3): 4
Wang F, Zhang C. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55–67
Belkin M, Niyogi P. Semi-supervised learning on riemannian manifolds. Machine Learning, 2004, 56(1–3): 209–239
Karlen M, Weston J, Erkan A, Collobert R. Large scale manifold transduction. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 448–455
Liang D M, Li Y F. Learning safe graph construction from multiple graphs. In: Proceedings of the International CCF Conference on Artificial Intelligence. 2018, 41–54
Guo L Z, Wang S B, Li Y F. Large margin graph construction for semi-supervised learning. In: Proceedings of the International Workshop on Large Scale Graph Representation Learning and Applications. 2018, 1030–1033
Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems, 2010, 24(3): 415–439
Li Y F, Zhou Z H. Towards making unlabeled data never hurt. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1081–1088
Wang Y, Chen S, Zhou Z H. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(5): 689–702
Wang Y, Meng Y, Fu Z, Xue H. Towards safe semi-supervised classification: adjusted cluster assumption via clustering. Neural Processing Letters, 2017, 46(3): 1031–1042
Balsubramani A, Freund Y. Optimally combining classifiers using unlabeled data. In: Proceedings of the 28th Conference on Learning Theory. 2015, 211–225
Niu G, Plessis d M C, Sakai T, Ma Y, Sugiyama M. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 1207–1215
Kawakita M, Takeuchi J. Safe semi-supervised learning based on weighted likelihood. Neural Networks, 2014, 53: 146–164
Li Y F, Zha H W, Zhou Z H. Learning safe prediction for semi-supervised regression. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2217–2223
Zhou Z H. A brief introduction to weakly supervised learning. National Science Review, 2017, 5(1): 44–53
Frénay B, Verleysen M. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 845–869
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359
Guo L Z, Li Y F. A general formulation for safely exploiting weakly supervised data. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 3126–3133
Wei T, Guo L Z, Li Y F, Gao W. Learning safe multi-label prediction for weakly labeled data. Machine Learning, 2018, 107(4): 703–725
Wei T, Li Y F. Does tail label help for large-scale multi-label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2847–2853
Wei T, Li Y F. Learning from semi-supervised weak-label data. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019
Li F, Qian Y, Wang J, Dang C, Liu B. Cluster’s quality evaluation and selective clustering ensemble. ACM Transactions on Knowledge Discovery from Data, 2018, 12(5): 60
Qian Y, Li F, Liang J, Liu B, Dang C. Space structure and clustering of categorical data. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2047–2059
Yao Q, Wang M, Chen Y, Dai W, Hu Y Q, Li Y F, Tu W W, Yang Q, Yu Y. Taking human out of learning applications: a survey on automated machine learning. 2018, arXiv preprint arXiv: 1810.13306
Feurer M, Klein A, Eggensperger K, Springenberg J T, Blum M, Hutter F. Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2755–2763
Li Y F, Wang H, Wei T, Tu W W. Towards automated semi-supervised learning. In: Proceedings of the 33rd Conference on Artificial Intelligence. 2019
Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1760–1766
Zhu Y, Ting K M, Zhou Z H. New class adaptation via instance generation in one-pass class incremental learning. In: Proceedings of the IEEE International Conference on Data Mining. 2017, 1207–1212
Zhu Y, Ting K M, Zhou Z H. Multi-label learning with emerging new labels. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(10): 1901–1914
Acknowledgements
This research was supported by the National Key R&D Program of China (2017YFB1001903), the National Natural Science Foundation of China (Grant No. 61772262) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Yu-Feng Li received the BSc and PhD degrees in computer science from Nanjing University, China in 2006 and 2013, respectively. He joined the Department of Computer Science & Technology at Nanjing University as an Assistant Researcher in 2013, and is currently associate professor of the National Key Laboratory for Novel Software Technology, China. He is a member of the LAMDA group. His research interests are mainly in machine learning. Particularly, he is interested in semi-supervised learning, statistical learning and optimization. He has published over 30 papers in toptier journal and conferences such as JMLR, TPAMI, AIJ, ICML, NIPS, AAAI, etc. He is the senior program committee member of top-tier AI conferences such as IJCAI15, IJCAI17, AAAI19, and an editorial board member of machine learning journal special issues. He has received outstanding doctoral dissertation award from China Computer Federation (CCF), outstanding doctoral dissertation award from Jiangsu Province and Microsoft Fellowship Award.
De-Ming Liang received the BSc degree in 2017. He is currently a master student in the Department of Computer Science and Technology at Nanjing University, China. His research interests are mainly in machine learning. Particularly, he is interested in weakly supervised learning.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Li, YF., Liang, DM. Safe semi-supervised learning: a brief introduction. Front. Comput. Sci. 13, 669–676 (2019). https://doi.org/10.1007/s11704-019-8452-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-019-8452-2