Advertisement

Frontiers of Computer Science

, Volume 12, Issue 4, pp 725–735 | Cite as

Instance selection method for improving graph-based semi-supervised learning

  • Hai Wang
  • Shao-Bo Wang
  • Yu-Feng LiEmail author
Research Article

Abstract

Graph-based semi-supervised learning is an important semi-supervised learning paradigm. Although graph-based semi-supervised learning methods have been shown to be helpful in various situations, they may adversely affect performance when using unlabeled data. In this paper, we propose a new graph-based semi-supervised learning method based on instance selection in order to reduce the chances of performance degeneration. Our basic idea is that given a set of unlabeled instances, it is not the best approach to exploit all the unlabeled instances; instead, we should exploit the unlabeled instances that are highly likely to help improve the performance, while not taking into account the ones with high risk. We develop both transductive and inductive variants of our method. Experiments on a broad range of data sets show that the chances of performance degeneration of our proposed method are much smaller than those of many state-of-the-art graph-based semi-supervised learning methods.

Keywords

graph-based semi-supervised learning performance degeneration instance selection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The authors want to thank the associate editors and reviewers for helpful comments and suggestions. This research was partially supported by the National Natural Science Foundation of China (Grant No. 61403186), Jiangsu Science Foundation (BK20140613) and MSRA research fund.

Supplementary material

11704_2017_6543_MOESM1_ESM.ppt (971 kb)
Instance selection method for improving graph-based semi-supervised learning

References

  1. 1.
    Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2004, 321–328Google Scholar
  2. 2.
    Zhu X. Semi-supervised learning literature survey. Technical Report, 2007Google Scholar
  3. 3.
    Zhu X, Goldberg A B. Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 3(1): 1–130CrossRefzbMATHGoogle Scholar
  4. 4.
    Chapelle O, Schölkopf B, Zien A. Semi-Supervised Learning. Cambridge: MIT Press, 2006CrossRefGoogle Scholar
  5. 5.
    Blum A, Mitchell T. Combining labeled and unlabeled data with cotraining. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100Google Scholar
  6. 6.
    Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209Google Scholar
  7. 7.
    Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine learning. 2003, 912–919Google Scholar
  8. 8.
    Zhu X, Lafferty J, Rosenfeld R. Semi-supervised learning with graphs. Dissertation for the Doctoral Degree. Pittsburgh: CarnegieMellon University, 2005Google Scholar
  9. 9.
    Cai X F, Wen G H, Wei J, Yu Z W. Relative manifold based semisupervised dimensionality reduction. Frontiers of Computer Science, 2014, 8(6): 923–932CrossRefGoogle Scholar
  10. 10.
    Liu W, Wang J, Chang S F. Robust and scalable graph-based semisupervised learning. Proceedings of the IEEE, 2012, 100(9): 2624–2638CrossRefGoogle Scholar
  11. 11.
    Joachims T. Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 290–297Google Scholar
  12. 12.
    Zha Z J, Mei T, Wang J, Wang Z, Hua X S. Graph-based semisupervised learning with multiple labels. Journal of Visual Communication and Image Representation, 2009, 20(2): 97–103CrossRefGoogle Scholar
  13. 13.
    Camps-Valls G, Marsheva T V B, Zhou D. Semi-supervised graphbased hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2007, 45(10): 3044–3054CrossRefGoogle Scholar
  14. 14.
    Belkin M, Niyogi P. Semi-supervised learning on riemannian manifolds. Machine Learning, 2004, 56(1–3): 209–239CrossRefzbMATHGoogle Scholar
  15. 15.
    Karlen M, Weston J, Erkan A, Collobert R. Large scale manifold transduction. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 775–782Google Scholar
  16. 16.
    Wang F, Zhang C. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55–67CrossRefGoogle Scholar
  17. 17.
    Li Y F, Wang S B, Zhou Z H. Graph Quality Judgement: a large margin expedition. In: Proceedings of the 25th International Joint Confernece on Artificial Intelligence. 2016, 1725–1731Google Scholar
  18. 18.
    Li Y F, Zhou Z H. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175–188CrossRefGoogle Scholar
  19. 19.
    Li Y F, Kwok J T, Zhou Z H. Towards safe semi-supervised learning for multivariate performance measures. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1816–1822Google Scholar
  20. 20.
    Balsubramani A, Freund Y. Optimally Combining Classifiers Using Unlabeled Data. In: Proceedings of the 28th International Conference On Learning Theory. 2015, 211–225Google Scholar
  21. 21.
    Bennett K P, Demiriz A. Semi-supervised support vector machines. In: Proceedings of the Conference on Advances in Neural Information Processing Systems II. 1999, 368–374Google Scholar
  22. 22.
    Li Y F, Kwok J T, Zhou Z H. Semi-supervised learning using label mean. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 633–640Google Scholar
  23. 23.
    Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26Google Scholar
  24. 24.
    Chapelle O, Weston J, Schölkopf B. Cluster kernels for semisupervised learning. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. 2003, 601–608Google Scholar
  25. 25.
    Szummer M, Jaakkola T. Partially labeled classification with Markov random walks. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2002, 945–952Google Scholar
  26. 26.
    Kemp C, Griffiths T L, Stromsten S, Tenenbaum J B. Semi-supervised learning with trees. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2004, 257–264Google Scholar
  27. 27.
    Wang H, Wang S B, Li Y F. Instance Selection Method for Improving Graph-based Semi-Supervised Learning. In: Proceedings of the 14th Pacific Rim International Conference on Artificial Intelligence. 2016, 565–573Google Scholar
  28. 28.
    Jebara T, Wang J, Chang S F. Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 441–448Google Scholar
  29. 29.
    Belkin M, Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2002, 585–591Google Scholar
  30. 30.
    Kuncheva L I, Whitaker C J, Shipp C A, Duin R P. Limits on the majority vote accuracy in classifier fusion. Pattern Analysis and Applications, 2003, 6(1): 22–31MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Delalleau O, Bengio Y, Roux N L. Efficient Non-Parametric Function Induction in Semi-Supervised Learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 96–103Google Scholar
  32. 32.
    Li Y F, Zhou Z H. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 386–391Google Scholar
  33. 33.
    Yang Y, Nie F P, Xu D, Luo J B. Zhuang Y T, Pan Y H. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 723–742CrossRefGoogle Scholar
  34. 34.
    Yang Y, Ma Z G, Nie F P, Chang X J, Hauptmann A G. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 2015, 113(2): 113–127MathSciNetCrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Collaborative Innovation Center of Novel Software Technology and IndustrializationNanjingChina

Personalised recommendations