A Refinement Approach to Handling Model Misfit in Semi-supervised Learning

Su, Hanjing; Chen, Ling; Ye, Yunming; Sun, Zhaocai; Wu, Qingyao

doi:10.1007/978-3-642-17313-4_8

Hanjing Su²¹,
Ling Chen²²,
Yunming Ye²¹,
Zhaocai Sun²¹ &
…
Qingyao Wu²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6441))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3050 Accesses
4 Citations

Abstract

Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graph-based algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Selftraining and Tri-training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Fujino, A., Ueda, N., Saito, K.: A hybrid generative/discriminative approach to semi-supervised classifier design. In: The Twentieth National Conference on Artificial Intelligence, pp. 764–769. AAAI Press/ MIT Press, Pennsylvania (2005)
Google Scholar
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: The 18th International Conference on Machine Learning, pp. 16–26. Morgan Kaufmann, MA (2001)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: The 20th International Conference on Machine Learning, pp. 912–919. AAAI Press, Washington (2003)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: The 16th International Conference on Machine Learning, pp. 200–209. Morgan Kaufmann, Slovenia (1999)
Google Scholar
Zhu, X.: Semi-Supervised Learning Literature Survey. Tech. Rep. 1530, University of Wisconsin – Madison, Madison, WI (2006)
Google Scholar
Krishnapuram, B., Williams, D., Xue, Y., Hartemink, A., Carin, L.: On Semi-supervised Classification. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, MIT Press, Cambridge (2005)
Google Scholar
Demiriz, A., Bennett, K., Embrechts, M.: Semi-supervised clustering using genetic algorithms. The Artificial Neural Networks in Engineering (1999)
Google Scholar
Dara, R., Kremer, S., Stacey, D.: Clustering unlabeled data with SOMs improves classification of labeled real-world data. In: Proceedings of the World Congress on Computational Intelligence, Honolulu (2002)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: The 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196. Morgan Kaufmann Publishers, Massachusetts (1995)
Chapter Google Scholar
Avrim, B., Mitchell, T.M.: Combining Labeled and Unlabeled Data with Co-Training. In: The 11th annual conference on Computational learning Theory, pp. 92–100. Morgan Kaufmann Publishers, San Francisco (1998)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transaction on Knowledge and Data Engineering 17, 1529–1541 (2005)
Article Google Scholar
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning Techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Information Systems 37, 1088–1098 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China
Hanjing Su, Yunming Ye, Zhaocai Sun & Qingyao Wu
QCIS, Faculty of Engineering and Information Technology, University of Technology, Sydney, 2007, Australia
Ling Chen

Authors

Hanjing Su
View author publications
You can also search for this author in PubMed Google Scholar
Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunming Ye
View author publications
You can also search for this author in PubMed Google Scholar
Zhaocai Sun
View author publications
You can also search for this author in PubMed Google Scholar
Qingyao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, University of Technology Sydney, 2007, Sydney, NSW, Australia
Longbing Cao
College of Computer Science, Chongqing University, 400030, Chongqing, China
Jiang Zhong & Yong Feng &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, H., Chen, L., Ye, Y., Sun, Z., Wu, Q. (2010). A Refinement Approach to Handling Model Misfit in Semi-supervised Learning. In: Cao, L., Zhong, J., Feng, Y. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17313-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-17313-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17312-7
Online ISBN: 978-3-642-17313-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics