SETRED: Self-training with Editing

Li, Ming; Zhou, Zhi-Hua

doi:10.1007/11430919_71

Ming Li²¹ &
Zhi-Hua Zhou²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2843 Accesses
70 Citations

Abstract

Self-training is a semi-supervised learning algorithm in which a learner keeps on labeling unlabeled examples and retraining itself on an enlarged labeled training set. Since the self-training process may erroneously label some unlabeled examples, sometimes the learned hypothesis does not perform well. In this paper, a new algorithm named Setred is proposed, which utilizes a specific data editing method to identify and remove the mislabeled examples from the self-labeled data. In detail, in each iteration of the self-training process, the local cut edge weight statistic is used to help estimate whether a newly labeled example is reliable or not, and only the reliable self-labeled examples are used to enlarge the labeled training set. Experiments show that the introduction of data editing is beneficial, and the learned hypotheses of Setred outperform those learned by the standard self-training algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, New York, NY, pp. 92–100 (1998)
Google Scholar
Cohen, I., Cozman, F.G., Sebe, N., Cirelo, M.C., Huang, T.S.: Semisupervised learning of classifier: theory, algorithms, and their application to human-computer interaction. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1553–1567 (2004)
Article Google Scholar
Jiang, Y., Zhou, Z.-H.: Editing training data for kNN classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004)
Chapter Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, San Francisco, CA, pp. 200–209 (1999)
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 3–12 (1994)
Google Scholar
McCallum, A., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, Madison, WI, pp. 359–367 (1998)
Google Scholar
Muhlenbach, F., Lallich, S., Zighed, D.A.: Identifying and handling mislabelled instances. Journal of Intelligent Information Systems 39, 89–109 (2004)
Article Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proceeding of the 17th International Conference on Machine Learning, Stanford, CA, pp. 621–626 (2000)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Active + semi-supervised learning = robust multi-view learning. In: Proceeding of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 435–442 (2002)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicabilbity of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management, Washington, DC, pp. 86–93 (2000)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Sarkar, A.: Applying co-training methods to statistical parsing. In: Proceedings of the 2nd Annual Meeting of the North American Chapter of the Association of Computational Linguistics, Pittsburgh, PA, pp. 95–102 (2001)
Google Scholar
Seeger, M.: Learning with labeled and unlabeled data. Technical Report, University of Edinburgh, Edinburgh, UK (2001)
Google Scholar
Seuong, H., Opper, M., Sompolinski, H.: Query by committee. In: Proceedings of the 5th ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 287–294 (1992)
Google Scholar
Wilson, D.R.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2, 408–421 (1972)
Article MATH Google Scholar
Zhou, Z.-H., Chen, K.-J., Jiang, Y.: Exploiting unlabeled data in content-based image retrieval. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 525–536. Springer, Heidelberg (2004)
Chapter Google Scholar
Zighed, D.A., Lallich, S., Muhlenbach, F.: Separability index in supervised learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 475–487. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Ming Li & Zhi-Hua Zhou

Authors

Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu Bao Ho
University of Hong Kong, Pokfulam Road, Hong Kong, China
David Cheung
Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Zhou, ZH. (2005). SETRED: Self-training with Editing. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_71

Download citation

DOI: https://doi.org/10.1007/11430919_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics