SETRED: Self-training with Editing
Self-training is a semi-supervised learning algorithm in which a learner keeps on labeling unlabeled examples and retraining itself on an enlarged labeled training set. Since the self-training process may erroneously label some unlabeled examples, sometimes the learned hypothesis does not perform well. In this paper, a new algorithm named Setred is proposed, which utilizes a specific data editing method to identify and remove the mislabeled examples from the self-labeled data. In detail, in each iteration of the self-training process, the local cut edge weight statistic is used to help estimate whether a newly labeled example is reliable or not, and only the reliable self-labeled examples are used to enlarge the labeled training set. Experiments show that the introduction of data editing is beneficial, and the learned hypotheses of Setred outperform those learned by the standard self-training algorithm.
KeywordsGeneralization Ability Unlabeled Data Base Learner Data Editing Computational Learn Theory
Unable to display preview. Download preview PDF.
- 1.Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, New York, NY, pp. 92–100 (1998)Google Scholar
- 4.Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, San Francisco, CA, pp. 200–209 (1999)Google Scholar
- 5.Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 3–12 (1994)Google Scholar
- 6.McCallum, A., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, Madison, WI, pp. 359–367 (1998)Google Scholar
- 8.Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proceeding of the 17th International Conference on Machine Learning, Stanford, CA, pp. 621–626 (2000)Google Scholar
- 9.Muslea, I., Minton, S., Knoblock, C.A.: Active + semi-supervised learning = robust multi-view learning. In: Proceeding of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 435–442 (2002)Google Scholar
- 10.Nigam, K., Ghani, R.: Analyzing the effectiveness and applicabilbity of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management, Washington, DC, pp. 86–93 (2000)Google Scholar
- 12.Sarkar, A.: Applying co-training methods to statistical parsing. In: Proceedings of the 2nd Annual Meeting of the North American Chapter of the Association of Computational Linguistics, Pittsburgh, PA, pp. 95–102 (2001)Google Scholar
- 13.Seeger, M.: Learning with labeled and unlabeled data. Technical Report, University of Edinburgh, Edinburgh, UK (2001)Google Scholar
- 14.Seuong, H., Opper, M., Sompolinski, H.: Query by committee. In: Proceedings of the 5th ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 287–294 (1992)Google Scholar