ICCPOL 2006: Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead pp 31-41 | Cite as
Using Semi-supervised Learning for Question Classification
Abstract
This paper tries to use unlabelled in combination with labelled questions for semi-supervised learning to improve the performance of question classification task. We also give two proposals to modify the Tri-training which is a simple but efficient co-training style algorithm to make it more suitable for question data type. In order to avoid bootstrap-sampling the training set to get different sets for training the three classifiers, the first proposal is to use multiple algorithms for classifiers in Tri-training, the second one is to use multiple algorithms for classifiers in combination with multiple views. The modification prevents the error rate at the initial step from being increased and our experiments show promising results.
Keywords
Support Vector Machine Unlabelled Instance Multiple Algorithm Dimension Reduction Step Initial Error RatePreview
Unable to display preview. Download preview PDF.
References
- 1.Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)Google Scholar
- 2.Carlson, A., Cumby, C., Roth, D.: The SNoW learning architecture, Technical Report UIUC-DCS-R-99-2101, UIUC Computer Science Department (1999)Google Scholar
- 3.Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20(3), 273–297 (1995)MATHGoogle Scholar
- 4.Zhang, D., Lee, W.S.: Question classification using Support vector machine. In: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 26–32 (2003)Google Scholar
- 5.Voorhees, E.: The TREC-8 Question Answering Track Report. In: Proceedings of the 8th Text Retrieval Conference (TREC8), pp. 77–82 (1999)Google Scholar
- 6.Voorhees, E.: The TREC-9 Question Answering Track. In: Proceedings of the 9th Text Retrieval Conference (TREC9), pp. 71–80 (2000)Google Scholar
- 7.Voorhees, E.: Overview of the TREC 2001 Question Answering Track. In: Proceedings of the 10th Text Retrieval Conference (TREC10), pp. 157–165 (2001)Google Scholar
- 8.Mulenbach, F., et al.: Identifying and handling mislabelled Instances. Journal of Intelligent Information Systems 22(1), 89–109 (2004)CrossRefGoogle Scholar
- 9.Kanji, G.K.: 100 Statistical Tests. SAGE Publications, Thousand Oaks (1994)Google Scholar
- 10.Kadri, H., Wayne, W.: Question classification with Support vector machines and error correcting codes. In: Proceedings of NAACL/Human Language Technology Conference, pp. 28–30 (2003)Google Scholar
- 11.Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning, pp. 327–334 (2000)Google Scholar
- 12.Joachims, T.: Text categorization with Support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
- 13.Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 556–562 (2002)Google Scholar
- 14.Zhou, Z.-H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11) (2005)Google Scholar