Using Semi-supervised Learning for Question Classification

  • Nguyen Thanh Tri
  • Nguyen Minh Le
  • Akira Shimazu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


This paper tries to use unlabelled in combination with labelled questions for semi-supervised learning to improve the performance of question classification task. We also give two proposals to modify the Tri-training which is a simple but efficient co-training style algorithm to make it more suitable for question data type. In order to avoid bootstrap-sampling the training set to get different sets for training the three classifiers, the first proposal is to use multiple algorithms for classifiers in Tri-training, the second one is to use multiple algorithms for classifiers in combination with multiple views. The modification prevents the error rate at the initial step from being increased and our experiments show promising results.


Support Vector Machine Unlabelled Instance Multiple Algorithm Dimension Reduction Step Initial Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)Google Scholar
  2. 2.
    Carlson, A., Cumby, C., Roth, D.: The SNoW learning architecture, Technical Report UIUC-DCS-R-99-2101, UIUC Computer Science Department (1999)Google Scholar
  3. 3.
    Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20(3), 273–297 (1995)MATHGoogle Scholar
  4. 4.
    Zhang, D., Lee, W.S.: Question classification using Support vector machine. In: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 26–32 (2003)Google Scholar
  5. 5.
    Voorhees, E.: The TREC-8 Question Answering Track Report. In: Proceedings of the 8th Text Retrieval Conference (TREC8), pp. 77–82 (1999)Google Scholar
  6. 6.
    Voorhees, E.: The TREC-9 Question Answering Track. In: Proceedings of the 9th Text Retrieval Conference (TREC9), pp. 71–80 (2000)Google Scholar
  7. 7.
    Voorhees, E.: Overview of the TREC 2001 Question Answering Track. In: Proceedings of the 10th Text Retrieval Conference (TREC10), pp. 157–165 (2001)Google Scholar
  8. 8.
    Mulenbach, F., et al.: Identifying and handling mislabelled Instances. Journal of Intelligent Information Systems 22(1), 89–109 (2004)CrossRefGoogle Scholar
  9. 9.
    Kanji, G.K.: 100 Statistical Tests. SAGE Publications, Thousand Oaks (1994)Google Scholar
  10. 10.
    Kadri, H., Wayne, W.: Question classification with Support vector machines and error correcting codes. In: Proceedings of NAACL/Human Language Technology Conference, pp. 28–30 (2003)Google Scholar
  11. 11.
    Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning, pp. 327–334 (2000)Google Scholar
  12. 12.
    Joachims, T.: Text categorization with Support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  13. 13.
    Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 556–562 (2002)Google Scholar
  14. 14.
    Zhou, Z.-H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11) (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nguyen Thanh Tri
    • 1
  • Nguyen Minh Le
    • 1
  • Akira Shimazu
    • 1
  1. 1.School of Information Science, Japan Advanced Institute of Science and TechnologyIshikawaJapan

Personalised recommendations