Advertisement

Automatic Feature Extraction for Question Classification Based on Dissimilarity of Probability Distributions

  • David Tomás
  • José L. Vicedo
  • Empar Bisbal
  • Lidia Moreno
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)

Abstract

Question classification is one of the first tasks carried out in a Question Answering system. In this paper we present a multilingual question classification system based on machine learning techniques. We use Support Vector Machines to classify the questions. All the features needed to train and test this method are automatically extracted through statistical information in an unsupervised way, comparing Poisson distributions of single words in two plain corpora of questions and documents. Thus, we need nothing but plain text to train the system, obtaining a flexible approach easy to adapt to new languages and domains. We have tested it on a bilingual corpus of questions in English and Spanish.

Keywords

Support Vector Machine Statistical Information Plain Text Question Answering Linguistic Knowledge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hermjakob, U.: Parsing and question classification for question answering. In: Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering (2001)Google Scholar
  2. 2.
    Li, X., Roth, D.: Learning question classifiers. In: Proceedings of COLING (2002)Google Scholar
  3. 3.
    Bisbal, E., Tomás, D., Vicedo, J.L., Moreno, L.: A Multilingual SVM-Based Question Classification System. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS, vol. 3789, pp. 806–815. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y ISBN 0-387-94559-8Google Scholar
  5. 5.
    Manning, C., Schütze, H.: Foundations of Statistical natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  6. 6.
    Tomás, D., Bisbal, E., Vicedo, J.L., Moreno, L., Suárez, A.: Una aproximación multilingüe a la clasificación de preguntas basada en aprendizaje automático. Procesamiento del Lenguaje Natural (SEPLN) 35, 391–400 (2005)Google Scholar
  7. 7.
    Magnini, B., Romagnoli, S., Vallin, A., Herrera, J., Peñas, A., Peinado, V., Verdejo, F., de Rijke, M.: Creating the DISEQuA Corpus: A Test Set for Multilingual Question AnsweringGoogle Scholar
  8. 8.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David Tomás
    • 1
  • José L. Vicedo
    • 1
  • Empar Bisbal
    • 2
  • Lidia Moreno
    • 2
  1. 1.Departamento de Lenguajes y Sistemas InformáticosUniversidad de AlicanteSpain
  2. 2.Departamento de Sistemas Informáticos y ComputaciónUniversidad Politécnica de ValenciaSpain

Personalised recommendations