Skip to main content

Automatic Feature Extraction for Question Classification Based on Dissimilarity of Probability Distributions

  • Conference paper
Book cover Advances in Natural Language Processing (FinTAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

Abstract

Question classification is one of the first tasks carried out in a Question Answering system. In this paper we present a multilingual question classification system based on machine learning techniques. We use Support Vector Machines to classify the questions. All the features needed to train and test this method are automatically extracted through statistical information in an unsupervised way, comparing Poisson distributions of single words in two plain corpora of questions and documents. Thus, we need nothing but plain text to train the system, obtaining a flexible approach easy to adapt to new languages and domains. We have tested it on a bilingual corpus of questions in English and Spanish.

This work has been developed in the framework of the project CICYT R2D2 (TIC2003-07158-C04).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hermjakob, U.: Parsing and question classification for question answering. In: Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering (2001)

    Google Scholar 

  2. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of COLING (2002)

    Google Scholar 

  3. Bisbal, E., Tomás, D., Vicedo, J.L., Moreno, L.: A Multilingual SVM-Based Question Classification System. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS, vol. 3789, pp. 806–815. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y ISBN 0-387-94559-8

    Google Scholar 

  5. Manning, C., Schütze, H.: Foundations of Statistical natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  6. Tomás, D., Bisbal, E., Vicedo, J.L., Moreno, L., Suárez, A.: Una aproximación multilingüe a la clasificación de preguntas basada en aprendizaje automático. Procesamiento del Lenguaje Natural (SEPLN) 35, 391–400 (2005)

    Google Scholar 

  7. Magnini, B., Romagnoli, S., Vallin, A., Herrera, J., Peñas, A., Peinado, V., Verdejo, F., de Rijke, M.: Creating the DISEQuA Corpus: A Test Set for Multilingual Question Answering

    Google Scholar 

  8. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tomás, D., Vicedo, J.L., Bisbal, E., Moreno, L. (2006). Automatic Feature Extraction for Question Classification Based on Dissimilarity of Probability Distributions. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_15

Download citation

  • DOI: https://doi.org/10.1007/11816508_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37334-6

  • Online ISBN: 978-3-540-37336-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics