Discovering Links Between Lexical and Surface Features in Questions and Answers

  • Soumen Chakrabarti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3932)


Information retrieval systems, based on keyword match, are evolving to question answering systems that return short passages or direct answers to questions, rather than URLs pointing to whole pages. Most open-domain question answering systems depend on manually designed hierarchies of question types. A question is first classified to a fixed type, and then hand-engineered rules associated with the type yield keywords and/or predictive annotations that are likely to match indexed answer passages. Here we seek a more data-driven approach, assisted by machine learning. We propose a simple log-linear model over a pair of feature vectors, one derived from the question and the other derived from the a candidate passage. Features are extracted using a lexical network and surface context as in named entity extraction, except that there is no direct supervision available in the form of fixed entity types and their examples. Using the log-linear model, we filter candidate passages and see substantial improvement in the mean rank at which the first answer is found. The model parameters distill and reveal linguistic artifacts coupling questions and their answers, which can be used for better annotation and indexing.


Information Retrieval Question Type Information Retrieval System Question Answering Uppercase Letter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: WWW Conference, pp. 169–178 (2001)Google Scholar
  2. 2.
    Breck, E., Burger, J., House, D., Light, M., Mani, I.: Answering from Large Document Collections. In: AAAI Fall Symposium on Question Answering Systems (1999)Google Scholar
  3. 3.
    Chen, S.F., Rosenfeld, R.: A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, Carnegie Mellon University (1999)Google Scholar
  4. 4.
    Clarke, C.L.A., Cormack, G.V., Lynam, T.R.: Exploiting redundancy in question answering. In: SIGIR, pp. 358–365 (2001)Google Scholar
  5. 5.
    Dumais, S., Banko, M., Brill, E., Lin, J., Ng, A.: Web question answering: Is more always better? In: SIGIR, pp. 291–298 (2002)Google Scholar
  6. 6.
    Etzioni, O., Cafarella, M., et al.: Web-scale information extraction in KnowItAll. In: WWW Conference. ACM, New York (2004)Google Scholar
  7. 7.
    Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V., Morarescu, P.: FALCON: Boosting knowledge for answer engines. In: TREC 9, pp. 479–488. NIST (2000)Google Scholar
  8. 8.
    Hovy, E., Gerber, L., Hermjakob, U., Junk, M., Lin, C.-Y.: Question answering in Webclopedia. In: TREC 9, NIST (2001)Google Scholar
  9. 9.
    Katz, B., Lin, J.: Selectively using relations to improve precision in question answering. In: EACL Workshop on Natural Language Processing for Question Answering, Budapest, Hungary (2003)Google Scholar
  10. 10.
    Kwok, C., Etzioni, O., Weld, D.S.: Scaling question answering to the Web. In: WWW Conference, Hong Kong, vol. 10, pp. 150–161 (2001)Google Scholar
  11. 11.
    Light, M., Mann, G., Riloff, E., Breck, E.: Analyses for elucidating current question answering technology. Journal of Natural Language Engineering 7(4), 325–342 (2001)CrossRefGoogle Scholar
  12. 12.
    Lin, D., Pantel, P.: Discovery of inference rules for question answering. Natural Language Engineering 7(4), 343–360 (2001)CrossRefGoogle Scholar
  13. 13.
    McCallum, A.: Efficiently inducing features of conditional random fields. In: UAI (2003)Google Scholar
  14. 14.
    Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An online lexical database. International Journal of Lexicography (1993)Google Scholar
  15. 15.
    Nyberg, E., Mitamura, T., Callan, J., Carbonell, J., Frederking, R., Collins-Thompson, K., Hiyakumoto, L., Huang, Y., Huttenhower, C., Judy, S., Ko, J., Kupsc, A., Lita, L.V., Pedro, V., Svoboda, D., Durme, B.V.: The JAVELIN question-answering system at TREC 2003: A multi-strategy approach with dynamic planning. In: TREC, vol. 12 (2003)Google Scholar
  16. 16.
    Prager, J., Brown, E., Coden, A., Radev, D.: Question-answering by predictive annotation. In: SIGIR, pp. 184–191. ACM, New York (2000)CrossRefGoogle Scholar
  17. 17.
    Radev, D., Fan, W., Qi, H., Wu, H., Grewal, A.: Probabilistic question answering on the web. In: WWW Conference, pp. 408–419 (2002)Google Scholar
  18. 18.
    Ramakrishnan, G., Chakrabarti, S., Paranjpe, D.A., Bhattacharyya, P.: Is question answering an acquired skill? In: WWW Conference, New York, pp. 111–120 (2004)Google Scholar
  19. 19.
    Suzuki, J., Hirao, T., Sasaki, Y., Maeda, E.: Hierarchical directed acyclic graph kernel: Methods for structured natural language data. In: ACL, pp. 32–39 (2003)Google Scholar
  20. 20.
    Tellex, S., Katz, B., et al.: Quantitative evaluation of passage retrieval algorithms for question answering. In: SIGIR, pp. 41–47 (2003)Google Scholar
  21. 21.
    Voorhees, E.: Overview of the TREC 2001 question answering track. In: The Tenth Text REtrieval Conference. NIST Special Publication, vol. 500-250, pp. 42–51 (2001)Google Scholar
  22. 22.
    Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: ACL, Las Cruces, NM, vol. 32, pp. 88–95 (1994)Google Scholar
  23. 23.
    Zhang, D., Lee, W.S.: A language modeling approach to passage question answering. In: Text REtrieval Conference (TREC), NIST, vol. 12 (November 2003)Google Scholar
  24. 24.
    Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR, Toronto, Canada. ACM, New York (2003)Google Scholar
  25. 25.
    Zhang, J., Yang, Y.: Robustness of regularized linear classification methods in text categorization. In: SIGIR, pp. 190–197. ACM, New York (2003)Google Scholar
  26. 26.
    Zheng, Z.: AnswerBus question answering system. In: HLT (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Soumen Chakrabarti
    • 1
  1. 1.IIT BombayIndia

Personalised recommendations