Skip to main content

A Statistical Approach For Open Domain Question Answering

  • Chapter

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 32))

This chapter investigates a statistical approach to open domain question answering. Although the work presented in this chapter centers around maximum entropy models, the models required can be modelled using any machine learning approach. To perform question answering, as has been discussed in previous chapters, questions are first analyzed and a prediction is made as to what type of answer the user is expecting. Secondly, a fast search of the text database is performed and the top documents relevant to the query are retrieved. These documents have been annotated automatically using a named entity tagger. Finally, the answer tag prediction and the annotated documents are input to the answer selection stage. Results obtained from a trainable answer selection algorithm are reported.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  • Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. (2000). Bridging the lexical chasm: Statistical approaches to answer finding. In Research and Development in Information Retrieval, pages 192-199.

    Google Scholar 

  • Berger, A. and Printz, H. (1998). A comparison of criteria and maximum divergence feature selection. Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, pages 97-106.

    Google Scholar 

  • Berger, A. L., Pietra, V. D., and Pietra, S. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1): 39-71.

    Google Scholar 

  • Borthwick, A. et al. (1998). Exploiting diverse knowledge sources via maximum entropy in named entity recognition. Proceedings of the COLING-ACL 98, Sixth Workshop on Very Large Corpora, pages 152-160.

    Google Scholar 

  • Brill, E. (1993). Transformation-based error-driven parsing. In Proceedings of the 31st Annual Meeting of the ACL, Columbus, Ohio, pages 543-565.

    Google Scholar 

  • Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. (2001). Data-intensive question answering. TREC-10 Proceedings, pages 393-400.

    Google Scholar 

  • Brown, K. and Miller, J. (1999). Concise Encyclopedia of Grammatical Categories. Elsevier Science Ltd.

    Google Scholar 

  • Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19: 263-311.

    Google Scholar 

  • Burke, R. et al. (1997). Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System. University of Chicago Technical Report TR-97-05.

    Google Scholar 

  • Clarke, C., Cormack, G., Lynam, T., Li, C., and McLearn, G. (2001). Web reinforced questions answering (multitext experiments for trec 2001). TREC-10 Proceedings, pages 673-679.

    Google Scholar 

  • Collins, M. (1996). A new statistical parser based on bi-gram lexical dependencies. In Proceedings of the 34th Annual Meeting of the ACL, pages 184-191.

    Google Scholar 

  • Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. John Wiley and Sons, Inc.

    Google Scholar 

  • Csiszar, I. (1989) a Geometric Interpretation of Darroch and Ratcliff’s Generalized Iterative Scaling. The Annuals of Statistics, 17(3): 1409-1414.

    Article  Google Scholar 

  • Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annual of Mathematical Studies, 43(5): 1470-1480.

    Article  Google Scholar 

  • Frakes, W. B. and Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms. Prentice-Hall.

    Google Scholar 

  • Ge, N. (2000). An Approach to Anaphora Resolution. PhD Thesis Department of Computer Science, Brown University.

    Google Scholar 

  • Green, B., Wolf, A., Chomsky, C., and Baseball, L. (1963). An automatic question answerer. Computers and Thought, pages 207-216.

    Google Scholar 

  • Harabagiu, S et al. (2000). Falcon: Boosting knowledge for answer engines. TREC-9 Proceeding, pages 50-59.

    Google Scholar 

  • Hobbs, J. (1976). Pronoun resolution. Dept. of Computer Science, City College, CUNY, Technical Report TR76-1

    Google Scholar 

  • Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A., and Mammone, R. (2000). IBM’s statistical question answering system. TREC-9 Proceedings, pages 60-65.

    Google Scholar 

  • Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A., and Mammone, R. (2001). Question answering using maximum entropy components. The Second Meeting of the North America Chapter of the Association of Computational Linguistics, Pittsburgh, PA, pages 33-39.

    Google Scholar 

  • Jaynes, E. (1983). Papers on Probability, Statistics, and Statistical Physics. D. Reidel Publishing Co., Dordrecht-Holland.

    Google Scholar 

  • Koeling, R. (2000). Chunking with maximum entropy models. In Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, Pages 139-141.

    Google Scholar 

  • Li, X. and Roth, D. (2002). Learning questions classifiers. In COLING 2002, The 19th International Conference on Computational Linguistics, pages 556-562.

    Google Scholar 

  • Marcus, M. P., Santroini, B., and Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2): 313-303.

    Google Scholar 

  • Mikheev, A., Grover, C., and Moens M. (1997). Description of the LTG System Used for MUC-7. Online Proceedings of MUC-7, pages 1-11.

    Google Scholar 

  • Miller, G. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4): 235-244.

    Article  Google Scholar 

  • Moldovan, D. et al. (1999). LASSO: A tool for surfing the answer net. TREC-8 Proceedings, page 65-73.

    Google Scholar 

  • Morton, T. (1999). Using coreference for question and answering. In ACL Workshop, Coreference and Its Applications, pages 173-180.

    Google Scholar 

  • Ng, H. T., Teo, L. H., and Kwan, J. L. P. (2000). A machine learning approach to answering questions for reading comprehension tests. In Proceedings of the 2002 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pages 124-132.

    Google Scholar 

  • Papineni, K. (2001). Why inverse document frequency. The Second Meeting of the North American Chapter of the Association of Computational Linguistics, Pittsburgh, PA, 10(10): 25-32.

    Google Scholar 

  • Pietra, S. D., Pietra, V. D., and Lafferty, J. (1995). Inducing features of random fields. Technical Report, Department of Computer Science, Carnegie-Mellon University, CMU-CS-95-144.

    Google Scholar 

  • Ratnaparkhi, A. (1998a). Maximum Entropy Models For Natural Language Ambiguity Resolution. PhD Thesis, Department of Computer and Information Science, University of Pennsylvania. Ratnaparkhi, A. (1998b). Statistical models for unsupervised prepositional phrase attachments. In COLING-ACL, pages 1079-1085.

    Google Scholar 

  • Ratnaparkhi, A. (1990). Learning to parse natural language with maximum entropy models. Machine Learning Journal, 34: 151-175.

    Article  Google Scholar 

  • Ratnaparkhi, A., Roukos, S., and Ward, R. T. (1994). Maximum entropy model for parsing. In Proc. Of the 1994 International Conference on Spoken Language Processing (ICSLP 94), PAGES 803-806, Yokohama, Japan.

    Google Scholar 

  • Ravichandran, D. and Hovy, E. (2002). Learning surface text patterns for questions answering system. In Proceedings of the 4th Annual Meeting of the ACL, pages 41-47.

    Google Scholar 

  • Reynar, J. C. and Ratnaparkhi, A. (1997). A maximum entropy approach to identifying sentences boundaries. Proceedings of the Fifth on Applied Natural Language Processing, pages 16-19.

    Google Scholar 

  • Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. (1995). Okapi at TREC-3. In Harman, D., editor, Proceedings of the Third Text Retrieval Conference (TREC-3). NIST Special Publication 500-225.

    Google Scholar 

  • Soubbotin, M. M. (2001). Patterns of potential answer expressions as clues to the right answers. TREC-10 Proceedings, pages 293-302.

    Google Scholar 

  • Srihari, R. and Li, W.(1999). Question answering supported by information extraction. TREC-8 Proceedings, pages 75-85.

    Google Scholar 

  • Xu, J. and Croft, W. B. (1996). Query expansion using local and global document analysis. In Research and Development in Information Retrieval, pages 4-11.

    Google Scholar 

  • Yang, H. and Chua, T.-S. (2002). The integration of lexical knowledge and external resources for question answering. TREC-11 Notebook Proceedings.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this chapter

Cite this chapter

Ittycheriah, A. (2008). A Statistical Approach For Open Domain Question Answering. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_2

Download citation

Publish with us

Policies and ethics