A Statistical Approach For Open Domain Question Answering

Ittycheriah, Abraham

doi:10.1007/978-1-4020-4746-6_2

A Statistical Approach For Open Domain Question Answering

Abraham Ittycheriah⁵

Chapter

777 Accesses
2 Citations

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 32))

This chapter investigates a statistical approach to open domain question answering. Although the work presented in this chapter centers around maximum entropy models, the models required can be modelled using any machine learning approach. To perform question answering, as has been discussed in previous chapters, questions are first analyzed and a prediction is made as to what type of answer the user is expecting. Secondly, a fast search of the text database is performed and the top documents relevant to the query are retrieved. These documents have been annotated automatically using a named entity tagger. Finally, the answer tag prediction and the annotated documents are input to the answer selection stage. Results obtained from a trainable answer selection algorithm are reported.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

7. References

Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. (2000). Bridging the lexical chasm: Statistical approaches to answer finding. In Research and Development in Information Retrieval, pages 192-199.
Google Scholar
Berger, A. and Printz, H. (1998). A comparison of criteria and maximum divergence feature selection. Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, pages 97-106.
Google Scholar
Berger, A. L., Pietra, V. D., and Pietra, S. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1): 39-71.
Google Scholar
Borthwick, A. et al. (1998). Exploiting diverse knowledge sources via maximum entropy in named entity recognition. Proceedings of the COLING-ACL 98, Sixth Workshop on Very Large Corpora, pages 152-160.
Google Scholar
Brill, E. (1993). Transformation-based error-driven parsing. In Proceedings of the 31st Annual Meeting of the ACL, Columbus, Ohio, pages 543-565.
Google Scholar
Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. (2001). Data-intensive question answering. TREC-10 Proceedings, pages 393-400.
Google Scholar
Brown, K. and Miller, J. (1999). Concise Encyclopedia of Grammatical Categories. Elsevier Science Ltd.
Google Scholar
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19: 263-311.
Google Scholar
Burke, R. et al. (1997). Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System. University of Chicago Technical Report TR-97-05.
Google Scholar
Clarke, C., Cormack, G., Lynam, T., Li, C., and McLearn, G. (2001). Web reinforced questions answering (multitext experiments for trec 2001). TREC-10 Proceedings, pages 673-679.
Google Scholar
Collins, M. (1996). A new statistical parser based on bi-gram lexical dependencies. In Proceedings of the 34th Annual Meeting of the ACL, pages 184-191.
Google Scholar
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. John Wiley and Sons, Inc.
Google Scholar
Csiszar, I. (1989) a Geometric Interpretation of Darroch and Ratcliff’s Generalized Iterative Scaling. The Annuals of Statistics, 17(3): 1409-1414.
Article Google Scholar
Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annual of Mathematical Studies, 43(5): 1470-1480.
Article Google Scholar
Frakes, W. B. and Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms. Prentice-Hall.
Google Scholar
Ge, N. (2000). An Approach to Anaphora Resolution. PhD Thesis Department of Computer Science, Brown University.
Google Scholar
Green, B., Wolf, A., Chomsky, C., and Baseball, L. (1963). An automatic question answerer. Computers and Thought, pages 207-216.
Google Scholar
Harabagiu, S et al. (2000). Falcon: Boosting knowledge for answer engines. TREC-9 Proceeding, pages 50-59.
Google Scholar
Hobbs, J. (1976). Pronoun resolution. Dept. of Computer Science, City College, CUNY, Technical Report TR76-1
Google Scholar
Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A., and Mammone, R. (2000). IBM’s statistical question answering system. TREC-9 Proceedings, pages 60-65.
Google Scholar
Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A., and Mammone, R. (2001). Question answering using maximum entropy components. The Second Meeting of the North America Chapter of the Association of Computational Linguistics, Pittsburgh, PA, pages 33-39.
Google Scholar
Jaynes, E. (1983). Papers on Probability, Statistics, and Statistical Physics. D. Reidel Publishing Co., Dordrecht-Holland.
Google Scholar
Koeling, R. (2000). Chunking with maximum entropy models. In Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, Pages 139-141.
Google Scholar
Li, X. and Roth, D. (2002). Learning questions classifiers. In COLING 2002, The 19th International Conference on Computational Linguistics, pages 556-562.
Google Scholar
Marcus, M. P., Santroini, B., and Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2): 313-303.
Google Scholar
Mikheev, A., Grover, C., and Moens M. (1997). Description of the LTG System Used for MUC-7. Online Proceedings of MUC-7, pages 1-11.
Google Scholar
Miller, G. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4): 235-244.
Article Google Scholar
Moldovan, D. et al. (1999). LASSO: A tool for surfing the answer net. TREC-8 Proceedings, page 65-73.
Google Scholar
Morton, T. (1999). Using coreference for question and answering. In ACL Workshop, Coreference and Its Applications, pages 173-180.
Google Scholar
Ng, H. T., Teo, L. H., and Kwan, J. L. P. (2000). A machine learning approach to answering questions for reading comprehension tests. In Proceedings of the 2002 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pages 124-132.
Google Scholar
Papineni, K. (2001). Why inverse document frequency. The Second Meeting of the North American Chapter of the Association of Computational Linguistics, Pittsburgh, PA, 10(10): 25-32.
Google Scholar
Pietra, S. D., Pietra, V. D., and Lafferty, J. (1995). Inducing features of random fields. Technical Report, Department of Computer Science, Carnegie-Mellon University, CMU-CS-95-144.
Google Scholar
Ratnaparkhi, A. (1998a). Maximum Entropy Models For Natural Language Ambiguity Resolution. PhD Thesis, Department of Computer and Information Science, University of Pennsylvania. Ratnaparkhi, A. (1998b). Statistical models for unsupervised prepositional phrase attachments. In COLING-ACL, pages 1079-1085.
Google Scholar
Ratnaparkhi, A. (1990). Learning to parse natural language with maximum entropy models. Machine Learning Journal, 34: 151-175.
Article Google Scholar
Ratnaparkhi, A., Roukos, S., and Ward, R. T. (1994). Maximum entropy model for parsing. In Proc. Of the 1994 International Conference on Spoken Language Processing (ICSLP 94), PAGES 803-806, Yokohama, Japan.
Google Scholar
Ravichandran, D. and Hovy, E. (2002). Learning surface text patterns for questions answering system. In Proceedings of the 4th Annual Meeting of the ACL, pages 41-47.
Google Scholar
Reynar, J. C. and Ratnaparkhi, A. (1997). A maximum entropy approach to identifying sentences boundaries. Proceedings of the Fifth on Applied Natural Language Processing, pages 16-19.
Google Scholar
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. (1995). Okapi at TREC-3. In Harman, D., editor, Proceedings of the Third Text Retrieval Conference (TREC-3). NIST Special Publication 500-225.
Google Scholar
Soubbotin, M. M. (2001). Patterns of potential answer expressions as clues to the right answers. TREC-10 Proceedings, pages 293-302.
Google Scholar
Srihari, R. and Li, W.(1999). Question answering supported by information extraction. TREC-8 Proceedings, pages 75-85.
Google Scholar
Xu, J. and Croft, W. B. (1996). Query expansion using local and global document analysis. In Research and Development in Information Retrieval, pages 4-11.
Google Scholar
Yang, H. and Chua, T.-S. (2002). The integration of lexical knowledge and external resources for question answering. TREC-11 Notebook Proceedings.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, 10598, Yorktown Heights, NY, USA
Abraham Ittycheriah

Authors

Abraham Ittycheriah
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

State University of New York at Albany, 1400 Washington Avenue, 12222, Albany, NY, USA
Tomek Strzalkowski
University of Texas at Dallas, 75083, Richardson, TX, USA
Sanda M. Harabagiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ittycheriah, A. (2008). A Statistical Approach For Open Domain Question Answering. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_2

Download citation

DOI: https://doi.org/10.1007/978-1-4020-4746-6_2
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4744-2
Online ISBN: 978-1-4020-4746-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics

Buying options