This chapter investigates a statistical approach to open domain question answering. Although the work presented in this chapter centers around maximum entropy models, the models required can be modelled using any machine learning approach. To perform question answering, as has been discussed in previous chapters, questions are first analyzed and a prediction is made as to what type of answer the user is expecting. Secondly, a fast search of the text database is performed and the top documents relevant to the query are retrieved. These documents have been annotated automatically using a named entity tagger. Finally, the answer tag prediction and the annotated documents are input to the answer selection stage. Results obtained from a trainable answer selection algorithm are reported.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
7. References
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. (2000). Bridging the lexical chasm: Statistical approaches to answer finding. In Research and Development in Information Retrieval, pages 192-199.
Berger, A. and Printz, H. (1998). A comparison of criteria and maximum divergence feature selection. Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, pages 97-106.
Berger, A. L., Pietra, V. D., and Pietra, S. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1): 39-71.
Borthwick, A. et al. (1998). Exploiting diverse knowledge sources via maximum entropy in named entity recognition. Proceedings of the COLING-ACL 98, Sixth Workshop on Very Large Corpora, pages 152-160.
Brill, E. (1993). Transformation-based error-driven parsing. In Proceedings of the 31st Annual Meeting of the ACL, Columbus, Ohio, pages 543-565.
Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. (2001). Data-intensive question answering. TREC-10 Proceedings, pages 393-400.
Brown, K. and Miller, J. (1999). Concise Encyclopedia of Grammatical Categories. Elsevier Science Ltd.
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19: 263-311.
Burke, R. et al. (1997). Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System. University of Chicago Technical Report TR-97-05.
Clarke, C., Cormack, G., Lynam, T., Li, C., and McLearn, G. (2001). Web reinforced questions answering (multitext experiments for trec 2001). TREC-10 Proceedings, pages 673-679.
Collins, M. (1996). A new statistical parser based on bi-gram lexical dependencies. In Proceedings of the 34th Annual Meeting of the ACL, pages 184-191.
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. John Wiley and Sons, Inc.
Csiszar, I. (1989) a Geometric Interpretation of Darroch and Ratcliff’s Generalized Iterative Scaling. The Annuals of Statistics, 17(3): 1409-1414.
Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annual of Mathematical Studies, 43(5): 1470-1480.
Frakes, W. B. and Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms. Prentice-Hall.
Ge, N. (2000). An Approach to Anaphora Resolution. PhD Thesis Department of Computer Science, Brown University.
Green, B., Wolf, A., Chomsky, C., and Baseball, L. (1963). An automatic question answerer. Computers and Thought, pages 207-216.
Harabagiu, S et al. (2000). Falcon: Boosting knowledge for answer engines. TREC-9 Proceeding, pages 50-59.
Hobbs, J. (1976). Pronoun resolution. Dept. of Computer Science, City College, CUNY, Technical Report TR76-1
Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A., and Mammone, R. (2000). IBM’s statistical question answering system. TREC-9 Proceedings, pages 60-65.
Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A., and Mammone, R. (2001). Question answering using maximum entropy components. The Second Meeting of the North America Chapter of the Association of Computational Linguistics, Pittsburgh, PA, pages 33-39.
Jaynes, E. (1983). Papers on Probability, Statistics, and Statistical Physics. D. Reidel Publishing Co., Dordrecht-Holland.
Koeling, R. (2000). Chunking with maximum entropy models. In Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, Pages 139-141.
Li, X. and Roth, D. (2002). Learning questions classifiers. In COLING 2002, The 19th International Conference on Computational Linguistics, pages 556-562.
Marcus, M. P., Santroini, B., and Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2): 313-303.
Mikheev, A., Grover, C., and Moens M. (1997). Description of the LTG System Used for MUC-7. Online Proceedings of MUC-7, pages 1-11.
Miller, G. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4): 235-244.
Moldovan, D. et al. (1999). LASSO: A tool for surfing the answer net. TREC-8 Proceedings, page 65-73.
Morton, T. (1999). Using coreference for question and answering. In ACL Workshop, Coreference and Its Applications, pages 173-180.
Ng, H. T., Teo, L. H., and Kwan, J. L. P. (2000). A machine learning approach to answering questions for reading comprehension tests. In Proceedings of the 2002 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pages 124-132.
Papineni, K. (2001). Why inverse document frequency. The Second Meeting of the North American Chapter of the Association of Computational Linguistics, Pittsburgh, PA, 10(10): 25-32.
Pietra, S. D., Pietra, V. D., and Lafferty, J. (1995). Inducing features of random fields. Technical Report, Department of Computer Science, Carnegie-Mellon University, CMU-CS-95-144.
Ratnaparkhi, A. (1998a). Maximum Entropy Models For Natural Language Ambiguity Resolution. PhD Thesis, Department of Computer and Information Science, University of Pennsylvania. Ratnaparkhi, A. (1998b). Statistical models for unsupervised prepositional phrase attachments. In COLING-ACL, pages 1079-1085.
Ratnaparkhi, A. (1990). Learning to parse natural language with maximum entropy models. Machine Learning Journal, 34: 151-175.
Ratnaparkhi, A., Roukos, S., and Ward, R. T. (1994). Maximum entropy model for parsing. In Proc. Of the 1994 International Conference on Spoken Language Processing (ICSLP 94), PAGES 803-806, Yokohama, Japan.
Ravichandran, D. and Hovy, E. (2002). Learning surface text patterns for questions answering system. In Proceedings of the 4th Annual Meeting of the ACL, pages 41-47.
Reynar, J. C. and Ratnaparkhi, A. (1997). A maximum entropy approach to identifying sentences boundaries. Proceedings of the Fifth on Applied Natural Language Processing, pages 16-19.
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. (1995). Okapi at TREC-3. In Harman, D., editor, Proceedings of the Third Text Retrieval Conference (TREC-3). NIST Special Publication 500-225.
Soubbotin, M. M. (2001). Patterns of potential answer expressions as clues to the right answers. TREC-10 Proceedings, pages 293-302.
Srihari, R. and Li, W.(1999). Question answering supported by information extraction. TREC-8 Proceedings, pages 75-85.
Xu, J. and Croft, W. B. (1996). Query expansion using local and global document analysis. In Research and Development in Information Retrieval, pages 4-11.
Yang, H. and Chua, T.-S. (2002). The integration of lexical knowledge and external resources for question answering. TREC-11 Notebook Proceedings.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer
About this chapter
Cite this chapter
Ittycheriah, A. (2008). A Statistical Approach For Open Domain Question Answering. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_2
Download citation
DOI: https://doi.org/10.1007/978-1-4020-4746-6_2
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4744-2
Online ISBN: 978-1-4020-4746-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)