The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the “correct” answers to factual natural language questions. We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
7. References
Banko, M., V. Mittal, and M. Witbrock. Headline Generation Based on Statistical Translation, ACL 2000.
Berger, A. and J. Lafferty. Information retrieval as statistical translation. In Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999.
Berger, A., P. Brown, S. Pietra, V. Pietra, J. Lafferty, H. Printz, and L. Ures. The Candide system for machine translation. In Proceedings of the ARPA Conference on Human Language Technology, 1994.
Brown, P.F., J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, 1990.
Church, K. A stochastic parts program and a noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988.
Cohn, D. and Z. Ghahramani and M. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 1996, pages 129-145.
Dempster, A.P., N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39:1-38, 1977.
Excite query corpus. ftp://ftp.excite.com/pub/jack/Excite_Log_12201999.gz, 1999.
Glover, E., G. Flake, S. Lawrence, W. Birmingham, and A. Kruger. Improving category specific web search by learning query modifications. In Symposium on Applications and the Internet, Jan 8– 12 2001.
Glover, E.J., S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. L. Giles. Web search - your way. Communications of the ACM, 2001.
Harabagiu, S., D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Gîrju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 2001.
Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts, 1997.
Knight, K. and D. Marcu. Statistics-based summarization - step one: sentence compression. In Proceedings of Seventeenth Annual Conference of the American Association for Artificial Intelligence, Austin, Texas, August 2000.
Knight, K. and J. Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998.
Manning, C. and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
McCallum, A. and K. Nigam. Employing EM and pool-based active learning for text classification. Proceedings on ICML. Pages 359-367, 1998.
Mikheev, A. Tagging sentence boundaries. In Proceedings of SIGIR 2000.
Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.
Mitra, M. and A. Singhal and C. Buckley. Improving Automatic Query Expansion. SIGIR 1998.
Moldovan, D., S. Harabagiu, M. Pasca, R. Mihalcea, R. Girju, R. Goodrum, and V. Rus. The structure and performance of an open-domain question answering system. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, October 2000.
Neal, R. and G. Hinton. A new view of the EM algorithm that justifies incremental and other variant. Technical Report. University of Toronto, 1993.
Nigam, K., A. McCallum, S. Thrun, and T. Mitchell. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39 (2-3), pages 103-134, 2000.
Pereira, F., N. Tishby, and L. Lee. Distributional clustering of English words. In 30th Annual Meeting of the ACL, 183-190, 1993.
Ponte, J. and B. Croft. A language modeling approach to information retrieval. In Proceedings, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 275-281, Melbourne, Australia, August 1998.
Prager, J., E. Brown, A. Coden, and Dragomir R. Radev. Question-answering by predictive annotation. In Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000.
Radev, Dragomir R., J. Prager, and V. Samn. Ranking potential answers to natural language questions. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000.
Radev, Dragomir R., Kelsey Libner, and Weiguo Fan. Getting Answers to Natural Language Queries on the Web. Journal of the American Society for Information Science and Technology, 2002.
Voorhees, E. and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer
About this chapter
Cite this chapter
Radev, D.R. et al. (2008). Query Modulation For Web-Based Question Answering. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_9
Download citation
DOI: https://doi.org/10.1007/978-1-4020-4746-6_9
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4744-2
Online ISBN: 978-1-4020-4746-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)