This paper describes our approach and experience with the question-answering tasks of TREC-9 and TREC-2001. Our approach employed techniques from IR, pattern matching and metakeyword detection with little linguistic analysis and no natural language understanding. It involved the following four steps: 1) retrieval of top-ranked subdocuments using content keywords of a question as query; 2) weighting of sentences from retrieved subdocuments based on heuristic rules and matching with question keywords; 3) refined weighting and ranking of sentences using agreement with expected answer type suggested by question analysis; and 4) extraction of answer strings from top-ranked sentences based on expected answer type and sentence word pattern rules. The blind experiments in TREC showed that the approach returned reasonably good results, excluding those questions with NIL answer. It works because the questions are mainly factoid, definitional types. Analysis shows that our system improves with more subdocuments retrieved, and if answer candidates from two different retrieval lists are combined by confirmation. It can identify sentences containing answers quite well, but it often fails when answers need to be extracted correctly within a 50-byte output. These experiments may serve as examples of how far one can attain in open domain question-answering without making use of external resources (e.g. the web) to find answers, and without deeper natural language analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
10. References
Allan, J., Connell, M. E., Croft, W. B., Feng, F-F, Fisher, D. & Li, X. (2001). INQUERY and TREC-9. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 551-562). Washington, DC: US GPO.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press.
Brill, E., Lin J., Banko, M., Dumais, S., & Ng, Andrew (2002). Data-intensive Question Answering. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 393-400). Washington, DC: US GPO.
Burger, A. & Lafferty, J. (1999). Information retrieval as statistical translation. In Proc. 22nd Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 222-229).
Callan, J. P. (1994). Passage-level evidence in document retrieval. In Proc. 17 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 302-310).
Clarke, C. L. A. , Cormack, G. V., Lynam, T. R., Li, C. M., & McLearn, G. L. (2002). Web reinforced Q A (Multitext experiments for TREC 2001). In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 673-679). Washington, DC: US GPO.
Cormack, G. V., Clarke, C. L. A., Palmer, C. R. and Kisman, D. I. E. (2000). Fast automatic passage ranking (MultiText experiments for TREC-8). In E.M. Voorhees & D.K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 735-741). Washington, DC: US GPO.
Harabagiu, S., Moldovan, D., Pasca, M., Surdeanu, M., Mihalcea, R., Girju, R., et al. (2002). Answering complex, list and context questions with LCC’s Q-A server. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 355-361). Washington, DC: US GPO.
Hovy, E., Gerber, L., Hermjakob, U., Junk M., & Lin, C-Y. (2001). Question answering in Webclopedia. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 655-664). Washington, DC: US GPO.
Hull, D. A. (2000). Xerox TREC-8 question answering track report. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 743-752). Washington, DC: US GPO.
Katz, B. (1997). From sentence processing to information access on the world wide web. AAAI Spring Symposium on NLP for the WWW, Stanford University, CA. (available on: http://www.ai.mit. edu/projects/infolab/start-system.html.)
Kraft, D. & Buell, D. A. (1983). Fuzzy sets and generalized Boolean retrieval systems. Intl. J. of Man-Machine Studies, 19, 45-56.
Kwok, K. L. (1995). A network approach to probabilistic information retrieval. ACM Transactions on Office Information System, 13, 324-353.
Kwok, K. L. & Chan, M. (1998). Improving two-stage ad-hoc retrieval for short queries. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR. (pp. 250-256).
Kwok, K. L., Grunfeld, L. & Chan, M (2000). TREC-8 Ad-Hoc, Query and Filtering Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 217-227). Washington, DC: US GPO.
Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2001). TREC-9 Cross Language, Web and Question-Answering Track experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information  Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 417-426). Washington, DC: US GPO.
Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2002). TREC 2001 Question-Answer, Web and Cross Language Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 452-456). Washington, DC: US GPO.
Kwok, K. L., Papadopoulos, L. & Kwan, Kathy Y. Y. (1993). Retrieval experiments with a large collection using PIRCS. In D. K. Harman, (Ed.), The First Text REtrieval Conference (TREC-1), NIST Special Publication 500-207, (pp. 153-172). Washington, DC: US GPO.
Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proc. 24 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 111-119).
Lavenko, V. & Croft, W. B. (2001). Relevance-based languague models. In Proc. 24 th Ann. Intl. ACM SIGIR Conf. on R&D in IR., (pp. 120-127).
Licklider, J. C. R. (1965). Libraries of the Future. Cambridge, MA: MIT Press.
Moldavan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., et al. (2000). The structure and performance of an open-domain question answering system. In Proc.38 th Ann. Mtg. of ACL (ACL-2000), (pp. 563-570).
O’Connor, J (1975). Retrieval of answer-sentences and answer-figures from papers by text searching. Information Processing & Management, 11(5/7), 155-164.
Ponte, J. M., & Croft, B. W. (1998). A language modeling approach to information retrieval. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 275-281).
Prager, J., Chu-Carroll, J. & Czuba, K. (2002). Use of Wordnet hypernyms for answering what-is questions. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 250-257). Washington, DC: US GPO.
Robertson, S. E. & Sparck Jones, K. (1976) Relevance weighting of search terms. J. of American Soceity of Information Science, 27, 129-146.
Salton, G. (1968). Automatic Information Organization and Retrieval. New York: McGraw-Hill.
Salton, G., Fox, E. & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(17), 1022-1036.
Salton, G. & McGill, M. (1983). Introduction to Modern Information Organization and Retrieval. New York: McGraw-Hill.
Srihari, R. K., Li, W. & Li, X. (200x). Question Answering Supported by Multiple Levels of Information Extraction. paper in this volume.
Soubbotin, M. M. (2002). Patterns of potential answer expressions as clues to the right answers. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 293-302). Washington, DC: US GPO.
Tague-Sutcliffe, J. (1992). Measuring the informativeness of a retrieval process. In Proc. 15 th Ann. Intl. ACM SIGIR Conf. on R&D in IR , (pp. 23-36).
Turtle, H. (1994). Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proc. 17 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 212-220).
Turtle, H. & Croft, B. W. (1991). Evaluation of an inference nework-based retrieval model. ACM Transactions on Information Systems, 9(3), 187-222.
Voorhees, E. M. (2001). Overview of the TREC-9 Question Answering track. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 71-79). Washington, DC: US GPO.
Voorhees, E. M. (2002). Overview of the TREC 2001 Question Answering track.. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 42-51). Washington, DC: US GPO.
Winston, P. H. (1977). Artificial Intelligence. Reading, MA: Addison-Wesley.
Woods, W. A. (1977). Lunar rocks in Natural English: Explorations in NL Q-A. Linguistic Structures Processing, 521-569.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer
About this chapter
Cite this chapter
Grunfeld, L., Kwok, KL. (2008). Sentence Ranking Using Keywords And Meta-Keywords. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_7
Download citation
DOI: https://doi.org/10.1007/978-1-4020-4746-6_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4744-2
Online ISBN: 978-1-4020-4746-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)