Skip to main content

Sentence Ranking Using Keywords And Meta-Keywords

  • Chapter
Advances in Open Domain Question Answering

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 32))

This paper describes our approach and experience with the question-answering tasks of TREC-9 and TREC-2001. Our approach employed techniques from IR, pattern matching and metakeyword detection with little linguistic analysis and no natural language understanding. It involved the following four steps: 1) retrieval of top-ranked subdocuments using content keywords of a question as query; 2) weighting of sentences from retrieved subdocuments based on heuristic rules and matching with question keywords; 3) refined weighting and ranking of sentences using agreement with expected answer type suggested by question analysis; and 4) extraction of answer strings from top-ranked sentences based on expected answer type and sentence word pattern rules. The blind experiments in TREC showed that the approach returned reasonably good results, excluding those questions with NIL answer. It works because the questions are mainly factoid, definitional types. Analysis shows that our system improves with more subdocuments retrieved, and if answer candidates from two different retrieval lists are combined by confirmation. It can identify sentences containing answers quite well, but it often fails when answers need to be extracted correctly within a 50-byte output. These experiments may serve as examples of how far one can attain in open domain question-answering without making use of external resources (e.g. the web) to find answers, and without deeper natural language analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

10. References

  • Allan, J., Connell, M. E., Croft, W. B., Feng, F-F, Fisher, D. & Li, X. (2001). INQUERY and TREC-9. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 551-562). Washington, DC: US GPO.

    Google Scholar 

  • Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press.

    Google Scholar 

  • Brill, E., Lin J., Banko, M., Dumais, S., & Ng, Andrew (2002). Data-intensive Question Answering. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 393-400). Washington, DC: US GPO.

    Google Scholar 

  • Burger, A. & Lafferty, J. (1999). Information retrieval as statistical translation. In Proc. 22nd Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 222-229).

    Google Scholar 

  • Callan, J. P. (1994). Passage-level evidence in document retrieval. In Proc. 17 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 302-310).

    Google Scholar 

  • Clarke, C. L. A. , Cormack, G. V., Lynam, T. R., Li, C. M., & McLearn, G. L. (2002). Web reinforced Q A (Multitext experiments for TREC 2001). In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 673-679). Washington, DC: US GPO.

    Google Scholar 

  • Cormack, G. V., Clarke, C. L. A., Palmer, C. R. and Kisman, D. I. E. (2000). Fast automatic passage ranking (MultiText experiments for TREC-8). In E.M. Voorhees & D.K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 735-741). Washington, DC: US GPO.

    Google Scholar 

  • Harabagiu, S., Moldovan, D., Pasca, M., Surdeanu, M., Mihalcea, R., Girju, R., et al. (2002). Answering complex, list and context questions with LCC’s Q-A server. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 355-361). Washington, DC: US GPO.

    Google Scholar 

  • Hovy, E., Gerber, L., Hermjakob, U., Junk M., & Lin, C-Y. (2001). Question answering in Webclopedia. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 655-664). Washington, DC: US GPO.

    Google Scholar 

  • Hull, D. A. (2000). Xerox TREC-8 question answering track report. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 743-752). Washington, DC: US GPO.

    Google Scholar 

  • Katz, B. (1997). From sentence processing to information access on the world wide web. AAAI Spring Symposium on NLP for the WWW, Stanford University, CA. (available on: http://www.ai.mit. edu/projects/infolab/start-system.html.)

  • Kraft, D. & Buell, D. A. (1983). Fuzzy sets and generalized Boolean retrieval systems. Intl. J. of Man-Machine Studies, 19, 45-56.

    Article  Google Scholar 

  • Kwok, K. L. (1995). A network approach to probabilistic information retrieval. ACM Transactions on Office Information System, 13, 324-353.

    Article  Google Scholar 

  • Kwok, K. L. & Chan, M. (1998). Improving two-stage ad-hoc retrieval for short queries. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR. (pp. 250-256).

    Google Scholar 

  • Kwok, K. L., Grunfeld, L. & Chan, M (2000). TREC-8 Ad-Hoc, Query and Filtering Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 217-227). Washington, DC: US GPO.

    Google Scholar 

  • Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2001). TREC-9 Cross Language, Web and Question-Answering Track experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information  Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 417-426). Washington, DC: US GPO.

    Google Scholar 

  • Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2002). TREC 2001 Question-Answer, Web and Cross Language Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 452-456). Washington, DC: US GPO.

    Google Scholar 

  • Kwok, K. L., Papadopoulos, L. & Kwan, Kathy Y. Y. (1993). Retrieval experiments with a large collection using PIRCS. In D. K. Harman, (Ed.), The First Text REtrieval Conference (TREC-1), NIST Special Publication 500-207, (pp. 153-172). Washington, DC: US GPO.

    Google Scholar 

  • Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proc. 24 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 111-119).

    Google Scholar 

  • Lavenko, V. & Croft, W. B. (2001). Relevance-based languague models. In Proc. 24 th Ann. Intl. ACM SIGIR Conf. on R&D in IR., (pp. 120-127).

    Google Scholar 

  • Licklider, J. C. R. (1965). Libraries of the Future. Cambridge, MA: MIT Press.

    Google Scholar 

  • Moldavan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., et al. (2000). The structure and performance of an open-domain question answering system. In Proc.38 th Ann. Mtg. of ACL (ACL-2000), (pp. 563-570).

    Google Scholar 

  • O’Connor, J (1975). Retrieval of answer-sentences and answer-figures from papers by text searching. Information Processing & Management, 11(5/7), 155-164.

    Article  Google Scholar 

  • Ponte, J. M., & Croft, B. W. (1998). A language modeling approach to information retrieval. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 275-281).

    Google Scholar 

  • Prager, J., Chu-Carroll, J. & Czuba, K. (2002). Use of Wordnet hypernyms for answering what-is questions. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 250-257). Washington, DC: US GPO.

    Google Scholar 

  • Robertson, S. E. & Sparck Jones, K. (1976) Relevance weighting of search terms. J. of American Soceity of Information Science, 27, 129-146.

    Article  Google Scholar 

  • Salton, G. (1968). Automatic Information Organization and Retrieval. New York: McGraw-Hill.

    Google Scholar 

  • Salton, G., Fox, E. & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(17), 1022-1036.

    Article  Google Scholar 

  • Salton, G. & McGill, M. (1983). Introduction to Modern Information Organization and Retrieval. New York: McGraw-Hill.

    Google Scholar 

  • Srihari, R. K., Li, W. & Li, X. (200x). Question Answering Supported by Multiple Levels of Information Extraction. paper in this volume.

    Google Scholar 

  • Soubbotin, M. M. (2002). Patterns of potential answer expressions as clues to the right answers. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 293-302). Washington, DC: US GPO.

    Google Scholar 

  • Tague-Sutcliffe, J. (1992). Measuring the informativeness of a retrieval process. In Proc. 15 th Ann. Intl. ACM SIGIR Conf. on R&D in IR , (pp. 23-36).

    Google Scholar 

  • Turtle, H. (1994). Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proc. 17 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 212-220).

    Google Scholar 

  • Turtle, H. & Croft, B. W. (1991). Evaluation of an inference nework-based retrieval model. ACM Transactions on Information Systems, 9(3), 187-222.

    Article  Google Scholar 

  • Voorhees, E. M. (2001). Overview of the TREC-9 Question Answering track. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 71-79). Washington, DC: US GPO.

    Google Scholar 

  • Voorhees, E. M. (2002). Overview of the TREC 2001 Question Answering track.. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 42-51). Washington, DC: US GPO.

    Google Scholar 

  • Winston, P. H. (1977). Artificial Intelligence. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Woods, W. A. (1977). Lunar rocks in Natural English: Explorations in NL Q-A. Linguistic Structures Processing, 521-569.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this chapter

Cite this chapter

Grunfeld, L., Kwok, KL. (2008). Sentence Ranking Using Keywords And Meta-Keywords. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_7

Download citation

Publish with us

Policies and ethics