Advertisement

Sentence Ranking Using Keywords And Meta-Keywords

  • Laszlo Grunfeld
  • Kui-Lam Kwok
Part of the Text, Speech and Language Technology book series (TLTB, volume 32)

This paper describes our approach and experience with the question-answering tasks of TREC-9 and TREC-2001. Our approach employed techniques from IR, pattern matching and metakeyword detection with little linguistic analysis and no natural language understanding. It involved the following four steps: 1) retrieval of top-ranked subdocuments using content keywords of a question as query; 2) weighting of sentences from retrieved subdocuments based on heuristic rules and matching with question keywords; 3) refined weighting and ranking of sentences using agreement with expected answer type suggested by question analysis; and 4) extraction of answer strings from top-ranked sentences based on expected answer type and sentence word pattern rules. The blind experiments in TREC showed that the approach returned reasonably good results, excluding those questions with NIL answer. It works because the questions are mainly factoid, definitional types. Analysis shows that our system improves with more subdocuments retrieved, and if answer candidates from two different retrieval lists are combined by confirmation. It can identify sentences containing answers quite well, but it often fails when answers need to be extracted correctly within a 50-byte output. These experiments may serve as examples of how far one can attain in open domain question-answering without making use of external resources (e.g. the web) to find answers, and without deeper natural language analysis.

Keywords

Anorexia Nervosa Information Retrieval Question Type Query Word Mean Reciprocal Rank 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

10. References

  1. Allan, J., Connell, M. E., Croft, W. B., Feng, F-F, Fisher, D. & Li, X. (2001). INQUERY and TREC-9. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 551-562). Washington, DC: US GPO.Google Scholar
  2. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press.Google Scholar
  3. Brill, E., Lin J., Banko, M., Dumais, S., & Ng, Andrew (2002). Data-intensive Question Answering. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 393-400). Washington, DC: US GPO.Google Scholar
  4. Burger, A. & Lafferty, J. (1999). Information retrieval as statistical translation. In Proc. 22nd Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 222-229).Google Scholar
  5. Callan, J. P. (1994). Passage-level evidence in document retrieval. In Proc. 17 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 302-310).Google Scholar
  6. Clarke, C. L. A. , Cormack, G. V., Lynam, T. R., Li, C. M., & McLearn, G. L. (2002). Web reinforced Q A (Multitext experiments for TREC 2001). In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 673-679). Washington, DC: US GPO.Google Scholar
  7. Cormack, G. V., Clarke, C. L. A., Palmer, C. R. and Kisman, D. I. E. (2000). Fast automatic passage ranking (MultiText experiments for TREC-8). In E.M. Voorhees & D.K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 735-741). Washington, DC: US GPO.Google Scholar
  8. Harabagiu, S., Moldovan, D., Pasca, M., Surdeanu, M., Mihalcea, R., Girju, R., et al. (2002). Answering complex, list and context questions with LCC’s Q-A server. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 355-361). Washington, DC: US GPO.Google Scholar
  9. Hovy, E., Gerber, L., Hermjakob, U., Junk M., & Lin, C-Y. (2001). Question answering in Webclopedia. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 655-664). Washington, DC: US GPO.Google Scholar
  10. Hull, D. A. (2000). Xerox TREC-8 question answering track report. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 743-752). Washington, DC: US GPO.Google Scholar
  11. Katz, B. (1997). From sentence processing to information access on the world wide web. AAAI Spring Symposium on NLP for the WWW, Stanford University, CA. (available on: http://www.ai.mit. edu/projects/infolab/start-system.html.)
  12. Kraft, D. & Buell, D. A. (1983). Fuzzy sets and generalized Boolean retrieval systems. Intl. J. of Man-Machine Studies, 19, 45-56.CrossRefGoogle Scholar
  13. Kwok, K. L. (1995). A network approach to probabilistic information retrieval. ACM Transactions on Office Information System, 13, 324-353.CrossRefGoogle Scholar
  14. Kwok, K. L. & Chan, M. (1998). Improving two-stage ad-hoc retrieval for short queries. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR. (pp. 250-256).Google Scholar
  15. Kwok, K. L., Grunfeld, L. & Chan, M (2000). TREC-8 Ad-Hoc, Query and Filtering Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 217-227). Washington, DC: US GPO.Google Scholar
  16. Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2001). TREC-9 Cross Language, Web and Question-Answering Track experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information  Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 417-426). Washington, DC: US GPO.Google Scholar
  17. Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2002). TREC 2001 Question-Answer, Web and Cross Language Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 452-456). Washington, DC: US GPO.Google Scholar
  18. Kwok, K. L., Papadopoulos, L. & Kwan, Kathy Y. Y. (1993). Retrieval experiments with a large collection using PIRCS. In D. K. Harman, (Ed.), The First Text REtrieval Conference (TREC-1), NIST Special Publication 500-207, (pp. 153-172). Washington, DC: US GPO.Google Scholar
  19. Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proc. 24 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 111-119).Google Scholar
  20. Lavenko, V. & Croft, W. B. (2001). Relevance-based languague models. In Proc. 24 th Ann. Intl. ACM SIGIR Conf. on R&D in IR., (pp. 120-127).Google Scholar
  21. Licklider, J. C. R. (1965). Libraries of the Future. Cambridge, MA: MIT Press.Google Scholar
  22. Moldavan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., et al. (2000). The structure and performance of an open-domain question answering system. In Proc.38 th Ann. Mtg. of ACL (ACL-2000), (pp. 563-570).Google Scholar
  23. O’Connor, J (1975). Retrieval of answer-sentences and answer-figures from papers by text searching. Information Processing & Management, 11(5/7), 155-164.CrossRefGoogle Scholar
  24. Ponte, J. M., & Croft, B. W. (1998). A language modeling approach to information retrieval. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 275-281).Google Scholar
  25. Prager, J., Chu-Carroll, J. & Czuba, K. (2002). Use of Wordnet hypernyms for answering what-is questions. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 250-257). Washington, DC: US GPO.Google Scholar
  26. Robertson, S. E. & Sparck Jones, K. (1976) Relevance weighting of search terms. J. of American Soceity of Information Science, 27, 129-146.CrossRefGoogle Scholar
  27. Salton, G. (1968). Automatic Information Organization and Retrieval. New York: McGraw-Hill.Google Scholar
  28. Salton, G., Fox, E. & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(17), 1022-1036.CrossRefGoogle Scholar
  29. Salton, G. & McGill, M. (1983). Introduction to Modern Information Organization and Retrieval. New York: McGraw-Hill.Google Scholar
  30. Srihari, R. K., Li, W. & Li, X. (200x). Question Answering Supported by Multiple Levels of Information Extraction. paper in this volume.Google Scholar
  31. Soubbotin, M. M. (2002). Patterns of potential answer expressions as clues to the right answers. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 293-302). Washington, DC: US GPO.Google Scholar
  32. Tague-Sutcliffe, J. (1992). Measuring the informativeness of a retrieval process. In Proc. 15 th Ann. Intl. ACM SIGIR Conf. on R&D in IR , (pp. 23-36).Google Scholar
  33. Turtle, H. (1994). Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proc. 17 th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 212-220).Google Scholar
  34. Turtle, H. & Croft, B. W. (1991). Evaluation of an inference nework-based retrieval model. ACM Transactions on Information Systems, 9(3), 187-222.CrossRefGoogle Scholar
  35. Voorhees, E. M. (2001). Overview of the TREC-9 Question Answering track. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 71-79). Washington, DC: US GPO.Google Scholar
  36. Voorhees, E. M. (2002). Overview of the TREC 2001 Question Answering track.. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 42-51). Washington, DC: US GPO.Google Scholar
  37. Winston, P. H. (1977). Artificial Intelligence. Reading, MA: Addison-Wesley.Google Scholar
  38. Woods, W. A. (1977). Lunar rocks in Natural English: Explorations in NL Q-A. Linguistic Structures Processing, 521-569.Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  • Laszlo Grunfeld
    • 1
  • Kui-Lam Kwok
    • 1
  1. 1.Queens College of CUNYFlushingUSA

Personalised recommendations