Sentence Ranking Using Keywords And Meta-Keywords

Grunfeld, Laszlo; Kwok, Kui-Lam

doi:10.1007/978-1-4020-4746-6_7

Sentence Ranking Using Keywords And Meta-Keywords

Laszlo Grunfeld⁵ &
Kui-Lam Kwok⁵

Chapter

757 Accesses

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 32))

This paper describes our approach and experience with the question-answering tasks of TREC-9 and TREC-2001. Our approach employed techniques from IR, pattern matching and metakeyword detection with little linguistic analysis and no natural language understanding. It involved the following four steps: 1) retrieval of top-ranked subdocuments using content keywords of a question as query; 2) weighting of sentences from retrieved subdocuments based on heuristic rules and matching with question keywords; 3) refined weighting and ranking of sentences using agreement with expected answer type suggested by question analysis; and 4) extraction of answer strings from top-ranked sentences based on expected answer type and sentence word pattern rules. The blind experiments in TREC showed that the approach returned reasonably good results, excluding those questions with NIL answer. It works because the questions are mainly factoid, definitional types. Analysis shows that our system improves with more subdocuments retrieved, and if answer candidates from two different retrieval lists are combined by confirmation. It can identify sentences containing answers quite well, but it often fails when answers need to be extracted correctly within a 50-byte output. These experiments may serve as examples of how far one can attain in open domain question-answering without making use of external resources (e.g. the web) to find answers, and without deeper natural language analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

10. References

Allan, J., Connell, M. E., Croft, W. B., Feng, F-F, Fisher, D. & Li, X. (2001). INQUERY and TREC-9. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 551-562). Washington, DC: US GPO.
Google Scholar
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press.
Google Scholar
Brill, E., Lin J., Banko, M., Dumais, S., & Ng, Andrew (2002). Data-intensive Question Answering. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 393-400). Washington, DC: US GPO.
Google Scholar
Burger, A. & Lafferty, J. (1999). Information retrieval as statistical translation. In Proc. 22nd Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 222-229).
Google Scholar
Callan, J. P. (1994). Passage-level evidence in document retrieval. In Proc. 17 ^th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 302-310).
Google Scholar
Clarke, C. L. A. , Cormack, G. V., Lynam, T. R., Li, C. M., & McLearn, G. L. (2002). Web reinforced Q A (Multitext experiments for TREC 2001). In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 673-679). Washington, DC: US GPO.
Google Scholar
Cormack, G. V., Clarke, C. L. A., Palmer, C. R. and Kisman, D. I. E. (2000). Fast automatic passage ranking (MultiText experiments for TREC-8). In E.M. Voorhees & D.K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 735-741). Washington, DC: US GPO.
Google Scholar
Harabagiu, S., Moldovan, D., Pasca, M., Surdeanu, M., Mihalcea, R., Girju, R., et al. (2002). Answering complex, list and context questions with LCC’s Q-A server. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 355-361). Washington, DC: US GPO.
Google Scholar
Hovy, E., Gerber, L., Hermjakob, U., Junk M., & Lin, C-Y. (2001). Question answering in Webclopedia. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 655-664). Washington, DC: US GPO.
Google Scholar
Hull, D. A. (2000). Xerox TREC-8 question answering track report. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 743-752). Washington, DC: US GPO.
Google Scholar
Katz, B. (1997). From sentence processing to information access on the world wide web. AAAI Spring Symposium on NLP for the WWW, Stanford University, CA. (available on: http://www.ai.mit. edu/projects/infolab/start-system.html.)
Kraft, D. & Buell, D. A. (1983). Fuzzy sets and generalized Boolean retrieval systems. Intl. J. of Man-Machine Studies, 19, 45-56.
Article Google Scholar
Kwok, K. L. (1995). A network approach to probabilistic information retrieval. ACM Transactions on Office Information System, 13, 324-353.
Article Google Scholar
Kwok, K. L. & Chan, M. (1998). Improving two-stage ad-hoc retrieval for short queries. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR. (pp. 250-256).
Google Scholar
Kwok, K. L., Grunfeld, L. & Chan, M (2000). TREC-8 Ad-Hoc, Query and Filtering Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, (pp. 217-227). Washington, DC: US GPO.
Google Scholar
Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2001). TREC-9 Cross Language, Web and Question-Answering Track experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 417-426). Washington, DC: US GPO.
Google Scholar
Kwok, K. L., Grunfeld, L., Dinstl, N. & Chan, M. (2002). TREC 2001 Question-Answer, Web and Cross Language Experiments using PIRCS. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 452-456). Washington, DC: US GPO.
Google Scholar
Kwok, K. L., Papadopoulos, L. & Kwan, Kathy Y. Y. (1993). Retrieval experiments with a large collection using PIRCS. In D. K. Harman, (Ed.), The First Text REtrieval Conference (TREC-1), NIST Special Publication 500-207, (pp. 153-172). Washington, DC: US GPO.
Google Scholar
Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proc. 24 ^th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 111-119).
Google Scholar
Lavenko, V. & Croft, W. B. (2001). Relevance-based languague models. In Proc. 24 ^th Ann. Intl. ACM SIGIR Conf. on R&D in IR., (pp. 120-127).
Google Scholar
Licklider, J. C. R. (1965). Libraries of the Future. Cambridge, MA: MIT Press.
Google Scholar
Moldavan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., et al. (2000). The structure and performance of an open-domain question answering system. In Proc.38 ^th Ann. Mtg. of ACL (ACL-2000), (pp. 563-570).
Google Scholar
O’Connor, J (1975). Retrieval of answer-sentences and answer-figures from papers by text searching. Information Processing & Management, 11(5/7), 155-164.
Article Google Scholar
Ponte, J. M., & Croft, B. W. (1998). A language modeling approach to information retrieval. In Proc. 21st Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 275-281).
Google Scholar
Prager, J., Chu-Carroll, J. & Czuba, K. (2002). Use of Wordnet hypernyms for answering what-is questions. In E.M. Voorhees & D.K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 250-257). Washington, DC: US GPO.
Google Scholar
Robertson, S. E. & Sparck Jones, K. (1976) Relevance weighting of search terms. J. of American Soceity of Information Science, 27, 129-146.
Article Google Scholar
Salton, G. (1968). Automatic Information Organization and Retrieval. New York: McGraw-Hill.
Google Scholar
Salton, G., Fox, E. & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(17), 1022-1036.
Article Google Scholar
Salton, G. & McGill, M. (1983). Introduction to Modern Information Organization and Retrieval. New York: McGraw-Hill.
Google Scholar
Srihari, R. K., Li, W. & Li, X. (200x). Question Answering Supported by Multiple Levels of Information Extraction. paper in this volume.
Google Scholar
Soubbotin, M. M. (2002). Patterns of potential answer expressions as clues to the right answers. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 293-302). Washington, DC: US GPO.
Google Scholar
Tague-Sutcliffe, J. (1992). Measuring the informativeness of a retrieval process. In Proc. 15 ^th Ann. Intl. ACM SIGIR Conf. on R&D in IR , (pp. 23-36).
Google Scholar
Turtle, H. (1994). Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proc. 17 ^th Ann. Intl. ACM SIGIR Conf. on R&D in IR, (pp. 212-220).
Google Scholar
Turtle, H. & Croft, B. W. (1991). Evaluation of an inference nework-based retrieval model. ACM Transactions on Information Systems, 9(3), 187-222.
Article Google Scholar
Voorhees, E. M. (2001). Overview of the TREC-9 Question Answering track. In E. M. Voorhees & D. K. Harman (Eds.), Information Technology: The Nineth Text REtrieval Conference (TREC-9), NIST Special Publication 500-249, (pp. 71-79). Washington, DC: US GPO.
Google Scholar
Voorhees, E. M. (2002). Overview of the TREC 2001 Question Answering track.. In E. M. Voorhees & D. K. Harman (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, (pp. 42-51). Washington, DC: US GPO.
Google Scholar
Winston, P. H. (1977). Artificial Intelligence. Reading, MA: Addison-Wesley.
Google Scholar
Woods, W. A. (1977). Lunar rocks in Natural English: Explorations in NL Q-A. Linguistic Structures Processing, 521-569.
Google Scholar

Download references

Author information

Authors and Affiliations

Queens College of CUNY, 11367, Flushing, NY, USA
Laszlo Grunfeld & Kui-Lam Kwok

Authors

Laszlo Grunfeld
View author publications
You can also search for this author in PubMed Google Scholar
Kui-Lam Kwok
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

State University of New York at Albany, 1400 Washington Avenue, 12222, Albany, NY, USA
Tomek Strzalkowski
University of Texas at Dallas, 75083, Richardson, TX, USA
Sanda M. Harabagiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grunfeld, L., Kwok, KL. (2008). Sentence Ranking Using Keywords And Meta-Keywords. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_7

Download citation

DOI: https://doi.org/10.1007/978-1-4020-4746-6_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4744-2
Online ISBN: 978-1-4020-4746-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics

Buying options