Combining Sources of Evidence for Recognition of Relevant Passages in Texts

  • Alexander Gelbukh
  • NamO Kang
  • SangYong Han
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3563)

Abstract

Automatically recognizing in large electronic texts short selfcontained passages relevant for a user query is necessary for fast and accurate information access to large text archives. Surprisingly, most search engines practically do not provide any help to the user in this tedious task, just presenting a list of whole documents supposedly containing the requested information. We show how different sources of evidence can be combined in order to assess the quality of different passages in a document and present the highest ranked ones to the user. Specifically, we take into account the relevance of a passage to the user query, structural integrity of the passage with respect to paragraphs and sections of the document, and topic integrity with respect to topic changes and topic threads in the text. Our experiments show that the results are promising.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Bolshakov, A.G.: Text segmentation into paragraphs based on local text cohesion. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 158–166. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Cardie, C.: Empirical Methods in Information Extraction. AI Magazine 18 (4), 65–79 (1997)Google Scholar
  4. 4.
    Clarke, C.L.A., Cormack, G.V., Lynam, T.R., Terra, E.L.: Question Answering by Passage Selection. In: Advances in Open Domain Question Answering, Kluwer Academic Publishers, Kluwer (2004)Google Scholar
  5. 5.
    Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-Based Query Refinement. Information Processing and Management 36(1), 133–153 (2000)CrossRefGoogle Scholar
  6. 6.
    Del-Castillo-Escobedo, A., Montes-y-Gómez, M., Villaseñor-Pineda, L.: QA on the Web: A Preliminary Study for Spanish Language. In: Proc. of ENC-2004, IEEE, Los Alamitos (2004)Google Scholar
  7. 7.
    Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, The MIT Press, Cambridge (1998)Google Scholar
  8. 8.
    LLopis, F., Vicedo, J.L., Ferrández, A.: Passage Selection to Improve Question Answering. In: Multilingual Summarization and Question Answering, COLING 2002 (2002)Google Scholar
  9. 9.
    Mochizuki, H., Iwayama, M., Okumura, M.: Passage-Level Document Retrieval Using Lexical Chains. RIAO 2000, 491–506 (2000)Google Scholar
  10. 10.
    Nakao, Y.: A Method for Related-passage Extraction based on Thematic Hierarchy. IPSJ Transactions on Databases 42 (SIG 10 (TOD 11)), 39–53 (2001)Google Scholar
  11. 11.
    Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: 16th annual international ACM SIGIR conf. on Research and development in information retrieval, US, pp. 49–58 (1993)Google Scholar
  12. 12.
    Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  13. 13.
    Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. In: Mani, I., Maybury, M. (eds.) Advances in automatic text summarization, MIT, Cambridge (1999)Google Scholar
  14. 14.
    Page, L., Brin, S.: The anatomy of a large-scale hypertextual web search engine. In: Proc. 7th Intl. WWW Conf., pp. 107–117 (1998)Google Scholar
  15. 15.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  17. 17.
    Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer, Dordrecht (1999)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Alexander Gelbukh
    • 1
    • 2
  • NamO Kang
    • 1
  • SangYong Han
    • 1
  1. 1.Chung-Ang UniversityKorea
  2. 2.National Polytechnic InstituteMexico

Personalised recommendations