Combining Sources of Evidence for Recognition of Relevant Passages in Texts

Gelbukh, Alexander; Kang, NamO; Han, SangYong

doi:10.1007/11533962_25

Alexander Gelbukh^19,20,
NamO Kang¹⁹ &
SangYong Han¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3563))

Included in the following conference series:

International Symposium and School on Advancex Distributed Systems

948 Accesses

Abstract

Automatically recognizing in large electronic texts short selfcontained passages relevant for a user query is necessary for fast and accurate information access to large text archives. Surprisingly, most search engines practically do not provide any help to the user in this tedious task, just presenting a list of whole documents supposedly containing the requested information. We show how different sources of evidence can be combined in order to assess the quality of different passages in a document and present the highest ranked ones to the user. Specifically, we take into account the relevance of a passage to the user query, structural integrity of the passage with respect to paragraphs and sections of the document, and topic integrity with respect to topic changes and topic threads in the text. Our experiments show that the results are promising.

Work done under partial support of the ITRI of Chung-Ang University, Korea, and for the first author, Korean Government (KIPA) and Mexican Government (SNI, CONACyT, The first author is currently on Sabbatical leave at Chung-Ang University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Bolshakov, A.G.: Text segmentation into paragraphs based on local text cohesion. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 158–166. Springer, Heidelberg (2001)
Chapter Google Scholar
Cardie, C.: Empirical Methods in Information Extraction. AI Magazine 18 (4), 65–79 (1997)
Google Scholar
Clarke, C.L.A., Cormack, G.V., Lynam, T.R., Terra, E.L.: Question Answering by Passage Selection. In: Advances in Open Domain Question Answering, Kluwer Academic Publishers, Kluwer (2004)
Google Scholar
Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-Based Query Refinement. Information Processing and Management 36(1), 133–153 (2000)
Article Google Scholar
Del-Castillo-Escobedo, A., Montes-y-Gómez, M., Villaseñor-Pineda, L.: QA on the Web: A Preliminary Study for Spanish Language. In: Proc. of ENC-2004, IEEE, Los Alamitos (2004)
Google Scholar
Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, The MIT Press, Cambridge (1998)
Google Scholar
LLopis, F., Vicedo, J.L., Ferrández, A.: Passage Selection to Improve Question Answering. In: Multilingual Summarization and Question Answering, COLING 2002 (2002)
Google Scholar
Mochizuki, H., Iwayama, M., Okumura, M.: Passage-Level Document Retrieval Using Lexical Chains. RIAO 2000, 491–506 (2000)
Google Scholar
Nakao, Y.: A Method for Related-passage Extraction based on Thematic Hierarchy. IPSJ Transactions on Databases 42 (SIG 10 (TOD 11)), 39–53 (2001)
Google Scholar
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: 16th annual international ACM SIGIR conf. on Research and development in information retrieval, US, pp. 49–58 (1993)
Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. In: Mani, I., Maybury, M. (eds.) Advances in automatic text summarization, MIT, Cambridge (1999)
Google Scholar
Page, L., Brin, S.: The anatomy of a large-scale hypertextual web search engine. In: Proc. 7th Intl. WWW Conf., pp. 107–117 (1998)
Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)
Chapter Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer, Dordrecht (1999)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Chung-Ang University, Korea
Alexander Gelbukh, NamO Kang & SangYong Han
National Polytechnic Institute, Mexico
Alexander Gelbukh

Authors

Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
NamO Kang
View author publications
You can also search for this author in PubMed Google Scholar
SangYong Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multi-Agent Systems Development Group, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Prolongación López Mateos Sur No. 590, Guadalajara, Jalisco, México
Félix F. Ramos
Department "Sistemas de Informacion", Universidad de Guadalajara, CUCEA, 799, Periferico Norte, Ed. L308, 45100, Zapopan, Jal., Mexico
Victor Larios Rosillo
Computer Science Dept., University of Rostock, 18051, Rostock, Germany
Herwig Unger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gelbukh, A., Kang, N., Han, S. (2005). Combining Sources of Evidence for Recognition of Relevant Passages in Texts. In: Ramos, F.F., Larios Rosillo, V., Unger, H. (eds) Advanced Distributed Systems. ISSADS 2005. Lecture Notes in Computer Science, vol 3563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11533962_25

Download citation

DOI: https://doi.org/10.1007/11533962_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28063-7
Online ISBN: 978-3-540-31674-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics