Thematic Segment Retrieval Revisited

Lamprier, Sylvain; Amghar, Tassadit; Levrat, Bernard; Saubion, Frédéric

doi:10.1007/978-3-540-85776-1_14

Sylvain Lamprier¹,
Tassadit Amghar¹,
Bernard Levrat¹ &
…
Frédéric Saubion¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5253))

Included in the following conference series:

International Conference on Artificial Intelligence: Methodology, Systems, and Applications

993 Accesses

Abstract

Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great potential benefit in considering these passages independently when computing the similarity of a document with a user’s query. Experiments have been realized in order to identify the kinds of passage which are the best suited for such a process. Contrarily to what could have been expected, working with thematic segments, which are likely to represent only one topic each, has led to greatly lower effectiveness results than the use of arbitrary sequences of words. In this paper, we show that this paradoxical observation is mainly due to biases induced by the great length diversity of the thematic passages. Therefore, we propose here to cope with these biases by using a more powerful text length normalization technique. Experiments show that, when length biases are laid aside, the use of thematic passages is better suited than arbitrary sequences of words to retrieve relevant informations as response to a user’s query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Voorhees, E.M., Harman, D.: Overview of the fifth text retrieval conference (trec-5). In: TREC 1996 (1996)
Google Scholar
Zobel, J., Moffat, A.: Exploring the similarity space. SIGIR Forum 32(1), 18–34 (1998)
Article Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press /Addison-Wesley (1999)
Google Scholar
Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: TREC 1992, pp. 21–30 (1992)
Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: SIGIR 1993, pp. 49–58. ACM, New York (1993)
Chapter Google Scholar
Callan, J.P.: Passage-level evidence in document retrieval. In: SIGIR 1994, pp. 302–310. Springer, Heidelberg (1994)
Google Scholar
Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society of Information Science 52(4), 344–364 (2001)
Article Google Scholar
Liu, X., Croft, W.B.: Passage retrieval based on language models. In: CIKM 2002, pp. 375–382. ACM, New York (2002)
Chapter Google Scholar
Lamprier, S., Amghar, T., Levrat, B., Saubion, F.: Seggen: A genetic algorithm for linear text segmentation. In: Veloso, M.M. (ed.) IJCAI 2007, pp. 1647–1652 (2007)
Google Scholar
Lamprier, S., Amghar, T., Levrat, B., Saubion, F.: Document length normalization by statistical regression. In: ICTAI 2007, vol. (2), pp. 19–26. IEEE, Los Alamitos (2007)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Singhal, A., Salton, G., Mitra, M., Buckley, C.: Document length normalization. Information Processing and Management 32(5), 619–633 (1996)
Article Google Scholar
Chung, T.L., Luk, R.W.P., Wong, K.F., Kwok, K.L., Lee, D.L.: Adapting pivoted document-length normalization for query size: Experiments in chinese and english. In: TALIP 2006, vol. 5(3), pp. 245–263 (2006)
Google Scholar
Zobel, J., Moffat, A., Wilkinson, R., Sacks-Davis, R.: Efficient retrieval of partial documents. Information Processing and Management 31(3), 361–377 (1995)
Article Google Scholar
Stanfill, C., Waltz, D.L.: Statistical methods, artificial intelligence, and information retrieval, pp. 215–225 (1992)
Google Scholar
Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: SIGIR 1997, pp. 178–185. ACM Press, New York (1997)
Chapter Google Scholar
Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: Hypertext 1996, pp. 53–65. ACM Press, New York (1996)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LERIA - University of Angers, 2 Bd Lavoisier, 49000, Angers, France
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat & Frédéric Saubion

Authors

Sylvain Lamprier
View author publications
You can also search for this author in PubMed Google Scholar
Tassadit Amghar
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Levrat
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Saubion
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Danail Dochev Marco Pistore Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lamprier, S., Amghar, T., Levrat, B., Saubion, F. (2008). Thematic Segment Retrieval Revisited. In: Dochev, D., Pistore, M., Traverso, P. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2008. Lecture Notes in Computer Science(), vol 5253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85776-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-85776-1_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85775-4
Online ISBN: 978-3-540-85776-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics