Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5253))

  • 993 Accesses

Abstract

Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great potential benefit in considering these passages independently when computing the similarity of a document with a user’s query. Experiments have been realized in order to identify the kinds of passage which are the best suited for such a process. Contrarily to what could have been expected, working with thematic segments, which are likely to represent only one topic each, has led to greatly lower effectiveness results than the use of arbitrary sequences of words. In this paper, we show that this paradoxical observation is mainly due to biases induced by the great length diversity of the thematic passages. Therefore, we propose here to cope with these biases by using a more powerful text length normalization technique. Experiments show that, when length biases are laid aside, the use of thematic passages is better suited than arbitrary sequences of words to retrieve relevant informations as response to a user’s query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Voorhees, E.M., Harman, D.: Overview of the fifth text retrieval conference (trec-5). In: TREC 1996 (1996)

    Google Scholar 

  2. Zobel, J., Moffat, A.: Exploring the similarity space. SIGIR Forum 32(1), 18–34 (1998)

    Article  Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press /Addison-Wesley (1999)

    Google Scholar 

  4. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: TREC 1992, pp. 21–30 (1992)

    Google Scholar 

  5. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  6. Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: SIGIR 1993, pp. 49–58. ACM, New York (1993)

    Chapter  Google Scholar 

  7. Callan, J.P.: Passage-level evidence in document retrieval. In: SIGIR 1994, pp. 302–310. Springer, Heidelberg (1994)

    Google Scholar 

  8. Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society of Information Science 52(4), 344–364 (2001)

    Article  Google Scholar 

  9. Liu, X., Croft, W.B.: Passage retrieval based on language models. In: CIKM 2002, pp. 375–382. ACM, New York (2002)

    Chapter  Google Scholar 

  10. Lamprier, S., Amghar, T., Levrat, B., Saubion, F.: Seggen: A genetic algorithm for linear text segmentation. In: Veloso, M.M. (ed.) IJCAI 2007, pp. 1647–1652 (2007)

    Google Scholar 

  11. Lamprier, S., Amghar, T., Levrat, B., Saubion, F.: Document length normalization by statistical regression. In: ICTAI 2007, vol. (2), pp. 19–26. IEEE, Los Alamitos (2007)

    Google Scholar 

  12. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  13. Singhal, A., Salton, G., Mitra, M., Buckley, C.: Document length normalization. Information Processing and Management 32(5), 619–633 (1996)

    Article  Google Scholar 

  14. Chung, T.L., Luk, R.W.P., Wong, K.F., Kwok, K.L., Lee, D.L.: Adapting pivoted document-length normalization for query size: Experiments in chinese and english. In: TALIP 2006, vol. 5(3), pp. 245–263 (2006)

    Google Scholar 

  15. Zobel, J., Moffat, A., Wilkinson, R., Sacks-Davis, R.: Efficient retrieval of partial documents. Information Processing and Management 31(3), 361–377 (1995)

    Article  Google Scholar 

  16. Stanfill, C., Waltz, D.L.: Statistical methods, artificial intelligence, and information retrieval, pp. 215–225 (1992)

    Google Scholar 

  17. Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: SIGIR 1997, pp. 178–185. ACM Press, New York (1997)

    Chapter  Google Scholar 

  18. Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: Hypertext 1996, pp. 53–65. ACM Press, New York (1996)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Danail Dochev Marco Pistore Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lamprier, S., Amghar, T., Levrat, B., Saubion, F. (2008). Thematic Segment Retrieval Revisited. In: Dochev, D., Pistore, M., Traverso, P. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2008. Lecture Notes in Computer Science(), vol 5253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85776-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85776-1_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85775-4

  • Online ISBN: 978-3-540-85776-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics