Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Processing Overlaps in Structured Text Retrieval

  • Georgina Ramírez
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_279

Synonyms

Controlling overlap; Removing overlap

Definition

In semi-structured text retrieval, processing overlap techniques are used to reduce the amount of overlapping (thus redundant) information returned to the user. The existence of redundant information in result lists is caused by the nested structure of semi-structured documents, where the same text fragment may appear in several of the marked up elements (see Fig. 1). In consequence, when retrieval systems perform a focused search on this type of document and use the marked up elements as retrieval objects, very often result lists contain overlapping elements. In retrieval applications where it is assumed that the user does not want to see the same information twice, it may be necessary to reduce or completely remove this overlap and return a ranked list of no overlapping elements. Thus, depending on the underlying user model and retrieval application, different processing overlap techniques are used in order to decide, given a...
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Clarke CLA. Controlling overlap in content-oriented XML retrieval. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2008. p. 314–21.Google Scholar
  2. 2.
    Geva S. GPX – gardens point XML IR at INEX 2005. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2006. p. 240–53.Google Scholar
  3. 3.
    Kazai G, Lalmas M, de Vries AP. The overlap problem in content-oriented XML retrieval evaluation. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 72–9.Google Scholar
  4. 4.
    Mass Y, Mandelbrod M. Using the INEX environment as a test bed for various user models for XML retrieval. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2006. p. 187–95.Google Scholar
  5. 5.
    Mihajlovi V, Ramírez G, Westerveld T, Hiemstra D, Blok HE, de Vries AP. TIJAH scratches INEX 2005: vague element selection, image search, overlap and relevance feedback. 2006. p. 72–87.Google Scholar
  6. 6.
    Sauvagnat K, Hlaoua L, Boughanem M. XFIRM at INEX 2005: ad-hoc and relevance feedback tracks. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2006. p. 88–103.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Yahoo! Research BarcelonaBarcelonaSpain

Section editors and affiliations

  • Jaap Kamps
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands