Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Indexing Units of Structured Text Retrieval

  • Jaap Kamps
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_202

Synonyms

Indexing granularity

Definition

Indexing units refers to the granularity of information in the retrieval system’s index, which can be in principle any document part of a structured text, and as a consequence determines the possible units of retrieval. There are three basic approaches. The first approach is to index every potentially retrievable unit as a whole – the so-called element-based approach [13]. The second approach is to index disjoint nodes – and relying on aggregation or score propagation methods for scoring higher-level nodes [e.g., 1, 12]. The third approach is to index only selected elements, for example by indexing particular element types in separate indexes [10]. Various mixtures of these approaches have also been applied.

All approaches make implicit or explicit assumptions on the (most likely) unit of retrieval. Although there may be no designated retrieval unit (such as the document or root node of the structured document), this also does not mean that...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Abolhassani M, Fuhr N, Malik S. HyREX at INEX 2003. In: Fuhr N, Lalmas M, Malik S, editors. Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 27–32.Google Scholar
  2. 2.
    Burkowski FJ. Retrieval activities in a database consisting of heterogeneous collections of structured text. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 112–25.Google Scholar
  3. 3.
    Chiaramella Y. Browsing and querying: two complementary approaches for multimedia information retrieval. In: Proceedings of the Hypertext, Information Retrieval and Multimedia; 1997. p. 9–26.Google Scholar
  4. 4.
    Cutler M, Shih Y, Meng W. Using the structure of HTML documents to improve retrieval. In: Proceedings of the 1st USENIX Symposium on Internet Technologies and Systems; 1997.Google Scholar
  5. 5.
    Fuhr N, Gövert N, Rölleke T. DOLORES: a system for logic-based retrieval of multimedia objects. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 257–65.Google Scholar
  6. 6.
    Geva S. GPX – Gardens Point XML IR at INEX 2004. In: Proceedings of the 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2004. p. 211–23.Google Scholar
  7. 7.
    INEX. INitiative for the Evaluation of XML Retrieval. 2007. http://inex.is.informatik.uni-duisburg.de/.
  8. 8.
    Kamps J, de Rijke M, Sigurbjörnsson B. Length normalization in XML retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 80–7.Google Scholar
  9. 9.
    Kazai G, Lalmas M, Reid M. Construction of a test collection for the focussed retrieval of structured documents. In: Proceedings of the 25th European Conference on IR Research; 2003. p. 88–103.Google Scholar
  10. 10.
    Mass Y, Mandelbrod M. Retrieving the most relevant XML components. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 53–8.Google Scholar
  11. 11.
    Myaeng SH, Jang DH, Kim MS, Zhoo ZC. A flexible model for retrieval of SGML documents. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 138–45.Google Scholar
  12. 12.
    Ogilvie P, Callan J. Using language models for flat text queries in XML retrieval. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 12–8.Google Scholar
  13. 13.
    Sigurbjörnsson B, Kamps J, de Rijke M. An element-based approch to XML retrieval. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 19–26.Google Scholar
  14. 14.
    Trotman A, Sigurbjörnsson B. Narrowed. Extended XPath I (NEXI). In: Proceedings of the 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2004. p. 16–40.CrossRefGoogle Scholar
  15. 15.
    Wilkinson R. Effective retrieval of structured documents. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1994. p. 311–7.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of AmsterdamAmsterdamThe Netherlands

Section editors and affiliations

  • Jaap Kamps
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands