Synonyms
Indexing granularity
Definition
Indexing units refers to the granularity of information in the retrieval system’s index, which can be in principle any document part of a structured text, and as a consequence determines the possible units of retrieval. There are three basic approaches. The first approach is to index every potentially retrievable unit as a whole – the so-called element-based approach [13]. The second approach is to index disjoint nodes – and relying on aggregation or score propagation methods for scoring higher-level nodes [e.g., 1, 12]. The third approach is to index only selected elements, for example by indexing particular element types in separate indexes [10]. Various mixtures of these approaches have also been applied.
All approaches make implicit or explicit assumptions on the (most likely) unit of retrieval. Although there may be no designated retrieval unit (such as the document or root node of the structured document), this also does not mean that...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Abolhassani M, Fuhr N, Malik S. HyREX at INEX 2003. In: Fuhr N, Lalmas M, Malik S, editors. Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 27–32.
Burkowski FJ. Retrieval activities in a database consisting of heterogeneous collections of structured text. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 112–25.
Chiaramella Y. Browsing and querying: two complementary approaches for multimedia information retrieval. In: Proceedings of the Hypertext, Information Retrieval and Multimedia; 1997. p. 9–26.
Cutler M, Shih Y, Meng W. Using the structure of HTML documents to improve retrieval. In: Proceedings of the 1st USENIX Symposium on Internet Technologies and Systems; 1997.
Fuhr N, Gövert N, Rölleke T. DOLORES: a system for logic-based retrieval of multimedia objects. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 257–65.
Geva S. GPX – Gardens Point XML IR at INEX 2004. In: Proceedings of the 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2004. p. 211–23.
INEX. INitiative for the Evaluation of XML Retrieval. 2007. http://inex.is.informatik.uni-duisburg.de/.
Kamps J, de Rijke M, Sigurbjörnsson B. Length normalization in XML retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 80–7.
Kazai G, Lalmas M, Reid M. Construction of a test collection for the focussed retrieval of structured documents. In: Proceedings of the 25th European Conference on IR Research; 2003. p. 88–103.
Mass Y, Mandelbrod M. Retrieving the most relevant XML components. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 53–8.
Myaeng SH, Jang DH, Kim MS, Zhoo ZC. A flexible model for retrieval of SGML documents. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 138–45.
Ogilvie P, Callan J. Using language models for flat text queries in XML retrieval. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 12–8.
Sigurbjörnsson B, Kamps J, de Rijke M. An element-based approch to XML retrieval. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 19–26.
Trotman A, Sigurbjörnsson B. Narrowed. Extended XPath I (NEXI). In: Proceedings of the 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2004. p. 16–40.
Wilkinson R. Effective retrieval of structured documents. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1994. p. 311–7.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Kamps, J. (2018). Indexing Units of Structured Text Retrieval. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_202
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_202
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering