Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Structured Text Retrieval Models

  • Djoerd Hiemstra
  • Ricardo Baeza-Yates
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_379

Synonyms

Retrieval Models for Text Databases

Definition

Structured text retrieval models provide a formal definition or mathematical framework for querying semi-structured textual databases. A textual database contains both content and structure. The content is the text itself, and the structure divides the database into separate textual parts and relates those textual parts by some criterion. Often, textual databases can be represented as marked-up text, for instance, as XML, where the XML elements define the structure on the text content. Retrieval models for textual databases should comprise of three parts: (i) a model of the text, (ii) a model of the structure, and (iii) a query language [4]: The model of the text defines a tokenization into words or other semantic units, as well as stop words, stemming, synonyms, etc. The model of the structure defines parts of the text, typically a contiguous portion of the text called element, region, or segment, which is defined on top of the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Alink W XIRAF: an XML information retrieval approach to digital forensics. Master’s thesis, University of Twente. 2005.Google Scholar
  2. 2.
    Amer-Yahia S, Botev C, Shanmugasundaram J TeXQuery: a full-text search extension to XQuery. In: Proceedings of the 12th International World Wide Web Conference; 2004.Google Scholar
  3. 3.
    Amer-Yahia S, Lakshmanan LVS, Pandit S. FleXPath: flexible structure and full-text querying for XML. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004.Google Scholar
  4. 4.
    Baeza-Yates RA, Navarro G. Integrating contents and structure in text retrieval. ACM SIGMOD Rec. 1996;25(1):67–79.CrossRefGoogle Scholar
  5. 5.
    Burkowski FJ Retrieval activities in a database consisting of heterogeneous collections of structured text. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 112–24.Google Scholar
  6. 6.
    Carmel D, Maarek YS, Mandelbrod M, Mass Y, Soffer A. Searching XML documents via XML fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2003. p. 151–8.Google Scholar
  7. 7.
    Clarke CLA, Cormack GV, Burkowski FJ. An algebra for structured text search and a framework for its implementation. Comput J. 1995;38(1):43–56.CrossRefGoogle Scholar
  8. 8.
    Fuhr N, Gövert N, Kazai G, Lalmas M, editors. In: Proceedings of the 1st International Workshop of the Initiative for the Evaluation of XML Retrieval; 2002.Google Scholar
  9. 9.
    Fuhr N, Grossjohann K. XIRQL: a query language for information retrieval in XML. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2001. p. 172–80.Google Scholar
  10. 10.
    Gonnet GH, Tompa FW Mind your grammar: a new approach to modelling text. In: Proceedings of the 13th International Conference on Very Large Data Bases; 1987. p. 339–46.Google Scholar
  11. 11.
    Jaakkola J, Kilpeläinen P. Nested text-region algebra. Technical report. University of Helsinki. 1999.Google Scholar
  12. 12.
    Mihajlovic V, Blok HE, Hiemstra D, Apers PMG. Score region algebra: building a transparent XML-IR database. In: Proceedings of the International Conference on Information and Knowledge Management; 2005. p. 12–9.Google Scholar
  13. 13.
    Navarro G, Baeza-Yates RA. Proximal nodes: a model to query document databases by content and structure. ACM Trans Inf Syst. 1997;15(4):400–35.CrossRefGoogle Scholar
  14. 14.
    Ogilvie P, Callan J. Hierarchical language models for XML component retrieval. In: Advances in XML information retrieval. Lecture notes in computer science 3493. Springer; 2005. p. 224–37.Google Scholar
  15. 15.
    Salminen A, Tompa FW. PAT expressions: an algebra for text search. Proc Complex. 1992;92:309–32.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of TwenteEnschedeThe Netherlands
  2. 2.NTENT, USA - Univ. Pompeu FabraSpain - Univ. de ChileChile

Section editors and affiliations

  • Jaap Kamps
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands