Skip to main content

Structured Text Retrieval Models

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

Retrieval Models for Text Databases

Definition

Structured text retrieval models provide a formal definition or mathematical framework for querying semi-structured textual databases. A textual database contains both content and structure. The content is the text itself, and the structure divides the database into separate textual parts and relates those textual parts by some criterion. Often, textual databases can be represented as marked-up text, for instance, as XML, where the XML elements define the structure on the text content. Retrieval models for textual databases should comprise of three parts: (i) a model of the text, (ii) a model of the structure, and (iii) a query language [4]: The model of the text defines a tokenization into words or other semantic units, as well as stop words, stemming, synonyms, etc. The model of the structure defines parts of the text, typically a contiguous portion of the text called element, region, or segment, which is defined on top of the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Alink W XIRAF: an XML information retrieval approach to digital forensics. Master’s thesis, University of Twente. 2005.

    Google Scholar 

  2. Amer-Yahia S, Botev C, Shanmugasundaram J TeXQuery: a full-text search extension to XQuery. In: Proceedings of the 12th International World Wide Web Conference; 2004.

    Google Scholar 

  3. Amer-Yahia S, Lakshmanan LVS, Pandit S. FleXPath: flexible structure and full-text querying for XML. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004.

    Google Scholar 

  4. Baeza-Yates RA, Navarro G. Integrating contents and structure in text retrieval. ACM SIGMOD Rec. 1996;25(1):67–79.

    Article  Google Scholar 

  5. Burkowski FJ Retrieval activities in a database consisting of heterogeneous collections of structured text. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 112–24.

    Google Scholar 

  6. Carmel D, Maarek YS, Mandelbrod M, Mass Y, Soffer A. Searching XML documents via XML fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2003. p. 151–8.

    Google Scholar 

  7. Clarke CLA, Cormack GV, Burkowski FJ. An algebra for structured text search and a framework for its implementation. Comput J. 1995;38(1):43–56.

    Article  Google Scholar 

  8. Fuhr N, Gövert N, Kazai G, Lalmas M, editors. In: Proceedings of the 1st International Workshop of the Initiative for the Evaluation of XML Retrieval; 2002.

    Google Scholar 

  9. Fuhr N, Grossjohann K. XIRQL: a query language for information retrieval in XML. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2001. p. 172–80.

    Google Scholar 

  10. Gonnet GH, Tompa FW Mind your grammar: a new approach to modelling text. In: Proceedings of the 13th International Conference on Very Large Data Bases; 1987. p. 339–46.

    Google Scholar 

  11. Jaakkola J, Kilpeläinen P. Nested text-region algebra. Technical report. University of Helsinki. 1999.

    Google Scholar 

  12. Mihajlovic V, Blok HE, Hiemstra D, Apers PMG. Score region algebra: building a transparent XML-IR database. In: Proceedings of the International Conference on Information and Knowledge Management; 2005. p. 12–9.

    Google Scholar 

  13. Navarro G, Baeza-Yates RA. Proximal nodes: a model to query document databases by content and structure. ACM Trans Inf Syst. 1997;15(4):400–35.

    Article  Google Scholar 

  14. Ogilvie P, Callan J. Hierarchical language models for XML component retrieval. In: Advances in XML information retrieval. Lecture notes in computer science 3493. Springer; 2005. p. 224–37.

    Google Scholar 

  15. Salminen A, Tompa FW. PAT expressions: an algebra for text search. Proc Complex. 1992;92:309–32.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Djoerd Hiemstra .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Hiemstra, D., Baeza-Yates, R. (2018). Structured Text Retrieval Models. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_379

Download citation

Publish with us

Policies and ethics