Advertisement

Information Retrieval

, Volume 8, Issue 4, pp 547–570 | Cite as

TIJAH: Embracing IR Methods in XML Databases

  • Johan ListEmail author
  • Vojkan Mihajlović
  • Georgina RamÍrez
  • Arjen P. de Vries
  • Djoerd Hiemstra
  • Henk Ernst Blok
Article

Abstract

This paper discusses our participation in INEX (the Initiative for the Evaluation of XML Retrieval) using the TIJAH XML-IR system. TIJAH’s system design follows a ‘standard’ layered database architecture, carefully separating the conceptual, logical and physical levels. At the conceptual level, we classify the INEX XPath-based query expressions into three different query patterns. For each pattern, we present its mapping into a query execution strategy. The logical layer exploits score region algebra (SRA) as the basis for query processing. We discuss the region operators used to select and manipulate XML document components. The logical algebra expressions are mapped into efficient relational algebra expressions over a physical representation of the XML document collection using the ‘pre-post numbering scheme’. The paper concludes with an analysis of experiments performed with the INEX test collection.

Keywords

structured document retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates R and Navarro G (1996) Integrating contents and structurein text Retrieval. In: ACM SIGMOD Record, vol. 25, pp. 67–79.Google Scholar
  2. Boncz P (2002) Monet: A next generation database kernel for Query Intensive Applications. PhD thesis, CWI.Google Scholar
  3. Burkowski FJ (1992) Retrieval activities in a database consisting of Heterogeneous Collections of Structured Texts. In: Proceedings of the 15th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112–125.Google Scholar
  4. Clarke CLA, Cormack GV and Burkowski FJ (1995) An algebra for structuredtext Search and a Framework for its Implementation. The Computer Journal, 38(1):43–56.Google Scholar
  5. Consens M and Milo T (1995) Algebras for querying text regions. in: Proceedings of the ACM Conference on Principles of Distributed Systems, pp. 11–22.Google Scholar
  6. Cowan J and Tobin R (2004) XML Information Set (Second Edition). Technical report, W3C.Google Scholar
  7. De Vries AP, List JA and Blok HE (2003) The multi-model dbms architecture and xml information retrieval. In: Blanken HM, Grabs T, Schek H-J, Schenkel R and Weikum G, Eds., Intelligent Search on XML, volume 2818 of Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI), Springer-Verlag, Springer-Verlag, Berlin, New York, pp. 179–192.Google Scholar
  8. De Vries AP (2001) Content independence in multimedia databases. Journal of the American Society for Information Science and Technology, 52(11):954–960.Google Scholar
  9. Feretal:XPath:03 Fernández M, et al. (2003) XML Path Language (XPath 2.0). Technical report, W3C.Google Scholar
  10. Graefe G (1993) Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73–170.Google Scholar
  11. Grust T (2002) Accelerating XPath location steps. in proceedings of the 21st ACM SIGMOD International Conference on Management of Data, pp. 109–120.Google Scholar
  12. Grust T and van Keulen M (2003). Tree awareness for relational dbms kernels: Staircase join. In: Blanken HM, Grabs T, Schek H-J, Schenkel R and Weikum G, Eds., Intelligent Search on XML, volume 2818 of Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI), Springer-Verlag, Berlin, New York, pp. 179–192.Google Scholar
  13. Hiemstra D (2001) Using language models forinformation Retrieval. PhD thesis, University of Twente, Twente, The Netherlands.Google Scholar
  14. Hiemstra D (2003) A database approach to content-based XML retrieval. In: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval (INEX), ERCIM Publications.Google Scholar
  15. Jaakkola J and Kilpelainen P (1999) Nested text-region algebra. technical Report C-1999-2, Department of Computer Science, University of Helsinki.Google Scholar
  16. Kazai G, Lalmas M and de Vries AP (2004) The overlap problem in content-oriented xml retrieval evaluation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, to appear.Google Scholar
  17. List JA and de Vries AP (2003) CWI at INEX 2002. In: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval (INEX), ERCIM Publications.Google Scholar
  18. Masuda K, Ninomiya T, Miyao Y, Ohta T and Tsujii J (2003) A robust retrieval engine for Proximal and Structural Search. In: Proceedings of HLT-NAACL 2003 Short papers, pp. 58–60.Google Scholar
  19. Masuda K (2003) A ranking model of proximal and Structural Text Retrieval Based on Region Algebra. In: Proceedings of the ACL-2003 Student Research Workshop, pp. 50–57.Google Scholar
  20. Miller RC (2002) Light-weight structured text processing. PhD thesis, Computer Science Department, Carnegie-Mellon University.Google Scholar
  21. Salminen A and Tompa FW (1992) PAT expressions: an algebra for Text Search. In: Proceedings of the 2nd International Conference in Computational Lexicography, COMPLEX’92, pp. 309–332.Google Scholar
  22. Schmidt AR, Kersten ML, Windhouwer MA and Waas F (2000) Efficient relational storage andretrieval of XML Documents. In: International Workshop on the Web and Databases (in conjunction with ACM SIGMOD), pp. 47–52.Google Scholar
  23. Sigurbjouml;rnsson B, Kamps J and de Rijke M (2004) An element-based approach to XML retrieval. In: Fuhr N, Lalmas M and Malik S, Eds., Proceedings of the Second Workshop of the INitiative for the Evaluation of XML retrieval (INEX), ERCIM Publications.Google Scholar
  24. Trotman A and O’Keefe RA (2004) The simplest query language that Could Possibly Work. In: Fuhr N, Lalmas M and Malik S, Eds., Proceedings of the Second Workshop of the INitiative for the Evaluation of XML retrieval (INEX), ERCIM Publications.Google Scholar
  25. Tsichritzis D and Klug A (1978) The ANSI/X3/SPARC DBMS framework report of the study group on database management systems. Information Systems, 3:173–191.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Johan List
    • 1
    Email author
  • Vojkan Mihajlović
    • 2
  • Georgina RamÍrez
    • 3
  • Arjen P. de Vries
    • 3
  • Djoerd Hiemstra
    • 4
  • Henk Ernst Blok
    • 4
  1. 1.CWIAmsterdamThe Netherlands
  2. 2.University of TwenteAE EnschedeThe Netherlands
  3. 3.CWIAmsterdamThe Netherlands
  4. 4.University of TwenteAE EnschedeThe Netherlands

Personalised recommendations