HOPI: An Efficient Connection Index for Complex XML Document Collections

  • Ralf Schenkel
  • Anja Theobald
  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., et al.: Compact labeling schemes for ancestor queries. In: SODA 2001, pp. 547–556 (2001)Google Scholar
  2. 2.
    Alstrup, S., Rauhe, T.: Improved labeling scheme for ancestor queries. In: SODA 2002, pp. 947–953 (2002)Google Scholar
  3. 3.
    Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. In: SIGMOD 1986, pp. 16–52 (1986)Google Scholar
  4. 4.
    Blanken, H., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.): Intelligent Search on XML Data. LNCS, vol. 2818. Springer, Heidelberg (2003)MATHGoogle Scholar
  5. 5.
    Böhme, T., Rahm, E.: Multi-user evaluation of XML data management systems with XMach-1. In: EEXTT 2002, pp. 148–158 (2003)Google Scholar
  6. 6.
    Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: SIGMOD 2002, pp. 121–132 (2002)Google Scholar
  7. 7.
    Ciarlet Jr, P., Lamour, F.: On the validity of a front oriented approach to partitioning lage sparse graphs with a connectivity constraint. Numerical Algorithms 12(1,2), 193–214 (1996)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Cohen, E., et al.: Labeling dynamic XML trees. In: PODS 2002, pp. 271–281 (2002)Google Scholar
  9. 9.
    Cohen, E., et al.: Reachability and distance queries via 2-hop labels. In: SODA 2002, pp. 937–946 (2002)Google Scholar
  10. 10.
    Cooper, B., et al.: A fast index for semistructured data. In: VLDB 2001, pp. 341–350 (2001)Google Scholar
  11. 11.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 1st edn. MIT Press, Cambridge (1990)MATHGoogle Scholar
  12. 12.
    DeRose, S., et al.: XML linking language (XLink), version 1.0. W3C recommendation (2001)Google Scholar
  13. 13.
    Farhat, C.: A simple and efficient automatic FEM domain decomposer. Computers and Structures 28(5), 579–602 (1988)CrossRefGoogle Scholar
  14. 14.
    Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: VLDB 1997, pp. 436–445 (1997)Google Scholar
  15. 15.
    Grust, T.: Accelerating XPath location steps. In: SIGMOD 2002, pp. 109–120 (2002)Google Scholar
  16. 16.
    Grust, T., van Keulen, M.: Tree awareness for relational DBMS kernels: Staircase join. In: Blanken et al. [4]Google Scholar
  17. 17.
    Kaplan, H., et al.: A comparison of labeling schemes for ancestor queries. In: SODA 2002, pp. 954–963 (2002)Google Scholar
  18. 18.
    Kaplan, H., Milo, T.: Short and simple labels for small distances and other functions. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 246–257. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  19. 19.
    Kaushik, R., et al.: Covering indexes for branching path queries. In: SIGMOD 2002, pp. 133–144 (2002)Google Scholar
  20. 20.
    Ley, M.: DBLP XML Records. Downloaded September 1 (2003)Google Scholar
  21. 21.
    Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  22. 22.
    Qun, C., et al.: D(k)-index: An adaptive structural summary for graph-structured data. In: SIGMOD 2003, pp. 134–144 (2003)Google Scholar
  23. 23.
    Schenkel, R., Theobald, A., Weikum, G.: Ontology-enabled XML search. In: Blanken et al. [4]Google Scholar
  24. 24.
    Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  25. 25.
    Theobald, A., Weikum, G.: The XXL search engine: Ranked retrieval of XML data using indexes and ontologies. In: SIGMOD 2002 (2002)Google Scholar
  26. 26.
    Zezula, P., Amato, G., Rabitti, F.: Processing XML queries with tree signatures. In: Blanken et al. [4]Google Scholar
  27. 27.
    Zezula, P., et al.: Tree signatures for XML querying and navigation. In: 1st Int. XML Database Symposium, pp. 149–163 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Ralf Schenkel
    • 1
  • Anja Theobald
    • 1
  • Gerhard Weikum
    • 1
  1. 1.Max Planck Institut für InformatikSaarbrückenGermany

Personalised recommendations