Advertisement

Labeling RDF Graphs for Linear Time and Space Querying

  • Tim FurcheEmail author
  • Antonius Weinzierl
  • François Bry
Chapter

Abstract

Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

Keywords

Label Scheme Edge Label SPARQL Query Arbitrary Graph Triple Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: Proc. ACM Symp. on Management of Data (SIGMOD), pp. 253–262. ACM, New York (1989) Google Scholar
  2. 2.
    Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Wu, Y.: Structural joins: a primitive for efficient XML query pattern matching. In: Proc. Int. Conf. on Data Engineering, p. 141. IEEE Computer Society, Los Alamitos (2002) Google Scholar
  3. 3.
    Backett, D.: Turtle—Terse RDF Triple Language. Technical Report, Institute for Learning and Research Technology, University of Bristol (2007) Google Scholar
  4. 4.
    Beckett, D., McBride, B.: RDF/XML Syntax Specification (Revised). Recommendation, W3C (2004) Google Scholar
  5. 5.
    Bolzer, O.: Towards Data-Integration on the Semantic Web: Querying RDF with Xcerpt. Diplomarbeit/diploma Thesis, University of Munich (2005) Google Scholar
  6. 6.
    Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: Proc. ACM Symp. on Management of Data (SIGMOD), pp. 479–490. ACM, New York (2006) Google Scholar
  7. 7.
    Booth, K.S., Lueker, G.S.: Linear algorithms to recognize interval graphs and test for the consecutive ones property. In: Proc. of ACM Symposium on Theory of Computing, pp. 255–265. ACM, New York (1975) Google Scholar
  8. 8.
    Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 310–321. ACM, New York (2002) Google Scholar
  9. 9.
    Bry, F., Furche, T., Linse, B., Pohl, A.: Xcerptrdf: A pattern-based answer to the versatile web challenge. In: Proc. Workshop on (Constraint) Logic Programming (WLP) (2008) Google Scholar
  10. 10.
    Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: Proc. Int’l. Conf. on Very Large Data Bases (VLDB), pp. 493–504. VLDB Endowment (2005) Google Scholar
  11. 11.
    Chen, T., Lu, J., Ling, T.W.: On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 455–466. ACM, New York (2005) CrossRefGoogle Scholar
  12. 12.
    Chen, Z., Gehrke, J., Korn, F., Koudas, N., Shanmugasundaram, J., Srivastava, D.: Index structures for matching XML twigs using relational query processors. Data Knowl. Eng. (DKE) 60(2), 283–302 (2007) CrossRefGoogle Scholar
  13. 13.
    Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: On labeling schemes for the semantic web. In: Proc. Int’l. World Wide Web Conf. (WWW), pp. 544–555. ACM, New York (2003) Google Scholar
  14. 14.
    Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. In: Proc. ACM Symposium on Discrete Algorithms, pp. 937–946. Society for Industrial and Applied Mathematics, Philadelphia (2002) Google Scholar
  15. 15.
    Dietz, P.F.: Maintaining order in a linked list. In: Proc. ACM Symp. on Theory of Computing (STOC), pp. 122–127. ACM, New York (1982) Google Scholar
  16. 16.
    Fulkerson, D.R., Gross, O.A.: Incidence matrices and interval graphs. Pac. J. Math. 15(3), 835–855 (1965) zbMATHMathSciNetGoogle Scholar
  17. 17.
    Furche, T.: Implementation of web query language reconsidered: beyond tree and single-language algebras at (almost) no cost. Dissertation/doctoral Thesis, Ludwig-Maxmilians University Munich (2008) Google Scholar
  18. 18.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979) zbMATHGoogle Scholar
  19. 19.
    Goldberg, P.W., Golumbic, M.C., Kaplan, H., Shamir, R.: Four strikes against physical mapping of DNA. J. Comput. Biol. 2(1), 139–152 (1995) CrossRefGoogle Scholar
  20. 20.
    Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. (2005) Google Scholar
  21. 21.
    Gottlob, G., Leone, N., Scarcello, F.: The complexity of acyclic conjunctive queries. J. ACM 48(3), 431–498 (2001) CrossRefMathSciNetGoogle Scholar
  22. 22.
    Grust, T.: Accelerating XPath location steps. In: Proc. ACM Symp. on Management of Data (SIGMOD) (2002) Google Scholar
  23. 23.
    Grust, T., van Keulen, M., Teubner, J.: Staircase join: teach a relational DBMS to watch its (axis) steps. In: Proc. Int. Conf. on Very Large Databases (2003) Google Scholar
  24. 24.
    Habib, M., McConnell, R., Paul, C., Viennot, L.: Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing. Theor. Comput. Sci. 234(1–2), 59–84 (2000) zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Haddadi, S., Layouni, Z.: Consecutive block minimization is 1.5-approximable. Inf. Process. Lett. 108(3), 132–135 (2008) CrossRefMathSciNetGoogle Scholar
  26. 26.
    Hsu, W.L.: PC-trees vs. PQ-trees. In: Proc. Int’l. Conf. on Computing and Combinatorics. LNCS, vol. 2108. Springer, Berlin (2001) Google Scholar
  27. 27.
    Hsu, W.L.: A simple test for the consecutive ones property. J. Algorithms 43(1), 1–16 (2002) zbMATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: Proc. Int’l. Conf. on Very Large Data Bases (VLDB), pp. 273–284. VLDB Endowment (2003) Google Scholar
  29. 29.
    Kou, L.T.: Polynomial complete consecutive information retrieval problems. SIAM J. Comput. 6(1), 67–75 (1977) zbMATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Meuss, H., Schulz, K.U.: Complete answer aggregates for treelike databases: a novel approach to combine querying and navigation. ACM Trans. Inf. Syst. 19(2), 161–215 (2001) CrossRefGoogle Scholar
  31. 31.
    Olteanu, D.: SPEX: streamed and progressive evaluation of XPath. IEEE Trans. Knowl. Data Eng. (2007) Google Scholar
  32. 32.
    Olteanu, D., Furche, T., Bry, F.: Evaluating complex queries against XML streams with polynomial combined complexity. In: Proc. British National Conf. on Databases (BNCOD), pp. 31–44 (2003) Google Scholar
  33. 33.
    Olteanu, D., Furche, T., Bry, F.: An efficient single-pass query evaluator for XML data streams. In: Data Streams Track, Proc. ACM Symp. on Applied Computing (SAC) pp. 627–631 (2004) Google Scholar
  34. 34.
    Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: looking forward. In: Proc. EDBT Workshop on XML-Based Data Management. Lecture Notes in Computer Science, vol. 2490. Springer, Berlin (2002) CrossRefGoogle Scholar
  35. 35.
    O’Neil, P., O’Neil, E., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: insert-friendly XML node labels. In: Proc. ACM Symp. on Management of Data (SIGMOD), pp. 903–908. ACM, New York (2004) Google Scholar
  36. 36.
    Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987) zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Perez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Proc. Int’l. Semantic Web Conf. (ISWC) (2006) Google Scholar
  38. 38.
    Pérez, J., Arenas, M., Gutierrez, C.: nSPARQL: A navigational language for rdf. In: Proc. Int’l. Semantic Web Conf. (ISWC), pp. 66–81 (2008) Google Scholar
  39. 39.
    Polleres, A.: From SPARQL to rules (and back). In: Proc. Int’l. World Wide Web Conf. (WWW), pp. 787–796. ACM, New York (2007) CrossRefGoogle Scholar
  40. 40.
    Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. Proposed Recommendation, W3C (2007) Google Scholar
  41. 41.
    Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: Proc. Extending Database Technology (2004) Google Scholar
  42. 42.
    Su-Cheng, H., Chien-Sing, L.: Node labeling schemes in XML query optimization: A survey and trends. IETE Tech. Rev. 26(2), 88–100 (2009) CrossRefGoogle Scholar
  43. 43.
    Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: Proc. ACM Symp. on Management of Data (SIGMOD), pp. 845–856. ACM, New York (2007) Google Scholar
  44. 44.
    Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: Answering graph reachability queries in constant time. In: Proc. Int’l. Conf. on Data Engineering (ICDE), p. 75. IEEE Computer Society, Los Alamitos (2006) Google Scholar
  45. 45.
    Weigel, F., Schulz, K.U., Meuss, H.: The BIRD numbering scheme for XML and tree databases—deciding and reconstructing tree relations using efficient arithmetic operations. In: Proc. Int’l. XML Database Symposium (XSym). LNCS, vol. 3671, pp. 49–67. Springer, Berlin (2005) Google Scholar
  46. 46.
    Weinzierl, A.: Interval-based graph representations for efficient web querying. Diplomarbeit/diploma Thesis, Ludwig-Maxmilians University Munich (2009) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Institute for InformaticsUniversity of MunichMunichGermany
  2. 2.Knowledge-based Systems GroupTechnische Universität WienWienAustria

Personalised recommendations