Advertisement

Query Processing for RDF Databases

  • Zoi Kaoudi
  • Anastasios Kementsietsidis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8714)

Abstract

RDF has become recently a very popular data model used in a variety of applications and use cases in both academia and industry. Query processing and evaluation is a central component in data management in general and is, thus, unsurprisingly one of the most active areas of research in the field of RDF data management. In this chapter we provide an overview of query processing techniques for the RDF data model using different system architectures. We survey techniques for both centralized and distributed RDF stores, including peer-to-peer, federated and cloud-based systems.

Keywords

Query Processing Resource Description Framework Query Evaluation SPARQL Query Query Planning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable Semantic Web Data Management Using Vertical Partitioning. In: VLDB, pp. 411–422 (2007)Google Scholar
  2. 2.
    Aberer, K., Cudre-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., Schmidt, R.: P-Grid: A Self-Organizing Structured P2P System. SIGMOD Record 32, 29–33 (2003)CrossRefGoogle Scholar
  3. 3.
    Aberer, K., Cudre-Mauroux, P., Hauswirth, M., Pelt, T.V.: GridVine: Building Internet-Scale Semantic Overlay Networks. In: Proceedings of the 13th World Wide Web Conference (WWW 2004), New York, USA (2004)Google Scholar
  4. 4.
    Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: Anapsid: An adaptive query processing engine for sparql endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Afrati, F.N., Ullman, J.D.: Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Trans. Knowl. Data Eng. 23(9) (2011)Google Scholar
  6. 6.
    Alexander, K., Hausenblas, M.: Describing linked datasets - on the design and usage of void, the vocabulary of interlinked datasets. In: Linked Data on the Web Workshop (LDOW 09), in conjunction with 18th International World Wide Web Conference, WWW 2009 (2009)Google Scholar
  7. 7.
    Alexander, N., Lopez, X., Ravada, S., Stephens, S., Wang, J.: Rdf data model in oracleGoogle Scholar
  8. 8.
    Apache Accumulo (2012), http://accumulo.apache.org/
  9. 9.
    Apache Cassandra (2012), http://cassandra.apache.org/
  10. 10.
    Apache Hadoop (2012), http://hadoop.apache.org/
  11. 11.
    Apache HBase (2012), http://hbase.apache.org/
  12. 12.
    Aranda-Andújar, A., Bugiotti, F., Camacho-Rodríguez, J., Colazzo, D., Goasdoué, F., Kaoudi, Z., Manolescu, I.: Amada: Web Data Repositories in the Amazon Cloud (demo). In: CIKM (2012)Google Scholar
  13. 13.
    Amazon Web Services (2012), http://aws.amazon.com/
  14. 14.
    Battre, D., Heine, F., Hoing, A., Kao, O.: Load-balancing in P2P based RDF stores. In: Proceedings of the 2nd International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006, Co-located with ISWC 2006), Athens, Georgia, USA (2006)Google Scholar
  15. 15.
    Battre, D., Heine, F., Hoing, A., Kao, O.: BabelPeers: P2P based Semantic Grid Resource Discovery. High Performance Computing and Grids in Action 16, 288–307 (2008)Google Scholar
  16. 16.
    Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A Comparison of Join Algorithms for Log Processing in MapReduce. In: SIGMOD (2010)Google Scholar
  17. 17.
    Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: SIGMOD Conference, pp. 121–132 (2013)Google Scholar
  18. 18.
    Brickley, D., Guha, R.: RDF Vocabulary Description Language 1.0: RDF Schema. Technical report, W3C Recommendation (2004)Google Scholar
  19. 19.
    Bugiotti, F., Goasdoué, F., Kaoudi, Z., Manolescu, I.: RDF Data Management in the Amazon Cloud. In: DanaC Workshop (in Conjunction with EDBT) (2012)Google Scholar
  20. 20.
    Cai, M., Frank, M.: RDFPeers: A Scalable Distributed RDF Repository based on A Structured Peer-to-Peer Network. In: Proceedings of the 13th World Wide Web Conference (WWW 2004), New York, USA (2004)Google Scholar
  21. 21.
    Cai, M., Frank, M., Szekely, P.: MAAN: A Multi-Attribute Addressable Network for Grid Information Services. In: Proceedings of the 4th International Workshop on Grid Computing (Grid2003), Phoenix, Arizona, USA (2003)Google Scholar
  22. 22.
    Cai, M., Frank, M.R., Yan, B., MacGregor, R.M.: A Subscribable Peer-to-Peer RDF Repository for Distributed Metadata Management. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 2(2), 109–130 (2004)CrossRefGoogle Scholar
  23. 23.
    Cattell, R.: Scalable SQL and NoSQL data stores. SIGMOD Record 39(4), 12–27 (2011)CrossRefGoogle Scholar
  24. 24.
    Chaudhry, N.A., Shaw, K., Abdelguerfi, M. (eds.): Stream Data Management. Advances in Database Systems, vol. 30. Springer (2005)Google Scholar
  25. 25.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified Data Processing on Large Clusters. In: Proceedings of the USENIX Symposium on Operating Systems Design & Implementation (OSDI), pp. 137–147 (2004)Google Scholar
  26. 26.
    Dhraief, H., Kemper, A., Nejdl, W., Wiesner, C.: Processing and Optimization of Complex Queries in Schema-Based P2P-Networks. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 31–45. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  27. 27.
    Doulkeridis, C., Norvag, K.: A survey of large-scale analytical query processing in MapReduce. VLDB Journal (2013)Google Scholar
  28. 28.
    Görlitz, O., Staab, S.: Splendid: Sparql endpoint federation exploiting void descriptions. In: COLD (2011)Google Scholar
  29. 29.
    Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 276–285 (1997)Google Scholar
  30. 30.
    Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)CrossRefzbMATHGoogle Scholar
  31. 31.
    Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation (2013), http://www.w3.org/TR/sparql11-overview/
  32. 32.
    Hayes, P.: RDF Semantics. W3C Recommendation (February 2004), http://www.w3.org/TR/rdf-mt/
  33. 33.
    Heine, F.: Scalable P2P based RDF Querying. In: Proceedings of the 1st International Conference on Scalable Information Systems (Infoscale 2006), Hong Kong (2006)Google Scholar
  34. 34.
    Heine, F., Hovestadt, M., Kao, O.: Processing Complex RDF Queries over P2P Networks. In: Proceedings of Workshop on Information Retrieval in Peer-to-Peer-Networks (P2PIR 2005), Bremen, Germany (2005)Google Scholar
  35. 35.
    Hoffmann, J., Selman, B. (eds.): Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, Ontario, Canada, July 22-26. AAAI Press (2012)Google Scholar
  36. 36.
    Hose, K., Schenkel, R.: WARP: Workload-Aware Replication and Partitioning for RDF. In: DESWEB Workshop (in Conjunction with ICDE) (2013)Google Scholar
  37. 37.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL Querying of Large RDF Graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  38. 38.
    Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.M.: Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing. IEEE Trans. on Knowl. and Data Eng. (2011)Google Scholar
  39. 39.
    Jena: a semantic web framework for java, https://jena.apache.org
  40. 40.
    Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: Storing, Updating and Querying RDF(S) Data on Top of DHTs. Journal of Web Semantics (2010)Google Scholar
  41. 41.
    Kaoudi, Z., Kyzirakos, K., Koubarakis, M.: SPARQL Query Optimization on Top of DHTs. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 418–435. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  42. 42.
    Kaoudi, Z., Manolescu, I.: RDF in the Clouds: A Survey. The VLDB Journal (2014)Google Scholar
  43. 43.
    Karnstedt, M.: Query Processing in a DHT-Based Universal Storage - The World as a Peer-to-Peer Database. PhD thesis (2009)Google Scholar
  44. 44.
    Karnstedt, M., Sattler, K.-U., Richtarsky, M., Muller, J., Hauswirth, M., Schmidt, R., John, R.: UniStore: Querying a DHT-based Universal Storage. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007 (Demo paper), Istanbul, Turkey (April 2007)Google Scholar
  45. 45.
    Kim, H., Ravindra, P., Anyanwu, K.: From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra (demo). PVLDB 4(12), 1426–1429 (2011)Google Scholar
  46. 46.
    Kokkinidis, G., Christophides, V.: Semantic Query Routing and Processing in P2P Database Systems: The ICS-FORTH SQPeer Middleware. In: EDBT Workshops, Heraklion, Crete, Greece (March 2004)Google Scholar
  47. 47.
    Kokkinidis, G., Sidirourgos, L., Christophides, V.: Query Processing in RDF/S-based P2P Database Systems. In: Semantic Web and Peer-to-Peer. Springer (2006)Google Scholar
  48. 48.
    Ladwig, G., Harth, A.: CumulusRDF: Linked Data Management on Nested Key-Value Stores. In: SSWS (2011)Google Scholar
  49. 49.
    Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The Vertica Analytic Database: C-store 7 Years Later. In: Proc. VLDB Endow., vol. 5(12), pp. 1790–1801 (2012)Google Scholar
  50. 50.
    Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for sparql. In: ICDE, pp. 666–677 (2012)Google Scholar
  51. 51.
    Li, F., Le, W., Duan, S., Kementsietsidis, A.: Scalable Keyword Search on Large RDF Data. IEEE Transactions on Knowledge and Data Engineering 99(PrePrints) (2014)Google Scholar
  52. 52.
    Liarou, E., Idreos, S., Koubarakis, M.: Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 399–413. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  53. 53.
    Matono, A., Pahlevi, S.M., Kojima, I.: RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005/2006. LNCS, vol. 4125, pp. 323–330. Springer, Heidelberg (2007)Google Scholar
  54. 54.
  55. 55.
    Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmér, M., Risch, T.: EDUTELLA: A P2P Networking Infrastructure based on RDF. In: Proceedings of the 11th World Wide World Conference (WWW 2002), Honolulu, Hawaii, USA, pp. 604–615 (2002)Google Scholar
  56. 56.
    Nejdl, W., Wolf, B., Staab, S., Tane, J.: Semantic Web Workshop 2002. CEUR Workshop Proceedings, vol. 55 (2002)Google Scholar
  57. 57.
    Nejdl, W., Wolpers, M., Siberski, W., Schmitz, C., Schlosser, M., Brunkhorst, I., Loser, A.: Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-To-Peer Networks. In: Proceedings of the 12th WWW Conference, Budapest, Hungary (May 2003)Google Scholar
  58. 58.
    Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRefGoogle Scholar
  59. 59.
    Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer (2011)Google Scholar
  60. 60.
    Paoli, J., Yergeau, F., Sperberg-McQueen, M., Bray, T., Maler, E.: Extensible markup language (XML) 1.0. W3C recommendation, W3C, 5th edn. (November 2008), http://www.w3.org/TR/2008/REC-xml-20081126/
  61. 61.
    Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: BigData Conference (2013)Google Scholar
  62. 62.
    Patel-Schneider, P., Hayes, P.: RDF 1.1 semantics. W3C recommendation, W3C (February 2014), http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/
  63. 63.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems 34(3), 16:1–16:45 (2009)Google Scholar
  64. 64.
    Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: A Scalable RDF Triple Store for the Clouds. In: Workshop on Cloud Intelligence (in Conjunction with VLDB) (2012)Google Scholar
  65. 65.
    Rakhmawati, N.A., Umbrich, J., Karnstedt, M., Hasnain, A., Hausenblas, M.: Querying over Federated SPARQL Endpoints - A State of the Art Survey. CoRR, abs/1306.1723 (2013)Google Scholar
  66. 66.
    Raman, V., Attaluri, G.K., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Müller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A.J., Zhang, L.: Db2 with blu acceleration: So much more than just a column store. PVLDB 6(11), 1080–1091 (2013)Google Scholar
  67. 67.
    Ravindra, P., Kim, H., Anyanwu, K.: An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 46–61. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  68. 68.
    Rhea, S., Geels, D., Roscoe, T., Kubiatowicz, J.: Handling Churn in a DHT. In: USENIX Annual Technical Conference (2004)Google Scholar
  69. 69.
    Rohloff, K., Schantz, R.E.: Clause-Iteration with MapReduce to Scalably Query Datagraphs in the SHARD Graph-Store. In: Workshop on Data-intensive Distributed Computing (2011)Google Scholar
  70. 70.
    Rowstron, A., Druschel, P.: Pastry: Scalable, Distributed Object Location and Routing for Large-Scale- Peer-to-Peer Storage Utility. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  71. 71.
    Sakr, S., Liu, A., Fayoumi, A.G.: The Family of Mapreduce and Large-scale Data Processing Systems. ACM Comput. Surv. 46(1), 11:1–11:44 (2013)Google Scholar
  72. 72.
    Saleem, M., Khan, Y., Ivan Ermilov, A.H.A.D., Ngomo, A.-C.N.:Google Scholar
  73. 73.
    Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: Mapping SPARQL to Pig Latin. In: SWIM (2011)Google Scholar
  74. 74.
    Schlosser, M.T., Sintek, M., Decker, S., Nejdl, W.: HyperCuP - Hypercubes, Ontologies and Efficient Search on Peer-to-peer Networks. In: Moro, G., Koubarakis, M. (eds.) AP2PC 2002. LNCS (LNAI), vol. 2530, pp. 112–124. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  75. 75.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: Fedx: Optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  76. 76.
    SHA-1. Secure hash standard. National Institute of Standards and Technology. Publication 180-1 (1995)Google Scholar
  77. 77.
    Shao, B., Wang, H., Li, Y.: The Trinity Graph Engine. Technical report (2012), http://research.microsoft.com/pubs/161291/trinity.pdf
  78. 78.
    Sidirourgos, L., Kokkinidis, G., Dalamagas, T., Christophides, V., Sellis, T.: Indexing Views to Route Queries in a PDMS. Journal of Distributed Parallel Databases 23, 45–68 (2008)CrossRefGoogle Scholar
  79. 79.
    Staab, S., Stuckenschmidt, H. (eds.): Semantic Web and Peer-to-Peer: Decentralized Management and Exchange of Knowledge and Information. Springer (2006)Google Scholar
  80. 80.
    Stein, R., Zacharias, V.: RDF On Cloud Number Nine. In: Workshop on New Forms of Reasoning for the Semantic Web: Scalable and Dynamic (May 2010)Google Scholar
  81. 81.
    Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking 11(1), 17–32 (2003)CrossRefGoogle Scholar
  82. 82.
    Triantafillou, P., Xiruhaki, C., Koubarakis, M., Ntarmos, N.: Towards high-performance peer-to-peer content and resource sharing systems. In: Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003) (January 2003)Google Scholar
  83. 83.
    Wilkinson, K.: Jena property table implementation. In: SSWS (2006)Google Scholar
  84. 84.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A Distributed Graph Engine for Web Scale RDF Data. In: PVLDB (2013)Google Scholar
  85. 85.
    Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: Towards Scalable I/O Efficient SPARQL Query Evaluation on the Cloud. In: ICDE (2013)Google Scholar
  86. 86.
    Zhang, X., Chen, L., Wang, M.: Towards Efficient Join Processing over Large RDF Graph Using MapReduce. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 250–259. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zoi Kaoudi
    • 1
  • Anastasios Kementsietsidis
    • 2
  1. 1.IMIS, Athena Research CenterAthensGreece
  2. 2.Google ResearchMountain ViewUSA

Personalised recommendations