Advertisement

Storing and Querying Semantic Data in the Cloud

  • Daniel Janke
  • Steffen Staab
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11078)

Abstract

In the last years, huge RDF graphs with trillions of triples were created. To be able to process this huge amount of data, scalable RDF stores are used, in which graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. The main challenges to be investigated for the development of such RDF stores in the cloud are: (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this manuscript, we give an overview of how these challenges are addressed by scalable RDF stores in the cloud.

References

  1. 1.
    Largetriplestores. https://www.w3.org/wiki/LargeTripleStores. Accessed 10 July 2018
  2. 2.
    The bigdata\(\textregistered \) RDF Database. http://www.bigdata.com/whitepapers/bigdata_architecture_whitepaper.pdf. Accessed 29 Oct 2014
  3. 3.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 411–422. VLDB Endowment (2007). http://dl.acm.org/citation.cfm?id=1325851.1325900
  4. 4.
    Abbassi, S., Faiz, R.: RDF-4X: a scalable solution for RDF quads store in the cloud. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, MEDES, pp. 231–236. ACM, New York (2016).  https://doi.org/10.1145/3012071.3012104
  5. 5.
    Abdelaziz, I., Harbi, R., Salihoglu, S., Kalnis, P.: Combining vertex-centric graph processing with SPARQL for large-scale RDF data analytics. IEEE Trans. Parallel Distrib. Syst. 28(12), 3374–3388 (2017).  https://doi.org/10.1109/TPDS.2017.2720174CrossRefGoogle Scholar
  6. 6.
    Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: building internet-scale semantic overlay networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30475-3_9CrossRefGoogle Scholar
  7. 7.
    Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L. (ed.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25073-6_2CrossRefGoogle Scholar
  8. 8.
    Akar, Z., Halaç, T.G., Ekinci, E.E., Dikenelli, O.: Querying the web of interlinked datasets using VOID descriptions. In: WWW 2012 Workshop on Linked Data on the Web, Lyon, France, 16 April 2012. http://ceur-ws.org/Vol-937/ldow2012-paper-06.pdf
  9. 9.
    Al-Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Adaptive partitioning for very large RDF data. CoRR abs/1505.0 (2015). http://arxiv.org/abs/1505.02728
  10. 10.
    Al-Harbi, R., Ebrahim, Y., Kalnis, P.: PHD-store: an adaptive SPARQL engine with dynamic partitioning for distributed RDF repositories. CoRR abs/1405.4 (2014). http://arxiv.org/abs/1405.4979
  11. 11.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets with the VoID vocabulary. W3C Interest Group Note, W3C (2011). http://www.w3.org/TR/2011/NOTE-void-20110303/
  12. 12.
    Ali, L., Janson, T., Lausen, G.: 3rdf: storing and querying RDF data on top of the 3nuts overlay network. In: 2011 22nd International Workshop on Database and Expert Systems Applications, pp. 257–261 (2011).  https://doi.org/10.1109/DEXA.2011.1
  13. 13.
    Ali, L., Janson, T., Schindelhauer, C.: Towards load balancing and parallelizing of RDF query processing in P2P based distributed RDF data stores. In: 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 307–311 (2014).  https://doi.org/10.1109/PDP.2014.79
  14. 14.
    Ali, L., Janson, T., Lausen, G., Schindelhauer, C.: Effects of network structure improvement on distributed RDF querying. In: Hameurlain, A., Rahayu, W., Taniar, D. (eds.) Globe 2013. LNCS, vol. 8059, pp. 63–74. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40053-7_6CrossRefGoogle Scholar
  15. 15.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_13CrossRefGoogle Scholar
  16. 16.
    Arenas, M., Pérez, J.: Federation and navigation in SPARQL 1.1. In: Eiter, T., Krennwallner, T. (eds.) Reasoning Web 2012. LNCS, vol. 7487, pp. 78–111. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33158-9_3CrossRefGoogle Scholar
  17. 17.
    Basca, C., Bernstein, A.: Distributed SPARQL throughput increase: on the effectiveness of workload-driven RDF partitioning. In: ISWC 2013 (2013)Google Scholar
  18. 18.
    Basca, C., Bernstein, A.: Querying a messy web of data with AVALANCHE. Web Semant.: Sci. Serv. Agents World Wide Web 26 (2014). http://www.websemanticsjournal.org/index.php/ps/article/view/361
  19. 19.
    Battré, D., Heine, F., Höing, A., Kao, O.: On triple dissemination, forward-chaining, and load balancing in DHT based RDF stores. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005–2006. LNCS, vol. 4125, pp. 343–354. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-71661-7_33CrossRefGoogle Scholar
  20. 20.
    Beame, P., Koutris, P., Suciu, D.: Skew in parallel query processing. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 212–223. ACM, New York (2014).  https://doi.org/10.1145/2594538.2594558
  21. 21.
    Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009).  https://doi.org/10.4018/jswis.2009040101CrossRefGoogle Scholar
  22. 22.
    Böhm, C., Hefenbrock, D., Naumann, F.: Scalable peer-to-peer-based RDF management. In: Proceedings of the 8th International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 165–168. ACM, New York (2012).  https://doi.org/10.1145/2362499.2362523
  23. 23.
    Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: COSI: cloud oriented subgraph identification in massive social networks. In: Advances in Social Networks Analysis and Mining (ASONAM), pp. 248–255 (2010).  https://doi.org/10.1109/ASONAM.2010.80
  24. 24.
    Bugiotti, F., Camacho-Rodríguez, J., Goasdoué, F., Kaoudi, Z., Manolescu, I., Zampetakis, S.: SPARQL query processing in the cloud. In: Harth, A., Hose, K., Schenkel, R. (eds.) Linked Data Management. Emerging Directions in Database Systems and Applications. Chapman and Hall/CRC (2014)Google Scholar
  25. 25.
    Cai, M., Frank, M.: RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In: Proceedings of the 13th International Conference on World Wide Web, pp. 650–657 (2004). http://dl.acm.org/citation.cfm?id=988760
  26. 26.
    Charalambidis, A., Troumpoukis, A., Konstantopoulos, S.: SemaGrow: optimizing federated SPARQL queries. In: Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS 2015, pp. 121–128. ACM, New York (2015).  https://doi.org/10.1145/2814864.2814886
  27. 27.
    Cheng, L., Kotoulas, S.: Scale-out processing of large RDF datasets. IEEE Trans. Big Data 1(4), 138–150 (2015).  https://doi.org/10.1109/TBDATA.2015.2505719CrossRefGoogle Scholar
  28. 28.
    Chu, S., Balazinska, M., Suciu, D.: From theory to practice: efficient join query evaluation in a parallel database system. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 63–78. ACM, New York (2015).  https://doi.org/10.1145/2723372.2750545
  29. 29.
    Cossu, M., Färber, M., Lausen, G.: PRoST: distributed execution of SPARQL queries using mixed partitioning strategies. In: Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, 26–29 March 2018, pp. 469–472 (2018).  https://doi.org/10.5441/002/edbt.2018.49
  30. 30.
    Crespo, A., Garcia-Molina, H.: Semantic overlay networks for P2P systems. In: Moro, G., Bergamaschi, S., Aberer, K. (eds.) AP2PC 2004. LNCS (LNAI), vol. 3601, pp. 1–13. Springer, Heidelberg (2005).  https://doi.org/10.1007/11574781_1CrossRefGoogle Scholar
  31. 31.
    Cudre-Mauroux, P., Agarwal, S., Aberer, K.: GridVine: an infrastructure for peer information management. IEEE Internet Comput. 11(5), 36–44 (2007).  https://doi.org/10.1109/MIC.2007.108CrossRefGoogle Scholar
  32. 32.
    Cudré-Mauroux, P., et al.: NoSQL databases for RDF: an empirical evaluation. In: Alani, H. (ed.) ISWC 2013. LNCS, vol. 8219, pp. 310–325. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41338-4_20CrossRefGoogle Scholar
  33. 33.
    Curé, O., Naacke, H., Baazizi, M.A., Amann, B.: On the evaluation of RDF distribution algorithms implemented over apache spark. In: Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems (ISWC 2015), pp. 16–31 (2015)Google Scholar
  34. 34.
    DeCandia, G., et al.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 205–220. ACM, New York (2007).  https://doi.org/10.1145/1294261.1294281
  35. 35.
    Della Valle, E., Turati, A., Ghioni, A.: PAGE: a distributed infrastructure for fostering rdf-based interoperability. In: Eliassen, F., Montresor, A. (eds.) DAIS 2006. LNCS, vol. 4025, pp. 347–353. Springer, Heidelberg (2006).  https://doi.org/10.1007/11773887_27CrossRefGoogle Scholar
  36. 36.
    DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 1–8. ACM, New York (1984).  https://doi.org/10.1145/602259.602261
  37. 37.
    Dhraief, H., Kemper, A., Nejdl, W., Wiesner, C.: Processing and optimization of complex queries in schema-based P2P-networks. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 31–45. Springer, Heidelberg (2005).  https://doi.org/10.1007/978-3-540-31838-5_3CrossRefGoogle Scholar
  38. 38.
    Ding, L., Peng, Y., da Silva, P.P., McGuinness, D.L.: Tracking RDF graph provenance using RDF molecules. Technical report, UMBC (2005). https://ebiquity.umbc.edu/paper/html/id/240/Tracking-RDF-Graph-Provenance-using-RDF-Molecules
  39. 39.
    Du, F., Bian, H., Chen, Y., Du, X.: Efficient SPARQL query evaluation in a database cluster. In: IEEE International Congress on Big Data, pp. 165–172 (2013).  https://doi.org/10.1109/BigData.Congress.2013.30
  40. 40.
    Erling, O., Mikhailov, I.: Towards web scale RDF. In: 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2008) (2008)Google Scholar
  41. 41.
    Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-04329-1_21CrossRefGoogle Scholar
  42. 42.
    Farhan Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011).  https://doi.org/10.1109/TKDE.2011.103CrossRefGoogle Scholar
  43. 43.
    Galarraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. CoRR abs/1212.5 (2012). http://arxiv.org/abs/1212.5636
  44. 44.
    Goasdoué, F., Kaoudi, Z., Manolescu, I., Quiané-Ruiz, J.A., Zampetakis, S.: CliqueSquare: flat plans for massively parallel RDF queries. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 771–782 (2015).  https://doi.org/10.1109/ICDE.2015.7113332
  45. 45.
    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI 2014, pp. 599–613. USENIX Association, Berkeley (2014). http://dl.acm.org/citation.cfm?id=2685048.2685096
  46. 46.
    Goodman, E.L., Grunwald, D.: Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, \(\text{IA}^3\) 2014, pp. 25–32. IEEE Press, Piscataway (2014).  https://doi.org/10.1109/IA3.2014.10
  47. 47.
    Görlitz, O., Thimm, M., Staab, S.: SPLODGE: Systematic generation of SPARQL benchmark queries for linked open data. Semant. Web-ISWC 2012, 116–132 (2012).  https://doi.org/10.1007/978-3-642-35176-1_8CrossRefGoogle Scholar
  48. 48.
    Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of the Second International Conference on Consuming Linked Data, COLD 2011, vol. 782, pp. 13–24. CEUR-WS.org, Aachen (2010). http://dl.acm.org/citation.cfm?id=2887352.2887354
  49. 49.
    Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: A Multi-Criteria Experimental Ranking of Distributed SPARQL Evaluators (2016). https://hal.inria.fr/hal-01381781
  50. 50.
    Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: SPARQLGX: efficient distributed evaluation of SPARQL with apache spark. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 80–87. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46547-0_9CrossRefGoogle Scholar
  51. 51.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for owl knowledge base systems. Web Semant.: Sci. Serv. Agents World Wide Web, 3(2–3) (2005). http://www.websemanticsjournal.org/index.php/ps/article/view/70
  52. 52.
    Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD, pp. 289–300 (2014).  https://doi.org/10.1145/2588555.2610511
  53. 53.
    Gutierrez, C., Hurtado, C., Mendelzon, A.O.: Foundations of semantic web databases. In: PODS, pp. 95–106. ACM (2004).  https://doi.org/10.1145/1055558.1055573
  54. 54.
    Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: VLDB 1997, Athens, Greece, pp. 276–285. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  55. 55.
    Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S.M.R., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB Endow. 8(6), 654–665 (2015).  https://doi.org/10.14778/2735703.2735705CrossRefGoogle Scholar
  56. 56.
    Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N.: Evaluating SPARQL queries on massive RDF datasets. PVLDB, 8(12), 1848–1851 (2015). http://www.vldb.org/pvldb/vol8/p1848-harbi.pdf
  57. 57.
    Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016).  https://doi.org/10.1007/s00778-016-0420-yCrossRefGoogle Scholar
  58. 58.
    Harris, S., Lamb, N., Shadbolt, N.: 4store: the design and implementation of a clustered RDF store. In: Scalable Semantic Web Knowledge Base Systems - SSWS 2009, pp. 94–109 (2009)Google Scholar
  59. 59.
    Harth, A., Decker, S.: Optimized index structures for querying RDF from the web. In: Proceedings of LA-WEB 2005, p. 71. IEEE (2005).  https://doi.org/10.1109/LAWEB.2005.25
  60. 60.
    Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Aberer, K. (ed.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-76298-0_16CrossRefGoogle Scholar
  61. 61.
    Hong, S., Depner, S., Manhardt, T., Van Der Lugt, J., Verstraaten, M., Chafi, H.: PGX.D: a fast distributed graph processing engine. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 58:1–58:12. ACM, New York (2015).  https://doi.org/10.1145/2807591.2807620
  62. 62.
    Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: Data Engineering Workshops (ICDEW), pp. 1–6 (2013).  https://doi.org/10.1109/ICDEW.2013.6547414
  63. 63.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  64. 64.
    Janke, D., Staab, S., Thimm, M.: Impact analysis of data placement strategies on query efforts in distributed RDF stores. J. Web Semant. (2018).  https://doi.org/10.1016/j.websem.2018.02.002, http://www.websemanticsjournal.org/index.php/ps/article/view/516
  65. 65.
    Jones, N.D.: An introduction to partial evaluation. ACM Comput. Surv. 28(3), 480–503 (1996).  https://doi.org/10.1145/243439.243447CrossRefGoogle Scholar
  66. 66.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 229–238 (2009).  https://doi.org/10.1109/ICDM.2009.14
  67. 67.
    Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: storing, updating and querying RDF(S) data on top of DHTs. Web Semant.: Sci. Serv. Agents World Wide Web 8(4) (2010). http://www.websemanticsjournal.org/index.php/ps/article/view/250
  68. 68.
    Karnstedt, M., et al.: UniStore: querying a DHT-based universal storage. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 1503–1504 (2007).  https://doi.org/10.1109/ICDE.2007.369054
  69. 69.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998).  https://doi.org/10.1137/S1064827595287997MathSciNetCrossRefzbMATHGoogle Scholar
  70. 70.
    Khadilkar, V., Kantarcioglu, M., Thuraisingham, B.M., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. Technical report, Department of Computer Science at the University of Texas at Dallas (2012)Google Scholar
  71. 71.
    Khadilkar, V., Kantarcioglu, M., Thuraisingham, B.M., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. In: Proceedings of the ISWC 2012 Posters & Demonstrations Track, Boston, USA, 11–15 November 2012. http://ceur-ws.org/Vol-914/paper_14.pdf
  72. 72.
    Kim, H., Ravindra, P., Anyanwu, K.: From SPARQL to MapReduce: the journey using a nested TripleGroup Algebra. PVLDB 4(12), 1426–1429 (2011). http://www.vldb.org/pvldb/vol4/p1426-kim.pdf
  73. 73.
    Kokkinidis, G., Christophides, V.: Semantic query routing and processing in P2P database systems: the ICS-FORTH SQPeer middleware. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 486–495. Springer, Heidelberg (2005).  https://doi.org/10.1007/978-3-540-30192-9_48CrossRefGoogle Scholar
  74. 74.
    Kotsev, V., Kiryakov, A., Fundulaki, I., Alexiev, V.: LDBC semantic publishing benchmark (SPB) - v2.0 first public draft release. Technical report, The Linked Data Benchmark Council (2014). https://github.com/ldbc/ldbc_spb_bm_2.0/blob/master/doc/LDBC_SPB_v2.0.docx?raw=true
  75. 75.
    Ladwig, G., Harth, A.: CumulusRDF: linked data management on nested key-value stores. In: Proceedings of the 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011) at the 10th International Semantic Web Conference (ISWC 2011) (2011)Google Scholar
  76. 76.
    Ladwig, G., Tran, T.: SIHJoin: querying remote and local linked data. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 139–153. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21034-1_10CrossRefGoogle Scholar
  77. 77.
    Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010).  https://doi.org/10.1145/1773912.1773922CrossRefGoogle Scholar
  78. 78.
    Le-Phuoc, D., Nguyen Mau Quoc, H., Le Van, C., Hauswirth, M.: Elastic and scalable processing of linked stream data in the cloud. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 280–297. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_18CrossRefGoogle Scholar
  79. 79.
    Lee, K., Liu, L.: Efficient data partitioning model for heterogeneous graphs in the cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. pp. 46:1–46:12. ACM (2013).  https://doi.org/10.1145/2503210.2503302
  80. 80.
    Lee, K., Liu, L.: Scaling queries over Big RDF graphs with semantic hash partitioning. PVLDB 6(14), 1894–1905 (2013).  https://doi.org/10.14778/2556549.2556571CrossRefGoogle Scholar
  81. 81.
    Lee, K., Liu, L., Tang, Y., Zhang, Q., Zhou, Y.: Efficient and customizable data partitioning framework for distributed big RDF data processing in the cloud. In: IEEE CLOUD 2013, pp. 327–334 (2013).  https://doi.org/10.1109/CLOUD.2013.63
  82. 82.
    Liarou, E., Idreos, S., Koubarakis, M.: Evaluating conjunctive triple pattern queries over large structured overlay networks. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 399–413. Springer, Heidelberg (2006).  https://doi.org/10.1007/11926078_29CrossRefGoogle Scholar
  83. 83.
    Lynden, S., Kojima, I., Matono, A., Tanimura, Y.: ADERIS: an adaptive query processor for joining federated SPARQL endpoints. In: Meersman, R. (ed.) OTM 2011. LNCS, vol. 7045, pp. 808–817. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25106-1_28CrossRefGoogle Scholar
  84. 84.
    Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010).  https://doi.org/10.1145/1807167.1807184
  85. 85.
    Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013).  https://doi.org/10.1016/j.physrep.2013.08.002MathSciNetCrossRefzbMATHGoogle Scholar
  86. 86.
    Mansour, E., Abdelaziz, I., Ouzzani, M., Aboulnaga, A., Kalnis, P.: A demonstration of Lusail: querying linked data at scale. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 2017, pp. 1603–1606. ACM, New York (2017).  https://doi.org/10.1145/3035918.3058731
  87. 87.
    Matono, A., Pahlevi, S.M., Kojima, I.: RDFCube: a P2P-based three-dimensional index for structural joins on distributed triple stores. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005-2006. LNCS, vol. 4125, pp. 323–330. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-71661-7_31CrossRefGoogle Scholar
  88. 88.
    McMurry, J., et al.: Report on the scalability of semantic web integration in biomedbridges (2015).  https://doi.org/10.5281/zenodo.14071
  89. 89.
    Mishra, P., Eich, M.H.: Join processing in relational databases. ACM Comput. Surv. 24(1), 63–113 (1992).  https://doi.org/10.1145/128762.128764CrossRefGoogle Scholar
  90. 90.
    Montoya, G., Skaf-Molli, H., Hose, K.: The Odyssey approach for optimizing federated SPARQL queries. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 471–489. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68288-4_28CrossRefGoogle Scholar
  91. 91.
    Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Federated SPARQL queries processing with replicated fragments. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 36–51. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_3CrossRefGoogle Scholar
  92. 92.
    Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.E.: Decomposing federated queries in presence of replicated fragments. Web Semant.: Sci. Serv. Agents World Wide Web 42(1) (2017). http://www.websemanticsjournal.org/index.php/ps/article/view/486
  93. 93.
    Montoya, G., Vidal, M.E., Acosta, M.: A heuristic-based approach for planning federated SPARQL queries. In: Proceedings of the Third International Conference on Consuming Linked Data, COLD 2012, vol. 905, pp. 63–74. CEUR-WS.org, Aachen (2012). http://dl.acm.org/citation.cfm?id=2887367.2887373
  94. 94.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25073-6_29CrossRefGoogle Scholar
  95. 95.
    Mutharaju, R., Sakr, S., Sala, A., Hitzler, P.: D-SPARQ: distributed, scalable and efficient RDF query engine. In: ISWC (Posters & Demos) 2013, pp. 261–264 (2013)Google Scholar
  96. 96.
    Naacke, H., Amann, B., Curé, O.: SPARQL graph pattern processing with apache spark. In: Proceedings of the Fifth International Workshop on Graph Data-Management Experiences and Systems, GRADES 2017, pp. 1:1–1:7. ACM, New York (2017).  https://doi.org/10.1145/3078447.3078448
  97. 97.
    Nejdl, W., et al.: Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks. In: Proceedings of the 12th International Conference on World Wide Web, WWW 2003, pp. 536–543. ACM, New York (2003).  https://doi.org/10.1145/775152.775229
  98. 98.
    Norvig, P.: The semantic web and the semantics of the web: where does meaning come from? In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, p. 1. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2016)Google Scholar
  99. 99.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1099–1110. ACM, New York (2008).  https://doi.org/10.1145/1376616.1376726
  100. 100.
    Oren, E., Kotoulas, S., Anadiotis, G., Siebes, R., ten Teije, A., van Harmelen, F.: Marvin: distributed reasoning over large-scale Semantic Web data. Web Semant.: Sci. Serv. Agents World Wide Web 7(4) (2009). http://www.websemanticsjournal.org/index.php/ps/article/view/173
  101. 101.
    Osorio, M., Aranda, C.B.: Storage balancing in P2P based distributed RDF data stores. In: Proceedings of the Workshop on Decentralizing the Semantic Web 2017, Co-located with 16th International Semantic Web Conference (ISWC 2017) (2017). http://ceur-ws.org/Vol-1934/contribution-04.pdf
  102. 102.
    Owens, A., Seaborne, A., Gibbins, N., schraefel, M.: Clustered TDB: A Clustered Triple Store for Jena (2008). http://eprints.soton.ac.uk/266974/
  103. 103.
    Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., Koziris, N.: H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 909–912. ACM, New York (2014).  https://doi.org/10.1145/2588555.2594535
  104. 104.
    Peng, P., Zou, L., Chen, L., Zhao, D.: Query workload-based RDF graph fragmentation and allocation. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, 15–16 March 2016, pp. 377–388 (2016).  https://doi.org/10.5441/002/edbt.2016.35
  105. 105.
    Peng, P., Zou, L., Özsu, M.T., Chen, L., Zhao, D.: Processing SPARQL queries over distributed RDF graphs. VLDB J. 25(2), 243–268 (2016).  https://doi.org/10.1007/s00778-015-0415-0CrossRefGoogle Scholar
  106. 106.
    Penteado, R.R.M., Scroeder, R., Hara, C.S.: Exploring controlled RDF distribution. In: 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 160–167 (2016).  https://doi.org/10.1109/CloudCom.2016.0038
  107. 107.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009).  https://doi.org/10.1145/1567274.1567278CrossRefGoogle Scholar
  108. 108.
    Potter, A., Motik, B., Horrocks, I.: Querying distributed RDF graphs: the effects of partitioning. In: Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2014), pp. 29–44 (2014)Google Scholar
  109. 109.
    Potter, A., Motik, B., Nenov, Y., Horrocks, I.: Distributed RDF query answering with dynamic data exchange. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 480–497. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46523-4_29CrossRefGoogle Scholar
  110. 110.
    Prud’hommeaux, E., Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation, W3C (2013). http://www.w3.org/TR/sparql11-query/
  111. 111.
    Przyjaciel-Zablocki, M., Schätzle, A., Lausen, G.: TriAL-QL: distributed processing of navigational queries. In: Proceedings of the 18th International Workshop on Web and Databases, WebDB 2015, pp. 48–54, ACM, New York (2015).  https://doi.org/10.1145/2767109.2767115
  112. 112.
    Przyjaciel-Zablocki, M., Schätzle, A., Lausen, G.: Querying semantic knowledge bases with SQL-on-Hadoop. In: Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2017, pp. 4:1–4:10. ACM, New York (2017).  https://doi.org/10.1145/3070607.3070610
  113. 113.
    Pujol, J.M., Erramilli, V., Rodriguez, P.: Divide and conquer: partitioning online social networks. CoRR abs/0905.4 (2009). http://arxiv.org/abs/0905.4918
  114. 114.
    Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In: 1st International Workshop on Cloud Intelligence, pp. 4:1–4:8. ACM (2012).  https://doi.org/10.1145/2347673.2347677
  115. 115.
    Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-68234-9_39CrossRefGoogle Scholar
  116. 116.
    Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications, PSI EtA 2010, pp. 4:1–4:5. ACM, New York (2010).  https://doi.org/10.1145/1940747.1940751
  117. 117.
    Russell, J.: Getting Started with Impala: Interactive SQL for Apache Hadoop. O’Reilly Media (2014). http://shop.oreilly.com/product/0636920033936.do
  118. 118.
    Sakr, S., Wylot, M., Mutharaju, R., Le Phuoc, D., Fundulaki, I., I.: Linked Data: Storing, Querying, and Reasoning. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73515-3CrossRefGoogle Scholar
  119. 119.
    Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M. (ed.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_4CrossRefGoogle Scholar
  120. 120.
    Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: duplicate-aware federated query processing over the web of data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_36CrossRefGoogle Scholar
  121. 121.
    Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G.: S2X: graph-parallel querying of RDF with GraphX. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds.) Big-O(Q)/DMAH -2015. LNCS, vol. 9579, pp. 155–168. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-41576-5_12CrossRefGoogle Scholar
  122. 122.
    Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: mapping SPARQL to pig Latin. In: Proceedings of the International Workshop on Semantic Web Information Management, SWIM 2011, pp. 4:1–4:8. ACM, New York (2011).  https://doi.org/10.1145/1999299.1999303
  123. 123.
    Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 164–179. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_11CrossRefGoogle Scholar
  124. 124.
    Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. PVLDB 9(10), 804–815 (2016). http://www.vldb.org/pvldb/vol9/p804-schaetzle.pdf
  125. 125.
    Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L. (ed.) ISWC 2011. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25073-6_37CrossRefGoogle Scholar
  126. 126.
    Schmidt, M., Hornung, T., Meier, M., Pinkel, C., Lausen, G.: SP\(^2\)Bench: a SPARQL performance benchmark. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management: A Model-Based Perspective, pp. 371–393. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-04329-1_16CrossRefGoogle Scholar
  127. 127.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25073-6_38CrossRefGoogle Scholar
  128. 128.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010).  https://doi.org/10.1109/MSST.2010.5496972
  129. 129.
    Stein, R., Zacharias, V.: RDF on cloud number nine. In: Ceri, S., Valle, E.D., Hendler, J., Huang, Z. (eds.) Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic. CEUR Workshop Proceedings (2010)Google Scholar
  130. 130.
    Stutz, P., Verman, M., Fischer, L., Bernstein, A.: TripleRush: a fast and scalable triple store. In: 9th International Workshop on Scalable Semantic Web Knowledge Base Systems. CEUR Workshop Proceedings, Aachen (2013). http://ceur-ws.org
  131. 131.
    Stutz, P., Bernstein, A., Cohen, W.: Signal/collect: graph algorithms for the (semantic) web. ISWC 2010. LNCS, vol. 6496, pp. 764–780. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-17746-0_48CrossRefGoogle Scholar
  132. 132.
    Stutz, P., Paudel, B., Verman, M., Bernstein, A.: Random walk TripleRush: asynchronous graph querying and sampling. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 1034–1044. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2015).  https://doi.org/10.1145/2736277.2741687
  133. 133.
    Wang, R., Chiu, K.: Optimizing distributed RDF triplestores via a locally indexed graph partitioning. In: 2012 41st International Conference on Parallel Processing (ICPP), pp. 259–268 (2012).  https://doi.org/10.1109/ICPP.2012.47
  134. 134.
    Wang, X., Tiropanis, T., Davis, H.C.: LHD: optimising linked data query processing using parallelisation. In: Proceedings of the WWW 2013 Workshop on Linked Data on the Web, Rio de Janeiro, Brazil, 14 May 2013. http://ceur-ws.org/Vol-996/papers/ldow2013-paper-06.pdf
  135. 135.
    White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Beijing (2015). https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781491901687/
  136. 136.
    Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. Distrib. Parallel Databases 1(1), 103–128 (1993).  https://doi.org/10.1007/BF01277522CrossRefGoogle Scholar
  137. 137.
    Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using path partitioning. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 795–806 (2015).  https://doi.org/10.1109/ICDE.2015.7113334
  138. 138.
    Wu, B., Zhou, Y., Yuan, P., Jin, H., Liu, L.: SemStore: a semantic-preserving distributed RDF triple store. In: CIKM 2014 (2014)Google Scholar
  139. 139.
    Wylot, M., Cudré-Mauroux, P.: Diplocloud: efficient and scalable management of rdf data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016).  https://doi.org/10.1109/TKDE.2015.2499202CrossRefGoogle Scholar
  140. 140.
    Xu, Z., Chen, W., Gai, L., Wang, T.: SparkRDF: in-memory distributed RDF management framework for large-scale social data. In: Dong, X.L., Yu, X., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 337–349. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-21042-1_27CrossRefGoogle Scholar
  141. 141.
    Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 517–528. ACM, New York (2012).  https://doi.org/10.1145/2213836.2213895
  142. 142.
    Yang, T., Chen, J., Wang, X., Chen, Y., Du, X.: Efficient SPARQL query evaluation via automatic data partitioning. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7826, pp. 244–258. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37450-0_18CrossRefGoogle Scholar
  143. 143.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
  144. 144.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013).  https://doi.org/10.14778/2535570.2488333CrossRefGoogle Scholar
  145. 145.
    Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: ICDE 2013, pp. 565–576 (2013).  https://doi.org/10.1109/ICDE.2013.6544856
  146. 146.
    Zhang, X., Chen, L., Wang, M.: Towards efficient join processing over large RDF graph using MapReduce. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 250–259. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-31235-9_16CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute for Web Science and TechnologiesUniversität Koblenz-LandauKoblenzGermany
  2. 2.Web and Internet Science GroupUniversity of SouthamptonSouthamptonUK

Personalised recommendations