Cloud Computing pp 173-210

Part of the Computer Communications and Networks book series (CCN)

High-Performance Graph Data Management and Mining in Cloud Environments with X10

Chapter

Abstract

Large-scale graph data management and mining in cloud environments have been a widely discussed issue in recent times. The goal and the scope of this chapter is to discuss how X10 (a Partitioned Global Address Space (PGAS) language) has been applied for programming data-intensive systems. Specifically, we focus on the problem of creating scalable systems for storing and processing large-scale graph data on HPC clouds with X10. The chapter first discusses about large-scale graph processing with X10. Next, it describes the experience of designing and implementing a distributed graph database engine called Acacia with X10. We specifically focus on Acacia’s RDF extension. Finally, it will describe how a graph database benchmarking framework called XGDBench has been developed to analyze the performance of graph database servers. Overall the chapter describes our experiences of implementing such graph-based systems and frameworks with X10.

References

  1. 1.
    Abadi DJ, Marcus A, Madden SR, Hollenbach K (2009) Sw-store: a vertically partitioned DBMS for semantic web data management. VLDB J 18(2):385–406. doi: 10.1007/s00778-008-0125-y CrossRefGoogle Scholar
  2. 2.
    Agarwal S, Barik R, Sarkar V, Shyamasundar RK (2007) May-happen-in-parallel analysis of x10 programs, PPoPP ’07, San Jose, pp 183–193Google Scholar
  3. 3.
    Aggarwal CC, Wang H (2010) A survey of clustering algorithms for graph data. In: Aggarwal CC, Wang H, Elmagarmid AK (eds) Managing and mining graph data. The Kluwer international series on advances in database systems, vol 40. Springer, New York, pp 275–301CrossRefGoogle Scholar
  4. 4.
    An P, Jula A, Rus S, Saunders S, Smith T, Tanase G, Thomas N, Amato N, Rauchwerger L (2003) STAPL: an adaptive, generic parallel c++ library. In: Proceedings of the 14th international conference on Languages and compilers for parallel computing, LCPC’01. Springer, Berlin/Heidelberg, pp 193–208CrossRefGoogle Scholar
  5. 5.
    Anthonisse J (1971) The rush in a directed graph. Technical report BN 9/71Google Scholar
  6. 6.
    Arnold M, Grove D, Herta B, Hind M, Hirzel M, Iyengar A, Mandel L, Saraswat VA, Shinnar A, Siméon J, Takeuchi M, Tardieu O, Zhang W (2016) Meta: middleware for events, transactions, and analytics. IBM J Res Dev 60(2–3):15:1–15:10. doi: 10.1147/JRD.2016.2527419
  7. 7.
  8. 8.
    Aurelius (2015) Titan: distributed graph database. http://thinkaurelius.github.io/titan/
  9. 9.
    Bader D, Cong G, Feo J (2005) On the architectural requirements for efficient execution of graph algorithms. In: International conference on parallel processing, ICPP 2005, Oslo, pp 547–556Google Scholar
  10. 10.
    Bader DA, Feo J, Gilbert J, Kepner J, Koester D, Loh E, Madduri K, Mann B, Meuse T, Robinson E (2009) HPC scalable graph analysis benchmark. http://www.graphanalysis.org/benchmark/
  11. 11.
    Barrett B, Berry J, Murphy R, Wheeler K (2009) Implementing a portable multi-threaded graph library: the MTGL on Qthreads. In: IEEE international symposium on parallel distributed processing, IPDPS 2009, Rome, pp 1 –8Google Scholar
  12. 12.
    Batenkov D (2011) Boosting productivity with the boost graph library. XRDS 17:31–32CrossRefGoogle Scholar
  13. 13.
    Berry J, Hendrickson B, Kahan S, Konecny P (2007) Software and algorithms for graph queries on multithreaded architectures. In: IEEE international parallel and distributed processing symposium, IPDPS 2007, Long Beach, pp 1–14Google Scholar
  14. 14.
    Bizer C, Schultz A (2009) The Berlin SPARQL Benchmark. Int J Semant Web Inf Syst 5(2):1–24CrossRefGoogle Scholar
  15. 15.
    Blackford LS, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley RC (1997) ScaLAPACK Users’ guide. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefMATHGoogle Scholar
  16. 16.
    Brandes U (2001) A Faster algorithm for betweenness centrality. J Math Sociol 25:163–177CrossRefMATHGoogle Scholar
  17. 17.
    Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Fourth SIAM international conference on data mining, PhiladelphiaGoogle Scholar
  18. 18.
    Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA ’05. ACM, New York, pp 519–538. doi: 10.1145/1094811.1094852 CrossRefGoogle Scholar
  19. 19.
    Ciglan M, Averbuch A, Hluchy L (2012) Benchmarking traversal operations over graph databases. In: 2012 IEEE 28th international conference on data engineering workshops (ICDEW), Arlington, pp 186–189Google Scholar
  20. 20.
    Cong G, Almasi G, Saraswat V (2009) Fast PGAS connected components algorithms, PGAS ’09. ACM, New York, pp 13:1–13:6Google Scholar
  21. 21.
    Cong G, Almasi G, Saraswat V (2010) Fast PGAS implementation of distributed graph algorithms, SC ’10. IEEE Computer Society, Washington, DC, pp 1–11Google Scholar
  22. 22.
    Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC ’10. ACM, New York, pp 143–154. doi: http://doi.acm.org/10.1145/1807128.1807152
  23. 23.
    Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst 1695. http://igraph.sf.net
  24. 24.
    Cunningham D, Grove D, Herta B, Iyengar A, Kawachiya K, Murata H, Saraswat V, Takeuchi M, Tardieu O (2014) Resilient x10: efficient failure-aware programming. In: Proceedings of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’14. ACM, New York, pp 67–80. doi: 10.1145/2555243.2555248 Google Scholar
  25. 25.
    Dayarathna M, Suzumura T (2012) Xgdbench: a benchmarking platform for graph stores in exascale clouds. In: 2012 IEEE 4th international conference on cloud computing technology and science (CloudCom), pp 363–370. doi: 10.1109/CloudCom.2012.6427516
  26. 26.
    Dayarathna M, Suzumura T (2014) Graph database benchmarking on cloud environments with XGDBench. Autom softw Eng 21(4):509–533. doi: 10.1007/s10515-013-0138-7 CrossRefGoogle Scholar
  27. 27.
    Dayarathna M, Suzumura T (2014) Towards emulation of large scale complex network workloads on graph databases with XGDBench. In: 2014 IEEE international congress on big data, pp 748–755. doi: 10.1109/BigData.Congress.2014.140
  28. 28.
    Dayarathna M, Suzumura T (2014) Towards scalable distributed graph database engine for hybrid clouds. In: 2015 5th international workshop on data-intensive computing in the clouds (DataCloud), pp 1–8. doi: 10.1109/DataCloud.2014.9
  29. 29.
    Dayarathna M, Houngkaew C, Ogata H, Suzumura T (2012) Scalable performance of scalegraph for large scale graph analysis. In: 2012 19th international conference on high performance computing (HiPC), pp 1–9. doi: 10.1109/HiPC.2012.6507498
  30. 30.
    Dayarathna M, Houngkaew C, Suzumura T (2012) Introducing scalegraph: an x10 library for billion scale graph analytics. In: Proceedings of the 2012 ACM SIGPLAN X10 workshop, X10 ’12. ACM, New York, pp 6:1–6:9. doi: 10.1145/2246056.2246062, http://doi.acm.org/10.1145/2246056.2246062
  31. 31.
    Dayarathna M, Herath I, Dewmini Y, Mettananda G, Nandasiri S, Jayasena S, Suzumura T (2016) Introducing acacia-RDF: an x10-based scalable distributed RDF graph database engine. In: 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 1024–1032. doi: 10.1109/IPDPSW.2016.31
  32. 32.
    Dongarra J et al (2011) The international exascale software project roadmap. Int J high Perform Comput Appl 25(1):3–60CrossRefGoogle Scholar
  33. 33.
    Ediger D, Jiang K, Riedy J, Bader DA, Corley C (2010) Massive social network analysis: mining twitter for social good. In: Proceedings of the 2010 39th international conference on parallel processing, ICPP ’10. IEEE Computer Society, Washington, DC, pp 583–593Google Scholar
  34. 34.
    Fortunato S (2009) Community detection in graphs. CoRR abs/0906.0612Google Scholar
  35. 35.
    Freeman LC (1977) A Set of Measures of centrality based on betweenness. Sociometry 40(1):35–41CrossRefGoogle Scholar
  36. 36.
    SPARQL G (2016) The SPARQL (pron: sparkle) query language antlr4 grammar. https://code.google.com/p/sparkle-g/
  37. 37.
    Garcia R, Jarvi J, Lumsdaine A, Siek JG, Willcock J (2003) A comparative study of language support for generic programming, OOPSLA’03. ACM, New York, pp 115–134MATHGoogle Scholar
  38. 38.
    Gregor D, Lumsdaine A (2005) Lifting sequential graph algorithms for distributed-memory parallel computation. SIGPLAN Not 40:423–437CrossRefGoogle Scholar
  39. 39.
    Grove D, Tardieu O, Cunningham D, Herta B, Peshansky I, Saraswat V (2011) A performance model for x10 applications: What’s going on under the hood?Google Scholar
  40. 40.
    Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant 3(2–3):158–182. doi: 10.1016/j.websem.2005.06.005 CrossRefGoogle Scholar
  41. 41.
    Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 289–300. doi: 10.1145/2588555.2610511 Google Scholar
  42. 42.
    Hammoud M, Rabbou DA, Nouri R, Beheshti SMR, Sakr S (2015) Dream: distributed RDF engine with adaptive query planner and minimal communication. Proc VLDB Endow 8(6):654–665. doi: 10.14778/2735703.2735705
  43. 43.
    Hielscher F, Gottschling P (2012) Pargraph. http://pargraph.sourceforge.net/
  44. 44.
    Ho LY, Wu JJ, Liu P (2012) Distributed graph database for large-scale social computing. In: 2012 IEEE 5th international conference on cloud computing (CLOUD), Piscataway, pp 455–462Google Scholar
  45. 45.
    Huppler K (2009) The art of building a good benchmark. In: Nambiar R, Poess M (ed) Performance evaluation and benchmarking. Springer, Berlin/Heidelberg, pp 18–30Google Scholar
  46. 46.
    IBM (2014) X10: performance and productivity at scale. http://x10-lang.org/
  47. 47.
    Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATHGoogle Scholar
  48. 48.
    Kemal Ebcioglu VS Vijay Saraswat (2004) X10: Programming for hierarchical parallelism and non-uniform data access. In: 3rd international workshop on language runtimes, impact of next generation processor architectures on virtual machine technologiesGoogle Scholar
  49. 49.
    Labouseur AG, Birnbaum J, Olsen J PaulW, Spillane S, Vijayan J, Hwang JH, Han WS (2014) The g* graph database: efficiently managing large distributed dynamic graphs. Distrib Parallel Databases 1–36. doi: 10.1007/s10619-014-7140-3
  50. 50.
    Law J (2003) Review of “the boost graph library: user guide and reference manual by jeremy g. siek, lie-quan lee, and andrew lumsdaine.” addison-wesley 2002. ACM SIGSOFT Softw Eng Notes 28(2):35–36Google Scholar
  51. 51.
    Lee LQ, Siek JG, Lumsdaine A (1999) The generic graph component library. SIGPLAN Not 34:399–414CrossRefGoogle Scholar
  52. 52.
    Leskovec J (2012) Snap: Stanford network analysis project. http://snap.stanford.edu/
  53. 53.
    Lugowski A, Alber D, Buluç A, Gilbert J, Reinhardt S, Teng Y, Waranis A (2012, accepted) A flexible open-source toolbox for scalable complex graph analysis. In: SIAM Conference on Data Mining (SDM), PhiladelphiaGoogle Scholar
  54. 54.
    Ma L, Yang Y, Qiu Z, Xie G, Pan Y, Liu S (2006) Towards a complete owl ontology benchmark. In: Sure Y, Domingue J (eds) The semantic web: research and applications. Lecture notes in computer science, vol 4011. Springer, Berlin/Heidelberg, pp 125–139CrossRefGoogle Scholar
  55. 55.
    Madduri K, Hendrickson B, Berry J, Bader D, Crobak J (2008) Multithreaded algorithms for processing massive graphsGoogle Scholar
  56. 56.
    Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data, SIGMOD ’10. ACM, New York, pp 135–146CrossRefGoogle Scholar
  57. 57.
    Marsland S (2009) Machine learning: an algorithmic perspective. Chapman & Hall/CRC, Boca RatonGoogle Scholar
  58. 58.
    Morsey M, Lehmann J, Auer S, Ngomo ACN (2011) Dbpedia sparql benchmark – performance assessment with real queries on real data. In: International semantic web conference (1)’11, pp 454–469Google Scholar
  59. 59.
    Myunghwan K, Leskovec J (2012) Multiplicative attribute graph model of real-world networks. Internet Math 8(1-2):113–160MathSciNetCrossRefMATHGoogle Scholar
  60. 60.
    Newmann M (2010) Networks: an introduction. Oxford University Press, Oxford/New YorkCrossRefGoogle Scholar
  61. 61.
    Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. The VLDB J 19(1):91–113. doi: 10.1007/s00778-009-0165-y CrossRefGoogle Scholar
  62. 62.
    Newmann M, Barabasi AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, PrincetonGoogle Scholar
  63. 63.
    NMON (2016) NMON performance: a free tool to analyze aix and linux performance. http://www.ibm.com/developerworks/aix/library/au-analyze_aix/
  64. 64.
    Ogata H, Dayarathna M, Suzumura T (2012) Towards highly scalable x10 based spectral clustering. In: 2012 19th international conference on high performance computing, pp 1–5. doi: 10.1109/HiPC.2012.6507522
  65. 65.
    O’Madadhain J, Fisher D, White S, Boey Y (2003) The JUNG (Java Universal Network/Graph) Framework. Technical report, UCI-ICSGoogle Scholar
  66. 66.
    Papailiou N, Tsoumakos D, Karras P, Koziris N (2015) Graph-aware, workload-adaptive sparql query caching. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD 2015. ACM, New York, pp 1777–1792. doi: 10.1145/2723372.2723714 Google Scholar
  67. 67.
    Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan MA, Kaleem R, Lee TH, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D, Sui X (2011) The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation, PLDI ’11. ACM, New York, pp 12–25CrossRefGoogle Scholar
  68. 68.
    Project AX (2012) Xerces-c++ xml parser. http://xerces.apache.org/xerces-c/
  69. 69.
    Rohloff K, Dean M, Emmons I, Ryder D, Sumner J (2007) An evaluation of triple-store technologies for large data stores. In: On the move to meaningful Internet systems 2007: OTM 2007 workshops. Lecture notes in computer science, vol 4806. Springer, Berlin/Heidelberg, pp 1105–1114Google Scholar
  70. 70.
    Sarwat M, Elnikety S, He Y, Kliot G (2012) Horton: online query execution engine for large distributed graphs. In: ICDE, pp 1289–1292Google Scholar
  71. 71.
    Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27 – 64CrossRefMATHGoogle Scholar
  72. 72.
    Schmidt M, Hornung T, Lausen G, Pinkel C (2008) Sp2bench: a SPARQL performance benchmark. CoRR abs/0806.4627Google Scholar
  73. 73.
    Skiena SS (2008) The algorithm design manual. 2nd edn. Springer, LondonCrossRefMATHGoogle Scholar
  74. 74.
    Sourceforge (2012) Jung – java universal network/graph framework. http://jung.sourceforge.net/index.html
  75. 75.
    Tardieu O, Herta B, Cunningham D, Grove D, Kambadur P, Saraswat V, Shinnar A, Takeuchi M, Vaziri M (2014) X10 and apgas at petascale. In: Proceedings of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’14. ACM, New York, pp 53–66. doi: 10.1145/2555243.2555245 Google Scholar
  76. 76.
    Thakker D, Osman T, Gohil S, Lakin P (2010) A pragmatic approach to semantic repositories benchmarking. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) The semantic web: research and applications. Lecture notes in computer science, vol 6088. Springer, Berlin/Heidelberg, pp 379–393CrossRefGoogle Scholar
  77. 77.
    Versaci F, Pingali K (2012) Processor allocation for optimistic parallelization of irregular programs. In: Proceedings of the 12th international conference on computational science and its applications – volume part I, ICCSA’12. Springer, Berlin/Heidelberg, pp 1–14Google Scholar
  78. 78.
    Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D (2010) A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, pp 42:1–42:6Google Scholar
  79. 79.
    W3C (2015) RDF – semantic web standards. http://www.w3.org/RDF/
  80. 80.
    WANG J (2009) Sequential patterns. In: LIU L, öZSU M (eds) Encyclopedia of database systems. Springer, New York, pp 2621–2625Google Scholar
  81. 81.
    Wood D, Zaidman M, Ruth L, Hausenblas M (2014) Linked Data. Manning, Shelter IslandGoogle Scholar
  82. 82.
    Wu B, Zhou Y, Yuan P, Jin H, Liu L (2014) Semstore: a semantic-preserving distributed RDF triple store. In: Proceedings of the 23rd ACM international conference on information and knowledge management, CIKM ’14. ACM, New York, pp 509–518. doi: 10.1145/2661829.2661876 Google Scholar
  83. 83.
    Xia Y, Tanase I, Nai L, Tan W, Liu Y, Crawford J, Lin CY (2014) Graph analytics and storage. In: IEEE international conference on big data (Big Data), pp 942–951. doi: 10.1109/BigData.2014.7004326
  84. 84.
    Yuan P, Liu P, Wu B, Jin H, Zhang W, Liu L (2013) Triplebit: a fast and compact system for large scale RDF data. Proc VLDB Endow 6(7):517–528. doi: 10.14778/2536349.2536352
  85. 85.
    Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. In: Proceedings of the 39th international conference on Very Large Data Bases, VLDB Endowment, PVLDB’13, pp 265–276. http://dl.acm.org/citation.cfm?id=2488329.2488333
  86. 86.
    Zhao Z, Liu J, Crespi N (2011) The design of activity-oriented social networking: Dig-event. In: Proceedings of the 13th international conference on information integration and web-based applications and services, IIWAS ’11. ACM, New York, pp 420–425Google Scholar
  87. 87.
    Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. Proc VLDB Endow 4(8):482–493. doi: 10.14778/2002974.2002976

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.WSO2, Inc.Mountain ViewUSA
  2. 2.University of MoratuwaMoratuwaSri Lanka
  3. 3.T.J. Watson Research Center, IBMNew YorkUSA
  4. 4.Barcelona Supercomputing CenterBarcelonaSpain
  5. 5.University of TokyoTokyoJapan

Personalised recommendations