The VLDB Journal

, Volume 25, Issue 2, pp 125–150 | Cite as

NScale: neighborhood-centric large-scale graph analytics in the cloud

Regular Paper

Abstract

There is an increasing interest in executing complex analyses over large graphs, many of which require processing a large number of multi-hop neighborhoods or subgraphs. Examples include ego network analysis, motif counting, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing influence cascades, and others. These tasks are not well served by existing vertex-centric graph processing frameworks, where user programs are only able to directly access the state of a single vertex at a time, resulting in high communication, scheduling, and memory overheads in executing such tasks. Further, most existing graph processing frameworks ignore the challenges in extracting the relevant portions of the graph that an analysis task is interested in, and loading those onto distributed memory. This paper introduces NScale, a novel end-to-end graph processing framework that enables the distributed execution of complex subgraph-centric analytics over large-scale graphs in the cloud. NScale enables users to write programs at the level of subgraphs rather than at the level of vertices. Unlike most previous graph processing frameworks, which apply the user program to the entire graph, NScale allows users to declaratively specify subgraphs of interest. Our framework includes a novel graph extraction and packing (GEP) module that utilizes a cost-based optimizer to partition and pack the subgraphs of interest into memory on as few machines as possible. The distributed execution engine then takes over and runs the user program in parallel on those subgraphs, restricting the scope of the execution appropriately, and utilizes novel techniques to minimize memory consumption by exploiting overlaps among the subgraphs. We present a comprehensive empirical evaluation comparing against three state-of-the-art systems, namely Giraph, GraphLab, and GraphX, on several real-world datasets and a variety of analysis tasks. Our experimental results show orders-of-magnitude improvements in performance and drastic reductions in the cost of analytics compared to vertex-centric approaches.

Keywords

Graph analytics Cloud computing Egocentric analysis Subgraph extraction Set bin packing Data co-location Social networks 

References

  1. 1.
    Akoglu, L., McGlohon, M., Faloutsos, C.: OddBall: spotting anomalies in weighted graphs. In: PAKDD (2010)Google Scholar
  2. 2.
    Apache Giraph. http://giraph.apache.org
  3. 3.
    Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: WSDM (2011)Google Scholar
  4. 4.
  5. 5.
    Burt, R.S.: Secondhand brokerage: evidence on the importance of local structure for managers, bankers, and analysts. Acad. Manag. J. 50(1), 119–148 (2007)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Burt, R.S.: Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge (2009)Google Scholar
  7. 7.
    Cheng, J., Ke, Y., Ng, W., Lu, A.: Fg-index: towards verification-free query processing on graph databases. In: SIGMOD (2007)Google Scholar
  8. 8.
    Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking pulse of a fast-changing and connected world. In: EuroSys (2012)Google Scholar
  9. 9.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1367–1372 (2004)CrossRefGoogle Scholar
  10. 10.
    Curtiss, M., Becker, I., Bosman, T., Doroshenko, S., Grijincu, L., Jackson, T., Kunnatur, S., Lassen, S., Pronin, P., Sankar, S., Shen, G., Woss, G., Yang, C., Zhang, N.: Unicorn: a system for searching the social graph. In: Proceedings of VLDB Endowment (2013)Google Scholar
  11. 11.
    Everett, M., Borgatti, S.P.: Ego network betweenness. Soc. Netw. 27(1), 31–38 (2005)CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: OSDI (2014)Google Scholar
  14. 14.
    Granovetter, M.S.: The strength of weak ties. Am. J. Sociol. 78, 1360–1380 (1973)CrossRefGoogle Scholar
  15. 15.
  16. 16.
    Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at twitter. In: WWW (2013)Google Scholar
  17. 17.
    He H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD (2008)Google Scholar
  18. 18.
    Hoque, I., Gupta, I.: Lfgraph: simple and fast distributed graph analytics. In: TRIOS (2013)Google Scholar
  19. 19.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. In: PVLDB (2011)Google Scholar
  20. 20.
    Izumi, T., Yokomaru, T., Takahashi, A., Kajitani, Y.: Computational complexity analysis of set-bin-packing problem. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 81(5), 842–849 (1998)Google Scholar
  21. 21.
    Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 1746–1758 (2004)CrossRefGoogle Scholar
  22. 22.
    Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8, 161–185 (2012)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)CrossRefGoogle Scholar
  24. 24.
    Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: PVLDB (2013)Google Scholar
  25. 25.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)Google Scholar
  26. 26.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning in the cloud. In: PVLDB (2012)Google Scholar
  27. 27.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)Google Scholar
  28. 28.
    McAuley, J., Leskovec, J.: Learning to discover social circles in ego networks. In: NIPS (2012)Google Scholar
  29. 29.
  30. 30.
    Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002)Google Scholar
  31. 31.
    Mongiov, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinform. Comput. Biol. 8, 199–218 (2010)CrossRefGoogle Scholar
  32. 32.
    Moustafa, W.E., Namata, G., Deshpande, A., Getoor, L.: Declarative analysis of noisy information networks. In: ICDE Workshops (2011)Google Scholar
  33. 33.
    Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: SOSP (2013)Google Scholar
  34. 34.
    Popescu, A.D., Balmin, A., Ercegovac, V., Ailamaki, A.: PREDIcT: towards predicting the runtime of large scale iterative analytics. In: Proceedings of VLDB Endowment (2013)Google Scholar
  35. 35.
    Pujol, J.M., Erramilli, V., Siganos, G., Xiaoyuan, Y., Laoutaris, N., Chhabra, P., Rodriguez, P.: The little engine(s) that could: scaling online social networks. In: SIGCOMM (2010)Google Scholar
  36. 36.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  37. 37.
  38. 38.
    Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: SOSP (2013)Google Scholar
  39. 39.
    Salihoglu, S., Widom, J.: GPS: a graph processing system. In: SSDBM (2013)Google Scholar
  40. 40.
    Seo, J., Guo, S., Lam, M.S.: Socialite: datalog extensions for efficient social network analysis. In: ICDE (2013)Google Scholar
  41. 41.
    Seo, J., Park, J., Shin, J., Lam, M.S.: Distributed socialite: a datalog-based language for large-scale graph analysis. In: PVLDB (2013)Google Scholar
  42. 42.
    Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. In: VLDB (2008)Google Scholar
  43. 43.
    Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS (2002)Google Scholar
  44. 44.
    Simmhan, Y., Kumbhare, A.G., Wickramaarachchi, C., Nagarkar, S., Ravi, S., Raghavendra, C.S., Prasanna, V.K.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: CoRR (2013)Google Scholar
  45. 45.
    Stanford Network Analysis Project. https://snap.stanford.edu
  46. 46.
    Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “Think Like a Vertex” to “Think Like a Graph”. In: PVLDB (2013)Google Scholar
  47. 47.
    Tian, Y., Patel, J.M.: TALE: a tool for approximate large graph matching. In: ICDE (2008)Google Scholar
  48. 48.
    Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)MathSciNetCrossRefGoogle Scholar
  49. 49.
    Wang, G., Xie, W., Demers, A.J., Gehrke, J.: Asynchronous large-scale graph processing made easy. In: CIDR (2013)Google Scholar
  50. 50.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD (2004)Google Scholar
  51. 51.
    Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: tree + delta less than equal to graph. In: VLDB (2007)Google Scholar
  52. 52.
    Zou, L., Chen, L., Yu, J.X., Lu, Y.: A novel spectral coding in a large graph database. In: EDBT (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.University of MarylandCollege ParkUnited States

Personalised recommendations