Views and Transactional Storage for Large Graphs

  • Michael M. Lee
  • Indrajit Roy
  • Alvin AuYoung
  • Vanish Talwar
  • K. R. Jayaram
  • Yuanyuan Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8275)

Abstract

A growing number of applications store and analyze graph-structured data. These applications impose challenging infrastructure demands due to a need for scalable, high-throughput, and low-latency graph processing. Existing state-of-the-art storage systems and data processing systems are limited in at least one of these dimensions, and simply layering these technologies is inadequate.

We present Concerto, a graph store based on distributed, in-memory data structures. In addition to enabling efficient graph traversals by co-locating graph nodes and associated edges where possible, Concerto provides transactional updates while scaling to hundreds of nodes. Concerto introduces graph views to denote sub-graphs on which user-defined functions can be invoked. Using graph views, programmers can perform event-driven analysis and dynamically optimize application performance. Our results show that Concerto is significantly faster than in-memory MySQL, in-memory Neo4j, and GemFire for graph insertions as well as graph queries. We demonstrate the utility of Concerto’s features in the design of two real-world applications: real-time incident impact analysis on a road network and targeted advertising in a social network.

Keywords

Graphs transactions views event-driven analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Facebook’s new realtime analytics system: Hbase to process 20 billion events per day, http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html
  2. 2.
  3. 3.
    Miller, M., Gupta, C., Wang, Y.: An empirical analysis of the impact of incidents on freeway traffic. Research paper HPL-2011-134, Hewlett Packard, Palo Alto, CA, USA (2011)Google Scholar
  4. 4.
    Caltrans performance measurement system (pems), http://pems.dot.ca.gov/
  5. 5.
    Malewicz, G., Austern, M.H., Bik, A.J., Dehñert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)Google Scholar
  6. 6.
    GemFire: Technical white paper, copyright 2005 by gemstone systems (2005), http://community.gemstone.com/display/gemfire60/EDF+Technical+White+Paper
  7. 7.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of OSDI 2004, pp. 137–150 (December 2004)Google Scholar
  8. 8.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI, San Jose, CA, pp. 1–14 (2012)Google Scholar
  9. 9.
    Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI, Hollywood, pp. 1–14 (October 2012)Google Scholar
  10. 10.
    Lattanzi, S., Moseley, B., Suri, S., Vassilvitskii, S.: Filtering: a method for solving graph problems in mapreduce. In: Proceedings of SPAA, 85–94 (2011)Google Scholar
  11. 11.
    Infinitegraph: The distributed graph database, http://www.infinitegraph.com/
  12. 12.
    Neo4j: Nosql for the enterprise, http://neo4j.org/
  13. 13.
  14. 14.
    Iordanov, B.: HyperGraphDB: A generalized graph database. In: Shen, H.T., Pei, J., Özsu, M.T., Zou, L., Lu, J., Ling, T.-W., Yu, G., Zhuang, Y., Shao, J. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 25–36. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Martínez-Bazan, N., Gómez-Villamor, S., Escale-Claveras, F.: Dex: A high-performance graph database management system. In: Proceedings of IEEE ICDE Workshop on Graph Data Management, pp. 124–127. IEEE (2011)Google Scholar
  16. 16.
    Prabhakaran, V., Wu, M., Weng, X., McSherry, F., Zhou, L., Haridasan, M.: Managing large graphs on multi-cores with graph awareness. In: Proceedings of USENIX ATC, Berkeley, CA, USA, pp. 1–12 (2012)Google Scholar
  17. 17.
    Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: Proceedings of SIGMOD (2013)Google Scholar
  18. 18.
    Fitzpatrick, B.: Distributed caching with memcached. Linux Journal 2004(124), 5Google Scholar
  19. 19.
    Huang, J., Abadi, D.J., Ren, K.: Scalable sparql querying of large rdf graphs, 1123–1134 (August 2011)Google Scholar
  20. 20.
    Karypis, G., Kumar, V.: Metis - unstructured graph partitioning and sparse matrix ordering system. Technical report, University of Minnesota (1995)Google Scholar
  21. 21.
    Mondal, J., Deshpande, A.: Managing Large Dynamic Graphs Efficiently. In: Proceedings of SIGMOD, pp. 145–156 (2012)Google Scholar
  22. 22.
    Aguilera, M.K., Merchant, A., Shah, M.A., Veitch, A.C., Karamanolis, C.T.: Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst. 27(3), 1–5 (2009)CrossRefGoogle Scholar
  23. 23.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33, 103–111 (1990)CrossRefGoogle Scholar
  24. 24.
    Geambasu, R., Levy, A.A., Kohno, T., Krishnamurthy, A., Levy, H.M.: Comet: An active distributed key-value store. In: Proceedings of OSDI, pp. 1–13 (2010)Google Scholar
  25. 25.
    Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America 99, 2566–2572 (2002)CrossRefMATHGoogle Scholar
  26. 26.
    Stanford large network dataset collection, http://snap.stanford.edu/data/index.html
  27. 27.
    Montresor, A., De Pellegrini, F., Miorandi, D.: Distributed k-core decomposition. In: Proceedings of PODC, pp. 207–208 (2011)Google Scholar
  28. 28.
    Kwon, J., Mauch, M., Varaiya, P.: The components of congestion: delay from incidents, special events, lane closures, weather, potential ramp metering gain, and demand. In: Proceedings of the TRB 85th Annual Meeting (2006)Google Scholar
  29. 29.
  30. 30.
    Sarwat, M., Elnikety, S., He, Y., Kliot, G.: Horton: Online query execution engine for large distributed graphs. In: Proceedings of ICDE. Demonstration (2012)Google Scholar
  31. 31.
    Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: Proceedings of ACM/IEEE Supercomputing, pp. 1–11. IEEE Computer Society, Washington, DC (2010)Google Scholar
  32. 32.
    Pearce, R., Gokhale, M., Amato, N.M.: Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: Proceedings of ACM/IEEE Supercomputing, pp. 1–11. IEEE Computer Society, Washington, DC (2010)Google Scholar
  33. 33.
    Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of EuroSys, pp. 85–98. ACM, New York (2012)Google Scholar
  34. 34.
    Gutiérrez, A., Pucheral, P., Steffen, H., Thévenin, J.M.: Database graph views: A practical model to manage persistent graphs. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (1994)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Michael M. Lee
    • 1
  • Indrajit Roy
    • 2
  • Alvin AuYoung
    • 2
  • Vanish Talwar
    • 2
  • K. R. Jayaram
    • 2
  • Yuanyuan Zhou
    • 1
  1. 1.University of CaliforniaSan DiegoUSA
  2. 2.HP LabsUSA

Personalised recommendations