Skip to main content

Compiled Plans for In-Memory Path-Counting Queries

  • Conference paper
  • First Online:
  • 908 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8921))

Abstract

Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation details, not a fundamental limitation of the relational model. To evaluate this hypothesis, we are exploring code-generation to produce fast in-memory algorithms and data structures for graph patterns that are inaccessible to conventional relational optimizers.

In this paper, we present preliminary results for this approach on path-counting queries, which includes triangle counting as a special case. We compile Datalog queries into main-memory pipelined hash-join plans in C\(++\), and show that the resulting programs easily outperform PostgreSQL on real graphs with different degrees of skew. We then produce analogous parallel programs for Grappa, a runtime system for distributed memory architectures. Grappa is a good target for building a parallel query system as its shared memory programming model and communication mechanisms provide productivity and performance when building communication-intensive applications. Our experiments suggest that Grappa programs using hash joins have competitive performance with queries executed on a commercial parallel database. We find preliminary evidence that a code generation approach simplifies the design of a query engine for graph analysis and improves performance over conventional relational databases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    All queries considered in this paper can be expressed with a single Datalog rule.

References

  1. neo4j open source graph database, May 2013. http://neo4j.org/

  2. Ahmad, Y., Koch, C.: DBToaster: a SQL compiler for high-performance delta processing in main-memory databases. Proc. VLDB Endow. 2(2), 1566–1569 (2009)

    Article  Google Scholar 

  3. Angles, R., Gutierrez, C.: The expressive power of SPARQL. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 114–129. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Backstrom, L., et al.: Group formation in large social networks: membership, growth, and evolution. In: ACM KDD, pp. 44–54 (2006)

    Google Scholar 

  5. Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1995, pp. 207–216. ACM, New York (1995)

    Google Scholar 

  6. Caverlee, J., Liu, L.: Countering web spam with credibility-based link analysis. In: ACM Principles of Distributed Computing (PODC), pp. 157–166 (2007)

    Google Scholar 

  7. Chen, S., Ailamaki, A., Gibbons, P., Mowry, T.: Improving hash join performance through prefetching. In: International Conference on Data Engineering (ICDE), pp. 116–127 (2004)

    Google Scholar 

  8. Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Gonzalez, J.E., et al.: PowerGraph: distributed graph-parallel computation on natural graphs. In: USENIX Operating Systems Design and Implementation (OSDI), pp. 17–30 (2012)

    Google Scholar 

  10. Hagberg, A.A., et al.: Exploring network structure, dynamics, and function using NetworkX. In: Python in Science Conference (SciPy), pp. 11–15, August 2008

    Google Scholar 

  11. HP-Vertica. Vertica analytics platform, June 2013. http://www.vertica.com

  12. Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with MapReduce. arXiv preprint arXiv:1301.5887 (2013)

  13. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: International Conference on World Wide Web (WWW), pp. 591–600 (2010)

    Google Scholar 

  14. Leskovec, J., et al.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. CoRR, abs/0810.1355 (2008)

    Google Scholar 

  15. Loo, B.T., et al.: Declarative routing: extensible routing with declarative queries. SIGCOMM Comput. Commun. Rev. 35(4), 289–300 (2005)

    Article  Google Scholar 

  16. Losemann, K., Martens, W.: The complexity of evaluating path expressions in SPARQL. In: Proceedings of Principles of Database Systems (PODS) (2012)

    Google Scholar 

  17. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD, pp. 135–146 (2010)

    Google Scholar 

  18. Mandal, A., Fowler, R., Porterfield, A.: Modeling memory concurrency for multi-socket multi-core systems. In: Performance Analysis of Systems Software (ISPASS), March 2010

    Google Scholar 

  19. Nelson, J., et al.: Crunching large graphs with commodity processors. In: USENIX Conference on Hot Topics in Parallelism (HotPar), pp. 10–10 (2011)

    Google Scholar 

  20. Neumann, T.: Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4(9), 539–550 (2011)

    Article  Google Scholar 

  21. Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. In: Proceedings of the 36th International Conference on Very Large Data Bases, PVLDB 2013 (2010)

    Google Scholar 

  22. Pavan, A., Tangwongan, K., Tirthapura, S.: Parallel and distributed triangle counting on graph streams. Technical report, IBM (2013)

    Google Scholar 

  23. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Przyjaciel-Zablocki, M., Schätzle, A., Hornung, T., Lausen, G.: RDFPath: path query processing on large RDF graphs with MapReduce. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 50–64. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  25. Rompf, T., Odersky, M.: Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. SIGPLAN Not. 46(2), 127–136 (2010)

    Article  Google Scholar 

  26. Seo, J., Guo, S., Lam, M.S.: SociaLite: datalog extensions for efficient social network analysis. In: 29th IEEE International Conference on Data Engineering. IEEE (2013)

    Google Scholar 

  27. Waas, F.M.: Beyond conventional data warehousing-massively parallel data processing with Greenplum database. In: Castellanos, M., Dayal, U., Sellis, T. (eds.) BIRTE 2008. LNBIP, vol. 27, pp. 89–96. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  28. Welc, A., Raman, R., Wu, Z., Hong, S., Chafi, H., Banerjee, J.: Graph analysis: do we have to reinvent the wheel? In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 7:1–7:6. ACM, New York (2013)

    Google Scholar 

  29. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: ACM SIGKDD Workshop on Mining Data Semantics, pp. 3:1–3:8 (2012)

    Google Scholar 

  30. Zhang, W., Zhao, D., Wang, X.: Agglomerative clustering via maximum incremental path integral. Pattern Recogn. 46, 3056–3065 (2013)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brandon Myers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Myers, B., Hyrkas, J., Halperin, D., Howe, B. (2015). Compiled Plans for In-Memory Path-Counting Queries. In: Jagatheesan, A., Levandoski, J., Neumann, T., Pavlo, A. (eds) In Memory Data Management and Analysis. IMDM IMDM 2013 2014. Lecture Notes in Computer Science(), vol 8921. Springer, Cham. https://doi.org/10.1007/978-3-319-13960-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13960-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13959-3

  • Online ISBN: 978-3-319-13960-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics