Compiled Plans for In-Memory Path-Counting Queries

Myers, Brandon; Hyrkas, Jeremy; Halperin, Daniel; Howe, Bill

doi:10.1007/978-3-319-13960-9_3

Compiled Plans for In-Memory Path-Counting Queries

Brandon Myers¹⁷,
Jeremy Hyrkas¹⁷,
Daniel Halperin¹⁷ &
…
Bill Howe¹⁷

Conference paper
First Online: 01 January 2015

908 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8921))

Abstract

Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation details, not a fundamental limitation of the relational model. To evaluate this hypothesis, we are exploring code-generation to produce fast in-memory algorithms and data structures for graph patterns that are inaccessible to conventional relational optimizers.

In this paper, we present preliminary results for this approach on path-counting queries, which includes triangle counting as a special case. We compile Datalog queries into main-memory pipelined hash-join plans in C\(++\), and show that the resulting programs easily outperform PostgreSQL on real graphs with different degrees of skew. We then produce analogous parallel programs for Grappa, a runtime system for distributed memory architectures. Grappa is a good target for building a parallel query system as its shared memory programming model and communication mechanisms provide productivity and performance when building communication-intensive applications. Our experiments suggest that Grappa programs using hash joins have competitive performance with queries executed on a commercial parallel database. We find preliminary evidence that a code generation approach simplifies the design of a query engine for graph analysis and improves performance over conventional relational databases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
All queries considered in this paper can be expressed with a single Datalog rule.

References

neo4j open source graph database, May 2013. http://neo4j.org/
Ahmad, Y., Koch, C.: DBToaster: a SQL compiler for high-performance delta processing in main-memory databases. Proc. VLDB Endow. 2(2), 1566–1569 (2009)
Article Google Scholar
Angles, R., Gutierrez, C.: The expressive power of SPARQL. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 114–129. Springer, Heidelberg (2008)
Chapter Google Scholar
Backstrom, L., et al.: Group formation in large social networks: membership, growth, and evolution. In: ACM KDD, pp. 44–54 (2006)
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1995, pp. 207–216. ACM, New York (1995)
Google Scholar
Caverlee, J., Liu, L.: Countering web spam with credibility-based link analysis. In: ACM Principles of Distributed Computing (PODC), pp. 157–166 (2007)
Google Scholar
Chen, S., Ailamaki, A., Gibbons, P., Mowry, T.: Improving hash join performance through prefetching. In: International Conference on Data Engineering (ICDE), pp. 116–127 (2004)
Google Scholar
Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2010)
Chapter Google Scholar
Gonzalez, J.E., et al.: PowerGraph: distributed graph-parallel computation on natural graphs. In: USENIX Operating Systems Design and Implementation (OSDI), pp. 17–30 (2012)
Google Scholar
Hagberg, A.A., et al.: Exploring network structure, dynamics, and function using NetworkX. In: Python in Science Conference (SciPy), pp. 11–15, August 2008
Google Scholar
HP-Vertica. Vertica analytics platform, June 2013. http://www.vertica.com
Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with MapReduce. arXiv preprint arXiv:1301.5887 (2013)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: International Conference on World Wide Web (WWW), pp. 591–600 (2010)
Google Scholar
Leskovec, J., et al.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. CoRR, abs/0810.1355 (2008)
Google Scholar
Loo, B.T., et al.: Declarative routing: extensible routing with declarative queries. SIGCOMM Comput. Commun. Rev. 35(4), 289–300 (2005)
Article Google Scholar
Losemann, K., Martens, W.: The complexity of evaluating path expressions in SPARQL. In: Proceedings of Principles of Database Systems (PODS) (2012)
Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD, pp. 135–146 (2010)
Google Scholar
Mandal, A., Fowler, R., Porterfield, A.: Modeling memory concurrency for multi-socket multi-core systems. In: Performance Analysis of Systems Software (ISPASS), March 2010
Google Scholar
Nelson, J., et al.: Crunching large graphs with commodity processors. In: USENIX Conference on Hot Topics in Parallelism (HotPar), pp. 10–10 (2011)
Google Scholar
Neumann, T.: Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4(9), 539–550 (2011)
Article Google Scholar
Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. In: Proceedings of the 36th International Conference on Very Large Data Bases, PVLDB 2013 (2010)
Google Scholar
Pavan, A., Tangwongan, K., Tirthapura, S.: Parallel and distributed triangle counting on graph streams. Technical report, IBM (2013)
Google Scholar
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)
Chapter Google Scholar
Przyjaciel-Zablocki, M., Schätzle, A., Hornung, T., Lausen, G.: RDFPath: path query processing on large RDF graphs with MapReduce. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 50–64. Springer, Heidelberg (2012)
Chapter Google Scholar
Rompf, T., Odersky, M.: Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. SIGPLAN Not. 46(2), 127–136 (2010)
Article Google Scholar
Seo, J., Guo, S., Lam, M.S.: SociaLite: datalog extensions for efficient social network analysis. In: 29th IEEE International Conference on Data Engineering. IEEE (2013)
Google Scholar
Waas, F.M.: Beyond conventional data warehousing-massively parallel data processing with Greenplum database. In: Castellanos, M., Dayal, U., Sellis, T. (eds.) BIRTE 2008. LNBIP, vol. 27, pp. 89–96. Springer, Heidelberg (2009)
Chapter Google Scholar
Welc, A., Raman, R., Wu, Z., Hong, S., Chafi, H., Banerjee, J.: Graph analysis: do we have to reinvent the wheel? In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 7:1–7:6. ACM, New York (2013)
Google Scholar
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: ACM SIGKDD Workshop on Mining Data Semantics, pp. 3:1–3:8 (2012)
Google Scholar
Zhang, W., Zhao, D., Wang, X.: Agglomerative clustering via maximum incremental path integral. Pattern Recogn. 46, 3056–3065 (2013)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
Brandon Myers, Jeremy Hyrkas, Daniel Halperin & Bill Howe

Authors

Brandon Myers
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Hyrkas
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Halperin
View author publications
You can also search for this author in PubMed Google Scholar
Bill Howe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brandon Myers .

Editor information

Editors and Affiliations

Samsung Corporation, San Jose, California, USA
Arun Jagatheesan
Microsoft Corporation, Redmond, Washington, USA
Justin Levandoski
Technische Universität München, Garching, Germany
Thomas Neumann
Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Andrew Pavlo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Myers, B., Hyrkas, J., Halperin, D., Howe, B. (2015). Compiled Plans for In-Memory Path-Counting Queries. In: Jagatheesan, A., Levandoski, J., Neumann, T., Pavlo, A. (eds) In Memory Data Management and Analysis. IMDM IMDM 2013 2014. Lecture Notes in Computer Science(), vol 8921. Springer, Cham. https://doi.org/10.1007/978-3-319-13960-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-13960-9_3
Published: 14 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13959-3
Online ISBN: 978-3-319-13960-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics