Efficient Relational Techniques for Processing Graph Queries

Sakr, Sherif; Al-Naymat, Ghazi

doi:10.1007/s11390-010-9402-5

Efficient Relational Techniques for Processing Graph Queries

Regular Paper
Published: 03 November 2010

Volume 25, pages 1237–1255, (2010)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Sherif Sakr^1,2 &
Ghazi Al-Naymat¹

132 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

Graphs are widely used for modeling complicated data such as social networks, chemical compounds, protein interactions and semantic web. To effectively understand and utilize any collection of graphs, a graph database that efficiently supports elementary querying mechanisms is crucially required. For example, Subgraph and Supergraph queries are important types of graph queries which have many applications in practice. A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems. Relational database management systems (RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data. RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness, proper join ordering and powerful indexing mechanisms. In this article, we study the problem of indexing and querying graph databases using the relational infrastructure. We present a purely relational framework for processing graph queries. This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database. We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries. Finally, we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Manola F, Miller E. RDF primer: World wide web consortium proposed recommendation. February 2004. http://www.w3.org/TR/rdfprimer/.
Cai D, Shao Z, He X, Yan X, Han J. Community mining from multi-relational networks. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, Oct. 3-7, 2005, pp.445–452.
Yang Q, Sze S. Path matching and graph matching in biological networks. Journal of Computational Biology, 2007, 14(1): 56–67.
Article MathSciNet Google Scholar
Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A.Mining protein family specific residue packing patterns from protein structure graphs. In Proc. the Eighth Annual International Conference on Computational Molecular Biology, San Diego, USA, Mar. 27-31, 2004, pp.308–315.
Klinger S, Austin J. Chemical similarity searching using a neural graph matcher. In Proc. the 13th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, Apr. 27-29, 2005, pp.479–484.
Willett P, Barnard J, Downs G. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 1998, 38(6): 983–996.
Google Scholar
Sakr S, Awad A. A framework for querying graph-based business process models. In Proc. the 19th International World Wide Web Conference (WWW), Raleigh, USA, Apr. 26-30, 2010, pp.1297–1300.
Cheng J, Ke Y, Ng W, Lu A. FG-Index: Towards verificationfree query processing on graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.857–872.
Giugno R, Shasha D. GraphGrep: A fast and universal method for querying graphs. In Proc. the IEEE International Conference in Pattern Recognition (ICPR), Quebec, Canada, Aug. 11-15, 2002, pp.112–115.
Jiang H, Wang H, Yu P, Zhou S. A novel approach for efficient search in graph databases. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.566–575.
Williams D, Huan J, Wang W. Graph database indexing using structured graph decomposition. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.976–985.
Yan X, Yu P, Han J. Graph indexing: A frequent structure based approach. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 8-12, 2004, pp.335–346.
Zhang S, Hu M, Yang J. TreePi: A novel graph indexing method. In Proc. the 23rd International Conference on Data Engineering, Istanbul, Turkey, Apr. 15-20, 2007, pp.966–975.
Zou L, Chen L, Yu J, Lu Y. A novel spectral coding in a large graph database. In Proc. the 11th International Conference on Extending Database Technology (EDBT), Nantes, France, Mar. 25-29, 2008, pp.181–192.
Chen C, Yan X, Yu P, Han J, Zhang D, Gu X. Towards graph containment search and indexing. In Proc. the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.926–937.
Zhang S, Li J, Gao H, Zou Z. A novel approach for efficient supergraph query processing on graph databases. In Proc. the 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, Mar. 24-26, 2009, pp.204–215.
Tian Y, Patel J. TALE: A tool for approximate large graph matching. In Proc. of the 24th International Conference on Data Engineering (ICDE), Cancun, Mexico, Apr. 7-12, 2008, pp.963–972.
Yan X, Yu P, Han J. Substructure similarity search in graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Jul. 31-Aug. 4, 2005, pp.766–777.
Ke Y, Cheng J, Ng W. Efficient correlation search from graph databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1601–1615.
Article Google Scholar
Zou L, Chen L, Lu Y. Top-K correlation sub-graph search in graph databases. In Proc. the International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.168–185.
Cohen S, Hurley P, Schulz K et al. Scientific formats for object-relational database systems: A study of suitability and performance. SIGMOD Record, 2006, 35(2): 10–15.
Article Google Scholar
Botea V, Mallett D, Nascimento M, Sander J. PIST: An efficient and practical indexing technique for historical spatio-temporal point data. GeoInformatica, 2008, 12(2): 143–168.
Article Google Scholar
Grust T, Sakr S, Teubner J. XQuery on SQL hosts. In Proc. the 30th International Conference on Very Large Data Bases, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.252–263.
Grust T, Mayr M, Rittinger J et al. A SQL: 1999 code generator for the pathfinder XQuery compiler. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.1162–1164.
Sakr S. Algebraic-based XQuery cardinality estimation. International Journal of Web Information Systems (IJWIS), 2008, 4(1): 7–46.
Google Scholar
Teubner J, Grust T, Maneth S, Sakr S. Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment (PVLDB), 2008, 1(1): 463–477.
Google Scholar
Graefe G. Sorting and indexing with partitioned B-trees. In Proc. the 1st International Conference on Data Systems Research (CIDR), Asilomar, USA, Jan. 5-8, 2003.
Grust T, Rittinger J, Teubner J. Why off-the-shelf RDBMSs are better at XPath than you might expect. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.949–958.
Bruno N, Chaudhuri S, Ramamurthy R. Power hints for query optimization. In Proc. the 25th International Conference on Data Engineering (ICDE), Shanghai, China, Mar. 29-Apr. 2, 2009, pp.469–480.
Florescu D, Kossmann D. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 1999, 22(3): 27–34.
Google Scholar
Sakr S. Storing and querying graph data using efficient relational processing techniques. In Proc. the 3rd International United Information Systems Conference (UNISCON), Sydney, Australia, Apr. 21-24, 2009, pp.379–392.
Beyer K, Haas P, Reinwald B et al. On synopses for distinctvalue estimation under multiset operations. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, August 5-9, 2007, pp.199–210.
Chakkappen S, Cruanes T, Dageville B, Jiang L, Shaft U, Su H, Zait M. Efficient and scalable statistics gathering for large databases in Oracle 11g. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 11-15, 2008, pp.1053–1064.
Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), New York City, USA, Aug. 27-31, 1998, pp.204–208.
Goldman R, Widom J. Enabling query formulation and optimization in semistructured databases. In Proc. the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, Aug. 25-29, 1997, pp.436–445.
Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
Aboulnaga A, Alameldeen A, Naughton J. Estimating the selectivity of XML path expressions for Internet scale applications. In Proc. the 27th Int. Conf. Very Large Data Bases (VLDB), Rome, Italy, Sept. 11-14, 2001, pp.591–600.
Graefe G. Query evaluation techniques for large databases. ACM Computing Surveys, 1993, 25(2): 73–170.
Article Google Scholar
Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.359–370.
Agrawal S, S Chaudhuri, Narasayya V. Automated selection of materialized views and indexes in SQL databases. In Proc. the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt, Sept. 10-14, 2000, pp.496–505.
Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: Workload as a sequence. In Proc. the ACM SIGMOD International Conference on Management of Data, Chicago, USA, Jun. 26-29, 2006, pp.683–694.
Developmental therapeutics program. NCI/NIH. http://dtp.nci.nih.gov/.
Kuramochi M, Karypis G. Frequent subgraph discovery. In Proc. the IEEE International Conference on Data Mining (ICDM), San Jose, USA, Nov. 29-Dec. 2, 2001, pp.313–320.
Sakr S, Al-Naymat G. Graph indexing and querying: A review. International Journal of Web Information Systems (IJWIS), 2010, 6(2): 101–120.
Article Google Scholar
Sakr S. GraphREL: A decomposition-based and selectivityaware relational framework for processing sub-graph queries. In Proc. the 14th International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.123–137.
Zhao P, Yu J, Yu P. Graph indexing: Tree + delta ≥ graph. In Proc. the 33rd Int. Conf. Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.938–949.
He H, Singh A. Closure-tree: An index structure for graph queries. In Proc. the 22nd International Conference on Data Engineering (ICDE), Atlanta, USA, Apr. 3-8, 2006, pp.38–52.
Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Minneapolis, USA, Jul. 23-27, 1984, pp.47–57.

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Sherif Sakr (Member, ACM) & Ghazi Al-Naymat (Member, ACM)
Managing Complexity Group, National ICT Australia (NICTA), ATP, Sydney, Australia
Sherif Sakr (Member, ACM)

Authors

Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar
Ghazi Al-Naymat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sherif Sakr.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakr, S., Al-Naymat, G. Efficient Relational Techniques for Processing Graph Queries. J. Comput. Sci. Technol. 25, 1237–1255 (2010). https://doi.org/10.1007/s11390-010-9402-5

Download citation

Received: 22 February 2010
Revised: 19 August 2010
Published: 03 November 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s11390-010-9402-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Efficient Relational Techniques for Processing Graph Queries

Abstract

Access this article

Similar content being viewed by others

Big-Graphs: Querying, Mining, and Beyond

Graph Databases: Their Power and Limitations

An Introduction to Graph Data Management

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Relational Techniques for Processing Graph Queries

Abstract

Access this article

Similar content being viewed by others

Big-Graphs: Querying, Mining, and Beyond

Graph Databases: Their Power and Limitations

An Introduction to Graph Data Management

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation