Abstract
Graphs are widely used for modeling complicated data such as social networks, chemical compounds, protein interactions and semantic web. To effectively understand and utilize any collection of graphs, a graph database that efficiently supports elementary querying mechanisms is crucially required. For example, Subgraph and Supergraph queries are important types of graph queries which have many applications in practice. A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems. Relational database management systems (RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data. RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness, proper join ordering and powerful indexing mechanisms. In this article, we study the problem of indexing and querying graph databases using the relational infrastructure. We present a purely relational framework for processing graph queries. This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database. We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries. Finally, we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques.
Similar content being viewed by others
References
Manola F, Miller E. RDF primer: World wide web consortium proposed recommendation. February 2004. http://www.w3.org/TR/rdfprimer/.
Cai D, Shao Z, He X, Yan X, Han J. Community mining from multi-relational networks. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, Oct. 3-7, 2005, pp.445–452.
Yang Q, Sze S. Path matching and graph matching in biological networks. Journal of Computational Biology, 2007, 14(1): 56–67.
Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A.Mining protein family specific residue packing patterns from protein structure graphs. In Proc. the Eighth Annual International Conference on Computational Molecular Biology, San Diego, USA, Mar. 27-31, 2004, pp.308–315.
Klinger S, Austin J. Chemical similarity searching using a neural graph matcher. In Proc. the 13th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, Apr. 27-29, 2005, pp.479–484.
Willett P, Barnard J, Downs G. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 1998, 38(6): 983–996.
Sakr S, Awad A. A framework for querying graph-based business process models. In Proc. the 19th International World Wide Web Conference (WWW), Raleigh, USA, Apr. 26-30, 2010, pp.1297–1300.
Cheng J, Ke Y, Ng W, Lu A. FG-Index: Towards verificationfree query processing on graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.857–872.
Giugno R, Shasha D. GraphGrep: A fast and universal method for querying graphs. In Proc. the IEEE International Conference in Pattern Recognition (ICPR), Quebec, Canada, Aug. 11-15, 2002, pp.112–115.
Jiang H, Wang H, Yu P, Zhou S. A novel approach for efficient search in graph databases. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.566–575.
Williams D, Huan J, Wang W. Graph database indexing using structured graph decomposition. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.976–985.
Yan X, Yu P, Han J. Graph indexing: A frequent structure based approach. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 8-12, 2004, pp.335–346.
Zhang S, Hu M, Yang J. TreePi: A novel graph indexing method. In Proc. the 23rd International Conference on Data Engineering, Istanbul, Turkey, Apr. 15-20, 2007, pp.966–975.
Zou L, Chen L, Yu J, Lu Y. A novel spectral coding in a large graph database. In Proc. the 11th International Conference on Extending Database Technology (EDBT), Nantes, France, Mar. 25-29, 2008, pp.181–192.
Chen C, Yan X, Yu P, Han J, Zhang D, Gu X. Towards graph containment search and indexing. In Proc. the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.926–937.
Zhang S, Li J, Gao H, Zou Z. A novel approach for efficient supergraph query processing on graph databases. In Proc. the 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, Mar. 24-26, 2009, pp.204–215.
Tian Y, Patel J. TALE: A tool for approximate large graph matching. In Proc. of the 24th International Conference on Data Engineering (ICDE), Cancun, Mexico, Apr. 7-12, 2008, pp.963–972.
Yan X, Yu P, Han J. Substructure similarity search in graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Jul. 31-Aug. 4, 2005, pp.766–777.
Ke Y, Cheng J, Ng W. Efficient correlation search from graph databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1601–1615.
Zou L, Chen L, Lu Y. Top-K correlation sub-graph search in graph databases. In Proc. the International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.168–185.
Cohen S, Hurley P, Schulz K et al. Scientific formats for object-relational database systems: A study of suitability and performance. SIGMOD Record, 2006, 35(2): 10–15.
Botea V, Mallett D, Nascimento M, Sander J. PIST: An efficient and practical indexing technique for historical spatio-temporal point data. GeoInformatica, 2008, 12(2): 143–168.
Grust T, Sakr S, Teubner J. XQuery on SQL hosts. In Proc. the 30th International Conference on Very Large Data Bases, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.252–263.
Grust T, Mayr M, Rittinger J et al. A SQL: 1999 code generator for the pathfinder XQuery compiler. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.1162–1164.
Sakr S. Algebraic-based XQuery cardinality estimation. International Journal of Web Information Systems (IJWIS), 2008, 4(1): 7–46.
Teubner J, Grust T, Maneth S, Sakr S. Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment (PVLDB), 2008, 1(1): 463–477.
Graefe G. Sorting and indexing with partitioned B-trees. In Proc. the 1st International Conference on Data Systems Research (CIDR), Asilomar, USA, Jan. 5-8, 2003.
Grust T, Rittinger J, Teubner J. Why off-the-shelf RDBMSs are better at XPath than you might expect. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.949–958.
Bruno N, Chaudhuri S, Ramamurthy R. Power hints for query optimization. In Proc. the 25th International Conference on Data Engineering (ICDE), Shanghai, China, Mar. 29-Apr. 2, 2009, pp.469–480.
Florescu D, Kossmann D. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 1999, 22(3): 27–34.
Sakr S. Storing and querying graph data using efficient relational processing techniques. In Proc. the 3rd International United Information Systems Conference (UNISCON), Sydney, Australia, Apr. 21-24, 2009, pp.379–392.
Beyer K, Haas P, Reinwald B et al. On synopses for distinctvalue estimation under multiset operations. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, August 5-9, 2007, pp.199–210.
Chakkappen S, Cruanes T, Dageville B, Jiang L, Shaft U, Su H, Zait M. Efficient and scalable statistics gathering for large databases in Oracle 11g. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 11-15, 2008, pp.1053–1064.
Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), New York City, USA, Aug. 27-31, 1998, pp.204–208.
Goldman R, Widom J. Enabling query formulation and optimization in semistructured databases. In Proc. the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, Aug. 25-29, 1997, pp.436–445.
Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
Aboulnaga A, Alameldeen A, Naughton J. Estimating the selectivity of XML path expressions for Internet scale applications. In Proc. the 27th Int. Conf. Very Large Data Bases (VLDB), Rome, Italy, Sept. 11-14, 2001, pp.591–600.
Graefe G. Query evaluation techniques for large databases. ACM Computing Surveys, 1993, 25(2): 73–170.
Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.359–370.
Agrawal S, S Chaudhuri, Narasayya V. Automated selection of materialized views and indexes in SQL databases. In Proc. the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt, Sept. 10-14, 2000, pp.496–505.
Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: Workload as a sequence. In Proc. the ACM SIGMOD International Conference on Management of Data, Chicago, USA, Jun. 26-29, 2006, pp.683–694.
Developmental therapeutics program. NCI/NIH. http://dtp.nci.nih.gov/.
Kuramochi M, Karypis G. Frequent subgraph discovery. In Proc. the IEEE International Conference on Data Mining (ICDM), San Jose, USA, Nov. 29-Dec. 2, 2001, pp.313–320.
Sakr S, Al-Naymat G. Graph indexing and querying: A review. International Journal of Web Information Systems (IJWIS), 2010, 6(2): 101–120.
Sakr S. GraphREL: A decomposition-based and selectivityaware relational framework for processing sub-graph queries. In Proc. the 14th International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.123–137.
Zhao P, Yu J, Yu P. Graph indexing: Tree + delta ≥ graph. In Proc. the 33rd Int. Conf. Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.938–949.
He H, Singh A. Closure-tree: An index structure for graph queries. In Proc. the 22nd International Conference on Data Engineering (ICDE), Atlanta, USA, Apr. 3-8, 2006, pp.38–52.
Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Minneapolis, USA, Jul. 23-27, 1984, pp.47–57.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sakr, S., Al-Naymat, G. Efficient Relational Techniques for Processing Graph Queries. J. Comput. Sci. Technol. 25, 1237–1255 (2010). https://doi.org/10.1007/s11390-010-9402-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-010-9402-5