Skip to main content
Log in

Efficient Relational Techniques for Processing Graph Queries

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Graphs are widely used for modeling complicated data such as social networks, chemical compounds, protein interactions and semantic web. To effectively understand and utilize any collection of graphs, a graph database that efficiently supports elementary querying mechanisms is crucially required. For example, Subgraph and Supergraph queries are important types of graph queries which have many applications in practice. A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems. Relational database management systems (RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data. RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness, proper join ordering and powerful indexing mechanisms. In this article, we study the problem of indexing and querying graph databases using the relational infrastructure. We present a purely relational framework for processing graph queries. This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database. We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries. Finally, we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Manola F, Miller E. RDF primer: World wide web consortium proposed recommendation. February 2004. http://www.w3.org/TR/rdfprimer/.

  2. Cai D, Shao Z, He X, Yan X, Han J. Community mining from multi-relational networks. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, Oct. 3-7, 2005, pp.445–452.

  3. Yang Q, Sze S. Path matching and graph matching in biological networks. Journal of Computational Biology, 2007, 14(1): 56–67.

    Article  MathSciNet  Google Scholar 

  4. Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A.Mining protein family specific residue packing patterns from protein structure graphs. In Proc. the Eighth Annual International Conference on Computational Molecular Biology, San Diego, USA, Mar. 27-31, 2004, pp.308–315.

  5. Klinger S, Austin J. Chemical similarity searching using a neural graph matcher. In Proc. the 13th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, Apr. 27-29, 2005, pp.479–484.

  6. Willett P, Barnard J, Downs G. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 1998, 38(6): 983–996.

    Google Scholar 

  7. Sakr S, Awad A. A framework for querying graph-based business process models. In Proc. the 19th International World Wide Web Conference (WWW), Raleigh, USA, Apr. 26-30, 2010, pp.1297–1300.

  8. Cheng J, Ke Y, Ng W, Lu A. FG-Index: Towards verificationfree query processing on graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.857–872.

  9. Giugno R, Shasha D. GraphGrep: A fast and universal method for querying graphs. In Proc. the IEEE International Conference in Pattern Recognition (ICPR), Quebec, Canada, Aug. 11-15, 2002, pp.112–115.

  10. Jiang H, Wang H, Yu P, Zhou S. A novel approach for efficient search in graph databases. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.566–575.

  11. Williams D, Huan J, Wang W. Graph database indexing using structured graph decomposition. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.976–985.

  12. Yan X, Yu P, Han J. Graph indexing: A frequent structure based approach. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 8-12, 2004, pp.335–346.

  13. Zhang S, Hu M, Yang J. TreePi: A novel graph indexing method. In Proc. the 23rd International Conference on Data Engineering, Istanbul, Turkey, Apr. 15-20, 2007, pp.966–975.

  14. Zou L, Chen L, Yu J, Lu Y. A novel spectral coding in a large graph database. In Proc. the 11th International Conference on Extending Database Technology (EDBT), Nantes, France, Mar. 25-29, 2008, pp.181–192.

  15. Chen C, Yan X, Yu P, Han J, Zhang D, Gu X. Towards graph containment search and indexing. In Proc. the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.926–937.

  16. Zhang S, Li J, Gao H, Zou Z. A novel approach for efficient supergraph query processing on graph databases. In Proc. the 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, Mar. 24-26, 2009, pp.204–215.

  17. Tian Y, Patel J. TALE: A tool for approximate large graph matching. In Proc. of the 24th International Conference on Data Engineering (ICDE), Cancun, Mexico, Apr. 7-12, 2008, pp.963–972.

  18. Yan X, Yu P, Han J. Substructure similarity search in graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Jul. 31-Aug. 4, 2005, pp.766–777.

  19. Ke Y, Cheng J, Ng W. Efficient correlation search from graph databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1601–1615.

    Article  Google Scholar 

  20. Zou L, Chen L, Lu Y. Top-K correlation sub-graph search in graph databases. In Proc. the International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.168–185.

  21. Cohen S, Hurley P, Schulz K et al. Scientific formats for object-relational database systems: A study of suitability and performance. SIGMOD Record, 2006, 35(2): 10–15.

    Article  Google Scholar 

  22. Botea V, Mallett D, Nascimento M, Sander J. PIST: An efficient and practical indexing technique for historical spatio-temporal point data. GeoInformatica, 2008, 12(2): 143–168.

    Article  Google Scholar 

  23. Grust T, Sakr S, Teubner J. XQuery on SQL hosts. In Proc. the 30th International Conference on Very Large Data Bases, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.252–263.

  24. Grust T, Mayr M, Rittinger J et al. A SQL: 1999 code generator for the pathfinder XQuery compiler. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.1162–1164.

  25. Sakr S. Algebraic-based XQuery cardinality estimation. International Journal of Web Information Systems (IJWIS), 2008, 4(1): 7–46.

    Google Scholar 

  26. Teubner J, Grust T, Maneth S, Sakr S. Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment (PVLDB), 2008, 1(1): 463–477.

    Google Scholar 

  27. Graefe G. Sorting and indexing with partitioned B-trees. In Proc. the 1st International Conference on Data Systems Research (CIDR), Asilomar, USA, Jan. 5-8, 2003.

  28. Grust T, Rittinger J, Teubner J. Why off-the-shelf RDBMSs are better at XPath than you might expect. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.949–958.

  29. Bruno N, Chaudhuri S, Ramamurthy R. Power hints for query optimization. In Proc. the 25th International Conference on Data Engineering (ICDE), Shanghai, China, Mar. 29-Apr. 2, 2009, pp.469–480.

  30. Florescu D, Kossmann D. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 1999, 22(3): 27–34.

    Google Scholar 

  31. Sakr S. Storing and querying graph data using efficient relational processing techniques. In Proc. the 3rd International United Information Systems Conference (UNISCON), Sydney, Australia, Apr. 21-24, 2009, pp.379–392.

  32. Beyer K, Haas P, Reinwald B et al. On synopses for distinctvalue estimation under multiset operations. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, August 5-9, 2007, pp.199–210.

  33. Chakkappen S, Cruanes T, Dageville B, Jiang L, Shaft U, Su H, Zait M. Efficient and scalable statistics gathering for large databases in Oracle 11g. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 11-15, 2008, pp.1053–1064.

  34. Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), New York City, USA, Aug. 27-31, 1998, pp.204–208.

  35. Goldman R, Widom J. Enabling query formulation and optimization in semistructured databases. In Proc. the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, Aug. 25-29, 1997, pp.436–445.

  36. Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.

  37. Aboulnaga A, Alameldeen A, Naughton J. Estimating the selectivity of XML path expressions for Internet scale applications. In Proc. the 27th Int. Conf. Very Large Data Bases (VLDB), Rome, Italy, Sept. 11-14, 2001, pp.591–600.

  38. Graefe G. Query evaluation techniques for large databases. ACM Computing Surveys, 1993, 25(2): 73–170.

    Article  Google Scholar 

  39. Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.359–370.

  40. Agrawal S, S Chaudhuri, Narasayya V. Automated selection of materialized views and indexes in SQL databases. In Proc. the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt, Sept. 10-14, 2000, pp.496–505.

  41. Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: Workload as a sequence. In Proc. the ACM SIGMOD International Conference on Management of Data, Chicago, USA, Jun. 26-29, 2006, pp.683–694.

  42. Developmental therapeutics program. NCI/NIH. http://dtp.nci.nih.gov/.

  43. Kuramochi M, Karypis G. Frequent subgraph discovery. In Proc. the IEEE International Conference on Data Mining (ICDM), San Jose, USA, Nov. 29-Dec. 2, 2001, pp.313–320.

  44. Sakr S, Al-Naymat G. Graph indexing and querying: A review. International Journal of Web Information Systems (IJWIS), 2010, 6(2): 101–120.

    Article  Google Scholar 

  45. Sakr S. GraphREL: A decomposition-based and selectivityaware relational framework for processing sub-graph queries. In Proc. the 14th International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.123–137.

  46. Zhao P, Yu J, Yu P. Graph indexing: Tree + delta ≥ graph. In Proc. the 33rd Int. Conf. Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.938–949.

  47. He H, Singh A. Closure-tree: An index structure for graph queries. In Proc. the 22nd International Conference on Data Engineering (ICDE), Atlanta, USA, Apr. 3-8, 2006, pp.38–52.

  48. Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Minneapolis, USA, Jul. 23-27, 1984, pp.47–57.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sherif Sakr.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakr, S., Al-Naymat, G. Efficient Relational Techniques for Processing Graph Queries. J. Comput. Sci. Technol. 25, 1237–1255 (2010). https://doi.org/10.1007/s11390-010-9402-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9402-5

Keywords

Navigation