Journal of Computer Science and Technology

, Volume 25, Issue 6, pp 1237–1255

Efficient Relational Techniques for Processing Graph Queries

Regular Paper

Abstract

Graphs are widely used for modeling complicated data such as social networks, chemical compounds, protein interactions and semantic web. To effectively understand and utilize any collection of graphs, a graph database that efficiently supports elementary querying mechanisms is crucially required. For example, Subgraph and Supergraph queries are important types of graph queries which have many applications in practice. A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems. Relational database management systems (RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data. RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness, proper join ordering and powerful indexing mechanisms. In this article, we study the problem of indexing and querying graph databases using the relational infrastructure. We present a purely relational framework for processing graph queries. This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database. We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries. Finally, we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques.

Keywords

graph database graph query subgraph query supergraph query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Manola F, Miller E. RDF primer: World wide web consortium proposed recommendation. February 2004. http://www.w3.org/TR/rdfprimer/.
  2. [2]
    Cai D, Shao Z, He X, Yan X, Han J. Community mining from multi-relational networks. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, Oct. 3-7, 2005, pp.445–452.Google Scholar
  3. [3]
    Yang Q, Sze S. Path matching and graph matching in biological networks. Journal of Computational Biology, 2007, 14(1): 56–67.CrossRefMathSciNetGoogle Scholar
  4. [4]
    Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A.Mining protein family specific residue packing patterns from protein structure graphs. In Proc. the Eighth Annual International Conference on Computational Molecular Biology, San Diego, USA, Mar. 27-31, 2004, pp.308–315.Google Scholar
  5. [5]
    Klinger S, Austin J. Chemical similarity searching using a neural graph matcher. In Proc. the 13th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, Apr. 27-29, 2005, pp.479–484.Google Scholar
  6. [6]
    Willett P, Barnard J, Downs G. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 1998, 38(6): 983–996.Google Scholar
  7. [7]
    Sakr S, Awad A. A framework for querying graph-based business process models. In Proc. the 19th International World Wide Web Conference (WWW), Raleigh, USA, Apr. 26-30, 2010, pp.1297–1300.Google Scholar
  8. [8]
    Cheng J, Ke Y, Ng W, Lu A. FG-Index: Towards verificationfree query processing on graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.857–872.Google Scholar
  9. [9]
    Giugno R, Shasha D. GraphGrep: A fast and universal method for querying graphs. In Proc. the IEEE International Conference in Pattern Recognition (ICPR), Quebec, Canada, Aug. 11-15, 2002, pp.112–115.Google Scholar
  10. [10]
    Jiang H, Wang H, Yu P, Zhou S. A novel approach for efficient search in graph databases. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.566–575.Google Scholar
  11. [11]
    Williams D, Huan J, Wang W. Graph database indexing using structured graph decomposition. In Proc. the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, Apr. 15-20, 2007, pp.976–985.Google Scholar
  12. [12]
    Yan X, Yu P, Han J. Graph indexing: A frequent structure based approach. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 8-12, 2004, pp.335–346.Google Scholar
  13. [13]
    Zhang S, Hu M, Yang J. TreePi: A novel graph indexing method. In Proc. the 23rd International Conference on Data Engineering, Istanbul, Turkey, Apr. 15-20, 2007, pp.966–975.Google Scholar
  14. [14]
    Zou L, Chen L, Yu J, Lu Y. A novel spectral coding in a large graph database. In Proc. the 11th International Conference on Extending Database Technology (EDBT), Nantes, France, Mar. 25-29, 2008, pp.181–192.Google Scholar
  15. [15]
    Chen C, Yan X, Yu P, Han J, Zhang D, Gu X. Towards graph containment search and indexing. In Proc. the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.926–937.Google Scholar
  16. [16]
    Zhang S, Li J, Gao H, Zou Z. A novel approach for efficient supergraph query processing on graph databases. In Proc. the 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, Mar. 24-26, 2009, pp.204–215.Google Scholar
  17. [17]
    Tian Y, Patel J. TALE: A tool for approximate large graph matching. In Proc. of the 24th International Conference on Data Engineering (ICDE), Cancun, Mexico, Apr. 7-12, 2008, pp.963–972.Google Scholar
  18. [18]
    Yan X, Yu P, Han J. Substructure similarity search in graph databases. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Jul. 31-Aug. 4, 2005, pp.766–777.Google Scholar
  19. [19]
    Ke Y, Cheng J, Ng W. Efficient correlation search from graph databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1601–1615.CrossRefGoogle Scholar
  20. [20]
    Zou L, Chen L, Lu Y. Top-K correlation sub-graph search in graph databases. In Proc. the International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.168–185.Google Scholar
  21. [21]
    Cohen S, Hurley P, Schulz K et al. Scientific formats for object-relational database systems: A study of suitability and performance. SIGMOD Record, 2006, 35(2): 10–15.CrossRefGoogle Scholar
  22. [22]
    Botea V, Mallett D, Nascimento M, Sander J. PIST: An efficient and practical indexing technique for historical spatio-temporal point data. GeoInformatica, 2008, 12(2): 143–168.CrossRefGoogle Scholar
  23. [23]
    Grust T, Sakr S, Teubner J. XQuery on SQL hosts. In Proc. the 30th International Conference on Very Large Data Bases, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.252–263.Google Scholar
  24. [24]
    Grust T, Mayr M, Rittinger J et al. A SQL: 1999 code generator for the pathfinder XQuery compiler. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.1162–1164.Google Scholar
  25. [25]
    Sakr S. Algebraic-based XQuery cardinality estimation. International Journal of Web Information Systems (IJWIS), 2008, 4(1): 7–46.Google Scholar
  26. [26]
    Teubner J, Grust T, Maneth S, Sakr S. Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment (PVLDB), 2008, 1(1): 463–477.Google Scholar
  27. [27]
    Graefe G. Sorting and indexing with partitioned B-trees. In Proc. the 1st International Conference on Data Systems Research (CIDR), Asilomar, USA, Jan. 5-8, 2003.Google Scholar
  28. [28]
    Grust T, Rittinger J, Teubner J. Why off-the-shelf RDBMSs are better at XPath than you might expect. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, San Diego, USA, Aug. 5-9, 2007, pp.949–958.Google Scholar
  29. [29]
    Bruno N, Chaudhuri S, Ramamurthy R. Power hints for query optimization. In Proc. the 25th International Conference on Data Engineering (ICDE), Shanghai, China, Mar. 29-Apr. 2, 2009, pp.469–480.Google Scholar
  30. [30]
    Florescu D, Kossmann D. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 1999, 22(3): 27–34.Google Scholar
  31. [31]
    Sakr S. Storing and querying graph data using efficient relational processing techniques. In Proc. the 3rd International United Information Systems Conference (UNISCON), Sydney, Australia, Apr. 21-24, 2009, pp.379–392.Google Scholar
  32. [32]
    Beyer K, Haas P, Reinwald B et al. On synopses for distinctvalue estimation under multiset operations. In Proc. the ACM SIGMOD International Conference on Management of Data, San Diego, USA, August 5-9, 2007, pp.199–210.Google Scholar
  33. [33]
    Chakkappen S, Cruanes T, Dageville B, Jiang L, Shaft U, Su H, Zait M. Efficient and scalable statistics gathering for large databases in Oracle 11g. In Proc. the ACM SIGMOD International Conference on Management of Data, Los Angeles, USA, Aug. 11-15, 2008, pp.1053–1064.Google Scholar
  34. [34]
    Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), New York City, USA, Aug. 27-31, 1998, pp.204–208.Google Scholar
  35. [35]
    Goldman R, Widom J. Enabling query formulation and optimization in semistructured databases. In Proc. the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, Aug. 25-29, 1997, pp.436–445.Google Scholar
  36. [36]
    Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.Google Scholar
  37. [37]
    Aboulnaga A, Alameldeen A, Naughton J. Estimating the selectivity of XML path expressions for Internet scale applications. In Proc. the 27th Int. Conf. Very Large Data Bases (VLDB), Rome, Italy, Sept. 11-14, 2001, pp.591–600.Google Scholar
  38. [38]
    Graefe G. Query evaluation techniques for large databases. ACM Computing Surveys, 1993, 25(2): 73–170.CrossRefGoogle Scholar
  39. [39]
    Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Toronto, Canada, Aug. 31-Sept. 3, 2004, pp.359–370.Google Scholar
  40. [40]
    Agrawal S, S Chaudhuri, Narasayya V. Automated selection of materialized views and indexes in SQL databases. In Proc. the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt, Sept. 10-14, 2000, pp.496–505.Google Scholar
  41. [41]
    Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: Workload as a sequence. In Proc. the ACM SIGMOD International Conference on Management of Data, Chicago, USA, Jun. 26-29, 2006, pp.683–694.Google Scholar
  42. [42]
    Developmental therapeutics program. NCI/NIH. http://dtp.nci.nih.gov/.
  43. [43]
    Kuramochi M, Karypis G. Frequent subgraph discovery. In Proc. the IEEE International Conference on Data Mining (ICDM), San Jose, USA, Nov. 29-Dec. 2, 2001, pp.313–320.Google Scholar
  44. [44]
    Sakr S, Al-Naymat G. Graph indexing and querying: A review. International Journal of Web Information Systems (IJWIS), 2010, 6(2): 101–120.CrossRefGoogle Scholar
  45. [45]
    Sakr S. GraphREL: A decomposition-based and selectivityaware relational framework for processing sub-graph queries. In Proc. the 14th International Conference on Database Systems for Advanced Applications (DASFAA), Brisbane, Australia, Apr. 21-23, 2009, pp.123–137.Google Scholar
  46. [46]
    Zhao P, Yu J, Yu P. Graph indexing: Tree + delta ≥ graph. In Proc. the 33rd Int. Conf. Very Large Data Bases (VLDB), Vienna, Austria, Sept. 23-27, 2007, pp.938–949.Google Scholar
  47. [47]
    He H, Singh A. Closure-tree: An index structure for graph queries. In Proc. the 22nd International Conference on Data Engineering (ICDE), Atlanta, USA, Apr. 3-8, 2006, pp.38–52.Google Scholar
  48. [48]
    Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Minneapolis, USA, Jul. 23-27, 1984, pp.47–57.Google Scholar

Copyright information

© Springer 2010

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringUniversity of New South WalesSydneyAustralia
  2. 2.Managing Complexity GroupNational ICT Australia (NICTA), ATPSydneyAustralia

Personalised recommendations