Advertisement

Frontiers of Computer Science

, Volume 12, Issue 5, pp 966–983 | Cite as

Using partial evaluation in holistic subgraph search

  • Peng Peng
  • Lei Zou
  • Zhenqin Du
  • Dongyan Zhao
Research Article

Abstract

Because of its wide application, the subgraph matching problem has been studied extensively during the past decade. However, most existing solutions assume that a data graph is a vertex/edge-labeled graph (i.e., each vertex/edge has a simple label). These solutions build structural indices by considering the vertex labels. However, some real graphs contain rich-content vertices such as user profiles in social networks and HTML pages on the World Wide Web. In this study, we consider the problem of subgraph matching using a more general scenario. We build a structural index that does not depend on any vertex content. Based on the index, we design a holistic subgraph matching algorithm that considers the query graph as a whole and finds one match at a time. In order to further improve efficiency, we propose a “partial evaluation and assembly” framework to find subgraph matches over large graphs. Last but not least, our index has light maintenance overhead. Therefore, our method can work well on dynamic graphs. Extensive experiments on real graphs show that our method outperforms the state-of-the-art algorithms.

Keywords

subgraph search holistic approach partial evaluation and assembly 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China (2016YFB1000603), Fundamental Research Funds for the Central Universities, the National Natural Science Foundation of China (Grant Nos. 61622201, 61472131, and 61272546), and Science and Technology Key Projects of Hunan Province (2015 TP1004).

Supplementary material

11704_2016_5522_MOESM1_ESM.ppt (202 kb)
Using partial evaluation in holistic subgraph search

References

  1. 1.
    Zhang S J, Li S R, Yang J. GADDI: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology. 2009, 192–203CrossRefGoogle Scholar
  2. 2.
    Watts D J, Dodds P S, Newman M E J. Identity and search in social networks. Science, 2002, 296(5571): 1302–1305CrossRefGoogle Scholar
  3. 3.
    Stocker M, Seaborne A, Bernstein A, Kiefer C, Reynolds D. SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 595–604Google Scholar
  4. 4.
    Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32(5): 1338–1355MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chan E P F, Lim H. Optimization and evaluation of shortest path queries. The VLDB Journal, 2007, 16(3): 343–369CrossRefGoogle Scholar
  6. 6.
    Jing N, Huang Y W, Rundensteiner E A. Hierarchical encoded path views for path query processing: an optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 1998, 10(3): 409–432CrossRefGoogle Scholar
  7. 7.
    Cheng J F, Yu J X. On-line exact shortest distance query processing. In: Proceedings of the 12th International Conference on Extending Database Technology. 2009, 481–492CrossRefGoogle Scholar
  8. 8.
    Wang H X, He H, Yang J, Yu P S, Yu J X. Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 75Google Scholar
  9. 9.
    Trißl S, Leser U. Fast and practical indexing and querying of very large graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 845–856Google Scholar
  10. 10.
    Chen Y J, Chen Y B. An efficient algorithm for answering graph reachability queries. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 893–902Google Scholar
  11. 11.
    Shasha D, Wang J T L, Giugno R. Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems. 2002, 39–52Google Scholar
  12. 12.
    Yan X F, Yu P S, Han J W. Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2004, 335–346Google Scholar
  13. 13.
    He H H, Singh A K. Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 405–418Google Scholar
  14. 14.
    Zhang S J, Hu M, Yang J. Tree Pi: a novel graph indexing method. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 966–975Google Scholar
  15. 15.
    Zhao P X, Yu J X, Yu P S. Graph indexing: tree + delta >= graph. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 938–949Google Scholar
  16. 16.
    He H H, Singh A K. Closure-tree: an index structure for graph queries. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 38Google Scholar
  17. 17.
    Tian Y Y, Patel J M. TALE: a tool for approximate large graph matching. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 963–972Google Scholar
  18. 18.
    Zhao P X, Han J W. On graph query optimization in large networks. Proceedings of the VLDB Endowment, 2010, 3(1-2): 340–351CrossRefGoogle Scholar
  19. 19.
    Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Subgraph search over massive disk resident graphs. In: Proceedings of the 23rd International Conference on Scientific and Statistical Database Management. 2011, 312–321CrossRefGoogle Scholar
  20. 20.
    Sakr S, Elnikety S, He Y X. G-SPARQL: a hybrid engine for querying large attributed graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 335–344Google Scholar
  21. 21.
    Sun Z, Wang H Z, Wang H X, Shao B, Li J Z. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment, 2012, 5(9): 788–799CrossRefGoogle Scholar
  22. 22.
    Han W S, Lee J, Lee J H. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 337–348Google Scholar
  23. 23.
    Zhu K, Zhang Y, Lin X M, Zhu G P, Wang W. NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications. 2010, 140–154CrossRefGoogle Scholar
  24. 24.
    Zou L, Chen L, Özsu M T. Distance Join: pattern match query in a large graph database. Proceedings of the VLDB Endowment, 2009, 2(1): 886–897CrossRefGoogle Scholar
  25. 25.
    Yu J X, Zeng X G, Cheng J F. Top-k graph pattern matching over large graphs. In: Proceedings of IEEE International Conference on Data Engineering. 2013, 1033–1044Google Scholar
  26. 26.
    Bruno N, Koudas N, Srivastava D. Holistic twig joins: optimal XML pattern matching. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2002, 310–321Google Scholar
  27. 27.
    Jiang H F, Wang W, Lu H J, Yu J X. Holistic twig joins on indexed XML documents. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 273–284Google Scholar
  28. 28.
    Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proceedings of ACM/IEEE Conference on Supercomputing. 1995, 29Google Scholar
  29. 29.
    Wang L, Xiao Y H, Shao B, Wang H X. How to partition a billion-node graph. In: Proceedings of the 30th International Conference on Data Engineering. 2014, 568–579Google Scholar
  30. 30.
    Tretyakov K, Armas-Cervantes A, García-Banuelos L, Vilo J, Dumas M. Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management. 2011, 1785–1794Google Scholar
  31. 31.
    Hoffart J, Suchanek F M, Berberich K, Weikum G. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 2013, 194: 28–61MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249CrossRefGoogle Scholar
  33. 33.
    Ullmann J R. An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23(1): 31–42MathSciNetCrossRefGoogle Scholar
  34. 34.
    Cordella L P, Foggia P, Sansone C, Vento M. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(10): 1367–1372CrossRefGoogle Scholar
  35. 35.
    Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Answering subgraph queries over massive disk resident graphs. World Wide Web: Internet and Web Information Systems, 2016, 19(3): 417–448CrossRefGoogle Scholar
  36. 36.
    Lee J, Han W S, Kasperovics R, Lee J H. An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proceedings of the VLDB Endowment, 2012, 6(2): 133–144CrossRefGoogle Scholar
  37. 37.
    Ren X G, Wang J H. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment, 2015, 8(5): 617–628CrossRefGoogle Scholar
  38. 38.
    Afrati F N, Fotakis D, Ullman J D. Enumerating subgraph instances using Map-Reduce. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 62–73Google Scholar
  39. 39.
    Fan W F, Wang X, Wu Y H, Deng D. Distributed graph simulation: impossibility and possibility. Proceedings of the VLDB Endowment, 2014, 7(12): 1083–1094CrossRefGoogle Scholar
  40. 40.
    Fan W F, Li J Z, Ma S, Tang N, Wu Y H, Wu Y P. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2010, 3(1): 264–275CrossRefGoogle Scholar
  41. 41.
    Fan W F, Wang X, Wu Y H. Diversified top-k graph pattern matching. Proceedings of the VLDB Endowment, 2013, 6(13): 1510–1521CrossRefGoogle Scholar
  42. 42.
    Ma S, Cao Y, Huai J P, Wo T Y. Distributed graph pattern matching. In: Proceedings of the 21st World Wide Web Conference. 2012, 949–958Google Scholar
  43. 43.
    Khan A, Li N, Yan X F, Guan Z Y, Chakraborty S, Tao S. Neighborhood based fast graph search in large networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 901–912Google Scholar
  44. 44.
    Buneman P, Cong G, Fan W F, Kementsietsidis A. Using partial evaluation in distributed query evaluation. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 211–222Google Scholar
  45. 45.
    Cong G, Fan W F, Kementsietsidis A. Distributed query evaluation with performance guarantees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 509–520Google Scholar
  46. 46.
    Cong G, Fan W F, Kementsietsidis A, Li J Z, Liu X M. Partial evaluation for distributed XPath query processing and beyond. ACM Transactions on Database Systems, 2012, 37(4): 32CrossRefGoogle Scholar
  47. 47.
    Fan W F, Wang X, Wu Y H. Performance guarantees for distributed reachability queries. Proceedings of the VLDB Endowment, 2012, 5(11): 1304–1315CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Big Data Provincial Key LaboratoryHunan UniversityChangshaChina
  2. 2.Institute of Computer Science and TechnologyPeking UniversityBeijingChina

Personalised recommendations