Frontiers of Computer Science

, Volume 6, Issue 3, pp 313–338 | Cite as

Adding regular expressions to graph reachability and pattern queries

  • Wenfei Fan
  • Jianzhong Li
  • Shuai Ma
  • Nan Tang
  • Yinghui Wu
Research Article
  • 143 Downloads

Abstract

It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data.

Keywords

graph reachability graph pattern queries regular expressions containment equivalence minimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32(5): 1338–1355MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Jin R, Xiang Y, Ruan N, Fuhry D. 3-hop: a high-compression indexing scheme for reachability query. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD’9. 2009, 813–826Google Scholar
  3. 3.
    Wang H, He H, Yang J, Yu P S, Yu J X. Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE’6. 2006, 75–86Google Scholar
  4. 4.
    Agrawal R, Borgida A, Jagadish H V. Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, SIGMOD’89. 1989, 253–262Google Scholar
  5. 5.
    Jin R, Xiang Y, Ruan N, Wang H. Efficiently answering reachability queries on very large directed graphs. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08. 2008, 595–608Google Scholar
  6. 6.
    Jin R, Hong H, Wang H, Ruan N, Xiang Y. Computing label-constraint reachability in graph databases. In: Proceedings of the 2010 ACMSIGMOD International Conference on Management of Data, SIGMOD’10. 2010, 123–134Google Scholar
  7. 7.
    Bruno N, Koudas N, Srivastava D. Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD’02. 2002, 310–321Google Scholar
  8. 8.
    Chen L, Gupta A, Kurul M E. Stack-based algorithms for pattern matching on DAGs. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’05. 2005, 493–504Google Scholar
  9. 9.
    Cheng J, Yu J X, Ding B, Yu P S, Wang H. Fast graph pattern matching. In: Proceedings of the 24th IEEE International Conference on Data Engineering, ICDE’08. 2008, 913–922Google Scholar
  10. 10.
    Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T. Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07. 2007, 737–746Google Scholar
  11. 11.
    Zou L, Chen L, Ösu M. Distance-join: pattern match query in a large graph database. In: Proceedings of the VLDB Endowment. 2009, 886–897Google Scholar
  12. 12.
    Gallagher B. Matching structure and semantics: a survey on graphbased pattern matching. In: Proceedings of AAAI FS’06. 2006, 45–53Google Scholar
  13. 13.
    McPherson M, Smith-Lovin L, Cook J. Birds of a feather: homophily in social networks. Annual Review of Sociology, 2001, 27: 415–444CrossRefGoogle Scholar
  14. 14.
    Brzozowski MJ, Hogg T, Szabo G. Friends and foes: ideological social networking. In: Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems, CHI’08. 2008, 817–820Google Scholar
  15. 15.
    Henzinger M, Henzinger T, Kopke P. Computing simulations on finite and infinite graphs. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS’95. 1995, 453–462Google Scholar
  16. 16.
    Neven F, Schwentick T. XPath containment in the presence of disjunction, DTDs, and variables. In: Proceedings of the 9th International Conference on Database Theory, ICDT’03. 2002, 315–329Google Scholar
  17. 17.
    Wood P T. Containment for XPath fragments under DTD constraints. In: Proceedings of the 9th International Conference on Database Theory, ICDT’03. 2002, 300–314Google Scholar
  18. 18.
    Papadimitriou C H. Computational complexity. In: Ralston A, Reilly E D, Hemmendinger D, eds. Encyclopedia of Computer Science. Chichester: Wiley, 1994, 260–265Google Scholar
  19. 19.
    National Consortium for the Study of Terrorism and Responses to Terrorism (START). http://www.start.umd.edu/gtd
  20. 20.
    Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In: Proceedings of the 27th IEEE International Conference on Data Engineering, ICDE’11. 2011, 39–50Google Scholar
  21. 21.
    Buneman P, Fernandez M, Suciu D. UnQL: a query language and algebra for semistructured data based on structural recursion. The International Journal on Very Large Data Bases, 2000, 9(1): 76–110CrossRefGoogle Scholar
  22. 22.
    Abiteboul S, Quass D, McHugh J, Widom J, Wiener J. The lorel query language for semistructured data. International Journal on Digital Libraries, 1997, 1(1): 68–88CrossRefGoogle Scholar
  23. 23.
    Florescu D, Levy A, Suciu D. Query containment for conjunctive queries with regular expressions. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database SSystems. 1998, 139–148Google Scholar
  24. 24.
    Barceló P, Hurtado C, Libkin L, Wood P. Expressive languages for path queries over graph-structured data. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of data. 2010, 3–14Google Scholar
  25. 25.
    He H, Singh A. Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 405–418Google Scholar
  26. 26.
    Ronen R, Shmueli O. SoQL: a language for querying and creating data in social networks. In: Proceedings of the 25th IEEE International Conference on Data Engineering, ICDE’09. 2009, 1595–1602Google Scholar
  27. 27.
    SPARQL query language for RDF. http://www.w3.org/TR/rdfsparqlquery/
  28. 28.
    Mandreoli F, Martoglia R, Villani G, Penzo W. Flexible query answering on graph-modeled data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’09. 2009, 216–227Google Scholar
  29. 29.
    Chan E P, Lim H. Optimization and evaluation of shortest path queries. The VLDB Journal, 2007, 16(3): 343–369CrossRefGoogle Scholar
  30. 30.
    Wei F. TEDI: efficient shortest path query answering on graphs. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD’10. 2010, 99–110Google Scholar
  31. 31.
    Shasha D, Wang J, Giugno R. Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2002, 39–52Google Scholar
  32. 32.
    Bohannon P, Fan W, Flaster M, Narayan P. Information preserving XML schema embedding. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 85–96Google Scholar
  33. 33.
    Fan W, Li J, Ma S, Wang H, Wu Y. Graph homomorphism revisited for graph matching. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1161–1172Google Scholar
  34. 34.
    Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: From intractable to polynomial time. In: Proceedings of the VLDB Endowment. 2010, 264–275Google Scholar
  35. 35.
    Bustan D, Grumberg O. Simulation-based minimization. ACM Transactions on Computational Logic (TOCL), 2003, 4(2): 181–206MathSciNetCrossRefGoogle Scholar
  36. 36.
    Abiteboul S, Hull R, Vianu V. Foundations of Databases: The Logical Level. 1st edition. Boston: Addison-Wesley, 1995MATHGoogle Scholar
  37. 37.
    Chen D, Chan C Y. Minimization of tree pattern queries with constraints. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08. 2008, 609–622Google Scholar
  38. 38.
    Milo T, Suciu D. Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory, ICDT’99. 1999, 277–295Google Scholar
  39. 39.
    Kaushik R, Shenoy P, Bohannon P, Gudes E. Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering, ICDE’02. 2002, 129–140Google Scholar
  40. 40.
    Yahia S, Benedikt M, Bohannon P. Challenges in searching online communities. IEEE Data Engineering Bulletin, 2007, 30(2): 23–31Google Scholar
  41. 41.
    Jiang T, Ravikumar B. Minimal nfa problems are hard. SIAM Journal on Computing, 1993, 22(6): 1117–1141MathSciNetMATHCrossRefGoogle Scholar
  42. 42.
    Bang-Jensen J, Gutin G Z. Digraphs: Theory, Algorithms and Applications. 2nd edition. Springer, 2008Google Scholar
  43. 43.
    Chen Z, Shen H T, Zhou X, Yu J X. Monitoring path nearest neighbor in road networks. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD’09. 2009, 591–602Google Scholar
  44. 44.
    Ranzato F, Tapparo F. A new efficient simulation equivalence algorithm. In: Proceedings of the 22nd Annual IEEE Symposium on the Logic in Computer Science, LICS’07. 2007, 171–180Google Scholar
  45. 45.
    Tarjan R. Depth-first search and linear graph algorithms. SIAMJournal on Computing, 1972, 1(2): 146–160MathSciNetMATHCrossRefGoogle Scholar
  46. 46.
    Ullmann J. An algorithm for subgraph isomorphism. Journal of the ACM (JACM), 1976, 23(1): 31–42MathSciNetCrossRefGoogle Scholar
  47. 47.
  48. 48.
    Fan W, Li J, Luo J, Tan Z, Wang X, Wu Y. Incremental graph pattern matching. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD’11. 2011, 925–936Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wenfei Fan
    • 1
    • 2
  • Jianzhong Li
    • 2
  • Shuai Ma
    • 3
  • Nan Tang
    • 4
  • Yinghui Wu
    • 1
  1. 1.School of InformaticsUniversity of EdinburghEdinburghUK
  2. 2.Department of Computer Science and EngineeringHarbin Institute of TechnologyHarbinChina
  3. 3.NLSDE LabBeihang UniversityBeijing100919China
  4. 4.Qatar Computing Research InstituteQatar FoundationDohaQatar

Personalised recommendations