Skip to main content

Graph Data Management and Mining: A Survey of Algorithms and Applications

  • Chapter
  • First Online:
Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

Graph mining and management has become a popular area of research in recent years because of its numerous applications in a wide variety of practical fields, including computational biology, software bug localization and computer networking. Different applications result in graphs of different sizes and com- plexities. Correspondingly, the applications have different requirements for the underlying mining algorithms. In this chapter, we will provide a survey of dif- ferent kinds of graph mining and management algorithms. We will also discuss a number of applications, which are dependent upon graph representations. We will discuss how the different graph mining algorithms can be adapted for different applications. Finally, we will discuss important avenues of future research in the area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chemaxon. Screen, Chemaxon Inc., 2005.

    Google Scholar 

  2. Daylight. Daylight Toolkit, Daylight Inc, Mission Viejo, CA, USA, 2008.

    Google Scholar 

  3. Oracle Spatial Topology and Network Data Models 10g Release 1 (10.1) URL: http://www.oracle.com/technology/products/spatial/pdf/10g_network_model_twp.pdf

  4. Semantic Web Challenge. URL: http://challenge.semanticweb.org/

  5. J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN) (Cancun, Mexico). 598–612, 2002.

    Google Scholar 

  6. S. Abiteboul, P. Buneman, D. Suciu. Data on the web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 1999.

    Google Scholar 

  7. C. Aggarwal, Y. Xie, P. Yu. GConnect: A Connectivity Index for Massive Disk-Resident Graphs, VLDB Conference, 2009.

    Google Scholar 

  8. C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.

    Google Scholar 

  9. C. Aggarwal, P. Yu. Online Analysis of Community Evolution in Data Streams. SIAM Conference on Data Mining, 2005.

    Google Scholar 

  10. R. Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of Transitive Relationships in Large Data and Knowledge Bases, ACM SIGMOD Conference, 1989.

    Google Scholar 

  11. R. Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases, VLDB Conference, 1994.

    Google Scholar 

  12. S. Agrawal, S. Chaudhuri, G. Das. DBXplorer: A system for keyword-based search over relational databases. ICDE Conference, 2002.

    Google Scholar 

  13. R. Ahuja, J. Orlin, T. Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.

    Google Scholar 

  14. S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis. On Storing Voluminous RDF Description Bases. In WebDB, 2001.

    Google Scholar 

  15. S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases. In SemWeb, 2001.

    Google Scholar 

  16. S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM KDD Conference, 2007.

    Google Scholar 

  17. R. Baeza-Yates, A Tiberi. Extracting semantic relations from query logs. ACM KDD Conference, 2007.

    Google Scholar 

  18. Z. Bar-Yossef, R. Kumar, D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. ACM SODA Conference, 2002.

    Google Scholar 

  19. D. Beckett. The Design and Implementation of the Redland RDF Application Framework. WWW Conference, 2001.

    Google Scholar 

  20. P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2(1), 2005.

    Google Scholar 

  21. P. Berkhin. Bookmark-coloring approach to personalized pagerank computing. Internet Mathematics, 3(1), 2006.

    Google Scholar 

  22. M. Berlingerio, F. Bonchi, B. Bringmann, A. Gionis. Mining Graph-Evolution Rules, PKDD Conference, 2009.

    Google Scholar 

  23. S. Bhagat, G. Cormode, I. Rozenbaum. Applying link-based classification to label blogs. WebKDD/SNA-KDD, pages 97–117, 2007.

    Google Scholar 

  24. G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE Conference, 2002.

    Google Scholar 

  25. M. Bilgic, L. Getoor. Effective label acquisition for collective classification. ACM KDD Conference, pages 43–51, 2008.

    Google Scholar 

  26. S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, J. Simeon. XQuery 1.0: An XML query language. URL: W3C, http://www.w3.org/TR/xquery/,2007.

  27. I. Bordino, D. Donato, A. Gionis, S. Leonardi. Mining Large Networks with Subgraph Counting. IEEE ICDM Conference, 2008.

    Google Scholar 

  28. C. Borgelt, M. R. Berthold. Mining molecular fragments: Find- ing Relevant Substructures of Molecules. ICDM Conference, 2002.

    Google Scholar 

  29. S. Brin, L. Page. The Anatomy of a Large Scale Hypertextual Search Engine, WWW Conference, 1998.

    Google Scholar 

  30. H.J. Bohm, G. Schneider. Virtual Screening for Bioactive Molecules. Wiley-VCH, 2000.

    Google Scholar 

  31. B. Bringmann, S. Nijssen. What is frequent in a single graph? PAKDD Conference, 2008.

    Google Scholar 

  32. A. Z. Broder, M. Charikar, A. Frieze, M. Mitzenmacher. Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8–13):1157–1166, 1997.

    Google Scholar 

  33. J. Broekstra, A. Kampman, F. V. Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC Conference, 2002.

    Google Scholar 

  34. H. Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 18: pp. 689–694, 1997.

    Article  MathSciNet  Google Scholar 

  35. H. Bunke, G. Allermann. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1: pp. 245–253, 1983.

    Article  MATH  Google Scholar 

  36. H. Bunke, X. Jiang, A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1): pp. 13–25, 2000.

    MATH  MathSciNet  Google Scholar 

  37. H. Bunke, K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3): pp. 255–259, 1998.

    Article  MATH  Google Scholar 

  38. J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, K. Wilkinson. Jena: implementing the Semantic Web recommendations. In WWW Conference, 2004.

    Google Scholar 

  39. V. R. de Carvalho, W. W. Cohen. On the collective classification of email “speech acts”. ACM SIGIR Conference, pages 345–352, 2005.

    Google Scholar 

  40. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, C. Faloutsos. Epidemic thresholds in real networks. ACM Transactions on Information Systems and Security, 10(4), 2008.

    Google Scholar 

  41. D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.

    Google Scholar 

  42. S. Chakrabarti. Dynamic Personalized Pagerank in Entity-Relation Graphs, WWW Conference, 2007.

    Google Scholar 

  43. R.-Y. Chang, A. Podgurski, J. Yang. Discovering Neglected Conditions in Software by Mining Dependence Graphs. IEEE Transactions on Software Engineering, 34(5):579–596, 2008.

    Article  Google Scholar 

  44. O. Chapelle, A. Zien, B. Scholkopf, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.

    Google Scholar 

  45. S. S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.

    Google Scholar 

  46. C. Chen, C. Lin, M. Fredrikson, M. Christodorescu, X. Yan, J. Han, Mining Graph Patterns Efficiently via Randomized Summaries, VLDB Conference, 2009.

    Google Scholar 

  47. L. Chen, A. Gupta, M. E. Kurul. Stack-based algorithms for pattern matching on dags. VLDB Conference, 2005.

    Google Scholar 

  48. J. Cheng, J. Xu Yu, X. Lin, H. Wang, P. S. Yu. Fast Computing of Reachability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.

    Google Scholar 

  49. J. Cheng, J. Xu Yu, X. Lin, H. Wang, P. S. Yu. Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.

    Google Scholar 

  50. Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. KDD Conference, 2007.

    Google Scholar 

  51. C. Chung, J. Min, K. Shim. APEX: An adaptive path index for XML data. In SIGMOD Conference, 2002.

    Google Scholar 

  52. J. Clark, S. DeRose. XML Path Language (XPath). URL: W3C, http://www.w3.org/TR/xpath/,1999.

  53. E. Cohen. Size-estimation Framework with Applications to Transitive Closure and Reachability, Journal of Computer and System Sciences, v.55 n.3, p.441–453, Dec. 1997.

    Article  MATH  MathSciNet  Google Scholar 

  54. E. Cohen, E. Halperin, H. Kaplan, U. Zwick. Reachability and Distance Queries via 2-hop Labels, ACM Symposium on Discrete Algorithms, 2002.

    Google Scholar 

  55. S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB Conference, 2003.

    Google Scholar 

  56. M. P. Consens, A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In PODS Conference, 1990.

    Google Scholar 

  57. D. Conte, P. Foggia, C. Sansone, M. Vento. Thirty Years of Graph Matching in Pattern Recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18(3): pp. 265–298, 2004.

    Article  Google Scholar 

  58. D. Cook, L. Holder. Mining Graph Data, John Wiley & Sons Inc, 2007.

    Google Scholar 

  59. B. F. Cooper, N. Sample, M. Franklin, G. Hjaltason, M. Shadmon. A fast index for semistructured data. In VLDB Conference, pages 341–350, 2001.

    Google Scholar 

  60. L.P. Cordella, P. Foggia, C. Sansone, M. Vento. A (Sub)graph Isomorphism Algorithm for Matching Large Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(20): pp. 1367–1372, 2004.

    Article  Google Scholar 

  61. G. Cormode, S. Muthukrishnan. Space efficient mining of multigraph streams. ACM PODS Conference, 2005.

    Google Scholar 

  62. K. Crammer Y. Singer. A new family of online algorithms for category ranking. Journal of Machine Learning Research., 3:1025–1058, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  63. T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005.

    Google Scholar 

  64. V. Dallmeier, C. Lindig, A. Zeller. Lightweight Defect Localization for Java. In Proc. of the 19th European Conf. on Object-Oriented Programming (ECOOP), 2005.

    Google Scholar 

  65. M. Deshpande, M. Kuramochi, N. Wale, G. Karypis. Frequent Substructure-based Approaches for Classifying Chemical Compounds. IEEE Transactions on Knowledge and Data Engineering, 17: pp. 1036–1050, 2005.

    Article  Google Scholar 

  66. E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269–271.

    Article  MATH  MathSciNet  Google Scholar 

  67. F. Eichinger, K. Bohm, M. Huber. Improved Software Fault Detection with Graph Mining. Workshop on Mining and Learning with Graphs, 2008.

    Google Scholar 

  68. F. Eichinger, K. Bohm, M. Huber. Mining Edge-Weighted Call Graphs to Localize Software Bugs. PKDD Conference, 2008.

    Google Scholar 

  69. T. Falkowski, J. Bartelheimer, M. Spilopoulou. Mining and Visualizing the Evolution of Subgroups in Social Networks, ACM International Conference on Web Intelligence, 2006.

    Google Scholar 

  70. M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.

    Google Scholar 

  71. W. Fan, K. Zhang, H. Cheng, J. Gao. X. Yan, J. Han, P. S. Yu O. Verscheure. Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. ACM KDD Conference, 2008.

    Google Scholar 

  72. G. Di Fatta, S. Leue, E. Stegantova. Discriminative Pattern Mining in Software Fault Detection. Workshop on Software Quality Assurance, 2006.

    Google Scholar 

  73. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, J. Zhang. Graph Distances in the Data-Stream Model. SIAM Journal on Computing, 38(5): pp. 1709–1727, 2008.

    Article  MATH  MathSciNet  Google Scholar 

  74. J. Ferlez, C. Faloutsos, J. Leskovec, D. Mladenic, M. Grobelnik. Monitoring Network Evolution using MDL. IEEE ICDE Conference, 2008.

    Google Scholar 

  75. M. Fiedler, C. Borgelt. Support computation for mining frequent subgraphs in a single graph. Workshop on Mining and Learning with Graphs (MLG’07), 2007.

    Google Scholar 

  76. M.A. Fischler, R.A. Elschlager. The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1): pp 67–92, 1973.

    Article  Google Scholar 

  77. P.-O. Fjallstrom. Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science, Vol 3, no 10, 1998.

    Google Scholar 

  78. G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.

    MathSciNet  Google Scholar 

  79. D. Fogaras, B. Racz, K. Csalogany, T. Sarlos. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3), 2005.

    Google Scholar 

  80. M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, 1979.

    Google Scholar 

  81. T. Gartner, P. Flach, S. Wrobel. On graph kernels: Hardness results and efficient alternatives. 16th Annual Conf. on Learning Theory, pp. 129–143, 2003.

    Google Scholar 

  82. D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.

    Google Scholar 

  83. R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs. International Conference in Pattern recognition (ICPR), 2002.

    Google Scholar 

  84. S. Godbole, S. Sarawagi. Discriminative methods for multi-labeled classification. PAKDD Conference, pages 22–30, 2004.

    Google Scholar 

  85. R. Goldman, J. Widom. DataGuides: Enable query formulation and optimization in semistructured databases. VLDB Conference, pages 436–445, 1997.

    Google Scholar 

  86. L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. ACM SIGMOD Conference, pages 16–27, 2003.

    Google Scholar 

  87. M. S. Gupta, A. Pathak, S. Chakrabarti. Fast algorithms for top-k personalized pagerank queries. WWW Conference, 2008.

    Google Scholar 

  88. R. H. Guting. GraphDB: Modeling and querying graphs in databases. In VLDB Conference, pages 297–308, 1994.

    Google Scholar 

  89. M. Gyssens, J. Paredaens, D. van Gucht. A graph-oriented object database model. In PODS Conference, pages 417–424, 1990.

    Google Scholar 

  90. J. Han, J. Pei, Y. Yin. Mining Frequent Patterns without Candidate Generation. SIGMOD Conference, 2000.

    Google Scholar 

  91. S. Harris, N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS Conference, 2003.

    Google Scholar 

  92. S. Harris, N. Shadbolt. SPARQL query processing with conventional relational database systems. In SSWS Conference, 2005.

    Google Scholar 

  93. M. Al Hasan, V. Chaoji, S. Salem, J. Besson, M. J. Zaki. ORIGAMI: Mining Representative Orthogonal Graph Patterns. ICDM Conference, 2007.

    Google Scholar 

  94. D. Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz, 1999.

    Google Scholar 

  95. T. Haveliwala. Topic-Sensitive Page Rank, World Wide Web Conference, 2002.

    Google Scholar 

  96. H. He, A. K. Singh. Query Language and Access Methods for Graph Databases, appears as a chapter in Managing and Mining Graph Data, ed. Charu Aggarwal, Springer, 2010.

    Google Scholar 

  97. H. He, Querying and mining graph databases. Ph.D. Thesis, UCSB, 2007.

    Google Scholar 

  98. H. He, A. K. Singh. Efficient Algorithms for Mining Significant Substructures from Graphs with Quality Guarantees. ICDM Conference, 2007.

    Google Scholar 

  99. H. He, H. Wang, J. Yang, P. S. Yu. BLINKS: Ranked keyword searches on graphs. SIGMOD Conference, 2007.

    Google Scholar 

  100. [100] J. Huan, W. Wang, J. Prins, J. Yang. Spin: Mining Maximal Frequent Subgraphs from Graph Databases. KDD Conference, 2004.

    Google Scholar 

  101. J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, A. Tropsha. Mining Spatial Motifs from Protein Structure Graphs. Research in Computational Molecular Biology (RECOMB), pp. 308–315, 2004.

    Google Scholar 

  102. V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4):525–539, 2006.

    Article  Google Scholar 

  103. V. Hristidis, Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB Conference, 2002.

    Google Scholar 

  104. A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. PKDD Conference, pages 13–23, 2000.

    Google Scholar 

  105. H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558–598, 1990.

    Article  MathSciNet  Google Scholar 

  106. H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, C. Yu. TIMBER: A native XML database. In VLDB Journal, 11(4):274–291, 2002.

    Article  MATH  Google Scholar 

  107. H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, K. Thompson. TAX: A tree algebra for XML. DBPL Conference, 2001.

    Google Scholar 

  108. G. Jeh, J. Widom. Scaling personalized web search. In WWW, pages 271–279, 2003.

    Google Scholar 

  109. J. L. Jenkins, A. Bender, J. W. Davies. In silico target fishing: Predicting biological targets from chemical structure. Drug Discovery Today, 3(4):413–421, 2006.

    Article  Google Scholar 

  110. R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, G. Agrawal. Discovering Frequent Topological Structures from Graph Datasets. ACM KDD Conference, 2005.

    Google Scholar 

  111. R. Jin, H. Hong, H. Wang, Y. Xiang, N. Ruan. Computing Label-Constraint Reachability in Graph Databases. Under submission, 2009.

    Google Scholar 

  112. R. Jin, Y. Xiang, N. Ruan, D. Fuhry. 3-HOP: A high-compression indexing scheme for reachability query. SIGMOD Conference, 2009.

    Google Scholar 

  113. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB Conference, 2005.

    Google Scholar 

  114. H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs, ICML, 2003.

    Google Scholar 

  115. R. Kaushik, P. Bohannon, J. Naughton, H. Korth. Covering indexes for branching path queries. In SIGMOD Conference, June 2002.

    Google Scholar 

  116. B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291–307.

    Google Scholar 

  117. M.-S. Kim, J. Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks, VLDB Conference, 2009.

    Google Scholar 

  118. J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46(5):pp. 604–632, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  119. R.I. Kondor, J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. ICML Conference, pp. 315–322, 2002.

    Google Scholar 

  120. M. Koyuturk, A. Grama, W. Szpankowski. An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks. Bioinformatics, 20:1200–207, 2004.

    Article  Google Scholar 

  121. T. Kudo, E. Maeda, Y. Matsumoto. An Application of Boosting to Graph Classification, NIPS Conf. 2004.

    Google Scholar 

  122. R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.

    Google Scholar 

  123. M. Kuramochi, G. Karypis. Frequent subgraph discovery. ICDM Conference, pp. 313–320, Nov. 2001.

    Google Scholar 

  124. M. Kuramochi, G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3): pp. 243–271, 2005.

    Article  MathSciNet  Google Scholar 

  125. J. Larrosa, G. Valiente. Constraint satisfaction algorithms for graph pattern matching. Mathematical Structures in Computer Science, 12(4): pp. 403–422, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  126. M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. CIKM Conference, 2002.

    Google Scholar 

  127. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. S. Glance. Cost-effective outbreak detection in networks. KDD Conference, pp. 420–429, 2007.

    Google Scholar 

  128. J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs, SDM Conference, 2007.

    Google Scholar 

  129. J. Leskovec, J. Kleinberg, C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. ACM KDD Conference, 2005.

    Google Scholar 

  130. J. Leskovec, E. Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network, WWW Conference, 2008.

    Google Scholar 

  131. J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins. Microscopic Evolution of Social Networks, ACM KDD Conference, 2008.

    Google Scholar 

  132. Q. Li, B. Moon. Indexing and querying XML data for regular path expressions. In VLDB Conference, pages 361–370, September 2001.

    Google Scholar 

  133. W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.

    Google Scholar 

  134. L. Lim, H. Wang, M. Wang. Semantic Queries in Databases: Problems and Challenges. CIKM Conference, 2009.

    Google Scholar 

  135. Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, B. L. Tseng. FacetNet: A framework for analyzing communities and their evolutions in dynamic networks. WWW Conference, 2008.

    Google Scholar 

  136. C. Liu, X. Yan, H. Yu, J. Han, P. S. Yu. Mining Behavior Graphs for “Backtrace” of Noncrashing Bugs. SDM Conference, 2005.

    Google Scholar 

  137. C. Liu, X. Yan, L. Fei, J. Han, S. P. Midkiff. SOBER: Statistical Model-Based Bug Localization. SIGSOFT Software Engineering Notes, 30(5):286–295, 2005.

    Article  Google Scholar 

  138. Q. Lu, L. Getoor. Link-based classification. ICML Conference, pages 496–503, 2003.

    Google Scholar 

  139. F. Manola, E. Miller. RDF Primer. W3C, http://www.w3.org/TR/rdf-primer/,2004.

  140. A. McGregor. Finding Graph Matchings in Data Streams. APPROX-RANDOM, pp. 170–181, 2005.

    Google Scholar 

  141. T. Milo and D. Suciu. Index structures for path expression. In ICDT Conference, pages 277–295, 1999.

    Google Scholar 

  142. S. Navlakha, R. Rastogi, N. Shrivastava. Graph Summarization with Bounded Error. ACMSIGMOD Conference, pp. 419–432, 2008.

    Google Scholar 

  143. M. Neuhaus, H. Bunke. Self-organizing maps for learning the edit costs in graph matching. IEEE Transactions on Systems, Man, and Cybernetics, 35(3) pp. 503–514, 2005.

    Article  Google Scholar 

  144. M. Neuhaus, H. Bunke. Automatic learning of cost functions for graph edit distance. Information Sciences, 177(1), pp 239–247, 2007.

    Article  MATH  MathSciNet  Google Scholar 

  145. M. Neuhaus, H. Bunke. Bridging the Gap Between Graph Edit Distance and Kernel Machines. World Scientific, 2007.

    Google Scholar 

  146. M. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 2006.

    Google Scholar 

  147. M. E. J. Newman. The spread of epidemic disease on networks, Phys. Rev. E 66, 016128, 2002.

    Google Scholar 

  148. J. Pei, D. Jiang, A. Zhang. On Mining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.

    Google Scholar 

  149. Nidhi, M. Glick, J. Davies, J. Jenkins. Prediction of biological targets for compounds using multiple-category bayesian models trained on chemogenomics databases. J Chem Inf Model, 46:1124–1133, 2006.

    Article  Google Scholar 

  150. S. Nijssen, J. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of SIGKDD, pages 647–652, 2004.

    Google Scholar 

  151. L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.

    Google Scholar 

  152. Z. Pan, J. Heflin. DLDB: Extending relational databases to support Semantic Web queries. In PSSS Conference, 2003.

    Google Scholar 

  153. J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.

    Google Scholar 

  154. E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF. W3C, URL: http://www.w3.org/TR/rdf-sparql-query/,2007.

  155. L. Qin, J.-X. Yu, L. Chang. Keyword search in databases: The power of RDBMS. SIGMOD Conference, 2009.

    Google Scholar 

  156. S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405–416, 2003.

    Google Scholar 

  157. S. Ranu, A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. ICDE Conference, 2009.

    Google Scholar 

  158. M. Rattigan, M. Maier, D. Jensen. Graph Clustering with Network Sructure Indices. ICML, 2007.

    Google Scholar 

  159. P. R. Raw, B. Moon. PRIX: Indexing and querying XML using prufer sequences. ICDE Conference, 2004.

    Google Scholar 

  160. J. W. Raymond, P. Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comp. Aided Mol. Des., 16(7):521–533, 2002.

    Article  Google Scholar 

  161. K. Riesen, X. Jiang, H. Bunke. Exact and Inexact Graph Matching: Methodology and Applications, appears as a chapter in Managing and Mining Graph Data, ed. Charu Aggarwal, Springer, 2010.

    Google Scholar 

  162. H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. GBoost: A mathematical programming approach to graph classification and regression. Machine Learning, 2008.

    Google Scholar 

  163. F. Sams-Dodd. Target-based drug discovery: is something wrong? Drug Discov Today, 10(2):139–147, Jan 2005.

    Article  Google Scholar 

  164. P. Sarkar, A. Moore, A. Prakash. Fast Incremental Proximity Search in Large Graphs, ICML Conference, 2008.

    Google Scholar 

  165. P. Sarkar, A. Moore. Fast Dynamic Re-ranking of Large Graphs, WWW Conference, 2009.

    Google Scholar 

  166. A. D. Sarma, S. Gollapudi, R. Panigrahy. Estimating PageRank in Graph Streams, ACM PODS Conference, 2008.

    Google Scholar 

  167. V. Satuluri, S. Parthasarathy. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery, ACM KDD Conference, 2009.

    Google Scholar 

  168. R. Schenkel, A. Theobald, G. Weikum. Hopi: An efficient connection index for complex XML document collections. EDBT Conference, 2004.

    Google Scholar 

  169. J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. VLDB Conference, 1999.

    Google Scholar 

  170. N. Stiefl, I. A. Watson, K. Baumann, A. Zaliani. Erg: 2d pharmacophore descriptor for scaffold hopping. J. Chem. Info. Model., 46:208–220, 2006.

    Article  Google Scholar 

  171. J. Sun, S. Papadimitriou, C. Faloutsos, P. Yu. GraphScope: Parameter Free Mining of Large Time-Evolving Graphs, ACM KDD Conference, 2007.

    Google Scholar 

  172. S. J. Swamidass, J. Chen, J. Bruand, P. Phung, L. Ralaivola, P. Baldi. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1):359–368, 2005.

    Article  Google Scholar 

  173. L. Tang, H. Liu, J. Zhang, Z. Nazeri. Community evolution in dynamic multi-mode networks. ACM KDD Conference, 2008.

    Google Scholar 

  174. B. Taskar, P. Abbeel, D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485–492, 2002.

    Google Scholar 

  175. H. Tong, C. Faloutsos, J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, pages 613–622, 2006.

    Google Scholar 

  176. S. TrißI, U. Leser. Fast and practical indexing and querying of very large graphs. SIGMOD Conference, 2007.

    Google Scholar 

  177. A. A. Tsay, W. S. Lovejoy, D. R. Karger. Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383–413, 1999.

    Article  MathSciNet  Google Scholar 

  178. K. Tsuda, W. S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004.

    Article  Google Scholar 

  179. K. Tsuda, H. Saigo. Graph Classification, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.

    Google Scholar 

  180. J.R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of the Association for Computing Machinery, 23(1): pp. 31–42, 1976.

    MathSciNet  Google Scholar 

  181. N. Vanetik, E. Gudes, S. E. Shimony. Computing Frequent Graph Patterns from Semi-structured Data. IEEE ICDM Conference, 2002.

    Google Scholar 

  182. R. Volz, D. Oberle, S. Staab, and B. Motik. KAON SERVER: A Semantic Web Management System. In WWW Conference, 2003.

    Google Scholar 

  183. H. Wang, C. Aggarwal. A Survey of Algorithms for Keyword Search on Graph Data. appears as a chapter in Managing and Mining Graph Data, Springer, 2010.

    Google Scholar 

  184. H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.

    Google Scholar 

  185. H. Wang, S. Park, W. Fan, P. S. Yu. ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. In SIGMOD Conference, 2003.

    Google Scholar 

  186. H. Wang, X. Meng. On the Sequencing of Tree Structures for XML Indexing. In ICDE Conference, 2005.

    Google Scholar 

  187. Y. Wang, D. Chakrabarti, C. Wang, C. Faloutsos. Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint, SRDS, pp. 25–34, 2003.

    Google Scholar 

  188. N. Wale, G. Karypis. Target identification for chemical compounds using target-ligand activity data and ranking based methods. Technical Report TR-08-035, University of Minnesota, 2008.

    Google Scholar 

  189. N. Wale, G. Karypis, I. A. Watson. Method for effective virtual screening and scaffold-hopping in chemical compounds. Comput Syst Bioinformatics Conf, 6:403–414, 2007.

    Article  Google Scholar 

  190. N. Wale, X. Ning, G. Karypis. Trends in Chemical Graph Data Mining, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.

    Google Scholar 

  191. N. Wale, I. A. Watson, G. Karypis. Indirect similarity based methods for effective scaffold-hopping in chemical compounds. J. Chem. Info. Model., 48(4):730–741, 2008.

    Article  Google Scholar 

  192. N. Wale, I. A. Watson, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14:347–375, 2008.

    Article  Google Scholar 

  193. C. Weiss, P. Karras, A. Bernstein. Hexastore: Sextuple Indexing for Semantic Web Data Management. In VLDB Conference, 2008.

    Google Scholar 

  194. K. Wilkinson. Jena property table implementation. In SSWS Conference, 2006.

    Google Scholar 

  195. K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in Jena2. In SWDB Conference, 2003.

    Google Scholar 

  196. Y. Xu, Y. Papakonstantinou. Efficient LCA based keyword search in XML data. EDBT Conference, 2008.

    Google Scholar 

  197. Y. Xu, Y.Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. ACM SIGMOD Conference, 2005.

    Google Scholar 

  198. X. Yan, J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.

    Google Scholar 

  199. X. Yan, H. Cheng, J. Han, P. S. Yu. Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.

    Google Scholar 

  200. X. Yan, J. Han. Gspan: Graph-based Substructure Pattern Mining. ICDM Conference, 2002.

    Google Scholar 

  201. X. Yan, P. S. Yu, J. Han. Graph indexing: A frequent structure-based approach. SIGMOD Conference, 2004.

    Google Scholar 

  202. X. Yan, P. S. Yu, J. Han. Substructure similarity search in graph databases. SIGMOD Conference, 2005.

    Google Scholar 

  203. X. Yan, B. He, F. Zhu, J. Han. Top-K Aggregation Queries Over Large Networks, IEEE ICDE Conference, 2010.

    Google Scholar 

  204. J. X. Yu, J. Cheng. Graph Reachability Queries: A Survey, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.

    Google Scholar 

  205. M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.

    Google Scholar 

  206. T. Zhang, A. Popescul, B. Dom. Linear prediction models with graph regularization for web-page categorization. ACM KDD Conference, pages 821–826, 2006.

    Google Scholar 

  207. Q. Zhang, I. Muegge. Scaffold hopping through virtual screening using 2d and 3d similarity descriptors: Ranking, voting and consensus scoring. J. Chem. Info. Model., 49:1536–1548, 2006.

    Google Scholar 

  208. P. Zhao, J. Yu, P. Yu. Graph indexing: tree + delta >= graph. VLDB Conference, 2007.

    Google Scholar 

  209. D. Zhou, J. Huang, B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. ICML Conference, pages 1036–1043, 2005.

    Google Scholar 

  210. D. Zhou, O. Bousquet, J. Weston, B. Scholkopf. Learning with local and global consistency. Advances in Neural Information Processing Systems (NIPS) 16, pages 321–328. MIT Press, 2004.

    Google Scholar 

  211. X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. ICML Conference, pages 912–919, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Aggarwal, C.C., Wang, H. (2010). Graph Data Management and Mining: A Survey of Algorithms and Applications. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_2

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics