Graph Data Management and Mining: A Survey of Algorithms and Applications

Aggarwal, Charu C.; Wang, Haixun

doi:10.1007/978-1-4419-6045-0_2

Charu C. Aggarwal³ &
Haixun Wang⁴

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

7737 Accesses
32 Citations

Abstract

Graph mining and management has become a popular area of research in recent years because of its numerous applications in a wide variety of practical fields, including computational biology, software bug localization and computer networking. Different applications result in graphs of different sizes and com- plexities. Correspondingly, the applications have different requirements for the underlying mining algorithms. In this chapter, we will provide a survey of dif- ferent kinds of graph mining and management algorithms. We will also discuss a number of applications, which are dependent upon graph representations. We will discuss how the different graph mining algorithms can be adapted for different applications. Finally, we will discuss important avenues of future research in the area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chemaxon. Screen, Chemaxon Inc., 2005.
Google Scholar
Daylight. Daylight Toolkit, Daylight Inc, Mission Viejo, CA, USA, 2008.
Google Scholar
Oracle Spatial Topology and Network Data Models 10g Release 1 (10.1) URL: http://www.oracle.com/technology/products/spatial/pdf/10g_network_model_twp.pdf
Semantic Web Challenge. URL: http://challenge.semanticweb.org/
J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN) (Cancun, Mexico). 598–612, 2002.
Google Scholar
S. Abiteboul, P. Buneman, D. Suciu. Data on the web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 1999.
Google Scholar
C. Aggarwal, Y. Xie, P. Yu. GConnect: A Connectivity Index for Massive Disk-Resident Graphs, VLDB Conference, 2009.
Google Scholar
C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.
Google Scholar
C. Aggarwal, P. Yu. Online Analysis of Community Evolution in Data Streams. SIAM Conference on Data Mining, 2005.
Google Scholar
R. Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of Transitive Relationships in Large Data and Knowledge Bases, ACM SIGMOD Conference, 1989.
Google Scholar
R. Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases, VLDB Conference, 1994.
Google Scholar
S. Agrawal, S. Chaudhuri, G. Das. DBXplorer: A system for keyword-based search over relational databases. ICDE Conference, 2002.
Google Scholar
R. Ahuja, J. Orlin, T. Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.
Google Scholar
S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis. On Storing Voluminous RDF Description Bases. In WebDB, 2001.
Google Scholar
S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases. In SemWeb, 2001.
Google Scholar
S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM KDD Conference, 2007.
Google Scholar
R. Baeza-Yates, A Tiberi. Extracting semantic relations from query logs. ACM KDD Conference, 2007.
Google Scholar
Z. Bar-Yossef, R. Kumar, D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. ACM SODA Conference, 2002.
Google Scholar
D. Beckett. The Design and Implementation of the Redland RDF Application Framework. WWW Conference, 2001.
Google Scholar
P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2(1), 2005.
Google Scholar
P. Berkhin. Bookmark-coloring approach to personalized pagerank computing. Internet Mathematics, 3(1), 2006.
Google Scholar
M. Berlingerio, F. Bonchi, B. Bringmann, A. Gionis. Mining Graph-Evolution Rules, PKDD Conference, 2009.
Google Scholar
S. Bhagat, G. Cormode, I. Rozenbaum. Applying link-based classification to label blogs. WebKDD/SNA-KDD, pages 97–117, 2007.
Google Scholar
G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE Conference, 2002.
Google Scholar
M. Bilgic, L. Getoor. Effective label acquisition for collective classification. ACM KDD Conference, pages 43–51, 2008.
Google Scholar
S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, J. Simeon. XQuery 1.0: An XML query language. URL: W3C, http://www.w3.org/TR/xquery/,2007.
I. Bordino, D. Donato, A. Gionis, S. Leonardi. Mining Large Networks with Subgraph Counting. IEEE ICDM Conference, 2008.
Google Scholar
C. Borgelt, M. R. Berthold. Mining molecular fragments: Find- ing Relevant Substructures of Molecules. ICDM Conference, 2002.
Google Scholar
S. Brin, L. Page. The Anatomy of a Large Scale Hypertextual Search Engine, WWW Conference, 1998.
Google Scholar
H.J. Bohm, G. Schneider. Virtual Screening for Bioactive Molecules. Wiley-VCH, 2000.
Google Scholar
B. Bringmann, S. Nijssen. What is frequent in a single graph? PAKDD Conference, 2008.
Google Scholar
A. Z. Broder, M. Charikar, A. Frieze, M. Mitzenmacher. Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8–13):1157–1166, 1997.
Google Scholar
J. Broekstra, A. Kampman, F. V. Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC Conference, 2002.
Google Scholar
H. Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 18: pp. 689–694, 1997.
Article MathSciNet Google Scholar
H. Bunke, G. Allermann. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1: pp. 245–253, 1983.
Article MATH Google Scholar
H. Bunke, X. Jiang, A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1): pp. 13–25, 2000.
MATH MathSciNet Google Scholar
H. Bunke, K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3): pp. 255–259, 1998.
Article MATH Google Scholar
J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, K. Wilkinson. Jena: implementing the Semantic Web recommendations. In WWW Conference, 2004.
Google Scholar
V. R. de Carvalho, W. W. Cohen. On the collective classification of email “speech acts”. ACM SIGIR Conference, pages 345–352, 2005.
Google Scholar
D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, C. Faloutsos. Epidemic thresholds in real networks. ACM Transactions on Information Systems and Security, 10(4), 2008.
Google Scholar
D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.
Google Scholar
S. Chakrabarti. Dynamic Personalized Pagerank in Entity-Relation Graphs, WWW Conference, 2007.
Google Scholar
R.-Y. Chang, A. Podgurski, J. Yang. Discovering Neglected Conditions in Software by Mining Dependence Graphs. IEEE Transactions on Software Engineering, 34(5):579–596, 2008.
Article Google Scholar
O. Chapelle, A. Zien, B. Scholkopf, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.
Google Scholar
S. S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.
Google Scholar
C. Chen, C. Lin, M. Fredrikson, M. Christodorescu, X. Yan, J. Han, Mining Graph Patterns Efficiently via Randomized Summaries, VLDB Conference, 2009.
Google Scholar
L. Chen, A. Gupta, M. E. Kurul. Stack-based algorithms for pattern matching on dags. VLDB Conference, 2005.
Google Scholar
J. Cheng, J. Xu Yu, X. Lin, H. Wang, P. S. Yu. Fast Computing of Reachability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.
Google Scholar
J. Cheng, J. Xu Yu, X. Lin, H. Wang, P. S. Yu. Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.
Google Scholar
Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. KDD Conference, 2007.
Google Scholar
C. Chung, J. Min, K. Shim. APEX: An adaptive path index for XML data. In SIGMOD Conference, 2002.
Google Scholar
J. Clark, S. DeRose. XML Path Language (XPath). URL: W3C, http://www.w3.org/TR/xpath/,1999.
E. Cohen. Size-estimation Framework with Applications to Transitive Closure and Reachability, Journal of Computer and System Sciences, v.55 n.3, p.441–453, Dec. 1997.
Article MATH MathSciNet Google Scholar
E. Cohen, E. Halperin, H. Kaplan, U. Zwick. Reachability and Distance Queries via 2-hop Labels, ACM Symposium on Discrete Algorithms, 2002.
Google Scholar
S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB Conference, 2003.
Google Scholar
M. P. Consens, A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In PODS Conference, 1990.
Google Scholar
D. Conte, P. Foggia, C. Sansone, M. Vento. Thirty Years of Graph Matching in Pattern Recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18(3): pp. 265–298, 2004.
Article Google Scholar
D. Cook, L. Holder. Mining Graph Data, John Wiley & Sons Inc, 2007.
Google Scholar
B. F. Cooper, N. Sample, M. Franklin, G. Hjaltason, M. Shadmon. A fast index for semistructured data. In VLDB Conference, pages 341–350, 2001.
Google Scholar
L.P. Cordella, P. Foggia, C. Sansone, M. Vento. A (Sub)graph Isomorphism Algorithm for Matching Large Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(20): pp. 1367–1372, 2004.
Article Google Scholar
G. Cormode, S. Muthukrishnan. Space efficient mining of multigraph streams. ACM PODS Conference, 2005.
Google Scholar
K. Crammer Y. Singer. A new family of online algorithms for category ranking. Journal of Machine Learning Research., 3:1025–1058, 2003.
Article MATH MathSciNet Google Scholar
T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005.
Google Scholar
V. Dallmeier, C. Lindig, A. Zeller. Lightweight Defect Localization for Java. In Proc. of the 19th European Conf. on Object-Oriented Programming (ECOOP), 2005.
Google Scholar
M. Deshpande, M. Kuramochi, N. Wale, G. Karypis. Frequent Substructure-based Approaches for Classifying Chemical Compounds. IEEE Transactions on Knowledge and Data Engineering, 17: pp. 1036–1050, 2005.
Article Google Scholar
E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269–271.
Article MATH MathSciNet Google Scholar
F. Eichinger, K. Bohm, M. Huber. Improved Software Fault Detection with Graph Mining. Workshop on Mining and Learning with Graphs, 2008.
Google Scholar
F. Eichinger, K. Bohm, M. Huber. Mining Edge-Weighted Call Graphs to Localize Software Bugs. PKDD Conference, 2008.
Google Scholar
T. Falkowski, J. Bartelheimer, M. Spilopoulou. Mining and Visualizing the Evolution of Subgroups in Social Networks, ACM International Conference on Web Intelligence, 2006.
Google Scholar
M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.
Google Scholar
W. Fan, K. Zhang, H. Cheng, J. Gao. X. Yan, J. Han, P. S. Yu O. Verscheure. Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. ACM KDD Conference, 2008.
Google Scholar
G. Di Fatta, S. Leue, E. Stegantova. Discriminative Pattern Mining in Software Fault Detection. Workshop on Software Quality Assurance, 2006.
Google Scholar
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, J. Zhang. Graph Distances in the Data-Stream Model. SIAM Journal on Computing, 38(5): pp. 1709–1727, 2008.
Article MATH MathSciNet Google Scholar
J. Ferlez, C. Faloutsos, J. Leskovec, D. Mladenic, M. Grobelnik. Monitoring Network Evolution using MDL. IEEE ICDE Conference, 2008.
Google Scholar
M. Fiedler, C. Borgelt. Support computation for mining frequent subgraphs in a single graph. Workshop on Mining and Learning with Graphs (MLG’07), 2007.
Google Scholar
M.A. Fischler, R.A. Elschlager. The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1): pp 67–92, 1973.
Article Google Scholar
P.-O. Fjallstrom. Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science, Vol 3, no 10, 1998.
Google Scholar
G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.
MathSciNet Google Scholar
D. Fogaras, B. Racz, K. Csalogany, T. Sarlos. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3), 2005.
Google Scholar
M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, 1979.
Google Scholar
T. Gartner, P. Flach, S. Wrobel. On graph kernels: Hardness results and efficient alternatives. 16th Annual Conf. on Learning Theory, pp. 129–143, 2003.
Google Scholar
D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.
Google Scholar
R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs. International Conference in Pattern recognition (ICPR), 2002.
Google Scholar
S. Godbole, S. Sarawagi. Discriminative methods for multi-labeled classification. PAKDD Conference, pages 22–30, 2004.
Google Scholar
R. Goldman, J. Widom. DataGuides: Enable query formulation and optimization in semistructured databases. VLDB Conference, pages 436–445, 1997.
Google Scholar
L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. ACM SIGMOD Conference, pages 16–27, 2003.
Google Scholar
M. S. Gupta, A. Pathak, S. Chakrabarti. Fast algorithms for top-k personalized pagerank queries. WWW Conference, 2008.
Google Scholar
R. H. Guting. GraphDB: Modeling and querying graphs in databases. In VLDB Conference, pages 297–308, 1994.
Google Scholar
M. Gyssens, J. Paredaens, D. van Gucht. A graph-oriented object database model. In PODS Conference, pages 417–424, 1990.
Google Scholar
J. Han, J. Pei, Y. Yin. Mining Frequent Patterns without Candidate Generation. SIGMOD Conference, 2000.
Google Scholar
S. Harris, N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS Conference, 2003.
Google Scholar
S. Harris, N. Shadbolt. SPARQL query processing with conventional relational database systems. In SSWS Conference, 2005.
Google Scholar
M. Al Hasan, V. Chaoji, S. Salem, J. Besson, M. J. Zaki. ORIGAMI: Mining Representative Orthogonal Graph Patterns. ICDM Conference, 2007.
Google Scholar
D. Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz, 1999.
Google Scholar
T. Haveliwala. Topic-Sensitive Page Rank, World Wide Web Conference, 2002.
Google Scholar
H. He, A. K. Singh. Query Language and Access Methods for Graph Databases, appears as a chapter in Managing and Mining Graph Data, ed. Charu Aggarwal, Springer, 2010.
Google Scholar
H. He, Querying and mining graph databases. Ph.D. Thesis, UCSB, 2007.
Google Scholar
H. He, A. K. Singh. Efficient Algorithms for Mining Significant Substructures from Graphs with Quality Guarantees. ICDM Conference, 2007.
Google Scholar
H. He, H. Wang, J. Yang, P. S. Yu. BLINKS: Ranked keyword searches on graphs. SIGMOD Conference, 2007.
Google Scholar
[100] J. Huan, W. Wang, J. Prins, J. Yang. Spin: Mining Maximal Frequent Subgraphs from Graph Databases. KDD Conference, 2004.
Google Scholar
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, A. Tropsha. Mining Spatial Motifs from Protein Structure Graphs. Research in Computational Molecular Biology (RECOMB), pp. 308–315, 2004.
Google Scholar
V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4):525–539, 2006.
Article Google Scholar
V. Hristidis, Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB Conference, 2002.
Google Scholar
A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. PKDD Conference, pages 13–23, 2000.
Google Scholar
H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558–598, 1990.
Article MathSciNet Google Scholar
H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, C. Yu. TIMBER: A native XML database. In VLDB Journal, 11(4):274–291, 2002.
Article MATH Google Scholar
H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, K. Thompson. TAX: A tree algebra for XML. DBPL Conference, 2001.
Google Scholar
G. Jeh, J. Widom. Scaling personalized web search. In WWW, pages 271–279, 2003.
Google Scholar
J. L. Jenkins, A. Bender, J. W. Davies. In silico target fishing: Predicting biological targets from chemical structure. Drug Discovery Today, 3(4):413–421, 2006.
Article Google Scholar
R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, G. Agrawal. Discovering Frequent Topological Structures from Graph Datasets. ACM KDD Conference, 2005.
Google Scholar
R. Jin, H. Hong, H. Wang, Y. Xiang, N. Ruan. Computing Label-Constraint Reachability in Graph Databases. Under submission, 2009.
Google Scholar
R. Jin, Y. Xiang, N. Ruan, D. Fuhry. 3-HOP: A high-compression indexing scheme for reachability query. SIGMOD Conference, 2009.
Google Scholar
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB Conference, 2005.
Google Scholar
H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs, ICML, 2003.
Google Scholar
R. Kaushik, P. Bohannon, J. Naughton, H. Korth. Covering indexes for branching path queries. In SIGMOD Conference, June 2002.
Google Scholar
B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291–307.
Google Scholar
M.-S. Kim, J. Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks, VLDB Conference, 2009.
Google Scholar
J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46(5):pp. 604–632, 1999.
Article MATH MathSciNet Google Scholar
R.I. Kondor, J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. ICML Conference, pp. 315–322, 2002.
Google Scholar
M. Koyuturk, A. Grama, W. Szpankowski. An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks. Bioinformatics, 20:1200–207, 2004.
Article Google Scholar
T. Kudo, E. Maeda, Y. Matsumoto. An Application of Boosting to Graph Classification, NIPS Conf. 2004.
Google Scholar
R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.
Google Scholar
M. Kuramochi, G. Karypis. Frequent subgraph discovery. ICDM Conference, pp. 313–320, Nov. 2001.
Google Scholar
M. Kuramochi, G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3): pp. 243–271, 2005.
Article MathSciNet Google Scholar
J. Larrosa, G. Valiente. Constraint satisfaction algorithms for graph pattern matching. Mathematical Structures in Computer Science, 12(4): pp. 403–422, 2002.
Article MATH MathSciNet Google Scholar
M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. CIKM Conference, 2002.
Google Scholar
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. S. Glance. Cost-effective outbreak detection in networks. KDD Conference, pp. 420–429, 2007.
Google Scholar
J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs, SDM Conference, 2007.
Google Scholar
J. Leskovec, J. Kleinberg, C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. ACM KDD Conference, 2005.
Google Scholar
J. Leskovec, E. Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network, WWW Conference, 2008.
Google Scholar
J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins. Microscopic Evolution of Social Networks, ACM KDD Conference, 2008.
Google Scholar
Q. Li, B. Moon. Indexing and querying XML data for regular path expressions. In VLDB Conference, pages 361–370, September 2001.
Google Scholar
W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.
Google Scholar
L. Lim, H. Wang, M. Wang. Semantic Queries in Databases: Problems and Challenges. CIKM Conference, 2009.
Google Scholar
Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, B. L. Tseng. FacetNet: A framework for analyzing communities and their evolutions in dynamic networks. WWW Conference, 2008.
Google Scholar
C. Liu, X. Yan, H. Yu, J. Han, P. S. Yu. Mining Behavior Graphs for “Backtrace” of Noncrashing Bugs. SDM Conference, 2005.
Google Scholar
C. Liu, X. Yan, L. Fei, J. Han, S. P. Midkiff. SOBER: Statistical Model-Based Bug Localization. SIGSOFT Software Engineering Notes, 30(5):286–295, 2005.
Article Google Scholar
Q. Lu, L. Getoor. Link-based classification. ICML Conference, pages 496–503, 2003.
Google Scholar
F. Manola, E. Miller. RDF Primer. W3C, http://www.w3.org/TR/rdf-primer/,2004.
A. McGregor. Finding Graph Matchings in Data Streams. APPROX-RANDOM, pp. 170–181, 2005.
Google Scholar
T. Milo and D. Suciu. Index structures for path expression. In ICDT Conference, pages 277–295, 1999.
Google Scholar
S. Navlakha, R. Rastogi, N. Shrivastava. Graph Summarization with Bounded Error. ACMSIGMOD Conference, pp. 419–432, 2008.
Google Scholar
M. Neuhaus, H. Bunke. Self-organizing maps for learning the edit costs in graph matching. IEEE Transactions on Systems, Man, and Cybernetics, 35(3) pp. 503–514, 2005.
Article Google Scholar
M. Neuhaus, H. Bunke. Automatic learning of cost functions for graph edit distance. Information Sciences, 177(1), pp 239–247, 2007.
Article MATH MathSciNet Google Scholar
M. Neuhaus, H. Bunke. Bridging the Gap Between Graph Edit Distance and Kernel Machines. World Scientific, 2007.
Google Scholar
M. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 2006.
Google Scholar
M. E. J. Newman. The spread of epidemic disease on networks, Phys. Rev. E 66, 016128, 2002.
Google Scholar
J. Pei, D. Jiang, A. Zhang. On Mining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.
Google Scholar
Nidhi, M. Glick, J. Davies, J. Jenkins. Prediction of biological targets for compounds using multiple-category bayesian models trained on chemogenomics databases. J Chem Inf Model, 46:1124–1133, 2006.
Article Google Scholar
S. Nijssen, J. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of SIGKDD, pages 647–652, 2004.
Google Scholar
L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
Google Scholar
Z. Pan, J. Heflin. DLDB: Extending relational databases to support Semantic Web queries. In PSSS Conference, 2003.
Google Scholar
J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.
Google Scholar
E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF. W3C, URL: http://www.w3.org/TR/rdf-sparql-query/,2007.
L. Qin, J.-X. Yu, L. Chang. Keyword search in databases: The power of RDBMS. SIGMOD Conference, 2009.
Google Scholar
S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405–416, 2003.
Google Scholar
S. Ranu, A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. ICDE Conference, 2009.
Google Scholar
M. Rattigan, M. Maier, D. Jensen. Graph Clustering with Network Sructure Indices. ICML, 2007.
Google Scholar
P. R. Raw, B. Moon. PRIX: Indexing and querying XML using prufer sequences. ICDE Conference, 2004.
Google Scholar
J. W. Raymond, P. Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comp. Aided Mol. Des., 16(7):521–533, 2002.
Article Google Scholar
K. Riesen, X. Jiang, H. Bunke. Exact and Inexact Graph Matching: Methodology and Applications, appears as a chapter in Managing and Mining Graph Data, ed. Charu Aggarwal, Springer, 2010.
Google Scholar
H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. GBoost: A mathematical programming approach to graph classification and regression. Machine Learning, 2008.
Google Scholar
F. Sams-Dodd. Target-based drug discovery: is something wrong? Drug Discov Today, 10(2):139–147, Jan 2005.
Article Google Scholar
P. Sarkar, A. Moore, A. Prakash. Fast Incremental Proximity Search in Large Graphs, ICML Conference, 2008.
Google Scholar
P. Sarkar, A. Moore. Fast Dynamic Re-ranking of Large Graphs, WWW Conference, 2009.
Google Scholar
A. D. Sarma, S. Gollapudi, R. Panigrahy. Estimating PageRank in Graph Streams, ACM PODS Conference, 2008.
Google Scholar
V. Satuluri, S. Parthasarathy. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery, ACM KDD Conference, 2009.
Google Scholar
R. Schenkel, A. Theobald, G. Weikum. Hopi: An efficient connection index for complex XML document collections. EDBT Conference, 2004.
Google Scholar
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. VLDB Conference, 1999.
Google Scholar
N. Stiefl, I. A. Watson, K. Baumann, A. Zaliani. Erg: 2d pharmacophore descriptor for scaffold hopping. J. Chem. Info. Model., 46:208–220, 2006.
Article Google Scholar
J. Sun, S. Papadimitriou, C. Faloutsos, P. Yu. GraphScope: Parameter Free Mining of Large Time-Evolving Graphs, ACM KDD Conference, 2007.
Google Scholar
S. J. Swamidass, J. Chen, J. Bruand, P. Phung, L. Ralaivola, P. Baldi. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1):359–368, 2005.
Article Google Scholar
L. Tang, H. Liu, J. Zhang, Z. Nazeri. Community evolution in dynamic multi-mode networks. ACM KDD Conference, 2008.
Google Scholar
B. Taskar, P. Abbeel, D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485–492, 2002.
Google Scholar
H. Tong, C. Faloutsos, J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, pages 613–622, 2006.
Google Scholar
S. TrißI, U. Leser. Fast and practical indexing and querying of very large graphs. SIGMOD Conference, 2007.
Google Scholar
A. A. Tsay, W. S. Lovejoy, D. R. Karger. Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383–413, 1999.
Article MathSciNet Google Scholar
K. Tsuda, W. S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004.
Article Google Scholar
K. Tsuda, H. Saigo. Graph Classification, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
Google Scholar
J.R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of the Association for Computing Machinery, 23(1): pp. 31–42, 1976.
MathSciNet Google Scholar
N. Vanetik, E. Gudes, S. E. Shimony. Computing Frequent Graph Patterns from Semi-structured Data. IEEE ICDM Conference, 2002.
Google Scholar
R. Volz, D. Oberle, S. Staab, and B. Motik. KAON SERVER: A Semantic Web Management System. In WWW Conference, 2003.
Google Scholar
H. Wang, C. Aggarwal. A Survey of Algorithms for Keyword Search on Graph Data. appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
Google Scholar
H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.
Google Scholar
H. Wang, S. Park, W. Fan, P. S. Yu. ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. In SIGMOD Conference, 2003.
Google Scholar
H. Wang, X. Meng. On the Sequencing of Tree Structures for XML Indexing. In ICDE Conference, 2005.
Google Scholar
Y. Wang, D. Chakrabarti, C. Wang, C. Faloutsos. Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint, SRDS, pp. 25–34, 2003.
Google Scholar
N. Wale, G. Karypis. Target identification for chemical compounds using target-ligand activity data and ranking based methods. Technical Report TR-08-035, University of Minnesota, 2008.
Google Scholar
N. Wale, G. Karypis, I. A. Watson. Method for effective virtual screening and scaffold-hopping in chemical compounds. Comput Syst Bioinformatics Conf, 6:403–414, 2007.
Article Google Scholar
N. Wale, X. Ning, G. Karypis. Trends in Chemical Graph Data Mining, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
Google Scholar
N. Wale, I. A. Watson, G. Karypis. Indirect similarity based methods for effective scaffold-hopping in chemical compounds. J. Chem. Info. Model., 48(4):730–741, 2008.
Article Google Scholar
N. Wale, I. A. Watson, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14:347–375, 2008.
Article Google Scholar
C. Weiss, P. Karras, A. Bernstein. Hexastore: Sextuple Indexing for Semantic Web Data Management. In VLDB Conference, 2008.
Google Scholar
K. Wilkinson. Jena property table implementation. In SSWS Conference, 2006.
Google Scholar
K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in Jena2. In SWDB Conference, 2003.
Google Scholar
Y. Xu, Y. Papakonstantinou. Efficient LCA based keyword search in XML data. EDBT Conference, 2008.
Google Scholar
Y. Xu, Y.Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. ACM SIGMOD Conference, 2005.
Google Scholar
X. Yan, J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.
Google Scholar
X. Yan, H. Cheng, J. Han, P. S. Yu. Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.
Google Scholar
X. Yan, J. Han. Gspan: Graph-based Substructure Pattern Mining. ICDM Conference, 2002.
Google Scholar
X. Yan, P. S. Yu, J. Han. Graph indexing: A frequent structure-based approach. SIGMOD Conference, 2004.
Google Scholar
X. Yan, P. S. Yu, J. Han. Substructure similarity search in graph databases. SIGMOD Conference, 2005.
Google Scholar
X. Yan, B. He, F. Zhu, J. Han. Top-K Aggregation Queries Over Large Networks, IEEE ICDE Conference, 2010.
Google Scholar
J. X. Yu, J. Cheng. Graph Reachability Queries: A Survey, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
Google Scholar
M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.
Google Scholar
T. Zhang, A. Popescul, B. Dom. Linear prediction models with graph regularization for web-page categorization. ACM KDD Conference, pages 821–826, 2006.
Google Scholar
Q. Zhang, I. Muegge. Scaffold hopping through virtual screening using 2d and 3d similarity descriptors: Ranking, voting and consensus scoring. J. Chem. Info. Model., 49:1536–1548, 2006.
Google Scholar
P. Zhao, J. Yu, P. Yu. Graph indexing: tree + delta >= graph. VLDB Conference, 2007.
Google Scholar
D. Zhou, J. Huang, B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. ICML Conference, pages 1036–1043, 2005.
Google Scholar
D. Zhou, O. Bousquet, J. Weston, B. Scholkopf. Learning with local and global consistency. Advances in Neural Information Processing Systems (NIPS) 16, pages 321–328. MIT Press, 2004.
Google Scholar
X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. ICML Conference, pages 912–919, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Hawthorne, NY, 10532, USA
Charu C. Aggarwal
Microsoft Research Asia, Beijing, China, 100190
Haixun Wang

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Haixun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C., Wang, H. (2010). Graph Data Management and Mining: A Survey of Algorithms and Applications. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_2

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_2
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics