Abstract
Databases and data warehouse systems have been evolving from handling normalized spreadsheets stored in relational databases, to managing and analyzing diverse application-oriented data with complex interconnecting structures. Responding to this emerging trend, graphs have been growing rapidly and showing their critical importance in many applications, such as the analysis of XML, social networks, Web, biological data, multimedia data and spatiotemporal data. Can we extend useful functions of databases and data warehouse systems to handle graph structured data? In particular, OLAP (On-Line Analytical Processing) has been a popular tool for fast and user-friendly multi-dimensional analysis of data warehouses. Can we OLAP graphs? Unfortunately, to our best knowledge, there are no OLAP tools available that can interactively view and analyze graph data from different perspectives and with multiple granularities. In this paper, we argue that it is critically important to OLAP graph structured data and propose a novel Graph OLAP framework. According to this framework, given a graph dataset with its nodes and edges associated with respective attributes, a multi-dimensional model can be built to enable efficient on-line analytical processing so that any portions of the graphs can be generalized/specialized dynamically, offering multiple, versatile views of the data. The contributions of this work are three-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the Graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Second, we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. As we can see, due to the increased structural complexity of data, aggregated graphs that depend on the underlying “network” properties of the graph dataset are much harder to compute than their traditional OLAP counterparts. Third, to provide more flexible, interesting and informative OLAP of graphs, we further propose a discovery-driven multi-dimensional analysis model to ensure that OLAP is performed in an intelligent manner, guided by expert rules and knowledge discovery processes. We outline such a framework and discuss some challenging research issues for discovery-driven Graph OLAP.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Archambault D, Munzner T, Auber D (2007) Topolayout: Multilevel graph layout by topological features. IEEE Trans Vis Comput Graph 13(2): 305–317
Beyer KS, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: SIGMOD Conference, pp 359–370
Boldi P, Vigna S (2004) The WebGraph framework I: Compression techniques. In: WWW, pp 595–602
Chakrabarti D, Faloutsos C (2006) Graph mining: Laws, generators, and algorithms. ACM Comput Surv 38(1)
Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16(1): 53–96
Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12(3): 355–378
Chaudhuri S, Dayal U (1997) An overview of data warehousing and olap technology. SIGMOD Record 26(1): 65–74
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge
Fang M, Shivakumar N, Garcia-Molina H, Motwani R, Ullman JD (1998) Computing iceberg queries efficiently. In: VLDB, pp 299–310
Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: VLDB, pp 721–732
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min Knowl Discov 1(1): 29–53
Gupta A, Mumick IS (1999) Materialized Views: Techniques, Implementations, and Applications. MIT Press
Herman I, Melançon G, Marshall MS (2000) Graph visualization and navigation in information visualization: a survey. IEEE Trans Vis Comput Graph 6(1): 24–43
Jeh G, Widom J (2004) Mining the space of graph properties. In: KDD, pp 187–196
Kossinets G, Kleinberg JM, Watts DJ (2008) The structure of information pathways in a social communication network. In: KDD, pp 435–443
Leskovec J, Kleinberg JM, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: KDD, pp 177–187
Li X, Han J, Gonzalez H (2004) High-dimensional olap: a minimal cubing approach. In: VLDB, pp 528–539
Lu W, Janssen JCM, Milios EE, Japkowicz N, Zhang Y (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1): 105–129
Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: SIGMOD conference, pp 419–432
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPS, pp 849–856
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: SIGMOD Conference, pp 13–24
Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: ICDE, pp 405–416
Sarawagi S, Agrawal R, Megiddo N (1998) Discovery-driven exploration of olap data cubes. In: EDBT, pp 168–182
Sen P, Namata GM, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. CS-TR-4905, University of Maryland, College Park
Stephenson K, Zelen M (1989) Rethinking centrality: Methods and examples. Soc Netw 11(1): 1–37
Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In: EDBT, pp 565–576
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: SIGMOD conference, pp 567–580
Wang N, Parthasarathy S, Tan K-L, Tung AKH (2008) Csv: visualizing and mining cohesive subgraphs. In: SIGMOD conference, pp 445–458
Wu AY, Garland M, Han J (2004) Mining scale-free networks using geodesic clustering. In: KDD, pp 719–724
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng AFM, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Zhao Y, Deshpande P, Naughton JF (1997) An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD conference, pp 159–170
Acknowledgments
We thank Dr. Raghu Ramakrishnan for useful comments and discussions on this article and related researches. The work was supported in part by the U.S. National Science Foundation grants IIS-08-42769 and BDI-05-15813, Office of Naval Research (ONR) grant N00014-08-1-0565, and NASA grant NNX08AC35A.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Chen, C., Yan, X., Zhu, F. et al. Graph OLAP: a multi-dimensional framework for graph data analysis. Knowl Inf Syst 21, 41–63 (2009). https://doi.org/10.1007/s10115-009-0228-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0228-9