Abstract
Analysis of big graphs has become possible due to the convergence of many technologies such as server farms, new paradigms for processing data in massively parallel ways (e.g., Map/Reduce, Bulk Synchronous Parallelization), as well as the ability to process unstructured data. This has allowed one to solve problems that were not possible (or extremely time consuming) earlier. Many algorithms are being mapped to new paradigms to deal with larger versions with a meaningful response time.
This paper analyses a few related problems in the context of graph analysis. Our goal is to analyse the challenges of adapting/extending algorithms for graph analysis (graph mining, graph matching and graph search) to exploit the Map/Reduce paradigm for very large graph analysis in order to achieve scalability. We intend to explore alternative paradigms that may be better suited for a class of applications.
As an example, finding interesting and repetitive patterns from graphs forms the crux of graph mining. For processing large social network and other graphs, extant main memory and disk-based) approaches are not appropriate. Hence, it is imperative that we explore massively parallel paradigms for their processing to achieve scalability. This is also true for other problems such as in-exact or approximate match of graphs, and answering queries on large data sets represented as graphs (e.g. Freebase). In this paper, we identify the challenges of harmonizing the existing approaches with Map/Reduce and our preliminary approaches to solve these problems using the new paradigm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. Technical report, Stanford University (December 2011)
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On Storing Voluminous RDF Descriptions: The Case of Web Portal Catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: World Wide Web Conference Series, pp. 41–50 (2010)
Bayati, M., Gleich, D.F., Saberi, A., Wang, Y.: Message-passing algorithms for sparse network alignment, vol. 7, p. 3 (2013)
Bollobs, B., Chung, F.R.K.: The Diameter of a Cycle Plus a Random Matching. Siam Journal on Discrete Mathematics 1, 328–333 (1988)
Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognition Letters 1, 245–253 (1983)
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. 19, 255–259 (1998)
Dean, J., Ghemawat, S.: MapReduce: Simplied Data Processing on Large Clusters. In: Operating Systems Design and Implementation, pp. 137–150 (2004)
Deshpande, M., Kuramochi, M., Karypis, G.: Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)
Holder, L.B., Cook, D.J., Djoko, S.: Substucture Discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Jin, C., Bhowmick, S.S., Xiao, X., Cheng, J., Choi, B.: GBLENDER: towards blending visual query formulation and query processing in graph databases. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 111–122. ACM, New York (2010)
Jin, C., Bhowmick, S.S., Xiao, X., Choi, B., Zhou, S.: Gblender: visual subgraph query formulation meets query processing. In: SIGMOD Conference, pp. 1327–1330 (2011)
Kumar, R., Raghavant, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E.: Stochastic models for the Web graph. In: IEEE Symposium on Foundations of Computer Science, pp. 57–65 (2000)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)
Mongiovì, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinformatics and Computational Biology 8(2), 199–218 (2010)
Neumann, T., Weikum, G.: The RDF3X engine for scalable management of RDF data. The Vldb Journal 19, 19:91–19:113 (2010)
Padmanabhan, S., Chakravarthy, S.: HDB-Subdue: A Scalable Approach to Graph Mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 325–338. Springer, Heidelberg (2009)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. In: International Conference on Data Engineering, pp. 215–224 (2001)
Pelillo, M., Mestre, V.: Replicator Equations. Maximal Cliques, and Graph Isomorphism 11, 1933–1955 (1999)
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: World Wide Web Conference Series, pp. 607–614 (2011)
Tian, Y., Patel, J.M.: Tale: A tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)
Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: KDD, pp. 737–746 (2007)
Umeyama, S.: An Eigendecomposition Approach to Weighted Graph Matching Problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 695–703 (1988)
Valiant, L.G.: A bridging model for parallel computation, vol. 33, pp. 103–111. ACM, New York (August 1990)
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD Conference, pp. 335–346 (2004)
Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. In: Applied Mathematics Letters, vol. 21, pp. 86–94 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Das, S., Chakravarthy, S. (2013). Challenges and Approaches for Large Graph Analysis Using Map/Reduce Paradigm. In: Bhatnagar, V., Srinivasa, S. (eds) Big Data Analytics. BDA 2013. Lecture Notes in Computer Science, vol 8302. Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-03689-2_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03688-5
Online ISBN: 978-3-319-03689-2
eBook Packages: Computer ScienceComputer Science (R0)