Challenges and Approaches for Large Graph Analysis Using Map/Reduce Paradigm

Das, Soumyava; Chakravarthy, Sharma

doi:10.1007/978-3-319-03689-2_8

Challenges and Approaches for Large Graph Analysis Using Map/Reduce Paradigm

Soumyava Das¹⁸ &
Sharma Chakravarthy¹⁸

Conference paper

3493 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8302))

Abstract

Analysis of big graphs has become possible due to the convergence of many technologies such as server farms, new paradigms for processing data in massively parallel ways (e.g., Map/Reduce, Bulk Synchronous Parallelization), as well as the ability to process unstructured data. This has allowed one to solve problems that were not possible (or extremely time consuming) earlier. Many algorithms are being mapped to new paradigms to deal with larger versions with a meaningful response time.

This paper analyses a few related problems in the context of graph analysis. Our goal is to analyse the challenges of adapting/extending algorithms for graph analysis (graph mining, graph matching and graph search) to exploit the Map/Reduce paradigm for very large graph analysis in order to achieve scalability. We intend to explore alternative paradigms that may be better suited for a class of applications.

As an example, finding interesting and repetitive patterns from graphs forms the crux of graph mining. For processing large social network and other graphs, extant main memory and disk-based) approaches are not appropriate. Hence, it is imperative that we explore massively parallel paradigms for their processing to achieve scalability. This is also true for other problems such as in-exact or approximate match of graphs, and answering queries on large data sets represented as graphs (e.g. Freebase). In this paper, we identify the challenges of harmonizing the existing approaches with Map/Reduce and our preliminary approaches to solve these problems using the new paradigm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://hadoop.apache.org/
http://hama.apache.org/
http://www.freebase.com
Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. Technical report, Stanford University (December 2011)
Google Scholar
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On Storing Voluminous RDF Descriptions: The Case of Web Portal Catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)
Google Scholar
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: World Wide Web Conference Series, pp. 41–50 (2010)
Google Scholar
Bayati, M., Gleich, D.F., Saberi, A., Wang, Y.: Message-passing algorithms for sparse network alignment, vol. 7, p. 3 (2013)
Google Scholar
Bollobs, B., Chung, F.R.K.: The Diameter of a Cycle Plus a Random Matching. Siam Journal on Discrete Mathematics 1, 328–333 (1988)
Article MathSciNet Google Scholar
Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognition Letters 1, 245–253 (1983)
Article MATH Google Scholar
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. 19, 255–259 (1998)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplied Data Processing on Large Clusters. In: Operating Systems Design and Implementation, pp. 137–150 (2004)
Google Scholar
Deshpande, M., Kuramochi, M., Karypis, G.: Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)
Google Scholar
Holder, L.B., Cook, D.J., Djoko, S.: Substucture Discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Jin, C., Bhowmick, S.S., Xiao, X., Cheng, J., Choi, B.: GBLENDER: towards blending visual query formulation and query processing in graph databases. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 111–122. ACM, New York (2010)
Chapter Google Scholar
Jin, C., Bhowmick, S.S., Xiao, X., Choi, B., Zhou, S.: Gblender: visual subgraph query formulation meets query processing. In: SIGMOD Conference, pp. 1327–1330 (2011)
Google Scholar
Kumar, R., Raghavant, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E.: Stochastic models for the Web graph. In: IEEE Symposium on Foundations of Computer Science, pp. 57–65 (2000)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010)
Chapter Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)
Google Scholar
Mongiovì, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinformatics and Computational Biology 8(2), 199–218 (2010)
Article Google Scholar
Neumann, T., Weikum, G.: The RDF3X engine for scalable management of RDF data. The Vldb Journal 19, 19:91–19:113 (2010)
Google Scholar
Padmanabhan, S., Chakravarthy, S.: HDB-Subdue: A Scalable Approach to Graph Mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 325–338. Springer, Heidelberg (2009)
Chapter Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. In: International Conference on Data Engineering, pp. 215–224 (2001)
Google Scholar
Pelillo, M., Mestre, V.: Replicator Equations. Maximal Cliques, and Graph Isomorphism 11, 1933–1955 (1999)
Google Scholar
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)
Google Scholar
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: World Wide Web Conference Series, pp. 607–614 (2011)
Google Scholar
Tian, Y., Patel, J.M.: Tale: A tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)
Google Scholar
Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: KDD, pp. 737–746 (2007)
Google Scholar
Umeyama, S.: An Eigendecomposition Approach to Weighted Graph Matching Problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 695–703 (1988)
Article MATH Google Scholar
Valiant, L.G.: A bridging model for parallel computation, vol. 33, pp. 103–111. ACM, New York (August 1990)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD Conference, pp. 335–346 (2004)
Google Scholar
Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. In: Applied Mathematics Letters, vol. 21, pp. 86–94 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Texas at Arlington, USA
Soumyava Das & Sharma Chakravarthy

Authors

Soumyava Das
View author publications
You can also search for this author in PubMed Google Scholar
Sharma Chakravarthy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, South Asian University, Akhar Bhavan, 110021, Chanakyapuri, New Delhi, India
Vasudha Bhatnagar
International Institute of Information Technology, Bangalore, India
Srinath Srinivasa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, S., Chakravarthy, S. (2013). Challenges and Approaches for Large Graph Analysis Using Map/Reduce Paradigm. In: Bhatnagar, V., Srinivasa, S. (eds) Big Data Analytics. BDA 2013. Lecture Notes in Computer Science, vol 8302. Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-03689-2_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03688-5
Online ISBN: 978-3-319-03689-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics