Skip to main content

Challenges and Approaches for Large Graph Analysis Using Map/Reduce Paradigm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8302))

Abstract

Analysis of big graphs has become possible due to the convergence of many technologies such as server farms, new paradigms for processing data in massively parallel ways (e.g., Map/Reduce, Bulk Synchronous Parallelization), as well as the ability to process unstructured data. This has allowed one to solve problems that were not possible (or extremely time consuming) earlier. Many algorithms are being mapped to new paradigms to deal with larger versions with a meaningful response time.

This paper analyses a few related problems in the context of graph analysis. Our goal is to analyse the challenges of adapting/extending algorithms for graph analysis (graph mining, graph matching and graph search) to exploit the Map/Reduce paradigm for very large graph analysis in order to achieve scalability. We intend to explore alternative paradigms that may be better suited for a class of applications.

As an example, finding interesting and repetitive patterns from graphs forms the crux of graph mining. For processing large social network and other graphs, extant main memory and disk-based) approaches are not appropriate. Hence, it is imperative that we explore massively parallel paradigms for their processing to achieve scalability. This is also true for other problems such as in-exact or approximate match of graphs, and answering queries on large data sets represented as graphs (e.g. Freebase). In this paper, we identify the challenges of harmonizing the existing approaches with Map/Reduce and our preliminary approaches to solve these problems using the new paradigm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://hadoop.apache.org/

  2. http://hama.apache.org/

  3. http://www.freebase.com

  4. Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. Technical report, Stanford University (December 2011)

    Google Scholar 

  5. Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On Storing Voluminous RDF Descriptions: The Case of Web Portal Catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)

    Google Scholar 

  6. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: World Wide Web Conference Series, pp. 41–50 (2010)

    Google Scholar 

  7. Bayati, M., Gleich, D.F., Saberi, A., Wang, Y.: Message-passing algorithms for sparse network alignment, vol. 7, p. 3 (2013)

    Google Scholar 

  8. Bollobs, B., Chung, F.R.K.: The Diameter of a Cycle Plus a Random Matching. Siam Journal on Discrete Mathematics 1, 328–333 (1988)

    Article  MathSciNet  Google Scholar 

  9. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognition Letters 1, 245–253 (1983)

    Article  MATH  Google Scholar 

  10. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph.  19, 255–259 (1998)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: Simplied Data Processing on Large Clusters. In: Operating Systems Design and Implementation, pp. 137–150 (2004)

    Google Scholar 

  12. Deshpande, M., Kuramochi, M., Karypis, G.: Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)

    Google Scholar 

  13. Holder, L.B., Cook, D.J., Djoko, S.: Substucture Discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)

    Google Scholar 

  14. Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Jin, C., Bhowmick, S.S., Xiao, X., Cheng, J., Choi, B.: GBLENDER: towards blending visual query formulation and query processing in graph databases. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 111–122. ACM, New York (2010)

    Chapter  Google Scholar 

  16. Jin, C., Bhowmick, S.S., Xiao, X., Choi, B., Zhou, S.: Gblender: visual subgraph query formulation meets query processing. In: SIGMOD Conference, pp. 1327–1330 (2011)

    Google Scholar 

  17. Kumar, R., Raghavant, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E.: Stochastic models for the Web graph. In: IEEE Symposium on Foundations of Computer Science, pp. 57–65 (2000)

    Google Scholar 

  18. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010)

    Chapter  Google Scholar 

  19. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)

    Google Scholar 

  20. Mongiovì, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinformatics and Computational Biology 8(2), 199–218 (2010)

    Article  Google Scholar 

  21. Neumann, T., Weikum, G.: The RDF3X engine for scalable management of RDF data. The Vldb Journal 19, 19:91–19:113 (2010)

    Google Scholar 

  22. Padmanabhan, S., Chakravarthy, S.: HDB-Subdue: A Scalable Approach to Graph Mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 325–338. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  23. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. In: International Conference on Data Engineering, pp. 215–224 (2001)

    Google Scholar 

  24. Pelillo, M., Mestre, V.: Replicator Equations. Maximal Cliques, and Graph Isomorphism 11, 1933–1955 (1999)

    Google Scholar 

  25. Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)

    Google Scholar 

  26. Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: World Wide Web Conference Series, pp. 607–614 (2011)

    Google Scholar 

  27. Tian, Y., Patel, J.M.: Tale: A tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)

    Google Scholar 

  28. Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: KDD, pp. 737–746 (2007)

    Google Scholar 

  29. Umeyama, S.: An Eigendecomposition Approach to Weighted Graph Matching Problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 695–703 (1988)

    Article  MATH  Google Scholar 

  30. Valiant, L.G.: A bridging model for parallel computation, vol. 33, pp. 103–111. ACM, New York (August 1990)

    Google Scholar 

  31. Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)

    Google Scholar 

  32. Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD Conference, pp. 335–346 (2004)

    Google Scholar 

  33. Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. In: Applied Mathematics Letters, vol. 21, pp. 86–94 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Das, S., Chakravarthy, S. (2013). Challenges and Approaches for Large Graph Analysis Using Map/Reduce Paradigm. In: Bhatnagar, V., Srinivasa, S. (eds) Big Data Analytics. BDA 2013. Lecture Notes in Computer Science, vol 8302. Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03689-2_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03688-5

  • Online ISBN: 978-3-319-03689-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics