Skip to main content

CC-MR – Finding Connected Components in Huge Graphs with MapReduce

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7523)


The detection of connected components in graphs is a well-known problem arising in a large number of applications including data mining, analysis of social networks, image analysis and a lot of other related problems. In spite of the existing very efficient serial algorithms, this problem remains a subject of research due to increasing data amounts produced by modern information systems which cannot be handled by single workstations. Only highly parallelized approaches on multi-core-servers or computer clusters are able to deal with these large-scale data sets. In this work we present a solution for this problem for distributed memory architectures, and provide an implementation for the well-known MapReduce framework developed by Google. Our algorithm CC-MR significantly outperforms the existing approaches for the MapReduce framework in terms of the number of necessary iterations, communication costs and execution runtime, as we show in our experimental evaluation on synthetic and real-world data. Furthermore, we present a technique for accelerating our implementation for datasets with very heterogeneous component sizes as they often appear in real data sets.


  • Parallel Algorithm
  • Component Size
  • Star Graph
  • MapReduce Framework
  • Forward Edge

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bus, L., Tvrdík, P.: A Parallel Algorithm for Connected Components on Distributed Memory Machines. In: Cotronis, Y., Dongarra, J. (eds.) PVM/MPI 2001. LNCS, vol. 2131, pp. 280–287. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  2. Chin, F.Y.L., Lam, J., Chen, I.-N.: Efficient parallel algorithms for some graph problems. Commun. ACM 25(9), 659–665 (1982)

    MathSciNet  MATH  CrossRef  Google Scholar 

  3. Cohen, J.: Graph twiddling in a MapReduce world. Computing in Science and Engineering 11(4), 29–41 (2009)

    CrossRef  Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  5. Greiner, J.: A comparison of parallel algorithms for connected components. In: SPAA, pp. 16–25 (1994)

    Google Scholar 

  6. Hirschberg, D.S., Chandra, A.K., Sarwate, D.V.: Computing connected components on parallel computers. Commun. ACM 22(8), 461–464 (1979)

    MathSciNet  MATH  CrossRef  Google Scholar 

  7. Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system. In: ICDM, pp. 229–238 (2009)

    Google Scholar 

  8. Krishnamurthy, A., Lumetta, S., Culler, D., Yelick, K.: Connected components on distributed memory machines. DIMACS Implementation Challenge 30, 1 (1997)

    MathSciNet  Google Scholar 

  9. Lattanzi, S., Moseley, B., Suri, S., Vassilvitskii, S.: Filtering: a method for solving graph problems in mapreduce. In: SPAA, pp. 85–94 (2011)

    Google Scholar 

  10. Rastogi, V., Machanavajjhala, A., Chitnis, L., Sarma, A.D.: Finding connected components on map-reduce in logarithmic rounds. Computing Research Repository (CoRR), abs/1203.5387 (2012)

    Google Scholar 

  11. Shiloach, Y., Vishkin, U.: An o(log n) parallel connectivity algorithm. J. Algorithms 3(1), 57–67 (1982)

    MathSciNet  MATH  CrossRef  Google Scholar 

  12. Wu, B., Du, Y.: Cloud-based connected component algorithm. In: Artificial Intelligence and Computational Intelligence (AICI), vol. 3, pp. 122–126 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seidl, T., Boden, B., Fries, S. (2012). CC-MR – Finding Connected Components in Huge Graphs with MapReduce. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33459-7

  • Online ISBN: 978-3-642-33460-3

  • eBook Packages: Computer ScienceComputer Science (R0)