SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks

Wu, Zhiang; Gao, Guangliang; Bu, Zhan; Cao, Jie

doi:10.1007/s10586-015-0504-2

SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks

Published: 09 November 2015

Volume 19, pages 211–221, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Zhiang Wu¹,
Guangliang Gao²,
Zhan Bu ORCID: orcid.org/0000-0002-7582-8203¹ &
…
Jie Cao¹

442 Accesses
8 Citations
Explore all metrics

Abstract

Community detection is a classic and very difficult task in complex network analysis. As the increasingly explosion of social media, scaling community detection methods to large networks has attracted considerable recent interests. In this paper, we propose a novel SIMPLifying and Ensembling (SIMPLE) framework for parallel community detection. It employs the random link sampling to simplify the network and obtain basic partitionings on every sampled graphs. Then, the K-means-based Consensus Clustering is used to ensemble a number of basic partitionings to get high-quality community structures. All of phases in SIMPLE, including random sampling, sampled graph partitioning, and consensus clustering, are encapsulated into MapReduce for parallel execution. Experiments on six real-world social networks analyze key parameters and factors inside SIMPLE, and demonstrate both effectiveness and efficiency of the SIMPLE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Parallel and Scalable Framework for Non-overlapping Community Detection Algorithms

A novel parallel community detection scheme based on label propagation

Article 16 December 2017

PLinkSHRINK: a parallel overlapping community detection algorithm with Link-Graph for large networks

Article 05 November 2019

References

Apache Software Foundation: Apache Mahout: Scalable machine-learning and data-mining library. http://mahout.apache.org
Bernard, T., Bui, A., Pilard, L., Sohier, D.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)
Article Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)
Article Google Scholar
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 493–507 (1952)
Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of KDD, pp. 621–628 (2011)
Clauset, A.: Finding local community structure in networks. Phys. Rev. E 72, 026132 (2005)
Article Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Article MathSciNet Google Scholar
Gregori, E., Lenzini, L., Mainardi, S.: Parallel k-clique community detection on large-scale networks. IEEE Trans. Parallel Distrib. Syst. 24(8), 1651–1660 (2013)
Article Google Scholar
Hubler, C., Kriegel, H.P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proceedings of the 2008 IEEE 7th International Conference on Data Mining, pp. 283–292. IEEE (2008)
Hui, P., Yoneki, E., Chan, S.Y., Crowcroft, J.: Distributed community detection in delay tolerant networks. In: Proceedings of 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture, p. 7. ACM (2007)
Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th Conference on Design Automation, pp. 343–348 (1999)
LaSalle, D., Karypis, G.: Multi-threaded modularity based graph clustering using the multilevel paradigm. J. Parallel Distrib. Comput. 76, 66–80 (2015)
Article Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 177–187. ACM (2005)
Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Advances in Neural Information Processing Systems, pp. 539–547 (2012)
Li, F., Ooi, B.C., Ozsu, M., Wu, S.: Distributed data management using mapreduce. ACM Comput. Surv. 46(3), 31 (2013)
Google Scholar
Moon, S., Lee, J.G., Kang, M.: Scalable community detection from networks by computing edge betweenness on mapreduce. In: International Conference on Big Data and Smart Computing (BIGCOMP), pp. 145–148. IEEE (2014)
Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 66–113 (2004)
Article Google Scholar
Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)
Article Google Scholar
Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)
Article MathSciNet MATH Google Scholar
Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Article Google Scholar
Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24(3), 515–554 (2012)
Prat-Pérez, A., Dominguez-Sal, D., Brunat, J.M., Larriba-Pey, J.L.: Shaping communities out of triangles. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1677–1681. ACM (2012)
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.: High quality, scalable and parallel community detection for large real graphs. In: 23rd International World Wide Web Conference, WWW ’14, Seoul, 7–11 April, pp. 225–236 (2014)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036–106 (2007)
Article Google Scholar
Serrano, M.Á., Boguñá, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106(16), 6483–6488 (2009)
Article Google Scholar
Shi, J., Xue, W., Wang, W., Zhang, Y., Yang, B., Li, J.: Scalable community detection in massive social networks using mapreduce. IBM J. Res. Dev. 57(3/4), 12-1 (2013)
Google Scholar
Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool Publishers, San Rafael (2010)
Google Scholar
Traud, A.L., Kelsic, E.D., Mucha, P.J., Porter, M.A.: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53(3), 526–543 (2011)
Article MathSciNet Google Scholar
White, T.: Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media (2009)
Wu, J., Liu, H., Xiong, H., Cao, J.: A theoretic framework of k-means-based consensus clustering. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1799–1805. AAAI Press (2013)
Wu, Z., Cao, J., Wu, J., Wang, Y., Liu, C.: Detecting genuine communities from large-scale social networks: a pattern-based method. Comput. J. 57(9), 1343–1357 (2014)
Article Google Scholar
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of ICDM, pp. 745–754 (2012)
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596. ACM (2013)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI’12, pp. 2–2. USENIX Association, Berkeley, CA (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, vol. 10, p. 10 (2010)
Zhang, Y., Wang, J., Wang, Y., Zhou, L.: Parallel community detection on large networks with propinquity dynamics. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 997–1006. ACM (2009)
Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Proceedings of the 1st International Conference on Cloud Computing, CloudCom ’09, pp. 674–679. Springer (2009)

Download references

Acknowledgments

This research was partially supported by National Natural Science Foundation of China under Grants 71571093, 71372188 and 61502222, National Center for International Joint Research on E-Business Information Processing under Grant 2013B01035, National Key Technologies R&D Program of China under Grants 2013BAH16F01 and 2013BAH16F04, Industry Projects in Jiangsu S&T Pillar Program under Grant BE2014141, Natural Science Foundation of Jiangsu Province of China under Grant SBK2015042593, and Key/Surface Projects of Natural Science Research in Jiangsu Provincial Colleges and Universities under Grants 12KJA520001, 14KJA520001, 14KJB520015, 15KJB520012 and 15KJB520011.

Author information

Authors and Affiliations

Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, China
Zhiang Wu, Zhan Bu & Jie Cao
College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Guangliang Gao

Authors

Zhiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Guangliang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Bu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhan Bu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Z., Gao, G., Bu, Z. et al. SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks . Cluster Comput 19, 211–221 (2016). https://doi.org/10.1007/s10586-015-0504-2

Download citation

Received: 12 March 2015
Revised: 24 October 2015
Accepted: 26 October 2015
Published: 09 November 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10586-015-0504-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks

Abstract

Access this article

Similar content being viewed by others

A Parallel and Scalable Framework for Non-overlapping Community Detection Algorithms

A novel parallel community detection scheme based on label propagation

PLinkSHRINK: a parallel overlapping community detection algorithm with Link-Graph for large networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks

Abstract

Access this article

Similar content being viewed by others

A Parallel and Scalable Framework for Non-overlapping Community Detection Algorithms

A novel parallel community detection scheme based on label propagation

PLinkSHRINK: a parallel overlapping community detection algorithm with Link-Graph for large networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation