Abstract
Graph clustering is successfully applied in various applications for finding similar patterns. Recently, deep learning- based autoencoder has been used efficiently for detecting disjoint clusters. However, in real-world graphs, vertices may belong to multiple clusters. Thus, it is obligatory to analyze the membership of vertices toward clusters. Furthermore, existing approaches are centralized and are inefficient in handling large graphs. In this paper, a deep learning-based model ‘DFuzzy’ is proposed for finding fuzzy clusters from large graphs in distributed environment. It performs clustering in three phases. In first phase, pre-training is performed by initializing the candidate cluster centers. Then, fine tuning is performed to learn the latent representations by mining the local information and capturing the structure using PageRank. Further, modularity is used to redefine clusters. In last phase, reconstruction error is minimized and final cluster centers are updated. Experiments are performed over real-life graph data, and the performance of DFuzzy is compared with four state-of-the-art clustering algorithms. Results show that DFuzzy scales up linearly to handle large graphs and produces better quality of clusters when compared to state-of-the-art clustering algorithms. It is also observed that deep structures can help in getting better graph representations and provide improved clustering performance.
Similar content being viewed by others
References
Apolloni B, Bassis S, Rota J, Galliani GL, Gioia M, Ferrari L (2016) A neurofuzzy algorithm for learning from complex granules. Granul Comput 1(4):225–246
Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized pagerank on mapreduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 973–984
Bampis CG, Maragos P, Bovik AC (2017) Graph-driven diffusion and random walk schemes for image segmentation. IEEE Trans Image Process 26(1):35–50
Banijamali E, Ghodsi A (2017) Fast spectral clustering using autoencoders and landmarks. arXiv preprint arXiv:1704.02345
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2):191–203
Bhatia V, Rani R (2017) A parallel fuzzy clustering algorithm for large graphs using pregel. Expert Syst Appl 78:135–144
Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access 2:514–525
Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3642–3649
Ciucci D (2016) Orthopairs and granular computing. Granul Comput 1(3):159–170
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Deng L, Yu D, Platt J (2012) Scalable stacking and learning for building deep architectures. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 2133–2136
Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103,018
Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146
Havens TC, Bezdek JC, Leckie C, Ramamohanarao K, Palaniswami M (2013) A soft modularity function for detecting fuzzy communities in social networks. IEEE Trans Fuzzy Syst 21(6):1170–1175
He T, Chan KC (2016) Evolutionary graph clustering for protein complex identification. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2016.2642107
Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154
Hutchinson B, Deng L, Yu D (2013) Tensor deep stacking networks. IEEE Trans Pattern Anal Mach Intell 35(8):1944–1957
Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst 27(2):303–325
Kianmehr K, Alshalalfa M, Alhajj R (2010) Fuzzy clustering-based discretization for gene expression classification. Knowl Inf Syst 24(3):441–465
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data. Accessed 15 Feb 2017
Lingras P, Haider F, Triff M (2016) Granular meta-clustering based on hierarchical, network, and temporal connections. Granular Comput 1(1):71–92
Liu L, Chen X, Liu M, Jia Y, Zhong J, Gao R, Zhao Y (2016) An influence power-based clustering approach with pagerank-like model. Appl Soft Comput 40:17–32
Liu L, Sun L, Chen S, Liu M, Zhong J (2016) K-prscan: a clustering method based on pagerank. Neurocomputing 175:65–80
Ludwig SA (2015) Mapreduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int J Mach Learn Cybern 6(6):923–934
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77(1):016,107
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning, pp 2014–2023
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Stanford InfoLab, Stanford
Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy c-means model. IEEE Trans Fuzzy Syst 3(3):370–379
Peters G, Weber R (2016) Dcc: a framework for dynamic granular clustering. Granul Comput 1(1):1–11
Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 873–880
Robinson I, Webber J, Eifrem E (2015) Graph databases new opportunities for connected data. O’Reilly Media, Newton
Sanchez MA, Castro JR, Castillo O, Mendoza O, Rodriguez-Diaz A, Melin P (2017) Fuzzy higher type information granules from an uncertainty measurement. Granul Comput 2(2):95–103
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: IJCAI, pp 3798–3804
Šíma J (1996) Back-propagation is not efficient. Neural Netw 9(6):1017–1023
Skowron A, Jankowski A, Dutta S (2016) Interactive granular computing. Granul Comput 1(2):95–113
Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, pp 117–124
Song M, Wang Y (2016) A study of granular computing in the agenda of growth of artificial neural networks. Granul Comput 1(4):247–257
Tian F, Gao B, Cui Q, Chen E, Liu TY (2014) Learning deep representations for graph clustering. In: Proceedings of 28th conference on artificial intelligence (AAAI-14), pp 1293–1299
Timón I, Soto J, Pérez-Sánchez H, Cecilia JM (2016) Parallel implementation of fuzzy minimals clustering algorithm. Expert Syst Appl 48:35–41
Wang G, Yang J, Xu J (2017) Granular computing: from granularity optimization to multi-granularity joint problem solving. Granul Comput 2(3):105–120
Wu Z, Gao G, Bu Z, Cao J (2016) Simple: a simplifying-ensembling framework for parallel community detection from large networks. Cluster Comput 19(1):211–221
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on Web search and data mining. ACM, pp 587–596
Yang JX, Zhang XD (2017) Finding overlapping communities using seed set. Physica A Stat Mech Appl 467:96–106
Yang L, Cao X, He D, Wang C, Wang X, Zhang W (2016) Modularity based community detection with deep learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, pp 2252–2258
Yao Y (2016) A triarchic theory of granular computing. Granul Comput 1(2):145–157
Yoon SH, Kim KN, Hong J, Kim SW, Park S (2015) A community-based sampling method using dpl for online social networks. Inf Sci 306:53–69
Zhang K, Chen XW (2014) Large-scale deep belief nets with mapreduce. IEEE Access 2:395–403
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhatia, V., Rani, R. DFuzzy: a deep learning-based fuzzy clustering model for large graphs. Knowl Inf Syst 57, 159–181 (2018). https://doi.org/10.1007/s10115-018-1156-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1156-3