Skip to main content
Log in

DFuzzy: a deep learning-based fuzzy clustering model for large graphs

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graph clustering is successfully applied in various applications for finding similar patterns. Recently, deep learning- based autoencoder has been used efficiently for detecting disjoint clusters. However, in real-world graphs, vertices may belong to multiple clusters. Thus, it is obligatory to analyze the membership of vertices toward clusters. Furthermore, existing approaches are centralized and are inefficient in handling large graphs. In this paper, a deep learning-based model ‘DFuzzy’ is proposed for finding fuzzy clusters from large graphs in distributed environment. It performs clustering in three phases. In first phase, pre-training is performed by initializing the candidate cluster centers. Then, fine tuning is performed to learn the latent representations by mining the local information and capturing the structure using PageRank. Further, modularity is used to redefine clusters. In last phase, reconstruction error is minimized and final cluster centers are updated. Experiments are performed over real-life graph data, and the performance of DFuzzy is compared with four state-of-the-art clustering algorithms. Results show that DFuzzy scales up linearly to handle large graphs and produces better quality of clusters when compared to state-of-the-art clustering algorithms. It is also observed that deep structures can help in getting better graph representations and provide improved clustering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Apolloni B, Bassis S, Rota J, Galliani GL, Gioia M, Ferrari L (2016) A neurofuzzy algorithm for learning from complex granules. Granul Comput 1(4):225–246

    Article  Google Scholar 

  2. Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized pagerank on mapreduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 973–984

  3. Bampis CG, Maragos P, Bovik AC (2017) Graph-driven diffusion and random walk schemes for image segmentation. IEEE Trans Image Process 26(1):35–50

    Article  MathSciNet  Google Scholar 

  4. Banijamali E, Ghodsi A (2017) Fast spectral clustering using autoencoders and landmarks. arXiv preprint arXiv:1704.02345

  5. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2):191–203

    Article  Google Scholar 

  6. Bhatia V, Rani R (2017) A parallel fuzzy clustering algorithm for large graphs using pregel. Expert Syst Appl 78:135–144

    Article  Google Scholar 

  7. Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access 2:514–525

    Article  Google Scholar 

  8. Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3642–3649

  9. Ciucci D (2016) Orthopairs and granular computing. Granul Comput 1(3):159–170

    Article  MathSciNet  Google Scholar 

  10. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  11. Deng L, Yu D, Platt J (2012) Scalable stacking and learning for building deep architectures. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 2133–2136

  12. Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103,018

    Article  Google Scholar 

  13. Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146

    Article  Google Scholar 

  14. Havens TC, Bezdek JC, Leckie C, Ramamohanarao K, Palaniswami M (2013) A soft modularity function for detecting fuzzy communities in social networks. IEEE Trans Fuzzy Syst 21(6):1170–1175

    Article  Google Scholar 

  15. He T, Chan KC (2016) Evolutionary graph clustering for protein complex identification. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2016.2642107

  16. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154

    Article  Google Scholar 

  17. Hutchinson B, Deng L, Yu D (2013) Tensor deep stacking networks. IEEE Trans Pattern Anal Mach Intell 35(8):1944–1957

    Article  Google Scholar 

  18. Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst 27(2):303–325

    Article  Google Scholar 

  19. Kianmehr K, Alshalalfa M, Alhajj R (2010) Fuzzy clustering-based discretization for gene expression classification. Knowl Inf Syst 24(3):441–465

    Article  Google Scholar 

  20. Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data. Accessed 15 Feb 2017

  21. Lingras P, Haider F, Triff M (2016) Granular meta-clustering based on hierarchical, network, and temporal connections. Granular Comput 1(1):71–92

    Article  Google Scholar 

  22. Liu L, Chen X, Liu M, Jia Y, Zhong J, Gao R, Zhao Y (2016) An influence power-based clustering approach with pagerank-like model. Appl Soft Comput 40:17–32

    Article  Google Scholar 

  23. Liu L, Sun L, Chen S, Liu M, Zhong J (2016) K-prscan: a clustering method based on pagerank. Neurocomputing 175:65–80

    Article  Google Scholar 

  24. Ludwig SA (2015) Mapreduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int J Mach Learn Cybern 6(6):923–934

    Article  Google Scholar 

  25. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146

  26. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21

    Article  Google Scholar 

  27. Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77(1):016,107

    Article  MathSciNet  Google Scholar 

  28. Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582

    Article  Google Scholar 

  29. Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning, pp 2014–2023

  30. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Stanford InfoLab, Stanford

    Google Scholar 

  31. Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy c-means model. IEEE Trans Fuzzy Syst 3(3):370–379

    Article  Google Scholar 

  32. Peters G, Weber R (2016) Dcc: a framework for dynamic granular clustering. Granul Comput 1(1):1–11

    Article  Google Scholar 

  33. Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 873–880

  34. Robinson I, Webber J, Eifrem E (2015) Graph databases new opportunities for connected data. O’Reilly Media, Newton

    Google Scholar 

  35. Sanchez MA, Castro JR, Castillo O, Mendoza O, Rodriguez-Diaz A, Melin P (2017) Fuzzy higher type information granules from an uncertainty measurement. Granul Comput 2(2):95–103

    Article  Google Scholar 

  36. Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    Article  MATH  Google Scholar 

  37. Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: IJCAI, pp 3798–3804

  38. Šíma J (1996) Back-propagation is not efficient. Neural Netw 9(6):1017–1023

    Article  Google Scholar 

  39. Skowron A, Jankowski A, Dutta S (2016) Interactive granular computing. Granul Comput 1(2):95–113

    Article  MathSciNet  MATH  Google Scholar 

  40. Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, pp 117–124

  41. Song M, Wang Y (2016) A study of granular computing in the agenda of growth of artificial neural networks. Granul Comput 1(4):247–257

    Article  Google Scholar 

  42. Tian F, Gao B, Cui Q, Chen E, Liu TY (2014) Learning deep representations for graph clustering. In: Proceedings of 28th conference on artificial intelligence (AAAI-14), pp 1293–1299

  43. Timón I, Soto J, Pérez-Sánchez H, Cecilia JM (2016) Parallel implementation of fuzzy minimals clustering algorithm. Expert Syst Appl 48:35–41

    Article  Google Scholar 

  44. Wang G, Yang J, Xu J (2017) Granular computing: from granularity optimization to multi-granularity joint problem solving. Granul Comput 2(3):105–120

    Article  Google Scholar 

  45. Wu Z, Gao G, Bu Z, Cao J (2016) Simple: a simplifying-ensembling framework for parallel community detection from large networks. Cluster Comput 19(1):211–221

    Article  Google Scholar 

  46. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487

  47. Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on Web search and data mining. ACM, pp 587–596

  48. Yang JX, Zhang XD (2017) Finding overlapping communities using seed set. Physica A Stat Mech Appl 467:96–106

    Article  Google Scholar 

  49. Yang L, Cao X, He D, Wang C, Wang X, Zhang W (2016) Modularity based community detection with deep learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. AAAI Press, pp 2252–2258

  50. Yao Y (2016) A triarchic theory of granular computing. Granul Comput 1(2):145–157

    Article  Google Scholar 

  51. Yoon SH, Kim KN, Hong J, Kim SW, Park S (2015) A community-based sampling method using dpl for online social networks. Inf Sci 306:53–69

    Article  Google Scholar 

  52. Zhang K, Chen XW (2014) Large-scale deep belief nets with mapreduce. IEEE Access 2:395–403

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vandana Bhatia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhatia, V., Rani, R. DFuzzy: a deep learning-based fuzzy clustering model for large graphs. Knowl Inf Syst 57, 159–181 (2018). https://doi.org/10.1007/s10115-018-1156-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1156-3

Keywords

Navigation