• Lei MengEmail author
  • Ah-Hwee Tan
  • Donald C. Wunsch II
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


The last decade has witnessed how social media in the era of Web 2.0 reshapes the way people communicate, interact, and entertain in daily life and incubates the prosperity of various user-centric platforms, such as social networking, question answering, massive open online courses (MOOC), and e-commerce platforms. The available rich user-generated multimedia data on the web has evolved traditional ways of understanding multimedia research and has led to numerous emerging topics on human-centric analytics and services, such as user profiling, social network mining, crowd behavior analysis, and personalized recommendation. Clustering, as an important tool for mining information groups and in-group shared characteristics, has been widely investigated for the knowledge discovery and data mining tasks in social media analytics. Whereas, social media data has numerous characteristics that raise challenges for traditional clustering techniques, such as the massive amount, diverse content, heterogeneous media sources, noisy user-generated content, and the generation in stream manner. This leads to the scenario where the clustering algorithms used in the literature of social media applications are usually variants of a few traditional algorithms, such as K-means, non-negative matrix factorization (NMF) , and graph clustering. Developing a fast and robust clustering algorithm for social media analytics is still an open problem. This chapter will give a bird’s eye view of clustering in social media analytics, in terms of data characteristics, challenges and issues, and a class of novel approaches based on adaptive resonance theory (ART) .


  1. 1.
    Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithmics (JEA) 17. No 2.4CrossRefGoogle Scholar
  2. 2.
    Ailon N, Jaiswal R, Monteleoni C (2009) Streaming k-means approximation. In: Advances in neural information processing systems, pp 10–18Google Scholar
  3. 3.
    Barbakh W, Fyfe C (2008) Online clustering algorithms. Int J Neural Syst 18(3):185–194CrossRefGoogle Scholar
  4. 4.
    Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. In: CVPR, pp 1–8Google Scholar
  5. 5.
    Bickel S, Scheffer T (2004) Multi-view clustering. In: ICDM, pp 19–26Google Scholar
  6. 6.
    Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: ICDM, pp 828–833Google Scholar
  7. 7.
    Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the annual ACM symposium on theory of computing, pp 30–39Google Scholar
  8. 8.
    Chen Y, Dong M, Wan W (2007) Image co-clustering with multi-modality features and user feedbacks. In: MM, pp 689–692Google Scholar
  9. 9.
    Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In: ICDM, pp 103–112Google Scholar
  10. 10.
    Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142Google Scholar
  11. 11.
    Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semisupervised heterogeneous data coclustering. TKDE 22(10):1459–1474CrossRefGoogle Scholar
  12. 12.
    Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR, pp 1–9Google Scholar
  13. 13.
    Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231Google Scholar
  14. 14.
    Goldberg Y, Levy O (2014) Word2vec explained: deriving Mikolov et al’s negative-sampling word-embedding method. arXiv:1402.3722
  15. 15.
    Grossberg S (1980) How does a brain build a cognitive code. Psychol Rev 87(1):1–51CrossRefGoogle Scholar
  16. 16.
    Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528CrossRefGoogle Scholar
  17. 17.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  18. 18.
    Hu X, Sun N, Zhang C, Chua TS (2009) Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of ACM conference on information and knowledge management, pp 919–928Google Scholar
  19. 19.
    Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems, pp 2177–2185Google Scholar
  20. 20.
    Li X, Snoek CGM, Worring M (2008) Learning tag relevance by neighbor voting for social image retrieval. In: Proceedings of ACM multimedia, pp 180–187Google Scholar
  21. 21.
    Liu D, Hua X, Yang L, Wang M, Zhang H (2009) Tag ranking. In: Proceedings of international conference on world wide web, pp 351–360Google Scholar
  22. 22.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  23. 23.
    Meng L, Tan AH (2014) Community discovery in social networks via heterogeneous link association and fusion. In: SIAM international conference on data mining (SDM), pp 803–811Google Scholar
  24. 24.
    Shi X, Fan W, Yu PS (2010) Efficient semi-supervised spectral co-clustering with constraints. In: ICDM, pp 532–541Google Scholar
  25. 25.
    Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho AC, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv (CSUR) 46(1). No 13CrossRefGoogle Scholar
  26. 26.
    Tan AH, Carpenter GA, Grossberg S (2007) Intelligence through interaction: towards a unified theory for learning. LNCS, vol 4491. Springer, Berlin, pp 1094–1103CrossRefGoogle Scholar
  27. 27.
    Wang L, Leckie C, Ramamohanarao K, Bezdek J (2012) Automatically determining the number of clusters in unlabeled data sets. IEEE Trans Knowl Data Eng 21(3):335–350CrossRefGoogle Scholar
  28. 28.
    Wang W, Zhang Y (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158(19):2095–2117MathSciNetCrossRefGoogle Scholar
  29. 29.
    Whang JJ, Sui X, Sun Y, Dhillon IS (2012) Scalable and memory-efficient clustering of large-scale social networks. In: ICDM, pp 705–714Google Scholar
  30. 30.
    Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views. In: ICML, pp 1159–1166Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.NTU-UBC Research Center of Excellence in Active Living for the Elderly (LILY)Nanyang Technological UniversitySingaporeSingapore
  2. 2.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore
  3. 3.Applied Computational Intelligence LaboratoryMissouri University of Science and TechnologyRollaUSA

Personalised recommendations