Chapter

From Social Data Mining and Analysis to Prediction and Community Detection

Part of the series Lecture Notes in Social Networks pp 1-29

Date:

An Offline–Online Visual Framework for Clustering Memes in Social Media

  • Anh DangAffiliated withFaculty of Computer Science, Dalhousie University Email author 
  • , Abidalrahman Moh’dAffiliated withFaculty of Computer Science, Dalhousie University
  • , Anatoliy GruzdAffiliated withTed Rogers School of Management, Ryerson University
  • , Evangelos MiliosAffiliated withFaculty of Computer Science, Dalhousie University
  • , Rosane MinghimAffiliated withUniversity of São Paulo-USP, ICMC

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The amount of data generated in Online Social Networks (OSNs) is increasing every day. Extracting and understanding trending topics and events from the vast amount of data is an important area of research in OSNs. This paper proposes a novel clustering framework to detect the spread of memes in OSNs in real time. The Offline–Online meme clustering framework exploits various similarity scores between different elements of Reddit submissions, two strategies to combine those scores based on Wikipedia concepts as an external knowledge, text semantic similarity and a modified version of Jaccard Coefficient. The two combination strategies include: (1) automatically computing the similarity score weighting factors for five elements of a submission and (2) allowing users to engage in the clustering process and filter out outlier submissions, modify submission class labels, or assign different similarity score weight factors for various elements of a submission using a visualization prototype. The Offline–Online clustering process does a one-pass clustering for existing OSN data in the first step by calculating and summarizing each cluster statistics using Wikipedia concepts. For the online component, it assigns new streaming data points to the appropriate clusters using a modified version of online k-means. The experiment results show that the use of Wikipedia as external knowledge and text semantic similarity improves the speed and accuracy of the meme clustering problem when comparing to baselines. For the online clustering process, using a damped window model approach is suitable for online streaming environments as it not only requires low prediction and training costs, but also assigns more weight to recent data and popular topics.

Keywords

Online algorithms Clustering memes Social media Semantic Jaccard index Wikipedia entity linking Visual analysis