Language agnostic meme-filtering for hashtag-based social network analysis

  • Dimitrios Kotsakos
  • Panos Sakkos
  • Ioannis Katakis
  • Dimitrios Gunopulos
Original Article


Users in social networks utilize hashtags for a variety of reasons. In many cases, hashtags serve retrieval purposes by labeling the content they accompany. More often than not, hashtags are used to promote content, ideas, or conversations producing viral memes. This paper addresses a specific case of hashtag classification: meme-filtering. We argue that hashtags that are correlated with memes may hinder many valuable social media algorithms like trend detection and event identification. We propose and evaluate a set of language-agnostic features that aid the separation of these two classes: meme-hashtags and event-hashtags. The proposed approach is evaluated on two large datasets of Twitter messages written in English and German. A proof-of-concept application of the meme-filtering approach to the problem of event detection is presented.


Social Medium Event Detection Social Network Analysis Random Forest Classifier Twitter User 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank the data annotators. This work has been co-financed by EU and Greek National funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Programs: Heraclitus II fellowship, THALIS - GeomComp, THALIS - DISFER, ARISTEIA - MMD,” and the EU funded project INSIGHT.


  1. Aha DW, Kibler DF, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66Google Scholar
  2. Bauckhage C (2011) Insights into internet memes. In: ICWSMGoogle Scholar
  3. Boyd D, Golder S, Lotan G (2010) Tweet, tweet, retweet: conversational aspects of retweeting on twitter. In: Proceedings of the 2010 43rd Hawaii international conference on system sciences, IEEE Computer Society, Washington, DC, HICSS ’10, pp 1–10. doi: 10.1109/HICSS.2010.412
  4. Burton S, Soboleva A (2011) Interactive or reactive? Marketing with twitter. J Consum Mark 28(7):491–499CrossRefGoogle Scholar
  5. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181MATHMathSciNetGoogle Scholar
  6. Grant WJ, Moon B, Busby Grant J (2010) Digital dialogue? Australian politicians’ use of the social network tool twitter. Aust J Polit Sci 45(4):579–604CrossRefGoogle Scholar
  7. Gupta A, Sycara KP, Gordon GJ, Hefny A (2013) Exploring friend’s influence in cultures in twitter. In: ASONAM, pp 584–591Google Scholar
  8. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  9. Hawn C (2009) Take two aspirin and tweet me in the morning: how twitter, facebook, and other social media are reshaping health care. Health Aff 28(2):361–368CrossRefGoogle Scholar
  10. Kamath KY, Caverlee J (2013) Spatio-temporal meme prediction: learning what hashtags will be popular where. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management, ACM, pp 1341–1350Google Scholar
  11. Kamath KY, Caverlee J, Lee K, Cheng Z (2013) Spatio-temporal dynamics of online memes: A study of geo-tagged tweets. In: Proceedings of the 22nd international conference on world wide web, International World Wide Web Conferences Steering Committee, pp 667–678Google Scholar
  12. Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4):373–397MathSciNetCrossRefGoogle Scholar
  13. Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: The good the bad and the omg!. ICWSM 11:538–541Google Scholar
  14. Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, June 28–July 1, 2009, pp 477–486, doi: 10.1145/1557019.1557075
  15. Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 497–506Google Scholar
  16. Parker J, Wei Y, Yates A, Frieder O, Goharian N (2013) A framework for detecting public health trends with twitter. In: ASONAM, pp 556–563Google Scholar
  17. Petrovic S, Osborne M, McCreadie R, Macdonald C, Ounis I, Shrimpton L (2013) Can twitter replace newswire for breaking news. In: Seventh international AAAI conference on weblogs and social mediaGoogle Scholar
  18. Platakis M, Kotsakos D, Gunopulos D (2009) Searching for events in the blogosphere. In: Proceedings of the 18th international conference on world wide web, ACM, pp 1225–1226Google Scholar
  19. Qi X, Tang W, Wu Y, Guo G, Fuller E, Zhang CQ (2014) Optimal local community detection in social networks based on density drop of subgraphs. Pattern Recogn Lett 36:46–53CrossRefGoogle Scholar
  20. Quercia D, Kosinski M, Stillwell D, Crowcroft J (2011) Our twitter profiles, our selves: Predicting personality with twitter. In: Privacy, security, risk and trust (passat), 2011 IEE third international conference on social computing (socialcom), pp 180–185Google Scholar
  21. Ruzzo WL, Tompa M (1999) A linear time algorithm for finding all maximal scoring subsequences. ISMB 99:234–241Google Scholar
  22. Sen S, Lam SK, Rashid AM, Cosley D, Frankowski D, Osterhouse J, Harper FM, Riedl J (2006) Tagging, communities, vocabulary, evolution. In: Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, ACM, pp 181–190Google Scholar
  23. Teevan J, Ramage D, Morris MR (2011) # twittersearch: a comparison of microblog search and web search. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 35–44Google Scholar
  24. Tsur O, Rappoport A (2012) What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, pp 643–652Google Scholar
  25. Valkanas G, Gunopulos D (2013) How the live web feels about events. In: Iyengar A, Nejdl W, Pei J, Rastogi R, He Q (eds) CIKM, ACM, pp 639–648Google Scholar
  26. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 177–186Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  • Dimitrios Kotsakos
    • 1
  • Panos Sakkos
    • 1
  • Ioannis Katakis
    • 1
  • Dimitrios Gunopulos
    • 1
  1. 1.Department of Informatics and TelecommunicationsNational and Kapodistrian University of AthensIlissiaGreece

Personalised recommendations