Detecting anomalies in social network data consumption

  • Cuneyt Gurcan AkcoraEmail author
  • Barbara Carminati
  • Elena Ferrari
  • Murat Kantarcioglu
Original Article


As the popularity and usage of social media exploded over the years, understanding how social network users’ interests evolve gained importance in diverse fields, ranging from sociological studies to marketing. In this paper, we use two snapshots from the Twitter network and analyze data interest patterns of users in time to understand individual and collective user behavior on social networks. Building topical profiles of users, we propose novel metrics to identify anomalous friendships, and validate our results with Amazon Mechanical Turk experiments. We show that although more than 80 % of all friendships on Twitter are created due to data interests, 83 % of all users have at least one friendship that can be explained neither by users’ past interest nor collective behavior of other similar users.


Anomaly Detection Topic Model Latent Dirichlet Allocation Similar User Twitter User 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is partially funded by National Science Foundation (NSF) Grants Career—CNS-0845803, CNS-0964350, CNS-1016343, CNS-1111529, CNS-1228198.


  1. Akcora CG, Carminati B, Ferrari E (2012) Risks of friendships on social networks. In: Data Mining (ICDM), 2012 IEEE 12th International ConferenceGoogle Scholar
  2. Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter diff dataset: friends of users in 2009 and 2013., 2014
  3. Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter taf dataset: detecting topically anomalous friendships., 2014
  4. Anantharam P, Sheth A (2012) Topical anomaly detection from twitter stream. Proc ACM Web Sci 2012:11–14CrossRefGoogle Scholar
  5. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mac Learn Res 3:993–1022zbMATHGoogle Scholar
  6. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8CrossRefGoogle Scholar
  7. Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, p 4Google Scholar
  8. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15CrossRefGoogle Scholar
  9. Choudhury MD (2011) Tie formation on twitter: homophily and structure of egocentric networks. In: SocialCom/PASSAT, p 465–470Google Scholar
  10. Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, p 21–30Google Scholar
  11. Fleiss JL, Levin B, Paik MC (1981) The measurement of interrater agreement. Stat Methods Rates Proportions 2:212–236Google Scholar
  12. Gan G, Ma C, Wu J (2007) Data clustering. SIAM, Society for Industrial and Applied MathematicsGoogle Scholar
  13. Hong L, Davison B (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, p 80–88Google Scholar
  14. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, p 591–600 Google Scholar
  15. Lee S, Kim J (2012) Warningbird: detecting suspicious urls in twitter stream. In: Symposium on Network and Distributed System Security (NDSS)Google Scholar
  16. Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: The geography of twitter. First Monday 18(5)Google Scholar
  17. Lucia W, Akcora CG, Ferrari E (2013) Multi-dimensional conversation analysis across online social networks. In: Cloud and Green Computing (CGC), 2013 Third International Conference, IEEE, p 369–376Google Scholar
  18. Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 international conference on Management of data. ACM, p 1155–1158Google Scholar
  19. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 415–444Google Scholar
  20. Meeder B, Karrer B, Sayedi A, Ravi R, Borgs C, Chayes J (2011) We know who you followed last summer: inferring social link creation times in twitter. In: Proceedings of the 20th international conference on World wide web. ACM, p 517–526Google Scholar
  21. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC, vol 2010Google Scholar
  22. Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30CrossRefGoogle Scholar
  23. Ramage D, Dumais S, Liebling D (2010) Characterizing microblogs with topic models. In: ICWSMGoogle Scholar
  24. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web. ACM, p 851–860Google Scholar
  25. Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In Data Mining (ICDM), 2011 IEEE 11th International Conference. IEEE, p 1230–1235Google Scholar
  26. Thomases H (2010) Twitter marketing: an hour a day. SybexGoogle Scholar
  27. Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Phys A Stat Mech Appl 391(16):4165–4180CrossRefGoogle Scholar
  28. Zhao W, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. Adv Inf Retr 338–349Google Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  • Cuneyt Gurcan Akcora
    • 1
    Email author
  • Barbara Carminati
    • 1
  • Elena Ferrari
    • 1
  • Murat Kantarcioglu
    • 2
  1. 1.DISTA, Università degli Studi dell’InsubriaVareseItaly
  2. 2.Data Security and Privacy LaboratoryUniversity of Texas at DallasRichardsonUSA

Personalised recommendations