Detecting anomalies in social network data consumption

Abstract

As the popularity and usage of social media exploded over the years, understanding how social network users’ interests evolve gained importance in diverse fields, ranging from sociological studies to marketing. In this paper, we use two snapshots from the Twitter network and analyze data interest patterns of users in time to understand individual and collective user behavior on social networks. Building topical profiles of users, we propose novel metrics to identify anomalous friendships, and validate our results with Amazon Mechanical Turk experiments. We show that although more than 80 % of all friendships on Twitter are created due to data interests, 83 % of all users have at least one friendship that can be explained neither by users’ past interest nor collective behavior of other similar users.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    In this paper, we use the term “anomaly” to represent such significant changes in user behavior.

  2. 2.

    https://dev.twitter.com/.

  3. 3.

    In Twitter API, friends of a user are the accounts followed by the user.

  4. 4.

    Two senators are excluded in bioLDA because of short or blank bios.

  5. 5.

    Other words from the topic include words such as green, water, power, wind, oil and gas.

  6. 6.

    The number of new friendships is greater than the total number of queried Twitter users because we have queried Twitter breadth first, and many new friendships are shared by seed users.

  7. 7.

    http://sight.dicom.uninsubria.it/anomaly/.

  8. 8.

    Approved by the Office of Research Compliance-University of Texas at Dallas, human experiment IRB MR 13-231.

  9. 9.

    For Fleiss’ Kappa, >0.2 Fair agreement, >0.40 Moderate agreement, >0.6 Substantial agreement

References

  1. Akcora CG, Carminati B, Ferrari E (2012) Risks of friendships on social networks. In: Data Mining (ICDM), 2012 IEEE 12th International Conference

  2. Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter diff dataset: friends of users in 2009 and 2013. http://strict.dista.uninsubria.it/?p=364, 2014

  3. Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter taf dataset: detecting topically anomalous friendships. http://strict.dista.uninsubria.it/?p=442, 2014

  4. Anantharam P, Sheth A (2012) Topical anomaly detection from twitter stream. Proc ACM Web Sci 2012:11–14

    Article  Google Scholar 

  5. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mac Learn Res 3:993–1022

    MATH  Google Scholar 

  6. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  7. Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, p 4

  8. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  9. Choudhury MD (2011) Tie formation on twitter: homophily and structure of egocentric networks. In: SocialCom/PASSAT, p 465–470

  10. Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, p 21–30

  11. Fleiss JL, Levin B, Paik MC (1981) The measurement of interrater agreement. Stat Methods Rates Proportions 2:212–236

    Google Scholar 

  12. Gan G, Ma C, Wu J (2007) Data clustering. SIAM, Society for Industrial and Applied Mathematics

  13. Hong L, Davison B (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, p 80–88

  14. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, p 591–600

  15. Lee S, Kim J (2012) Warningbird: detecting suspicious urls in twitter stream. In: Symposium on Network and Distributed System Security (NDSS)

  16. Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: The geography of twitter. First Monday 18(5)

  17. Lucia W, Akcora CG, Ferrari E (2013) Multi-dimensional conversation analysis across online social networks. In: Cloud and Green Computing (CGC), 2013 Third International Conference, IEEE, p 369–376

  18. Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 international conference on Management of data. ACM, p 1155–1158

  19. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 415–444

  20. Meeder B, Karrer B, Sayedi A, Ravi R, Borgs C, Chayes J (2011) We know who you followed last summer: inferring social link creation times in twitter. In: Proceedings of the 20th international conference on World wide web. ACM, p 517–526

  21. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC, vol 2010

  22. Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30

    Article  Google Scholar 

  23. Ramage D, Dumais S, Liebling D (2010) Characterizing microblogs with topic models. In: ICWSM

  24. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web. ACM, p 851–860

  25. Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In Data Mining (ICDM), 2011 IEEE 11th International Conference. IEEE, p 1230–1235

  26. Thomases H (2010) Twitter marketing: an hour a day. Sybex

  27. Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Phys A Stat Mech Appl 391(16):4165–4180

    Article  Google Scholar 

  28. Zhao W, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. Adv Inf Retr 338–349

Download references

Acknowledgments

This work is partially funded by National Science Foundation (NSF) Grants Career—CNS-0845803, CNS-0964350, CNS-1016343, CNS-1111529, CNS-1228198.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Cuneyt Gurcan Akcora.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Akcora, C.G., Carminati, B., Ferrari, E. et al. Detecting anomalies in social network data consumption. Soc. Netw. Anal. Min. 4, 231 (2014). https://doi.org/10.1007/s13278-014-0231-3

Download citation

Keywords

  • Anomaly Detection
  • Topic Model
  • Latent Dirichlet Allocation
  • Similar User
  • Twitter User