Exploring characteristics of suspended users and network stability on Twitter

  • Wei Wei
  • Kenneth Joseph
  • Huan Liu
  • Kathleen M. Carley
Original Article

Abstract

Social media is rapidly becoming a medium of choice for understanding the cultural pulse of a region; e.g. for identifying what the population is concerned with and what kind of help is needed in a crisis. To assess this cultural pulse, it is critical to have an accurate assessment of who is saying what. Unfortunately, social media is also the home of users who engage in disruptive, disingenuous, and potentially illegal activity. A range of users, both human and non-human, carry out such social cyber-attacks. We ask, to what extent does the presence or absence of such users influence our ability to assess the cultural pulse of a region? Our prior research on this topic showed that Twitter-based network structures and content are unstable and can be highly impacted by the removal of suspended users. Because of this, statistical techniques can be established to differentiate potential types of suspended and non-suspended users. In this extended paper, we develop additional experiments to explore the spatial patterns of suspended users, and we further consider how these users affect structural and content concentrations via the development of new metrics and new analyses. We find significant evidence that suspended users exist on the periphery of social networks on Twitter and consequently that removing them has little impact on network structure. We also improve prior attempts to distinguish among different types of suspended users by using a much larger dataset. Finally, we conduct a temporal sentiment analysis to illustrate differences between suspended users and non-suspended users on this dimension.

References

  1. Amleshwaram AA, Reddy N, Yadav S, Gu G, Yang C (2013) CATS: characterizing automation of twitter spammers. In: Communication systems and networks (COMSNETS), 2013 fifth international conference on, IEEE, pp 1–10Google Scholar
  2. Anthonisse JM (1971) The rush in a directed graph. Stichting Mathematisch Centrum Mathematische Besliskunde (BN 9/71):1–10Google Scholar
  3. Bíró I, Szabó J, Benczúr AA (2008) Latent dirichlet allocation in web spam filtering. In: Proceedings of the 4th international workshop on adversarial information retrieval on the web, ACM, pp 29–32Google Scholar
  4. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATHGoogle Scholar
  5. Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17:235–249Google Scholar
  6. Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136CrossRefGoogle Scholar
  7. Bosagh Zadeh R, Goel A, Munagala K, Sharma A (2013) On the precision of social and information networks. In: Proceedings of the first ACM conference on Online social networks, pp 63–74Google Scholar
  8. Carley KM, Pfeffer J, Morstatter F, Liu H (2014) Embassies burning: toward a near-real-time assessment of social media using geo-temporal dynamic network analytics. Soci Netw Anal Min 4(1):1–23Google Scholar
  9. De Lathauwer L, De Moor B, Vandewalle J, by Higher-Order BSS (1994) Singular value decomposition. In: Proceedings of the EUSIPCO-94, Edinburgh, Scotland, UK, vol 1, pp 175–178Google Scholar
  10. Diao Q, Qiu M, Wu CY, Smola AJ, Jiang J, Wang C (2014) Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 193–202Google Scholar
  11. Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):188–230CrossRefGoogle Scholar
  12. Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, Citeseer, vol 6, pp 417–422Google Scholar
  13. Frantz TL, Cataldo M, Carley KM (2009) Robustness of centrality measures under uncertainty: examining the role of network topology. Comput Math Organ Theory 15(4):303–328CrossRefGoogle Scholar
  14. Freeman LC (1979) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239CrossRefGoogle Scholar
  15. Gelman A (2008) Scaling regression inputs by dividing by two standard deviations. Stat Med 27(15):2865–2873MathSciNetCrossRefGoogle Scholar
  16. Golder SA, Macy MW (2011) Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333(6051):1878–1881. doi:10.1126/science.1202775, http://www.sciencemag.org/content/333/6051/1878
  17. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235CrossRefGoogle Scholar
  18. Heise DR (1987) Affect control theory: concepts and model. J Math Sociol 13(1–2):1–33MathSciNetCrossRefMATHGoogle Scholar
  19. Hern A (2015) Twitter CEO: we suck at dealing with trolls and abuse. http://www.theguardian.com/technology/2015/feb/05/twitter-ceo-we-suck-dealing-with-trolls-abuse
  20. Hong L, Ahmed A, Gurumurthy S, Smola AJ, Tsioutsiouliklis K (2012) Discovering geographical topics in the twitter stream. In: Proceedings of the 21st international conference on world wide web, ACM, pp 769–778Google Scholar
  21. Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, pp 80–88Google Scholar
  22. Hutto C, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social mediaGoogle Scholar
  23. Jordan MI (1998) Learning in Graphical Models: [proceedings of the NATO Advanced Study Institute...: Ettore Mairona Center, Erice, Italy, September 27-October 7, 1996], vol 89. Springer Science & Business MediaGoogle Scholar
  24. Joseph K, Carley KM (2015) Culture, networks, twitter and foursquare: testing a model of cultural conversion with social media data. In: Proceedings of the 7th international AAAI conference on weblogs and social media (ICWSM)Google Scholar
  25. Joseph K, Tan CH, Carley KM (2012) Beyond local, categories and friends: clustering foursquare users with latent topics. In: Proceedings of the 2012 ACM conference on ubiquitous computing, ACM, pp 919–926Google Scholar
  26. Le QV, Mikolov T (2014) Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053
  27. Lim KH, Datta A (2013) A topological approach for detecting twitter communities with common interests. In: Atzmueller M, Chin A, Helic D, Hotho A (eds) Ubiquitous social media analysis. Springer, Berlin Heidelberg, pp 23–43Google Scholar
  28. Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 375–384Google Scholar
  29. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167CrossRefGoogle Scholar
  30. Luxton DD, June JD, Fairall JM (2012) Social media and suicide: a public health perspective. Am J Public Health 102(S2):S195–S200. doi:10.2105/AJPH.2011.300608 CrossRefGoogle Scholar
  31. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
  32. Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73CrossRefGoogle Scholar
  33. Moh TS, Murmann AJ (2010) Can you judge a man by his friends?-enhancing spammer detection on the twitter microblogging platform using friends and followers. In: Information systems, technology and management. Springer, pp 210–220Google Scholar
  34. Monmarché N, Slimane M, Venturini G (1999) Antclass: discovery of clusters in numeric data by an hybridization of an ant colony with the kmeans algorithmGoogle Scholar
  35. Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
  36. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol 10, pp 1320–1326Google Scholar
  37. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing-volume 10, association for computational linguistics, pp 79–86Google Scholar
  38. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATHGoogle Scholar
  39. Pennebaker JW, Booth RJ, Francis ME (2007) Linguistic inquiry and word count: Liwc. Liwc net, AustinGoogle Scholar
  40. Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Flammini A, Menczer F (2011) Detecting and tracking political abuse in social media. In: ICWSMGoogle Scholar
  41. Reynolds D (2009) Gaussian mixture models. In: Encyclopedia of biometrics. Springer, pp 659–663Google Scholar
  42. Romero DM, Tan C, Kleinberg J (2013) On the interplay between social and topical structure. In: Proceedings of the 7th International AAAI Conference on weblogs and social media (ICWSM)Google Scholar
  43. Santos I, Miambres-Marcos I, Laorden C, Galn-Garca P, Santamara-Ibirika A, Bringas PG (2014) Twitter content-based spam filtering. In: International joint conference SOCO13-CISIS13-ICEUTE13. Springer, pp 449–458Google Scholar
  44. Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, ACM, pp 243–258Google Scholar
  45. Thomas K, McCoy D, Grier C, Kolcz A, Paxson V (2013) Trafficking fraudulent accounts: the role of the underground market in twitter spam and abuse. Presented as part of the 22nd USENIX security symposium (USENIX Security 13). USENIX, Washington, D.C., pp 195–210Google Scholar
  46. Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, Citeseer, vol. 8, pp 308–316Google Scholar
  47. Wang AH (2010) Don’t follow me: spam detection in twitter. In: Security and cryptography (SECRYPT), proceedings of the 2010 international conference on, IEEE, pp 1–10Google Scholar
  48. Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 448–456Google Scholar
  49. Wei W, Carley K (2014) Real time closeness and betweenness centrality calculations on streaming network data.Google Scholar
  50. Wei W, Carley KM (2015) Measuring temporal patterns in dynamic social networks. ACM Trans Knowl Discov Data (TKDD) 10(1):1–27. doi:10.1145/2749465
  51. Wei W, Joseph K, Liu H, Carley KM (2015a) The fragility of twitter social networks against suspended users. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ACM, pp 9–16Google Scholar
  52. Wei W, Joseph K, Lo W, Carley KM (2015b) A bayesian graphical model to discover latent events from twitter. In: Ninth international AAAI conference on web and social mediaGoogle Scholar
  53. Wei W, Pfeffer J, Reminga J, Carley KM (2011) Handling weighted, asymmetric, self-looped, and disconnected networks in ora. Tech. rep., DTIC DocumentGoogle Scholar
  54. Xia P, Jiang H, Wang X, Chen C, Liu B (2014) Predicting user replying behavior on a large online dating site. In: Proceedings of 8th international AAAI conference on weblogs and social mediaGoogle Scholar
  55. Xia P, Liu B, Sun Y, Chen C (2015) Reciprocal recommendation system for online dating. arXiv preprint arXiv:150106247
  56. Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I (2008) Spamming botnets: signatures and characteristics. In: ACM SIGCOMM computer communication review, ACM 38:171–182Google Scholar
  57. Xu R, Wunsch D et al (2005) Survey of clustering algorithms. Neural Netw IEEE Trans 16(3):645–678CrossRefGoogle Scholar
  58. Yin J, Ho Q, Xing EP (2013) A scalable approach to probabilistic latent space inference of large-scale networks. In: Advances in neural information processing systems, pp 422–430Google Scholar
  59. Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. In: Proceedings of the 18th ACM SIGKDD international conference on kowledge discovery and data mining, ACM, pp 186–194Google Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  • Wei Wei
    • 1
  • Kenneth Joseph
    • 1
  • Huan Liu
    • 2
  • Kathleen M. Carley
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  2. 2.School of Computing, Informatics, and Decision Systems EngineeringArizona State UniversityTempeUSA

Personalised recommendations