In this paper, we propose a characterization of social media users based on language usage over time in order to make more rigorous the notions of organic and inorganic online behavior. This characterization describes the extent to which a user’s word usage within a particular time period subverts expectations based on preceding time periods. To do this, we adapt the use of an information theoretic measure of cognitive surprise and apply it to a set of behaviorally diverse Twitter users. We then compare the language-production dynamics across users based on term frequencies at multiple levels of granularity. We then illustrate the intuition behind this characterization through case studies of salient users identified from this method. Through these case studies, we find that this characterization can be linked to the degree to which a user’s word usage is organic, inorganic, or a mixture of both.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
The reason for doing so is to ensure that each group of tweets represents a roughly equal amount of linguistic activity, regardless of the time spanned by each grouping. If we were to instead group tweets based on a fixed unit of time (e.g., grouping all tweets from the same month), we would run into a serious problem in cases where a user has periods of lessened tweet activity. Such periods of fewer tweets would result in sparser word distributions, which would cause the calculated surprise (detailed in the following section) to be arbitrarily high. To put it more simply, if we want to compare a user’s tweet to the tweet before it, the time lag between the two tweets is irrelevant—the last tweet simply represents the most recent action of the user and thus all that we have available to make our comparison.
Bail CA et al (2018) Exposure to opposing views on social media can increase political polarization. PNAS 115(37):9216–9221. https://doi.org/10.1073/pnas.1804840115
Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social networks in information diffusion. In: Proceedings of the 21st international conference on world wide web. Lyon, France, pp 519–528
Barron ATJ, Huang J, Spang RL, DeDeo S (2018) Individuals, institutions, and innovation in the debates of the French Revolution. PNAS 115(18):4607–4612. https://doi.org/10.1073/pnas.1717729115
Bird S, Klein E, Loper E (2009) Natural Language Processing with Python. O’Reilly Media, Sebastopol
Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secur Comput 9(6):811–824. https://doi.org/10.1109/TDSC.2012.75
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion. Perth, Australia, pp 963–972. https://doi.org/10.1145/3041021.3055135
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2018) Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans Dependable Secur Comput 15(4):561–576. https://doi.org/10.1109/TDSC.2017.2681672
Cresci S, Petrocchi M, Spognardi A, Tognazzi S (2019) On the capability of evolved spambots to evade detection via genetic engineering. Online Soc Netw Media 9:1–6. https://doi.org/10.1016/j.osnem.2018.10.005
Del Vicario M et al (2016) The spreading of misinformation online. PNAS 113(3):554–559. https://doi.org/10.1073/pnas.1517441113
Dickerson JP, Kagan V, Subrahmanian VS (2014) Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining. Beijing, China, pp 620–627
Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104. https://doi.org/10.1145/2818717
Gilani Z, Almeida M, Farahbakhsh R, Wang L, Crowcroft J (2016) Stweeler: A framework for Twitter bot analysis. In: Proceedings of the 25th international conference companion on world wide web. Montréal, Canada, pp 37–38. https://doi.org/10.1145/2872518.2889360
Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining. Sydney, Australia, pp 349–354. https://doi.org/10.1145/3110025.3110090
Grimme C, Assenmacher D, Adam L (2018) Changing perspectives: Is it sufficient to detect social bots? In: Meiselwitz G (ed) Social computing and social media, user experience and behavior, SCSM 2018 lecture notes in computer science. Springer, Cham, pp 445–461. https://doi.org/10.1007/978-3-319-91521-0_32
Guo L, Tan E, Chen S, Zhang X, Zhao Y (2009) Analyzing patterns of user content generation in online social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. Paris, France, pp 369–378. https://doi.org/10.1145/1557019.1557064
Liao TW (2005) Clustering of time series data—a survey. Pattern Recognit 38(11):1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025
Murdock J, Allen C, DeDeo S (2017) Exploration and exploitation of Victorian science in Darwin’s reading notebooks. Cognition 159:117–126. https://doi.org/10.1016/j.cognition.2016.11.012
Oliphant TE (2006) A guide to NumPy. Trelgol Publishing, Provo
Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2016) Understanding the trolling phenomenon: the automated detection of bots and cyborgs in the social media. J Inf Warf 15(4):100–111
Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2017) May I introduce you to a troll? Defining and categorizing internet behaviour commonly referred to as trolling. In: Proceedings of the 16th European conference on cyber warfare and security. Dublin, Ireland, pp 734–740
Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. Toronto, Canada, pp 37–44. https://doi.org/10.1145/1871985.1871993
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. Valletta, Malta, pp 45–50
Schmidt AL et al (2017) Anatomy of news consumption on Facebook. PNAS 114(12):3035–3039. https://doi.org/10.1073/pnas.1617052114
Stine ZK, Khaund T, Agarwal N (2018) Measuring the information-foraging behaviors of social bots through word usage. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining. Barcelona, Spain, pp 570–671. https://doi.org/10.1109/ASONAM.2018.8508811
Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the 11th international AAAI conference on web and social media. pp 280–289
Volkova S, Bachrach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of the 29th AAAI conference on artificial intelligence. pp 4296–4297
Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293. https://doi.org/10.1109/TIFS.2013.2267732
The authors wish to thank Stefano Cresci and colleagues for generously making the dataset available to us. This research is funded in part by the U.S. National Science Foundation (IIS-1636933, ACI-1429160, and IIS-1110868), U.S. Office of Naval Research (N00014-10-1-0091, N00014-14-1-0489, N00014-15-P-1187, N00014-16-1-2016, N00014-16-1-2412, N00014-17-1-2605, N00014-17-1-2675, N00014-19-1-2336), U.S. Air Force Research Lab, U.S. Army Research Office (W911NF-16-1-0189), U.S. Defense Advanced Research Projects Agency (W31P4Q-17-C-0059), Arkansas Research Alliance, and the Jerry L. Maulden/Entergy Endowment at the University of Arkansas at Little Rock. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations. The researchers gratefully acknowledge the support.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Stine, Z.K., Agarwal, N. Characterizing the language-production dynamics of social media users. Soc. Netw. Anal. Min. 9, 60 (2019). https://doi.org/10.1007/s13278-019-0605-7
- Social media
- Human–computer interaction
- Natural language processing
- Computer-mediated communication
- Information theory