Characterizing the language-production dynamics of social media users


In this paper, we propose a characterization of social media users based on language usage over time in order to make more rigorous the notions of organic and inorganic online behavior. This characterization describes the extent to which a user’s word usage within a particular time period subverts expectations based on preceding time periods. To do this, we adapt the use of an information theoretic measure of cognitive surprise and apply it to a set of behaviorally diverse Twitter users. We then compare the language-production dynamics across users based on term frequencies at multiple levels of granularity. We then illustrate the intuition behind this characterization through case studies of salient users identified from this method. Through these case studies, we find that this characterization can be linked to the degree to which a user’s word usage is organic, inorganic, or a mixture of both.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    The reason for doing so is to ensure that each group of tweets represents a roughly equal amount of linguistic activity, regardless of the time spanned by each grouping. If we were to instead group tweets based on a fixed unit of time (e.g., grouping all tweets from the same month), we would run into a serious problem in cases where a user has periods of lessened tweet activity. Such periods of fewer tweets would result in sparser word distributions, which would cause the calculated surprise (detailed in the following section) to be arbitrarily high. To put it more simply, if we want to compare a user’s tweet to the tweet before it, the time lag between the two tweets is irrelevant—the last tweet simply represents the most recent action of the user and thus all that we have available to make our comparison.


  1. Bail CA et al (2018) Exposure to opposing views on social media can increase political polarization. PNAS 115(37):9216–9221.

    Article  Google Scholar 

  2. Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social networks in information diffusion. In: Proceedings of the 21st international conference on world wide web. Lyon, France, pp 519–528

  3. Barron ATJ, Huang J, Spang RL, DeDeo S (2018) Individuals, institutions, and innovation in the debates of the French Revolution. PNAS 115(18):4607–4612.

    Article  Google Scholar 

  4. Bird S, Klein E, Loper E (2009) Natural Language Processing with Python. O’Reilly Media, Sebastopol

    Google Scholar 

  5. Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secur Comput 9(6):811–824.

    Article  Google Scholar 

  6. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion. Perth, Australia, pp 963–972.

  7. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2018) Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans Dependable Secur Comput 15(4):561–576.

    Article  Google Scholar 

  8. Cresci S, Petrocchi M, Spognardi A, Tognazzi S (2019) On the capability of evolved spambots to evade detection via genetic engineering. Online Soc Netw Media 9:1–6.

    Article  Google Scholar 

  9. Del Vicario M et al (2016) The spreading of misinformation online. PNAS 113(3):554–559.

    Article  Google Scholar 

  10. Dickerson JP, Kagan V, Subrahmanian VS (2014) Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining. Beijing, China, pp 620–627

  11. Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104.

    Article  Google Scholar 

  12. Gilani Z, Almeida M, Farahbakhsh R, Wang L, Crowcroft J (2016) Stweeler: A framework for Twitter bot analysis. In: Proceedings of the 25th international conference companion on world wide web. Montréal, Canada, pp 37–38.

  13. Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining. Sydney, Australia, pp 349–354.

  14. Grimme C, Assenmacher D, Adam L (2018) Changing perspectives: Is it sufficient to detect social bots? In: Meiselwitz G (ed) Social computing and social media, user experience and behavior, SCSM 2018 lecture notes in computer science. Springer, Cham, pp 445–461.

    Google Scholar 

  15. Guo L, Tan E, Chen S, Zhang X, Zhao Y (2009) Analyzing patterns of user content generation in online social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. Paris, France, pp 369–378.

  16. Liao TW (2005) Clustering of time series data—a survey. Pattern Recognit 38(11):1857–1874.

    Article  MATH  Google Scholar 

  17. Murdock J, Allen C, DeDeo S (2017) Exploration and exploitation of Victorian science in Darwin’s reading notebooks. Cognition 159:117–126.

    Article  Google Scholar 

  18. Oliphant TE (2006) A guide to NumPy. Trelgol Publishing, Provo

    Google Scholar 

  19. Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2016) Understanding the trolling phenomenon: the automated detection of bots and cyborgs in the social media. J Inf Warf 15(4):100–111

    Google Scholar 

  20. Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2017) May I introduce you to a troll? Defining and categorizing internet behaviour commonly referred to as trolling. In: Proceedings of the 16th European conference on cyber warfare and security. Dublin, Ireland, pp 734–740

  21. Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. Toronto, Canada, pp 37–44.

  22. Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. Valletta, Malta, pp 45–50

  23. Schmidt AL et al (2017) Anatomy of news consumption on Facebook. PNAS 114(12):3035–3039.

    Article  Google Scholar 

  24. Stine ZK, Khaund T, Agarwal N (2018) Measuring the information-foraging behaviors of social bots through word usage. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining. Barcelona, Spain, pp 570–671.

  25. Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the 11th international AAAI conference on web and social media. pp 280–289

  26. Volkova S, Bachrach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of the 29th AAAI conference on artificial intelligence. pp 4296–4297

  27. Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293.

    Article  Google Scholar 

Download references


The authors wish to thank Stefano Cresci and colleagues for generously making the dataset available to us. This research is funded in part by the U.S. National Science Foundation (IIS-1636933, ACI-1429160, and IIS-1110868), U.S. Office of Naval Research (N00014-10-1-0091, N00014-14-1-0489, N00014-15-P-1187, N00014-16-1-2016, N00014-16-1-2412, N00014-17-1-2605, N00014-17-1-2675, N00014-19-1-2336), U.S. Air Force Research Lab, U.S. Army Research Office (W911NF-16-1-0189), U.S. Defense Advanced Research Projects Agency (W31P4Q-17-C-0059), Arkansas Research Alliance, and the Jerry L. Maulden/Entergy Endowment at the University of Arkansas at Little Rock. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations. The researchers gratefully acknowledge the support.

Author information



Corresponding author

Correspondence to Zachary K. Stine.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Stine, Z.K., Agarwal, N. Characterizing the language-production dynamics of social media users. Soc. Netw. Anal. Min. 9, 60 (2019).

Download citation


  • Social media
  • Human–computer interaction
  • Natural language processing
  • Computer-mediated communication
  • Information theory