Advertisement

Characterizing the language-production dynamics of social media users

Abstract

In this paper, we propose a characterization of social media users based on language usage over time in order to make more rigorous the notions of organic and inorganic online behavior. This characterization describes the extent to which a user’s word usage within a particular time period subverts expectations based on preceding time periods. To do this, we adapt the use of an information theoretic measure of cognitive surprise and apply it to a set of behaviorally diverse Twitter users. We then compare the language-production dynamics across users based on term frequencies at multiple levels of granularity. We then illustrate the intuition behind this characterization through case studies of salient users identified from this method. Through these case studies, we find that this characterization can be linked to the degree to which a user’s word usage is organic, inorganic, or a mixture of both.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    The reason for doing so is to ensure that each group of tweets represents a roughly equal amount of linguistic activity, regardless of the time spanned by each grouping. If we were to instead group tweets based on a fixed unit of time (e.g., grouping all tweets from the same month), we would run into a serious problem in cases where a user has periods of lessened tweet activity. Such periods of fewer tweets would result in sparser word distributions, which would cause the calculated surprise (detailed in the following section) to be arbitrarily high. To put it more simply, if we want to compare a user’s tweet to the tweet before it, the time lag between the two tweets is irrelevant—the last tweet simply represents the most recent action of the user and thus all that we have available to make our comparison.

References

  1. Bail CA et al (2018) Exposure to opposing views on social media can increase political polarization. PNAS 115(37):9216–9221. https://doi.org/10.1073/pnas.1804840115

  2. Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social networks in information diffusion. In: Proceedings of the 21st international conference on world wide web. Lyon, France, pp 519–528

  3. Barron ATJ, Huang J, Spang RL, DeDeo S (2018) Individuals, institutions, and innovation in the debates of the French Revolution. PNAS 115(18):4607–4612. https://doi.org/10.1073/pnas.1717729115

  4. Bird S, Klein E, Loper E (2009) Natural Language Processing with Python. O’Reilly Media, Sebastopol

  5. Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secur Comput 9(6):811–824. https://doi.org/10.1109/TDSC.2012.75

  6. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion. Perth, Australia, pp 963–972. https://doi.org/10.1145/3041021.3055135

  7. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2018) Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans Dependable Secur Comput 15(4):561–576. https://doi.org/10.1109/TDSC.2017.2681672

  8. Cresci S, Petrocchi M, Spognardi A, Tognazzi S (2019) On the capability of evolved spambots to evade detection via genetic engineering. Online Soc Netw Media 9:1–6. https://doi.org/10.1016/j.osnem.2018.10.005

  9. Del Vicario M et al (2016) The spreading of misinformation online. PNAS 113(3):554–559. https://doi.org/10.1073/pnas.1517441113

  10. Dickerson JP, Kagan V, Subrahmanian VS (2014) Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining. Beijing, China, pp 620–627

  11. Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104. https://doi.org/10.1145/2818717

  12. Gilani Z, Almeida M, Farahbakhsh R, Wang L, Crowcroft J (2016) Stweeler: A framework for Twitter bot analysis. In: Proceedings of the 25th international conference companion on world wide web. Montréal, Canada, pp 37–38. https://doi.org/10.1145/2872518.2889360

  13. Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining. Sydney, Australia, pp 349–354. https://doi.org/10.1145/3110025.3110090

  14. Grimme C, Assenmacher D, Adam L (2018) Changing perspectives: Is it sufficient to detect social bots? In: Meiselwitz G (ed) Social computing and social media, user experience and behavior, SCSM 2018 lecture notes in computer science. Springer, Cham, pp 445–461. https://doi.org/10.1007/978-3-319-91521-0_32

  15. Guo L, Tan E, Chen S, Zhang X, Zhao Y (2009) Analyzing patterns of user content generation in online social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. Paris, France, pp 369–378. https://doi.org/10.1145/1557019.1557064

  16. Liao TW (2005) Clustering of time series data—a survey. Pattern Recognit 38(11):1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025

  17. Murdock J, Allen C, DeDeo S (2017) Exploration and exploitation of Victorian science in Darwin’s reading notebooks. Cognition 159:117–126. https://doi.org/10.1016/j.cognition.2016.11.012

  18. Oliphant TE (2006) A guide to NumPy. Trelgol Publishing, Provo

  19. Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2016) Understanding the trolling phenomenon: the automated detection of bots and cyborgs in the social media. J Inf Warf 15(4):100–111

  20. Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2017) May I introduce you to a troll? Defining and categorizing internet behaviour commonly referred to as trolling. In: Proceedings of the 16th European conference on cyber warfare and security. Dublin, Ireland, pp 734–740

  21. Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. Toronto, Canada, pp 37–44. https://doi.org/10.1145/1871985.1871993

  22. Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. Valletta, Malta, pp 45–50

  23. Schmidt AL et al (2017) Anatomy of news consumption on Facebook. PNAS 114(12):3035–3039. https://doi.org/10.1073/pnas.1617052114

  24. Stine ZK, Khaund T, Agarwal N (2018) Measuring the information-foraging behaviors of social bots through word usage. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining. Barcelona, Spain, pp 570–671. https://doi.org/10.1109/ASONAM.2018.8508811

  25. Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the 11th international AAAI conference on web and social media. pp 280–289

  26. Volkova S, Bachrach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of the 29th AAAI conference on artificial intelligence. pp 4296–4297

  27. Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293. https://doi.org/10.1109/TIFS.2013.2267732

Download references

Acknowledgements

The authors wish to thank Stefano Cresci and colleagues for generously making the dataset available to us. This research is funded in part by the U.S. National Science Foundation (IIS-1636933, ACI-1429160, and IIS-1110868), U.S. Office of Naval Research (N00014-10-1-0091, N00014-14-1-0489, N00014-15-P-1187, N00014-16-1-2016, N00014-16-1-2412, N00014-17-1-2605, N00014-17-1-2675, N00014-19-1-2336), U.S. Air Force Research Lab, U.S. Army Research Office (W911NF-16-1-0189), U.S. Defense Advanced Research Projects Agency (W31P4Q-17-C-0059), Arkansas Research Alliance, and the Jerry L. Maulden/Entergy Endowment at the University of Arkansas at Little Rock. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations. The researchers gratefully acknowledge the support.

Author information

Correspondence to Zachary K. Stine.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Stine, Z.K., Agarwal, N. Characterizing the language-production dynamics of social media users. Soc. Netw. Anal. Min. 9, 60 (2019) doi:10.1007/s13278-019-0605-7

Download citation

Keywords

  • Social media
  • Human–computer interaction
  • Natural language processing
  • Computer-mediated communication
  • Information theory