Reading Between the Lines: A Prototype Model for Detecting Twitter Sockpuppet Accounts Using Language-Agnostic Processes

  • Erin Smith CrabbEmail author
  • Alan Mishler
  • Susannah Paletz
  • Brook Hefright
  • Ewa Golonka
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 528)


Sockpuppets are online identities controlled by a user or group of users to manipulate the dissemination of information in digital environments. This manipulation can distort computational assessments of public opinion in social media. Using Russian-language Twitter data from the Ukrainian crisis in 2014, we present a proof-of-concept model employing character n-gram methods to detect sockpuppets. Previous research has demonstrated that n-gram authorship attribution methods can capture lexical preferences, including grammatical and orthographic preferences, while also being less computationally intensive than grammatical or compression language models. Additionally, they can be applied to any language data irrespective of orthography. In this study, a Naïve Bayes classifier was constructed using normalized frequencies of parsed character bigrams to contrast author bigram use. The created model illustrated that suspected sockpuppet accounts were less likely to be correctly classified, showing lower precision, recall, and f-measure rates than other accounts, as predicted.


Sockpuppetry Authorship attribution Character n-grams Public opinion measurement Social media 


  1. 1.
    Bu, Z., Xia, Z., Wang, J.: A sockpuppet detection algorithm on virtual spaces. Knowl.-Based Syst. 37, 366–377 (2013)CrossRefGoogle Scholar
  2. 2.
    Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: Proceedings of SDAIR-1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175. Information Science Research Institute, Las Vegas (1994)Google Scholar
  3. 3.
    Fornaciari, T., Poesio, M.: Identifying fake Amazon reviews as learning from crowds. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 279–287. Association for Computational Linguistics (2014)Google Scholar
  4. 4.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)CrossRefGoogle Scholar
  5. 5.
    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)CrossRefGoogle Scholar
  6. 6.
    Kukushkina, O., Polikarpov, A., Khmelev, D.: Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  7. 7.
    Kumar, S., Barbier, G., Abbasi, M., Liu, H.: TweetTracker: An analysis tool for humanitarian and disaster relief. In: Proceedings of the International Conference on Weblogs and Social Media, pp. 661–662. The AAAI Press, Palo Alto (2011)Google Scholar
  8. 8.
    Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer, New York (2013)Google Scholar
  9. 9.
    Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary and Linguistic Computing 26(1), 35–55 (2011)CrossRefGoogle Scholar
  10. 10.
    Petty, R., Cacioppo, J.: The elaboration likelihood model of persuasion. Adv. Soc. Psychol. 19, 123–205 (1986)CrossRefGoogle Scholar
  11. 11.
    Petty, R., Cacioppo, J., Strathman, A., Priester, J.: To think or not to think: Exploring two routes to persuasion. In: Brook, T.C., Green, M.C. (eds.) Persuasion: Psychological Insights and Perspectives, pp. 81–116. Sage, Thousand Oaks (2005)Google Scholar
  12. 12.
    Pratkanis, A., Aronson, E.: Age of Propaganda: The Everyday Use and Abuse of Persuasion. W. H. Freeman, New York (2001)Google Scholar
  13. 13.
    Solorio, T., Ragib, H., Mizan, M.: Sockpuppet detection in Wikipedia: A corpus of real-world deceptive writing for linking identities. Computing Research Repository (2013). arXIV: 1310.6772 [cs.CL]
  14. 14.
    Tsikerdekis, M., Zeadally, S.: Multiple account identity deception detection in social media using nonverbal behavior. Library and Information Science Faculty Publications, Paper 13 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Erin Smith Crabb
    • 1
    Email author
  • Alan Mishler
    • 1
  • Susannah Paletz
    • 1
  • Brook Hefright
    • 1
  • Ewa Golonka
    • 1
  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations