Skip to main content

Automated Classification of Potentially Insulting Speech Acts on Social Network Sites

  • Conference paper
  • First Online:
Digital Transformation and Global Society (DTGS 2021)

Abstract

Insulting speech acts have become the subject of public discussion in the media, social media, the basis for speculation in political communication, and a working concept in the legal environment. The present research article explores insulting speech acts on the social network site “VKontakte” aiming to develop an algorithm for automatic classification of text data. We conducted semantic analysis of the text of “Article 5.61” of the Code of Administrative Offenses of the Russian Federation, which made it possible to formulate inclusion criteria for formal classification. We used three common word embeddings models (BERT, ELMo, and fastText) on the original Russian language dataset consisting of 4596 annotated messages perceived as insulting speech acts. General findings argue that even in a specialized dataset the share of messages that meet criteria of inclusion is negligible. This indicates a low probability of going to court on the fact of an administrative offense under Article 5.61 based on speech communication on social network sites, even though such communication is public in nature and is automatically recorded in writing. Machine learning text classifier based on BERT model showed best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://huggingface.co/DeepPavlov/rubert-base-cased-conversational.

  2. 2.

    https://huggingface.co/sismetanin/rubert_conversational-ru-sentiment-rusentiment.

References

  1. Ahrenova, N.A.: Internet-lingvistika: Novaja paradigma v opisanii jazyka Interneta. Vestnik Moskovskogo gosudarstvennogo oblastnogo socialno-gumanitarnogo instituta 3, 8–14 (2016)

    Google Scholar 

  2. AI from Siberia will find covert forbidden texts on the Web (2019) . https://roskomsvoboda.org/53920/

  3. Article 5.61 of the Code of Administrative Offenses of the Russian Federation. https://www.consultant.ru/document/cons_doc_LAW_34661/d40cbd099d17057d9697b15ee8368e49953416ae/

  4. Audience’s features of “VKontakte”. https://www.demis.ru/articles/celevaya-auditoria-vkontakte/

  5. Bojanowski, P., et al.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  6. Brusenskaya, L.A., Arsenieva, V.A., Suryanto, T.: Verbal crime: the problem of insult in the media text. Media Educ. (Mediaobrazovanie) 58(3), 12–23 (2018). https://doi.org/10.13187/me.2018.3.12

    Article  Google Scholar 

  7. Crystal, D.: The Language Revolution. Polity Press Ltd., Cambridge (2008)

    Google Scholar 

  8. Culpeper, J., Iganski, P., Sweiry, A.: Linguistic impoliteness and religiously aggravated hate crime in England and Wales. J. Lang. Aggr. Confl. 5(1), 1–29 (2017). https://doi.org/10.1075/jlac.5.1.01cul

    Article  Google Scholar 

  9. Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  10. Durán Sánchez, C.A.: Aspectos interventores en la participación política y electoral de jóvenes. Una reflexión sobre la información, interacción y difusión de contenidos en redes sociales para futuras investigaciones en Santander. Desafíos 27(1), 47–81 (2015). https://doi.org/10.12804/desafios27.01.2015.02

  11. Galyashina, E.: The distinction between the forensic linguistic and scientific activity of linguist analyst: competencies, methods and technologies. Acta Linguistica Petropolitana 1(15), 104–129 (2019). https://doi.org/10.30842/alp2306573715105

    Article  Google Scholar 

  12. Jaroshhuk, I.A., Zhukova, N.A., Dolzhenko, N.I.: Linguistic expertise. BelGU, Belgorod (2020)

    Google Scholar 

  13. Kennedy, J.: Rhetorics of sharing: data, imagination, and desire. In: Lovink, G., Rasch, M. (eds.) Unlike Us Reader: Social Media Monopolies and Their Alternatives, pp. 127–136. Institute of Network Cultures, Amsterdam (2013)

    Google Scholar 

  14. Komalova, L., Goloshchapova, T., Motovskikh, L., Epifanov, R., Morozov, D., Glazkova, A.: MCA Workshop – Toxic Comments (2021). https://doi.org/10.17632/fktgy52645.1, https://data.mendeley.com/datasets/fktgy52645/1

  15. Komalova, L.R.: Agressogen Discourse: The Multilingual Aggression Verbalization Typology. Publishing House «Sputnik +», Moscow (2020)

    Google Scholar 

  16. Komalova, L.R.: Repertory of verbal realization of reciprocal aggression in situation of status-role asimmetry. Vestnik of Moscow State Linguistic University. Humanities 9(695), 103–111 (2014)

    Google Scholar 

  17. Kukushkina, O.V., Safonova, J., Sekerazh, T.N.: Theoretical and Methodological Foundations for Psycho-linguistic Text Expertise on Extremism Cases. RFCSJe pri Minjuste Rossii, Moscow (2011)

    Google Scholar 

  18. Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. Comput. Linguist. Intellect. Technol. 18, 333–339 (2019)

    Google Scholar 

  19. Kusov, G.V.: Kommunikativnaja perversija kak sposob diagnostiki iskazhenij pri oskorblenijah. Jurislingvistika 6, 43–55 (2005)

    Google Scholar 

  20. Kutuzov, A., et al.: Word vectors, reuse, and replicability: towards a community repository of large-text resources. In: Proceedings of the 58th Conference on Simulation and Modelling, pp. 271–276. Linköping University Electronic Press (2017)

    Google Scholar 

  21. Lambke, A.: The social dilemma. In: Netflix, Documentary Films (2020). https://www.netflix.com/ru-en/title/81254224

  22. McCulloch, M.: Because Internet: Understanding the New Rules of Language. Riverhead Book, New York (2019)

    Google Scholar 

  23. Miconi, A.: Under the skin of the networks: how concentration affects social practices in web 2.0 environments. In: Lovink, G., Rasch, M. (eds.) Unlike Us Reader: Social Media Monopolies and Their Alternatives, pp. 89–102. Institute of Network Cultures, Amsterdam (2013)

    Google Scholar 

  24. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013). https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

  25. MSA-Workshop (2020). https://gitlab.com/rostepifanov/mca-workshop

  26. News for the Press (2020). https://vk.com/press/no-hate-speech

  27. Paasch-Colberg, S., Strippel, C., Trebbe, J., Emmer, M.: From insult to hate speech: mapping offensive language in German user comments on immigration. Media Commun. 9(1), 171–180 (2021). https://doi.org/10.17645/mac.v9i1.3399

    Article  Google Scholar 

  28. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019). https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

  29. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365 (2018)

  31. Russian language toxic comments. https://www.kaggle.com/blackmoon/russian-language-toxic-comments

  32. Shahmatova, T.S.: Oskorblenie kak instrument jazykovogo nasilija v rechevyh situacijah institucionalnogo obshhenija. Uchenye zapiski Kazanskogo universiteta. Serija. Gumanitarnye nauki 155(5), 267–278 (2013)

    Google Scholar 

  33. Smetanin, S., Komarov, M.: Deep transfer learning baselines for sentiment analysis in Russian. Inf. Process. Manag. 3(58), 102484 (2021). https://doi.org/10.1016/j.ipm.2020.102484

    Article  Google Scholar 

  34. Špago, D., Maslo, A., Špago-Ćumurija, E.: Insults speak louder than words: Donald Trump’s tweets through the lens of the speech act of insulting. Folia Linguistica et Litteraria 27, 139–159 (2019)

    Article  Google Scholar 

  35. Sponholz, L., Christofoletti, R.: From preachers to comedians: Ideal types of hate speakers in Brazil. Glob. Media Commun. 15(1), 67–84 (2019). https://doi.org/10.1177/1742766518818870

    Article  Google Scholar 

  36. The Multilingual Internet: Language, Culture, and Communication Online. Oxford University Press, Oxford (2007)

    Google Scholar 

  37. VKontakte told about increase of more than 22% to 73 million in Russian audience. https://vk.com/press/q1-2020-results

  38. Wolf, T., et al.: Transformers: state-of-the-art natural language processing. arXiv:1910.03771 (2019)

Download references

Funding

The research done for this work has been supported by the 1st Workshop at the Mathematical Center in Akademgorodok (project No 26 “Mathematical support for linguistic expertise”, 13 July–14 August, 2020) http://mca.nsu.ru/workshopen/. The authors express their sincere gratitude to the students of the Engineering School of Novosibirsk State University, especially to M.V. Fedorova and E.V. Timofeeva, as well as a student of the Higher School of Economics M.O. Maslova, who made an invaluable contribution to the collection of the dataset and acted as annotators.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liliya Komalova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Komalova, L., Glazkova, A., Morozov, D., Epifanov, R., Motovskikh, L., Mayorova, E. (2022). Automated Classification of Potentially Insulting Speech Acts on Social Network Sites. In: Alexandrov, D.A., et al. Digital Transformation and Global Society. DTGS 2021. Communications in Computer and Information Science, vol 1503. Springer, Cham. https://doi.org/10.1007/978-3-030-93715-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93715-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93714-0

  • Online ISBN: 978-3-030-93715-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics