Automated Classification of Potentially Insulting Speech Acts on Social Network Sites

Komalova, Liliya; Glazkova, Anna; Morozov, Dmitry; Epifanov, Rostislav; Motovskikh, Leonid; Mayorova, Ekaterina

doi:10.1007/978-3-030-93715-7_26

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1503))

Included in the following conference series:

International Conference on Digital Transformation and Global Society

1204 Accesses
2 Citations

Abstract

Insulting speech acts have become the subject of public discussion in the media, social media, the basis for speculation in political communication, and a working concept in the legal environment. The present research article explores insulting speech acts on the social network site “VKontakte” aiming to develop an algorithm for automatic classification of text data. We conducted semantic analysis of the text of “Article 5.61” of the Code of Administrative Offenses of the Russian Federation, which made it possible to formulate inclusion criteria for formal classification. We used three common word embeddings models (BERT, ELMo, and fastText) on the original Russian language dataset consisting of 4596 annotated messages perceived as insulting speech acts. General findings argue that even in a specialized dataset the share of messages that meet criteria of inclusion is negligible. This indicates a low probability of going to court on the fact of an administrative offense under Article 5.61 based on speech communication on social network sites, even though such communication is public in nature and is automatically recorded in writing. Machine learning text classifier based on BERT model showed best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ahrenova, N.A.: Internet-lingvistika: Novaja paradigma v opisanii jazyka Interneta. Vestnik Moskovskogo gosudarstvennogo oblastnogo socialno-gumanitarnogo instituta 3, 8–14 (2016)
Google Scholar
AI from Siberia will find covert forbidden texts on the Web (2019) . https://roskomsvoboda.org/53920/
Article 5.61 of the Code of Administrative Offenses of the Russian Federation. https://www.consultant.ru/document/cons_doc_LAW_34661/d40cbd099d17057d9697b15ee8368e49953416ae/
Audience’s features of “VKontakte”. https://www.demis.ru/articles/celevaya-auditoria-vkontakte/
Bojanowski, P., et al.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Brusenskaya, L.A., Arsenieva, V.A., Suryanto, T.: Verbal crime: the problem of insult in the media text. Media Educ. (Mediaobrazovanie) 58(3), 12–23 (2018). https://doi.org/10.13187/me.2018.3.12
Article Google Scholar
Crystal, D.: The Language Revolution. Polity Press Ltd., Cambridge (2008)
Google Scholar
Culpeper, J., Iganski, P., Sweiry, A.: Linguistic impoliteness and religiously aggravated hate crime in England and Wales. J. Lang. Aggr. Confl. 5(1), 1–29 (2017). https://doi.org/10.1075/jlac.5.1.01cul
Article Google Scholar
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Durán Sánchez, C.A.: Aspectos interventores en la participación política y electoral de jóvenes. Una reflexión sobre la información, interacción y difusión de contenidos en redes sociales para futuras investigaciones en Santander. Desafíos 27(1), 47–81 (2015). https://doi.org/10.12804/desafios27.01.2015.02
Galyashina, E.: The distinction between the forensic linguistic and scientific activity of linguist analyst: competencies, methods and technologies. Acta Linguistica Petropolitana 1(15), 104–129 (2019). https://doi.org/10.30842/alp2306573715105
Article Google Scholar
Jaroshhuk, I.A., Zhukova, N.A., Dolzhenko, N.I.: Linguistic expertise. BelGU, Belgorod (2020)
Google Scholar
Kennedy, J.: Rhetorics of sharing: data, imagination, and desire. In: Lovink, G., Rasch, M. (eds.) Unlike Us Reader: Social Media Monopolies and Their Alternatives, pp. 127–136. Institute of Network Cultures, Amsterdam (2013)
Google Scholar
Komalova, L., Goloshchapova, T., Motovskikh, L., Epifanov, R., Morozov, D., Glazkova, A.: MCA Workshop – Toxic Comments (2021). https://doi.org/10.17632/fktgy52645.1, https://data.mendeley.com/datasets/fktgy52645/1
Komalova, L.R.: Agressogen Discourse: The Multilingual Aggression Verbalization Typology. Publishing House «Sputnik +», Moscow (2020)
Google Scholar
Komalova, L.R.: Repertory of verbal realization of reciprocal aggression in situation of status-role asimmetry. Vestnik of Moscow State Linguistic University. Humanities 9(695), 103–111 (2014)
Google Scholar
Kukushkina, O.V., Safonova, J., Sekerazh, T.N.: Theoretical and Methodological Foundations for Psycho-linguistic Text Expertise on Extremism Cases. RFCSJe pri Minjuste Rossii, Moscow (2011)
Google Scholar
Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. Comput. Linguist. Intellect. Technol. 18, 333–339 (2019)
Google Scholar
Kusov, G.V.: Kommunikativnaja perversija kak sposob diagnostiki iskazhenij pri oskorblenijah. Jurislingvistika 6, 43–55 (2005)
Google Scholar
Kutuzov, A., et al.: Word vectors, reuse, and replicability: towards a community repository of large-text resources. In: Proceedings of the 58th Conference on Simulation and Modelling, pp. 271–276. Linköping University Electronic Press (2017)
Google Scholar
Lambke, A.: The social dilemma. In: Netflix, Documentary Films (2020). https://www.netflix.com/ru-en/title/81254224
McCulloch, M.: Because Internet: Understanding the New Rules of Language. Riverhead Book, New York (2019)
Google Scholar
Miconi, A.: Under the skin of the networks: how concentration affects social practices in web 2.0 environments. In: Lovink, G., Rasch, M. (eds.) Unlike Us Reader: Social Media Monopolies and Their Alternatives, pp. 89–102. Institute of Network Cultures, Amsterdam (2013)
Google Scholar
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013). https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
MSA-Workshop (2020). https://gitlab.com/rostepifanov/mca-workshop
News for the Press (2020). https://vk.com/press/no-hate-speech
Paasch-Colberg, S., Strippel, C., Trebbe, J., Emmer, M.: From insult to hate speech: mapping offensive language in German user comments on immigration. Media Commun. 9(1), 171–180 (2021). https://doi.org/10.17645/mac.v9i1.3399
Article Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019). https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365 (2018)
Russian language toxic comments. https://www.kaggle.com/blackmoon/russian-language-toxic-comments
Shahmatova, T.S.: Oskorblenie kak instrument jazykovogo nasilija v rechevyh situacijah institucionalnogo obshhenija. Uchenye zapiski Kazanskogo universiteta. Serija. Gumanitarnye nauki 155(5), 267–278 (2013)
Google Scholar
Smetanin, S., Komarov, M.: Deep transfer learning baselines for sentiment analysis in Russian. Inf. Process. Manag. 3(58), 102484 (2021). https://doi.org/10.1016/j.ipm.2020.102484
Article Google Scholar
Špago, D., Maslo, A., Špago-Ćumurija, E.: Insults speak louder than words: Donald Trump’s tweets through the lens of the speech act of insulting. Folia Linguistica et Litteraria 27, 139–159 (2019)
Article Google Scholar
Sponholz, L., Christofoletti, R.: From preachers to comedians: Ideal types of hate speakers in Brazil. Glob. Media Commun. 15(1), 67–84 (2019). https://doi.org/10.1177/1742766518818870
Article Google Scholar
The Multilingual Internet: Language, Culture, and Communication Online. Oxford University Press, Oxford (2007)
Google Scholar
VKontakte told about increase of more than 22% to 73 million in Russian audience. https://vk.com/press/q1-2020-results
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. arXiv:1910.03771 (2019)

Download references

Funding

The research done for this work has been supported by the 1^st Workshop at the Mathematical Center in Akademgorodok (project No 26 “Mathematical support for linguistic expertise”, 13 July–14 August, 2020) http://mca.nsu.ru/workshopen/. The authors express their sincere gratitude to the students of the Engineering School of Novosibirsk State University, especially to M.V. Fedorova and E.V. Timofeeva, as well as a student of the Higher School of Economics M.O. Maslova, who made an invaluable contribution to the collection of the dataset and acted as annotators.

Author information

Authors and Affiliations

Institute of Scientific Information for Social Sciences of the Russian Academy of Sciences, 51/21 Nakhimovsky Prospect, Moscow, 117418, Russia
Liliya Komalova & Ekaterina Mayorova
Moscow State Linguistic University, 38 Ostozhenka Str., Moscow, 119034, Russia
Liliya Komalova & Leonid Motovskikh
University of Tyumen, 6 Volodarskogo Str., Tyumen, 625003, Russia
Anna Glazkova
Novosibirsk State University, 1 Pirogova Str., Novosibirsk, 630090, Russia
Dmitry Morozov & Rostislav Epifanov

Authors

Liliya Komalova
View author publications
You can also search for this author in PubMed Google Scholar
Anna Glazkova
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Morozov
View author publications
You can also search for this author in PubMed Google Scholar
Rostislav Epifanov
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Motovskikh
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Mayorova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liliya Komalova .

Editor information

Editors and Affiliations

National Research University Higher School of Economics, St. Petersburg, Russia
Daniel A. Alexandrov
ITMO University, St. Petersburg, Russia
Alexander V. Boukhanovsky
ITMO University, St. Petersburg, Russia
Andrei V. Chugunov
National Research University Higher School of Economics, St. Petersburg, Russia
Yury Kabanov
National Research University Higher School of Economics, St. Petersburg, Russia
Olessia Koltsova
National Research University Higher School of Economics, St. Petersburg, Russia
Ilya Musabirov
National Research University Higher School of Economics, St. Petersburg, Russia
Sergei Pashakhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Komalova, L., Glazkova, A., Morozov, D., Epifanov, R., Motovskikh, L., Mayorova, E. (2022). Automated Classification of Potentially Insulting Speech Acts on Social Network Sites. In: Alexandrov, D.A., et al. Digital Transformation and Global Society. DTGS 2021. Communications in Computer and Information Science, vol 1503. Springer, Cham. https://doi.org/10.1007/978-3-030-93715-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-93715-7_26
Published: 25 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93714-0
Online ISBN: 978-3-030-93715-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics