Detecting Interethnic Relations with the Data from Social Media

  • Olessia KoltsovaEmail author
  • Sergey Nikolenko
  • Svetlana Alexeeva
  • Oleg Nagornyy
  • Sergei Koltcov
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 745)


The ability of social media to rapidly disseminate judgements on ethnicity and to influence offline ethnic relations creates demand for the methods of automatic monitoring of ethnicity related online content. In this study we seek to measure the overall volume of ethnicity related discussion in the Russian language social media and to develop an approach that would automatically detect various aspects of attitudes to those ethnic groups. We develop a comprehensive list of ethnonyms and related bigrams that embrace 97 Post-Soviet ethnic groups and obtain all messages containing one of those words from a two-year period from all Russian language social media (N = 2,660,222 texts). We hand-code 7,181 messages where rare ethnicities are overrepresented and train a number of classifiers to recognize different aspects of authors’ attitudes and other text features. After calculating a number of standard quality metrics, we find that we reach good quality in detecting intergroup conflict, positive intergroup contact, and overall negative and positive sentiment. Relevance to the topic of ethnicity and general attitude to an ethnic group are least well predicted, while some aspects such as calls for violence against an ethnic group are not sufficiently present in the data to be predicted.


Interethnic relations Ethnic attitudes Mapping Social media Classification Lexicon 



This work was done at the Laboratory for Internet Studies, National Research University Higher School of Economics (NRU HSE), Russia. It was supported by the Russian Research Foundation grant no. 15-18-00091.


  1. 1.
    Apishev, M., Koltsov, S., Koltsova, E.Y., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20, 387–403 (2016). doi: 10.13053/CyS-20-3-2473 Google Scholar
  2. 2.
    Attenberg, J., Ipeirotis, P.G., Provost, F.J.: Beat the machine: challenging workers to find the unknown unknowns. In: Proceedings of 11th AAAI Conference on Human Computation, pp. 2–7 (2011)Google Scholar
  3. 3.
    Bartlett, J., et al.: Anti-Social Media. Demos, London (2014)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Bodrunova, S., Koltsova, O., Nikolenko, S.: Are migranty all the same? Attitudes to re-settlers from post-soviet South in the Russian blogosphere (2016). Unpublished manuscriptGoogle Scholar
  6. 6.
    Bodrunova, S.S., Litvinenko, A.A., Gavra, D.P., Yakunin, A.V.: Twitter-based discourse on migrants in Russia: the case of 2013 bashings in Biryulyovo. Int. Rev. Manag. Mark. 5, 97–104 (2015)Google Scholar
  7. 7.
    Bohlin, L., Edler, D., Lancichinetti, A., Rosvall, M.: Community detection and visualization of networks with the map equation framework. In: Ding, Y., Rousseau, R., Wolfram, D. (eds.) Measuring Scholarly Impact, pp. 3–34. Springer, Cham (2014). doi: 10.1007/978-3-319-10377-8_1 Google Scholar
  8. 8.
    Burnap, P., Williams, M.L.: Cyber hate speech on Twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7, 223–242 (2015). doi: 10.1002/poi3.85 CrossRefGoogle Scholar
  9. 9.
    Chan, J., et al.: The internet and racial hate crime: offline spillovers from online access. MIS Q.: Manag. Inf. Syst. 40(2), 381–403 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Daniels, J.: Race and racism in Internet studies: a review and critique. New Media Soc. 15, 695–719 (2013). doi: 10.1177/1461444812462849 CrossRefGoogle Scholar
  11. 11.
    Dekker, R., Belabas, W., Scholten, P.: Interethnic contact online: contextualising the implications of social media use by second-generation migrant youth. J. Intercult. Stud. 36, 450–467 (2015). doi: 10.1080/07256868.2015.1049981 CrossRefGoogle Scholar
  12. 12.
    Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015). doi: 10.1145/2740908.2742760
  13. 13.
    Faris, R., Ashar, A., Gasser, U., Joo, D.: Understanding harmful speech online. Berkman Klein Center Research Publication No. 2016-21 (2016). doi: 10.2139/ssrn.2882824
  14. 14.
    Gagliardone, I.: Mapping and Analysing Hate Speech Online. Social Science Research Network, Rochester (2014)Google Scholar
  15. 15.
    Gibson, S., Lando, A.L.: Impact of Communication and the Media on Ethnic Conflict. IGI Global, Hershey (2015)Google Scholar
  16. 16.
    Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimed. Ubiquit. Eng. 10, 215–230 (2015). doi: 10.14257/ijmue.2015.10.4.21 CrossRefGoogle Scholar
  17. 17.
    Gladkova, A.: Linguistic and cultural diversity in Russian cyberspace: examining four ethnic groups online. J. Multicult. Discourses 10, 49–66 (2015). doi: 10.1080/17447143.2015.1011657 CrossRefGoogle Scholar
  18. 18.
    Glukhov, A.P.: Construction of national identity through a social network: a case study of ethnic networks of immigrants to Russia from Central Asia. AI Soc. 32, 101–108 (2017). doi: 10.1007/s00146-016-0644-9 CrossRefGoogle Scholar
  19. 19.
    Grasmuck, S., Martin, J., Zhao, S.: Ethno-racial identity displays on Facebook. J. Comput.-Mediat. Commun. 15, 158–188 (2009). doi: 10.1111/j.1083-6101.2009.01498.x CrossRefGoogle Scholar
  20. 20.
    Grishhenko, A.I., Nikolina, N.A.: Expressive ethnonyms as markers of hate speech [Jekspressivnye jetnonimy kak primety jazyka vrazhdy]. In: Hate Speech and Speech of Consent in the Socio-Cultural Context of Modern Society [Jazyk vrazhdy i jazyk soglasija v sociokul’turnom kontekste sovremennosti], pp. 175–187 (2006). (in Russian)Google Scholar
  21. 21.
    Kim, Y.-C., Jung, J.-Y., Ball-Rokeach, S.J.: Ethnicity, place, and communication technology: effects of ethnicity on multi-dimensional internet connectedness. Inf. Technol. People 20, 282–303 (2007). doi: 10.1108/09593840710822877 CrossRefGoogle Scholar
  22. 22.
    Korobkova, O.S.: Hate speech indicators in ethnic membership nominations: sociolinguistic aspect [Markery jazyka vrazhdy v nominacijah jetnicheskoj prinadlezhnosti: so-ciolingvisticheskij aspekt]. Izvestia: Herzen Univ. J. Humanit. Sci. [Izvestija Rossijskogo gosudarstvennogo pedagogicheskogo universiteta im. AI Gercena] 200–205 (2009). (in Russian)Google Scholar
  23. 23.
    Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013, pp. 1621–1622 (2013)Google Scholar
  24. 24.
    McLaine, S.: Ethnic online communities. In: Cyberactivism: Online Activism in Theory and Practice, pp. 233–254 (2003)Google Scholar
  25. 25.
    Mustafa, H., Hamid, H.A., Ahmad, J., Siarap, K.: Intercultural relationship, prejudice and ethnocentrism in a computer-mediated communication (CMC): a time-series experiment. Asian Soc. Sci. 8, 34–48 (2012). doi: 10.5539/ass.v8n3p34 Google Scholar
  26. 26.
    Nikolenko, S.I., et al.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017)CrossRefGoogle Scholar
  27. 27.
    Nakamura, L.: Cybertypes: Race, Ethnicity, and Identity on the Internet. Routledge, Abingdon (2013)Google Scholar
  28. 28.
    Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016). doi: 10.1145/2872427.2883062
  29. 29.
    Parker, D., Song, M.: New ethnicities online: reflexive racialisation and the internet. Soc. Rev. 54, 575–594 (2006). doi: 10.1111/j.1467-954X.2006.00630.x CrossRefGoogle Scholar
  30. 30.
    Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016, pp. 687–690 (2016)Google Scholar
  31. 31.
    Steinfeldt, J.A., Foltz, B.D., Kaladow, J.K., Carlson, T.N., Pagano Jr., L.A., Benton, E., Steinfeldt, M.C.: Racism in the electronic age: role of online forums in expressing racial attitudes about American Indians. Cult. Divers. Ethnic Minor. Psychol. 16, 362–371 (2010). doi: 10.1037/a0018692 CrossRefGoogle Scholar
  32. 32.
    Sternin, I.A.: Politically incorrect national names in language consciousness of language’s possessor [Nepolitkorrektnye naimenovanija lic v jazykovom soznanii nositelja jazyka]. Polit. linguist. [Politicheskaja lingvistika] 1, 191–193 (2013)Google Scholar
  33. 33.
    Trebbe, J., Schoenhagen, P.: Ethnic minorities in the mass media: how migrants perceive their representation in Swiss public television. J. Int. Migr. Integr. 12, 411–428 (2011). doi: 10.1007/s12134-011-0175-7 Google Scholar
  34. 34.
    Tukachinsky, R., Mastro, D., Yarchi, M.: Documenting portrayals of race/ethnicity on primetime television over a 20-year span and their association with national-level racial/ethnic attitudes. J. Soc. Issues 71, 17–38 (2015). doi: 10.1111/josi.12094 CrossRefGoogle Scholar
  35. 35.
    Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, B., Daelemans, W.: A dictionary-based approach to racism detection in Dutch social media. arXiv preprint arXiv:1608.08738 (2016)
  36. 36.
    Tynes, B.M., Giang, M.T., Thompson, G.N.: Ethnic identity, intergroup contact, and outgroup orientation among diverse groups of adolescents on the Internet. CyberPsychol. Behav. 11, 459–465 (2008). doi: 10.1089/cpb.2007.0085 CrossRefGoogle Scholar
  37. 37.
    Vepreva, I.T., Kupina, N.A.: The words of unrest in the world today: unofficial ethnonyms in real usage [Trevozhnaja leksika tekushhego vremeni: neoficial’nye jetnonimy v funkcii aktu-al’nyh slov]. Polit. linguist. [Politicheskaja lingvistika] 43–50 (2014). (in Russian)Google Scholar
  38. 38.
    Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Association for Computational Linguistics (2012)Google Scholar
  39. 39.
    Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the 1st Workshop on Natural Language Processing and Computational Social Science, pp. 138–142 (2016)Google Scholar
  40. 40.
    Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of NAACL-HLT 2016, pp. 88–93 (2016)Google Scholar
  41. 41.
    Zhu, Z.: Making the “invisible” a “visible problem”—the representation of Chinese illegal immigrants in US newspapers. J. Chin. Overseas 10, 61–90 (2014). doi: 10.1163/17932548-12341268 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Olessia Koltsova
    • 1
    Email author
  • Sergey Nikolenko
    • 1
    • 2
  • Svetlana Alexeeva
    • 1
    • 3
  • Oleg Nagornyy
    • 1
  • Sergei Koltcov
    • 1
  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.Steklov Mathematical InstituteSt. PetersburgRussia
  3. 3.St. Petersburg State UniversitySt. PetersburgRussia

Personalised recommendations