Like Trainer, Like Bot? Inheritance of Bias in Algorithmic Content Moderation

  • Reuben BinnsEmail author
  • Michael Veale
  • Max Van Kleek
  • Nigel Shadbolt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10540)


The internet has become a central medium through which ‘networked publics’ express their opinions and engage in debate. Offensive comments and personal attacks can inhibit participation in these spaces. Automated content moderation aims to overcome this problem using machine learning classifiers trained on large corpora of texts manually annotated for offence. While such systems could help encourage more civil debate, they must navigate inherently normatively contestable boundaries, and are subject to the idiosyncratic norms of the human raters who provide the training data. An important objective for platforms implementing such measures might be to ensure that they are not unduly biased towards or against particular norms of offence. This paper provides some exploratory methods by which the normative biases of algorithmic content moderation systems can be measured, by way of a case study using an existing dataset of comments labelled for offence. We train classifiers on comments labelled by different demographic subsets (men and women) to understand how differences in conceptions of offence between these groups might affect the performance of the resulting models on various test sets. We conclude by discussing some of the ethical choices facing the implementers of algorithmic moderation systems, given various desired levels of diversity of viewpoints amongst discussion participants.


Algorithmic accountability Machine learning Online abuse Discussion platforms Freedom of speech 



Authors at the University of Oxford were supported under SOCIAM: The Theory and Practice of Social Machines, funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/J017728/2. Michael Veale was supported by EPSRC grant number EP/M507970/1. The UCL Legion High Performance Computing Facility (Legion@UCL) supported part of the analysis. Thanks additionally go to three anonymous reviewers for their helpful comments.


  1. 1.
    Perspective API (webpage). Accessed 04 Jul 2017
  2. 2.
    Anderson, A.A., Brossard, D., Scheufele, D.A., Xenos, M.A., Ladwig, P.: The nasty effect: online incivility and risk perceptions of emerging technologies. J. Comput.-Mediated Commun. 19(3), 373–387 (2014)CrossRefGoogle Scholar
  3. 3.
    Bassey, E.: The Times Sharply Increases Articles Open for Comments, Using Googles Technology. The New York Times.
  4. 4.
    Bolukbasi, T., Chang, K., Zou, J.Y., Saligrama, V., Kalai, A.: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings (2016).
  5. 5.
    Boyd, D.: Social network sites as networked publics: affordances, dynamics, and implications. In: Networked Self: Identity, Community, and Culture on Social Network Sites, pp. 39–58. Routledge, London (2010)Google Scholar
  6. 6.
    Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016)CrossRefGoogle Scholar
  7. 7.
    Calders, T., Žliobaitė, I.: Why unbiased computational processes can lead to discriminative decision procedures. In: Custers, B., Calders, T., Schermer, B., Zarsky, T. (eds.) Discrimination and Privacy in the Information Society. SAPERE, vol. 3, pp. 43–59. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30487-3_3 CrossRefGoogle Scholar
  8. 8.
    Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017). CrossRefGoogle Scholar
  9. 9.
    Centivany, A.: Values, ethics and participatory policymaking in online communities. Proc. Assoc. Inf. Sci. Technol. 53(1), 1–10 (2016)CrossRefGoogle Scholar
  10. 10.
    Chandrasekharan, E., Samory, M., Srinivasan, A., Gilbert, E.: The bag of communities: identifying abusive behavior online with preexisting internet data. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3175–3187. ACM (2017)Google Scholar
  11. 11.
    Crawford, K., Gillespie, T.: What is a flag for? social media reporting tools and the vocabulary of complaint. New Media Soc. 18(3), 410–428 (2016)CrossRefGoogle Scholar
  12. 12.
    Dahlberg, L.: The internet and democratic discourse: exploring the prospects of online deliberative forums extending the public sphere. Inf. Commun. Soc. 4(4), 615–633 (2001)CrossRefGoogle Scholar
  13. 13.
    Diakopoulos, N.: CommentiQ: Enhancing journalistic curation of online news comments. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 715–716 (2016).
  14. 14.
    Diakopoulos, N., Naaman, M.: Towards quality discourse in online news comments. In: Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, pp. 133–142. ACM (2011)Google Scholar
  15. 15.
    Feinberg, J.: Offense to Others, vol. 2. Oxford University Press, Oxford (1985)Google Scholar
  16. 16.
    Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268 (2015)Google Scholar
  17. 17.
    Gagliardone, I., Gal, D., Alves, T., Martinez, G.: Countering Online Hate Speech. UNESCO Publishing, Paris (2015)Google Scholar
  18. 18.
    Gillespie, T.: The politics of platforms. New Media Soc. 12(3), 347–364 (2010)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Halpern, D., Gibbs, J.: Social media as a catalyst for online deliberation? exploring the affordances of Facebook and YouTube for political expression. Comput. Hum. Behav. 29(3), 1159–1168 (2013)CrossRefGoogle Scholar
  20. 20.
    Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Measures 1(1), 77–89 (2007)CrossRefGoogle Scholar
  21. 21.
    Jay, T.: Cursing in America: A Psycholinguistic Study of Dirty Language in the Courts, in the Movies, in the Schoolyards, and on the Streets. John Benjamins Publishing, Philadelphia (1992)CrossRefGoogle Scholar
  22. 22.
    Johnson, F.L., Fine, M.G.: Sex differences in uses and perceptions of obscenity. Women’s Stud. Commun. 8(1), 11–24 (1985)CrossRefGoogle Scholar
  23. 23.
    Ksiazek, T.B.: Civil interactivity: how news organizations’ commenting policies explain civility and hostility in user comments. J. Broadcast. Electron. Media 59(4), 556–573 (2015)CrossRefGoogle Scholar
  24. 24.
    Martire, R.L.: REL: Reliability Coefficients (2017), rpackageversion1.3.0.
  25. 25.
    Mill, J.S.: On Liberty. Broadview Press, Orchard Park (1999)Google Scholar
  26. 26.
    Pavlopoulos, J., Malakasiotis, P., Androutsopoulos, I.: Deep learning for user comment moderation (2017).
  27. 27.
    Rohrer, K.: First steps to curbing toxicity. Discus Blog, April 2017.
  28. 28.
    Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, pp. 1–10 (2017)Google Scholar
  29. 29.
    Schrock, A., Boyd, D.: Problematic youth interaction online: solicitation, harassment, and cyberbullying. In: Computer-Mediated Communication in Personal Relationships, pp. 368–398 (2011)Google Scholar
  30. 30.
    Spertus, E.: Smokey: automatic recognition of hostile messages. In: IAAI-97 Proceedings, pp. 1058–1065 (1997)Google Scholar
  31. 31.
    Stroud, N.J., Scacco, J.M., Muddiman, A., Curry, A.L.: Changing deliberative norms on news organizations’ Facebook sites. J. Comput.-Mediated Commun. 20(2), 188–203 (2015)CrossRefGoogle Scholar
  32. 32.
    Sukumaran, A., Vezich, S., McHugh, M., Nass, C.: Normative influences on thoughtful online participation. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3401–3410. ACM (2011)Google Scholar
  33. 33.
    Sutton, L.A.: Bitches and skanky hobags. In: Hall, K., Buchholz, M. (eds.) Gender Articulated: Language and the Socially Constructed Self, pp. 279–296. Routledge, London (2001)Google Scholar
  34. 34.
    Tokunaga, R.S.: Following you home from school: a critical review and synthesis of research on cyberbullying victimization. Comput. Hum. Behav. 26(3), 277–287 (2010)CrossRefGoogle Scholar
  35. 35.
    Wagner, K.: Twitter says its going to start pushing more abusive tweets out of sight. Recode, February 2017.
  36. 36.
    Wolak, J., Mitchell, K.J., Finkelhor, D.: Does online harassment constitute bullying? an exploration of online harassment by known peers and online-only contacts. J. Adolesc. Health 41(6), S51–S58 (2007)CrossRefGoogle Scholar
  37. 37.
    Wulczyn, E., Thain, N., Dixon, L.: Ex Machina: personal attacks seen at scale. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1391–1399. International World Wide Web Conferences Steering Committee (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Reuben Binns
    • 1
    Email author
  • Michael Veale
    • 2
  • Max Van Kleek
    • 1
  • Nigel Shadbolt
    • 1
  1. 1.Department of Computer ScienceUniversity of OxfordOxfordUK
  2. 2.Department of Science, Technology, Engineering and Public Policy (STEaPP)University College LondonLondonUK

Personalised recommendations