Reduced-Bias Co-trained Ensembles for Weakly Supervised Cyberbullying Detection

  • Elaheh RaisiEmail author
  • Bert Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11917)


Social media reflects many aspects of society, including social biases against individuals based on sensitive characteristics such as gender, race, religion, physical ability, and sexual orientation. Machine learning algorithms trained on social media data may therefore perpetuate or amplify discriminatory attitudes against various demographic groups, causing unfair decision-making. One important application for machine learning is the automatic detection of cyberbullying. Biases in this context could take the form of bullying detectors that make false detections more frequently on messages by or about certain identity groups. In this paper, we present an approach for training bullying detectors from weak supervision while reducing the degree to which learned models reflect or amplify discriminatory biases in the data. Our goal is to decrease the sensitivity of models to language describing particular social groups. An ideal, fair language-based detector should treat language describing subpopulations of particular social groups equitably. Building on a previously proposed weakly supervised learning algorithm, we penalize the model when discrimination is observed. By penalizing unfairness, we encourage the learning algorithm to avoid unfair behavior in its predictions and achieve equitable treatment for protected subpopulations. We introduce two unfairness penalty terms: one aimed at removal fairness and another at substitutional fairness. We quantitatively and qualitatively evaluate the resulting models’ fairness on a synthetic benchmark and data from Twitter comparing against crowdsourced annotation.


Cyberbullying detection Social media Weakly supervised machine learning Co-trained ensemble Fairness in machine learning Embedding models 


  1. 1.
    Bolukbasi, T., Chang, K., Zou, J.Y., Saligrama, V., Kalai, A.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. CoRR abs/1607.06520 (2016)Google Scholar
  2. 2.
    Boyd, D.: It’s Complicated. Yale University Press, New Haven (2014)Google Scholar
  3. 3.
    Chatzakou, D., Kourtellis, N., Blackburn, J., Cristofaro, E.D., Stringhini, G., Vakali, A.: Mean birds: detecting aggression and bullying on Twitter. CoRR abs/1702.06877 (2017)Google Scholar
  4. 4.
    Chelmis, C., Zois, D.S., Yao, M.: Mining patterns of cyberbullying on Twitter. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 126–133 (2017)Google Scholar
  5. 5.
    Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: International Conference on Social Computing, pp. 71–80 (2012)Google Scholar
  6. 6.
    Dieterich, W., Mendoza, C., Brennan, T.: Compas risk scales: demonstrating accuracy equity and predictive parity performance of the compas risk scales in broward county (2016)Google Scholar
  7. 7.
    Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: ICWSM Workshop on Social Mobile Web (2011)Google Scholar
  8. 8.
    Garg, S., Perot, V., Limtiaco, N., Taly, A., Chi, E.H., Beutel, A.: Counterfactual fairness in text classification through robustness. CoRR abs/1809.10610 (2018)Google Scholar
  9. 9.
    Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. CoRR abs/1607.00653 (2016)Google Scholar
  10. 10.
    Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. CoRR abs/1610.02413 (2016)Google Scholar
  11. 11.
    Hosseinmardi, H., Ghasemianlangroodi, A., Han, R., Lv, Q., Mishra, S.: Towards understanding cyberbullying behavior in a semi-anonymous social network. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 244–252 (2014)Google Scholar
  12. 12.
    Hosseinmardi, H., Mattson, S.A., Rafiq, R.I., Han, R., Lv, Q., Mishra, S.: Detection of cyberbullying incidents on the Instagram social network. Association for the Advancement of Artificial Intelligence (2015)Google Scholar
  13. 13.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)Google Scholar
  14. 14.
    Huang, Q., Singh, V.K.: Cyber bullying detection using social and textual analysis. In: Proceedings of the International Workshop on Socially-Aware Multimedia, pp. 3–6 (2014)Google Scholar
  15. 15.
    Kim, M.P., Ghorbani, A., Zou, J.Y.: Multiaccuracy: black-box post-processing for fairness in classification. CoRR abs/1805.12317 (2018)Google Scholar
  16. 16.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  17. 17.
    Nahar, V., Li, X., Pang, C.: An effective approach for cyberbullying detection. Commun. Inf. Sci. Manag. Eng. 3(5), 238–247 (2013)Google Scholar
  18. 18. List of swear words & curse words (2016).
  19. 19.
    Ptaszynski, M., Dybala, P., Matsuba, T., Masui, F., Rzepka, R., Araki, K.: Machine learning and affect analysis against cyber-bullying. In: Linguistic and Cognitive Approaches to Dialog Agents Symposium, pp. 7–16 (2010)Google Scholar
  20. 20.
    Raisi, E., Huang, B.: Co-trained ensemble models for weakly supervised cyberbullying detection. In: NeurIPS Workshop on Learning with Limited Labeled Data (2017)Google Scholar
  21. 21.
    Raisi, E., Huang, B.: Weakly supervised cyberbullying detection using co-trained ensembles of embedding models. In: Proceedings of the IEEE/ACM International Conference on Social Networks Analysis and Mining, pp. 479–486 (2018)Google Scholar
  22. 22.
    Rezvan, M., Shekarpour, S., Thirunarayan, K., Shalin, V.L., Sheth, A.P.: Analyzing and learning the language for different types of harassment. CoRR abs/1811.00644 (2018)Google Scholar
  23. 23.
    Sinders, C.: Toxicity and tone are not the same thing: analyzing the new Google API on toxicity, PerspectiveAPI (2017).
  24. 24.
    Soni, D., Singh, V.K.: See no evil, hear no evil: audio-visual-textual cyberbullying detection. Proc. ACM Hum.-Comput. Interact. 2, 164:1–164:26 (2018)CrossRefGoogle Scholar
  25. 25.
    Tomkins, S., Getoor, L., Chen, Y., Zhang, Y.: A socio-linguistic model for cyberbullying detection. In: International Conference on Advances in Social Networks Analysis and Mining (2018)Google Scholar
  26. 26.
    Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J.: Feature hashing for large scale multitask learning. In: Proceedings of the International Conference on Machine Learning, pp. 1113–1120 (2009)Google Scholar
  27. 27.
    Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on Web 2.0. Content Analysis in the WEB 2.0 (2009)Google Scholar
  28. 28.
    Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In: Proceedings of the International Conference on World Wide Web, pp. 1171–1180 (2017)Google Scholar
  29. 29.
    Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating unwanted biases with adversarial learning. CoRR abs/1801.07593 (2018)Google Scholar
  30. 30.
    Zois, D.S., Kapodistria, A., Yao, M., Chelmis, C.: Optimal online cyberbullying detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2017–2021 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Virginia TechBlacksburgUSA

Personalised recommendations