Skip to main content

The Other Side of Compression: Measuring Bias in Pruned Transformers

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XXI (IDA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13876))

Included in the following conference series:

  • 810 Accesses


Social media platforms have become popular worldwide. Online discussion forums attract users because of their easy access, speech freedom, and ease of communication. Yet there are also possible negative aspects of such communication, including hostile and hate language. While fast and effective solutions for detecting inappropriate language online are constantly being developed, there is little research focusing on the bias of compressed language models that are commonly used nowadays. In this work, we evaluate bias in compressed models trained on Gab and Twitter speech data and estimate to which extent these pruned models capture the relevant context when classifying the input text as hateful, offensive or neutral. Results of our experiments show that transformer-based encoders with 70% or fewer preserved weights are prone to gender, racial, and religious identity-based bias, even if the performance loss is insignificant. We suggest a supervised attention mechanism to counter bias amplification using ground truth per-token hate speech annotation. The proposed method allows pruning BERT, RoBERTa and their distilled versions up to 50% while preserving 90% of their initial performance according to bias and plausibility scores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    The implementation of the experiments can be found at

  2. 2.

    In our work, we use token-wise and word-level supervision interchangeably.

  3. 3.

    That token is used for classification in Transformer LMs.


  1. Bisht, A., Singh, A., Bhadauria, H., Virmani, J., et al.: Detection of hate speech and offensive language in twitter data using LSTM model. In: Jain, S., Paul, S. (eds.) Recent Trends in Image and Signal Processing in Computer Vision. AISC, vol. 1124, pp. 243–264. Springer, Singapore (2020).

  2. Borkan, D., Dixon, L., Sorensen, J.S., Thain, N., Vasserman, L.: Nuanced metrics for measuring unintended bias with real data for text classification. In: Companion Proceedings of The 2019 World Wide Web Conference (2019)

    Google Scholar 

  3. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018)

    Article  Google Scholar 

  4. Gupta, M., Varma, V., Damani, S., Narahari, K.N.: Compression of deep learning models for NLP. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3507–3508. CIKM 2020, Association for Computing Machinery, New York, NY, USA (2020).

  5. Hooker, S., Courville, A., Clark, G., Dauphin, Y., Frome, A.: What do compressed deep neural networks forget? arXiv preprint arXiv:1911.05248 (2019)

  6. Jawahar, G., Sagot, B., Seddah, D.: What does BERT learn about the structure of language? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3651–3657. Association for Computational Linguistics, Florence, Italy, July 2019.

  7. Lima, L., Reis, J.C., Melo, P., Murai, F., Benevenuto, F.: Characterizing (un) moderated textual data in social systems. In: 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 430–434. IEEE (2020)

    Google Scholar 

  8. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).

  9. Maass, A., Cadinu, M.: Stereotype threat: when minority members underperform. Eur. Rev. Soc. Psychol. 14(1), 243–275 (2003).

  10. Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14867–14875 (2021)

    Google Scholar 

  11. Merchant, A., Rahimtoroghi, E., Pavlick, E., Tenney, I.: What happens to BERT embeddings during fine-tuning? arXiv preprint arXiv:2004.14448 (2020)

  12. Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on BERT model. PLoS ONE 15(8), e0237861 (2020)

    Google Scholar 

  13. Mutanga, R.T., Naicker, N., Olugbara, O.O.: Hate speech detection in twitter using transformer methods. Int. J. Adv. Comput. Sci. Appl. 11(9) (2020)

    Google Scholar 

  14. Neill, J.O.: An overview of neural network compression. arXiv preprint arXiv:2006.03669 (2020)

  15. Niu, J., Lu, W., Penn, G.: Does BERT rediscover a classical NLP pipeline? In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3143–3153 (2022)

    Google Scholar 

  16. Röttger, P., Vidgen, B., Nguyen, D., Waseem, Z., Margetts, H., Pierrehumbert, J.B.: Hatecheck: functional tests for hate speech detection models. arXiv preprint arXiv:2012.15606 (2020)

  17. Sajjad, H., Dalvi, F., Durrani, N., Nakov, P.: Poor man’s BERT: smaller and faster transformer models. CoRR abs/2004.03844 (2020)

    Google Scholar 

  18. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv abs/1910.01108 (2019)

    Google Scholar 

  19. Soares, I.B., Wei, D., Ramamurthy, K.N., Singh, M., Yurochkin, M.: Your fairness may vary: pretrained language model fairness in toxic text classification. In: Annual Meeting of the Association for Computational Linguistics (2022)

    Google Scholar 

  20. Steiger, M., Bharucha, T.J., Venkatagiri, S., Riedl, M.J., Lease, M.: The psychological well-being of content moderators. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, ACM, May 2021.

  21. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)

    Google Scholar 

  22. Xu, C., Zhou, W., Ge, T., Wei, F., Zhou, M.: Bert-of-theseus: compressing BERT by progressive module replacing. arXiv preprint arXiv:2002.02925 (2020)

  23. Xu, J.M., Jun, K.S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 656–666 (2012)

    Google Scholar 

  24. Yin, W., Zubiaga, A.: Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Comput. Sci. 7, e598 (2021)

    Google Scholar 

Download references


This work was funded by the ANR project Dikè (grant number ANR-21-CE23-0026-02).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Irina Proskurina .

Editor information

Editors and Affiliations



Fig. 3.
figure 3

Community-wise Subgroup AUC scores on HateXplain test set. \(r*\) = set of bottom removed layers.

Table 3. Performance and fairness scores (Subgroup AUC) of models trained with word-level supervision. The numbers in parentheses represent the ratio of the layers preserved when pruning bottom layers. \(\lambda =0\) stands for non-supervised attention learning.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Proskurina, I., Metzler, G., Velcin, J. (2023). The Other Side of Compression: Measuring Bias in Pruned Transformers. In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30046-2

  • Online ISBN: 978-3-031-30047-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics