Impact of Gender Debiased Word Embeddings in Language Modeling

Basta, Christine; Costa-jussà, Marta R.

doi:10.1007/978-3-031-24337-0_25

Christine Basta^8,9 &
Marta R. Costa-jussà⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

423 Accesses

Abstract

Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing. A key path towards fairness is to understand, analyse and interpret our data and algorithms. Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases. In addition, current algorithms have also been proven to amplify biases from data.

To further address these concerns, in this paper, we study how an state-of-the-art recurrent neural language model behaves when trained on data, which under-represents females, using pre-trained standard and debiased word embeddings. Results show that language models inherit higher bias when trained on unbalanced data when using pre-trained embeddings, in comparison with using embeddings trained within the task. Moreover, results show that, on the same data, language models inherit lower bias when using debiased pre-trained emdeddings, compared to using standard pre-trained embeddings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003). http://dl.acm.org/citation.cfm?id=944919.944966
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4349–4357. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf
Chiappa, S., Gilliam, T.P.: Path-specific counterfactual fairness. arXiv:1802.08139 (2018)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on EMNLP, pp. 1724–1734 (2014). http://aclweb.org/anthology/D/D14/D14-1179.pdf
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735, http://dx.doi.org/10.1162/neco.1997.9.8.1735
Islam, A.C., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora necessarily contain human biases. Science 356, 183–186 (2017)
Article Google Scholar
Leavy, S.: Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning. In: Proceedings of the 1st International Workshop on Gender Equality in Software Engineering, pp. 14–16. ACM (2018)
Google Scholar
Lu, K., Mardziel, P., Wu, F., Amancharla, P., Datta, A.: Gender bias in neural natural language processing. CoRR abs/1807.11714 (2018), http://arxiv.org/abs/1807.11714
Ma, L., Zhang, Y.: Using word2vec to process big text data. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2895–2897. IEEE (2015)
Google Scholar
Madaan, N., Singh, G., Mehta, S., Chetan, A., Joshi, B.: Generating clues for gender based occupation de-biasing in text. arXiv preprint arXiv:1804.03839 (2018)
Makarenkov, V., Shapira, B., Rokach, L.: Language models with glove word embeddings. CoRR abs/1610.03759 (2016)
Google Scholar
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)
Merity, S., Keskar, N.S., Socher, R.: An analysis of neural language modeling at multiple scales. arXiv preprint arXiv:1803.08240 (2018)
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Kobayashi, T., Hirose, K., Nakamura, S. (eds.) INTERSPEECH, pp. 1045–1048. ISCA (2010). http://dblp.uni-trier.de/db/conf/interspeech/interspeech2010.html#MikolovKBCK10
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014). http://www.aclweb.org/anthology/D14-1162
Rao, S., Tetreault, J.: Dear sir or madam, may I introduce the YAFC corpus: corpus, benchmarks and metrics for formality style transfer. arXiv preprint arXiv:1803.06535 (2018)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta (2010). http://is.muni.cz/publication/884893/en
Rudinger, R., Naradowsky, J., Leonard, B., Van Durme, B.: Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 (2018)
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: INTERSPEECH. ISCA (2002)
Google Scholar
Vera, M.F.: Exploring and mitigating gender bias in glove word embeddings (2018)
Google Scholar
Webster, K., Recasens, M., Axelrod, V., Baldridge, J.: Mind the GAP: a balanced corpus of gendered ambiguous pronouns. CoRR abs/1810.05201 (2018)
Google Scholar
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018)

Download references

Acknowledgments

This work is supported in part by the AGAUR through the FI PhD Scholarship; the Spanish Ministerio de Economía y Competitividad, the European Regional Development Fund and the Agencia Estatal de Investigación, through the postdoctoral senior grant Ramón y Cajal, the contract TEC2015-69266-P (MINECO/FEDER,EU) and the contract PCIN-2017-079 (AEI/MINECO).

Author information

Authors and Affiliations

TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
Christine Basta & Marta R. Costa-jussà
Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt
Christine Basta

Authors

Christine Basta
View author publications
You can also search for this author in PubMed Google Scholar
Marta R. Costa-jussà
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christine Basta .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Basta, C., Costa-jussà, M.R. (2023). Impact of Gender Debiased Word Embeddings in Language Modeling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-24337-0_25
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Impact of Gender Debiased Word Embeddings in Language Modeling