Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

Dolci, Tommaso; Azzalini, Fabio; Tanelli, Mara

doi:10.1007/s41019-023-00211-0

Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

Research Paper
Open access
Published: 15 April 2023

Volume 8, pages 177–195, (2023)
Cite this article

Download PDF

You have full access to this open access article

Data Science and Engineering Aims and scope Submit manuscript

Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

Download PDF

2097 Accesses
1 Altmetric
Explore all metrics

Abstract

The ever-increasing number of systems based on semantic text analysis is making natural language understanding a fundamental task: embedding-based language models are used for a variety of applications, such as resume parsing or improving web search results. At the same time, despite their popularity and widespread use, concern is rapidly growing due to their display of social bias and lack of transparency. In particular, they exhibit a large amount of gender bias, favouring the consolidation of social stereotypes. Recently, sentence embeddings have been introduced as a novel and powerful technique to represent entire sentences as vectors. We propose a new metric to estimate gender bias in sentence embeddings, named bias score. Our solution leverages semantic importance of words and previous research on bias in word embeddings, and it is able to discern between neutral and biased gender information at sentence level. Experiments on a real-world dataset demonstrate that our novel metric can identify gender stereotyped sentences. Furthermore, we employ bias score to detect and then remove or compensate for the more stereotyped entries in text corpora used to train sentence encoders, improving their degree of fairness. Finally, we prove that models retrained on fairer corpora are less prone to make stereotypical associations compared to their original counterpart, while preserving accuracy in natural language understanding tasks. Additionally, we compare our experiments with traditional methods for reducing bias in embedding-based language models.

Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings

Impact of Gender Debiased Word Embeddings in Language Modeling

Extensive study on the underlying gender bias in contextualized word embeddings

Article 24 July 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Language models are used for a variety of applications, such as CV parsing for a job position or document ranking for web search [5, 33]. Recently, a big step forward in the field of natural language processing (NLP) was the introduction of language models based on word embeddings, i.e. representations of words as vectors in a multi-dimensional space. These models translate the semantics of words into geometric properties, so that terms with similar meanings tend to have their vectors close to each other, and the difference between two embeddings represents the relationship between their respective words [40]. For instance, it is possible to retrieve the analogy $man: king = woman: queen$ because the difference vectors $\overrightarrow{queen} - \overrightarrow{king}$ and $\overrightarrow{woman} - \overrightarrow{man}$ share approximately the same direction.

Word embeddings boosted results in many NLP tasks, like sentiment analysis and question answering. However, despite the growing hype around them, these models have been shown to reflect the stereotypes of our society, even when the training phase is performed over text corpora written by professionals, such as news articles. For instance, they return sexist analogies like $man: programmer = woman: homemaker$ [7]. The social bias in the geometry of the model is then of course reflected in downstream applications like web search, CV parsing or hate speech detection [3, 7, 43]. In turn, this phenomenon favours the spread of prejudice towards social categories already frequently penalised, such as women or African Americans.

Lately, sentence embeddings—vector representations of sentences based on word embeddings—are increasing in popularity, gaining exceptional results in many language understanding tasks, such as semantic similarity or sentiment prediction [17, 49]. Training language models on large corpora, that often encapsulate historical bias in the form of social stereotypes, leads to the risk of enforcing the bias originally present in our society; as a result, training datasets should be adjusted to remove bias [54]. Therefore, it is of the utmost importance to expand research on how sentence embedding encoders internalise the semantics of natural languages. An important step towards this direction is to define metrics that are able to reflect and quantify social bias in sentence encoders. Furthermore, studying and limiting the causes and consequences of bias in language models is an extremely important task [4, 6].

This work expands research on social bias in embedding-based models, focusing specifically on gender bias in sentence representations. First, we propose a method to estimate gender bias in sentence embeddings, highlighting the correlation between bias and stereotypical concepts in the sentence. Our solution, named bias score, is highly flexible and designed to be easily adapted to both different kinds of social biases (e.g. ethnic, religious) and various sentence encoders. Moreover, since gender bias is determined by the internalisation of stereotypical associations in language models, bias score allows to identify stereotyped sentences that are responsible for increasing gender bias in the output embeddings encoded by the model. Therefore, in the second part of the paper, we leverage bias score to retrieve the more stereotyped sentences from the Stanford Natural Language Inference corpus (SNLI) [9], a large text corpus suitable for training general-purpose sentence encoders, such as those proposed by [17] and [13]. We then outline two approaches to make SNLI fairer: removing entries associated to the highest bias score, and performing data augmentation by compensating stereotyped sentences with their gender-swapped counterparts. Finally, we retrain a BiLSTM sentence encoder [17] on different fairer versions of SNLI, testing and comparing it with its original counterpart from both fairness and accuracy viewpoint in downstream tasks.

Our contributions in this work include: aa novel metric to estimate gender bias in sentence embeddings leveraging the semantic importance of words and previous research on bias in word embeddings; btwo methods to mitigate gender bias in sentence encoders by improving training data, performing data subtraction and data augmentation, respectively; can analysis of the effect of such mitigation actions when retraining a BiLSTM sentence embedding encoder, with a comparison with traditional methods for gender bias mitigation; da demonstration of the flexibility of our approach to be adapted to other language models, such as those based on transformer architectures.

The rest of the paper is structured as follows. Section 2 explores the state of the art on bias identification and reduction in language models, focusing on word and sentence embeddings. Section 3 introduces and defines bias score, our new metric for estimating gender bias in sentence representations. At the end of the section, we provide some examples of gender bias estimation via bias score. Section 4 first describes how to leverage bias score to make text corpora fairer, then explores a new approach to reduce bias in sentence encoders by retraining them on improved versions of their training data. Section 5 shows the results of our bias reduction methodology, discussing the benefits of the procedure from both the perspectives of quality and fairness. Section 6 describes how to extend our solution to transformer-based sentence encoders. Finally, Sect. 7 concludes the paper and outlines future work.

2 Related Work

Although language models are successfully used in a variety of applications, bias and fairness in NLP have received relatively little consideration until recent times, running the risk of favouring prejudice and strengthening stereotypes [14].

Static word embeddings were the first to be analysed. In 2016, they have been shown to exhibit the so-called gender bias, defined as the cosine of the angle between the word embedding of a gender-neutral word, and a one-dimensional subspace representing gender [7]. This approach was later adapted for non-binary social biases such as racial and religious bias [34]. A debiasing algorithm was also proposed to mitigate gender bias in word embeddings [7]; however, it was also shown that it fails to entirely capture and remove bias [24]. The Word Embedding Association Test (WEAT) [11] was created to measure bias in word embeddings following the pattern of the implicit-association test for humans. WEAT demonstrated the presence of harmful associations in GloVe [46] and word2vec [38, 39] embeddings.

Recently, a number of different approaches extended the research field. A new debiasing procedure was proposed to reduce gender bias by introducing a term to the loss function used during the training phase of the model [48]. Additionally, [8] presented a regularisation procedure that aims at debiasing a language model by minimising the projection of encoder-trained embeddings onto a subspace that encodes gender. Similarly, [59] used model compression techniques, a type of regularisation techniques, to reduce toxicity and bias originally present in generative language models. The system proposed by [32] mitigates bias by employing counterfactual data augmentation, proving that modifying the training data works better than changing the actual geometry of the embeddings. On a similar note, the approach described by [10] performs perturbations of the original embeddings training data to reduce the overall bias present in them. [27] presented a method to preserve gender-related information in feminine and masculine words while removing bias from stereotypical words. Still using GloVe as language model, [56] described an innovative procedure called Double-Hard Debias, to cope with changes in word frequency statistics that commonly have an undesirable impact on standard debasing methods. [60] describes a novel method exploiting causal inference, to reduce not only gender bias associated with a gender direction, but also gender bias from word embedding relations.

More recently, contextualised word embeddings like BERT [18] proved to be very accurate language models. However, despite the literature suggesting that they are generally less biased compared to their static counterparts [2], they still display a significant amount of social bias [36]. WEAT was extended to measure bias in sentence embedding encoders: the Sentence Encoder Association Test (SEAT) is again based on the evaluation of implicit associations and shows that modern sentence embeddings also exhibit social bias [36]. Meanwhile, attempts at debiasing sentence embeddings faced the issue of not being able to recognise neutral sentences, thus debiasing every representation regardless of the gender attributes in the original natural language sentence, leading to a loss of correct semantics [30].

Recently, [61] suggested the generation of implicit gender bias samples at sentence-level, which, along with a novel metric, can be used to accurately measure gender bias on contextualised embeddings. [28] proposed a fine-tuning method for debiasing word embeddings that can be applied to any pre-trained language model. Additionally, researchers have started working on generative transformer models. For instance, [25] proposed to mitigate gender disparity in text generation by learning a fair model with knowledge distillation. Last but not least, two comprehensive survey papers highlighted the latest advances on this front: [45] presents an overview of the most common debiasing methods in the context of vision and language research, while [37] proposes a deep empirical analysis of several bias mitigation techniques with different language models.

3 Gender Bias Estimation

Gender bias in word embeddings is typically estimated by computing the cosine similarity between word vectors and a gender direction identified in the vector space [7]. Cosine similarity is a popular metric to establish the semantic similarity of words based on the angle $\theta$ between their embedding vectors ${\mathbf{\vec{u}}}$ and ${\mathbf{\vec{v}}}$: $\cos (\theta ) = \frac{{u \cdot \overrightarrow {v} }}{{\left\| {\overrightarrow {u} } \right\|\;\left\| {\overrightarrow {v} } \right\|}}$. The more $\cos (\theta )$ approaches 1, the higher is the similarity between ${\mathbf{\vec{u}}}$ and ${\mathbf{\vec{v}}}$. In word embedding models, semantic similarity with respect to a gender direction (typically computed with PCA from multiple gender words) means that a word vector contains information about gender. Since only gender-neutral words can be biased, gender words like man or woman are assumed to contain correct gender information.

Recently, sentence representations are becoming increasingly popular, but the same approach used for measuring gender bias in word-level representations cannot be easily adopted, demanding a new methodology. In fact, the main problem is that gender-neutral sentences cannot be identified and listed. Unlike words, sentences are infinite in number. Moreover, sentences may contain gender bias despite containing explicit gender information. Consider the sentence my mother is a nurse: the word mother contains correct gender semantics, but the word nurse is female stereotyped. Table 1 shows that representations of gender-neutral sentences like someone is a nurse still contain a lot of “false” gender information due to the bias associated with the word nurse.

Table 1 Gender information from cosine similarity for sentence embeddings encoded by InferSent [17] and SBERT [49]

Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

Abstract

Similar content being viewed by others

Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings

Impact of Gender Debiased Word Embeddings in Language Modeling

Extensive study on the underlying gender bias in contextualized word embeddings

1 Introduction

2 Related Work

3 Gender Bias Estimation

3.1 Bias Score

3.2 Gender Direction

3.3 Gender Words

3.4 Word Importance

3.5 Bias Score Examples

4 Gender Bias Reduction

4.1 Training Corpus

4.2 Improving SNLI Fairness

4.2.1 Data Subtraction

4.2.2 Data Augmentation

4.3 Language Model and Parameters

5 Experimental Results

5.1 Fairness and Accuracy Metrics

5.1.1 SEAT

5.1.2 SentEval

5.2 Retrained Models Evaluation

5.3 Analysis and Discussion

5.4 Comparison with Hard-Debiasing

6 Extension to Transformer-Based Models

7 Conclusions and Future Work

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Appendices

Appendix A Gender Words

Appendix B STS and MultiNLI Bias Score

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation