1 Introduction

Biometric systems are being designed and implemented by public and private organisations for law enforcement, migration control and security purposes. This technology is used to identify or verify the unique identity of a person through physical, physiological or behavioural characteristics, such as fingerprint, face, voice, gait or finger veins of human bodies. Individual bodily characteristics are transformed into biometric data and processed by algorithmic systems which output is considered the truthful identity of an individual. Biometric systems are part of our daily lives, we use them to unlock our mobile phones or to ‘efficiently’ cross borders at airports.Footnote 1 However, this technology has different social and political consequences depending on a subject’s migration status [59, 71, 73]. In Europe, the Dublin Regulation is the legal framework that establishes which Member State within the European Union (EU) is responsible of the application of asylum seekers. This law provides that the asylum seeker should seek asylum in the first country of arrival.Footnote 2 Under this law, asylum seekers are forced to remain in this first country where their asylum procedure will be examined, without the possibility of travelling or moving to another Member State in general. In order to enforce this law, the EU implemented a biometric database (EURODAC) to make it easier for Member States to determine the country responsible for the asylum application [17, 77, 92]. This large-scale database stores fingerprints of asylum seekers to identify the first country of arrival where they should have sought asylum. Then, in the case of a border crossing, border authorities can determine the country responsible of the asylum case by extracting the fingerprint of the asylum seeker and compare it to the fingerprints stored in EURODAC. In the case of a biometric match, the database will show the Member State responsible of the asylum case and the asylum seeker will be push-back. In other words, biometrics are implemented to immobilise, control and obstruct migrants’ movements within Member States through fingerprint identification [82, 87]. Meanwhile, European citizens do have the freedom of free movement across Member States. Moreover, the World Food Programme (WFP) in partnership with the United Nations High Commissioner for Refugees (UNHCR) implemented iris recognition to register migrants and provide cash assistance in Jordan’s Za’atari refugee camp [5]. More recently, Privacy International has denounced the plans of the UK Home Office to use facial recognition smartwatches to monitor migrants with criminal records [76]. The use of biometrics in these scenarios has been largely criticised by academics, activists and human and digital rights organisations who have argued that ‘undermines democracy, freedom and justice’ [30, 77]. In the specific context of migration, biometric systems are used to criminalise migrants and illegalise border crossing between Member States through the rule of law such as Dublin Regulation, infringing a fundamental right: the freedom of movement. Yet, algorithmic fairness has plunged into biometrics as a new research area which aims at debiasing these systems, without considering the historical, political, legal and social context in which this technology is embedded.

This paper starts by questioning the ‘emergent challenge’ of fairness in biometrics. Since the publication of Gender Shades by Buolamwini and Gebru in 2018, the field of biometrics has experienced a surge in studies on bias and disparate impact in algorithmic systems, such as facial recognition, fingerprints, finger veins and iris recognition, among others [26]. By examining the recent literature on fairness in biometrics, we observe a general lack of engagement with the political and social aspects in which these systems are implemented. The literature examined also neglect current debates and critiques towards algorithmic fairness, such as the impossibility of fairness and the need to analyse the politics of algorithms. Notably, a recent work in finger vein recognition systems has suggested a lack of bias for different biometric algorithms [27]. However, critical scholars find limitations to reproduce biometric systems and compare results. Data and code are not usually made open and publicly available. The sensitivity of biometric and demographic data stems from the fact that subjects identity can be put at stake if the anonymisation process is not carried out properly. Moreover, obtaining access to biometric code is also a constraint as engineers and even academics are sometimes reluctant to share it. Thus, challenging the biometric research field in fairness from a quantitative and technical perspective becomes a very arduous task for critical scholars and digital rights organisations.

In this paper, we propose to critically analyse the field of fairness in biometrics. Whilst the impossibility of fairness in machine learning has been widely investigated [22, 40, 45, 58], this result has not been proved in the field of biometrics yet. Building on these previous works in machine learning, the first contribution of this work is to show the impossibility of fair biometric systems from a mathematical perspective. The second contribution is to empirically demonstrate this theoretical framework by implementing three fairness metrics in four different biometric systems. These biometric systems are reproduced in [27] where authors claimed that these systems are not biased by analysing statistical differences. Although this might seem pragmatic to evaluate fairness, we outline some technical limitations. First, the decision thresholdFootnote 3 which sets the cut-off between a biometric match or non-match plays a key role in the assessment of fairness in biometric systems [36]. Second, intersectional demographic evaluation of gender and age must be assessed. Third, holding on our theoretical framework that demonstrates the impossibility of fairness in biometrics, the lack of bias is almost technically impossible in any biometric system. Moreover, we highlight that the dataset used to train these systems and which we had access to reproduce the racialisation of subjects, proposing race and gender categories that are colonial, archaic and offensive.

Yet, we move the argument forward by contending that a critical questioning of the use of fairness in biometric systems should consider the historical, political and social contexts in which biometrics are deployed, rather than narrow to error differences on demographic groups. The third contribution of this paper is to expose how biometric systems are unfair considering the political context in which these algorithms are deployed. Moving our discourse towards a ‘critical biometric consciousness’ [16], we analyse the case of an asylum appeal where the migrant’s credibility was challenged in part due to an inconsistency between his testimony and a biometric trace stored in a database and matched by a biometric system. Given this fact and other inconsistencies, the UK Deputy Upper Tribunal Judge denied his asylum claim, dismissing any other evidence provided by the asylum seeker. This case, we argue, unveils the elephant in the room of biometrics: the fact that is being intentionally ignored or left unaddressed, showing how biometrics cannot be fair if they are deployed in scenarios that are intended to discriminate, even in the hypothetical case of lack of bias. Whilst academics, private companies and biometric engineers are focusing their efforts in building more accurate, explainable and fairer biometrics, little attention is paid to the political implications of biometrics and how these systems are used to jeopardise rights that have been recognised by institutions such as United Nations (UN) and require high degree of protection, such as the freedom of movement [48]. Importantly, whilst the European Commission has proposed the first legal framework for artificial intelligence and biometrics, the Artificial Intelligence Act (AI Act hereinafter), to protect European citizens’ digital and fundamental rights [33], biometric systems used at the border for migration control will be explicitly exempt from such regulation. Therefore, migrants’ digital and fundamental rights will not obtain the same lawful protection. Since fairness in biometrics has the risk of becoming more prominent in the incoming years, we urge for a critical, political and radical examination of this field. We suggest that we must also situate current debates in biometrics within a broader historical context of struggles against discrimination, moving beyond the technological dilemmas about ethics and demographic bias. To the best of our knowledge, this is the first academic work that investigates fairness in biometrics from a critical and political perspective, and pushes the argument further showing how debates about the core function of borders and the use of biometrics to criminalise migration are undermined by the focus on ethics and more equitable biometric systems.

2 Fairness in biometrics: An emergent challenge?

In 2018, Buolamwini and Gebru published Gender Shades, an academic work that assessed bias in gender classification algorithms through facial recognition. They analysed several commercial gender classification models and found significant disparities based on individuals’ characteristics: white skins had better results than dark skins whilst males obtained better results than females. This intersectional benchmark opened up a new avenue of critical analysis towards biases in racialised technologies. It influenced public and academic debates towards the use of facial recognition [13, 54, 69], creating awareness about the risks of algorithms that encode and propagate historical, political and social disparities. Consequently, it also has disrupted the field of biometrics. Since the publication of Gender Shades [18], fairness has emerged as a major challenge within biometrics [26].

The main goal of fairness in the field of biometrics is to identify and understand bias in systems that determine or validate the identity or other characteristics like the gender of individuals [80]. In some other cases, bias mitigation can be also part of this goal. Researchers have analysed demographic biases in facial recognition, fingerprints, palm prints, iris and even in finger veins. Face recognition has been notably the most scrutinised system in the last decade, where gender and race are the demographic features scrutinised. In general, males and white skins obtain higher biometric performance [3, 36, 60, 85]. The annual reports published by National Institute of Standards and Technology (NIST) [46] also found error disparities based on gender and race on more than 189 commercial facial recognition systems that are used for border control. Similarly, other biometric systems such as iris recognition find noteworthy differences between females and males, with the former having higher error rates [34]. A recent study has analysed demographic bias in fingerprint recognition [44], concluding that bias depends on external factors such as image quality rather than gender or race of individuals. Yet, age is the demographic feature that has the greatest impact on the performance of fingerprint systems. For instance, the conclusion achieved in recent works is that fingerprint verification systems get higher error rates on minors [63, 75]. Indeed, most studies about fairness in biometrics focus on analysing bias in race and gender rather than age. Given the lack of biometric samples of children, these biometric systems perform worse on minors when compare to adult samples. Despite the complexity of mitigating demographic bias in biometric systems, researchers have suggested that ‘statistically significant bias’ on age and gender ‘have not been detected on five finger vein recognition algorithms tested on four datasets’ [27].

In spite of the numerous biometric articles in fairness that have been recently published, there is a lack of critical analysis on how these systems are implemented in real scenarios and jeopardise fundamental rights, such as freedom of movement. According to Guild and Groenendijk: ‘[F]reedom of movement did not only amount to the right to travel freely, to take up residence and to work, but also involved the enjoyment of a legal status characterised by security of residence, the right to family reunification and the right to be treated equally with nationals’ [48, pp. 206]. What is the purpose of developing biometric systems that perform equally well across different demographic groups if they are implemented to push-back migrants at the border, limit border crossings and deny asylum applications? On the other hand, part of these studies also neglect the current debates in the field of fairness and critical artificial intelligence studies. First, these works analyse algorithmic bias without considering the more than 20 definitions in fairness in machine learning that have been proposed in the last decade [12, 28, 95]. For instance, in [27] demographic bias is calculated by analysing differences of score distributions by groups (males vs. females or children vs. adults). However, differences on error rates which is an standard approach to evaluate fairness are not considered. Second, despite the popularity of Gender Shades [18] and its proposed benchmark, intersectionality is not considered in most of the studies analysed. In general, error disparity is analysed taking only a single demographic feature into account (gender, race or age). Third, some studies also suggest a lack of error disparities of biometric systems such as in  [27]. However, different fairness works have demonstrated the impossibility of fairness, proving mathematically and empirically that fairness definitions are mutually exclusive [22, 40, 45, 58]. As a result, it has been proved that despite the efforts on mitigating disparities, error differences among demographics still persists [46].

As we have previously stated, part of the literature of fairness in biometrics that we have examined do not take into account the politics of these systems and how they are impacting on fundamental rights [20]. In contrast, Buolamwini and Gebru clearly exposed that ‘all evaluated companies provide a “gender classification” feature that uses the binary sex labels of female and male. However, this view on gender does not adequately capture its complexities or address transgender identities’ [18, p. 6]. In fact, the mere idea of gender classification algorithms infringes fundamental rights by designing and targeting trans people which are more likely to be ‘miss-classified’. Examining the performance of classification models with respect to gender will likely improve these models. Yet, we ought to consider if it is even a desirable goal. What is the politics of a gender classification system? What is the benefit of gender classification algorithms bring to our societies? How could communities such as LGBTQIA+ benefit from a fair and transparent gender facial recognition?

In this paper, we focus on the critique towards the use of biometrics for migration control which impacts on asylum seekers’ rights. Since 2000, the EU has implemented biometric systems to identify travellers for migration control [10, 64, 82]. Judges, officials in migration administration and border guards among others are currently making use of biometric technology in asylum cases or visa applications [43, 74] to make ‘unequivocal’ and ‘efficient’ decisions. The output of the biometric systems are considered more reliable than migrants or asylum seekers, which are perceived as deceptive subjects [9].

Fairness and the study of bias has become the elephant in the room of biometrics. Whilst the effort is focused on solving the challenge of addressing demographic biases and design ‘fair’ biometric systems, less attention is paid on the context of how it is used by the biometric community. These systems are nowadays reproducing political injustices which are linked to a colonial legacy which is ignored in contemporary biometrics [16]. As Maguire argued [62], fingerprints were used by a British officer at the Indian Civil Service to avoid fraud on colonial subjects. At the same time, Galton investigated ‘the heritable characteristics of race’ on racialised individuals in British prisons. In fact, biometrics ‘offered 19th century innovators more than the prospect of identifying criminals: early biometrics promised a utopia of bio-governmentality in which individual identity verification was at the heart of population control’ [62]. In the 21st century, this approach to biometrics has evolved with the use of algorithms that automatically find patterns and calculate matches. Yet, its colonial legacy remains still visible. At the border, these systems are used to identify racialised subjects. There, fairness has emerged to make the identification of asylum seekers ‘fairer’.

3 A critique of fairness in biometrics from a technological perspective

Fairness is a contested concept that has been historically discussed by social scientists, legal experts and philosophers [41, 79]. Yet, there is no consensus on what this concept means or implies. The surge of socio-technical systems deployed in our society has revealed algorithmic disparate impacts [8, 32, 68]. As a response, the concept of fairness has been translated into mathematical definitions which quantify the fairness of algorithmic-based models [52]. Different formulations [12, 28, 95] have been proposed to calculate and mitigate algorithmic biases. In recent years, the research field in biometrics has also evaluated the fairness of these systems as well as developing novel methods for bias mitigation. Most of the studies do not take into account the most relevant state-of-the-art definitions of fairness in machine learning and do not engage with critical debates [27, 34, 63, 84, 85]. Moreover, one of these studies analysed suggests lack of bias in score distributions [27] which is technically impossible as studies in the field of algorithmic fairness in machine learning have shown [22, 40, 45, 58]. As there are multiple definitions to evaluate demographic bias, the field of fairness has proved that it is generally impossible to satisfy several fairness criteria simultaneously. Therefore, algorithmic systems, including biometrics as we will prove, will be generally biased.

3.1 Formulation of a biometric verification system

In the discussion, herein, we focus on biometric verification systems. These systems confirm (or reject) whether a biometric sample belongs to an specific individual based on similarity to their learned biometric representation. Formally, we use the following notation:

  • \(x^{(i)}\): learned representation of the biometric ith-sample within the dataset \(\mathcal {D}\),

  • \(y^{(i,j)}\): element (ij) of the binary outcome vector y,

  • \(\theta\): set of parameters of the biometric system,

  • \({\tilde{y}}^{(i,j)} = f ((x^{(i)}, x^{(j)}) | \theta )\): element (ij) of the binary prediction vector \({\tilde{y}}\),

  • \(\tau\): decision threshold,

  • \(s^{(i,j)}\): similarity score between samples i and j.

Usually, these systems employ learning algorithms which transform images into features and numerical representations. They are driven by an optimisation function to minimise errors between two samples (\(x^{(i)}\) and \(x^{(j)}\)) and the outcome (\(y^{(i,j)}\)) which is whether they belong to the same individual or not:

$$\begin{aligned} \theta ^*= \mathop {\textrm{arg}\,\textrm{min}}\limits _{\theta } \sum _{x^{(i)},x^{(j)} \in \mathcal {D}} \mathcal {L} \Big ( f \left( (x^{(i)}, x^{(j)}) \big | \theta \right) , y^{(i,j)} \Big ) \end{aligned}$$
(1)

The biometric model compares two learned representation samples (\(x^{(i)}\) and \(x^{(j)}\)) and obtains a similarity score (\(s^{(i,j)}\)). This score is then translated into a binary prediction (\({\tilde{y}}^{(i,j)}\)) given a threshold \(\tau\):

$$\begin{aligned} {\tilde{y}}^{(i,j)} = {\left\{ \begin{array}{ll} 0 &{} \text {if }s \le \tau \\ 1 &{} \text {if }s > \tau \end{array}\right. } \end{aligned}$$
(2)

The model learns the best representation of parameters (\(\theta\)) that achieves the minimum number of ‘miss-identifications’. This means that the binary prediction (\({\tilde{y}}^{(i,j)}\)) should be as similar as the binary outcome (\(y^{(i,j)}\)). In the case of biometric verification systems, two pairs of biometric samples are labelled as genuine (\(y^{(i,j)}=1\)) if they correspond to the same individual and as impostor (\(y^{(i,j)}=0\)) otherwise.Footnote 4 Therefore, a true genuine (\(TG = Pr({\tilde{y}}^{(i,j)} = 1 | y^{(i,j)} = 1)\)) refers to those samples that correspond to the same individual and are correctly matched by the biometric system; the true impostor (\(TN = Pr({\tilde{y}}^{(i,j)} = 0 | y^{(i,j)} = 0)\)) refers to samples that correspond to different individuals and they are rejected; false genuine (\(FG = Pr({\tilde{y}}^{(i,j)} = 1 | y^{(i,j)} = 0)\)) refers to different individuals whose samples are matched; and false impostor (\(FI = Pr({\tilde{y}}^{(i,j)} = 0 | y^{(i,j)} = 1)\)) refers to samples of the same individual that are miss-matched.Footnote 5 Thus, the errors that biometric systems commit are: (1) incorrect association of two subjects or (2) failed association of one subject.

Fig. 1
figure 1

Relationship between the similarity score (s), decision threshold (\(\tau\)) and the genuine (g(s)) and impostor (i(s)) score distributions in a biometric system. The figure visually shows the dependence with the confusion matrix elements (true genuine (TG), true impostor (TI), false genuine (FG), and false impostor (FI))

In contrast to machine learning, biometric systems are evaluated setting different decision thresholds (\(\tau\)) which clearly affects on the distribution of errors (see Fig. 1). Equal Error Rate (EER) is the value where the false genuine and impostor rates curves intersect (\(FGR = FIR\)). False genuine rate at 0.001 (\(FGR_{1000}\)) is the value of false impostor rate when false genuine rate is 0.001 (\(FGR = 0.001\)), and vice versa. Systems are also evaluated when one of these rates is 0 (ZFGR when \(FIR=0\) or ZFIR when \(FGR=0\)). Thus, the decision threshold is set targeting different error values.

3.2 Formulation of three fairness definitions for biometrics

A fairness measure is a mathematical function that quantifies and assesses biased systems and algorithmic discrimination. These measures aim at evaluating model performance across demographics groups (\(\mathcal {C} = \{C_1, C_2, \ldots , C_n\}\)) and ensure that there is no disparate impact among them. In this setting, typically a demographic feature such as gender, age, class or race is proposed to define the ‘advantaged’ and ‘disadvantaged’ group.

We identify that part of the literature on fairness in biometrics disengages from the established fairness definitions in machine learning. In some of the works analysed [27, 34, 88], authors assessed biases in biometric systems by proposing their own fairness definitions without considering previous works on algorithmic discrimination. However, recent studies have adopted these definitions into biometrics [36, 44]. Therefore, we propose three well-known definitions of fairness translated into biometrics notation to demonstrate the impossibility of fairness in these systems.

3.2.1 Equalised odds (also disparate mistreatment or error rate balance) [50]

A biometric system satisfies equalised odds if TGR and FGR are similar across demographics groups:

$$\begin{aligned} | TGR_{C_a} - TGR_{C_b} | < \epsilon \end{aligned}$$
(3)

and,

$$\begin{aligned} | FGR_{C_a} - FGR_{C_b} | < \epsilon , \forall \,C_a, C_b \subset \mathcal {C} \end{aligned}$$
(4)

3.2.2 Statistical parity (also group fairness or demographic parity) [29]

A biometric system satisfies statistical parity if the probability of predicted genuine is similar across demographics groups. This definition is based on the predicted outcome. Mathematically, this definition is expressed as follows:

$$\begin{aligned} |Pr({\tilde{y}} = 1 | C_a ) - Pr({\tilde{y}} = 1 | C_b )| < \epsilon , \forall \,C_a, C_b \subset \mathcal {C} \end{aligned}$$
(5)

3.2.3 Predictive parity (also outcome test) [22]

A biometric system satisfies predictive parity if the probability of being predicted genuine of actual genuine is similar across demographic groups. More formally:

$$\begin{aligned}{} & {} |Pr(y = 1 | {\tilde{y}} = 1, C_a ) \nonumber \\{} & {} \quad - Pr(y = 1 | {\tilde{y}} = 1, C_b )| < \epsilon , \forall \,C_a, C_b \subset \mathcal {C} \end{aligned}$$
(6)

3.3 The impossibility of unbiased biometric systems

In this section, we provide a theoretical framework to demonstrate that the previous fairness criteria are mutually exclusive. The impossibility of fairness in machine learning has been widely studied by [22, 40, 45, 58]. However, there is a lack in the literature about the impossibility of fairness in biometrics that this paper attempts to address. We mathematically show that only under very unrealistic conditions (equal ratios among demographics groups, trivial or perfect biometric system),Footnote 6 these three definitions can be simultaneously satisfied. Thus, we prove the impossibility of any unbiased biometric system.

To simplify the notation, we assume that the demographic features is categorised into two groups: \(\mathcal {C} = \{C_1, C_2\}\).

Proposition 3.3.1

Given a biometric system which is non-trivial with unequal ratios among groups that satisfies equalised odds and statistical parity, then predictive parity cannot hold.

Proof

Suppose that predictive parity is held, then equalised odds or statistical parity is not held.

Given Bayes’ theorem, we obtain that:

$$\begin{aligned} Pr(y=1 | {\tilde{y}}=1, \mathcal {C}) = \frac{Pr({\tilde{y}}=1 | y=1, \mathcal {C}) Pr(y=1 | \mathcal {C})}{Pr({\tilde{y}}=1 | \mathcal {C})} \end{aligned}$$
(7)

If predictive parity is satisfied, then:

$$\begin{aligned}{} & {} |Pr(y = 1 | {\tilde{y}} = 1, C_1 ) - Pr(y = 1 | {\tilde{y}} = 1, C_2 )|< \epsilon \overset{\text {Equation}\,7}{\implies } \\{} & {} \quad \Big | \frac{Pr({\tilde{y}} = 1 | y = 1, C_1) Pr(y=1 | C_1) }{Pr({\tilde{y}} = 1 |C_1)} - \frac{Pr({\tilde{y}} = 1 | y = 1, C_2) Pr(y=1 | C_2) }{Pr({\tilde{y}} = 1 | C_2)} \Big | < \epsilon \end{aligned}$$

Assuming that equalised odds on the TGR ( \(| TGR_{C_1} - TGR_{C_2} | < \epsilon )\) and statistical parity (\(|Pr({\tilde{y}} = 1 | C_a ) - Pr({\tilde{y}} = 1 | C_b )| < \epsilon\)) are satisfied in the previous expression:

$$\begin{aligned} \Big | \frac{Pr({\tilde{y}} = 1 | y = 1) [Pr(y=1 | C_1)-Pr(y=1 | C_2)]}{Pr({\tilde{y}} = 1)} \Big | < \epsilon \end{aligned}$$

Given that \(Pr({\tilde{y}} | y ), Pr(y| \mathcal {C}), Pr({\tilde{y}}) \in [0, 1]\), predictive parity is only satisfied when:

$$\begin{aligned} | Pr({\tilde{y}} = 1 | y = 1) [Pr(y=1 | C_1)-Pr(y=1 | C_2)] | < \epsilon \end{aligned}$$

which implies:

$$\begin{aligned} Pr({\tilde{y}} = 1 | y = 1) < \epsilon \end{aligned}$$

or

$$\begin{aligned} | Pr(y=1 | C_1)-Pr(y=1 | C_2) | < \epsilon . \end{aligned}$$

On the one hand, if \(Pr({\tilde{y}} = 1 | y = 1) < \epsilon\), we obtain that Proposition 3.3.1 is only satisfied when \(TGR = 0\) which means that the system fails to correctly classify any genuine instance or the system is a trivial classifier, e.g. there are only impostor instances. On the other hand, \(| Pr(y=1 | C_1)-Pr(y=1 | C_2) < \epsilon | \implies Pr(y=1 | C_1) \sim Pr(y=1 | C_2)\) which implies that Proposition 3.3.1 is satisfied only under equal ratios. \(\square\)

Thus, we prove that the impossibility of fairness also holds for biometric systems.

4 A biometric experiment: Why this system is biased?

Getting access to off-the-shelf biometric algorithms together with biometric and demographic data is challenging. Reasonably, biometric data is not available in the public domain due to privacy and consent policies. However, we got access to a finger vein dataset together with its demographic data developed by scholars at the University of Salzburg (Austria) that allow us to empirically prove the impossibility of fairness in biometrics. We examine four biometric systems (finger veins) that are publicly available to empirically demonstrate the impossibility of fairness in biometric systems (see Proposition 3.3.1).

These biometric algorithms aimed at identifying subjects through vascular patterns on the human body, i.e. finger, palm or human eye veins [90]. These four systems are evaluated in [27] to assess demographic bias by differences on statistics (mean and standard deviation) of genuine and impostor score distributions. The conclusion achieved is that statistically significant biases in score distributions do not exist and the authors proposed to evaluate this framework in the future with more individuals, given that the number of subjects in each of the databases is very low. Rather than reproduce their experiments with larger databases, we empirically demonstrate that this framework is biased from three different perspectives: (1) ratios, (2) fairness criteria, and (3) intersectionality.

4.1 Ratios

Broken links and APIs hampered the access to three of four publicly available datasets [61, 89, 93]. Through an online petition, we had access to PLUSVein-FV3 [56].Footnote 7, Footnote 8 This database contains 1440 finger vein images from hands and fingers of 60 individuals. Figure 2 shows the number of individuals based on age, raceFootnote 9 and genderFootnote 10. We observe that the mean of age within this dataset is 37.9 years, whilst \(Q_3\)Footnote 11 is 46.5 years, which implies that the database is not representative for elders. Analysing the gender feature, we observe that rates are unbalanced: 60% males and 40% females. Yet, this feature does not consider other gender expressions rather than binary ones. Europeans are the most represented race in PLUS-VeinFV3: 90% European, 5% East Asian, 1.6% Central Asian, 1.6% ‘Mulatto’, 1.6% African. The proportion of non-European individuals is significantly low (see Fig. 2).

Remarkably, ‘Mulatto’ is a label proposed for this category which is not linked to any continent. Whilst European, East Asian and Central Asian correspond to race labels related with geographical expressions, ‘Mulatto’ was a label used during the Spanish colonial period to mark the slave status of children born to Spaniards and African women slaves. This category exposes the colonial and racial legacy of biometrics through a conceptualisation of racialised and colonial bodies. ‘Mulatto’ is a clear expression of the making of bodies through their qualities of ‘colour’ and colonial slavery [16, 42]. In this paper, we contend that the literature in biometrics creates race categories to calculate demographic biases without critically analysing how these systems are embedded in structrual mechanisms such as borders, the intention of which is to discriminate [2]. As other critical scholars pointed out: ‘[t]reating race as an attribute, rather than a structural, institutional, and relational phenomenon, can serve to minimise the structural aspects of algorithmic unfairness’ [49].

Fig. 2
figure 2

Intersectional ratios on age, race and gender of individuals in PLUSVein-FV3 [56]. There are large disparities among groups: more males (M) than females (F), young than old adults, and majority of Europeans. The proposed race labels lack of diversity, considering one of them (‘Mulatto’) rather archaic and offensive

Bias in demographic groups in PLUSVein-FV3 becomes also evident (see Fig. 2). Any biometric system trained in this dataset will perform better on individuals whose demographic characteristics are widely presented in the database, i.e. young male Europeans [18]. The performance of the biometric system will be poor on individuals who are under-represented: non-Europeans, elderly and females. Consequently, given that the majority of samples are taken from European males, rates of genuine and impostor outcomes will be significantly unequal across different demographic groups. Thus, as previously demonstrated (Proposition 3.3.1) equalised odds, statistical parity and predictive parity cannot hold together.

4.2 Fairness criteria

The experiments are run using the four biometric systems in [27] and the PLUSVein-FV3 dataset. These systems are designed using different types of vein recognition schemes: LBP [35], MC [67], PC [21], and SIFT [55, 96]. These methods are designed following different mechanisms of finger vein recognition. Whilst LBP is based on a pattern-based method that identifies veins textures using filters, MC and PC extract patterns analysing curvatures. SIFT is designed to identify keypoints on the image that guide the algorithm to recognise veins.

Table 1 shows the results of these four different biometric systems on the PLUSVein-FV3 dataset. In general, the error rates of these systems are very low (see MC, PC and SIFT). However, we observe that LBP has the worst performance, obtaining 0.89 and 0.78 of FIR at \(FGR_{1000}\) and \(FGR_{100}\) respectively. In this case, if the rate of false match is very low (\(\downarrow FGR\)), the false non-match rate is extremely high (\(\uparrow FIR\)). Nevertheless, the aim of this section is to empirically demonstrate the impossibility of fairness in biometric systems (see Proposition 3.3.1). To do so, we calculate three fairness criteria (equalised odds, statistical parity and predictive parity) on three different demographics (age, gender, and race). Demographics groups are categorised as: young (\(\le 45\)) and old (\(>45\)), male and female, and European and non-European.

Table 1 Error rates of biometric methods

As de Freitas Pereira and Marcel observed, several works of fairness in biometrics set a single decision threshold \(\tau\) for every demographic group. However, they argue that this is a ‘serious flaw’ and ‘can give a false impression that a biometric verification system is fair’ [36, p. 3]. They conclude that ‘[f]air biometric recognition systems are fair if a decision threshold \(\tau\) is “fair” for all demographic groups with respect to FGR(\(\tau\)) and FIR(\(\tau\))’ [36, p. 10]. Following this suggestion, fairness metrics are calculated after setting the same decision thresholds for the three demographic groups: \(FGR_{1000}\) and near ZFIR.

Table 2 Biometric recognition performance as measured by fairness criteria differences among three demographic groups at \(FGR_{1000}\)
Table 3 Biometric recognition performance as measured by fairness criteria differences among three demographic groups at \(\sim ZFIR\)

The overall recognition performance results show that the four biometric systems are unfair considering three fairness definitions on the three demographic groups assessed.Footnote 12 These results contradict the result that these systems lack of demographic bias. Moreover, the results on Table 3 hold our theoretical framework on the impossibility of fairness in biometric systems (Proposition 3.3.1). We observe how the decision threshold value (\(\tau\)) impacts on the fairness criteria that are satisfied. On one hand, setting \(\tau\) at \(FGR_{1000}\) both equalised odds and statistical parity are not satisfied, yet predictive parity is achieved (Table 2). On the other hand, when \(\tau\) is set at \(\sim ZMR\), then predictive party is held but equalised odds and statistical parity are not achieved (Table 3).

The empirical results at \(FGR_{1000}\) clearly hold Chouldechova’s observation in [22, p. 157] about predictive parity, FGR, and FIR. When a system satisfies predictive parity but ratios differ across demographic groups, the system cannot achieve equal FGR and FIR across those groups. Thus, if FGR are not similar across categories in the demographic group scrutinised, equalised odds cannot be satisfied. Analysing disparities among groups, old adults obtain worsen results than young, which could be a consequence of low proportion of old adults in the dataset. Gender also has significant unfair results. For instance, MC and PC obtains higher FGR on females than on males (421.18% and 198.8% respectively). Fair criteria are also worsen on non-Europeans than Europeans.

On the other hand, setting thresholds near ZFIR (\(\sim ZFIR\)) implies that predictive parity is not satisfied. These results clearly demonstrate Proposition 3.3.1 that states that the three fairness definition cannot hold simultaneously.Footnote 13 Rather than race, we observe that age and gender obtain wider differences. In this case, old people and female individuals obtain larger error rates than young and males respectively.

4.3 Intersectionality

Fig. 3
figure 3

Intersectional fairness disparities in four biometric recognition systems based on age and gender. Decision threshold (\(\tau\)) is set at \(\sim ZFIR\) (first dashed line) and \(FGR_{1000}\) (second dashed line). Distribution of genuine (orange) and impostor (purple) scores differ among groups. Classification disparities across intersectional demographic groups are apparent. Overall, young males obtain better performance than other groups

Given the intersectional benchmark proposed in Gender Shades [18], we evaluate the results of the four biometric recognition systems based on intersectional groups. The results show that disparities are more prominent on age and gender, rather than on race. Unlike facial recognition, finger veins systems do not take into account skin tones and their performance is more affected by other attributes such as fingers size. Moreover, wide differences in ratios among Europeans and non-Europeans (see Fig. 2) impacts also on these results.

Figure 3 shows the distribution of genuine and impostor scores across four groups (young females, old females, young males and old males) at two decision thresholds. On the four biometric systems, young male scores obtain better results than the other groups. The distribution of genuine scores (orange box) is more right-handed, which results in a higher number of correct matches (\(\uparrow TGR\)). In contrast, impostor scores (purple box) are less left-handed, which implies high rates of false matches (\(\uparrow FGR\)). Genuine distributions of young males obtained the highest lower quartile (\(Q_1\)) on the four systems. Young females obtain higher genuine scores than old females and males. Overall, all biometric systems perform worst on elderly males. For instance, Q1 of PC’s genuine distribution of old males is the only one below \(\tau\) when \(FGR_{1000}\). This implies that the FGR of this group is substantially higher than other groups. The distribution of impostor scores is similar across the four demographic groups and biometric systems. Impostor distributions are narrower than genuine distributions. However, the number of outliers is considerable (see LBP’s distributions). Setting \(\tau\) at \(\sim ZFIR\), young males obtain higher TIR than females and elderly adults.

5 The politics beyond fairness in biometric systems

Biometric systems will generally be biased as we have previously demonstrated. Yet these systems are getting more accurate and fairer.Footnote 14 The percentage of errors and biases are decreasing consistently over the years. Private companies and research centres are training their systems with better image quality and more sophisticated algorithmic architectures which outperforms previous versions. For instance, the latest publication released by NIST [47] reported that the best vendor’s facial recognition has negligible errors with VISA border photos (\(FGR_{10^6} = 0.0023\)). Moreover, the analysis on demographic disparities shows no significant differences. To demystify myths on the technical details of algorithmic systems, we argue that a critical approach to fairness in biometrics, coined as ‘critical biometric consciousness’ by the scholar Simone Browne [16, p. 116], should also shift the attention towards the political context and racialised mechanisms where biometrics are embedded in [7, 20].

Since 2001, the EU has established a digital border infrastructure for migration control [17, 74]. The European Asylum Dactyloscopy Database (EURODAC), the Visa Information System (VIS), the Schengen Information System (SIS II), and the new Entry–Exit System (EES) are the four main databases that Member States use to determine responsibility for examining an asylum application, register visa applications or border crossings. These systems are provided with biometric systems in order to register, identify and criminalise migrants [64, 86, 87]. Through these databases, migrants who have entered the Schengen area are registered and identified through their fingerprints. The storage of biometric samples in these databases is used to verify the identity of asylum seekers in countries which ‘are not responsible’ for their asylum petition (EURODAC) or detect those people whose visa has expired (EES and VIS). However, biometric traces provided by these biometric systems are used in other contexts such as asylum tribunal decisions. On May 2019, the First-tier Tribunal in the UK refused a petition of asylum to an individual national of Iraq of Kurdish origin due to a biometric evidence given by one of these systems [91], alongside other ‘discrepancies’ on his narrative.Footnote 15 The asylum seeker claimed that he was at risk of serious harm, stating that one family member had been killed and his own house was intentionally set on fire. He appealed against the decision, the Deputy Upper Tribunal Judge did not set aside the previous decision:

A further document was relied upon by the Respondent at that hearing, being a EURODAC search result, demonstrating that a person in the Appellant’s identity was fingerprinted in Dresden in Germany on 22 March 2016. The Appellant’s account as given in his Statement of Evidence (SEF) interview and confirmed in oral evidence before the judge was that he only left Iraq in December 2017. The Appellant denied before the judge that the person identified in the EURODAC search was him but the judge stated that she was satisfied by the details contained within the document, and looking at the clear photograph on the EURODAC match, that the person fingerprinted in Germany was indeed the Appellant. [...] The evidence provided by the EURODAC document is unequivocal. The photograph contained within the document is clearly the Appellant who appeared before me and I have no reason to doubt that the document relates to him. Therefore, this document upon which I am satisfied I can place significant weight puts him in Germany on 22 March 2016. Consequently, I find that this evidence undermines the credibility of his entire account and his credibility as a witness in his own cause and that it renders his entire account unreliable. [91, p. 2]

This case unveils how biometric systems are used nowadays as an algorithmic evidence at the border to refuse and fail asylum seekers. As Browne has argued biometrics are a technology ‘to make the mute body disclose the truth of its racial identities’ and ‘that can be employed to do the work of alienating the subject by producing a truth about the racial body and one’s identity (or identities) despite the subject’s claims’ [16, pp. 108–110]. The evidence given by EURODAC’s fingerprint system was placed ahead of the person’s narrative within the hierarchy of truthfulness [9]. The asylum seeker was portrayed as a deceptive subject given that the biometric system unveiled the ‘truthful’ of his whereabouts. The inconsistency encountered did not question the credibility of the technology, but rather affected the asylum seeker’s credibility. Moreover, this inconsistency became decisive for the UK’s judicial power to refuse his asylum petition.

Perhaps less intuitively, this asylum appeal also exposes the elephant in the room of biometrics. Whilst engineers are centering their efforts in training better performing, fairer, and more equitable biometrics, the same systems are implemented at the border to deny asylum and push migrants back. As previously shown, we are witnessing a trend on fairness in biometrics [26]. In the previous section, we have demystified the possibility of fairness in biometric systems taken into account different definitions of algorithmic fairness. However, in this paper, we argue that part of the literature of fairness in biometrics neglects how these systems are implemented in real life for migration control. We also observe this trend beyond biometrics, proposing approaches to better distribute asylum seekers within a country [4, 11, 25, 57]. However, algorithmic fairness cannot distribute justice in scenarios which intended purpose is to discriminate and that consistently jeopardise fundamental rights. As Tenday Achiume has argued: ‘[T]here can be no technological solution to the inequities of digital racial borders’ [2, p. 337]. Fairer biometrics and algorithmic solutions implemented at racialised borders conceals the injustices that these infrastructures reproduce. These social, political and historical injustices are the elephant in the room of biometrics, the controversial issue that is obvious but remains ignored and unmentioned in debates around borders, biometrics and fairness.

In April 2021, the European Parliament published the AI Act, the first legal framework proposal to regulate artificial intelligence [33]. The scope of the AI Act is to address the risks associated with the use of such a technology and protect safety and fundamental rights. Within this document, biometrics is considered a high-risk system in the following areas: (i) biometrics identification and categorisation of natural persons, (ii) migration, asylum and border control management, (iii) law enforcement and (iv) emotion recognition. Probably, the recent campaigns against mass surveillance using facial recognition organised by several organisations have played a key role for the regulation of biometric systems [14, 30]. Indeed, the proposal opts to ban the use of ‘real-time’ facial recognition in public spaces [94]. However, certain exceptions are considered regarding the use of biometric systems such as targeted search for specific potential victims of crime, threat to the life or physical safety of natural persons or of a terrorist attack or perpetrator or suspect of a criminal offence. Interestingly, Article 83 exempts large-scale biometric systems used for migration control in the EU from this regulation:

This Regulation shall not apply to the AI systems which are components of the large-scale IT systems established by the legal acts listed in Annex IX that have been placed on the market or put into service before [12 months after the date of application of this Regulation referred to in Article 85(2)], unless the replacement or amendment of those legal acts leads to a significant change in the design or intended purpose of the AI system or AI systems concerned. [33, p. 88]

Despite of the fact that the legal document categorised as high-risk biometric systems used in the context of migration, border control management and law enforcement, the previous text exposes that this same regulation does not apply on the four biometric databases (EURODAC, VIS, SIS II, and EES), which are used specifically for these purposes. Regulation will only apply when there is a ‘significant change’ within these systems, but the proposed document does not provide what does a ‘significant change’ mean. Thus, the AI Act will not entail any substantial legal change for migrants and asylum seekers who are screened by biometric systems upon arrival in Europe.

Whilst facial recognition technologies used by police authorities in public spaces has been banned by the European Parliament [70, 81], the use of fingerprints to immobilise migrants will remain legalised and in the shadows of any public or political debate. As the EU Rapporteur, Petar Vitanov, said after the resolution approved by the European institution: ‘This is a huge win for all European citizens’. Yet, fundamental rights for non-Europeans will be put into the background. Asymmetries in legal, political and social rights have been historically confronted. In her classic Women, Race & Class, Angela Davis narrates the frictions between the (white) feminist movement and the enslavement of Black people. During the women’s rights campaign in the US, she explains the advanced political position of an American abolitionist and women’s rights advocate: ‘But Angelina Grimke proposed a principled defence of the unity between Black Liberation and Women’s Liberation: “I want to be identified with the Negro”, she insisted. “Until he gets his rights, we shall never have ours.” ’ [24, p. 59]. Bringing Grimke’s political consciousness to our discussion about the regulation of biometrics, we suggest that until migrants get their digital and fundamental rights, we shall never have ours.

6 Conclusion

Fairness has emerged in the context of biometrics as an emergent research area. It aims at addressing and mitigating demographic biases in systems designed to identify or authenticate subjects based on body features (eyes, face, finger veins, among others). Yet, fairness has overshadowed the elephant in the room of the use of biometrics, the controversial issue which is obviously present but avoided as a subject for discussion. Biometrics has a long-standing colonial and racial legacy which is usually ignored by the biometric industry and research field. This heritage is still latent today with the implementation of biometric systems for the purposes of migration control and law enforcement. Whilst the study of fairness revolves around ‘debiasing’ biometric systems, migrants’ fundamental rights are jeopardised by the use of this technology at the border.

In this paper, we argued that biometrics are and will be always biased. Building on the literature of fairness in machine learning, we demonstrated theoretically that biometric systems cannot mutually satisfy different fairness definitions. Then, we empirically proved the impossibility of fair biometric systems. We also observed that the biometric dataset proposed to train the biometric systems reproduce racialisation of bodies, underrepresenting non-Western subjects and using race categories that are colonial, archaic and offensive. Although our experiment is only tested in one dataset due to limitations on accessing biometric and demographic data, results clearly exposed that biometric systems show differences on three fairness criteria at different decision thresholds. Yet, this paper has pushed this argument further showing how the focus on the fairness of biometrics disregards the political discourse about the use of biometrics at the border. As we have shown, biometric systems are used nowadays by border and judicial authorities for migration control. The algorithmic decision is used to assess and challenge the narrative of the migrant or asylum seeker, and in case of an inconsistency, the biometric output is positioned as the ground truth. Moreover, the proposed AI regulation by the EU that bans and categorises certain biometric systems will not be applied to the large-scale databases that are used to immobilise and criminalise migrants. As J. Khadijah Abdurahman argues: ‘[I]t is not just that classification systems are inaccurate or biased, it is who has the power to classify, to determine the repercussions / policies associated thereof and their relation to historical and accumulated injustice?’[1].

In conclusion, the use of fairness in algorithmic systems installed in social and political contexts which principal and intended function is to discriminate, displaces the breach of fundamental rights because the algorithm is ‘fair’. Fairer biometric systems embedded at the border will legitimise denials of asylum, push-backs or secondary movements. Moreover, the current debates around the demographic biases and the ethics of artificial intelligence overshadow political, social and historical discrimination that was there before the technology. As Browne argues: ‘a critical biometric consciousness must acknowledge the connexions between contemporary biometric information technologies and their historical antecedents’ [16, p. 118]. As this trend will become more prominent in the incoming years, there is an urgent need to shift the debates around the historical and political context in which most of these systems are embedded in the present.