There is an elephant in the room: towards a critique on the use of fairness in biometrics

The proliferation of biometric systems in our societies is shaping public debates around its political, social and ethical implications. Yet, whilst concerns towards the racialised use of this technology have been on the rise, the field of biometrics remains unperturbed by these debates. Despite the lack of critical analysis, algorithmic fairness has recently been adopted by biometrics. Different studies have been published to understand and mitigate demographic bias in biometric systems, without analysing the political consequences. In this paper, we offer a critical reading of recent debates about biometric fairness and show its detachment from political debates. Building on previous fairness demonstrations, we prove that biometrics will be always biased. Yet, we claim algorithmic fairness cannot distribute justice in scenarios which are broken or whose intended purpose is to discriminate. By focusing on demographic biases rather than examine how these systems reproduce historical and political injustices, fairness has overshadowed the elephant in the room of biometrics.


Introduction
Biometric systems are being designed and implemented by public and private organisations for law enforcement, migration control and security purposes.This technology is used to identify or verify the unique identification of a person through physical, physiological or behavioural characteristics, such as fingerprint, face, voice, gait or finger vein of human bodies.Individual bodily characteristics are transformed into biometric data and used by the authorities as the truthful identity of an individual.Biometric systems are part of our daily lives, we use them to unlock our mobile phones or to 'efficiently' cross borders at airports.However, this technology has different consequences depending on a subject's citizenship status.For instance, according to the Dublin Regulation, the EU is extracting and storing fingerprints of asylum seekers in large-scale databases to control border crossings and determine the country responsible of the asylum petition Queiroz (2019); Broeders (2007); Van der Ploeg (1999).In other words, biometrics are implemented to immobilise, control and obstruct migrants' movements within Member States through fingerprints identification Scheel (2013); Tazzioli (2019).The World Food Programme (WFP) in partnership with the United Nations High Commissioner for Refugees (UNHCR) implemented iris recognition to register migrants and provide cash assistance in refugee camps Aloudat et al. (2016).The use of biometrics in these scenarios has been largely criticised by academics, activists and human & digital rights organizations who have argued that 'undermines democracy, freedom and justice' EDRi (2020); Queiroz (2019).In the specific context of migration, biometric systems are used to illegalise freedom of movement through Dublin Regulation, infringing a fundamental right.Yet fairness has plunged into biometrics as a new research area which aims at debiasing these systems, ignoring the historical, political and social context in which this technology is embedded.
This paper starts by questioning the 'emergent challenge' of fairness in biometrics.Since the publication of Gender Shades Buolamwini and Gebru (2018) in 2018, the field of biometrics has experienced a surge in studies on bias and disparate impact in algorithmic systems such as facial recognition, fingerprints, finger veins, iris recognition, among others Drozdowski et al. (2020).By examining the recent literature on fairness in biometrics, we observe a lack of engagement with the state-of-the-art of fairness.Notably, a recent work in finger vein recognition systems have suggested a lack of bias for the tested biometric algorithms Drozdowski et al. (2021).Although finger vein systems have not been implemented for migration control, we propose to empirically demonstrate the impossibility of fairness examining the systems used in this work.Rather than evaluating fairness using at least one of the multiple definitions proposed in recent years by fairness scholars Hutchinson and Mitchell (2019); Verma and Rubin (2018); Dunkelau and Leuschel (2019); Barocas et al. (2019), the authors analysed algorithmic discrimination by analysing statistical differences on descriptive statistics of the score distributions based on gender, age, fingers and hands.Although this might seem sensible to evaluate demographic biases, we outline serious limitations.First, the decision threshold plays a key role in the assessment of fairness in biometrics systems de Freitas Pereira and Marcel (2020).Second, intersectional demographic evaluation of gender and age must be assessed.Third, it has been proved that fairness definitions are mutually exclusive, so the lack of bias is technically impossible in any algorithmic system Garg et al. (2020); Chouldechova (2017); Zhao and Gordon (2019); Kleinberg (2018).In order to show these flaws, this paper proceeds by developing a theoretical framework, translating fairness definitions into biometrics.Building on previous works in fairness and machine learning, we theoretically proved the impossibility of unbiased biometric systems.Then, we empirically demonstrate that biometric systems are unfair.Moreover, we highlight that the dataset used to train these systems reproduce the racialisation of subjects, proposing race categories that are archaic and offensive and gender categories that are binary.
We argue that a critical questioning of the use of fairness in biometrics systems should also be focused on the historical, political and social contexts in which biometrics are deployed.Moving our discourse towards a 'critical biometric consciousness ' Browne (2015), we analyse the case of an asylum appeal where the migrant's credibility was challenged in part due to an inconsistency between his testimony and a biometric trace stored in a database.Given this fact and other inconsistencies, the UK Deputy Upper Tribunal Judge denied his asylum claim, dismissing any other evidence provided by the asylum seeker.This case, we argue, unveils the elephant in the room of biometrics: the obvious fact that is being intentionally ignored or left unaddressed, showings how biometrics implemented at the border cannot be fair given that borders' intentional function is to discriminate.While academics, private companies and biometric engineers are centred in building more accurate and fairer biometrics, little attention is paid about how biometrics are jeopardising fundamental rights.Importantly, whilst the European Commission has proposed the first legal framework for artificial intelligence and biometrics, the Artificial Intelligence Act (AI Act hereinafter), to protect European citizens' digital and fundamental rights European Commission (2021), some biometric systems used at the border will be explicitly exempt from such regulation.Therefore, migrants' digital and fundamental rights will not obtain the same lawful protection.Since fairness in biometrics has the risk of becoming more prominent in the incoming years, we urge for a critical and radical examination of this field.We suggest that we must also engage and acknowledge the politics and situated current debates in biometrics within a broaden historical context of struggles against discrimination at the border, moving beyond the technological ethical dilemmas about ethics and biases.To the best of our knowledge, this is the first academic work that investigates fairness in biometrics from a critical perspective, and pushes the argument further showing how debates about the core function of borders and the use of biometrics to criminalise migration are undermined by the focus on ethics and more equitable biometric systems.
2 Fairness in biometrics: An emergent challenge?
In 2018, Buolamwini and Gebru Buolamwini and Gebru (2018) published Gender Shades, an academic work that assessed bias in gender classification algorithms through facial recognition.They analysed several commercial gender classification models and found significant disparities based on individuals' characteristics: white skin had better results than dark skin whilst males obtained better results than females.This intersectional benchmark opened up a new avenue of critical discourse towards biases in racialised technologies.It influenced public and academic debates towards the use of facial recognition Kantayya (2020); Benjamin (2019); O'Flaherty (2020), creating awareness about the risks of algorithms that encode and propagate historical, political and social biases.Consequently, it also disrupted the field of biometrics.Since the publication of Gender Shades Buolamwini and Gebru (2018), fairness has emerged as a major challenge within biometrics Drozdowski et al. (2020).
The main goal of fairness in the field of biometrics is to estimate and mitigate bias in systems that determine the identity or other characteristics like the gender of individuals Ross et al. (2019).Researchers have analysed demographic biases in facial recognition, There is an elephant in the room: Towards a critique on the use of fairness in biometrics Valdivia et al.
fingerprints, palmprints, iris and even in finger veins.Face recognition has been notably the most analysed system in the last decade, where gender and race are the features analysed.In general, males and white skins obtain higher biometric performance Lohr (2018); Acien et al. (2018); Serna et al. (2021);de Freitas Pereira and Marcel (2020).The annual reports published by National Institute of Standards and Technology (NIST) Grother et al. (2019) also found error disparities based on gender and race on more than 189 commercial facial recognition systems that are used for border control.Similarly, other biometrics systems such as iris recognition find noteworthy differences between females and males, with the former having higher error rates Fang et al. (2021).Biases also persists in fingerprints technologies, but rather than gender and race, age is the demographic feature most affected.The conclusion achieved is that fingerprint verification systems get higher error rates on children Marasco (2019); Preciozzi et al. (2020).Indeed, most of the studies focus on analysing race and gender rather than age.In this case, the performance of some biometric systems such as fingerprints, iris of finger vein might be more affected by age.More interestingly, researchers have suggested that 'statistically significant bias' on age and gender 'have not been detected on five finger vein recognition algorithms tested on four datasets ' Drozdowski et al. (2020).
In spite of the numerous biometric articles in fairness that have been recently published, there is a notable disengagement from relevant and previous works in fairness.First, these works analyse algorithmic bias without considering the more than 20 definitions in fairness in machine learning that have been proposed in the last decade Verma and Rubin (2018); Dunkelau and Leuschel (2019); Barocas et al. (2019).For instance, in Drozdowski et al. (2020) demographic bias is calculated by analysing differences of score distributions by groups (males vs. females or children vs. adults).However, differences on error rates which is an standard approach to evaluate fairness are not considered.Second, despite of the popularity of Gender Shades Buolamwini and Gebru (2018) and its proposed benchmark, intersectionality is not considered in any study analysed.In general, bias is analysed taking only a single demographic feature into account (gender, race or age).Third, some studies also suggest a lack of bias of biometric systems such as in Drozdowski et al. (2020).However, different fairness works have demonstrated the impossibility of fairness, proving mathematically and empirically that fairness definitions are mutually exclusive Garg et al. (2020); Chouldechova (2017); Zhao and Gordon (2019); Kleinberg (2018).As a result, it has been recently proved that despite the efforts on debiasing biometric system bias still persists Grother et al. (2019).
Part of the literature of fairness in biometrics that we have examined ignores that these systems are nowadays impacting on fundamental rights Castelvecchi (2020).For instance, the authors of Gender Shades clearly exposed that 'all evaluated companies provide a "gender classification" feature that uses the binary sex labels of females and male.This reductionist view of gender does not adequately capture the complexities of gender or address transgender identities ' Buolamwini and Gebru (2018).In fact, the mere idea of gender classification algorithms infringes fundamental rights by designing and targeting trans people which are more likely to be 'miss-classified'.Examining the performance of classification models with respect to gender will likely improve these models.Yet, we ought to consider if it is even a desirable goal.What is the benefit of gender classification algorithms bring to our societies?In this paper we focus the critique towards the use of biometrics for migration control which impacts on asylum seekers' rights.Since 2000, the EU has implemented biometrics systems to identify travellers for migration control Scheel (2013); Metcalfe and Dencik (2019); Amnesty International (2016).Judges, officials in migration administration and border guards among others are currently making use of biometric technology in asylum cases or visa applications Glouftsios and Scheel (2021); Jones (2019) to make 'unequivocal' decisions.The output of the biometric systems are considered more reliable than migrants or asylum seekers, which are perceived as a deceptive subjects.Thus, the algorithmic performance and error rates are never scrutinised.
Fairness and the study of bias has become the elephant in the room of biometrics.While the effort is focused on solving the challenge of addressing demographic biases and design 'fair' biometric systems, little attention is paid on the context of how it is used.These systems are nowadays reproducing political injustices which are linked to a colonial legacy which is ignored in contemporary biometrics Browne (2015).As Maguire argued Maguire (2009), fingerprints were used by a Brisith officer at the Indian Civil Service to avoid fraud on colonial subjects.At the same time, Galton investigated 'the heritable characteristics of race' on racialised individuals in British prisons.In fact, biometrics 'offered 19th-century innovators more than the prospect of identifying criminals: early biometrics promised a utopia of bio-governmentality in which individual identity verification was at the heart of population control ' Maguire (2009).In the 21th-century, this approach to biometrics has evolved with the use of algorithms that automatically calculate matches.Yet its colonial legacy remains still visible.At the border, these systems are used to identify racialised subjects.There, fairness has emerged to make the identification of asylum seekers 'fairer'.

A theoretical approach towards fairness in biometrics
Fairness is a contested concept that have been historically discussed by social scientists, legal experts and philosophers.Yet there is no consensus on what this concept means.This idea have been translated into mathematical definitions which quantify the fairness of algorithmic-based models Hutchinson and Mitchell (2019), resulting in different formulations Verma and Rubin (2018); Dunkelau and Leuschel (2019); Barocas et al. (2019).In recent years, the research field in biometrics have also evaluated the bias and fairness of these systems as well as developing novel methods for bias mitigation.Most of the studies that we have analysed do not take into account the most relevant state-of-the-art definitions of fairness in machine learning, and some of them argued that their experimental evaluation suggests lack of bias in score distributions Drozdowski et al. (2021).Moreover, recent studies of fairness in machine learning have proved to be generally impossible to satisfy several fairness criteria simultaneously Garg et al. (2020); Chouldechova (2017); Zhao and Gordon (2019); Kleinberg (2018).
In this section, we mathematically demonstrate the impossibility of bias-free biometric systems, showing that several fairness criteria can only be satisfied under very limited conditions.We introduce a formulation for biometric verification systems and translate fairness criteria into this formulation to finally demonstrate the incompatibility among them.

Formulation of a biometric verification system
In the discussion herein we focus on biometric verification systems.These systems confirms (or rejects) whether a biometric sample belongs to an specific individual based on similarity to their learned representation.Formally, we use the following notation: • x (i) : learned representation of the biometric ith-sample within the dataset D, • y (i,j) : element (i, j) of the binary outcome vector y, • θ: set of parameters of the biometric system, • ỹ(i,j) = f ((x (i) , x (j) )|θ): element (i, j) of the binary prediction vector ỹ, • τ : decision threshold, • s (i,j) : score of similarity between samples i and j.
These systems employ deep learning algorithms which transform images into numerical representations.They are driven by an optimisation function to minimise errors between two samples (x (i) and x (j) ) and the outcome (y (i,j) ) which is whether they belong to the same individual or not: The biometric model compares two learned representations samples (x (i) and x (j) ) and obtains a similarity score (s (i,j) ).This score is then translated into a binary prediction (ỹ (i,j) ) given a threshold τ : The model learns the best representation of parameters (θ) that achieves the minimum number of 'miss-identifications'.This means that the binary prediction (ỹ (i,j) ) should be as similar as the binary outcome (ỹ (i,j) ).In the case of biometric verification systems, two pairs of biometric samples are labelled as genuine (y (i,j) = 1) if they correspond to the same individual and as impostor (y (i,j) = 0) otherwise1 .Therefore, a true genuine (T G = P r(ỹ (i,j) = 1|y (i,j) = 1)) refers to those samples that correspond to the same individual and are correctly matched; the true impostor (T N = P r(ỹ (i,j) = 0|y (i,j) = 0)) refers to samples that correspond to different individuals and they are rejected; false genuine (F G = P r(ỹ (i,j) = 1|y (i,j) = 0)) refers to different individuals whose samples are matched; and false impostor (F I = P r(ỹ (i,j) = 0|y (i,j) = 1)) refers to samples of the same individual that are miss-matched2 .Thus, incorrect association of two subjects or failed association of one subject are the errors that biometric systems commit.Formally, the true genuine rate (T GR), true impostor rate (T IR), false genuine rate (F GR), and false impostor rate (F IR) are formulated as: In contrast to machine learning, biometric systems are evaluated setting different decision thresholds (τ ) which clearly affects on the distribution of errors (see Figure 1).Equal Error Rate (EER) is the value where the false genuine and impostor rates curves intersects (F GR = F IR). False genuine rate at 0.001 (F GR 1000 ) is the value of false impostor rate when false genuine rate is 0.001 (F GR = 0.001), and vice versa.Systems are also evaluated when one of these rates is 0 (ZF GR when F IR = 0 or ZF IR when F GR = 0).Thus, the decision threshold is set targeting different values of error.

Formulation of three fairness definitions for biometrics
A fairness measure is a mathematical function that quantifies and assesses biased systems and algorithmic discrimination.These measures aimed at evaluating model performance across demographics groups (C = {C 1 , C 2 , . . ., C n }) and ensure that there is no disparate impact There is an elephant in the room: Towards a critique on the use of fairness in biometrics Valdivia et al.A biometric system satisfies equalised odds if TGR and FGR are similar across demographics groups: 3.2.2Statistical parity (also group fairness or demographic parity) Dwork et al. (2012) A biometric system satisfies statistical parity if the probability of predicted genuine is similar across demographics groups.This definition is based on the predicted outcome.Mathematically, this definition is expressed as follows: 3.2.3Predictive parity (also outcome test) Chouldechova (2017) A biometric system satisfies predictive parity if the probability of being predicted genuine of actual genuine is similar across demographic groups.More formally:

The impossibility of unbiased biometric systems
In this section, we provide a theoretical framework to demonstrate that the previous fairness criteria are mutually exclusive.Building on previous proofs Garg et al. (2020); Chouldechova (2017); Zhao and Gordon (2019); Kleinberg (2018), we mathematically show that only under very unrealistic conditions (equal ratios among demographics groups, trivial or perfect biometric system3 ), these three definitions can be simultaneously satisfied.Thus, we prove the impossibility of any unbiased biometric system.
There is an elephant in the room: Towards a critique on the use of fairness in biometrics Valdivia et al.
To simplify the notation, we assume that C = {C 1 , C 2 }.
Proposition 3.3.1.Given a biometric system which is non-trivial with unequal ratios among groups that satisfies equalised odds and statistical parity, then predictive parity cannot hold.
Proof.Given Bayes' theorem, we obtain that: If predictive parity is satisfied, then: On one hand, if P r(ỹ = 1|y = 1) < , we obtain that Proposition 3.3.1 is only satisfied when T GR = 0 which means that the system fails to correctly classified any genuine instance or the system is a trivial classifier, e.g.there are only impostor instances.On the other hand, |P r(y = 1|C 1 ) − P r(y = 1|C 2 ) < | =⇒ P r(y = 1|C 1 ) ∼ P r(y = 1|C 2 ) which implies that Proposition 3.3.1 is satisfied only under equal ratios.
4 A biometric experiment: Why this system is and will be always biased?
We examine four biometric systems (finger veins) to empirically demonstrate the impossibility of the impossibility of fairness (see Propostion 3.3.1).These biometric algorithms identify subjects through vascular patterns on the human body, i.e. finger, palm or human eye veins Uhl et al. (2020).These four systems are proposed in Drozdowski et al. (2021) to assess demographic bias by differences on score distribution statistics (mean and standard deviation) of genuine and impostor attempts are evaluated.The conclusion achieve is that statistically significant biases in score distributions do not exist and the authors proposed to evaluate this framework in the future with more individuals, given that the number of subjects in each of the databases is very small.Rather than reproduce their experiments with larger databases, we empirically demonstrate that this framework is biased from three different perspectives: (1) ratios, (2) fairness criteria, and (3) intersectionality analysis.

Ratios
Broken links and APIs hampered the access to three of four publicly available datasets Lu et al. (2013); Ton and Veldhuis (2013); Vanoni et al. (2014).Through an online petition4 , we had access to PLUSVein-FV35 Kauba et al. (2018).This database contains 1440 finger vein images of 60 individuals from different hands and fingers.Figure 2 shows the number of individuals based on age, race6 and gender7 .We observe that the mean of age within this dataset is 37.9 years, whilst Q 3 is 46.5 years, which implies that the database is not representative for elders.Europeans are the most represented race in PLUS-VeinFV3: 90% European, 5% East Asian, 1.6% Central Asian, 1.6% 'Mulatto', 1.6% African.The proportion of non-European individuals is significantly low (see Figure 2).Remarkably, 'Mulatto' is a label proposed for this demographic which is not linked to any continent.Whilst European, East Asian and Central Asian correspond to race labels related with geographical expressions, 'Mulatto' was a label used during the Spanish colonial period to mark the slave status of children born to Spaniards and African women slaves.This category exposes the colonial and racial legacy of biometrics through a conceptualisation of racialised and colonial bodies.'Mulatto' is a clear expression of the making of bodies through their qualities of 'colour' and colonial slavery Browne (2015); Gilroy (2000).Analysing the gender feature, we observe that the dataset is more balanced: 60% males and 40% females.Yet this feature does not consider other gender expressions rather than binary ones.Bias on demographic groups in PLUSVein-FV3 becomes evident in Figure 2. Any biometric system trained in this dataset will perform better on individuals whose demographic characteristics are widely presented in the database, i.e. young male Europeans.The performance of the biometric system will be poor on individuals who are under-represented: non-Europeans, elderly and females.Consequently, given that the majority of samples are taken from European males, rates of genuine and impostor outcomes will be significantly unequal across different demographic groups.Thus, as previously demonstrated (Proposition 3.3.1)equalised odds, statistical parity and predictive parity cannot hold together.

Fairness criteria
The experiments are run using four finger vein recognition systems.These systems are designed using different types of vein recognition schemes: LBP Table 1 shows the results on the PLUSVein-FV3.In general, the error rates of these systems are very low (see MC, PC and SIFT).However, we observe that LBP has the worst performance, obtaining 0.89 and 0.78 of F IR at F GR 1000 and F GR 100 respectively.In this case, if the rate of false match is very low (↓ F GR), the false non-match rate is extremely high (↑ F IR). Nevertheless, the aim of this section is to demonstrate that biometric systems are and will be always biased.To do so, we calculate three fairness criteria (equalised There is an elephant in the room: Towards a critique on the use of fairness in biometrics Valdivia et al.  2: Biometric recognition performance as measured by fairness criteria differences among three demographic groups at F GR 1000 .All systems are consistently unfair, showing significant equalised odds and statistical parity differences for age, gender and race.odds, statistical parity and predictive parity) on three different demographics (age, gender, and ethnicity).Demographics groups are categorised as: young (≤ 45) and old (> 45), male and female, and European and non-European.

Performance metric EER F GR
As de Freitas Pereira and Marcel observed de Freitas Pereira and Marcel (2020), several works of fairness in biometrics set a single τ for every demographic group.However, they argue that this is a 'serious flaw' and 'can give a false impression that a biometric verification system is fair' (de Freitas Pereira and Marcel, 2020, p. 3).They conclude that '[f]air biometric recognition systems are fair if a decision threshold τ is "fair" for all demographic groups with respect to F GR(τ ) and F IR(τ )' (de Freitas Pereira and Marcel, 2020, p. 10).Following this suggestion, fairness metrics are calculated after setting the same decision thresholds for the three demographic groups: F GR 1000 and near ZF IR.
The overall recognition performance results show that the four biometric systems are unfair considering three fairness definitions on the three demographic groups assessed8 .These results contradicts the idea that these systems lack of demographic bias.Moreover, the results on Table 3 hold our theoretical framework on the impossibility of unbiased biometric systems (Proposition 3.3.1).We observe how the decision threshold value (τ ) impacts on the fairness criteria that are satisfied.On one hand, setting τ at F GR 1000 both equalised odds and statistical parity are not satisfied, yet predictive parity is achieved (Table 2).On the other hand, when τ is set at ∼ ZM R then predictive party is held but equalised odds and statistical parity are not achieved (Table 3).
The empirical results at F GR 1000 clearly hold Chouldechova's observation in (Chouldechova, 2017, p. 157) about predictive parity, F GR, and F IR.When a system satisfies predictive parity but ratios differ across demographic groups, the system cannot achieve equal FGR and FIR across those groups.Thus, if FGR are not similar equalised odds cannot be satisfied.Analysing disparities among groups, old adults obtain worsen results than young, which could be a consequence of low proportion of old adults in the dataset.Gender also has significant unfair results.For instance, MC and PC obtains higher F GR on females than on males (421.18% and 198.8% respectively).Fair criteria are also worsen on non-Europeans than Europeans.
On the other hand, setting thresholds near ZF IR (∼ ZF IR) implies that predictive parity is not satisfied.These results clearly demonstrate Proposition 3.3.1 that states that the three fairness definition cannot hold simultaneously 9  that age and gender obtain wider differences.In this case, old people and female individuals obtain larger error rates than young and males respectively.Given the intersectional benchmark proposed in Gender Shades Buolamwini and Gebru (2018), we evaluate the results of the four biometric recognition systems based on intersectional groups.The results show that disparities are more prominent on age and gender, rather than on ethnicity.Unlike facial recognition, finger veins systems do not taken into account skin tones and their performance is more affected by other attributes such as size of fingers.Moreover, wide differences in ratios among Europeans and non-Europeans (Figure 2) impacts also on these results.
Figure 3 shows the distribution of genuine and impostor scores across four groups (young females, old females, young males and old males) at two decision thresholds.On the four biometric systems, young male scores obtain better results than the other groups.The distribution of genuine scores (orange box) is more right-handed, which results in a higher number of correct matches (↑ T GR).In contrast, impostor scores (purple box) are less left-handed, which implies high rates of false matches (↑ F GR). Genuine distributions of young males obtained the highest lower quartile (Q1) on the four systems.Young females obtain higher genuine scores than old females and males.Overall, all biometric systems perform worst on elderly males.For instance, Q1 of PC's genuine distribution of old males is the only one below τ when F GR 1000 .This implies that the F GR of this group is substantially higher than other groups.The distribution of impostor scores is similar across the four demographic groups and biometric systems.Impostor distributions are narrower than genuine There is an elephant in the room: Towards a critique on the use of fairness in biometrics Valdivia et al.
distributions.However, the number of outliers is considerable (see LBP's distributions).Setting τ at ∼ ZF IR, young males obtain higher T IR than females and elderly adults.

Politics beyond fairness in biometric systems
Biometric systems will always be biased as we have previously demonstrated.Yet these systems are getting more accurate and fairer.The percentage of errors and biases are decreasing consistently over the years.Private companies and research centers are training their systems with better image quality and more sophisticated algorithmic architectures which outperforms previous versions.For instance, the latest publication released by NIST Grother et al. (2021) reported that the best vendor's facial recognition has negligible errors with VISA Border Photos (F GR 10 6 = 0.0023).Moreover, the analysis on demographic disparities shows no significant differences.In addition to demystifying myths on the technical details of algorithmic systems, we argue that a critical approach to fairness in biometrics, coined as 'critical biometric consciousness' by Simone Browne (Browne, 2015, p. 116), should also shift the attention towards the political context and racialised mechanism in where biometrics are embedded Amoore (2021); Castelvecchi (2020).
Since 2001, the EU has established a digital border infrastructure for migration control Broeders (2007); Jones (2019).The European Asylum Dactyloscopy Database (EURODAC), the Visa Information System (VIS), the Schengen Information System (SIS II), and the new Entry-Exity System (EES) are the four main databases that EU countries use to determine responsibility for examining an asylum application, register visa applications or border-crossings.These systems are provided with biometric systems in order to register, identify and criminalise migrants Tazzioli (2019); Stenum (2017); Metcalfe and Dencik (2019).Through these databases, migrants who have entered an EU country are identified through their fingerprints.The storage of biometric samples in these databases are used to identify asylum seekers in countries which 'are not responsible' for their asylum petition (EURODAC) or detect people those who visa has expired (EES and VIS).However, biometric traces provided by these biometric systems are used in other contexts such as asylum tribunal decisions.On May 201910 , the First-tier Tribunal in the UK refused a petition of asylum to an individual national of Iraq of Kurdish origin due to a biometric evidence given by one of these systems (Immigration and Chamber), alongside other discrepancies.The asylum seeker claimed that he was at risk of serious harm, stating that one family member had been killed and his own house was intentionally set on fire.He appealed against the decision, the Deputy Upper Tribunal Judge did not set aside the previous decision: A further document was relied upon by the Respondent at that hearing, being a EURODAC search result, demonstrating that a person in the Appellant's identity was fingerprinted in Dresden in Germany on 22 March 2016.The Appellant's account as given in his Statement of Evidence (SEF) interview and confirmed in oral evidence before the judge was that he only left Iraq in December 2017 The Appellant denied before the judge that the person identified in the EURODAC search was him but the judge stated that she was satisfied by the details contained within the document, and looking at the clear photograph on the EURODAC match, that the person fingerprinted in Germany was indeed the Appellant.[...] The evidence provided by the EURODAC document is unequivocal.The photograph contained within the document is clearly the Appellant who appeared before me and I have no reason to doubt that the document relates to him.Therefore this document upon which I am satisfied I can place significant weight puts him in Germany on 22 March 2016.Consequently, I find that this evidence undermines the credibility of his entire account and his credibility as a witness in his own cause and that it renders his entire account unreliable.(, Immigration and Chamber,p. 2) This case unveils how biometric systems are used nowadays at the border to refuse and fail asylum seekers.As Browne has argued biometrics are a technology 'to make the mute body disclose the truth of its racial identities' and 'that can be employed to do the work of alienating the subject by producing a truth about the racial body and one's identity (or identities) despite the subject's claims' (Browne, 2015, pp. 108-110).The evidence given by EURODAC's fingerprint system was placed ahead of the person's narrative within the hierarchy of truthfulness Aradau and Perret (2022).The asylum seeker was portrayed as a deceptive subject given that the biometric system unveiled the 'truthful' of his whereabouts.The inconsistency encountered did not question the credibility of the technology, but rather affected the asylum seeker's credibility.Moreover, this inconsistency became decisive for the UK's judicial power to refuse his asylum petition.
Perhaps less intuitively, this asylum appeal also exposes the the elephant in the room of biometrics.While engineers are centering their efforts in training better performing, fairer, and more equitable biometrics, the same systems are implemented at the border to deny asylum and push migrants back.As previously shown, we are witnessing a trend on fairness in biometrics Drozdowski et al. (2020).Furthermore, we also observe this trend in algorithmic fairness for migration and asylum contexts, proposing approaches to better distribute asylum seekers within a country Bansak and Martén (2021); Ahmad (2020); Kinchin (2021).However, algorithmic fairness cannot distribute justice in scenarios which intended purpose is to discriminate and that consistently jeopardise fundamental rights.As Tenday Achiume has argued: '[T]here can be no technological solution to the inequities of digital racial borders' (Achiume, 2021, p. 337).Fairer biometrics and algorithmic solutions implemented at racialised borders conceals the injustices that these infrastructures reproduce.These social, political and historical injustices are the elephant in the room of biometrics, the controversial issue that is obvious but remains ignored and unmentioned in debates around borders, biometrics and fairness.
In April 2021, the European Parliament published the AI Act, the first legal framework proposal to regulate artificial intelligence European Commission (2021).The scope of the AI Act is to address the risks associated with the use of such a technology and protect safety and fundamental rights.Within this document, biometrics is considered a high-risk system in the following areas: (i) biometrics identification and categorisation of natural persons, (ii) migration, asylum and border control management, (iii) law enforcement and (iv) emotion recognition.Probably, the recent campaigns against mass surveillance using facial recognition organised by several organisations have played a key role for the regulation of biometric system Watch (2020); EDRi (2020).Indeed, the proposal opts to ban the use of 'real-time' facial recognition in public spaces Veale and Borgesius (2021).However, certain exceptions are considered regarding the use of biometrics systems such as targeted search for specific potential victims of crime, threat to the life or physical safety of natural persons or of a terrorist attack or perpetrator or suspect of a criminal offence.Interestingly, the exception announced in Article 83 has gone completely unnoticed in the public debate on the AI Act: This Regulation shall not apply to the AI systems which are components of the large-scale IT systems established by the legal acts listed in Annex IX that have been placed on the market or put into service before [12 months after the date of application of this Regulation referred to in Article 85(2)], unless the replacement or amendment of those legal acts leads to a significant change in the design or intended purpose of the AI system or AI systems concerned.(European Commission, 2021, p. 88) Despite of the fact that the legal document categorised as high-risk biometric systems used in the context of migration, border control management and law enforcement, the previous text exposes that this same regulation does not apply on the four biometric databases (EURODAC, VIS, SIS II, and EES) which are used specifically for these purposes.Regulation will only apply when there is a 'significant change' within these systems, but the proposed document does not provide what does a 'significant change' mean.Thus, the AI Act will not entail any substantial legal change for migrants and asylum seekers who are screened by biometric systems upon arrival in Europe.
Whilst facial recognition technologies used by police authorities in public spaces has been banned by the European Parliament Ojamo (2021); Sánchez Nicolás (2021), the use of fingerprints to immobilise migrants will remain legalised and in the shadows of any public or political debate.As the EU Rapporteur, Petar Vitanov, said after the resolution approved by the European institution: 'This is a huge win for all European citizens'.Yet, fundamental rights for non-Europeans will be put into the background.Asymmetries in legal, political and social rights have been historically confronted.In her classic Women, Race & Class, Angela Davis narrates the frictions between the (white) feminist movement and the enslavement of Black people.During the women's rights campaign in the US, she explains the advanced political position of an American abolitionist and women's rights advocate: 'But Angelina Grimke proposed a principled defense of the unity between Black Liberation and Women's Liberation: "I want to be identified with the Negro", she insisted.'Until he gets his rights, we shall never have ours."'(Davis, 2019, p. 59).Bringing Grimke's political consciousness to our discussion about the regulation of biometrics, we suggest that until migrants get their digital and fundamental rights, we shall never have ours.

Discussion
Fairness has emerged in the context of biometrics as a new research area.It aims at addressing and mitigate demographic biases in systems designed to identify or authenticate subjects based on body features (eyes, face, finger veins, among others).Yet fairness has overshadowed the the elephant in the room of the use of biometrics, the controversial issue which is obviously present but is avoided as a subject for discussion.Biometrics has a long-standing colonial and racial legacy which is usually ignored by the biometric industry and research field.This heritage is still latent today with the implementation of biometric systems for the purposes of migration control and law enforcement.Whilst the study of fairness revolves around 'debiasing' biometric systems, migrants' fundamental rights are jeopardised by the use of this technology at the border.
In this paper we argued that biometrics are and will be always biased.Building on the literature of fairness in machine learning, we demonstrated theoretically that biometric systems cannot mutually satisfied different fairness definitions.Then, we empirically demonstrate the impossibility of fair biometric systems.We observed that the biometric dataset proposed to train the biometric systems reproduce racialisation of bodies, underrepreseting non-Western subjects and using race categories that are archaic and offensive.The results clearly held the theoretical framework, showing that biometric systems show differences on three fairness criteria based on age, gender and race groups at different decision thresholds.Yet, this paper has pushed this argument further showing how the focus on the fairness of biometrics undermines the political discourse about the use of biometrics at the border.As we have shown, biometric systems are used nowadays by border and judicial authorities for migration control.The algorithmic decision is used to assess the narrative of the migrant or asylum seeker, and in case of an inconsistency, the biometric output is positioned as the real truth.Moreover, the recent proposed AI regulation by the EU that bans and categorised certain biometric systems will not applied to the large-scale databases that are used to immobilise and criminalise migrants.
In conclusion, the use of fairness in algorithmic systems installed in social and political contexts which principal and intended function is to discriminate, displaces the breach of fundamental rights because the algorithm is 'fair'.Fairer biometric systems embedded at the border will legitimise denials of asylum, push-backs or secondary movements.Moreover, the current debates around the demographic biases and the ethics of artificial intelligence overshadows political, social and historical discrimination that was there before the technology.As There is an elephant in the room: Towards a critique on the use of fairness in biometrics Valdivia et al.
Browne argues: 'a critical biometric consciousness must acknowledge the connections between contemporary biometric information technologies and their historical antecedents' (Browne, 2015, p. 118).As this trend will become more prominent in the incoming years, there is an urgent need to shift the debates around the colonial and racial context in which most of these systems are embedded.

FrequencyFigure 1 :
Figure 1: Genuine (g(s)) and impostor (i(s)) score distributions for a biometric verification system.The figure shows the relation between score distributions and the confusion matrix elements (true genuine (T G), true impostor (T I), false genuine (F G), and false impostor (F I)).

Figure 2 :
Figure2: Intersectional ratios on age, ethnicity and gender of individuals in PLUSVein-FV3Kauba et al. (2018).There are large disparities among groups: more males (M) than females (F), young than old adults, and majority of Europeans.The proposed ethnicity labels lack of diversity, considering one of them ('Mulatto') rather archaic and offensive.

Figure 3 :
Figure 3: Intersectional fairness disparities in four biometric recognition systems based on age and gender.Decision threshold (τ ) is set at ∼ ZF IR (first dashed line) and F GR 1000 (second dashed line).Distribution of genuine (orange) and impostor (purple) scores differ among groups.Classification disparities across intersectional demographic groups are apparent.Overall, young males obtain better performance than other groups.

Table 1 :
1000 F GR 100 F GR 10 ZF IR Error rates of biometric methods.LBP is the method with the poorest performance.MC, PC, and SIFT obtain low and similar error rates.

Table 3 :
. Rather than ethnicity, we observe Biometric recognition performance as measured by fairness criteria differences among three demographic groups at ∼ ZF IR.All systems are consistently unfair, showing significant predictive parity differences for age, gender and ethnicity.These results empirically demonstrate Proposition 3.3.1.