1 Introduction

Facial recognition (FR) is a technology based on artificial intelligence (AI) that analyses human faces for several purposes (Raposo 2022a; The Alan Turing Institute 2019). It operates by measuring human facial features, using these measurements to create a so-called ‘biometric template’, which is nothing more than a mathematical formula (The Alan Turing Institute 2019).

Like many other AI-based tools—robotic agents (Sharkey 2017), chatbots (Parviainen and Rantala 2022), and dark patterns (Neuwirth 2022)—it raises legal and ethical concerns, mostly due to the blurriness of AI technology, leading to its classification as a black-box, and to the lack of a clear legal framework to address its many challenges.

This paper analyzes cases in which the use of FR to identify individuals leads to mistaken identifications, exploring its causes and legal consequences. The final aim is to provide the users of this technology with adequate knowledge about what can go wrong, why, and what legal risks they incur, so that they can protect themselves and their businesses.

Reports of dark-coloured faces not being identified as faces by Hewlett Packard’s facial-tracking software or being identified as gorilla faces by Google Photos (Umoja 2018), and of Asian people being asked if they were blinking when their photos were taken by Nikon cameras, or being told that their eyes were closed by FR software in airports (Borgesius 2018), triggered this discussion.

FR has several purposes in addition to identification: analysis of human emotions, detection of genetic diseases and profiling of individuals based on their characteristics (Raposo 2022a), and inaccuracies can occur in all these domains. These other uses raise legal and ethical issues that should not be undermined, such as risks of discrimination (Castelvecchi 2020), invasion of privacy (Leong 2019), undue data processing (Raposo 2022a) and manipulation of our thoughts and actions (Neuwirth 2022). However, this paper deals exclusively with inaccuracies in FR identification.

Furthermore, there are multiple legal risks and concerns involved in using FR, specifically those related to privacy and data protection (Raposo 2022a), but also to other fundamental rights (European Union Agency for Fundamental Rights 2019) and to the rule of law itself in liberal democracies (Smith and Miller 2022). This paper, however, focuses solely on FR accuracy, the possible harms arising from an erroneous identification and the associated liabilities.

2 FR for identification purposes

The process of FR for identification purposes involves several stages. First, detect a face in an image, which will then be analysed by AI software to measure the geometry of the face. Subsequently, these measurements are converted into data; that is, the picture of a face (analogue information) is converted into a template (digital information). Finally, this template is matched against other template or templates (Fu 2021).

This FR process can differ, depending on the particular form of identification applied (Raposo 2022a). In authentication/authorisation,Footnote 1 the template created from a photo of a given individual is compared with an earlier photo of that same person stored in the system (usually voluntarily submitted by that person as a precondition to receiving a benefit such as a product or service). If there is a positive match, the person is granted access to that benefit; for example, their iPhone is unlocked (Apple 2021) or they are admitted access to a restricted area (Maciel et al. 2016). The aim of such a comparison is to establish whether a person is who he/she claims to be. In contrast, when FR is used for verification/recognition, the template is compared with others saved on a database to establish whether the person is who the system operator believes him/her to be. This FR is widely used for immigration purposes in airports (Glusac 2022) and in law enforcement (Raposo 2022b).

FR operates using facial features, which are a type of biometric data (Kindt 2013). A relevant definition of biometric data results from Article 4(14) of the General Data Protection Regulation (GDPR)Footnote 2 and Article 3(33) of the AI Draft Act (European Commission 2021a, b).Footnote 3 These wide definitions cover two types of personal characteristics: those related to behaviours (actions, forms of walking, forms of writing, voice, mannerisms, habits) and those related to physical characteristics (facial features, but also DNA, iris patterns, fingerprints, palm prints). In theory, more than one method of biometric identification should be used to avoid mistaken identifications.

Biometric data are not only personal data but a special type of personal data, the so-called sensitive data (as per Article 9 of the GDPR, which as a rule forbids the processing of such data, although some exceptions are admitted in its number 2), due to its intrinsic connection to the human person, its special capacity to identify the individual and the consequent risks involved for the latter’s rights and fundamental freedoms (United Nations Development Group 2017). The special ‘sensitivity’ of biometric data is likewise recognised in par. 74 of the UNESCO, Recommendation on the ethics of artificial intelligence (UNESCO 2021).Footnote 4

2.1 (In)accuracy of FR

Despite FR’s increasing accuracy, even the most minute inaccuracy in FR can lead to erroneous results (European Digital Rights 2019; European Union Agency for Fundamental Rights 2019; Office of the High Commissioner for Human Rights 2021). A 2016 study reported that colourful glasses frames are enough to deceive FR technology (Sharif et al. 2016). Moreover, of all possible forms of biometric identification (fingerprints, palmprints, iris, DNA, voice), FR is considered to be the least accurate (Thakkar 2017).

Assessments of FR accuracy differ. Some studies have claimed that the rate of inaccuracy is extremely high. One study analysed eight FR systems on the market and concluded that the error rate ranged between 48 and 62% (Dupr et al. 2020), and other studies report false positive rates as high as 98% (Sharman 2018). In contrast, in a study carried out in 2018, the US National Institute of Standards Technology (NIST) analysed 127 FR algorithms developed by 45 research and development laboratories (Grother et al. 2018). The study concluded that the best algorithms (according to 2018 standards) had an error rate of less than 0.2%.Footnote 5 A NIST report from 2020 confirmed FR’s accuracy, by concluding that ‘the best facial recognition algorithms in the world are highly accurate and have vanishingly small differences in their rates of false-positive or false-negative readings across demographic groups’ (McLaughlin, Castro 2020).

Regardless of these different results, the fact remains that when the volume of people being ‘scanned’ is very high, even a narrow margin of error allows a concerning number of misidentifications. ‘A system deployed on the scale of the population of the European Union would have to achieve an error rate of 0.00000224% (i.e., an accuracy rate of 99.99999776%) to commit less than 10 errors for a total of 446 million individuals. We are still a long way from such performances’ (Renaissance Numérique 2020, 23). In absolute terms, the percentage of error grows when the number of identifications grows, but moreover, the size of the target population also affects accuracy. The more individuals that the system must ‘scan’, the higher the percentage of error.

For instance, suppose FR is being used at the UK’s Heathrow Airport, where an average of 219,458 people pass through daily (Heathrow Airport n.d.). Assuming a percentage of false positives of 0.01% (in the case of extremely precise technology), this equates to 22 people per day being erroneously tagged and stopped. This is clearly problematic. However, considering that the more modern FR software has an accuracy rate of 99% (McLaughlin, Castro 2020) and that the human eye reaches, at best, an accuracy of 85% (White et al. 2015), a properly developed FR system is more effective than human operators.

Several factors may hamper the accuracy of an FR identification,such as the passage of time or the use of medication or recreational drugs (Renaissance Numérique 2020). If a person being scanned has a disability that affects the bone structure or involves facial paralysis, this may undermine the system’s precision (European Union Agency for Fundamental Rights 2019). Age is another obstructive factor, as the level of FR accuracy diminishes when current images are compared with photos taken a long time ago (Boussaad and Boucetta 2020). Facial resemblance remains a problem, as many algorithms cannot distinguish between twins (Noroozi and Toygar 2017).The use of face masks–recurrent during the pandemic–also detracts from accuracy (Hariri 2022). Actions taken deliberately to trick the system are an obvious source of frustration, a clear example being data poisoning.Footnote 6

The main challenges addressed in this paper are as follows: (i) the nature of FR, which operates through probabilities, and the consequent definition of the threshold, for a positive result (ii) AI unintelligibility and (iii) AI biases.

2.2 FR as a matter of probability

As stated by the European Union Agency for Fundamental Rights (2019, 34), ‘Facial recognition technology algorithms never provide a definitive result, but only probabilities’. FR as identification operates as a face-comparison model. The system compares pictures–either one-to-one (authorisation) or one-to-many (recognition)–to identify a match. The final result is not expressed as a ‘yes’ or ‘no’ answer but as a probability (European Union Agency for Fundamental Rights 2019). The so-called ‘confidence level’ or ‘confidence score’ shows the probability, as a percentage, of the image being correctly detected by the algorithm (European Data Protection Board 2022). The software cannot determine whether two templates belong to the same person (an exact match) but only how likely it is that they belong to the same person (Commission Nationale de l’Informatique et des Libertés 2019; Institute and International Association of Chiefs of Police 2019). Such probabilities depend on how accurate the software is (although, as noted above, FR is more accurate than eyewitnesses (Bambauer 2021).

In a one-to-one comparison, such as when FR is used to unlock an iPhone, the level of accuracy is higher, because the matching operates between only two photos instead of comparing one photo with an entire database of potentially millions of photos (Thompson 2020). Apple claims, ‘The probability that a random person in the population could look at your iPhone or iPad Pro and unlock it using Face ID is less than 1 in 1,000,000 with a single enrolled appearance whether or not you’re wearing a mask’ (Apple 2022). If that is the case, the level of accuracy is extremely reliable.

In a one-to-many comparison, however, the percentage of error increases, as the chance of error rises with the number of photos compared. This is, therefore, the context that raises the most concerns.

When comparing two templates, FR software indicates the level of probability that the templates match each other. The system declares a match when the probability meets an established threshold (Commission Nationale de l’Informatique et des Libertés 2019). The higher the threshold is set, the higher the number of matches the system misses. For instance, if the algorithm is set to only provide results with 99% confidence, it may miss many positive matches, which are reported as (false) negatives. However, the matches the system does provide have a very high confidence level, minimising false positives (Crumpler 2020). It is especially important to set a high threshold when the process is conducted without human overview (European Data Protection Board 2022), as human intervention is necessary to diminish the number of false positives. Although setting a high threshold necessarily increases the number of false negatives, this result is deemed to be less legally problematic than the alternative (Crumpler 2020).

Therefore, the determination of the threshold will depend on the risks arising from false positives and false negatives. For instance, a false positive that grants someone access to another individual’s iPhone is less problematic than an erroneous mismatch in a criminal case, risking a wrongful conviction. Thus, in the former, the threshold for a match may be lowered, whereas the latter requires a very high threshold (Umoja 2018). An error in the identification of a criminal is particularly concerning, as certain ramifications, such as detention (which, in the case of erroneous identification, constitutes unlawful detention) and public disclosure of the person’s identity, may damage the individual’s reputation (European Union Agency for Fundamental Rights 2019).

2.3 AI unintelligibility

There are two types of AI: black box and white box (Loyola-González 2019). ‘Black box’ (Smith 2020)is a term used to describe the opaque nature of this technology. In its 2020 White Paper on Artificial Intelligence, the European Commission (2020, 12) made note of AI’s ‘opacity (“black box-effect”), complexity, unpredictability and partially autonomous behaviour’. Complex AI applications involve significant opacity in terms of the visibility of and justifications for the algorithms and data they use.

FR has been described as a ‘black box’ AI. Researchers have yet to determine how FR algorithms operate. The fact that modern FR is based on deep neural networks contributes to this phenomenon. In this form of AI, not even the programmers know how the AI is developed and which elements it uses to form its determinations (Bathaee 2018). This opacity makes it difficult to correct an FR system’s biases and malfunctions.

The nebulous nature of FR is exacerbated by the secrecy involved in the performing technology, protected by laws governing intellectual property rights and commercial secrets (European Digital Rights 2019). The novelty of this technology, which strongly relies on business secrecy, probably discourages developers and manufacturers from disclosing relevant information. This is not a hypothetical situation. In a British case related to the use of FR for law enforcement purposes, an audit of the FR system was required to establish potential biases, but the provider of the AI refused to disclose information invoking ‘business secrecy’, and the court upheld this argument (R [Bridges] v Chief Constable of South Wales Police [2020] EWCA Civ 1058, §199).

2.4 Bias and discrimination

FR algorithms differ in their accuracy rates between genders and ethnic groups. The 2018 Gender Shades project analysed three gender classification algorithms and concluded that they all performed poorly with regard to dark-skinned females, with an error rate 34% higher than that for light-skinned females (Buolamwini and Gebru 2018). A 2019 report from the NIST analysed 189 FR algorithms (covering almost the entire market), and it concluded that most of them had some type of bias, especially against Black and Asian faces (for these groups, the frequency of error was about 10–100 times higher than for white faces). Grother et al. (2019, 2) stated, ‘Our main result is that false positive differentials are much larger than those related to false negatives and exist broadly, across many, but not all, algorithms tested. Across demographics, false positive rates often vary by factors of 10 to beyond 100 times’. They also revealed discrepancies regarding gender, stating, ‘We found false positives to be higher in women than men, and this is consistent across algorithms and datasets. This effect is smaller than that due to race’ (Grother et al. 2019, 2). A recent report from the US General Services Administration (US General Services Administration 2022) also pointed out racial biases against African Americans. Overall, FR has proven to be discriminatory towards certain groups, namely females (Buolamwini and Gebru 2018)and Black people (Najibi 2020).

Two main causes of such biases have been proposed: biased databases and biased programmers (Cowgill et al. 2020).

Databases are biased when they predominantly include photos from a specific group of people, resulting in algorithms trained to recognise mostly individuals from that group, usually white males (Umoja 2018). The algorithm is not able to identify other types of individuals with the same accuracy, as its training lacks adequate representation of these types.

Programmers are also said to be biased because, like all humans, they have conscious and unconscious preconceptions that they transfer to the codes they write, for instance, by selecting the variables to be used or ignored (Donald 2019). The so-called ‘algorithm bias’ refers to cases in which algorithms ‘inherit’ the prejudices of their creators, namely towards ethnic groups and females, as most programmers are white males (Borgesius 2018). The algorithms on which AI is based tend to emulate the social biases of the human operators who created them and, voluntarily or inadvertently, transposed such biases to the algorithms (Chattopadhyay et al. 2020; Cofone 2019; Donald 2019; European Digital Rights 2019; European Union Agency for Fundamental Rights 2018). These two factors are probably interrelated, as studies show that ‘engineers exert greater effort and are more responsive to incentives when given better training data’ (Cowgill et al. 2020).

The tendency is for such flaws to be overcome. Recent studies show advances in this regard (US Department of Homeland Security 2019) as compared with studies from one decade ago (Klare et al. 2012), not only due to technological improvements but also because developers realised that richer and more diverse datasets provide less biased results. A 2020 NIST report concluded that the best FR software does not show any significant difference in recognising people of different genders and ethnic groups: false-negative rates of 0.49% or less for Black females and no more than 0.85% for white males (McLaughlin, Castro 2020). Modern FR software has an accuracy rate of 99% (McLaughlin, Castro 2020) and studies continue to provide proposals to prevent discriminatory results (Serna et al. 2022).

3 Legal consequences: liability related to FR inaccuracies

3.1 Harm and compensation for moral damages

Public or private entities using FR may be held liable for damage caused by the use of this technology, either under the rules of contract law or, most commonly, tort law. Errors can, at best, lead to awkward or compromising situations; however, they can also impose severe harm on the individuals involved (as in the case of a man arrested and detained for six days because of a mismatch in FR (Infobae 2019), for which the authorities can be held accountable. Moreover, an erroneous identification may result in discrimination and harm to the individual’s dignity. ‘This misrecognition on the basis of rights is ethically problematic not just as a violation of moral principles regarding equal treatment, but also because it can have long-lasting damaging effects on a person’s self-development, in particular on their sense of self-respect’ (Waelen 2022).

Discrimination—banned by Article 21 of the European Charter of Fundamental Rights, Article 14 of the European Convention of Human Rights and Protocol 12 to the European Convention of Human Rights (European Digital Rights 2019)—involves mostly non-white individuals (Najibi 2020) and females (Buolamwini and Gebru 2018). Other vulnerable individuals are also at great risk, namely, people with facial deformities or that suffer from any disease capable of altering facial traits. In a sense, every individual is vulnerable in face of this technology, as even minor changes in look (bigger glasses, a different haircut) can lead to erroneous identifications. However, some individuals—the ones whose ethnic features or other distinctive traits are less known to the machine—are especially at stake.

It is at least theoretically conceivable that a person mistakenly identified—and thus subject to public embracement and humiliation—files a compensation request for moral damages against the manufacturer and/or the user of the FR system.Footnote 7 This possibility, however, might be more theoretical than real. The reason is that in common law jurisdictions compensation for moral damages solely does not usually take place, unless intentionally inflicted or when physical or patrimonial losses are also at stake (Ben-Shahar and Porat 2018), requirements that hardly will happen in our scenario. In contrast, in European continental law, compensation for purely moral damages remains more flexible (Basenko, Avanesian and Strilko 2022). Based on the roman tradition, civil law distinguishes between patrimonial damages, which involve both damages to property and the body, and moral damages. Recovery for moral damages exclusively (i.e., independently of the existence of any other damage) in tort law is fully recognised in civil law jurisdictions, which can potentially include the distress and anguish caused by an erroneous identification by an FR system. The problem might be, however, that not every moral damage is to be compensated. It is common in civil law jurisdictions to restrict compensation for moral damages, a limitation sometimes established by statute.Footnote 8 If that is the case, it would remain at the discretion of national courts to decide whether the moral harm argued by a victim of an erroneous FR identification is serious enough to receive compensation under tort law rules.

3.2 Liability of various stakeholders

3.2.1 Manufacturer and user liability

The developers of FR algorithms and the manufacturers of AI systems may be held accountable, based on laws governing defective products. In the context of such a complex device, several players may be involved in its development and manufacture, raising the question of the proper distribution of liability.

Liability may also fall on the AI user, that is, the natural or legal person using the FR. As FR algorithms provide only probabilities in the form of percentages and not definitive matches or identifications, it is up to the natural or legal person using the algorithm to decide what do to with such percentages. A rush decision, taken without sufficient evidence, may lead to errors and thus to liability. For instance, in 2021, a man sued the Detroit police department that had arrested him for an alleged robbery based on algorithmic identification. FR was used on images collected by camera surveillance, and the system matched the man in the images with the plaintiff’s driving licence photo, taken from the driving licence database. Subsequently, the match was proven to be wrong (Williams v. City of Detroit, Michigan, A Municipal Corporation et al. (2:21-cv-10827), Michigan Eastern District Court). However, the user’s liability is complicated, as users may not be able to identify, much less monitor, the training data used to develop the algorithm because developers of FR software are generally unwilling to disclose the data used for copyright and trade secret protection purposes (European Union Agency for Fundamental Rights 2019).

In certain cases, the user’s liability is relatively clear. Consider an FR system whose components are not initially defective but become so through misuse. For instance, suppose the software is sold with a high-resolution camera, which is damaged by the user and replaced with a lower-quality camera that provides images in which small details of people’s faces cannot be distinguished. Erroneous identifications cannot, in this scenario, be blamed on the manufacturer. Similarly, consider an FR system intended to be placed and used in a brightly lit area (and sold with clear statements of this information). If, instead, the user installs the camera in a dimly lit area, this may result in low-quality images, causing erroneous identifications. Such an FR system cannot be considered defective, even if the percentage of errors is above the established threshold, as the malfunction is due to the contravention of the manufacturers’ instructions.

3.2.2 The new norms on manufacturer’s liability

The Product Liability Directive (PLD) (Council of the European Communities 1985) is a European law that provides for strict liability for defective products. Its application exempts plaintiffs from the duty to demonstrate in court the culpability of the manufacturer.

The PLD applies to a very restrictive set of ‘defects’, all related to malfunctions that jeopardise the user’s safety. As stated in Article 6(1) of the PLD, ‘A product is defective when it does not provide the safety which a person is entitled to expect’. A defect, under the PLD, is associated with a lack of safety. As such, a misidentification by an FR system would be out of its scope, as it is a non–safety-related malfunction. A safety issue may exist regarding FR hardware, such as when an FR camera burns a person’s face when he/she comes too close. However, it is difficult to envisage such harm resulting from FR software. Even the moral harm suffered by a misidentified individual–which is especially likely to happen with identification activities carried out by police forces–does not equate to a lack of safety under the PLD, as it states in the introductory notes that its aim is ‘to protect the physical well-being and property of the consumer’. In an extremely flexible interpretation of this concept, it could be said that in some circumstances an erroneous identification can put a person at risk (for example, in cases of detention and in which the person is physically attacked); however, it is far-fetched in terms of practical application.

In any case, even considering a defect as described in the PLD, in its current (not revised) version, it is not clear whether the PLD applies to FR technology and, if so, in which terms. It has long been accepted that the PLD only applies to products, not to services, as recently underlined in the Krone case (Vi v Krone,–Verlag Gesellschaft mbH & Co KG, 10 June 2021, Case C-65/20, ECLI:EU:C:2021:471). However, the Advocate General’s opinion in this case went one step further, stating that the PLD applies ‘only [to] the physical properties of the product’ (Advocate General 2021). The European Commission itself did not go that far; it simply underlined the difficulties of applying the PLD to AI products (European Commission 2021a).

However, a draft revised version of the PLD was recently disclosed, in which AI software, including FR software, is firmly placed within the scope of the PLD (European Commission 2022b). Therefore, the ones harmed by the hardware parts in FR systems can sue the manufacturer under the strict liability regime therein established, but also the ones harmed by the software used (though, within FR, harm due to the software's lack of safety is hardly foreseeable).

Moreover, when it comes to a software defect, an additional set of norms will intervene, the AI Liability Directive (European Commission 2022a), recently proposed by the EU. According to its current draft, people harmed by the AI technology underneath an FR system will be able to require the AI provider to have access to information (‘evidence’) relevant to ground a lawsuit for damages,Footnote 9 which in this case could involve information about the quantity and quality of images used to train and test the FR system or about the code used. If such evidence is not provided, a presumption of fault (i.e., a presumption of non-compliance with a relevant duty of care) will operate to give as demonstrated the very same events that the missing evidence was trying to prove, as set forth in Article 3(5) of the AI Liability Directive. Moreover, a presumption of causation between the violation of the duty of care and the output might also apply (Article 4 of the AI Liability Directive).

Despite all these additional tools to address liability, the fact that this particular AI technology works with probabilities will very likely difficult the assessment of malfunctioning. As a correct result cannot be guaranteed in FR, it all comes to the definition of the threshold to accept its result and the put in place of adequate mechanism of confirmation (e.g., other forms of biometric identification) and human oversight. These measures consubstantiate the standard of care, as defined in Article 2(9) of the AI Liability Directive, both for AI developers and AI users (as this Directive applies to both) in what regards this technology.

4 Measures to avoid erroneous results (and exclude liability)

4.1 State of the art technique

Accuracy strongly relates to the technique used, which must be state of the art (European Union Agency for Fundamental Rights 2019), a typical requirement in domains characterised by extreme technological complexity and huge investment (Boyd and Ingberman 1995). This observation, which seems obvious, immediately faces obstacles in its implementation. First, the difficulty in defining the concept of ‘state of the art’ and which technology can be classified as such. Secondly, even assuming that the state of the art criterion is well-defined, it is difficult to have access to such technologies, usually protected by strong IP rights. Thirdly, a concern related to the costs of using highly developed technologies, which might overcome the expected profit.

The first concern is based on two difficulties, one legal and the other technical. Legally, there is no consensus about the exact content of the concept ‘state-of-the-art’: does it refer to the respect for common (standard) practices in a given industry or does it imply the inexistence in the market of a safer/better product (Boyd and Ingberman 1995)? Technically, it is difficult to reach an agreement about which technology should be considered ‘state of the art’ at any given moment (Boyd & Ingberman 1995) mostly because FR technology is continuously developing. Within FR it is commonly agreed that computation power and developed algorithms (such as deep convolutional neural networks) should be used to achieve more accurate results. In recent years, a significant improvement in FR was achieved with the introduction of neural networks, a form of machine learning able to recognise individuals from a large dataset of images (Welinder and Palmer 2018), in particular convolutional neural networks (Wright 2019), considered the state of the art in this technology (Hancock et al. 2020), but it is not known until when. Moreover, the more developed the technology is, the more skills are required of the operators; thus, technological development must come with proper technological training (Sarabdeen 2022). The fact that AI is able to learn can contribute to improvements in its level of accuracy. ‘The algorithm then adjusts and fits itself for accuracy, enabling it to make predictions about a new photograph with increased precision’ (Office for Product Safety and Standards 2021, 15). For instance, in an FR authentication mechanism, the system learns how to detect small changes in a person’s face, such as from the use of makeup or from ageing (Jesdanun 2017; Apple 2022). In the context of the face ID used to unlock the iPhone, Apple states that ‘[t]his data [the mathematical representations of the face] will be refined and updated as you use Face ID to improve your experience, including when you successfully authenticate’ (Apple 2022). Hardware also plays an important role (Roth 2009). For instance, many of the cameras used in FR do not adequately capture darker skin, resulting in images of lower quality for persons of colour (The Alan Turing Institute 2019) but a switch to high-resolution cameras may thus help to increase accuracy.

The obstacle to the use of state-of-the-art technology resulting from IP rights must also be considered, as novel technologies will most likely be protected by patents. Even though the protection conferred to patents is not absolute—the protected AI system can still be used for interoperability purposes, scientific research purposes, private use or under a compulsory license (Dent et al. 2006)—the most viable solution is a licence agreement.

However, eventually, we might have to rethink the balance between the protection of IP rights and the diffusion of knowledge. More than a decade ago, it was already acknowledged that ‘[f]rom an economic perspective, it is to be considered that IP monopolies, while spurring investment in new creations, also impede follow-on innovation requiring the use of pre-existing, protected material. Hence, there is a delicate balance inherent in all IP protection regimes’ (Senftleben 2011: 4). Nowadays, considering the vertiginous technological development and the frequent inability of the law to keep up with it, this statement makes even more sense. The legal regime applicable to patents must accommodate the growing demands for more knowledge in digital societies. Those demands might force a flexibilization of patent law and patent rights.

Finally, the costs. Innovation has a cost and the more complex and innovative (and thus, the more precise) the FR system is, the higher the price. Unless the company manages to internalise the cost, the financial burden might be unbearable and some companies might prefer to take the risk of using a ‘not so good’ technology, hoping that no negative outcome will take place. A possible solution could be the establish a technological partnership with the developing institution (a university or a private company) to have access to such technology at a lower price, an option especially adequate for public bodies or other non-profit entities. In exchange, the user (i.e., the receiver) of this technology could be available for trials to identify drawbacks of the technology or marketing actions. Traditionally, this kind of partnership takes place to develop new technologies, joining the resources (human, capital) of two or more companies (Santangelo 2000), but they can also be employed to facilitate access to new technologies (Water 2022).

From a policy perspective, economic incentives could be provided to companies willing to acquire developed–and thus expensive–technology, such as tax benefits and/or other financial incentives. These policies are already in place in several countries (OCDE 2018). It remains to be seen, however, if national governments see the investment in more secure and precise FT systems as a type of R&D investment able to justify the use of such incentives.

4.2 Ethical dimension

The technical dimension of FR should also incorporate ethical concerns, following the principles of good governance (Mökander et al. 2022), that embrace ethics, policy, and law (Raposo 2023). This is crucial to building systems that are ethical by design, that is, whose source codes take into consideration ethical dimensions (Renaissance Numérique 2020). In this regard, Mutale Nkonde (2020, 33) refers to the ‘design justice framework’, a theory first developed by Sascha Costanza-Chock (2020). The design justice framework centres the AI design on the groups that suffer particular negative effects of its use—in this case, as a result of the higher percentage of error—and develops around that focus.

Racial biases are a major ethical concern. Despite improvements in this domain, there are still some disconformities in results. Therefore, the FR system must be based on data that matches end-user characteristics (World Economic Forum 2020), meaning that different parts of the world might have to resort to different databases to train their FR systems, according to the ethical composition of the respective population (see next section). A plan to identify potential biases during the testing phase and correct them must be in place. In a 2020 White Paper on FR, the World Economic Forum (2020) recommended to entities using this technology an impact assessment regarding the possible biases and their impact on the targeted individuals. The assessment would involve, among other measures, the identification of the possible discrimination risks and the way the organization deals with them and mitigates their effects.

Transparency is also part of the ethical (and legal) measures to consider. Information shall be provided to the team using this technology and to the ones affected by it, covering the way the FR system operates, the principles of good governance that have been adopted, the possible negative outcomes, and the expected percentage of false positives and false negatives. This shall be communicated in a language that is accessible to laypeople to guarantee comprehension (World Economic Forum 2020).

Data protection is another stringent concern.Footnote 10 It seems to raise a different set of issues than those related to inaccurate recognitions, but the principle of data accuracy (Article 5(1)(d) GDPR) connects them both. Personal data shall be accurate and when an incorrect identification is tagged to a biometric template this principle is compromised, leading to a violation of the GDPR.

4.3 Quantity and quality of images used

As one of the causes of ‘algorithm bias’ (Borgesius 2018) relates to the relevant datasets. Therefore, its resolution demands an improvement in the quality and quantity of the images used (Veale and Binns 2017). FR training requires a large amount of data in the form of photos. Different types of AI require different amounts of data, and ‘deep learning’ (the kind of AI that now forms the basis of FR) requires more data than other types of AI (Office for Product Safety and Standards 2021). As mentioned above, a persistent problem in FR is the predominance of white male images in such data, leading FR to present very high inaccuracy rates for females and people of other ethnic groups (European Union Agency for Fundamental Rights 2019). Solving the problem of the lack of diversity in the data would solve a substantial part of the FR accuracy problem.

It is worth noting that FR is not intended to discriminate against ethnic minorities (e.g., Asians, and African Americans in the West). However, when groups are underrepresented in the community, they are likely to also be underrepresented in the databases, exactly as other minorities are in other geographical areas. For instance, there is evidence that an FR system developed in Asia is less able to recognise white people than Asians (Lunter 2020) and, therefore, is biased against whites.

The quality of the images (both those used for training and testing and those stored in the database) also affects the accuracy of the technique. Several factors must be considered: differences in hair and skin colourFootnote 11; differences between compared images, including ageing and other individual transformations, and emotions; lighting conditions; camera distance; background; head orientation; and the size of the face in the image (European Union Agency for Fundamental Rights 2019). The images to train and test the system shall be collected in a condition similar to the one in which the FT cameras will operate. Another mitigation measure is the use of cropping photos, disregarding peoples’ hair, to improve accuracy for the individuals that have their heads covered (a turban or a hijab) (World Economic Forum 2020).

Moreover, the use of FR differs significantly between a controlled environment, such as police stations and airports, in which the lighting and orientation of the subject are controlled, and a non-controlled environment, such as random images from CCTV cameras, especially live footage of people passing in a street (Crumpler 2020; European Union Agency for Fundamental Rights 2019; Harwell 2019). The law must distinguish between these two types of matches, for instance, by demanding additional methods of identification for non-controlled environments, such as other forms of biometric identification (e.g., fingerprints, DNA), before taking any action (Renaissance Numérique 2020). Preferably, images should be taken in controlled environments, where the accuracy of FR is very high (European Union Agency for Fundamental Rights 2019). A typical example of a controlled environment is that of the airport boarding gate, for which a NIST report from 2021 confirmed an accuracy rate of 99.5% (Grother et al. 2021).

The criteria regarding the quantity and quality of images used for FR should cover not only the training images but also the testing images. The reason that some tests fail to detect bias is that the data used to test the system are usually as biased as those used for training because data scientists generally divide a single set of data into two parts and use one part for training and the other for testing. When the same dataset is used for training and testing, testing cannot detect a problem with bias (Hao 2019).

If FR technology manages to solve the problem of misidentifying people from less common ethnicities or females, FR may become less discriminatory than human-operated methods of identification. Humans (e.g., law enforcement agents) are frequently accused of discrimination and even racism (Schwartz 2020), whereas properly working FR software should be free of such biases.

The data used to train FR algorithms must be accurate and up-to-date to avoid not only liability due to misidentification but also liabilities related to the violation of data processing norms (Information Commissioner’s Office 2019; European Data Protection Board 2022).Although this perspective is not developed in this paper, it should be noted that the principle of accuracyFootnote 12 is stated in Article 5(1)(d) of the Law Enforcement Directive (European Parliament and Council 2016a), Article 5(1)(d) of the General Data Protection Regulation (European Parliament and Council 2016b), and Article 5/3/d of the Convention 108 (Council of Europe 1981).

4.4 Constant monitoring

FR must be subject to constant monitoring and sporadic auditing to analyse the performance of its algorithms, immediately detect biases and other failures, and amend them (Renaissance Numérique 2020). Towards this aim, the law should require that records be kept of the programming of the algorithm, the training methodologies applied to the AI system and the data used to train that system.

In the US, the 2022 Algorithmic Accountability Act, in its Sect. 3(1), imposes an obligation on companies to regularly assess the accuracy of their FR algorithms, by performing an impact assessment (Senate and House of Representatives 2019). In the European Union, the Draft Proposal on AI (European Commission 2021b) proposes a post-commercialisation monitoring mechanism, which involves two main dimensions: AI systems providers are required to report serious incidents and malfunctions of their systems (Article 62 of the AI Draft Regulation); and market surveillance authorities are asked to control the AI systems already put on the market, assumingly also to control their accuracy (Article 63 of the AI Draft Regulation).

It has also been suggested that vendors should make their datasets and algorithms available to the general public, to permit independent auditing (Ho et al. 2020). However, the feasibility of this proposal is questionable for reasons related to IP rights. The novelty of this technology will most likely hamper public disclosure of this material, to prevent business rivals to develop competing products.

Likewise, in its 2022 guidelines on FR in law enforcement, the European Data Protection Board (2022) recommended certain measures to prevent inaccurate results, which, although specifically addressing data protection issues, are generalisable to other sources of legal liability. Specifically, the European Data Protection Board recommended the implementation of procedures to supervise algorithmic accuracy, such as logging and reporting mistakes (European Data Protection Board 2022). When a problem is identified (e.g., data poising, spoofing), measures should be immediately taken to address it. If necessary, the algorithm must be retrained, using an enriched version of the same dataset or an entirely new one (European Data Protection Board 2022).

Monitoring is only efficient if a maximum limit of error is established, above which the FR system must be cancelled or at least revised. A problem that remains to be settled is how to determine a reasonable level of accuracy to establish such an error limit. An intuitive answer is that FR should only be allowed when it causes the same number as, or fewer errors than, human recognisers in the same setting (Sarabdeen 2022). However, it must be clarified whether this refers to average humans or so-called ‘super-recognisers’, who are particularly good at remembering faces and recognising them in crowds and who account for only one to two per cent of the population. This would obviously constitute a much more demanding comparative criterion to establish the limit of error.

5 Preliminary conclusions

Despite recent advances in technology, FR still presents flaws in its identification abilities. This is a weakness that manufacturers and potential uses of FR must consider and address, as the most common types of FR misidentifications can result in legal discrimination and lawsuits against these actors.

Most of the flaws associated with FR, however, can be controlled and even amended. Recent studies analysing the development of this technology reveal improvements in FR. When using modern software trained with a panoply of images, FR accuracy goes up to 99% (McLaughlin, Castro 2020). In the future, it may even become the standard of care for identification. At present, however, prudence is recommended, and appropriate measures and safeguards should be put in place.