Artificial intelligence (AI) and the use of computer algorithms play an increasingly pervasive role in daily life (Binns, 2018; Bjerring & Busch, 2021; Crawford & Calo, 2016; de Laat, 2018; Helbing et al., 2017). The use of AI has become ever more influential in decisions made in fields as diverse as the healthcare field (Esteva et al., 2017; Martinez-Martin, 2019), employment decisions (Chamorro-Premuzic et al., 2017), money lending (Prince et al., 2019), education (Holstein et al., 2018), and the judicial system, including law enforcement (Angwin et al., 2016; Buolamwini & Gebru, 2018; Chouldechova, 2017; Garvie, 2019; Garvie et al., 2016; Lum & Isaac, 2016; Veale et al., 2018). Face recognition technology (FRT) is a type of AI, the use of which comes with both societal benefits along with moral pitfalls. On the one hand, FRT can help physicians diagnose diseases and monitor patients in the healthcare setting (Martinez-Martin, 2019), find missing and lost persons (Darsham Balar et al., 2019), and help law enforcement apprehend dangerous criminals (Eddine Lahlali et al., 2015; Garvie, 2019). On the other hand, these benefits come with associated moral risks. One such moral risk results from American law enforcement’s use of FRT that expresses bias along racial and gender lines (Allyn, 2020; Angileri et al., 2019; Buolamwini & Gebru, 2018; Furl et al., 2002; Fussell, 2020; Garvie et al., 2016; Klare et al., 2012; Rhue, 2018, 2019; Wang et al., 2019).Footnote 1 According to Garvie et al.s’ (2016) landmark report, the FBI and an estimated one out of every four American law enforcement agencies (state and local) make use of or have access to FRT programs and databases. This large-scale use of FRT by law enforcement makes the moral problem of biased FRT algorithms especially concerning, since there is the potential for many people to be adversely affected.
This paper addresses the moral issues that arise with use of biased FRT by law enforcement as a means for apprehending criminal suspects. More specifically, it presents an in-depth philosophical analysis, from the liberal tradition, of the moral and political consequences and problems of biased FRT used by law enforcement. The author develops and defends “A Liberal Argument Against Biased Face Recognition Technology,” which concludes that biased FRT used by law enforcement is incompatible with liberal democracy because it violates the classical liberal value that all individuals deserve equal treatment before the law. While the argument of this paper does not prove that the use of this technology is immoral per se, the argument herein will provide insight into how and why the current use of such technology by law enforcement is incompatible with core principles of western liberal democracy.
To be sure, the ethical use of FRT and related AI has received an array of philosophical, legal, and public-media attention in recent years. de Laat (2018) has recently argued for greater transparency in the oversight and regulation of machine learning algorithms, recognizing that biased outcomes can result from their widespread use. Brey (2004) has provided a broad philosophical overview of the ethical issues and associated policy proposals of the use of FRT in public places. Despite the broad scope of Brey’s analysis, the topic of racial bias in FRT was not included. Hale (2005) has argued that the use of FRT by law enforcement threatens self-determination by conflicting with a conception of free will that depends on social interactions. Even though Hale’s astute analysis focuses on law enforcement’s use of FRT, it does not address the problem of bias. In a similar vein, Selinger and Hartzog (2020a, 2020b) have argued from a legal perspective that the risks associated with FRT surveillance used by governments and companies make the consent to its use impossible (Selinger & Hartzog, 2019). While Selinger and Hartzog discuss at length the legal aspects of FRT, racial and gender bias are not within the scope of the paper. Additionally, Selinger and Hartzog have published some popular articles in the New York Daily News. In their most recent piece, they argued that FRT ought not to be used in the fight against COVID-19 due to privacy concerns, but they do not consider the racial and gender bias associated with FRT technology (Selinger & Hartzog, 2020a, 2020b). In an earlier piece in the same newspaper, the authors argue that while public mask-wearing could present some complications for the use of FRT by governments to track patients infected with COVID-19, the authors note that technology firms are working on ways to improve FRT’s ability to “guess” the identity of mask-wearing individuals. While the burdens on people of color are mentioned, how and why people of color will be burdened was not thoroughly explored (Selinger & Hartzog, 2020a, 2020b). In 2020, legal scholars Katelyn Ringrose and Divya Ramjee published an article in the California Law Review which explored the use of FRT by law enforcement to identify individuals who attend large protests. The focus of Ringrose and Ramiee centers on not only the privacy concerns of protestors and the machine bias of the FRT, but also the fact that law enforcement agencies had been working with a private company’s (Clearview AI) facial image repository which yielded more than 3 billion images (Ringrose & Ramiee, 2020). This analysis raises two problems related to machine bias which call for further philosophical and legal analysis. First, an ethical and legal analysis is needed on the controversy of merging private tech companies with the law enforcement arms of the government. Secondly, an important topic for future ethical and legal analysis is the development of a regulatory framework to constrain the ubiquitous use of cameras and the associated expansion of large repositories/databases of facial images that are exploited to train FRT programs used by law enforcement.
This paper will proceed by first reviewing the current state of technological development of FRT, including its use by law enforcement and the evidence of racial and gender bias. This section will rely heavily on Garvie et al.s’ report “The Perpetual Line-up: Unregulated Police Face Recognition in America”. Next, the main argument of this paper, “A Liberal Argument Against Biased Face Recognition Technology,” will be presented. The proceeding sections are devoted to defending the premises of the argument, including responding to objections to the most contentious premise. The final sections of the paper will define the policy implications of the argument and will mention future directions for philosophical analysis.
First, a definition of FRT is necessary. FRT is a type of AI that incorporates machine learning algorithms which identify patterns of facial features and matches a face to pictures of other faces from a large data base. Following Garvie et al., “Face recognition is the automated process of comparing two images of faces to determine whether they represent the same individual” (Garvie et al., 2016, p. 9). This report provides an excellent summary of how FRT identifies faces: “Before face recognition can identify someone, an algorithm must first find that person’s face within the photo. This is called face detection. Once detected, a face is “normalized”—scaled, rotated, and aligned so that every face that the algorithm processes is in the same position. This makes it easier to compare the faces. Next, the algorithm extracts features from the face—characteristics that can be numerically quantified, like eye position or skin texture. Finally, the algorithm examines pairs of faces and issues a numerical score reflecting the similarity of their features” (Garvie et al., 2016, p. 9). There exist many types of machine learning algorithms that can detect (Zafeiriou et al., 2015) and recognize faces (Klare et al., 2012), and their accuracy and overall performance continue to improve (Kong et al., 2006. Some FRT programs contain algorithms that go beyond merely matching a face to an image. These algorithms attempt to detect subtle facial changes that are associated with specific emotions and truth-telling/lying (Bittle, 2020; Rhue, 2018, 2019).
Many American law enforcement agencies now use or have access to FRT to aid in combating crime (Allyn, 2020; Buolamwini & Gebru, 2018; Fussell, 2020; Garvie et al., 2016; Garvie, 2019; Holstein et al., 2018). According to the Garvie report in 2016, 117 million American adults are affected by law enforcement’s use of FRT, which includes the FBI’s Next Generation Identification Interstate Photo System (NGIIPS), and the one out of four state and local law enforcement agencies that have access to FRT programs (Garvie et al., 2016). One way that FRT is used by law enforcement is to help identify and arrest a suspect by running an image of a suspect’s face through an FRT program, which attempts to match that image to other images of faces in a large database. Facial images of crime suspects can be obtained by police due to the nearly ubiquitous use of public and private cameras. ATM machines, traffic cameras, and private security cameras around homes and businesses all provide opportunities for police to obtain digital images of suspects after or during the commission of a crime. A crime suspect’s image, once obtained by police, can be entered into an FRT computer program. That FRT program would then search though a large database of faces, and then render “matches” of varying probabilities of other faces for the police to consider for further investigation. Law enforcement investigators can then make decisions about which suspects should be considered for questioning, detainment, or arrest. While law enforcement’s decisions regarding which individual to arrest may not be based solely on the FRT’s matches, the FRT’s matches could play a crucial role in the chain of causation that determines which citizens will ultimately be investigated and arrested.
Computer scientist Joy Buolamwini and Gebru (2018) at Massachusetts Institute of Technology (MIT) have recently expressed concern about law enforcement use of FRT to identify crime suspects by writing, “…it is very likely that such software is used to identify suspects. Thus, an error in the output of a face recognition algorithm used as input for other tasks can have serious consequences. For example, someone could be wrongfully accused of a crime based on erroneous but confident misidentification of the perpetrator from security video footage analysis”. In the important study led by Buolamwini, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” the researchers found that FRT algorithms are biased based on race and gender. The study examined popular commercially available FRT programs by analyzing their data sets, which is the information used to train the FRT algorithm. The study found that the data sets were mostly consist of light-skinned subjects (79.6% and 89.2%). Since the FRT data sets incorporated predominantly light-skinned subjects, the FRT algorithms which could classify faces by gender were most accurate when the subject being identified was a white male (only a 0.8% error rate), and the algorithms were least accurate when the subject being identified was a dark-skinned female (up to 34.7% error rate). The FRT algorithms performed 11.8–19.2% worse on darker-skinned images compared with their lighter-skinned counterparts (Buolamwini & Gebru, 2018). Buolamwini et al. conclude that “urgent attention” is needed on the part of companies that produce FRT in order to maintain fairness and accountability.
While Buolamwini and her team have demonstrated in detail how and along what intersectional lines AI algorithms are biased, other researchers in the field have been aware that such built-in bias could occur (Angileri et al., 2019; Angwin et al., 2016; Chouldechova, 2017; Garvie et al., 2016; Gong et al., 2019; Klare et al., 2012; Serna et al., 2019). A 2019 study conducted by the National Institute of Standards and Technology (NIST) in the USA shared the same conclusion of Buolamwini et al., finding that FRT software varies in its accuracy depending on the race and gender of the images. The NIST study additionally found biased results when it came to East Asian, Native American, American Indian, Alaskan Indian, and Pacific Islanders faces (NIST, 2019). In the United Kingdom (UK), where FRT is being used in public places, a recent report from the University of Essex led by Professor Pete Fussey corroborates both Garvie’s and Buolamwini’s concerns about biased FRT being deployed by the London Metropolitan Police (Fussey & Murray, 2019). Interestingly, Furl et al. (2002) found that FRT developed and used in Western countries were more accurate for Caucasian facial images, whereas in East Asian countries, the FRT algorithms were more accurate for East Asian facial images.
Additional research suggests that due to the skewed data sets described by Buolamwini et al., some FRT programs designed to interpret emotions by facial analysis perform differently according to the race of the subject (Rhue, 2018, 2019). According to Rhue, images of black faces are more likely to be interpreted by algorithms to be expressing negative emotions (anger or contempt) at a higher rate than white faces. Since the use of these programs could be used by law enforcement on large crowds to identify individuals more likely to be threats (by associating angry or contemptuous looking faces with potential threats), any bias in these FRT programs could result in higher rates of errors when it comes to non-whites (Rhue, 2018, 2019). This same type of error could occur with FRT programs designed to analyze small changes in a person’s face that have been associated with lying. The use of these “lie-detecting” FRT programs have been proposed for courtroom settings (Zhe et al., 2017), questioning suspects of crimes in the UK (Randell, 2019) and for law enforcement use to manage border crossing between countries (Bittle, 2020). Even though some of these newer programs have been tested for racial and cultural bias (Bittle, 2020), the potential for disparate effects along intersectional lines has been established by Buolamwini et al. Additionally, a more basic problem with emotion and lie-detecting algorithms comes from the underlying concept: That subtle changes in facial features can reliability predict emotions and whether someone is telling the truth. Some commentators have pointed out that such algorithms resemble the pseudo-scientific claims made by the now-debunked theories behind phrenology (the claim that skull shape predicts character) and physiognomy (the claim that facial features predict character) (Chinoy, 2019; Spichak, 2021). Future philosophical analysis is needed to explore the validity and moral status of the use of such technology within the context of its pseudo-scientific predecessors. Despite the fact that there are many types of FRT programs that could be used by law enforcement in diverse contexts, the argument of this manuscript would apply in equal measure, mutatis mutandis, to any biased consequences that would be a product of all of these uses of FRT programs, should it turn out that the skewed data sets give rise to biased outcomes.
At this point, two separate claims have been established. First, American law enforcement uses FRT on a large scale. Second, AI, and more specifically, FRT algorithms and data sets demonstrate racial bias. These two claims on their own do not yet establish whether there is evidence to believe that the specific FRT algorithms in use by law enforcement are themselves biased. A review of the current research suggests that law enforcement FRT is indeed biased along racial and gender lines. The Garvie report cites Klare et al.s’ 2012 study, which was co-authored by an FBI expert. This study evaluated three different commercially available FRT algorithms, which were in use by the Los Angeles County Sheriff, the Maryland Department of Public Safety, the Michigan State Police, the Pennsylvania Justice Network, and the San Diego Association of Governments (SANDAG), which runs a system used by 28 law enforcement agencies within San Diego County. The FRT algorithms were 5–10% less accurate when it came to African Americans as compared to Caucasians. According to Garvie et al. (2016, p. 54), “… this effect could lead the police to misidentify the suspect and investigate the wrong person. Many systems return the top few matches for a given suspect no matter how bad the matches themselves are. If the suspect is African American rather than Caucasian, the system is more likely to erroneously fail to identify the right person, potentially causing innocent people to be bumped up the list—and possibly even investigated. Even if the suspect is simply knocked a few spots lower on the list, it means that, according to the face recognition system, innocent people will look like better matches”. The Garvie report states that in 2016, the authors of the report interviewed engineers from two of the leading FRT vendors, both of whom have contracts with law enforcement agencies. The engineers confirmed that their respective companies did not explicitly test their FRT algorithms for racial bias (Garvie et al., 2016, p. 55).
Finally, in June 2020, Wired.com, NPR, and other mainstream news outlets reported on the arrest and detainment of Robert Williams, who is the first man in the USA to be mistakenly accused and detained by police due to a racially biased FRT program used in a criminal investigation (Allyn, 2020; Fussell, 2020). According to those news reports, Williams’ arrest was prompted by security footage of a theft that occurred in a retail store in Michigan being run through an FRT program by the Michigan State Police crime lab. The image of the theft suspect in the security footage was mistakenly matched by the FRT program to Williams’ photo from his driver’s license. Williams was detained for 30 h, released on bail, and the case was eventually dropped by the Wayne County prosecutor’s office due to insufficient evidence (Allyn, 2020). This recent event provides a real-life look into the unfortunate effects of law enforcement’s use of biased FRT programs, and it has prompted some state and local governments in the USA to take legislative actions that include policy debates, regulations, and moratoriums (DeCosta-Kilp, 2020; Ryan-Mosely, 2020; Stein, 2020). Despite these state and local legislative actions, as of the year 2020, there exists no federal law in the USA that directly regulates the use of biometric technology (including FRT) (Ringrose & Ramjee, 2020).
Given the Garvie report and Buolamwini et al.s’ recent study on racial and gender bias in FRT algorithms, along with the Robert Williams case, the following propositions are true:
American law enforcement agencies have used and currently use FRT on a large scale to fight crime.
The specific FRT programs in use by American law enforcement agencies demonstrate racial and gender bias.