Audit studies are a type of field experiment used to test for discriminatory behavior. In such experiments, one characteristic of individuals distinguishes the treatment and control groups, and the experiment is designed to gauge whether the groups are treated differently because of this characteristic. For instance, in one of the most common applications in criminology, two otherwise similar fictitious job applicants apply for the same jobs, with the applicants distinguished by the presence of a criminal record (see, e.g., Pager 2003). Researchers then measure differential responses to the fictitious job applications—e.g., an invitation to interview for the job—as an indicator of discrimination based on the mark of a criminal record. Through the creation of equivalent applicant backgrounds across the fictitious job applicants, audit methods reduce the problems of omitted variables and selection bias often found in studies of discrimination.
Whereas much of the application of audit methods in criminology focus on the stigma of a criminal record, here, we turn the lens to the potential stigma of being a police officer. With widespread social unrest from police violence, police as a whole may become stigmatized. Affiliation with the police may, to some individuals, signal a blemish of individual character (Goffman 1963), for instance that the individual is racist, overly aggressive, and unable to treat people with fairness and respect. Such qualities may be viewed as undesirable in the labor market. Accordingly, using a matched pairs design, we compare the likelihood of getting an affirmative response from a prospective employer to a job application from a fictitious former police officer (the treatment condition) to the response to one of two control conditions: a former firefighter or a former code enforcement officer. We signal the nature of prior employment through the job experience listed on the applicant’s resume and job application. We chose firefighting and code enforcement as control conditions given their similarity in requisite professional skillsets, a choice we return to in our concluding section.
We conducted our audit test at two separate time points: once in a period absent social unrest from police violence, and second during the heightened unrest following the killing of George Floyd on May 25, 2020. Under normal circumstances, this before-after design might allow for a natural experiment to examine whether incidents of publicized police brutality produced a change in the amount of differential response to former police officers in the labor market. However, the second time period of our data collection coincided with the fallout of the COVID-19 pandemic and the corresponding damage to the labor market, so the results should be interpreted cautiously.
We pre-registered the design of our study on Open Science Framework (OSF), where our data and code is archived (https://osf.io/p8je9/).
Sampled cities
We sampled employers in two Northeastern cities: Boston and Philadelphia. Resource constraints and the COVID-19 lockdown prevented us from extending the study to the Midwest, South, and West. To select these two cities, our sampling frame included the primary city in each metropolitan area in the Northeast that had a population size greater than 1 million. We only sampled cities in large metropolitan areas in order to ensure a sufficiently large number of potential jobs for which to apply, and to avoid diluting the job market with fictitious applications. Using information from the Fatal Encounters data repository (http://www.fatalencounters.org/), we computed the rate of deaths by the police in each large city. We then ranked and split the cities into two groups at the median: a “low” police violence group and a “high” police violence group. We then randomly selected one city from the “low” group (Boston) and one from the “high” group (Philadelphia).
Characteristics of fictitious applicants
In developing the fictitious resumes for the experiment, we created 12 distinct profiles that varied by profession (police, firefighter, or code enforcement), gender (male or female), and race (Black or White). We signaled race and gender through first and last names.Footnote 1 When applying for a given job, race and gender of the two applicants were identical, and we randomly determined which applicant race, gender, and control condition to use. For all job openings, one of the applicants had most recently been employed as a police officer, with the other applicant most recently a firefighter or a code enforcement officer.Footnote 2
We meticulously developed resumes in order to ensure similarity across background characteristics, but without exactly matching resume information and style. For residential addresses, we used real street names but fictitious street numbers in two different neighborhoods in the given city with similar race-ethnic compositions and poverty levels. For educational background, we selected public high schools characterized by similar race-ethnic composition and the percent of students qualifying for reduced-priced or free school meals. All applicants graduated from high school in May or June 2013. With an approximate birth year of 1995, necessarily, our experiment is focused on individuals who had relatively brief stints as police officers, roughly 3 years, before separating from the profession.
Prior to becoming a police officer, firefighter, or code enforcement officer, our fictitious applicants worked as delivery drivers and as sales clerks/cashiers in retail businesses. Specific elements of the resumes (e.g., addresses, high school attended, graduation date, prior jobs, and skills) were randomized for each job opening.
Characteristics of targeted jobs and employers
Our design is a version of an audit study known as an online correspondence test, in which we applied to online job advertisements. We submitted fictitious applications to job openings advertised in the Boston and Philadelphia metropolitan areas on Indeed.com and Craigslist.org. We applied to four different types of jobs: (1) skilled trades, such as electricians or painters; (2) drivers; 3) retail sales; and (4) office and customer support. We excluded jobs related to security and law enforcement. We expected that for these four occupations, former police, firefighters, and code enforcement officers would be similarly qualified. Hence, if there is preferential treatment of one group over another, it may be due to the stigma of their prior profession rather than their actual qualifications.
We targeted newly listed openings occurring within the preceding week. We did not apply to jobs requiring a social security number. We only applied to one position per employer. For companies with multiple branches and sites, we only applied to one position per branch. Based on these criteria, we selected jobs by convenience, as opposed to randomly sampling from a listing of all job openings matching our eligibility criteria.
When applying for jobs, we randomized the order of the application—i.e., whether the first application was sent from the treatment condition (i.e., police) or control (firefighter or code enforcement). We submitted the second application approximately 3 hours after the first one. For openings found on Craigslist, we submitted applications via email, including a brief cover email and an attached resume. We designed the content of the emails to be similar but not exact across the treatment and control cases, and randomized which version was sent by the treatment vs. control condition. For Indeed.com openings, we submitted job applications either through its platform or through a linked company site. These submissions typically required completion of an online form that duplicated the content of our attached resume. If a cover letter was required, we replicated the contents of the emails.
Outcome variable
Each of the 12 different applicant profiles used in the study had an email account and a unique phone number, which could receive texts and voicemails. Our outcome variable measures whether the fictitious applicant received an affirmative response to the application to set-up an interview or more informal requests to discuss the job. We excluded immediate auto-generated responses acknowledging receipt of the application. If an applicant received an affirmative response, we set the value of the outcome variable equal to one. For non-responses or negative responses, we set our outcome variable equal to zero.
Implementation
As noted, we implemented our experiment at two time points, prior to and following the killing of George Floyd on May 25, 2020. Data from the Crowd Counting Consortium (https://sites.google.com/view/crowdcountingconsortium/home), a collaborative crowd-sourcing effort led by political scientists Erica Chenoweth and Jeremy Pressman, reveals that more than 5000 different anti-racism and anti-police-brutality protest events took place nationwide in the first 6 weeks after Mr. Floyd’s death, with between 15 and 26 million individuals participating in these protests, including large numbers in Philadelphia and Boston (Buchanan, Bui, and Patel 2020; Putnam, Pressman, and Chenoweth 2020). Scholars and experts have characterized the combination of the size, intensity, and frequency of the protests as “unprecedented” (Putnam, Chenoweth, and Pressman 2020), underscoring the potential for differential treatment of the police across the two time periods of our study.
Our first implementation period took place between late May 2019 and March 2020, and was discontinued because of the COVID-19 pandemic. Termination of the data collection occurred on March 11, 2020, roughly 1 week prior to the first stay-at-home order in the USA, in the San Francisco area, and about 2 weeks prior to widespread issuance of stay-at-home orders. Our sample size of jobs in this period equals 605, with 1,210 applications.
We re-launched our data collection on June 4, 2020, 10 days after the killing of George Floyd. The data collection ended on July 16, 2020. In May of that year, both Massachusetts and Pennsylvania began reopening their respective economies after the initial lockdown from COVID-19. However, the necessity of social distancing and minimizing the spread of the virus meant that the state of the economies across our two time periods differed substantially. Our sample size of jobs from this period equals 212, with 424 applications.
Analysis plan
In the analysis to follow, we first present descriptive results of the proportion of applications for which police, fire, and code enforcement applicants received an affirmative response, separately for our two time periods. We then use McNemar’s (1947) test to examine statistical inferences. In an online Technical Appendix, we supplement these analyses by estimating linear probability models of the likelihood of an affirmative employer response, to facilitate examination of possible heterogeneous effects by applicant race and gender.
McNemar’s test, which is applied to a 2 × 2 contingency table, is a Chi-square test of goodness of fit for paired nominal data that can be used to compare the distribution of employer responses expected under the null hypothesis to the actual observed responses. Table 1 illustrates the test. In this table, nab represents the count of the number of cases in a given cell. The proportion corresponding to the count then equals to the following: pab = nab/n. The first subscript represents the outcome for the control condition in the rows (i.e., prior employment as a firefighter or code enforcement officer). The second subscript denotes the outcome for the treatment condition (i.e., prior employment as a police officer). An affirmative response equals 1 and a negative response equals 0.
Table 1 2 × 2 contingency table for McNemar’s test Our focus with McNemar’s test is on the discordant pairs, n10 and n01. More specifically, McNemar’s test assesses the null hypothesis of marginal homogeneity: π1+ = π+1, which is equivalent to testing whether the difference between the two discordant proportions equals 0—i.e., π10 = π01 (Vuolo, Uggen, and Lageson 2016). Accordingly, we are assessing whether the proportion of job applications for which a former police officer received an affirmative response from the employer but not the former firefighter is the same as the proportion of job applications for which a former firefighter received an affirmative response from the employer but not the former police officer. The test statistic is:
$$ {\chi}^2=\frac{{\left({n}_{10}-{n}_{01}\right)}^2}{\left({n}_{10}+{n}_{01}\right)} $$
A statistically significant Chi-square test would provide evidence to reject the null hypothesis, interpreted to mean that the likelihood of an affirmative response to a job application differs between the treatment (i.e., former police officer) and the control condition (former firefighter or code enforcement officer).