1 Introduction

In this paper, we examine whether the prevalence of colorism in India can be linked to discrimination in hiring for people with darker skin shades. Colorism, the preference for lighter skin tones even among non-white majority populations, has a long and contested history in India. In the more recent past, this preference for lighter skin became amplified by the rapidly growing skin whitening product industries’ efforts to extend colorism beyond a beauty ideal and link it to economic success, specifically labor market success. However, the existence of such a link is yet to be explored given the lack of skin tone–specific data in India. In this study, we implemented an experimental survey design to overcome this lack of data. Such experimental studies have yet to be used in the context of colorism in India. Our study included 275 participants who were asked to evaluate job candidates on the basis of unchanging resumes paired with photographs manipulated to vary skin tones. We did not find a statistically significant bias in favor of resumes paired with lighter-skinned photographs. Overall, participants tended to evaluate both lighter-skinned and darker-skinned candidates similarly. Our findings provide an important counter-narrative to the skin whitening industries’ prolonged efforts to expand their consumer base by linking lighter skin to economic success.

The relationship between skin shade and economic outcomes has been investigated in the USA, where colorism co-exists with racism and long-standing practices of racial discrimination. Empirical evidence (Goldsmith, Darity and Hamilton 2007, Monk 2014, Painter, Holmes & Bateman 2016, Hersch 2008) has shown that there is a link between skin shade and outcomes in a range of economic and social contexts in the USA. However empirical evidence has yet to be examined in the case of outcomes of colorism globally and specifically in India, where there is not a concurrent connection between colorism and historically overt racially discriminatory laws. Examining this link now is important given the global skin whitening industry’s efforts to intensify the focus on the perceived socio-economic benefits of lighter skin. Such efforts, if continued unchecked and unchallenged, might in fact create the feedback loops that generate unconscious biases and group-based discriminatory outcomes. As such the efforts of the industry represent an attempt to open a new global frontier in stratification based on skin color.

Our findings suggest that colorism in India currently cannot be easily linked to direct instances of discrimination in the case of hiring. This presents a direct challenge to the skin whitening industry’s efforts to emphasize such economic discrimination. Differential outcomes due to preference for skin color though might still operate in other social contexts like marriage and family or health outcomes and in situations where beauty ideals are more relevant.

These findings also add context to the discussion about the linkages between colorism and the caste system in India. Caste privileges in India are often described as being akin to racism in that it is the source of pervasive differences in nearly all socio-economic outcomes. However, unlike the linkages between racism and colorism, the racial or skin color–based origins of caste are contested (Misra 2015). Many have argued that the caste system predates the racial stratifications derived from colonialism and slavery. Moreover, geographical variations in skin color in India cut across class lines, and differences in skin shades are not necessarily the defining feature of caste identity. Our findings present new data to support this assertion that skin shade in itself is not a stand-in for caste and that it might not yet have the kind of structural economic consequences that it does when combined with a history of racial discrimination. Such clarifications can add important context to stratification economics studies (Deshpande, 2011) that seek to isolate the dynamics of caste and other such specific intergroup disparities.

Section 2 presents an overview of the history of colorism in India and the new role of the skin whitening products industry in intensifying global colorism. This is followed by a discussion of the literature on discrimination and the measurement of discrimination in Section 3, including a literature review of empirical evidence for skin tone discrimination in the USA. In Section 4, we describe the experimental survey and data collection method for this study, followed by a discussion of the results in Section 5.

2 Colorism and the modern skin whitening industry

The renewed global focus on racism following the death of George Floyd and the Black Lives Matter protest movements in the USA in May 2020 generated fresh scrutiny on the vast skin whitening products industry in India. Under pressure, the Anglo-Dutch multinational Unilever announced that it would change the name of its iconic “Fair and Lovely” face cream in IndiaFootnote 1; dropping the term “fair,” a common Indian euphemism for lighter skin. There was however much skepticism about whether a mere change of name would actually change entrenched industry practices of promoting light skin given similar cosmetic responses to past advocacy pressures. Skin whitening products are big business in India. According to a 2011 World Health Organization report (WHO 2011), 61 percent of the dermatological market in India consists of skin whitening products. The range of products has only become larger since. According to marketing reports, the skincare products market grew by an annual rate of about 20 percent between 2012 and 2016, nearly double the growth rate for China, the largest market in the region. Valued at close to 2 billion dollars in 2016, the market is dominated by facial care products and as the UN report indicates a majority of these products claim skin whitening benefits. The trends in India were part of a rapidly expanding global market. Before the renewed conversations around racism, a few different market research reports had indicated an even more rapid growth in the near future. For example, a recent market forecast indicates that the global market for skin whitening is projected to reach about $24 billion (US) by 2027 (Future Market Insight, 2017); another puts the figure at $ 31.2 billion (US) by the year 2024 (Global Industry Analysts 2018).

Paralleling the rise of this industry, there has also been an expanding conversation on colorism (Sadat 2015). Colorism represents a new context to racism, where even among non-white majority populations, power and privilege become associated with whiteness (Norwood 2014) in a perpetuating cycle. In India, the origins of light skin preference are much debated. Color preference co-exists within the complex heterogeneities of region, caste, and religion in Indian society. The caste hierarchy, originally thought to be derived from occupational categories, is particularly the defining feature of group identities in India. The literal meaning of the Sanskrit language word for the original caste classification “varna” is categories. However, as is common in Sanskrit, words can have multiple interpretations and varna is sometimes also interpreted as color. This has led to discussions about the links between caste and skin color. The varna system, in some interpretations, is said to have been introduced by lighter-skinned Aryan tribes from Central Asia who made incursions into the Indian subcontinent. However, there is considerable disagreement about this supposed link between caste and skin color.

Misra (2015), Deshpande (2011), and Parameswaran and Cardoza (2009) review much of the anthropological and historical literature that affirms that the Indus Valley Civilization predates Aryan arrival and there is no clear-cut evidence of a wholescale replacement of pre-existing cultural configurations by the Aryans. This historical evidence therefore does not rule out the possibility that caste hierarchies might have predated the arrival of the supposedly lighter-skinned Aryans. There are also no references to skin color in the original texts that first delineated the varna or caste system. Moreover, in the contemporary expanded social stratification of “jatis,” which are well over 3000 groups derived from the original varna classification, there is a varied and intricate hierarchy that cannot be neatly linked to a linear progression of skin tones. Deshpande (2011) also argues that there is considerable variation in skin color across geographical regions in India and these geographical variations in skin shade transcend caste lines.

Rather than the caste system, the aspiration for white skin can be more directly traced to colonialism much in the way that racism originates with slavery and colonialism. It is with the arrival of the British colonialists that we see specific codified color lines. Unlike previous waves of incursions, the British, with their distinct whiteness, specifically emphasized the separation between themselves and the Indians. A large body of historical and socio-cultural literature has documented the British emphasis on whiteness as a form of racial superiority and their justification of colonization as a mission to civilize the non-white Asian and African populations (Alatas 1977; Said 1994). Specific demarcation of access to educational institutions, private clubs and restaurants between Whites and Indians, the labeling of British versus Indian living spaces as white and black towns, and the exclusion of Indians from the colonial power structure solidified the association of white with power, privilege, and overall social superiority. The elevation of whiteness in India became overt and firmly established during the 200 years of British colonialism. In post-Independence India, this transformed into a more generalized aspiration for lighter skin tones.

This aligns with much of the literature where the origin of colorism is traced to the same myths of Caucasian, White superiority that came about as a result of slavery and European colonialism. Washington (1990) describes colorism as a legacy of the continued economic and cultural hegemony of the West in the post-colonial world, where non-white, brown individuals internalize the Western world view of superiority of the Caucasian/White race. They therefore experience a psychological need to distance themselves from the blacks as they aspire to the higher status accorded to the “lighter-skinned” individuals in the hierarchy. Similarly, Hall (2018) further adds that even as modern laws have deemed discrimination on the basis of race illegal, colonial ideology about racial superiority persists through the globalization of lighter skin presence. The lighter skin tone in different populations is seen as a proxy for the Caucasian race. Hall (2018) therefore calls for critical skin theory (CST) instead of critical race theory (CRT) to spotlight the continued global impact of racial hierarchies stemming from colonialism. Skin shade variation therefore represents an emerging global frontier for examining intergroup inequalities as emphasized in stratification economics (Darity 2005).

The end of colonial rule dissolved the segregation of spaces based on color lines in India, unlike in the USA where the legal structures of racial segregation persisted till the civil rights movement and became embedded in the institutional structure of society. Though such legal and institutional racism ended with the disbanding of colonialism, the aspiration for lighter skin persisted in India. The original market for skin whitening products included various bleaching products. In 1975, Unilever introduced the “Fair and Lovely” fairness cream, touting its melanin-blocking properties as a first of its kind safe alternative to bleaching.Footnote 2Fair and Lovely quickly became a mainstay of the Indian cosmetic market. The initial marketing was primarily aimed at women and propagated “fairness” as a beauty ideal for women, particularly younger women on the cusp of entering the marriage market (Dhillon-Jamerson 2019).

Beginning in the 1990s, new economic forces of open markets, relaxation of government controls, and greater global integration drove an explosive growth in the variety and range of skin whitening products led primarily by multinational corporations like Unilever and the French multinational L’Oreal (Vijaya 2019). This expanded market was also defined by aggressive marketing efforts to move beyond the beauty ideal aspect of skin shade. Beginning with a controversial Unilever TV commercial in the early 2000s, the use of skin whitening products began to be tied explicitly to outcomes of socio-economic success such as getting jobs, acquiring greater confidence and social acceptance. The newer marketing strategies not only suggested that whiteness equaled success and social mobility but also explicitly linked darker skin to a lesser status and socio-economic stagnation. Unilever withdrew its first ad linking lighter skin to employment due to protests by women’s organizations about its racist imagery (BBC 2003). However, this did not change the general trajectory in the new marketing tactics. Skincare commercials continued to portray lighter skin as a requisite for socio-economic mobility and success. This trend was accompanied by an increased use of harmful bleaching and steroid products, causing periodic public health alerts by medical professionals (Hogade and Fatima 2017).

Such aggressive marketing of skin whitening as the key to success is not unique to India. Parallels exist in Asia and Africa where the skin whitening has expanded greatly in recent years. In East and West Africa for example, multinational corporations have used the same marketing tactics. The German multinational Beiersdorf’s Nigerian ad for its Nivea fairness cream generated criticism for its blatant portrayal of black skin turning white and associating white skin with social success (Vijaya 2019). Public health concerns about widespread use of skin whitening products have led the East African Legislative Assembly to recommend a regional ban on cosmetics with skin bleaching agents. In response, multinational corporations have sought to portray their products as “natural” and therefore safer melanin-blocking alternatives to bleaching (Vijaya 2019). Referring to melanin as an abnormality that needs to be blocked or controlled has added a new dimension to the normalization of the global color hierarchy.

However, even as these perceptions of success associated with lighter skin have been propagated, empirical evaluation of such links has been scarce. It is not clear to what extent preference for white skin translates into skin shade discrimination against those with darker skin tones in specific economic realms. While it is important to highlight such discrimination where it exists, it is equally important to challenge the suggestion of such disadvantage propagated by the industry, if such disadvantage does not exist.

Though skin preference may not have derived from the caste system and did not originally represent a group identity among the non-white Indian population, did the colonial era associations of light skin with power and prestige evolve into group privileges for lighter-skinned individuals and discrimination against darker-skinned individuals, overtime? In the absence of ingrained legal and institutional segregation based on skin color, the mechanisms through which lighter skin could translate into specific economic outcomes of advantage and discrimination need to be examined and evaluated. In the following section, we explore the literature on group identity–based discriminations and the measurement of such discrimination in other contexts.

3 The mechanism of discrimination and measurement of colorism

Theoretical models of discrimination in the economics literature center broadly on two modes originally described in the work of Gary Becker—a taste for discrimination and statistical discrimination (Neumark 2018; Small and Pager 2020). In the first case, employers are willing to pay a price to reduce association with a group against whom they are prejudiced. In this case, discrimination is conscious and overt and results in inferior outcomes such as reduced employment options for the groups that are discriminated against. It also results in a cost for the discriminating employer, who will have to pay more to hire persons exclusively from the preferred or privileged group. So, if employers have a preference for lighter-skinned individuals, we will see higher wages and employment rates for this group in comparison to similarly qualified darker-skinned individuals. Such discrimination based on overt taste or preferences is thought to be short-lived since in the long run competitive employers who take advantage of lower labor costs of the discriminated group are expected to outcompete employers with a taste for discrimination. Such overt discrimination is also less likely and harder to measure in the context of anti-discrimination laws and in cases where group identities are less binary (black and white or male, female) and more on a continuum of skin tones.

In the case of statistical discrimination, employers or others in positions of power make judgements based on perceptions about group characteristics when they have limited information about individuals. For example, in hiring decisions, employers often lack full information about a candidate’s productive capacity. They might therefore resort to making decisions based on their perceptions about group characteristics. In this case, statistical discrimination is viewed as a way to hedge the higher cost of hiring a potentially less productive employee. In the case of white skin preference, skin tone might act as a proxy for unobserved data about potential employees if lighter skin is generally associated with superior capabilities and higher productivity.

More recently, Bertrand et al. (2005) discussed a third model of discrimination—implicit discrimination. In implicit discrimination, discrimination is unintentional and occurs outside of the conscious awareness of the discriminator. It often occurs in situations where there is ambiguity about a decision, for example, in the case of deciding who among equally qualified candidates will make a good employee. Bertrand et al. (2005) described the use of the Implicit Association Test (IAT) that measures implicit attitudes about social groups. IAT test scores can in turn predict discriminatory choices that result from unconscious or implicit biases against social groups based on for example race or gender.

Goldsmith et al. (2006) described discrimination based on white skin preference to social categorizations of in- and out-groups. According to social identity theory (Tajfel & Turner, 1979), individuals tend to mentally categorize people as belonging to an in-group or an out-group, based on their similarities or differences with them, respectively. Individuals then tend to identify with their in-group members and conform with this identity. This is followed by social comparison, where people compare their in-group to the out-groups and to maintain and boost their self-esteem, favor their own group, and discriminate against the out-group. However, when a low-status group accepts the superiority of a high-status group, the members of the low-status group may in fact show a preference for the high-status group, even when it is an out-group. While generally, in-group privilege might generally distribute across society since most people will belong to both in-groups and out-groups, particular in-groups might have more power and resources than other groups. The long colonial associations of light skin with higher status therefore could lead even those in non-white populations to view lighter skin as higher status and more trustworthy or deserving. There is evidence to show that members of different groups (whites, blacks, and latinos) judge darker-skinned members of both out-group and in-group more stereotypically (Uhlmann et al. 2002). More negative traits were associated with the darker-skinned individuals by both black and white participants compared to the positive and counter-stereotypic traits of those with lighter skin (Maddox & Gray, 2002).

Beyond these associations of traits, measuring the specific economic outcomes of the various models of discrimination has a long tradition in economic research. The key to such empirical estimations of outcomes is distinguishing between outcomes that result from discrimination versus outcomes that come about as a result of other kinds of differences among individuals. Traditional methods have focused on using large observational data to perform regression decomposition analysis. In this method, observable factors that could influence individual productivity such as level of education and years of experience are controlled for, using a regression equation. This allows the estimation of a residual gap, a gap in wages for example that cannot be explained by differences in observed productivity factors and therefore can be interpreted as evidence of discrimination based on group identity. Such residual gap analysis is common in the gender and racial employment and wage gap literature since such primarily self-reported group identity information is readily available in large-scale datasets (Newmark 2018). In the case of colorism, where differences in skin tone are more along a spectrum rather than categorical group identities, data is less readily available.

Due to this data limitation, empirical estimation of discrimination based on colorism comes primarily from the USA where a few survey datasets have noted skin shade differences among respondents. Goldsmith et al. (2006) used a multicity urban inequality survey data where interviewers noted skin shade of respondents on a graded Likert scale to study the impact of white preference on wages. They found that the interracial or black-white wage gap was narrower for lighter-skinned African American men in comparison to darker-skinned African American men. Darker-skinned African American men suffered a wage penalty both in comparison to white men as well as lighter-skinned African American men. Other studies have also found that lighter-skinned African Americans, Asians, and Latin Americans in the USA are able to have greater access to education, employment, and wealth creation opportunities, in comparison to darker-skinned individuals from the same ethnic group (Monk 2014, Painter et al. 2016, Hersch 2008). In these studies, observational data from national-level surveys which recorded skin tone variations of respondents were the key to the empirical analysis.

Such survey data though are rare in the Global South. There, there are a few respectable exceptions such as research carried out in Brazil. For example, some researchers have used the 1991 Brazilian census which asks individuals to choose from the 5 skin-color options for the skin-color or race question (Rangel 2015). Additionally, the Latin American Public Opinion Project provided nationally represented sample survey where interviews rated the skin color of respondents (Monk 2016). In the case of India, there are no publicly available survey datasets which record skin tones of respondents. Mishra (2015) examined the impact of colorism on social acceptability among Indians by conducting a survey with male and female college students and a focus-group interview with women from different regions of the country. Irrespective of their own skin tone, 74% of those surveyed agreed that lighter-skinned individuals were more acceptable in society. A majority of these individuals also perceived lighter-skinned people to have a higher status in society. A larger proportion of women wanted to have a lighter skin tone compared to the men and a larger proportion of men wanted to go out with lighter-skinned women. Interestingly, lighter skin tone was not perceived to be directly related to education or caste. This study though focuses primarily on self-reported perceptions and therefore does not evaluate outcomes of discrimination based on skin color preference.

In a more qualitative approach, Sims and Hirudayaraj (2015) interviewed six Indian women to explore the impact of colorism on their career aspirations and opportunities. They used phenomenological inquiry to examine the lived experiences of Indian women who had experienced colorism in their lives. Stifling of career aspirations was reported by the women, especially in professions involving interaction with customers. It impacted the self-confidence of the dark-skinned individuals as they reported feeling inferior to their light-skinned peers. The authors believe that these women may avoid venturing into certain professions, if they feel they would not be hired for because of their complexion or skin tone. This self-selection would fall within the purview of pre-market discrimination and is not a direct measure of whether there is discrimination once individuals enter a particular labor market. This is therefore not comparable to studies that evaluate wage gap or employment outcomes from survey data.

Even if traditional survey data were available in the Global South context, evidence of discrimination from such data also suffers from the potential problem of unmeasurable differences in characteristics between respondents. Even though regression analysis controls for the measured differences, the residual and therefore the measure of discrimination is skewed by the unmeasured differences. As an alternative, newer experimental approaches aim to create a synthetic pool of identical labor market participants distinguished only by the group identity being tested for discrimination. Outcomes observed as a reaction to such a pool therefore can potentially be attributed solely to discrimination since all other potential heterogeneities normally observed among candidates have been erased. Neumark (2018) reviews several experimental research projects evaluating discriminatory outcomes based on a range of group identities such as gender, motherhood, age, and obesity status. Participants in these experimental data collection projects are generally presented with hypothetical options about selecting candidates for hiring, promotions, and other labor market outcomes where group status (age, sex, skin color) is manipulated for otherwise identical job candidates to explore latent biases.

Harrison and Thomas (2009) use such experimental methods to specifically test the impact of colorism on job selection in the USA. Participants in that study were undergraduate students who were presented with a resume for a person in the marketing field with one of six photos attached at one time. While the resume remained the same throughout, the six possible photos consisted of three photos of the same man with skin tone manipulated to range from dark, medium, and light and similarly three photos of a woman with the three skin tone manipulations. Participants were asked to rate the resume on the overall presentation and their perception of the skill, knowledge, and experience levels of the candidate. The study found that for both the male and female photos, the lighter skin–toned photo-resume combinations received statistically significant higher ratings for overall resume quality, experience level, and hiring decision, in comparison to the darker skin–toned photo-resume combinations.

In the absence of survey data that identifies skin tone variations in India, such experimental research methods are most suited to examine the impact of white skin preference on labor market outcomes. In this study, we follow the example of Harrison and Thomas (2009) in creating a pool of identical resumes matched with photographs manipulated for skin tones. To the best of our knowledge, this is the first experimental study designed to evaluate employment outcomes of skin tone preference in India. In the next section, we describe the details of the study design.

4 Experimental design for light skin preference and hiring bias in India

In this experimental study, we aimed to empirically test the impact of white skin preference on hiring outcomes. Guided by the discussion of implicit bias and its impact on discriminatory behavior, the primary objective is to see whether perceptions of white skin being linked to success leads to a greater propensity to implicitly or unconsciously view lighter-skinned job candidates favorably in comparison to equally qualified darker-skinned candidates. We also consider white skin preference in terms of the in-group privilege explained by Goldsmith et al. (2006). That is, did the colonial era associations and subsequent marketing of skin whitening reinforce the status of lighter-skinned individuals as an in-group with high status such that they are viewed as being more capable and suitable for employment relative to darker-skinned individuals?

Following the Harrison and Thomas (2009) research design, we asked primarily student participants to evaluate resume-photo combinations where the photos have been manipulated for three different skin tones. The research participants were drawn from the student cohort of a highly selective graduate-level Business School in Bangalore in Southern India. Given the level of selectivity, students in the program generally have a few years of work experience beyond their undergraduate education before they start the program and are likely to have participated in hiring processes.

For this study, we have developed two resumes—one for an early career marketing professional and one for an early career information technology (IT) professional (Appendix). Since Bangalore is the IT hub of India, often referred to as the Silicon Valley of India, this was an appropriate choice for our study. We included a marketing resume for comparison with Harrison and Thomas (2009) and also to include a profession which is more client-facing and might therefore place a greater or different emphasis on presentability than IT. The resumes were developed in consultation with a marketing and an IT professional in Bangalore who provided us with several prototype resumes. The resumes we have developed are entirely anonymous. They did not include any names and or information traceable to any one person. We decided against the use of names since names can have caste associations in India. The dates and work experience information were created by us, though the company names are real to keep the resumes believable. We then developed three versions of two photographs—one male and one female to be included in the photo-resume combinations. The original photographs were given to us by similarly aged volunteers who were fully informed of the intent and goals of this study and had an academic interest in our study. Using the original photo as the medium skin tone, two additional versions, one with a darker skin tone and a lighter skin tone, were generated using the photo editor Pixlr. Each photograph therefore had three skin tone variations—dark, medium, and light.

We first ran a pilot study to test reactions to the resumes and to test if the manipulated photos were of comparable quality. For the pilot study, we drew student participants from the same academic institution as our main study. However, these participants were from a Ph.D. program within the same institution whereas our main study consisted of participants from a graduate program in business administration. The two student bodies are relatively separate and therefore the pilot participants are unlikely to reveal details of the study to the main study participants.

4.1 Pilot test

In the pilot study, each participant was shown both the IT and marketing resumes separately without photo attachments. They were then shown one male and one female photograph independent of the resumes. The male and female photographs were paired according to similar skin tones. That is, those shown a darker-toned male photograph were also shown a darker-toned female photograph and so on. Respondents were asked to rate the quality of both the photos on a five-point scale ranging from quite clear to quite unclear. They were asked to rate the attractiveness of both the male and female photos on a five-point scale ranging from very attractive to quite unattractive. Finally, they were asked to estimate the age of the individuals in the two photos. In the first round of the pilot, each pair of male and female photographs were shown to 11 individuals for a total of 33 participants. For the female photographs, we found that the photo quality ratings for dark and medium skin tones were comparable. The dark skin tone photo was rated higher than average on the five-point photo quality scale by 92 percent of the respondents. Similarly, the medium skin tone photo was rated higher than average by 91 percent of respondents. However, the light skin tone female photograph received substantially different quality ratings. Only 64 percent of the respondents rated the light skin tone photo as being above average in terms of quality.

This indicated a difference in the photo quality itself and suggested that the three photographs were not of comparable quality. To address this, we created a different version of the light skin tone photo in Pixlr. This new photograph was used in the second round of the pilot. An additional 10 participants were shown the female light skin photo paired with the male light-skinned photo. With this new version, 100 percent of the participants rated the female light skin photo as being above average in quality. These results gave us confidence that we had been able to fix the photo quality issue. All three male photographs were comparably rated in terms of quality in the initial round of the pilot and therefore did not require any further manipulation.

The participants rated the two resumes, independent of the photographs on the overall quality, the perceived skill, knowledge, and experience highlighted in the resume. There were no specific trends in these ratings. The IT and marketing resumes received very similar and favorable ratings.

4.2 Main study design and implementation

Having addressed the photo quality issue based on the pilot studies, we then moved on to the main study. For the main study, we paired each of the three skin tone variations of the male and female photograph with the marketing and IT resumes for a total of 12 different resume and photo combinations. A total of 273 students participated in the rating of these resumes. Participants were told that the research was intended to understand reactions to resumes and provide feedback about resume best practices. No specific mention was made about biases or implicit biases so that participants were not particularly attuned to the goals of the study. Each participant viewed and rated only one resume-photo combination. This study design was chosen with intentionality to ensure that participants would not be consciously thinking about skin color bias while rating the resumes.

Showing each participant more than one photo-resume combination, particularly a mix of lighter or darker-skinned photos, presented a few different challenges. If participants were to be shown the same photograph but in two different skin shades, they would likely be alerted to the idea that the study might focused on skin shade bias. This awareness might then influence their rating. If the photos were of two different individuals, one with a darker skin tone and one with lighter skin tone, we would then need to modify the resumes. Having the exact same resume with two different photographs would once again draw attention to the skin tone differences in the photos.

Modifying the resumes however would have introduced new elements of difference between the two photo-resume combinations. Since the resumes are no longer identical, some of the differences in ratings might be based on perceived differences in the resumes. It would be nearly impossible to create two different resumes that will always be perceived as equal. Similarly, it would not be feasible to distinguish the differences in ratings that are based on the resume differences versus the difference in ratings that might be due to the differences in the skin shade of the photographs.

Using photographs of two different individuals also introduces other elements of difference in the physical appearance besides skin shade. We therefore followed the study design of Harrison and Thomas (2009) in showing each participant only one photo-resume combination. Doing so allowed us to follow the general principle of experimental studies, where all other potential differences are minimized so that only one particular group characteristic that is being tested for remains different. In our case, viewing only one photo-resume combination at a time allowed us to use the same resume and the photographs of the same individual with only the skin shade being varied across participants. Similar to Harrison and Thomas (2009), this study design allowed us to explore the prevalence of implicit light skin preference in the community at large. Rather than having individuals choose between lighter and darker skin shades more consciously, we focused on evaluating whether lighter-skinned photo-resume combinations received more positive responses from the community overall.

The surveys were implemented through Qualtrics beginning in December 2020. In the initial phase, participants were asked to come to the Behavioral Sciences Lab and take the survey on one of the lab computers. This plan however was disrupted due to the COVID-19 pandemic shutdowns that began in March 2020. Thereafter, participants were sent the Qualtrics link and could take the survey remotely. The Qualtrics link was enabled to implement a random rotation through the 12 different resumes to ensure that there would be similar numbers of ratings per resume. After completing an informed consent declaration, participants were able to view one resume-photo combination as explained above. They then moved on to the questionnaire where they were asked to rate how favorably they viewed a candidate based on three different criteria—educational background, work experience, and overall content and design of resume.

In each case, participants rated the candidates on a 5-point Likert scale, with 1 indicating highly unlikely to view favorably and 5 indicating highly likely to view favorably. Furthermore, participants were also asked to specify the likelihood of their recommending the candidate be invited for an interview on the same 5-point Likert scale. Finally, participants were also asked to rate the overall presentability of the candidate on a 5-point scale ranging from poor to excellent. At the end of the ratings, participants were asked to offer suggestions for improving the resume and provide a description of the photo accompanying the resume. Outside of the rating, participants also had to fill out a survey providing some demographic information about themselves including their age, gender, employment experience, and prior experience with a hiring process. We also asked participants to rate their own skin color on a scale ranging from extremely fair to dark-skinned.

Given our sample size, each resume-photo combination was evaluated by about 22 to 23 participants. Table 1 presents a summary of the participant demographics. The average age of participants is about 26 years, with a wide range between the youngest (20) and the oldest (51) participants. A majority of the participants (66%) were men. This is reflective of the student body of the institution where a majority of the incoming class tend to be male.Footnote 3 As discussed before, given the selective nature of the school, many participants had some prior work experience. The average years of experience was 2.4 with the actual number ranging from 0 to 24 years. Given this background, about 31% of the respondents had participated in a hiring process in their prior working lives.

Table 1 Participant demographics

5 Analysis and results

We began the analysis by looking at some descriptive statistics. We first looked at the broad trends in the average ratings when the resumes are grouped according to skin tones. In Table 2, we compared the mean scores for the different rating criteria by grouping the resumes according to skin tone. We found that the average scores did not vary considerably across the three skin tones. There is no discernable preference for resumes paired with light skin–toned photographs or a distinct lack of preference for resumes paired with the two dark skin–toned photographs. As seen in the p-values, there were no statistically significant differences (at the 99 percent or 95 percent confidence level) in average ratings for resumes with dark, light, or medium skin tones across any of the five different evaluation criteria. The average scores tended to fall broadly in the middle of the 5-point Likert scale rating, ranging between 3 and 4. Respondents therefore seemed to have tended to a safe or neutral position in their ratings. The only exception was the ratings for experience, which were uniformly 4 or slightly above, which indicated that respondents were likely to view the experience of candidates favorably. Even in this case though, there is no significant difference in average rating scores across the different skin tones.

Table 2 Mean comparison for by skin tone; all resumes

Next, we compared outcomes for different groupings of resumes. First, we separated the resumes based on the field of work—marketing or IT. Given the more direct client-facing aspect of marketing positions, perceptions and therefore ratings for presentability or other aspects might be different for marketing resumes compared to the IT resumes. We might also expect gendered expectations about appearances and the past associations of whiteness with a beauty ideal to reflect in differences in the ratings of male versus female job candidates. We therefore also compared ratings by grouping the male and female resumes separately. For these comparisons, we went beyond the means and looked at the distribution of the actual scores for the different rating categories. To compare variations in the distribution of scores by skin shade for each of the resume groupings, we used histograms presented in the following graphs.

In Graphs 1 and 2, we compared histograms for the marketing resumes grouped by the three skin tone variations. There was a total of 143 marketing resumes. Out of these, 45 resumes were paired with a dark skin tone photograph, 49 with a medium skin tone photo, and another 49 with a light skin tone photo. The green bars represent the histogram for the darker skin tone resumes. The blue bars represent the rating distribution for the medium skin tone resumes and the red bars represent the light skin tone resumes. In Graph 1, we found that the score of 4 (likely to view favorably) was most often chosen by participants when asked to rate the overall resume, for all three skin tones. None of the resumes had the lowest score of 1 on the Likert scale (highly unlikely to view favorably). The dark skin tone (green bar) resumes received a slightly higher proportion of 4 s and a lower proportion of 5 s compared to the other two groups. However, these differences were not statistically significant. The chi-square test of independence indicated a p-value of 0.62 which is not significant at the 95 percent confidence level.

Graph 1
figure 1

Resume rating for marketing resumes grouped by skin tone. Chi(2) 4.4 P-value 0.62, not significant at the 95 percent confidence level. Total resumes 143 (dark 45, medium 49, light 49)

Graph 2
figure 2

Presentability rating for marketing resumes grouped by skin tone. Chi(2) 9.3 P-value 0.30. No significant differences at the 95 percent confidence level. Total resumes 143 (dark 45, medium 49, light 4)

In Graph 2, we compared the distribution of scores for the presentability rating. We again found that across the three skin tone groupings, the highest proportion of respondents chose a score of 4. Here, the medium skin tone resumes had the highest proportion of 4 s. There was only one rating of 1 for a light skin tone resume. However, once again as seen from the p-value for the chi(2) statistic, the differences here were not statistically significant.

In Graphs 3 and 4, we compared the spread of scores for the IT resumes. There were a total of 130 IT resume-photo combinations. For the overall resume ratings, we see more variability in scores for IT resumes in Graph 3, compared to the marketing resumes. There is a higher concentration of scores at the lower end of the 5-point scale indicating a more critical evaluation of the IT resumes compared to the marketing resumes. However, the scores across the three skin tones are similar. Though there seems to be a slightly larger concentration of scores of 2 (unlikely to view favorably) and 3 (neutral), for the light skin tone resumes, these differences were not statistically significant.

Graph 3
figure 3

Resume rating for IT resumes grouped by skin tone. Chi(2) 7.3 P-value 0.50. No significant differences at the 95 percent confidence level. Total resumes 130 (dark 44, medium 43, light 43)

Graph 4
figure 4

Presentability rating for IT resumes grouped by skin tone. Chi(2) 6.5 P-value 0.59. Total resumes 129. No significant differences at the 95 percent confidence level. Total resumes 130 (dark 43, medium 43, light 43)

In Graph 4, the spread of scores for the presentability of the candidates for the IT resumes is also more varied than the marketing ratings. However, once again the ratings across the different skin notes are not systematically different. While the medium skin (blue bar) tone rating seems to be relatively more concentrated at the score of 4, these differences are not statistically significant.

In these initial descriptive analyses, we did not find much of a pattern to indicate that implicit bias against darker skin influenced the rating of these resumes. We also did not see a consistent pattern of in-group privilege working in favor of resumes paired with the lighter skin tone photos. There are no statistically significant differences in the overall ratings of the resumes based on the skin tone of the photos paired with the resumes. Though we did see differences in the rating of the IT resumes compared to the marketing resumes, within these categories, skin tone variations did not lead to significant differences in the overall ratings of the resumes.

To explore further, we next looked at the trends in ratings across the three specific evaluative criteria of education, experience, and the likelihood of recommending for an interview. There were no statistically significant differences in the spread of scores for ratings for experience or likelihood of interview selection. However, for the education rating, we did find the first statistically significant difference. In Graph 5, we see the histograms for education ratings for the marketing resumes, grouped by skin tone. We find that for darker skin tone photo-resume combinations, there is considerably less concentration at the favorable score of 4 (green bar) compared to the other two skin tones. There is instead a higher concentration at the score of 2 (unlikely to view favorably). When we look at the chi-square and p-value, we do find that these differences are statistically significant at the 95 percent confidence level.

Graph 5
figure 5

Education rating marketing resumes. Chi(2) 18.3 P-value 0.02* (*significant at the 95% confidence level). Total resumes 143 (dark 45, medium 49, light 4)

Therefore, in the case of the marketing resumes, we find that darker skin tone photo-resume combinations have been rated lower in their educational background compared to the medium or light skin tone resume combinations. This finding is unique to the education ratings for marketing resumes. The education ratings for the IT resumes do not have statistically significant differences across skin tones as seen in Graph 6.

Graph 6
figure 6

Education rating IT resumes. Chi(2) 9.8 P-value 0.28. No significant differences at the 95 percent confidence level. Total resumes 130 (dark 44, medium 43, light 43)

Finally, we looked at the trends in ratings when resumes are grouped by gender. Since white skin preference has in the past been seen as a beauty ideal, particularly for women, we might expect that a stronger implicit bias might emerge when individuals view female resume-photo combinations. However, in Graphs 7 and 8, we found that the distribution of scores for resume and presentability rating did not vary significantly across the different skin tones for the female resumes. Resume rating tended to have the highest concentration at the score of 4 (likely to view favorably) for all three groups in Graph 7. Presentability ratings had more of a spread between scores of 3 and 4. Though here again there were no statistically significant differences.

Graph 7
figure 7

Resume rating female resumes. Chi(2) 10.2 P-value 0.26. No significant differences at the 95 percent confidence level. Total resumes 130 (dark 43, medium 45, light 47)

Graph 8
figure 8

Presentability rating female resumes. Chi(2)11.4 P-value 0.18. No significant differences at the 95 percent confidence level. Total resumes 130 (dark 43, medium 45, light 47)

To summarize the descriptive analysis so far, after comparing differences in ratings across various categories, we did not find a consistent pattern of bias against resumes paired with darker skin tone photos. In only one instance, in the case of marketing resumes, was there statistically significant lower ratings for resumes with darker skin photos compared to the medium or lighter skin tone resume-photo pairings. To explore this further, we next turn to a multivariate analysis.

Colorism or skin shade bias might not be uniform across all individuals but might vary according to certain group characteristics. A multivariate analysis would allow us to see if a more consistent pattern of bias might emerge if we controlled for some of the demographic characteristics of study participants. In this study, we were able to obtain information about the gender and age of the study participants. Given the gendered association of lighter skin with women’s ideals, we might expect the gender of the respondents to have some influence on the way skin color bias operates. We also asked participants to specify their years of work experience and whether they had previous experience with hiring. Using these demographic variables, we were interested in estimating the following regression model.

$$S=X\beta +\varepsilon$$

where S, the Likert scale score chosen by a participant to rate a particular criterion for a resume, is dependent on the independent predictors X. These independent predictors include the participant’s gender, years of experience, and hiring experience. Since age and years of work experience were very closely related, with older individuals having more years of work experience, we included only the work experience variable in the regression model. In addition, we also included dummy variables for the skin shade of the photo attached to the resume being rated. Including these independent variables led us to the following estimating equation.

$${S}_{i}={\beta }_{0}+{\beta }_{1} {Gender}_{i}+{\beta }_{2} {Work\;Experience}_{i}+{\beta }_{3} {Hiring\;Experience}_{i}+{\beta }_{4} Medium+{\beta }_{5}Light$$

Medium and Light represent the dummy variables for the skin shade of the photos. Since we have three skin shades, we include two dummies here with the primary comparison being with the dark skin tone photo category. If resumes with medium and light skin tone photos are being more favorable in comparison to the darker skin tone photos, we would expect to see a positive and significant coefficient for the medium and light dummy variable coefficients. The Gender variable is represented by a binary dummy where male respondents are coded as 1 and female respondents 0. The Hiring variable is similarly a binary dummy variable where having prior experience with hiring is coded as 1. Since the dependent variable is based on the Likert scale with 5 representing the most favorable rating and 1 representing the least favorable rating, the equation is estimated using an ordered probit regression model.

In Table 3, we present the results of 4 different ordered probit regressions, each with a different rating criterion as the dependent variable. In the first column, we have the regression coefficients with the overall resume rating scores as dependent variables. We found that only the coefficient for the years of work experience variable is statistically significant. The likelihood of a positive (higher score) rating is increased with each additional year of work experience the respondent has. The other demographic variable like gender and hiring experience did not have statistically significant coefficients. We also found that the skin tone of the resume did not have a significant influence on the rating score. The coefficients for the Medium or Light skin shade dummy variables were not statistically significant. A similar pattern was observed in the next three columns where the presentability, experience, and education rating scores were used as the dependent variables. The only statistically significant coefficient was associated with the work experience variable across all these regressions. The dummy variables for skin shade of the photos did not have a statistically significant impact on the ratings in either of the estimations.

Table 3 Ordered probit model all resumes combined

Since the descriptive analysis above indicated that skin shade differences might be associated with differences in the education ratings for marketing resumes, we estimated regression equations separately for just the marketing resumes. In Table 4, we see the results from four regressions for the ratings of the marketing resumes. In the education rating regression in column 1, we do see the impact of skin shade on the ratings. The coefficients for both the medium and light skin shade dummy variables are positive, indicating that photos with both these skin shades have a greater likelihood of a positive rating compared to the dark skin photo. Though only the coefficient for the medium skin shade variable is statistically significant. Here, we do see confirmation of the trend we found in the descriptive analysis. Marketing resumes paired with darker skin photos tend to have lower scores for education ratings. This bias against the dark skin photos though is not present for any of the other marketing rating criteria. The regressions for overall resume rating scores, presentability, and experience ratings do not have statistically significant coefficients for the skin tone dummies. As before, only the work experience variable is found to have a consistently positive and statistically significant impact on ratings. The results from the regression analysis therefore did not offer consistent evidence for bias against darker-skinned photo-resume combinations.

Table 4 Ordered probit model for marketing resumes

6 Conclusion

In this study we attempted an experimental study to evaluate the impact of light skin preference on labor market outcomes in India. We specifically focused on whether lighter skin preference might contribute to discrimination in the hiring process against those with darker skin tones. Lack of survey data with recorded skin color variations has limited the ability to study specific discriminatory outcomes based on colorism particularly in India. In this study we attempted to overcome the data barriers through our experimental study design. Participants in the study rated equivalent resumes matched with photographs that were manipulated to show three different skin tones. Each participant viewed and rate one photo-resume combination. The results do not provide evidence for the kind of impact that colorism or skin color preference has been shown to have on labor market outcomes in the USA. While colorism is prevalent in India in the social context, this study offered no consistent evidence for linking light skin preference to instances of implicit bias in the specific economic context of hiring. These findings offer some new directions for thinking about colorism in the global context. Without the historical context of racism and the associated overt legal and institutional segregation and discrimination, colorism might operate more indirectly through social norms and outcomes rather than direct economic outcomes. Though the skin whitening industry has put considerable resources to market the notion, the fact that this link to economic outcomes might not exist is worth emphasizing.

It is also likely that unlike in the USA where the impact of racism and colorism transcends economic class, in India, colorism might be overridden by other more powerful hierarchies of class and caste. The class hierarchy has direct implications for this study conducted at an elite, highly selective graduate-level academic institution. The resumes we developed also suggested candidates with graduate degrees in the fields of marketing and IT, both of which have considerable class privilege in India. It is possible that this class privilege overrides the barriers to entry that might exist due to colorism outside of this elite setting or fields of work. It is also worth emphasizing that this study is also not intended to measure success in a field or even earnings but only the implicit barriers to entry that might exist at the point of hiring. It would be useful to replicate the study in a different, relatively non-elite setting with different choices of fields to explore this relationship between class and colorism.

Our findings once again emphasize the distinction between caste and colorism in India. As mentioned before, many scholars have argued that the two cannot be equated. Colorism is derived from colonial era associations, whereas caste predates such associations. It is also not possible to link the many layers of caste categories with a linear progression of skin tone. The substantial geographical variation in skin tone across India also makes the association of caste with color untenable (Desphande 2011). This geographical variation is particularly relevant for this study. Darker skin shades tend to be more prevalent in Southern India and this is also the common perception about skin tone variation in India. Our study was conducted in Bangalore, the IT hub in Southern India. It is therefore likely that though there might be a generalized preference for lighter skin, prejudice against darker skin tones might be less pronounced here relative to the north at least in the economic sphere. Indication of this geographical distinction also emerged in the qualitative comments from the survey. After the Likert scale ranking of the resumes, respondents were also asked to provide a description of the individual in the photograph. While we found no consistent pattern in the description of the photographs or differences between descriptions of the darker and lighter skin tone photos, the most often used descriptor was that the photograph was that of a “South Indian” man or woman. The term “South Indian” was indicated 5 times across the male and female photographs. This association with a geographical region with more dark skin prevalence might be reflected in the outcome of this study. It would therefore also be useful to replicate the study in a different geographical setting to compare the regional differences in skin shade bias.

Finally, though participants in the survey did not exhibit implicit bias against darker skin–toned applicants, a majority perceived of themselves as not being dark-skinned. Asked to report their skin tone on a 5-point scale ranging from extremely fair/light-skinned to dark-skinned, only 9 individuals or less than 3 percent of respondents indicated that they were dark-skinned. A majority, about 65 percent, indicated medium skin tone. About 29 percent choose to describe themselves as fair or light-skinned. So, while the implicit bias against dark skin tone may not manifest, at least in Southern India, there is an indication that white skin is at least the aspirational preference for individuals themselves. there is an indication that white skin is at least the aspirational preference for individuals themselves, mistakenly associating lightness with hiring success, as promoted by the skin whitening industry. As a counter to this narrative, we have demonstrated the tenuous nature of the relationship between lighter skin and economic success specific to hiring.

The following informed consent option was administered to all participants.

You are invited to complete a survey for a study of resume effectiveness. The motivation for the research is to study different unconscious biases in the hiring processes and evaluation of job candidates. We are also looking to examine whether reactions to resumes vary across different demographics. This study aims to inform job candidates on effective self-representation and hiring managers and human resource professionals of potential biases in the evaluation of candidates. This survey should take about 12–15 min to complete.

This study is being conducted by Ramya Vijaya, Professor of Economics, Stockton University and Naureen Bhullar, Manager and Research Coordinator, IIMB Behavioural Sciences Lab. Participation in this research poses no risk to participants. Participation in this study is voluntary. Your decision whether or not to participate will not affect your current or future relations with the Indian Institute of Management Bangalore or Stockton University. If you choose to participate, you are free to not provide any information you do not wish to, choose not to answer any question you do not wish to, or withdraw at any time. The records of this study will be kept private on computers accessible only to the two principal researchers. No names are collected in this survey so no one can link responses to any individual. In any sort of reports we might publish, we will not include any information that will make it possible to identify any individuals.

If you have any additional questions at any time, please do contact Ramya Vijaya email ramya.vijaya@stockton.edu or Naureen Bhullar email naureenbhullar@gmail.com.

By choosing yes below, you are indicating that you have read the information provided above and have decided to participate. You may withdraw at any time without prejudice should you choose to discontinue participation in this study.