Abstract
Several different measures of skin color are popular in social science surveys, yet we have little evidence to suggest which method is the most valid or reliable when we design new studies. In this experiment, we compare three different ways of asking raters to evaluate skin tone, testing whether common methods designed to reduce variation across raters from different social groups are effective. We compare two popular scales: a simple text-based 5-point skin tone scale (which asks raters to classify pictures on a scale from very light to very dark) and a newer 10-point palette-based skin tone scale (which asks raters to choose a number from 1 to 10, with pictures associated with each number). We also ask raters to use a more complex two-axis color grid that we created, in order to test whether addressing common criticisms of the palette-based scales improves rating reliability. Experiment participants rated a randomly selected subset of pictures with a wide range of skin tones. We find that demographic characteristics of the raters such as gender, race, their amount of contact with diverse racial groups, and immigration status affect skin tone ratings that observers assign, no matter what type of measure is used, and the three measures have reliability ratings that are statistically similar. We discuss the implications of the differences between the measures for designing social science surveys and interview studies.
Similar content being viewed by others
Notes
Latinxs are one of the groups most often perceived as members of a different racial category by observers (Campbell and Troyer 2011; Vargas and Kingsbury 2016; Vargas and Stainback 2016). In order to stay true to the models’ self-concept and explore the full range of appearances associated with Latinx identification, we used self-identification as the sole requirement of Latinx identity.
We did not recruit any graduate students who had taught classes, in order to ensure that undergraduate participants would not be likely to recognize the graduate students from class. We asked after the experiment was completed if the rater recognized anyone from the pictures. No participants reported having recognized anyone from the photographs used.
https://web.archive.org/web/20200417054253/https://www.loreal.com/research-and-innovation/when-the-diversity-of-types-of-beauty-inspires-science/expert-in-skin-and-hair-types-around-the-world, supplemented with shades from https://web.archive.org/web/20200107024915/https://www.lorealparisusa.com/products/makeup/face/foundation-makeup/true-match-super-blendable-makeup.aspx?shade=w0-5-cream-ivory
We tested the results without combining columns 11 and 12; results did not differ.
Seven respondents did not enter their zip code but did enter the name of their high school and city at age 15, so we used the zip code of their high school.
We asked raters to think about the “person or people who raised you” and select the highest level of education attained by these individuals.
A Hausman specification test showed that a fixed-effects model was not appropriate for these data. Quadrature checks (with the Stata command quadchk) showed that the results were not sensitive to the number of integration points used in the estimation. Although random-effects models require larger samples than a simple OLS model, our sample of 60 level-2 units (photos) is large enough to limit the risk of bias in our estimates and standard errors, especially given the high ICC for our dependent variables, as we can see from studies of similar model types that tested the impact of sample size (Ali et al. 2016, 2019).
Krippendorff’s alpha has significant advantages for this comparison over other measures of agreement. For example, it generalizes across different levels of measurement, and it discounts the amount of agreement across raters that occurs simply by chance, which is greater in scales that have few options (Hayes and Krippendorff 2007).
We used a one-way random-effects model to measure reliability for a single rater, because photos are not all rated by the same group of raters (Koo and Li 2016).
We assigned the photographs to darkness tertiles using the L* score from the pixel sampled from the forehead of each picture in Photoshop.
References
Abascal, M. (2020). Contraction as a response to group threat: Demographic decline and whites’ classification of people who are ambiguously white. American Sociological Review,85(2), 298–322.
Ali, A., Ali, S., Khan, S. A., Khan, D. M., Abbas, K., Khalil, A., et al. (2019). Sample size issues in multilevel logistic regression models. PLoS ONE,14(11), e0225427.
Ali, S., Ali, A., Khan, S. A., & Hussain, S. (2016). Sufficient sample size and power in multilevel ordinal logistic regression models. Computational and Mathematical Methods in Medicine. https://doi.org/10.1155/2016/7329158.
Allen, W., Telles, E., & Hunter, M. (2000). Skin color, income and education: A comparison of African Americans and Mexican Americans. National Journal of Sociology,12(1), 129–180.
Bailey, S., Saperstein, A., & Penner, A. (2014). Race, color, and income inequality across the Americas. Demographic Research,31, 735–756.
Bond, S., & Cash, T. F. (1992). Black beauty: Skin color and body images among African-American College Women. Journal of Applied Social Psychology,22(11), 874–888.
Borrell, L. N., Kiefe, C. I., Diez-Roux, A. V., Williams, D. R., & Gordon-Larsen, P. (2013). Racial discrimination, racial/ethic segregation, and health behaviors in the CARDIA Study. Ethnicity & Health,18(3), 227–243.
Campbell, M. E., Bratter, J. L., & Roth, W. D. (2016). Measuring the diverging components of race: An introduction. American Behavioral Scientist,60(4), 381–389.
Campbell, M. E., & Troyer, L. (2011). Further data on misclassification: A reply to Cheng and Powell. American Sociological Review,76(2), 356–364.
Caruso, E. M., Mead, N. L., & Balcetis, E. (2009). Political partisanship influences perception of biracial candidates’ skin tone. Proceedings of the National Academy of Sciences,106(48), 20168–20173.
Chavez-Dueñas, N. Y., Adames, H. Y., & Organista, K. C. (2014). Skin-color prejudice and within-group racial discrimination: Historical and current impact on Latino/a populations. Hispanic Journal of Behavioral Sciences,36(1), 3–26.
Dixon, T. L., & Maddox, K. B. (2005). Skin Tone, Crime News, and Social Reality Judgments: Priming the Stereotype of the Dark and Dangerous Black Criminal. Journal of Applied Social Psychology,35(8), 1555–1570.
Dressler, W. W. (1991). Social class, skin color, and arterial blood pressure in two societies. Ethnicity & Disease,1(1), 60–77.
Dressler, W. W. (1993). Health in the African American Community: Accounting for health inequalities. Medical Anthropology Quarterly,7(4), 325–345.
Espino, R., & Franz, M. M. (2002). Latino phenotypic discrimination revisited: The impact of skin color on occupational status. Social Science Quarterly,83(2), 612–623.
Feliciano, C. (2016). Shades of race: How phenotype and observer characteristics shape racial classification. American Behavioral Scientist,60(4), 390–419.
Frank, R., Akresh, I. R., & Bo, Lu. (2010). Latino immigrants and the U.S. racial order: How and where do they fit in? American Sociological Review,75(3), 378–401.
Freeman, J. B., Penner, A. M., Saperstein, A., Scheutz, M., & Ambady, N. (2011). Looking the part: Social status cues shape race perception. PLoS ONE,6(9), e25107.
Garcia, D., & Abascal, M. (2015). Colored perceptions: Racially distinctive names and assessments of skin color. American Behavioral Scientist,60(4), 420–441.
Goldsmith, A. H., Hamilton, D., & Darity, W. (2006). Shades of discrimination: Skin tone and wages. The American Economic Review,96(2), 242–245.
Goldsmith, A. H., Hamilton, D., & Darity, W. (2007). From dark to light: Skin color and wages among African-Americans. Journal of Human Resources,42(4), 701–738.
Gravlee, C. C., Dressler, W. W., & Russell Bernard, H. (2005). Skin color, social classification, and blood pressure in Southeastern Puerto Rico. American Journal of Public Health,95(12), 2191–2197.
Hagiwara, N., Kashy, D. A., & Cesario, J. (2012). The independent effects of skin tone and facial features on Whites’ affective reactions to blacks. Journal of Experimental Social Psychology,48(4), 892–898.
Hamilton, D., Goldsmith, A. H., & Darity, W. (2009). Shedding ‘light’ on marriage: The influence of skin shade on marriage for Black females. Journal of Economic Behavior and Organization,72(1), 30–50.
Hannon, L. (2014). Hispanic respondent intelligence level and skin tone. Hispanic Journal of Behavioral Sciences,36(3), 265–283.
Hannon, L., & DeFina, R. (2014). Just Skin Deep? The impact of interviewer race on the assessment of African American Respondent Skin Tone. Race and Social Problems,6(4), 356–364.
Hannon, L., & DeFina, R. (2016). Reliability concerns in measuring respondent skin tone by interviewer observation. Public Opinion Quarterly,80(2), 534–541.
Harris, K. M., Halpern, C. T., Whitsel, E., Hussey, J., Tabor, J., Entzel, P., & Udry, J. R. (2009). The National Longitudinal Study of Adolescent to Adult Health: Research Design [WWW Document]. Retrieved March 17, 2020, from https://www.cpc.unc.edu/projects/addhealth/design.
Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures,1(1), 77–89.
Hebl, M. R., Williams, M. J., Sundermann, J. M., Kell, H. J., & Davies, P. G. (2012). Selectively friending: Racial stereotypicality and social rejection. Journal of Experimental Social Psychology,48(6), 1329–1335.
Herman, M. R. (2010). Do you see what I am? How observers’ backgrounds affect their perceptions of multiracial faces. Social Psychology Quarterly,73(1), 58–78.
Hersch, J. (2006). Skin-tone effects among African Americans: Perceptions and reality. American Economic Review,96(2), 251–255.
Hersch, J. (2008). Profiling the new immigrant worker: The effects of skin color and height. Journal of Labor Economics,26(2), 345–386.
Hersch, J. (2011). The persistence of skin color discrimination for immigrants. Social Science Research,40(5), 1337–1349.
Hill, M. E. (2002a). Race of the interviewer and perception of skin color: Evidence from the multi-city study of urban inequality. American Sociological Review,67(1), 99–108.
Hill, M. E. (2002b). Skin color and the perception of attractiveness among African Americans: Does gender make a difference? Social Psychology Quarterly,65(1), 77–91.
Hughes, M., Jill Kiecolt, K., Keith, V. M., & Demo, D. H. (2015). Racial identity and well-being among African Americans. Social Psychology Quarterly,78(1), 25–48.
Hunter, M. L. (2002). ‘If You’re Light You’re Alright’ Light Skin Color as Social Capital for Women of Color. Gender & Society,16(2), 175–193.
Hunter, M. L. (2005). Race, gender, and the politics of skin tone. New York: Routledge.
Hunter, M. L. (2007). The persistent problem of colorism: Skin tone, status, and inequality. Sociology Compass,1(1), 237–254.
Jackson, J. S., Torres, M., Caldwell, C. H., Neighbors, H. W., Nesse, R. M., Taylor, R. J., et al. (2004). The National Survey of American Life: A study of racial, ethnic and cultural influences on mental disorders and mental health. International Journal of Methods in Psychiatric Research,13(4), 196–207.
Keith, V. M., & Campbell. M. E. (2015). Texas diversity survey.
Keith, V. M., & Herring, C. (1991). Skin tone and stratification in the black community. American Journal of Sociology,97(3), 760–778.
Keith, V. M., & Thompson, M. S. (2003). Color matters: The importance of skin tone for African American Women’s Self-Concept in Black and White America. In D. R. Brown & V. M. Keith (Eds.), In and out of our right minds: The mental health of African American Women (pp. 116–135). New York: Columbia University Press.
Klonoff, E. A., & Landrine, H. (2000). Is skin color a marker for racial discrimination? Explaining the skin color-hypertension relationship. Journal of Behavioral Medicine,23(4), 329–338.
Koo, T. K., & Mae, Y. L. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine,15(2), 155–163.
Krieger, N., Sidney, S., & Coakley, E. (1998). Racial discrimination and skin color in the CARDIA Study: Implications for Public Health Research. Coronary Artery Risk Development in Young Adults. American Journal of Public Health,88(9), 1308–1313.
López, N., Vargas, E., Juarez, M., Cacari-Stone, L., & Bettez, S. (2018). What’s your ‘Street Race’? Leveraging multidimensional measures of race and intersectionality for examining physical and mental health status among Latinxs. Sociology of Race and Ethnicity,4(1), 49–66.
Maddox, K. B. (2004). Perspectives on racial phenotypicality bias. Personality and Social Psychology Review,8(4), 383–401.
Martin, L. L., Horton, H. D., Herring, C., Keith, V., & Thomas, M. (2017). Color Struck: How Race and Complexion Matter in the “Color-Blind” Era. Boston, MA: Sense.
Massey, D. S., & Martin, J. A. (2003). The NIS Skin Color Scale. Retrieved August 3, 2015, from https://nis.princeton.edu/downloads/NIS-Skin-Color-Scale.pdf.
Monk, E. P. (2014). Skin tone stratification among Black Americans, 2001–2003. Social Forces,92(4), 1313–1337.
Monk, E. P. (2015). The cost of color: Skin color, discrimination, and health among African-Americans. American Journal of Sociology,121(2), 396–444.
Penner, A. M., & Saperstein, A. (2013). Engendering Racial Perceptions: An intersectional analysis of how social status shapes race. Gender & Society,27(3), 319–344.
Rondilla, J. L., & Spickard, P. (2007). Is lighter better? Skin-tone discrimination among Asian Americans. Toronto: Rowman & Littlefied Publishers Inc.
Ross, L. E. (1997). Mate selection preferences among African American college students. Journal of Black Studies,27(4), 554–569.
Roth, W. D. (2016). The multiple dimensions of race. Ethnic and Racial Studies,39(8), 1310–1338.
Ryabov, I. (2016a). Colorism and educational outcomes of Asian Americans: Evidence from the National Longitudinal Study of Adolescent Health. Social Psychology of Education,19(2), 303–324.
Ryabov, I. (2016b). Educational outcomes of Asian and Hispanic Americans: The significance of skin color. Research in Social Stratification and Mobility,44, 1–9.
Saperstein, A. (2012). Capturing complexity in the United States: Which aspects of race matter and when? Ethnic and Racial Studies,35(8), 1484–1502.
Saperstein, A., Kizer, J. M., & Penner, A. M. (2015). Making the most of multiple measures: Disentangling the effects of different dimensions of race in survey research. American Behavioral Scientist,60(4), 519–537.
Saperstein, A., & Penner, A. M. (2016). Still Searching for a True Race? Reply to Kramer et al. and Alba et al. American Journal of Sociology,122(1), 263–285.
Stepanova, E. V., & Strube, M. J. (2012). The role of skin color and facial physiognomy in racial categorization: Moderation by implicit racial attitudes. Journal of Experimental Social Psychology,48(4), 867–878.
Stewart, Q. T., Cobb, R. Y., & Keith, V. M. (2018). The color of death: Race, observed skin tone, and all-cause mortality in the United States. Ethnicity & Health. https://doi.org/10.1080/13557858.2018.1469735.
Telles, E., Flores, R. D., & Urrea-Giraldo, F. (2015). Pigmentocracies: Educational inequality, skin color and census ethnoracial identification in eight Latin American Countries. Research in Social Stratification and Mobility,40, 39–58.
Uzogara, E. E., Lee, H., Abdou, C. M., & Jackson, J. S. (2014). A comparison of skin tone discrimination among African American Men: 1995 and 2003. Psychology of Men & Masculinity,15(2), 201–212.
Vargas, N. (2015). Latina/o Whitening? Which Latina/Os Self-Classify as White and Report Being Perceived as White by Other Americans? Du Bois Review: Social Science Research on Race,12(01), 119–136.
Vargas, N., & Kingsbury, J. (2016). Racial identity contestation: Mapping and measuring racial boundaries. Sociology Compass,10(8), 718–729.
Vargas, N., & Stainback, K. (2016). Documenting contested racial identities among self-identified Latina/Os, Asians, Blacks, and Whites. American Behavioral Scientist,60(4), 442–464.
Villarreal, A. (2010). Stratification by skin color in contemporary Mexico. American Sociological Review,75(5), 652–678.
Weatherall, I. L., & Coombs, B. D. (1992). Skin color measurements in terms of CIELAB color space values. Journal of Investigative Dermatology,99(4), 468–473.
Weaver, V. M. (2012). The electoral consequences of skin color: The ‘hidden’ side of race in politics. Political Behavior,34, 159–192.
Young, D. M., Sanchez, D. T., & Wilton, L. S. (2017). Biracial perception in Black and White: How Black and White perceivers respond to phenotype and racial identity cues. Cultural Diversity and Ethnic Minority Psychology,23(1), 154–164.
Zopf, B. J. (2018). A different kind of Brown: Arabs and Middle Easterners as Anti-American Muslims. Sociology of Race and Ethnicity,4(2), 178–191.
Acknowledgements
The authors would like to thank the College of Liberal Arts at Texas A&M University for funding the data collection, Lance Hannon for the loan of the spectrometer, Phia Salter for her help with the project development, Aline Piacun for photo manipulation, Mary K. Campbell for her help with photo skin tone measures, and Katie Constantin, Emily Knox, Gabe Miller, David Orta, and Jesus Smith for their help with data collection or project discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Skin tone measures
Text-Based 5-Point Scale
Probably the most commonly used scale in the social sciences is the text-based scale, which asks interviewers to rate the respondent’s skin tone on a scale from very light to very dark. We asked participants to rate the photographs on this scale:
The subject’s skin color is:
-
Very Light
-
Light
-
Medium
-
Dark
-
Very Dark
Graphic-Based 10-Point Scale
Another widely used scale is the 10-point Massey and Martin (2003) scale, which asks interviewers to classify the skin tone of every respondent using this graphic (which was to be memorized and never to be shown to the people they were classifying) as a guide:
Graphic-Based Grid (69-Point) Scale
This scale asked the respondents to classify each person based on a graphic of skin tones. This measure was adapted from L’Oréal’s grid of skin tones, supplemented with more shades in order to broaden the range of darker skin tones available (see footnote 3). The lettered rows indicate the individual’s undertone, which varies from red (A) to yellow (F). The numbered columns vary in skin tone darkness, with the lightest tones on the left (1) and the darkest (12) on the right. Respondents were instructed to choose the cell they believe best matches the skin color of the individuals photographed (e.g., “E5”).
Appendix B: Photograph Selection
We began with photographs from 16 different Latinx models with a range of skin tones. For each photograph, we asked the artist to produce a set of lighter and darker pictures. Then a panel of five researchers (Mary Campbell, Verna Keith, Vanessa Gonlin, Emily Knox, and David Orta) examined each of the pictures in turn, and if more than one of the five researchers felt the picture was not convincing (that is, if they felt the picture looked like it had been digitally altered), then the picture was discarded. The researchers viewed each picture on its own (because often the digital alteration is very clear if the pictures of a single model are viewed all together, as they are displayed below, but not obvious when viewed in isolation).
One model had only two pictures that were chosen (the original, and one that was lightened). Two models had five pictures selected (the original, as well as some that were lightened and some that were darkened). Most models had three or four pictures that we selected. The final number of total photos was 60.
Each respondent only saw each model one time—meaning each respondent saw 16 pictures. The picture they saw was randomly selected from all of the photos of that model. So, for example, there were three pictures of model A, so each respondent was randomly assigned to see one of those photographs, and then one of the five pictures of model B, and so on.
Below is one example of a set of model photographs (included with permission from the model):
Rights and permissions
About this article
Cite this article
Campbell, M.E., Keith, V.M., Gonlin, V. et al. Is a Picture Worth A Thousand Words? An Experiment Comparing Observer-Based Skin Tone Measures. Race Soc Probl 12, 266–278 (2020). https://doi.org/10.1007/s12552-020-09294-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12552-020-09294-0