Abstract
Accurate estimation of public opinion in diverse countries requires survey questions that operate similarly across languages. We leverage 3,026 bilingual Latinos across four studies to investigate how language-of-administration affects the measurement properties of political science scales. We randomly assign bilinguals to English and Spanish survey forms and uncover item bias, also known as differential item functioning (DIF), across items measuring identity, attitudes, and political knowledge. We examine whether translation errors are the culprit, yet find that perceived translation quality does not predict the magnitude of DIF and questions with minimal text exhibit item bias. Our findings suggest otherwise similar Latinos may be estimated to possess different levels of knowledge, attitude strength, and group identification due to survey language selection. We discuss the implications of our findings on the design of multilingual surveys and the study of racial and ethnic politics more broadly.
Similar content being viewed by others
Notes
TRAPD is an acronym for translation, review, adjudication, pre-testing, and documentation. This method has been used in developing cross-national surveys such as the European Social Survey (Jowell et al., 2007).
To our knowledge, political scientists have yet to use the bilingual experimental design to assess if multi-item scales measuring political attitudes and knowledge function similarly across languages (i.e., have similar item properties).
Students were recruited to participate in a study on Latino/a politics via instructor e-mails. A $50 Amazon.com gift card lottery was used to incentivize participation.
CloudResearch has been shown to recover treatment effects observed in other studies involving Latino samples (Velez et al,. 2022).
Pre-analysis plans for these two studies can be found at https://osf.io/39j2d. In “Appendix G”, we reproduce the pre-analysis plans and describe deviations.
Upon being assigned to a language form, respondents were also randomly assigned to different prompts asking them to write about the importance of their ethnic identity. These treatments had no discernible effects on attitudes. We use the complete sample to preserve statistical power.
The survey block measuring ideology in Study 3 had a programming error that presented participants with an uneven number of items across both languages. No other scales were affected. We corrected the error in Study 4.
The final study performed randomization at the scale level, such that for each scale, participants could be assigned to one of four scales (English items; Spanish items; English items first, Spanish second; Spanish items first, English second). This was done to counterbalance possible order effects.
The first study randomized the order of languages. Given the large number of scales in the second and third studies, respondents in the hybrid condition always responded to English items first before Spanish items. The final study randomized the order of languages.
Study 4 used successful passage of the two-question language quiz as inclusion criteria.
In “Appendix D”, we estimate one-parameter versions of each IRT model presented in the paper, and still detect evidence of DIF for most scales.
For the HSI study, we estimate simpler one-parameter models to avoid over-fitting, given the smaller sample size (N = 194). These models estimate a difficulty parameter for each binary item and item step difficulties for each polytomous item. Item step difficulties capture the point on the latent scale where probability curves for two adjacent response categories intersect (e.g., the point on the latent scale at which the probability of “somewhat agree” is equal to “strongly agree”) (Nering & Remo, 2011).
Difficulty and discrimination have analogs in factor analysis; discrimination corresponds to factor loadings and difficulty corresponds to factor intercepts.
Given the randomization of participants to language forms, we view this as a plausible assumption.
This scale was identical to the HSI version, except for the inclusion of visual items.
We present item discrimination differences in “Appendix E”. We focus on item difficulties because they are easier to interpret and more commonly used in substantive applications of IRT.
These “mixed” cases can be observed in the anti-trans sentiment, panethnic identity, and immigration opinion scales, where the selection of lower response options is easier in English, but the selection of higher response options is easier in Spanish, and vice versa.
These translators are affiliated with the Department of Latin American and Iberian Cultures at the principal investigator’s institution, and comprise the entire set of translators listed on the Spanish translation webpage.
We thank an anonymous reviewer for this suggestion.
For binary items, the number of steps is one (i.e., the probability of moving from 0 to 1), whereas for ordinal items, the number of steps is equal to \({\text {K}}-1,\) where K represents the number of response categories. For example, four step difficulties are estimated when one is modeling a five-point scale. These step difficulties capture the probability of selecting the second response category versus the first, the third versus the second, and so on.
Peytcheva (2020) describes a theory of language-driven survey response that enumerates possible mechanisms underlying language effects. In this study, we are unable to empirically distinguish between these mechanisms, but future research could assess the psychological processes underlying linguistic DIF when translation errors are not responsible.
Notably, DIF is only detected for non-Latino public officials. However, in Study 3, question formats varied across Latino and non-Latino public officials, so it is unclear whether the DIF is due to question format or legislator ethnicity. In Study 4, we addressed this by presenting participants with Latino and non-Latino public officials within each question type. Consistent with Study 3, we tend to detect evidence of linguistic DIF for non-Latino officials (see Political Knowledge CR2 in Fig. 1). However, we fail to reject the null hypothesis of zero difference in mean absolute DIF between items measuring knowledge of Latino and non-Latino officials (\(\beta = -.06\); SE = .20; p = .77).
We thank an anonymous reviewer for suggesting this analysis.
References
Abrajano, M. (2015). Reexamining the “racial gap’’ in political knowledge. The Journal of Politics, 77(1), 44–54.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.
Awad, G. H., Hashem, H., & Nguyen, H. (2021). Identity and ethnic/racial self-labeling among Americans of Arab or Middle Eastern and North African descent. Identity, 21(2), 115–130.
Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22.
Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216.
Ervin, S., & Bower, R. T. (1952). Translation problems in international surveys. Public Opinion Quarterly, 16(4), 595–604.
Flores, A., & Coppock, A. (2018). Do bilinguals respond more favorably to candidate advertisements in English or in Spanish? Political Communication, 35(4), 612–633.
Gomez-Aguinaga, B. (2021). Messaging “en Español’’: The impact of Spanish language on linked fate among bilingual Latinos. The International Journal of Press/Politics. https://doi.org/10.1177/19401612211050889.
Harkness, J. A., Van de Vijver, F. J. R., Mohler, P. P., & Wiley, J. (2003). Cross-cultural survey methods (Vol. 325). Wiley-Interscience.
Harkness, J., Stange, M., Cibelli, K. L., Mohler, P., & Pennell, B.-E. (2014). Surveying cultural and linguistic minorities. In Hard-to-survey populations. Cambridge University Press.
Hidalgo-Montesinos, M. D., & Gómez-Benito, J. (2003). Test Purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1.
Hill, K. A., & Moreno, D. V. (2001). Language as a variable: English, Spanish, ethnicity, and political opinion polling in South Florida. Hispanic Journal of Behavioral Sciences, 23(2), 208–228.
Huddy, L., Mason, L., & Aarøe, L. (2015). Expressive partisanship: Campaign involvement, political emotion, and partisan identity. American Political Science Review, 109(1), 1–17.
Iyengar, S. (1993). Assessing linguistic equivalence in multilingual surveys. In Social research in developing countries: Surveys and censuses in the Third World (pp. 173–182). London: Wiley.
Jones-Correa, M., Al-Faham, H., & Cortez, D. (2018). Political (mis) behavior: Attention and lacunae in the study of Latino politics. Annual Review of Sociology, 44(1), 213–235.
Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. SAGE.
King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66.
Lee, S., & Grant, D. (2009). The effect of question order on self-rated general health status in a multilingual survey context. American Journal of Epidemiology, 169(12), 1525–1530.
Lee, T., & Pérez, E. O. (2014). The persistent connection between language-of-interview and Latino political opinion. Political Behavior, 36(2), 401–425.
Lien, P., Margaret Conway, M., & Wong, J. (2003). The contours and sources of ethnic identity choices among Asian Americans. Social Science Quarterly, 84(2), 461–481.
Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.
Nering, M. L., & Ostini, R. (2011). Handbook of polytomous item response theory models. Taylor & Francis.
Pérez, E., & Tavits, M. (2022). Voicing politics. In Voicing politics. Princeton University Press.
Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71(4), 1530–1548.
Pérez, E. O. (2011). The origins and implications of language effects in multilingual surveys: A MIMIC approach with application to Latino political attitudes. Political Analysis, 19(4), 434–454.
Pérez, E. O., & Tavits, M. (2016). Language shapes public attitudes toward gender equality. The Journal of Politics. https://doi.org/10.1086/700004.
Pérez, E. O., & Tavits, M. (2017). Language shapes people’s time perspective and support for future-oriented policies. American Journal of Political Science, 61(3), 715–727.
Pérez, E. O., & Tavits, M. (2019). Language influences public attitudes toward gender equality. The Journal of Politics, 81(1), 81–93.
Peytcheva, E. (2020). The effect of language of survey administration on the response formation process. In The essential role of language in survey research (p. 1). RTI International.
Pietryka, M. T., & Macintosh, R. C. (2017). ANES scales often don’t measure what you think they measure—An ERPC2016 analysis.
Prior, M. (2014). Visual political knowledge: A different road to competence? The Journal of Politics, 76(1), 41–57.
Ramakrishnan, K., & Ahmad, F. Z. (2014). Language diversity and English proficiency (p. 27). Center for American Progress.
Ramirez, C. M., Abrajano, M. A., & Michael Alvarez, R. (2019). Using machine learning to uncover hidden heterogeneities in survey data. Scientific Reports, 9(1), 1–11.
Saavedra Cisneros, A., Carey Jr, T. E., Rogers, D. L., & Johnson, J. M. (2022). One size does not fit all: Core political values and principles across race, ethnicity, and gender. Politics, Groups, and Identities 1–20.
Sireci, S. G. (1997). Problems and issues in linking assessments across languages. Educational Measurement: Issues and Practice, 16(1), 12–19.
Sireci, S. G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248.
Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In Rethinking linguistic relativity (pp. 70–96). Cambridge University Press.
Velez, Y. R., Porter, E., & Wood, T. (2022). Latino-targeted misinformation and the power of factual corrections. Journal of Politics. https://doi.org/10.1086/722345.
Welch, S., Comer, J., & Steinman, M. (1973). Interviewing in a Mexican–American community: An investigation of some potential sources of response bias. The Public Opinion Quarterly, 37(1), 115–126.
Willis, G. B. (2015). The practice of cross-cultural cognitive interviewing. Public Opinion Quarterly, 79(S1), 359–395.
Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: Emerging constituents and their political identities. Russell Sage Foundation.
Zavala-Rojas, D. (2018). Exploring language effects in crosscultural survey research: Does the language of administration affect answers about politics? Methods, Data, Analyses: A Journal for Quantitative Methods and Survey Methodology (MDA), 12(1), 127–150.
Acknowledgements
We thank Devin Caughey, Stephen Sireci, Chris Tausanovitch, Nazita Lajevardi, Michael Peress, Brad Jones, Melissa Michelson, Efren Perez, anonymous reviewers, and the editors for thoughtful comments and suggestions. We also appreciate the helpful feedback we received at Michigan State University’s Minority Politics Online Seminar Series (MPOSS). We received funding from the Columbia Experimental Laboratory for Social Sciences.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Material for this article is available in the “Appendix” in the online edition. Replication files are available in the Political Behavior Dataverse (https://doi.org/10.7910/DVN/MPWVJ7)
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Velez, Y.R., Saavedra Cisneros, Á. & Gomez, J. Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys. Polit Behav (2023). https://doi.org/10.1007/s11109-023-09869-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11109-023-09869-8