Skip to main content
Log in

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

  • Original Paper
  • Published:
Political Behavior Aims and scope Submit manuscript

Abstract

Accurate estimation of public opinion in diverse countries requires survey questions that operate similarly across languages. We leverage 3,026 bilingual Latinos across four studies to investigate how language-of-administration affects the measurement properties of political science scales. We randomly assign bilinguals to English and Spanish survey forms and uncover item bias, also known as differential item functioning (DIF), across items measuring identity, attitudes, and political knowledge. We examine whether translation errors are the culprit, yet find that perceived translation quality does not predict the magnitude of DIF and questions with minimal text exhibit item bias. Our findings suggest otherwise similar Latinos may be estimated to possess different levels of knowledge, attitude strength, and group identification due to survey language selection. We discuss the implications of our findings on the design of multilingual surveys and the study of racial and ethnic politics more broadly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. TRAPD is an acronym for translation, review, adjudication, pre-testing, and documentation. This method has been used in developing cross-national surveys such as the European Social Survey (Jowell et al., 2007).

  2. To our knowledge, political scientists have yet to use the bilingual experimental design to assess if multi-item scales measuring political attitudes and knowledge function similarly across languages (i.e., have similar item properties).

  3. Students were recruited to participate in a study on Latino/a politics via instructor e-mails. A $50 Amazon.com gift card lottery was used to incentivize participation.

  4. CloudResearch has been shown to recover treatment effects observed in other studies involving Latino samples (Velez et al,. 2022).

  5. Pre-analysis plans for these two studies can be found at https://osf.io/39j2d. In “Appendix G”, we reproduce the pre-analysis plans and describe deviations.

  6. Upon being assigned to a language form, respondents were also randomly assigned to different prompts asking them to write about the importance of their ethnic identity. These treatments had no discernible effects on attitudes. We use the complete sample to preserve statistical power.

  7. The survey block measuring ideology in Study 3 had a programming error that presented participants with an uneven number of items across both languages. No other scales were affected. We corrected the error in Study 4.

  8. The final study performed randomization at the scale level, such that for each scale, participants could be assigned to one of four scales (English items; Spanish items; English items first, Spanish second; Spanish items first, English second). This was done to counterbalance possible order effects.

  9. The first study randomized the order of languages. Given the large number of scales in the second and third studies, respondents in the hybrid condition always responded to English items first before Spanish items. The final study randomized the order of languages.

  10. Study 4 used successful passage of the two-question language quiz as inclusion criteria.

  11. In “Appendix D”, we estimate one-parameter versions of each IRT model presented in the paper, and still detect evidence of DIF for most scales.

  12. For the HSI study, we estimate simpler one-parameter models to avoid over-fitting, given the smaller sample size (N = 194). These models estimate a difficulty parameter for each binary item and item step difficulties for each polytomous item. Item step difficulties capture the point on the latent scale where probability curves for two adjacent response categories intersect (e.g., the point on the latent scale at which the probability of “somewhat agree” is equal to “strongly agree”) (Nering & Remo, 2011).

  13. Difficulty and discrimination have analogs in factor analysis; discrimination corresponds to factor loadings and difficulty corresponds to factor intercepts.

  14. Given the randomization of participants to language forms, we view this as a plausible assumption.

  15. This scale was identical to the HSI version, except for the inclusion of visual items.

  16. We present item discrimination differences in “Appendix E”. We focus on item difficulties because they are easier to interpret and more commonly used in substantive applications of IRT.

  17. These “mixed” cases can be observed in the anti-trans sentiment, panethnic identity, and immigration opinion scales, where the selection of lower response options is easier in English, but the selection of higher response options is easier in Spanish, and vice versa.

  18. These translators are affiliated with the Department of Latin American and Iberian Cultures at the principal investigator’s institution, and comprise the entire set of translators listed on the Spanish translation webpage.

  19. We thank an anonymous reviewer for this suggestion.

  20. For binary items, the number of steps is one (i.e., the probability of moving from 0 to 1), whereas for ordinal items, the number of steps is equal to \({\text {K}}-1,\) where K represents the number of response categories. For example, four step difficulties are estimated when one is modeling a five-point scale. These step difficulties capture the probability of selecting the second response category versus the first, the third versus the second, and so on.

  21. Peytcheva (2020) describes a theory of language-driven survey response that enumerates possible mechanisms underlying language effects. In this study, we are unable to empirically distinguish between these mechanisms, but future research could assess the psychological processes underlying linguistic DIF when translation errors are not responsible.

  22. Notably, DIF is only detected for non-Latino public officials. However, in Study 3, question formats varied across Latino and non-Latino public officials, so it is unclear whether the DIF is due to question format or legislator ethnicity. In Study 4, we addressed this by presenting participants with Latino and non-Latino public officials within each question type. Consistent with Study 3, we tend to detect evidence of linguistic DIF for non-Latino officials (see Political Knowledge CR2 in Fig. 1). However, we fail to reject the null hypothesis of zero difference in mean absolute DIF between items measuring knowledge of Latino and non-Latino officials (\(\beta = -.06\); SE = .20; p = .77).

  23. We thank an anonymous reviewer for suggesting this analysis.

References

  • Abrajano, M. (2015). Reexamining the “racial gap’’ in political knowledge. The Journal of Politics, 77(1), 44–54.

    Article  Google Scholar 

  • Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.

    Article  Google Scholar 

  • Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.

    Google Scholar 

  • Awad, G. H., Hashem, H., & Nguyen, H. (2021). Identity and ethnic/racial self-labeling among Americans of Arab or Middle Eastern and North African descent. Identity, 21(2), 115–130.

    Article  Google Scholar 

  • Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22.

    Article  Google Scholar 

  • Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216.

    Article  Google Scholar 

  • Ervin, S., & Bower, R. T. (1952). Translation problems in international surveys. Public Opinion Quarterly, 16(4), 595–604.

    Article  Google Scholar 

  • Flores, A., & Coppock, A. (2018). Do bilinguals respond more favorably to candidate advertisements in English or in Spanish? Political Communication, 35(4), 612–633.

    Article  Google Scholar 

  • Gomez-Aguinaga, B. (2021). Messaging “en Español’’: The impact of Spanish language on linked fate among bilingual Latinos. The International Journal of Press/Politics. https://doi.org/10.1177/19401612211050889.

    Article  Google Scholar 

  • Harkness, J. A., Van de Vijver, F. J. R., Mohler, P. P., & Wiley, J. (2003). Cross-cultural survey methods (Vol. 325). Wiley-Interscience.

    Google Scholar 

  • Harkness, J., Stange, M., Cibelli, K. L., Mohler, P., & Pennell, B.-E. (2014). Surveying cultural and linguistic minorities. In Hard-to-survey populations. Cambridge University Press.

  • Hidalgo-Montesinos, M. D., & Gómez-Benito, J. (2003). Test Purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1.

    Article  Google Scholar 

  • Hill, K. A., & Moreno, D. V. (2001). Language as a variable: English, Spanish, ethnicity, and political opinion polling in South Florida. Hispanic Journal of Behavioral Sciences, 23(2), 208–228.

    Article  Google Scholar 

  • Huddy, L., Mason, L., & Aarøe, L. (2015). Expressive partisanship: Campaign involvement, political emotion, and partisan identity. American Political Science Review, 109(1), 1–17.

    Article  Google Scholar 

  • Iyengar, S. (1993). Assessing linguistic equivalence in multilingual surveys. In Social research in developing countries: Surveys and censuses in the Third World (pp. 173–182). London: Wiley.

  • Jones-Correa, M., Al-Faham, H., & Cortez, D. (2018). Political (mis) behavior: Attention and lacunae in the study of Latino politics. Annual Review of Sociology, 44(1), 213–235.

    Article  Google Scholar 

  • Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. SAGE.

  • King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66.

    Article  Google Scholar 

  • Lee, S., & Grant, D. (2009). The effect of question order on self-rated general health status in a multilingual survey context. American Journal of Epidemiology, 169(12), 1525–1530.

    Article  Google Scholar 

  • Lee, T., & Pérez, E. O. (2014). The persistent connection between language-of-interview and Latino political opinion. Political Behavior, 36(2), 401–425.

    Article  Google Scholar 

  • Lien, P., Margaret Conway, M., & Wong, J. (2003). The contours and sources of ethnic identity choices among Asian Americans. Social Science Quarterly, 84(2), 461–481.

    Article  Google Scholar 

  • Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.

    Article  Google Scholar 

  • Nering, M. L., & Ostini, R. (2011). Handbook of polytomous item response theory models. Taylor & Francis.

  • Pérez, E., & Tavits, M. (2022). Voicing politics. In Voicing politics. Princeton University Press.

  • Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71(4), 1530–1548.

    Article  Google Scholar 

  • Pérez, E. O. (2011). The origins and implications of language effects in multilingual surveys: A MIMIC approach with application to Latino political attitudes. Political Analysis, 19(4), 434–454.

    Article  Google Scholar 

  • Pérez, E. O., & Tavits, M. (2016). Language shapes public attitudes toward gender equality. The Journal of Politics. https://doi.org/10.1086/700004.

    Article  Google Scholar 

  • Pérez, E. O., & Tavits, M. (2017). Language shapes people’s time perspective and support for future-oriented policies. American Journal of Political Science, 61(3), 715–727.

    Article  Google Scholar 

  • Pérez, E. O., & Tavits, M. (2019). Language influences public attitudes toward gender equality. The Journal of Politics, 81(1), 81–93.

    Article  Google Scholar 

  • Peytcheva, E. (2020). The effect of language of survey administration on the response formation process. In The essential role of language in survey research (p. 1). RTI International.

  • Pietryka, M. T., & Macintosh, R. C. (2017). ANES scales often don’t measure what you think they measure—An ERPC2016 analysis.

  • Prior, M. (2014). Visual political knowledge: A different road to competence? The Journal of Politics, 76(1), 41–57.

    Article  Google Scholar 

  • Ramakrishnan, K., & Ahmad, F. Z. (2014). Language diversity and English proficiency (p. 27). Center for American Progress.

  • Ramirez, C. M., Abrajano, M. A., & Michael Alvarez, R. (2019). Using machine learning to uncover hidden heterogeneities in survey data. Scientific Reports, 9(1), 1–11.

    Article  Google Scholar 

  • Saavedra Cisneros, A., Carey Jr, T. E., Rogers, D. L., & Johnson, J. M. (2022). One size does not fit all: Core political values and principles across race, ethnicity, and gender. Politics, Groups, and Identities 1–20.

  • Sireci, S. G. (1997). Problems and issues in linking assessments across languages. Educational Measurement: Issues and Practice, 16(1), 12–19.

    Article  Google Scholar 

  • Sireci, S. G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248.

    Article  Google Scholar 

  • Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In Rethinking linguistic relativity (pp. 70–96). Cambridge University Press.

  • Velez, Y. R., Porter, E., & Wood, T. (2022). Latino-targeted misinformation and the power of factual corrections. Journal of Politics. https://doi.org/10.1086/722345.

    Article  Google Scholar 

  • Welch, S., Comer, J., & Steinman, M. (1973). Interviewing in a Mexican–American community: An investigation of some potential sources of response bias. The Public Opinion Quarterly, 37(1), 115–126.

    Article  Google Scholar 

  • Willis, G. B. (2015). The practice of cross-cultural cognitive interviewing. Public Opinion Quarterly, 79(S1), 359–395.

    Article  Google Scholar 

  • Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: Emerging constituents and their political identities. Russell Sage Foundation.

  • Zavala-Rojas, D. (2018). Exploring language effects in crosscultural survey research: Does the language of administration affect answers about politics? Methods, Data, Analyses: A Journal for Quantitative Methods and Survey Methodology (MDA), 12(1), 127–150.

    Google Scholar 

Download references

Acknowledgements

We thank Devin Caughey, Stephen Sireci, Chris Tausanovitch, Nazita Lajevardi, Michael Peress, Brad Jones, Melissa Michelson, Efren Perez, anonymous reviewers, and the editors for thoughtful comments and suggestions. We also appreciate the helpful feedback we received at Michigan State University’s Minority Politics Online Seminar Series (MPOSS). We received funding from the Columbia Experimental Laboratory for Social Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yamil Ricardo Velez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Material for this article is available in the “Appendix” in the online edition. Replication files are available in the Political Behavior Dataverse (https://doi.org/10.7910/DVN/MPWVJ7)

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 2928 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Velez, Y.R., Saavedra Cisneros, Á. & Gomez, J. Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys. Polit Behav (2023). https://doi.org/10.1007/s11109-023-09869-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11109-023-09869-8

Keywords

Navigation