Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Velez, Yamil Ricardo; Saavedra Cisneros, Ángel; Gomez, Jose

doi:10.1007/s11109-023-09869-8

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Original Paper
Published: 23 March 2023

(2023)
Cite this article

Political Behavior Aims and scope Submit manuscript

Yamil Ricardo Velez ORCID: orcid.org/0000-0002-3243-0975¹,
Ángel Saavedra Cisneros² &
Jose Gomez¹

449 Accesses
1 Citation
16 Altmetric
Explore all metrics

Abstract

Accurate estimation of public opinion in diverse countries requires survey questions that operate similarly across languages. We leverage 3,026 bilingual Latinos across four studies to investigate how language-of-administration affects the measurement properties of political science scales. We randomly assign bilinguals to English and Spanish survey forms and uncover item bias, also known as differential item functioning (DIF), across items measuring identity, attitudes, and political knowledge. We examine whether translation errors are the culprit, yet find that perceived translation quality does not predict the magnitude of DIF and questions with minimal text exhibit item bias. Our findings suggest otherwise similar Latinos may be estimated to possess different levels of knowledge, attitude strength, and group identification due to survey language selection. We discuss the implications of our findings on the design of multilingual surveys and the study of racial and ethnic politics more broadly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Research Methods for Public Policy

More human than human: measuring ChatGPT political bias

Article Open access 17 August 2023

Notes

TRAPD is an acronym for translation, review, adjudication, pre-testing, and documentation. This method has been used in developing cross-national surveys such as the European Social Survey (Jowell et al., 2007).
To our knowledge, political scientists have yet to use the bilingual experimental design to assess if multi-item scales measuring political attitudes and knowledge function similarly across languages (i.e., have similar item properties).
Students were recruited to participate in a study on Latino/a politics via instructor e-mails. A $50 Amazon.com gift card lottery was used to incentivize participation.
CloudResearch has been shown to recover treatment effects observed in other studies involving Latino samples (Velez et al,. 2022).
Pre-analysis plans for these two studies can be found at https://osf.io/39j2d. In “Appendix G”, we reproduce the pre-analysis plans and describe deviations.
Upon being assigned to a language form, respondents were also randomly assigned to different prompts asking them to write about the importance of their ethnic identity. These treatments had no discernible effects on attitudes. We use the complete sample to preserve statistical power.
The survey block measuring ideology in Study 3 had a programming error that presented participants with an uneven number of items across both languages. No other scales were affected. We corrected the error in Study 4.
The final study performed randomization at the scale level, such that for each scale, participants could be assigned to one of four scales (English items; Spanish items; English items first, Spanish second; Spanish items first, English second). This was done to counterbalance possible order effects.
The first study randomized the order of languages. Given the large number of scales in the second and third studies, respondents in the hybrid condition always responded to English items first before Spanish items. The final study randomized the order of languages.
Study 4 used successful passage of the two-question language quiz as inclusion criteria.
In “Appendix D”, we estimate one-parameter versions of each IRT model presented in the paper, and still detect evidence of DIF for most scales.
For the HSI study, we estimate simpler one-parameter models to avoid over-fitting, given the smaller sample size (N = 194). These models estimate a difficulty parameter for each binary item and item step difficulties for each polytomous item. Item step difficulties capture the point on the latent scale where probability curves for two adjacent response categories intersect (e.g., the point on the latent scale at which the probability of “somewhat agree” is equal to “strongly agree”) (Nering & Remo, 2011).
Difficulty and discrimination have analogs in factor analysis; discrimination corresponds to factor loadings and difficulty corresponds to factor intercepts.
Given the randomization of participants to language forms, we view this as a plausible assumption.
This scale was identical to the HSI version, except for the inclusion of visual items.
We present item discrimination differences in “Appendix E”. We focus on item difficulties because they are easier to interpret and more commonly used in substantive applications of IRT.
These “mixed” cases can be observed in the anti-trans sentiment, panethnic identity, and immigration opinion scales, where the selection of lower response options is easier in English, but the selection of higher response options is easier in Spanish, and vice versa.
These translators are affiliated with the Department of Latin American and Iberian Cultures at the principal investigator’s institution, and comprise the entire set of translators listed on the Spanish translation webpage.
We thank an anonymous reviewer for this suggestion.
For binary items, the number of steps is one (i.e., the probability of moving from 0 to 1), whereas for ordinal items, the number of steps is equal to ${\text {K}}-1,$ where K represents the number of response categories. For example, four step difficulties are estimated when one is modeling a five-point scale. These step difficulties capture the probability of selecting the second response category versus the first, the third versus the second, and so on.
Peytcheva (2020) describes a theory of language-driven survey response that enumerates possible mechanisms underlying language effects. In this study, we are unable to empirically distinguish between these mechanisms, but future research could assess the psychological processes underlying linguistic DIF when translation errors are not responsible.
Notably, DIF is only detected for non-Latino public officials. However, in Study 3, question formats varied across Latino and non-Latino public officials, so it is unclear whether the DIF is due to question format or legislator ethnicity. In Study 4, we addressed this by presenting participants with Latino and non-Latino public officials within each question type. Consistent with Study 3, we tend to detect evidence of linguistic DIF for non-Latino officials (see Political Knowledge CR2 in Fig. 1). However, we fail to reject the null hypothesis of zero difference in mean absolute DIF between items measuring knowledge of Latino and non-Latino officials ($\beta = -.06$; SE = .20; p = .77).
We thank an anonymous reviewer for suggesting this analysis.

References

Abrajano, M. (2015). Reexamining the “racial gap’’ in political knowledge. The Journal of Politics, 77(1), 44–54.
Article Google Scholar
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.
Article Google Scholar
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.
Google Scholar
Awad, G. H., Hashem, H., & Nguyen, H. (2021). Identity and ethnic/racial self-labeling among Americans of Arab or Middle Eastern and North African descent. Identity, 21(2), 115–130.
Article Google Scholar
Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22.
Article Google Scholar
Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216.
Article Google Scholar
Ervin, S., & Bower, R. T. (1952). Translation problems in international surveys. Public Opinion Quarterly, 16(4), 595–604.
Article Google Scholar
Flores, A., & Coppock, A. (2018). Do bilinguals respond more favorably to candidate advertisements in English or in Spanish? Political Communication, 35(4), 612–633.
Article Google Scholar
Gomez-Aguinaga, B. (2021). Messaging “en Español’’: The impact of Spanish language on linked fate among bilingual Latinos. The International Journal of Press/Politics. https://doi.org/10.1177/19401612211050889.
Article Google Scholar
Harkness, J. A., Van de Vijver, F. J. R., Mohler, P. P., & Wiley, J. (2003). Cross-cultural survey methods (Vol. 325). Wiley-Interscience.
Google Scholar
Harkness, J., Stange, M., Cibelli, K. L., Mohler, P., & Pennell, B.-E. (2014). Surveying cultural and linguistic minorities. In Hard-to-survey populations. Cambridge University Press.
Hidalgo-Montesinos, M. D., & Gómez-Benito, J. (2003). Test Purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1.
Article Google Scholar
Hill, K. A., & Moreno, D. V. (2001). Language as a variable: English, Spanish, ethnicity, and political opinion polling in South Florida. Hispanic Journal of Behavioral Sciences, 23(2), 208–228.
Article Google Scholar
Huddy, L., Mason, L., & Aarøe, L. (2015). Expressive partisanship: Campaign involvement, political emotion, and partisan identity. American Political Science Review, 109(1), 1–17.
Article Google Scholar
Iyengar, S. (1993). Assessing linguistic equivalence in multilingual surveys. In Social research in developing countries: Surveys and censuses in the Third World (pp. 173–182). London: Wiley.
Jones-Correa, M., Al-Faham, H., & Cortez, D. (2018). Political (mis) behavior: Attention and lacunae in the study of Latino politics. Annual Review of Sociology, 44(1), 213–235.
Article Google Scholar
Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. SAGE.
King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66.
Article Google Scholar
Lee, S., & Grant, D. (2009). The effect of question order on self-rated general health status in a multilingual survey context. American Journal of Epidemiology, 169(12), 1525–1530.
Article Google Scholar
Lee, T., & Pérez, E. O. (2014). The persistent connection between language-of-interview and Latino political opinion. Political Behavior, 36(2), 401–425.
Article Google Scholar
Lien, P., Margaret Conway, M., & Wong, J. (2003). The contours and sources of ethnic identity choices among Asian Americans. Social Science Quarterly, 84(2), 461–481.
Article Google Scholar
Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.
Article Google Scholar
Nering, M. L., & Ostini, R. (2011). Handbook of polytomous item response theory models. Taylor & Francis.
Pérez, E., & Tavits, M. (2022). Voicing politics. In Voicing politics. Princeton University Press.
Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71(4), 1530–1548.
Article Google Scholar
Pérez, E. O. (2011). The origins and implications of language effects in multilingual surveys: A MIMIC approach with application to Latino political attitudes. Political Analysis, 19(4), 434–454.
Article Google Scholar
Pérez, E. O., & Tavits, M. (2016). Language shapes public attitudes toward gender equality. The Journal of Politics. https://doi.org/10.1086/700004.
Article Google Scholar
Pérez, E. O., & Tavits, M. (2017). Language shapes people’s time perspective and support for future-oriented policies. American Journal of Political Science, 61(3), 715–727.
Article Google Scholar
Pérez, E. O., & Tavits, M. (2019). Language influences public attitudes toward gender equality. The Journal of Politics, 81(1), 81–93.
Article Google Scholar
Peytcheva, E. (2020). The effect of language of survey administration on the response formation process. In The essential role of language in survey research (p. 1). RTI International.
Pietryka, M. T., & Macintosh, R. C. (2017). ANES scales often don’t measure what you think they measure—An ERPC2016 analysis.
Prior, M. (2014). Visual political knowledge: A different road to competence? The Journal of Politics, 76(1), 41–57.
Article Google Scholar
Ramakrishnan, K., & Ahmad, F. Z. (2014). Language diversity and English proficiency (p. 27). Center for American Progress.
Ramirez, C. M., Abrajano, M. A., & Michael Alvarez, R. (2019). Using machine learning to uncover hidden heterogeneities in survey data. Scientific Reports, 9(1), 1–11.
Article Google Scholar
Saavedra Cisneros, A., Carey Jr, T. E., Rogers, D. L., & Johnson, J. M. (2022). One size does not fit all: Core political values and principles across race, ethnicity, and gender. Politics, Groups, and Identities 1–20.
Sireci, S. G. (1997). Problems and issues in linking assessments across languages. Educational Measurement: Issues and Practice, 16(1), 12–19.
Article Google Scholar
Sireci, S. G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248.
Article Google Scholar
Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In Rethinking linguistic relativity (pp. 70–96). Cambridge University Press.
Velez, Y. R., Porter, E., & Wood, T. (2022). Latino-targeted misinformation and the power of factual corrections. Journal of Politics. https://doi.org/10.1086/722345.
Article Google Scholar
Welch, S., Comer, J., & Steinman, M. (1973). Interviewing in a Mexican–American community: An investigation of some potential sources of response bias. The Public Opinion Quarterly, 37(1), 115–126.
Article Google Scholar
Willis, G. B. (2015). The practice of cross-cultural cognitive interviewing. Public Opinion Quarterly, 79(S1), 359–395.
Article Google Scholar
Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: Emerging constituents and their political identities. Russell Sage Foundation.
Zavala-Rojas, D. (2018). Exploring language effects in crosscultural survey research: Does the language of administration affect answers about politics? Methods, Data, Analyses: A Journal for Quantitative Methods and Survey Methodology (MDA), 12(1), 127–150.
Google Scholar

Download references

Acknowledgements

We thank Devin Caughey, Stephen Sireci, Chris Tausanovitch, Nazita Lajevardi, Michael Peress, Brad Jones, Melissa Michelson, Efren Perez, anonymous reviewers, and the editors for thoughtful comments and suggestions. We also appreciate the helpful feedback we received at Michigan State University’s Minority Politics Online Seminar Series (MPOSS). We received funding from the Columbia Experimental Laboratory for Social Sciences.

Author information

Authors and Affiliations

Department of Political Science at Columbia University, 420 W. 118th Street, New York, NY, 10027, USA
Yamil Ricardo Velez & Jose Gomez
Department of Government and Legal Studies at Bowdoin College, 255 Main Street, Brunswick, ME, 04011, USA
Ángel Saavedra Cisneros

Authors

Yamil Ricardo Velez
View author publications
You can also search for this author in PubMed Google Scholar
Ángel Saavedra Cisneros
View author publications
You can also search for this author in PubMed Google Scholar
Jose Gomez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yamil Ricardo Velez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Material for this article is available in the “Appendix” in the online edition. Replication files are available in the Political Behavior Dataverse (https://doi.org/10.7910/DVN/MPWVJ7)

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 2928 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Velez, Y.R., Saavedra Cisneros, Á. & Gomez, J. Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys. Polit Behav (2023). https://doi.org/10.1007/s11109-023-09869-8

Download citation

Accepted: 01 March 2023
Published: 23 March 2023
DOI: https://doi.org/10.1007/s11109-023-09869-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Research Methods for Public Policy

More human than human: measuring ChatGPT political bias

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

(PDF 2928 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Research Methods for Public Policy

More human than human: measuring ChatGPT political bias

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

(PDF 2928 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation