Showcasing the usefulness of web probing: Do subtle variations in questionnaire translation lead to different survey responding?

Behr, Dorothée; Braun, Michael; Aiglstorfer, Luisa

doi:10.1007/s11135-024-01843-8

Showcasing the usefulness of web probing: Do subtle variations in questionnaire translation lead to different survey responding?

Open access
Published: 05 March 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Quality & Quantity Aims and scope Submit manuscript

Showcasing the usefulness of web probing: Do subtle variations in questionnaire translation lead to different survey responding?

Download PDF

355 Accesses
Explore all metrics

Abstract

It is generally taken for granted that comparability in comparative research hinges, among others, on the quality of questionnaire translations. However, what do slight differences in translation mean for respondents’ answers? In this article, we look at a combination of quantitative evidence from split-ballot experiments and qualitative evidence from additional probing questions for three items that were translated according to different translation methods, resulting in different translations, e.g., for “our national way of life.” Two of the three items do not show any quantitative differences between translation versions when implemented in split-ballot experiments. However, using open-ended probing questions we delved deeper into the effects of different translation versions. This allowed us to show that different translations do indeed change respondent understanding. We suggest mechanisms that may lead to different translations (not) having an impact on the data, and we also try to align the results to the notion of equivalence/comparability in translation. Ultimately, we showcase the usefulness of web probing for exploring different translation understandings.

The Harmonisation Process: Harmonisation Is Not Translation

Comparability of Survey Measurements

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Article 23 March 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cross-cultural comparative research requires equivalent measurement in different cultures to draw valid conclusions. A crucial element in establishing equivalence is comparable translation. Questionnaire translation has been on the research agenda in cross-national social sciences since the late 1990s (Harkness and Schoua-Glusberg 1998). Quite some research since then has been focusing on translation and translation assessment methodology, bringing about, amongst others, the team translation model TRAPD (Harkness 2003) as a countermodel to (simple) back translation (Brislin 1970). The methodology debate continues, including a revived discussion around the (non-)potential of the back translation method (Colina et al. 2017; Epstein et al. 2015), or technology-driven advances in terms of machine translation (Zavala-Rojas et al. forthcoming). However, oftentimes, the actual impact of different translation choices (incl. deviations and errors) on the resulting data remains unclear. Thus, the AAPOR/WAPOR Task Force Report on Quality in Comparative Surveys suggests as future direction for the field of questionnaire translation to learn more about the impact of different translation options on the resulting data to be in a better position to guide translation activities: “[…] The future will need to see more qualitative and quantitative (experimental and evaluation) studies focusing on translation quality assessment.” (Lyberg et al. 2021, p. 64) (see also Smith 2020, who calls for more quantitative evidence). Already in 2008, Harkness et al. had uncovered many differences and errors in translations of the World Mental Health Survey Initiative, but they refrained from clearly attributing survey error or diminished interview experience to these—further research beyond expert assessments would be needed to indicate which differences or errors do indeed matter and which questions are robust despite differences between translations and the source instrument. In medical research, a discipline that operates quite independently from cross-cultural survey methodology, translation versions of several instruments produced through different translation methods (e.g., including or not back translation) were compared to each other, leading to no or only minor psychometric differences, even though—at least partly—accuracy and preference among the target population differed (Epstein et al. 2015; Hagell et al. 2010; Perneger et al. 1999). The potential robustness of an instrument regarding structure and item content was put forward as a potential reason, as was the cancelling out of imperfections in both translations (Perneger et al. 1999). Roberts et al. (2020) assessed a subjective well-being measure across scale formats, modes, as well as linguistic and cultural contexts. They found that translation versions and cultural differences contributed more to non-equivalence than scale format or mode. Repke and Dorer (2021) fielded a web survey with different Estonian and Slovene translation versions that were situated on a continuum between close and adaptive translation. Not considering conceptual challenges of the experiment here, they found items being sensitive to translation wording (i.e., small linguistic changes in translations had an impact on the data) and items being more robust (i.e., the meaning of a concept remained the same despite differences in translation wordings). Behr and Braun (2023) implemented split-ballot experiments comparing German translations stemming either from a team translation approach or from a simple back translation approach. Despite obvious faults—at least at face value—in the version resulting from the back translation approach, there was in most cases no statistical difference between the two tested translation versions. Overall, these few published studies are comforting in the sense that not all differences or even errors matter. This study builds on the quantitative results by Behr and Braun (2023) but adds qualitative evidence for the same respondents, thus making the study an innovative mixed-method study. The qualitative evidence comes from probing questions implemented in the same web survey. Using quantitative and qualitative evidence, we aim to explore the following question: What do slight differences in translation mean for respondents’ answers? And what can we say about comparability to the English source?

While answering these questions, we will also showcase the usefulness of web probing for exploring the meaning of different translation versions. Web probing, understood as the implementation of cognitive probes that are typically asked in cognitive interviews in web surveys, has been developed and refined over the past 10 years (Behr et al. 2020). It has been used in post-hoc studies understanding suspicious data in main surveys (e.g., Meitinger 2017) but also in pretesting studies (e.g., Hadler et al. 2022).

2 Methods and data

The source items and the two German translation versions for each item, as used in this study, come from a research project in which the team approach towards translation (TRAPD) was compared against a simple back translation approach (described and analysed in detail in Behr & Braun (2023). In the present paper, we focus on different translation versions of the same item—regardless of which translation approach they are based on. Both translation versions represent more or less “plausible” solutions of the translation task, but we see—on the pure semantic level—differences, which trigger this research. The original British English items come from the ISSP (International Social Survey Program) modules on “Social Inequality” (2019) and the “Environment” (2020).

2.1 Item selection

For the present case study, we selected the following three items, which were followed by a probe allowing us to exploit the usefulness of open-ended probing questions.

The first example is the item “[COUNTRY] should limit immigration in order to protect our national way of life.” The item was rendered by two different German language versions: In one of them, “way of life” had been translated by “Kultur” [culture] and in the other version by “Lebensweise” [way of living]. Even though both versions seem plausible, we see a (slight) meaning difference between these terms and are thus curious to learn about potential effects on the data.

The second example is the question “Thinking about your neighbourhood, to what extent, if at all, was it affected by the following things over the last twelve months? Air pollution/Water pollution/Extreme weather events.” The item text was supplemented with a translation note saying: “By ‘neighbourhood’ we mean the part of the town/city the respondent lives in. If he/she lives in a village, this can be taken as his/her ‘neighbourhood.’ ‘Affected’ refers to the impact on the neighbourhood.” In one version, “neighbourhood” was translated into German in a way that typically indicates the neighbours close to one’s home and/or proximity (“Nachbarschaft”) rather than the area one lives in more in general; the latter meaning, more in line with the translation note, was covered in the second version, which used “Wohngegend” [area where you live]. We were particularly interested in whether the narrower notion of “Nachbarschaft” would manifest itself in the data.

The third example is the question “Some people feel angry about differences in wealth between the rich and the poor, while others do not. How do you feel when you think about differences in wealth between the rich and the poor in [COUNTRY]? […]”. “Differences in wealth” was translated by “Wohlstandsunterschiede” [translation alluding to differences in economic welfare] in the first and by “Vermögensunterschiede” [translation alluding to differences in assets/property] in the second translation. Again, on the semantic level, there is a difference between these two German terms, but would this matter? “Wohlstand” alludes more to a high standard of living, a situation that grants economic security (Duden 2022), while “Vermögen” refers to property having a material value (Duden 2022) and as such is more concrete.

2.2 Split-ballot web study

The web survey was implemented both in German (Germany) and in English (Britain), but the English study is only ancillary to this article, as here only one version was implemented and probed. For the data collection, we commissioned the online access panel provider respondi^{Footnote 1}. Respondents were recruited using a quota sample balanced according to age, gender, and education. Data were collected in November 2020. In the German questionnaire, our three items were out of a total of 15 questions for which we randomized the question text (independent randomizations at each of these questions). For the German survey, 1422 panellists clicked on the link to the survey. Of those, 361 were rejected due to a full quota, 37 were screened out, and 49 broke off. The break-off rate was 5.7%. For the British survey, 1438 panellists clicked on the link to the survey, 730 were rejected due to a full quota, 152 were screened out, and 69 dropped out. The break-off rate was 12%. Table 1 provides a brief overview of both the German survey and the British survey.

Table 1 Respondent characteristics in the web survey

Full size table

2.3 Probing

We followed each of the questions with open-ended probes to learn about the cognitive processes of respondents with the two different translation versions (Behr et al. 2020). Tables 2, 3 and 4 summarize the English source wording for items and probes alongside the two German translations. The Version 1 translation stems from the back translation approach; the Version 2 translation stems from the team translation approach.

Table 2 Item “way of life” and probe

Full size table

Table 3 Item “neighbourhood” and probe

Full size table

Table 4 Item “wealth” and probe

Full size table

2.4 Coding schemes for probe answers

The coding scheme development was based on an inductive approach and considered answers from both German and British respondents to consider potentially country-specific answer patterns (Behr 2015). The answers were coded in their original languages, i.e., German and English. The open-ended answers offered by the respondents were coded by either the second or the third author. 15% of the answers were double coded by either the second or third author. Intercoder reliability (Holsti’s coefficient) for “way of life” reached 85% (German answers) and 76% (English answers), respectively; for “neighbourhood” they reached 92% (German answers) and 85% (English answers), respectively; and for “wealth” they reached 80% (German answers) and 69% (English answers), respectively. Differences were discussed and reconciled for the final dataset. Tables 5, 6 and 7 show the categories, a short description, and an example for the categories for each of the items.

Table 5 Coding schema for “way of life”

Full size table

Table 6 Coding schema for “neighbourhood”

Full size table

Table 7 Coding schema for “wealth”

Full size table

2.4.1 Categories present in all coding schemes

There are a few categories present in all coding schemes. “Non-response” means that respondents leave a blank text box, “don’t know” that they explicitly state they do not want or are unable to answer, “mismatch” means that they answer but not the actual question at stake (e.g., they give reasons for their choice of an answer category with the closed item and do not communicate their understanding of the keywords in question), “unproductive answer” that they give responses without a substantive meaning, and “other” that the answer is substantive but cannot be accomodated in the category scheme. These are mainly codes that do not help very much in illuminating which concepts respondents had in mind. However, they can be useful in pointing to comprehension problems or difficulties of understanding specific concepts. The non-substantive codes and the “other” code are exclusive, that is only one of these codes is coded. This does not apply to the substantive codes, which can be combined with one another.

The substantive categories are question-specific and we are going to address them now in turn.

2.4.2 Substantive codes for “way of life”

Most of the substantive codes, as presented in Table 5, are self-explanatory: “traditions & cultural practices”, “language”, “(fundamental) values”, “religion”, “history”, as well as “law and system of justice”. “Social system” concerns (elements of) the social system of a country, such as education and health care. “Environment” refers to everything related to environment and nature including the corresponding behavior of individuals. “Diversity of population” is coded if respondents mention cultural diversity as a positive trait of a society, “stereotypes” if negative stereotypes on the side of the majority population are criticized and “concept does not exist” if the concept of a national “way of life” is regarded as meaningless.

2.4.3 Substantive codes for “neighbourhood”

The substantive codes for “neighbourhood”, presented in Table 6, mostly refer to the closeness versus distance of the area with regard to the respondent. There is a clear graduation from “house/estate/immediate area”, over “village/rural area” and “city/community/county” to “region/country.” In addition, there is an extra code for “unclear area codes.” An exception of this coding principle is “neighbours”, which is used when respondents mention neighbours as people and not as an area category.

2.4.4 Substantive codes for “wealth”

The substantive codes for “wealth” are presented in Table 7. “Salary/wages from work” and “assets/property” are self-explanatory. “Monetary wealth” is coded if particularly the source of wealth is left unspecified. “Lifestyle positive” and “lifestyle negative” refer to the consequences of wealth and poverty, respectively, and to those living in these two conditions. “Access to resources” is coded if people mention access—or the lack thereof—to mostly public resources, such as healthcare and education.

3 Results

Firstly, we will present results quantitatively by looking at test statistics (t test) for split-ballot experiments for our three items. Since these items were not part of multi-item batteries, equivalence tests could not be implemented. Secondly, based on the open-ended probes, we will investigate the associations that the different translation wordings trigger among respondents. Lastly, we will regress the dependent variables on the associations by respondents (that is, categories in a coding scheme) in order to identify to which extent different understandings may lead to different survey results.

3.1 Quantitative testing: results from the web survey

The three sets of items included in the web study are presented in Table 8 along with the results. The complete results from the web survey are presented in Behr and Braun (2023).

Table 8 Quantitative results of web survey (n = 972)

Full size table

Two of these items do not show any quantitative differences (regarding the means) between translation versions: the different translations of “way of life” and “neighbourhood.” Quantitative differences could be found for “wealth” only. The following paragraphs address these items in turn by adding the qualitative evidence.

3.2 Qualitative evidence from probing

3.2.1 “Way of life”

While the quantitative assessment did not show any differences, based on the probing question, there are significant differences between both translation versions with regard to which aspects come to mind to respondents (Table 9). On the one hand, “culture” produces fewer mismatching responses and fewer responses evoking elements of the “social system” than “way of living.” On the other hand, it triggers more responses referring to “language” and “(fundamental) values” than “way of living.” Roughly comparing these results to the British figures, the German translation “Lebensweise” seems closer to the English “way of life” in this particular context, but the means comparison shows that slight meaning shifts are not detrimental for this item.

Table 9 Frequencies of the categories for “way of life” in different subsamples

Full size table

A regression allows us to delve deeper into the impact of the significantly different associations that respondents have in mind when being presented with different renderings of the English term “way of life” (Table 10). Both the “mismatch” category (strong in the “way of living” version) and the “(fundamental) values” category (strong in in the “culture” version) impact on the response; both being linked to more hostile reactions to immigration.

Table 10 Impact of respondents’ associations on the closed item “way of life”

Full size table

3.2.2 “Neighbourhood”

If “neighbourhood” is translated in a close way as “Nachbarschaft”, it is neighbours who come to mind in 8% of the cases in Germany, while with a rendering as “Wohngegend” [area of living] no one in our German sample mentions the neighbours. In Britain, 1% of the respondents think of neighbours—which comes close to the German figure in the case where “Wohngegend” is used. Other than that, there is also some difference with the smallest distance area “house/estate/immediate area” which is more frequently mentioned by the Germans who received the closely translated “Nachbarschaft” version. The British mention this category still a bit more frequently. There are no significant differences between the two German-language versions for the other categories (Table 11). In sum, both German translations share similarities and differences with the English source version. As the means comparison shows, these shifts are not detrimental.

Table 11 Frequencies of the categories for “neighbourhood” in different subsamples

Full size table

With a regression we examine the impact of the significantly different associations of respondents (Table 12). Only code 4—“House/estate/immediate areas”—shows a small significant negative effect on air and water pollution as well as on extreme weather events, which is likely due to these events or pollutions not necessarily affecting the smallest area around one’s home. Only the results for one of the items (“air pollution”) are shown below.

Table 12 Impact of respondents’ associations on the closed item “air pollution”

Full size table

3.2.3 “Wealth”

The third example was based on the question “Some people feel angry about differences in wealth between the rich and the poor, while others do not. How do you feel when you think about differences in wealth between the rich and the poor in [COUNTRY]?” “Differences in wealth” was translated as “Wohlstandsunterschiede” [alluding to differences in economic welfare] in one version and “Vermögensunterschiede” [alluding to differences in assets/property] in the other. Here we uncovered significant differences in the means. However, in addition to the above-mentioned translation difference, there was another one regarding the rendering of “feeling angry” and its integration into the response scale: Version 1 was more stilted (“Ärger verspüren”) and Version 2 more colloquial (“sich ärgern”); Version 1 expressed the end point of extreme anger in a potentially more extreme way (“äußerst großer Ärger”) than Version 2 (“sehr stark darüber geärgert”).

As for the qualitative results from probing, there are several significant differences between the two language versions, as Table 13 shows: Mismatching answers are less frequent in the “differences in assets/property” version. This might simply mean that it is easier to respondents to figure out what “differences in assets/property” mean compared to the meaning of “differences in economic welfare.” On the substantive side, “salary/wages from work” as well as “assets/property” come to mind easier when “differences in assets/property” is asked for. On the contrary, (negative) aspects for the lifestyle come easier to mind with the “differences in economic welfare” version. In sum, both translations share differences and similarities with the British understanding: While “Vermögensunterschiede” triggers “salary/wages from work” as well as “assets/property” in a similar way compared to the British term “differences in wealth,” the German “Wohlstandsunterschiede” comes closest to the British term when it comes to triggering lifestyle aspects.

Table 13 Frequencies of the categories for “wealth” in different subsamples

Full size table

A regression for this item reveales that associations of respondents with “assets/property” significantly reduce the anger, while the opposite is true for associations of respondents with a “negative lifestyle” (Table 14). The lack of significant results in the base model when looking at the comparison between translation versions (split variable) contrasts with the results in Table 8. This is due to the regression in Table 14 being based on a smaller n, as only a random sample of respondents received a probe question.

Table 14 Impact of respondents’ associations on the closed item “wealth”

Full size table

4 Discussion

For the first item, the means comparison was not significant, which is, first of all, good news. While “way of living” is a bit closer in meaning to the English source based on respondents’ open-ended associations, “culture” is not too far off. Overall, both German translation versions share core meanings, in particular related to “traditions and cultural practices”, “(fundamental) values,” “religion,” and “language.” Exceptions are related to the categories of “environment” and “history”: responses related to these latter categories are only mentioned in relation to one translation but are not central to respondents and may be regarded as “fuzzy edges” of the concepts (cf. “prototype semantics”, Kussmaul 1994, 2007). The regression provides a potential argument as to why certain translation versions may not lead to different results. Two answer patterns (“(fundamental) values” and “mismatch”)—the first stronger in the version with “culture,” the second stronger in the version with “way of living”—led to more xenophobic reactions among respondents. Since these patterns are in both translated versions, this may be the reason why we do not see differences in the data between translation versions. Why may “(fundamental) values” and “mismatch” responses lead to more xenophobic reactions? “(Fundamental) values” seem to be particularly sensitive and worth of protection. For understanding the role of “mismatch” responses, a closer look into the German item wording seems useful. Literally back translated, the translations read: “Germany should limit immigration to maintain our own way of living.” “Our own” may signal to some respondents that their personal way of living may be at risk, which is why we might see the increased number of mismatches. Typical “mismatch” responses are—for the present item—xenophobic answers, such as “we already have too many, partly dangerous cultures, with us” (ID 326). In the end, which translation version is better and more comparable to the source? While “Lebensweise” is a bit closer to “way of life” than “Kultur,” at least when based on the open-ended probe responses, ultimately both versions function comparably; no translation can be ruled out as inadequate.

For the second example, the means comparison shows that the wording “Nachbarschaft” (alluding to neighbours/proximity) was not as problematic as it had seemed at the beginning. While a certain percentage of respondents thought of their neighbours, the other associations testify that, overall, the term “Nachbarschaft” covers the meaning of “neighbourhood” in an equivalent way. The same holds true for “Wohngegend.” Slight differences compared to the English associations for both translations show that there is not necessarily full equivalence between a source word and its translation(s); translation strives to be an approximation at best (Gile 1995). A certain loss or gain needs to be accepted (Munday 2016). Besides, with this item, we are likely to cover country-specific patterns of dwelling, which is why full comparability to associations in the English source is not likely anyway. We would also like to emphasise that the item context may have helped respondents to understand the notion of “Nachbarschaft” in the meaning as intended by the item. Based on the listed types of pollution or extreme weather events, it may be that most respondents deduced that larger areas, beyond the closest proximity of neighbours, are meant anyway (Behr & Braun 2023). In this context, we want to refer to Harkness et al. (2010), who stressed the importance of context both in questionnaire design and translation, calling for a theoretical framework that fully accommodates context. Even though the regression indicated that associations of “house/estate/immediate area” can affect the closed survey responses, this finding was, overall, not decisive. Which version is better and more comparable? In the end, both versions function in a comparable way; no translation can be ruled out as inadequate.

For the third item, the means comparison signalled that something is going on here. Even though we see that core meanings are covered in both translations, two meaning dimensions were distributed quite differently across the translation versions: “Assets/property” was particularly strong with “Vermögen”—this is also in line with the dictionary definition of “Vermögen”, which stresses material property (“gesamter Besitz, der einen materiellen Wert darstellt”, Duden 2022). Typical answers refer to “real estate” [“Immobilien”], “property” [“Eigentum”], “shares” [“Aktien”], and “inheritance” [“Erbe”]. The association of “negative lifestyle” was particularly strong with “Wohlstand”; this equally fits to the dictionary definition of “Wohlstand”, which stresses the notion of living standard or economic security (“Maß an Wohlhabenheit, die jemandem wirtschaftliche Sicherheit gibt; hoher Lebensstandard”, Duden 2022). The following is an example answer: “I thought about the fact that there are people who can throw money out of the window, while others would starve to death without the food banks and struggle every month not to lose their homes.” For these two associations, the effects in the regression go in different directions; these results, however, do not allow us to explain the results provided in Table 8 when the full sample is used for the t-test. The version with “Vermögensunterschiede” had triggered significantly more anger in the larger sample; this does not fit to the respondent associations of “assets/property” minimising anger. We alluded before that also the translation of “feel angry” and its integration into the scale differed in the two translation versions. In the end, this may have been the main driver of the differences, superseding all other differences in wording. With the above results in our minds, we dare to suggest that the more colloquial translation of “feeling angry” coupled with a potentially less extreme endpoint label paved the way for more easily selecting more extreme response options. Different “feeling” and different scale translations are certainly worthwhile to be pursued in future research (Perneger et al. 1999, list examples of different “feeling” translations and their impact; Villar 2009, shows the effects of different translation-related changes in response labels). Returning to the translation of “wealth,” which translation is better and more comparable to the source? Looking at the distribution of the open-ended responses and their comparison with the English responses, no final decision can be taken. In the end, we see that languages cut up reality differently and that we need to take decisions, even if these are trade-off decisions. On the semantic level, the term “Vermögen” is certainly closer in meaning to “wealth,” but the wider associations regarding positive and negative lifestyle, likely triggered in the context of the entire item, bring “Wohlstand” closer to “wealth.” Either translation can be chosen in this context, but neither translation seems to be a fully equivalent match here.

5 Conclusions

The study has shown that different translation versions do not necessarily impact on the resulting data, even though they can, as testified by the third item. If there are different associations with different translation versions, these may lead to effects cancelling each other out, that is, to effects that both go in the same direction. If this happens, either translation version works well. However, translations versions can also lead to associations resulting in opposite response behaviour, which may then impact on survey data. The latter we could not proof with our data, though, since additional translation differences blurred the results. What does this mean for translation? We are still at the beginning when it comes to understanding what different translation versions mean to respondents and more research is certainly needed. For the time being, we can only emphasise best practice and recommend bringing together a team of skilled persons. Together, these should ideally have high proficiency in the respective languages; translation know-how, including research skills (using dictionaries, the web, etc.); substantive knowledge on the concepts to be measured, and questionnaire design/field experience. They should be encouraged to think of the core meanings of the English source item and how these could be covered by a translation. This may lead to different translation options on the table, which may even be all fitting in the context of a given item. And where subjective knowledge ends, empirical testing, such as web probing used in this study, can help to shed light on respondents’ interpretation and take final decisions that bring the translation as close as possible to the meaning of the source text. This study provides a blueprint for such empirical testing.

Further research should attempt to systematically link translation decisions and reasoning to data outcomes. This may be achieved by drawing on written documentation on particular decisions (cf. Behr & Zabal 2020) or by accessing recordings of think aloud translations or team discussions (e.g., Behr 2009, Dorer 2020). Thus, we could learn more about successful translation strategies that lead to the desired outcome in the data.

6 Limitations

For our web survey, we recruited respondents from a non-representative online access panel. Hence, the population we reached is not representative to the general population in Germany. What the illiterate population, for instance, or those less active in the digital world will make of these items can therefore not be answered.

Web probing suffers from non-response; to some extent, we compensated for this by having a rather high number of web probing respondents (more than 220 per probe and version).^{Footnote 2} Moreover, most types of non-response did not differ between translated versions; but if they differed, we tried to suggest reasons related to specific translation wordings.

Testing three items and their translations—and this only in one language—does not allow for generalization of findings, but the study provides an insightful glimpse into what different translations may mean to respondents. The results invite to delve deeper into decision-making in and impact of translations.

The regressions only have a suggestive character; we start in each case with non-significant results, partly based on small n. Therefore, these analyses shall incite researchers to take up the resulting suggestions and integrate them into more nuanced and targeted research in different languages.

Data availability

The quantitative data is available here: https://search.gesis.org/research_data/SDN-10.7802-2318?doi=https://doi.org/10.7802/2318. The qualitative data can be obtained from the authors upon request.

Notes

(https://www.respondi.com/)
The GESIS pretest lab, for instance, conducts web probing studies with 120 to 240 respondents (https://www.gesis.org/en/services/planning-studies-and-collecting-data/cognitive-pretesting).

References

Behr, D., Braun, M.: How does back translation fare against team translation?: An experimental case study in the language combination English-German. J. Surv. Stat. Methodol 11(2), 285–315 (2023). https://doi.org/10.1093/jssam/smac005
Behr, D., Meitinger, M., Braun, M., Kaczmirek, L.: Cross-national web probing: an overview of its methodology and its use in cross-national studies. In: Beatty, P.C., Collins, D., Kaye, L., Padilla, J.L., Willis, G., Wilmot, A. (eds.) Advances in Questionnaire Design, Development, Eval and Test, pp. 521–544. Wiley, Hoboken (2020)
Behr, D.: Translating answers to open-ended survey questions in cross-cultural research: a case study on the interplay between translation, coding, and analysis. Field Methods 27(3), 248–299 (2015). https://doi.org/10.1177/1525822X14553175
Behr, D., Zabal., A.: Documenting survey translation. GESIS-Survey Guidelines. GESIS–Leibniz-Institute for the Social Sciences, Mannheim (2020). https://doi.org/10.15465/gesis-sg_en_035
Behr, D.: Translationswissenschaft und international vergleichende Umfrageforschung: Qualitätssicherung bei Fragebogenübersetzungen als Gegenstand einer Prozessanalyse. GESIS, Bonn (2009). http://nbn-resolving.de/urn:nbn:de:0168-ssoar-261259
Brislin, R.W.: Back-translation for cross-cultural research. J. Cross Cult. Psychol. 3, 185–216 (1970). https://doi.org/10.1177/135910457000100301
Article Google Scholar
Colina, S., Marrone, N., Ingram, M., Sánchez, D.: Translation quality assessment in health research: a functionalist alternative to back-translation. Eval. Health Prof. 40(3), 267–293 (2017). https://doi.org/10.1177/0163278716648191
Article PubMed Google Scholar
Dorer, B.: Advance Translation as a Means of Improving Source Questionnaire Translatability?: Findings from a Think-Aloud Study for French and German. Frank & Timme, Berlin (2020)
Google Scholar
Epstein, J., Osborne, R.H., Elsworth, G.R., Beaton, D.E., Guillemin, F.: Cross-cultural adaptation of the health education impact questionnaire: Experimental study showed expert committee, not back-translation, added value. J. Clin. Epidemiol. 68(4), 360–369 (2015). https://doi.org/10.1016/j.jclinepi.2013.07.013
Article PubMed Google Scholar
Gile, D.: Basic Concepts and Models for Interpreter and Translator Training. John Benjamins, Amsterdam (1995)
Book Google Scholar
Hadler, P, Lenzner, T., Schick, L., Neuert, C.: European Working Conditions Survey 2024: Preparation and cognitive testing of the online questionnaire. Eurofound Working Paper WPEF22035. https://www.eurofound.europa.eu/sites/default/files/wpef22035.pdf. (2022). Accessed 19 May 2023
Hagell, P., Hedin, P.J., Meads, D.M., Nyberg, L., McKenna, S.P.: Effects of method of translation of patient reported health outcome questionnaires: a randomized study of the translation of the rheumatoid arthritis quality of life (RAQoL) Instrument for Sweden. Value Health 13, 424–430 (2010). https://doi.org/10.1111/j.1524-4733.2009.00677.x
Article PubMed Google Scholar
Harkness, J.: Questionnaire translation. In: Harkness, J., van de Vijver, F.J.R., Mohler, PPh. (eds.) Cross-Cultural Survey Methods, pp. 35–56. Wiley, Hoboken (2003)
Google Scholar
Harkness, J., Pennell, B.E., Villar, A., Gebler, N., Aguilar-Gaxiola, S., Bilgen, I.: Translation procedures and translation assessment in the World Mental Health Survey Initiative. In: Kessler, R.C., Bedirhan Ustun, T. (eds.) The WHO World Mental Health Surveys: Global Perspectives on the Epidemiology of Mental Disorders, pp. 91–113. Cambridge University Press, Cambridge. (2008)
Google Scholar
Harkness, J.A., Villar, A., Edwards, B.: Translation, adaptation, and design. In: Harkness, J.A., Braun, M., Edwards, B., Johnson, T.P., Lyberg, L., Mohler, PPh., Pennell, B.-E., Smith, T.W. (eds.) Survey Methods in Multinational, Multiregional, and Multicultural Contexts, pp. 117–140. Wiley, Hoboken (2010)
Chapter Google Scholar
Harkness, J. A., Schoua-Glusberg, A.: Questionnaires in translation. In: Harkness, J. (ed.) ZUMA-Nachrichten Spezial 3: Cross-Cultural Survey Equivalence, pp. 87–127. ZUMA, Mannheim (1998)
Kussmaul, P.: Semantic models and translating. Target 6(1), 1–13 (1994)
Article Google Scholar
Kußmaul, P.: Verstehen und Übersetzen: ein Lehr- und Arbeitsbuch. Narr, Tübingen (2007)
Lyberg, L., Pennell, B.-E., Cibelli Hibben, K., de Jong, J., Behr,D., Burnett, J., Fitzgerald, R., Granda, P., Guerrero, L. L., Gyuzalyan, H., Johnson, T., Kim, J., Mneimneh, Z., Moynihan, P., Robbins, M., Schoua-Glusberg, A., Sha, M., Smith, T.W., Stoop, I., Tomescu-Dubrow, I., Zaval-Rojas, D., Zechmeister, E.J.: AAPOR/WAPOR task force report in comparative surveys. (2021) https://www.aapor.org/AAPOR_Main/media/MainSiteFiles/images/AAPOR-WAPOR-Task-Force-Report-on-Quality-in-Comparative-Surveys_Full-Report.pdf. Accessed 1 January 2022
Meitinger, K.: Necessary but Insufficient. Why measurement invariance tests need online probing as a complementary tool. Public Opin. q. 81(2), 447–472 (2017). https://doi.org/10.1093/poq/nfx009
Article PubMed PubMed Central Google Scholar
Munday, J.: Introducing Translation Studies: Theories and Applications. Routledge, London (2016)
Book Google Scholar
Perneger, T.V., Leplège, A., Etter, J.F.: Cross-cultural adaptation of a psychometric instrument: two methods compared. J. Clin. Epidemiol. 52(11), 1037–1046 (1999)
Article CAS PubMed Google Scholar
Repke, L., Dorer, B.: Translate wisely! An evaluation of close and adaptive translation procedures in an experiment involving questionnaire translation. Int. J. Sociol. 51(2), 135–162 (2021). https://doi.org/10.1080/00207659.2020.1856541
Article Google Scholar
Roberts, C., Sarrasin, O., Stähli, M.E.: Investigating the relative impact of different sources of measurement non-equivalence in comparative surveys. Surv. Res. Methods 14(4), 399–415 (2020). https://doi.org/10.18148/srm/2020.v14i4.7416
Article Google Scholar
Villar, A.: Agreement answer scale design for multilingual surveys: effects of translation-related changes in verbal labels on response styles and response distributions. The University of Nebraska-Lincoln. https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1002&context=sramdiss (2009). Accessed 19 May 2023
Zavala-Rojas, D., Behr, D., Dorer, B., Sorato, D., Keck, V.: Using machine translation and postediting in the TRAPD approach: effects on the quality of translated survey texts. Public Opinion Quarterly (forthcoming)

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. The authors received funding for the data collection by their home institution.

Author information

Luisa Aiglstorfer
Present address: Sirona Dental Systems GmbH, Charlotte, USA

Authors and Affiliations

GESIS - Leibniz Institute for the Social Sciences, Mannheim, Germany
Dorothée Behr, Michael Braun & Luisa Aiglstorfer

Authors

Dorothée Behr
View author publications
You can also search for this author in PubMed Google Scholar
Michael Braun
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Aiglstorfer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DB and MB were responsible for the study conception, design, and data collection. LA provided the first draft of the coding schemes, to which all subsequently contributed. All authors contributed to coding the open-ended data. DB was responsible for data processing and analysis, supported by MB. The first draft of the manuscript was written by DB and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Dorothée Behr.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Behr, D., Braun, M. & Aiglstorfer, L. Showcasing the usefulness of web probing: Do subtle variations in questionnaire translation lead to different survey responding?. Qual Quant (2024). https://doi.org/10.1007/s11135-024-01843-8

Download citation

Accepted: 14 January 2024
Published: 05 March 2024
DOI: https://doi.org/10.1007/s11135-024-01843-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Showcasing the usefulness of web probing: Do subtle variations in questionnaire translation lead to different survey responding?

Abstract

Similar content being viewed by others

The Harmonisation Process: Harmonisation Is Not Translation

Comparability of Survey Measurements

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

1 Introduction

2 Methods and data

2.1 Item selection

2.2 Split-ballot web study

2.3 Probing

2.4 Coding schemes for probe answers

2.4.1 Categories present in all coding schemes

2.4.2 Substantive codes for “way of life”

2.4.3 Substantive codes for “neighbourhood”

2.4.4 Substantive codes for “wealth”

3 Results

3.1 Quantitative testing: results from the web survey

3.2 Qualitative evidence from probing

3.2.1 “Way of life”

3.2.2 “Neighbourhood”

3.2.3 “Wealth”

4 Discussion

5 Conclusions

6 Limitations

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation