1 Introduction

In the digital age, surveys are often conducted computer-assisted, which can either be administered by an interviewer or self-administered in the presence of an interviewer, or online without any interviewer present. In the following, I will use the term “computerized surveys” for all these scenarios. Among the great advantages of computerized surveys is the fact that the collected data is immediately digitized and does not require further processing. On the design side, the computerization comes with an increased interplay and interconnectivity between questionnaire and survey software (e.g., through the use of fills or placeholders, complex routing, automatic pop-up error messages, etc.), which has to be carefully considered when implementing computerized surveys.

The trend towards computerization also applies to migration research. While computerized surveys facilitate certain processes, such as reaching the target population (Pötzschke & Braun, 2017), the typically multilingual and multicultural character of migration surveys adds a layer of complexity to computerized implementation. This chapter aims to tackle the topic of questionnaire translation in migration research mainly from this technical perspective of multilingual survey implementation. General translation approaches and requirements can be found in various publications on questionnaire translation (e.g., Behr, 2018a; Behr & Shishido, 2016; Harkness et al., 2010b; Mohler et al., 2016). However, technical aspects related to translation often remain in the background, even though they can become quite crucial for data quality (Wang et al., 2017). It is only recently that challenges resulting from the interplay between translation and survey software have been brought to the fore (Pan et al., 2020; Wang et al., 2017). Although highly informative, these recent works do not attempt to build on and connect to related research in other disciplines, such as in translation studies. A notable exception is Upsing et al. (2011), who describe processes for large-scale competency assessment with reference to standards in software translation and adaptation – typically called “software localization” – using PIAAC as an example, the OECD Program for the International Assessment of Adult Competencies. Localization hereby refers to the processes of “adapting [digital] content linguistically, culturally, and technically” (Gambier, 2016, p. 891). It is the explicit aim of this chapter to draw on frameworks from software localization to foster knowledge transfer. I should add that the content of this chapter is applicable to multilingual computerized survey research in general, whether it is migration, cross-cultural or cross-national research.

In the following, I will provide a brief overview of good practices in questionnaire translation in general (Sect. 5.2). I will then introduce frameworks from software localization and transfer these to the survey research field (Sect. 5.3). With the help of these frameworks, I will outline major decisions in multilingual survey design and challenges that can arise when translating questionnaires for computerized surveys (Sect. 5.4, 5.6 and 5.7). Major players in multilingual research (e.g., Upsing et al. 2011; Dept et al., 2017; the consortia implementing the OECD studies PISAFootnote 1 and PIAAC) already operate according to the principles referred to in this chapter. Implementing computerized studies with a high level of comparability across diverse countries and cultures would otherwise not be possible. The author is not cognizant of computerized migration research that systematically addresses technical challenges right from the start. This can also be due, however, to a general scarcity of documentation on translation procedures and challenges in migration research.

2 Questionnaire Translation: What Aspects Lead to High Quality?

Migration research frequently requires the translation of a source questionnaire into various target languages in order to reach the intended respondent population. The term translation, as understood in this chapter, shall also include different levels of cultural adaptation (e.g., substantial modifications to items) to make an item work better in a new context (Behr & Shishido, 2016; van de Vijver & de Leung, 2011). Unfortunately, questionnaire translation is still too often seen as a step outside of the scientific process (Smith, 2004). Moreover, it is often regarded as a mere “word-by-word substitution, a problem of dictionaries” by those not familiar with translation (Gambier, 2016, p. 887). However, good translation significantly differs from such a view. It draws on interacting competencies, including linguistic and textual competencies in both the source and target language, as well as on cultural, substantive, information acquisition, tools, and general translation know-how (Behr, 2018a). When seen from this angle, it becomes clear that a simplistic view of translation leads to little consideration of translation needs, to small budgets for translation teams, and/or to unfeasible timelines. A reduced quality of the translation and ultimately of the research output is likely to result.

Much has been written in the cross-cultural survey methodology literature on good practice translation procedures. Seminal work is included in the edited volumes by Harkness et al., (2003), Harkness et al. (2010a), Johnson et al. (2018) as well as in the Cross-Cultural Survey Guidelines (Survey Research Center, 2016). The main guidance in cross-cultural psychology is provided by the ITC Guidelines for Translating and Adapting Tests (International Test Commission, 2017). The health research field is quite divers in the guidance it offers. Aquadro et al. (2008), Wild et al. (2005), Wild et al. (2009), and Eremenco et al. (2018) represent useful literature to get started in this field. Practices and challenges, specifically for migration research, are summarized in an edited volume by Behr (2018b).

The different disciplines coincide in emphasizing that a multi-step process is needed in order to ensure comparability and quality of the newly produced instrument. Using the example of the TRAPD model (Harkness, 2003), good practice calls for double translation of the questionnaire by two independently working translators (T), team reviews to arrive at a final version (R = Review and A = adjudication), pre-testing (P) among the target population, and a thorough documentation of all these steps for both internal and external quality control (D).Footnote 2 The various steps should include input from experts from different fields since a combination of expertise (in translation, questionnaire/survey design, and the substantive topic) is deemed crucial for producing a high-quality translation that fulfills both the needs of a good translation and those of a properly functioning measurement instrument (see Behr & Shishido, 2018a, b; Harkness, 2003).

Furthermore, translation teams need to be briefed on the task at hand. That is, they need to be given concrete information on the study and the translation goal (e.g., are cultural adaptations allowed or should the translation adhere to the source text?) so that they can make appropriate decisions in line with the overall objective of the study. Behr et al. (2018) also speak of “input documentation” in this context, as opposed to “output documentation.” The latter includes the translated questionnaires and comments on these (e.g., in case of difficult decisions or needed cultural adaptations) and a description of the overall process implemented. At a minimum, the briefing – or “input documentation” – should include information on the study, the translation goal, the target group, the survey mode, the employed translation and assessment processes, and expectations linked to each of these steps. It can – and even should – be expanded by information on key terms, on the questionnaire structure in case of complex instruments, reference materials, etc. Information can be conveyed in written form and additionally through (web) trainings (Behr, 2018a; Behr et al., 2018; Dept et al., 2017). Input documentation intended for translators and output documentation intended for research teams, alongside with open communication channels for all types of queries and issues, is particularly important in situations where research teams responsible for a study do not themselves speak the languages of a study and thus need to rely on external translators.

Another crucial cornerstone when it comes to ensuring translation quality and comparability is the source questionnaire itself. It is already during the development phase of a source questionnaire that the way is paved for comparability. Cross-national or cross-cultural research collaboration at the development stage of a questionnaire is vital to ensure that different cultural and linguistic realities are considered and sufficiently taken on board. The wording of source questions should be kept as simple as possible and allow a “relatively” easy transfer from one language to the other. Questions can also be annotated for translation or specifically earmarked for adaptation (Behr & Scholz, 2011). Furthermore, pretesting and translatability assessments – or alternatively advance translations – help to assess the questionnaire’s suitability for a multilingual and multicultural study before the source questionnaire is finalized (Acquadro et al., 2018; Dept et al., 2017; Dorer, 2020; Smith, 2004). The translatability criteria summarized in Aquadro et al. (2018) or the advance translation scheme by Dorer (2011; later updated in Dorer, 2020) highlight what can be considered when reviewing a source questionnaire for translatability (e.g., issues pertaining to culture, language or item construction).

Challenges that need to be considered when implementing a computerized survey in more than one language and/or cultural group can partly be deduced from these schemes but they are not explicitly mentioned. Since the technical set-up of a computerized survey, in particular the way how it is programmed, will impact on translation and may lead to problems with the translation later on, this topic shall receive heightened attention in this chapter. Readers should bear in mind, however, that these more technical aspects always need to be considered alongside cultural, linguistic, and design issues that can impact translation and comparability.

Against this backdrop, I now want to introduce key frameworks and approaches from software localization. The ultimate goal is to transfer these to the survey research field. The software localization field has summarized the technical challenges with multilingual software that they have encountered over decades into best practice frameworks. Transferring this knowledge across disciplines – and adapting it where needed – avoids making the same mistakes again.

3 Software Localization: Frameworks and Approaches

With the advent of the personal computer in the 1980s, the localization industry began to develop, tasked with “adapting [digital] content linguistically, culturally, and technically” for new regional markets (Gambier, 2016, p. 891). The industry started with the localization of software and websites, and it has now moved on to also include the localization of mobile phones and video games (Gambier, 2016). The activities related to providing such products for new linguistic and cultural markets largely exceeded the requirements linked to translation as exercised and known before. After all, extensive project management, software or graphics engineering, content management systems, etc. are all needed in this endeavor. Hence, a new term was coined: localization. Simultaneously to these developments, computer-aided translation (CAT) tools rose to prominence, which help, for instance, in the separation of programming code on the one hand and content for translation on the other hand, and in consistent use of reoccurring text elements. CAT tools are nowadays an integral part of the translation environment of professional translators (Sin-wai, 2016).

In a more fine-grained and process-driven view, the localization industry operates according to the GILT framework, which subsumes the four interdependent activities globalization, internationalization, localization, and translation. Globalization stands for all activities related to marketing a product in various regional markets. Internationalization stands for preparing a product at the technical level for localization. As such, it is the “process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for redesign” (LISA cited by Esselink, 2000, p. 2). Localization stands for the process of modifying a product for a specific market. Translation is in fact already part of localization, because localization includes both adaptation and translation. The better a product is internationalized, the more cost- and time-effective localization can be carried out (Gambier, 2016; Sandrini, 2008; Valli, 2019). Overall, the GILT framework highlights that technology and its requirements are one decisive pillar besides language and culture.

The internationalization and localization processes can further be broken down into five core elements (Schäler, 2010): (1) Analysis refers to a set of key questions that need to be asked prior to localization, for instance, whether it makes sense to localize the content at all, whether all the text that needs to be translated is accessible for translators, or whether it is hidden in program code that cannot be modified? (2) Preparation refers to preparing a so-called localization kit for everyone working on the project, including source materials, reference materials, guidelines, milestones, etc. (3) Translation takes place in a highly computerized environment, which is nowadays standard for many translators. Translators work with CAT tools that include translation memoriesFootnote 3 (TMs), terminology databases, machine translation (MT) functionalities, automatic checking routines, etc. Sometimes, preview functions allow viewing the translated text in the actual software environment, which is even more important when text strings have to be translated out of context. (4) Engineering and testing involves assessing the content in terms of linguistic correctness, interface layout, and functionality. (5) A review closes the localization project. Lessons learned are collected for future projects.

In the following, these frameworks and approaches from the localization field are merged to outline steps that should be taken when preparing for and implementing translations in multilingual and multicultural computerized migration surveys (see Table 5.1).Footnote 4

Table 5.1 Merging frameworks and approaches across survey methodology and localization

The chapter will focus on technical issues that can affect or interact with translation. For more comprehensive guidelines on technical questionnaire design, Hansen et al. (2016) should be consulted. In the subsequent descriptions, we will assume that survey programming is implemented centrally and that a source questionnaire, as programmed, can serve as a template for the intended target languages. I will not go into details of programming but rather point to general issues to consider.

4 Design and Implementation of the Source Questionnaire

This section is dedicated to the design phase of a survey questionnaire. It highlights ‘internationalization’ decisions that need to be made prior to and during technical questionnaire design. In line with best practice in multilingual and multicultural research, namely that the design process is decisive for translation quality later on (Behr & Zabal, 2019), this preparatory process should receive special attention in a survey.

4.1 Software Fit

Migration research is oftentimes multilingual. The languages chosen depend on the target population. The German IAB-BAMF-SOEP Survey of Refugees, for instance, is implemented in German and (alongside it in a bilingual fashion) in English, Arabic, Farsi, Pashto, Urdu, and Kurmanji (Jacobsen, 2018). In such multilingual surveys, the first necessary technical clarification refers to the survey software. It needs to support the required scripts and character sets, language directions (i.e., left-to-right, right-to left, vertical, bi-directional, see Hansen et al., 2016), fonts, etc. that are needed for a specific survey. This means that the decision about the required target language(s) should precede any decisions about survey software, or at least it should go hand in hand with software decisions. The right-to-left implementation needed for ArabicFootnote 5 is particularly important nowadays given that major refugee studies around the world field their surveys in Arabic, amongst other languages (e.g., Jacobsen, 2018; AIFS, 2018). Sometimes, audio-assisted self-interviewing (ACASI) is meant to compensate for the lack of multilingual interviewers – this should then also be supported by the software (see the above-mentioned studies for examples).

In the following, we will look into more specific aspects that need to be considered when designing – and translating – multilingual and multicultural computerized surveys.

4.2 Culture-Driven Response Formats with Technical Implications

Some survey features are culture-dependent. Hence, the survey should be designed in such a way that it allows for cultural adjustments. These adjustments could affect the following (Hansen et al., 2016; Valli, 2019, Maroto & De Bortoli, 2001; Pym, 2011):

  • Date formats: e.g., different positions regarding day, month, and year such as mm/dd/yyyy in the US and dd/mm/yyyy in many European countries;

  • Time formats: e.g., 12-hour vs. 24-hour clock;

  • Name formats: e.g., two surnames in Spanish-speaking countries;

  • Address formats: e.g., different sequence of information or type of information required (state, province, etc.);

  • Telephone number formats: e.g., including or excluding local prefixes;

  • Number formats: e.g., different decimal, thousand, etc. separators, such as 20,5 in German (DE) vs. 20.5 in English (US);

  • Currency formats: e.g., currency symbol after or before the relevant currency entry;

  • Measurement units: e.g., metric vs. imperial units for distances, Celsius vs. Fahrenheit, different clothing units, etc.

These formats are not only relevant when it comes to programming individual questions but also when information from these questions is automatically inserted in follow-up questions in a survey (fills). The way how fills are programmed needs to ensure cultural particularities such as addressing a person with the appropriate order of names (Wang et al., 2017) or “piping in” the date in the culturally appropriate way (see Sect. 5.4.4 for more information on fills.)

These or similar formats also play a role when defining out-of-scope answers that automatically trigger error messages popping up on the screen. For instance, the allowable ranges for feet or meter, when it comes to size, will be different across cultural groups who use metric vs. imperial units (e.g. km vs mile); or the need for inclusion or exclusion of commas or full stops will vary depending on language. Wang et al. (2017) share their experiences from the Chinese 2016 Census Test Internet Instrument: Respondents had to set up security questions. A valid answer had to contain at least three characters. In Chinese, the names of people, locations or items often consist of only two characters. Hence, respondents entering those names were confused with automatic validation checks, which were appropriate for the English source language but not for Chinese.

Moreover, if pre-coded response lists are provided, these will need to be adapted to the respective needs. For instance, pre-coded response lists of time will look different in different languages and cultures.

Valli (2019), speaking for software localization, argues that during internationalization cultural assumptions should be removed from software design. In particular, software should not include hard-coded culture-specific formats (e.g., date formats) that cannot be changed. Hard-coded stands for text that is directly part of a source code and typically not accessible for translators. Transferred to the survey world, we can say that anything that could require a cultural accommodation should be soft-coded and/or be made editable in some other way.

4.3 Non-linguistic Adaptations with Technical Implications

Graphics, icons or images can be embedded in a survey for different reasons. They may serve a measurement purpose when they are an integral part of questions. Additionally, they could serve to represent the survey sponsor, survey agency or the study itself, for instance in the header of an online survey. For measurement-related graphics, icons or images, their cultural suitability for the target population should be assessed to ensure that cultural norms are not transgressed. For instance, Hansen et al. (2016) show, using the 2007 International Social Survey Program (ISSP), that body shapes can be presented with figures wearing only boxer shorts or bikinis in Austria whereas in the Philippines they wear clothes covering larger parts of their body. In technical terms, such images should be soft-coded so that they can be replaced if needed.

Similarly for icons representing a sponsor, agency or the study itself: For instance, when conducting multilingual web surveys within the contexts of cross-cultural web probing studies (Behr et al., 2019), the survey icon representing our institute, GESIS – Leibniz Institute for the Social Sciences, was uploaded and integrated into the survey with the German institute name for the study conducted in Germany and with the English institute name for all other surveys.Footnote 6 At a minimum, graphics, icons or images should contain editable text in case translation teams need to implement a change (Valli, 2019).

Also, links to external websites (e.g., on the survey introduction page linking to further information) may need to be replaced so that they directly link to a website in the respective target language. Similarly, Sha et al. (2018) describe how entry pages (websites) to a multilingual survey should best be designed and also adapted in order to ensure participation across multilingual groups in a society. In their case study, the authors were interested in limited English speakers’ entry to U.S. Federal Government internet surveys.

In technical terms, all of the information referred to in this section should be accessible for translation teams so that the content can be adapted, if needed.

I should add here that also colors (e.g., background colors of a survey or of a logo) should be thoroughly checked in terms of cultural meaning and associations (Hansen et al., 2016).

4.4 Linguistic Differences with Technical Implications

Computerized surveys have certain features that make them unique compared to paper-and-pencil surveys. One of these features is the possibility to use responses given earlier in the survey to adapt survey text in later questions (fills). For instance, questions can be tailored to refer to a previously mentioned male or female partner; or questions can be asked in present or past tense depending on whether a situation currently applies or whether it applied in the past. With survey software taking on such adjustments, interviewers in interviewer-administered surveys can focus on the interviewing task itself and do not have to adapt text to a given respondent (Latour et al., 2013). In self-administered surveys, the respondents can focus on relevant text for their situation without being distracted by irrelevant information.

The multi-country Survey of Health, Aging, and Retirement in Europe (SHARE) (Das et al., 2005) used fills, such as the automatic insertion of ‘he’ or ‘she’ depending on the gender of a partner as indicated earlier. The multilingual implementation of the English source version proved challenging, however:

At first sight this seemed to be straightforward, but because of country specific [sic!] grammar and syntax it became complicated. In later versions of the CAPI instrument generic fill texts used in multiple question texts were no longer used. Instead, each question had its own fills, using question-specific fill names. (p. 17)

Also in the Programme for the International Assessment of Adult Competencies (PIAAC), fills – or dynamic text, as they called it – was used to accommodate different respondent situations.Footnote 7 In this study, too, the researchers experienced challenges across languages. For instance, for the source question “In your ^JobLastjob, how often ^DoDid you usually … read directions or instructions?” it is sufficient to insert the words “current job” and “do” for the text indicated through ^ and the result is a perfect sentence in present tense in the English language. When “last job” and “did” are inserted, the resulting sentence successfully captures past respondent activities. However, in many languages, a close translation of the question, including a literal translation of fills, did not work, because past and present tense are not formed in the same manner as in the English language. Oftentimes, other solutions, including translating the entire question (or larger parts thereof) for all respondent conditions, had to be resorted to (Latour et al., 2013).

Fills are not only difficult for these linguistic-technical reasons, however. It can also be difficult to convey to translators unfamiliar with questionnaires how they are supposed to understand and translate these fills. On the other hand, if questions are completely written out for different respondent conditions, special attention needs to be directed to the briefing of translators so that they understand the difference between similarly worded sentences, their respective role in the survey, and consistency needs. For instance, in a GESIS study with tuberculosis patients from Somalia and Ethiopia, many sentences were partially replicated. One sentence, for instance, asked: ‘Did the doctor voice suspicion that you may have tuberculosis?’ The subsequent sentence read: ‘Did one of these doctors voice suspicions that you may have tuberculosis?’ The second question applied to situations where the respondent had several doctors taking care of him/her. The translations were supposed to be identical, except for the difference between ‘the doctor’ vs. ‘one of the doctors’ (and any language adjustments needed in Somali and Tigrinya because of this difference). The translators did not always translate sentences such as these consistently, which may have been due to the fact that the set-up of the questionnaire (e.g., who gets which question) was difficult to understand and general survey principles (e.g., standardization in surveys) not known. CAT tools with translation memories would have helped the translators to translate consistently in any case.

The software localization field, too, knows the challenges that come with fills, in particular if these are based on a rather simple source language, at least on the structural level. English, for instance, has a simple morphology, with word endings that do not undergo many changes from one sentence to another. This characteristic does not necessarily apply to other languages (Valli, 2019). De la Cova (2016) observes that English word order and lack of gender may not replicate well in other languages. Valli (2019) recommends that the number and nature of fills should be well considered in advance. The same applies to surveys. Moreover, those knowledgeable of translation and the different linguistic needs of the survey languages should have a say in the set-up of fills in a source questionnaire so that problems can be prevented. “Writing for translation” (De la Cova, 2016, p. 253) or even programming for translation could be the main message here. If ease of programming in the source language prevails, the resulting translations may be suboptimal, possibly even artificial, with detrimental effects on the validity and comparability of data.

Another known challenge with technical implication is that of text expansion. Compared to English, other languages are often longer (Dept et al., 2017). Microsoft (2018) states that text strings, when translated into German or Dutch, often expand by 40% (2018; see also Valli, 2019). This needs to be taken into account when designing buttons, menus or dialogue boxes in software. Transferred to the survey context, developers should ensure that buttons (e.g., ‘forward’ or ‘backward’) are sufficiently large to cater for different languages, or that pop-up windows contain all relevant text – without incorrect hyphenation or truncation. If the survey software offers default sizes for certain elements, the various testing scenarios should establish whether this is sufficient (see Sect. 5.7). The needed or required text length also plays a role when designing open-ended text boxes where the size of the text box is known to influence the response length (Dillman et al., 2014). Thus, these text boxes should fit the expected or desired response length – and they might possibly even be enlarged in general to cater to the response length in different languages (see also Meitinger et al., 2019, on response patterns for open-ended questions in different languages).

Directly related to open-ended text boxes, respondents should be able to type in characters from different scripts into open-ended text boxes. Thus, there should be no system restriction on the type of data that can be entered into these boxes. This challenge is exacerbated when migrants are supposed to enter text into open-ended text boxes in self-administered surveys where laptops, tablets, etc. are handed out to respondents by interviewers. It needs to be ensured that the software and the keyboard support text entries in different scripts.

4.5 Content Differences with Technical Implications

Often, the default situation in multilingual computerized surveys is that a generic source questionnaire serves as a blueprint for all other language versions. Additional design solutions, however, should be possible to allow a cultural group to adapt content, such as adding relevant questions or response categories or changing routing based on culture-specific needs. How this can be achieved depends on the survey software and overall design decisions. However, in a purposefully designed comparative study, adaptations to the source questionnaire should all be signed off from a central organization and documented to ensure comparability. Asking about highest educational attainment based on country-specific response categories may serve as an example of an adaptation.

4.6 Preparing Source Questionnaires for Computer-Aided Translation

Sometimes, the output format of the survey software requires the use of dedicated translation tools (CAT tools) to read the file, but also the normal text processing formats Word or Excel are supported by CAT tools. CAT tools can only show their strength if the source text – here: the source questionnaire – is optimally prepared. This includes avoidance of manual hyphenation, of manual hard returns or of multiple blank spaces (instead of tabs); also key terminology or repetitive elements should be worded and spelled consistently. These simple style and formatting rules allow translation memories to correctly display identical or similar translations as stored in the translation memory, or term databases to reliably show pre-defined terminologies (Esselink, 2000; Valli, 2019).

4.7 Internationalization Testing

The localization industry calls for internationalization testing before a software product can be localized and passed on to the next step in the workflow (Esselink, 2000). This includes checking whether the software is ready for localizability. Essentially, the testing involves the issues and challenges addressed above. Key questions to ask are: Does the software support all needed characters and scripts? Does it support different regional (date, time, etc.) formats? Does it allow for text expansion? Does it run on required operating systems? Is all text that needs to be translated or adapted accessible – likewise for icons, images or graphics?

To help internationalization testing, so-called pseudo translation (e.g., replacing text with more characters or with accented characters) can be carried out in an easy, low-cost way to identify issues in other languages, such as spacing issues, truncated text or issues with scripts. Based on this exercise and its analysis, recommendations can be made on how to proceed (Esselink, 2000; Lerum et al., 2014; Schäler, 2010).

These testing procedures from localization can equally be implemented for survey translation. Moreover, I want to stress that a source questionnaire itself should have been thoroughly tested in terms of wording, routing, and overall design, before proceeding to internationalization testing and then translation. Implementing changes on the source version – and consequently in all language versions – once the translation has started are cost-, time-, and work-intensive and there is the risk that not all source improvements are consistently implemented in all language versions.

5 Prior to Translation: Preparing Translation Teams

Once the source material is ready, translation can begin. In the localization industry, translators receive a so-called localization kit that does not only contain the source material to be translated but also reference materials, including translation memories, terminology databases, style guides, milestones, etc. (Esselink, 2000; Schäler, 2010; Valli, 2019). The importance of additional project information has already been discussed in Sect. 5.2, under the notion of briefing. For questionnaires, especially if the normal flow of text is interrupted through fills, or if text strings (e.g., for buttons or error messages) are not understandable without context information, annotations for translators will be helpful.

6 Translation, Including Adaptation

Having received the localization kit, translators in the localization industry start translating. Their task will always involve the use of specialized localization software. For survey translation, depending on output formats of source questionnaires from the survey software, translation files could be XLIFF files, which require the use of dedicated CAT tools for the translation, or Excel files, which can be translated with CAT tools but also with normal text processing programs. Translation may also take place within the survey software itself in specifically designed language editors. CAT tools can aid translation by supporting consistency of terms or of reoccurring text elements through the use of term databases and translation memories, or by allowing the use of several automated checking routines (e.g., on spelling, punctuation, figures or formatting).

A number of translation decisions must be taken in view of the survey interface as well as with respondent activities in mind (see also Pan et al., 2020). If interviewer or respondent instructions refer to buttons on the screen (e.g., to the ‘Help’ button or the ‘Next’ button), the same translation of key terms should be used for the buttons themselves so that ease of navigation on the screen is ensured. This essentially means that the translation of the survey interface and the translation of the questionnaire should be coordinated in one way or the other.

Interviewer or respondent instructions such as ‘Mark all that apply’ or ‘Tick only one box’ should be translated with the ultimate layout and the concrete interviewer or respondent activity on the screen in mind. The translation of ‘mark’, ‘tick’ or ‘box’ could vary depending on these features or activities.

For questions that are asked in an open-ended fashion (e.g., the number of hours that a respondent spends on a given activity), it is important to consider the design of the survey and the position of the open-ended text box. Depending on whether it comes before the unit (here: ‘hours’) or after, the translation may need to be linguistically adapted to this position.

Words or phrases helping to structure a questionnaire or interview (such as: ‘In the following …’ or ‘To what extent do you agree or disagree with the following statements?’) always need to be translated in view of the visual survey design. The “following” could be translated in the sense of ‘as follows below’ if, and only if, the respondents themselves see the questions below. Otherwise, ‘following’ will need to be translated in a temporal sense.

If fills are used in a questionnaire, translators should be trained on how to understand these features and what they need to consider during translation. Wang et al. (2017) provide an example based on Chinese where strict adherence to the English fill structure resulted in a defective text in Chinese. Possibly, the translator(s) was not sufficiently informed on the use of fills. If fills do not work in a target language, this should be openly communicated to research teams.

7 Technical Pretesting

In the localization industry, testing is an integral part of software localization. It can start once the software is compiled in the target language. Testing is always based on the real application. The localization field differentiates between (a) linguistic testing, (b) interface testing,Footnote 8 and (c) functionality testing (Esselink, 2000).

The linguistic test targets all aspects related to language. Key questions that also apply to survey translation are: Has all text been translated, including error messages? Do the different scripts display correctly? Does the text hyphenate correctly? Do fills display correctly (e.g., do they appear in the correct position in the sentence, or is capitalization or a small letter of a fill appropriate at the given position in the sentence)? Is all text translated in the intended sense, including interface elements, button labels, etc. (Pan et al., 2020)?

The interface test focuses on visual aspects. Questions that should be addressed here are: Is the text in dialog boxes or error messages displayed completely, i.e. without truncation? Are dialog boxes or error messages adequately (re)sized? Does the text fit on the screen in different resolutions? Is the localized version aesthetically acceptable? Do drop box designs display all response options? Are the different format conventions (e.g., date or time formats) correctly implemented?

Eventually, the actual functionality of the software is focused on. Esselink (2000) – speaking for the software localization sector – holds that functionality testing usually mirrors the processes that have been implemented on the source product. Moreover, the more thoroughly the source product has been prepared and tested prior to localization, the fewer problems will be found during testing of the localized product. Transferred to the survey context, the key question to ask during this final testing is whether the entire questionnaire works as intended, or whether problems were introduced through translation. For survey translation, functionality testing should involve the use of various testing scenarios that cover respondent groups receiving different parts of the questionnaire. Such a full-blown test can also identify whether the translation works in the context of more extensive routing and in different types of “paths”. During the actual translation, questions are translated in a linear fashion one after the other. In a concrete survey context, however, this linear fashion may not be applicable since routing based on a given answer may send a respondent to questions much later in the questionnaire. Hence, this real-life testing is extremely important for ensuring that the questionnaire is intelligible in the different “paths” that a respondent could take through the survey.

In the localization industry, localized software often undergoes compatibility testing that checks how compatible a new product is with other products that are available in the target language (e.g., platforms, devices, web browsers). This type of testing can be crucial for translated surveys as well. For instance, Dillman et al. (2014, p. 345) discovered in one of their web studies that toggling back and forth between an English and Spanish questionnaire only worked in some versions of browsers, but not in others.

Last but not least, a remark seems appropriate on how this technical pretesting step relates to usability testing, which has gained momentum in survey research due to the rise of computer-assisted surveys. The main goals of usability testing are to improve data quality by the reduction of errors, and to prevent item or unit non-response by the reduction of respondent burden. Usability testing should not be confused with best practice questionnaire design; rather it should build upon best practice design and provide the ultimate test that ensures that interviewers and respondents can record answers easily and accurately (Geisen & Romano Bergstrom, 2017). Transferred to multilingual surveys, usability testing should first and foremost target the questionnaire in general, and as such it should be part of the aforementioned testing of the source questionnaire. However, surveys that undergo larger cultural adjustments (e.g., a change in the writing direction) may be in need of additional usability testing once the translated version is available. This testing may still be similar to the interface and functionality testing described above.

8 Discussion and Recommendations

The aforementioned observations have hopefully shed some light on the complexities that can come with implementing a multilingual and multicultural survey questionnaire in a computerized survey environment. The good news is that the complexities can be mastered. What is important, however, is that technical aspects are considered early on in the development process in a multilingual survey. Just as early cross-cultural collaboration, translatability assessments or advance translations are important to ensure ease of translation and cultural relevance, so are early checks in ensuring that the software and programming work in a multilingual context. That is: When developing and programming a source questionnaire for a multilingual study, the diversity of study languages and their respective needs should always be considered. Ultimately, a well-designed survey on the linguistic, cultural as well as technical level is the pre-condition for sound data.

We require respondents to invest time and effort into replying to our questions, even though their (intrinsic) motivation may be low. We should, on our side, invest time and effort into providing questionnaires that are linguistically and culturally appropriate and function as intended. The best possible way to achieve this is early cooperation between survey developers, linguists, and translation technologists (see Lupsa, 2018, cited in Behr & Zabal, 2019). Prior to finalizing the design in the source language, issues or challenges for other languages can thus be identified and remedies found. In the same vein, Valli (2019) states for the field of software localization: “At minimum, the localization team should be involved in those product development phases in order to raise awareness about the future linguistic pitfalls.” Checklists as the one provided in the appendix – or the work by Esselink (2000) or Microsoft (2018) – may additionally help to inform multilingual technical survey design. Furthermore, planning and workflows will need to cater to these additional layers of cooperation.

Finally, testing of translations in the computerized survey environment will need to be factored in both time- and budget-wise, and so does another loop of adjustments following the feedback from a first round of multilingual linguistic, interface, and functionality testing. In particular in languages that a research team does not speak (e.g., languages of refugees in a country) additional resources are required for this testing.Footnote 9

To conclude, the translation approach is rarely described (in detail) in multilingual migration research. It would be helpful if researchers document their procedures and include challenges encountered or lessons learned both in translation and in technical survey implementation. This way, surveys can raise awareness on challenges, they can learn from each other, and build on each other’s experiences. We already see this transfer of lessons learned with other types of study challenges in migration research, for instance on sampling, field work or item understanding (Formea et al., 2014; Haug et al., 2019; Röder, 2018).