Machine translation (MT) tools like Google Translate can overcome language barriers and increase access to information. These tools also carry risks, and their societal role remains understudied. This article investigates typical uses and perceptions of MT based on a survey of 1200 United Kingdom residents who were representative of the national population in terms of age, sex, and ethnicity. We highlight three main findings from our analysis. First, participants often used MT for non-essential purposes that rarely justified professional human translations. Second, while they were highly satisfied with MT they also expressed desires for higher MT quality. These desires were usually motivated by expectations of perfection rather than fitness for purpose. Third, participants’ future vision for MT involved increasingly blurred boundaries between text and speech. The article calls for more MT research on the interface between written and spoken communication and on the ethical implications of rare but significant high-risk uses of the technology.
Online machine translation (MT) tools have a huge user base. By March 2021, the Google Translate app alone had been installed a billion times (Pitman, 2021). MT can support communication by allowing users to understand or convey information across languages they might not speak or know well. In its current form, however, MT also has considerable limitations. Its success depends on factors including the system’s architecture, the availability of linguistic data, the specific source and target languages (Johnson et al., 2017), and the use context and purpose. In some cases, MT errors are inconsequential. In others, they can affect livelihoods and reputations (Vieira et al., 2020, p. 1516). To mitigate the risks of this technology, and understand its potential roles in society, it is increasingly important to understand how MT tools are used in ‘everyday’ life.
A vivid description of what everyday MT use—by an American in France—looked like in the 1990s appears in Lawson and Vasconcellos (1994, p. 86):
I’ve used it (French Assistant) a little in translate mode, like the day there was no hot water in the apartment I’m renting and I had to go check with la gardienne. I created a file with the basic questions I wanted to ask, each one expressed two or three ways with lots of complete clauses, simple sentences, etc. I was able to get some half decent sentences with a little tweaking and patience. I practiced pronouncing the sentences a little bit and went down to knock on the office door. Normally I would have printed out the results and carried a page along as a ‘cheat-sheet’ but my printer was out of order. I put my notebook on battery power and carried the PC along with me. […] ‘La Gardienne’ was out and her high-school age daughter came to the door. I guess I should have spent more minutes on the pronunciation practice because the noises I was uttering left la fille de la gardienne looking perplexed. At that point I flipped up the display on the PC and held it so she could see the screen as I scrolled through my questions. […] I said ‘merci’ and returned to my apartment. Mission accomplished.
This anecdote, later quoted by Nurminen and Papula (2018), illustrates not only how much has changed, technologically, in the last three decades, but also how early on users were experimenting with MT-mediated communication.
In this article we probe into what members of a general cohort of users do with MT. We carried out a survey of United Kingdom (UK) residents who were representative of the UK population in terms of age, sex and ethnic background (see Sect. 3.2). The survey provides insights into participants’ self-identified reasons for using MT; the use context, location and device; as well as their assessment of the technology and of the type of risk it posed, if any. The focus of the article is on users’ open-ended comments about their perceptions and expectations of MT. We qualitatively code these comments and discuss the content in relation to three common themes of their responses that should be considered in the public deployment of MT tools, namely users’ understanding of translation quality, their conception of translation itself and their ways of interacting with MT systems. The full survey is available as an open dataset together with the code we used to process the responses we present here (see Data Availability Statement). In the remainder of the article, we briefly review previous research (Sect. 2) and present the survey method (Sect. 3), our analysis (Sect. 4) and a discussion of the results with our main conclusions and directions for future research (Sect. 5).
2 Literature review
2.1 MT user research
MT research to date has tended to frame ‘the user’ from a language industry perspective. In an article that explores different MT use cases, Way (2013) refers to raw (i.e., unedited) MT as one of three service levels, but only as regards specialised industrial sectors in which translation services are used: technology, manufacturing, finance/legal, marketing and e-commerce (p. 5). Similarly, a recent foreword to the ‘User Track’ of the Machine Translation Summit states that: ‘[t]he range of topics covered [by the User Track of the conference] overall reflects the fact that, now more so than ever, machine translation is in wide commercial use’ (Tinsley & Shterionov, 2019, emphasis added). Perhaps because even free-of-charge tools are used commercially, as framed above, uses of MT that are outside strictly defined business sectors are often overlooked.
We highlight two studies where, by contrast, MT use is framed exceptionally broadly. The most recent of these is a survey of 119 frequent MT users (Robertson et al., 2021). Robertson et al. conclude their article by proposing strategies for MT use and design. They call for more MT interactivity with system interventions that can alert users to errors, help them make the input more MT-friendly or allow them to adapt the output, for example by choosing between different registers (p. 5). The other study is the abovementioned one by Nurminen and Papula (2018), who obtained 1579 responses to a brief survey aiming to establish ‘who is using MT, where these users are, how they are using it, when they are using it, and in what areas of life’ (p. 200). They found that most users sought to understand texts for themselves—i.e., for their internal use (assimilation) rather than for external distribution (dissemination)—and often seemed ‘to translate texts that are in languages in which they already have some proficiency’ (p. 206). These two studies have a similar participant profile. Robertson et al.’s (2021) respondents were a subset of frequent MT users (p. 3). Many of them were students (p. 11). Nurminen and Papula’s (2018) participants used MT most often for study (p. 205), and most used it frequently (p. 204). Both these studies, therefore, are based on users who were already quite familiar with MT and who were often students or used it predominantly for study (for a survey where all respondents were students, see also Gaspari, 2007).
From a sociological perspective, Asscher and Glikson (2021) looked at end users’ potential bias in MT evaluation. Based on an online experiment with 284 participants, this study found that users tended to evaluate identical translations more favourably when the translations were attributed to a human rather than a machine. The study argues that this bias against MT must be considered in relation to how the technology can influence the power dynamics of communication, especially if MT is used in ethically charged contexts. Similarly, a review of medical and legal MT use cases (Vieira et al., 2020) highlighted the importance of raising awareness of some of the technology’s strengths as well as its limitations, including its potential to exacerbate social and linguistic inequalities by producing results that vary in quality between languages.
Despite these caveats, the long-term prospects of MT are positive. As a common medium of communication, MT has been described as potentially more effective than lingua franca projects such as Basic English or Esperanto (Ramati & Pinchevski, 2018, p. 2562). A previous study that looked at this question empirically suggests that, subject to a selective use of MT and to participants’ linguistic profile, MT-mediated multilingual communication outperforms the use of English (Pituxcoosuvarn & Ishida, 2018).
2.2 Conceptualising MT use and users
As mentioned, an important factor in MT use is users’ awareness of the technology’s risks and limitations, and how to mitigate them. Gaining this type of awareness has been treated as a matter of developing ‘MT literacy’. Literacy in an MT context ranges from having a basic understanding of how MT works and the ability to judge when MT systems can be used to knowing what content should be translated with them and how their output can be improved (Bowker & Buitrago Ciro, 2019, p. 88). Although MT literacy has been promoted and studied especially in relation to education (Bowker, 2020a), it serves as an important concept for the present analysis. It involves, we argue, knowledge and skills that are relevant to all members of society, who are increasingly likely to use or come across MT systems.
To our knowledge, uses of unedited MT for everyday communicative purposes have not to date been systematically conceptualised in relation to a specific theoretical framework. Previous discussions of translation tools (e.g., Olohan, 2011; Olohan, 2017) have, however, often drawn on the social construction of technology (SCOT) (Pinch & Bijker, 1987), a framework that involves concepts and assumptions that are particularly relevant to this article. A detailed review of SCOT is outside the scope of what we aim to provide here, but there are two tenets of this framework that are worth highlighting. The first is the assumption that technological tools have agency—i.e., they ‘do things’ (Pickering, 2010) and thereby have an effect on users and their social circles. Users, on the other hand, exert their own agency and, through their uses and interpretations of different tools, influence how these tools develop and mature. The underlying assumption in the SCOT paradigm, therefore, is that technologies are socially co-constructed rather than unilaterally determined by the designers or another specific group (Oudshoorn & Pinch, 2003). In relation to MT, this assumption underlines the importance of examining how MT tools are used, how they might change users’ perceptions of language and translation, and how they might affect different groups of users, which brings us to the second tenet of SCOT we wish to emphasise, namely the concept of relevant social groups.
In the early SCOT literature, relevant social groups are defined as ‘institutions and organizations […] as well as organized or unorganized groups of individuals’ who all ‘share the same set of meanings, attached to a specific artifact’ (Pinch & Bijker, 1987, p. 30). These groups are the different constituencies whose experiences and needs interact to shape how a technology evolves. While Pinch and Bijker warned readers that ‘more research [was] needed to develop operationalizations of the notion of “relevant social group”’ (1987, p. 50), their treatment of this concept has been a target of criticism. Due to power imbalances or rigid social structures, some groups may lack access to the relevant platforms that would give them a voice in how technologies may change and the different purposes they may serve (Klein & Kleinman, 2002). There can be potential issues of ‘unrecognised and missing participants’ (Klein & Kleinman, 2002, p. 32) or ‘individuals sharing common meanings [but who are] unable to unite into a group’ (p. 36), so in research about how technologies can improve or the effects they might have on society, there is often the risk that certain groups or individuals might go unnoticed or be underrepresented.
As we allude to in Sect. 2.1, MT research has at least in part fallen victim to this problem. This is especially the case in relation to MT users who may have casual encounters with the technology but who are not students, commercial users or translators—in other words, the user categories on which MT research has traditionally focused. While we do not solve the issue of identifying MT’s relevant social groups—let alone representing all of them—our intention with this article and its underlying data is to consider users and uses of MT as widely as possible in the hope of documenting any thus far potentially ‘unrecognised’ or ‘missing’ perspectives. This means that apart from our national focus, which provides a useful parameter for controlling the sample, we do not attempt to find users who belong to a preliminarily defined social group. Rather, we cast a wide net over the population to allow typical MT uses to emerge. We outline our sampling method below in Sect. 3 after specifying the survey scope and design.
3.1 Survey scope and design
Following ethical approval, we designed the survey by conducting a series of iterative pilot studies aimed at improving the survey structure and the wording of the questions. We hosted the survey on Typeform.com. The final version had a total of 34 questions. We used logic jumps that skipped over questions that did not apply, so not all participants saw all questions.
The survey started by defining what was regarded as MT:
This survey is about computer programs like Google Translate. These programs convert texts or speech from one language to another automatically. In this study, we call these programs automatic translators, but they are also known as ‘machine translation systems’ or ‘online translators’. In addition to Google Translate, other popular examples of these programs are Bing Microsoft Translator or Babel Fish. You don’t need to have used automatic translators or heard of them before to take part in the study.
After this definition, we adopted the term ‘automatic translators’ throughout the survey.Footnote 1 To avoid predetermining the types of MT-mediated communication on which participants could report, we did not distinguish a priori between uses of MT involving written texts and those that also involved speech synthesis, recognition or both (i.e., machine interpreting). The survey itself asked for details of this nature.
Once participants acknowledged the above definition and completed a short demographic section, the survey was split into two possible routes based on a question that asked whether participants had used MT before. Those who had used it were asked for evaluations of their experience and details of their MT use(s). Those who had not used it were asked about any second-hand perceptions of the technology or how they handled situations where they might have needed to communicate across languages.
3.2 Data collection
We distributed the survey through the data collection platform Prolific.co (henceforth, ‘Prolific’). Prolific has a large database of users who are by default asked to register their demographic details on the platform. This online method of data collection by its very nature does not include Internet non-users as potential respondents. This was not a significant concern since most free-of-charge MT systems are available online, so use of MT in most cases presupposes use of the Internet. The number of UK households with Internet access is also high—nine in ten in 2019 (Ofcom, 2019, p. 2)—so Prolific’s database is unlikely to disproportionately represent the population in this respect.
The only requirement for participating in the study was being 18 or over and a resident of the UK. We used Prolific’s UK representative samples feature. This means that response rates were controlled to ensure that the resulting sample mirrored the national census data in terms of age, sex and ethnicity. While representativeness in relation to other parameters (e.g., education) was not available, using these three demographic variables is expected to improve the generalisability of the findings. Our sample accuracy was calculated at 99.6%, which reflects the extent to which stratified response targets for each specific demographic (i.e., in terms of age, sex and ethnicity) were met (Prolific Team, 2019).
The study had a target of 1200 responses in total, the maximum allowed by the representative samples feature at the point we collected the data. The responses were collected anonymously. They were linked to participants’ unique Prolific accounts, which are verified according to a series of data quality checks (Bradley, 2018). The validity of responses was further checked with two additional measures. First, participants had a time limit to make their submissions. Based on the pilot studies, we estimated the survey would take between five and seven min to complete. Using this estimate, the maximum time for completion was automatically set by Prolific at 36 min. This allowed for participants who wanted to give full and considered answers to open questions, while avoiding counting responses from participants who became inactive during the survey. Second, to test participants’ attention, we added a question to the survey that simply instructed them to select the number 5 on a 1–5 scale. Those who were timed out (n = 7) or who failed this attention check (n = 4) did not count as valid responses and were therefore excluded. The survey was published on 8 July 2019 and the target of 1200 valid responses was reached on 30 July 2019. Based on the estimated UK adult population in mid-2019—52,673,433 (ONS, 2020a)—at a confidence level of 95%, the full sample provides a margin of error of plus or minus 3%.
Participants’ demographic details were available through Prolific by default. We did not define the structure of this data or its nomenclature, for example in relation to sex and ethnicity categories, which are based on the UK census. We complemented Prolific’s demographic information with questions of our own about participants’ levels of education and their linguistic profile.Footnote 2
A minority of participants were students (11.8%; missing: 0.5%). Their mean age was 44.98 (range 18–86); 51.2% of them were female and 48.8% male. Most of them had selected White as their ethnicity (84.5%). The other ethnicity categories were Asian (7.7%), Black (3.8%), Mixed (2.2%) and Other (1.8%).
Figure 1 provides details of participants’ educational background and employment status.Footnote 3 Over 40% of them were in full-time employment, and the majority had an undergraduate degree or higher.
Most participants were native speakers of English (94.3%; missing: 0.7%). When asked if there were non-native languages they spoke at least enough to read a restaurant menu, 60.3% said yes (missing: 0.7%). The non-native language of which most of the sample had some knowledge was French (38.1%). Participants’ linguistic profile is an inherent limitation of this data, since it does not allow us to draw conclusions about how non-native speakers of English use MT. This is at the same time an interesting feature of the sample, however, since it reflects how a population with relatively low uptake of additional languages (Campbell-Cree, 2017) interacts with the technology, a perspective that is largely missing from previous research.
3.4 Qualitative coding
Respondents who had used MT before saw two substantive open-ended questions: “Please say a few words about what would make you prefer automatic translators over professional human translators, if anything” (Question A) and “How would you describe the ideal automatic translator of the future?” (Question B). These questions were intended to probe into participants’ reasons to use MT, and their conceptions of it, in particular vis-à-vis its relationship to translations carried out by human professionals.
We analysed responses to these questions in three stages. First, two members of the team discussed the responses and inductively generated a list of themes that recurred in the answers, which were refined into a set of 13 coding categories. Second, the four authors coded approximately a quarter of the data each and set aside 100 randomly selected survey submissions (200 in total across questions A and B). Lastly, all authors independently coded all the responses that had been set aside. We used the overlapping codes assigned to these responses to calculate inter-coder agreement.
Table 1 presents the coding categories and their defining concepts, which were drawn from the responses themselves. These codes were used for both questions. More than one code could often describe a single response, so we selected up to three codes per response in these cases. If the response could correspond to more than three codes, we retained the three codes that were apparent earlier in the text of the response.
To calculate inter-coder agreement, we disregarded 56 blank responses (by non-MT users who skipped the questions or those who did not answer them) and 4 responses coded and shared by accident between the authors prior to the independent coding. A resulting sample of 140 responses across the two questions was therefore available for agreement checking. The Fuzzy Kappa measure of agreement between two coders, which is fit for many-to-one classifications (Kirilenko & Stepchenkova, 2016), ranged between 0.65 and 0.75. If only the first code provided for each response is considered, Cohen’s Kappa varies between 0.66 and 0.77. In the final dataset used for analysis, we keep codes provided by the first author for the responses used to calculate agreement (i.e., where overlapping codes by all authors were available).
We report the qualitative coding results in Sect. 4.2.
Of the full sample of 1200 participants, 911 (75.9%) had used MT before taking part in the study. This question did not distinguish between frequent or infrequent MT use. In subsequent questions, participants were often asked to consider their overall experience. Those who answered ‘no’ to the question on previous MT use were asked whether they had heard about the technology at all, if they had been in a situation where they had to read, write or communicate something in a language they did not know and, if so, how they handled that situation. While we do not analyse these answers here in detail, we note that of the 289 participants who answered ‘no’ to the MT use question, eight later mentioned using Google Translate and a further one mentioned using the Facebook translation system. These participants had therefore used MT even though they are not included in the counts provided below. Although we explicitly mentioned Google Translate as an example in our initial survey scope definition, misinterpretations of this nature are difficult to eliminate. Their incidence was in any case small.
4.1 Multiple-choice questions
This section summarises answers to multiple-choice questions of the survey. Unless otherwise specified, the base for all figures below is 911, the number of those who confirmed previous MT use.
Regarding use contexts, of which participants could select as many as applicable out of ‘Leisure’, ‘Work’, ‘Study’ or the free-text option ‘Other’, the results indicate that MT was most used for leisure (80.1%), followed by work (27.8%) and study (22.8%) (missing: 1.4%). This is unlike the survey by Nurminen and Papula (2018) discussed above, and is consistent with the small number of students in the present sample. The most common use location was the UK (91.3%) though some participants had also or instead used MT abroad (32.4%) (missing: 0.5%). The most common devices for using MT were desktop or laptop computers (79.1%), followed by mobile phones (60.9%). Tablets (20%), smart speakers/home devices (1.5%) and smartwatches (0.5%) were selected less often (other: 0.1%; missing: 0.4%). Google Translate was by far the most common system, selected by 95.2% of users.
Table 2 presents more details of the situations in which users called on MT systems. These situations were intended to capture the specific nature of what participants were attempting to do, unlike what we call ‘contexts’ above, which are broader. We highlight three points about these results. First, they confirm assimilation of written information online as the most common MT use situation (68.1%). Second, uses of MT involving spoken communication were selected relatively infrequently. Communicating by typing messages in the same physical space (12.2%) was, for instance, more common than using MT when speaking out loud either in the same physical space (8%) or online (4.2%). Lastly, a minority of users had resorted to MT in situations considered to be serious, for example while in hospital or at a police station (2.1%). High-stakes MT use was therefore relatively rare.
In Table 3, we present participants’ levels of satisfaction with MT and their perceptions of MT quality.
Table 3 shows that a substantial majority of participants were either satisfied (62.8%) or very satisfied (30.1%) with the systems they had used, and most rated the technology as accurate (67.7%). These results are somewhat inconsistent with findings reported by Robertson et al. (2021), whose participants tended to be critical of MT quality. The use contexts and situations presented above are most likely a factor in these assessments. That is, as a tool used mostly in informal contexts unrelated to study or work, MT was considered effective. Furthermore, in response to a question about their motivations for using it, most participants chose the option “Because it served my purpose well and I wanted to use it” (76.6%) rather than “For lack of a better alternative” (22.8%) (missing: 0.5%). This suggests that, in the above contexts, MT is not normally a make-do solution but rather a technology of choice.
The survey also probed into users’ assessment of risk. Online MT use may pose a wide range of risks linked to translation quality or simply to the act of resorting to the technology in the first place, for example in relation to information security and confidentiality (Canfora & Ottmann, 2020). Most survey respondents felt that using MT did not represent a risk (76.8%), however. The type and level of risk was deliberately unspecified at this point in the survey, so these answers reflect participants’ own judgment. Those who did think there were risks involved (n = 211, the base for the following percentages) rated the risk level more often as either low (25.6%) or very low (4.7%) than as high (16.1%) or very high (1.9%). The majority chose the medium risk category (51.7%). In addition, a risk to personal reputation was mentioned most often as the nature of the risk (40.8%) followed by a risk to their studies (32.7%) and by more specific possibilities specified under ‘Other’ (25.6%). Professional (19%), financial (8.5%), medical (5.2%) and legal (4.7%) risks were less common. On the one hand, participants’ low level of concern about MT’s potential risks again reflects the ways in which they reported using it—i.e., for leisure or indeed ‘out of curiosity’ (40.5%—see Table 2). On the other hand, their assessments of MT and of its risks may speak to important aspects of their understanding or perception of some of the technology’s key use implications, which we return to below.
4.2 Open-ended questions
Figure 2 shows the extent to which we used each code to classify responses to the survey’s open-ended questions: (A) what, if anything, would make users prefer MT to human translators and (B) how they would describe the ideal MT systems of the future. We realised after the fact that the prompt for Question A (unintentionally) allows for responses from two perspectives: participants could refer to actual aspects of current use contexts that might make them favour MT, or potential factors (e.g., technological improvements) that would, in a hypothetical future, make them prefer the technology. Both possibilities serve our purpose of examining participants’ own description of what might play a role in their decision to use MT and in their conception of differences between human and machine translation.Footnote 4
Figure 2 shows that Usability was the most frequent code used in the analysis of Question A whereas Quality was the most frequent one for Question B. In responses to Question A, aspects of Speed and Cost were often interconnected with Usability. In responses to Question B, the Quality code dominates more markedly. We examine these results below by concentrating on three common threads of the comments: conceptions of human and machine translation (Sect. 4.2.1), MT evaluations (Sect. 4.2.2), and human-MT interaction (Sect. 4.2.3). These themes were linked to several codes used to analyse the responses, though they particularly concern the codes Usability, Cost, Quality, Procedure, Human v Machine and Use contexts.
4.2.1 Conceptions of human and machine translation
In responses to Question A, convenience was unsurprisingly an important element of Usability and therefore a key driver of MT use. Descriptions of how or why MT was convenient often involved portability and ease of access: ‘They are easier to use and are on me at all times’; ‘[…] can take a tablet with you’.Footnote 5 One participant mentioned using MT: ‘Simply because it’s there […] I wouldn’t have paid to find out the information.’ Indeed, although Question A explicitly mentioned professional human translators, some respondents seemed to equate this with ‘any nearby human’. In the words of one respondent, using MT was ‘[…] easier than finding someone who is bilingual’. Other participants mentioned: ‘[…] There isn’t always a native speaker of the language you need translated available at that moment in time […]’; ‘[…] People here are busy don’t want to disturb’. There was also uncertainty about whether human translators would expect to be paid: ‘[…] It is […] more likely than not that a human translator would want to be paid, whereas automatic translators are generally free to use’. While in translation studies the meaning of ‘professional human translators’ is narrowly defined and in most cases unambiguous, human translation is characterised in these comments as any human assistance rather than as a professional service. This characterisation is largely coloured by the informal contexts in which MT was used. In many cases, these contexts would not have warranted professional intervention in the first place so, in participants’ understanding, MT had little overlap with professional language services.
Some responses suggested hesitancy about any human intervention at all. Embarrassment was mentioned as a reason for this: ‘Less embarrassment for things that may seem simple […]’; ‘You don’t need to worry about the situation you are asking about - it’s not judgmental’. There were also participants who pointed to the satisfaction of being able to take control of the situation themselves: ‘I prefer the sense of doing something myself rather than relying on others […]’. Others explicitly regarded MT as a more private solution than speaking to a human: ‘Privacy is retained with an automatic translator […]’; ‘It is easy to use and gives you privacy’. Given the poor record of big technology companies on privacy issues (Esteve, 2017), these comments reflect potentially problematic assumptions. At the same time, they underline the status of MT as a personal technology that can be used without involving others for purposes that are not only informal, but may also be idiosyncratic, if not embarrassing. Indeed, the benefits of the technology in these cases was associated precisely with the fact that it did not require relationship-building or human involvement.
4.2.2 Evaluations of MT
Participants’ perceptions of human and machine translations are also influenced by their understanding of concepts such as grammatical correctness, fitness for purpose and translation quality. In Quality responses to Question A, higher quality was often mentioned as a pre-condition for participants to favour the technology: ‘If you could guarantee accuracy’; ‘For it to be completely accurate’; ‘If I knew that it would be 100% accurate’. These responses were closely connected to those provided to Question B, where higher translation quality was the main request for the future. Specifically, some responses to Question B asked for more idiomatic and more culturally appropriate translations: ‘More sensitive to idioms and cultural quirks’; ‘[…] Able to translate nuances and to recognise more idiomatic phrases’. Others asked for ‘perfect’ MT output: ‘Flawless’; ‘One that gives perfect translations’. Many framed accuracy in terms of grammatical correctness: ‘One that can accurately translate grammar […]’; ‘completely grammatically accurate’; ‘take grammar in consideration’.
On first impression, these results seem contradictory. Most participants were satisfied with MT technology and used it out of choice. When given the chance, however, most also wanted MT to offer better quality. Translation quality is a complex concept that involves several factors and sub-concepts. In MT research, target-language stylistic and grammatical acceptability are often treated as matters of ‘fluency’ whereas semantic precision is treated as a matter of ‘adequacy’ or ‘accuracy’ (Way, 2018, p. 164). Fluency and adequacy are in turn underpinned by fitness for purpose, which is an important concept in MT evaluation (Bowker, 2020b) though also more broadly in functional theories of translation that consider quality to be strictly purpose-dependent (Nord, 2014). The apparent contradiction in participants’ responses stems most likely from the complexity of these factors. This is particularly the case in relation to fitness for purpose. When participants rated MT accuracy (here understood holistically) and their satisfaction with the technology, they did so based on their experience, which often involved low-stakes and therefore relatively undemanding purposes. In their open-ended responses, by contrast, they were asked to imagine an idealised scenario. Here the understanding of quality is absolute rather than relative. Participants tended to focus on grammatical correctness and unqualified perfection.
These idealised—and indeed unrealistic—quality descriptions underscore the importance of promoting MT literacy. They also expose inherent limitations of MT as a general communication technology. In most cases, users are unlikely to know both source and target languages well enough to evaluate the translation themselves. Certain types of errors, especially those involving meaning, will therefore be usually out of reach, so users’ focus may unsurprisingly be on matters of fluency. Robertson et al. (2021) suggest technological solutions to this problem by leveraging quality estimation—i.e., automatic predictions of MT quality (Specia et al., 2018)—and more system interactivity to guide users in making decisions. Even as technological safeguards improve, they will still be prone to error, however. In general communication, MT’s measure of success lies more in knowing when to use it than in attempting to directly evaluate its output.
Some participants were aware of the importance of considering MT’s use purpose, which was particularly clear in Use contexts responses provided to Question A:
I can use them when it is not important […].
[…] as I’m only being nosy as to what some of my Facebook friends have posted on the site, I only really need a rough translation to use and enjoy their posts”
[…] If I needed a translation for legal, business or study purposes I would employ a professional human translator.
As reviewed by previous research (Vieira et al., 2020), however, there are cases where MT is used in high-stakes contexts, and the present data also provides examples of this:
In my job as a border force officer it [MT] is useful as it is quick to access.
[…] I had a Polish patient who did not speak English. There was no one else on the ward who spoke Polish. […] I had to ask her permission to check her [b]lood pressure. I had to explain the medication [I] was giving her. I had to ask about her pain levels. I had to ask if she needed help to mobilise to the toilet or help into bed. Without google translate on my phone [I] wouldn’t have accurately given the information or got the information from her.
I use them to explain patients rights who are detained under the mental health act. It is important they have accurate information and an interpreter told me once that the written inform[a]tion I had been given by google translate was almost unintelligible.
Alarmingly, while the latter participant provided this comment as an example of the risks posed by MT, the former two answered ‘no’ to the question on whether there were potential risks involved.
Although cases of high-risk use of the technology were reported rarely, these cases are significant for multiple reasons. First, their circumstances are common, and MT is often at hand. As the use rate of the technology increases—for example, with increased portability and wider availability across languages—the probability of MT misuse also increases. Second, in any single high-stakes case where MT’s fitness for purpose is misjudged, the consequences can be significant. Although so far high-risk uses of the technology are infrequent, they are not negligible.
4.2.3 Human-MT interaction
The second most frequent code used in the analysis of open-ended question B was Procedure. Multimodal methods of interacting with MT systems involving speech and sometimes images were a salient theme in these responses. The MT system of the future was described as:
One that can listen to a conversation and translate it
[Y]ou speak into it and it can conversate to others in another language for you
Speaking the words and having both verbal and visual translation
Live written/verbal translation
Many of these requests involved speech recognition and synthesis functionalities that are provided by online MT systems already. The fact that speech did not feature prominently in participants’ responses on how they used the technology (see Sect. 4.1) but did in their future development wishes may indicate that users are not widely aware of these existing implementations or that their level of quality is not yet satisfactory.
Participants’ vision for the future of MT was nevertheless clearly influenced by the use of speech, and by science fiction: ‘Voice in voice out. Like the real bab[el]fish from Hitchhikers Guide to the Galaxy’. There were requests for more seamless MT functionality, which sometimes meant not having to activate or prompt the technology into use at all: ‘One that automatically translates one foreign language into another without needing to instruct a program’; ‘without need to start “program” automatically recognises the need for translation […]’. These imaginary developments bring MT even further into daily life. Here MT is envisaged as a ubiquitous extension of human ability, which is liable to involve more informal uses and more spoken language.
The responses involved considerable overlap between text and speech: ‘one you could speak to and get automatic voice and text response’. Some users also expressed a wish to pronounce the output themselves: ‘it [ideal MT systems of the future] would have an option to break down words showing you how to pronounce them’. Integration of text, sound and images is one of the elements that characterises new media (van Dijk, 2012, p. 8). To date, the role of online technologies in the relationship between written and spoken language has been examined mostly monolingually, however (Sindoni, 2013). When different languages are involved, they tend to be studied in isolation (e.g. Pérez-Sabater et al., 2008). Text-speech integration in cross-linguistic, casual communication is therefore a relatively recent phenomenon both in practice and in terms of conceptual understanding. Although translation researchers distinguish between translation (text) and interpreting (speech), we argue that translation technologies are making this boundary more porous.
From a practical perspective, it is also noteworthy that although speech functionalities are widely offered by online MT systems, language coverage is still limited. Notably, dialects with particularly distinctive spoken forms or that are more commonly spoken such as Swiss German, Cantonese or Levantine Arabic are not consistently mentioned as separate language options on the web versions of MT tools. While Microsoft Translator supports Cantonese and Levantine Arabic (Microsoft, 2021), at the time of writing these are not separate entries on Google Translate’s list of languages for web (Google, n.d.). Some of these languages are available for mobile apps, but even then they are not consistently mentioned on the list of languages in the apps’ description (e.g., Apple, 2021a; b), so it is not clear to potential users if speech in these languages would be recognised and, if so, to what level of quality. Despite recent improvements in speech technologies, as the naming of specific tools and of the technology itself suggests, MT is still dominated by the conventions and expectations of written communication. In technological terms, this is unsurprising. Even if MT is used with speech recognition and synthesis, the core of the technology in most cases still works with written input and output.Footnote 6 Our research reveals that the focus on written texts runs counter to participants’ vision, however. It is also only partially consistent with the contexts in which the technology is used, which are informal and thereby potentially conducive to spoken communication even if that is not currently common. We therefore highlight the interface between text and speech as a clear area of interest for future MT use research.
5 Discussion and conclusion
In relation to the first finding, participants’ satisfaction with, or sometimes preference for, MT was heavily motivated by the often inconsequential contexts in which they used MT tools. MT’s widespread availability allows language translation to play a role in areas of life where professional human intervention would have otherwise been unwarranted. MT therefore expands the range of contexts in which individuals use and interact with translations.
This expansion can also push MT into contexts where it carries more risk, however. Although we are unable to comment on whether these results can be extrapolated to countries other than the UK, responses to our survey included rare but noteworthy cases where participants resorted to MT in situations that had more overlap with professional translation and where the ethics of MT use come into play more prominently. When used by healthcare workers or immigration officers, the speed and convenience of MT may be valuable but need to be judged against the risks of translation errors, not to mention potential risks to privacy and data protection. While our survey suggests MT use in these contexts is currently not frequent, this is likely to become more common if we consider respondents’ desire for the ‘perfect’ translating machine, which is ironically reminiscent of the abandoned goal of early MT initiatives widely known as FAHQMT (fully automatic, high-quality MT) (Gottschalk & Thompson, 1959). Working out the limits of what MT can do, and when it should and should not be used, is therefore one of the great challenges this technology poses to users. Raising awareness of these limits and of the importance of MT literacy is in turn a great challenge for society. We call for robust ethical frameworks that consider not only the risks of the technology in high-stakes contexts, but also the wider set of factors that may drive its misuse, including public sector funding pressures and low public recognition of translation as a profession.
As for the third finding, the increasingly blurred line between text and speech in MT use represents possibilities for multilingual communication that are different from the practices which translation and interpreting have most often examined. MT users may want to type the source content and synthesise the output in speech. When translating into a less well-known language, they may also want the tool to help them vocalise the output themselves. Human-MT interaction can vary, therefore, in relation to the preferred mode of communication (i.e., written or spoken) and the extent of human performance in the communicative act. Input and output modes can both change between text and speech as well as between human and machine. Moreover, this variation is dynamic—mode switching can occur mid-conversation and affect just the input or output.
These relatively new affordances of MT technology have at least two implications for future research. First, they call for more dynamic conceptions of translation and interpreting that link rather than segregate these two fields as strictly written or spoken, respectively. In practice, this may take the shape of empirical studies that compare the efficacy and ethical implications of uses of text and speech—or combinations thereof—in different MT use contexts. There is also room for more robust conceptualisations of MT-mediated communication that draw on theories of both translation and interpreting to cover the wide set of circumstances in which MT may play a social role.
Second, fluid uses of text and speech call for risk assessment procedures to consider factors that are less relevant for MT research focused strictly on written content. Regarding the risks posed by MT as a professional tool to be used by translators, for instance, the content’s life span and size of its readership have traditionally been key factors to consider in deciding whether MT use is appropriate (Nitzke et al., 2019, p. 246). As MT moves further into personal life, human-MT interaction may involve just two interlocutors, but the communication could still be of consequence. In such cases, content life span and exposure are less effective as parameters for assessing risk, which underlines the complex ethics of MT as an everyday technology. Although previous research has looked at a wide range of ways in which language and cross-cultural communication are essential factors of life in society—for example in relation to migration (Polezzi, 2012), community interpreting (Hale, 2007) or user-generated translation of multimedia content (O’Hagan, 2009)—these fields have not so far provided a systematic characterisation of the type of translation/interpreting that takes place when any individual with access to the internet calls on MT technology. Practical directions for future research in this respect may involve efforts to identify more comprehensive taxonomies of MT use and to reformulate previous approaches to MT risk with a view to updating policy or public service guidelines. In any future work in this area, we argue that everyday use of free-of-charge, online MT systems should feature more prominently in the translation and interpreting ecosystem. Understanding this type of MT use is an important component of understanding the transformative role of MT in everyday life, where technology is being constantly co-constructed by users.
The dataset generated and analysed during the current study is available at the University of Bristol data repository at https://doi.org/10.5523/bris.qu7ahpvuyr0j2ioh74rjhx2a0.
Informal discussions held during the study’s planning phases with members of the university community suggested ‘machine translation’ is not a term widely recognised by everyday users, so we avoided it in the wording of the questions.
We also asked participants to describe their occupation, but we decided not to publish this question in our dataset to prevent data deanonymization. For the same reason we have replaced exact ages with ranges in the dataset and we also exclude sex and ethnicity details from the open data. Summary statistics of these details are provided in Sect. 3.3.
Student status and employment status are two separate variables in the Prolific data, so those with a value of ‘yes’ for the student variable (11.8%) will also have selected one of the employment categories shown in Fig. 1 (e.g., ‘part-time’, ‘other’ or ‘unemployed’). As mentioned, population representativeness cannot be guaranteed based on these parameters. In relation to education, for instance, national data from 2019 suggests that the present sample has higher levels of qualification on average compared to the UK population (ONS, 2020b).
Although Question A might be considered to lead participants, we do not see this as a reason for concern since the focus of the question is indeed on potential reasons to prefer MT.
On occasion we slightly edited the responses to correct typos. Quotes between separate quotation marks or divided by a blank line were provided by different respondents.
Given the lack of standard written forms for sign languages, one area where this does not work in the same way is text-to-sign translation (see Wolfe, 2021).
Apple. (2021a). App store preview—Google Translate. Retrieved December 7, 2021, from https://apps.apple.com/gb/app/google-translate/id414706506.
Apple. (2021b). App store preview—Microsoft Translator. Retrieved February 1, 2022, from https://apps.apple.com/gb/app/microsoft-translator/id1018949559.
Asscher, O., & Glikson, E. (2021). Human evaluations of machine translation in an ethically charged situation. New Media & Society. https://doi.org/10.1177/14614448211018833
Bowker, L. (2020a). Machine translation literacy instruction for international business students and business English instructors. Journal of Business & Finance Librarianship, 25(1–2), 25–43. https://doi.org/10.1080/08963568.2020.1794739
Bowker, L. (2020b). Fit-for-purpose translation. In M. O’Hagan (Ed.), The Routledge handbook of translation technology (pp. 453–468). Routledge.
Bowker, L., & Buitrago Ciro, J. (2019). Machine translation and global research: Towards improved machine translation literacy in the scholarly community. Emerald Publishing Limited.
Bradley, P. (2018). Bots and data quality on crowdsourcing platforms. Prolific Blog. https://blog.prolific.co/bots-and-data-quality-on-crowdsourcing-platforms/
Campbell-Cree, A. (2017). Which Foreign languages will be most important for the UK post-Brexit? British Council. Retrieved August 6, 2021, from https://www.britishcouncil.org/research-policy-insight/insight-articles/which-foreign-language.
Canfora, C., & Ottmann, A. (2020). Risks in neural machine translation. Translation Spaces, 9(1), 58–77. https://doi.org/10.1075/ts.00021.can
Danet, B., & Herring, S. C. (2017). Introduction: The multilingual Internet. Journal of Computer-Mediated Communication, 9(1), JCMC9110. https://doi.org/10.1111/j.1083-6101.2003.tb00354.x
Esteve, A. (2017). The business of personal data: Google, Facebook, and privacy issues in the EU and the USA. International Data Privacy Law, 7(1), 36–47
Gaspari, F. (2007). The role of online MT in webpage translation [PhD Thesis, University of Manchester]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.840&rep=rep1&type=pdf
Google. (n.d.). Google Translate: About. Google. Retrieved July 19, 2021, from https://translate.google.com/intl/en/about/.
Gottschalk, C. M., & Thompson, M. S. (1959). Report on the state of machine translation in the United States. Yehoshua Bar Hillel. Technical Report No. 1. Prepared for the U.S. Office of Naval Research, Information Systems Branch, Jerusalem, Israel, 1959 (available as PB151746 from Office of Technical Services, U.S. Dept. of Commerce, Washington 25, D.C.). 48 pp. + appendixes. $2.25. Science, 130(3383), 1185–1185. https://doi.org/10.1126/science.130.3383.1185-b
Hale, S. (2007). Community interpreting. Palgrave Macmillan.
Hancock, J. T., Naaman, M., & Levy, K. (2020). AI-mediated communication: Definition, research agenda, and ethical considerations. Journal of Computer-Mediated Communication, 25(1), 89–100. https://doi.org/10.1093/jcmc/zmz022
Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Corrado, G., Hughes, M., & Dean, J. (2017). Google’s multilingual neural machine translation system: Enabling zero-shot translation. arXiv:1611.04558.
Kirilenko, A. P., & Stepchenkova, S. (2016). Inter-coder agreement in one-to-many classification: Fuzzy Kappa. PLoS ONE, 11(3), e0149787. https://doi.org/10.1371/journal.pone.0149787.
Klein, H. K., & Kleinman, D. L. (2002). The social construction of technology: Structural considerations. Science, Technology, & Human Values, 27(1), 28–52. http://www.jstor.org/stable/690274
Lawson, V., & Vasconcellos, M. (1994). Forty ways to skin a cat: Users report on machine translation. Aslib Proceedings, 46(3), 83–87. https://doi.org/10.1108/eb051348.
Microsoft. (2021). Microsoft Translator. Retrieved July, 20, 2021 from https://www.microsoft.com/en-us/translator/languages/.
Nitzke, J., Hansen-Schirra, S., & Canfora, C. (2019). Risk management and post-editing competence. The Journal of Specialised Translation, 31, 239–259
Nord, C. (2014). Translating as a purposeful activity: Functionalist approaches explained. Routledge.
Nurminen, M., & Papula, N. (2018). Gist MT users: A snapshot of the use and users of one online MT tool. In J. A. Pérez-Ortiz, F. Sánchez-Martínez, M. Esplà-Gomis, M. Popović, C. Rico, A. Martins, J. Van den Bogaert, & M. L. Forcada (Eds.), Proceedings of the 21st annual conference of the European Association for machine translation, Alicant, Spain, May 2018 (pp. 199–208). European Association for Machine Translation. http://eamt2018.dlsi.ua.es/proceedings-eamt2018.pdf
Ofcom. (2019). Online nation. 2019 report. The office of communications. Retrieved from https://www.ofcom.org.uk/__data/assets/pdf_file/0024/149253/online-nation-summary.pdf.
O’Hagan, M. (2009). Evolution of user-generated translation: Fansubs, translation hacking and crowdsourcing. The Journal of Internationalization and Localization, 1(1), 94–121
Olohan, M. (2011). Translators and translation technology: The dance of agency. Translation Studies, 4(3), 342–357. https://doi.org/10.1080/14781700.2011.589656
Olohan, M. (2017). Technology, translation and society: A constructivist, critical theory approach. Target, 29(2), 264–283
ONS (2020a). Estimates of the population for the UK, England and Wales, Scotland and Northern Ireland (MYE2—Persons, mid-2019: April 2020 local authority district codes edition). Office for National Statistics. Retrieved February 1, 2022, from https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland.
ONS (2020b). Education and training statistics for the UK. Office for National Statistics. Retrived February 1, 2022, from https://explore-education-statistics.service.gov.uk/find-statistics/education-and-training-statistics-for-the-uk/2020.
Oudshoorn, N., & Pinch, T. (2003). Introduction. In N. Oudshoorn & T. Pinch (Eds.), How users matter: The co-construction of users and technology. The MIT Press. https://doi.org/10.7551/mitpress/3592.003.0002.
Pérez-Sabater, C., Peña-Martínez, G., Turney, E., & Montero-Fleta, B. (2008). A spoken genre gets written: Online football commentaries in English, French, and Spanish. Written Communication, 25(2), 235–261. https://doi.org/10.1177/0741088307313174
Pickering, A. (2010). Material culture and the dance of agency. In D. Hicks & M. C. Beaudry (Eds.), The Oxford handbook of material culture studies (pp. 191–208). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199218714.013.0007.
Pinch, T. J., & Bijker, W. E. (1987). The social construction of facts and artifacts: Or how the sociology of science and the sociology of technology might benefit each other. In W. E. Bijker, T. P. Hughes, & T. J. Pinch (Eds.), The social construction of technological systems: New directions in the sociology and history of technology. MIT Press.
Pitman, J. (2021). Google Translate: One billion installs, one billion stories. The Keyword. https://blog.google/products/translate/one-billion-installs/.
Pituxcoosuvarn, M., & Ishida, T. (2018). Multilingual communication via best-balanced machine translation. New Generation Computing, 36(4), 349–364. https://doi.org/10.1007/s00354-018-0041-7
Polezzi, L. (2012). Translation and migration. Translation Studies, 5(3), 345–356
Prolific Team. (2019). Representative samples FAQ. Prolific.co. Retrieved July 2, 2021, from https://researcher-help.prolific.co/hc/en-gb/articles/360019238413-Representative-Samples-FAQ.
Ramati, I., & Pinchevski, A. (2018). Uniform multilingualism: A media genealogy of Google Translate. New Media & Society, 20(7), 2550–2565. https://doi.org/10.1177/1461444817726951
Robertson, S., Deng, W. H., Gebru, T., Mitchell, M., Liebling, D. J., Lahav, M., Heller, K., Díaz, M., Bengio, S., & Salehi, N. (2021). Three directions for the design of human-centered machine translation. Google Research. https://storage.googleapis.com/pub-tools-public-publicationdata/pdf/1bb66ff36a5eb4650a76a3d05ea57e09c0203366.pdf
Sindoni, M. G. (2013). Spoken and written discourse in online interactions: A multimodal approach. Routledge.
Specia, L., Blain, F., Logacheva, V., Astudillo, R., & Martins, A. (2018). Findings of the WMT 2018 Shared Task on Quality Estimation. In Proceedings of the Third Conference on Machine Translation (WMT) (Vol. 2, pp. 689–709). Association for Computational Linguistics. http://aclweb.org/anthology/W18-6451
Tinsley, J., & Shterionov, D. (2019). Foreword by the user track program chairs. In B. Haddow, R. Sennrich, J. Tinsley, D. Shterionov, C. Rico, F. Gaspari, & M. L. Forcada (Eds.), Proceedings of machine translation summit XVII, Volume 2: Translator, project and user tracks (pp. n.p.). European Association for Machine Translation
van Dijk, J. (2012). The Network society (3rd ed.). Sage.
Vieira, L. N., O’Hagan, M., & O’Sullivan, C. (2020). Understanding the societal impacts of machine translation: A critical review of the literature on medical and legal use cases. Information, Communication & Society, 24(11), 1515–1532. https://doi.org/10.1080/1369118X.2020.1776370
Way, A. (2013). Traditional and emerging use-cases for machine translation. Paper presented at Translating and the Computer 35. https://www.computing.dcu.ie/~away/PUBS/2013/Way_ASLIB_2013.pdf
Way, A. (2018). Quality expectations of machine translation. In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment (pp. 159–178). Springer.
Wolfe, R. (2021). Special issue: Sign language translation and avatar technology. Machine Translation, 35(3), 301–304. https://doi.org/10.1007/s10590-021-09270-4
Yang, J., & Lange, E. (2003). Going live on the internet. In H. Somers (Ed.), Computers and Translation: A Translator’s Guide (pp. 191–210). John Benjamins.
This work was funded internally by the Faculty of Arts at the University of Bristol. The authors have no relevant financial or non-financial interests to disclose.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Vieira, L.N., O’Sullivan, C., Zhang, X. et al. Machine translation in society: insights from UK users. Lang Resources & Evaluation 57, 893–914 (2023). https://doi.org/10.1007/s10579-022-09589-1