Introduction

Translation is a multifaceted, complex process involving various cognitive, social, crosslinguistic, and cross-cultural variables (Al-Qinai, 2000; Angelelli, 2009; House, 2015; Korol, 2020). Zehnalová (2013, p. 43) stated that ‘translation is a complex form of communication, engaging not only the subjects of the text producer and recipient but also the subject of the communication mediator—the translator’. The demand for highly qualified translators has increased owing to the need for multilingual communication (Williams, 2004; Xie and Wu, 2022; Zehnalová, 2013). Thus, translation education is widely practised and researched (Campbell and Hale, 2003; House, 2015; Manipuspika, 2021; Salamah, 2021). The development of translation education and training programmes so that such graduates can acquire standard translation competencies, has become indispensable (Manipuspika, 2021).

However, translation education cannot be successfully improved without enhancing Translation Quality Assessment (TQA) methodologies and practices (Williams, 2004). TQA is an essential component of any translation education and training programme. It is pivotal to the articulation of a sound translation theory and definition, and crucial to the development of effective translation curricula and teaching methods (Amini, 2018; Hönig, 1997; House, 2015; Williams, 2004). The increased demand for the services of highly qualified translators and integrators has created the need for the meticulous examination of translation education and training (Korol, 2020; Salamah, 2021; Williams, 2004). This includes the actual TQA practices used by translation instructors. As Hönig (1997, p. 15) notes, TQA is a ‘central issue in university training courses. The way it is taught and conducted influences all aspects of the practice and theory of translation’.

Previous studies have reported different results from using other TQA methods, specifically regarding the degree of reliability and efficiency, as well as the merits and demerits of such methods (Al-Qinai, 2000; Amini, 2018; Phelan, 2017; Waddington, 2001). To the best of our knowledge, an under-investigated area in TQA research is the instructors’ TQA practices in different translation education and training programmes, particularly in Arab countries. Therefore, the present study aimed to document these practices to facilitate an understanding of the state of assessment in translation education classrooms. In doing so, this study contributes to the identification of the theoretical foundations of translation on which instructors base their TQA practices. The findings contribute to improved translation training and more effective curricula and instructions.

Literature review

As assessment plays a role in any educational context as teachers should possess sufficient knowledge of different facets of assessment to effectively develop their teaching, support their students, respond to their needs, and meet stakeholder expectations (Herrera and Macías, 2015). Assessment literacy is extremely crucial in TQA. The intricacy of translation as a cognitive procedure that occurs ‘in the translator’s head’ and as ‘a social, crosslinguistic, and cross-cultural practice’ (House, 2015, p. 1) affects TQA. This renders TQA a problematic area for different reasons (Bowker, 2000; Gile, 1995; Kelly, 2005; Orlando, 2011; Williams, 2004; Zehnalová, 2013). A major reason is the vague concept of quality, whose definition has become controversial (Amini, 2018; Campbell and Hale, 2003) because of its ‘very fuzzy and shifting boundaries’ (Bowker, 2000, p. 183). The definition of quality in translation assessment is a long-debated issue in the field of translation studies because ‘different ideals, expectations, and conceptions of quality are at stake’ (Martinez-Mateo, 2016, p. 38). The assessment of translation quality presupposes a theory of translation (House, 2015). Consequently, different concepts of quality are expected as a result of different views of translation. However, as reported by Amini (2018, p. 2), ‘there is no common ground when it comes to defining quality in translation either from a practical or theoretical viewpoint’.

In addition, TQA is highly problematic because of the subjective nature of assessment practices (Bowker, 2000; Martinez-Mateo, 2016). It is a delicate practice ‘performed in a subjective, undisciplined/ad hoc fashion’ (Melis and Albir, 2001, p. 272). TQA research has prioritised the substitution of objectivity for subjectivity during translation assessment (Al-Qinai, 2000; Amini, 2018; Baker, 1992; Hatim and Mason, 1990; Horton, 1998; House, 2015; Wilss, 1982). The approach to de-subjectifying translation assessment and embrace objectively oriented assessment methods is central to TQA research. According to Xie and Wu (2022, p. 1), in the absence of a relatively objective and practical TQA model, ‘translation researchers have focused on subjective criticism and error analyses of translations’.

Certain translation studies researchers contend that efforts should be devoted towards developing a methodology for translation assessment that enables evaluators to provide objective and constructive feedback to their students (Bowker, 2000; Xie and Wu, 2022). However, Zehnalová (2013, p. 43) asks, pertinently, that ‘why should we then expect that a necessarily subjective evaluation of something subjective by its very nature will be perfectly objective?’ Overall, TQA researchers have been dissatisfied with the findings of various proposed methods for reducing the subjectivity of translation (Amini, 2018). For them, subjectivity is inevitable and fundamental to TQA (Hutchins and Somers, 1992).

Thus, critical issues related to TQA, such as the divergent views regarding the definition of quality, the lack of generally accepted and recognised translation competence structures, and how objective the assessment of translation should be, have not yet been resolved. Arango-Keeth and Koby (2003) addressed two other problems associated with translation assessment: the lack of standardised terminology for evaluating translations, and the existence of various assessment procedures resulting from different theoretical approaches to translation.

A major concern in TQA studies is adequate assessment criteria and quality scales, which may not be achieved without examining the TQA practices used across various programmes. Only a few studies have examined in depth the assessment procedures adopted by translation instructors. Firoozkoohi et al. (2012) asked 100 male and female university students majoring in translation across 20 Iranian universities about the assessment criteria used by their instructors, and found that ‘translation instructors were not unanimous in terms of the criteria used in assessing students’ translations. Furthermore, in most cases, the students were unaware of the criteria used’ (Firoozkoohi et al., 2012, p. 1).

Waddington (2001) administered a questionnaire to Canadian and European translation teachers to examine their assessment strategies. Unexpectedly, 38.5% of the respondents employed a holistic (subjective) method to evaluate student translations, their judgement based on the requirements from professional translators. In addition, 36.5% of the teachers employed a method based on error analysis, and 23% employed a combined error analysis with a holistic evaluation.

The limited number of studies conducted in this area and the mixed results obtained indicate the need for further investigations into TQA practices. Translation educators are expected to be aware of what is being assessed, why they are assessing it, the most effective assessment methods, and how to develop proper assessment practices (Chappuis et al., 2012). The understanding of TQA practices has been hampered by inadequate data on the practices of translation educators. Therefore, the present empirical study intended to fill this gap by focusing on the general assessment (GA) and TQA practices adopted by instructors at different colleges of Languages and Translation in certain Arab countries. It employed a quantitative approach to elucidate the current practices and identify the requirements of translation assessment literacy, and proposed a tailored inventory for the investigation of TQA practices based on Approaches to Classroom Inventory (ACAI), a two-part survey developed by DeLuca et al. (2016) used to examine teachers’ approaches to classroom assessment. To our knowledge, this is the first large-scale investigation of TQA procedures in the Arab context. The study aimed to answer the following research questions:

  1. 1.

    What are the GA practices adopted by the faculty members of the colleges of Languages and Translation in Arab countries?

  2. 2.

    What specialised TQA practices are adopted by the faculty members of these colleges?

  3. 3.

    Is there a correlation between GA and TQA practices?

Methodology

We administered a three-part online self-explanatory survey to obtain targeted information on the GA and TQA practices of faculty members teaching translation at different colleges of Languages and Translation. Part 1 of the survey was designed to elicit demographic information on gender, job title, academic field, language field, affiliated university, years of experience, and GA and translation assessment education and training. Parts 2 and 3 comprise closed-ended items seeking response on a five-point Likert scale (1 = strongly disagree to 5 = strongly agree). Part 2 is a modified version of the ACAI, which includes statements on the different practices that the target population believed should be applied in GA—25 statements on assessment purposes, processes, fairness, and theory. Table 1 summarises the ACAI assessment dimensions and sets of priorities.

Table 1 ACAI assessment dimensions and sets of priorities.

Part 3 comprised a TQA Practices Inventory (TQAPI), designed as part of the present study. To the best of our knowledge, this is the first comprehensive TQA inventory that examines different practices employed by translation instructors. The TQAPI comprised 31 TQA-related items on the following general themes: variation by course, rubric, role of errors, focus of assessment, fairness, assessment methods, and objectivity (Table 2).

Table 2 TQAPI themes and items.

The next phase involved piloting the survey items by three translation faculty members from three Saudi universities and one translation faculty member from Kuwait. The survey was administered to test the items and determine if the survey required amendments before the main data collection. The final version of the survey was administered online to a large group of faculty members teaching translation at different colleges of Languages and Translation. We employed different channels to invite them to participate in the survey, including formal and informal communication through email, Twitter, WhatsApp groups, and personal contact. Additionally, prior to the survey, we explained the research objectives and obtained consent to participate from the participants.

Participant characteristics

A total of 98 faculty members from 16 universities participated. Table 3 lists the demographic characteristics of the participants obtained in part 1 of the survey. The demographic characteristics comprised gender, university affiliation, job title, educational degree, years of experience, field of specialisation, and linguistic specialisation.

Table 3 Demographic characteristics of participants.

The most common field of specialisation was translation/translation studies with 36 participants majoring in translation studies (a frequency of 36.7%), followed by applied linguistics (25.5%). Other fields of specialisation included English language and literature, language education, architecture and planning, and business management. The most common linguistic specialisation was English (58.2%), followed by Arabic (17.3%). Certain participants had multiple specialisations, such as Arabic, English, Spanish, German, and French.

Data analysis

The relatively small sample size was crucial in determining what statistical tests to use. We employed reliability tests, independent samples t-tests, and one-way analysis of variance (ANOVA) tests to analyse the results. To confirm the internal consistency and reliability of the data, Cronbach’s alpha and Kolmogorov–Smirnov coefficient tests were employed. We conducted descriptive analysis (frequencies, percentages, mean, and standard deviation) to investigate the GA and TQA practices applied by faculty members of these colleges. Additionally, we employed one-way ANOVA, independent samples t-tests, the least significant difference (LSD) test, and Chi-square test to compare various variables to determine if any relationships existed between the analysis trends and the direction of the data trends.

Regarding internal consistency and reliability, the results of the correlation coefficient indicated a statistically significant correlation between each item and the overall degree of the axis. This implied that the tool had a high degree of internal consistency. Table 4 lists the reliability statistics for the two sets of survey inventories (ACAI and TQAPI) represented by Cronbach’s alpha coefficient and Cronbach’s alpha based on standardised items. Cronbach’s alpha based on standardised items was similar to the regular Cronbach’s alpha; however, it was calculated after standardising the scores of the items. Table 4 shows that for the ACAI set of items measuring GA practices, Cronbach’s alpha coefficient was 0.910, whereas that based on standardised items was 0.915, indicating that the items measured the same construct and were reliable. For the TQAPI survey set, Cronbach’s alpha coefficient was 0.876, and that based on standardised items was 0.880, indicating that the items measured the same construct and were reliable. Overall, these high values suggested that the survey items for the GA and TQA practices were reliable.

Table 4 Cronbach’s alpha reliability.

Table 5 lists the Kolmogorov–Smirnov coefficients for testing the data distribution. The Kolmogorov–Smirnov coefficients were not significant at 0.05, indicating that the data followed a normal distribution.

Table 5 Kolmogorov–Smirnov coefficients for testing the data distribution.

Results

The main objective of this study was to investigate the GA and TQA practices of the faculty members of selected colleges of Languages and Translation in Arab countries. Specifically, the study aimed to identify the common GA practices, evaluation criteria, and guidelines employed by faculty members in TQA. First, we enquired about the participants’ educational background (pre- and in-service) to determine if differences existed in their practices according to their education and professional development opportunities. In total, 45.9% of the participants had taken a university course in GA and 60.2% had taken a professional development course in GA. This is interesting, considering that almost 75% of the participants already had a Ph.D. in linguistics (general and applied) and translation. With such a high level of education, the degree of assessment literacy should have been high.

Again, among participants who had undergone TQA-related training, 35.7% had studied a university course, whereas only 27.6% had taken a training course. This is an important finding that indicates the need to conduct a thorough analysis of what is required of translation instructors. A possible explanation for this finding is the variation in the majors of the participants, as reported in Table 3. About 39.8 and 50% of the participants majored in translation and linguistics and applied linguistics, respectively, which explains why only 35.7% of the participants had studied a university course related to TQA. A variable that must be considered when investigating TQA practices is the possibility that such practices might be influenced by teaching experience, rather than educational background. Thus, TQA training is vital for faculty members of the colleges of Languages and Translation.

The first question this study sought to answer is ‘What are the GA practices of faculty members of certain Arab Colleges of Languages and Translation?’ The mean scores of the 25 items related to GA practices ranged between 3.80 and 4.54. This explains the main trend that the majority of the participants valued and are committed to assessment purposes, processes, fairness, and measurement theories. Table 6 lists the distribution of responses on GA practices. The items representing the assessment process (communication) had the highest mean scores. The statement, ‘I provide timely feedback to students to improve their learning’, had the highest mean (4.54). Assessment purpose (assessment for learning), represented by the statement, ‘I use a variety of formative assessment techniques’, ranked second (4.47), followed by assessment theory (contextual): ‘I link my assessment tasks/questions to learning objectives’ (4.45). Assessment purpose (assessment of learning), represented by the statement, ‘I use various summative assessment types, such as multiple-choice type tests, essays, and performance-based assessments’, came next with a mean of 4.40.

Table 6 Distribution of responses on GA practices.

The items related to the themes of assessment purpose (assessment of learning) and assessment fairness (equitable and standardised) had the lowest mean scores. The statement, ‘My summative assessment (e.g. quizzes) grades provide a meaningful representation of individual student learning as related to curriculum expectations’, representing assessment purpose (assessment of learning), had the lowest mean (3.81). For assessment fairness (standardised), the statement, ‘In my class, all students complete the same assignments, quizzes, and tests’, had the second lowest mean (3.85). Both of the two assessment fairness (equitable)-focused items of ‘I provide adequate resources, time, and accommodation to prepare students with special needs/exceptionalities for assessment’ and ‘I spend adequate time ensuring that my assessments are responsive to and respectful of the cultural and linguistic diversity of students’ had low mean scores of 4.07.

Regarding the second research question, which focused on the TQA practices of the faculty members, Table 7 presents the distribution of responses. The results indicate that the responses were high, as the mean scores ranged between 2.98 and 4.45. The most endorsed items were related to the role of errors, assessment methods, and objectivity. The role of errors, represented by the item, ‘I classify students’ translation errors based on their types, and I explain the most important solutions to overcome them’, was the most endorsed practice (4.46). Assessment methods, represented by ‘I use the question of translating a full text from the source language (e.g. English, Japanese, or other languages) into Arabic’, was the second most endorsed item (4.33). The statement, ‘Students’ TQA should be objective and based on specific and clear criteria’, in the objectivity category was the third most endorsed item (4.26).

Table 7 Distribution of responses on TQA practices.

The least endorsed items were related to the categories of rubric, subjectivity, and assessment methods. Assessment methods was the least endorsed category in the TQA practices. The statement, ‘I use fill-in-the-blank questions in midterm and final translation exams’, had the lowest mean (2.99). The second least endorsed statement was ‘Students’ TQA should be comprehensive and subjective (non-objective)’ in the subjectivity category (3.06).

The two statements, ‘The rubric should be the only and sufficient means for assessing students’ translation quality’ and ‘A standardised rubric should be employed for all translation courses, regardless of whether they are general or specialised’, had mean scores of 3.29 and 3.07, respectively. Thus, the rubric category was the third least endorsed category.

Another major issue explored in this study is the likely correlation between GA and TQA practices among the study population. Table 8 lists the Pearson correlation coefficient and significance level (p value) of the correlation between the two sets of categories. The Pearson correlation coefficient between the GA and TQA practices was 0.555. The p value for the correlation was 0.000, which is less than 0.01, suggesting a significant positive correlation between both practices at the 0.01 level (2-tailed).

Table 8 Correlation between GA and TQA practices.

Finally, the independent samples t test was employed to test the hypothesis that GA and TQA practices vary across gender, linguistic specialisation, and major. Regarding gender, Table 9 shows that the t-value for GA practices was 0.257, which was not significant at the 0.05 level. This indicates that gender does not lead to significant differences in GA practices. By contrast, for TQA practices, the t-value of 2.77 was statistically significant at the 0.05 level, indicating that gender affects TQA practices in favour of male faculty members.

Table 9 Independent samples t test to identify differences in GA and TQA practices due to gender.

To investigate the possible effect of job title on GA and TQA practices, we conducted one-way ANOVA to test the statistical hypothesis that there are no differences in GA and TQA practices due to job title. Table 10 indicates that the F-values for GA and TQA practices were not statistically significant at 0.564 and 0.108, respectively. Thus, the variable of job title does not affect GA or TQA practices.

Table 10 One-way ANOVA test to identify differences in GA and TQA practices due to job title.

Another interesting variable was the faculty members’ majors. The statistical hypothesis assumes differences in GA and TQA practices due to the major. To verify this hypothesis, we conducted a one-way ANOVA (Table 11). The F-value for GA practices was 0.485, that is, not significant at 0.05. Therefore, it is safe to state that the majors had no significant effect on GA practices. The results also revealed that the F-value for TQA practices was 0.445 and not significant, indicating no differences in the TQA practices as a result of faculty members’ majors.

Table 11 One-way ANOVA test to identify differences in GA and TQA practices due to major.

Finally, the statistical hypothesis assumes that no differences exist in GA and TQA practices due to linguistic specialisation. A one-way ANOVA was employed to test this hypothesis. Table 12 shows that the F-value for GA practices was 0.263 and not significant at the 0.05 level. Therefore, there are no differences in GA practices due to linguistic specialisation.

Table 12 One-way ANOVA test to identify differences in GA and TQA practices due to linguistic specialisation.

The results also show that the F-value for TQA practices was 0.758, which was not significant at the 0.05 level. Thus, it is safe to state that there are no differences in TQA practices due to linguistic specialisation.

Discussion and conclusion

Translation assessment remains controversial because of the nature of translation as a human product guided by various human and social factors. The study findings showed that the participants highly endorsed GA and TQA practices. The most endorsed item in GA practices relates to communicating assessment results as part of the assessment process, whereas the least endorsed item relates to assessment fairness (equitable and standardised). The GA findings correlated with those of Almossa and Alzahrani (2022), who applied the same ACAI version to faculty members teaching various subjects at universities in Saudi Arabia. The two most endorsed items relate to communication in assessment and contextualised assessment. This differs from the present study with a higher mean score, suggesting that the translation instructors strongly endorsed assessment practices. Assessment fairness is the least endorsed category in both studies. The findings related to mapping learning assessment with the curriculum are the lowest in both studies. The most endorsed items in TQA practices relate to the role of errors, objectivity, and assessment methods, such as exams and text translation to and from target languages (the most assessment methods used and the types of assessment questions). The least endorsed items relate to the rubric, subjectivity in assessment, and assessment methods (the least used types of assessment questions, e.g. true–false questions and pop quizzes). Thus, Albir and Pavani (2018) have called for a multidimensional assessment method, although the traditional summative assessment method of assigning a text to students to translate during exams is more popular.

An important aspect of the available literature on TQA is the use of rubrics and their pivotal role in translation assessment. This study shows that the participants believed in the importance of rubrics as a reliable tool for assessing their students’ TQ, though not the only one. In the same context, the participants expressed that they were fierce advocates for freedom of translation instructors in designing and using the rubrics that suit the content and objectives of the courses they teach. Such findings indicate how critical it is to design translation instructors’ training based on the actual needs and perceptions of the trainees.

Another controversial issue closely related to TQA practices is the call for using wholly objective measures when evaluating students’ translations. Translation teachers, as represented by the study participants, believe that assessment measures should be as objective and as minimally subjective as possible. Their belief is further supported by their perception of the use of rubrics as an objective and criterion-referenced tool in assessing their students’ TQA.

Interestingly, no statistical differences are observed in GA and TQA practices across gender, job title, academic field, and linguistic specialisation. The lack of significance of the data can be attributed to two factors. First, the participants are a homogenous group, with similar educational backgrounds and experiences in teaching and translation assessment. The second reason is the nature of the unified assessment practices.

The findings show that the number of participants who had taken a university course or in-service professional development course in GA was higher than those who had taken a university or professional development course in specialised TQA. The variation in training and education assessment streams (general and specialised) was because not all the participants who taught translation courses had majored in translation at the university level. This finding highlights the importance of considering training non-translation-major instructors to equip them with the necessary tools for assessment practices. Translation majors might benefit from pre- and in-service training and professional development in TQA as a professional translator or instructor in the pedagogical approach. The reported lack of a sufficient TQA-focused curriculum within translation training programmes is in line with Salamah’s (2021) observation that there is a pedagogical gap in translator training that includes programme or course objectives, curricula, and translation methods, ‘as well as the absence of solid pedagogical and methodological criteria for teaching translation and designing translation courses’ (p. 277). Efforts should be directed towards revising the existing components of translation training programmes and implementing the required modifications for the benefit of professional translation instructors. The present study is expected to benefit TQA researchers and instructors interested in exploring and developing the pedagogical approach in their courses, especially the data it provides on the GA and TQA practices of translation faculty in translation courses. The study provides insight for translation faculty, colleges, and other decision-makers on how translation students are being assessed and what assessment experiences they are exposed to. To our knowledge, this study is the first step towards documenting and reflecting on current assessment practices in translation courses.

The study has several implications for translation instructors regarding development of assessment practices. First, policymakers should identify the professional development requirements of instructors and the methods used for meeting these requirements. This will provide faculty who teach translation courses with access to specialised translation assessment courses and professional development resources, such as books, workshops, seminars, and conferences. In addition, those responsible for the development of training courses for translation instructors should identify the actual needs to which training should be directed. The practice of constructing proper rubrics and using different evaluation methods and techniques is one of the needs that can be fulfilled by translation instructors’ training.

This study has certain limitations that provide directions for future research. First, the sample size was small because the participants were purposely selected to focus on instructors in translation, which is not a main major in certain universities. Second, the data collected were based on the participants’ self-reports of their assessment practices and beliefs. Third, although the first ACAI has been established, the second TQAPI was piloted in this study. While we observed the potential and limitations of the TQAPI items, more work is required to develop the instrument as a self-report tool for researchers and colleagues interested in reflecting on their practices. The TQAPI is a useful training tool for translator-training courses; we invite researchers to experiment with it and adopt it in a way that might be relevant in various contexts.

This study is an initial step in proposing a TQA instrument, and further investigations of translation tests, tasks, and instructor evaluations are needed. We call on researchers to focus on translation assessment methods across assessment courses, in other contexts and in comparative studies.