Scoring Fairness in Large-Scale High-Stakes English Language Testing: An Examination of the National Matriculation English Test

  • Yi Mei
  • Liying Cheng


Empirical research exploring test fairness in scoring written performance has been mostly conducted in the North American context. There has been little research conducted in Asian countries such as China. Considering the extreme high stakes of large-scale testing in this context, this study examines what and how raters’ scoring decisions were affected by the features of writing intended (or unintended) to be measured in the National Matriculation English Test (NMET) in China. The study further explores whether there was any difference in rating behaviours between novice and experienced NMET raters. The results highlight the extent to which raters attended to the NMET rating scale which led to a deeper understanding of scoring fairness involved in large-scale high-stakes tests within the Chinese context and has implications on scoring fairness in other similar contexts internationally.


China’s National Matriculation English Test Novice and experienced raters Rating behaviour Test fairness 



The study was supported by a SEED research grant (Liying Cheng: Principal Investigator) from Faculty of Education, Queen’s University, Kingston, Ontario, Canada.


  1. Alderson, J. C., & Urquhart, A. H. (1985). The effect of students’ academic discipline on their performance in ESP reading tests. Language Testing, 2, 192–204.CrossRefGoogle Scholar
  2. American Educational Research Association (AERA), American Psychological Association (APA), & National Council for Measurement in Education (NCME). (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  3. Bai, L. (2011). 我国科举录取名额分配制度的历史与反思——兼谈我国高考录取中的考试剬平与区域剬平 [Reflecting on the history of Chinese imperial examination enrolment quota distribution system: Fairness in test outcomes and regional parity in Gaokao enrolment system]. Educational Innovation, (6), 6–7.Google Scholar
  4. Barkaoui, K. (2007). Participants, texts, and processes in ESL/EFL essay tests: A narrative review of the literature. Canadian. Modern Language Review, 64, 99–134. doi: 10.3138/cmlr.64.1.099.CrossRefGoogle Scholar
  5. Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44, 31–57. doi: 10.5054/tq.2010.214047.CrossRefGoogle Scholar
  6. Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28, 51–75. doi: 10.1177/0265532210376379.CrossRefGoogle Scholar
  7. Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20, 1–25.CrossRefGoogle Scholar
  8. Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221–256). Westport: Praeger Publishers.Google Scholar
  9. Cao, Y., & Zhang, H. (1999). Detection of differential item functioning in a Chinese vocabulary test. Acta Psychologica Sinica, 31, 460–467.Google Scholar
  10. Cheng, L. (2008). The key to success: English language testing in China. Language Testing, 25, 15–37. doi: 10.1177/0265532207083743.CrossRefGoogle Scholar
  11. Cheng, L. (2010). The history of examinations: Why, how, what, whom to select? In L. Cheng & A. Curtis (Eds.), English language assessment and the Chinese learner (pp. 13–26). New York: Routledge: Taylor & Francis Group.Google Scholar
  12. Clapham, C. (1998). The effect of language proficiency and background knowledge on EAP students’ reading comprehension. In A. J. Kunnan (Ed.), Validation in language assessment (pp. 141–168). Mahwah: Lawrence Erlbaum.Google Scholar
  13. Cole, N. S., & Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369–382. doi: 10.1111/j.1745-3984.2001.tb01132.x.CrossRefGoogle Scholar
  14. Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7, 31–51. doi: 10.1177/026553229000700104.CrossRefGoogle Scholar
  15. Cumming, A., Kantor, R., & Powers, D. (2001). Scoring TOEFL essays and TOEFL 2000 prototype writing tasks: An investigation into raters’ decision making and development of a preliminary analytic framework (TOEFL Monograph Series MS-22). Retrieved from Educational Testing Service website:
  16. Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86, 67–96.CrossRefGoogle Scholar
  17. Dai, J., Wei, X., & Liu, F. (2010). 教育考试剬平性的基本理论研究 [Fundamental theoretical research on educational test fairness]. China Higher Education Research, (8), 27–29.Google Scholar
  18. Dong, S., & Ma, S. (2011). Fairness analysis on assessment with score report from measurement perspective. Examinations Research, 1, 59–64.Google Scholar
  19. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.Google Scholar
  20. Guo, F. (2009). Fairness of automated essay scoring of GMAT® AWA (GMAC® Research Reports: RR-09-01). Retrieved from the Graduate Management Admission Council® website:
  21. Guo, G. (2010). 高考剬平性的影响要素分析 [Analysis of influential factors in achieving fairness in Gaokao]. Theory and Practice of Education, 30(17), 15–17.Google Scholar
  22. Hale, G. (1988). Student major field and text content: Interactive effects on reading comprehension in the TOEFL. Language Testing, 5, 49–61.CrossRefGoogle Scholar
  23. Huang, J. (2007). Examining the fairness of rating ESL students’ writing on large-scale assessments. Unpublished doctoral dissertation. Queen’s University, Kingston.Google Scholar
  24. Huang, J. (2011). Generalizability theory as evidence of concerns about fairness in large-scale ESL writing assessments. TESOL Journal, 2, 423–443. doi: 10.5054/tj.2011.269751.CrossRefGoogle Scholar
  25. Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European language testing in a global context (pp. 27–48). Cambridge: Cambridge University Press.Google Scholar
  26. Kunnan, A. J. (2008). Large scale language assessment. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Language testing and assessment 2nd ed., Vol. 7, pp. 135–155). New York: Springer.Google Scholar
  27. Lee, H. K. (2004). A comparative study of ESL writers’ performance in a paper-based and a computer-delivered writing test. Assessing Writing, 9, 4–26.CrossRefGoogle Scholar
  28. Li, L. (2007). 教育剬正视野中的高考录取制度改革—兼论考试剬平与区域剬平之争 [Gaokao enrolment system reform in visions of educational justice: On the dispute between fairness in test outcomes and regional parity]. Hubei Social Sciences, (9), 156–158.Google Scholar
  29. Lu, Y. (2011). Fairness in writing assessment: A survey of factors that affect rater bias. Foreign Language Testing and Teaching, 2, 30–36.Google Scholar
  30. Ma, S. (2009). 建国60年来我国大学入学考试制度的沿革与发展 [History and development of the university entrance examination system in China in 60 years]. Educational Measurement and Evaluation, (10), 49–52.Google Scholar
  31. May, L. A. (2007). Interaction in a paired speaking test: The rater's perspective. Unpublished doctoral dissertation. The University of Melbourne, Melbourne.Google Scholar
  32. Milanovic, M., Saville, N., & Shen, S. (1996). A study of the decision-making behaviour of composition markers. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment (pp. 92–114). Cambridge, UK: Cambridge University Press.Google Scholar
  33. O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19, 169–192.CrossRefGoogle Scholar
  34. Qi, L. (2006). Some reflections on washback. Foreign Languages and Their Teaching, 8, 29–32.Google Scholar
  35. Qi, L. (2007). Is testing an efficient agent for pedagogical change? Examining the intended washback of the writing task in a high-stakes English test in China. Assessment in Education: Principles, Policy and Practice, 14(1), 51–74.Google Scholar
  36. Stricker, L. J., Rock, D. A., & Lee, Y. W. (2005). Factor structure of the LanguEdgeTM Test across language groups (TOEFL Monograph Series MS-32). Retrieved from Educational Testing Service website:
  37. Swinton, S. S., & Powers, D. E. (1980). Factor analysis of the TOEFL® test for several language groups (TOEFL Research Report: RR-06). Retrieved from Educational Testing Service website:
  38. Vaughan, C. (1991). Holistic assessment: What goes on in the rater’s mind? In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111–125). Norwood: Ablex.Google Scholar
  39. Wang, J. W. (2011). Study on present situation of examination fairness and countermeasure. China Examinations, 5, 53–57.Google Scholar
  40. Wolfe, E. W., & Manalo, J. R. (2005). An investigation of the impact of composition medium on the quality of TOEFL writing scores (TOEFL Research Report RR-04-29). Retrieved from
  41. Yang, Y. (2001). 学业成绩评定的激励作用分析 [Analysis of incentives of attainment assessment]. Journal of Teaching and Management, (4), 15–16.Google Scholar
  42. Zeidner, M. (1986). Are English language aptitude tests biased towards culturally different minority groups? Some Israeli findings. Language Testing, 3, 80–95.CrossRefGoogle Scholar
  43. Zeng, X., & Meng, Q. (1999). 目功能差异及其检测方法 [Differential item functioning and its detection methods]. Journal of Developments in Psychology, 7(2), 41–47.Google Scholar
  44. Zhou, H., & Shen, G. (2006). Review and reflection on the history of enrolment by examination in China. Educational Research, 4, 43–48.Google Scholar
  45. Zhou, J., Ding, X., Zhang, Q., & Wen, H. (2010). Empirical analysis on the fairness in national uniformed entrance examination in general colleges and universities in Beijing. Educational Research, 10, 46–52.Google Scholar
  46. Zou, S. (2011). On enhancing test fairness. Foreign Language Testing and Teaching, 1, 42–50.Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  1. 1.Faculty of EducationQueen’s UniversityKingstonCanada

Personalised recommendations