Fairness and Social Justice in English Language Assessment

  • Bart DeygersEmail author
Reference work entry
Part of the Springer International Handbooks of Education book series (SIHE)


This chapter offers a critical historical overview of the research into and conceptions of fairness and justice in the language assessment literature. The focus is primarily on high-stakes assessment practices, since this is the area in which most of the relevant research is conducted. In order to clarify the meaning and origin of the current conceptions of fairness and justice, this chapter opens by going back to the original works in moral and political philosophy. Afterward, fairness, justice, and their relationship to validity are discussed.


Fairness Justice Social justice Validity high-stakes language testing Language assessment 


  1. Abedi J (2014) Accommodations in the assessment of English language learners. In: Kunnan AJ (ed) The companion to language assessment. Wiley, Chichester, pp 1115–1130Google Scholar
  2. Adams M (2014) Social justice and education. In: Reisch M (ed) The Routledge international handbook of social justice. Routledge, London/New York, pp 249–268Google Scholar
  3. Alderson JC, Urquhart AH (1985) The effect of students’ academic discipline on their performance on ESP reading tests. Lang Test 2:192–204. Scholar
  4. Allalouf A, Abramzon A (2008) Constructing better second language assessments based on differential item functioning analysis. Lang Assess Q 5:120–141. Scholar
  5. ALTE (2001) Principles of good practice for ALTE examinationsGoogle Scholar
  6. ALTE (2007) Minimum standards for establishing quality profiles in ALTE examinationsGoogle Scholar
  7. American Educational Research Association, American Psychological Association, National Council on Measurement, in Education (2014) The standards for educational and psychological testing. American Psychological Association, Washington, DCGoogle Scholar
  8. Aryadoust V (2018) Using recursive partitioning Rasch trees to investigate differential item functioning in second language reading tests. Stud Educ Eval 56:197–204. Scholar
  9. Aryadoust V, Goh CCM, Kim LI (2011) An investigation of differential item functioning in the MELAB listening test. Lang Assess Q 8:361–385CrossRefGoogle Scholar
  10. Baker BA, Tsushima R, Wang S (2014) Investigating language assessment literacy: collaboration between assessment specialists and Canadian university admissions officers. Lang Learn High Educ Berl 4:137–157. Scholar
  11. Ball SJ (2003) The more things change: educational research, social class and “interlocking” inequalities. Institute of education, LondonGoogle Scholar
  12. Barkaoui K (2010) Explaining ESL essay holistic scores: a multilevel modeling approach. Lang Test 27:515–535.
  13. Barkaoui K (2014) Multifaceted Rasch analysis for test evaluation. In: The companion to language assessment. Wiley, Malden, MAGoogle Scholar
  14. Binet A, Simon T (1916) The development of intelligence in children. Williams & Wilkins, BaltimoreGoogle Scholar
  15. Blake PR, McAuliffe K, Corbit J et al (2015) The ontogeny of fairness in seven societies. Nature 528:258–261. Scholar
  16. Borsboom D, Markus KA (2013) Truth and evidence in validity theory. J Educ Meas 50:110–114. Scholar
  17. Bourdieu P (1984) Distinction: a social critique of the judgment of taste. Harvard University Press, Cambridge, MAGoogle Scholar
  18. Bourdieu P (1991) Language and symbolic power, 7th edn. Harvard University Press, Cambridge, MAGoogle Scholar
  19. Boyd K, Davies A (2002) Doctors’ orders for language testers: the origin and purpose of ethical codes. Lang Test 19:296–322. Scholar
  20. Boylan M (2004) A just society. Rowman & Littlefield Publishers, LanhamGoogle Scholar
  21. Carlsen CH (2017) Giving LESLLA-learners a fair chance in testing. In: Proceedings of the 12th LESLLA symposium. University of Granada, GranadaGoogle Scholar
  22. Cattell JM (1905) Examinations, grades and credits. Pop Sci Mon 66:367–378Google Scholar
  23. Chen Z, Henning G (1985) Linguistic and cultural bias in language proficiency tests. Lang Test 2:155–163. Scholar
  24. Cleary TA (1968) Test Bias: prediction of grades of Negro and white students in integrated colleges. J Educ Meas 5:115–124CrossRefGoogle Scholar
  25. Cronbach LJ (1976) Equity in selection: where psychometrics and political philosophy meet. J Educ Meas 13:31–42CrossRefGoogle Scholar
  26. Cronbach LJ (1984) Essentials of psychological testing, 4th edn. Harper and Row, New YorkGoogle Scholar
  27. Cronbach LJ, Meehl PE (1955) Construct validity in psychological tests. Psychol Bull 52:281–302CrossRefGoogle Scholar
  28. Darlington RB (1971) Another look at “Cultural Fairness”. J Educ Meas 8:71–82CrossRefGoogle Scholar
  29. Davidson F (2012) Test specifications and criterion referenced assessment. In: Fulcher G, Davidson F (eds) The Routledge handbook of language testing. Routledge, London/New York, pp 197–208Google Scholar
  30. Davies A (1977) The Edinburgh course in applied linguistics, vol 4. Oxford University Press, LondonGoogle Scholar
  31. Davies A (2010) Test fairness: a response. Lang Test 27:171–176. Scholar
  32. Davies A (2012) Ethical codes and unexpected consequences. In: Fulcher G, Davidson F (eds) The Routledge handbook of language testing. Routledge, London/New York, pp 455–468Google Scholar
  33. De Jong JHAL (1983) Focusing in on a latent trait: an attempt at construct validation using the Rasch model. In: Van Weeren J (ed) Practice and problems in language testing 5. Papers presented at the International Language Testing Symposium. CITO, ArnhemGoogle Scholar
  34. Dexter EG (1903) High-grade men: in college and out by Edwin Grant Dexter. Pop Sci Mon 62:429–435Google Scholar
  35. Deygers B (2017) Just testing. Applying theories of justice to high-stakes language tests. ITL Int J Appl Linguist 168(2):143–162Google Scholar
  36. Deygers B, Van Gorp K (2015) Determining the scoring validity of a co-constructed CEFR-based rating scale. Lang Test 32:521–541. Scholar
  37. Deygers B, Van den Branden K, Van Gorp K (2017) University entrance language tests: a matter of justice. Lang Test.
  38. Diederich PB, French JW, Carlton ST (1961) Factors in judgments of writing ability. In: ETS research bulletin series. Educational Testing Service, PrincetonGoogle Scholar
  39. Dworkin R (2000) Affirmative action: is it fair? J Blacks High Educ 79–88.
  40. Dworkin R (2002) Sovereign virtue: the theory and practice of equality, Trade Paperback edn. Harvard University Press, Cambridge, MAGoogle Scholar
  41. Dworkin R (2003) Equality, luck and hierarchy. Philos Public Aff 31:190–198. Scholar
  42. Dworkin R (2013) Justice for Hedgehogs, Reprint edn. Belknap Press, Cambridge, MAGoogle Scholar
  43. EALTA (2000) EALTA guidelines for good practice in language testing and assessmentGoogle Scholar
  44. Eckes T (2005) Examining rater effects in TestDaF writing and speaking performance assessments: a many-facet Rasch analysis. Lang Assess Q 2:197–221. Scholar
  45. Edgeworth FY (1888) The statistics of examinations. J R Stat Soc 51:599–635Google Scholar
  46. Elder C (1997) What does test bias have to do with fairness? Lang Test 14:261–277. Scholar
  47. Elder C, Knoch U, Barkhuizen G, von Randow J (2005) Individual feedback to enhance rater training: does it work? Lang Assess Q 2:175–196. Scholar
  48. ETS (2010) Linking TOEFL iBT TM scores to IELTS® scores – a research reportGoogle Scholar
  49. Flaugher DL (1974) The new definitions of test fairness in selection: developments and implications. Educ Res 3:13–16. Scholar
  50. Foucault M (1977) Discipline and punish. The birth of the prison. Penguin, LondonGoogle Scholar
  51. Frost K, Elder C, Wigglesworth G (2012) Investigating the validity of an integrated listening-speaking task: a discourse-based analysis of test takers’ oral performances. Lang Test 29:345–369. Scholar
  52. Fulcher G (2010) Practical language testing. Hodder Education, LondonGoogle Scholar
  53. Fulcher G (2015) Re-examining language testing: a philosophical and social inquiry. Routledge, London/New YorkCrossRefGoogle Scholar
  54. Fulcher G, Davidson F, Kemp J (2011) Effective rating scale development for speaking tests: performance decision trees. Lang Test 28:5–29. Scholar
  55. Galaczi ED, Ffrench A, Hubbard C, Green A (2011) Developing assessment scales for large-scale speaking tests: a multiple-method approach. Assess Educ Princ Policy Pract 18:217–237. Scholar
  56. Giddens A (1979) Central problems in social theory: action, structure, and contradiction in social analysis. University of California Press, BerkeleyCrossRefGoogle Scholar
  57. Hale GA (1988) Student major field and text content: interactive effects on reading comprehension in the Test of English as a Foreign Language. Lang Test 5:49–61. Scholar
  58. Harsch C, Martin G (2012) Adapting CEF-descriptors for rating purposes: validation by a combined rater training and scale revision approach. Assess Writ 17:228–250. Scholar
  59. Harsch C, Martin G (2013) Comparing holistic and analytic scoring methods: issues of validity and reliability. Assess Educ Princ Policy Pract 20:281–307. Scholar
  60. Hume D (1978) A treatise of human nature. Clarendon Press, OxfordGoogle Scholar
  61. ILTA (2000) Code of ethicsGoogle Scholar
  62. ILTA (2007) Guidelines for practiceGoogle Scholar
  63. Inbar-Lourie O (2013) Guest Editorial to the special issue on language assessment literacy. Lang Test 30:301–307. Scholar
  64. Isaacs T, Thomson RI (2013) Rater experience, rating scale length, and judgments of L2 pronunciation: revisiting research conventions. Lang Assess Q 10:135–159. Scholar
  65. Kane MT (2010) Validity and fairness. Lang Test 27:177–182. Scholar
  66. Kane MT (2013) Validating the interpretations and uses of test scores. J Educ Meas 50:1–73. Scholar
  67. Kane MT, Kane J, Clauser BE (2017) A validation framework for credentialing tests. In: Buckendahl CW, Davis-Becker S (eds) Testing in the professions: credentialing polices and practice. Routledge, pp 20–41Google Scholar
  68. Kant I (1785) Fundamental principles of the metaphysic of morals. CreateSpace Independent Publishing Platform. (Original work published 1785), Scott’s Valley, CAGoogle Scholar
  69. Kaufmann P, Kuch H, Neuhaeuser C, Webster E (2010) Humiliation, degradation, dehumanization: human dignity violated. Springer Science & Business Media, DordrechtGoogle Scholar
  70. Khan K, Blackledge A (2015) They look into our lips. J Lang Polit 14:382–405. Scholar
  71. Khan K, McNamara T (2017) Citizenship, immigration laws, and language. In: Canagarajah S (ed) The Routledge handbook of migration and language. Routledge, London/New York, pp 451–467CrossRefGoogle Scholar
  72. Kunnan AJ (2000) Fairness and justice for all. In: Kunnan AJ (ed) Fairness and validation in language assessment. Cambridge University Press, Cambridge, UK, pp 1–13Google Scholar
  73. Kunnan AJ (2004) Test fairness. In: Milanovic M, Weir C (eds) European language testing in a global context. Cambridge University Press, Cambridge, UK, pp 27–48Google Scholar
  74. Kunnan AJ (2010) Test fairness and Toulmin’s argument structure. Lang Test 27:183–189. Scholar
  75. Kunnan AJ (2014) Fairness and justice in language assessment. In: The companion to language assessment. Wiley, New York, pp 1–17Google Scholar
  76. Kunnan AJ (2018) Evaluating language assessments. Routledge, New York/LondonGoogle Scholar
  77. Laborde C (2002) The reception of John Rawls in Europe. Eur J Polit Theory 1:133–146. Scholar
  78. Latham H (1877) On the action of examinations considered as a means of selection. Deighton, Bell & Co, Cambridge, UKGoogle Scholar
  79. Li H, Suen HK (2013) Detecting native language group differences at the subskills level of reading: a differential skill functioning approach. Lang Test 30:273–298. Scholar
  80. Linacre JM (1989) Many faceted Rasch measurement. Unpublished doctoral dissertation. University of Chicago, ChicagoGoogle Scholar
  81. Linn RL, Drasgow F (1987) Implications of the Golden Rule settlement for test construction. Educ Meas Issues Pract 6:13–17. Scholar
  82. Lo Bianco J (2014) Dialogue between ELF and the field of language policy and planning. J Engl Lingua Franca 3:197–213Google Scholar
  83. Lumley T (2002) Assessment criteria in a large-scale writing test: what do they really mean to the raters? Lang Test 19:246–276CrossRefGoogle Scholar
  84. Malone ME (2013) The essentials of assessment literacy: contrasts between testers and users. Lang Test 30:329–344. Scholar
  85. Matlock KL, Turner R (2016) Unidimensional IRT item parameter estimates across equivalent test forms with confounding specifications within dimensions. Educ Psychol Meas 76:258–279. Scholar
  86. McNamara T (2009) Australia: the dictation test redux? Lang Assess Q 6:106–111. Scholar
  87. McNamara T (2012) Language assessments as shibboleths: a poststructuralist perspective. Appl Linguist 33:564–581. Scholar
  88. McNamara T, Knoch U (2012) The Rasch wars: the emergence of Rasch measurement in language testing. Lang Test 29:555–577. Scholar
  89. McNamara T, Roever C (2006) Language testing: the social dimension. Wiley, MaldenGoogle Scholar
  90. McNamara T, Ryan K (2011) Fairness versus justice in language testing: the place of English literacy in the Australian citizenship test. Lang Assess Q 8:161–178. Scholar
  91. McNamara T, Shohamy E (2008) Language tests and human rights. Int J Appl Linguist 18:89–95. Scholar
  92. Messick S (1989) Validity. In: Educational measurement, 3rd edn. American Council on Education/Macmillan, Washington, DC, pp 13–103Google Scholar
  93. Mill JS (2015) On liberty, utilitarianism and other essays. Oxford University Press, Oxford. (Original work published 1861)Google Scholar
  94. Nagel T (1989) The view from nowhere, Revised edn. Oxford University Press, New YorkGoogle Scholar
  95. Nozick R (1974) Anarchy, state, and utopia, 2nd edn. Basic Books, New YorkGoogle Scholar
  96. Nussbaum M (2000) Women and human development: the capabilities approach. Cambridge University Press, Cambridge, UKCrossRefGoogle Scholar
  97. Nussbaum M (2002) Capabilities and social justice. Int Stud Rev 4:123–135. Scholar
  98. Nussbaum M (2011) Creating capabilities. The human development approach. Belknap Press, Cambridge, MACrossRefGoogle Scholar
  99. O’Loughlin K (2013) Developing the assessment literacy of university proficiency test users. Lang Test 30:363–380. Scholar
  100. Oliveri ME, Ercikan K, Zumbo B (2013) Analysis of sources of latent class differential item functioning in international assessments. Int J Test 13:272–293. Scholar
  101. Petersen NS, Novick MR (1976) An evaluation of some models for culture-fair selection. J Educ Meas 13:3–29CrossRefGoogle Scholar
  102. Phillips DC (2007) Adding complexity: philosophical perspectives on the relationship between evidence and policy. Yearb Natl Soc Study Educ 106:376–402. Scholar
  103. Pogge T (2010) A critique of the capability approach. In: Brighouse H, Robeyns I (eds) Measuring justice: primary goods and capabilities. Cambridge University Press, Cambridge, UK, pp 17–61CrossRefGoogle Scholar
  104. Rawls J (1958) Justice as fairness. Philos Rev 67:164–194CrossRefGoogle Scholar
  105. Rawls J (1971) A theory of justice. Harvard University Press, Cambridge, MAGoogle Scholar
  106. Rawls J (2001) Justice as fairness: a restatement, 2nd edn. Belknap Press, Cambridge, MAGoogle Scholar
  107. Reisch M (2014) Social justice and liberalism. In: Reisch M (ed) The Routledge international handbook of social justice. Routledge, London/New York, pp 132–147CrossRefGoogle Scholar
  108. Rescher N (2002) Fairness. Theory & practice of distributive justice. Transaction publishers, New Brunswick/LondonGoogle Scholar
  109. Roemer J (1996) Theories of distributive justice. Harvard University Press, Cambridge, MAGoogle Scholar
  110. Saida C, Hattori T (2008) Post-hoc IRT equating of previously administered English tests for comparison of test scores. Lang Test 25:187–210. Scholar
  111. Sandel MJ (2010) Justice: what’s the right thing to do? Reprint edn. Farrar, Straus and Giroux, New YorkGoogle Scholar
  112. Saville N (2012) Quality management in test production and administration. In: Fulcher G, Davidson F (eds) The Routledge handbook of language testing. Routledge, London/New York, pp 395–413Google Scholar
  113. Sen A (1980) Equality of what? In: McMurrin S (ed) The Tanner lectures on human values. University of Utah Press and Cambridge University Press, Salt Lake City, pp 196–220Google Scholar
  114. Sen A (1992) Inequality re-examined. Oxford University Press, OxfordGoogle Scholar
  115. Sen A (1993) Capability and wellbeing. In: Nussbaum M, Sen A (eds) The quality of life: studies in development economics. Oxford University Press, Oxford, pp 31–66Google Scholar
  116. Sen A (2010) The idea of justice, 1st edn. Penguin, LondonGoogle Scholar
  117. Shaw S, Imam H (2013) Assessment of international students through the medium of English: ensuring validity and fairness in content-based examinations. Lang Assess Q 10:452–475. Scholar
  118. Shohamy E (2001) The power of tests. Pearson, New YorkGoogle Scholar
  119. Shohamy E (2006) Language policy: hidden agendas and new approaches. Routledge, London/New YorkCrossRefGoogle Scholar
  120. Shohamy E (2009) Language tests for immigrants: why language? Why tests? Why citizenship? In: Hogan-Brun G, Mar-Molinero C, Stevenson P (eds) Discourses on language and integration. John Benjamins Publishing, Amsterdam, pp 45–60CrossRefGoogle Scholar
  121. Spolsky B (1981) Some ethical questions about language testing. In: Klein-Braley C, Stevenson DK (eds) Practice and problems in language testing. Peter Lang, Frankfurt am Main, pp 5–30Google Scholar
  122. Spolsky B (1995) Measured words: the development of objective language testing. Oxford University Press (Sd), OxfordGoogle Scholar
  123. Spolsky B (1997) The ethics of gatekeeping tests: what have we learned in a hundred years? Lang Test 14:242–247. Scholar
  124. Spolsky B (2013) The influence of ethics in language assessment. In: The companion to language assessment. Wiley, Malden, MAGoogle Scholar
  125. Stein Z (2016) Social justice and educational measurement. Routledge, Oxon/New YorkCrossRefGoogle Scholar
  126. Stern W (1914) The psychological methods of testing intelligence. Warwick & York, BaltimoreCrossRefGoogle Scholar
  127. Swinton S, Powers D (1980) Factor analysis of the Test of English as a Foreign Language for several groups. Educational Testing Service, PrincetonCrossRefGoogle Scholar
  128. The State of Louisiana (1963a) Louisiana literacy testGoogle Scholar
  129. The State of Louisiana (1963b) Registration procedureGoogle Scholar
  130. Thorndike RL (1971) Concepts of culture-fairness. J Educ Meas 8:63–70CrossRefGoogle Scholar
  131. Toulmin S (2003) The uses of argument, Updated edn. Cambridge University Press, Cambridge, UK/New YorkCrossRefGoogle Scholar
  132. UN General Assembly (1948) Universal declaration of human rightsGoogle Scholar
  133. Valentini L (2012) Justice in a globalized world: a normative framework, 1st edn. Oxford University Press, OxfordGoogle Scholar
  134. Van Avermaet P, Pulinx R (2013) Language testing for immigration to Europe. In: The companion to language assessment, Wiley, Malden, MAGoogle Scholar
  135. Van den Branden K (2015) Sustainable education: exploiting students’ energy for learning as a renewable resource. Sustainability 7:5471–5487. Scholar
  136. Walker M, Unterhalter E (eds) (2007) Amartya Sen’s capability approach and social justice in education. Palgrave Macmillan, Basingstoke/New YorkGoogle Scholar
  137. Walters SF (2012) Fairness. In: Fulcher G, Davidson F (eds) The Routledge handbook of language testing. Routledge, London/New York, pp 469–479Google Scholar
  138. Weideman A (2017) Does responsibility encompass ethicality and accountability in language assessment? Lang Commun 57:5–13. Scholar
  139. Weigle SC (2002) Assessing writing. Cambridge University Press, Cambridge, UKCrossRefGoogle Scholar
  140. Weintrob J, Weintrob R (1912) The influence of environment on mental ability as shown by Binet-Simon tests. J Educ Psychol 3:577–583CrossRefGoogle Scholar
  141. Willner LS, Rivera C, Acosta BD (2008) Descriptive study of state assessment policies for accommodating English language learners. George Washington University Center for Equity and Excellence in Education, ArlingtonGoogle Scholar
  142. Wood AW (2008) Kantian ethics. Cambridge University Press, Cambridge, UKGoogle Scholar
  143. Xi X (2010a) How do we go about investigating test fairness? Lang Test 27:147–170. Scholar
  144. Xi X (2010b) Aspects of performance on line graph description tasks: influenced by graph familiarity and different task features. Lang Test 27:73–100. Scholar
  145. Young IM (1990) Justice and the politics of difference. Princeton University Press, New JerseyGoogle Scholar
  146. Young IM (2011) Responsibility for justice. Oxford University Press, OxfordCrossRefGoogle Scholar
  147. Zeidner M (1987) A comparison of ethnic, sex and age bias in the predictive validity of English language aptitude tests: some Israeli data. Lang Test 4:55–71. Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.FWO Vlaanderen and Centrum voor Taal and OnderwijsKU LeuvenLeuvenBelgium

Personalised recommendations