Language Testing and Assessment

  • April GintherEmail author
  • Kyle McIntosh


We begin by examining the history of language testing and assessment as parallel to the development of large-scale, high-stakes language proficiency tests (e.g., TOEFL) used primarily for admission into institutions of higher learning. We then discuss core concepts in the field and provide an overview of the most commonly used research methods. Lastly, we address a number of challenges and concerns arising from tensions between those who see the growing emphasis on testing as a way to ensure fairness and accountability and those who believe it results in bias and inequality. Consequential validity, assessment literacy, and world Englishes/English as a lingua franca are discussed in relation to language tests and assessments as used for decision-making purposes in various domains.


Assessment literacy Language proficiency Language testing Reliability Validity 


  1. ACTFL. (2012). ACTFL proficiency guidelines (Revised). Alexandria, VA: American Council on the Teaching of Foreign Languages.Google Scholar
  2. Alderson, C. (1991). Language testing in the 1990s: How far have we come? How much further have we to go? In A. Sarinee (Ed.), Current developments in language testing: Anthology Series 25 (pp. 1–27). Singapore: Regional Language Centre.Google Scholar
  3. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  4. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  5. Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. Braun (Eds.), Test validity (pp. 9–13). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  6. Bachman, L. (1988). Problems in examining the validity of the ACTFL oral proficiency interview. Studies in Second Language Acquisition, 10, 149–164.CrossRefGoogle Scholar
  7. Bachman, L., & Savignon, S. (1986). The evaluation of communicative language proficiency: A critique of the ACTFL oral interview. Modern Language Journal, 70, 380–391.CrossRefGoogle Scholar
  8. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.Google Scholar
  9. Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press.Google Scholar
  10. Banerjee, J., & Luoma, S. (1997). Qualitative approaches to test validation. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education, Volume 7: Language testing and assessment (pp. 275–287). Dordrecht: Kluwer Academic.CrossRefGoogle Scholar
  11. Berns, M. (2008). World Englishes, English as a lingua franca, and intelligibility. World Englishes, 27, 327–334.CrossRefGoogle Scholar
  12. Canale, M. (1983). From communicative competence to communicative language pedagogy. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 2–27). New York: Longman.Google Scholar
  13. Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1–47.CrossRefGoogle Scholar
  14. Carroll, J. B. (1961). Fundamental considerations in testing for English language proficiency of foreign students. In H. B. Allen & R. N. Campbell (Eds.), Teaching English as a second language: A book of readings (2nd ed., pp. 313–321). New York: McGraw Hill.Google Scholar
  15. Carroll, J. B. (1986). LT + 25, and beyond. Language Testing, 3, 123–129.CrossRefGoogle Scholar
  16. Chapelle, C., Chung, Y., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27, 443–469.CrossRefGoogle Scholar
  17. Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.Google Scholar
  18. Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.Google Scholar
  19. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: M.I.T. Press.Google Scholar
  20. Council of Europe. (2003). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching and assessment. Cambridge: Cambridge University Press.Google Scholar
  21. Cronbach, L. J. (1984). Essentials of psychological testing (4th ed.). New York: Harper and Row.Google Scholar
  22. Davidson, F. (2006). World Englishes and test construction. In B. B. Kachru, Y. Kachru, & C. Nelson (Eds.), The handbook of world Englishes (pp. 709–717). Hoboken, NJ: Wiley-Blackwell.CrossRefGoogle Scholar
  23. Davidson, F., & Fulcher, G. (2007). The Common European Framework of Reference (CEFR) and the design of language tests: A matter of effect. Language Teaching, 40, 231–24I.CrossRefGoogle Scholar
  24. Davies, A. (1984). Validating three tests of language proficiency. Language Testing, 1, 50–69.CrossRefGoogle Scholar
  25. Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33, 117–135.CrossRefGoogle Scholar
  26. Davis, L., Laughlin, V., Gu, L., & Ockey, G. (2016, March). Face-to-face speaking assessment in the digital age: Interactive speaking tasks on-line. Paper presented at the Georgetown University Roundtable, Washington, DC.Google Scholar
  27. Dimova, S. (2017). Pronunciation assessment in the context of world Englishes. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 49–66). New York: Routledge.Google Scholar
  28. Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185.CrossRefGoogle Scholar
  29. Fulcher, G. (1996). Invalidating validity claims for the ACTFL oral rating scale. System, 24, 163–172.CrossRefGoogle Scholar
  30. Fulcher, G. (1997). An English language placement test: Issues in reliability and validity. Language Testing, 14, 113–138.CrossRefGoogle Scholar
  31. Fulcher, G. (2004). Deluded by artifices? The Common European Framework and harmonization. Language Assessment Quarterly, 1, 253–266.CrossRefGoogle Scholar
  32. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9, 113–132.CrossRefGoogle Scholar
  33. Gardener, H. (1985). The mind’s new science. New York: Basic Books.Google Scholar
  34. Ginther, A., & Elder, C. (2014). A comparative investigation into understandings and uses of the TOEFL iBT test, the International English Language Testing Service (academic) test, and the Pearson Test of English for Graduate Admissions in the United States and Australia: A case study of two university contexts. ETS research report No. TOEFLiBT-24. Retrieved from
  35. Ginther, A., & Stevens, J. (1998). Language background, ethnicity, and the internal construct validity of the Advanced Placement Spanish language examination. In A. Kunnan (Ed.), Validation in language assessment (pp. 169–194). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  36. Hawkins, J., & Filipović, L. (2012). Criterial features in L2 English: Specifying the reference levels of the Common European Framework. Cambridge: Cambridge University Press.Google Scholar
  37. Henning, G. (1984). Advantages of latent trait measurement in language testing. Language Testing, 1, 123–133.CrossRefGoogle Scholar
  38. Hsu, T. H.-L. (2016). Removing bias towards World Englishes: The development of a rater attitude instrument using Indian English as a stimulus. Language Testing, 33, 367–389.CrossRefGoogle Scholar
  39. Hymes, D. H. (1972). On communicative competence. In J. B. Pride & J. Holmes (Eds.), Sociolinguistics. Selected readings (pp. 269–293). Harmondsworth: Penguin.Google Scholar
  40. Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus on language assessment courses. Language Testing, 25, 385–402.CrossRefGoogle Scholar
  41. Jenkins, J. (2006). Current perspectives on teaching world Englishes and English as a lingua Franca. TESOL Quarterly, 40, 157–181.CrossRefGoogle Scholar
  42. Kachru, B. (1985). Standards, codification and sociolinguistic realism: The English language in the Outer Circle. In R. Quirk & H. Widdowson (Eds.), English in the world, teaching and learning the language and literatures (pp. 11–30). Cambridge: Cambridge University Press.Google Scholar
  43. Kane, M. T. (2013). Validating the interpretation and uses of test scores. Journal of Educational Measurement, 50, 1–73.CrossRefGoogle Scholar
  44. Lado, R. (1961). Language testing: The construction and use of foreign language tests. London: Longman.Google Scholar
  45. Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice, 17, 28–30.CrossRefGoogle Scholar
  46. Lowenberg, P. H. (1993). Issues in validity in tests of English as a world language: Whose standards? World Englishes, 12, 95–106.CrossRefGoogle Scholar
  47. Major, R. C., Fitzmaurice, S. F., Bunta, F., & Balasubramanian, C. (2005). Testing the effects of regional, ethnic and international dialects of English on listening comprehension. Language Learning, 55, 37–69.CrossRefGoogle Scholar
  48. McNamara, T. F. (1995). Modelling performance: Opening Pandora’s box. Applied Linguistics, 16, 159–179.CrossRefGoogle Scholar
  49. McNamara, T. F. (1996). Measuring second language performance: A new era in language testing. New York: Longman.Google Scholar
  50. Mehrens, W. A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16, 16–18.CrossRefGoogle Scholar
  51. Messick, S. (1975). The standard program: Meaning and values in measurement and evaluation. American Psychologist, 30, 955–966.CrossRefGoogle Scholar
  52. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education and Macmillan.Google Scholar
  53. Miller, G. (2003). The cognitive revolution: A historical perspective. Trends in Cognitive Sciences, 7, 141–144.CrossRefGoogle Scholar
  54. Morrow, K. (1981). Communicative language testing: Revolution or evolution? In J. C. Alderson & A. Hughes (Eds.), Issues in language testing, 38 (pp. 9–26). London: The British Council.Google Scholar
  55. Nelson, C. (2011). Intelligibility in world Englishes. Hoboken, NJ: Blackwell.Google Scholar
  56. O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test users. Language Testing, 30, 363–380.CrossRefGoogle Scholar
  57. Papageorgiou, S., Tannenbaum, R. J., Bridgeman, B., & Cho, Y. (2015). The association between TOEFL iBT test scores and the Common European Framework of Reference (CEFR) levels. Research Memorandum-15-06. Princeton, NJ: ETS.Google Scholar
  58. Phakiti, A. (2008). Construct validation of Bachman and Palmer’s (1996) strategic competence model over time in EFL reading tests. Language Testing, 25, 237–272.CrossRefGoogle Scholar
  59. Popham, W. J. (1997). Consequential validity: Right concern – Wrong concept. Educational Measurement: Issues and Practice, 16, 9–13.CrossRefGoogle Scholar
  60. Rea-Dickins, P. (2001). Mirror, mirror on the wall: Identifying processes of classroom assessment. Language Testing, 18, 429–462.CrossRefGoogle Scholar
  61. Sawaki, Y., Stricker, L. J., & Oranje, A. H. (2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26, 5–30.CrossRefGoogle Scholar
  62. Seidlhofer, B. (2001). Closing a conceptual gap: The case for a description of English as a lingua franca. International Journal of Applied Linguistics, 11, 133–158.CrossRefGoogle Scholar
  63. Shepard, L. A. (1993). Evaluating test validity. In L. Darling-Hammond (Ed.), Review of Research in Education, 19 (pp. 405–450). Washington, DC: AERA.Google Scholar
  64. Shiotsu, T., & Weir, C. J. (2007). The relative significance of syntactic knowledge and vocabulary breadth in the prediction of reading comprehension test performance. Language Testing, 24, 99–128.CrossRefGoogle Scholar
  65. Spolsky, B. (1981). Some ethical questions about language testing. In C. Klein-Braley & D. K. Stevenson (Eds.), Practice and problems in language testing (pp. 5–30). Frankfurt am Main: Peter Lang.Google Scholar
  66. Spolsky, B. (1986). A multiple choice for language testers. Language Testing, 3, 147–158.CrossRefGoogle Scholar
  67. Spolsky, B. (1993). Testing across cultures: An historical perspective. World Englishes, 12, 87–93.CrossRefGoogle Scholar
  68. Spolsky, B. (1995). Measured words: The development of objective language testing. Oxford: Oxford University Press.Google Scholar
  69. Stansfield, C. (2008). Where we have been and where we should go? Language Testing, 25, 311–326.CrossRefGoogle Scholar
  70. Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappan, 72, 534–539.Google Scholar
  71. Torkildsen, L. G., & Erickson, G. (2016). “If they’d written more…” – On students’ perceptions of assessment and assessment practices. Education Inquiry, 7, 137–157.CrossRefGoogle Scholar
  72. Toulmin, S. (1958). The uses of argument. Cambridge: Cambridge University Press.Google Scholar
  73. Toulmin, S. (2001). Return to reason. Cambridge, MA: Harvard University Press.Google Scholar
  74. Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language Writing, 16, 194–209.CrossRefGoogle Scholar
  75. Wind, S. A., & Peterson, M. E. (2017). A systematic review of methods for evaluating rating quality in language assessment. Language Testing, 35, 161–192.CrossRefGoogle Scholar
  76. Yan, X., Thirakunkovit, S., Kauper, N., & Ginther, A. (2016). What do test takers say: Test-taker feedback as input for quality control. In J. Read (Ed.), Post-admission language assessments of university students (pp. 157–183). Switzerland: Springer.Google Scholar
  77. Zhang, Y., & Elder, C. (2011). Judgments of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing, 28, 31–50.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of EnglishPurdue UniversityWest LafayetteUSA
  2. 2.Department of English and WritingUniversity of TampaTampaUSA

Personalised recommendations