Abstract
This chapter reviews recent trends in the conceptualizations and formats of tests used to determine whether non-native speakers of English have sufficient proficiency in English to study at English-medium universities in English-dominant countries. The review focuses on published research informing a new version of the Test of English as a Foreign Language (TOEFL), but a range of similar tests internationally is also considered. Prominent among the issues guiding research and development on these tests are the following: construct validation, particularly refinements in the description of testing purposes, evaluations of the discourse produced in the contexts of testing, and surveys of relevant domains and score users; consistency, including fairness in opportunities for test performance across differing populations, reliability through field-testing and equating of test forms, and sampling of multiple, comparable performances from examinees; and innovations in the media of test administration, including various forms of computer and other technological adaptations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AERA (American Educational Research Association), APA (American Psychological Association), & NCME (National Council on Measurement and Evaluation). (1999). Standards for educational and psychological assessment. Washington, DC: Authors.
Alderson, J. C., & Banerjee, J. (2002). State of the art review: Language testing and assessment (Part 2). Language Teaching, 35, 79–113.
Alderson, J. C., & Clapham, C. (1992). Applied linguistics and language testing: A case study of the ELTS test. Applied Linguistics, 13(2), 149–167.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press.
Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. TESOL Quarterly, 13, 280–297.
Bachman, L. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42.
Bachman, L., Davidson, F., Ryan, K., & Choi, I. (1995). An investigation into the comparability of two tests of English as a foreign language: The Cambridge-TOEFL comparability study. Cambridge: Cambridge University Press.
Bailey, K. (1999). Washback in language testing (TOEFL Monograph No. 15). Princeton, NJ: Educational Testing Service.
Benesch, S. (Ed.). (1988). Ending remediation: Linking ESL and content in higher education. Washington, DC: TESOL.
Bejar, I., Douglas, D., Jamieson, J., Nissan, S., & Turner, J. (2000). TOEFL 2000 listening framework: A working paper (TOEFL Monograph Series, Report No. 19). Princeton, NJ: Educational Testing Service.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multidimensional comparison. TESOL Quarterly, 36, 9–48.
Boyle, A., & Booth, D. (2000, March). The UCLES/CUP learner corpus. Research Notes: University of Cambridge Local Examinations Syndicate EFL, 1, 10.
Bridgeman, B., Cline, F., & Powers, D. (2002, April). Evaluating new tasks for TOEFL: Relationships to external criteria. Paper presented at the Annual TESOL Convention, Salt Lake City, UT.
Brindley, G. (1998). Describing language development? Rating scales and second language acquisition. In L. Bachman & A. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp. 112–140). Cambridge: Cambridge University Press.
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20, 1–25.
Butler, F., Eignor, D., Jones, S., McNamara, T., & Suomi, B. (2000). TOEFL 2000 speaking framework: A working paper (TOEFL Monograph Series, Report No. 20). Princeton, NJ: Educational Testing Service.
Carey, P. (1966). A review of psychometric and consequential issues related to performance assessment (TOEFL Monograph Series, Report No. 3). Princeton, NJ: Educational Testing Service.
Carrell, P., Dunkel, P., & Mollaun, P. (2002). The effects of notetaking, lecture length and topic on the listening component of TOEFL 2000. (TOEFL Monograph Series, Report No. 23). Princeton, NJ: Educational Testing Service.
Carroll, J. B. (1975). The teaching of French as a foreign language in eight countries. John Wiley & Sons: New York.
Chalhoub-Deville, M., & Deville, C. (1999). Computer-adaptive testing in second language contexts. Annual Review of Applied Linguistics, 19, 273–299.
Chapelle, C. (1998). Construct definition and validity inquiry in SLA research. In L. Bachman & A. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp. 32–70). Cambridge: Cambridge University Press.
Chapelle, C. (2001). Computer applications in second language acquisition: Foundations for teaching, testing and research. Cambridge: Cambridge University Press.
Chapelle, C., Grabe, W., & Berns, M. (1997). Communicative language proficiency: Definition and implications for TOEFL 2000. (TOEFL Monograph Series Report No. 10). Princeton, NJ: Educational Testing Service.
Charge, N., & Taylor, L. (1997). Recent developments in IELTS. ELT Journal, 51, 374–380.
Clapham, C. (1996). The development of the IELTS: A study of the effect of background knowledge on reading comprehension. Cambridge: Cambridge University Press.
Connor-Linton, J. (1995). Looking behind the curtain: What do L2 composition ratings really mean? TESOL Quarterly, 29, 762–765.
Cumming, A. (1996). The concept of validation in language testing. In A. Cumming & R. Berwick, (Eds.), Validation in language testing (pp. 1–14). Clevedon, UK: Multingual Matters.
Cumming, A., Grant, L., Mulcahy-Ernt, P., & Powers, D. (2004). A teacher-verification study of speaking and writing prototype tasks for a new TOEFL. Language Testing, 21, 159–197.
Cumming, A., Kantor, R., & Powers, D. (2001). Scoring TOEFL essays and TOEFL 2000 prototype tasks: An investigation into raters’ decision making and development of a preliminary analytic framework (TOEFL Monograph Series, Report No. 22). Princeton, NJ: Educational Testing Service.
Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while scoring ESL/EFL compositions: A descriptive model. Modern Language Journal, 86, 67–96.
Cumming, A., Kantor, R., Powers, D., Santos, T., & Taylor, C. (2000). TOEFL 2000 writing framework: A working paper (TOEFL Monograph Series, Report No. 18). Princeton, NJ: Educational Testing Service.
Davidson, F., Turner, C., & Huhta, A. (1997). Language testing standards. In D. Corson (Series Ed.) & C. Clapham (Vol. Ed.), Encyclopedia of language and education: Vol. 7. Language testing and assessment (pp. 303–311). Dordrecht, Netherlands: Kluwer.
De Jong, J. (Ed.). (1990). Standardization in language testing [Special issue]. AILA Review, 7, 24–45.
Douglas, D. (1997). Testing speaking ability in academic contexts: Theoretical considerations (TOEFL Monograph Series, Report No. 8). Princeton, NJ: Educational Testing Service.
Douglas, D., & Smith, J. (1997). Theoretical underpinnings of the Test of Spoken English revision project (TOEFL Monograph Series, Report No. 9). Princeton, NJ: Educational Testing Service.
ETS (Educational Testing Service). (2002). Language Edge courseware: Handbook for scoring speaking and writing. Princeton, NJ: Educational Testing Service.
Elder, C. (1993). How do subject specialists construe classroom language proficiency? Language Testing, 10, 233–254.
Elson, N. (1992). The failure of tests: Language tests and post-secondary admissions of ESL students. In B. Burnaby & A. Cumming (Eds.), Socio-political aspects of ESL education in Canada (pp. 110–121). Toronto: OISE Press.
Enright, M., & Cline, F. (2002, April). Evaluating new task types for TOEFL: Relationships between skills. Paper presented at Annual TESOL Convention, Salt Lake City, UT.
Enright, M., Grabe, B., Koda, K., Mosenthal, P., Mulcahy-Emt, P., & Schedl, M. (2000). TOEFL 2000 reading framework: A working paper (TOEFL Monograph Series, Report No. 17). Princeton, NJ: Educational Testing Service.
Epp, L., & Stawychny, M. (2001). Using the Canadian Language Benchmarks (CLB) to benchmark college programs/courses and language proficiency tests. TESL Canada Journal, 18, 32–47.
Fletcher, J., & Stern, R. (1989). Language skills and adaptation: A study of foreign students in a Canadian university. Curriculum Inquiry, 19, 293–308.
Fulcher, G. (1996). Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing, 13, 208–238.
Fulcher, G. (1999). Assessment in English for academic purposes: Putting content validity in its place. Applied Linguistics, 20, 221–236
Ginther, A. (2001). Effects of the presence and absence of visuals on performance on TOEFL CBT listening-comprehensive stimuli (TOEFL Research Report No. 66). Princeton, NJ: Educational Testing Service.
Ginther, A., & Grant, L. (1996). A review of the academic needs of native English-speaking college students in the United States (TOEFL Monograph Series, Report No. 1). Princeton, NJ: Educational Testing Service.
Graham, J. (1987). English language proficiency and the prediction of academic success. TESOL Quarterly, 21, 505–521.
Grant, L., & Ginther, L. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9, 123–145.
Greenberg, K. (1986). The development and validation of the TOEFL writing test: A discussion of TOEFL Research Reports 15 and 19. TESOL Quarterly, 20, 531–544.
Hale, G., Taylor, C., Bridgeman, B., Carson, J., Kroll, B., & Kantor, R. (1996). A study of writing tasks assigned in academic degree programs (TOEFL Research Report No. 54). Princeton, NJ: Educational Testing Service.
Hamp-Lyons, L. (1997). Ethics in language testing. In D. Corson (Series Ed.) & C. Clapham (Vol. Ed.), Encyclopedia of language and education: Vol. 7. Language testing and assessment (pp. 323–333). Dordrecht, Netherlands: Kluwer.
Hamp-Lyons, L. (1998). Ethical test preparation practice: The case of TOEFL. TESOL Quarterly, 32, 329–337.
Hamp-Lyons, L., & Kroll, B. (1997). TOEFL 2000-writing: Composition, community, and assessment (TOEFL Monograph Report No. 5). Princeton, NJ: Educational Testing Service.
Harley, B., Cummins, J., Swain, M., & Allen, P. (Eds.). (1990). The development of second language proficiency. Cambridge: Cambridge University Press.
Hudson, T. (1996). Assessing second language academic reading from a communicative competence perspective (TOEFL Monograph Series, Report 4). Princeton, NJ: Educational Testing Service.
Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information-processing approach in task design. Language Learning, 51, 401–436.
Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., & Taylor, C. (2000). TOEFL 2000 framework: A working paper (TOEFL Monograph Series Report No. 16). Princeton, NJ: Educational Testing Service.
Kenyon, D., & Malobonga, V. (2001). Comparing examinee attitudes toward computer-assisted and other oral proficiency assessments. Language Learning and Technology, 5(2), 60–83.
Kirsch, I., Jamieson, J., Taylor, C., & Eignor, D. (1998). Computer familiarity among TOEFL examinees (TOEFL Research Report Series, No. 59). Princeton, NJ: Educational Testing Service.
Kunnan, A. (1998). Approaches to validation in language assessment. In A. Kunnan (Ed.), Validation in language assessment (pp. 1–18). Mahwah, NJ: Lawrence Erlbaum.
Kunnan, A. (2000). Fairness and justice for all. In A. Kunnan (Ed.), Fairness and validation in language assessment. Cambridge: Cambridge University Press.
Lazaration, A. (2002). A qualitative approach to the validation of oral language tests. Cambridge: Cambridge University Press.
Lazarton, A., & Wagner, S. (1996). The revised TSE: Discourse analysis of native and nonnative speaker data (TOEFL Monograph Report No. 7). Princeton, NJ: Educational Testing Service.
Lee, Y-W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. (TOEFL Monograph Series No. 28). Princeton, NJ: Educational Testing Service.
Lee, Y-W., Kantor, R., & Mollaun, P. (2002, April). Score reliability as an essential prerequisite for validating new writing and speaking tasks for TOEFL. Paper presented at the Annual TESOL Convention, Salt Lake City, UT.
McNamara, T. (1998). Policy and social considerations in language assessment. Annual Review of Applied Linguistics, 18, 304–319.
McNamara, T., Hill, K., & May, L. (2002). Discourse and assessment. Annual Review of Applied Linguistics, 22, 221–242.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. Braun (Eds.), Test validity (pp. 33–45). Hillsdale, NJ: Erlbaum.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.
Moss, P. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229–258.
North, B. (2000). The development of a Common Framework Scale of language proficiency. Oxford: Peter Lang.
Peirce, B. (1992). Demystifying the TOEFL reading test. TESOL Quarterly, 26, 665–689.
Powell, W. (2001). Looking back, looking forward: Trends in intensive English program enrollments (TOEFL Monograph 14). Princeton, NJ: Educational Testing Service.
Raimes, A. (1990). The TOEFL Test of Written English: Causes for concern. TESOL Quarterly, 24(3), 427–442.
Roberts, M. (2000). An examination of the way a group of Korean language learners prepare for the Test of English as a Foreign Language (TOEFL). Unpublished Masters’ dissertation, Department of Curriculum, Teaching and Learning, University of Toronto.
Rosenfeld, M., Leung, S., & Oltman, P. (2001). The reading, writing, speaking, and listening tasks important for academic success at the undergraduate and graduate levels (TOEFL Monograph 21). Princeton, NJ: Educational Testing Service.
Ross, S., & Berwick, R. (1992). The discourse of accommodation in oral proficiency interviews. Studies in Second Language Acquisition, 14, 159–176.
Schedl, M., Gordon, C., Carey, P., & Tang, K. L. (1996). An analysis of the dimensionality of TOEFL reading comprehension items. TOEFL Research Report 53. Princeton, NJ: Educational Testing Service.
Shermis, M., & Burstein, J. (Eds.). (2003). Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Erlbaum.
Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press.
Spolsky, B. (1995). Measured words: The development of objective language testing. Oxford: Oxford University Press.
Sullivan, B., Weir, C., & Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19, 33–56.
Taylor, C., Jamieson, J., Eignor, D., & Kirsch, I. (1998). The relationship between computer familiarity and performance on computer-based TOEFL test tasks (TOEFL Research Report Series, 61). Princeton, NJ: Educational Testing Service.
Taylor, C., Kirsch, I., Jamieson, J., & Eignor, D. (1999). Examining the relationship between computer familiarity and performance on computer-based language tasks. Language Learning, 49, 219–274.
Wainer, H., & Lukhele, R. (1997). How reliable is the TOEFL test? (TOEFL Technical Report 12). Princeton, NJ: Educational Testing Service.
Wallace, C. (1997). IELTS: Global implications of curriculum and materials design. ELT Journal, 51, 370–373.
Waters, A. (1996). A review of research into needs in English for academic purposes of relevance to the North American higher education context. (TOEFL Monograph Report 6). Princeton, NJ: Educational Testing Service.
Weir, C. (2005). Language testing and validation: An evidence-based approach. New York: Palgrave Macmillan.
Wesche, M. (1987). Second language performance testing: The Ontario Test of ESL as an example. Language Testing, 4, 28–47.
Young, R., & He, A. (Eds.). (1998). Talking and testing: Discourse approaches to the assessment of oral proficiency. Amsterdam: John Benjamins.
Zamel, V. (1995). Stangers in academia: The experiences of faculty and ESL students across the curriculum. College Composition and Communication, 46, 505–521.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC.
About this chapter
Cite this chapter
Cumming, A. (2007). New Directions in Testing English Language Proficiency for University Entrance. In: Cummins, J., Davison, C. (eds) International Handbook of English Language Teaching. Springer International Handbooks of Education, vol 15. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-46301-8_34
Download citation
DOI: https://doi.org/10.1007/978-0-387-46301-8_34
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-46300-1
Online ISBN: 978-0-387-46301-8
eBook Packages: Humanities, Social Sciences and LawEducation (R0)