Abstract
Lexical recognition tests are frequently used to assess vocabulary knowledge. In such tests, learners need to differentiate between words and artificial nonwords that look much like real words. Our ultimate goal is to create high quality lexical recognition tests automatically which enables repetitive automated testing for different languages. This task involves both simple (words selection) and complex (nonwords generation) subtasks. Our main goal here is to automatically generate word-like nonwords. We compare different ranking strategy and find that our best strategy (a specialized higher-order character-based language model) creates word-like nonwords. We evaluate our nonwords in a user study and find that our automatically generated test yields scores that are highly correlated with a well-established lexical recognition test which was manually created.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It means generous in English.
- 2.
This is a different size compared to nonwords in LexTALE that are 4 to 11 letters long. In order to ensure comparability with LexTALE, we follow those length constraints, but newly generated tests should use the same constraints for words and nonwords.
- 3.
- 4.
- 5.
References
Baayen, R.H., Piepenbrock, R., Gulikers, L.: The Celex Lexical Database (Release 2). Linguistic Data Consortium, Philadelphia (1995)
Balota, D.A., Yap, M.J., Hutchison, K.A., Cortese, M.J., Kessler, B., Loftis, B., Neely, J.H., Nelson, D.L., Simpson, G.B., Treiman, R.: The English lexicon project. Behav. Res. Methods 39(3), 445–459 (2007)
Brysbaert, M.: LexTALE_FR a fast, free, and efficient test to measure language proficiency in French. Psychol. Belg. 53(1), 23–37 (2013)
Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. Ann. Arbor. MI 48113(2), 161–175 (1994)
Duyck, W., Desmet, T., Verbeke, L.P., Brysbaert, M.: Wordgen: a tool for word selection and nonword generation in dutch, english, german, and french. Behav. Res. Methods Instrum. Comput. 36(3), 488–499 (2004)
Francis, W.N., Kuçera, H.: Manual of Information to Accompany a Standard Corpus of Present-day Edited American English, for use with Digital Computers. Brown University, Providence (1964)
Greenberg, J.H.: Some generalizations concerning initial and final consonant sequences. Linguistics 3(18), 5–34 (1965)
Huibregtse, I., Admiraal, W., Meara, P.: Scores on a yes-no vocabulary test: correction for guessing and response style. Lang. Test. 19(3), 227–245 (2002)
Izura, C., Cuetos, F., Brysbaert, M.: Lextale-esp: a test to rapidly and efficiently assess the spanish vocabulary size. Psicol. Int. J. Methodol. Exp. Psychol. 35(1), 49–66 (2014)
Johnson, R.L., Eisler, M.E.: The importance of the first and last letter in words during sentence reading. Acta Psychol. 141(3), 336–351 (2012)
Keuleers, E., Brysbaert, M.: Wuggy: a multilingual pseudoword generator. Behav. Res. Methods 42(3), 627–633 (2010)
Lemhöfer, K., Broersma, M.: Introducing lextale: a quick and valid lexical test for advanced learners of english. Behav. Res. Methods 44(2), 325–343 (2012)
Meara, P., Jones, G.: Tests of vocabulary size in english as a foreign language. Polyglot 8(1), 1–40 (1987)
Nation, P.: Teaching and Learning Vocabulary. Newbury House, Rowley (1990)
Rastle, K., Harrington, J., Coltheart, M.: 358,534 nonwords: the arc nonword database. Q. J. Exp. Psychol. Sect. A 55(4), 1339–1362 (2002)
Schmitt, N.: Vocabulary in Language Teaching. Ernst Klett Sprachen, Stuttgart (2000)
Vatanen, T., Väyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. In: LREC. Citeseer (2010)
Wang, T.H.: What strategies are effective for formative assessment in an e-learning environment? J. Comput. Assist. Learn. 23(3), 171–186 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Hamed, O., Zesch, T. (2018). The Automatic Generation of Nonwords for Lexical Recognition Tests. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-93782-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)