The Automatic Generation of Nonwords for Lexical Recognition Tests

Hamed, Osama; Zesch, Torsten

doi:10.1007/978-3-319-93782-3_23

Osama Hamed¹⁶ &
Torsten Zesch¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

Language and Technology Conference

556 Accesses

Abstract

Lexical recognition tests are frequently used to assess vocabulary knowledge. In such tests, learners need to differentiate between words and artificial nonwords that look much like real words. Our ultimate goal is to create high quality lexical recognition tests automatically which enables repetitive automated testing for different languages. This task involves both simple (words selection) and complex (nonwords generation) subtasks. Our main goal here is to automatically generate word-like nonwords. We compare different ranking strategy and find that our best strategy (a specialized higher-order character-based language model) creates word-like nonwords. We evaluate our nonwords in a user study and find that our automatically generated test yields scores that are highly correlated with a well-established lexical recognition test which was manually created.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It means generous in English.
2.
This is a different size compared to nonwords in LexTALE that are 4 to 11 letters long. In order to ensure comparability with LexTALE, we follow those length constraints, but newly generated tests should use the same constraints for words and nonwords.
3.
https://moodle.org.
4.
http://www.englishprofile.org/index.php/the-cef.
5.
http://elexicon.wustl.edu.

References

Baayen, R.H., Piepenbrock, R., Gulikers, L.: The Celex Lexical Database (Release 2). Linguistic Data Consortium, Philadelphia (1995)
Google Scholar
Balota, D.A., Yap, M.J., Hutchison, K.A., Cortese, M.J., Kessler, B., Loftis, B., Neely, J.H., Nelson, D.L., Simpson, G.B., Treiman, R.: The English lexicon project. Behav. Res. Methods 39(3), 445–459 (2007)
Article Google Scholar
Brysbaert, M.: LexTALE_FR a fast, free, and efficient test to measure language proficiency in French. Psychol. Belg. 53(1), 23–37 (2013)
Article Google Scholar
Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. Ann. Arbor. MI 48113(2), 161–175 (1994)
Google Scholar
Duyck, W., Desmet, T., Verbeke, L.P., Brysbaert, M.: Wordgen: a tool for word selection and nonword generation in dutch, english, german, and french. Behav. Res. Methods Instrum. Comput. 36(3), 488–499 (2004)
Article Google Scholar
Francis, W.N., Kuçera, H.: Manual of Information to Accompany a Standard Corpus of Present-day Edited American English, for use with Digital Computers. Brown University, Providence (1964)
Google Scholar
Greenberg, J.H.: Some generalizations concerning initial and final consonant sequences. Linguistics 3(18), 5–34 (1965)
Article Google Scholar
Huibregtse, I., Admiraal, W., Meara, P.: Scores on a yes-no vocabulary test: correction for guessing and response style. Lang. Test. 19(3), 227–245 (2002)
Article Google Scholar
Izura, C., Cuetos, F., Brysbaert, M.: Lextale-esp: a test to rapidly and efficiently assess the spanish vocabulary size. Psicol. Int. J. Methodol. Exp. Psychol. 35(1), 49–66 (2014)
Google Scholar
Johnson, R.L., Eisler, M.E.: The importance of the first and last letter in words during sentence reading. Acta Psychol. 141(3), 336–351 (2012)
Article Google Scholar
Keuleers, E., Brysbaert, M.: Wuggy: a multilingual pseudoword generator. Behav. Res. Methods 42(3), 627–633 (2010)
Article Google Scholar
Lemhöfer, K., Broersma, M.: Introducing lextale: a quick and valid lexical test for advanced learners of english. Behav. Res. Methods 44(2), 325–343 (2012)
Article Google Scholar
Meara, P., Jones, G.: Tests of vocabulary size in english as a foreign language. Polyglot 8(1), 1–40 (1987)
Google Scholar
Nation, P.: Teaching and Learning Vocabulary. Newbury House, Rowley (1990)
Google Scholar
Rastle, K., Harrington, J., Coltheart, M.: 358,534 nonwords: the arc nonword database. Q. J. Exp. Psychol. Sect. A 55(4), 1339–1362 (2002)
Article Google Scholar
Schmitt, N.: Vocabulary in Language Teaching. Ernst Klett Sprachen, Stuttgart (2000)
Google Scholar
Vatanen, T., Väyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. In: LREC. Citeseer (2010)
Google Scholar
Wang, T.H.: What strategies are effective for formative assessment in an e-learning environment? J. Comput. Assist. Learn. 23(3), 171–186 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Lab, Department of Computer Science and Applied Cognitive Science, University of Duisburg-Essen, Forsthausweg 2, 47057, Duisburg, Germany
Osama Hamed & Torsten Zesch

Authors

Osama Hamed
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Zesch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osama Hamed .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay Cedex, France
Joseph Mariani
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hamed, O., Zesch, T. (2018). The Automatic Generation of Nonwords for Lexical Recognition Tests. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-93782-3_23
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics