Multimedia Tools and Applications

, Volume 75, Issue 21, pp 13015–13022 | Cite as

Finding an appropriate lexical diversity measurement for a small-sized corpus and its application to a comparative study of L2 learners’ writings

  • Woonho Choi
  • HwaYoung Jeong


The present study investigates four kinds of lexical diversity measurement and a computational experiment with corpus processing and statistical test has been conducted to find out the most effective lexical diversity measurement in evaluating a small-sized corpus of 350 ~ 550 words. The results show that the D-estimate is the most appropriate among the four lexical diversity measurements which were compared in this research. Also the D-estimate showed more stable results than other measurements when the number of words varied between texts. The D-estimate was applied to measure the morphological and grammatical diversities of L2 learners of the Korean language, and conduct a statistical test on whether the mother tongues of L2 learners affect the degree of acquisition of grammatical morphemes. The test shows that the native languages of L2 learners learning Korean did not seem to have a significant impact.


L2 learning TTR (Type-Token Ratio) D-estimate Yule’s K Guiraud’s R Lexical diversity 


  1. 1.
    Baayen RH (2008) Analyzing linguistic data: a practical introduction to statistics using R. Cambridge University Press, NYCrossRefGoogle Scholar
  2. 2.
    Chang KH, Jeon EJ (2008) A study on the diversity of words used by middle and high school students. Korean Semant 27:225–242Google Scholar
  3. 3.
    Durán P, Malvern D, Brian R, Ngoni C (2004) Development trends in lexical diversity. Appl Linguist 25(2):220–242Google Scholar
  4. 4.
    Jin DY (2006) A study on vocabulary as a component of KSL writing ability. Biling Res 30:385–418Google Scholar
  5. 5.
    Kang S (2002) Korean morphological analyzer and information retrieval. Hongneung Science Publication, SeoulGoogle Scholar
  6. 6.
    Lee HY (2010) The comparison on the Korean language proficiency of American heritage learners and that of non-heritage learners in their beginning level. Biling Res 44:275–294Google Scholar
  7. 7.
    Mellor A (2011) Essay length, lexical diversity and automatic essay scoring. Mem Osaka Inst Technol Ser B 55(2):1–14MathSciNetGoogle Scholar
  8. 8.
    Ministry of Culture, Sports, and Tourism (2010) The research on the actual condition and demand of Korean language educational institutions. The National Institute of the Korean Language, Republic of KoreaGoogle Scholar
  9. 9.
    Park JE, Kim YJ (2014) Lexical diversity in the writings of advanced Korean learners. J Korean Lang Educ 25(2):1–32Google Scholar
  10. 10.
    Text Corpus from Project Gutenberg available on, (2011)
  11. 11.
    Tweedie FJ, Baayen RH (1998) How variable may a constant be? Measures of lexical richness in perspective. Comput Hum 32:323–335CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Korean Language and LiteratureMokpo National UniversityMuan-gunSouth Korea
  2. 2.Humanitas CollegeKyung Hee UniversitySeoulSouth Korea

Personalised recommendations