Behavior Research Methods

, Volume 42, Issue 2, pp 381–392 | Cite as

MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment

  • Philip M. McCarthyEmail author
  • Scott Jarvis
Articles From the SCiP Conference


The main purpose of this study was to examine the validity of the approach to lexical diversity assessment known as the measure of textual lexical diversity (MTLD). The index for this approach is calculated as the mean length of word strings that maintain a criterion level of lexical variation. To validate the MTLD approach, we compared it against the performances of the primary competing indices in the field, which include vocd-D, TTR, Maas, Yule’s K, and an HD-D index derived directly from the hypergeometric distribution function. The comparisons involved assessments of convergent validity, divergent validity, internal validity, and incremental validity. The results of our assessments of these indices across two separate corpora suggest three major findings. First, MTLD performs well with respect to all four types of validity and is, in fact, the only index not found to vary as a function of text length. Second, HD-D is a viable alternative to the vocd-D standard. And third, three of the indices—MTLD, vocd-D (or HD-D), and Maas—appear to capture unique lexical information. We conclude by advising researchers to consider using MTLD, vocd-D (or HD-D), and Maas in their studies, rather than any single index, noting that lexical diversity can be assessed in many ways and each approach may be informative as to the construct under investigation.


Specific Language Impairment Latent Semantic Analysis Factor Size Divergent Validity Hypergeometric Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  2. Best, R., Ozuru, Y., Floyd, R., & McNamara, D. S. (2006). Children’s text comprehension: Effects of genre, knowledge, and text cohesion. In S. A. Barab, K. E. Hay, & D. T. Hickey (Eds.), Proceedings of the Seventh International Conference of the Learning Sciences (pp. 37–42). Mahwah, NJ: Erlbaum.Google Scholar
  3. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.Google Scholar
  4. Biber, D. (1989). A typology of English texts. Linguistics, 27, 3–43.CrossRefGoogle Scholar
  5. Biggs, A., Daniel, L., Feather, R. M., Ortleb, E., Rillero, P., Snyder, S. L., & Zike, D. (2003). Glencoe science: Science level green. New York: Glencoe/McGraw-Hill.Google Scholar
  6. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.Google Scholar
  7. Crossley, S. A., & McNamara, D. S. (2009). Computationally assessing lexical differences in L1 and L2 writing. Journal of Second Language Writing, 18, 119–135.CrossRefGoogle Scholar
  8. Crossley, S. A., & McNamara, D. S. (in press). Predicting second language writing proficiency: The role of cohesion, readability, and lexical difficulty. Journal of Research in Reading.Google Scholar
  9. Crossley, S. A., Salsbury, T., & McNamara, D. S. (2009). Measuring second language lexical growth using hypernymic relationships. Language Learning, 59, 307–334.CrossRefGoogle Scholar
  10. Dempsey, K. B., McCarthy, P. M., & McNamara, D. S. (2007). Using phrasal verbs as an index to distinguish text genres. In D. Wilson & G. Sutcliffe (Eds.), Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference (pp. 217–222). Menlo Park, CA: AAAI Press.Google Scholar
  11. Dugast, D. (1978). Sur quoi se fonde la notion d’étendue théoretique du vocabulaire? Le Français Moderne, 46, 25–32.Google Scholar
  12. Ertmer, P. A., Bai, H., Dong, C., Khalil, M., Park, S. H., & Wang, L. (2002). Online professional development: Building administrators’ capacity for technology leadership. Journal in Computing Teacher Education, 19, 5–11.Google Scholar
  13. Glaser, B. G., & Strauss, A. (1967). Discovery of grounded theory: Strategies for qualitative research. New York: Aldine.Google Scholar
  14. Harris Wright, H., Silverman, S. W., & Newhoff, M. (2003). Measures of lexical diversity in aphasia. Aphasiology, 17, 443–452.CrossRefGoogle Scholar
  15. Herdan, G. (1964). Quantitative linguistics. London: Butterworths.Google Scholar
  16. Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech & Hearing Research, 29, 129–134.Google Scholar
  17. Honoré, A. (1979). Some simple measures of richness of vocabulary. Association for Literary & Linguistic Computing Bulletin, 7, 172–177.Google Scholar
  18. Jarvis, S. (2002). Short texts, best fitting curves, and new measures of lexical diversity. Language Testing, 19, 57–84.CrossRefGoogle Scholar
  19. Johansson, S., Leech, G., & Goodluck, H. (1978). Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Oslo: University of Oslo, Department of English.Google Scholar
  20. Johnson, W. (1944). Studies in language behavior: I. A program of research. Psychological Monographs, 56, 1–15.CrossRefGoogle Scholar
  21. Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.Google Scholar
  22. Landauer, T. K., Laham, D., Rehder, B., & Schreiner, M. E. (1997). How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans. In M. G. Shafto & P. Langley (Eds.), Proceedings of the 19th Annual Meeting of the Cognitive Science Society (pp. 412–417). Mahwah, NJ: Erlbaum.Google Scholar
  23. Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage.Google Scholar
  24. Louwerse, M. M., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2004). Variation in language and cohesion across written and spoken registers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Conference of the Cognitive Science Society (pp. 843–848). Mahwah, NJ: Erlbaum.Google Scholar
  25. Maas, H. D. (1972). Zusammenhang zwischen Wortschatzumfang und Länge eines Textes. Zeitschrift für Literaturwissenschaft und Linguistik, 8, 73–79.Google Scholar
  26. Malvern, D. D., Richards, B. J., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. Houndmills, NH: Palgrave Macmillan.CrossRefGoogle Scholar
  27. McCarthy, P. M., Dufty, D., Hempelman, C., Cai, Z., Graesser, A. C., & McNamara, D. S. (in press). Evaluating givenness/newness. Discourse Processes.Google Scholar
  28. McCarthy, P. M., & Jarvis, S. (2007). A theoretical and empirical evaluation of vocd. Language Testing, 24, 459–488.CrossRefGoogle Scholar
  29. McCarthy, P. M., Myers, J. C., Briner, S. W., Graesser, A. C., & McNamara, D. S. (2009). A psychological and computational study of genre recognition. Journal for Language Technology & Computational Linguistics, 24, 23–55.Google Scholar
  30. McEnery, T. (2003). Corpus linguistics. In R. Mitkov (Ed.), Handbook of computational linguistics (pp. 448–463). Oxford: Oxford University Press.Google Scholar
  31. McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary & Linguistic Computing, 15, 323–337.CrossRefGoogle Scholar
  32. McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27, 57–86.CrossRefGoogle Scholar
  33. McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (in press). Coh-Metrix: Capturing linguistic features of cohesion. Discourse Processes.Google Scholar
  34. Miller, D. P. (1981). The depth/breadth trade-off in hierarchical computer menus. In Proceedings of the Human Factors Society 25th Annual Meeting (pp. 296–300). Santa Monica, CA: HFES.Google Scholar
  35. Morse, J. M. (1995). The significance of saturation. Qualitative Health Research, 5, 147–149.CrossRefGoogle Scholar
  36. Olney, A. M. (2007). Latent semantic grammar induction: Context, projectivity, and prior distributions. In R. Dragomir & R. Mihalcea (Eds.), Proceedings of TextGraphs-2: Graph-based algorithms for natural language processing (pp. 45–52). Rochester, NY: Association for Computational Linguistics.Google Scholar
  37. Ong, A. D., & van Dulmen, M. H. M. (2006). Oxford handbook of methods in positive psychology. Oxford: Oxford University Press.Google Scholar
  38. Orlov, Y. K. (1983). Ein Model der Häufigekeitsstruktur des Vokabulars. In H. Guiter & M. V. Arapov (Eds.), Studies on Zipf’s law (pp. 154–233). Bochum: Brockmeyer.Google Scholar
  39. Owen, A. J., & Leonard, L. B. (2002). Lexical diversity in the spontaneous speech of children with specific language impairment: Application of D. Journal of Speech & Hearing Research, 45, 927–937.CrossRefGoogle Scholar
  40. Silverman, S. W., & Bernstein Ratner, N. (2000). Word frequency distributions and type-token characteristics. Mathematical Scientist, 11, 45–72.Google Scholar
  41. Somers, H. H. (1966). Statistical methods in literary analysis. In J. Leeds (Ed.), The computer and literary style (pp. 128–140). Kent, OH: Kent State University.Google Scholar
  42. Templin, M. (1957). Certain language skills in children. Minneapolis: University of Minnesota Press.Google Scholar
  43. Tuldava, J. (1993). The statistical structure of a text and its readability. In L. Hrebícek & G. Altmann (Eds.), Quantitative text analysis (pp. 215–227). Trier: Wissenschaftlicher Verlag.Google Scholar
  44. Tweedie, F. J., & Baayen, R. H. (1998). How variable may a constant be? Measures of lexical richness in perspective. Computers & the Humanities, 32, 323–352.CrossRefGoogle Scholar
  45. Van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New York: Academic Press.Google Scholar
  46. Wu, T. (1993). An accurate computation of the hypergeometric distribution function. ACM Transactions on Mathematical Software, 19, 33–43.CrossRefGoogle Scholar
  47. Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge: Cambridge University Press.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2010

Authors and Affiliations

  1. 1.Department of EnglishUniversity of MemphisMemphis
  2. 2.Ohio UniversityAthens

Personalised recommendations