Coh-Metrix: Analysis of text on cohesion and language

  • Arthur C. GraesserEmail author
  • Danielle S. McNamaraEmail author
  • Max M. Louwerse
  • Zhiqiang Cai


Advances in computational linguistics and discourse processing have made it possible to automate many language- and text-processing mechanisms. We have developed a computer tool called Coh-Metrix, which analyzes texts on over 200 measures of cohesion, language, and readability. Its modules use lexicons, part-of-speech classifiers, syntactic parsers, templates, corpora, latent semantic analysis, and other components that are widely used in computational linguistics. After the user enters an English text, Coh-Metrix returns measures requested by the user. In addition, a facility allows the user to store the results of these analyses in data files (such as Text, Excel, and SPSS). Standard text readability formulas scale texts on difficulty by relying on word length and sentence length, whereas Coh-Metrix is sensitive to cohesion relations, world knowledge, and language and discourse characteristics.


Latent Semantic Analysis Content Word World Knowledge Discourse Process Sentence Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Allen, J. (1995).Natural language understanding. Redwood City, CA: Benjamin/Cummings.Google Scholar
  2. Baayen, R. H., Piepenbrock, R., &Gulikers, L. (1995).The CELEX lexical database (CD-ROM). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.Google Scholar
  3. Belew, R. K. (2002). Finding out about: A cognitive perspective on search engine technology and the WWW.Information Retrieval,5,269–278.CrossRefGoogle Scholar
  4. Biber, D., Conrad, S., &Reppen, R. (1998).Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.Google Scholar
  5. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging.Computational Linguistics,21543–566.Google Scholar
  6. Brooks, C., &Warren, R. P. (1972).Modern rhetoric. New York: Harcourt Brace Jovanovich.Google Scholar
  7. Brown, G. D. A. (1984). A frequency count of 190,000 words in theLondon-Lund Corpus of English Conversation.Behavior Research Methods, Instruments, & Computers,16, 502–532.CrossRefGoogle Scholar
  8. Burgess, C., Livesay, K., &Lund, K. (1998). Explorations in context space: Words, sentences, and discourse.Discourse Processes,25, 211–257.CrossRefGoogle Scholar
  9. Coltheart, M. (1981). The MRC psycholinguistic database.Quarterly Journal of Experimental Psychology,33A, 497–505.Google Scholar
  10. DARPA (1995).Proceedings of the Sixth Message Understanding Conference (MUC-6). San Francisco: Morgan Kaufman.Google Scholar
  11. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., &Harshman, R. (1990). Indexing by latent semantic analysis.Journal of the American Society for Information Science,41,391–407.CrossRefGoogle Scholar
  12. Fellbaum, C. (Ed.) (1998).WordNet: An electronic lexical database. Cambridge, MA: MIT Press.Google Scholar
  13. Foltz, P. W. (1996). Latent semantic analysis for text-based research.Behavior Research Methods, Instruments, & Computers,28,197–202.CrossRefGoogle Scholar
  14. Francis, W. N., &Kucera, H. (1982).Frequency analysis of English usage. Boston: Houghton-Mifflin.Google Scholar
  15. Gernsbacher, M. A., &Faust, M. (1991). The mechanism of suppression: A component of general comprehension skill.Journal of Experimental Psychology: Learning, Memory, & Cognition,17,245–262.CrossRefGoogle Scholar
  16. Gilhooly, K. J., &Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words.Behavior Research Methods & Instrumentation,12, 395–427.CrossRefGoogle Scholar
  17. Graesser, A. C., Gernsbacher, M. A., &Goldman, S. R. (2003). Introduction to theHandbook of discourse processes. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman (Eds.),Handbook of discourse processes (pp. 1–24). Mahwah, NJ: Erlbaum.Google Scholar
  18. Graesser, A. C., Burger, J., Carrol, J., Corbett, A., Ferro, L., Gordon, D., Greiff, W., Harabagiu, S., Howell, K., Kelly, H., Litman, D., Louwerse, M., Moore, A., Pell, A., Prange, J., Voorhees, E., & Ward, W. (2003).Question generation and answering systems: R&D for technology-enabled learning systems. Research roadmap for the Federation of American Sciences. Unpublished manuscript.Google Scholar
  19. Graesser, A. C., Karnavat, A. B., Daniel, F. K., Cooper, E., Whitten, S. N., &Louwerse, M. (2001). A computer tool to improve questionnaire design. InStatistical Policy Working Paper 33, Federal Committee on Statistical Methodology (pp. 36–48). Washington, DC: Bureau of Labor Statistics.Google Scholar
  20. Graesser, A. C., McNamara, D. S., &Louwerse, M. M. (2003). What do readers need to learn in order to process coherence relations in narrative and expository text? In A. P. Sweet & C. E. Snow (Eds.),Rethinking reading comprehension (pp. 82–98). New York: Guilford.Google Scholar
  21. Graesser, A. C., Person, N., Harter, D., &the Tutoring Research Group (2001). Teaching tactics and dialog in AutoTutor.International Journal of Artificial Intelligence in Education,12, 257–279.Google Scholar
  22. Graesser, A. C., Singer, M., &Trabasso, T. (1994). Constructing inferences during narrative text comprehension.Psychological Review,101,371–395.PubMedCrossRefGoogle Scholar
  23. Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., &Harter, D. (2001). Intelligent tutoring systems with conversational dialogue.AI Magazine,22(4), 39–52.Google Scholar
  24. Graesser, A. C., Wiemer-Hastings, K., Kreuz, R., Wiemer-Hastings, P., &Marquis, K. (2000). QUAID: A questionnaire evaluation aid for survey methodologists.Behavior Research Methods, Instruments, & Computers,32, 254–262.CrossRefGoogle Scholar
  25. Haberlandt, K., &Graesser, A. C. (1985). Component processes in text comprehension and some of their interactions.Journal of Experimental Psychology: General,114,357–374.CrossRefGoogle Scholar
  26. Halliday, M. A., &Hasan, R. (1976).Cohesion in English. London: Longman.Google Scholar
  27. Jurafsky, D., &Martin, J. H. (2000).Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Prentice-Hall.Google Scholar
  28. Just, M. A. &Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension.Psychological Review,87,329–354.PubMedCrossRefGoogle Scholar
  29. Kintsch, W. (1998).Comprehension: Aparadigmfor cognition. Cambridge: Cambridge University Press.Google Scholar
  30. Kintsch, W., &van Dijk, T. A. (1978). Toward a model of text comprehension and production.Psychological Review,85,363–394.CrossRefGoogle Scholar
  31. Klare, G. R. (1974–1975). Assessing readability.Reading Research Quarterly,10, 62–102.CrossRefGoogle Scholar
  32. Landauer, T. K., &Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.Psychological Review,104,211–240.CrossRefGoogle Scholar
  33. Landauer, T. K., Foltz, P. W., &Laham, D. (1998). An introduction to latent semantic analysis.Discourse Processes,25, 259–284.CrossRefGoogle Scholar
  34. Lehnert, W. G. (1997). Information extraction: What have we learned?Discourse Processes,23, 441–470.CrossRefGoogle Scholar
  35. Lehnert, W. G., &Ringle, M. H. (Eds.) (1982).Strategies for natural language processing. Hillsdale, NJ: Erlbaum.Google Scholar
  36. Louwerse, M. M. (2002). An analytic and cognitive parameterization of coherence relations.Cognitive Linguistics,12,291–315.CrossRefGoogle Scholar
  37. Louwerse, M. M., & Graesser, A. C. (in press). Coherence in discourse. In P. Strazny (Ed.),Encyclopedia of linguistics. Chicago: Fitzroy Dearborn.Google Scholar
  38. Louwerse, M. M., &Mitchell, H. H. (2003). Toward a taxonomy of a set of discourse markers in dialog: A theoretical and computational linguistic account.Discourse Processes,35, 199–239.CrossRefGoogle Scholar
  39. Marcus, M., Santorini, B., &Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank.Computational Linguistics,19,313–330.Google Scholar
  40. McNamara, D. S. (2001). Reading both high and low coherence texts: Effects of text sequence and prior knowledge.Canadian Journal of Experimental Psychology,55,51–62.PubMedGoogle Scholar
  41. McNamara, D. S., Kintsch, E., Songer, N. B., &Kintsch, W. (1996). Are good texts always better? Text coherence, background knowledge, and levels of understanding in learning from text.Cognition & Instruction,14,1–43.CrossRefGoogle Scholar
  42. McNamara, D. S., &Kintsch, W. (1996). Learning from text: Effects of prior knowledge and text coherence.Discourse Processes,22,247–287.CrossRefGoogle Scholar
  43. McNamara, D. S., &McDaniel, M. (2004). Suppressing irrelevant information: Knowledge activation or inhibition?Journal of Experimental Psychology: Learning, Memory, & Cognition,30,465–482.CrossRefGoogle Scholar
  44. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., &Miller, K. (1990).Five papers on WordNet (Tech. Rep. No. 43). Princeton, NJ: Princeton University, Cognitive Science Laboratory.Google Scholar
  45. Moore, J. D., &Wiemer-Hastings, P. (2003). Discourse in computational linguistics and artificial intelligence. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman (Eds.),Handbook of discourse processes (pp. 439–486). Mahwah, NJ: Erlbaum.Google Scholar
  46. Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery and meaningfulness values for 925 words.Journal of Experimental Psychology Monograph Supplements,76(3, Part 2).Google Scholar
  47. Pennebaker, J. W., &Francis, M. E. (1999).Linguistic inquiry and word count (LIWC). Mahwah, NJ: Erlbaum.Google Scholar
  48. Robertson, S. (2001).Evaluation in information retrieval: Lectures on information retrieval. New York: Springer-Verlag.Google Scholar
  49. Schank, R., &Riesbeck, C. K. (Eds.) (1981).Inside computer understanding. Hillsdale, NJ: Erlbaum.Google Scholar
  50. Sekine, S., &Grishman, R. (1995). A corpus-based probabilistic grammar with only two nonterminals. InFour th International Workshop on Parsing Technologies (pp. 260–270). Prague: Karlovy Vary.Google Scholar
  51. Thorndike, E. L., &Lorge, I. (1944).The teacher’s word book of 30,000 words. New York: Teachers College.Google Scholar
  52. Toglia, M. P., &Battig, W. R. (1978).Handbook of semantic word norms. Hillsdale, NJ: Erlbaum.Google Scholar
  53. Trabasso, T., &van den Broek, P. (1985). Causal thinking and the representation of narrative events.Journal of Memory & Language,24,612–630.CrossRefGoogle Scholar
  54. van den Broek, P., Virtue, S., Everson, M. G., Tzeng, Y., &Sung, Y. (2002). Comprehension and memory of science texts: Inferential processes and the construction of a mental representation. In J. Otero, J. Leon, & A. C. Graesser (Eds.),The psychology of science text comprehension (pp. 131–154). Mahwah, NJ: Erlbaum.Google Scholar
  55. Voorhees, E. (2001). The TREC Question Answering Track.Natural Language Engineering,7,361–378.CrossRefGoogle Scholar
  56. Zipf, G. (1949).Human behavior and the principle of least effort: An introduction to human ecology. Cambridge, MA: Addison-Wesley.Google Scholar
  57. Zwaan, R. A., &Radvansky, G. A. (1998). Situation models in language comprehension and memory.Psychological Bulletin,123162–185.PubMedCrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2004

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of MemphisMemphis

Personalised recommendations