Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Coh-Metrix: Analysis of text on cohesion and language

Abstract

Advances in computational linguistics and discourse processing have made it possible to automate many language- and text-processing mechanisms. We have developed a computer tool called Coh-Metrix, which analyzes texts on over 200 measures of cohesion, language, and readability. Its modules use lexicons, part-of-speech classifiers, syntactic parsers, templates, corpora, latent semantic analysis, and other components that are widely used in computational linguistics. After the user enters an English text, Coh-Metrix returns measures requested by the user. In addition, a facility allows the user to store the results of these analyses in data files (such as Text, Excel, and SPSS). Standard text readability formulas scale texts on difficulty by relying on word length and sentence length, whereas Coh-Metrix is sensitive to cohesion relations, world knowledge, and language and discourse characteristics.

References

  1. Allen, J. (1995).Natural language understanding. Redwood City, CA: Benjamin/Cummings.

  2. Baayen, R. H., Piepenbrock, R., &Gulikers, L. (1995).The CELEX lexical database (CD-ROM). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.

  3. Belew, R. K. (2002). Finding out about: A cognitive perspective on search engine technology and the WWW.Information Retrieval,5,269–278.

  4. Biber, D., Conrad, S., &Reppen, R. (1998).Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

  5. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging.Computational Linguistics,21543–566.

  6. Brooks, C., &Warren, R. P. (1972).Modern rhetoric. New York: Harcourt Brace Jovanovich.

  7. Brown, G. D. A. (1984). A frequency count of 190,000 words in theLondon-Lund Corpus of English Conversation.Behavior Research Methods, Instruments, & Computers,16, 502–532.

  8. Burgess, C., Livesay, K., &Lund, K. (1998). Explorations in context space: Words, sentences, and discourse.Discourse Processes,25, 211–257.

  9. Coltheart, M. (1981). The MRC psycholinguistic database.Quarterly Journal of Experimental Psychology,33A, 497–505.

  10. DARPA (1995).Proceedings of the Sixth Message Understanding Conference (MUC-6). San Francisco: Morgan Kaufman.

  11. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., &Harshman, R. (1990). Indexing by latent semantic analysis.Journal of the American Society for Information Science,41,391–407.

  12. Fellbaum, C. (Ed.) (1998).WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

  13. Foltz, P. W. (1996). Latent semantic analysis for text-based research.Behavior Research Methods, Instruments, & Computers,28,197–202.

  14. Francis, W. N., &Kucera, H. (1982).Frequency analysis of English usage. Boston: Houghton-Mifflin.

  15. Gernsbacher, M. A., &Faust, M. (1991). The mechanism of suppression: A component of general comprehension skill.Journal of Experimental Psychology: Learning, Memory, & Cognition,17,245–262.

  16. Gilhooly, K. J., &Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words.Behavior Research Methods & Instrumentation,12, 395–427.

  17. Graesser, A. C., Gernsbacher, M. A., &Goldman, S. R. (2003). Introduction to theHandbook of discourse processes. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman (Eds.),Handbook of discourse processes (pp. 1–24). Mahwah, NJ: Erlbaum.

  18. Graesser, A. C., Burger, J., Carrol, J., Corbett, A., Ferro, L., Gordon, D., Greiff, W., Harabagiu, S., Howell, K., Kelly, H., Litman, D., Louwerse, M., Moore, A., Pell, A., Prange, J., Voorhees, E., & Ward, W. (2003).Question generation and answering systems: R&D for technology-enabled learning systems. Research roadmap for the Federation of American Sciences. Unpublished manuscript.

  19. Graesser, A. C., Karnavat, A. B., Daniel, F. K., Cooper, E., Whitten, S. N., &Louwerse, M. (2001). A computer tool to improve questionnaire design. InStatistical Policy Working Paper 33, Federal Committee on Statistical Methodology (pp. 36–48). Washington, DC: Bureau of Labor Statistics.

  20. Graesser, A. C., McNamara, D. S., &Louwerse, M. M. (2003). What do readers need to learn in order to process coherence relations in narrative and expository text? In A. P. Sweet & C. E. Snow (Eds.),Rethinking reading comprehension (pp. 82–98). New York: Guilford.

  21. Graesser, A. C., Person, N., Harter, D., &the Tutoring Research Group (2001). Teaching tactics and dialog in AutoTutor.International Journal of Artificial Intelligence in Education,12, 257–279.

  22. Graesser, A. C., Singer, M., &Trabasso, T. (1994). Constructing inferences during narrative text comprehension.Psychological Review,101,371–395.

  23. Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., &Harter, D. (2001). Intelligent tutoring systems with conversational dialogue.AI Magazine,22(4), 39–52.

  24. Graesser, A. C., Wiemer-Hastings, K., Kreuz, R., Wiemer-Hastings, P., &Marquis, K. (2000). QUAID: A questionnaire evaluation aid for survey methodologists.Behavior Research Methods, Instruments, & Computers,32, 254–262.

  25. Haberlandt, K., &Graesser, A. C. (1985). Component processes in text comprehension and some of their interactions.Journal of Experimental Psychology: General,114,357–374.

  26. Halliday, M. A., &Hasan, R. (1976).Cohesion in English. London: Longman.

  27. Jurafsky, D., &Martin, J. H. (2000).Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Prentice-Hall.

  28. Just, M. A. &Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension.Psychological Review,87,329–354.

  29. Kintsch, W. (1998).Comprehension: Aparadigmfor cognition. Cambridge: Cambridge University Press.

  30. Kintsch, W., &van Dijk, T. A. (1978). Toward a model of text comprehension and production.Psychological Review,85,363–394.

  31. Klare, G. R. (1974–1975). Assessing readability.Reading Research Quarterly,10, 62–102.

  32. Landauer, T. K., &Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.Psychological Review,104,211–240.

  33. Landauer, T. K., Foltz, P. W., &Laham, D. (1998). An introduction to latent semantic analysis.Discourse Processes,25, 259–284.

  34. Lehnert, W. G. (1997). Information extraction: What have we learned?Discourse Processes,23, 441–470.

  35. Lehnert, W. G., &Ringle, M. H. (Eds.) (1982).Strategies for natural language processing. Hillsdale, NJ: Erlbaum.

  36. Louwerse, M. M. (2002). An analytic and cognitive parameterization of coherence relations.Cognitive Linguistics,12,291–315.

  37. Louwerse, M. M., & Graesser, A. C. (in press). Coherence in discourse. In P. Strazny (Ed.),Encyclopedia of linguistics. Chicago: Fitzroy Dearborn.

  38. Louwerse, M. M., &Mitchell, H. H. (2003). Toward a taxonomy of a set of discourse markers in dialog: A theoretical and computational linguistic account.Discourse Processes,35, 199–239.

  39. Marcus, M., Santorini, B., &Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank.Computational Linguistics,19,313–330.

  40. McNamara, D. S. (2001). Reading both high and low coherence texts: Effects of text sequence and prior knowledge.Canadian Journal of Experimental Psychology,55,51–62.

  41. McNamara, D. S., Kintsch, E., Songer, N. B., &Kintsch, W. (1996). Are good texts always better? Text coherence, background knowledge, and levels of understanding in learning from text.Cognition & Instruction,14,1–43.

  42. McNamara, D. S., &Kintsch, W. (1996). Learning from text: Effects of prior knowledge and text coherence.Discourse Processes,22,247–287.

  43. McNamara, D. S., &McDaniel, M. (2004). Suppressing irrelevant information: Knowledge activation or inhibition?Journal of Experimental Psychology: Learning, Memory, & Cognition,30,465–482.

  44. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., &Miller, K. (1990).Five papers on WordNet (Tech. Rep. No. 43). Princeton, NJ: Princeton University, Cognitive Science Laboratory.

  45. Moore, J. D., &Wiemer-Hastings, P. (2003). Discourse in computational linguistics and artificial intelligence. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman (Eds.),Handbook of discourse processes (pp. 439–486). Mahwah, NJ: Erlbaum.

  46. Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery and meaningfulness values for 925 words.Journal of Experimental Psychology Monograph Supplements,76(3, Part 2).

  47. Pennebaker, J. W., &Francis, M. E. (1999).Linguistic inquiry and word count (LIWC). Mahwah, NJ: Erlbaum.

  48. Robertson, S. (2001).Evaluation in information retrieval: Lectures on information retrieval. New York: Springer-Verlag.

  49. Schank, R., &Riesbeck, C. K. (Eds.) (1981).Inside computer understanding. Hillsdale, NJ: Erlbaum.

  50. Sekine, S., &Grishman, R. (1995). A corpus-based probabilistic grammar with only two nonterminals. InFour th International Workshop on Parsing Technologies (pp. 260–270). Prague: Karlovy Vary.

  51. Thorndike, E. L., &Lorge, I. (1944).The teacher’s word book of 30,000 words. New York: Teachers College.

  52. Toglia, M. P., &Battig, W. R. (1978).Handbook of semantic word norms. Hillsdale, NJ: Erlbaum.

  53. Trabasso, T., &van den Broek, P. (1985). Causal thinking and the representation of narrative events.Journal of Memory & Language,24,612–630.

  54. van den Broek, P., Virtue, S., Everson, M. G., Tzeng, Y., &Sung, Y. (2002). Comprehension and memory of science texts: Inferential processes and the construction of a mental representation. In J. Otero, J. Leon, & A. C. Graesser (Eds.),The psychology of science text comprehension (pp. 131–154). Mahwah, NJ: Erlbaum.

  55. Voorhees, E. (2001). The TREC Question Answering Track.Natural Language Engineering,7,361–378.

  56. Zipf, G. (1949).Human behavior and the principle of least effort: An introduction to human ecology. Cambridge, MA: Addison-Wesley.

  57. Zwaan, R. A., &Radvansky, G. A. (1998). Situation models in language comprehension and memory.Psychological Bulletin,123162–185.

Download references

Author information

Correspondence to Arthur C. Graesser or Danielle S. McNamara.

Additional information

The research was supported by Institute for Education Sciences Grant IES R3056020018-02 and National Science Foundation Grant SES 9977969. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the IES or the NSF.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Graesser, A.C., McNamara, D.S., Louwerse, M.M. et al. Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers 36, 193–202 (2004). https://doi.org/10.3758/BF03195564

Download citation

Keywords

  • Latent Semantic Analysis
  • Content Word
  • World Knowledge
  • Discourse Process
  • Sentence Pair