Textual Signatures: Identifying Text-Types Using Latent Semantic Analysis to Measure the Cohesion of Text Structures

  • Philip M. McCarthy
  • Stephen W. Briner
  • Vasile Rus
  • Danielle S. McNamara


Just as a sentence is far more than a mere concatenation of words, a text is far more than a mere concatenation of sentences. Texts contain pertinent information that co-refers across sentences and paragraphs [30]; texts contain relations between phrases, clauses, and sentences that are often causally linked [21], [51], [56]; and texts that depend on relating a series of chronological events contain temporal features that help the reader to build a coherent representation of the text [19], [55]. We refer to textual features such as these as cohesive elements, and they occur within paragraphs (locally), across paragraphs (globally), and in forms such as referential, causal, temporal, and structural [18], [22], [36]. But cohesive elements, and by consequence cohesion, does not simply feature in a text as dialogues tend to feature in narratives, or as cartoons tend to feature in newspapers. That is, cohesion is not present or absent in a binary or optional sense. Instead, cohesion in text exists on a continuum of presence, which is sometimes indicative of the text-type in question [12], [37], [41] and sometimes indicative of the audience for which the text was written [44], [47]. In this chapter, we discuss the nature and importance of cohesion; we demonstrate a computational tool that measures cohesion; and, most importantly, we demonstrate a novel approach to identifying text-types by incorporating contrasting rates of cohesion.


Latent Semantic Analysis Prototypical Model Cohesive Element Computational Linguistics Expository Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Best, R.M., Floyd, R.G., & McNamra, D.S. (2004). Understanding the fourthgrade slump: Comprehension difficulties as a function of reader aptitudes and text genre. Paper presented at the 85th Annual Meeting of the American Educational Research Association.Google Scholar
  2. 2.
    Biber, D. (1987). A textual comparison of British and American writing. American Speech, 62, 99–119.CrossRefGoogle Scholar
  3. 3.
    Biber, D. (1988). Linguistic features: algorithms and functions in variation across speech and writing. Cambridge: Cambridge University Press.Google Scholar
  4. 4.
    Brill, E. (1995). Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA.Google Scholar
  5. 5.
    Britton, B. K., & Gulgoz, S. (1991). Using Kintschs computational model to improve instructional text: Effects of inference calls on recall and cognitive structures. Journal of Educational Psychology, 83, 329–345CrossRefGoogle Scholar
  6. 6.
    Burrows, J. (1987). Word-patterns and story-shapes: The statistical analysis of narrative style. Literary and Linguistic Computing, 2, 6170.CrossRefGoogle Scholar
  7. 7.
    Charniak, E. (1997) Statistical Parsing with a context-free grammar and word statistics Proceedings of the Fourteenth National Conference on Artificial Intelligence, Menlo Park: AAAI/MIT PressGoogle Scholar
  8. 8.
    Charniak, E. (2000) A Maximum-Entropy-Inspired Parser. Proceedings of the North-American Chapter of Association for Computational Linguistics, Seattle, WAGoogle Scholar
  9. 9.
    Charniak, E. & Johnson, M. (2005) Coarse-to-fine n-best parsing and Max-Ent discriminative reranking. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 173–180). Ann Arbor, MIGoogle Scholar
  10. 10.
    Collins, M. (1996) A New Statistical Parser Based on Bigram Lexical Dependencies. Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, CAGoogle Scholar
  11. 11.
    Collins, M. (1997) Three Generative, Lexicalised Models for Statistical Parsing Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain.Google Scholar
  12. 12.
    Crossley, S., Louwerse, M.M., McCarthy, P.M., & McNamara, D.S. (forthcoming 2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91, (2).Google Scholar
  13. 13.
    Dennis, S., Landauer, T., Kintsch, W. & Quesada, J. (2003). Introduction to Latent Semantic Analysis. Slides from the tutorial given at the 25th Annual Meeting of the Cognitive Science Society, Boston.Google Scholar
  14. 14.
    Duran, N., McCarthy, P.M., Graesser, A.C., McNamara, D.S., (2006). An empirical study of temporal indices. Proceedings of the 28th annual conference of the Cognitive Science Society, 2006.Google Scholar
  15. 15.
    Foltz, P. W., Britt, M. A., & Perfetti, C. A. (1996). Reasoning from multiple texts: An automatic analysis of readers’ situation models. In G. W. Cottrell (Ed.) Proceedings of the 18th Annual Cognitive Science Conference (pp. 110–115). Lawrence Erlbaum, NJ.Google Scholar
  16. 16.
    Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). The measurement of textual Coherence with Latent Semantic Analysis. Discourse Processes, 25, 285–307.Google Scholar
  17. 17.
    Foltz, P. W., Gilliam, S., & Kendall, S. (2000). Supporting content-based feedback in on-line writing evaluation with LSA. Interactive Learning Environments, 8, 111–127.CrossRefGoogle Scholar
  18. 18.
    Gernsbacher, M.A. (1990). Language comprehension as structure building. Hillsdale, NJ: Erlbaum.Google Scholar
  19. 19.
    Givn, T. (1995). Coherence in the text and coherence in the mind. In Gernsbacher, M.A. & Givn, T., Coherence in spontaneous text. (pp. 59–115). Amsterdam/Philadelphia, John Benjamins.Google Scholar
  20. 20.
    Graesser, A.C. (1993). Inference generation during text comprehension. Discourse Processes, 16, 1–2.Google Scholar
  21. 21.
    Graesser, A.C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101, 371–95.CrossRefGoogle Scholar
  22. 22.
    Graesser, A.C., McNamara, D., Louwerse, M., & Cai, Z. (2004). Coh-Metrix: CohMetrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36, 193–202.Google Scholar
  23. 23.
    Hearst, M.A. (1994) Multi-paragraph Segmentation of Expository Text. Proceedings of the Association of Computational Linguistics, Las Cruces, NM.Google Scholar
  24. 24.
    Hobbs, J.R. (1985). On the coherence and structure of discourse. CSLI Technical Report, 85–37. Stanford, CA.Google Scholar
  25. 25.
    Hovy, E. (1990). Parsimonious and profligate approaches to the question of discourse structure relations. Proceedings of the Fifth International Workshop on Natural Language generation, East Stroudsburg, PA, Association for Computational Linguistics.Google Scholar
  26. 26.
    Karlsgren J. & Cutting, D. (1994). Recognizing text genres with simple metrics using discriminant analysis. International Conference on Computational Linguistics Proceedings of the 15th conference on Computational linguistics-Volume 2 (pp. 1071–1075). Kyoto, Japan.Google Scholar
  27. 27.
    Kessler, Nunberg, G., & Schutze, H. (1997). Automatic detection of text genre. In Proceedings of 35th Annual Meeting of Association for Computational Linguistics, and in 8th Conference of European Chapter of Association for Computational Linguistics (pp. 32–38). Madrid, Spain.Google Scholar
  28. 28.
    Kintsch, W. & Bowles, A. (2002) Metaphor comprehension: What makes a metaphor difficult to understand? Metaphor and Symbol, 2002, 17, 249–262.Google Scholar
  29. 29.
    Kintsch, E., Steinhart, D., Stahl, G., LSA Research Group, Matthews, C., & Lamb, R. (2000). Developing summarization skills through the use of LSAbased feedback. Interactive Learning Environments 8, 87–109.CrossRefGoogle Scholar
  30. 30.
    Kintsch, W., & van Dijk, T.A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394.CrossRefGoogle Scholar
  31. 31.
    Labov, W. (1972). The Transformation of Experience in Narrative Syntax, In W. Labov (ed.), Language in the Inner City, 1972, University of Pennsylvania Press, Philadelphia.Google Scholar
  32. 32.
    Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.CrossRefGoogle Scholar
  33. 33.
    Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284.Google Scholar
  34. 34.
    Lehman, S., & Schraw, G. (2002). Effects of coherence and relevance on shallow and deep text processing. Journal of Educational Psychology, 94, 738–750.CrossRefGoogle Scholar
  35. 35.
    Linderholm, T., Everson, M.G., van den Broek, Mischinski, M., Crittenden, A., & Samuels, J. (2000). Effects of causal text revisions on more and less skilled readers comprehension of easy and difficult text. Cognition and Instruction, 18, 525–556.CrossRefGoogle Scholar
  36. 36.
    Louwerse, M.M. (2002). Computational retrieval of themes. In M.M. Louwerse & W. van Peer (Eds.), Thematics: Interdisciplinary Studies (pp. 189–212). Amsterdam/Philadelphia: John Benjamins.Google Scholar
  37. 37.
    Louwerse, M. M., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2004). Variation in language and cohesion across written and spoken registers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 843–848). Mahwah, NJ: Erlbaum.Google Scholar
  38. 38.
    Loxterman, J.A., Beck, I. L., & McKeown, M.G. (1994). The effects of thinking aloud during reading on students’ comprehension of more or less coherent text. Reading Research Quarterly, 29, 353–367.CrossRefGoogle Scholar
  39. 39.
    Mani, I. & Pustejovsky, J. (2004). Temporal discourse markers for narrative structures. ACL Workshop on Discourse Annotation, Barcelona, Spain. East Stoudsburg, PA, Association for Computational Linguistics.Google Scholar
  40. 40.
    Mann, W. C. & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8 (3). 243–281Google Scholar
  41. 41.
    McCarthy, P.M., Lightman, E.J., Dufty, D.F. & McNamara (in press). Using Coh-Metrix to assess distributions of cohesion and difficulty in high-school textbooks. Proceedings of the 28th annual conference of the Cognitive Science Society.Google Scholar
  42. 42.
    McCarthy, P.M., Lewis, G.A., Dufty, D.F., & McNamara, D.S. (2006). Analyzing Writing Styles with Coh-Metrix. 19th International FLAIRS Conference 2006.Google Scholar
  43. 43.
    McNamara, D.S., Kintsch, E., Songer, N.B., & Kintsch, W. (1996). Are good texts always better? Text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14, 1–43.CrossRefGoogle Scholar
  44. 44.
    McNamara, D. S. (2001). Reading both high and low coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology, 55, 51–62.Google Scholar
  45. 45.
    Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Girju, R., Goodrum, R., & Rus, V. (2000): The Structure and Performance of an Open-Domain Question Answering System, in Proceedings of ACL 2000, Hong Kong, OctoberGoogle Scholar
  46. 46.
    Morris, J., Hirst, G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linquistics, 17, 21–48.Google Scholar
  47. 47.
    Ozuru, Y., Dempsey, K., Sayroo, J., & McNamara, D. S. (2005). Effects of text cohesion on comprehension of biology texts. Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 1696–1701). Hillsdale, NJ: Erlbaum.Google Scholar
  48. 48.
    Propp, V. (1968). Morphology of the folk tale. Baltimore: Port City Press, pp 19–65.Google Scholar
  49. 49.
    Ratnaparkhi, A. (1996), A maximum entropy model for part-of-speech tagging. Proceedings of Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania.Google Scholar
  50. 50.
    Stamatatos, E., Fakotatos, N., & Kokkinakis, G. (2001). Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35, 193–214.CrossRefGoogle Scholar
  51. 51.
    Trabasso, T., & van den Broek, P. (1985). Causal thinking and the representation of narrative events. Journal of Memory and Language, 24, 612–630.CrossRefGoogle Scholar
  52. 52.
    Voorhees, E. M. & Tice, D.M. (2000). Building a question answering test collection. Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information RetrievalGoogle Scholar
  53. 53.
    Wolfe, M. B., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). Learning from text: Matching readers and text by Latent Semantic Analysis. Discourse Processes, 25, 309–336.CrossRefGoogle Scholar
  54. 54.
    Wolfe, M. B.W., & Goldman S.R. (2003). Use of latent semantic analysis for predicting psychological phenomena: Two issues and proposed solutions. Behavior Research Methods, Instruments, & Computers, 35, 22–31.Google Scholar
  55. 55.
    Zwaan, R.A.(1996). Processing narrative time shifts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1196–1207.CrossRefGoogle Scholar
  56. 56.
    Zwaan, R.A. & Radvansky, G.A. (1998). Situation models in language comprehension and Memory. Psychological Bulletin, 123, 162–185.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Philip M. McCarthy
    • 1
  • Stephen W. Briner
    • 1
  • Vasile Rus
    • 1
  • Danielle S. McNamara
    • 1
  1. 1.Department of Psychology, Institute for Intelligent SystemsUniversity of MemphisMemphisUSA

Personalised recommendations