Language Resources and Evaluation

, Volume 40, Issue 2, pp 109–126 | Cite as

Reader-based exploration of lexical cohesion

  • Beata Beigman KlebanovEmail author
  • Eli Shamir
Original Paper


Lexical cohesion refers to the reader-perceived unity of text achieved by the author’s usage of words with related meanings (Halliday and Hasan, 1976). This article reports on an experiment with 22 readers aimed at finding lexical cohesive patterns in 10 texts. Although there was much diversity in peoples’ answers, we identified a common core of the phenomenon, using statistical analysis of agreement patterns and a validation experiment. The core data may now be used as a minimal test set for models of lexical cohesion; we present an example suggesting that models based on mutually exclusive lexical chains will not suffice. In addition, we believe that procedures for revealing and analyzing sub-group patterns of agreement described here may be applied to data collected in other studies of comparable size.


Lexical cohesion Inter-annotator agreement Cohesion 


  1. Al-Halimi, R., & Kazman R. (1998). Temporal indexing through lexical chaining. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database (pp. 333–351). Cambridge, MA: MIT Press.Google Scholar
  2. Artstein, R., & Poesio M. (2005). Kappa3 = Alpha (or Beta). Technical report, University of Essex CS Technical Report CSM-437.Google Scholar
  3. Barzilay, R., & Elhadad M. (1997) Using lexical chains for text summarization. In Proceedings of the ACL intelligent scalable text summarization workshop (pp. 86–90). Madrid, Spain.Google Scholar
  4. Barzilay, R., & Lapata M. (2005) Modeling local coherence: An entity-based approach. In Proceedings of ACL-05. Ann Arbor, USA.Google Scholar
  5. Bednarek, M. A. (2005). Frames revisited – The coherence-inducing function of frames. Journal of Pragmatics, 37, 685–705.CrossRefGoogle Scholar
  6. Beigman Klebanov, B., & Shamir E. (2005) Guidelines for annotation of concept mention patterns. Technical Report 2005–8, Leibniz Center for Research in Computer Science, The Hebrew University of Jerusalem, Israel.Google Scholar
  7. Camus, A. (1962). The stranger. New York: Vintage Books. Translated from French by Stuart Gilbert. First published in 1942.Google Scholar
  8. Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.Google Scholar
  9. Craggs, R., & McGee Wood, M. (2005). Evaluating discourse and diaglogue coding schemes. Computational Linguistics, 31(3), 289–295.CrossRefGoogle Scholar
  10. Di Eugenio, B., & Glass, M. (2004). The Kappa statistic: A second look. Computational Linguistics, 30(1), 95–101.CrossRefGoogle Scholar
  11. Green, S. (1998) Automated link generation: Can we do better than term repetition? Computer Networks, 30(1–7), 75–84.Google Scholar
  12. Grosz, B., Joshi, A., & Weinstein, S. (1995) Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2), 203–225.Google Scholar
  13. Halliday, M., & Hasan, R. (1976) Cohesion in English. London: Longman Group Ltd.Google Scholar
  14. Hasan R. (1984) Coherence and cohesive harmony. In J. Flood (Eds.), Understanding reading comprehension (pp 181–219). International Reading Association, Delaware.Google Scholar
  15. Hearst, M. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), 33–64.Google Scholar
  16. Hirschman, L., Robinson, P. Burger, J. D. & Vilain M. (1998) Automating coreference: The role of annotated training data. CoRR cmp-lg/9803001.Google Scholar
  17. Hirst, G., & Budanitsky, A. (2005) Correcting real-word spelling errors by restoring lexical cohesion Natural Language Engineering, 11(1), 87–111.CrossRefGoogle Scholar
  18. Hoey, M. (1991). Patterns of Lexis in text. Oxford University Press: Hong Kong.Google Scholar
  19. Karamanis, N., Poesio, M. Mellish, C. & Oberlander, J. (2004) Evaluating centering-based metrics of coherence for text structuring using a reliably annotated corpus. In Proceedings of ACL-04 (pp. 391–398). Barcelona, SpainGoogle Scholar
  20. Krippendorff, K. (1980) Content analysis. Beverly Hills, CA: Sage Publications.Google Scholar
  21. Mann, W., & Thomson S. (1988). Rhetorical structure theory. Text 8(3), 243–281.Google Scholar
  22. Marcu D. (2000) The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge Mass.Google Scholar
  23. Marcus, M., Santorini, B., Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.Google Scholar
  24. Morris, J., & Hirst, G. (1991). Lexical cohesion, the thesaurus, and the structure of text. Computational Linguistics, 17(1), 21–48.Google Scholar
  25. Morris J., & Hirst, G. (2004) Non-classical lexical semantic relations. In Proceedings of HLT-NAACL workshop on computational lexical semantics (pp. 46–51). Boston, MA, USA.Google Scholar
  26. Morris, J., & Hirst G. (2005) The subjectivity of lexical cohesion in text. In J. C. Chanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text. Springer, Dodrecht, The Netherlands.Google Scholar
  27. Passonneau, R. (2004) Computing reliability for coreference annotation. In Proceedings of LREC (vol. 4. pp. 1503–1506).Google Scholar
  28. Poesio, M., & Vieira, R. (1998). A corpus-based investigation of definite description use. Computational Linguistics, 24(2), 183–216.Google Scholar
  29. Schank, R., & Abelson, R. (1977). Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  30. Siddharthan, A., & Copestake A. (2004) Generating referring expressions in open domains. In Proceedings of ACL-04 pp. 407–414. Barcelona, Spain.Google Scholar
  31. Siegel S., & Castellan J. N. (1988). Nonparametric statistics for the behavioral sciences. Boston, MA: McGraw Hill.Google Scholar
  32. Silber, G., & McCoy, K. (2002). Effciently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics, 28(4), 487–496.CrossRefGoogle Scholar
  33. Stairmand, M. A. (1997) Textual context analysis for information retrieval. In Proceedings of ACM SIGIR (pp. 140–147). Philadelphia, PA, USA.Google Scholar
  34. Stokes N., Carthy J., & Smeaton A. F. (2004). SeLeCT: A lexical cohesion based news story segmentation system. Journal of AI Communications, 17(1), 3–12Google Scholar
  35. Vieira, R., & Poesio, M. (2000) An empirically-based system for processing definite descriptions. Computational Linguistics, 26(4), 539–593.CrossRefGoogle Scholar
  36. Webber, B., & Byron D. K. (Eds.), (2004). Proceedings of the ACL-2004 workshop on discourse annotation.Google Scholar

Copyright information

© Springer Science+Business Media 2006

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringThe Hebrew UniversityJerusalemIsrael

Personalised recommendations