Advertisement

Lexical Chains Using Distributional Measures of Concept Distance

  • Meghana Marathe
  • Graeme Hirst
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6008)

Abstract

In practice, lexical chains are typically built using term reiteration or resource-based measures of semantic distance. The former approach misses out on a significant portion of the inherent semantic information in a text, while the latter suffers from the limitations of the linguistic resource it depends upon.

In this paper, chains are constructed using the framework of distributional measures of concept distance, which combines the advantages of resource-based and distributional measures of semantic distance. These chains were evaluated by applying them to the task of text segmentation, where they performed as well as or better than state-of-the-art methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization (ISTS 1997), Madrid, pp. 10–17 (1997)Google Scholar
  2. Bernard, J.R.L. (ed.): The Macquarie thesaurus. Macquarie Library, Sydney (1986)Google Scholar
  3. Bird, S., Dale, R., Dorr, B., Gibson, B., Joseph, M., Kan, M.-Y., Lee, D., Powley, B., Radev, D., Tan, Y.F.: The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics. In: Proceedings of Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco (May 2008)Google Scholar
  4. Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)CrossRefGoogle Scholar
  5. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pp. 26–33. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  6. Choi, F.Y.Y.: JTextTile: A free platform independent text segmentation algorithm. Software (1999), http://www.cs.man.ac.uk/~choif
  7. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication Series. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  8. Grosz, B.J., Sidner, C.L.: Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12(3), 175–204 (1986)Google Scholar
  9. Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)Google Scholar
  10. Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, USA. Association for Computational Linguistics (June 1994)Google Scholar
  11. Hearst, M.A.: TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)Google Scholar
  12. Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, pp. 305–332. The MIT Press, Cambridge (1998)Google Scholar
  13. Hollingsworth, W.A.: Using Lexical Chains to Characterise Scientific Text. PhD thesis, Clare Hall College, University of Cambridge (2008)Google Scholar
  14. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research on Computational Linguistics (ROCLING X), Taiwan (1997)Google Scholar
  15. Kan, M.-Y., Klavans, J.L., McKeown, K.R.: Linear segmentation and segment significance. In: Proceedings of the 6th International Workshop of Very Large Corpora (WVLC-6), Montreal, Quebec, Canada, August 1998, pp. 197–205 (1998)Google Scholar
  16. Kiss, G.R., Armstrong, C., Milroy, R., Piper, J.: An associative thesaurus of English and its computer analysis. In: Aitken, A.J., Bailey, R.W., Hamilton-Smith, N. (eds.) The Computer and Literary Studies. University Press, Edinburgh (1973)Google Scholar
  17. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: SIGDOC 1986: Proceedings of the 5th annual international conference on Systems documentation, pp. 24–26. ACM, New York (1986)CrossRefGoogle Scholar
  18. Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, Canada, August 1998, vol. 2, pp. 768–774. Association for Computational Linguistics (1998a)Google Scholar
  19. Lin, D.: An Information-Theoretic Definition of Similarity. In: ICML 1998: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann Publishers Inc., San Francisco (1998b)Google Scholar
  20. Mohammad, S., Hirst, G.: Distributional measures as proxies for semantic relatedness (2005), http://ftp.cs.toronto.edu/pub/gh/Mohammad+Hirst-2005.pdf
  21. Mohammad, S., Hirst, G.: Distributional measures of concept-distance: A task-oriented evaluation. In: Proceedings, 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia (July 2006)Google Scholar
  22. Morris, J., Hirst, G.: Lexical cohesion, the thesaurus, and the structure of text. Computational Linguistics 17(1), 21–48 (1991)Google Scholar
  23. Okumura, M., Honda, T.: Word sense disambiguation and text segmentation based on lexical cohesion. In: COLING 1994: The 15th International Conference on Computational linguistics, Kyoto, Japan, vol. 2, pp. 755–761 (1994)Google Scholar
  24. Passonneau, R.J., Litman, D.J.: Intention-based Segmentation: Human Reliability and Correlation with Linguistic Cues. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, USA, June 1993, pp. 148–155. Association for Computational Linguistics (1993)Google Scholar
  25. Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet:: Similarity – Measuring the Relatedness of Concepts. In: Marcu, D., Dumais, S., Roukos, S. (eds.) HLT-NAACL 2004: Demonstration Papers, Boston, Massachusetts, USA, May 2004, pp. 38–41. Association for Computational Linguistics (2004)Google Scholar
  26. Pevzner, L., Hearst, M.: A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28, 1–19 (2002)CrossRefGoogle Scholar
  27. Reynar, J.C.: Topic segmentation: Algorithms and applications. PhD thesis, Computer and Information Science, University of Pennsylvania (1998)Google Scholar
  28. Stokes, N., Carthy, J., Smeaton, A.F.: SeLeCT: a lexical cohesion based news story segmentation system. AI Communications 17(1), 3–12 (2004)zbMATHMathSciNetGoogle Scholar
  29. Weeds, J.E.: Measures and applications of lexical distributional similarity. PhD thesis, University of Sussex (September 2003)Google Scholar
  30. Yang, D., Powers, D.M.W.: Word Sense Disambiguation Using Lexical Cohesion in the Context. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, July 2006, pp. 929–936. Association for Computational Linguistics (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Meghana Marathe
    • 1
  • Graeme Hirst
    • 1
  1. 1.University of TorontoToronto

Personalised recommendations