Skip to main content

Exploring Resources for Lexical Chaining: A Comparison of Automated Semantic Relatedness Measures and Human Judgments

  • Chapter
Modeling, Learning, and Processing of Text Technological Data Structures

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

Abstract

In the past decade various semantic relatedness, similarity, and distance measures have been proposed which play a crucial role in many NLP-applications. Researchers compete for better algorithms (and resources to base the algorithms on), and often only few percentage points seem to suffice in order to prove a new measure (or resource) more accurate than an older one. However, it is still unclear which of them performs best under what conditions. In this work we therefore present a study comparing various relatedness measures. We evaluate them on the basis of a human judgment experiment and also examine several practical issues, such as run time and coverage. We show that the performance of all measures – as compared to human estimates – is still mediocre and argue that the definition of a shared task might bring us considerably closer to results of high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J.R.: A spreading activation theory of memory. Journal of Verbal Leaning and Verbal Behaviour 22, 261–295 (1983)

    Article  Google Scholar 

  2. Baroni, M., Bernardini, S. (eds.): Wacky! Working papers on the web as corpus. GEDIT, Bologna (2006)

    Google Scholar 

  3. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Proceedings of the Intelligent Scalable Text Summarization Workshop, pp. 10–17 (1997)

    Google Scholar 

  4. Boyd-Graber, J., Fellbaum, C., Osherson, D., Schapire, R.: Adding dense, weighted, connections to wordnet. In: Proceedings of the 3rd Global WordNet Meeting, pp. 29–35 (2006)

    Google Scholar 

  5. Budanitsky, A.: Lexical semantic relatedness and its application in natural language processing. Tech. rep., Department of Computer Science, University of Toronto (1999), http://citeseerx.ist.psu.edu/viewdoc/summary?doi0.1.1.34.1036

  6. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)

    Article  Google Scholar 

  7. Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(1), 510–526 (2007)

    Article  Google Scholar 

  8. Carthy, J.: Lexical chains versus keywords for topic tracking. In: Computational Linguistics and Intelligent Text Processing. LNCS, pp. 507–510. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy. In: Proc. of CoNNL 2003 (2003)

    Google Scholar 

  10. Church, K., Hanks, P.: Word association norms, mutual information and lexicography. In: Proceedings of the 27th ACL, vol. 27, pp. 76–83 (1989)

    Google Scholar 

  11. Cilibrasi, R., Vitanyi, P.M.B.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)

    Article  Google Scholar 

  12. Collins, A., Loftus, E.: A spreading activation theory of semantic processing. Psychological Review 82, 407–428 (1975)

    Article  Google Scholar 

  13. Cramer, I.: How Well Do Semantic Relatedness Measures Perform? A Meta-Study. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing. STEP 2008 Conference Proceedings, Research in Computational Semantics, vol. 1, pp. 59–70. College Publications (2008), http://www.aclweb.org/anthology/W08-2206

  14. Cramer, I., Finthammer, M.: An evaluation procedure for word net based lexical chaining: Methods and issues. In: Proceedings of the 4th Global WordNet Meeting, pp. 120–147 (2008)

    Google Scholar 

  15. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990), http://citeseer.nj.nec.com/deerwester90indexing.html

    Article  Google Scholar 

  16. Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  17. Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 6–12 (2007)

    Google Scholar 

  18. Green, S.J.: Building hypertext links by computing semantic similarity. IEEE Transactions on Knowledge and Data Engineering 11(5) (1999)

    Google Scholar 

  19. Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  20. Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)

    Google Scholar 

  21. Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)

    Google Scholar 

  22. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING X, pp. 19–33 (1997)

    Google Scholar 

  23. Kilgarriff, A.: Googleology is bad science. Computational Linguistics 33(1), 147–151 (2007)

    Article  Google Scholar 

  24. Landauer, T., Dumais, S.: A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(1), 211–240 (1997)

    Article  Google Scholar 

  25. Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–284. The MIT Press, Cambridge (1998)

    Google Scholar 

  26. Lemnitzer, L., Kunze, C.: Germanet – representation, visualization, application. In: Proceedings of the 4th Language Resources and Evaluation Conference, pp. 1485–1491 (2002)

    Google Scholar 

  27. Lemnitzer, L., Wunsch, H., Gupta, P.: Enriching germanet with verb-noun relations – a case study of lexical acquisition. In: Proceedings of the 6th International Language Resources and Evaluation (2008)

    Google Scholar 

  28. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  29. Marrafa, P., Mendes, S.: Modeling adjectives in computational relational lexica. In: Proceedings of the COLING/ACL 2006, pp. 555–562 (2006) (poster session)

    Google Scholar 

  30. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similiarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  31. Milne, D.: Computing semantic relatedness using wikipedia link structure. In: Proc. of NZCSRSC 2007 (2007)

    Google Scholar 

  32. Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational linguistics 17(1) (1991)

    Google Scholar 

  33. Morris, J., Hirst, G.: Non-classical lexical semantic relations. In: Proc. of HLT-NAACL Workshop on Computational Lexical Semantics (2004)

    Google Scholar 

  34. Morris, J., Hirst, G.: The subjectivity of lexical cohesion in text. In: Chanahan, J.C., Qu, C., Wiebe, J. (eds.) Computing attitude and affect in text. Springer, Heidelberg (2005)

    Google Scholar 

  35. Novischi, A., Moldovan, D.: Question answering with lexical chains propagating verb arguments. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 897–904 (2006)

    Google Scholar 

  36. Rapp, R.: The computation of word associations: Comparing syntagmatic and paradigmatic approaches. In: Proceedings of COLING 2002, Taipei, Taiwan (2002)

    Google Scholar 

  37. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Martin, T., L. Ralescu, A. (eds.) IJCAI-WS 1995. LNCS, vol. 1188, Springer, Heidelberg (1997)

    Google Scholar 

  38. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  39. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill, New York (1983)

    MATH  Google Scholar 

  40. Schulte im Walde, S., Melinger, A.: Identifying semantic relations and functional properties of human verb associations. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 612–619 (2005)

    Google Scholar 

  41. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1998)

    Google Scholar 

  42. Silber, G.H., McCoy, K.F.: Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28(4) (2002)

    Google Scholar 

  43. Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st national conference on Artificial intelligence, vol. 2, pp. 1419–1424. AAAI Press, Menlo Park (2006)

    Google Scholar 

  44. Teich, E., Fankhauser, P.: Wordnet for lexical cohesion analysis. In: Proc. of the 2nd Global WordNet Conference, GWC 2004 (2004)

    Google Scholar 

  45. Turney, P.D.: Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: Proceedings of the 12th European Conference on Machine Learning EMCL 2001, pp. 491–502. Springer, London (2001), http://portal.acm.org/citation.cfm?id=645328.650004

    Chapter  Google Scholar 

  46. Wandmacher, T.: How semantic is Latent Semantic Analysis? In: Proceedings of TALN/RECITAL 2005, Dourdan, France (2005)

    Google Scholar 

  47. Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Elra, E. (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  48. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  49. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138 (1994)

    Google Scholar 

  50. Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: Proceedings of the Workshop on Linguistic Distances at COLING/ACL 2006, pp. 16–24 (2006)

    Google Scholar 

  51. Zesch, T., Gurevych, I., Mühlhäuser, M.: Comparing wikipedia and german wordnet by evaluating semantic relatedness on multiple datasets. In: Proc. of NAACL-HLT (2007)

    Google Scholar 

  52. Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC). Electronic Proceedings (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cramer, I., Wandmacher, T., Waltinger, U. (2011). Exploring Resources for Lexical Chaining: A Comparison of Automated Semantic Relatedness Measures and Human Judgments. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22613-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22612-0

  • Online ISBN: 978-3-642-22613-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics