Exploring Resources for Lexical Chaining: A Comparison of Automated Semantic Relatedness Measures and Human Judgments

Cramer, Irene; Wandmacher, Tonio; Waltinger, Ulli

doi:10.1007/978-3-642-22613-7_18

Irene Cramer⁷,
Tonio Wandmacher⁸ &
Ulli Waltinger⁹

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

892 Accesses
1 Citations

Abstract

In the past decade various semantic relatedness, similarity, and distance measures have been proposed which play a crucial role in many NLP-applications. Researchers compete for better algorithms (and resources to base the algorithms on), and often only few percentage points seem to suffice in order to prove a new measure (or resource) more accurate than an older one. However, it is still unclear which of them performs best under what conditions. In this work we therefore present a study comparing various relatedness measures. We evaluate them on the basis of a human judgment experiment and also examine several practical issues, such as run time and coverage. We show that the performance of all measures – as compared to human estimates – is still mediocre and argue that the definition of a shared task might bring us considerably closer to results of high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, J.R.: A spreading activation theory of memory. Journal of Verbal Leaning and Verbal Behaviour 22, 261–295 (1983)
Article Google Scholar
Baroni, M., Bernardini, S. (eds.): Wacky! Working papers on the web as corpus. GEDIT, Bologna (2006)
Google Scholar
Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Proceedings of the Intelligent Scalable Text Summarization Workshop, pp. 10–17 (1997)
Google Scholar
Boyd-Graber, J., Fellbaum, C., Osherson, D., Schapire, R.: Adding dense, weighted, connections to wordnet. In: Proceedings of the 3rd Global WordNet Meeting, pp. 29–35 (2006)
Google Scholar
Budanitsky, A.: Lexical semantic relatedness and its application in natural language processing. Tech. rep., Department of Computer Science, University of Toronto (1999), http://citeseerx.ist.psu.edu/viewdoc/summary?doi0.1.1.34.1036
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)
Article Google Scholar
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(1), 510–526 (2007)
Article Google Scholar
Carthy, J.: Lexical chains versus keywords for topic tracking. In: Computational Linguistics and Intelligent Text Processing. LNCS, pp. 507–510. Springer, Heidelberg (2004)
Chapter Google Scholar
Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy. In: Proc. of CoNNL 2003 (2003)
Google Scholar
Church, K., Hanks, P.: Word association norms, mutual information and lexicography. In: Proceedings of the 27th ACL, vol. 27, pp. 76–83 (1989)
Google Scholar
Cilibrasi, R., Vitanyi, P.M.B.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Collins, A., Loftus, E.: A spreading activation theory of semantic processing. Psychological Review 82, 407–428 (1975)
Article Google Scholar
Cramer, I.: How Well Do Semantic Relatedness Measures Perform? A Meta-Study. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing. STEP 2008 Conference Proceedings, Research in Computational Semantics, vol. 1, pp. 59–70. College Publications (2008), http://www.aclweb.org/anthology/W08-2206
Cramer, I., Finthammer, M.: An evaluation procedure for word net based lexical chaining: Methods and issues. In: Proceedings of the 4th Global WordNet Meeting, pp. 120–147 (2008)
Google Scholar
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990), http://citeseer.nj.nec.com/deerwester90indexing.html
Article Google Scholar
Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 6–12 (2007)
Google Scholar
Green, S.J.: Building hypertext links by computing semantic similarity. IEEE Transactions on Knowledge and Data Engineering 11(5) (1999)
Google Scholar
Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005)
Chapter Google Scholar
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)
Google Scholar
Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING X, pp. 19–33 (1997)
Google Scholar
Kilgarriff, A.: Googleology is bad science. Computational Linguistics 33(1), 147–151 (2007)
Article Google Scholar
Landauer, T., Dumais, S.: A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(1), 211–240 (1997)
Article Google Scholar
Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–284. The MIT Press, Cambridge (1998)
Google Scholar
Lemnitzer, L., Kunze, C.: Germanet – representation, visualization, application. In: Proceedings of the 4th Language Resources and Evaluation Conference, pp. 1485–1491 (2002)
Google Scholar
Lemnitzer, L., Wunsch, H., Gupta, P.: Enriching germanet with verb-noun relations – a case study of lexical acquisition. In: Proceedings of the 6th International Language Resources and Evaluation (2008)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304 (1998)
Google Scholar
Marrafa, P., Mendes, S.: Modeling adjectives in computational relational lexica. In: Proceedings of the COLING/ACL 2006, pp. 555–562 (2006) (poster session)
Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similiarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Article Google Scholar
Milne, D.: Computing semantic relatedness using wikipedia link structure. In: Proc. of NZCSRSC 2007 (2007)
Google Scholar
Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational linguistics 17(1) (1991)
Google Scholar
Morris, J., Hirst, G.: Non-classical lexical semantic relations. In: Proc. of HLT-NAACL Workshop on Computational Lexical Semantics (2004)
Google Scholar
Morris, J., Hirst, G.: The subjectivity of lexical cohesion in text. In: Chanahan, J.C., Qu, C., Wiebe, J. (eds.) Computing attitude and affect in text. Springer, Heidelberg (2005)
Google Scholar
Novischi, A., Moldovan, D.: Question answering with lexical chains propagating verb arguments. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 897–904 (2006)
Google Scholar
Rapp, R.: The computation of word associations: Comparing syntagmatic and paradigmatic approaches. In: Proceedings of COLING 2002, Taipei, Taiwan (2002)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Martin, T., L. Ralescu, A. (eds.) IJCAI-WS 1995. LNCS, vol. 1188, Springer, Heidelberg (1997)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Article Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill, New York (1983)
MATH Google Scholar
Schulte im Walde, S., Melinger, A.: Identifying semantic relations and functional properties of human verb associations. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 612–619 (2005)
Google Scholar
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1998)
Google Scholar
Silber, G.H., McCoy, K.F.: Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28(4) (2002)
Google Scholar
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st national conference on Artificial intelligence, vol. 2, pp. 1419–1424. AAAI Press, Menlo Park (2006)
Google Scholar
Teich, E., Fankhauser, P.: Wordnet for lexical cohesion analysis. In: Proc. of the 2nd Global WordNet Conference, GWC 2004 (2004)
Google Scholar
Turney, P.D.: Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: Proceedings of the 12th European Conference on Machine Learning EMCL 2001, pp. 491–502. Springer, London (2001), http://portal.acm.org/citation.cfm?id=645328.650004
Chapter Google Scholar
Wandmacher, T.: How semantic is Latent Semantic Analysis? In: Proceedings of TALN/RECITAL 2005, Dourdan, France (2005)
Google Scholar
Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Elra, E. (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138 (1994)
Google Scholar
Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: Proceedings of the Workshop on Linguistic Distances at COLING/ACL 2006, pp. 16–24 (2006)
Google Scholar
Zesch, T., Gurevych, I., Mühlhäuser, M.: Comparing wikipedia and german wordnet by evaluating semantic relatedness on multiple datasets. In: Proc. of NAACL-HLT (2007)
Google Scholar
Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC). Electronic Proceedings (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for German Language and Literature, Technische Universität Dortmund, Emil-Figge-Straße 50, D-44221, Dortmund, Germany
Irene Cramer
Systran S.A., Paris, France
Tonio Wandmacher
Faculty of Technology, Bielefeld University, Universitätsstraße 25, D-33615, Bielefeld, Germany
Ulli Waltinger

Authors

Irene Cramer
View author publications
You can also search for this author in PubMed Google Scholar
Tonio Wandmacher
View author publications
You can also search for this author in PubMed Google Scholar
Ulli Waltinger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Linguistics and Literature, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, Germany
Alexander Mehler
Institute of Cognitive Science, University of Osnabrück, Albrechtstr. 28, 49076, Osnabrück, Germany
Kai-Uwe Kühnberger
Angewandte Sprachwissenschaft und, Justus-Liebig-Universität Gießen, Computerlinguistik, Otto-Behaghel-Straße 10D, 35394, Gießen, Germany
Henning Lobin & Harald Lüngen &
Institut für deutsche Sprache und Literatur, Technical University Dortmund, Emil-Figge-Straße 50, 44227, Dortmund, Germany
Angelika Storrer
SFB 441 Linguistic Data Structures, Eberhard Karls Universität Tübingen, Nauklerstraße 35, 72074, Tübingen, Germany
Andreas Witt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cramer, I., Wandmacher, T., Waltinger, U. (2011). Exploring Resources for Lexical Chaining: A Comparison of Automated Semantic Relatedness Measures and Human Judgments. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-22613-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics