Exploring Extensive Linguistic Feature Sets in Near-Synonym Lexical Choice

Paukkeri, Mari-Sanna; Väyrynen, Jaakko; Arppe, Antti

doi:10.1007/978-3-642-28601-8_1

Mari-Sanna Paukkeri¹⁷,
Jaakko Väyrynen¹⁷ &
Antti Arppe¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1392 Accesses

Abstract

In the near-synonym lexical choice task, the best alternative out of a set of near-synonyms is selected to fill a lexical gap in a text. We experiment on an approach of an extensive set, over 650, linguistic features to represent the context of a word, and a range of machine learning approaches in the lexical choice task. We extend previous work by experimenting with unsupervised and semi-supervised methods, and use automatic feature selection to cope with the problems arising from the rich feature set. It is natural to think that linguistic analysis of the word context would yield almost perfect performance in the task but we show that too many features, even linguistic, introduce noise and make the task difficult for unsupervised and semi-supervised methods. We also show that purely syntactic features play the biggest role in the performance, but also certain semantic and morphological features are needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apidianaki, M.: Data-driven semantic analysis for multilingual WSD and lexical selection in translation. In: Proceedings of EACL 2009, pp. 77–85. ACL (2009)
Google Scholar
Arppe, A.: Univariate, bivariate, and multivariate methods in corpus-based lexicography–a study of synonymy. Ph.D. thesis, University of Helsinki, Finland (2008)
Google Scholar
Baayen, R.H., Arppe, A.: Statistical classification and principles of human learning. In: Proceedings of QITL, vol. 4 (2011)
Google Scholar
Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of EMNLP-CoNLL 2007, pp. 61–72 (2007)
Google Scholar
Comon, P.: Independent component analysis, a new concept? Signal processing 36(3), 287–314 (1994)
Article MATH Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Edmonds, P.: Choosing the word most typical in context using a lexical co-occurrence network. In: Proceedings of EACL 1997, pp. 507–509. ACL (1997)
Google Scholar
Edmonds, P., Hirst, G.: Near-synonymy and lexical choice. Computational Linguistics 28(2), 105–144 (2002)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Haykin, S.: Neural networks: a comprehensive foundation. Prentice-Hall, Englewood Cliffs (1994)
MATH Google Scholar
Inkpen, D., Graeme, H.: Building and using a lexical knowledge base of near-synonym differences. Computational Linguistics 32(2), 223–262 (2006)
Article Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences, vol. 30. Springer, New York (2001)
Book MATH Google Scholar
Kurimo, M., Creutz, M., Turunen, V.: Overview of morpho challenge in CLEF 2007. In: Working Notes of the CLEF 2007 Workshop, pp. 19–21 (2007)
Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
McCarthy, D.: Lexical substitution as a task for WSD evaluation. In: Proceedings of SIGLEX/SENSEVAL 2002, pp. 109–115. ACL (2002)
Google Scholar
McCarthy, D., Navigli, R.: SemEval-2007 task 10: English lexical substitution task. In: Proceedings of SemEval 2007, pp. 48–53. ACL (2007)
Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman & Hall, New York (1990)
Google Scholar
Mihalcea, R., Sinha, R., McCarthy, D.: SemEval-2010 Task 2: Cross-lingual lexical substitution. In: Proceedings of SemEval 2010, pp. 9–14. ACL (2010)
Google Scholar
Sahlgren, M.: The Word-Space Model. Ph.D. thesis, Department of Linguistics, Stockholm University, Stockholm, Sweden (2006)
Google Scholar
Schütze, H.: Dimensions of meaning. In: Proceedings of SC 1992, pp. 787–796. IEEE (1992)
Google Scholar
Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of Applied Natural Language Processing, pp. 64–71. ACL (1997)
Google Scholar
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of ACM SIGIR 1994, pp. 61–69. Springer, Heidelberg (1994)
Google Scholar
Wang, T., Hirst, G.: Near-synonym lexical choice in latent semantic space. In: Proceedings of Coling 2010, pp. 1182–1190. ACL (2010)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of ACL 1995, pp. 189–196. ACL (1995)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Aalto University School of Science, P.O. Box 15400, FI-00076, Aalto, Finland
Mari-Sanna Paukkeri & Jaakko Väyrynen
University of Helsinki, Unioninkatu 40 A, FI-00014, Finland
Antti Arppe

Authors

Mari-Sanna Paukkeri
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Väyrynen
View author publications
You can also search for this author in PubMed Google Scholar
Antti Arppe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paukkeri, MS., Väyrynen, J., Arppe, A. (2012). Exploring Extensive Linguistic Feature Sets in Near-Synonym Lexical Choice. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-28601-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics