Skip to main content

Exploring Extensive Linguistic Feature Sets in Near-Synonym Lexical Choice

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

  • 1392 Accesses

Abstract

In the near-synonym lexical choice task, the best alternative out of a set of near-synonyms is selected to fill a lexical gap in a text. We experiment on an approach of an extensive set, over 650, linguistic features to represent the context of a word, and a range of machine learning approaches in the lexical choice task. We extend previous work by experimenting with unsupervised and semi-supervised methods, and use automatic feature selection to cope with the problems arising from the rich feature set. It is natural to think that linguistic analysis of the word context would yield almost perfect performance in the task but we show that too many features, even linguistic, introduce noise and make the task difficult for unsupervised and semi-supervised methods. We also show that purely syntactic features play the biggest role in the performance, but also certain semantic and morphological features are needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apidianaki, M.: Data-driven semantic analysis for multilingual WSD and lexical selection in translation. In: Proceedings of EACL 2009, pp. 77–85. ACL (2009)

    Google Scholar 

  2. Arppe, A.: Univariate, bivariate, and multivariate methods in corpus-based lexicography–a study of synonymy. Ph.D. thesis, University of Helsinki, Finland (2008)

    Google Scholar 

  3. Baayen, R.H., Arppe, A.: Statistical classification and principles of human learning. In: Proceedings of QITL, vol. 4 (2011)

    Google Scholar 

  4. Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of EMNLP-CoNLL 2007, pp. 61–72 (2007)

    Google Scholar 

  5. Comon, P.: Independent component analysis, a new concept? Signal processing 36(3), 287–314 (1994)

    Article  MATH  Google Scholar 

  6. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  7. Edmonds, P.: Choosing the word most typical in context using a lexical co-occurrence network. In: Proceedings of EACL 1997, pp. 507–509. ACL (1997)

    Google Scholar 

  8. Edmonds, P., Hirst, G.: Near-synonymy and lexical choice. Computational Linguistics 28(2), 105–144 (2002)

    Article  Google Scholar 

  9. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  10. Haykin, S.: Neural networks: a comprehensive foundation. Prentice-Hall, Englewood Cliffs (1994)

    MATH  Google Scholar 

  11. Inkpen, D., Graeme, H.: Building and using a lexical knowledge base of near-synonym differences. Computational Linguistics 32(2), 223–262 (2006)

    Article  Google Scholar 

  12. Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences, vol. 30. Springer, New York (2001)

    Book  MATH  Google Scholar 

  13. Kurimo, M., Creutz, M., Turunen, V.: Overview of morpho challenge in CLEF 2007. In: Working Notes of the CLEF 2007 Workshop, pp. 19–21 (2007)

    Google Scholar 

  14. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  15. McCarthy, D.: Lexical substitution as a task for WSD evaluation. In: Proceedings of SIGLEX/SENSEVAL 2002, pp. 109–115. ACL (2002)

    Google Scholar 

  16. McCarthy, D., Navigli, R.: SemEval-2007 task 10: English lexical substitution task. In: Proceedings of SemEval 2007, pp. 48–53. ACL (2007)

    Google Scholar 

  17. McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman & Hall, New York (1990)

    Google Scholar 

  18. Mihalcea, R., Sinha, R., McCarthy, D.: SemEval-2010 Task 2: Cross-lingual lexical substitution. In: Proceedings of SemEval 2010, pp. 9–14. ACL (2010)

    Google Scholar 

  19. Sahlgren, M.: The Word-Space Model. Ph.D. thesis, Department of Linguistics, Stockholm University, Stockholm, Sweden (2006)

    Google Scholar 

  20. Schütze, H.: Dimensions of meaning. In: Proceedings of SC 1992, pp. 787–796. IEEE (1992)

    Google Scholar 

  21. Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of Applied Natural Language Processing, pp. 64–71. ACL (1997)

    Google Scholar 

  22. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of ACM SIGIR 1994, pp. 61–69. Springer, Heidelberg (1994)

    Google Scholar 

  23. Wang, T., Hirst, G.: Near-synonym lexical choice in latent semantic space. In: Proceedings of Coling 2010, pp. 1182–1190. ACL (2010)

    Google Scholar 

  24. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of ACL 1995, pp. 189–196. ACL (1995)

    Google Scholar 

  25. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paukkeri, MS., Väyrynen, J., Arppe, A. (2012). Exploring Extensive Linguistic Feature Sets in Near-Synonym Lexical Choice. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28601-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28600-1

  • Online ISBN: 978-3-642-28601-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics