Vector Space Models in Detection of Semantically Non-compositional Word Combinations in Turkish

  • Levent Tolga Eren
  • Senem Kumova MetinEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11179)


The semantic compositionality presents the relation between the meanings of word combinations and their components. Simply, in non-compositional expressions, the words combine to generate a different meaning. This is why, identification of non-compositional expressions (e.g. idioms) become important in natural language processing tasks such as machine translation and word sense disambiguation.

In this study, we explored the performance of vector space models in detection of non-compositional expressions in Turkish. A data set of 2229 uninterrupted two-word combinations that is built from six different Turkish corpora is utilized. Three sets of five different vector space models are employed in the experiments. The evaluation of models is performed using well-known accuracy and F-measures. The experimental results showed that the model that measures the similarity between the vectors of word combination and the second composing word produced higher average F-scores for all testing corpora.


Semantic compositionality Vector space model Turkish 


  1. 1.
    Baldwin, T.: Compositionality and multiword expressions: six of one, half a dozen of the other? In: COLING/ACL 2006 Workshop on MWEs, Invited Speech (2006)Google Scholar
  2. 2.
    Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: shared task description and results. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21–28 (2011)Google Scholar
  3. 3.
    Almi, P., Snajder, J.: Determining the semantic compositionality of croatian multiword expressions. In: 9th Language Technologies Conference Information Society, IS 2014 (2014)Google Scholar
  4. 4.
    Pedersen, T.: Identifying collocations to measure compositionality: shared task system description. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 33–37 (2011)Google Scholar
  5. 5.
    Chakraborty, T., Pal, S., Mondal, T., Saikh, T.: Shared task system description: measuring the compositionality of bigrams using statistical methodologies, pp. 38–42. Association for Computational Linguistics (2011)Google Scholar
  6. 6.
    Maldonado-guerra, A., Emms, M.: Measuring the compositionality of collocations via word co-occurrence vectors: shared task system description, pp. 48–53. Association for Computational Linguistics (2011)Google Scholar
  7. 7.
    Vecchi, E.M., Baroni, M., Zamparelli, R.: (Linear) maps of the impossible: capturing semantic anomalies in distributional space. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 1–9 (2011)Google Scholar
  8. 8.
    Johannsen, A., Alonso, H.M., Rishøj, C., Søgaard, A.: Shared task system description: frustratingly hard compositionality prediction. In: Proceedings of DiSCo, pp. 29–32 (2011)Google Scholar
  9. 9.
    Reddy, S., McCarthy, D., Manandhar, S., Gella, S.: Exemplar-based word-space model for compositionality detection: shared task system description. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 54–60. Association for Computational Linguistics (2011)Google Scholar
  10. 10.
    Zanzotto, F.M., Dell’Arciprete, L.: Distributed structures and distributional meaning. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, DiSCo 2011, Portland, Oregon, pp. 10–15 (2011)Google Scholar
  11. 11.
    Van De Cruys, T.: Two multivariate generalizations of pointwise mutual information. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 16–20 (2011)Google Scholar
  12. 12.
    Krčmář, L., Ježek, K., Pecina, P.: Determining compositionality of word expressions using various word space models and measures. In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, Sofia, Bulgaria, 9 August 2013, pp. 64–73 (2013)Google Scholar
  13. 13.
    Salehi, B., Cook, P., Baldwin, T.: Detecting non-compositional MWE components using Wiktionary. In: EMNLP (2014)Google Scholar
  14. 14.
    Salehi, B., Askarian, N., Fazly, A.: Automatic identification of Persian light verb constructions. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, vol. 7181, pp. 201–210. Springer, Heidelberg (2012). Scholar
  15. 15.
    de Medeiros Caseli, H., Ramisch, C., das Graças Volpe Nunes, M., Villavicencio, A.: Alignment-based extraction of multiword expressions. Lang. Resour. Eval. 44, 59–77 (2010)CrossRefGoogle Scholar
  16. 16.
    Can, F., Kocberber, S., Baglioglu, O., Kardas, S., Ocalan, H.C., Uyar, E.: New event detection and topic tracking in Turkish. J. Am. Soc. Inf. Sci. Technol. 61, 802–819 (2010)Google Scholar
  17. 17.
    Tur, G., Hakkani-Tur, D., Oflazer, K.: A statistical information extraction system for Turkish. Nat. Lang. Eng. 9, 181–210 (2003)CrossRefGoogle Scholar
  18. 18.
    Quasthoff, U., Richter, M., Biemann, C.: Corpus portal for search in monolingual corpora. In: Proceedings of 5th International Conference on Language Resources and Evaluation, pp. 1799–1802 (2006)Google Scholar
  19. 19.
    Say, B., Zeyrek, D., Oflazer, K., Umut, Ö.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the Eleventh International Conference of Turkish Linguistics (2002)Google Scholar
  20. 20.
    Dinçer, B.T.: Türkçe için istatistiksel bir bilgi geri-getirim sistemi (2004)Google Scholar
  21. 21.
    Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proceedings of the 23rd International Conference on Computational Linguistics, Coling 2010, pp. 116–124 (2010)Google Scholar
  22. 22.
    Choueka, Y.: Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: RIAO, pp. 609–624 (1988)Google Scholar
  23. 23.
    Metin, S.K., Taze, M.: A procedure to build multiword expression data set. In: 2nd International Conference on Computer and Communication Systems, pp. 46–49 (2017)Google Scholar
  24. 24.
    Mitchell, J., Lapata, M.: Vector-based models of semantic composition, pp. 236–244. Association for Computational Linguistics (2008)Google Scholar
  25. 25.
  26. 26.
    Aka Uymaz, H., Metin, S.K.: A comprehensive analysis of web-based frequency in multiword expression detection. Int. J. Intell. Syst. Appl. Eng. (2017, in print)Google Scholar
  27. 27.
    Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 977–983 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Faculty of Engineeringİzmir University of EconomicsİzmirTurkey

Personalised recommendations