Language Resources and Evaluation

, Volume 51, Issue 3, pp 727–743 | Cite as

Comparing explicit and predictive distributional semantic models endowed with syntactic contexts

  • Pablo Gamallo
Original Paper


In this article, we introduce an explicit count-based strategy to build word space models with syntactic contexts (dependencies). A filtering method is defined to reduce explicit word-context vectors. This traditional strategy is compared with a neural embedding (predictive) model also based on syntactic dependencies. The comparison was performed using the same parsed corpus for both models. Besides, the dependency-based methods are also compared with bag-of-words strategies, both count-based and predictive ones. The results show that our traditional count-based model with syntactic dependencies outperforms other strategies, including dependency-based embeddings, but just for the tasks focused on discovering similarity between words with the same function (i.e. near-synonyms).


Word similarity Word embeddings Count-based models Dependency-based semantic models 



This research has been partially funded by the Spanish Ministry of Economy and Competitiveness through project FFI2014-51978-C2-1-R. We are very grateful to Omer Levy and Yoav Goldberg for sending us the parsed corpus used to build their embeddings. Moreover, we are also very grateful to the reviewers for their useful comments and suggestions.


  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, NAACL ’09 (pp. 19–27).Google Scholar
  2. Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721.CrossRefGoogle Scholar
  3. Baroni, M., Bernardi, R., & Zamparelli, R. (2014a). Frege in space: A program for compositional distributional semantics. LiLT, 9, 241–346.Google Scholar
  4. Baroni, M., Dinu, G., & Kruszewski, G. (2014b). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 238–247). Baltimore, Maryland.Google Scholar
  5. Biemann, C., & Riedl, M. (2013). Text: Now in 2d! A framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.CrossRefGoogle Scholar
  6. Blacoe, W., & Lapata, M. (2012). A comparison of vector-based representations for semantic composition. In Empirical methods in natural language processing—EMNLP-2012 (pp. 546–556). Jeju Island, Korea.Google Scholar
  7. Bordag, S. (2008) A comparison of co-occurrence and similarity measures as simulations of context. In 9th CICLing (pp. 52–63).Google Scholar
  8. Bullinaria, J . A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3), 510–526.CrossRefGoogle Scholar
  9. Bullinaria, J. A., & Levy, J. P. (2013). Limiting factors for mapping corpus-based semantic representations to brain activity. PLoS One, 8(3), e57191.CrossRefGoogle Scholar
  10. Chen, Z. (2003). Assessing sequence comparison methods with the average precision criterion. Bioinformatics, 19, 2456–2460.CrossRefGoogle Scholar
  11. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In International conference on machine learning. ICML.Google Scholar
  12. Curran, J. R., & Moens, M. (2002). Improvements in automatic thesaurus extraction. In ACL workshop on unsupervised lexical acquisition (pp. 59–66). Philadelphia.Google Scholar
  13. Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.Google Scholar
  14. Fellbaum, C. (1998). A semantic network of english: The mother of all WordNets. Computer and the Humanities, 32, 209–220.CrossRefGoogle Scholar
  15. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., et al. (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.CrossRefGoogle Scholar
  16. Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., et al. (2005). New experiments in distributional representations of synonymy. In Proceedings of the ninth conference on computational natural language learning (pp. 25–32).Google Scholar
  17. Gamallo, P. (2008). Comparing window and syntax based strategies for semantic extraction. In PROPOR-2008. Lecture Notes in Computer Science (pp. 41–50). Springer.Google Scholar
  18. Gamallo, P. (2009). Comparing different properties involved in word similarity extraction. In 14th Portuguese conference on artificial intelligence (EPIA’09), LNCS (Vol. 5816, pp. 634–645). Aveiro: Springer.Google Scholar
  19. Gamallo, P. (2015). Dependency parsing with compression rules. In International workshop on parsing technology (IWPT 2015), Bilbao, Spain.Google Scholar
  20. Gamallo, P., & Bordag, S. (2011). Is singular value decomposition useful for word simalirity extraction. Language Resources and Evaluation, 45(2), 95–119.CrossRefGoogle Scholar
  21. Gamallo, P., & González, I. (2011). A grammatical formalism based on patterns of part-of-speech tags. International Journal of Corpus Linguistics, 16(1), 45–71.CrossRefGoogle Scholar
  22. Gamallo, P., Agustini, A., & Lopes, G. (2005). Clustering syntactic positions with similar semantic requirements. Computational Linguistics, 31(1), 107–146.CrossRefGoogle Scholar
  23. Goldberg, Y., & Nivre, J. (2012). A dynamic oracle for arc-eager dependency parsing. In COLING 2012, 24th international conference on computational linguistics proceedings of the conference: Technical papers, 8–15 (pp. 959–976). Mumbai, India.Google Scholar
  24. Grefenstette, G. (1993). Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Workshop on acquisition of lexical knowledge from text SIGLEX/ACL. Columbus, OH.Google Scholar
  25. Harris, Z. (1985). Distributional structure. In J. Katz (Ed.), The philosophy of linguistics (pp. 26–47). New York: Oxford University Press.Google Scholar
  26. Hofmann, M. J., & Jacobs, A. M. (2014). Interactive activation and competition models and semantic context: From behavioral to brain data. Neuroscience and Biobehavioral Reviews, 46(Part 1), 85–104.CrossRefGoogle Scholar
  27. Hofmann, M., Kuchinke, L., Biemann, C., Tamm, S., & Jacobs, A. (2011). Remembering words in context as predicted by an associative read-out model. Frontiers in Psychology, 252(2), 85–104.Google Scholar
  28. Huang, E., Socher, R., & Manning, C. (2012). Improving word representations via global context and multiple word prototypes. In ACL-2012 (pp. 873–882). Jeju Island, Korea.Google Scholar
  29. Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquision, induction and representation of knowledge. Psychological Review, 10(2), 211–240.CrossRefGoogle Scholar
  30. Lebret, R., & Collobert, R. (2015). Rehabilitation of count-based models for word vector representations. In Gelbukh, A. F. (Ed.), CICLing (1), Springer, Lecture Notes in Computer Science (Vol. 9041, pp. 417–429).Google Scholar
  31. Levy, O., & Goldberg, Y. (2014a). Dependency-based word embeddings. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA (pp. 302–308).Google Scholar
  32. Levy, O., & Goldberg, Y. (2014b) Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, Maryland, USA, June 26–27, 2014 (pp. 171–180).Google Scholar
  33. Levy, O., & Goldberg, Y., (2014c) Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014 (December) (pp. 2177–2185). Montreal, Quebec, Canada.Google Scholar
  34. Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.Google Scholar
  35. Lin, D. (1998). Automatic retrieval and clustering of similar words. In COLING-ACL’98, Montreal.Google Scholar
  36. Lu, C. H., Ong, C. S., Hsub, W. L., & Leeb, H. K. (2011). Using filtered second order co-occurrence matrix to improve the traditional co-occurrence model. In Computer technologies and information sciences, Department of Computer Science and Information Engineering, National Taiwan University,
  37. Mikolov, T., Yih, W., & Zweig, G., (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 746–751). Atlanta, Georgia.Google Scholar
  38. Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.CrossRefGoogle Scholar
  39. Padró, M., Idiart, M., Villavicencio, A., & Ramisch, C. (2014). Nothing like good old frequency: Studying context filters for distributional thesauri. In Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL (pp. 419–424).Google Scholar
  40. Peirsman, Y., Heylen, K., & Speelman, D. (2007). Finding semantically related words in Dutch, co-occurrences versus syntactic contexts. In CoSMO workshop (pp. 9–16). Roskilde, Denmark.Google Scholar
  41. Seretan, V., & Wehrli, E. (2006). Accurate collocation extraction using a multilingual parser. In21st international conference on computational linguistics and the 44th annual meeting of the ACL (pp. 953–960).Google Scholar
  42. Turney, P. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In 12th European conference of machine learning (pp. 491–502).Google Scholar
  43. Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.CrossRefGoogle Scholar
  44. Zhu, P. (2015). N-Grams based linguistic search engine. International Journal of Computational Linguistics Research, 6(1), 1–7.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.Centro de Investigación en Tecnoloxías da Información (CITIUS) Campus Vida Universidade de Santiago de CompostelaSantiago de CompostelaSpain

Personalised recommendations