Distributional Models for Lexical Semantics: An Investigation of Different Representations for Natural Language Learning

  • Danilo CroceEmail author
  • Simone Filice
  • Roberto Basili
Part of the Studies in Computational Intelligence book series (SCI, volume 589)


Language learning systems usually generalize linguistic observations into rules and patterns that are statistical models of higher level semantic inferences. When the availability of training data is scarce, lexical information can be limited by data sparseness effects and generalization is thus needed. Distributional models represent lexical semantic information in terms of the basic co-occurrences between words in large-scale text collections. As recent works already address, the definition of proper distributional models as well as methods able to express the meaning of phrases or sentences as operations on lexical representations is a complex problem, and a still largely open issue. In this paper, a perspective centered on Convolution Kernels is discussed and the formulation of a Partial Tree Kernel that integrates syntactic information and lexical generalization is studied. Moreover a large scale investigation of different representation spaces, each capturing a different linguistic relation, is provided.


Distributional lexical semantics Kernel methods Question classification 


  1. 1.
    Harris, Z.: Distributional structure. In: Katz, J.J., Fodor, J.A. (eds.) The Philosophy of Linguistics. Oxford University Press, Oxford (1964)Google Scholar
  2. 2.
    Sahlgren, M.: The word-space model. PhD thesis, Stockholm University (2006)Google Scholar
  3. 3.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)zbMATHMathSciNetGoogle Scholar
  4. 4.
    Schutze, H.: Automatic word sense discrimination. J. Comput. Linguist. 24, 97–123 (1998)Google Scholar
  5. 5.
    Lin, D.: Automatic retrieval and clustering of similar word. In: Proceedings of COLING-ACL, Montreal, Canada (1998)Google Scholar
  6. 6.
    Giuliano, C.: Fine-grained classification of named entities exploiting latent semantic kernels. In: Proceedings of CoNLL 2009, CoNLL’09, Stroudsburg, PA, USA, pp. 201–209 (2009)Google Scholar
  7. 7.
    Croce, D., Giannone, C., Annesi, P., Basili, R.: Towards open-domain semantic role labeling. In: ACL, pp. 237–246 (2010)Google Scholar
  8. 8.
    Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguist. 33(2) (2007)Google Scholar
  9. 9.
    Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34, 1388–1429 (2010)CrossRefGoogle Scholar
  10. 10.
    Baroni, M., Lenci, A.: One distributional memory, many semantic spaces. In: Proceedings of the GEMS 2009 Workshop, GEMS’09, Stroudsburg, PA, USA, pp. 1–8 (2009)Google Scholar
  11. 11.
    Clark, S., Pulman, S.: Combining symbolic and distributional models of meaning. In: Proceedings of the AAAI Spring Symposium on Quantum Interaction, pp. 52–55 (2007)Google Scholar
  12. 12.
    Grefenstette, E., Sadrzadeh, M.: Experimental support for a categorical compositional distributional model of meaning. In: Proceedings of EMNLP 2011, Edinburgh, Scotland, UKGoogle Scholar
  13. 13.
    Haussler, D.: Convolution kernels on discrete structures. University of Santa Cruz, Technical report (1999)Google Scholar
  14. 14.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proceedings of ACL’02 (2002)Google Scholar
  15. 15.
    Bloehdorn, S., Moschitti, A.: Combined syntactic and semantic kernels for text classification. In: Proceedings of ECIR 2007, Rome, Italy (2007)Google Scholar
  16. 16.
    Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP 2011Google Scholar
  17. 17.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet::similarity—measuring the relatedness of concept. In: Proceedings of 5th NAACL, Boston, MA (2004)Google Scholar
  18. 18.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18 (1975)Google Scholar
  19. 19.
    Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104 (1997)Google Scholar
  20. 20.
    Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *SEM 2013 shared task: semantic textual similarity, including a pilot on typed-similarity. In: *SEM 2013 (2013)Google Scholar
  21. 21.
    Schütze, H., Pedersen, J.O.: Information retrieval based on word senses. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (1995)Google Scholar
  22. 22.
    Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, Scotland (1998)Google Scholar
  23. 23.
    Graff, D.: English Gigaword. Technical report, Linguistic Data Consortium, Philadelphia (2003)Google Scholar
  24. 24.
    Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. LRE 43(3), 209–226 (2009)Google Scholar
  25. 25.
    Schütze, H.: Word space. In: Advances in Neural Information Processing Systems 5, Morgan Kaufmann, pp. 895–902 (1993)Google Scholar
  26. 26.
    Basili, R., Pennacchiotti, M.: Distributional lexical semantics: toward uniform representation paradigms for advanced acquisition and processing tasks. Nat. Lang. Eng. 16(4), 347–358 (2010)CrossRefGoogle Scholar
  27. 27.
    Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL, pp. 768–774 (1998)Google Scholar
  28. 28.
    Fano, R.M., Hawkins, D.: Transmission of information: a statistical theory of communications. Am. J. Phys. 29(11), 793–794 (1961)CrossRefGoogle Scholar
  29. 29.
    Bengio, Y., Delalleau, O., Roux, N.L.: The curse of dimensionality for local kernel machines. Technical report, Departement d’Informatique et Recherche Operationnelle (2005)Google Scholar
  30. 30.
    Lee, J., Verleysen, M.: Nonlinear Dimensionality Reduction. Information Science and Statistics. Springer, New York (2007)CrossRefzbMATHGoogle Scholar
  31. 31.
    Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Ind. Appl. Math.: Ser. B, Numer. Anal.Google Scholar
  32. 32.
    Johansson, R., Nugues, P.: Dependency-based syntactic-semantic analysis with PropBank and NomBank. In: Proceedings of CoNLL, pp. 183–187 (2008)Google Scholar
  33. 33.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  34. 34.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS), pp. 625–632 (2001)Google Scholar
  35. 35.
    Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: ECML, Machine Learning: ECML, Berlin, Germany, pp. 318–329 (2006)Google Scholar
  36. 36.
    Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proceedings of NAACL, Stroudsburg, PA, USA, pp. 57–60 (2006)Google Scholar
  37. 37.
    Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. In: Brodley, C., Danyluk, A. (eds.) Proceedings of ICML-01 18th International Conference on Machine Learning, Williams College, US, Morgan Kaufmann Publishers, San Francisco, USA, pp. 66–73 (2001)Google Scholar
  38. 38.
    Li, X., Roth, D.: Learning question classifiers. In: Proceedings of ACL’02 (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer Science, Systems and ProductionUniversity of Roma Tor VergataRomeItaly

Personalised recommendations