Abstract
Language learning systems usually generalize linguistic observations into rules and patterns that are statistical models of higher level semantic inferences. When the availability of training data is scarce, lexical information can be limited by data sparseness effects and generalization is thus needed. Distributional models represent lexical semantic information in terms of the basic co-occurrences between words in large-scale text collections. As recent works already address, the definition of proper distributional models as well as methods able to express the meaning of phrases or sentences as operations on lexical representations is a complex problem, and a still largely open issue. In this paper, a perspective centered on Convolution Kernels is discussed and the formulation of a Partial Tree Kernel that integrates syntactic information and lexical generalization is studied. Moreover a large scale investigation of different representation spaces, each capturing a different linguistic relation, is provided.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that SVD emphasizes directions with maximal covariance for \(M\), i.e. term clusters for which it is maximal the difference between contexts, i.e. short syntagmatic patterns.
- 2.
When \(n_1\) and \(n_2\) are not lexical nodes \(\sigma \) will be 0 when \(n_1 \ne n_2\).
- 3.
- 4.
References
Harris, Z.: Distributional structure. In: Katz, J.J., Fodor, J.A. (eds.) The Philosophy of Linguistics. Oxford University Press, Oxford (1964)
Sahlgren, M.: The word-space model. PhD thesis, Stockholm University (2006)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Schutze, H.: Automatic word sense discrimination. J. Comput. Linguist. 24, 97–123 (1998)
Lin, D.: Automatic retrieval and clustering of similar word. In: Proceedings of COLING-ACL, Montreal, Canada (1998)
Giuliano, C.: Fine-grained classification of named entities exploiting latent semantic kernels. In: Proceedings of CoNLL 2009, CoNLL’09, Stroudsburg, PA, USA, pp. 201–209 (2009)
Croce, D., Giannone, C., Annesi, P., Basili, R.: Towards open-domain semantic role labeling. In: ACL, pp. 237–246 (2010)
Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguist. 33(2) (2007)
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34, 1388–1429 (2010)
Baroni, M., Lenci, A.: One distributional memory, many semantic spaces. In: Proceedings of the GEMS 2009 Workshop, GEMS’09, Stroudsburg, PA, USA, pp. 1–8 (2009)
Clark, S., Pulman, S.: Combining symbolic and distributional models of meaning. In: Proceedings of the AAAI Spring Symposium on Quantum Interaction, pp. 52–55 (2007)
Grefenstette, E., Sadrzadeh, M.: Experimental support for a categorical compositional distributional model of meaning. In: Proceedings of EMNLP 2011, Edinburgh, Scotland, UK
Haussler, D.: Convolution kernels on discrete structures. University of Santa Cruz, Technical report (1999)
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proceedings of ACL’02 (2002)
Bloehdorn, S., Moschitti, A.: Combined syntactic and semantic kernels for text classification. In: Proceedings of ECIR 2007, Rome, Italy (2007)
Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP 2011
Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet::similarity—measuring the relatedness of concept. In: Proceedings of 5th NAACL, Boston, MA (2004)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18 (1975)
Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104 (1997)
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *SEM 2013 shared task: semantic textual similarity, including a pilot on typed-similarity. In: *SEM 2013 (2013)
Schütze, H., Pedersen, J.O.: Information retrieval based on word senses. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (1995)
Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, Scotland (1998)
Graff, D.: English Gigaword. Technical report, Linguistic Data Consortium, Philadelphia (2003)
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. LRE 43(3), 209–226 (2009)
Schütze, H.: Word space. In: Advances in Neural Information Processing Systems 5, Morgan Kaufmann, pp. 895–902 (1993)
Basili, R., Pennacchiotti, M.: Distributional lexical semantics: toward uniform representation paradigms for advanced acquisition and processing tasks. Nat. Lang. Eng. 16(4), 347–358 (2010)
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL, pp. 768–774 (1998)
Fano, R.M., Hawkins, D.: Transmission of information: a statistical theory of communications. Am. J. Phys. 29(11), 793–794 (1961)
Bengio, Y., Delalleau, O., Roux, N.L.: The curse of dimensionality for local kernel machines. Technical report, Departement d’Informatique et Recherche Operationnelle (2005)
Lee, J., Verleysen, M.: Nonlinear Dimensionality Reduction. Information Science and Statistics. Springer, New York (2007)
Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Ind. Appl. Math.: Ser. B, Numer. Anal.
Johansson, R., Nugues, P.: Dependency-based syntactic-semantic analysis with PropBank and NomBank. In: Proceedings of CoNLL, pp. 183–187 (2008)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS), pp. 625–632 (2001)
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: ECML, Machine Learning: ECML, Berlin, Germany, pp. 318–329 (2006)
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proceedings of NAACL, Stroudsburg, PA, USA, pp. 57–60 (2006)
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. In: Brodley, C., Danyluk, A. (eds.) Proceedings of ICML-01 18th International Conference on Machine Learning, Williams College, US, Morgan Kaufmann Publishers, San Francisco, USA, pp. 66–73 (2001)
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of ACL’02 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Croce, D., Filice, S., Basili, R. (2015). Distributional Models for Lexical Semantics: An Investigation of Different Representations for Natural Language Learning. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-14206-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14205-0
Online ISBN: 978-3-319-14206-7
eBook Packages: EngineeringEngineering (R0)