Semantic Tree Kernels for Statistical Natural Language Learning

Croce, Danilo; Basili, Roberto; Moschitti, Alessandro

doi:10.1007/978-3-319-14206-7_5

Danilo Croce⁷,
Roberto Basili⁷ &
Alessandro Moschitti⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 589))

447 Accesses
2 Citations

Abstract

A central topic in Natural Language Processing (NLP) is the design of effective linguistic processors suitable for the target applications. Within this scenario, Convolution Kernels provide a powerful method to directly apply Machine Learning algorithms to complex structures representing linguistic information. The main topic of this work is the definition of the semantically Smoothed Partial Tree Kernel (SPTK), a generalized formulation of one of the most performant Convolution Kernels, i.e. the Tree Kernel (TK), by extending the similarity between tree structures with node similarities. The main characteristic of SPTK is its ability to measure the similarity between syntactic tree structures, which are partially similar and whose nodes can differ but are nevertheless semantically related. One of the most important outcomes is that SPTK allows for embedding external lexical information in the kernel function only through a similarity function among lexical nodes. The SPTK has been evaluated in three complex automatic Semantic Processing tasks: Question Classification in Question Answering, Verb Classification and Semantic Role Labeling. Although these tasks address different problems, state-of-the-art results have been achieved in every evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the Impact of Linguistic Information in Kernel-Based Deep Architectures

Kernel methods for word sense disambiguation

Article 30 December 2015

Deep Learning for Natural Language Processing: A Survey

Article 26 June 2023

Notes

1.
It implies that \(n_c(n_{1})=n_c(n_{2})\).
2.
http://disi.unitn.it/moschitti/Tree-Kernel.htm.
3.
http://cogcomp.cs.illinois.edu/Data/QA/QC/.
4.
Note that in [37], higher accuracy values for smoothed STK are shown for different parameters but the best according to a validation set is not highlighted.
5.
The average running time of the SK is much higher than the one of PTK. When a tree is composed by only one level PTK collapses to SK.
6.
Using one of the 8 processors of an Intel(R) Xeon(R) CPU E5430 @ 2.66 GHz machine, 32 Gb Ram.

References

Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Boston (1999)
Google Scholar
Kwok, C.C., Etzioni, O., Weld, D.S.: Scaling question answering to the web. In: World Wide Web, pp. 150–161 (2001)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Article Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press, Cambridge (1998)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)
MATH Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS’2001), pp. 625–632 (2001)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004).
Google Scholar
Johansson, R., Nugues, P.: The effect of syntactic representation on semantic role labeling. In: Proceedings of COLING, Manchester, 18–22 Aug 2008
Google Scholar
Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguist. 33(2), (2007)
Google Scholar
Sahlgren, M.: The Word-space model. PhD thesis, Stockholm University (2006)
Google Scholar
Schütze, H.: Word space. In: Advances in Neural Information Processing Systems 5, pp. 895–902. Morgan Kaufmann (1993)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of ACL’02 (2002)
Google Scholar
Brown, S.W., Dligach, D., Palmer, M.: Verbnet class assignment as a WSD task. In: Proceedings of the Ninth International Conference on Computational Semantics, IWCS’11, pp. 85–94. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Gildea, D., Palmer, M.: The necessity of parsing for predicate argument recognition. In: Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL-02), Philadelphia (2002)
Google Scholar
Gildea, D., Jurafsky, D.: Automatic Labeling of Semantic Roles. Comput. Linguist. 28(3), 245–288 (2002)
Article Google Scholar
Fillmore, C.J.: Frames and the semantics of understanding. Quaderni di Semantica 4(2), 222–254 (1985)
Google Scholar
Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Article Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of COLING-ACL, Montreal, Canada (1998)
Google Scholar
Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In: Proceedings of CoNLL-2005, Ann Arbor, Michigan, June 2005, pp. 152–164
Google Scholar
Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J.H.: Support vector learning for semantic argument classification. Mach. Learn. J. 60(1–3), 11–39 (2005)
Article Google Scholar
Coppola, B., Moschitti, A., Riccardi, G.: Shallow semantic parsing for spoken language understanding. In: Proceedings of NAACL’09, pp. 85–88. Morristown, NJ (2009)
Google Scholar
Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Comput. Linguist. 34(2), 193–224 (2008)
Article MathSciNet Google Scholar
Firth, J.: A synopsis of linguistic theory 1930–1955. In: Studies in Linguistic Analysis. Philological Society, Oxford (1957) reprinted in Palmer, F. (ed.) Selected Papers of J. R. Firth, Longman, Harlow (1968)
Google Scholar
Wittgenstein, L.: Philosophical Investigations. Blackwells, Oxford (1953)
Google Scholar
Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., Hovy, E.: ISP: Learning inferential selectional preferences. In: Proceedings of HLT/NAACL (2007)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
MATH MathSciNet Google Scholar
Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104 (1997)
Google Scholar
Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Ind. Appl. Math.: Ser. B, Numer. Anal. 2(2), 205–224 (1965)
Article MATH MathSciNet Google Scholar
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. In: Brodley, C., Danyluk, A. (eds.) Proceedings of the 18th International Conference on Machine Learning (ICML-01), pp. 66–73. Morgan Kaufmann Publishers, San Francisco, Williams College (2001)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34, 1388–1429 (2010)
Article Google Scholar
Annesi, P., Storch, V., Basili, R.: Space projections as distributional models for semantic composition. In: Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing’12. Springer (2012)
Google Scholar
Baroni, M., Lenci, A.: One distributional memory, many semantic spaces. In: Proceedings of the GEMS 2009 Workshop. GEMS’09, pp. 1–8. Stroudsburg (2009)
Google Scholar
Clark, S., Pulman, S.: Combining symbolic and distributional models of meaning. In: Proceedings of the AAAI Spring Symposium on Quantum Interaction, pp. 52–55 (2007)
Google Scholar
Grefenstette, E., Sadrzadeh, M.: Experimental support for a categorical compositional distributional model of meaning. In: Proceedings of EMNLP 2011, Edinburgh, Scotland, UK (2011)
Google Scholar
Zanzotto, F.M., Korkontzelos, I., Fallucchi, F., Manandhar, S.: Estimating linear models for compositional distributional semantics. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10), pp. 1263–1271. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Bloehdorn, S., Moschitti, A.: Structure and semantics for expressive text kernels. In: Proceedings of CIKM (2007)
Google Scholar
Mehdad, Y., Moschitti, A., Zanzotto, F.M.: Syntactic/semantic structures for textual entailment recognition. In: HLT-NAACL, pp. 1020–1028 (2010)
Google Scholar
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: 17th European Conference on Machine Learning, Proceedings, Machine Learning: ECML 2006, pp. 318–329. ECML, Berlin, Germany, Sept 2006
Google Scholar
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)
MATH MathSciNet Google Scholar
Joachims, T.: Estimating the generalization performance of a SVM efficiently. In: Proceedings of ICML’00 (2000)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of NAACL’00 (2000)
Google Scholar
Johansson, R., Nugues, P.: Dependency-based syntactic-semantic analysis with PropBank and NomBank. In: Proceedings of the Twelfth Conference on Natural Language Learning (CoNLL 2008), pp. 183–187. Manchester (2008)
Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Res. Eval. 43(3), 209–226 (2009)
Article Google Scholar
Yeh, A.S.: More accurate tests for the statistical significance of result differences. In: COLING, pp. 947–953 (2000)
Google Scholar
Padó, S.: User’s guide to sigf: significance testing by approximate randomisation (2006)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 26–32. ACM Press (2003)
Google Scholar
Schuler, K.K.: VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylyania (2005)
Google Scholar
Loper, E., ting Yi, S., Palmer, M.: Combining lexical resources: mapping between propbank and verbnet. In: Proceedings of the 7th International Workshop on Computational Linguistics (2007)
Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Systems and Production, University of Roma Tor Vergata, Rome, Italy
Danilo Croce & Roberto Basili
Department of Computer Science and Information Engineering, University of Trento, Povo (TN), Italy
Alessandro Moschitti

Authors

Danilo Croce
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Basili
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Moschitti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danilo Croce .

Editor information

Editors and Affiliations

Department of Computer Science, Systems and Production, University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Department of Computer Science, University of Turin, Turin, Italy
Cristina Bosco
Department of Language and Cultural Studies, Department of Computer Science, Ca’ Foscari University of Venice, Venezia, Italy
Rodolfo Delmonte
Department of Computer Science and Information Engineering, University of Trento, Trento, Italy
Alessandro Moschitti
Department of Computer Science, University of Pisa, Pisa, Italy
Maria Simi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Croce, D., Basili, R., Moschitti, A. (2015). Semantic Tree Kernels for Statistical Natural Language Learning. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-14206-7_5
Published: 15 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14205-0
Online ISBN: 978-3-319-14206-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Semantic Tree Kernels for Statistical Natural Language Learning

Abstract

Access this chapter

Similar content being viewed by others

On the Impact of Linguistic Information in Kernel-Based Deep Architectures

Kernel methods for word sense disambiguation

Deep Learning for Natural Language Processing: A Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Semantic Tree Kernels for Statistical Natural Language Learning

Abstract

Access this chapter

Similar content being viewed by others

On the Impact of Linguistic Information in Kernel-Based Deep Architectures

Kernel methods for word sense disambiguation

Deep Learning for Natural Language Processing: A Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation