Verb Class Discovery from Rich Syntactic Data

Sun, Lin; Korhonen, Anna; Krymolowski, Yuval

doi:10.1007/978-3-540-78135-6_2

Lin Sun¹,
Anna Korhonen¹ &
Yuval Krymolowski²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1495 Accesses
9 Citations

Abstract

Previous research has shown that syntactic features are the most informative features in automatic verb classification. We investigate their optimal characteristics by comparing a range of feature sets extracted from data where the proportion of verbal arguments and adjuncts is controlled. The data are obtained from different versions of valex [1] – a large scf lexicon for English which was acquired automatically from several corpora and the Web. We evaluate the feature sets thoroughly using four supervised classifiers and one unsupervised method. The best performing feature set includes rich syntactic information about both arguments and adjuncts of verbs. When combined with our best performing classifier (a novel Gaussian classifier), it yields the promising accuracy of 64.2% in classifying 204 verbs to 17 Levin (1993) classes. We discuss the impact of our results on the state-or-art and propose avenues for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Korhonen, A., Krymolowski, Y., Briscoe, T.: A large subcategorization lexicon for natural language processing applications. In: Proceedings of LREC (2006)
Google Scholar
Merlo, P., Stevenson, S.: Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics 27, 373–408 (2001)
Article Google Scholar
Korhonen, A., Krymolowski, Y., Collier, N.: Automatic classification of verbs in biomedical texts. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual meeting of the ACL, pp. 345–352 (2006)
Google Scholar
Schulte im Walde, S.: Experiments on the automatic induction of german semantic verb classes. Computational Linguistics 32, 159–194 (2006)
Article Google Scholar
Joanis, E., Stevenson, S., James, D.: A general feature space for automatic verb classification. Natural Language Engineering (forthcoming, 2007)
Google Scholar
Dorr, B.J.: Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Machine Translation 12, 271–322 (1997)
Article Google Scholar
Prescher, D., Riezler, S., Rooth, M.: Using a probabilistic class-based lexicon for lexical ambiguity resolution. In: 18th International Conference on Computational Linguistics, Saarbrücken, Germany, pp. 649–655 (2000)
Google Scholar
Swier, R., Stevenson, S.: Unsupervised semantic role labelling. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp. 95–102 (2004)
Google Scholar
Dang, H.T.: Investigations into the Role of Lexical Semantics in Word Sense Disambiguation. PhD thesis, CIS, University of Pennsylvania (2004)
Google Scholar
Shi, L., Mihalcea, R.: Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In: Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico (2005)
Google Scholar
Jackendoff, R.: Semantic Structures. MIT Press, Cambridge (1990)
Google Scholar
Levin, B.: English Verb Classes and Alternations. Chicago University Press, Chicago (1993)
Google Scholar
Miller, G.A.: WordNet: An on-line lexical database. International Journal of Lexicography 3, 235–312 (1990)
Article Google Scholar
Schulte im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: Proceedings of COLING, Saarbrücken, Germany, pp. 747–753 (2000)
Google Scholar
Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: AAAI/IAAI, pp. 691–696 (2000)
Google Scholar
Briscoe, E.J., Carroll, J.: Automatic extraction of subcategorization from corpora. In: Proceedings of the 5^th ACL Conference on Applied Natural Language Processing, Washington DC, pp. 356–363 (1997)
Google Scholar
Briscoe, E.J., Carroll, J.: Robust accurate statistical annotation of general text. In: Proceedings of the 3^rd LREC, Las Palmas, Gran Canaria, pp. 1499–1504 (2002)
Google Scholar
Boguraev, B., Briscoe, T.: Large lexicons for natural language processing: utilising the grammar coding system of ldoce. Comput. Linguist. 13, 203–218 (1987)
Google Scholar
Grishman, R., Macleod, C., Meyers, A.: Comlex syntax: building a computational lexicon. In: Proceedings of the 15th conference on Computational linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 268–272 (1994)
Google Scholar
Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)
MATH Google Scholar
Chang, C., Lin, J.: LIBSVM: a library for support vector machines (2001)
Google Scholar
Hsu, W., Chang, C., Lin, J.: A practical guide to support vector classification (2003)
Google Scholar
Pietra, S.D., Pietra, J.D., Lafferty, J.D.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 380–393 (1997)
Article Google Scholar
Zhang, L.: Maximum Entropy Modeling Toolkit for Python and C++ (2004)
Google Scholar
Puzicha, J., Hofmann, T., Buhmann, J.M.: A theory of proximity-based clustering: structure detection by optimization. Pattern Recognition 33, 617–634 (2000)
Article Google Scholar
Ando, R.K., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 1–9 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
Lin Sun & Anna Korhonen
Department of Computer Science, University of Haifa, 31905, Haifa, Israel
Yuval Krymolowski

Authors

Lin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Anna Korhonen
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Krymolowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, L., Korhonen, A., Krymolowski, Y. (2008). Verb Class Discovery from Rich Syntactic Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics