Verb Class Discovery from Rich Syntactic Data

  • Lin Sun
  • Anna Korhonen
  • Yuval Krymolowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)

Abstract

Previous research has shown that syntactic features are the most informative features in automatic verb classification. We investigate their optimal characteristics by comparing a range of feature sets extracted from data where the proportion of verbal arguments and adjuncts is controlled. The data are obtained from different versions of valex [1] – a large scf lexicon for English which was acquired automatically from several corpora and the Web. We evaluate the feature sets thoroughly using four supervised classifiers and one unsupervised method. The best performing feature set includes rich syntactic information about both arguments and adjuncts of verbs. When combined with our best performing classifier (a novel Gaussian classifier), it yields the promising accuracy of 64.2% in classifying 204 verbs to 17 Levin (1993) classes. We discuss the impact of our results on the state-or-art and propose avenues for future work.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Korhonen, A., Krymolowski, Y., Briscoe, T.: A large subcategorization lexicon for natural language processing applications. In: Proceedings of LREC (2006)Google Scholar
  2. 2.
    Merlo, P., Stevenson, S.: Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics 27, 373–408 (2001)CrossRefGoogle Scholar
  3. 3.
    Korhonen, A., Krymolowski, Y., Collier, N.: Automatic classification of verbs in biomedical texts. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual meeting of the ACL, pp. 345–352 (2006)Google Scholar
  4. 4.
    Schulte im Walde, S.: Experiments on the automatic induction of german semantic verb classes. Computational Linguistics 32, 159–194 (2006)CrossRefGoogle Scholar
  5. 5.
    Joanis, E., Stevenson, S., James, D.: A general feature space for automatic verb classification. Natural Language Engineering (forthcoming, 2007)Google Scholar
  6. 6.
    Dorr, B.J.: Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Machine Translation 12, 271–322 (1997)CrossRefGoogle Scholar
  7. 7.
    Prescher, D., Riezler, S., Rooth, M.: Using a probabilistic class-based lexicon for lexical ambiguity resolution. In: 18th International Conference on Computational Linguistics, Saarbrücken, Germany, pp. 649–655 (2000)Google Scholar
  8. 8.
    Swier, R., Stevenson, S.: Unsupervised semantic role labelling. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp. 95–102 (2004)Google Scholar
  9. 9.
    Dang, H.T.: Investigations into the Role of Lexical Semantics in Word Sense Disambiguation. PhD thesis, CIS, University of Pennsylvania (2004)Google Scholar
  10. 10.
    Shi, L., Mihalcea, R.: Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In: Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico (2005)Google Scholar
  11. 11.
    Jackendoff, R.: Semantic Structures. MIT Press, Cambridge (1990)Google Scholar
  12. 12.
    Levin, B.: English Verb Classes and Alternations. Chicago University Press, Chicago (1993)Google Scholar
  13. 13.
    Miller, G.A.: WordNet: An on-line lexical database. International Journal of Lexicography 3, 235–312 (1990)CrossRefGoogle Scholar
  14. 14.
    Schulte im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: Proceedings of COLING, Saarbrücken, Germany, pp. 747–753 (2000)Google Scholar
  15. 15.
    Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: AAAI/IAAI, pp. 691–696 (2000)Google Scholar
  16. 16.
    Briscoe, E.J., Carroll, J.: Automatic extraction of subcategorization from corpora. In: Proceedings of the 5th ACL Conference on Applied Natural Language Processing, Washington DC, pp. 356–363 (1997)Google Scholar
  17. 17.
    Briscoe, E.J., Carroll, J.: Robust accurate statistical annotation of general text. In: Proceedings of the 3rd LREC, Las Palmas, Gran Canaria, pp. 1499–1504 (2002)Google Scholar
  18. 18.
    Boguraev, B., Briscoe, T.: Large lexicons for natural language processing: utilising the grammar coding system of ldoce. Comput. Linguist. 13, 203–218 (1987)Google Scholar
  19. 19.
    Grishman, R., Macleod, C., Meyers, A.: Comlex syntax: building a computational lexicon. In: Proceedings of the 15th conference on Computational linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 268–272 (1994)Google Scholar
  20. 20.
    Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)MATHGoogle Scholar
  21. 21.
    Chang, C., Lin, J.: LIBSVM: a library for support vector machines (2001)Google Scholar
  22. 22.
    Hsu, W., Chang, C., Lin, J.: A practical guide to support vector classification (2003)Google Scholar
  23. 23.
    Pietra, S.D., Pietra, J.D., Lafferty, J.D.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 380–393 (1997)CrossRefGoogle Scholar
  24. 24.
    Zhang, L.: Maximum Entropy Modeling Toolkit for Python and C++ (2004)Google Scholar
  25. 25.
    Puzicha, J., Hofmann, T., Buhmann, J.M.: A theory of proximity-based clustering: structure detection by optimization. Pattern Recognition 33, 617–634 (2000)CrossRefGoogle Scholar
  26. 26.
    Ando, R.K., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 1–9 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Lin Sun
    • 1
  • Anna Korhonen
    • 1
  • Yuval Krymolowski
    • 2
  1. 1.Computer LaboratoryUniversity of CambridgeCambridgeUK
  2. 2.Department of Computer ScienceUniversity of HaifaHaifaIsrael

Personalised recommendations