Verb Clustering for Brazilian Portuguese

  • Carolina Scarton
  • Lin Sun
  • Karin Kipper-Schuler
  • Magali Sanches Duran
  • Martha Palmer
  • Anna Korhonen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8403)

Abstract

Levin-style classes which capture the shared syntax and semantics of verbs have proven useful for many Natural Language Processing (NLP) tasks and applications. However, lexical resources which provide information about such classes are only available for a handful of worlds languages. Because manual development of such resources is extremely time consuming and cannot reliably capture domain variation in classification, methods for automatic induction of verb classes from texts have gained popularity. However, to date such methods have been applied to English and a handful of other, mainly resource-rich languages. In this paper, we apply the methods to Brazilian Portuguese - a language for which no VerbNet or automatic class induction work exists yet. Since Levin-style classification is said to have a strong cross-linguistic component, we use unsupervised clustering techniques similar to those developed for English without language-specific feature engineering. This yields interesting results which line up well with those obtained for other languages, demonstrating the cross-linguistic nature of this type of classification. However, we also discover and discuss issues which require specific consideration when aiming to optimise the performance of verb clustering for Brazilian Portuguese and other less-resourced languages.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fellbaum, C.: WordNet: An electronic lexical database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  2. 2.
    Baker, C.F., Fillmore, C.J., Lowe, J.F.: The Berkeley Framenet Project. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, University of Montréal, Canadá, pp. 86–90 (1998)Google Scholar
  3. 3.
    Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: A Corpus Annotated with Semantic Roles. Computational Linguistics 31(1), 71–106 (2005)CrossRefGoogle Scholar
  4. 4.
    Kipper-Schuler, K.: Verbnet: A broad coverage, comprehensive verb lexicon. Doctor of philosophy, University of Pennsylvania (2005)Google Scholar
  5. 5.
    Levin, B.: English Verb Classes and Alternation, A Preliminary Investigation. The University of Chicago Press, Chicago (1993)Google Scholar
  6. 6.
    Crouch, D., King, T.H.: Unifying Lexical Resources. In: Interdisciplinary Workshop on the Identication and Representation of Verb Features and Verb, Saarbruecken, Germany, pp. 32–37 (2005)Google Scholar
  7. 7.
    Swier, R., Stevenson, S.: Unsupervised Semantic Role Labelling. In: EMNLP 2004, Barcelona, Spain, pp. 95–102 (2004)Google Scholar
  8. 8.
    Yi, S., Lopper, E., Palmer, M.: Can Semantic Roles Generalize Across Genres? In: NAACL HLT 2007, Rochester, NY, USA, pp. 548–555 (2007)Google Scholar
  9. 9.
    Shi, L., Mihalcea, R.: Putting Pieces Together: Combining Framenet, Verbnet and Wordnet for Robust Semantic Parsing. In: 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 99–110 (2005)Google Scholar
  10. 10.
    Girju, R., Roth, D., Sammons, M.: Token-level Disambiguation of Verbnet Classes. In: Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, Saarbruecken, Germany (2005)Google Scholar
  11. 11.
    Abend, O., Reichart, R., Rappoport, A.: A Supervised Algorithm for Verb Disambiguation into Verbnet Classes. In: LREC 2008, Manchester, UK, pp. 9–16 (2008)Google Scholar
  12. 12.
    Chen, L., Eugenio, B.D.: A Maximum Entropy Approach to Disambiguating Verbnet Classes. In: Proceedings of the 2nd Interdisciplinary Workshop on Verbs, The Identification and Representation of Verb Features, Pisa, Italy (2010)Google Scholar
  13. 13.
    Brown, S.W., Dligach, D., Palmer, M.: Verbnet Class Assignment as a WSD Task. In: IWCS 2011, Oxford, UK, pp. 85–94 (2011)Google Scholar
  14. 14.
    Jackendoff, R.: Semantic Structures. MIT Press, Cambridge (1990)Google Scholar
  15. 15.
    Taulé, M., Martí, M.A., Borrega, O.: Ancora-net: Mapping the spanish ancora-verb lexicon to verb-net. In: The Workshop on Verbs. The Identification and Representation of Verb Features, Pisa, Italy (2010)Google Scholar
  16. 16.
    Liu, M.C., Chiang, T.Y.: The construction of mandarim verbnet: A frame-based study of statement verbs. Language and Linguistics 9(2), 239–270 (2010)Google Scholar
  17. 17.
    Mousser, J.: Classifying arabic verbs using sibling classes. In: International Workshop on Computational Semantics, Oxford, UK (2011)Google Scholar
  18. 18.
    Kingsbury, P., Kipper-Schuler, K.: Deriving Verb-Meaning Clusters from Syntactic Strucutres. In: The Workshop on Text Meaning, in Conjunction with NAACL HLT 2003, Edmonton, Canad (2003)Google Scholar
  19. 19.
    Sun, L., Korhonen, A.: Hierarchical Verb Clustering Using Graph Factorization. In: EMNLP 2011, Edinburgh, UK, pp. 1023–1033 (2011)Google Scholar
  20. 20.
    Reichart, R., Korhonen, A.: Improved lexical acquisition through dpp-based verb clustering. In: ACL 2013, Sofia, Bulgaria (2013)Google Scholar
  21. 21.
    Korhonen, A., Krymolowski, Y., Collier, N.: The choice of features for classification of verbs in biomedical texts. In: COLING 2008, Manchester, UK (2008)Google Scholar
  22. 22.
    Guo, Y., Korhonen, A., Poibeau, T.: A weakly-supervised approach to argumentative zoning of scientific documents. In: EMNLP 2011, Edinburgh, UK (2011)Google Scholar
  23. 23.
    Shutova, E., Sun, L.: Unsupervised metaphor identification using hierarchical graph factorization clustering. In: NAACL 2013, Atlanta, USA (2013)Google Scholar
  24. 24.
    Ferrer, E.E.: Towards a semantic classification of spanish verbs based on subcategorisation information. In: The Workshop on Student Research, in Conjunction with ACL 2004, Barcelona, Spain, pp. 163–170 (2004)Google Scholar
  25. 25.
    Sun, L., Korhonen, A., Poibeau, T., Messiant, C.: Investigating the cross-linguistic potential of Verbnet-style classification. In: The 23rd International Conference on Computational Linguistics, Beijing, China, pp. 1056–1064 (2010)Google Scholar
  26. 26.
    Falk, I., Gardent, C., Lamirel, J.C.: Classifying french verbs using french and english lexical resources. In: ACL 2012, Jeju, Republic of Korea, pp. 854–863 (2012)Google Scholar
  27. 27.
    Sun, L., Korhonen, A., Krymolowski, Y.: Improving verb clustering with automatically acquired selectional preferences. In: EMNLP 2009, Singapore, pp. 638–647 (2009)Google Scholar
  28. 28.
    Merlo, P., Stevenson, S.: Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics 27(3), 373–408 (2001)CrossRefGoogle Scholar
  29. 29.
    Li, J., Brew, C.: Which Are the Best Features for Automatic Verb Classication? In: ACL 2008 (2008)Google Scholar
  30. 30.
    Joanis, E., Stevenson, S., James, D.: A General Feature Space for Automatic Verb Classication. Natural Language Engineering (2008)Google Scholar
  31. 31.
    Sun, L., McCarthy, D., Korhonen, A.: Diathesis alternation approximation for verb clustering. In: ACL 2013, Sofia, Bulgaria, pp. 736–741 (2013)Google Scholar
  32. 32.
    Sun, L., Korhonen, A., Krymolowski, Y.: Verb class discovery from rich syntactic data. In: The 9th International Conference on Computational Linguistics and Intelligent Text Processing, Haifa, Israel, pp. 16–27 (2008)Google Scholar
  33. 33.
    Schulte im Walde, S.: Experiments on the Automatic Induction of German Semantic Verb Classes. Computational Linguistics 32(2), 159–194 (2006)CrossRefGoogle Scholar
  34. 34.
    Vázquez, G., Fernández, A., Castellón, I., Martí, M.A.: Clasificasión verbal: Alternancias de diátesis. Quaderns de Sintagma, Universitat de Lleida (2000)Google Scholar
  35. 35.
    Duran, M.S., Aluisio, S.M.: Propbank-br: A brazilian treebank annotated with semantic role labels. In: LREC 2012, Istanbul, Turkey (2012)Google Scholar
  36. 36.
    Salomao, M.M.: Framenet Brasil: Um trabalho em progresso. Revista Calidoscópio 7(3), 171–182 (2009)CrossRefGoogle Scholar
  37. 37.
    Bertoldi, A., Chishman, R.: Frame semantics and legal corpora annotation: Theoretical and applied challenges. Linguistic Issues in Language Technology 7(9) (2012)Google Scholar
  38. 38.
    da Dias Silva, B.C., Felippo, A.D., Nunes, M.G.V.: The Automatic Mapping of Princeton Wordnet lexical-conceptual relations onto the Brazilian Portuguese Wordnet database. In: Proc. LREC 2008, Marrakech, Morocco, pp. 1535–1541 (2008)Google Scholar
  39. 39.
    Marrafa, P.: Portuguese wordnet: General architecture and internal semantic relations. DELTA 18, 131–146 (2002)CrossRefGoogle Scholar
  40. 40.
    Marrafa, P., Amaro, R., Chaves, R.P., Lourosa, S., Martins, C., Mendes, S.: Wordnet.pt new directions. In: The Third Global WordNet Association Conference, Jeju, Republic of Korea, pp. 319–320 (2008)Google Scholar
  41. 41.
    Bentivogli, L., Pianta, E., Girardi, C.: Multiwordnet: Developing an aligned multilingual database. In: The First International Conference on Global WordNet Conference, Mysore, India, pp. 293–302 (2002)Google Scholar
  42. 42.
    Scarton, C., Aluísio, S.M.: Towards a cross-linguistic Verbnet-style lexicon to Brazilian Portuguese. In: The Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, in Conjunction with LREC 2012, Istanbul, Turkey (2012)Google Scholar
  43. 43.
    Aluísio, S.M., Pinheiro, G.M., Manfrim, A.M.P., Genovês Jr., L.H.M., Tagnin, S.E.O.: The Lácio-web: Corpora and Tools to Advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools. In: LREC 2004, Lisbon, Portugal, pp. 1779–1782 (2004)Google Scholar
  44. 44.
    Muniz, M., Paulovich, F.V., Minghim, R., Infante, K., Muniz, F., Vieira, R., Aluísio, S.: Taming the tiger topic: An xces compliant corpus portal to generate subcorpus based on automatic text topic identification. In: CL 2007, Birmingham, UK (2007)Google Scholar
  45. 45.
    Aziz, W., Specia, L.: Fully automatic compilation of a Portuguese-English parallel corpus for statistical machine translation. In: STIL 2011, Cuiabá, MT (October 2011)Google Scholar
  46. 46.
    Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Doctor of philosophy, University of Aarhus (2005)Google Scholar
  47. 47.
    Zanette, A., Scarton, C., Zilio, L.: Automatic extraction of subcategorization frames from corpora: An approach to Portuguese. In: PROPOR 2012 - Demo Session, Coimbra, Portugal (2012)Google Scholar
  48. 48.
    Messiant, C.: A subcategorization acquisition system for French verbs. In: NAACL HLT 2008, Columbus, OH, pp. 55–60 (2008)Google Scholar
  49. 49.
    Zanette, A.: Aquisiçao de Subcategorization Frames para Verbos da Língua Portuguesa. Projeto de diplomação, Federal University of Rio Grande do Sul (2010)Google Scholar
  50. 50.
    Yang, Z., Oja, E.: Clustering by low-rank doubly stochastic matrix decomposition. In: ICML (2012)Google Scholar
  51. 51.
    Brew, C., Schulte im Walde, S.: Spectral clustering for german verbs. In: EMNLP 2002, pp. 117–124 (2002)Google Scholar
  52. 52.
    Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AISTATS (2001)Google Scholar
  53. 53.
    McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)CrossRefGoogle Scholar
  54. 54.
    Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Carolina Scarton
    • 1
    • 4
  • Lin Sun
    • 2
  • Karin Kipper-Schuler
    • 3
  • Magali Sanches Duran
    • 4
  • Martha Palmer
    • 3
  • Anna Korhonen
    • 2
  1. 1.Department of Computer ScienceUniversity of SheffieldPortobelloUK
  2. 2.Computer LaboratoryUniversity of CambridgeCambridgeUK
  3. 3.Department of LinguisticsUniversity of Colorado at BoulderColoradoUSA
  4. 4.Interistitutional Center for Computational Linguistics, ICMCUniversity of São PauloSão CarlosUSA

Personalised recommendations