Skip to main content
Log in

Automatic acquisition of syntactic verb classes with basic resources

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper describes a methodology aimed at grouping Catalan verbs according to their syntactic behavior. Our goal is to acquire a small number of basic classes with a high level of accuracy, using minimal resources. Information on syntactic class, expensive and slow to compile by hand, is useful for any NLP task requiring specific lexical information. We show that it is possible to acquire this kind of information using only a POS-tagged corpus. We perform two clustering experiments. The first one aims at classifying verbs into transitive, intransitive and verbs alternating with a se-construction. Our system achieves an average 0.84 F-score, for a task with a 0.33 baseline. The second experiment aims at further distinguishing among pure intransitives and verbs bearing a prepositional object. The baseline for the task is 0.51 and the upperbound 0.98. The system achieves an average 0.88 F-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

inf:

infinitive

fut:

future tense

OBJcli:

object clitic

VASE:

Verbs alternating with se

References

  • Alsina À, Badia T, Boleda G, Bott S, Gil À, Quixal M, Valentín O (2002) CATCG: a general purpose parsing tool applied. In: Proceedings of third international conference on language resources and evaluation. Las Palmas, Spain

  • Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of ACL 2001. Toulouse, France, pp 26–33

  • Bartra A (2002) La passiva i les construccions que s’hi relacionen. In: Solà J (ed) Gramàtica del català contemporani. Empúries, Barcelona, pp 2111–2179

    Google Scholar 

  • Boleda G, Bott S, Meza R, Castillo C, Badia T, López V (2006) CUCWeb: a Catalan corpus built from the Web. In: Proceedings of the 2nd Web as Corpus Workshop, celebrated in conjuction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Trento, Italy, April 3

  • Brent M (1993) From grammar to lexicon: unsupervised learning of lexical syntax. Comput Linguist 19(2):243–262

    Google Scholar 

  • Briscoe T, Carroll J (1997) Automatic extraction of subcategorization from corpora. In: Proceedings of the 5th conference on applied natural language processing (ANLP-97), Washington, USA

  • Church KW, Mercer RL (1993) Introduction to the special issue on computational linguistics using large corpora. Comput Linguist 19(1):1–24

    Google Scholar 

  • Hernanz ML, Brucart JM (1987) La sintaxis. Crítica, Barcelona

    Google Scholar 

  • Ide N, Véronis J (1998) Introduction to the special issue on word sense disambiguation: the state of the art. Comput Linguist 24(1):1–40

    Google Scholar 

  • Karypis G (2002) CLUTO: a clustering toolkit. CLUTO 2.0 user manual

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, New York City, NY

    Google Scholar 

  • Korhonen A, Krymolowswski Y, Marx Z (2003) Clustering polysemic subcategorization frame distributions semantically. In: Proceedings of the 41st annual meeting of the association for computer linguistics. Sapporo, Japan, pp 64–71

  • Manning C (1993) Automatic acquisition of a large subcategorisation dictionary from corpora. In: Proceedings of the 31st annual meeting of the association for computer linguistics. Columbus, USA, pp 235–242

  • Merlo P, Stevenson S (2001) Automatic verb classification based on statistical distributions of argument structure. Comput Linguist 27(3):373–408

    Article  Google Scholar 

  • Rafel J (1994) Un corpus general de referència de la llengua catalana. Caplletra 17:219–250

    Google Scholar 

  • Rosselló J (2002) El SV, I: verb i arguments verbals. In: Solà J (ed) Gramàtica del català contemporani. Empúries, Barcelona, pp 1853–1949

    Google Scholar 

  • Schulte im Walde S (2000) Clustering verbs semantically according to their alternation behaviour. In: Proceedings of the 18th international conference on computational linguistics (COLING-00). Saarbruecken, Germany, pp 747–753

  • Vallduví E, Engdahl E (1996) The linguistic realization of information packaging. Linguistics 34:459–519

    Article  Google Scholar 

Download references

Acknowledgements

Many thanks to Toni Martí and Enric Vallduví and all the colleagues from the GLiCom for their useful comments. Special thanks are due to the Institut d’Estudis Catalans for lending us the research corpus, and to Nadjet Bouayad and Sebastian Padó for a critical revision of a previous version of this paper. Also thanks to Tom Rozario for language revision. This work is supported by the Departament d’Universitats, Recerca i Societat de la Informació (grants 2003FI-00867 and 2001FI-00582), and by the Fundación Caja Madrid.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Mayol.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mayol, L., Boleda, G. & Badia, T. Automatic acquisition of syntactic verb classes with basic resources. Lang Resources & Evaluation 39, 295–312 (2005). https://doi.org/10.1007/s10579-006-9000-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-006-9000-x

Keywords

Navigation