Skip to main content

A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese

  • 674 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11122)


Verbal multiword expressions (VMWEs) such as to make ends meet require special attention in NLP and linguistic research, and annotated corpora are valuable resources for studying them. Corpora annotated with VMWEs in several languages, including Brazilian Portuguese, were made freely available in the PARSEME shared task. The goal of this paper is to describe and analyze this corpus in terms of the characteristics of annotated VMWEs in Brazilian Portuguese. First, we summarize and exemplify the criteria used to annotate VMWEs. Then, we analyze their frequency, average length, discontinuities and variability. We further discuss challenging constructions and borderline cases. We believe that this analysis can improve the annotated corpus and its results can be used to develop systems for automatic VMWE identification.


  • Multiword expressions
  • Annotation
  • Corpus linguistics

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-99722-3_3
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-99722-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.


  1. 1.

    Editions 1.0 (2017) and 1.1 (2018):

  2. 2.

  3. 3.

    Boldface indicates lexicalized components for all examples throughout this paper.

  4. 4.

    A flexibility test verifies to what extent a change usually allowed by a language’s grammar also applies to the candidate to annotate.

  5. 5.

    A word that does not co-occur with any other word outside the VMWE.

  6. 6.

  7. 7.

    In number of intervening tokens.

  8. 8.

    The normalized form of a VMWE is its sequence of lemmatized lexicalized components in lexicographic order, whereas its surface form is the textual sequence [8].


  1. Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn, pp. 267–292. CRC Press, Boca Raton (2010)

    Google Scholar 

  2. Bocorny Finatto, M.J., Scarton, C.E., Rocha, A., Aluísio, S.M.: Características do jornalismo popular: avaliação da inteligibilidade e auxílio à descrição do gênero. In: VIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pp. 30–39. Sociedade Brasileira de Computação, Cuiabá, MT, Brazil (2011)

    Google Scholar 

  3. Constant, M., et al.: Multiword expression processing: a survey. Comput. Linguistics 43(4), 837–892 (2017).

    CrossRef  MathSciNet  Google Scholar 

  4. Constant, M., Nivre, J.: A transition-based system for joint lexical and syntactic analysis. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 161–171. Association for Computational Linguistics, August 2016.

  5. Fotopoulou, A., Markantonatou, S., Giouli, V.: Encoding MWEs in a conceptual lexicon. In: Proceedings of the 10th Workshop on Multiword Expressions, MWE 2014, pp. 43–47. Association for Computational Linguistics (2014)

    Google Scholar 

  6. Nissim, M., Zaninello, A.: Modeling the internal variability of multiword expressions through a pattern-based method. ACM TSLP Special Issue MWEs 10(2) (2013)

    CrossRef  Google Scholar 

  7. Nivre, J., et al.: Universal dependencies v1: A multilingual treebank collection. In: Calzolari, N., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016, pp. 1659–1666. European Language Resources Association (ELRA), May 2016

    Google Scholar 

  8. Pasquer, C.: Expressions polylexicales verbales: étude de la variabilité en corpus. In: Actes de la 18e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (TALN-RÉCITAL 2017) (2017)

    Google Scholar 

  9. Riedl, M., Biemann, C.: Impact of MWE resources on multiword recognition. In: Proceedings of the 12th Workshop on Multiword Expressions, MWE 2016, pp. 107–111. Association for Computational Linguistics (2016).

  10. Rosén, V., et al.: A survey of multiword expressions in treebanks. In: Proceedings of the 14th International Workshop on Treebanks & Linguistic Theories Conference, December 2015.

  11. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002).

    CrossRef  Google Scholar 

  12. Sanches Duran, M., Scarton, C.E., Aluísio, S.M., Ramisch, C.: Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic ’se’ in Portuguese. In: Proceedings of the 9th Workshop on Multiword Expressions, pp. 93–100. Association for Computational Linguistics, Atlanta, June 2013.

  13. Savary, A., Cordeiro, S.R.: Literal readings of multiword expressions: as scarce as hen’s teeth. In: Proceedings of the 16th Workshop on Treebanks and Linguistic Theories (TLT 2016), Prague, Czech Republic (2018)

    Google Scholar 

  14. Savary, A., Jacquemin, C.: Reducing information variation in text. In: Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access. LNCS (LNAI), vol. 2705, pp. 145–181. Springer, Heidelberg (2003).

    CrossRef  Google Scholar 

  15. Savary, A., et al.: The PARSEME Shared Task on automatic identification of verbal multiword expressions. In: Proceedings of the 13th Workshop on Multiword Expressions, MWE 217, pp. 31–47. Association for Computational Linguistics (2017).,

  16. Straka, M., Straková, J.: Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, August 2017

    Google Scholar 

  17. Tutin, A.: Comparing morphological and syntactic variations of support verb constructions and verbal full phrasemes in French: a corpus based study. In: PARSEME COST Action. Relieving the Pain in the Neck in Natural Language Processing: 7th Final General Meeting, Dubrovnik, Croatia (2016)

    Google Scholar 

  18. van Gompel, M., van der Sloot, K., Reynaert, M., van den Bosch, A.: FoLiA in practice: the infrastructure of a linguistic annotation format, pp. 71–81 (2017).

  19. Zeman, D., et al.: Conll 2017 shared task: Multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1–19 Association for Computational Linguistics, Vancouver, Canada, August 2017.

Download references


We would like to thank Helena Caseli for her participation as an annotator. We would also like to thank the PARSEME shared task organizers, especially Agata Savary and Veronika Vincze. This work was supported by the IC1207 PARSEME COST action ( and by the PARSEME-FR project (ANR-14-CERA-0001). (

Author information

Authors and Affiliations


Corresponding author

Correspondence to Carlos Ramisch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ramisch, C., Ramisch, R., Zilio, L., Villavicencio, A., Cordeiro, S. (2018). A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese. In: , et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99721-6

  • Online ISBN: 978-3-319-99722-3

  • eBook Packages: Computer ScienceComputer Science (R0)