Skip to main content

A Lexical Database of Portuguese Multiword Expressions

  • Conference paper
Book cover Computational Processing of the Portuguese Language (PROPOR 2006)

Abstract

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bacelar do Nascimento, M.F.: Exemples de combinatoires lexicales établis pour l’écrit et l’oral à Lisbonne. In: Bilger, M. (ed.) Corpus, Méthodologie et Applications Linguistiques, pp. 237–261. H. Champion et Presses Universitaires de Perpignam, Paris (2000)

    Google Scholar 

  • Bahns, J.: Lexical collocations: a contrastive view. ELT Journal 47(1), 56–63 (1993)

    Article  Google Scholar 

  • Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C., Zampolli, A.: Towards Best Practice for Multiword Expressions in Computational Lexicons. In: Proceedings of the Second International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, May 29–31, pp. 1934–1940 (2002)

    Google Scholar 

  • Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  • Clear, J.: From Firth principles: Computational tools for the study of collocation. In: Baker, M., Francis, G., Tognini-Bonelli, E. (eds.) Text and technology: In honour of John Sinclair. John Benjamins, Amsterdam (1993)

    Google Scholar 

  • Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  • Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 188–195 (2001)

    Google Scholar 

  • Firth, J.: Modes of meaning, Papers in Linguistics 1934-1951, pp. 190–215. Oxford University Press, London (1955)

    Google Scholar 

  • Heid, U.: Towards a corpus-based dictionary of German noun-verb collocations. In: Euralex 1998 Proceedings, Université de Liège (1998)

    Google Scholar 

  • Kjellmer, G.A.: Dictionary of English Collocations. Oxford University Press, Oxford (1994)

    Google Scholar 

  • Krenn, B.: CDB - A Database of Lexical Collocations. In: Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, May 31 – June 2, vol. II, pp. 1003–1008 (2000a)

    Google Scholar 

  • Krenn, B.: Collocation Mining: Exploiting Corpora for Collocation Identification and Representation. In: Proceedings of KONVENCS 2000, Ilmenau, Deutschland (2000b)

    Google Scholar 

  • Mackin, R.: On collocations: Words shall be known by the company they keep. In: Honour of A. S. Hornby, pp. 149–165. Oxford University Press, Oxford (1978)

    Google Scholar 

  • Mel’cuk, I.: Dictionnaire explicatif et combinatoire du français contemporain, Les Presses de L’Université de Montréal, Montréal, Canada (1984)

    Google Scholar 

  • Pearce, D.: A Comparative Evaluation of Collocation Extraction Techniques. In: Proceedings of the Third International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp. 13–18 (2002)

    Google Scholar 

  • Pereira, L.A.S., Mendes, A.: An Electronic Dictionary of Collocations for European Portuguese: Methodology, Results and Applications. In: Braasch, A., Povlsen, C. (eds.) Preceedings of the 10th EURALEX International Congress, Copenhagen, Denmark, vol. II, pp. 841–849 (2002)

    Google Scholar 

  • Sag, I., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) Proceedings of CICLing-2002. Mexico City, Mexico (2002)

    Google Scholar 

  • Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Antunes, S., do Nascimento, M.F.B., Casteleiro, J.M., Mendes, A., Pereira, L., Sá, T. (2006). A Lexical Database of Portuguese Multiword Expressions. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_30

Download citation

  • DOI: https://doi.org/10.1007/11751984_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34045-4

  • Online ISBN: 978-3-540-34046-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics