Advertisement

A Lexical Database of Portuguese Multiword Expressions

  • Sandra Antunes
  • Maria Fernanda Bacelar do Nascimento
  • João Miguel Casteleiro
  • Amália Mendes
  • Luísa Pereira
  • Tiago Sá
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)

Abstract

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.

Keywords

Mutual Information Word Association Candidate List Association Measure Language Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bacelar do Nascimento, M.F.: Exemples de combinatoires lexicales établis pour l’écrit et l’oral à Lisbonne. In: Bilger, M. (ed.) Corpus, Méthodologie et Applications Linguistiques, pp. 237–261. H. Champion et Presses Universitaires de Perpignam, Paris (2000)Google Scholar
  2. Bahns, J.: Lexical collocations: a contrastive view. ELT Journal 47(1), 56–63 (1993)CrossRefGoogle Scholar
  3. Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C., Zampolli, A.: Towards Best Practice for Multiword Expressions in Computational Lexicons. In: Proceedings of the Second International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, May 29–31, pp. 1934–1940 (2002)Google Scholar
  4. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)Google Scholar
  5. Clear, J.: From Firth principles: Computational tools for the study of collocation. In: Baker, M., Francis, G., Tognini-Bonelli, E. (eds.) Text and technology: In honour of John Sinclair. John Benjamins, Amsterdam (1993)Google Scholar
  6. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  7. Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 188–195 (2001)Google Scholar
  8. Firth, J.: Modes of meaning, Papers in Linguistics 1934-1951, pp. 190–215. Oxford University Press, London (1955)Google Scholar
  9. Heid, U.: Towards a corpus-based dictionary of German noun-verb collocations. In: Euralex 1998 Proceedings, Université de Liège (1998)Google Scholar
  10. Kjellmer, G.A.: Dictionary of English Collocations. Oxford University Press, Oxford (1994)Google Scholar
  11. Krenn, B.: CDB - A Database of Lexical Collocations. In: Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, May 31 – June 2, vol. II, pp. 1003–1008 (2000a)Google Scholar
  12. Krenn, B.: Collocation Mining: Exploiting Corpora for Collocation Identification and Representation. In: Proceedings of KONVENCS 2000, Ilmenau, Deutschland (2000b)Google Scholar
  13. Mackin, R.: On collocations: Words shall be known by the company they keep. In: Honour of A. S. Hornby, pp. 149–165. Oxford University Press, Oxford (1978)Google Scholar
  14. Mel’cuk, I.: Dictionnaire explicatif et combinatoire du français contemporain, Les Presses de L’Université de Montréal, Montréal, Canada (1984)Google Scholar
  15. Pearce, D.: A Comparative Evaluation of Collocation Extraction Techniques. In: Proceedings of the Third International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp. 13–18 (2002)Google Scholar
  16. Pereira, L.A.S., Mendes, A.: An Electronic Dictionary of Collocations for European Portuguese: Methodology, Results and Applications. In: Braasch, A., Povlsen, C. (eds.) Preceedings of the 10th EURALEX International Congress, Copenhagen, Denmark, vol. II, pp. 841–849 (2002)Google Scholar
  17. Sag, I., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) Proceedings of CICLing-2002. Mexico City, Mexico (2002)Google Scholar
  18. Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sandra Antunes
    • 1
  • Maria Fernanda Bacelar do Nascimento
    • 1
  • João Miguel Casteleiro
    • 1
  • Amália Mendes
    • 1
  • Luísa Pereira
    • 1
  • Tiago Sá
    • 1
  1. 1.Centro de Linguística da Universidade de Lisboa (CLUL)LisboaPortugal

Personalised recommendations