Abstract
This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bacelar do Nascimento, M.F.: Exemples de combinatoires lexicales établis pour l’écrit et l’oral à Lisbonne. In: Bilger, M. (ed.) Corpus, Méthodologie et Applications Linguistiques, pp. 237–261. H. Champion et Presses Universitaires de Perpignam, Paris (2000)
Bahns, J.: Lexical collocations: a contrastive view. ELT Journal 47(1), 56–63 (1993)
Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C., Zampolli, A.: Towards Best Practice for Multiword Expressions in Computational Lexicons. In: Proceedings of the Second International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, May 29–31, pp. 1934–1940 (2002)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Clear, J.: From Firth principles: Computational tools for the study of collocation. In: Baker, M., Francis, G., Tognini-Bonelli, E. (eds.) Text and technology: In honour of John Sinclair. John Benjamins, Amsterdam (1993)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 188–195 (2001)
Firth, J.: Modes of meaning, Papers in Linguistics 1934-1951, pp. 190–215. Oxford University Press, London (1955)
Heid, U.: Towards a corpus-based dictionary of German noun-verb collocations. In: Euralex 1998 Proceedings, Université de Liège (1998)
Kjellmer, G.A.: Dictionary of English Collocations. Oxford University Press, Oxford (1994)
Krenn, B.: CDB - A Database of Lexical Collocations. In: Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, May 31 – June 2, vol. II, pp. 1003–1008 (2000a)
Krenn, B.: Collocation Mining: Exploiting Corpora for Collocation Identification and Representation. In: Proceedings of KONVENCS 2000, Ilmenau, Deutschland (2000b)
Mackin, R.: On collocations: Words shall be known by the company they keep. In: Honour of A. S. Hornby, pp. 149–165. Oxford University Press, Oxford (1978)
Mel’cuk, I.: Dictionnaire explicatif et combinatoire du français contemporain, Les Presses de L’Université de Montréal, Montréal, Canada (1984)
Pearce, D.: A Comparative Evaluation of Collocation Extraction Techniques. In: Proceedings of the Third International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp. 13–18 (2002)
Pereira, L.A.S., Mendes, A.: An Electronic Dictionary of Collocations for European Portuguese: Methodology, Results and Applications. In: Braasch, A., Povlsen, C. (eds.) Preceedings of the 10th EURALEX International Congress, Copenhagen, Denmark, vol. II, pp. 841–849 (2002)
Sag, I., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) Proceedings of CICLing-2002. Mexico City, Mexico (2002)
Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Antunes, S., do Nascimento, M.F.B., Casteleiro, J.M., Mendes, A., Pereira, L., Sá, T. (2006). A Lexical Database of Portuguese Multiword Expressions. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_30
Download citation
DOI: https://doi.org/10.1007/11751984_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)