Multiword expressions: hard going or plain sailing?
- 386 Downloads
Over the past two decades or so, Multi-Word Expressions (MWEs; also called Multi-word Units) have been an increasingly important concern for Computational Linguistics and Natural Language Processing (NLP). The term MWE has been used to refer to various types of linguistic units and expressions, including idioms, noun compounds, phrasal verbs, light verbs and other habitual collocations. However, while there is no universally agreed definition for MWE as yet, most researchers use the term to refer to those frequently occurring phrasal units which are subject to certain level of semantic opaqueness, or non-compositionality. Non-compositional MWEs pose tough challenges for automatic analysis because their interpretation cannot be achieved by directly combining the semantics of their constituents, thereby causing the “pain in the neck of NLP” (Sag et al. 2001).
In fact, MWEs have been studied for decades in Phraseology under the term phraseological unit. But in the early 1990s, MWEs...
KeywordsNatural Language Processing Natural Language Processing Application Multilingual Context Multiword Expression Natural Language Processing Community
- Dagan, I., & Church, K. (1994). Termight: Identifying and translating technical terminology. In Proceedings of the 4th conference on applied natural language processing (pp. 34–40). Stuttgart, German.Google Scholar
- Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. Technical paper, UCREL, Lancaster University.Google Scholar
- Granger, S., & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary perspective. Amsterdam, The Netherlands: John Benjamins.Google Scholar
- Löfberg, L., Piao, S. L., Nykanen, A., Varantola, K., Rayson, P., & Juntunen, J.-P. (2005). A semantic tagger for the Finnish language. In The Proceedings of the corpus linguistics conference 2005. Birmingham, UK (14–17 July).Google Scholar
- McEnery, T., Jean-Marc, L., Michael, O., & Jean, V. (1997). The exploitation of multilingual annotated corpora for term extraction. In R. Garside, G. Leech, & A. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 220–230). London: Longman.Google Scholar
- McNamee, P., & Mayfield, J. (2006). Translation of multiword expressions using parallel suffix arrays. In Proceedings of the 7th conference of the association for machine translation in the Americas (pp. 100–109). Cambridge, Massachusetts, USA.Google Scholar
- Michiels, A., & Dufour, N. (1998). DEFI, a tool for automatic multi-word unit recognition, meaning assignment and translation selection. In Proceedings of the first international conference on language resources & evaluation (pp. 1179–1186). Granada, Spain.Google Scholar
- Mudraya, O. V., Babych, B. V., Piao, S., Rayson, P., & Wilson, A. (2006). Developing a Russian semantic tagger for automatic semantic annotation. In Proceedings of the international conference “Corpus Linguistics—2006” (pp. 290–297). St.-Petersburg, Russia.Google Scholar
- Piao, S. L., Archer, D., Mudrayam O., Rayson, P., Garside, R., McEnery, A. M., et al. (2006). A large semantic lexicon for corpus annotation. In Proceedings from the corpus linguistics conference series (on-line e-journal 1(1)).Google Scholar
- Rayson, P., Archer, D., Piao, S. L., & McEnery, T. (2004). The UCREL semantic analysis system. In Proceedings of the workshop on beyond named entity recognition semantic labelling for NLP tasks in association with 4th international conference on language resources and evaluation (LREC 2004), 25th May 2004 (pp. 7–12). Lisbon, Portugal.Google Scholar
- Sag, I., Baldwin, T., Bond, F., Copestake, A., & Dan, F. (2001). Multiword expressions: A pain in the neck for NLP. LinGO Working Paper No. 2001-03, Stanford University, CA.Google Scholar
- Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.Google Scholar
- Wermter, S., & Chen, J. (1997). Cautious steps towards hybrid connectionist bilingual phrase alignment. In Proceedings of the conference on recent advances in natural language processing (pp. 364–368). Sofia, Bulgaria.Google Scholar
- Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3), 377–401.Google Scholar