Language Resources and Evaluation

, Volume 44, Issue 1–2, pp 1–5 | Cite as

Multiword expressions: hard going or plain sailing?

  • Paul RaysonEmail author
  • Scott Piao
  • Serge Sharoff
  • Stefan Evert
  • Begoña Villada Moirón

Over the past two decades or so, Multi-Word Expressions (MWEs; also called Multi-word Units) have been an increasingly important concern for Computational Linguistics and Natural Language Processing (NLP). The term MWE has been used to refer to various types of linguistic units and expressions, including idioms, noun compounds, phrasal verbs, light verbs and other habitual collocations. However, while there is no universally agreed definition for MWE as yet, most researchers use the term to refer to those frequently occurring phrasal units which are subject to certain level of semantic opaqueness, or non-compositionality. Non-compositional MWEs pose tough challenges for automatic analysis because their interpretation cannot be achieved by directly combining the semantics of their constituents, thereby causing the “pain in the neck of NLP” (Sag et al. 2001).

In fact, MWEs have been studied for decades in Phraseology under the term phraseological unit. But in the early 1990s, MWEs...


Natural Language Processing Natural Language Processing Application Multilingual Context Multiword Expression Natural Language Processing Community 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Dagan, I., & Church, K. (1994). Termight: Identifying and translating technical terminology. In Proceedings of the 4th conference on applied natural language processing (pp. 34–40). Stuttgart, German.Google Scholar
  2. Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. Technical paper, UCREL, Lancaster University.Google Scholar
  3. Granger, S., & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary perspective. Amsterdam, The Netherlands: John Benjamins.Google Scholar
  4. Löfberg, L., Piao, S. L., Nykanen, A., Varantola, K., Rayson, P., & Juntunen, J.-P. (2005). A semantic tagger for the Finnish language. In The Proceedings of the corpus linguistics conference 2005. Birmingham, UK (14–17 July).Google Scholar
  5. McEnery, T., Jean-Marc, L., Michael, O., & Jean, V. (1997). The exploitation of multilingual annotated corpora for term extraction. In R. Garside, G. Leech, & A. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 220–230). London: Longman.Google Scholar
  6. McNamee, P., & Mayfield, J. (2006). Translation of multiword expressions using parallel suffix arrays. In Proceedings of the 7th conference of the association for machine translation in the Americas (pp. 100–109). Cambridge, Massachusetts, USA.Google Scholar
  7. Michiels, A., & Dufour, N. (1998). DEFI, a tool for automatic multi-word unit recognition, meaning assignment and translation selection. In Proceedings of the first international conference on language resources & evaluation (pp. 1179–1186). Granada, Spain.Google Scholar
  8. Mudraya, O. V., Babych, B. V., Piao, S., Rayson, P., & Wilson, A. (2006). Developing a Russian semantic tagger for automatic semantic annotation. In Proceedings of the international conference “Corpus Linguistics—2006” (pp. 290–297). St.-Petersburg, Russia.Google Scholar
  9. Piao, S. L., Archer, D., Mudrayam O., Rayson, P., Garside, R., McEnery, A. M., et al. (2006). A large semantic lexicon for corpus annotation. In Proceedings from the corpus linguistics conference series (on-line e-journal 1(1)).Google Scholar
  10. Rayson, P., Archer, D., Piao, S. L., & McEnery, T. (2004). The UCREL semantic analysis system. In Proceedings of the workshop on beyond named entity recognition semantic labelling for NLP tasks in association with 4th international conference on language resources and evaluation (LREC 2004), 25th May 2004 (pp. 7–12). Lisbon, Portugal.Google Scholar
  11. Sag, I., Baldwin, T., Bond, F., Copestake, A., & Dan, F. (2001). Multiword expressions: A pain in the neck for NLP. LinGO Working Paper No. 2001-03, Stanford University, CA.Google Scholar
  12. Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.Google Scholar
  13. Wermter, S., & Chen, J. (1997). Cautious steps towards hybrid connectionist bilingual phrase alignment. In Proceedings of the conference on recent advances in natural language processing (pp. 364–368). Sofia, Bulgaria.Google Scholar
  14. Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3), 377–401.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Paul Rayson
    • 1
    Email author
  • Scott Piao
    • 1
  • Serge Sharoff
    • 2
  • Stefan Evert
    • 3
  • Begoña Villada Moirón
    • 4
  1. 1.Lancaster UniversityLancasterUK
  2. 2.University of LeedsLeedsUK
  3. 3.University of OsnabrueckOsnabrueckGermany
  4. 4.University of GroningenGroningenThe Netherlands

Personalised recommendations