Abstract
This book is about multiword expressions (MWEs) and their treatment in natural language processing (NLP) applications. Building computer systems capable of dealing with MWEs is a hard and open problem, due to the complex and pervasive nature of these constructions in language. This chapter is a general introduction to this exciting research topic. We motivate and illustrate the importance of MWEs through many examples in several human languages. Then, we discuss the goals and scope of the computational framework for MWE acquisition presented in this book.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Actually, many of these expressions are ambiguous, and also accept literal interpretations. For example, by the way can denote a place, like in she waits by the way. We will discuss this in Sect. 2.2.
- 2.
- 3.
- 4.
Expert MT systems are also sometimes called rule-based MT systems.
- 5.
Also sometimes called empirical MT systems.
- 6.
A list of similar expressions in other languages is available at http://en.wikipedia.org/wiki/Raining_animals
- 7.
Extracted from the extended list of translation examples, see Appendix A.
- 8.
However, we do not deal with languages whose writing systems do not use spaces to separate words.
- 9.
- 10.
References
Biber D, Johansson S, Leech G, Conrad S, Finegan E (1999) Longman grammar of spoken and written English, 1st edn. Pearson Education, Harlow, 1204p
Church K, Hanks P (1990) Word association norms mutual information, and lexicography. Comput Linguist 16(1):22–29
Constant M, Roux JL, Sigogne A (2013) Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 2 (TSLP) 10(3):1–24
Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353p
Evert S, Krenn B (2005) Using small random samples for the manual evaluation of statistical association measures. Comput Speech Lang Spec Issue MWEs 19(4):450–466
Ferraro G, Nazar R, Ramos MA, Wanner L (2014) Towards advanced collocation error correction in Spanish learner corpora. Lang Resour Eval Spec Issue Resour Lang Learn 48(1):45–64. doi:10.1007/s10579-013-9242-3, http://dx.doi.org/10.1007/s10579-013-9242-3
Finlayson M, Kulkarni N (2011) Detecting multi-word expressions improves word sense disambiguation. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 20–24. http://www.aclweb.org/anthology/W/W11/W11-0805
Firth JR (1957) Papers in linguistics 1934-1951. Oxford University Press, Oxford, 233p
Gala N, Zock M (eds) (2013) Ressources Lexicales : Contenu, construction, utilisation, évaluation. No. 30 in Lingvisticæ Investigationes Supplementa, John Benjamins Publishing Company, Amsterdam/Philadelphia, 364p
Green S, de Marneffe MC, Bauer J, Manning CD (2011) Multiword expression identification with tree substitution grammars: a parsing tour de force with French. In: Barzilay R, Johnson M (eds) Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh. Association for Computational Linguistics, pp 725–735. http://www.aclweb.org/anthology/D11-1067
Grégoire N, Evert S, Krenn B (eds) (2008) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, 57p. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf
Jackendoff R (1997) Twistin’ the night away. Language 73:534–559
Klebanov BB, Burstein J, Madnani N (2013) Sentiment profiles of multiword expressions in test-taker essays: the case of noun-noun compounds. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Practice Use Part 2 (TSLP) 10(3):1–15
Kordoni V, Ramisch C, Villavicencio A (eds) (2011) Proceedings of the ACL workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W/W11/W11-08
Kordoni V, Ramisch C, Villavicencio A (eds) (2013) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W13-10
Kordoni V, Savary A, Egg M, Wehrli E, Evert S (eds) (2014) Proceedings of the 10th workshop on multiword expressions (MWE 2014), Gothenburg. Association for Computational Linguistics, 133p. http://www.aclweb.org/anthology/W14-08
Messiant C, Poibeau T, Korhonen A (2008) Lexschem: a large subcategorization lexicon for French verbs. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008), Marrakech. European Language Resources Association, pp 533–538
Mirroshandel SA, Nasr A, Roux JL (2012) Semi-supervised dependency parsing using lexical affinities. In: Proceedings of the 50th annual meeting of the association for computational linguistics (volume 1: long papers), Jeju Island. Association for Computational Linguistics, pp 777–785. http://www.aclweb.org/anthology/P12-1082
Mitkov R, Monti J, Pastor GC, Seretan V (eds) (2013) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice. European Association for Machine Translation, 71p. http://www.mtsummit2013.info/workshop4.asp
Pearce D (2002) A comparative evaluation of collocation extraction techniques. In: Proceedings of the third international conference on language resources and evaluation (LREC 2002), Las Palmas. European Language Resources Association, pp 1530–1536
Pecina P (2010) Lexical association measures and collocation extraction. Lang Resour Eval Spec Issue Multiword Expr Hard Going Plain Sail 44(1–2):137–158. doi:10.1007/s10579-009-9101-4, http://www.springerlink.com/content/DRH83N312U658331
Preiss J, Briscoe T, Korhonen A (2007) A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague. Association for Computational Linguistics, pp 912–919
Procter P (ed) (1995) Cambridge international dictionary of English. Cambridge University Press, Cambridge
Ramisch C, Besacier L, Kobzar O (2013a) How hard is it to automatically translate phrasal verbs from English to French? In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 53–61. http://www.mtsummit2013.info/workshop4.asp
Ramisch C, Villavicencio A, Kordoni V (2013b) Introduction to the special issue on multiword expressions: from theory to practice and use. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 1 (TSLP) 10(2):1–10
Schone P, Jurafsky D (2001) Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Lee L, Harman D (eds) Proceedings of the 2001 conference on empirical methods in natural language processing (EMNLP 2001), Pittsburgh. Association for Computational Linguistics, pp 100–108
Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249p
Seretan V (2011) Syntax-based collocation extraction, text, speech and language technology, vol 44, 1st edn. Springer, Dordrecht, 212p
Seretan V (2013) On translating syntactically-flexible expressions. In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 11–11
Sinclair J (ed) (1989) Collins COBUILD dictionary of phrasal verbs. Collins COBUILD, London, 512p
Smadja FA (1993) Retrieving collocations from text: Xtract. Comput Linguist 19(1):143–177
Steedman M (2008) On becoming a discipline. Comput Linguist 34(1):137–144
Szpakowicz S, Bond F, Nakov P, Kim SN (2013) On the semantics of noun compounds. Nat Lang Eng Spec Issue Noun Compd 19(3):289–290. doi:10.1017/S1351324913000090, http://journals.cambridge.org/article_S1351324913000090
Termignoni S (2009) Mil expressões idiomáticas e coloqualismos Italiano-Português. Editora da PUCRS, Porto Alegre, 172p
Villavicencio A, Idiart M, Ramisch C, Araujo VD, Yankama B, Berwick R (2012) Get out but don’t fall down: verb-particle constructions in child language. In: Berwick R, Korhonen A, Poibeau T, Villavicencio A (eds) Proceedings of the EACL 2012 workshop on computational models of language acquisition and loss, Avignon. Association for Computational Linguistics, pp 43–50
Walter E (ed) (2006) Cambridge idioms dictionary, 2nd edn. Cambridge University Press, Cambridge, 519p
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ramisch, C. (2015). Introduction. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-09207-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09206-5
Online ISBN: 978-3-319-09207-2
eBook Packages: Computer ScienceComputer Science (R0)