Skip to main content

Introduction

  • Chapter
  • First Online:
Multiword Expressions Acquisition
  • 972 Accesses

Abstract

This book is about multiword expressions (MWEs) and their treatment in natural language processing (NLP) applications. Building computer systems capable of dealing with MWEs is a hard and open problem, due to the complex and pervasive nature of these constructions in language. This chapter is a general introduction to this exciting research topic. We motivate and illustrate the importance of MWEs through many examples in several human languages. Then, we discuss the goals and scope of the computational framework for MWE acquisition presented in this book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Actually, many of these expressions are ambiguous, and also accept literal interpretations. For example, by the way can denote a place, like in she waits by the way. We will discuss this in Sect. 2.2.

  2. 2.

    There are no clear rules as to whether an English compound should be spelled as a single word, with a hyphen or as two words (Procter 1995). For instance, both data set and dataset are acceptable forms. An overview of noun compound processing is provided in Szpakowicz et al. (2013).

  3. 3.

    http://duolingo.com

  4. 4.

    Expert MT systems are also sometimes called rule-based MT systems.

  5. 5.

    Also sometimes called empirical MT systems.

  6. 6.

    A list of similar expressions in other languages is available at http://en.wikipedia.org/wiki/Raining_animals

  7. 7.

    Extracted from the extended list of translation examples, see Appendix A.

  8. 8.

    However, we do not deal with languages whose writing systems do not use spaces to separate words.

  9. 9.

    http://multiword.sf.net

  10. 10.

    http://mwetoolkit.sf.net

References

  • Biber D, Johansson S, Leech G, Conrad S, Finegan E (1999) Longman grammar of spoken and written English, 1st edn. Pearson Education, Harlow, 1204p

    Google Scholar 

  • Church K, Hanks P (1990) Word association norms mutual information, and lexicography. Comput Linguist 16(1):22–29

    Google Scholar 

  • Constant M, Roux JL, Sigogne A (2013) Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 2 (TSLP) 10(3):1–24

    Article  Google Scholar 

  • Evert S (2004) The statistics of word cooccurrences: word pairs and collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Stuttgart, 353p

    Google Scholar 

  • Evert S, Krenn B (2005) Using small random samples for the manual evaluation of statistical association measures. Comput Speech Lang Spec Issue MWEs 19(4):450–466

    Article  Google Scholar 

  • Ferraro G, Nazar R, Ramos MA, Wanner L (2014) Towards advanced collocation error correction in Spanish learner corpora. Lang Resour Eval Spec Issue Resour Lang Learn 48(1):45–64. doi:10.1007/s10579-013-9242-3, http://dx.doi.org/10.1007/s10579-013-9242-3

    Google Scholar 

  • Finlayson M, Kulkarni N (2011) Detecting multi-word expressions improves word sense disambiguation. In: Kordoni V, Ramisch C, Villavicencio A (eds) Proceedings of the ALC workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, pp 20–24. http://www.aclweb.org/anthology/W/W11/W11-0805

  • Firth JR (1957) Papers in linguistics 1934-1951. Oxford University Press, Oxford, 233p

    Google Scholar 

  • Gala N, Zock M (eds) (2013) Ressources Lexicales : Contenu, construction, utilisation, évaluation. No. 30 in Lingvisticæ Investigationes Supplementa, John Benjamins Publishing Company, Amsterdam/Philadelphia, 364p

    Google Scholar 

  • Green S, de Marneffe MC, Bauer J, Manning CD (2011) Multiword expression identification with tree substitution grammars: a parsing tour de force with French. In: Barzilay R, Johnson M (eds) Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh. Association for Computational Linguistics, pp 725–735. http://www.aclweb.org/anthology/D11-1067

  • Grégoire N, Evert S, Krenn B (eds) (2008) Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008), Marrakech, 57p. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf

  • Jackendoff R (1997) Twistin’ the night away. Language 73:534–559

    Article  Google Scholar 

  • Klebanov BB, Burstein J, Madnani N (2013) Sentiment profiles of multiword expressions in test-taker essays: the case of noun-noun compounds. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Practice Use Part 2 (TSLP) 10(3):1–15

    Article  Google Scholar 

  • Kordoni V, Ramisch C, Villavicencio A (eds) (2011) Proceedings of the ACL workshop on multiword expressions: from parsing and generation to the real world (MWE 2011), Portland. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W/W11/W11-08

  • Kordoni V, Ramisch C, Villavicencio A (eds) (2013) Proceedings of the 9th workshop on multiword expressions (MWE 2013), Atlanta. Association for Computational Linguistics, 144p. http://www.aclweb.org/anthology/W13-10

  • Kordoni V, Savary A, Egg M, Wehrli E, Evert S (eds) (2014) Proceedings of the 10th workshop on multiword expressions (MWE 2014), Gothenburg. Association for Computational Linguistics, 133p. http://www.aclweb.org/anthology/W14-08

  • Messiant C, Poibeau T, Korhonen A (2008) Lexschem: a large subcategorization lexicon for French verbs. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008), Marrakech. European Language Resources Association, pp 533–538

    Google Scholar 

  • Mirroshandel SA, Nasr A, Roux JL (2012) Semi-supervised dependency parsing using lexical affinities. In: Proceedings of the 50th annual meeting of the association for computational linguistics (volume 1: long papers), Jeju Island. Association for Computational Linguistics, pp 777–785. http://www.aclweb.org/anthology/P12-1082

  • Mitkov R, Monti J, Pastor GC, Seretan V (eds) (2013) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice. European Association for Machine Translation, 71p. http://www.mtsummit2013.info/workshop4.asp

  • Pearce D (2002) A comparative evaluation of collocation extraction techniques. In: Proceedings of the third international conference on language resources and evaluation (LREC 2002), Las Palmas. European Language Resources Association, pp 1530–1536

    Google Scholar 

  • Pecina P (2010) Lexical association measures and collocation extraction. Lang Resour Eval Spec Issue Multiword Expr Hard Going Plain Sail 44(1–2):137–158. doi:10.1007/s10579-009-9101-4, http://www.springerlink.com/content/DRH83N312U658331

    Google Scholar 

  • Preiss J, Briscoe T, Korhonen A (2007) A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague. Association for Computational Linguistics, pp 912–919

    Google Scholar 

  • Procter P (ed) (1995) Cambridge international dictionary of English. Cambridge University Press, Cambridge

    Google Scholar 

  • Ramisch C, Besacier L, Kobzar O (2013a) How hard is it to automatically translate phrasal verbs from English to French? In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 53–61. http://www.mtsummit2013.info/workshop4.asp

  • Ramisch C, Villavicencio A, Kordoni V (2013b) Introduction to the special issue on multiword expressions: from theory to practice and use. ACM Trans Speech Lang Process Spec Issue Multiword Expr Theory Pract Use Part 1 (TSLP) 10(2):1–10

    Google Scholar 

  • Schone P, Jurafsky D (2001) Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Lee L, Harman D (eds) Proceedings of the 2001 conference on empirical methods in natural language processing (EMNLP 2001), Pittsburgh. Association for Computational Linguistics, pp 100–108

    Google Scholar 

  • Seretan V (2008) Collocation extraction based on syntactic parsing. PhD thesis, University of Geneva, Geneva, 249p

    Google Scholar 

  • Seretan V (2011) Syntax-based collocation extraction, text, speech and language technology, vol 44, 1st edn. Springer, Dordrecht, 212p

    Google Scholar 

  • Seretan V (2013) On translating syntactically-flexible expressions. In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 11–11

    Google Scholar 

  • Sinclair J (ed) (1989) Collins COBUILD dictionary of phrasal verbs. Collins COBUILD, London, 512p

    Google Scholar 

  • Smadja FA (1993) Retrieving collocations from text: Xtract. Comput Linguist 19(1):143–177

    Google Scholar 

  • Steedman M (2008) On becoming a discipline. Comput Linguist 34(1):137–144

    Article  Google Scholar 

  • Szpakowicz S, Bond F, Nakov P, Kim SN (2013) On the semantics of noun compounds. Nat Lang Eng Spec Issue Noun Compd 19(3):289–290. doi:10.1017/S1351324913000090, http://journals.cambridge.org/article_S1351324913000090

    Google Scholar 

  • Termignoni S (2009) Mil expressões idiomáticas e coloqualismos Italiano-Português. Editora da PUCRS, Porto Alegre, 172p

    Google Scholar 

  • Villavicencio A, Idiart M, Ramisch C, Araujo VD, Yankama B, Berwick R (2012) Get out but don’t fall down: verb-particle constructions in child language. In: Berwick R, Korhonen A, Poibeau T, Villavicencio A (eds) Proceedings of the EACL 2012 workshop on computational models of language acquisition and loss, Avignon. Association for Computational Linguistics, pp 43–50

    Google Scholar 

  • Walter E (ed) (2006) Cambridge idioms dictionary, 2nd edn. Cambridge University Press, Cambridge, 519p

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ramisch, C. (2015). Introduction. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09207-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09206-5

  • Online ISBN: 978-3-319-09207-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics