Abstract
In the concluding chapter of the book, we summarize the main contributions of our work and point to directions for future investigation. We begin by reviewing the theoretical and empirical arguments in favour of syntax-based collocation extraction. Initially hampered by the absence of appropriate tools, extraction methods based on syntactic parsing are still less popular than syntactically-uninformed methods despite the dramatic advances archived in parsing technologies. In our work, we have proposed an extraction methodology based on deep syntactic parsing, which we applied to multiple languages and evaluated in a series of experiments. The results showed that syntax-based collocation extraction is feasible, efficient, and particularly desirable as it enables the proper subsequent processing of extraction results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
“The critical problem for the lexicographer has been, heretofore, the treatment of collocations. It has been far more difficult to identify them than idioms or even compounds” (Benson et al., 1986b, 256).
- 3.
Semantic annotations have been used for multi-word expression extraction, for instance, in Piao et al. (2005)
References
Baldwin T, Kim SN (2010) Multiword expressions. In: Indurkhya N, Damerau FJ (eds) Handbook of Natural Language Processing, Second Edition, CRC Press, Taylor and Francis Group, Boca Raton, FL
Bannard C (2005) Learning about the meaning of verb-particle constructions from corpora. Computer Speech and Language 19(4):467–478
Baroni M, Evert S (2008) Statistical methods for corpus exploitation. In: Lüdeling A, Kytö M (eds) Corpus Linguistics. An International Handbook, Mouton de Gruyter, Berlin, pp 777–803
Benson M, Benson E, Ilson R (1986b) Lexicographic Description of English. John Benjamins, Amsterdam/Philadelphia
de Caseli HM, Ramisch C, das Graças Volpe Nunes M, Villavicencio A (2010) Alignment-based extraction of multiword expressions. Language Resources and Evaluation Special Issue on Multiword Expressions: Hard Going or Plain Sailing 44(1–2):59–77
Cook P, Fazly A, Stevenson S (2008) The VNC-tokens dataset. In: Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp 19–22
Evert S (2004b) The statistics of word cooccurrences: Word pairs and collocations. PhD thesis, University of Stuttgart
Evert S (2008a) Corpora and collocations. In: Lüdeling A, Kytö M (eds) Corpus Linguistics. An International Handbook, Mouton de Gruyter, Berlin
Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic, pp 9–16
Gildea D, Palmer M (2002) The necessity of parsing for predicate argument recognition. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp 239–246
Grossmann F, Tutin A (eds) (2003) Les Collocations. Analyse et traitement. Éditions De Werelt, Amsterdam
Heid U, Weller M (2008) Tools for collocation extraction: Preferences for active vs. passive. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
Keller F, Lapata M (2003) Using the web to obtain frequencies for unseen bigrams. Computational Linguistics 29(3):459–484
Kilgarriff A, Rychly P, Smrz P, Tugwell D (2004) The Sketch Engine. In: Proceedings of the 11th EURALEX International Congress, Lorient, France, pp 105–116
Krenn B (2000a) Collocation mining: Exploiting corpora for collocation identification and representation. In: Proceedings of KONVENS 2000, Ilmenau, Germany, pp 209–214
Kurz D, Xu F (2002) Text mining for the extraction of domain relevant terms and term collocations. In: Proceedings of the International Workshop on Computational Approaches to Collocations, Vienna, Austria
Leoni de Leon JA (2008) Modèle d’analyse lexico-syntaxique des locutions espagnoles. PhD thesis, University of Geneva
L’Homme MC (2003) Combinaisons lexicales spécialisées (CLS) : Description lexicographique et intégration aux banques de terminologie. In: Grossmann F, Tutin A (eds) Les collocations: analyse et traitement, Editions De Werelt, Amsterdam, pp 89–103
Maynard D, Ananiadou S (1999) A linguistic approach to terminological context clustering. In: Proceedings of Natural Language Pacific Rim Symposium 99, Beijing, China
McCarthy D, Venkatapathy S, Joshi A (2007) Detecting compositionality of verb-object combinations using selectional preferences. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp 369–379
McKeown KR, Radev DR (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) A Handbook of Natural Language Processing, Marcel Dekker, New York, NY, pp 507–523
Michelbacher L, Evert S, Schütze H (2007) Asymmetric association measures. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovetz, Bulgaria
Nivre J (2006) Inductive Dependency Parsing (Text, Speech and Language Technology). Springer-Verlag New York, Inc., Secaucus, NJ
Pearce D (2001a) Synonymy in collocation extraction. In: Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, PA, USA, pp 41–46
Pearce D (2002) A comparative evaluation of collocation extraction techniques. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp 1530–1536
Piao SS, Rayson P, Archera D, McEnery T (2005) Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech and Language Special Issue on Multiword Expressions 19(4):378–397
Rajman M, Besançon R (1998) Text mining — Knowledge extraction from unstructured textual data. In: Proceedings of the 6th Conference of International Federation of Classification Societies (IFCS-98), Roma, Italy, pp 473–480
Rayson P, Piao S, Sharoff S, Evert S, Moirón BV (2010) Multiword expressions: hard going or plain sailing? Language Resources and Evaluation Special Issue on Multiword Expressions: Hard Going or Plain Sailing 44(1–2):1–25
Ritz J (2006) Collocation extraction: Needs, feeds and results of an extraction system for German. In: Proceedings of the Workshop on Multi-Word-Expressions in a Multilingual Context at the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp 41–48
Sinclair J (1991) Corpus, Concordance, Collocation. Oxford University Press, Oxford
Smadja F (1993) Retrieving collocations from text: Xtract. Computational Linguistics 19(1):143–177
Tutin A (2004) Pour une modélisation dynamique des collocations dans les textes. In: Proceedings of the 11th EURALEX International Congress, Lorient, France, pp 207–219
Tutin A (2005) Annotating lexical functions in corpora: Showing collocations in context. In: Proceedings of the 2nd International Conference on the Meaning-Text Theory, Moscow, Russia
Villada Moirón Bn, Tiedemann J (2006) Identifying idiomatic expressions using automatic word-alignment. In: Proceedings of the Workshop on Multi-Word-Expressions in a Multilingual Context), Trento, Italy, pp 33–40
Villada Moirón MBn (2005) Data-driven identification of fixed expressions and their modifiability. PhD thesis, University of Groningen
Villavicencio A, Bond F, Korhonen A, McCarthy D (2005) Introduction to the special issue on multiword expressions: Having a crack at a hard nut. Computer Speech and Language Special Issue on Multiword Expressions 19(4):365–377
Wanner L, Bohnet B, Giereth M (2006) Making sense of collocations. Computer Speech & Language 20(4):609–624
Weller M, Heid U (2010) Extraction of German multiword expressions from parsed corpora using context features. In: Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta
Yang S (2003) Machine Learning for collocation identification. In: International Conference on Natural Language Processing and Knowledge Engineering Proceedings (NPLKE), Beijing, China
Zaiu Inkpen D, Hirst G (2002) Acquiring collocations for lexical choice between near-synonyms. In: Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition, Philadephia, PA, USA, pp 67–76
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Seretan, V. (2011). Conclusion. In: Syntax-Based Collocation Extraction. Text, Speech and Language Technology, vol 44. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0134-2_6
Download citation
DOI: https://doi.org/10.1007/978-94-007-0134-2_6
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-0133-5
Online ISBN: 978-94-007-0134-2
eBook Packages: Computer ScienceComputer Science (R0)