Skip to main content

Conclusion

  • Chapter
  • First Online:
Syntax-Based Collocation Extraction

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 44))

  • 865 Accesses

Abstract

In the concluding chapter of the book, we summarize the main contributions of our work and point to directions for future investigation. We begin by reviewing the theoretical and empirical arguments in favour of syntax-based collocation extraction. Initially hampered by the absence of appropriate tools, extraction methods based on syntactic parsing are still less popular than syntactically-uninformed methods despite the dramatic advances archived in parsing technologies. In our work, we have proposed an extraction methodology based on deep syntactic parsing, which we applied to multiple languages and evaluated in a series of experiments. The results showed that syntax-based collocation extraction is feasible, efficient, and particularly desirable as it enables the proper subsequent processing of extraction results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    URL: http://www.latl.unige.ch/personal/vseretan/data/annot/Exp1.html and http://www.latl.unige.ch/personal/vseretan/data/annot/Exp2.html.

  2. 2.

    “The critical problem for the lexicographer has been, heretofore, the treatment of collocations. It has been far more difficult to identify them than idioms or even compounds” (Benson et al., 1986b, 256).

  3. 3.

    Semantic annotations have been used for multi-word expression extraction, for instance, in Piao et al. (2005)

References

  • Baldwin T, Kim SN (2010) Multiword expressions. In: Indurkhya N, Damerau FJ (eds) Handbook of Natural Language Processing, Second Edition, CRC Press, Taylor and Francis Group, Boca Raton, FL

    Google Scholar 

  • Bannard C (2005) Learning about the meaning of verb-particle constructions from corpora. Computer Speech and Language 19(4):467–478

    Google Scholar 

  • Baroni M, Evert S (2008) Statistical methods for corpus exploitation. In: Lüdeling A, Kytö M (eds) Corpus Linguistics. An International Handbook, Mouton de Gruyter, Berlin, pp 777–803

    Google Scholar 

  • Benson M, Benson E, Ilson R (1986b) Lexicographic Description of English. John Benjamins, Amsterdam/Philadelphia

    Google Scholar 

  • de Caseli HM, Ramisch C, das Graças Volpe Nunes M, Villavicencio A (2010) Alignment-based extraction of multiword expressions. Language Resources and Evaluation Special Issue on Multiword Expressions: Hard Going or Plain Sailing 44(1–2):59–77

    Google Scholar 

  • Cook P, Fazly A, Stevenson S (2008) The VNC-tokens dataset. In: Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp 19–22

    Google Scholar 

  • Evert S (2004b) The statistics of word cooccurrences: Word pairs and collocations. PhD thesis, University of Stuttgart

    Google Scholar 

  • Evert S (2008a) Corpora and collocations. In: Lüdeling A, Kytö M (eds) Corpus Linguistics. An International Handbook, Mouton de Gruyter, Berlin

    Google Scholar 

  • Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic, pp 9–16

    Google Scholar 

  • Gildea D, Palmer M (2002) The necessity of parsing for predicate argument recognition. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp 239–246

    Google Scholar 

  • Grossmann F, Tutin A (eds) (2003) Les Collocations. Analyse et traitement. Éditions De Werelt, Amsterdam

    Google Scholar 

  • Heid U, Weller M (2008) Tools for collocation extraction: Preferences for active vs. passive. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco

    Google Scholar 

  • Keller F, Lapata M (2003) Using the web to obtain frequencies for unseen bigrams. Computational Linguistics 29(3):459–484

    Article  Google Scholar 

  • Kilgarriff A, Rychly P, Smrz P, Tugwell D (2004) The Sketch Engine. In: Proceedings of the 11th EURALEX International Congress, Lorient, France, pp 105–116

    Google Scholar 

  • Krenn B (2000a) Collocation mining: Exploiting corpora for collocation identification and representation. In: Proceedings of KONVENS 2000, Ilmenau, Germany, pp 209–214

    Google Scholar 

  • Kurz D, Xu F (2002) Text mining for the extraction of domain relevant terms and term collocations. In: Proceedings of the International Workshop on Computational Approaches to Collocations, Vienna, Austria

    Google Scholar 

  • Leoni de Leon JA (2008) Modèle d’analyse lexico-syntaxique des locutions espagnoles. PhD thesis, University of Geneva

    Google Scholar 

  • L’Homme MC (2003) Combinaisons lexicales spécialisées (CLS) : Description lexicographique et intégration aux banques de terminologie. In: Grossmann F, Tutin A (eds) Les collocations: analyse et traitement, Editions De Werelt, Amsterdam, pp 89–103

    Google Scholar 

  • Maynard D, Ananiadou S (1999) A linguistic approach to terminological context clustering. In: Proceedings of Natural Language Pacific Rim Symposium 99, Beijing, China

    Google Scholar 

  • McCarthy D, Venkatapathy S, Joshi A (2007) Detecting compositionality of verb-object combinations using selectional preferences. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp 369–379

    Google Scholar 

  • McKeown KR, Radev DR (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) A Handbook of Natural Language Processing, Marcel Dekker, New York, NY, pp 507–523

    Google Scholar 

  • Michelbacher L, Evert S, Schütze H (2007) Asymmetric association measures. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovetz, Bulgaria

    Google Scholar 

  • Nivre J (2006) Inductive Dependency Parsing (Text, Speech and Language Technology). Springer-Verlag New York, Inc., Secaucus, NJ

    Book  Google Scholar 

  • Pearce D (2001a) Synonymy in collocation extraction. In: Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, PA, USA, pp 41–46

    Google Scholar 

  • Pearce D (2002) A comparative evaluation of collocation extraction techniques. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp 1530–1536

    Google Scholar 

  • Piao SS, Rayson P, Archera D, McEnery T (2005) Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech and Language Special Issue on Multiword Expressions 19(4):378–397

    Google Scholar 

  • Rajman M, Besançon R (1998) Text mining — Knowledge extraction from unstructured textual data. In: Proceedings of the 6th Conference of International Federation of Classification Societies (IFCS-98), Roma, Italy, pp 473–480

    Google Scholar 

  • Rayson P, Piao S, Sharoff S, Evert S, Moirón BV (2010) Multiword expressions: hard going or plain sailing? Language Resources and Evaluation Special Issue on Multiword Expressions: Hard Going or Plain Sailing 44(1–2):1–25

    Google Scholar 

  • Ritz J (2006) Collocation extraction: Needs, feeds and results of an extraction system for German. In: Proceedings of the Workshop on Multi-Word-Expressions in a Multilingual Context at the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp 41–48

    Google Scholar 

  • Sinclair J (1991) Corpus, Concordance, Collocation. Oxford University Press, Oxford

    Google Scholar 

  • Smadja F (1993) Retrieving collocations from text: Xtract. Computational Linguistics 19(1):143–177

    Google Scholar 

  • Tutin A (2004) Pour une modélisation dynamique des collocations dans les textes. In: Proceedings of the 11th EURALEX International Congress, Lorient, France, pp 207–219

    Google Scholar 

  • Tutin A (2005) Annotating lexical functions in corpora: Showing collocations in context. In: Proceedings of the 2nd International Conference on the Meaning-Text Theory, Moscow, Russia

    Google Scholar 

  • Villada Moirón Bn, Tiedemann J (2006) Identifying idiomatic expressions using automatic word-alignment. In: Proceedings of the Workshop on Multi-Word-Expressions in a Multilingual Context), Trento, Italy, pp 33–40

    Google Scholar 

  • Villada Moirón MBn (2005) Data-driven identification of fixed expressions and their modifiability. PhD thesis, University of Groningen

    Google Scholar 

  • Villavicencio A, Bond F, Korhonen A, McCarthy D (2005) Introduction to the special issue on multiword expressions: Having a crack at a hard nut. Computer Speech and Language Special Issue on Multiword Expressions 19(4):365–377

    Google Scholar 

  • Wanner L, Bohnet B, Giereth M (2006) Making sense of collocations. Computer Speech & Language 20(4):609–624

    Article  Google Scholar 

  • Weller M, Heid U (2010) Extraction of German multiword expressions from parsed corpora using context features. In: Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta

    Google Scholar 

  • Yang S (2003) Machine Learning for collocation identification. In: International Conference on Natural Language Processing and Knowledge Engineering Proceedings (NPLKE), Beijing, China

    Google Scholar 

  • Zaiu Inkpen D, Hirst G (2002) Acquiring collocations for lexical choice between near-synonyms. In: Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition, Philadephia, PA, USA, pp 67–76

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Violeta Seretan .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Seretan, V. (2011). Conclusion. In: Syntax-Based Collocation Extraction. Text, Speech and Language Technology, vol 44. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0134-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-0134-2_6

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-0133-5

  • Online ISBN: 978-94-007-0134-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics