Bridging Collocational and Syntactic Analysis

  • Violeta SeretanEmail author
Part of the Quantitative Methods in the Humanities and Social Sciences book series (QMHSS)


The advent of the computer era, which enabled the development of large text corpora and of sophisticated corpus processing tools, led to unprecedented advances in the area of collocational analysis. These advances were paralleled by significant achievements in the area of syntactic analysis, with parsing technologies becoming available for an increasing number of languages. But more often than not, these developments have taken place independently. The coupling of collocational and syntactic analyses has seldom been considered, despite the fact that one type of analysis could benefit the other. In this chapter, we focus on the integration of syntactic parsing and collocational analysis. First, we review the literature describing syntactically-informed approaches to collocation extraction. Second, we survey the work devoted to exploiting collocational resources for syntactic parsing. Finally, we refer to more recent work that proposes a joint approach to collocational and syntactic analysis, arguing that the two analyses are interdependent to such a degree that only a simultaneous process, one in which structure decoding and pattern identification go hand in hand, can provide a solid bridge between them.



I am grateful to the anonymous reviewers, whose comments and suggestions allowed me to improve the chapter.


  1. Breidt, E. (1993). Extraction of V-N-collocations from text corpora: A feasibility study for German. In Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus (pp. 74–83).Google Scholar
  2. Brun, C. (1998). Terminology finite-state preprocessing for computational LFG. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Morristown (pp. 196–200).Google Scholar
  3. Charest, S., Brunelle, E., Fontaine, J., & Pelletier, B. (2007). Élaboration automatique d’un dictionnaire de cooccurrences grand public. In Actes de la 14e conférence sur le Traitement Automatique des Langues Naturelles (TALN 2007), Toulouse (pp. 283–292).Google Scholar
  4. Choueka, Y. (1988). Looking for needles in a haystack, or locating interesting collocational expressions in large textual databases. In Proceedings of the International Conference on User-Oriented Content-Based Text and Image Handling, Cambridge, MA (pp. 609–623).Google Scholar
  5. Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.Google Scholar
  6. Constant, M., & Sigogne, A. (2011). MWU-aware part-of-speech tagging with a CRF model and lexical resources. In Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World (pp. 49–56). Portland: Association for Computational Linguistics.Google Scholar
  7. Constant, M., Sigogne, A., & Watrin, P. (2012). Discriminative strategies to integrate multiword expression recognition and parsing. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 204–212). Jeju Island: Association for Computational Linguistics.Google Scholar
  8. Daille, B. (1994). Approche mixte pour l’extraction automatique de terminologie: Statistiques lexicales et filtres linguistiques. Ph.D. thesis, Université Paris 7.Google Scholar
  9. Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.Google Scholar
  10. Evert, S. (2004). The statistics of word cooccurrences: Word pairs and collocations. Ph.D. thesis, University of Stuttgart.Google Scholar
  11. Evert, S., & Krenn, B. (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech & Language, 19(4), 450–466.CrossRefGoogle Scholar
  12. Gross, M. (1984). Lexicon-grammar and the syntactic analysis of French. In Proceedings of the 10th Annual Computational Linguistics and 22nd Meeting of the Association for Computational Linguistics, Morristown. (pp. 275–282).Google Scholar
  13. Hausmann, F. J. (1989). Le dictionnaire de collocations. In F. Hausmann, O. Reichmann, H. Wiegand, & L. Zgusta (Eds.), Wörterbücher: Ein internationales Handbuch zur Lexicographie (pp. 1010–1019). Berlin: Dictionaries, Dictionnaires, de Gruyter.CrossRefGoogle Scholar
  14. Hornby, A. S., Cowie, A. P., & Lewis, J. W. (1948). Oxford advanced learner’dictionary of current English. London: Oxford University Press.Google Scholar
  15. Huang, C. R., Kilgarriff, A., Wu, Y., Chiu, C. M., Smith, S., Rychly, P., Bai, M. H., & Chen, K. J. (2005). Chinese sketch engine and the extraction of grammatical collocations. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island (pp. 48–55).Google Scholar
  16. Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the Eleventh EURALEX International Congress, Lorient (pp. 105–116).Google Scholar
  17. Korkontzelos, I., & Manandhar, S. (2010). Can recognising multiword expressions improve shallow parsing? In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 636–644). Los Angeles: Association for Computational Linguistics.Google Scholar
  18. Krenn, B. (2000). Collocation mining: Exploiting corpora for collocation identification and representation. In Proceedings of the KONVENS 2000, Ilmenau (pp. 209–214).Google Scholar
  19. Laenzlinger, C., & Wehrli, E. (1991). Fips, un analyseur interactif pour le français. TA Informations, 32(2), 35–49.Google Scholar
  20. Lafon, P. (1984). Dépouillements et statistiques en lexicométrie. Genève/Paris: Slatkine – Champion.Google Scholar
  21. Lea, D., & Runcie, M. (Eds.). (2002). Oxford collocations dictionary for students of English. Oxford: Oxford University Press.Google Scholar
  22. Lin, D. (1998). Extracting collocations from text corpora. In Proceedings of the First Workshop on Computational Terminology, Montreal (pp. 57–63).Google Scholar
  23. Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Morristown (pp. 317–324).Google Scholar
  24. Lü, Y., & Zhou, M. (2004). Collocation translation acquisition using monolingual corpora. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Barcelona (pp. 167–174)Google Scholar
  25. Mel’čuk, I. (2003). Collocations: Définition, rôle et utilité. In F. Grossmann, & A. Tutin (Eds.), Les collocations: Analyse et traitement (pp. 23–32). Amsterdam: Editions De WereltGoogle Scholar
  26. Monti, J., Seretan, V., Pastor, G. C., & Mitkov, R. (2018). Multiword units in machine translation and translation technology. In R. Mitkov, J. Monti, G. C. Pastor, & V. Seretan (Eds.), Multiword units in machine translation and translation technology (Current issues in linguistic theory, Vol. 341). Amsterdam/Philadelphia: John Benjamins.Google Scholar
  27. Nivre, J. (2006). Inductive dependency parsing (Text, speech and language technology). Secaucus: Springer.CrossRefGoogle Scholar
  28. Nivre, J., & Nilsson, J. (2004). Multiword units in syntactic parsing. In MEMURA 2004 – Methodologies and Evaluation of Multiword Units in Real-World Applications (LREC Workshop) (pp. 39–46).Google Scholar
  29. Nugues, P. M. (2014). Corpus processing tools (pp. 23–64). Berlin/Heidelberg: Springer.Google Scholar
  30. Orliac, B., & Dillinger, M. (2003). Collocation extraction for machine translation. In Proceedings of Machine Translation Summit IX, New Orleans (pp. 292–298).Google Scholar
  31. Pearce, D. (2001). Synonymy in collocation extraction. In Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh (pp. 41–46).Google Scholar
  32. Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Third International Conference on Language Resources and Evaluation, Las Palmas (pp. 1530–1536).Google Scholar
  33. Pecina, P. (2005). An extensive empirical study of collocation extraction methods. In Proceedings of the ACL Student Research Workshop, Ann Arbor (pp. 13–18).Google Scholar
  34. Pecina, P. (2008). Lexical association measures: Collocation extraction. Ph.D. thesis, Charles University.Google Scholar
  35. Piao, S. S., Rayson, P., Archera, D., & McEnery, T. (2005). Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech and Language Special Issue on Multiword Expressions, 19(4), 378–397.CrossRefGoogle Scholar
  36. Rani, A., Mehla, K., & Jangra, A. (2015). Parsers and parsing approaches: Classification and state of the art. In Proceedings of the 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), New Delhi (pp. 34–38).Google Scholar
  37. Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City (pp. 1–15).Google Scholar
  38. Seretan, V. (2008). Collocation extraction based on syntactic parsing. Ph.D. thesis, University of Geneva.Google Scholar
  39. Seretan, V. (2011). Syntax-based collocation extraction, text, speech and language technology (Vol. 44). Dordrecht: Springer.CrossRefGoogle Scholar
  40. Seretan, V., & Wehrli, E. (2006). Accurate collocation extraction using a multilingual parser. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney (pp. 953–960).Google Scholar
  41. Seretan, V., Nerima, L., & Wehrli, E. (2003). Extraction of multi-word collocations using syntactic bigram composition. In Proceedings of the Fourth International Conference on Recent Advances in NLP (RANLP-2003), Borovets (pp. 424–431).Google Scholar
  42. Shimohata, S., Sugio, T., & Nagata, J. (1997). Retrieving collocations by co-occurrences and word order constraints. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Madrid (pp. 476–481).Google Scholar
  43. Sinclair, J. (1995). Collins cobuild english dictionary. London: Harper Collins.Google Scholar
  44. Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.Google Scholar
  45. Tzoukermann, E., & Radev, D. R. (1996). Using word class for part-of-speech disambiguation. In Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen (pp. 1–13).Google Scholar
  46. Uhrig, P., & Proisl, T. (2012). Less hay, more needles – using dependency-annotated corpora to provide lexicographers with more accurate lists of collocation candidates. Lexicographica, 28(1), 141–180.CrossRefGoogle Scholar
  47. Villada Moirón, M. B. (2005). Data-driven identification of fixed expressions and their modifiability. Ph.D. thesis, University of Groningen.Google Scholar
  48. Villavicencio, A., Kordoni, V., Zhang, Y., Idiart, M., & Ramisch, C. (2007). Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague (pp. 1034–1043).Google Scholar
  49. Wehrli, E. (1997). L’analyse syntaxique des langues naturelles: Problèmes et méthodes. Paris: Masson.Google Scholar
  50. Wehrli, E., & Nerima, L. (2015). The fips multilingual parser. In N. Gala, R. Rapp, & G. Bel-Enguix (Eds.), Language production, cognition, and the lexicon, text, speech and language technology (Vol. 48, pp. 473–489). Cham: Springer.Google Scholar
  51. Wehrli, E., Seretan, V., & Nerima, L. (2010). Sentence analysis and collocation identification. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications (MWE 2010), Beijing (pp. 27–35).Google Scholar
  52. Wehrli, E., Seretan, V., & Nerima, L. (to appear) Verbal collocations and pronominalization. In G. C. Pastor & U. Heid (Eds.), Currrent trends in computational phraseology, research in linguistics and literature. Amsterdam/Philadelphia: John Benjamins.Google Scholar
  53. Wu, H., & Zhou, M. (2003). Synonymous collocation extraction using translation information. In Proceeding of the Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo (pp. 120–127).Google Scholar
  54. Zhang, Y., & Kordoni, V. (2006). Automated deep lexical acquisition for robust open texts processing. In Proceedings of LREC-2006, Genoa (pp. 275–280).Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of GenevaGenevaSwitzerland
  2. 2.University of LausanneLausanneSwitzerland

Personalised recommendations