Skip to main content

Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes

  • Chapter
  • First Online:
Lexical Collocation Analysis

Abstract

Collocation candidate extraction from dependency-annotated corpora has become more and more mainstream in collocation research over the past years. In most studies, however, the results of one parser are compared to those of relatively “dumb” window-based approaches only. To date, the impact of the parser used and its parsing scheme has not been studied systematically to the best of our knowledge. This chapter evaluates a total of 8 parsers on 2 corpora with 20 different association measures plus several frequency thresholds for 6 different types of collocations against the Oxford Collocations Dictionary for Students of English (2nd edition; 2009). We find that the parser and parsing scheme both play a role in the quality of the collocation candidate extraction. The performance of different parsers can differ substantially across different collocation types. The filters used to extract different types of collocations from the corpora also play an important role in the trade-off between precision and recall we can observe. Furthermore, we find that carefully sampled and balanced corpora (such as the BNC) seem to have considerable advantages in precision, but of course for total coverage, larger, less balanced corpora (such as the web corpus used in this study) take the lead. Overall, log-likelihood is the best association measure, but for some specific types of collocation (such as adjective-noun or verb-adverb), other measures perform even better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See Bartsch (2004: 27–39, 58–78) for a detailed overview.

  2. 2.

    http://www.cl.cam.ac.uk/$\sim$sc609/candc-1.00.html

  3. 3.

    http://www.maltparser.org/

  4. 4.

    http://www.maltparser.org/mco/mco.html

  5. 5.

    https://emorynlp.github.io/nlp4j/

  6. 6.

    https://spacy.io/

  7. 7.

    http://universaldependencies.org

  8. 8.

    https://stanfordnlp.github.io/CoreNLP/

  9. 9.

    https://github.com/tensorflow/models/tree/master/syntaxnet

  10. 10.

    To date, the following revisions have been released: 1.0, 1.1, 1.2, 1.3, 1.4, 2.0

  11. 11.

    https://opennlp.apache.org/

  12. 12.

    Unfiltered data can be used to maximize recall, since parsers generally are better at predicting that two items should be connected by a dependency relation than they are at predicting what type of dependency relation connects the two. In the technical terms of parser evaluation, this is the difference between unlabelled and labelled attachment.

  13. 13.

    http://www.collocations.de/software.html

  14. 14.

    There are relatively few candidate pairs for verb-adjective and adverb-adjective collocations; the largest numbers of pairs are found for noun-verb (both subjects and objects) and noun-adjective collocations.

  15. 15.

    That is, of course, if the definition of the collocation type is regarded as a lexical phenomenon with the terminology based on the canonical active-declarative structure.

  16. 16.

    Except for graphs where the high frequency threshold leads to a coverage of less than 50%

  17. 17.

    CoreNLP produces a parsing error on this sentence so that Grapeshot stores is wrongly analysed as a nominal compound.

  18. 18.

    The same is true of the alternative form “peace be upon him”, which occurs more than 10,000 times but does not propel peace + be into the to 1,000 collocation candidates.

  19. 19.

    The list for CoreNLP enhanced++ only contains four of them.

References

  • Ambati, B. R., Reddy, S., & Kilgarriff, A. (2012). Word sketches for Turkish. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the eighth international conference on language resources and evaluation (LREC’12) (pp. 2945–2950). Istanbul: European Language Resources Association http://www.lrec-conf.org/proceedings/lrec2012/pdf/585_Paper.pdf.

    Google Scholar 

  • Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., & Collins, M. (2016). Globally normalized transition-based neural networks. In Proceedings of the 54th annual meeting of the Association for Computational Linguistics (ACL'16) (pp. 2442–2452). Berlin: Association for Computational Linguistics http://aclweb.org/anthology/P16-1231.

    Google Scholar 

  • Bartsch, S. (2004). Structural and functional properties of collocations in English. A corpus study of lexical and pragmatic constraints on lexical co-occurrence. Tübingen: Narr.

    Google Scholar 

  • Bartsch, S., & Evert, S. (2014). Towards a Firthian notion of collocation. OPAL – Online publizierte Arbeiten zur Linguistik, 2(2014), 48–61 http://pub.ids-mannheim.de/laufend/opal/pdf/opal2014-2.pdf.

    Google Scholar 

  • Basili, R., Pazienza, M. T., & Velardi, P. (1994). A ‘not-so-shallow’ parser for collocational analysis. In Proceedings of the 15th conference on computational linguistics (COLING’94) (pp. 447–453). Tokyo: Association for Computational Linguistics http://aclweb.org/anthology/C94-1074.

    Chapter  Google Scholar 

  • Blaheta, D., & Johnson, M. (2001). Unsupervised learning of multi-word verbs. In Proceedings of the ACL workshop on collocation: Computational extraction, analysis and exploitation (pp. 54–60). Toulouse.: http://web.science.mq.edu.au/$\sim$mjohnson/papers/2001/dpb-colloc01.pdf.

  • Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP'14) (pp. 740–750). Doha: Association for Computational Linguistics http://aclweb.org/anthology/D14-1082.

    Chapter  Google Scholar 

  • Choi, J. D., & McCallum, A. (2013). Transition-based dependency parsing with Selectional branching. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (ACL'13) (pp. 1052–1062). Sofia: Association for Computational Linguistics http://aclweb.org/anthology/P13-1104.

    Google Scholar 

  • Choi, J. D., & Palmer, M. (2011). Getting the most out of transition-based dependency parsing. In Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human language technologies (ACL'11) (pp. 687–692). Portland: Association for Computational Linguistics http://aclweb.org/anthology/P11-2121.

    Google Scholar 

  • Choi, J. D., & Palmer, M. (2012). Guidelines for the C lear Style Constituent to Dependency Conversion. Institute of Cognitive Science Technical Report 01-12, University of Colorado Boulder.

    Google Scholar 

  • Church, K., Gale, W., Hanks, P., & Hindle, D. (1989). Parsing, word associations and typical predicate-argument relations. In Speech and natural language: Proceedings of a workshop held at cape cod, Massachusetts, October 15-18, 1989 (pp. 75–81). Cape Cod.: http://aclweb.org/anthology/H89-2012.

  • Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–556 http://aclweb.org/anthology/J07-4004.

    Article  Google Scholar 

  • Evert, S. (2004). The statistics of word Cooccurrences. Word pairs and collocations. Ph.D. thesis, Institut für maschinelle Sprachverarbeitung, Universität Stuttgart. Published in 2005 http://elib.uni-stuttgart.de/opus/volltexte/2005/2371/.

  • Evert, S., & Krenn, B. (2001). Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th annual meeting of the Association for Computational Linguistics (ACL’01) (pp. 188–195). Toulouse: Association for Computational Linguistics http://www.aclweb.org/anthology/P01-1025.

    Google Scholar 

  • Evert, S., Uhrig, P., Bartsch, S., & Proisl, T. (2017). E-VIEW-alation – A large-scale evaluation study of association measures for collocation identification. In Proceedings of eLex 2017 – Electronic lexicography in the 21st century: Lexicography from Scratch (pp. 531–549). Leiden: Lexical Computing https://elex.link/elex2017/wp-content/uploads/2017/09/paper32.pdf.

    Google Scholar 

  • Farahmand, M., & Henderson, J. (2016). Modeling the non-substitutability of multiword expressions with distributional semantics and a log-linear model. In Proceedings of the 12th workshop on multiword expressions (pp. 61–66). Berlin: Association for Computational Linguistics https://aclweb.org/anthology/W16-1809.

    Chapter  Google Scholar 

  • Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–165.

    Article  Google Scholar 

  • Gries, S. T., & Stefanowitsch, A. (2004). Covarying collexemes in the into-causative. In M. Achard & S. Kemmer (Eds.), Language, culture, and mind (pp. 225–236). Stanford, CA: CSLI.

    Google Scholar 

  • Heid, U., Fritzinger, F., Hauptmann, S., Weidenkaff, J., Weller, M. (2008). Providing corpus data for a dictionary for German juridical phraseology. In Storrer, A., Geyken, A., Siebert, A., Würzner, K-M, Text resources and lexical knowledge. Selected papers from the 9th conference on natural language processing, KONVENS 2008, Berlin, Germany (pp. 131–144). Berlin/Boston: Mouton de Gruyter. https://doi.org/10.1515/9783110211818.2.131

    Chapter  Google Scholar 

  • Herbst, T. (1996). What are collocations: Sandy beaches or false teeth? In English studies (Vol. 1996/4, pp. 379–393).

    Google Scholar 

  • Ivanova, K., Heid, U., Walde, S. S. i., Kilgarriff, A., & Pomikalek, J. (2008). Evaluating a German sketch grammar: A case study on noun phrase case. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the sixth international conference on language resources and evaluation (LREC’08). Marrakech: European Language Resources Association, 2101–2107 http://www.lrec-conf.org/proceedings/lrec2008/pdf/537_paper.pdf.

    Google Scholar 

  • Johansson, R., & Nugues, P. (2007). Extended constituent-to-dependency conversion for English. In Proceedings of NODALIDA 2007 (pp. 105–112). Tartu.: http://dspace.ut.ee/bitstream/handle/10062/2560/reg-Johansson-10.pdf.

  • Johnson, M. (1999). Confidence intervals on likelihood estimates for estimating association strengths. Unpublished technical report.

    Google Scholar 

  • Katz, G., & Giesbrecht, E. (2006). Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the workshop on multiword expressions: Identifying and exploiting underlying properties (MWE’06) (pp. 12–19). Sydney: Association for Computational Linguistics http://aclweb.org/anthology/W06-1203.

    Chapter  Google Scholar 

  • Kermes, H., & Heid, U. (2003). Using chunked corpora for the acquisition of collocations and idiomatic expressions. In F. Kiefer & J. Pajzs (Eds.), Proceedings of 7th conference on computational lexicography and Corpus research. Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences.

    Google Scholar 

  • Kiela, D., & Clark, S. (2013). Detecting compositionality of multi-word expressions using nearest neighbours in vector space models. In Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP’13) (pp. 1427–1432). Seattle: Association for Computational Linguistics http://www.aclweb.org/anthology/D13-1147.

    Google Scholar 

  • Kilgarriff, A., Rychlý, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In G. Williams & S. Vessier (Eds.), Proceedings of the 11th EURALEX international congress (pp. 105–115). Lorient: Université de Bretagne-Sud, Faculté des lettres et des sciences humaines http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202004/011_2004_V1_Adam%20KILGARRIFF,%20Pavel%20RYCHLY,%20Pavel%20SMRZ, %20David%20TUGWELL_The%20%20Sketch%20Engine.pdf.

    Google Scholar 

  • Kilgarriff, A., Rychlý, P., Jakubicek, M., Kovář, V., Baisa, V., & Kocincová, L. (2014). Extrinsic corpus evaluation with a collocation dictionary task. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the ninth international conference on language resources and evaluation (LREC’14). Reykjavik: European Language Resources Association http://www.lrec-conf.org/proceedings/lrec2014/pdf/52_Paper.pdf.

    Google Scholar 

  • Klotz, M., & Herbst, T. (2016). English dictionaries: A linguistic introduction. Berlin: Erich Schmidt.

    Google Scholar 

  • Lin, D. (1998). Extracting collocations from text corpora. In Proceedings of the first workshop on computational terminology (pp. 57–63). Montreal.

    Google Scholar 

  • Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics (ACL’99) (pp. 317–324). Morristown: Association for Computational Linguistics http://aclweb.org/anthology/P99-1041.

    Google Scholar 

  • Lü, Y., & Zhou, M. (2004). Collocation translation acquisition using monolingual corpora. In Proceedings of the 42nd meeting of the Association for Computational Linguistics (ACL’04) (pp. 167–174). Barcelona: Association for Computational Linguistics http://aclweb.org/anthology/P04-1022.

    Google Scholar 

  • Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (ACL'14) (pp. 55–60). Baltimore: Association for Computational Linguistics http://aclweb.org/anthology/P14-5010.

    Google Scholar 

  • Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330 http://aclweb.org/anthology/J93-2004.

    Google Scholar 

  • Marneffe, M.-C. de & Manning, C. D. (2008). Stanford dependencies manual. https://nlp.stanford.edu/software/dependencies_manual.pd

  • Nerima, L., Seretan, V., & Wehrli, E. (2003). Creating a multilingual collocations dictionary from large text corpora. In Companion volume to the proceedings of the 10th conference of the European chapter of the Association for Computational Linguistics (EACL’03) (pp. 131–134). Budapest: Association for Computational Linguistics http://aclweb.org/anthology/E03-1022.

    Google Scholar 

  • Nissim, Malvina, Andrea Zaninello (2013): “Modeling the internal variability of multi-word expressions through a pattern-based method.” ACM Transactions on Speech and Language Processing (TSLP) 10/2: 7:1–7:26. https://doi.org/10.1145/2483691.2483696

  • Nivre, J. (2009). Non-projective dependency parsing in expected linear time. In Proceedings of the 47th annual meeting of the Association for Computational Linguistics and the 4th international joint conference on natural language processing of the AFNLP (ACL'09) (pp. 351–359). Singapore: Association for Computational Linguistics http://www.aclweb.org/anthology/P09-1040.

    Google Scholar 

  • Pearce, D. (2001). Synonymy in collocation extraction. In Proceedings of the NAACL workshop on WordNet and other lexical resources: Applications, extensions and customizations (pp. 41–46). Pittsburgh: Association for Computational Linguistics.

    Google Scholar 

  • Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). Las Palmas: European language resources association (pp. 1530–1536). http://www.lrec-conf.org/proceedings/lrec2002/pdf/169.pdf.

    Google Scholar 

  • Pecina, P. (2005). An extensive empirical study of collocation extraction methods. In Proceedings of the ACL student research workshop (pp. 13–18). Ann Arbor: Association for Computational Linguistic http://aclweb.org/anthology/P05-2003.

    Chapter  Google Scholar 

  • Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation, 44, 137–158 https://doi.org/10.1007/s10579-009-9101-4.

    Article  Google Scholar 

  • Pecina, P., & Schlesinger, P. (2006). Combining association measures for collocation extraction. In Proceedings of the COLING/ACL 2006 main conference poster sessions (pp. 651–658). Sydney: Association for Computational Linguistics http://aclweb.org/anthology/P06-2084.

    Chapter  Google Scholar 

  • Rodríguez-Fernández, S., Anke, L. E., Carlini, R., & Wanner, L. (2016). Semantics-driven recognition of collocations using word embeddings. In Proceedings of the 54th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 499–505). Berlin: Association for Computational Linguistics https://doi.org/10.18653/v1/P16-2081.

    Chapter  Google Scholar 

  • Sangati, F., & van Cranenburgh, A. (2015). Multiword expression identification with recurring tree fragments and association measures. In Proceedings of the 11th workshop on multiword expressions (pp. 10–18). Denver: Association for Computational Linguistics https://doi.org/10.3115/v1/W15-0902.

    Chapter  Google Scholar 

  • Schäfer, R. (2015). Processing and querying large web corpora with the COW14 architecture. In P. Bański, H. Biber, E. Breiteneder, M. Kupietz, H. Lüngen, & A. Witt (Eds.), Proceedings of the 3rd workshop on challenges in the Management of Large Corpora (CMLC-3) (pp. 28–34). Mannheim: IDS Publication Server https://ids-pub.bsz-bw.de/files/3826/Schaefer_Processing_and_querying_large_web_corpora_2015.pdf.

    Google Scholar 

  • Schäfer, R., & Bildhauer, F. (2012). Building large corpora from the web using a new efficient tool chain. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the eighth international conference on language resources and evaluation (LREC’12) (pp. 486–493). Istanbul: European Language Resources Association http://www.lrec-conf.org/proceedings/lrec2012/pdf/834_Paper.pdf.

    Google Scholar 

  • Schulte im Walde, S. (2003). A collocation database for German verbs and nouns. In Proceedings of the 7th conference on computational lexicography and text research (COMPLEX’03) (pp. 73–81). Budapest.: http://www.ims.uni-stuttgart.de/institut/mitarbeiter/schulte/publications/workshop/complex-03.pdf.

  • Schuster, S., & Manning, C. D. (2016). Enhanced English universal dependencies: An improved representation for natural language understanding tasks. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the tenth international conference on language resources and evaluation (LREC’16) (pp. 2371–2378). Portorož: European Language Resources Association http://www.lrec-conf.org/proceedings/lrec2016/pdf/779_Paper.pdf.

    Google Scholar 

  • Seretan, V. (2008). Collocation extraction based on syntactic parsing. Ph.D. thesis, Faculté des lettres, Université de Genève http://www.issco.unige.ch/en/staff/seretan/publ/PhDThesis-VioletaSeretan.pdf.

  • Seretan, V., & Wehrli, E. (2006). Accurate collocation extraction using a multilingual parser. In Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics (pp. 953–960). Sydney: Association for Computational Linguistics http://aclweb.org/anthology/P06-1120.

    Google Scholar 

  • Seretan, V., Nerima, L., & Wehrli, E. (2003). Extraction of multi-word collocations using syntactic bigram composition. In Proceedings of the fourth international conference on recent advances in NLP (RANLP-2003) (pp. 424–431). https://archive-ouverte.unige.ch/unige:17034.

    Google Scholar 

  • Seretan, V., Nerima, L., & Wehrli, E. (2004). Multi-word collocation extraction by syntactic composition of collocation bigrams. In N. Nicolov, K. Bontcheva, G. Angelova, & R. Mitkov (Eds.), Recent advances in natural language processing III. Selected papers from RANLP 2003 (pp. 91–100). Amsterdam/Philadelphia: John Benjamins https://doi.org/10.1075/cilt.260.10ser.

    Chapter  Google Scholar 

  • Squillante, L. (2014). Towards an empirical subcategorization of multiword expressions. In Proceedings of the 10th workshop on multiword expressions (MWE 2014) (pp. 77–81). Gothenburg: Association for Computational Linguistics http://www.aclweb.org/anthology/W14-0813.

    Google Scholar 

  • Steedman, M. (2000). The syntactic process. Cambridge, MA: The MIT Press.

    MATH  Google Scholar 

  • Stefanowitsch, A., & Gries, S. T. (2005). Covarying collexemes. Corpus Linguistics and Linguistic Theory, 1(1), 1–43. https://doi.org/10.1515/cllt.2005.1.1.1.

    Article  Google Scholar 

  • Stefanowitsch, A., & Gries, S. T. (2009). Corpora and grammar. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 933–952). Berlin, DE/New York, NY: Walter de Gruyter.

    Google Scholar 

  • Teufel, S., & Grefenstette, G. (1995). Corpus-based method for automatic identification of support verbs for nominalizations. In Proceedings of the seventh conference of the European chapter of the Association for Computational Linguistics (EACL’95) (pp. 98–103). Dublin: Association for Computational Linguistics http://aclweb.org/anthology/E95-1014.

    Google Scholar 

  • Tsvetkov, Y., & Wintner, S. (2014). Identification of multiword expressions by combining multiple linguistic information sources. Computational Linguistics, 40(2), 449–468 https://doi.org/10.1162/COLI_a_00177.

    Article  Google Scholar 

  • Uhrig, P., & Proisl, T. (2012). Less hay, more needles – Using dependency-annotated corpora to provide lexicographers with more accurate lists of collocation candidates. Lexicographica, 28, 141–180 https://doi.org/10.1515/lexi.2012-0009.

    Article  Google Scholar 

  • Villada, M., & Begoña, M. (2005). Data-driven identification of fixed expressions and their modifiability. Ph.D. thesis, University of Groningen http://www.rug.nl/research/portal/files/9790774/thesis.pdf.

  • Weller, M., & Heid, U. (2010). Extraction of German multiword expressions from parsed corpora using context features. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the seventh international conference on language resources and evaluation (LREC’10) (pp. 3195–3201). Valletta: European Language Resources Association http://lrec-conf.org/proceedings/lrec2010/pdf/428_Paper.pdf.

    Google Scholar 

  • Wermter, J., & Hahn, U. (2006). You can’t beat frequency (unless you use linguistic knowledge) – A qualitative evaluation of association measures for collocation and term extraction. In Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the ACL (ACL’06) (pp. 785–792). Sydney: Association for Computational Linguistics http://aclweb.org/anthology/P06-1099.

    Google Scholar 

  • Wiechmann, D. (2008). On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4(2), 253–290 https://doi.org/10.1515/CLLT.2008.011.

    Article  Google Scholar 

  • Yazdani, M., Farahmand, M., & Henderson, J. (2015). Learning semantic composition to detect non-compositionality of multiword expressions. In Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15) (pp. 1733–1742). Lisbon: Association for Computational Linguistics http://www.aclweb.org/anthology/D15-1201.

    Chapter  Google Scholar 

  • Zinsmeister, H., & Heid, U. (2003). Significant triples: Adjective+noun+verb combinations. In Proceedings of the 7th conference on computational lexicography and text research (complex 2003). Budapest.: http://www.ims.uni-stuttgart.de/%7Ezinsmeis/pubs/SigColl-paper.pdf.

  • Zinsmeister, H., & Heid, U. (2004). Collocations of complex nouns: Evidence for lexicalisation. In Proceedings of KONVENS 2004. Vienna.: https://pdfs.semanticscholar.org/3e5d/d62cbe41b8aa4bbdf37231b85b9b7ef7d94e.pdf.

Dictionaries

  • OALD8 = Oxford Advanced Learner’s Dictionary of Current English, 8th edition (2010). Edited by Joanna Turnbull. Oxford: Oxford University Press.

    Google Scholar 

  • OCD2 = Oxford Collocations Dictionary for Students of English, 2nd edition (2009). Edited by Colin MacIntosh. Oxford: Oxford University Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Uhrig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Uhrig, P., Evert, S., Proisl, T. (2018). Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes. In: Cantos-Gómez, P., Almela-Sánchez, M. (eds) Lexical Collocation Analysis. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-92582-0_6

Download citation

Publish with us

Policies and ethics