Skip to main content

Comparative Evaluation and Integration of Collocation Extraction Metrics

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

  • 1504 Accesses

Abstract

The paper deals with collocation extraction from corpus data. A whole number of formulae have been created to integrate different factors that determine the association between the collocation components. The experiments are described which objective was to study the method of collocation extraction based on the statistical association measures. The work is focused on bigram collocations. The obtained data on the measure precision allow to establish to some degree that some measures are more precise than others. No measure is ideal, which is why various options of their integration are desirable and useful. We propose a number of parameters that allow to rank collocates in an combined list, namely, an average rank, a normalized rank and an optimized rank.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Evert, S.: The statistics of word cooccurences word pairs and collocations. Ph.D. thesis, Institut für Maschinelle Sprachverarbeitung (IMS), Stuttgart (2004)

    Google Scholar 

  2. Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2009). Prague

    Google Scholar 

  3. Halliday, M.: Current Ideas in Systemic Practice and Theory. Pinter, London (1991)

    Google Scholar 

  4. Daille, B.: Mixed approach for the automatic extraction of terminology: lexical statistics and linguistic filters [Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques]. Ph.D. thesis, Université Paris 7 (1994)

    Google Scholar 

  5. Kilgarriff, A., Tugwell, D.: Sketching words. In: Correard, M.H. (ed.) Lexicography and Natural Language Processing: A Festschrift in Honour of B.T.S. Atkins, pp. 125–137. Euralex, Goteborg (2002)

    Google Scholar 

  6. Seretan, V.: Syntax-Based Collocation Extraction. Text, Speech and Language. Springer, Dordrecht (2011)

    Book  MATH  Google Scholar 

  7. Křen, M.: Collocation Measures and the Czech Language: Comparison on the Czech National Corpus data [Kolokační míry a čeština: srovnání na datech Českého národního korpusu], pp. 223–248. Kolokace, Praha (2006)

    Google Scholar 

  8. Zakharov, V., Khokhlova, M.: Syntagmatic relations in Russian corpora and dictionaries. In: Schoepe, K., et al. (eds.) Pragmantax II. The Present State of Linguistics and its Sub-Disciplines, pp. 333–344. Peter Lang, Frankfurt a.M. (2014)

    Google Scholar 

  9. Rychlý, P.: Manatee/Bonito – a modular corpus manager. In: 1st Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 65–70. Masaryk University, Brno (2007)

    Google Scholar 

  10. Benko, V.: Aranea: yet another family of (comparable) web corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 247–256. Springer, Cham (2014). doi:10.1007/978-3-319-10816-2_31

    Google Scholar 

  11. Statistics Used in Sketch Engine. https://www.sketchengine.co.uk/documentation/statistics-used-in-sketch-engine/. Accessed 3 Feb 2017

  12. Ashmanov, I., Grigoryev, S., Gusev, V., Kharin, N., Shabanov, V.: Using statistical method for intelligent computer-based text processing [Primenenie statisticheskih metodov dlja intellektual’noj komp’juternoj obrabotki tekstov]. In: The Proceedings of the Dialog 1997 International Seminar on Computational Linguistics and Its Applications, pp. 33–37 (1997)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by the grant of the Russian Foundation for Humanities (research project No. 16-04-12019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Zakharov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zakharov, V. (2017). Comparative Evaluation and Integration of Collocation Extraction Metrics. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics