TermeX: A Tool for Collocation Extraction

  • Davor Delač
  • Zoran Krleža
  • Jan Šnajder
  • Bojana Dalbelo Bašić
  • Frane Šarić
Conference paper

DOI: 10.1007/978-3-642-00382-0_12

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)
Cite this paper as:
Delač D., Krleža Z., Šnajder J., Dalbelo Bašić B., Šarić F. (2009) TermeX: A Tool for Collocation Extraction. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg

Abstract

Collocations – word combinations occurring together more often than by chance – have a wide range of NLP applications. Many approaches for automating collocation extraction based on lexical association measures have been proposed in the literature. This paper presents TermeX – a tool for efficient extraction of collocations based on a variety of association measures. TermeX implements POS filtering and lemmatization, and is capable of extracting collocations up to length four. We address trade-offs between high memory consumption and processing speed and propose an efficient implementation. Our implementation allows for processing time linear to corpus size and memory consumption linear to the number of word types.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Davor Delač
    • 1
  • Zoran Krleža
    • 1
  • Jan Šnajder
    • 1
  • Bojana Dalbelo Bašić
    • 1
  • Frane Šarić
    • 1
  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebCroatiaCroatia

Personalised recommendations