Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus

  • Alexander Gelbukh
  • Grigori Sidorov
  • Eduardo Lavin-Villa
  • Liliana Chanona-Hernandez
Conference paper

DOI: 10.1007/978-3-642-13881-2_26

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6177)
Cite this paper as:
Gelbukh A., Sidorov G., Lavin-Villa E., Chanona-Hernandez L. (2010) Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus. In: Hopfe C.J., Rezgui Y., Métais E., Preece A., Li H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg

Abstract

In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.

Keywords

Single-word term extraction log-likelihood reference corpus term clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  • Eduardo Lavin-Villa
    • 1
  • Liliana Chanona-Hernandez
    • 2
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)MexicoMexico
  2. 2.Engineering faculty (ESIME)National Polytechnic Institute (IPN)MexicoMexico

Personalised recommendations