Gelbukh A., Sidorov G., Lavin-Villa E., Chanona-Hernandez L. (2010) Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus. In: Hopfe C.J., Rezgui Y., Métais E., Preece A., Li H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg
In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.
Single-word term extraction log-likelihood reference corpus term clustering