Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus

  • Alexander Gelbukh
  • Grigori Sidorov
  • Eduardo Lavin-Villa
  • Liliana Chanona-Hernandez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6177)


In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.


Single-word term extraction log-likelihood reference corpus term clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cimiano, P.: Ontology learning and population from text, algorithms, evaluation and applications. Springer, New York (2006)Google Scholar
  2. 2.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  3. 3.
    Gómez-Pérez, A., Fernandez-López, M., Corcho, O.: Ontological Engineering. Springer, London (2004)Google Scholar
  4. 4.
    Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proceedings of ECAI 2000 (2000)Google Scholar
  5. 5.
    Melchuk, I.A.: Lexical Functions in Lexicographic Description. In: Proceedings of VIII Annual Meeting of the Berkeley Linguistic Society, Berkeley, UCB, pp. 427–444 (1982)Google Scholar
  6. 6.
    Punuru, J.: Knowledge-based methods for automatic extraction of domain-specific ontologies. PhD thesis (2007)Google Scholar
  7. 7.
    Rayson, P., Berridge, D., Francis, B.: Extending the Cochran rule for the comparison of word frequencies between corpora. In: Purnelle, G., Fairon, C., Dister, A. (eds.) Le poids des mots: Proceedings of the 7th International Conference on Statistical analysis of textual data (JADT 2004), Louvain-la-Neuve, Belgium, March 10-12. Presses universitaires de Louvain, vol. II, pp.926–936. Presses universitaires de Louvain (2004)Google Scholar
  8. 8.
    He, T., Zhang, X., Xinghuo, Y.: An Approach to Automatically Constructing Domain Ontology. In: PACLIC 2006, Wuhan, China, November 1-3, pp. 150–157 (2006)Google Scholar
  9. 9.
    Uschold, M., Grunninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Egineering Review (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  • Eduardo Lavin-Villa
    • 1
  • Liliana Chanona-Hernandez
    • 2
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)MexicoMexico
  2. 2.Engineering faculty (ESIME)National Polytechnic Institute (IPN)MexicoMexico

Personalised recommendations