Research on Domain Term Extraction Based on Conditional Random Fields

  • Dequan Zheng
  • Tiejun Zhao
  • Jing Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5459)


Domain Term Extraction has an important significance in natural language processing, and it is widely applied in information retrieval, information extraction, data mining, machine translation and other information processing fields. In this paper, an automatic domain term extraction method is proposed based on condition random fields. We treat domain terms extraction as a sequence labeling problem, and terms’ distribution characteristics as features of the CRF model. Then we used the CRF tool to train a template for the term extraction. Experimental results showed that the method is simple, with common domains, and good results were achieved. In the open test, the precision rate achieved was 79.63 %, recall rate was 73.54%, and F-measure was 76.46%.


Term Extraction CRF Model Unithood Termhood 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zhifang, S.U.I., Chen, Y.: The Research on the Automatic Term Extraction in the Domain of Information Science and Technology. In: Proceedings of the 5th East Asia Forum of the TerminologyGoogle Scholar
  2. 2.
    Luo, S., Sun, M.: Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 24–30 (July 2003)Google Scholar
  3. 3.
    Kageura, K., Umino, B.: Methods of automatic term recognition. A review Terminology 3(2), 259–289Google Scholar
  4. 4.
    Chen, Y.: The Research on Automatic Chinese Term Extraction Integrated with Unithood and Domain Feature, Master Thesis in Beijing, Peking University, p. 4Google Scholar
  5. 5.
    Dunning, T.: Accurate Method for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74Google Scholar
  6. 6.
    Cohen, J.D.: Highlights: Language- and Domain-independent Automatic Indexing Terms for Abstracting. Journal of American Soc. for Information Science 46(3), 162–174Google Scholar
  7. 7.
    Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proceedings of EMNLP (2001)Google Scholar
  8. 8.
    Sornlertlamvanich, V., Potipiti, T., Charoenporn, T.: Automatic corpus-based Thai word extraction with the C4.5 learning algorithm. In: Proceedings of COLING 2000 (2000)Google Scholar
  9. 9.
    Chien, L.F.: Pat-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management 35, 501–521Google Scholar
  10. 10.
    Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: COMPUTERM 2002 Proceedings of the 2nd International Workshop on Computational Terminology, Taipei,Taiwan, August 31, 2002, pp. 29–35 (2002)Google Scholar
  11. 11.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase Extraction. In: Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999, pp. 668–673 (1999)Google Scholar
  12. 12.
    Chang, J.: Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Learning, pp. 64–71Google Scholar
  13. 13.
    Lin, S.: Topic Extraction Based on Techniques of Term Extraction and Term Clustering. Computational Linguistics and Chinese Language Processing 1(9), 97–112 (2004)Google Scholar
  14. 14.
  15. 15.
    Liu, J., He, T., Ji, D.: Extracting Chinese Term Based on Open Corpus. In: The 20th International Conference on Computer Processing of Oriental Languages, Shengyang, pp. 43–49 (2003)Google Scholar
  16. 16.
    Chen, W., Zhu, J.: Automatic Learing Field Words by Bootstrpping. In: Proceedings of the 7th National Conference On Computational Linguistics, pp. 67–72. Tsinghua university press, Bingjing (2003)Google Scholar
  17. 17.
    He, T., Zhang, Y.: Automatic Chinese Term Extraction Base on Decomposition of Prime String. Computer Engineering (December 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dequan Zheng
    • 1
  • Tiejun Zhao
    • 1
  • Jing Yang
    • 1
  1. 1.MOE-MS Key Laboratory of NLP and SpeechHarbin Institute of TechnologyHarbinChina

Personalised recommendations