Advertisement

Chinese Terminology Extraction Using Window-Based Contextual Information

  • Luning Ji
  • Mantai Sum
  • Qin Lu
  • Wenjie Li
  • Yirong Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4394)

Abstract

Terminology extraction is an important work for automatic update of domain specific knowledge. Contextual information helps to decide whether the extracted new terms are terminology or not. As extraction based on fixed patterns has very limited use to handle natural language text, we need both syntactical and semantic information in the context of a term to determine its termhood. In this paper, we investigate two window-based context word extraction methods taking into account of syntactic and semantic information. Based on the performance of each method individually, a hybrid method which combines both syntactical and semantic information is proposed. Experiments show that the hybrid method can achieve significant improvement.

Keywords

Chinese terminology terminology extraction window-based contextual word termhood unithood 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Daille, B.: Study and Implementation of Combined Techniques for Automatic extraction of terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)Google Scholar
  2. 2.
    Milios, E., Zhang, Y., He, B., Dong, L.: Automatic Term Extraction and Document Similarity in Special Text Corpora. In: Proc. of the 6th Conference of the Pacific Association for Computational Linguistics, Halifax, NS, Canada, August 22-25, pp. 275–284 (2003)Google Scholar
  3. 3.
    Yirong, C., Qin, L., Wenjie, L., Zhifang, S., Luning, J.: A Study on Terminology Extraction Based on Classified Corpora. In: LREC2006 (2006)Google Scholar
  4. 4.
    Chien, L.F.: Pat-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management 35, 501–521 (1999)CrossRefGoogle Scholar
  5. 5.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase Extraction. In: Proc. of 16th Int. Joint Conf. on Artificial Intelligence IJCAI-99, pp. 668–673 (1999)Google Scholar
  6. 6.
    Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: Proc. of the 2nd Int. Workshop on Computational Terminology, Taipei,Taiwan, August 31, pp. 29–35 (2002)Google Scholar
  7. 7.
    Fahmi, I.: C-value method for multi-word term extraction. In: Seminar in Statistics and Methodology, May 23 (2005)Google Scholar
  8. 8.
    Chang, J.-S.: Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora. Proc. of the Fourth SIGHAN Workshop on Chinese Language Learning, 64–71 (2005)Google Scholar
  9. 9.
    Kageura, K., Umino, B.: Methods of automatic term recognition: a review. Terminology 3(2), 259–289 (1996)CrossRefGoogle Scholar
  10. 10.
    Frantzi, K.T.: Incorporating Context Information for the Extraction of Terms. In: Proc. of ACL/EACL ’97, Madrid, Spain, July, pp. 501–503 (1997)Google Scholar
  11. 11.
    Frantzi, K.T., Annaniadou, S.: Extracting nested collocations. In: Proc. Of COLING’96, pp. 41–46 (1996)Google Scholar
  12. 12.
    Lu, Q., Chan, S.-T., Li, B., Yu, S.: A Unicode-based Adaptive Segmenter. Journal of Chinese Language and Computing 14(3), 221–234 (2004)Google Scholar
  13. 13.
    Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proc. of EMNLP (2001)Google Scholar
  14. 14.
    Luo, S., Sun, M.: Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In: Proc. of the Second SIGHAN Workshop on Chinese Language Processing, July, pp. 24–30 (2003)Google Scholar
  15. 15.
    Sui, Z., Chen, Y.: The Research on the automatic Term Extraction in the Domain of Information Science and Technology. In: Proc. of the 5th East Asia Forum of the Terminology (2002)Google Scholar
  16. 16.
    Hisamitsu, T., Niwa, Y.: A measure of term representativeness based on the number of co-occurring salient words. In: Proc. of the 19th COLING (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Luning Ji
    • 1
  • Mantai Sum
    • 1
  • Qin Lu
    • 1
  • Wenjie Li
    • 1
  • Yirong Chen
    • 1
  1. 1.The Department of Computing, The Hong Kong Polytechnic University, Hong KongChina

Personalised recommendations