Chinese Abbreviation Identification Using Abbreviation-Template Features and Context Information

  • Xu Sun
  • Houfeng Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


Chinese abbreviations are frequently used without being defined, which has brought much difficulty into NLP. In this study, the definition-independent abbreviation identification problem is proposed and resolved as a classification task in which abbreviation candidates are classified as either ‘abbreviation’ or ‘non-abbreviation’ according to the posterior probability. To meet our aim of identifying new abbreviations from existing ones, our solution is to add generalization capability to the abbreviation lexicon by replacing words with word classes and therefore create abbreviation-templates. By utilizing abbreviation-template features as well as context information, a SVM model is employed as the classifier. The evaluation on a raw Chinese corpus obtains an encouraging performance. Our experiments further demonstrate the improvement after integrating with morphological analysis, substring analysis and person name identification.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chang, J., Schütze, H., Altman, R.: Creating an online dictionary of abbreviations from MEDLINE. Journal of American Medical Information Association 9(6), 612–620 (2002)CrossRefGoogle Scholar
  2. 2.
    Gao, J., Li, M., Huang, C.: Improved Source-channel Models for Chinese Word Segmentation. In: Proceedings of the 41st Annual Meeting of Association for Computational Linguistics (ACL), Sapporo, Japan, July 8-10, pp. 272–279 (2003)Google Scholar
  3. 3.
    Chang, J.-S., Lai, Y.-T.: A Preliminary Study on Probabilistic Models for Chinese Abbreviations. In: Proceedings of the Third SIGHAN Workshop on Chinese Language Learning, ACL, Barcelona, Spain, pp. 9–16 (2004)Google Scholar
  4. 4.
    Sun, J., Gao, J., Zhang, L., Zhou, M., Huang, C.-N.: Chinese Named Entity Identification Using Class-based Language Model. In: Proc. of the 19th International Conference on Computational Linguistics, Taipei, pp. 967–973 (2002)Google Scholar
  5. 5.
    Katz, S.M.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE ASSP 35(3), 400–401 (1987)CrossRefGoogle Scholar
  6. 6.
    Och, Franz Josef: An efficient method for determining bilingual word classes. In: EACL 1999: Ninth Conference of the European Chapter of the Association for Computational Linguistics, pp. 71–76 (1999)Google Scholar
  7. 7.
    Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Computational Linguistics 22(3), 377–404 (1996)Google Scholar
  8. 8.
    Sproat, R., Shih, C.: Corpus-Based Methods in Chinese Morphology and Phonology. In: COLING 2002 (2002)Google Scholar
  9. 9.
    Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Pacific Symposium on Biocomputing (PSB 2003), Kauai, Hawaii (2003)Google Scholar
  10. 10.
    Martin, S., Liermann, J., Ney, H.: Algorithms for Bigram and Trigram Word Clustering. Speech Communication 24(1), 19–37 (1998)CrossRefGoogle Scholar
  11. 11.
    Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. International journal on Document Analysis and Recognition, 191–198 (1999)Google Scholar
  12. 12.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Schkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Yeates, S.: Automatic extraction of acronyms from text. In: Third New Zealand Computer Science Research Students’ Conference, pp. 117–124 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xu Sun
    • 1
  • Houfeng Wang
    • 1
  1. 1.Department of Computer Science and Technology, School of Electronic Engineering and Computer SciencePeking UniversityBeijingChina

Personalised recommendations