A Hybrid Approach to Chinese Abbreviation Expansion
This paper presents a hybrid approach to Chinese abbreviation expansion. In this study, each short-form in Chinese text is assumed to be created by the method of reduction and the method of elimination or generalization, respectively. A mapping table between short words and long words and a dictionary of non-reduced short-form/full-form pairs are thus applied to generate the respective expansion candidates. Then, a hidden Markov model (HMM) based disambiguation is employed to rank these candidates and select a proper expansion for each ambiguous abbreviation. In order to improve expansion accuracy, some linguistic knowledge like discourse information and abbreviation patterns are further employed to double-check the expanded results and revise some error expansions if any. The proposed approach was evaluated on an abbreviation-expanded corpus built from the Peking University Corpus. The results showed that a recall of 83.8% and a precision of 86.3% can be achieved on average for different types of Chinese abbreviations.
KeywordsChinese abbreviation expansion hidden Markov models (HMMs) abbreviation disambiguation
Unable to display preview. Download preview PDF.
- 2.Toole, J.: A hybrid approach to the identification and expansion of abbreviations. In: Proceedings of RIAO 2000, pp. 725–736 (2000)Google Scholar
- 5.Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic resolution of ambiguous abbreviations in biomedical texts using support vector machines and one sense per discourse hypothesis. In: Proceedings of the 26th ACM SIGIR, Toronto, Canada, pp. 57–62 (2003)Google Scholar
- 6.Pakhomov, S.: Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, USA, pp. 160–167 (2002)Google Scholar
- 7.Chang, J.-S., Lai, Y.-T.: A preliminary study on probabilistic models for Chinese abbreviations. In: Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, Barcelona, Spain, pp. 9–16 (2004)Google Scholar
- 8.Lee, H.-W.: A study of automatic expansion of Chinese abbreviations. MA Thesis, The University of Hong Kong (2005)Google Scholar
- 9.Yin, Z.: Methodologies and principles of Chinese abbreviation formation. Language Teaching and Study 2, 73–82 (1999)Google Scholar
- 10.Yu, S., Duan, H., Zhu, S., Swen, B., Chang, B.: Specification for corpus processing at Peking University: Word segmentation, POS tagging and phonetic notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)Google Scholar
- 11.Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Proceedings of the ARPA Workshop on Speech and Natural Language Processing, pp. 233–237 (1992)Google Scholar