Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression
In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however, make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).
Keywordsstatistical natural language processing abbreviation prediction support vector regression word clustering
Unable to display preview. Download preview PDF.
- Nenadic G, Spasic I, Ananiadou S. Automatic acronym acquisition and term variation management within domain-specific texts. In Proc. the LREC-3, Las Palmas, Spain, 2002, pp.2155–2162.Google Scholar
- Schwartz A, Hearst M. A simple algorithm for identifying abbreviation definitions in biomedical texts. In Proc. the Pacific Symposium on Biocomputing (PSB 2003), pp.451–462.Google Scholar
- Manuel Zahariev. An efficient methodology for acronym-expansion matching. In Proc. the International Conference on Information and Knowledge Engineering (IKE), Las Vegas, USA, 2003, pp.32–37.Google Scholar
- Tsuruoka Y, Ananiadou S, Tsujii J. A machine learning approach to abbreviation generation. In Proc. the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Michigan, USA, 2005, pp.25–31.Google Scholar
- Fu G, Luke K, Zhang M, Zhou G. A hybrid approach to Chinese abbreviation expansion. In Proc ICCPOL’06: 21st International Conference on Computer Processing of Oriental Languages, Singapore, 2006, pp.277–287.Google Scholar
- Huang C R, Ahrens K, Chen K J. A data-driven approach to psychological reality of the mental lexicon: Two studies on Chinese corpus linguistics. In Proc. Language and Its Psychobiological Bases, Taipei, 1994a.Google Scholar
- Huang C R, Hong W M, Chen K J. Suoxie: An information based lexical rule of abbreviation. In Proc. the Second Pacific Asia Conference on Formal and Computational Linguistics II, Japan, 1994b, pp.49–52.Google Scholar
- Chang J, Lai L. A preliminary study on probabilistic models for Chinese abbreviations. In Proc. the Third SIGHAN Workshop on Chinese Language Learning, ACL, Barcelona, Spain, 2004, pp.9–16.Google Scholar
- Christianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Methods. Cambridge University Press, 2000.Google Scholar
- Chang C C, Lin C J. LIBSVM: A library for support vector machines. Software available at http://www.csie. ntu.edu.tw/~cjin/libsvm.
- Hsu C W, Chang C C, Lin C J. A Practical Guide to Support Vector Classification, 2003, Working Paper, http://www.csie.ntu.edu.tw/~cjlin/talks/freiburg.pdf.
- Och F J. An efficient method for determining bilingual word classes. In Proc. Ninth Conference of the European Chapter of the Association for Computational Linguistics, EACL’99, 1999, pp.71–76.Google Scholar
- Yan H, Wan X. Modern Chinese Abbreviation Dictionary. China: Yuwen Publisher, 2002. (In Chinese)Google Scholar
- Sun X, Wang H F. Chinese abbreviation identification using abbreviation-template features and context information. In Proc. 21st International Conference on Computer Processing of Oriental Languages (ICCPOL-06), Singapore, 2006, pp.245–255.Google Scholar
- Sun X, Wang H F, Zhang Y. Chinese abbreviation-definition identification: A SVM approach using context information. In Proc. PRICAI-06: the 9th Pacific Rim International Conference on Artificial Intelligence, 2006, pp.495–504.Google Scholar