Abstract
Usually people use the input software to type Chinese language on a computer. The software takes a three-step approach: (1) receive the English keyboard input; (2) convert it to Chinese words; (3) output the words. Traditional Chinese spelling correction algorithms focus on the errors in the output Chinese, but ignore the errors introduced in the original keyboard input. These algorithms do not work well because the errors in the output are usually not the type of the typographical errors, which these algorithms are good at. In this paper, we propose a novel Chinese spelling correction model directly targeting at the original keyboard input. We integrate this model to an online Chinese input method, to improve the spelling suggestion feature. Experiments using real-word data show that this model helps the spelling suggestion achieve a 93.3% accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jinshan M (2004) Detecting chinese text errors based on trigram and dependency parsing
Kukich K (1992) Technique for automatically correcting words in text. ACM Comput Surv 24(4):377–439
Chen Q, Li M, Zhou M (2007) Improving query spelling correction using web search results. In: Proceedings of EMNLP-CoNLL, pp 181–189
Brill E, Moore RC (2000) An improved error model for noisy channel spelling correction. In: Proceedings of the 38th annual meeting of the ACL, pp 286–293
Toutanova K, Moore R (2002) Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th annual meeting of ACL, pp 144–151
Ahmad F, Kondrak G (2005) Learning a spelling error model from search query logs. In: Proceedings of EMNLP, pp 955–962
Reynaert M (2004) Text induced spelling correction. In: Proceedings of COLING, pp 834–840
Zhang L, Zhou M, Huang C et al (2000) Automatic chinese text error correction approach based on fast approximate chinese word-mathing algorithm. Microsoft research china paper, collection, pp 231–235
Zhang L, Zhou M, Huang C et al (2000) Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm, In: Annual meeting of the ACL proceedings of the 38th annual meeting on association for computational linguistics, pp 248–254
Edward, Riseman DM (1971) Contextual word recognation using binary dirams. IEEE on computers 20(4):397–403
Li J, Wang X, Wang P, Wang S (2001) The research of multi-feature chinese text proofreading algorithms. Comput Eng Sci (3):93
Zhipeng C, Yuqin L, Liu H et al (2009) Chinese spelling correction in search engines based on N-gram model. 4(3)
Quantity amount of Chinese character. http://www.sdtaishan.gov.cn/sites/liaocheng/shenxian/articles/J00000/1/1630615.aspx
Yu H, Yi Y, Zhang M, Ru L, Ma S (2007) Research in search engine user behavior based on log analysis. J Chin Inf Process 21(1):109–114
Silverstein C, Henzinger M, Marais H, Moricz M (1998) Analysis of avery large Alta Vista query log. Digital system research center, Technical report p 014
Chinese input methods Wikipedia (2010) http://zh.wikipedia.org/zh/中文输入法Accessed 5 Oct 2010
Snowling, MJ, Hulme C (2005) The science of reading: a handbook. In: Blackwell handbooks of developmental psychology, vol 17. Wiley-Blackwell, pp 320–322. ISBN 1405114886
Ranking of Chinese input methods downloaded (2008.02). http://download.zol.com.cn/download_order/ime_order.html
Agarwal S, Bloom filter: designing a spellchecker. http://ipowerinfinity.wordpress.com
Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Agarwal S (2006) Approximating the number of differences between remote sets. IEEE information theory workshop, Punta del Este, Uruguay, Trachtenberg Ari 217
Hash algorithm. http://www.java3z.com/cwbwebhome/article/article5/51002.html
Damerau F (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7:171–176
Norvig P How to Write a Spelling Corrector. http://norvig.com/spell-correct.html
Bayes’ theorem. http://en.wikipedia.org/wiki/Bayes’_theorem
Zheng Q, Yanan Q, Liu J (2010) Yotta: a knowledge map centric e-learning 1. In: System, IEEE 7th international conference on e-business engineering, pp 42–49
Acknowledgment
The research was supported in part by the National Science Foundation of China under Grant Nos.60825202, 60921003, 60803079, 61070072; the National Science and Technology Major Project (2010ZX01045-001-005); the Program for New Century Excellent Talents in University of China under Grant No.NECT-08-0433; the Doctoral Fund of Ministry of Education of China under Grant No. 20090201110060; Cheung Kong Scholar’s Program; Key Projects in the National Science & Technology Pillar Program during the 11th 5-Year Plan Period Grant No. 2009BAH51B02; IBM CRL Research Program—Research on BlueSky Storage for Cloud Computing Platform.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this paper
Cite this paper
Sha, S., Jun, L., Qinghua, Z., Wei, Z. (2011). Automatic Chinese Topic Term Spelling Correction in Online Pinyin Input. In: Park, J., Jin, H., Liao, X., Zheng, R. (eds) Proceedings of the International Conference on Human-centric Computing 2011 and Embedded and Multimedia Computing 2011. Lecture Notes in Electrical Engineering, vol 102. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2105-0_5
Download citation
DOI: https://doi.org/10.1007/978-94-007-2105-0_5
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2104-3
Online ISBN: 978-94-007-2105-0
eBook Packages: EngineeringEngineering (R0)