Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 102))

Abstract

Usually people use the input software to type Chinese language on a computer. The software takes a three-step approach: (1) receive the English keyboard input; (2) convert it to Chinese words; (3) output the words. Traditional Chinese spelling correction algorithms focus on the errors in the output Chinese, but ignore the errors introduced in the original keyboard input. These algorithms do not work well because the errors in the output are usually not the type of the typographical errors, which these algorithms are good at. In this paper, we propose a novel Chinese spelling correction model directly targeting at the original keyboard input. We integrate this model to an online Chinese input method, to improve the spelling suggestion feature. Experiments using real-word data show that this model helps the spelling suggestion achieve a 93.3% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jinshan M (2004) Detecting chinese text errors based on trigram and dependency parsing

    Google Scholar 

  2. Kukich K (1992) Technique for automatically correcting words in text. ACM Comput Surv 24(4):377–439

    Article  Google Scholar 

  3. Chen Q, Li M, Zhou M (2007) Improving query spelling correction using web search results. In: Proceedings of EMNLP-CoNLL, pp 181–189

    Google Scholar 

  4. Brill E, Moore RC (2000) An improved error model for noisy channel spelling correction. In: Proceedings of the 38th annual meeting of the ACL, pp 286–293

    Google Scholar 

  5. Toutanova K, Moore R (2002) Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th annual meeting of ACL, pp 144–151

    Google Scholar 

  6. Ahmad F, Kondrak G (2005) Learning a spelling error model from search query logs. In: Proceedings of EMNLP, pp 955–962

    Google Scholar 

  7. Reynaert M (2004) Text induced spelling correction. In: Proceedings of COLING, pp 834–840

    Google Scholar 

  8. Zhang L, Zhou M, Huang C et al (2000) Automatic chinese text error correction approach based on fast approximate chinese word-mathing algorithm. Microsoft research china paper, collection, pp 231–235

    Google Scholar 

  9. Zhang L, Zhou M, Huang C et al (2000) Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm, In: Annual meeting of the ACL proceedings of the 38th annual meeting on association for computational linguistics, pp 248–254

    Google Scholar 

  10. Edward, Riseman DM (1971) Contextual word recognation using binary dirams. IEEE on computers 20(4):397–403

    Google Scholar 

  11. Li J, Wang X, Wang P, Wang S (2001) The research of multi-feature chinese text proofreading algorithms. Comput Eng Sci  (3):93

    Google Scholar 

  12. Zhipeng C, Yuqin L, Liu H et al (2009) Chinese spelling correction in search engines based on N-gram model. 4(3)

    Google Scholar 

  13. Quantity amount of Chinese character. http://www.sdtaishan.gov.cn/sites/liaocheng/shenxian/articles/J00000/1/1630615.aspx

  14. Yu H, Yi Y, Zhang M, Ru L, Ma S (2007) Research in search engine user behavior based on log analysis. J Chin Inf Process 21(1):109–114

    Google Scholar 

  15. Silverstein C, Henzinger M, Marais H, Moricz M (1998) Analysis of avery large Alta Vista query log. Digital system research center, Technical report p 014

    Google Scholar 

  16. Chinese input methods Wikipedia (2010) http://zh.wikipedia.org/zh/中文输入法Accessed 5 Oct 2010

  17. Snowling, MJ, Hulme C (2005) The science of reading: a handbook. In: Blackwell handbooks of developmental psychology, vol 17. Wiley-Blackwell, pp 320–322. ISBN 1405114886

    Google Scholar 

  18. Ranking of Chinese input methods downloaded (2008.02). http://download.zol.com.cn/download_order/ime_order.html

  19. Agarwal S, Bloom filter: designing a spellchecker. http://ipowerinfinity.wordpress.com

  20. Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  MATH  Google Scholar 

  21. Agarwal S (2006) Approximating the number of differences between remote sets. IEEE information theory workshop, Punta del Este, Uruguay, Trachtenberg Ari 217

    Google Scholar 

  22. Hash algorithm. http://www.java3z.com/cwbwebhome/article/article5/51002.html

  23. Damerau F (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7:171–176

    Article  Google Scholar 

  24. Norvig P How to Write a Spelling Corrector. http://norvig.com/spell-correct.html

  25. Bayes’ theorem. http://en.wikipedia.org/wiki/Bayes’_theorem

  26. Zheng Q, Yanan Q, Liu J (2010) Yotta: a knowledge map centric e-learning 1. In: System, IEEE 7th international conference on e-business engineering, pp 42–49

    Google Scholar 

Download references

Acknowledgment

The research was supported in part by the National Science Foundation of China under Grant Nos.60825202, 60921003, 60803079, 61070072; the National Science and Technology Major Project (2010ZX01045-001-005); the Program for New Century Excellent Talents in University of China under Grant No.NECT-08-0433; the Doctoral Fund of Ministry of Education of China under Grant No. 20090201110060; Cheung Kong Scholar’s Program; Key Projects in the National Science & Technology Pillar Program during the 11th 5-Year Plan Period Grant No. 2009BAH51B02; IBM CRL Research Program—Research on BlueSky Storage for Cloud Computing Platform.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sha Sha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this paper

Cite this paper

Sha, S., Jun, L., Qinghua, Z., Wei, Z. (2011). Automatic Chinese Topic Term Spelling Correction in Online Pinyin Input. In: Park, J., Jin, H., Liao, X., Zheng, R. (eds) Proceedings of the International Conference on Human-centric Computing 2011 and Embedded and Multimedia Computing 2011. Lecture Notes in Electrical Engineering, vol 102. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2105-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-2105-0_5

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-2104-3

  • Online ISBN: 978-94-007-2105-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics