Skip to main content

The Application of Kalman Filter Based Human-Computer Learning Model to Chinese Word Segmentation

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

This paper presents a human-computer interaction learning model for segmenting Chinese texts depending upon neither lexicon nor any annotated corpus. It enables users to add language knowledge to the system by directly intervening the segmentation process. Within limited times of user intervention, a segmentation result that fully matches the use (or with an accurate rate of 100% by manual judgement) is returned. A Kalman filter based model is adopted to learn and estimate the intention of users quickly and precisely from their interventions to reduce system prediction error hereafter. Experiments show that it achieves an encouraging performance in saving human effort and the segmenter with knowledge learned from users outperforms the baseline model by about 10% in segmenting homogenous texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liang, N.Y.: CDWS: An Automatic Word Segmentation System for Written Chinese Texts. Journal of Chinese Information Processing 1 (1987) (in Chinese)

    Google Scholar 

  2. Nie, J.Y., Jin, W., Hannan, M.L.: A Hybrid Approach to Unknown Word Detection and Segmentation of Chinese. In: Proceedings of the International Conference on Chinese Computing, pp. 326–335 (1994)

    Google Scholar 

  3. Wu, Z.: LDC Chinese Segmenter, http://www.ldc.upenn.edu/Projects/Chinese/segmenter/mansegment.perl

  4. Luo, X., Sun, M., Tsou, B.K.: Covering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information. In: COLING 2002, pp. 1–7 (2002)

    Google Scholar 

  5. Li, M., Gao, J., Huang, C.N., Li, J.: Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, pp. 1–7 (2003)

    Google Scholar 

  6. Sun, C., Huang, C.N., Guan, Y.: Combinative Ambiguity String Detection and Resolution Based on Annotated Corpus. In: Proceedings of the 3rd Student Workshop on Computational Linguistics (2006)

    Google Scholar 

  7. Sun, M.S., Shen, D.Y., Tsou, B.K.: Chinese Word Segmentation Without Using Lexicon and Hand-Crafted Training Data. In: COLING/ACL 1998, pp. 1265–1271 (1998)

    Google Scholar 

  8. Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual Dependencies in Unsupervised Word Segmentation. In: COLING/ACL 2006, pp. 673–680 (2006)

    Google Scholar 

  9. Xue, N.: Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing 8, 29–48 (2003)

    Google Scholar 

  10. Zhang, H., Liu, Q., Cheng, X., Zhang, H., Yu, H.: Chinese Lexical Analysis Using Hierarchical Hidden Markov Model. In: Proceedings of the Second SIGHAN Workshop, pp. 63–70 (2003)

    Google Scholar 

  11. Peng, F., Feng, F., Mcallum, A.: Chinese Segmentation and New Word Detection Using Conditional Random Fields. In: COLING 2004, pp. 23–27 (2004)

    Google Scholar 

  12. Wang, Z., Araki, K., Tochinai, K.: A Word Segmentation Method with Dynamic Adapting to Text Using Inductive Learning. In: Proceedings of the First SIGHAN Workshop on Chinese Language, vol. 18, pp. 1–5 (2002)

    Google Scholar 

  13. Li, B., Chen, X.H.: A Human-Computuer Interaction Word Segmentation Method Adapting to Chinese Unknown Texts. Journal of Chinese Information Processing 21 (2007) (in Chinese)

    Google Scholar 

  14. Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Association for Computational Linguistics 22, 377–404 (1996)

    Google Scholar 

  15. Sproat, R., Shih, C.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4, 336–351 (1990)

    Google Scholar 

  16. Chien, L.F.: Pat-Tree-Based Keyword Extraction for Chinese Information Retrieval. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–58 (1997)

    Google Scholar 

  17. Zhang, J., Gao, J., Zhou, M.: Extraction of Chinese Compound Words–an Experimental Study on a Very Large Corpus. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 132–139 (2000)

    Google Scholar 

  18. Yamamoto, M., Church, K.W.: Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus. Computational Linguistics 27, 1–30 (2001)

    Article  Google Scholar 

  19. Sun, M., Xiao, M., Tsou, B.K.: Chinese Word Segmentation without Using Dictionary Based on Unsupervised Learning Strategy. Chinese Journal of Computers 6, 736–742 (2004)

    Google Scholar 

  20. Kit, C., Wilks, Y.: Unsupervised Learning of Word Boundary with Description Length Gain. In: Proceedings of the CoNLL 1999 ACL Workshop, pp. 1–6 (1999)

    Google Scholar 

  21. Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor Variety Criteria for Chinese Word Extraction. Computational Linguistics 30, 75–93 (2004)

    Article  Google Scholar 

  22. Jin, Z., Tanaka-Ishii, K.: Unsupervised Segmentation of Chinese Text by Use of Branching Entropy. In: COLING/ACL 2006, pp. 428–435 (2006)

    Google Scholar 

  23. Harris, Z.S.: Morpheme Boundaries within Words. In: Papers in Structural and Transformational Linguistics, pp. 68–77 (1970)

    Google Scholar 

  24. Feng, C., Chen, Z.X., Huang, H.Y., Guan, Z.Z.: Active Learning in Chinese Word Segmentation Based on Multigram Language Model. Journal of Chinese Information Processing 1 (2004) (in Chinese)

    Google Scholar 

  25. Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. Transaction of the ASME-Journal of Basic Engineering, 35–45 (1960)

    Google Scholar 

  26. Agarwal, D., Chen, B., Elango, P., Motgi, N., Park, S., Ramakrishnan, R., Roy, S., Zachariah, J.: Online Models for Content Optimization. Advances in Neural Information Processing Systems 21, 17–24 (2009)

    Google Scholar 

  27. Chu, W., Park, S.T.: Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models. In: Proc. of the 18th International World Wide Web Conference, pp. 691–700 (2009)

    Google Scholar 

  28. Tong, Y.: Chinese Word Segmentation Based on Statistical Method with General Dictionary and Component Information. Bachelor Degree Thesis. Peking University (2012)

    Google Scholar 

  29. Odelson, B.J., Rajamani, M.R., Rawlings, J.B.: A New Autocovariance Least-Squares Method for Estimating Noise Covariances. Automatica 42, 303–308 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  30. Åkesson, B.M., Jørgensen, J.B., Poulsen, N.K., Jørgensen, S.B.: A Generalized Autocovariance Least-Squares Method for Kalman Filter Tuning. Journal of Process Control 18, 769–779 (2008)

    Article  Google Scholar 

  31. Rajamani, M.R., Rawlings, J.B.: Estimation of the Disturbance Structure from Data Using Semidefinite Programming and Optimal Weighting. Automatica 45, 142–148 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, W., Sun, N., Zou, X., Hu, J. (2013). The Application of Kalman Filter Based Human-Computer Learning Model to Chinese Word Segmentation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics