Advertisement

Character Tagging-Based Word Segmentation for Uyghur

  • Yating Yang
  • Chenggang Mi
  • Bo Ma
  • Rui Dong
  • Lei Wang
  • Xiao Li
Part of the Communications in Computer and Information Science book series (CCIS, volume 493)

Abstract

For effectively obtain information in Uyghur words, we present a novel method based on character tagging for Uyghur word segmentation. In this paper, we suggest five labels for characters in a Uyghur word, include: Su, Bu, Iu, Eu and Au, according to our method, we segment Uyghur words as a sequence labeling procedure, which use Conditional Random Fields (CRFs) as the basic labeling model. Experimental show that our method collect more features in Uyghur words, therefore outperform several traditional used word segmentation models significantly.

Keywords

Word segmentation Uyghur Conditional Random Fields Character Tagging 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Christopher, D.M., Hinrich, S.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  2. 2.
    Meystre, S., Haug, P.J.: Automation of a problem list using natural language processing. BMC Medical Informatics and Decision Making 5(1), 30 (2005)CrossRefGoogle Scholar
  3. 3.
    Collobert, R., Weston, L., Bottou, M., Karlen, K.K., Kuksa, P.: Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research 12, 2493–2537 (2011)zbMATHGoogle Scholar
  4. 4.
    Zaokere, K., Aishan, W., Tuergen, Y., et al.: Uyghur noun stemming system based on hybrid method. Computer Engineering and Applications 49(1), 171–175 (2013)Google Scholar
  5. 5.
    Zou, Y., Tuergen, Y., Mairehaba, A., Aishan, W., Parida, T.: Uyghur event-anchored temporal expressions recognition using stemming method. Computer Engineering and Design 35(2), 625–630 (2014)Google Scholar
  6. 6.
    Xue, H., Dong, X., Wang, L., Osman, T., Jiang, T.: Unsupervised Uyghur word segmentation method based on affix corpus. Computer Engineering and Design 32(9), 3191–3194 (2011)Google Scholar
  7. 7.
    Chen, P.: Uyghur Stem Segmentation and POS Tagging based on Corpora. Master’s Thesis, Xinjiang University (2006)Google Scholar
  8. 8.
    Adongbieke, G., Ablimit, M.: Research on Uighur Word Segmentation. Journal of Chinese Information Processing 18(6), 61–65 (2004)Google Scholar
  9. 9.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)Google Scholar
  10. 10.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 134–141. Association for Computational Linguistics (2003)Google Scholar
  11. 11.
    Wallach, H.M.: Conditional random fields: An introduction. Technical Reports (CIS), 22 (2004)Google Scholar
  12. 12.
    Morwal, S., Jahan, N., Chopra, D.: Named entity recognition using hidden Markov model (HMM). Int. J. Nat. Lang. Comput(IJNLC) 1(4), 15–23 (2012)CrossRefGoogle Scholar
  13. 13.
    Morwal, S., Chopra, D.: NERHMM: A Tool For Named Entity Recognition based on Hidden Markov Model. International Journal on Natural Language Computing (IJNLC) 2, 43–49 (2013)CrossRefGoogle Scholar
  14. 14.
    Ratnaparkhi, A.: A simple introduction to maximum entropy models for natural language processing. IRCS Technical Reports Series 81 (1997)Google Scholar
  15. 15.
    Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)Google Scholar
  16. 16.
    Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the 6th Conference on Natural Language Learning, vol. 20, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  17. 17.
    Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Yating Yang
    • 1
  • Chenggang Mi
    • 1
    • 2
  • Bo Ma
    • 1
  • Rui Dong
    • 1
    • 2
  • Lei Wang
    • 1
  • Xiao Li
    • 1
  1. 1.Xinjiang Technical Institute of Physics & Chemistry of Chinese Academy of SciencesUrumqiChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations