Advertisement

HANS: A Service-Oriented Framework for Chinese Language Processing

  • Lung-Hao LeeEmail author
  • Kuei-Ching Lee
  • Yuen-Hsien Tseng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10761)

Abstract

A service-oriented architecture called as HANS is proposed to facilitate Chinese natural language processing. This unified framework seamlessly integrates fundamental NLP tasks including word segmentation, part-of-speech tagging, named entity recognition, chunking, paring, and semantic role labeling to enhance Chinese language processing functionality. A basic Chinese word segmentation task is used to illustrate the function of the proposed architecture. to demonstrate the effects. Evaluated benchmarks are taken from the SIGHAN 2005 bakeoff and the NLPCC 2016 shared task. We implement publicly released toolkits including Stanford CoreNLP, FudanNLP and CKIP as services in our HANS framework for performance comparison. Experimental results confirm the feasibility of the proposed architecture. Findings are also discussed to point to potential future developments.

Keywords

Service-oriented architecture Chinese word segmentation Chinese natural language processing 

Notes

Acknowledgments

This study was partially supported by the Ministry of Science and Technology, under the grant MOST 105-2221-E-003-020-MY2 and the “Aim for the Top University Project” and “Center of Language Technology for Chinese” of National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan.

References

  1. 1.
    Wong, K.-F., Li, W., Xu, R., Zhang, Z.: Introduction to Chinese natural language processing. Synth. Lect. Hum. Lang. Technol. 2, 1–148 (2009)CrossRefGoogle Scholar
  2. 2.
    Chen, K.-J., Bai, M.-H.: Unknown word detection for Chinese by a corpus-based learning method. Int. J. Comput. Linguist. Chin. Lang. Process. 3(1), 27–44 (1998)Google Scholar
  3. 3.
    Chen, K.-J., Ma, W.-Y.: Unknown word extraction for Chinese documents. In: 19th International Conference on Computational Linguistics, pp. 169–175. ACL Anthology (2002)Google Scholar
  4. 4.
    Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: a pragmatic approach. Comput. Linguist. 31(4), 531–574 (2005)CrossRefGoogle Scholar
  5. 5.
    Peng, F., Feng, F., MaCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: 20th International Conference on Computational Linguistics, pp. 562–568. ACL Anthology (2004)Google Scholar
  6. 6.
    Li, J., Wang, H., Ren, D., Li, G.: Discriminative pruning of language models for Chinese word segmentation. In: 44th Annual Meeting of the Association for Computational Linguistics, pp. 1001–1008. ACL Anthology (2006)Google Scholar
  7. 7.
    Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)CrossRefGoogle Scholar
  8. 8.
    Zhao, H., Huang, C.-N., Li, M., Lu, B.-L.: A unified character-based tagging framework for Chinese word segmentation. ACM Trans. Asian Lang. Inf. Process. 9(2) (2010). Article 5Google Scholar
  9. 9.
    Wang, F.L., Yang, C.C.: Mining web data for Chinese segmentation. J. Am. Soc. Inf. Sci. Technol. 58(12), 1820–1837 (2007)CrossRefGoogle Scholar
  10. 10.
    Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657. ACL Anthology (2013)Google Scholar
  11. 11.
    Wang, M., Voigt, R., Manning, C.D.: Two knives cut better than one: Chinese word segmentation with dual decomposition. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 193–198. ACL Anthology (2014)Google Scholar
  12. 12.
    Pei, W., Ge, T., Chang, B.: Max-margin tensor neural network for Chinese word segmentation. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 293–303. ACL Anthology (2014)Google Scholar
  13. 13.
    Sproat, R., Emerson, T.: The first international Chinese word segmentation bakeoff. In: 2nd SIGHAN Workshop on Chinese Language Processing. ACL Anthology (2003)Google Scholar
  14. 14.
    Emerson, T.: The second international Chinese word segmentation bakeoff. In: 4th SIGHAN Workshop on Chinese Language Processing, pp. 123–133. ACL Anthology (2005)Google Scholar
  15. 15.
    Levow, G.-A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: 5th SIGHAN Workshop on Chinese Language Processing, pp. 108–117. ACL Anthology (2006)Google Scholar
  16. 16.
    Jin, G., Chen, X.: The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese POS tagging. In: 6th SIGHAN Workshop on Chinese Language Processing, pp. 69–81. ACL Anthology (2008)Google Scholar
  17. 17.
    Qiu, X., Qian, P., Yin, L., Wu, S., Huang, X.: Overview of the NLPCC 2015 shared task: Chinese word segmentation and POS tagging for micro-blog texts. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS (LNAI), vol. 9362, pp. 541–549. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25207-0_50CrossRefGoogle Scholar
  18. 18.
    Qiu, X., Qian, P., Shi, Z.: Overview of the NLPCC-ICCPOL 2016 shared task: Chinese word segmentation for micro-blog texts. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 901–906. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-50496-4_84CrossRefGoogle Scholar
  19. 19.
  20. 20.
    Ma, W.-Y., Chen, K.-J.: Design of CKIP Chinese word segmentation system. Int. J. Asian Lang. Process. 14(3), 235–249 (2004)Google Scholar
  21. 21.
    Qiu, X., Zhang, Q., Huang, X.: FudanNLP: a toolkit for Chinese natural language processing. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 49–54. ACL Anthology (2013)Google Scholar
  22. 22.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. ACL Anthology (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lung-Hao Lee
    • 1
    Email author
  • Kuei-Ching Lee
    • 2
  • Yuen-Hsien Tseng
    • 1
  1. 1.Graduate Institute of Library and Information StudiesNational Taiwan Normal UniversityTaipeiTaiwan
  2. 2.China Development LabIBMTaipeiTaiwan

Personalised recommendations