Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts

Qiu, Xipeng; Qian, Peng; Yin, Liusong; Wu, Shiyu; Huang, Xuanjing

doi:10.1007/978-3-319-25207-0_50

Xipeng Qiu²³,
Peng Qian²³,
Liusong Yin²³,
Shiyu Wu²³ &
…
Xuanjing Huang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9362))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2371 Accesses
13 Citations

Abstract

In this paper, we give an overview for the shared task at the 4th CCF Conference on Natural Language Processing & Chinese Computing (NLPCC 2015): Chinese word segmentation and part-of-speech (POS) tagging for micro-blog texts. Different with the popular used newswire datasets, the dataset of this shared task consists of the relatively informal micro-texts. The shared task has two sub-tasks: (1) individual Chinese word segmentation and (2) joint Chinese word segmentation and POS Tagging. Each subtask has three tracks to distinguish the systems with different resources. We first introduce the dataset and task, then we characterize the different approaches of the participating systems, report the test results, and provide a overview analysis of these results. An online system is available for open registration and evaluation at http://nlp.fudan.edu.cn/nlpcc2015.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berger, A.L., Della, V.J.: Pietra, and S.A. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Collins, M.: Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (2002)
Google Scholar
Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor variety criteria for chinese word extraction. Computational Linguistics 30(1), 75–93 (2004)
Article Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Google Scholar
Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: one-at-a-time or all-at-once? word-based or character-based. In: Proceedings of EMNLP, vol. 4 (2004)
Google Scholar
Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
Google Scholar
Qiu, X., Zhang, Q., Huang, X.: FudanNLP: a toolkit for Chinese natural language processing. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (2013)
Google Scholar
Xue, N.: Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing 8(1), 29–48 (2003)
Google Scholar
Xue, N., Xia, F., Chiou, F.-D., Palmer, M.: The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Natural language engineering 11(2), 207–238 (2005)
Article Google Scholar
Yang, Y., Xue, N.: Chinese comma disambiguation for discourse analysis. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 786–794. Association for Computational Linguistics (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, 825 Zhangheng Road, Shanghai, China
Xipeng Qiu, Peng Qian, Liusong Yin, Shiyu Wu & Xuanjing Huang

Authors

Xipeng Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Qian
View author publications
You can also search for this author in PubMed Google Scholar
Liusong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanjing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xipeng Qiu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Juanzi Li
Rensselaer Polytechnic Institute, Troy, NY, USA
Heng Ji
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiu, X., Qian, P., Yin, L., Wu, S., Huang, X. (2015). Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-25207-0_50
Published: 20 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25206-3
Online ISBN: 978-3-319-25207-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics