Abstract
This chapter addresses Chinese noun phrase chunking with special reference to nominalizations based on a semi-supervised approach. It uses YamCha, a support vector machine (SVM) toolkit, to train the model. In addition to the IOB scheme and the two words before and after the target word, we experimented with new features and exploited unlabeled data from web pages to enhance the performance of the model. The result of our experiments showed that our proposed method of semi-supervised learning is effective in tackling a variety of complex Chinese noun phrases that have been largely unexplored in previous research. An important bi-product of our approach is the identification of Chinese nominalized verbs, which are indistinguishable from verbs in terms of their morphology and part-of-speech tags. The findings of this research may shed light on more recent approaches to similar problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abney, Steven. 1996. Partial parsing via finite-state cascades. Journal of Natural Language Engineering 2(4):337–344.
Abney, Steven. 2007. Semisupervised learning for computational linguistics. Chapman & Hall/CRC.
Ando, Rie Kubota, and Tong Zhang. 2005. A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 1–9. Ann Arbor, Michigan.
Chang, Hsi-Wei, Zhao-Ming Gao, Chao-Lin Liu 張席維, 高照明, 劉昭麟. 2005. A preliminary study on Chinese base NP detection using SVM 利用向量支撐機辨識中文基底名詞組的初步研究. In Proceedings of the 17th Conference on Computational Linguistics and Speech Processing 第十七屆自然語言與語音處理研討會, 317–332. Tainan, Taiwan.
Cheng, Yuchang, Masayuki Asahara, and Yuji Matsumoto. 2005. Machine learning-based dependency analyzer for Chinese. Journal of Chinese Language and Computing 15(1): 13–24.
Chiu, Jason P. C., and Nichols, Eric. 2015. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4, 357–370.
CKIP 詞庫小組 1993. Analyses of parts-of-speech in Chinese (3rd ed.) 中文詞類分析(三版) Technical Report no. 93–05. Academia Sinica. Taipei, Taiwan.
Collobert, Ronan, and Weston, Jason. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML), 160–167.
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, and Toutanova, Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171–4186.
Ding, Bing-Gong, Chang-Ning Huang, and De-Gen Huang. 2005. Chinese main verb identification: From specification to realization. Computational Linguistics and Chinese Language Processing 10(1):53–94.
Huang, Chu-Ren, and Keh-Jiann Chen. 2017. Sinica treebank. In Handbook of linguistic annotation, ed. Nancy Ide and James Pustejovsky, 641–657. New York: Springer.
Huang, Chu-Ren, and Dingxu Shi. 2016. A reference grammar of Chinese. Cambridge, UK: Cambridge University Press.
Huang, Chu-Ren, Adam Kilgarriff, Yiching Wu, Chih-Ming Chiu, Simon Smith, Pavel Rychly, Ming-Hong Bai, and Keh-Jiann Chen. 2005. Chinese sketch engine and the extraction of grammatical collocations. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, in association with IJCNLP. Jeju Island, Korea.
Huang, Chu-Ren, Shu-Kai Hsieh, and Keh-Jiann Chen. 2017. Mandarin Chinese words and parts of speech: A corpus-based study. London: Routledge.
Kilgarriff, Adam, Pavel Rychly, Pavel Smrz, and David Tugwell. 2004. Sketch engine. In Proceedings of Euralex, 105–116. Lorient, France. Reprinted in Lexicology: Critical concepts in linguistics, ed. Patrick Hanks. London: Routledge.
Kinyon, Alexandra. 2001. A language-independent shallow-parser compiler. In Proceedings of the 39th ACL Conference, 322–329. Toulouse, France.
Kudo, Taku. 2001. YamCha: Yet Another Multipurpose CHunk Annotator. Available at http://chasen.org/~taku/software/YamCha/. Accessed 22 August 2018.
Kudo, Taku, and Yuji Matsumoto. 2000. Use of support vector learning for chunk identification. In Proceedings of CoNLL-2000, 142–144. Lisbon, Portugal.
Kudo, Taku, and Yuji Matsumoto. 2001. Chunking with support vector machine. In Proceedings of NAACL 2001, 192–199. Pittsburgh, Pennsylvania.
Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001). Williamstown, Massachusetts.
Ma, Wei-Yun, and Chu-Ren Huang, 馬偉雲, 黃居仁. 2006. The design of a statistic model for identifying Chinese nominalizations 中文動詞名物化判斷的統計式模型設計. In Proceedings of the 17th Conference on Computational Linguistics and Speech Processing 第十八屆自然語言與語音處理研討會. Hsinchu, Taiwan.
Ramshaw, Lance, and Mitchell Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the Third Workshop on Very Large Corpora, 82–94. Cambridge, Massachusetts.
Søgaard, Anders. 2013. Semi-supervised learning and domain adaptation in natural language processing. Morgan & Claypool.
Tjong, Kim Sang Erik. 2000. Noun phrase recognition by system combination. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, 50–55. Seattle, Washington.
Tjong, Kim Sang Eric, and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 shared task: Chunking. In Proceedings of CoNLL-2000 and LLL-2000, 127–132. Lisbon, Portugal.
Vaswani, Ashish, et al. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30, 5998–6008. Long Beach, CA.
Wang, Rongbo and Chi, Zheru. 2003. Automatic segmentation of Chinese chunks using a neural network. IEEE Inernational Conference Neural Networks & Signal Processing, 14–17. Nanjing. China.
Wang, Chengyu, He, Xiaofeng, and Zhou Aoying 2021. Open relation extraction for Chinese noun phrases. IEEE Transactions on Knowledge and Data Engineering, 33(6): 2693–2708.
Wu, et al. 2019. An Attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access, Special Section on Data-enabled Intelligence for Digital Health, 7, 113942–113949.
Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196. Cambridge, Massachusetts.
Zhai, Feifei et al. 2017. Neural models for sequence chunking. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 3365–3371. San Francisco, California.
Zhao, Jun, and Chang-ning Huang. 1999. The model of Chinese base NP analysis. Chinese Journal of Computers 22(2):141–146.
Zhou, Junsheng, Weiguang Qu, and Fen Zhang. 2012. Exploiting chunk-level features to improve phrase chunking. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 557–567. Jeju Island, Korea.
Zhu, Xiaojin, and Andrew Goldberg. 2009. Introduction to semi-supervised learning. Morgan & Claypool.
Zhu, Ling, Derek F. Wong, and Lidia S. Chao. 2014. Unsupervised chunking based on graph propagation from bilingual corpus. The Scientific World Journal, 2014:1–10.
Zhu, Jingbo, Muhua Zhu, Qiang Wang, and Tong Xiao. 2015. NiuParser: A Chinese syntactic and semantic parsing toolkit. In Proceedings of ACL-IJCNLP 2015 System Demonstrations, 145–150. Beijing, China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gao, ZM., Lin, YH., Tsui, R.G. (2023). A Semi-supervised Approach for Chinese Noun Phrase Chunking. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-38913-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)