A Semi-supervised Approach for Chinese Noun Phrase Chunking

Gao, Zhao-Ming; Lin, Yen-Hsi; Tsui, Ruben G.

doi:10.1007/978-3-031-38913-9_26

Zhao-Ming Gao⁵,
Yen-Hsi Lin⁶ &
Ruben G. Tsui⁷

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 49))

130 Accesses

Abstract

This chapter addresses Chinese noun phrase chunking with special reference to nominalizations based on a semi-supervised approach. It uses YamCha, a support vector machine (SVM) toolkit, to train the model. In addition to the IOB scheme and the two words before and after the target word, we experimented with new features and exploited unlabeled data from web pages to enhance the performance of the model. The result of our experiments showed that our proposed method of semi-supervised learning is effective in tackling a variety of complex Chinese noun phrases that have been largely unexplored in previous research. An important bi-product of our approach is the identification of Chinese nominalized verbs, which are indistinguishable from verbs in terms of their morphology and part-of-speech tags. The findings of this research may shed light on more recent approaches to similar problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Noun Phrase Chunking for Turkish Using a Dependency Parser

Incorporating Word Clustering into Complex Noun Phrase Identification

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

References

Abney, Steven. 1996. Partial parsing via finite-state cascades. Journal of Natural Language Engineering 2(4):337–344.
Google Scholar
Abney, Steven. 2007. Semisupervised learning for computational linguistics. Chapman & Hall/CRC.
Google Scholar
Ando, Rie Kubota, and Tong Zhang. 2005. A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 1–9. Ann Arbor, Michigan.
Google Scholar
Chang, Hsi-Wei, Zhao-Ming Gao, Chao-Lin Liu 張席維, 高照明, 劉昭麟. 2005. A preliminary study on Chinese base NP detection using SVM 利用向量支撐機辨識中文基底名詞組的初步研究. In Proceedings of the 17th Conference on Computational Linguistics and Speech Processing 第十七屆自然語言與語音處理研討會, 317–332. Tainan, Taiwan.
Google Scholar
Cheng, Yuchang, Masayuki Asahara, and Yuji Matsumoto. 2005. Machine learning-based dependency analyzer for Chinese. Journal of Chinese Language and Computing 15(1): 13–24.
Google Scholar
Chiu, Jason P. C., and Nichols, Eric. 2015. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4, 357–370.
Google Scholar
CKIP 詞庫小組 1993. Analyses of parts-of-speech in Chinese (3rd ed.) 中文詞類分析(三版) Technical Report no. 93–05. Academia Sinica. Taipei, Taiwan.
Google Scholar
Collobert, Ronan, and Weston, Jason. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML), 160–167.
Google Scholar
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, and Toutanova, Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171–4186.
Google Scholar
Ding, Bing-Gong, Chang-Ning Huang, and De-Gen Huang. 2005. Chinese main verb identification: From specification to realization. Computational Linguistics and Chinese Language Processing 10(1):53–94.
Google Scholar
Huang, Chu-Ren, and Keh-Jiann Chen. 2017. Sinica treebank. In Handbook of linguistic annotation, ed. Nancy Ide and James Pustejovsky, 641–657. New York: Springer.
Google Scholar
Huang, Chu-Ren, and Dingxu Shi. 2016. A reference grammar of Chinese. Cambridge, UK: Cambridge University Press.
Google Scholar
Huang, Chu-Ren, Adam Kilgarriff, Yiching Wu, Chih-Ming Chiu, Simon Smith, Pavel Rychly, Ming-Hong Bai, and Keh-Jiann Chen. 2005. Chinese sketch engine and the extraction of grammatical collocations. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, in association with IJCNLP. Jeju Island, Korea.
Google Scholar
Huang, Chu-Ren, Shu-Kai Hsieh, and Keh-Jiann Chen. 2017. Mandarin Chinese words and parts of speech: A corpus-based study. London: Routledge.
Google Scholar
Kilgarriff, Adam, Pavel Rychly, Pavel Smrz, and David Tugwell. 2004. Sketch engine. In Proceedings of Euralex, 105–116. Lorient, France. Reprinted in Lexicology: Critical concepts in linguistics, ed. Patrick Hanks. London: Routledge.
Google Scholar
Kinyon, Alexandra. 2001. A language-independent shallow-parser compiler. In Proceedings of the 39th ACL Conference, 322–329. Toulouse, France.
Google Scholar
Kudo, Taku. 2001. YamCha: Yet Another Multipurpose CHunk Annotator. Available at http://chasen.org/~taku/software/YamCha/. Accessed 22 August 2018.
Kudo, Taku, and Yuji Matsumoto. 2000. Use of support vector learning for chunk identification. In Proceedings of CoNLL-2000, 142–144. Lisbon, Portugal.
Google Scholar
Kudo, Taku, and Yuji Matsumoto. 2001. Chunking with support vector machine. In Proceedings of NAACL 2001, 192–199. Pittsburgh, Pennsylvania.
Google Scholar
Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001). Williamstown, Massachusetts.
Google Scholar
Ma, Wei-Yun, and Chu-Ren Huang, 馬偉雲, 黃居仁. 2006. The design of a statistic model for identifying Chinese nominalizations 中文動詞名物化判斷的統計式模型設計. In Proceedings of the 17th Conference on Computational Linguistics and Speech Processing 第十八屆自然語言與語音處理研討會. Hsinchu, Taiwan.
Google Scholar
Ramshaw, Lance, and Mitchell Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the Third Workshop on Very Large Corpora, 82–94. Cambridge, Massachusetts.
Google Scholar
Søgaard, Anders. 2013. Semi-supervised learning and domain adaptation in natural language processing. Morgan & Claypool.
Google Scholar
Tjong, Kim Sang Erik. 2000. Noun phrase recognition by system combination. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, 50–55. Seattle, Washington.
Google Scholar
Tjong, Kim Sang Eric, and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 shared task: Chunking. In Proceedings of CoNLL-2000 and LLL-2000, 127–132. Lisbon, Portugal.
Google Scholar
Vaswani, Ashish, et al. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30, 5998–6008. Long Beach, CA.
Google Scholar
Wang, Rongbo and Chi, Zheru. 2003. Automatic segmentation of Chinese chunks using a neural network. IEEE Inernational Conference Neural Networks & Signal Processing, 14–17. Nanjing. China.
Google Scholar
Wang, Chengyu, He, Xiaofeng, and Zhou Aoying 2021. Open relation extraction for Chinese noun phrases. IEEE Transactions on Knowledge and Data Engineering, 33(6): 2693–2708.
Google Scholar
Wu, et al. 2019. An Attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access, Special Section on Data-enabled Intelligence for Digital Health, 7, 113942–113949.
Google Scholar
Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196. Cambridge, Massachusetts.
Google Scholar
Zhai, Feifei et al. 2017. Neural models for sequence chunking. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 3365–3371. San Francisco, California.
Google Scholar
Zhao, Jun, and Chang-ning Huang. 1999. The model of Chinese base NP analysis. Chinese Journal of Computers 22(2):141–146.
Google Scholar
Zhou, Junsheng, Weiguang Qu, and Fen Zhang. 2012. Exploiting chunk-level features to improve phrase chunking. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 557–567. Jeju Island, Korea.
Google Scholar
Zhu, Xiaojin, and Andrew Goldberg. 2009. Introduction to semi-supervised learning. Morgan & Claypool.
Google Scholar
Zhu, Ling, Derek F. Wong, and Lidia S. Chao. 2014. Unsupervised chunking based on graph propagation from bilingual corpus. The Scientific World Journal, 2014:1–10.
Google Scholar
Zhu, Jingbo, Muhua Zhu, Qiang Wang, and Tong Xiao. 2015. NiuParser: A Chinese syntactic and semantic parsing toolkit. In Proceedings of ACL-IJCNLP 2015 System Demonstrations, 145–150. Beijing, China.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Foreign Languages and Literatures, National Taiwan University, Taipei, Taiwan
Zhao-Ming Gao
Delta Electronics, Inc., Taipei, Taiwan
Yen-Hsi Lin
Graduate Program in Translation and Interpretation, College of Liberal Arts, National Taiwan University, Taipei, Taiwan
Ruben G. Tsui

Authors

Zhao-Ming Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Hsi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ruben G. Tsui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhao-Ming Gao .

Editor information

Editors and Affiliations

Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Chu-Ren Huang
Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan
Shu-Kai Hsieh
School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan City, Sichuan, China
Peng Jin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gao, ZM., Lin, YH., Tsui, R.G. (2023). A Semi-supervised Approach for Chinese Noun Phrase Chunking. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-38913-9_26
Published: 19 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics

A Semi-supervised Approach for Chinese Noun Phrase Chunking

Abstract

Access this chapter

Similar content being viewed by others

Noun Phrase Chunking for Turkish Using a Dependency Parser

Incorporating Word Clustering into Complex Noun Phrase Identification

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

A Semi-supervised Approach for Chinese Noun Phrase Chunking

Abstract

Access this chapter

Similar content being viewed by others

Noun Phrase Chunking for Turkish Using a Dependency Parser

Incorporating Word Clustering into Complex Noun Phrase Identification

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation