Skip to main content

A Semi-supervised Approach for Chinese Noun Phrase Chunking

  • Chapter
  • First Online:
Chinese Language Resources

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 49))

  • 130 Accesses

Abstract

This chapter addresses Chinese noun phrase chunking with special reference to nominalizations based on a semi-supervised approach. It uses YamCha, a support vector machine (SVM) toolkit, to train the model. In addition to the IOB scheme and the two words before and after the target word, we experimented with new features and exploited unlabeled data from web pages to enhance the performance of the model. The result of our experiments showed that our proposed method of semi-supervised learning is effective in tackling a variety of complex Chinese noun phrases that have been largely unexplored in previous research. An important bi-product of our approach is the identification of Chinese nominalized verbs, which are indistinguishable from verbs in terms of their morphology and part-of-speech tags. The findings of this research may shed light on more recent approaches to similar problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Abney, Steven. 1996. Partial parsing via finite-state cascades. Journal of Natural Language Engineering 2(4):337–344.

    Google Scholar 

  • Abney, Steven. 2007. Semisupervised learning for computational linguistics. Chapman & Hall/CRC.

    Google Scholar 

  • Ando, Rie Kubota, and Tong Zhang. 2005. A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 1–9. Ann Arbor, Michigan.

    Google Scholar 

  • Chang, Hsi-Wei, Zhao-Ming Gao, Chao-Lin Liu 張席維, 高照明, 劉昭麟. 2005. A preliminary study on Chinese base NP detection using SVM 利用向量支撐機辨識中文基底名詞組的初步研究. In Proceedings of the 17th Conference on Computational Linguistics and Speech Processing 第十七屆自然語言與語音處理研討會, 317–332. Tainan, Taiwan.

    Google Scholar 

  • Cheng, Yuchang, Masayuki Asahara, and Yuji Matsumoto. 2005. Machine learning-based dependency analyzer for Chinese. Journal of Chinese Language and Computing 15(1): 13–24.

    Google Scholar 

  • Chiu, Jason P. C., and Nichols, Eric. 2015. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4, 357–370.

    Google Scholar 

  • CKIP 詞庫小組 1993. Analyses of parts-of-speech in Chinese (3rd ed.) 中文詞類分析(三版) Technical Report no. 93–05. Academia Sinica. Taipei, Taiwan.

    Google Scholar 

  • Collobert, Ronan, and Weston, Jason. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML), 160–167.

    Google Scholar 

  • Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, and Toutanova, Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171–4186.

    Google Scholar 

  • Ding, Bing-Gong, Chang-Ning Huang, and De-Gen Huang. 2005. Chinese main verb identification: From specification to realization. Computational Linguistics and Chinese Language Processing 10(1):53–94.

    Google Scholar 

  • Huang, Chu-Ren, and Keh-Jiann Chen. 2017. Sinica treebank. In Handbook of linguistic annotation, ed. Nancy Ide and James Pustejovsky, 641–657. New York: Springer.

    Google Scholar 

  • Huang, Chu-Ren, and Dingxu Shi. 2016. A reference grammar of Chinese. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Huang, Chu-Ren, Adam Kilgarriff, Yiching Wu, Chih-Ming Chiu, Simon Smith, Pavel Rychly, Ming-Hong Bai, and Keh-Jiann Chen. 2005. Chinese sketch engine and the extraction of grammatical collocations. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, in association with IJCNLP. Jeju Island, Korea.

    Google Scholar 

  • Huang, Chu-Ren, Shu-Kai Hsieh, and Keh-Jiann Chen. 2017. Mandarin Chinese words and parts of speech: A corpus-based study. London: Routledge.

    Google Scholar 

  • Kilgarriff, Adam, Pavel Rychly, Pavel Smrz, and David Tugwell. 2004. Sketch engine. In Proceedings of Euralex, 105–116. Lorient, France. Reprinted in Lexicology: Critical concepts in linguistics, ed. Patrick Hanks. London: Routledge.

    Google Scholar 

  • Kinyon, Alexandra. 2001. A language-independent shallow-parser compiler. In Proceedings of the 39th ACL Conference, 322–329. Toulouse, France.

    Google Scholar 

  • Kudo, Taku. 2001. YamCha: Yet Another Multipurpose CHunk Annotator. Available at http://chasen.org/~taku/software/YamCha/. Accessed 22 August 2018.

  • Kudo, Taku, and Yuji Matsumoto. 2000. Use of support vector learning for chunk identification. In Proceedings of CoNLL-2000, 142–144. Lisbon, Portugal.

    Google Scholar 

  • Kudo, Taku, and Yuji Matsumoto. 2001. Chunking with support vector machine. In Proceedings of NAACL 2001, 192–199. Pittsburgh, Pennsylvania.

    Google Scholar 

  • Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001). Williamstown, Massachusetts.

    Google Scholar 

  • Ma, Wei-Yun, and Chu-Ren Huang, 馬偉雲, 黃居仁. 2006. The design of a statistic model for identifying Chinese nominalizations 中文動詞名物化判斷的統計式模型設計. In Proceedings of the 17th Conference on Computational Linguistics and Speech Processing 第十八屆自然語言與語音處理研討會. Hsinchu, Taiwan.

    Google Scholar 

  • Ramshaw, Lance, and Mitchell Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the Third Workshop on Very Large Corpora, 82–94. Cambridge, Massachusetts.

    Google Scholar 

  • Søgaard, Anders. 2013. Semi-supervised learning and domain adaptation in natural language processing. Morgan & Claypool.

    Google Scholar 

  • Tjong, Kim Sang Erik. 2000. Noun phrase recognition by system combination. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, 50–55. Seattle, Washington.

    Google Scholar 

  • Tjong, Kim Sang Eric, and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 shared task: Chunking. In Proceedings of CoNLL-2000 and LLL-2000, 127–132. Lisbon, Portugal.

    Google Scholar 

  • Vaswani, Ashish, et al. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30, 5998–6008. Long Beach, CA.

    Google Scholar 

  • Wang, Rongbo and Chi, Zheru. 2003. Automatic segmentation of Chinese chunks using a neural network. IEEE Inernational Conference Neural Networks & Signal Processing, 14–17. Nanjing. China.

    Google Scholar 

  • Wang, Chengyu, He, Xiaofeng, and Zhou Aoying 2021. Open relation extraction for Chinese noun phrases. IEEE Transactions on Knowledge and Data Engineering, 33(6): 2693–2708.

    Google Scholar 

  • Wu, et al. 2019. An Attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access, Special Section on Data-enabled Intelligence for Digital Health, 7, 113942–113949.

    Google Scholar 

  • Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196. Cambridge, Massachusetts.

    Google Scholar 

  • Zhai, Feifei et al. 2017. Neural models for sequence chunking. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 3365–3371. San Francisco, California.

    Google Scholar 

  • Zhao, Jun, and Chang-ning Huang. 1999. The model of Chinese base NP analysis. Chinese Journal of Computers 22(2):141–146.

    Google Scholar 

  • Zhou, Junsheng, Weiguang Qu, and Fen Zhang. 2012. Exploiting chunk-level features to improve phrase chunking. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 557–567. Jeju Island, Korea.

    Google Scholar 

  • Zhu, Xiaojin, and Andrew Goldberg. 2009. Introduction to semi-supervised learning. Morgan & Claypool.

    Google Scholar 

  • Zhu, Ling, Derek F. Wong, and Lidia S. Chao. 2014. Unsupervised chunking based on graph propagation from bilingual corpus. The Scientific World Journal, 2014:1–10.

    Google Scholar 

  • Zhu, Jingbo, Muhua Zhu, Qiang Wang, and Tong Xiao. 2015. NiuParser: A Chinese syntactic and semantic parsing toolkit. In Proceedings of ACL-IJCNLP 2015 System Demonstrations, 145–150. Beijing, China.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhao-Ming Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gao, ZM., Lin, YH., Tsui, R.G. (2023). A Semi-supervised Approach for Chinese Noun Phrase Chunking. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38913-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38912-2

  • Online ISBN: 978-3-031-38913-9

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics