Skip to main content

Multi-perspective Embeddings for Chinese Chunking

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11173))

Included in the following conference series:

Abstract

Chunking is a crucial step in natural language processing (NLP), which aims to divide a text into syntactically correlated but non-overlapping chunks. The task is typically modeled as a sequence labeling problem. Various machine learning algorithms, such as Conditional Random Fields (CRFs) and Support Vector Machines (SVMs), have been successfully used for this task. However, these state-of-the-art chunking systems largely depend on hand-crafted appropriate features. In this paper, we present a recurrent neural network (RNN) framework based on multi-perspective embeddings for Chinese chunking. This framework takes the character representation, part-of-speech (POS) embeddings and word embeddings as the input features of the RNN layer. On top of the RNN, we use a CRF layer to jointly decode labels for the whole sentence. Experimental results show that various embeddings can improve the performance of the RNN model. Although our model uses these embeddings as the only features, it can be successfully used for Chinese chunking without any feature engineering efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing. SLAP, vol. 44, pp. 257–278. Springer, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3474-3_10

    Chapter  Google Scholar 

  2. Chen, W., Zhang, Y., Isahara, H.: An empirical study of Chinese chunking. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 97–104. Association for Computational Linguistics (2006)

    Google Scholar 

  3. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Syntax, Semantics and Structure in Statistical Translation, p. 103 (2014)

    Google Scholar 

  4. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  5. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)

    Article  Google Scholar 

  6. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, pp. 1–8. Association for Computational Linguistics (2001)

    Google Scholar 

  7. Kudoh, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning-Volume 7, pp. 142–144. Association for Computational Linguistics (2000)

    Google Scholar 

  8. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp. 282–289 (2001)

    Google Scholar 

  9. Lin, Y., Liu, Z., Sun, M.: Neural relation extraction with multi-lingual attention. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 34–43. Association for Computational Linguistics, July 2017. http://aclweb.org/anthology/P17-1004

  10. Lyu, C., Chen, B., Ren, Y., Ji, D.: Long short-term memory RNN for biomedical named entity recognition. BMC Bioinform. 18(1), 462 (2017)

    Article  Google Scholar 

  11. Lyu, C., Zhang, Y., Ji, D.: Joint word segmentation, POS-tagging and syntactic chunking. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3007–3014. AAAI Press (2016)

    Google Scholar 

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  13. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 134–141. Association for Computational Linguistics (2003)

    Google Scholar 

  14. Tan, Y., Yao, T., Chen, Q., Zhu, J.: Chinese chunk identification using SVMs plus sigmoid. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 527–536. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30211-7_56

    Chapter  Google Scholar 

  15. Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to Chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_16

    Chapter  Google Scholar 

  16. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning-Volume 7, pp. 127–132. Association for Computational Linguistics (2000)

    Google Scholar 

  17. Word2vec. https://code.google.com/archive/p/word2vec/. Accessed 5 Jan 2018

  18. Zhang, M., Zhang, Y., Fu, G.: Transition-based neural word segmentation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 421–431. Association for Computational Linguistics, Berlin, August 2016. http://www.aclweb.org/anthology/P16-1040

  19. Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)

    MATH  Google Scholar 

  20. Zhou, J., Qu, W., Zhang, F.: Exploiting chunk-level features to improve phrase chunking. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 557–567. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

Download references

Acknowledgments

We thank all reviewers for their detailed comments. This work is supported by the Science and Technology Project of Guangzhou (No. 201704030002), the National Natural Science Foundation of China (No. 61772378, 61702121) and Humanities and Social Science Foundation of Ministry of Education of China (16YJCZH004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lyu, C., Chen, B., Ji, D. (2018). Multi-perspective Embeddings for Chinese Chunking. In: Hong, JF., Su, Q., Wu, JS. (eds) Chinese Lexical Semantics. CLSW 2018. Lecture Notes in Computer Science(), vol 11173. Springer, Cham. https://doi.org/10.1007/978-3-030-04015-4_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04015-4_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04014-7

  • Online ISBN: 978-3-030-04015-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics