Skip to main content

Chinese Predicate Chunk Knowledge Base Construction and Internal Boundary Recognition

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13250))

Included in the following conference series:

  • 384 Accesses

Abstract

Under current researches on Chinese language, sentences are usually chunked into the components of the same level. However, in actual Chinese environment, the predicate block in a sentence and the subject and object component blocks before and after it constitute the skeleton of the event representation, and the subblocks inside the predicate block modify the core predicates from different perspectives and serve as the related components of the event. Therefore, it is necessary to treat the predicate block and the component inside the predicate block as different levels of block components. Dividing into primary and secondary components plays an important role in understanding the main semantics of a sentence by abstracting the outline and facilitating the event reasoning and calculation. Therefore, this paper first defines predicate as the core of the predicate block and further defines the sub-block inside the predicate block. The predicate-centered predicate blocks are annotated with encyclopedia corpus with relatively high sentence complexity. At the same time, the components inside the predicate block are divided based on the subblock types defined in this paper. As of now, the knowledge base includes 36,360 predicate blocks. Based on this knowledge base, this paper creates an internal boundary recognition task for predicate blocks, and tests it with a sequential labeling model, which provides a baseline for the subsequent researches in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Use “[]” to identify the internal chunk boundaries of the modifiers, and “()” to identify the core predicate.

  2. 2.

    https://github.com/google-research/bert.

References

  1. Steven, A.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)

    Google Scholar 

  2. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, vol. 7. Association for Computational Linguistics, pp. 127–132 (2000)

    Google Scholar 

  3. Qiang, Z.: Automatically bracket and tag Chinese phrases. Doctoral dissertation, Peking University (1996). (in Chinese)

    Google Scholar 

  4. Su-Jian, L., Qun, L., Zhi-Feng, Y.: Chunk parsing with maximum entropy principle. Chin. J. Comput. 26(12), 1722–1727 (2003). (in Chinese)

    Google Scholar 

  5. Li, H., Zhu, J.-B., Yao, T.-S.: SVM based Chinese text chunking. J. Chin. Inf. Process. 18(2), 1–7 (2004). (in Chinese)

    Google Scholar 

  6. Li, H., Huang, C.N., Gao, J., et al.: Chinese chunking with another type of spec. In: The Third SIGHAN Workshop on Chinese Language Processing, pp. 24–26 (2004)

    Google Scholar 

  7. Chen, W., Zhang, Y., Isahara, H.: An empirical study of Chinese chunking. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 97–104. Association for Computational Linguistics (2002)

    Google Scholar 

  8. Sag, I.A., Baldwin, T.: Multiword expressions: a pain in the neck for NLR. In: Proceedings of CICLing, pp. 1–15 (2002)

    Google Scholar 

  9. Choueka, Y:. Looking for needles in a haystack or locating interesting collocation expressions in large textual databases. In: Proceedings of the RIAO Conf. User—Orient Content—Based Text and Image Hamdling (1988)

    Google Scholar 

  10. Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. Comput. Linguist. 15, 22–29 (1990)

    Google Scholar 

  11. Zhang, T., Damerau, F., Johnson. D.: Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)

    Google Scholar 

  12. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, vol. 9, pp. 1–8 (2001)

    Google Scholar 

  13. Park, S.B., Zhang, B.T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Erhard, W., Roth, D. (eds.) Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan: Association for Computational Linguistics, pp. 497–504 (2004)

    Google Scholar 

  14. Taku, K., Yuji, M.: Use of support vector learning for chunk identification. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg, PA, USA, pp. 142–144. Association for Computational Linguistics (2000)

    Google Scholar 

  15. Liu, F., Tiejun, Z., Hao, Y., Muyun, Y., Gaolin, F.: Chinese chunking analysis based on statistics. J. Chinese Inf. Process. 6, 28–32+39 (2000). (in Chinese)

    Google Scholar 

  16. Degen, H., Jing, Y.: A Distributed strategy for CRFs based Chinese text chunking. J. Chin. Inf. Process. 23(1), 16–22 (2009). (in Chinese)

    Google Scholar 

  17. Shao, T., Wang, C., Rao, G., Xun, E.: The semantic change and distribution of adjoining adverbs in modern Chinese. From minimal contrast to meaning construct. In: Frontiers in Chinese Linguistics, vol. 9. Springer, Singapore (2000), https://doi.org/10.1007/978-981-32-9240-6

  18. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)

    Google Scholar 

Download references

Acknowledgement

This paper is supported by the National Key Research and Development Program of China 2020AAA0106700, NSFC project U19A2065 and NSFC project 62076038.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhifang Sui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, C., Liu, X., Rao, G., Xun, E., Sui, Z. (2022). Chinese Predicate Chunk Knowledge Base Construction and Internal Boundary Recognition. In: Dong, M., Gu, Y., Hong, JF. (eds) Chinese Lexical Semantics. CLSW 2021. Lecture Notes in Computer Science(), vol 13250. Springer, Cham. https://doi.org/10.1007/978-3-031-06547-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06547-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06546-0

  • Online ISBN: 978-3-031-06547-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics