Chinese Predicate Chunk Knowledge Base Construction and Internal Boundary Recognition

Wang, Chengwen; Liu, Xiang; Rao, Gaoqi; Xun, Endong; Sui, Zhifang

doi:10.1007/978-3-031-06547-7_10

Chengwen Wang¹⁰,
Xiang Liu¹¹,
Gaoqi Rao¹¹,
Endong Xun¹¹ &
…
Zhifang Sui¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13250))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

384 Accesses

Abstract

Under current researches on Chinese language, sentences are usually chunked into the components of the same level. However, in actual Chinese environment, the predicate block in a sentence and the subject and object component blocks before and after it constitute the skeleton of the event representation, and the subblocks inside the predicate block modify the core predicates from different perspectives and serve as the related components of the event. Therefore, it is necessary to treat the predicate block and the component inside the predicate block as different levels of block components. Dividing into primary and secondary components plays an important role in understanding the main semantics of a sentence by abstracting the outline and facilitating the event reasoning and calculation. Therefore, this paper first defines predicate as the core of the predicate block and further defines the sub-block inside the predicate block. The predicate-centered predicate blocks are annotated with encyclopedia corpus with relatively high sentence complexity. At the same time, the components inside the predicate block are divided based on the subblock types defined in this paper. As of now, the knowledge base includes 36,360 predicate blocks. Based on this knowledge base, this paper creates an internal boundary recognition task for predicate blocks, and tests it with a sequential labeling model, which provides a baseline for the subsequent researches in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Use “[]” to identify the internal chunk boundaries of the modifiers, and “()” to identify the core predicate.
2.
https://github.com/google-research/bert.

References

Steven, A.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, vol. 7. Association for Computational Linguistics, pp. 127–132 (2000)
Google Scholar
Qiang, Z.: Automatically bracket and tag Chinese phrases. Doctoral dissertation, Peking University (1996). (in Chinese)
Google Scholar
Su-Jian, L., Qun, L., Zhi-Feng, Y.: Chunk parsing with maximum entropy principle. Chin. J. Comput. 26(12), 1722–1727 (2003). (in Chinese)
Google Scholar
Li, H., Zhu, J.-B., Yao, T.-S.: SVM based Chinese text chunking. J. Chin. Inf. Process. 18(2), 1–7 (2004). (in Chinese)
Google Scholar
Li, H., Huang, C.N., Gao, J., et al.: Chinese chunking with another type of spec. In: The Third SIGHAN Workshop on Chinese Language Processing, pp. 24–26 (2004)
Google Scholar
Chen, W., Zhang, Y., Isahara, H.: An empirical study of Chinese chunking. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 97–104. Association for Computational Linguistics (2002)
Google Scholar
Sag, I.A., Baldwin, T.: Multiword expressions: a pain in the neck for NLR. In: Proceedings of CICLing, pp. 1–15 (2002)
Google Scholar
Choueka, Y:. Looking for needles in a haystack or locating interesting collocation expressions in large textual databases. In: Proceedings of the RIAO Conf. User—Orient Content—Based Text and Image Hamdling (1988)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. Comput. Linguist. 15, 22–29 (1990)
Google Scholar
Zhang, T., Damerau, F., Johnson. D.: Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, vol. 9, pp. 1–8 (2001)
Google Scholar
Park, S.B., Zhang, B.T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Erhard, W., Roth, D. (eds.) Proceedings of the 41^st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan: Association for Computational Linguistics, pp. 497–504 (2004)
Google Scholar
Taku, K., Yuji, M.: Use of support vector learning for chunk identification. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg, PA, USA, pp. 142–144. Association for Computational Linguistics (2000)
Google Scholar
Liu, F., Tiejun, Z., Hao, Y., Muyun, Y., Gaolin, F.: Chinese chunking analysis based on statistics. J. Chinese Inf. Process. 6, 28–32+39 (2000). (in Chinese)
Google Scholar
Degen, H., Jing, Y.: A Distributed strategy for CRFs based Chinese text chunking. J. Chin. Inf. Process. 23(1), 16–22 (2009). (in Chinese)
Google Scholar
Shao, T., Wang, C., Rao, G., Xun, E.: The semantic change and distribution of adjoining adverbs in modern Chinese. From minimal contrast to meaning construct. In: Frontiers in Chinese Linguistics, vol. 9. Springer, Singapore (2000), https://doi.org/10.1007/978-981-32-9240-6
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
Google Scholar

Download references

Acknowledgement

This paper is supported by the National Key Research and Development Program of China 2020AAA0106700, NSFC project U19A2065 and NSFC project 62076038.

Author information

Authors and Affiliations

The MOE Key Laboratory of Computational Linguistics, Peking University, Beijing, China
Chengwen Wang & Zhifang Sui
Institute of Big Data and Language Education, Beijing Language and Culture University, Beijing, China
Xiang Liu, Gaoqi Rao & Endong Xun

Authors

Chengwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gaoqi Rao
View author publications
You can also search for this author in PubMed Google Scholar
Endong Xun
View author publications
You can also search for this author in PubMed Google Scholar
Zhifang Sui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhifang Sui .

Editor information

Editors and Affiliations

Institute for Infocomm Research, Singapore, Singapore
Minghui Dong
Nanjing Normal University, Nanjing, China
Yanhui Gu
National Taiwan Normal University, Taipei, Taiwan
Jia-Fei Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Liu, X., Rao, G., Xun, E., Sui, Z. (2022). Chinese Predicate Chunk Knowledge Base Construction and Internal Boundary Recognition. In: Dong, M., Gu, Y., Hong, JF. (eds) Chinese Lexical Semantics. CLSW 2021. Lecture Notes in Computer Science(), vol 13250. Springer, Cham. https://doi.org/10.1007/978-3-031-06547-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-06547-7_10
Published: 16 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06546-0
Online ISBN: 978-3-031-06547-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics