Abstract
Under current researches on Chinese language, sentences are usually chunked into the components of the same level. However, in actual Chinese environment, the predicate block in a sentence and the subject and object component blocks before and after it constitute the skeleton of the event representation, and the subblocks inside the predicate block modify the core predicates from different perspectives and serve as the related components of the event. Therefore, it is necessary to treat the predicate block and the component inside the predicate block as different levels of block components. Dividing into primary and secondary components plays an important role in understanding the main semantics of a sentence by abstracting the outline and facilitating the event reasoning and calculation. Therefore, this paper first defines predicate as the core of the predicate block and further defines the sub-block inside the predicate block. The predicate-centered predicate blocks are annotated with encyclopedia corpus with relatively high sentence complexity. At the same time, the components inside the predicate block are divided based on the subblock types defined in this paper. As of now, the knowledge base includes 36,360 predicate blocks. Based on this knowledge base, this paper creates an internal boundary recognition task for predicate blocks, and tests it with a sequential labeling model, which provides a baseline for the subsequent researches in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Use “[]” to identify the internal chunk boundaries of the modifiers, and “()” to identify the core predicate.
- 2.
References
Steven, A.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, vol. 7. Association for Computational Linguistics, pp. 127–132 (2000)
Qiang, Z.: Automatically bracket and tag Chinese phrases. Doctoral dissertation, Peking University (1996). (in Chinese)
Su-Jian, L., Qun, L., Zhi-Feng, Y.: Chunk parsing with maximum entropy principle. Chin. J. Comput. 26(12), 1722–1727 (2003). (in Chinese)
Li, H., Zhu, J.-B., Yao, T.-S.: SVM based Chinese text chunking. J. Chin. Inf. Process. 18(2), 1–7 (2004). (in Chinese)
Li, H., Huang, C.N., Gao, J., et al.: Chinese chunking with another type of spec. In: The Third SIGHAN Workshop on Chinese Language Processing, pp. 24–26 (2004)
Chen, W., Zhang, Y., Isahara, H.: An empirical study of Chinese chunking. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 97–104. Association for Computational Linguistics (2002)
Sag, I.A., Baldwin, T.: Multiword expressions: a pain in the neck for NLR. In: Proceedings of CICLing, pp. 1–15 (2002)
Choueka, Y:. Looking for needles in a haystack or locating interesting collocation expressions in large textual databases. In: Proceedings of the RIAO Conf. User—Orient Content—Based Text and Image Hamdling (1988)
Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. Comput. Linguist. 15, 22–29 (1990)
Zhang, T., Damerau, F., Johnson. D.: Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, vol. 9, pp. 1–8 (2001)
Park, S.B., Zhang, B.T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Erhard, W., Roth, D. (eds.) Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan: Association for Computational Linguistics, pp. 497–504 (2004)
Taku, K., Yuji, M.: Use of support vector learning for chunk identification. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg, PA, USA, pp. 142–144. Association for Computational Linguistics (2000)
Liu, F., Tiejun, Z., Hao, Y., Muyun, Y., Gaolin, F.: Chinese chunking analysis based on statistics. J. Chinese Inf. Process. 6, 28–32+39 (2000). (in Chinese)
Degen, H., Jing, Y.: A Distributed strategy for CRFs based Chinese text chunking. J. Chin. Inf. Process. 23(1), 16–22 (2009). (in Chinese)
Shao, T., Wang, C., Rao, G., Xun, E.: The semantic change and distribution of adjoining adverbs in modern Chinese. From minimal contrast to meaning construct. In: Frontiers in Chinese Linguistics, vol. 9. Springer, Singapore (2000), https://doi.org/10.1007/978-981-32-9240-6
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
Acknowledgement
This paper is supported by the National Key Research and Development Program of China 2020AAA0106700, NSFC project U19A2065 and NSFC project 62076038.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, C., Liu, X., Rao, G., Xun, E., Sui, Z. (2022). Chinese Predicate Chunk Knowledge Base Construction and Internal Boundary Recognition. In: Dong, M., Gu, Y., Hong, JF. (eds) Chinese Lexical Semantics. CLSW 2021. Lecture Notes in Computer Science(), vol 13250. Springer, Cham. https://doi.org/10.1007/978-3-031-06547-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-06547-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06546-0
Online ISBN: 978-3-031-06547-7
eBook Packages: Computer ScienceComputer Science (R0)