Abstract
Correctly identifying the types of propositions helps to understand the logical relationship between sentences, and is of great significance to natural language understanding, reasoning and generation. However, in previous studies: 1) Only explicit propositions are concerned, while most propositions in texts are implicit; 2) Only detect whether it is a proposition, but it is more meaningful to identify which proposition type it belongs to; 3) Only in the encyclopedia domain, whereas propositions exist widely in various domains. We present ProPC, a dataset for in-domain and cross-domain propositions classification. It consists of 15,000 sentences, 4 different classifications, in 5 different domains. We define two new tasks: 1) In-domain proposition classification, which is to identify the proposition type of a given sentence (not limited to explicit proposition); 2) Cross-domain proposition classification, which takes encyclopedia as the source domain and the other 4 domains as the target domain. We use the Matching, Bert and RoBERTa as our baseline methods and run experiments on each task. The result shows that machine indeed can learn the characteristics of various types of propositions from explicit propositions and classify implicit propositions, but the ability of domain generalization still needs to be strengthened. Our dataset, ProPC, is publicly available at https://github.com/NLUSoCo/ProPC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Logical keywords, like “all...are...”, “both...and...”, “if...,then...”, “either...or...”, etc.
- 2.
for example, “if you don’t fight, you fail”, here the logical keywords should be “if...then”, it lose a “then”, so it is an implicit proposition.
References
Liu, L., et al.: Automatic recognition and analysis of explicit propositions in natural language. J. Chin. Inf. Process. 35(2), 41–51 (2021)
Tomasello, M.: Cognitive linguistics. In: A Companion to Cognitive Science, pp. 477–487(2017)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguistics 31(1), 71–106 (2005)
He, J., Fu, M., Tu, M.: Applying deep matching networks to Chinese medical question answering: a study and a dataset. BMC Med. Inf. Decis. Making 19(2), 91–100 (2019)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. arXiv preprint arXiv:1906.08101 (2019)
Huang, S.: On the hidden form of logical constant. J. Jianghan University (Soc. Sci. Ed.) 4 (1991)
Li, X., et al.: Language, logic and logic of language. Philos. Stud., 41–48 (1986)
Zhou, L.: Formal logic and natural language. Philos. Stud., 29–35 (1993)
Gao, F.: on the role of formal logic in language research. Mod. Chinese (Lang. Res. Ed.), 4–6 (2017)
Li, S., et al.: Analogical reasoning on Chinese morphological and semantic relations. arXiv preprint arXiv:1805.06504 (2018)
Zhang, M., Song, Y., Qin, B., Liu, T.: Semantic relation recognition of Chinese text level sentences. Acta Sinica Sinica 27(06), 51–57 (2013)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Xu, Z.: RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification. arXiv preprint arXiv:2103.00492 (2021)
McGrath, M., Frank, D.: The Stanford Encyclopedia of Philosophy. 2nd edn. Metaphysics Research Lab, Stanford University (2020)
Allwood, J., et al.: Logic in Linguistics. Cambridge University Press (1977)
ChineseNlpCorpus. https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/lawzhidao. Accessed 17 Jan 2019
ChineseNlpCorpus. https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/financezhidao. Accessed 17 Jan 2019
Acknowledgements
Support by Beijing Natural Science Foundation (4192057) and Science Foundation of Beijing Language and Culture University (the Fundamental Research Funds for the Central Universities: 21YJ040005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, M., Liu, P., Bo, L., Mao, Y., Xu, K., Su, W. (2021). ProPC: A Dataset for In-Domain and Cross-Domain Proposition Classification Tasks. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13028. Springer, Cham. https://doi.org/10.1007/978-3-030-88480-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-88480-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88479-6
Online ISBN: 978-3-030-88480-2
eBook Packages: Computer ScienceComputer Science (R0)