Abstract
Factual consistency is one of the most important requirements when editing high quality documents. It is extremely important for automatic text generation systems like summarization, question answering, dialog modeling, and language modeling. Still, automated factual inconsistency detection is rather under-studied. Existing work has focused on (a) finding fake news keeping a knowledge base in context, or (b) detecting broad contradiction (as part of natural language inference literature). However, there has been no work on detecting and explaining types of factual inconsistencies in text, without any knowledge base in context. In this paper, we leverage existing work in linguistics to formally define five types of factual inconsistencies. Based on this categorization, we contribute a novel dataset, FICLE (Factual Inconsistency CLassification with Explanation), with \(\sim \)8K samples where each sample consists of two sentences (claim and context) annotated with type and span of inconsistency. When the inconsistency relates to an entity type, it is labeled as well at two levels (coarse and fine-grained). Further, we leverage this dataset to train a pipeline of four neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair. Explanations include inconsistent claim fact triple, inconsistent context span, inconsistent claim component, coarse and fine-grained inconsistent entity types. The proposed system first predicts inconsistent spans from claim and context; and then uses them to predict inconsistency types and inconsistent entity types (when inconsistency is due to entities). We experiment with multiple Transformer-based natural language classification as well as generative models, and find that DeBERTa performs the best. Our proposed methods provide a weighted F1 of \(\sim \)87% for inconsistency type classification across the five classes. We make the code and dataset publicly available (https://github.com/blitzprecision/FICLE).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alhindi, T., Petridis, S., Muresan, S.: Where is your evidence: improving fact-checking by justification modeling. In: Proceedings of the First Workshop on Fact Extraction and Verification (FEVER), pp. 85–90 (2018)
Atanasova, P., Simonsen, J.G., Lioma, C., Augenstein, I.: Generating fact checking explanations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7352–7364 (2020)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, pp. 632–642. Association for Computational Linguistics (ACL) (2015)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large, annotated corpus for learning natural language inference (2015). Preprint at arXiv:1508.05326. Accessed 21 Jun 2021
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Camburu, O.M., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-SNLI: natural language inference with natural language explanations. Adv. Neural. Inf. Process. Syst. 31, 9539–9549 (2018)
Cao, Z., Wei, F., Li, W., Li, S.: Faithful to the original: fact aware neural abstractive summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Ciampaglia, G.L., Shiralkar, P., Rocha, L.M., Bollen, J., Menczer, F., Flammini, A.: Computational fact checking from knowledge networks. PLoS ONE 10(6), e0128193 (2015)
De Marneffe, M.C., Rafferty, A.N., Manning, C.D.: Finding contradictions in text. In: Proceedings of ACL-08: HLT, pp. 1039–1047 (2008)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dušek, O., Kasner, Z.: Evaluating semantic accuracy of data-to-text generation with natural language inference. In: Proceedings of the 13th International Conference on Natural Language Generation, pp. 131–137 (2020)
He, P., Liu, X., Gao, J., Chen, W.: DEBERTa: decoding-enhanced BERT with disentangled attention. arXiv preprint arXiv:2006.03654 (2020)
Honovich, O., Choshen, L., Aharoni, R., Neeman, E., Szpektor, I., Abend, O.: \(q^2\): evaluating factual consistency in knowledge-grounded dialogues via question generation and question answering. arXiv preprint arXiv:2104.08202 (2021)
Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Computing Surveys p, To appear (2022)
Joshi, P., Aditya, S., Sathe, A., Choudhury, M.: TaxiNLI: taking a ride up the NLU hill. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 41–55 (2020)
Kryściński, W., Keskar, N.S., McCann, B., Xiong, C., Socher, R.: Neural text summarization: a critical evaluation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 540–551 (2019)
Kumar, S., Talukdar, P.: NILE: natural language inference with faithful natural language explanations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8730–8742 (2020)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Longpre, S., Perisetla, K., Chen, A., Ramesh, N., DuBois, C., Singh, S.: Entity-based knowledge conflicts in question answering. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7052–7063 (2021)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Mao, Y., Ren, X., Ji, H., Han, J.: Constrained abstractive summarization: preserving factual consistency with constrained generation. arXiv preprint arXiv:2010.12723 (2020)
Maynez, J., Narayan, S., Bohnet, B., McDonald, R.: On faithfulness and factuality in abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1906–1919 (2020)
Mesgar, M., Simpson, E., Gurevych, I.: Improving factual consistency between a response and persona facts. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 549–562 (2021)
Nan, F., et al.: Entity-level factual consistency of abstractive text summarization. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 2727–2733 (2021)
Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4885–4901 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Saeed, J.: Semantics. Wiley, Introducing Linguistics (2011)
Shi, B., Weninger, T.: Discriminative predicate path mining for fact checking in knowledge graphs. Knowl.-Based Syst. 104, 123–133 (2016)
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: Fever: a large-scale dataset for fact extraction and verification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 809–819 (2018)
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: Generating token-level explanations for natural language inference. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 963–969 (2019)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Vedula, N., Parthasarathy, S.: FACE-KEG: fact checking explained using knowledge graphs. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 526–534 (2021)
Wang, A., Cho, K., Lewis, M.: Asking and answering questions to evaluate the factual consistency of summaries. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5008–5020 (2020)
Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1112–1122. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1101, www.aclanthology.org/N18-1101
Zellers, R., et al.: Defending against neural fake news. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Zhang, S., Niu, J., Wei, C.: Fine-grained factual consistency assessment for abstractive summarization models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 107–116 (2021)
Zhou, C., Neubig, G., Gu, J., Diab, M., Guzmán, F., Zettlemoyer, L., Ghazvininejad, M.: Detecting hallucinated content in conditional neural sequence generation. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1393–1404 (2021)
Zhu, C., et al.: Enhancing factual consistency of abstractive summarization. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 718–733 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
In this work, we derived a dataset from FEVER datasetFootnote 3. Data annotations in FEVER incorporate material from Wikipedia, which is licensed pursuant to the Wikipedia Copyright Policy. These annotations are made available under the license terms described on the applicable Wikipedia article pages, or, where Wikipedia license terms are unavailable, under the Creative Commons Attribution-ShareAlike License (version 3.0), available at this link: http://creativecommons.org/licenses/by-sa/3.0/. Thus, we made use of the dataset in accordance with its appropriate usage terms. The FICLE dataset does not contain any personally identifiable information. Details of the manual annotations are explained in Sect. 4 as well as in annotationGuidelines.pdf at https://github.com/blitzprecision/FICLE.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raha, T. et al. (2023). Neural Models for Factual Inconsistency Classification with Explanations. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-43418-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43417-4
Online ISBN: 978-3-031-43418-1
eBook Packages: Computer ScienceComputer Science (R0)