Abstract
It is important for abstractive summarization models to understand the important parts of the original document and create a natural summary accordingly. Recently, studies have been conducted to incorporate important parts of the original document during learning and have shown good performance. However, these studies are effective for explicit datasets but not implicit datasets which are relatively more abstract. This study addresses the challenge of summarizing implicit datasets, which have a lower deviation in the significance of important sentences compared to explicit datasets. A multi-task learning approach that reflects information about salient and incidental objects during the learning process was proposed. This was achieved by adding a contrastive objective to the fine-tuning process of the encoder-decoder language model. The salient and incidental parts were selected based on the ROUGE-L F1 score and their relationships were learned through triplet loss. The proposed method was evaluated using five benchmark summarization datasets, including two explicit and three implicit. The experimental results showed a greater improvement in implicit datasets, particularly for the highly abstractive XSum dataset, compared to the vanilla fine-tuning method in both the BART-base and T5-small models.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
The pre-trained checkpoint is "facebook/bart-base".
The pre-trained checkpoint is "t5-small".
References
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 379–389
Nallapati R, Zhou B, Santos C, Gulçehre Ç, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, pp. 280–290
See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 1073–1083
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems. 30
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 4171–4186
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 7871–7880
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551
Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning, pp. 11328–11339. PMLR
Zheng C, Zhang K, Wang HJ, Fan L, Wang Z (2021) Enhanced seq2seq autoencoder via contrastive learning for abstractive text summarization. In: 2021 IEEE international conference on big data (Big Data), pp. 1764–1771. IEEE
Xu S, Zhang X, Wu Y, Wei F (2022) Sequence level contrastive learning for text summarization. In: Proceedings of the AAAI conference on artificial intelligence, pp. 11556–11565
Wang F, Song K, Zhang H, Jin L, Cho S, Yao W, Wang X, Chen M, Yu D (2022) Salience allocation as guidance for abstractive summarization. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp. 6094–6106
Dou Z-Y, Liu P, Hayashi H, Jiang Z, Neubig G (2021) Gsum: a general framework for guided neural abstractive summarization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 4830–4842
Ranzato M, Chopra S, Auli M, Zaremba W (2015) Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732
Liu Y, Liu P (2021) Simcls: A simple framework for contrastive learning of abstractive summarization. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 2: Short Papers), pp. 1065–1072
Ravaut M, Joty S, Chen N (2022) Summareranker: a multi-task mixture-of-experts re-ranking framework for abstractive summarization. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 4504–4524
Zhao Y, Khalman M, Joshi R, Narayan S, Saleh M, Liu PJ (2023) Calibrating sequence likelihood improves conditional language generation. In: The eleventh international conference on learning representations
Liu Y, Liu P, Radev D, Neubig G (2022) Brio: Bringing order to abstractive summarization. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 2890–2903
Zhang X, Liu Y, Wang X, He P, Yu Y, Chen S-Q, Xiong W, Wei F (2022) Momentum calibration for text generation. arXiv preprint arXiv:2212.04257
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735–1742. IEEE
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607. PMLR
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738
Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst 33:21271–21284
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Gunel B, Du J, Conneau A, Stoyanov V (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: International conference on learning representations
Lee S, Lee DB, Hwang SJ (2021) Contrastive learning with adversarial perturbations for conditional text generation. In: International conference on learning representations
Liu W, Wu H, Mu W, Li Z, Chen T, Nie D (2021) Co2sum: contrastive learning for factual-consistent abstractive summarization. arXiv preprint arXiv:2112.01147
An C, Zhong M, Wu Z, Zhu Q, Huang X-J, Qiu X (2022) Colo: A contrastive learning based re-ranking framework for one-stage summarization. In: Proceedings of the 29th international conference on computational linguistics, pp. 5783–5793
Cao S, Wang L (2021) Cliff: contrastive learning for improving faithfulness and factuality in abstractive summarization. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp. 6633–6649
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823
Zhong M, Liu P, Chen Y, Wang D, Qiu X, Huang X-J (2020) Extractive summarization as text matching. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 6197–6208
Hsu W-T, Lin C-K, Lee M-Y, Min K, Tang J, Sun M (2018) A unified model for extractive and abstractive summarization using inconsistency loss. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 132–141
Gehrmann S, Deng Y, Rush A (2018) Bottom-up abstractive summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 4098–4109
Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi, Y (2020) Bertscore: evaluating text generation with bert. In: International conference on learning representations
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches Out, pp. 74–81
Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. Advances in neural information processing systems. 28
Sandhaus E (2008) The new york times annotated corpus. Linguistic Data Consortium, Philadelphia 6(12):26752
Deutsch D, Roth D (2020) SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics. In: Proceedings of second workshop for NLP open source software (NLP-OSS), pp. 120–125. Association for Computational Linguistics, Online. https://www.aclweb.org/anthology/2020.nlposs-1.17
Xu J, Gan Z, Cheng Y, Liu J (2020) Discourse-aware neural extractive text summarization. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 5021–5031
Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 1797–1807
Kim B, Kim H, Kim G (2019) Abstractive summarization of reddit posts with multi-level memory networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 2519–2531
Gliwa B, Mochol I, Biesek M, Wawer A (2019) SAMSum corpus: a human-annotated dialogue dataset for abstractive summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, pp. 70–79
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations
Acknowledgements
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2023-00244789).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Packages and hyper parameters
Appendix: Packages and hyper parameters
Our Implementation was based on Pytorch.Footnote 1 We used the pre-trained BARTFootnote 2 and T5Footnote 3 model from Huggingface TransformersFootnote 4 library. We downloaded the datasets from the Datasets library,Footnote 5 and the NYT dataset was downloaded from [37].Footnote 6 We utilized the Datasets library for the ROUGE metric, and for BERTScore, we used the officially distributed bert-score package.Footnote 7 The training of the BART-base model was conducted on 8 GTX 2080 Ti GPUs, taking around 1.8 h per epoch for CNNDM dataset and 1.2 h for XSum dataset (Tables 13 and 14).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kwon, S., Lee, Y. Enhancing abstractive summarization of implicit datasets with contrastive attention. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09864-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00521-024-09864-y