Skip to main content
Log in

Enhancing abstractive summarization of implicit datasets with contrastive attention

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

It is important for abstractive summarization models to understand the important parts of the original document and create a natural summary accordingly. Recently, studies have been conducted to incorporate important parts of the original document during learning and have shown good performance. However, these studies are effective for explicit datasets but not implicit datasets which are relatively more abstract. This study addresses the challenge of summarizing implicit datasets, which have a lower deviation in the significance of important sentences compared to explicit datasets. A multi-task learning approach that reflects information about salient and incidental objects during the learning process was proposed. This was achieved by adding a contrastive objective to the fine-tuning process of the encoder-decoder language model. The salient and incidental parts were selected based on the ROUGE-L F1 score and their relationships were learned through triplet loss. The proposed method was evaluated using five benchmark summarization datasets, including two explicit and three implicit. The experimental results showed a greater improvement in implicit datasets, particularly for the highly abstractive XSum dataset, compared to the vanilla fine-tuning method in both the BART-base and T5-small models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. https://pytorch.org/.

  2. The pre-trained checkpoint is "facebook/bart-base".

  3. The pre-trained checkpoint is "t5-small".

  4. https://huggingface.co/docs/transformers/index.

  5. https://huggingface.co/docs/datasets/v1.0.1/index.html.

  6. https://github.com/danieldeutsch/sacrerouge/blob/master/doc/datasets/nytimes.md.

  7. https://github.com/Tiiiger/bert_score.

References

  1. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 379–389

  2. Nallapati R, Zhou B, Santos C, Gulçehre Ç, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, pp. 280–290

  3. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 1073–1083

  4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems. 30

  5. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 4171–4186

  6. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 7871–7880

  7. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551

    MathSciNet  Google Scholar 

  8. Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning, pp. 11328–11339. PMLR

  9. Zheng C, Zhang K, Wang HJ, Fan L, Wang Z (2021) Enhanced seq2seq autoencoder via contrastive learning for abstractive text summarization. In: 2021 IEEE international conference on big data (Big Data), pp. 1764–1771. IEEE

  10. Xu S, Zhang X, Wu Y, Wei F (2022) Sequence level contrastive learning for text summarization. In: Proceedings of the AAAI conference on artificial intelligence, pp. 11556–11565

  11. Wang F, Song K, Zhang H, Jin L, Cho S, Yao W, Wang X, Chen M, Yu D (2022) Salience allocation as guidance for abstractive summarization. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp. 6094–6106

  12. Dou Z-Y, Liu P, Hayashi H, Jiang Z, Neubig G (2021) Gsum: a general framework for guided neural abstractive summarization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 4830–4842

  13. Ranzato M, Chopra S, Auli M, Zaremba W (2015) Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732

  14. Liu Y, Liu P (2021) Simcls: A simple framework for contrastive learning of abstractive summarization. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 2: Short Papers), pp. 1065–1072

  15. Ravaut M, Joty S, Chen N (2022) Summareranker: a multi-task mixture-of-experts re-ranking framework for abstractive summarization. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 4504–4524

  16. Zhao Y, Khalman M, Joshi R, Narayan S, Saleh M, Liu PJ (2023) Calibrating sequence likelihood improves conditional language generation. In: The eleventh international conference on learning representations

  17. Liu Y, Liu P, Radev D, Neubig G (2022) Brio: Bringing order to abstractive summarization. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 2890–2903

  18. Zhang X, Liu Y, Wang X, He P, Yu Y, Chen S-Q, Xiong W, Wei F (2022) Momentum calibration for text generation. arXiv preprint arXiv:2212.04257

  19. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735–1742. IEEE

  20. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607. PMLR

  21. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738

  22. Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst 33:21271–21284

    Google Scholar 

  23. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673

    Google Scholar 

  24. Gunel B, Du J, Conneau A, Stoyanov V (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: International conference on learning representations

  25. Lee S, Lee DB, Hwang SJ (2021) Contrastive learning with adversarial perturbations for conditional text generation. In: International conference on learning representations

  26. Liu W, Wu H, Mu W, Li Z, Chen T, Nie D (2021) Co2sum: contrastive learning for factual-consistent abstractive summarization. arXiv preprint arXiv:2112.01147

  27. An C, Zhong M, Wu Z, Zhu Q, Huang X-J, Qiu X (2022) Colo: A contrastive learning based re-ranking framework for one-stage summarization. In: Proceedings of the 29th international conference on computational linguistics, pp. 5783–5793

  28. Cao S, Wang L (2021) Cliff: contrastive learning for improving faithfulness and factuality in abstractive summarization. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp. 6633–6649

  29. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823

  30. Zhong M, Liu P, Chen Y, Wang D, Qiu X, Huang X-J (2020) Extractive summarization as text matching. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 6197–6208

  31. Hsu W-T, Lin C-K, Lee M-Y, Min K, Tang J, Sun M (2018) A unified model for extractive and abstractive summarization using inconsistency loss. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 132–141

  32. Gehrmann S, Deng Y, Rush A (2018) Bottom-up abstractive summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 4098–4109

  33. Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi, Y (2020) Bertscore: evaluating text generation with bert. In: International conference on learning representations

  34. Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches Out, pp. 74–81

  35. Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. Advances in neural information processing systems. 28

  36. Sandhaus E (2008) The new york times annotated corpus. Linguistic Data Consortium, Philadelphia 6(12):26752

  37. Deutsch D, Roth D (2020) SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics. In: Proceedings of second workshop for NLP open source software (NLP-OSS), pp. 120–125. Association for Computational Linguistics, Online. https://www.aclweb.org/anthology/2020.nlposs-1.17

  38. Xu J, Gan Z, Cheng Y, Liu J (2020) Discourse-aware neural extractive text summarization. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 5021–5031

  39. Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 1797–1807

  40. Kim B, Kim H, Kim G (2019) Abstractive summarization of reddit posts with multi-level memory networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 2519–2531

  41. Gliwa B, Mochol I, Biesek M, Wawer A (2019) SAMSum corpus: a human-annotated dialogue dataset for abstractive summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, pp. 70–79

  42. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations

Download references

Acknowledgements

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2023-00244789).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Younghoon Lee.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Packages and hyper parameters

Appendix: Packages and hyper parameters

Our Implementation was based on Pytorch.Footnote 1 We used the pre-trained BARTFootnote 2 and T5Footnote 3 model from Huggingface TransformersFootnote 4 library. We downloaded the datasets from the Datasets library,Footnote 5 and the NYT dataset was downloaded from [37].Footnote 6 We utilized the Datasets library for the ROUGE metric, and for BERTScore, we used the officially distributed bert-score package.Footnote 7 The training of the BART-base model was conducted on 8 GTX 2080 Ti GPUs, taking around 1.8 h per epoch for CNNDM dataset and 1.2 h for XSum dataset (Tables 13 and 14).

Table 13 Hyperparameters for fine-tuning the models
Table 14 Hyperparameters for the summary generation

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kwon, S., Lee, Y. Enhancing abstractive summarization of implicit datasets with contrastive attention. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09864-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-024-09864-y

Keywords

Navigation