Enhancing abstractive summarization of implicit datasets with contrastive attention

Kwon, Soonki; Lee, Younghoon

doi:10.1007/s00521-024-09864-y

Enhancing abstractive summarization of implicit datasets with contrastive attention

Original Article
Published: 15 May 2024

(2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

21 Accesses
Explore all metrics

Abstract

It is important for abstractive summarization models to understand the important parts of the original document and create a natural summary accordingly. Recently, studies have been conducted to incorporate important parts of the original document during learning and have shown good performance. However, these studies are effective for explicit datasets but not implicit datasets which are relatively more abstract. This study addresses the challenge of summarizing implicit datasets, which have a lower deviation in the significance of important sentences compared to explicit datasets. A multi-task learning approach that reflects information about salient and incidental objects during the learning process was proposed. This was achieved by adding a contrastive objective to the fine-tuning process of the encoder-decoder language model. The salient and incidental parts were selected based on the ROUGE-L F1 score and their relationships were learned through triplet loss. The proposed method was evaluated using five benchmark summarization datasets, including two explicit and three implicit. The experimental results showed a greater improvement in implicit datasets, particularly for the highly abstractive XSum dataset, compared to the vanilla fine-tuning method in both the BART-base and T5-small models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recovering Missing Key Information: An Aspect-Guided Generator for Abstractive Multi-document Summarization

Employing Internal and External Knowledge to Factuality-Oriented Abstractive Summarization

A hierarchical framework based on transformer technology to achieve factual consistent and non-redundant abstractive text summarization

Article 27 October 2023

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

https://pytorch.org/.
The pre-trained checkpoint is "facebook/bart-base".
The pre-trained checkpoint is "t5-small".
https://huggingface.co/docs/transformers/index.
https://huggingface.co/docs/datasets/v1.0.1/index.html.
https://github.com/danieldeutsch/sacrerouge/blob/master/doc/datasets/nytimes.md.
https://github.com/Tiiiger/bert_score.

References

Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 379–389
Nallapati R, Zhou B, Santos C, Gulçehre Ç, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, pp. 280–290
See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 1073–1083
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems. 30
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 4171–4186
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 7871–7880
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551
MathSciNet Google Scholar
Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning, pp. 11328–11339. PMLR
Zheng C, Zhang K, Wang HJ, Fan L, Wang Z (2021) Enhanced seq2seq autoencoder via contrastive learning for abstractive text summarization. In: 2021 IEEE international conference on big data (Big Data), pp. 1764–1771. IEEE
Xu S, Zhang X, Wu Y, Wei F (2022) Sequence level contrastive learning for text summarization. In: Proceedings of the AAAI conference on artificial intelligence, pp. 11556–11565
Wang F, Song K, Zhang H, Jin L, Cho S, Yao W, Wang X, Chen M, Yu D (2022) Salience allocation as guidance for abstractive summarization. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp. 6094–6106
Dou Z-Y, Liu P, Hayashi H, Jiang Z, Neubig G (2021) Gsum: a general framework for guided neural abstractive summarization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 4830–4842
Ranzato M, Chopra S, Auli M, Zaremba W (2015) Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732
Liu Y, Liu P (2021) Simcls: A simple framework for contrastive learning of abstractive summarization. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 2: Short Papers), pp. 1065–1072
Ravaut M, Joty S, Chen N (2022) Summareranker: a multi-task mixture-of-experts re-ranking framework for abstractive summarization. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 4504–4524
Zhao Y, Khalman M, Joshi R, Narayan S, Saleh M, Liu PJ (2023) Calibrating sequence likelihood improves conditional language generation. In: The eleventh international conference on learning representations
Liu Y, Liu P, Radev D, Neubig G (2022) Brio: Bringing order to abstractive summarization. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 2890–2903
Zhang X, Liu Y, Wang X, He P, Yu Y, Chen S-Q, Xiong W, Wei F (2022) Momentum calibration for text generation. arXiv preprint arXiv:2212.04257
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735–1742. IEEE
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607. PMLR
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738
Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst 33:21271–21284
Google Scholar
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Google Scholar
Gunel B, Du J, Conneau A, Stoyanov V (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: International conference on learning representations
Lee S, Lee DB, Hwang SJ (2021) Contrastive learning with adversarial perturbations for conditional text generation. In: International conference on learning representations
Liu W, Wu H, Mu W, Li Z, Chen T, Nie D (2021) Co2sum: contrastive learning for factual-consistent abstractive summarization. arXiv preprint arXiv:2112.01147
An C, Zhong M, Wu Z, Zhu Q, Huang X-J, Qiu X (2022) Colo: A contrastive learning based re-ranking framework for one-stage summarization. In: Proceedings of the 29th international conference on computational linguistics, pp. 5783–5793
Cao S, Wang L (2021) Cliff: contrastive learning for improving faithfulness and factuality in abstractive summarization. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp. 6633–6649
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823
Zhong M, Liu P, Chen Y, Wang D, Qiu X, Huang X-J (2020) Extractive summarization as text matching. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 6197–6208
Hsu W-T, Lin C-K, Lee M-Y, Min K, Tang J, Sun M (2018) A unified model for extractive and abstractive summarization using inconsistency loss. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 132–141
Gehrmann S, Deng Y, Rush A (2018) Bottom-up abstractive summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 4098–4109
Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi, Y (2020) Bertscore: evaluating text generation with bert. In: International conference on learning representations
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches Out, pp. 74–81
Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. Advances in neural information processing systems. 28
Sandhaus E (2008) The new york times annotated corpus. Linguistic Data Consortium, Philadelphia 6(12):26752
Deutsch D, Roth D (2020) SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics. In: Proceedings of second workshop for NLP open source software (NLP-OSS), pp. 120–125. Association for Computational Linguistics, Online. https://www.aclweb.org/anthology/2020.nlposs-1.17
Xu J, Gan Z, Cheng Y, Liu J (2020) Discourse-aware neural extractive text summarization. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 5021–5031
Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 1797–1807
Kim B, Kim H, Kim G (2019) Abstractive summarization of reddit posts with multi-level memory networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 2519–2531
Gliwa B, Mochol I, Biesek M, Wawer A (2019) SAMSum corpus: a human-annotated dialogue dataset for abstractive summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, pp. 70–79
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations

Download references

Acknowledgements

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2023-00244789).

Author information

Authors and Affiliations

Department of Data Science, Seoul National University of Science and Technology, 232, Gongneung-ro, Nowon-gu, Seoul, 01811, Republic of Korea
Soonki Kwon
Department of Industrial Engineering, Seoul National University of Science and Technology, 232, Gongneung-ro, Nowon-gu, Seoul, 01811, Republic of Korea
Younghoon Lee

Authors

Soonki Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Younghoon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Younghoon Lee.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Packages and hyper parameters

Our Implementation was based on Pytorch.^{Footnote 1} We used the pre-trained BART^{Footnote 2} and T5^{Footnote 3} model from Huggingface Transformers^{Footnote 4} library. We downloaded the datasets from the Datasets library,^{Footnote 5} and the NYT dataset was downloaded from [37].^{Footnote 6} We utilized the Datasets library for the ROUGE metric, and for BERTScore, we used the officially distributed bert-score package.^{Footnote 7} The training of the BART-base model was conducted on 8 GTX 2080 Ti GPUs, taking around 1.8 h per epoch for CNNDM dataset and 1.2 h for XSum dataset (Tables 13 and 14).

Table 13 Hyperparameters for fine-tuning the models

Full size table

Table 14 Hyperparameters for the summary generation

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kwon, S., Lee, Y. Enhancing abstractive summarization of implicit datasets with contrastive attention. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09864-y

Download citation

Received: 01 August 2023
Accepted: 12 April 2024
Published: 15 May 2024
DOI: https://doi.org/10.1007/s00521-024-09864-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing abstractive summarization of implicit datasets with contrastive attention

Abstract

Access this article

Similar content being viewed by others

Recovering Missing Key Information: An Aspect-Guided Generator for Abstractive Multi-document Summarization

Employing Internal and External Knowledge to Factuality-Oriented Abstractive Summarization

A hierarchical framework based on transformer technology to achieve factual consistent and non-redundant abstractive text summarization

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Packages and hyper parameters

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancing abstractive summarization of implicit datasets with contrastive attention

Abstract

Access this article

Similar content being viewed by others

Recovering Missing Key Information: An Aspect-Guided Generator for Abstractive Multi-document Summarization

Employing Internal and External Knowledge to Factuality-Oriented Abstractive Summarization

A hierarchical framework based on transformer technology to achieve factual consistent and non-redundant abstractive text summarization

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Packages and hyper parameters

Appendix: Packages and hyper parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation