MOPRD: A multidisciplinary open peer review dataset

Lin, Jialiang; Song, Jiaxin; Zhou, Zhangping; Chen, Yidong; Shi, Xiaodong

doi:10.1007/s00521-023-08891-5

MOPRD: A multidisciplinary open peer review dataset

Original Article
Published: 23 September 2023

Volume 35, pages 24191–24206, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

519 Accesses
47 Altmetric
1 Mention
Explore all metrics

Abstract

Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as the data are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author’s rebuttal letters, and editorial decisions. Moreover, we propose a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance as indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BetterPR: A Dataset for Estimating the Constructiveness of Peer Review Comments

Investigations on Meta Review Generation from Peer Review Texts Leveraging Relevant Sub-tasks in the Peer Review Pipeline

What have we learned from OpenReview?

Article 09 November 2022

Data availability

The method of getting our dataset is provided within the paper.

Notes

https://proceedings.neurips.cc/.
https://openreview.net/about.
https://poppler.freedesktop.org/.
https://www.libreoffice.org/discover/writer/.
http://www.linjialiang.net/publications/moprd.
The data were collected on Aug 16, 2022.
Apart from the review comments, the content of the manuscript itself is also used by some researchers for this task.

References

Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: EMNLP-IJCNLP. https://doi.org/10.18653/v1/D19-1371
Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150
Bornmann L, Wolf M, Daniel HD (2012) Closed versus open reviewing of journal manuscripts: how far do comments differ in language use? Scientometrics 91(3):843–856. https://doi.org/10.1007/s11192-011-0569-5
Article Google Scholar
Choudhary G, Modani N, Maurya N (2021) ReAct: a review comment dataset for actionability (and more). In: WISE. https://doi.org/10.1007/978-3-030-91560-5_24
Deng Z, Peng H, Xia C, et al (2020) Hierarchical bi-directional self-attention networks for paper review rating recommendation. In: COLING. https://doi.org/10.18653/v1/2020.coling-main.555
Fan A, Lewis M, Dauphin Y (2018) Hierarchical neural story generation. In: ACL. https://doi.org/10.18653/v1/P18-1082
Ford E (2013) Defining and characterizing open peer review: a review of the literature. J Sch Publish 44(4):311–326. https://doi.org/10.3138/jsp.44-4-001
Article Google Scholar
Gao Y, Eger S, Kuznetsov I et al (2019) Does my rebuttal matter? Insights from a major NLP conference. In: NAACL-HLT. https://doi.org/10.18653/v1/N19-1129
Ghosal T, Kumar S, Bharti PK et al (2022) Peer review analyze: a novel benchmark resource for computational analysis of peer reviews. PLOS One 17(1):e0259-238. https://doi.org/10.1371/journal.pone.0259238
Ghosal T, Verma R, Ekbal A et al (2019a) A sentiment augmented deep architecture to predict peer review outcomes. In: JCDL. https://doi.org/10.1109/JCDL.2019.00096
Ghosal T, Verma R, Ekbal A et al (2019b) DeepSentiPeer: harnessing sentiment in review texts to recommend peer review decisions. In: ACL. https://doi.org/10.18653/v1/P19-1106
Guo M, Ainslie J, Uthus D et al (2022) LongT5: efficient text-to-text transformer for long sequences. In: Findings of NAACL
Han H, Bai X, Li P (2019) Augmented sentiment representation by learning context information. Neural Comput Appl 31(12):8475–8482. https://doi.org/10.1007/s00521-018-3698-4
Article Google Scholar
Huan JL, Sekh AA, Quek C et al (2022) Emotionally charged text classification with deep learning and sentiment semantic. Neural Comput Appl 34(3):2341–2351. https://doi.org/10.1007/s00521-021-06542-1
Article Google Scholar
Hua X, Nikolov M, Badugu N et al (2019) Argument mining for understanding peer reviews. In: NAACL-HLT. https://doi.org/10.18653/v1/N19-1219
Kang D, Ammar W, Dalvi B et al (2018) A dataset of peer reviews (PeerRead): collection, insights and NLP applications. In: NAACL-HLT. https://doi.org/10.18653/v1/N18-1149
Khan K (2010) Is open peer review the fairest system? No. BMJ 341:c6425. https://doi.org/10.1136/bmj.c6425
Article Google Scholar
Klein G, Kim Y, Deng Y et al (2017) OpenNMT: open-source toolkit for neural machine translation. In: ACL Demo
Laine C (2017) Scientific misconduct hurts. Ann Internal Med 166(2):148–149. https://doi.org/10.7326/M16-2550
Article Google Scholar
Lewis M, Liu Y, Goyal N et al (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL. https://doi.org/10.18653/v1/2020.acl-main.703
Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out
Lin J, Song J, Zhou Z et al (2023) Automated scholarly paper review: concepts, technologies, and challenges. Information Fusion 98. https://doi.org/10.1016/j.inffus.2023.101830
Lin J, Wang Y, Yu Y et al (2022) Automatic analysis of available source code of top artificial intelligence conference papers. Int J Softw Eng Knowl Eng 32(07):947–970. https://doi.org/10.1142/s0218194022500358
Article Google Scholar
Loper E, Bird S (2002) NLTK: the natural language toolkit. In: ETMTNLP. https://doi.org/10.3115/1118108.1118117
Lopez P (2009) GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: ECDL. https://doi.org/10.1007/978-3-642-04346-8_62
Matsui A, Chen E, Wang Y et al (2021) The impact of peer review on the contribution potential of scientific papers. PeerJ 9(e11):999. https://doi.org/10.7717/peerj.11999
Article Google Scholar
Morrison J (2006) The case for open peer review. Med Educ 40(9):830–831. https://doi.org/10.1111/j.1365-2929.2006.02573.x
Article Google Scholar
Nalimov VV, Mulchenko ZM (1971) Measurement of science: study of the development of science as an information process. Foreign Technology Division, Washington DC
Google Scholar
Nobarany S, Booth KS (2017) Understanding and supporting anonymity policies in peer review. J Assoc Inform Sci Technol 68(4):957–971. https://doi.org/10.1002/asi.23711
Article Google Scholar
Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization. In: ICLR
Plank B, van Dalen R (2019) CiteTracked: a longitudinal dataset of peer reviews and citations. In: BIRNDL
Pradhan T, Bhatia C, Kumar P et al (2021) A deep neural architecture based meta-review generation and final decision prediction of a scholarly article. Neurocomputing 428:218–238. https://doi.org/10.1016/j.neucom.2020.11.004
Article Google Scholar
Ribeiro AC, Sizo A, Lopes Cardoso H et al (2021) Acceptance decision prediction in peer-review through sentiment analysis. In: EPIA. https://doi.org/10.1007/978-3-030-86230-5_60
Shen C, Cheng L, Zhou R et al (2022) MReD: a meta-review dataset for structure-controllable text generation. In: Findings of ACL. https://doi.org/10.18653/v1/2022.findings-acl.198
Singh S, Singh M, Goyal P (2021) COMPARE: a taxonomy and dataset of comparison discussions in peer reviews. In: JCDL, https://doi.org/10.1109/JCDL52503.2021.00068
Soltau H, Liao H, Sak H (2017) Neural speech recognizer: acoustic-to-word LSTM model for large vocabulary speech recognition. In: Interspeech. https://doi.org/10.21437/Interspeech.2017-1566
Stappen L, Rizos G, Hasan M et al (2020) Uncertainty-aware machine support for paper reviewing on the Interspeech 2019 Submission Corpus. In: Interspeech. https://doi.org/10.21437/Interspeech.2020-2862
Van Noorden R (2015) Interdisciplinary research by the numbers. Nature 525(7569):306–307. https://doi.org/10.1038/525306a
Article Google Scholar
van Rooyen S, Godlee F, Evans S et al (1998) Effect of blinding and unmasking on the quality of peer review: a randomized trial. JAMA 280(3):234–237. https://doi.org/10.1001/jama.280.3.234
Article Google Scholar
van Rooyen S, Godlee F, Evans S et al (1999) Effect of open peer review on quality of reviews and on reviewers’ recommendations: a randomised trial. BMJ 318(7175):23–27. https://doi.org/10.1136/bmj.318.7175.23
Article Google Scholar
Walsh E, Rooney M, Appleby L et al (2000) Open peer review: a randomised controlled trial. Br J Psychiat 176(1):47–51. https://doi.org/10.1192/bjp.176.1.47
Article Google Scholar
Ware M, Mabe M (2015) The STM Report: an overview of scientific and scholarly journal publishing, 4th edn. Technical and Medical Publishers, International Association of Scientific
Wolf T, Debut L, Sanh V et al (2020) Transformers: state-of-the-art natural language processing. In: EMNLP Demo. https://doi.org/10.18653/v1/2020.emnlp-demos.6
Wolfram D, Wang P, Hembree A et al (2020) Open peer review: promoting transparency in open science. Scientometrics 125(2):1033–1051. https://doi.org/10.1007/s11192-020-03488-4
Article Google Scholar
Xiao W, Beltagy I, Carenini G et al (2022) PRIMERA: Pyramid-based masked sentence pre-training for multi-document summarization. In: ACL. https://doi.org/10.18653/v1/2022.acl-long.360
Yuan W, Neubig G, Liu P (2021) BARTScore: evaluating generated text as text generation. In: NeurIPS
Yuan W, Liu P, Neubig G (2022) Can we automate scientific reviewing? J Artif Intell Res 75:171–212. https://doi.org/10.1613/jair.1.12862
Article MathSciNet MATH Google Scholar
Zaheer M, Guruganesh G, Dubey A et al (2020) Big Bird: transformers for longer sequences. In: NeurIPS

Download references

Acknowledgements

This work is partly funded by the 13th Five-Year Plan project Artificial Intelligence and Language of State Language Commission of China (Grant No. WT135-38). We appreciate Fangzhi Chen, Guantian Ding, Hongkun Fang, Jiabin Xue, Jingjing Wang, Jintao Guo, Li Lei, Ning Zhang, Zhou Xu, and Zhu Lin for their work in evaluating the review comments. Special and heartfelt gratitude goes to the first author’s wife Fenmei Zhou, for her understanding and love. Her unwavering support and continuous encouragement enable this research to be possible.

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, China
Jialiang Lin, Jiaxin Song, Yidong Chen & Xiaodong Shi
College of Foreign Languages and Cultures, Xiamen University, Xiamen, China
Zhangping Zhou
Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, Xiamen, China
Jialiang Lin, Jiaxin Song, Yidong Chen & Xiaodong Shi

Authors

Jialiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhangping Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Shi.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, J., Song, J., Zhou, Z. et al. MOPRD: A multidisciplinary open peer review dataset. Neural Comput & Applic 35, 24191–24206 (2023). https://doi.org/10.1007/s00521-023-08891-5

Download citation

Received: 02 December 2022
Accepted: 12 July 2023
Published: 23 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00521-023-08891-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MOPRD: A multidisciplinary open peer review dataset

Abstract

Access this article

Similar content being viewed by others

BetterPR: A Dataset for Estimating the Constructiveness of Peer Review Comments

Investigations on Meta Review Generation from Peer Review Texts Leveraging Relevant Sub-tasks in the Peer Review Pipeline

What have we learned from OpenReview?

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MOPRD: A multidisciplinary open peer review dataset

Abstract

Access this article

Similar content being viewed by others

BetterPR: A Dataset for Estimating the Constructiveness of Peer Review Comments

Investigations on Meta Review Generation from Peer Review Texts Leveraging Relevant Sub-tasks in the Peer Review Pipeline

What have we learned from OpenReview?

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation