Skip to main content
Log in

PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Even though peer review is a central aspect of scientific communication, research shows that the process reveals a power imbalance. The position of the reviewer allows them to be harsh and intentionally offensive without being held accountable. It casts doubt on the integrity of the peer-review process and transforms it into an unpleasant and traumatic experience for authors. Accordingly, more effort should be given to provide critical and constructive feedback. Hence, it is necessary to remedy the growing rudeness and lack of professionalism in the review system, by analyzing the tone of review comments and creating a classification on the level of politeness in a review comment. To this end, we develop the first annotated PolitePEER dataset encompassing five levels of politeness: (1) highly impolite, (2) impolite, (3) neutral, (4) polite, and (5) highly polite. The review sentences accrued from multiple venues, viz., ICLR, NeurIPS, Publons and ShitMyReviewersSay. We have formulated our annotation guidelines and conducted a thorough analysis of the PolitePEER dataset, ensuring the dataset quality with an inter-annotation agreement of 93%. Additionally, we have benchmarked PolitePEER for multiclass classification and provided an extensive analysis of the proposed baseline. As a result, the proposed PolitePEER can aid in developing a politeness indicator to notify the reviewer and the editors to amend and formalize the review accordingly. Our dataset and codes are available at https://github.com/PrabhatkrBharti/PolitePEER.git for the community to explore further.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://blog.f1000.com/2020/01/31/a-brief-history-of-peer-review/.

  2. https://mitcommlab.mit.edu/broad/commkit/peer-review-a-historical-perspective/

  3. https://www.humanities.hk/news/this-paper-is-absolutely-ridiculous-ken-hyland.

  4. https://iclr.cc/.

  5. https://neurips.cc.

  6. http://shitmyreviewerssay.tumblr.com.

  7. https://publons.com/wos-op/.

  8. https://huggingface.co/gsarti/scibert-nli.

  9. https://huggingface.co/GroNLP/hateBERT.

  10. https://huggingface.co/unitary/toxic-bert.

References

  • Andersson, L. M., & Pearson, C. M. (1999). Tit for tat? the spiraling effect of incivility in the workplace. Academy of Management Review, 24(3), 452–471.

    Article  Google Scholar 

  • Beaumont, L. J. (2019). Peer reviewers need a code of conduct too. Nature, 572(7769), 439–440.

    Article  Google Scholar 

  • Belcher, D. D. (2007). Seeking acceptance in an english-only research world. Journal of Second Language Writing, 16(1), 1–22.

    Article  Google Scholar 

  • Beltagy, I., Lo, K., Cohan, A. (2019). Scibert: A pretrained language model for scientific text. Preprint retrieved from http://arxiv.org/abs/1903.10676

  • Bharti, P.K., Ghosal, T., Agarwal, M., & Ekbal, A. (2022a). A dataset for estimating the constructiveness of peer review comments. In International conference on theory and practice of digital libraries (pp. 500–505). Springer.

  • Bharti, P.K., Ghosal, T., Agrawal, M., & Ekbal, A. (2022b). How confident was your reviewer? Estimating reviewer confidence from peer review texts. In International workshop on document analysis systems (pp. 126–139). Springer.

  • Bohannon, J. (2013). Who’s afraid of peer review? American Association for the Advancement of Science

  • Bonn, N.A. (2020). Noémie aubert bonn

  • Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222.

    Article  Google Scholar 

  • Brennan, S. E., & Ohaeri, J. O. (1999). Why do electronic conversations seem less polite? the costs and benefits of hedging. ACM SIGSOFT Software Engineering Notes, 24(2), 227–235.

    Article  Google Scholar 

  • Brown, P., & Levinson, S.C. (1978). Universals in language usage: Politeness phenomena. In Questions and politeness: Strategies in social interaction, pp. 56–311. Cambridge University Press

  • Brown, P., Levinson, S.C., & Levinson, S.C. (1987). Politeness: Some universals in language usage. Cambridge University Press

  • Burke, M., & Kraut, R. (2008). Mind your ps and qs: the impact of politeness and rudeness in online communities. In: Proceedings of the 2008 ACM conference on computer supported cooperative work (pp. 281–284). ACM

  • Caselli, T., Basile, V., Mitrović, J., & Granitzer, M. (2020). Hatebert: Retraining bert for abusive language detection in english. Preprint retrieved from http://arxiv.org/abs/2010.12472

  • Choudhary, G., Modani, N., & Maurya, N. (2021). React: A review comment dataset for act ionability (and more). In: Web information systems engineering–WISE 2021: 22nd International conference on web information systems engineering, WISE 2021, Melbourne, VIC, Australia, October 26–29, 2021 (pp. 336–343). Springer

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  • Coniam, D. (2012). Exploring reviewer reactions to manuscripts submitted to academic journals. System, 40(4), 544–553.

    Article  Google Scholar 

  • Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., & Kleinberg, J. (2012). Echoes of power: Language effects and power differences in social interaction. In: Proceedings of the 21st international conference on world wide web (pp. 699–708)

  • Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors. Preprint retrieved from http://arxiv.org/abs/1306.6078

  • Dueñas, P. M. (2012). Getting research published internationally in english: An ethnographic account of a team of finance spanish scholars’ struggles. Ibérica, Revista de la Asociación Europea de Lenguas para Fines Específicos, 24, 139–155.

    Google Scholar 

  • Duthler, K. W. (2006). The politeness of requests made via email and voicemail: Support for the hyperpersonal model. Journal of Computer-Mediated Communication, 11(2), 500–521.

    Article  Google Scholar 

  • Falkenberg, L. J., & Soranno, P. A. (2018). Reviewing reviews: An evaluation of peer reviews of journal article submissions. Limnology and Oceanography Bulletin, 27(1), 1–5.

    Article  Google Scholar 

  • Fortanet, I. (2008). Evaluative language in peer review referee reports. Journal of English for Academic Purposes, 7(1), 27–37.

    Article  Google Scholar 

  • Gao, Y., Eger, S., Kuznetsov, I., Gurevych, I., & Miyao, Y. (2019). Does my rebuttal matter? insights from a major nlp conference. Preprint retrieved from http://arxiv.org/abs/1903.11367

  • Ghosal, T., Kumar, S., Bharti, P. K., & Ekbal, A. (2022). Peer review analyze: A novel benchmark resource for computational analysis of peer reviews. Plos one, 17(1), 0259238.

    Article  Google Scholar 

  • Gilbert, E. (2012). Phrases that signal workplace hierarchy. In: Proceedings of the ACM 2012 conference on computer supported cooperative work (pp. 1037–1046). ACM

  • Grice, H.P. (1975). Logic and conversation. In: Speech acts (pp. 41–58). Brill

  • Herring, S.C. (1994). Politeness in computer culture: Why women thank and men flame. In: Cultural performances: Proceedings of the third Berkeley women and language conference (pp. 278–294)

  • Hewings, M. (2004). An’important contribution’or’tiresome reading’? a study of evaluation in peer reviews of journal article submissions. Journal of Applied Linguistics and Professional Practice, 2004, 247–274.

  • Holmes, J. (2005). When small talk is a big deal: Sociolinguistic challenges in the workplace. Second Language Needs Analysis, 344, 371.

    Google Scholar 

  • Hua, X., Nikolov, M., Badugu, N., Wang, L. (2019). Argument mining for understanding peer reviews. Preprint retrieved from http://arxiv.org/abs/1903.10104

  • Hyland, K. (2016). Academic publishing: Issues and challenges in the construction of knowledge-oxford applied linguistics

  • Hyland, K.(2018). Metadiscourse: Exploring interaction in writing. Bloomsbury Publishing

  • Hyland, K., & Jiang, F.K . (2020). “This work is antithetical to the spirit of research”: An anatomy of harsh peer reviews. Journal of English for Academic Purposes 46, 10.

  • Hyland, K. (2005). Stance and engagement: A model of interaction in academic discourse. Discourse Studies, 7(2), 173–192.

    Article  Google Scholar 

  • Jefferson, T., Rudin, M., Folse, S.B., & Davidoff, F. (2006). Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database of Systematic Reviews 1, 4.

  • Kang, D., Ammar, W., Dalvi, B., Zuylen, M., Kohlmeier, S., Hovy, E.H., & Schwartz, R. (2018). A dataset of peer reviews (peerread): Collection, insights and NLP applications. In M. A. Walker, H. Ji, A. Stent (eds.) Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018 (pp. 1647–1661). Association for Computational Linguistics. https://doi.org/10.18653/v1/n18-1149 .

  • Kendall, M. G., & Smith, B. (1939). The problem of m rankings. The Annals of Mathematical Statistics, 10(3), 275–287. https://doi.org/10.1214/aoms/1177732140

    Article  Google Scholar 

  • Kourilová, M. (1996). Interactive functions of language in peer reviews of medical papers written by non-native users of english. Unesco ALSED-LSP Newsletter, 19(1), 4–21.

    Google Scholar 

  • Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed). Sage

  • Lakoff, R.(1973). The logic of politeness: Or, minding your p’s and q’s. In: Proceedings from the annual meeting of the Chicago linguistic Society (pp. 292–305). Chicago Linguistic Society

  • Lakoff, R. (1977). What you can do with words: Politeness, pragmatics and performatives. In: Proceedings of the Texas conference on performatives, presuppositions and implicatures 9pp. 79–106). ERIC

  • Lauscher, A., Glavaš, G., & Ponzetto, S.P. (2018). An argument-annotated corpus of scientific publications. Association for Computational Linguistics

  • Leech, G.N. (2016). Principles of pragmatics. Routledge

  • Lin, J., Song, J., Zhou, Z., Chen, Y., & Shi, X. (2022). Moprd: A multidisciplinary open peer review dataset. Preprint retrieved froms http://arxiv.org/abs/2212.04972

  • Luu, S.T., & Nguyen, N.L.T. (2021). Uit-ise-nlp at semeval-2021 task 5: Toxic spans detection with bilstm-crf and toxicbert comment classification. Preprint retrieved from http://arxiv.org/abs/2104.10100

  • Matsui, A., Chen, E., Wang, Y., & Ferrara, E. (2021). The impact of peer review on the contribution potential of scientific papers. PeerJ, 9, 11999.

    Article  Google Scholar 

  • Mulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a changing world: An international study measuring the attitudes of researchers. Journal of the American Society for Information Science and Technology, 64(1), 132–161.

    Article  Google Scholar 

  • Mungra, P., & Webber, P. (2010). Peer review process in medical research publications: Language and content comments. English for Specific Purposes, 29(1), 43–53.

    Article  Google Scholar 

  • Obeng, S. G. (1997). Language and politics: Indirectness in political discourse. Discourse & Society, 8(1), 49–83.

    Article  Google Scholar 

  • Paltridge, B.(2017). The discourse of peer review (pp. 978–981). Palgrave Macmillan

  • Peterson, K., Hohensee, M., & Xia, F. (2011). Email formality in the workplace: A case study on the enron corpus. In: Proceedings of the workshop on language in social media (LSM 2011) (pp. 86–95). LSM

  • Plank, B., & Dalen, R. (2019). Citetracked: A longitudinal dataset of peer reviews and citations (pp. 116–122). BIRNDL@ SIGIR

  • Prabhakaran, V., Rambow, O., & Diab, M. (2012). Predicting overt display of power in written dialogs. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 518–522). ACL

  • Rogers, P. S., & Lee-Wong, S. M. (2003). Reconceptualizing politeness to accommodate dynamic tensions in subordinate-to-superior reporting. Journal of Business and Technical Communication, 17(4), 379–412.

    Article  Google Scholar 

  • Scholand, A.J., Tausczik, Y.R., & Pennebaker, J.W. (2010) Social language network analysis. In: Proceedings of the 2010 ACM conference on computer supported cooperative work (pp. 23–26).

  • Schwartz, S. J., & Zamboanga, B. L. (2009). The peer-review and editorial system: Ways to fix something that might be broken. Perspectives on Psychological Science, 4(1), 54–61.

    Article  Google Scholar 

  • Shema, H. (2022). The birth of modern peer review. Retrieved July 15, 2022, from https://blogs.scientificamerican.com/information-culture/the-birth-of-modern-peer-review/.

  • Shen, C., Cheng, L., Zhou, R., Bing, L., You, Y., & Si, L. (2022). Mred: A meta-review dataset for structure-controllable text generation. Findings of the Association for Computational Linguistics: ACL, 2022, 2521–2535.

    Google Scholar 

  • Silbiger, N. J., & Stubler, A. D. (2019). Unprofessional peer reviews disproportionately harm underrepresented groups in stem. PeerJ, 7, 8247.

    Article  Google Scholar 

  • Singh, S., Singh, M., & Goyal, P. (2021). Compare: A taxonomy and dataset of comparison discussions in peer reviews. In: 2021 ACM/IEEE joint conference on digital libraries (JCDL) (pp. 238–241). IEEE

  • Spencer, S. J., Logel, C., & Davies, P. G. (2016). Stereotype threat. Annual Review of Psychology, 67(1), 415–437.

    Article  Google Scholar 

  • Stappen, L., Rizos, G., Hasan, M., Hain, T., & Schuller, B.W. (2020). Uncertainty-aware machine support for paper reviewing on the interspeech 2019 submission corpus

  • Swales, J. (1996). Occluded genres in the academy. Academic Writing 1996, 45–58

  • Verma, R., Roychoudhury, R., Ghosal, T. (2022). The lack of theory is painful: Modeling harshness in peer review comments. In: Proceedings of the 2nd conference of the Asia-Pacific chapter of the association for computational linguistics and the 12th international joint conference on natural language processing (pp. 925–935). ACL

  • Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., Jurgens, D., Jurafsky, D., & Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 114(25), 6521–6526.

    Article  Google Scholar 

  • Wilcox, C. (2019). Rude reviews are pervasive and sometimes harmful, study finds. Science, 366(6472), 1433–1433.

    Article  Google Scholar 

  • Year’s Best Peer Review Comments: Papers That "Suck the Will to Live" — discovermagazine.com. Retrieved January 02, 2023, https://www.discovermagazine.com/mind/years-best-peer-review-comments-papers-that-suck-the-will-to-live.

  • Yuan, W., Liu, P., & Neubig, G. (2022). Can we automate scientific reviewing? Journal of Artificial Intelligence Research, 75, 171–212.

    Article  Google Scholar 

Download references

Acknowledgements

The third author, Asif Ekbal, has received the Visvesvaraya Young Faculty Award. He owes a debt of gratitude to the Indian government and the Ministry of Electronics and Information Technology for their assistance.

Author information

Authors and Affiliations

Authors

Contributions

PKB: Conceptualization, data curation, investigation, methodology, experiments, writing - original draft and review & editing. MN: Data curation, data cleaning. MA: Supervision, reviewing & editing. AE: Supervision, reviewing & editing.

Corresponding authors

Correspondence to Prabhat Kumar Bharti or Asif Ekbal.

Ethics declarations

Conflict of interest

It is declared that none of the authors have any conflicts of interest concerning the publication of this article.

Ethical approval

We do not intend to attack specific individuals. Our purpose is to draw attention to the negative cultural zeitgeist in peer review, hoping that a discussion will stimulate improvement.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bharti, P.K., Navlakha, M., Agarwal, M. et al. PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews. Lang Resources & Evaluation (2023). https://doi.org/10.1007/s10579-023-09662-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10579-023-09662-3

Keywords

Navigation