Comparing published scientific journal articles to their pre-print versions

Abstract

Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: US academic libraries paid \(\$1.7\) billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers from two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two working assumptions: (1) If the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Notes

  1. 1.

    https://confluence.cornell.edu/display/arxivpub/arXiv+Public+Wiki.

  2. 2.

    https://arxiv.org/help/robots.

  3. 3.

    http://export.arxiv.org/oai2?verb=Identify.

  4. 4.

    https://github.com/CrossRef/rest-api-doc/blob/master/rest_api.md.

  5. 5.

    https://arxiv.org/help/bulk_data_s3.

  6. 6.

    https://arxiv.org/help/stats/2016_by_area/index.

  7. 7.

    https://github.com/kermitt2/grobid.

  8. 8.

    http://aye.comp.nus.edu.sg/parsCit/.

  9. 9.

    https://github.com/ropensci/fulltext.

  10. 10.

    https://pypi.python.org/pypi/python-Levenshtein/0.11.2.

  11. 11.

    https://pypi.python.org/pypi/Distance/.

  12. 12.

    http://scikit-learn.org/stable/.

References

  1. 1.

    Björk, B.C.: Have the mega-journals reached the limits to growth? PeerJ 3, e981 (2015)

    Article  Google Scholar 

  2. 2.

    Björk, B.C., Welling, P., Laakso, M., Majlender, P., Hedlund, T., Guðnason, G.: Open access to the scientific journal literature: situation 2009. PLoS ONE 5(6), e11,273 (2009)

    Article  Google Scholar 

  3. 3.

    Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66, 2215–2222 (2015)

    Article  Google Scholar 

  4. 4.

    Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaud. Sci. Nat. 37, 547–579 (1901)

    Google Scholar 

  5. 5.

    Jamali, H.R., Nabavi, M.: Open access and sources of full-text articles in Google Scholar in different subject fields. Scientometrics 105(3), 1635–1651 (2015)

    Article  Google Scholar 

  6. 6.

    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  7. 7.

    Mabe, M.: (Electronic) Journal publishing. In: The E-Resource Management Handbook. UK Serials Group (2006)

  8. 8.

    Office of Management and Budget (U.S.): Fiscal Year 2014 Analytical Perspectives: Budget of the U.S. Government. Office of Management and Budget (2013)

  9. 9.

    Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2006)

    Google Scholar 

  10. 10.

    Porter, M.F.: An algorithm for suffix stripping. Electron. Libr. Inf. Syst. 14(3), 130–137 (1980)

    Google Scholar 

  11. 11.

    Sørensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biol. Skr. 5, 1–34 (1948)

    Google Scholar 

  12. 12.

    University of California: Accountability Report 2015. http://accountability.universityofcalifornia.edu/2015/chapters/chapter-9.html

  13. 13.

    Ware, M., Wabe, M.: The STM report—an overview of scientific and scholarly journal publishing. International Association of Scientific, Technical and Medical Publishers (2015). http://www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Martin Klein.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Klein, M., Broadwell, P., Farb, S.E. et al. Comparing published scientific journal articles to their pre-print versions. Int J Digit Libr 20, 335–350 (2019). https://doi.org/10.1007/s00799-018-0234-1

Download citation

Keywords

  • Open access
  • Pre-print
  • Scholarly publishing
  • Text similarity