Introduction

Articles published in printed academic journals are the backbone of scientific communication. With the emergence of the internet a second channel has become more and more important, where so called preprints, working papers (WP) or any other kind of preliminary articles published (mostly) as a pdf-file are deposited. These kind of articles or papers have in almost all cases not been subject to a formal peer review process. There are several reasons for this type of publishing. Before submitting to a journal, authors want to gather input from other scientists. It can be a way of signaling in the scientific competition to be the first, especially when potentially many scientists work on the same or a similar topic.Footnote 1 Preprints are either published in a WP series, on a corresponding preprint server or in a private repository. WP are quite common in the social sciences – especially in economics. One of the first WP series is the Cowles Foundation Discussion Papers founded in 1955. Three well-known preprint repositories are arXiV, SSRN (Social Science Research Network) and RePEc (Research Papers in Economics), where the number of listed articles is huge. As of December 2019 on arXiv were more than 1.6 million articles listed, on SSRN about 770,000 and on RePEc over 800,000 working papers. Li et al. (2015) provide an overview of various repositories and their role in scholarly communication.

A natural question that arises is, if an article published as a preprint or WP has been finally published in a scientific journal, i.e. passing the peer review process. There are only a few articles that have dealt with this issue. Brown and Zimmermann (2017) check which articles from the Journal of Population Economics (275 in total) were published previously as a WP. McCabe and Snyder (2015) investigated for about 900 economics articles whether they have been deposited on a public available archive as RePEc or university web page. Larivière et al. (2014) analyzed almost 750,000 papers published on arXiV whether they have been finally published in a journal listed in Web of Science. Abdill and Blekhman (2019) did the same for 37,600 articles from bioRXiv. Tsunoda et al. (2019) also analyzed from bioRXiv, using a smaller sample of 17,800 articles. Finally, Anderson (2020) shows that over a period of five years 30% of preprints uploaded on bioRXiv never get published in a journal.

We contribute to this literature using data covering more than 28.000 articles published in four major economics working paper series. This this the largest data set in the social sciences so far. Our analysis is based on RePEc data. This article investigates where all these WP have gone: in a journal or a book chapter. Furthermore, evidence is provided that a growing number of articles are published in several working paper series.

Data

In socio-economic sciences RePEc has become an essential source for the spread of knowledge and ranking of individual authors as well as academic institutions. The RePEc network is growing continuously, as of December 2019 there were 2.8 million pieces of research from 3200 journals and 5000 working paper series. Additionally, more than 55,000 authors and 14,000 institutions from 101 countries are listed on the website.Footnote 2 Our study is based on articles from four major working paper series in economics

  • NBER (National Bureau of Economic Research) Working Papers

  • CEPR (Centre for European Policy Research) Discussion Papers

  • IZA (Institut für die Zukunft der Arbeit) Discussion Papers

  • CESifo (Center for Economic Studies) Working Papers.

This choice is driven by the number of yearly published papers, reputation and influence. These four belong to the most citedFootnote 3 and downloadedFootnote 4 working paper series on RePEc. The four series are published by networks of economists. Table 1 states the approximate number of network members retrieved from their websites as of December 2019. The largest network is NBER with about 1600 members and the smallest CEPR with 1300. Submitting papers to the series is only allowed to members of the corresponding networks and joining the network is only possible by invitation. Once an author is a member of a specific network they are free to submit any working paper. With this procedure the networks want to assure a specific level of quality of the submitted papers as invitations are only issued to established or promising researchers.

Many working papers series are associated with an organisation, faculty or a university often indicated by the name of the series. Thus, at least the submitter of a working paper should be affiliated with the issuing organisation. There are only a few series where everyone can submit a paper. A prominent example is the Munich Personal Archive (https://mpra.ub.uni-muenchen.de/).

We extracted meta-information for these four major working paper series in summer 2019 for the time period 2000–2012. WPs published later than 2012 are not being considered as there can be a substantial delay in the process to the final journal publication. This can be attributed to three potential reasons. First, the economics publishing process has slowed down (Ellison 2002). Second, if a paper was accepted by a journal, it takes sometimes a long period before finally being published in an issue. And third, the provision of the meta data in RePEc is often delayed.

In total we collected information for 28,877 WPs (Table 1). The majority of WPs (\(\sim\) 11,000) were published by the NBER network. Although the CESifo network is older (founded in 1991) it has published fewer WPs (3820) than the younger (established 1998) IZA network (7018 WPs). Figure 1 plots the annual number of WPs. For all years, NBER publishes the most WPs with more than 1000 in 2011 and CESifo the least with about 400 in 2012. There seems to be a general trend in publishing more WPs in every year, especially in the early 2000s, with CEPR being an exception here. The institution seems to have established a plateau of about 500 WPs per year.Footnote 5

RePEc links papers with corresponding title automatically. On the website it is stated “As long as two works have the same titles and are both listed in an author’s profile, we will link them automatically. Just give us some time. However, if the titles differ, you can create the links yourself by using this online form”. Therefore, if RePEc is unable to link an article, a registered author can automatically do this. We randomly checked the automated linkages for various working papers and found that RePEc does not only match exact recordings, but also those with a similar title.

Every RePEc page for each WP was manually assessed and checked whether it was published either as journal article or a book chapter. In case it was published in a journal the corresponding meta data was collected. Additionally, we checked whether a paper has been also published in other working paper series.

Table 1 Summary statistics for the working paper series
Fig. 1
figure 1

Quantitative development of working papers over time

Results

Overlap of publications in working paper series

In how many series has a WP been published? We state the answer in Fig. 2. In the left panel we plot a distribution across WP series.Footnote 6 The majority of WPs is released exclusively in one series. This especially pronounced for the NBER series with about 60%. For the CEPR series an article is published either in one or two series to the same amount (31%). The right panel of Fig. 2 shows the average number of series an article has been issued in. There is a clear upward tendency, i.e., papers are published in more and more series simultaneously. Papers released in the NBER series have the lowest rate of additional publications in other working paper series and papers published in CEPR have the highest rate. In Table 2 we show the overlap between the series. For example, 17% of all CEPR discussion papers have been simultaneously published as a NBER WP, which is the highest overlap among all series. From the NBER perspective these are 10.5%. The smaller percentage share is due to a higher number of NBER WPs compared to the CEPR series (see Table 1). The smallest overlap (\(<~3\)%) is between the CESifo and the NBER working paper series.

Fig. 2
figure 2

Number of publications as working paper

Table 2 Overlap between the four WP series in percent

Where have the working papers gone?

Basic estimates

We start answering the question where the working papers have gone by using the linked WP-articles provided on the RePEc website. Our results are stated in Table 3. The share of WP that have been published as a book chapter ranges between 0.8% for CESifo and 7.5% for NBER papers. In total we have a share of about 4%. With respect to journal publications we find that almost 50% of all WPs in our sample have been issued in a scientific journal. This number varies only marginally across WP series. For the IZA series we detect a slightly lower value with about 47%. The WPs were published in 622 different journals. The most articles (862) were finally published in the American Economic Review, followed by the Journal of Public Economics (456) and the Review of Economics and Statistics. More than 100 WPs in total were published in 38 journals. In contrast, for 189 journals we found only one corresponding WP.

For the moment, we do not find evidence where the remaining 46.5% of WPs have finally been published. Figure 3 plots the development over time. It shows that the publication rate in journals is quite stable and there seems to be no obvious trend.

How do our results compare to the existing literature? In Brown and Zimmermann (2017) 55% of the articles in the Journal of Population Economics were published as a WP. McCabe and Snyder (2015) detected the same number for articles published in 2005. For those before this date the share is somewhat lower. Larivière et al. (2014) found that 64% of articles published on arXiv were finally issued in a journal list in Web of Science. In Tsunoda et al. (2019) this amounts to about 40%. To summarize, our results are similar.

Table 3 Working papers published in journals or as chapters across series
Fig. 3
figure 3

Publication of WP in journals over time

Additional evidence based on a random sample

How robust or reliable are our results? Although our results are comparable to others in the literature, the share seems rather moderate. We already mentioned that RePEc automatically matches working paper titles with journal article titles. Furthermore, authors can link working papers to journal publications via their author account in RePEc. However, there can be several reasons why our figures underestimate the true value. First, and in our opinion the most important one, the title might have been changed during the revision process. Second, it has been published in a journal or book chapter that is not listed in RePEc. This might apply especially to journals outside economics or statistics. In order to investigate these issues, we draw a random sample from the non-matched articles, specifically 100 per WP series. For each of these 400 articles we searched the authors webpages and CVs (if available), looking for papers with similar titles or themes. Moreover, by reading and comparing abstracts, we identified a (possible) link between a working paper and a corresponding journal article.

Table 4 shows the results of these efforts. In our random sample we were able to match 36% of the working papers to a journal article. Across series the share is somewhat similar. Looking at the journals we find that almost all articles were published in (economics) journals that are also listed in RePEc and that many article titles (substantially) changed. Therefore, RePEc was not able to match them automatically. We also find that 9% were published as a book chapter. Here the NBER working paper series stands out, because 21% of the 100 investigated papers were published in a book. In the case of book chapters our investigations show that often the corresponding books are not listed in RePEc. For the remaining 55% of articles in our random sample no evidence was found, whether it was finally published in a journal or as book chapter was found.

Combining the results from the RePEc matching (Table 3) and the random sample with individual matching (Table 4) we provide an estimate both for journal publications and book chapters, for the former arriving at a value of 66.46% (49.56% + 36% \(\cdot\) 46.53%) and 7.83% (3.88% + 9% \(\cdot\) 46.53%) for the latter. It follows that for approximately 25% we find no record or evidence of an additional publication outlet besides the original WP.

Our new estimates are now higher compared to the other ones in the literature mentioned in the previous subsection. Changes in the title seem to be a serious issue in the matching process. This also shows that RePEc is not able to link all working papers to its published version in a journal. However, this would be difficult for any matching algorithm when the title changes substantially.

Table 4 Further matching evidence based on a random sample

Conclusion

This article analyzed whether, when and where a working paper has been published either in a journal or as a book chapter. Based on RePEc matching and a random sample we found that approximately 66.5% of about 28,000 investigated working papers where released in a journal.Footnote 7 Additionally, about 8% were issued as a book chapter. We have no record of what happened to the remaining 25.5% of WPs. Some caveats of our analysis should be mentioned:

  1. 1.

    The title of a WP could have changed and finally been published under a different name in a journal.

  2. 2.

    A WP might be published in a journal not covered by RePEc.

  3. 3.

    A WP has not been connected to the journal article in RePEc.

  4. 4.

    A WP has been completely revised, both with respect to the title as well the content, therefore impossible to be matched.

We are not able to quantify how large (or small) these effects are in our example. However, a certain share of WPs seem to be given up, i.e. has not passed formal peer review of a journal.