Abstract
Two major movements are actively at work to change the way research is done, shared, and reproduced. The first is the reproducible research (RR) approach, which has never been easier to implement given the current availability of tools and DIY manuals. The second is the FAIR (Findable, Accessible, Interoperable, and Reusable) approach, which aims to support the availability and sharing of research materials. We show here that despite the efforts made by researchers to improve the reproducibility of their research, the initial goals of RR remain mostly unmet. There is great demand, both within the scientific community and from the general public, for greater transparency and for trusted published results. As a scientific community, we need to reorganize the diffusion of all materials used in a study and to rethink the publication process. Researchers and journal reviewers should be able to easily use research materials for reproducibility, replicability, or reusability purposes or for exploration of new research paths. Here we present how the research process, from data collection to paper publication, could be reorganized and introduce some already available tools and initiatives. We show that even in cases in which data are confidential, journals and institutions can organize and promote “FAIR-like RR” solutions where not only the published paper but also all related materials can be used by any researcher.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We consider here that the research process starts once the data are collected and in possession of the researcher. We do not address here the issue of reproducibility for data collection in experimental economics or field experiments (Bowers et al. 2017).
- 2.
We will not discuss here the question of the precise meaning of “same results”.
- 3.
At the European level, one should mention OpenAIRE and in France the “Plan national pour la science ouverte” (https://www.ouvrirlascience.fr/).
- 4.
See also Table 2 in Appendix 2, for a synthesis of the cases presented throughout the paper.
- 5.
- 6.
In these figures, for clarity reasons, we do not illustrate the fact that researchers may share their materials themselves.
- 7.
In 2003, H. Pesaran announced the creation of a new section of the Journal of Applied Econometrics dedicated to the replication of published empirical papers (Pesaran 2003). Since then, some journals have followed this idea leading to an increase in the number of replication papers in economics (Mueller-Langer et al. 2019). The site PubPeer (https://pubpeer.com/) is also a way to allow users to discuss and review scientific research.
- 8.
Some useful resources facilitate the process (see https://social-science-data-editors.github.io/guidance/Verification_guidance.html). The Transparency and Openness Promotion (TOP) proposes also varying levels of replication policies for journals (Nosek et al. 2015).
- 9.
Jacoby William (2017) analyzed the AJPS verification policy and reported an average of 8 person-hours per manuscript to curate and replicate the analyses. The publication workflow, involving more rounds and resubmissions, is also much longer.
- 10.
A complete list of solutions is detailed in The Registry of Research Data Repositories (http://re3data.org) a service of DataCite. In addition, CoreTrustSeal provides certification to repositories and lists the certified ones.
- 11.
For datasets, the FAIR interoperability principle suggests the use of open formats such as CSV files instead of proprietary formats (.xls). For code, open-source software should be preferred to avoid exclusive access (Vilhuber 2019). The metadata should also follow standards (Dublin core or DDI). References and links to related data should also be provided (Jones and Grootveld 2017).
- 12.
The DataCite project (Brase 2009) is a popular resource to locate and precisely identify data through a unique DOI.
- 13.
- 14.
In France, the CASD (https://www.casd.eu/) is a single-access portal to many public data providers (INSEE, ministries, etc.). Researchers are not allowed to copy all the materials locally on their machine, and only some type of outputs can be extracted.
- 15.
The code may also contain some confidential elements. In particular, the code used for the initial data curation may contain, e.g., brand or city names and addresses.
- 16.
Some data providers, in particular NSOs, already perform RR on their confidential data, controlling output files and code, to check for confidentiality restrictions (Lagoze and Vilhuber 2017).
- 17.
Alter and Gonzalez (2018) suggested that to “protect” researchers who want to use their data first (before sharing), journals can propose an “embargo”.
- 18.
A recent lawsuit involving the popular training program CrossFit showed that a paper by Smith et al. (2013) erroneously showed an increased risk for injuries for its users. Although the paper was retracted later, the impacts on the researcher’s career were severe (for details, see https://retractionwatch.com/).
- 19.
The European Research Council (ERC) recommends “to all its funded researchers that they follow best practice by retaining files of all the research data they have used during the course of their work and that they be prepared to share this data with other researchers”.
References
Akers, K. G., & Doty, J. (2013). Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation, 8(2), 5–26.
Alter, G., & Gonzalez, R. (2018). Responsible practices for data sharing. American Psychologist, 73(2), 146–156.
Baiocchi, G. (2007). Reproducible research in computational economics: Guidelines, integrated approaches, and open source software. Computational Economics, 30(1), 19–40.
Baker, M. (2016). Why scientists must share their research code. Nature News.
Barba, L. A. (2018). Terminologies for reproducible research. arXiv preprint arXiv:1802.03311.
Benureau, F. C. Y., & Rougier, N. P. (2018). Re-run, repeat, reproduce, reuse, replicate: Transforming code into scientific contributions. Frontiers in Neuroinformatics, 11, 69.
Boker, S. M., Brick, T. R., Pritikin, J. N., Wang, Y., von Oertzen, T., Brown, D., et al. (2015). Maintained individual data distributed likelihood estimation (middle). Multivariate Behavioral Research, 50(6), 706–720.
Bowers, J., Higgins, N., Karlan, D., Tulman, S., & Zinman, J. (2017). Challenges to replication and iteration in field experiments: Evidence from two direct mail shots. American Economic Review, 107(5), 462–65.
Brase, J. (2009). DataCite - A global registration agency for research data. In 2009 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology (pp. 257–261).
Chang, A. C., & Li, P. (2017). A preanalysis plan to replicate sixty economics research papers that worked half of the time. American Economic Review, 107(5), 60–64.
Christensen, G., & Miguel, E. (2018). Transparency, reproducibility, and the credibility of economics research. Journal of Economic Literature, 56(3), 920–80.
Christensen, G., Freese, J., & Miguel, E. (2019). Transparent and reproducible social science research: How to do open science. Berkeley: University of California Press.
Christian, T.-M., Lafferty-Hess, S., Jacoby, W., & Carsey, T. (2018). Operationalizing the replication standard: A case study of the data curation and verification workflow for scholarly journals. International Journal of Digital Curation, 13(1), 114–124.
Claerbout, J. (1990). Active documents and reproducible results. SEP, 67, 139–144.
Crabtree, J. D. (2011). Odum institute user study: Exploring the applicability of the dataverse network.
Crosas, M., King, G., Honaker, J., & Sweeney, L. (2015). Automating open science for big data. ANNALS of the American Academy of Political and Social Science, 659(1), 260–273.
de Leeuw, J. (2001). Reproducible research. The bottom line.
Dewald, W. G., Thursby, J. G., & Anderson, R. G. (1988). Replication in empirical economics: The journal of money, credit and banking project: Reply. American Economic Review, 78(5), 1162–1163.
Di Cosmo, R., & Zacchiroli, S. (2017). Software heritage: Why and how to preserve software source code.
Dunn, C. S., & Austin, E. W. (1998). Protecting confidentiality in archival data resources. IASSIST Quarterly, 22(2), 16–16.
Duvendack, M., Palmer-Jones, R., & Reed, W. R. (2017). What is meant by “replication” and why does it encounter resistance in economics? American Economic Review, 107(5), 46–51.
Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., & Vadhan, S. (2009). On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (pp. 381–390).
Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., et al. (2017). A data citation roadmap for scholarly data repositories. bioRxiv.
Fuentes, M. (2016). Reproducible research in JASA. AMSTAT News: The Membership Magazine of the American Statistical Association, 17.
Gentleman, R., Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 1–23.
Gentzkow, M., & Shapiro, J. (2013). Nuts and bolts: Computing with large data. In Summer Institute 2013 Econometric Methods for High-Dimensional Data.
Van Gorp, P., & Mazanek, S. (2011). SHARE: A web portal for creating and sharing executable research papers. Procedia Computer Science, 4, 589–597.
Gouëzel, S., & Shchur, V. (2019). A corrected quantitative version of the Morse lemma. Journal of Functional Analysis, 277(4), 1258–1268.
Hurlin, C., Pérignon, C., & Stodden, V. (2014). RunMyCode.org: A novel dissemination and collaboration platform for executing published computational results. Open Science Framework.
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.
Jacoby William G., Lafferty-Hess, S., & Christian, T.-M. (2017). Should journals be responsible for reproducibility?
Jones, S., & Grootveld, M. (2017). How FAIR are your data?
King, G. (2007). An introduction to the dataverse network as an infrastructure for data sharing. Sociological Methods & Research, 36(2), 173–199.
Knuth, D. E. (1984). Literate programming. The Computer Journal, 27, 97–111.
Knuth, D. E. (1992). Literate programming. Center for the Study of Language and Information.
Lagoze, C., & Vilhuber, L. (2017). O privacy, where art thou? Making confidential data part of reproducible research. CHANCE, 30(3), 68–72.
Leeper, T. J. (2014). Archiving reproducible research with R and dataverse. R Journal, 6(1).
LeVeque, R. J. (2009). Python tools for reproducible research on hyperbolic problems. Computing in Science and Engineering (CiSE), 19–27. Special issue on Reproducible Research.
McCullough, B. D. (2009). Open access economics journals and the market for reproducible economic research. Economic Analysis and Policy, 39(1), 117–126.
Miyakawa, T. (2020). No raw data, no science: Another possible source of the reproducibility crisis.
Mueller-Langer, F., Fecher, B., Harhoff, D., & Wagner, G. G. (2019). Replication studies in economics–How many and which papers are chosen for replication, and why? Research Policy, 48(1), 62–83.
Nature, Editor. (2013). Reducing our irreproducibility. Nature, 496, 398.
Nosek, B. A., & Coauthors. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.
Orozco, V., Bontemps, C., Maigne, E., Piguet, V., Hofstetter, A., Lacroix, A., et al. (2020). How to make a pie: Reproducible research for empirical economics & econometrics. Journal of Economic Surveys, 34(5), 1134–1169.
Pérignon, C., Gadouche, K., Hurlin, C., Silberman, R., & Debonnel, E. (2019). Certify reproducibility with confidential data. Science, 365(6449), 127–128.
Pesaran, H. (2003). Introducing a replication section. Journal of Applied Econometrics, 18(1), 111.
Reinhart, C. M., & Rogoff, K. S. (2010). Growth in a time of debt. American Economic Review, 100(2), 573–78.
Rowhani-Farid, A., & Barnett, A. G. (2018). Badges for sharing data and code at biostatistics: An observational study [version 2; peer review: 2 approved]. F1000Research, 7(90).
Sansone, S.-A., McQuilton, P., Rocca-Serra, P., Gonzalez-Beltran, A., Izzo, M., Lister, A. L., et al. (2019). FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology, 37(4), 358–367.
Science, S. (2011). Challenges and opportunities. Science, 331(6018), 692–693.
Smith, M. M., Sommer, A. J., Starkoff, B. E., Devor, S. T. (2013). Crossfit-based high-intensity power training improves maximal aerobic fitness and body composition. The Journal of Strength and Conditioning Research, 27(11), 3159–3172.
Spencer, H. (1854). The art of education.
Sweeney, L, Crosas, M., & Bar-Sinai, M. (2015). Sharing sensitive data with confidence: The datatags system. Technology Science.
Vilhuber, L. (2019). Report by the AEA data editor. AEA Papers and Proceedings, 109, 718–729.
Vlaeminck, S., & Herrmann, L.-K. (2015). Data policies and data archives: A new paradigm for academic publishing in economic sciences? In B. Schmidt, & M. Dobreva (Eds.), New avenues for electronic publishing in the age of infinite collections and citizen science (pp. 145–155). Amsterdam: IOS Press.
Wilkinson, M., Dumontier, M., Aalbersber I., Appleton, G., Axton, M., Baak, A. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(160018).
Acknowledgements
Christine Thomas-Agnan is our former teacher and a great colleague always available for helpful and interesting discussions for several years. We were very happy to write this paper as a demonstration of our gratitude. The authors wish also to thank the participants of the Banco de Portugal Reproducible Research Workshop in Porto (2019) for the stimulating discussions, which are at the origin of this paper. We are grateful to Virginie Piguet as well as the two anonymous referees for their careful reading and inspiring comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix 1: Synthesis of All Situations Illustrated on Figs. 2–7
Appendix 1: Synthesis of All Situations Illustrated on Figs. 2–7
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bontemps, C., Orozco, V. (2021). Toward a FAIR Reproducible Research. In: Daouia, A., Ruiz-Gazen, A. (eds) Advances in Contemporary Statistics and Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-73249-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-73249-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73248-6
Online ISBN: 978-3-030-73249-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)