Skip to main content

Toward a FAIR Reproducible Research

  • Chapter
  • First Online:
Book cover Advances in Contemporary Statistics and Econometrics

Abstract

Two major movements are actively at work to change the way research is done, shared, and reproduced. The first is the reproducible research (RR) approach, which has never been easier to implement given the current availability of tools and DIY manuals. The second is the FAIR (Findable, Accessible, Interoperable, and Reusable) approach, which aims to support the availability and sharing of research materials. We show here that despite the efforts made by researchers to improve the reproducibility of their research, the initial goals of RR remain mostly unmet. There is great demand, both within the scientific community and from the general public, for greater transparency and for trusted published results. As a scientific community, we need to reorganize the diffusion of all materials used in a study and to rethink the publication process. Researchers and journal reviewers should be able to easily use research materials for reproducibility, replicability, or reusability purposes or for exploration of new research paths. Here we present how the research process, from data collection to paper publication, could be reorganized and introduce some already available tools and initiatives. We show that even in cases in which data are confidential, journals and institutions can organize and promote “FAIR-like RR” solutions where not only the published paper but also all related materials can be used by any researcher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We consider here that the research process starts once the data are collected and in possession of the researcher. We do not address here the issue of reproducibility for data collection in experimental economics or field experiments (Bowers et al. 2017).

  2. 2.

    We will not discuss here the question of the precise meaning of “same results”.

  3. 3.

    At the European level, one should mention OpenAIRE and in France the “Plan national pour la science ouverte” (https://www.ouvrirlascience.fr/).

  4. 4.

    See also Table 2 in Appendix 2, for a synthesis of the cases presented throughout the paper.

  5. 5.

    Other issues that we do not address directly here include the digital preservation of research data (Akers and Doty 2013) or the preservation of software (Di Cosmo and Zacchiroli 2017).

  6. 6.

    In these figures, for clarity reasons, we do not illustrate the fact that researchers may share their materials themselves.

  7. 7.

    In 2003, H. Pesaran announced the creation of a new section of the Journal of Applied Econometrics dedicated to the replication of published empirical papers (Pesaran 2003). Since then, some journals have followed this idea leading to an increase in the number of replication papers in economics (Mueller-Langer et al. 2019). The site PubPeer (https://pubpeer.com/) is also a way to allow users to discuss and review scientific research.

  8. 8.

    Some useful resources facilitate the process (see https://social-science-data-editors.github.io/guidance/Verification_guidance.html). The Transparency and Openness Promotion (TOP) proposes also varying levels of replication policies for journals (Nosek et al. 2015).

  9. 9.

    Jacoby William (2017) analyzed the AJPS verification policy and reported an average of 8 person-hours per manuscript to curate and replicate the analyses. The publication workflow, involving more rounds and resubmissions, is also much longer.

  10. 10.

    A complete list of solutions is detailed in The Registry of Research Data Repositories (http://re3data.org) a service of DataCite. In addition, CoreTrustSeal provides certification to repositories and lists the certified ones.

  11. 11.

    For datasets, the FAIR interoperability principle suggests the use of open formats such as CSV files instead of proprietary formats (.xls). For code, open-source software should be preferred to avoid exclusive access (Vilhuber 2019). The metadata should also follow standards (Dublin core or DDI). References and links to related data should also be provided (Jones and Grootveld 2017).

  12. 12.

    The DataCite project (Brase 2009) is a popular resource to locate and precisely identify data through a unique DOI.

  13. 13.

    There are many sources of confidential and nonshareable data (Christensen and Miguel 2018; Lagoze and Vilhuber 2017).

  14. 14.

    In France, the CASD (https://www.casd.eu/) is a single-access portal to many public data providers (INSEE, ministries, etc.). Researchers are not allowed to copy all the materials locally on their machine, and only some type of outputs can be extracted.

  15. 15.

    The code may also contain some confidential elements. In particular, the code used for the initial data curation may contain, e.g., brand or city names and addresses.

  16. 16.

    Some data providers, in particular NSOs, already perform RR on their confidential data, controlling output files and code, to check for confidentiality restrictions (Lagoze and Vilhuber 2017).

  17. 17.

    Alter and Gonzalez (2018) suggested that to “protect” researchers who want to use their data first (before sharing), journals can propose an “embargo”.

  18. 18.

    A recent lawsuit involving the popular training program CrossFit showed that a paper by Smith et al. (2013) erroneously showed an increased risk for injuries for its users. Although the paper was retracted later, the impacts on the researcher’s career were severe (for details, see https://retractionwatch.com/).

  19. 19.

    The European Research Council (ERC) recommends “to all its funded researchers that they follow best practice by retaining files of all the research data they have used during the course of their work and that they be prepared to share this data with other researchers”.

References

  • Akers, K. G., & Doty, J. (2013). Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation, 8(2), 5–26.

    Article  Google Scholar 

  • Alter, G., & Gonzalez, R. (2018). Responsible practices for data sharing. American Psychologist, 73(2), 146–156.

    Article  Google Scholar 

  • Baiocchi, G. (2007). Reproducible research in computational economics: Guidelines, integrated approaches, and open source software. Computational Economics, 30(1), 19–40.

    Article  Google Scholar 

  • Baker, M. (2016). Why scientists must share their research code. Nature News.

    Google Scholar 

  • Barba, L. A. (2018). Terminologies for reproducible research. arXiv preprint arXiv:1802.03311.

  • Benureau, F. C. Y., & Rougier, N. P. (2018). Re-run, repeat, reproduce, reuse, replicate: Transforming code into scientific contributions. Frontiers in Neuroinformatics, 11, 69.

    Article  Google Scholar 

  • Boker, S. M., Brick, T. R., Pritikin, J. N., Wang, Y., von Oertzen, T., Brown, D., et al. (2015). Maintained individual data distributed likelihood estimation (middle). Multivariate Behavioral Research, 50(6), 706–720.

    Article  Google Scholar 

  • Bowers, J., Higgins, N., Karlan, D., Tulman, S., & Zinman, J. (2017). Challenges to replication and iteration in field experiments: Evidence from two direct mail shots. American Economic Review, 107(5), 462–65.

    Article  Google Scholar 

  • Brase, J. (2009). DataCite - A global registration agency for research data. In 2009 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology (pp. 257–261).

    Google Scholar 

  • Chang, A. C., & Li, P. (2017). A preanalysis plan to replicate sixty economics research papers that worked half of the time. American Economic Review, 107(5), 60–64.

    Article  Google Scholar 

  • Christensen, G., & Miguel, E. (2018). Transparency, reproducibility, and the credibility of economics research. Journal of Economic Literature, 56(3), 920–80.

    Article  Google Scholar 

  • Christensen, G., Freese, J., & Miguel, E. (2019). Transparent and reproducible social science research: How to do open science. Berkeley: University of California Press.

    Google Scholar 

  • Christian, T.-M., Lafferty-Hess, S., Jacoby, W., & Carsey, T. (2018). Operationalizing the replication standard: A case study of the data curation and verification workflow for scholarly journals. International Journal of Digital Curation, 13(1), 114–124.

    Article  Google Scholar 

  • Claerbout, J. (1990). Active documents and reproducible results. SEP, 67, 139–144.

    Google Scholar 

  • Crabtree, J. D. (2011). Odum institute user study: Exploring the applicability of the dataverse network.

    Google Scholar 

  • Crosas, M., King, G., Honaker, J., & Sweeney, L. (2015). Automating open science for big data. ANNALS of the American Academy of Political and Social Science, 659(1), 260–273.

    Article  Google Scholar 

  • de Leeuw, J. (2001). Reproducible research. The bottom line.

    Google Scholar 

  • Dewald, W. G., Thursby, J. G., & Anderson, R. G. (1988). Replication in empirical economics: The journal of money, credit and banking project: Reply. American Economic Review, 78(5), 1162–1163.

    Google Scholar 

  • Di Cosmo, R., & Zacchiroli, S. (2017). Software heritage: Why and how to preserve software source code.

    Google Scholar 

  • Dunn, C. S., & Austin, E. W. (1998). Protecting confidentiality in archival data resources. IASSIST Quarterly, 22(2), 16–16.

    Google Scholar 

  • Duvendack, M., Palmer-Jones, R., & Reed, W. R. (2017). What is meant by “replication” and why does it encounter resistance in economics? American Economic Review, 107(5), 46–51.

    Article  Google Scholar 

  • Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., & Vadhan, S. (2009). On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (pp. 381–390).

    Google Scholar 

  • Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., et al. (2017). A data citation roadmap for scholarly data repositories. bioRxiv.

    Google Scholar 

  • Fuentes, M. (2016). Reproducible research in JASA. AMSTAT News: The Membership Magazine of the American Statistical Association, 17.

    Google Scholar 

  • Gentleman, R., Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 1–23.

    Google Scholar 

  • Gentzkow, M., & Shapiro, J. (2013). Nuts and bolts: Computing with large data. In Summer Institute 2013 Econometric Methods for High-Dimensional Data.

    Google Scholar 

  • Van Gorp, P., & Mazanek, S. (2011). SHARE: A web portal for creating and sharing executable research papers. Procedia Computer Science, 4, 589–597.

    Article  Google Scholar 

  • Gouëzel, S., & Shchur, V. (2019). A corrected quantitative version of the Morse lemma. Journal of Functional Analysis, 277(4), 1258–1268.

    Article  MathSciNet  Google Scholar 

  • Hurlin, C., Pérignon, C., & Stodden, V. (2014). RunMyCode.org: A novel dissemination and collaboration platform for executing published computational results. Open Science Framework.

    Google Scholar 

  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.

    Google Scholar 

  • Jacoby William G., Lafferty-Hess, S., & Christian, T.-M. (2017). Should journals be responsible for reproducibility?

    Google Scholar 

  • Jones, S., & Grootveld, M. (2017). How FAIR are your data?

    Google Scholar 

  • King, G. (2007). An introduction to the dataverse network as an infrastructure for data sharing. Sociological Methods & Research, 36(2), 173–199.

    Article  MathSciNet  Google Scholar 

  • Knuth, D. E. (1984). Literate programming. The Computer Journal, 27, 97–111.

    Article  Google Scholar 

  • Knuth, D. E. (1992). Literate programming. Center for the Study of Language and Information.

    Google Scholar 

  • Lagoze, C., & Vilhuber, L. (2017). O privacy, where art thou? Making confidential data part of reproducible research. CHANCE, 30(3), 68–72.

    Article  Google Scholar 

  • Leeper, T. J. (2014). Archiving reproducible research with R and dataverse. R Journal, 6(1).

    Google Scholar 

  • LeVeque, R. J. (2009). Python tools for reproducible research on hyperbolic problems. Computing in Science and Engineering (CiSE), 19–27. Special issue on Reproducible Research.

    Google Scholar 

  • McCullough, B. D. (2009). Open access economics journals and the market for reproducible economic research. Economic Analysis and Policy, 39(1), 117–126.

    Article  Google Scholar 

  • Miyakawa, T. (2020). No raw data, no science: Another possible source of the reproducibility crisis.

    Google Scholar 

  • Mueller-Langer, F., Fecher, B., Harhoff, D., & Wagner, G. G. (2019). Replication studies in economics–How many and which papers are chosen for replication, and why? Research Policy, 48(1), 62–83.

    Article  Google Scholar 

  • Nature, Editor. (2013). Reducing our irreproducibility. Nature, 496, 398.

    Google Scholar 

  • Nosek, B. A., & Coauthors. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.

    Google Scholar 

  • Orozco, V., Bontemps, C., Maigne, E., Piguet, V., Hofstetter, A., Lacroix, A., et al. (2020). How to make a pie: Reproducible research for empirical economics & econometrics. Journal of Economic Surveys, 34(5), 1134–1169.

    Google Scholar 

  • Pérignon, C., Gadouche, K., Hurlin, C., Silberman, R., & Debonnel, E. (2019). Certify reproducibility with confidential data. Science, 365(6449), 127–128.

    Google Scholar 

  • Pesaran, H. (2003). Introducing a replication section. Journal of Applied Econometrics, 18(1), 111.

    Article  MathSciNet  Google Scholar 

  • Reinhart, C. M., & Rogoff, K. S. (2010). Growth in a time of debt. American Economic Review, 100(2), 573–78.

    Article  Google Scholar 

  • Rowhani-Farid, A., & Barnett, A. G. (2018). Badges for sharing data and code at biostatistics: An observational study [version 2; peer review: 2 approved]. F1000Research, 7(90).

    Google Scholar 

  • Sansone, S.-A., McQuilton, P., Rocca-Serra, P., Gonzalez-Beltran, A., Izzo, M., Lister, A. L., et al. (2019). FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology, 37(4), 358–367.

    Article  Google Scholar 

  • Science, S. (2011). Challenges and opportunities. Science, 331(6018), 692–693.

    Article  Google Scholar 

  • Smith, M. M., Sommer, A. J., Starkoff, B. E., Devor, S. T. (2013). Crossfit-based high-intensity power training improves maximal aerobic fitness and body composition. The Journal of Strength and Conditioning Research, 27(11), 3159–3172.

    Google Scholar 

  • Spencer, H. (1854). The art of education.

    Google Scholar 

  • Sweeney, L, Crosas, M., & Bar-Sinai, M. (2015). Sharing sensitive data with confidence: The datatags system. Technology Science.

    Google Scholar 

  • Vilhuber, L. (2019). Report by the AEA data editor. AEA Papers and Proceedings, 109, 718–729.

    Article  Google Scholar 

  • Vlaeminck, S., & Herrmann, L.-K. (2015). Data policies and data archives: A new paradigm for academic publishing in economic sciences? In B. Schmidt, & M. Dobreva (Eds.), New avenues for electronic publishing in the age of infinite collections and citizen science (pp. 145–155). Amsterdam: IOS Press.

    Google Scholar 

  • Wilkinson, M., Dumontier, M., Aalbersber I., Appleton, G., Axton, M., Baak, A. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(160018).

    Google Scholar 

Download references

Acknowledgements

Christine Thomas-Agnan is our former teacher and a great colleague always available for helpful and interesting discussions for several years. We were very happy to write this paper as a demonstration of our gratitude. The authors wish also to thank the participants of the Banco de Portugal Reproducible Research Workshop in Porto (2019) for the stimulating discussions, which are at the origin of this paper. We are grateful to Virginie Piguet as well as the two anonymous referees for their careful reading and inspiring comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valérie Orozco .

Editor information

Editors and Affiliations

Appendix 1: Synthesis of All Situations Illustrated on Figs. 2–7

Appendix 1: Synthesis of All Situations Illustrated on Figs. 2–7

Table 2 Comparison of the various situations presented in the paper

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bontemps, C., Orozco, V. (2021). Toward a FAIR Reproducible Research. In: Daouia, A., Ruiz-Gazen, A. (eds) Advances in Contemporary Statistics and Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-73249-3_30

Download citation

Publish with us

Policies and ethics