Sharing Scientific Data: Moving Toward “Open Data”

  • Pali U. K. De SilvaEmail author
  • Candace K. Vance
Part of the Fascinating Life Sciences book series (FLS)


As the advantages of data sharing are increasingly recognized, the issues surrounding sharing and accessibility of scientific data are being widely discussed. Meanwhile, an “open data” revolution is taking shape in parallel to the open access movement for scientific publishing. The developments and contributions of a variety of stakeholders in shaping the dialog concerning scientific data sharing are discussed in this chapter. Data sharing issues and challenges are unique to each scientific discipline; highlighting these dissimilarities associated with two distinctly different disciplines, ecology and genomics are examined. In addition, challenges associated with openly sharing genomic data are discussed in detail.


Open data Scientific data sharing Genetic data Ecological data Data sharing initiatives Data publication Data citation 


  1. Andelman, S. J., Bowles, C. M., Willig, M. R., & Waide, R. B. (2004). Understanding environmental complexity through a distributed knowledge network. BioScience, 54(3), 240–246.CrossRefGoogle Scholar
  2. Arias, J. J., Pham-Kanter, G., & Campbell, E. G. (2015). The growth and gaps of genetic data sharing policies in the United States. Journal of Law and the Biosciences, 2(1), 56–68.CrossRefPubMedGoogle Scholar
  3. Barnes, M. R., Harland, L., Foord, S. M., Hall, M. D., Dix, I., Thomas, S., et al. (2009). Lowering industry firewalls: Pre-competitive informatics initiatives in drug discovery. Nature Reviews Drug Discovery, 8(9), 701–708.CrossRefPubMedGoogle Scholar
  4. Birney, E., Hudson, T. J., Green, E. D., Gunter, C., Eddy, S., Rogers, J., et al. (2009). Prepublication data sharing. Nature, 461(7261), 168–170.CrossRefPubMedGoogle Scholar
  5. Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059–1078.CrossRefGoogle Scholar
  6. Boulton, G., Campbell, P., Collins, B., Elias, P., Hall, W., Laurie, G., et al. (2012). Science as an open enterprise. London: Royal Society. 104.Google Scholar
  7. Caso, R., & Ducato, R. (2014). Intellectual property, open science and research biobanks. Trento Law and Technology Research Group Research Paper (22).Google Scholar
  8. Chavan, V. S., Gaiji, S., Hahn, A., Sood, R. K., Raymond, M., & King, N. (2010). Copenhagen: Global biodiversity information facility (36 pp.). ISBN:87-92020-13-5. Accessible online at
  9. Committee on Responsibilities of Authorship in the Biological Sciences, N. R. C. (2003). Sharing publication-related data and materials: Responsibilities of authorship in the life sciences. Plant Physiology, 132(1), 19–24.Google Scholar
  10. Costello, M. J. (2009). Motivating online publication of data. BioScience, 59(5), 418–427.CrossRefGoogle Scholar
  11. Cottingham, K. (2008). The structural genomics consortium makes its presence known. Journal of Proteome Research, 7(12), 5073–5073.CrossRefPubMedGoogle Scholar
  12. Couzin, J. (2008). Whole-genome data not anonymous, challenging assumptions. Science, 321(5894), 1278–1278.CrossRefPubMedGoogle Scholar
  13. Cragin, M. H., Palmer, C. L., Carlson, J. R., & Witt, M. (2010). Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1926), 4023–4038.CrossRefGoogle Scholar
  14. Critchley, C., Nicol, D., & Otlowski, M. (2015). The impact of commercialisation and genetic data sharing arrangements on public trust and the intention to participate in biobank research. Public Health Genomics, 18(3), 160–172. doi: 10.1159/000375441 CrossRefPubMedGoogle Scholar
  15. Cummings, J. A., Zagrodney, J. M., & Day, T. E. (2015). Impact of open data policies on consent to participate in human subjects research: Discrepancies between participant action and reported concerns.Google Scholar
  16. Danielsson, K., Mun, L. J., Lordemann, A., Mao, J., & Lin, C.-H. J. (2014). Next-generation sequencing applied to rare diseases genomics. Expert review of molecular diagnostics, 14(4), 469–487.CrossRefPubMedGoogle Scholar
  17. Editorial, Nature. (2009). Data’s shameful neglect. Nature, 461(7261), 145.Google Scholar
  18. Fischer, B. A., & Zigmond, M. J. (2010). The essential nature of sharing in science. Science and Engineering Ethics, 16(4), 783–799.CrossRefPubMedGoogle Scholar
  19. Gurstein, M. B. (2011). Open data: Empowering the empowered or effective data use for everyone? First Monday, 16(2).Google Scholar
  20. Gymrek, M., McGuire, A. L., Golan, D., Halperin, E., & Erlich, Y. (2013). Identifying personal genomes by surname inference. Science, 339(6117), 321–324.CrossRefPubMedGoogle Scholar
  21. Hanson, B., Sugden, A., & Alberts, B. (2011). Making data maximally available. Science, 331(6018), 649–649.CrossRefPubMedGoogle Scholar
  22. Henneken, E. (2015). Unlocking and sharing data in astronomy. Bulletin of the American Society for Information Science and Technology, 41(4), 40–43.CrossRefGoogle Scholar
  23. Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., et al. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics, 4(8), e1000167.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Hudson, K. L., & Collins, F. S. (2013). Biospecimen policy: Family matters. Nature, 500(7461), 141–142.CrossRefPubMedPubMedCentralGoogle Scholar
  25. International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 437(7063), 1299–1320.CrossRefGoogle Scholar
  26. International HapMap Project. (2003). The International HapMap Project. Nature, 426(6968), 789–796.CrossRefGoogle Scholar
  27. Johnson, A. D., Leslie, R., & O’Donnell, C. J. (2011). Temporal trends in results availability from genome-wide association studies. PLoS Genet, 7(9), e1002269.Google Scholar
  28. Jones, M. B., Schildhauer, M. P., Reichman, O. J., & Bowers, S. (2006). The new bioinformatics: Integrating ecological data from the gene to the biosphere. Annual Review of Ecology, Evolution, and Systematics, 519–544.Google Scholar
  29. Kaufman, D. J., Murphy-Bollinger, J., Scott, J., & Hudson, K. L. (2009). Public opinion about the importance of privacy in biobank research. The American Journal of Human Genetics, 85(5), 643–654.CrossRefPubMedGoogle Scholar
  30. Kaye, J. (2012). The tension between data sharing and the protection of privacy in genomics research. Annual Review of Genomics and Human Genetics, 13, 415.CrossRefPubMedPubMedCentralGoogle Scholar
  31. Kaye, J., & Hawkins, N. (2014). Data sharing policy design for consortia: Challenges for sustainability. Genome Med, 6(1), 4.CrossRefPubMedPubMedCentralGoogle Scholar
  32. Kaye, J., Heeney, C., Hawkins, N., De Vries, J., & Boddington, P. (2009). Data sharing in genomics—Re-shaping scientific practice. Nature Reviews Genetics, 10(5), 331–335.CrossRefPubMedPubMedCentralGoogle Scholar
  33. Kratz, J. E., & Strasser, C. (2015). Making data count. Scientific data, 2.Google Scholar
  34. Landry, J. J. M., Pyl, P. T., Rausch, T., Zichner, T., Tekkedil, M. M., Stütz, A. M., et al. (2013). The genomic and transcriptomic landscape of a HeLa cell line. G3: Genes| Genomes| Genetics, 3(8), 1213–1224.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Lawrence, B., Jones, C., Matthews, B., Pepler, S., & Callaghan, S. (2011). Citation and peer review of data: Moving towards formal data publication. International Journal of Digital Curation, 6(2), 4–37.CrossRefGoogle Scholar
  36. Lemke, A. A., Wolf, W. A., Hebert-Beirne, J., & Smith, M. E. (2010). Public and biobank participant attitudes toward genetic research participation and data sharing. Public Health Genomics, 13(6), 368–377.PubMedPubMedCentralGoogle Scholar
  37. Levenson, D. (2010). New research may help differentiate similar diagnoses. American Journal of Medical Genetics Part A, 152a(2), 9. doi: 10.1002/ajmg.a.33285
  38. Ludman, E. J., Fullerton, S. M., Spangler, L., Trinidad, S. B., Fujii, M. M., Jarvik, G. P., et al. (2010). Glad you asked: participants’ opinions of re-consent for dbGap data submission. Journal of Empirical Research on Human Research Ethics, 5(3), 9–16.CrossRefPubMedPubMedCentralGoogle Scholar
  39. Lunshof, J. E., Chadwick, R., Vorhaus, D. B., & Church, G. M. (2008). From genetic privacy to open consent. Nature Reviews Genetics, 9(5), 406–411. doi: 10.1038/nrg2360 CrossRefPubMedGoogle Scholar
  40. Lyon, L. (2007). Dealing with data: Roles, rights, responsibilities and relationships. Consultancy Report.Google Scholar
  41. Mailman, M. D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al. (2007). The NCBI dbGaP database of genotypes and phenotypes. Nature Genetics, 39(10), 1181–1186.CrossRefPubMedPubMedCentralGoogle Scholar
  42. Marden, E., & Godfrey, R. N. (2012). Intellectual property and sharing regimes in agricultural genomics: Finding the right balance for innovation. Drake J. Agric. L., 17, 369.Google Scholar
  43. Mayernik, M. S. (2012). Data citation initiatives and issues. Bulletin of the American Society for Information Science and Technology, 38(5), 23–28.CrossRefGoogle Scholar
  44. McGuire, A. L., Hamilton, J. A., Lunstroth, R., McCullough, L. B., & Goldman, A. (2008). DNA data sharing: Research participants’ perspectives. Genetics in Medicine, 10(1), 46–53.CrossRefPubMedPubMedCentralGoogle Scholar
  45. McGuire, A. L., & Majumder, M. A. (2009). Two cheers for GINA. Genome Medicine, 1(1), 6.CrossRefPubMedPubMedCentralGoogle Scholar
  46. Michener, W. K. (2015). Ecological data sharing. Ecological Informatics, 29, 33–44.CrossRefGoogle Scholar
  47. Oliver, J. M., Slashinski, M. J., Wang, T., Kelly, P. A., Hilsenbeck, S. G., & McGuire, A. L. (2012). Balancing the risks and benefits of genomic data sharing: Genome research participants’ perspectives. Public Health Genomics, 15(2), 106–114.   CrossRefPubMedGoogle Scholar
  48. Olson, R. J., & McCord, R. A. (2000). Archiving ecological data and information (pp. 117–141). Blackwell Science, Oxford, Great Britain.Google Scholar
  49. Pacheco, C. M., Daley, S. M., Brown, T., Filippi, M., Greiner, K. A., & Daley, C. M. (2013). Moving forward: Breaking the cycle of mistrust between American Indians and researchers. American Journal of Public Health, 103(12), 2152–2159.CrossRefPubMedPubMedCentralGoogle Scholar
  50. Paltoo, D. N., Rodriguez, L. L., Feolo, M., Gillanders, E., Ramos, E. M., Rutter, J., et al. (2014). Data use under the NIH GWAS data sharing policy and future directions. Nature Genetics, 46(9), 934.CrossRefPubMedPubMedCentralGoogle Scholar
  51. Pepe, A., Mayernik, M., Borgman, C. L., & Van de Sompel, H. (2010). From artifacts to aggregations: Modeling scientific life cycles on the semantic web. Journal of the American Society for Information Science and Technology, 61(3), 567–582.Google Scholar
  52. Perkmann, M., & Schildt, H. (2015). Open data partnerships between firms and universities: The role of boundary organizations. Research Policy, 44(5), 1133–1143.CrossRefGoogle Scholar
  53. Peters, D. P. C., Loescher, H. W., SanClements, M. D., & Havstad, K. M. (2014). Taking the pulse of a continent: Expanding site-based research infrastructure for regional-to continental-scale ecology. Ecosphere, 5(3), 1–23.CrossRefGoogle Scholar
  54. Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing detailed research data is associated with increased citation rate. PLoS one, 2(3), e308.CrossRefPubMedPubMedCentralGoogle Scholar
  55. Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1, e175.CrossRefPubMedPubMedCentralGoogle Scholar
  56. Poline, J.-B., Breeze, J. L., Ghosh, S., Gorgolewski, K., Halchenko, Y. O., Hanke, M., … Marcus, D. S. (2012). Data sharing in neuroimaging research. Frontiers in neuroinformatics, 6.Google Scholar
  57. Pullman, D., Etchegary, H., Gallagher, K., Hodgkinson, K., Keough, M., Morgan, D., et al. (2012). Personal privacy, public benefits, and biobanks: A conjoint analysis of policy priorities and public perceptions. Genetics in medicine, 14(2), 229–235.CrossRefPubMedGoogle Scholar
  58. Reichman, O. J., Jones, M. B., & Schildhauer, M. P. (2011). Challenges and opportunities of open data in ecology. Science, 331(6018).Google Scholar
  59. Resnik, D. B. (2010). Genomic research data: Open vs. restricted access. IRB, 32(1), 1.PubMedPubMedCentralGoogle Scholar
  60. Roberts, J. L. (2010). Preempting discrimination: Lessons from the Genetic Information Nondiscrimination Act. Vanderbilt Law Review, 63(2).Google Scholar
  61. Ross, J. S., & Krumholz, H. M. (2013). Ushering in a new era of open science through data sharing. Journal of the American Medical Association, 309(13), 1355–1356.CrossRefPubMedGoogle Scholar
  62. Rounsley, S. (2003). Sharing the wealth. The mechanics of a data release from industry. Plant Physiology, 133(2), 438–440.CrossRefPubMedPubMedCentralGoogle Scholar
  63. Savage, C. J., & Vickers, A. J. (2009). Empirical study of data sharing by authors publishing in PLoS journals. PLoS  One, 4(9), e7078.CrossRefPubMedPubMedCentralGoogle Scholar
  64. Schadt, E. E., Woo, S., & Hao, K. (2012). Bayesian method to predict individual SNP genotypes from gene expression data. Nature Genetics, 44(5), 603–608.CrossRefPubMedGoogle Scholar
  65. Staff, S. (2011). Challenges and opportunities. Science, 331(6018), 692–693.CrossRefGoogle Scholar
  66. Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., … Frame, M. (2011). Data sharing by scientists: practices and perceptions. PLoS One, 6(6). doi: 10.1371/journal.pone.0021101
  67. Trinidad, S. B., Fullerton, S. M., Bares, J. M., Jarvik, G. P., Larson, E. B., & Burke, W. (2010). Genomic research and wide data sharing: Views of prospective participants. Genetics in Medicine, 12(8), 486–495.CrossRefPubMedPubMedCentralGoogle Scholar
  68. Weigelt, J. (2009). The case for open-access chemical biology. EMBO Reports, 10(9), 941–945.CrossRefPubMedPubMedCentralGoogle Scholar
  69. Wessels, B., Finn, R. L., Linde, P., Mazzetti, P., Nativi, S., Riley, S., et al. (2014). Issues in the development of open access to research data. Prometheus, 32(1), 49–66.CrossRefGoogle Scholar
  70. Wynholds, L. (2011). Linking to scientific data: Identity problems of unruly and poorly bounded digital objects. International Journal of Digital Curation, 6(1), 214–225.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Murray State UniversityMurrayUSA

Personalised recommendations