Repositories for Sharing Human Data in Stem Cell Research

  • Pilar N. OssorioEmail author


High-throughput biology is data intensive, and stem cell research is no exception to this trend. Funders often require scientists to share the data generated by high-throughput methods, because sharing speeds scientific discovery and increases the benefit of public investments in science. When human data are involved, the benefits of sharing must be balanced against the risks of inappropriately disclosing sensitive, personal information. Historically, scientists anonymized data to protect the interests of people whose data were shared. However, recent development of computational methods for re-identifying people from anonymous data and empirical demonstrations of re-identification have led many commentators to question whether anonymization still provides adequate protection for people whose data are included in shared databases. Because of disclosure concerns, data sharing repositories control who can access sensitive human data and what the approved users can do with those data. Stem cell researchers who create such repositories must develop governance mechanisms that prevent harm to individuals whose data they share.


Data Sharing Stem Cell Research Stem Cell Scientist Unauthorized Person Informational Risk 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Alsheikh-Ali, A. A., Qureshi, W., Al-Mallah, M. H., & Ionnidis, J. P. A. (2011). Public availability of published research data in high impact journals. PLoS One, 6, e24357. doi: 10.1371/journal.pone.0024357.PubMedCrossRefGoogle Scholar
  2. Amid, C., Birney, E., Bower, L., Cerdeno-Tarraga, A., Cheng, Y., et al. (2012). Major submissions tool developments at the European nucleotide archive. Nucleic Acids Research, 40, D43–D47.PubMedCrossRefGoogle Scholar
  3. Anonymous. (2011). Challenges and opportunities. Science, 331, 692–693.CrossRefGoogle Scholar
  4. Benitez, K., & Malin, B. (2010). Evaluating re-identification risks with respect to the HIPAA Privacy Rule. Journal of the American Medical Information Association, 17, 169–177.CrossRefGoogle Scholar
  5. Benson, D. A., Karsch-Mizrachi, I., Clark, K., Lipman, D., Ostell, J., et al. (2012). Genbank. Nucleic Acids Research, 40, D48–D53.PubMedCrossRefGoogle Scholar
  6. Berman, J. J. (2002). Confidentiality issues for medical data miners. Artificial Intelligence in Medicine, 26, 25–36.PubMedCrossRefGoogle Scholar
  7. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., et al. (2001). Minimum information about a microarray experiment (MIAME)—Toward standards for microarray data. Nature Genetics, 29, 365–371.PubMedCrossRefGoogle Scholar
  8. Campbell, E. G., Clarridge, B. R., Gokhale, M., Birenbaum, L., Hilgartner, S., et al. (2002). Data withholding in academic genetics: Evidence from a national survey. Journal of the American Medical Association, 287, 473–480.PubMedCrossRefGoogle Scholar
  9. Collins, F. S., Morgan, M., & Patrinos, A. (2003). The human genome project: Lessons from large-scale biology. Science, 300, 286–290.PubMedCrossRefGoogle Scholar
  10. Contreras, J. (2011). Bermuda’s legacy: Policy, patents, and the design of the genome commons. Minnesota Journal of Law, Science and Technology, 12, 61–125.Google Scholar
  11. Department of Health and Human Services. (2005). Protection of Human Subjects, 45 Code of Federal Regulations, Part 46.102(f).Google Scholar
  12. Flicek, P., Amode, M. R., Barrell, D., Beal, K., Brent, S., et al. (2012). Ensembl 2012. Nucleic Acids Research, 40, D84–D90.PubMedCrossRefGoogle Scholar
  13. Foster, M. W. (1998). Model agreement for genetic research. American Journal of Human Genetics, 63, 696–702.PubMedCrossRefGoogle Scholar
  14. Foster, M. W., Eisenbraun, A. J., & Carter, T. H. (1997). Communal discourse as a supplement to informed consent for genetic research. Nature Genetics, 17, 277–279.PubMedCrossRefGoogle Scholar
  15. Foster, M. W., & Sharp, R. R. (2007). Share and share alike: Deciding how to distribute the scientific and social benefits of genomic data. Nature Reviews Genetics. doi: 10.1038/nrg2124.PubMedGoogle Scholar
  16. Galperin, M. Y., & Fernandez-Suarez, X. M. (2012). The 2012 nucleic acids research issue and the online molecular biology database collection. Nucleic Acids Research, 40, D1–D8.PubMedCrossRefGoogle Scholar
  17. Genome Canada. (2008). Data release and resource sharing. Retrieved June 25, 2012, from
  18. Genetic Information Nondiscrimination Act of 2008, Pub. Law 110-233, 122 Stat. 881. (2008).Google Scholar
  19. Heeney, C., Hawkins, N., Jd, V., Boddington, P., & Kaye, J. (2011). Assessing the privacy risks of data sharing in genomics. Public Health Genomics, 14, 17–25.PubMedCrossRefGoogle Scholar
  20. Hemphill, E. E., Dharia, A. P., Lee, C., Jakuba, C. M., Gibson, J. D., et al. (2011). Scld: A stem cell lineage database for the annotation of cell types and developmental lineages. Nucleic Acids Research, 39, D525–D533.PubMedCrossRefGoogle Scholar
  21. Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., et al. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genetics, 4(e1000167), 1000161–1000169.Google Scholar
  22. Hudson, K. L., Holohan, M. K., & Collins, F. S. (2008). Keeping pace with the times—The genetic information nondiscrimination act of 2008. New England Journal of Medicine, 358, 2661–2663.PubMedCrossRefGoogle Scholar
  23. Juengst, E. T. (1998). Groups as gatekeepers to genomic research: Conceptually confusing, morally hazardous, and practically useless. Kennedy Institute of Ethics Journal, 8, 183–200.PubMedCrossRefGoogle Scholar
  24. Kaye, J., Heeney, C., Hawkins, N., de Vries, J., & Boddington, P. (2009). Data sharing in genomics—Re-shaping scientific practice. Nature Reviews Genetics, 10, 331–335.PubMedCrossRefGoogle Scholar
  25. Kodama, Y., Shumway, M., Leinonen, R., & International Nucleotide Sequence Database Collaboration. (2012). The sequence read archive: Explosive growth of sequencing data. Nucleic Acids Research, 40, D54–D56.PubMedCrossRefGoogle Scholar
  26. Leinonen, R., Akhtar, R., Birney, E., Bower, L., Cerdeno-Tarraga, A., et al. (2011). The European nucleotide archive. Nucleic Acids Research, 39, D28–D31.PubMedCrossRefGoogle Scholar
  27. Lin, Z., Owen, A. B., & Altman, R. B. (2005). Genomic research and human subject privacy. Science, 305, 183.CrossRefGoogle Scholar
  28. Lowrance, W. W. (2002). Learning from experience: Privacy and the secondary use of data in health research. London: The Nuffield Trust.Google Scholar
  29. Lowrance, W. W. (2006). Privacy, confidentiality and identifiability in genomic research. Workship on Privacy, Confidentiality and Identifiability in Genomic Research. Oct. 3–4. Bathesda: National Human Genome Research Institute.Google Scholar
  30. Lowrance, W. W., & Collins, F. S. (2007). Identifiability in genomic research. Science, 317, 600.PubMedCrossRefGoogle Scholar
  31. Ludman, E. J., Fullerton, S. M., Spangler, L., Trinidad, S. B., Fujii, M. M., et al. (2010). Glad you asked: Participants’ opinions of re-consent for dbGaP data submission. Journal of Empirical Research on Human Research Ethics, 5, 9–16.PubMedCrossRefGoogle Scholar
  32. Mailman, M. D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., et al. (2007). The NCBI dbGaP database of genotypes and phenotypes. Nature Genetics, 39, 1181–1186.PubMedCrossRefGoogle Scholar
  33. Malin, B. (2005). Betrayed by my shadow: Learning data identity via trail matching. Journal of Privacy Technology, 20050609001.Google Scholar
  34. Malin, B. (2005b). An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. Journal of the American Medical Information Association, 12, 28–34.CrossRefGoogle Scholar
  35. Malin, B. (2006). Re-identification of familial database records. In AMIA 2006 symposium proceedings (pp. 525–528).Google Scholar
  36. Malin, B., Karp, D., & Scheuermann, R. H. (2010). Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research. Journal of Investigative Medicine, 58, 11–18.PubMedGoogle Scholar
  37. Mathews, D. J. H., Graff, G. D., Saha, K., & Winickoff, D. E. (2011). Access to stem cells and data: Persons, property rights, and scientific progress. Science, 331, 725–727.PubMedCrossRefGoogle Scholar
  38. Mayo Collaborative Services v. Prometheus Laboratories, Inc., 132 S. Ct. 1289. (2012).Google Scholar
  39. McGuire, A. L., & Gibbs, R. A. (2006). No longer de-identified. Science, 312, 370–371.PubMedCrossRefGoogle Scholar
  40. Narayanan, A., & Shmatikov, V. (2006). Robust de-anonymization of large sparse datasets. Retrieved July 12, 2012, from http://www.Cs.Utexas.Edu/~shmat/shmat_oak08netflix.Pdf.
  41. National Bioethics Advisory Commission. (1999). Research involving human biological materials: Ethical issues and policy guidance, Volume I. Rockville: National Bioethics Advisory Commission.Google Scholar
  42. National Center for Biotechnology Information. (2012). DbGaP. Washington, DC: National LIbrary of Medicine.Google Scholar
  43. National Institutes of Health. (2007). Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies (GWAS). Federal Register, 72, 49290–49297.Google Scholar
  44. Nyholt, D., Yu, C.-E., & Visscher, P. (2009). On Jim Watsons APOE status: Genetic information is hard to hide. European Journal of Human Genetics, 17, 147–150.PubMedCrossRefGoogle Scholar
  45. Office of Extramural Research. (2003). Nih data sharing policy and implementation guidance. Retrieved February 12, 2012, from http://grants.Nih.Gov/grants/policy/data_sharing/data_sharing_guidance.Htm.
  46. Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701–1777.Google Scholar
  47. Ossorio, P. (2011). Bodies of data: Genomic data and bioscience data sharing. Social Research, 78, 907–932.Google Scholar
  48. Phanstiel, D. H., Brumbaugh, J., Wenger, C. D., Tian, S., Probasco, M. D., et al. (2011). Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nature Methods, 8, 821–827.PubMedCrossRefGoogle Scholar
  49. Rai, A. K. (2005). “Open and collaborative” research: A new model for biomedicine. In R. W. Hahn (Ed.), Intellectual property rights in frontier industries. Washington, DC: AEI Press.Google Scholar
  50. Rodriguez, H., Snyder, M., Uhlen, M., Andrews, P., Beavis, R., et al. (2009). Recommendations from the 2008 international summit on proteomics data release and sharing policy: The Amsterdam principles. Journal of Proteome Research, 8, 3689–3692.PubMedCrossRefGoogle Scholar
  51. Sankararaman, S., Obozinski, G., Jordan, M. I., & Halperin, E. (2009). Genome privacy and limits of individual detection in a pool. Nature Genetics, 41, 966–967.CrossRefGoogle Scholar
  52. Sayers, E. W., Barrett, T., Benson, D. A., Bolton, E., Bryant, S. H., et al. (2012). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 40, D13–D25.PubMedCrossRefGoogle Scholar
  53. Schadt, E. E., Woo, S., & Hao, K. (2012). Bayesian method to predict individual SNP genotypes from gene expression data. Nature Genetics, 44, 603–609.PubMedCrossRefGoogle Scholar
  54. Sharp, R. R., & Foster, M. W. (2000). Involving study populations in the review of genetic research. Journal of Law, Medicine & Ethics, 28, 41–51.CrossRefGoogle Scholar
  55. Sherry, S. T., Ward, M., Kholodov, M., Baker, J., Phan, L., et al. (2001). dbSNP: The NCBI database of genetic variation. Nucleic Acids Research, 29, 308–311.PubMedCrossRefGoogle Scholar
  56. Stein, L. D. (2010). The case for cloud computing in genome informatics. Genome Biology, 11, 207.PubMedCrossRefGoogle Scholar
  57. Sui, S. J. H., Begley, K., Reilly, D., Chapman, B., McGovern, R., et al. (2012). The stem cell discovery engine: An integrated repository and analysis system for cancer stem cell comparisons. Nucleic Acids Research, 40, D984–D991.CrossRefGoogle Scholar
  58. Sweeny, L. (1996). Uniqueness of simple demographics in the u.S. Population. Working Paper LIDAP-WP4. Data Privacy Lab, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
  59. Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., et al. (2011). Data sharing by scientists: Practices and perceptions. PLoS One, 6, e21101. doi: 10.1371/journal.pone.0021101.PubMedCrossRefGoogle Scholar
  60. The 1000 Genomes Project Consortium. (2010). A map of human genome variation from population-scale sequencing. Nature, 467, 1061–1073.CrossRefGoogle Scholar
  61. The Hinxton Group. (2010). Statement on policies and practices governing data and materials sharing and intellectual property in stem cell science. Retrieved February 12, 2012, from http://www.Hinxtongroup.Org/consensus_hg10_final.Pdf.
  62. The International HapMap Consortium. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861.CrossRefGoogle Scholar
  63. The UniProt Consortium. (2012). Reorganizing the protein space at the universal protein resource (UniProt). Nucleic Acids Research, 40, D71–D75.CrossRefGoogle Scholar
  64. Toronto International Data Release Workshop. (2009). Prepublication data sharing. Nature, 461, 168–170.CrossRefGoogle Scholar
  65. Turnbaugh, P. J., Ley, R. E., Hamady, M., Fraser-Liggett, C. M., Knight, R., et al. (2007). The human microbiome project. Nature, 449, 804–810.PubMedCrossRefGoogle Scholar
  66. U.S. Agency for Healthcare Research and Quality, Bill and Melinda Gates Foundation (U.S.), U.S. Centers for Disease Control, Doris Duke Charitable Foundation (U.S.), U.S. Health Resources and Services Administration, Hewlett Foundation (U.S.), U.S. National Institutes of Health, U.S. Substance Abuse and Mental Health Services Administration, Canadian Institutes of Health Research, Deutsche Forschungsgemeinschaft, Economic and Social Research Council (UK), Medical Research Council (UK), Wellcome Trust (UK), Health Research Council of New Zealand, INSERM (FR), National Health and Medical Research Council (Australia), The World Bank. The list of signatories can be found at:
  67. Weijer, C., Goldsand, G., & Emanuel, E. J. (1999). Protecting communities in research: Current guidelines and limits of extrapolation. Nature Genetics, 23, 275–280.PubMedCrossRefGoogle Scholar
  68. Wellcome Trust. (2010). Data management and sharing. Retrieved June 15, 2012, from http://www.Wellcome.Ac.Uk/about-us/policy/spotlight-issues/data-sharing/data-management-and-sharing/index.Htm.
  69. Wellcome Trust.(2011). Sharing research data to improve public health: Full joint statement by funders of health researcher. Retrieved July 8, 2012, from http://www.Wellcome.Ac.Uk/about-us/policy/spotlight-issues/data-sharing/public-health-and-epidemiology/wtdv030690.Htm.
  70. Wheeler, D. A., Srinivasan, M., Egholm, M., Shen, Y., Cen, L., et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–877.PubMedCrossRefGoogle Scholar
  71. Wolf, S. M., Crock, B. N., Van Ness, B., Lawrenz, F., Kahn, J. P., et al. (2012). Managing incidental findings and research results in genomic research involving biobanks and archived datasets. Genetics in Medicine, 14, 361–384.PubMedCrossRefGoogle Scholar
  72. Wolf, S. M., Paradise, J., Nelson, C. A., Kahn, J. P., & Lawrenz, F. (2008). Managing incidental findings in human subjects research: Analysis and recommendations. Journal of Law, Medicine & Ethics, 36, 219–248.CrossRefGoogle Scholar
  73. Woodman, R. (1999). Wellcome Trust and drug giants fund gene marker database. British Medical Journal, 318, 1093.PubMedCrossRefGoogle Scholar
  74. Yeniterzi, R., Aberdeen, J., Bayer, S., Wellner, B., Hirschman, L., et al. (2010). Effects of personal identifier resynthesis on clinical text de-identification. Journal of the American Medical Information Association, 17, 159–168.CrossRefGoogle Scholar
  75. Zink, A., & Silman, A. (2008). Ethical and legal constraints on data sharing between countries in multinational epidemiological studies in Europe: Report from a joint workshop of the European League Against Rheumatism standing committee on epidemiology with the “AutoCure” Project. Annals of Rheumatic Disease, 67, 1041–1043.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Wisconsin, School of Law and School of Medicine and Public Health, Morgridge Institute for ResearchMadisonUSA

Personalised recommendations