Skip to main content

Optimizing Access Policies for Big Data Repositories: Latency Variables and the Genome Commons

Part of the Studies in Big Data book series (SBD,volume 18)


The design of access policies for large aggregations of scientific data has become increasingly important in today’s data-rich research environment. Planners routinely consider and weigh different policy variables when deciding how and when to release data to the public. This chapter proposes a methodology in which the timing of data release can be used to balance policy variables and thereby optimize data release policies. The global aggregation of publicly-available genomic data, or the “genome commons” is used as an illustration of this methodology.


  • Commons
  • Genome
  • Data sharing
  • Latency

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-30265-2_9
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-30265-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   199.99
Price excludes VAT (USA)


  1. 1.

    By “materially encumbered” I mean that one or more material restrictions on the use of the data exist. These might include a contractual or policy embargo on presentation or publication of further results based on that data. At the extreme end of the spectrum, patent rights that wholly prevent use of the data can be viewed as another variety of encumbrance.

  2. 2.

    Knowledge latency in a given information commons may be expressed either as a mandated value (derived from policy requirements), or as an actual value. It goes without saying that the actual value for knowledge latency may deviate from the mandated value for a number of reasons, including technical variations in data deposit practices and intentional or inadvertent non-compliance by data generators. As with any set of policy-imposed timing requirements (e.g., time periods for making filings with governmental agencies), it is important to consider the mandated time delay for the deposit of data to an information commons. Because a mandated value is also, theoretically, the maximum amount of time that should elapse before a datum is deposited in the commons, knowledge latency is expressed in this chapter in terms of its maximum value.

  3. 3.

    As with knowledge latency, this term may be applied to an individual datum (i.e., representing the time before a particular datum becomes freely usable) or to the commons as a whole (i.e., representing the maximum time that it will take for data within the commons to become freely usable).

  4. 4.

    In the U.S. and many other countries, the patent term lasts for twenty years from the date of filing.

  5. 5.

    Prior work had focused on simple model organisms and technology development.

  6. 6.

    Creative Commons is a non-profit organization that makes available a suite of open access licenses intended to facilitate the contribution of content and data to the public. See

  7. 7.

    A “click-wrap” agreement (alternatively referred to as a “click-through” or “click-to-accept” agreement or license) is “an electronic form agreement to which [a] party may assent by clicking an icon or a button or by typing in a set of specified words” [24].

  8. 8.

    The Bayh-Dole Act of 1980, P.L. 96-517, codified at 35 U.S.C. §§200-12, rationalized the previously chaotic rules governing federally-sponsored inventions and strongly encourages researchers to obtain patents on inventions arising from federally-funded research.

  9. 9.

    The GDS policy refers specifically to the U.S. Supreme Court’s decision in Assn. for Molecular Pathology v. Myriad Genetics, 133 S.Ct. 2107 (2013).


  1. Contreras, J.L.: Prepublication data release, latency and genome commons. Science 329, 393–94 (2010a)

    Google Scholar 

  2. Contreras, J.L.: Data Sharing, latency variables and science commons. Berkeley Tech. L.J. 25, 1601–1672 (2010b)

    Google Scholar 

  3. Ostrom, E., Hess, C.: A framework for analyzing the knowledge commons. In: Hess, C., Ostrom, E. (eds.) Understanding Knowledge as a Commons: From Theory to Practice. MIT Press, Cambridge, Mass (2007)

    Google Scholar 

  4. Benson, B.A., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: GenBank. Nucleic Acids Res. 42, D32–D37 (2014). doi:10.1093/nar/gkt1030

    CrossRef  Google Scholar 

  5. Natl. Ctr. Biotechnology Info. (NCBI): Growth of GenBank and WGS. (2015). Accessed 14 June 2015

  6. Pennisi, E.: Will computers crash genomics? Science 331, 666–667 (2011)

    CrossRef  Google Scholar 

  7. Natl. Res. Council (NRC): Mapping and Sequencing the Human Genome. Natl. Acad. Press, Washington (1988)

    Google Scholar 

  8. Oak Ridge Natl. Lab. (ORNL): NIH, DOE guidelines encourage sharing of data, resources. Hum. Genome News 4, 4. (1993)

  9. Contreras, J.L.: Bermuda’s legacy: policy, patents, and the design of the genome commons. Minn. J.L. Sci. Tech. 12, 61–125 (2011)

    Google Scholar 

  10. Natl. Res. Council (NRC): Bits of Power—Issues in Global Access to Scientific Data. Natl. Acad. Press, Washington (1997)

    Google Scholar 

  11. Reichman, J.H., Uhlir, P.F.: A contractually reconstructed research commons for scientific data in a highly protectionist intellectual property environment. Law Contemp. Probs. 66, 315–462 (2003)

    Google Scholar 

  12. Intl. Human Genome Sequencing Consortium (IHGSC): Initial sequencing and analysis of the human genome. Nature 409, 860–914 (2001)

    CrossRef  Google Scholar 

  13. Bermuda Principles: Summary of principles agreed at the first international strategy meeting on human genome sequencing. (2006)

  14. Kaye, J., et al.: Data sharing in genomics—re-shaping scientific practice. Nat. Rev. Genet. 10, 331–335 (2009)

    CrossRef  Google Scholar 

  15. Wellcome Trust: Sharing Data from Large-Scale Biological Research Projects: A System of Tripartite Responsibility: Report of meeting organized by the Wellcome Trust and held on 14–15 January 2003 at Fort Lauderdale, USA. (2003)

  16. Natl. Inst. Health (NIH): Policy for sharing of data obtained in NIH supported or conducted Genome-Wide Association Studies (GWAS). Fed. Reg. 72, 49,290 (2007)

    Google Scholar 

  17. Merck & Co., Inc.: First installment of merck gene index data released to public databases: cooperative effort promises to speed scientific understanding of the human genome. (1995)

  18. Marshall, E.: Bermuda rules: community spirit, with teeth. Science 291, 1192–1193 (2001)

    CrossRef  Google Scholar 

  19. Holden, A.L.: The SNP consortium: summary of a private consortium effort to develop an applied map of the human genome. Biotechniques 32, 22–26 (2002)

    Google Scholar 

  20. Contreras, J.L., Floratos, A., Holden, A.L.: The international serious adverse events consortium’s data sharing model. Nat. Biotech. 31, 17–19 (2013)

    CrossRef  Google Scholar 

  21. Personal Genome Project (PGP): About the PGP. (2014). Accessed 25 June 2014

  22. Natl. Inst. Health (NIH): Final NIH genomic data sharing policy. Fed. Reg. 79, 51345–51354 (2014)

    Google Scholar 

  23. Contreras, J.L.: NIH’s genomic data sharing policy: timing and tradeoffs. Trends Genet. 31, 55–57 (2015)

    CrossRef  Google Scholar 

  24. Kunz, C.L., et al.: Click-through agreements: strategies for avoiding disputes on validity of assent. Bus. Lawyer 57, 401–429 (2001)

    Google Scholar 

  25. Delta, G.B., Matsuura, J.H. (eds.): Law of the Internet, 3rd edn. Aspen, New York (2014)

    Google Scholar 

  26. Rai, A.K., Eisenberg, R.S.: Bayh-Dole reform and the progress of biomedicine. Law Contemp. Probs. 66, 289–314 (2003)

    Google Scholar 

  27. GAIN Collaborative Research Group: New models of collaboration in genome-wide association studies: the genetic association information network. Nat. Genet. 39, 1045–1051 (2007)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jorge L. Contreras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Contreras, J.L. (2016). Optimizing Access Policies for Big Data Repositories: Latency Variables and the Genome Commons. In: Emrouznejad, A. (eds) Big Data Optimization: Recent Developments and Challenges. Studies in Big Data, vol 18. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30263-8

  • Online ISBN: 978-3-319-30265-2

  • eBook Packages: EngineeringEngineering (R0)