Skip to main content

A simple model illustrating the virtue of replication for long-term information preservation


Each year destructive events might cause loss of data in members of an archival federation. This paper provides a ‘back-of-the-envelope’ model for the fraction of the federated data collection that survives after a certain number of years. It also discusses some simple parameterizations of factors that contribute to the trade offs between cost and survival of information.

This is a preview of subscription content, access via your institution.


  • Berriman GB, Groom SL (2012) How will astronomy archives survive the data tsunami. CACM 54:52–56

    Google Scholar 

  • Carroll GR, Hannan MT (2000) The demography of corporations and industries. Princeton Univeristy Press, Princeton, NJ

    Google Scholar 

  • Christiansen C (1997) The innovator’s dilemma: when new technologies cause great firms to fail. Harvard Business School Press, Boston, MA

    Google Scholar 

  • Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI’04: sixth symposium on operating System Design and Implementation, San Francisco, CA

  • Gertsbakh I, Shpungin Y (2010) Models of network reliability: analysis, combinatorics, and Monte Carlo. CRC Press, Boca Raton, FL

    Google Scholar 

  • Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. SIGOPS Oper Syst Rev 37:29–43. doi:10.1145/1165389.945450

    Article  Google Scholar 

  • Komorowski M (2011) A history of storage cost. Available at

  • Moore R, JaJa J, Chadduck R (2005) Mitigating risk of data loss in preservation environments. In: Proceedings of 22nd IEEE/13th NASA Goddard conference on Mass Storage Systems and Technologies (MSST 2005)

  • NOAA (2006) Celebrating 200 years at NOAA. Obsolete web page available at Accessed 3 May 2012

  • Rosenthal DSH, Robertson T, Lipkis T, Reich V, Morabito S (2005) Requirements for digital preservation systems: a bottom-up approach. Stanford University Libraries, CA

    Google Scholar 

  • Sawyer D, Hills K, Mccaslin P (2004) Preserving access to legacy information through data migration at NSSDC: experiences and lessons learned. In: Proceedings of ensuring the long-term presrevation and adding value to the scienctific data symposium, PV2004, ESA/ESRIN Frascati, Italy, 5–7 Oct 2004. Available from

  • Smith I (2008) Disk and tape storage cost models. Available on-line at

  • Smith I (2012) Cost of hard drive storage space. Available at

  • Tran J, Cinquini L, Mattmann C, Zimdars P, Cuddy D, Leung K, Kwoun O, Crichton D, Freeborn D (2011) Evaluating cloud computing in the NASA DESDynI ground data system. In: Proceedings of the ICSE 2011 workshop on Software Engineering for Cloud Computing—SECLOUD, Honolulu, HI, 22 May 2011

Download references


Support for Dr. Mattmann’s effort was provided by the Jet Propulsion Laboratory, California Institute of Technology under contract to the National Aeronautics and Space Administration.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Bruce R. Barkstrom.

Additional information

Communicated by: Hassan A. Babaie

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Barkstrom, B.R., Mattmann, C.A. A simple model illustrating the virtue of replication for long-term information preservation. Earth Sci Inform 5, 105–109 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Information loss rates
  • Parameterizations of rates of storage volume increases and costs