Storage Hierarchies for Big Data
Big data applications usually have to rely on a combination of storage media to achieve an economic balance between the capabilities of different media types and application needs.
Applications requirements vary significantly in data lifetime, number of concurrent data producers/consumers, fraction of active to passive data volume, sharing between parallel processing units, and the relative balance between CPU and IO requirements.
Storage media properties vary by several orders in price per capacity, latency for sequential and random access patterns, aggregate and single stream bandwidth, power requirements, endurance, and reliability. Several methods exist to further adapt these capabilities by combining several storage devices of the same type, but larger and economically efficient setups are constructed by combining several different storage technologies.
In addition, larger storage deployments typically provide services to more than one application and hence aim to...
- Bird I et al (2005) LHC computing grid—technical design report. CERN-LHCC-2005-024Google Scholar
- Bonwick J, Moore B (2003) ZFS: the last word in file systems. http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf
- Brewer E et al (2016) Disks for data centers. https://research.google.com/pubs/pub44830.html
- Feldman T, Gibson G (2013) Shingled magnetic recording—areal density increase requires new data management. Login 38(3):22–30Google Scholar
- Gregg B (2009) Hybrid storage pool: top speeds. http://dtrace.org/blogs/brendan/2009/10/08/hybrid-storage-pool-top-speeds/
- Gupta P et al (2014) An economic perspective of disk vs. flash media in archival storage. In: IEEE MASCOTS 2014Google Scholar
- Intel (2015) Micron debut 3D XPoint storage technology 1,000x faster than current SSDs. https://www.cnet.com/news/intel-and-micron-debut-3d-xpoint-storage-technology-thats-1000-times-faster-than-existing-drives/
- Klein A (2017) Hard disk cost per gigabyte. https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/
- Mellor C (2014) Kryder’s law craps out: race to UBER-CHEAP STORAGE is OVER. https://www.theregister.co.uk/2014/11/10/kryders_law_of_ever_cheaper_stor-age_disproven/
- Zaharia M et al (2010) Spark: cluster computing with working sets. Technical report No. UCB/EECS-2010-53, University of California, BerkeleyGoogle Scholar
- Zhu J, Zhu X, Tang Y (2007) Microwave Assisted Magnetic Recording, IDEMAGoogle Scholar