Scalable Storage for Data-Intensive Computing

  • Abhishek VermaEmail author
  • Shivaram Venkataraman
  • Matthew Caesar
  • Roy H. Campbell


Persistent storage is a fundamental abstraction in computing. It consists of a named set of data items that come into existence through explicit creation, persist through temporary failures of the system, until they are explicitly deleted. Sharing of data in distributed systems has become pervasive as these systems have grown in scale in terms of number of machines and the amount of data stored. The phenomenal growth of web services in the past decade has resulted in many Internet companies needing to perform large scale data analysis such as indexing the contents of the billions of websites or analyzing terabytes of traffic logs to mine usage patterns. A study into the economics of distributed computing [1] published in 2008, revealed that the cost of transferring data across the network is relatively high. Hence moving computation near the data is a more efficient computing model and several large scale, data-intensive application frameworks [2, 3] exemplify this model.


Fault Tolerance File System Distribute Hash Table Virtual Node Hadoop Distribute File System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was funded in part by NSF IIS grant 0841765 and in part by NSF CCF grant 0964471. The views expressed are those of the authors only.


  1. 1.
    J. Gray, “Distributed computing economics,” Queue, vol. 6, no. 3, pp. 63–68, 2008.CrossRefGoogle Scholar
  2. 2.
    J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.CrossRefGoogle Scholar
  3. 3.
    M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential building blocks,” in EuroSys ’07: Proc. of the 2nd ACM SIGOPS, New York, NY, USA, 2007, pp. 59–72.Google Scholar
  4. 4.
    J. Dean, “Large-Scale Distributed Systems at Google: Current Systems and Future Directions,” 2009.Google Scholar
  5. 5.
    J. Gantz and D. Reinsel, “As the economy contracts, the Digital Universe expands,” IDC Multimedia White Paper, 2009.Google Scholar
  6. 6.
    M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of Cloud Computing,” EECS Department, University of California, Berkeley, Tech. Rep., 2009.Google Scholar
  7. 7.
    S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29–43, 2003.CrossRefGoogle Scholar
  8. 8.
    M. K. McKusick and S. Quinlan, “GFS: Evolution on Fast-forward,” Queue, vol. 7, no. 7, pp. 10–20, 2009.CrossRefGoogle Scholar
  9. 9.
    D. Roselli, J. Lorch, and T. Anderson, “A comparison of file system workloads,” in Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association, 2000.Google Scholar
  10. 10.
    J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao, “Oceanstore: An architecture for global-scale persistent storage,” in Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000.Google Scholar
  11. 11.
    J. Ledlie, J. Shneidman, M. Seltzer, and J. Huth, “Scooped, again,” Lecture notes in computer science, pp. 129–138, 2003.Google Scholar
  12. 12.
    R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, “Design and implementation of the sun network filesystem,” in Proceedings of the Summer 1985 USENIX Conference, 1985, pp. 119–130.Google Scholar
  13. 13.
    J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West, “Scale and performance in a distributed file system,” ACM Transactions on Computer Systems (TOCS), vol. 6, no. 1, pp. 51–81, 1988.Google Scholar
  14. 14.
    D. Eastlake and P. Jones, “US secure hash algorithm 1 (SHA1),” RFC 3174, September, Tech. Rep., 2001.Google Scholar
  15. 15.
    B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowicz, “Tapestry: a resilient global-scale overlay for service deployment,” in IEEE J. Selected Areas in Communications, January 2003.Google Scholar
  16. 16.
    A. Rowstron and P. Druschel, “Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems,” in ACM Middleware, November 2001.Google Scholar
  17. 17.
    G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s highly available key-value store,” ACM SIGOPS Operating Systems Review, vol. 41, no. 6, p. 220, 2007.Google Scholar
  18. 18.
    B. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. Kaashoek, J. Kubiatowicz, and R. Morris, “Efficient replica maintenance for distributed storage systems,” in Proc. of NSDI, vol. 6, 2006.Google Scholar
  19. 19.
    P. Corbett and D. Feitelson, “The Vesta parallel file system,” ACM Transactions on Computer Systems (TOCS), vol. 14, no. 3, pp. 225–264, 1996.Google Scholar
  20. 20.
    P. Schwan, “Lustre: Building a file system for 1000-node clusters,” in Proceedings of the 2003 Linux Symposium, 2003.Google Scholar
  21. 21.
    S. Weil, S. Brandt, E. Miller, D. Long, and C. Maltzahn, “Ceph: A scalable, high-performance distributed file system,” in Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), 2006.Google Scholar
  22. 22.
    P. Druschel and A. Rowstron, “PAST: A large-scale, persistent peer-to-peer storage utility,” in Proc. HotOS VIII, 2001, pp. 75–80.Google Scholar
  23. 23.
    F. Dabek, M. Kaashoek, D. Karger, R. Morris, and I. Stoica, “Wide-area cooperative storage with CFS,” ACM SIGOPS Operating Systems Review, vol. 35, no. 5, pp. 202–215, 2001.CrossRefGoogle Scholar
  24. 24.
    I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: a scalable peer-to-peer lookup service for Internet applications,” in ACM SIGCOMM, August 2001.Google Scholar
  25. 25.
    T. M. G. Athicha Muthitacharoen, Robert Morris and B. Chen, “Ivy: A Read/Write Peer-to-Peer File System,” in OSDI, December 2002.Google Scholar
  26. 26.
    R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh, and R. Campbell, “A survey of peer-to-peer storage techniques for distributed file systems,” in ITCC, vol. 5, pp. 205–213.Google Scholar
  27. 27.
    A. Lakshman and P. Malik, “Cassandra: structured storage system on a P2P network,” in Proc. of the 28th ACM symposium on Principles of distributed computing, 2009.Google Scholar
  28. 28.
    F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber, “Bigtable: A distributed storage system for structured data,” in Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI06), 2006.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Abhishek Verma
    • 1
    Email author
  • Shivaram Venkataraman
    • 1
  • Matthew Caesar
    • 1
  • Roy H. Campbell
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations