Journal of Grid Computing

, Volume 7, Issue 1, pp 51–72 | Cite as

Chirp: a practical global filesystem for cluster and Grid computing

  • Douglas Thain
  • Christopher Moretti
  • Jeffrey Hemmes
Article

Abstract

Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.

Keywords

Filesystem Grid computing Cluster computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexandrov, A., Ibel, M., Schauser, K., Scheiman, C.: UFO: a personal global file system based on user-level extensions to the operating system. ACM Trans. Comput. Syst. 16, 207–233 (1998)CrossRefGoogle Scholar
  2. 2.
    Allcock, W., Bresnahan, J., Kettimuthu, R., Link, J.: The globus eXtensible input/output system (XIO): A protocol independent IO system for the Grid. In: Workshop on Middleware for Grid Computing, Melbourne (2005) NovemberGoogle Scholar
  3. 3.
    Allcock, W., Chervenak, A., Foster, I., Kesselman, C., Tuecke, S.: Protocols and services for distributed data-intensive science. In: Proceedings of Advanced Computing and Analysis Techniques in Physics Research, pp. 161–163, Fermi National Accelerator Laboratory, Batavia, IL, 16–20 October 2000Google Scholar
  4. 4.
    Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 3(215), 403–410 (1990)Google Scholar
  5. 5.
    Andrews, P., Kovatch, P., Jordan, C.: Massive high-performance global file systems for Grid computing. In: Supercomputing, Seattle, WA (2005) NovemberGoogle Scholar
  6. 6.
    Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of CASCON, Toronto (1998) NovemberGoogle Scholar
  7. 7.
    Batsakis, A., Burns, R.: Cluster delegation: High-performance fault-tolerant data sharing in NFS. In: High Performance Distributed Computing, Honolulu, 4–6 June 2004Google Scholar
  8. 8.
    Beck, M., Moore, T., Plank, J.: An end-to-end approach to globally scalable network storage. In: ACM SIGCOMM. Pittsburgh, Pennsylvania, 19–23 August 2002Google Scholar
  9. 9.
    Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Livny, M.: Flexibility, manageability, and performance in a Grid storage appliance. In: IEEE Symposium on High Performance Distributed Computing, Edinburgh, Scotland, 24–26 July 2002Google Scholar
  10. 10.
    Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256 (1992)CrossRefGoogle Scholar
  11. 11.
    Bester, J., Foster, I., Kesselman, C., Tedesco, J., Tuecke, S.: GASS: a data movement and access service for wide area computing systems. In: 6th Workshop on I/O in Parallel and Distributed Systems. ACM, New York (1999)Google Scholar
  12. 12.
    Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, B., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3), 219–237 (2005)Google Scholar
  13. 13.
    Figueiredo, R., Kapadia, N., Fortes, J.: The PUNCH virtual file system: seamless access to decentralized storage services in a computational Grid. In: IEEE High Performance Distributed Computing. San Francisco, CA, 7–9 August 2001Google Scholar
  14. 14.
    Foster, I., Kesselman, C., Nick, J., Tuecke, S.: Grid services for distributed system integration. IEEE Comp. 35(6), 37–46 (2002)Google Scholar
  15. 15.
    Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational Grids. In: ACM Conference on Computer and Communications Security, pp. 83–92, San Francisco, CA, 3–5 November 1998Google Scholar
  16. 16.
    Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the Grid: Enabling scalable virtual organizations. Lect. Notes Comput. Sci. 2150, 1–4 (2001)CrossRefGoogle Scholar
  17. 17.
    Ganguly, A., Agrawal, A., Boykin, P. O., Figueiredo, R.J.: WOW: Self organizing wide area overlay networks of workstations. J. Grid Computing 5(2) (2007)Google Scholar
  18. 18.
    Gray, C., Cheriton, D.: Lease: an efficient fault-tolerant mechanism for distributed file cache consistency. In: Twelfth ACM Symposium on Operating Systems Principles, pp. 202–210, Litchfield Park, Arizona, 3–6 December 1989Google Scholar
  19. 19.
    Grimshaw, A., Wulf, W., et al.: The legion vision of a worldwide virtual computer. Commun. ACM 40(1), 39–45 (1997)CrossRefGoogle Scholar
  20. 20.
    Hemmes, J., Thain, D.: Cacheable decentralized groups for Grid resource access control. In: IEEE Conference on Grid Computing, Barcelona, 28–29 September 2006Google Scholar
  21. 21.
    Honeyman, P., Adamson, W.A., McKee, S.: GridNFS: global storage for global collaboration. In: Local to Global Data Interoperability. IEEE, Piscataway (2005)Google Scholar
  22. 22.
    Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., West, M.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)CrossRefGoogle Scholar
  23. 23.
    Jones, M.: Interposition agents: transparently interposing user code at the system interface. In: 14th ACM Symposium on Operating Systems Principles, pp. 80–93, Asheville, 5–8 December 1993Google Scholar
  24. 24.
    Li, W., Liang, J., Xu, Z.: VegaFS: a prototype for file sharing crossing multiple domains. In: IEEE Conference on Cluster Computing, Hong Kong, 1–4 December 2003Google Scholar
  25. 25.
    Moretti, C., Faltemier, T., Thain, D., Flynn, P.: Challenges in executing data intensive biometric workloads on a desktop Grid. In: Workshop on Large Scale and Volatile Desktop Grids, Long Beach, CA (2007) MarchGoogle Scholar
  26. 26.
    Patterson, D.A., Gibson, G., Katz, R.: A case for redundant arrays of inexpensive disks (RAID). In: ACM SIGMOD international conference on management of data, pp. 109–116, Chicago, Illinois, 1–3 June 1988Google Scholar
  27. 27.
    Phillips, P., et al.: Overview of the face recognition grand challenge. In: IEEE Computer Vision and Pattern Recognition. IEEE, Piscataway (2005)Google Scholar
  28. 28.
    Plank, J., Beck, M., Elwasif, W., Moore, T., Swany, M., Wolski, R.: The internet backplane protocol: Storage in the network. In: Network Storage Symposium, Seattle, WA, 14–15 October 1999Google Scholar
  29. 29.
    Poirier, J., Canough, G., Gress, J., Mikocki, S., Rettig, T.: Nucl. Phys. B Proc. Suppl. 14, 143–147 (1990)CrossRefGoogle Scholar
  30. 30.
    Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., Lyon, B.: Design and implementation of the Sun network filesystem. In: USENIX Summer Technical Conference, pp. 119–130, Portland (1985)Google Scholar
  31. 31.
    Shoshani, A., Sim, A., Gu, J.: Storage resource managers: middleware components for Grid storage. In: Nineteenth IEEE Symposium on Mass Storage Systems, Maryland, 15–18 April 2002Google Scholar
  32. 32.
    Srinivasan, V., Mogul, J.: Spritely NFS: Experiments with cache consistency protocols. In: ACM Symposium on Operating Systems Principles, Litchfield Park, 3–6 December 1989Google Scholar
  33. 33.
    Steiner, J., Neuman, C., Schiller, J.I.: Kerberos: An authentication service for open network systems. In: Proceedings of the USENIX Winter Technical Conference, pp. 191–200 (1988)Google Scholar
  34. 34.
    Stone, N., et al.: PDIO: High performance remote file I/O for portals enabled compute nodes. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV (2006)Google Scholar
  35. 35.
    Tatebe, O., Soda, N., Morita, Y., Matsuoka, S., Sekiguchi, S.: Gfarm v2: a Grid file system that supports high-performance distributed and parallel data computing. In: Computing in High Energy Physics (CHEP) (2004) SeptemberGoogle Scholar
  36. 36.
    Thain, D.: Operating system support for space allocation in Grid storage systems. In: IEEE Conference on Grid Computing. IEEE, Piscataway (2006)Google Scholar
  37. 37.
    Thain, D., Klous, S., Wozniak, J., Brenner, P., Striegel, A., Izaguirre, J.: Separating abstractions from resources in a tactical storage system. In: IEEE/ACM Supercomputing. IEEE, Piscataway (2005)Google Scholar
  38. 38.
    Thain, D., Livny, M.: Bypass: a tool for building split execution systems. In: IEEE High Performance Distributed Computing. IEEE, Pittsburg, PA (2000)Google Scholar
  39. 39.
    Thain, D., Livny, M.: Parrot: transparent user-level middleware for data-intensive computing. In: Proceedings of the Workshop on Adaptive Grid Middleware, New Orleans (2003)Google Scholar
  40. 40.
    Thain, D., Moretti, C.: Efficient access to many small files in a filesystem for Grid computing. In: IEEE Conference on Grid Computing. IEEE, Austin, TX (2007)Google Scholar
  41. 41.
    Thain, D., Tannenbaum, T., Livny, M.: Condor and the Grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)Google Scholar
  42. 42.
    Vazhkudai, S., Ma, X., Freeh, V., Strickland, J., Tammineedi, N., Scott, S.: FreeLoader: Scavenging desktop storage resources for scientific data. In: Supercomputing, Seattle, WA (2005) NovemberGoogle Scholar
  43. 43.
    Walker, E.: A distributed file system for a wide-area high performance computing infrastructure. In: USENIX Workshop on Real Large Distributed Systems, Seattle, WA (2006) NovemberGoogle Scholar
  44. 44.
    Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: USENIX Operating Systems Design and Implementation, Seattle, WA (2006) NovemberGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Douglas Thain
    • 1
  • Christopher Moretti
    • 1
  • Jeffrey Hemmes
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of Notre DameNotre DameUSA

Personalised recommendations