Flexibility and performance of parallel file systems

  • David Kotz
  • Nils Nieuwejaar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1127)


As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port.

We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (APIs).

We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    B. Bershad, S. Savage, P. Pardyak, E. Gün Sirer, M. E. Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Extensibility, safety and performance in the SPIN operating system. In Proc. of the 15th ACM SOSP, pages 267–284, Dec. 1995.Google Scholar
  2. 2.
    A. J. Borr and F. Putzolu. High performance SQL through low-level system integration. In Proc. of the ACM SIGMOD Conf., pages 342–349, 1988.Google Scholar
  3. 3.
    J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related communication in distributed shared-memory systems. ACM TOCS, 13(3):205–243, Aug. 1995.Google Scholar
  4. 4.
    A. Choudhary, R. Bordawekar, M. Harry, R. Krishnaiyer, R. Ponnusamy, T. Singh, and R. Thakur. PASSION: parallel and scalable software for input-output. Technical Report SCCS-636, ECE Dept., NPAC and CASE Center, Syracuse University, Sept. 1994.Google Scholar
  5. 5.
    P. Corbett, D. Feitelson, Y. Hsu, J.-P. Prost, M. Snir, S. Fineberg, B. Nitzberg, B. Traversat, and P. Wong. MPI-IO: a parallel file I/O interface for MPI. Technical Report NAS-95-002, NASA Ames Research Center, Jan. 1995. Version 0.3.Google Scholar
  6. 6.
    P. F. Corbett, D. G. Feitelson, J.-P. Prost, G. S. Almasi, S. J. Baylor, A. S. Bolmarcich, Y. Hsu, J. Satran, M. Snir, R. Colao, B. Herr, J. Kavaky, T. R. Morgan, and A. Zlotek. Parallel file systems for the IBM SP computers. IBM Sys. Journal, 34(2):222–248, Jan. 1995.Google Scholar
  7. 7.
    T. H. Cormen and A. Colvin. ViC*: A preprocessor for virtual-memory C*. Technical Report PCS-TR94-243, Dept. of Computer Science, Dartmouth College, Nov. 1994.Google Scholar
  8. 8.
    T. H. Cormen and D. Kotz. Integrating theory and practice in parallel file systems. In Proc. of the 1993 DAGS/PC Symposium, pages 64–74, Hanover, NH, June 1993. Dartmouth Inst. for Adv. Graduate Studies. Revised as Dartmouth PCS-TR93-188 on 9/20/94.Google Scholar
  9. 9.
    E. DeBenedictis and J. M. del Rosario. nCUBE parallel I/O software. In Proc. of the 11th IPCCC, pages 0117–0124, Apr. 1992.Google Scholar
  10. 10.
    J. M. del Rosario, R. Bordawekar, and A. Choudhary. Improved parallel I/O via a two-phase run-time access strategy. In IPPS '93 Workshop on I/O in Par. Comp. Sys., pages 56–70, 1993. Also published in Computer Architecture News 21(5), December 1993, pages 31–38.Google Scholar
  11. 11.
    J. M. del Rosario and A. Choudhary. High performance I/O for parallel computers: Problems and prospects. IEEE Computer, 27(3):59–68, Mar. 1994.Google Scholar
  12. 12.
    P. C. Dibble. A Parallel Interleaved File System. PhD thesis, University of Rochester, Mar. 1990.Google Scholar
  13. 13.
    I. Foster and J. Nieplocha. ChemIO: High-performance I/O for computational chemistry applications. WWW http://www.mcs.anl.gov/chemio/, Feb. 1996.Google Scholar
  14. 14.
    R. S. Gaines. An operating system based on the concept of a supervisory computer. Comm. of the ACM, 15(3):150–156, Mar. 1972.Google Scholar
  15. 15.
    G. A. Gibson, D. Stodolsky, P. W. Chang, W. V. Courtright II, C. G. Demetriou, E. Ginting, M. Holland, Q. Ma, L. Neal, R. H. Patterson, J. Su, R. Youssef, and J. Zelenka. The Scotch parallel storage systems. In Proc. of 40th IEEE Computer Society International Conference (COMPCON 95), pages 403–410, San Francisco, Spring 1995.Google Scholar
  16. 16.
    J. Gosling and H. McGilton. The Java language: A white paper. Sun Microsystems, 1994.Google Scholar
  17. 17.
    R. S. Gray. Agent Tcl: A transportable agent system. In Proceedings of the CIKM Workshop on Intelligent Information Agents, Fourth International Conference on Information and Knowledge Management (CIKM 95), Baltimore, Maryland, Dec. 1995.Google Scholar
  18. 18.
    P. J. Hatcher and M. J. Quinn. C*-Linda: A programming environment with multiple data-parallel modules and parallel I/O. In Proc. of the 24th HICSS, pages 382–389, 1991.Google Scholar
  19. 19.
    J. Huber, C. L. Elford, D. A. Reed, A. A. Chien, and D. S. Blumenthal. PPFS: A high performance portable parallel file system. In Proc. of the 9th ACM Int'l Conf. on Supercomp., pages 385–394, Barcelona, July 1995.Google Scholar
  20. 20.
    D. Kotz. Disk-directed I/O for MIMD multiprocessors. In Proc. of the 1994 Symp. on OS Design and Impl., pages 61–74, Nov. 1994. Updated as Dartmouth TR PCS-TR94-226 on November 8, 1994.Google Scholar
  21. 21.
    D. Kotz. Expanding the potential for disk-directed I/O. In Proc. of the 1995 IEEE SPDP, pages 490–495, Oct. 1995.Google Scholar
  22. 22.
    D. Kotz. Introduction to multiprocessor I/O architecture. In R. Jain, J. Werth, and J. C. Browne, editors, Input/Output in Parallel and Distributed Computer Systems, chapter 4, pages 97–123. Kluwer Academic Publishers, 1996.Google Scholar
  23. 23.
    D. Kotz and C. S. Ellis. Caching and writeback policies in parallel file systems. J. of Par. and Dist. Comp., 17(1–2):140–145, January and February 1993.Google Scholar
  24. 24.
    D. Kotz and C. S. Ellis. Practical prefetching techniques for multiprocessor file systems. J. of Dist. and Par. Databases, 1(1):33–51, Jan. 1993.Google Scholar
  25. 25.
    O. Krieger and M. Stumm. HFS: A performance-oriented flexible file system based on building-block compositions. In 4th Workshop on I/O in Par. and Dist. Sys., pages 95–108, Philadelphia, May 1996.Google Scholar
  26. 26.
    C. H. Lee, M. C. Chen, and R. C. Chang. HiPEC: High performance external virtual memory caching. In Proc. of the 1994 Symp. on OS Design and Impl., pages 153–164, 1994.Google Scholar
  27. 27.
    S. J. LoVerso, M. Isman, A. Nanopoulos, W. Nesheim, E. D. Milne, and R. Wheeler. sfs: A parallel file system for the CM-5. In Proc. of the 1993 Summer USENIX Conf., pages 291–305, 1993.Google Scholar
  28. 28.
    Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, 1.0 edition, May 5 1994. http://www.mcs.anl.gov/Projects/mpi/standard.html.Google Scholar
  29. 29.
    E. L. Miller and R. H. Katz. RAMA: Easy access to a high-bandwidth massively parallel file system. In Proc. of the 1995 Winter USENIX Conf., pages 59–70, Jan. 1995.Google Scholar
  30. 30.
    S. A. Moyer and V. S. Sunderam. PIOUS: a scalable parallel I/O system for distributed computing environments. In Proc. of the Scalable High-Perf. Comp. Conf., pages 71–78, 1994.Google Scholar
  31. 31.
    N. Nieuwejaar and D. Kotz. The Galley parallel file system. In Proc. of the 10th ACM Int'l Conf. on Supercomp., pages 374–381, May 1996.Google Scholar
  32. 32.
    N. Nieuwejaar and D. Kotz. Performance of the Galley parallel file system. In 4th Workshop on I/O in Par. and Dist. Sys., pages 83–94, May 1996.Google Scholar
  33. 33.
    N. Nieuwejaar, D. Kotz, A. Purakayastha, C. S. Ellis, and M. Best. File-access characteristics of parallel scientific workloads. Technical Report PCS-TR95-263, Dept. of Computer Science, Dartmouth College, Aug. 1995. To appear in IEEE TPDS.Google Scholar
  34. 34.
    R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. In Proc. of the 15th ACM SOSP, pages 79–95, Dec. 1995.Google Scholar
  35. 35.
    P. Pierce. A concurrent file system for a highly parallel mass storage system. In Proc. of the Fourth Conf. on Hypercube Concurrent Comp. and Appl., pages 155–160. Golden Gate Enterprises, Los Altos, CA, Mar. 1989.Google Scholar
  36. 36.
    C. Pu, T. Autrey, A. Black, C. Consel, C. Cowan, J. Inouye, L. Kethana, J. Walpole, and K. Zhang. Optimistic incremental specialization: Streamlining a commercial operating system. In Proc. of the 15th ACM SOSP, pages 314–324, Dec. 1995.Google Scholar
  37. 37.
    A. Purakayastha, C. S. Ellis, and D. Kotz. ENWRICH: a compute-processor write caching scheme for parallel file systems. In 4th Workshop on I/O in Par. and Dist. Sys., pages 55–68, May 1996.Google Scholar
  38. 38.
    P. J. Roy. Unix file access and caching in a multicomputer environment. In Proc. of the Usenix Mach III Symposium, pages 21–37, 1993.Google Scholar
  39. 39.
    K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed collective I/O in Panda. In Proc. of Supercomp. '95, Dec. 1995.Google Scholar
  40. 40.
    K. E. Seamons and M. Winslett. An efficient abstract interface for multidimensional array I/O. In Proc. of Supercomp. '94, pages 650–659, Nov. 1994.Google Scholar
  41. 41.
    K. E. Seamons and M. Winslett. A data management approach for handling large compressed arrays in high performance computing. In Proc. of the 5th Symp. on the Frontiers of Massively Par. Comp., pages 119–128, Feb. 1995.Google Scholar
  42. 42.
    J. W. Stamos and D. K. Gifford. Remote execution. ACM TOPLAS, 12(4):537–565, Oct. 1990.Google Scholar
  43. 43.
    J. T. Thomas. The Panda array I/O library on the Galley parallel file system. Technical Report PCS-TR96-288, Dept. of Computer Science, Dartmouth College, June 1996. Senior Honors Thesis.Google Scholar
  44. 44.
    S. Toledo and F. G. Gustavson. The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations. In 4th Workshop on I/O in Par. and Dist. Sys., pages 28–40, Philadelphia, May 1996.Google Scholar
  45. 45.
    R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Efficient software-based fault isolation. In Proc. of 14th ACM SOSP, pages 203–216, 1993.Google Scholar
  46. 46.
    D. Womble, D. Greenberg, R. Riesen, and S. Wheat. Out of core, out of mind: Practical parallel I/O. In Proc. of the Scalable Par. Libraries Conf., pages 10–16, Mississippi State University, Oct. 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • David Kotz
    • 1
  • Nils Nieuwejaar
    • 1
  1. 1.Department of Computer ScienceDartmouth CollegeHanoverUSA

Personalised recommendations