Performance Evaluation of a Massively Parallel I/O Subsystem

  • Sandra Johnson Baylor
  • Caroline Benveniste
  • Yarsun Hsu
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 362)


Presented are the trace-driven simulation results of a study conducted to evaluate the performance of the internal parallel I/O subsystem of the Vulcan massively parallel processor (MPP) architecture. The system sizes evaluated vary from 16 to 512 nodes. The results show that a compute node to I/O node ratio of four is the most cost effective for all system sizes, suggesting high scalability. Also, processor-to-processor communication effects are negligible for small message sizes and the greater the fraction of I/O reads, the better the I/O performance. Worse case I/O node placement is within 13% of more efficient placement strategies. Introducing parallelism into the internal I/O subsystem improves I/O performance significantly.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S.J. Baylor, C. Benveniste, and Y. Hsu. Performance evaluation of a parallel i/o architecture. International Conference on Supercomputing, pages 404–413, July 1995.Google Scholar
  2. [2]
    P.F. Corbett and D.G. Feitelson. Design and implementation of the vesta parallel file system. Scalable High Performance Computing Conference, pages 63–70, 1994.Google Scholar
  3. [3]
    RF. Corbett, D.G. Feitelson, J-R Prost, and SJ. Baylor. Parallel access to files in the vesta file system. Supercomputing ’93, pages 472–481, November 1993.Google Scholar
  4. [4]
    Alverson et. al. The tera computer system. International Conference on Super-computing, pages 1–6, June 1990.Google Scholar
  5. [5]
    Leiserson et. al. The network architecture of the connection machine cm-5. 4th Symposium on Parallel Algorithms and Architectures, pages 272–285, June 1992.Google Scholar
  6. [6]
    Stunkel et. al. Architecture and implementation of vulcan. International Parallel Processing Symposium, pages 268–274, April 1994.Google Scholar
  7. [7]
    D.G. Feitelson, RF. Corbett, SJ. Baylor, and Y. Hsu. Parallel i/o subsystems in massively parallel supercomputers. IEEE Parallel and Distributed Technology, Fall 1995.Google Scholar
  8. [8]
    D.H. Lawrie. Access and alignment of data in an array processor. IEEE Transactions on Computers, pages 1145–1155, December 1975.Google Scholar
  9. [9]
    M. Livingston and Q.F. Stout. Distributing resources in hypercube computers. 3rd Conference on Hypercube Concurrent Computer Applications, pages 40–48, January 1988.Google Scholar
  10. [10]
    P. Messina. The concurrent supercomputing consortium: Year 1. IEEE Parallel and Distributed Technology, 1(1):9–16, February 1993.CrossRefGoogle Scholar
  11. [11]
    P. Pierce. A concurrent file system for a high parallel mass storage subsystem. Fourth Conference on Hypercube Computers and Applications, pages 155–160, 1989.Google Scholar
  12. [12]
    A.L.N. Reddy and P.Banerjee. Design, analysis, and simulation of i/o architectures for hypercube multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1(2): 140–151, April 1990.CrossRefGoogle Scholar
  13. [13]
    C.B. Stunkel, D.G. Shea, D.G. Grice, RH. Hochschild, and M. Tsao. The spl high-performance switch. Scalable High Performance Computing Conference, May 1994.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Sandra Johnson Baylor
    • 1
  • Caroline Benveniste
    • 2
  • Yarsun Hsu
    • 1
  1. 1.IBM T.J. Watson Research CenterNew YorkUSA
  2. 2.Department of Electrical EngineeringColumbia UniversityNew YorkUSA

Personalised recommendations