Performance Evaluation of a Massively Parallel I/O Subsystem

  • Sandra Johnson Baylor
  • Caroline Benveniste
  • Yarsun Hsu
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 362)


Presented are the trace-driven simulation results of a study conducted to evaluate the performance of the internal parallel I/O subsystem of the Vulcan massively parallel processor (MPP) architecture. The system sizes evaluated vary from 16 to 512 nodes. The results show that a compute node to I/O node ratio of four is the most cost effective for all system sizes, suggesting high scalability. Also, processor-to-processor communication effects are negligible for small message sizes and the greater the fraction of I/O reads, the better the I/O performance. Worse case I/O node placement is within 13% of more efficient placement strategies. Introducing parallelism into the internal I/O subsystem improves I/O performance significantly.


Node Ratio Request Rate Node Placement Read Request Node Blocking 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S.J. Baylor, C. Benveniste, and Y. Hsu. Performance evaluation of a parallel i/o architecture. International Conference on Supercomputing, pages 404–413, July 1995.Google Scholar
  2. [2]
    P.F. Corbett and D.G. Feitelson. Design and implementation of the vesta parallel file system. Scalable High Performance Computing Conference, pages 63–70, 1994.Google Scholar
  3. [3]
    RF. Corbett, D.G. Feitelson, J-R Prost, and SJ. Baylor. Parallel access to files in the vesta file system. Supercomputing ’93, pages 472–481, November 1993.Google Scholar
  4. [4]
    Alverson et. al. The tera computer system. International Conference on Super-computing, pages 1–6, June 1990.Google Scholar
  5. [5]
    Leiserson et. al. The network architecture of the connection machine cm-5. 4th Symposium on Parallel Algorithms and Architectures, pages 272–285, June 1992.Google Scholar
  6. [6]
    Stunkel et. al. Architecture and implementation of vulcan. International Parallel Processing Symposium, pages 268–274, April 1994.Google Scholar
  7. [7]
    D.G. Feitelson, RF. Corbett, SJ. Baylor, and Y. Hsu. Parallel i/o subsystems in massively parallel supercomputers. IEEE Parallel and Distributed Technology, Fall 1995.Google Scholar
  8. [8]
    D.H. Lawrie. Access and alignment of data in an array processor. IEEE Transactions on Computers, pages 1145–1155, December 1975.Google Scholar
  9. [9]
    M. Livingston and Q.F. Stout. Distributing resources in hypercube computers. 3rd Conference on Hypercube Concurrent Computer Applications, pages 40–48, January 1988.Google Scholar
  10. [10]
    P. Messina. The concurrent supercomputing consortium: Year 1. IEEE Parallel and Distributed Technology, 1(1):9–16, February 1993.CrossRefGoogle Scholar
  11. [11]
    P. Pierce. A concurrent file system for a high parallel mass storage subsystem. Fourth Conference on Hypercube Computers and Applications, pages 155–160, 1989.Google Scholar
  12. [12]
    A.L.N. Reddy and P.Banerjee. Design, analysis, and simulation of i/o architectures for hypercube multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1(2): 140–151, April 1990.CrossRefGoogle Scholar
  13. [13]
    C.B. Stunkel, D.G. Shea, D.G. Grice, RH. Hochschild, and M. Tsao. The spl high-performance switch. Scalable High Performance Computing Conference, May 1994.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Sandra Johnson Baylor
    • 1
  • Caroline Benveniste
    • 2
  • Yarsun Hsu
    • 1
  1. 1.IBM T.J. Watson Research CenterNew YorkUSA
  2. 2.Department of Electrical EngineeringColumbia UniversityNew YorkUSA

Personalised recommendations