Benchmarking and optimization of scientific codes on the CRAY X-MP, CRAY-2, and SCS-40 vector computers
- 36 Downloads
Various scientific codes were benchmarked on three vector computers: the CRAY X-MP/48 and CRAY-2 supercomputers and the SCS-40/XM minisupercomputer. On the X-MP, two Fortran compilers were also compared. The benchmarks, which were initially all in Fortran, consisted of six research codes from Caltech, the 24 Livermore loops, and two cases from the LINPACK benchmark. As a corollary effort, the effect of manual optimization on the Caltech codes was also considered, including the selected use of assembly-language math routines.
On each machine the ratio of the maximum to the minimum speeds for the various benchmarks was more than a factor of 50, even though the study was restricted to unitasked (i.e., single CPU) runs. The maximum speed for all-Fortran codes was more than 80% of the peak speed on the X-MP and SCS, but less than 40% of the peak speed on the CRAY-2.
Despite having a clock that is 2.3 times faster, the CRAY-2 generally runs slower than the X-MP, typically by a factor of 1.3 for scalar code and even slower for moderately vectorized code. Only for highly vectorized codes does the CRAY-2 marginally outperform the X-MP, at least for in-core benchmarks. The poorer performance of the CRAY-2 is due to its slower scalar speed, its lack of chaining, its single port between each CPU and memory, and its relatively slow memory.
The SCS runs slower than the X-MP by a factor of 2.6 in the scalar limit and by a factor of 4.7 (the clock ratio) in the vector limit when the same CFT compiler is used on both machines. Use of the newer CFT77 compiler on the X-MP negates the relative enhancement of the SCS scalar performance.
On the X-MP, the CFT77 3.0 compiler produces significantly faster code than CFT 1.14, typically by a factor of 1.4. This is obtained, however, at the expense of compilation times that are three to five times longer. Regardless of the compiler, manual optimization is still worthwhile. For three of the six Caltech codes compiled with CFT77, run time speedups of 2, 4, and 16 were achieved due to Fortran optimization only.
KeywordsCompilation Time Single Port Relative Enhancement Minimum Speed Vector Computer
Unable to display preview. Download preview PDF.
- Anderson, R.E., Grimes, R.G., and Simon, H.D. 1988. Performance comparison of the CRAY X-MP/24 with SSD and the CRAY-2. The J. of Supercomputing, 1, 4 (Aug.), 409–419.Google Scholar
- Dongarra, J.J. 1988. Performance of various computers using standard linear equations software in a Fortran environment. Argonne Nat. Laboratory Tech. Mem. MCS-TM-23.Google Scholar
- Dongarra, J.J., Gustavson, F.G., and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26, 1 (Jan.), 91–112.Google Scholar
- McMahon, F.H. 1986. The Livermore Fortran kernels: A computer test of the numerical performance range. Lawrence Livermore Nat. Laboratory Rept. UCRL-53745.Google Scholar
- Messina, P., Baillie, C.F., Felten, E.W., Hipes, P.G., Walker, D.G., Williams, R.D., Pfeiffer, W., Alagar, A., Kamrath, A., Leary, R.H., and Rogers, J. 1990. Benchmarking advanced architecture computers. Concurrency: Practice and Experience (to appear).Google Scholar
- Moore, R.W. 1988. Personal commun.Google Scholar
- Nelson, H. 1985. Using the performance monitors on the X-MP/48. Tentacle (newsletter of the Computation Dept. at Lawrence Livermore Nat. Laboratory), 5, 9 (Sept./Oct.), 15–23.Google Scholar
- Simmons, M.L., and Wasserman, H.J. 1988. Performance comparison of the CRAY-2 and CRAY X-MP/416 supercomputers. In Proc., Supercomputing '88 (Orlando, Fla., Nov. 14–18), IEEE Comp. Society Press, pp. 288–295.Google Scholar
- Walker, D.W., Messina, P., and Baillie, C.F. 1988. Performance evaluation of scientific programs on advanced architecture computers. Calif. Institute of Technology Concurrent Computation Program Rept. C3P-580.Google Scholar