A very high performance algorithm for NAS EP Benchmark
The NAS (Numerical Aerodynamic Simulation) Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. Major algorithmic improvements to the Embarrassingly Parallel (EP) Benchmark are described. Using IBM RS/6000 workstations and IBM SP-1 scalable parallel machines as examples, we also describe tuning techniques to obtain very high performance on this benchmark. Compared to the generic EP code, various algorithmic and tuning techniques have resulted in a performance improvement by nearly a factor of 18. The techniques described are generally applicable to many numerical algorithms on most RISC machines.
Unable to display preview. Download preview PDF.
- 1.Bailey, D.H., Barszcz, E., Barton, J.T.,Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S.,Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS Parallel Benchmarks. Int. Journal of Supercomputer Applications. (1991) 63–73Google Scholar
- 2.Bailey, D., Barton, J., Lesinski, T., Simon, H.: The NAS Parallel Benchmarks. NASA Technical Memorandum, 103863, Ames Research Center, Moffet Field, CA 94035-1000 (July 1993)Google Scholar
- 3.Gustavson, F.G., Shearer, J.B., Zubair, M.: Performance of EP: A NAS Parallel Benchmark on a Cluster of RS/6000. Internal Report, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 (1992)Google Scholar
- 4.Agarwal, R.C., Gustavson, F.G., Zubair, M.: Fast Embarrassingly Parallel Pseudo Random Number Generator Using Fused Multiply-Add on RS/6000. manuscript under preparation, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 (1994)Google Scholar
- 5.Bailey, D.H., Barszcz, E., Dagum, L., Simon, H.D.: NAS Parallel Benchmark Results 10-93. RNR Technical Report RNR-93-016 (Oct. 1993)Google Scholar
- 6.Agarwal, R.C., Cooley, J.W., Gustavson, F.G., Shearer, J.B., Slishman, G., Tuckerman, B.: New Scalar and Vector Elementary Functions for the IBM System/370. IBM Journal of Research and Development. (1986) 126–144.Google Scholar