Mapping and Optimizing 2-D Scientific Applications on a Stream Processor
Stream processors, with the stream programming model, have demonstrated significant performance advantages in the domains signal processing, multimedia and graphics applications, and are covering scientific applications. In this paper we examine the applicability of a stream processor to 2-D stencil scientific applications, an important and widely used class of scientific applications, which compute values using neighboring array elements in a fixed stencil pattern. We first map 2-D stencil scientific applications in FORTRAN version to the stream processor in a straightforward way. In a stream processor system, the management of system resources is the programmers’ responsibility. We then present several optimizations, which avail the stream program for 2-D stencil scientific applications, of various aspects of the stream processor architecture. Finally, we analyze the performance of optimized 2-D stencil scientific stream applications, with the presented optimizations. The final stream scientific programs gain from 2.56 to 7.62 times faster than the corresponding FORTRAN programs on a Xeon processor, with the optimizations playing an important role in realizing the performance improvement.
KeywordsInput Stream Memory Transfer Stream Processor Stream Application Basic Stream
This work was supported by NSFC (61003075, 61103193,61103011, 61103014).
- 1.Rixner S (2001) Stream processor architecture. Kluwer Academic Publishers, BostonGoogle Scholar
- 2.Kapasi U, Dally W, Rixner S, Owens J, Khailany B (2002) The imagine stream processor. In: Proceedings of 2002 IEEE international conference on computer design, pp 282–288Google Scholar
- 4.Fatica M, Jameson A, Alonso J STREAMFLO: an Euler solver for streaming architectures, submitted to AIAA conferenceGoogle Scholar
- 6.Das A, Dally WJ, Mattson P (2006) Compiling for stream processing. In: proceedings of the 15th international conference on parallel architectures and compilation techniques PACT ’06. ACM Press, New York, pp 33–42Google Scholar
- 8.Kahle JA, Day MN, Hofstee HP, Johns CR, Maeurer TR, Shippy D (2005) Introduction to the cellmultiprocessor. IBM J Res Dev 49(4/5):589–604Google Scholar