Matrix-Based Programming Optimization for Improving Memory Hierarchy Performance on Imagine

  • Xuejun Yang
  • Jing Du
  • Xiaobo Yan
  • Yu Deng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4330)


Despite Imagine presents an efficient memory hierarchy, the straightforward programming of scientific applications does not match the available memory hierarchy and thereby constrains the performance of stream applications. In this paper, we explore a novel matrix-based programming optimization for improving the memory hierarchy performance to sustain the operands needed for highly parallel computation. Our specific contributions include that we formulate the problem on the Data&Computation Matrix (D&C Matrix) that is proposed to abstract the relationship between streams and kernels, and present the key techniques for improving the multilevel bandwidth utilization based on this matrix. The experimental evaluation on five representative scientific applications shows that the new stream programs yielded by our optimization can effectively enhance the locality in LRF and SRF, improve the capacity utilization of LRF and SRF, make the best use of SPs and SBs, and avoid index stream overhead.


Access Pattern Bandwidth Utilization Memory Hierarchy Spatial Reuse Loop Fusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Saman Amarasinghe, W.: Stream Architectures. In: PaCT 2003 (September 2003)Google Scholar
  2. 2.
    Khailany, B., et al.: Imagine: Media processing with streams. IEEE Micro 21(2), 35–46 (2001)CrossRefGoogle Scholar
  3. 3.
    Kapasi, U.J., Rixner, S., Dally, W.J., Khailany, B., Ahn, J.H., Mattson, P., Owens, J.D.: Programmable Stream Processors. IEEE Computer, 54–62 (August 2003)Google Scholar
  4. 4.
    Khailany, B.: The VLSI Implementation and Evaluation of Area-and Energy-Efficient Streaming Media Processors. Ph.D. thesis, Stanford University (2003)Google Scholar
  5. 5.
    Zeng, L.: Fusion and Partition-Research on Memory-access-sequence Optimization. Ph.D. thesis, National University of Defense Technology, China (2006)Google Scholar
  6. 6.
    Johnsson, O., Stenemo, M., ul-Abdin, Z.: Programming & Implementation of Streaming Applications. Master’s thesis, Computer and Electrical Engineering Halmstad University (2005)Google Scholar
  7. 7.
    Amarasinghe, S., et al.: Stream Languages and Programming Models. In: PaCT 2003, September 27 (2003)Google Scholar
  8. 8.
    Jayasena, N.S.: Memory Hierarchy Design for Stream Computing. Ph.D. thesis, Stanford University (2005)Google Scholar
  9. 9.
    Mattson, P., et al.: Imagine Programming System Developer’s Guide (2002),
  10. 10.
    Das, A., Mattson, P., et al.: Imagine Programming System User’s Guide 2.0 (June 2004)Google Scholar
  11. 11.
    Mattson, P.R.: A Programming System for the Imagine Media Processor. Dept. of Electrical Engineering. Ph.D. thesis, Stanford University (2002)Google Scholar
  12. 12.
    Suh, J., Kim, E.-G., Crago, S.P., Srinivasan, L., French, M.C.: A Performance Analysis of PIM, Stream Processing, and Tiled Processing on Memory-Intensive Signal Processing Kernels. In: ISCA 2003 (2003)Google Scholar
  13. 13.
    Kuck, D., Kuhn, R., Padua, D., Leasure, B., Wolfe, M.J.: Dependence graphs and compiler optimizations. In: Conference Record of the Eighth Annual ACM Symposium on the Principles of Programming Languages, Williamsburg, VA (January 1981)Google Scholar
  14. 14.
    Xue, J.: Loop Tiling for Parallelism. Kluwer Academic Publishers, Boston (2000)MATHGoogle Scholar
  15. 15.
    Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)MATHGoogle Scholar
  16. 16.
    Du, J., Yang, X., et al.: Scientific Computing Applications on the Imagine Stream Processor. In: Jesshope, C., Egan, C. (eds.) ACSAC 2006. LNCS, vol. 4186, pp. 38–51. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Ahn, J.H., Dally, W.J., et al.: Evaluating the Imagine Stream Architecture. In: ISCA 2004 (2004)Google Scholar
  18. 18.
    Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge (1989)MATHGoogle Scholar
  19. 19.
    Wolf, M.E., Lam, M.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2(4), 452–471 (1991)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xuejun Yang
    • 1
  • Jing Du
    • 1
  • Xiaobo Yan
    • 1
  • Yu Deng
    • 1
  1. 1.School of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations