Tiled Multi-Core Stream Architecture

  • Nan Wu
  • Qianming Yang
  • Mei Wen
  • Yi He
  • Ju Ren
  • Maolin Guan
  • Chunyuan Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6760)


Conventional stream architectures focus on exploiting ILP and DLP in the applications, although stream model also exposes abundant TLP at kernel granularity. On the other side, with the development of model VLSI technology, increasing application demands and scalability challenges conventional stream architectures. In this paper, we present a novel Tiled Multi-Core Stream Architecture called TiSA. TiSA introduces the tile that consists of multiple stream cores as a new category of architectural resources, and designed an on-chip network to support stream transfer among tiles. In TiSA, multiple levels parallelisms are exploited on different granularity of processing elements. Besides hardware modules, this paper also discusses some other key issues of TiSA architecture, including programming model, various execution patterns and resource allocations. We then evaluate the hardware scalability of TiSA by scaling to 10s~1000s ALUs and estimating its area and delay cost. We also evaluate the software scalability of TiSA by simulating 6 stream applications and comparing sustained performance with other stream processors and general purpose processors, and different configuration of TiSA. A 256-ALU TiSA with 4 tile and 4 stream cores per tile is shown to be feasible with 45 nanometer technology, sustaining 100~350 GFLOP/s on most stream benchmarks and providing ~10x of speedup over a 16-ALU TiSA with a 5% degradation in area per ALU. The result shows that TiSA is a VLSI- and performance-efficient architecture for the billions-transistors era.


Stream Model Instruction Level Parallelism Stream Processor Stream Application Kernel Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Khailany, B.: The VLSI Implementation and Evaluation of Area-and Energy-Efficient Streaming Media Processors. PhD thesis, Stanford,University (2003)Google Scholar
  2. 2.
    Rixner, S.: Stream Processor Architecture. Kluwer Academic Publishers, Boston (2001)zbMATHGoogle Scholar
  3. 3.
    Bond, R.: High Performance DoD DSP Applications. In: 2003 Workshop on Streaming Systems (2003),
  4. 4.
    Wen, M., Wu, N., Li, H., Zhang, C.: Multiple-Morphs Adaptive Stream Architecture. Journal of Computer Science and Technology 20(5) (September 2005)Google Scholar
  5. 5.
    Kozyrakis, C.E., et al.: Scalable Processors in the Billion-Transistors Era: IRAM. IEEE Computer 30(9) (September 1997)Google Scholar
  6. 6.
    Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., et al.: Imagine: media processing with streams. IEEE Micro (March/April 2001)Google Scholar
  7. 7.
    Taylor, M.B., et al.: Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In: ISCA 2004 (2004)Google Scholar
  8. 8.
    Sankaralingam, K., et al.: Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS architecture. In: 30th Annual International Symposium on Computer Architecture (May 2003)Google Scholar
  9. 9.
    Hofstee, H.P.: Power Efficient Processor Architecture and the Cell Processor. In: Proc. of the 11th International Symposium on High Performance Computer Architecture (February 2005)Google Scholar
  10. 10.
    Fang, J.: Challenges and Opportunities on Multi-core Microprocessor. In: Srikanthan, T., Xue, J., Chang, C.-H. (eds.) ACSAC 2005. LNCS, vol. 3740, pp. 389–390. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Mattson, P.R.: A Programming System for the Imagine Media Processor. PhD thesis, Stanford University (2002)Google Scholar
  12. 12.
    Dally, W.J., et al.: Merrimac: Supercomputing with Streams. In: Proc. of Supercomputing 2003 (November 2003)Google Scholar
  13. 13.
    Wen, M., Wu, N., Zhang, C., Wu, W., Yang, Q., Xun, C.: FT64: Scientific Computing with Stream. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 209–220. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Larus, J.: SPIM: A MIPS Simulator,
  15. 15.
    Das, A., Mattson, P., Kapasi, U., Owens, J., Rixner, S., Jayasena, N.: Imagine Programming System User’s Guide 2.0 (June 2004),
  16. 16.
    Mattson, P.: Communication scheduling. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA (November 2000)Google Scholar
  17. 17.
    Nuwan, S.: Jayasena, Memory Hierarchy Design for Stream Computing. Stanford Ph.D. Thesis (2005)Google Scholar
  18. 18.
    Khailany, B., Dally, W.J., Rixner, S., Kapasi, U.J., Owens, J.D., Towles, B.: Exploring the VLSI Scalability of Stream Processors. In: Proceedings of the 9th Symposium on High Performance Computer Architecture, Anaheim, California (February 2003)Google Scholar
  19. 19.
    Erez, M.: Merrimac - High-Performance and High-Efficient Scientific Computing with Streams. PhD thesis, Stanford University (2006)Google Scholar
  20. 20.
    Das, A., Mattson, P., Kapasi, U., Owens, J., Rixner, S., Jayasena, N.: Imagine Programming System Developer’s Guide (2002),
  21. 21.
    Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W.J., Horowitz, M.: Smart memories: A modular reconfigurable architecture. In: International Symposium on Computer Architecture (June 2000)Google Scholar
  22. 22.
    Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2) (March/April 2005)Google Scholar
  23. 23.
  24. 24.
    Zhirnov, V., Cavin, R.: Greg Leeming, Kosmas Galatsis, An Assessment of Integrated Digital Cellular Automata Architectures. IEEE Computer (January 2008)Google Scholar
  25. 25.
    Wu, W., Wen, M., Wu, N., He, Y., et al.: Research and Evaluating of a Multiple-dimension Scalable Stream Architecture. Acta Electronic Sinica (May 2008)Google Scholar
  26. 26.
    Ahn, J.H.: Memory and Control Organizaions of Stream Processors, Ph.D. Thesis, Stanford University (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nan Wu
    • 1
  • Qianming Yang
    • 1
  • Mei Wen
    • 1
  • Yi He
    • 1
  • Ju Ren
    • 1
  • Maolin Guan
    • 1
  • Chunyuan Zhang
    • 1
  1. 1.Computer SchoolNational University of Defense TechnologyChang ShaP.R. of China

Personalised recommendations