A High Performance Heterogeneous Architecture and Its Optimization Design

  • Jianjun Guo
  • Kui Dai
  • Zhiying Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4208)


The widely adoption of media processing applications provides great challenges to high performance embedded processor design. This paper studies a Data Parallel Coprocessor architecture based on SDTA and architecture de-cisions are made for the best performance/cost ratio. Experimental results on a prototype show that SDTA has high performance to run many embedded media processing applications. The simplicity and flexibility of SDTA encourages for further development for its reconfigurable functionality.


Data Parallel SDTA ASIP 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fritts, J.E., Steiling, F.W., Tucek, J.A.: MediaBench II Video: Expediting the next generation of video systems research. In: Embedded Processors for Multimedia and Communications II. San Jose, California, March 8, pp. 79–93 (2005) ISBN / ISSN: 0-8194-5656-XGoogle Scholar
  2. 2.
    Berry, M.W.: Scientific Workload Characterization By Loop-Based Analyses. SIGMETRICS Performance Evaluation Review 19(3), 17–29 (1992)CrossRefGoogle Scholar
  3. 3.
    Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM Journal. Research & Development 49(4/5) (July/September 2005)Google Scholar
  4. 4.
    Krewell, K.: Cell moves into the limelight. Microprocessor Report. February 14 (2005)Google Scholar
  5. 5.
    Fritts, J.: Multi-level Memory Prefetching for Media and Stream Processing. In: Proc. of the IEEE International Conference on Multimedia and Expo (ICME2002), pp. 101–104 (August 2002)Google Scholar
  6. 6.
    Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proc. of the 17th Annual International Symposium on Computer Architecture, pp. 364–373 (May 1990)Google Scholar
  7. 7.
    Palacharla, S., Kessler, R.: Evaluating stream buffers as a secondary cache replacement. In: Proc. of the 21st Annual International Symposium on Computer Architecture, pp. 24–33 (April 1994)Google Scholar
  8. 8.
    Fu, J.W.C., Patel, J.H.: Data prefetching in multi-processor vector cache memories. In: Proc. of the 18th Annual International Symposium on Computer Architecture, pp. 54–63 (May 1991)Google Scholar
  9. 9.
    Fu, J., Patel, J., Janssens, B.: Stride directed prefetching in scalar processors. In: Proc. of the 25th International Symposium on Microarchitecture, pp. 102–110 (December 1992)Google Scholar
  10. 10.
    Zucker, D., Flynn, M., Lee, R.: A Comparison of Hardware Prefetching Techniques For Multimedia Benchmarks. In: 3rd. IEEE International Conference on Multimedia Computing and Systems, Hiroshima, Japan, pp. 236–244 (June 1996)Google Scholar
  11. 11.
    Jain, M.K., Balakrishnan, M.: ASIP Design Methodologies: Survey and Issues. In: Proc. of the 14th International Conference on VLSI Design (VLSID 2001), pp. 76–81 (January 2001)Google Scholar
  12. 12.
    Corporaal, H., Mulder, H.: MOVE: A framework for high-performance processor design. In: Supercomputing 1991, pp. 692–701 (November 1991)Google Scholar
  13. 13.
    Hoogerbrugge, J.: Code generation for Transport Triggered Architectures. PhD thesis, Delft Univ.of Technology (February 1996) ISBN 90-9009002-9Google Scholar
  14. 14.
  15. 15.
    Volder, J.E.: The CORDIC trigonometric computing technique. IRE Transactions on Electronic Computers 8, 330–334 (1959)CrossRefGoogle Scholar
  16. 16.
    Ye, T.T.: 0n-chip multiprocessor communication network design and analysis. PhD thesis, Stanford University (December 2003)Google Scholar
  17. 17.
    TMS320C64x CPU and Instruction Set Reference Guide. Texas Instruments, Inc., USA (2000) Google Scholar
  18. 18.
    TMS320C64x DSP library programmer’s reference. Texas Instruments, Inc., USA (2003) Google Scholar
  19. 19.
    Hofstee, H.P.: Power Efficient Processor Architecture and The Cell Processor. In: Proc. of the 11th International Symposium on High-Performance Computer Architecture (HPCA 2005), San Francisco, CA, USA, pp. 258–262 (February 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jianjun Guo
    • 1
  • Kui Dai
    • 1
  • Zhiying Wang
    • 1
  1. 1.School of ComputerNational University of Defense TechnologyChangsha, HunanChina

Personalised recommendations