Parallel Memory Architecture for TTA Processor

  • Jarno K. Tanskanen
  • Teemu Pitkänen
  • Risto Mäkinen
  • Jarmo Takala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4599)

Abstract

A conflict resolving parallel data memory system for Transport Triggered Architecture (TTA) is described. The architecture is generic and reusable to support various application specific designs. With parallel memory, more area and power consuming multi-port memory can be replaced with single-port memory modules. Number of ports can be increased over what is available on a design library for multi-port memories. In an FFT TTA example, dual-port data memory was replaced by the proposed architecture. To avoid memory conflicts, the original code was rescheduled and the TTA core was regenerated for the new schedule. The original memory required an area higher by a factor of 3.38 and energy higher by a factor of 1.70. In this case, the energy consumption of the processor core increased so that system energy consumption remained about the same. However, the original system required an area higher by a factor of 1.89.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Corporaal, H.: Microprocessor Architectures: From VLIW to TTA. John Wiley & Sons, Chichester, UK (1997)Google Scholar
  2. 2.
    Sohi, G.S., Franklin, M.: High-bandwidth data memory systems for superscalar processors. In: Proc. 4th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, U.S.A., pp. 53–62 (April 8-11, 1991)Google Scholar
  3. 3.
    Juan, T., Navarro, J.J., Temam, O.: Data caches for superscalar processors. In: Proc. 11th Int. Conf. Supercomputing, Vienna, Austria, pp. 60–67 (July 7-11, 1997)Google Scholar
  4. 4.
    Rivers, J.A., Tyson, G.S., Davidson, E.S., Austin, T.M.: On high-bandwidth data cache design for multi-issue processors. In: Proc. 30th Ann. ACM/IEEE Int. Symp. Microarchitecture, pp. 46–56. Research Triangle Park, NC, U.S.A (December 1-3, 1997)Google Scholar
  5. 5.
    Sawyer, N., Defossez, M.: Quad-port memories in Virtex devices. Xilinx application note, XAPP228 (v1.0) (September 24, 2002)Google Scholar
  6. 6.
    Zhu, Z., Johguchi, K., Mattausch, H.J., Koide, T., Hirakawa, T., Hironaka, T.: A novel hierarchical multi-port cache. In: Proc. 29th European Solid-State Circuits Conf., Estoril, Portugal, pp. 405–408 (September 16-18, 2003)Google Scholar
  7. 7.
    Patel, K., Macii, E., Poncino, M.: Energy-performance tradeoffs for the shared memory in multi-processor systems-on-chip. In: Proc. IEEE Int. Symp. Circuits and Systems, Vancouver, British Columbia, Canada, May 23-26, 2004, vol. 2, pp. 361–364. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  8. 8.
    Ang, S.S., Constantinides, G., Cheung, P., Luk, W.: A flexible multi-port caching scheme for reconfigurable platforms. In: Bertels, K., Cardoso, J.M.P., Vassiliadis, S. (eds.) ARC 2006. LNCS, vol. 3985, pp. 205–216. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Takala, J.H., Järvinen, T.S., Sorokin, H.T.: Conflict-free parallel memory access scheme for FFT processors. In: Proc. IEEE Int. Symp. Circuits and Systems, Bangkok, Thailand, May 25-28, 2003, vol. 4, pp. 524–527. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  10. 10.
    Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J.: Codesign toolset for application-specific instruction-set processors. In: Proc. SPIE - Multimedia on Mobile Devices (2007)Google Scholar
  11. 11.
    Mäkinen, R.: Fast Fourier transform on transport triggered architectures. M.Sc. Thesis, Tampere University of Technology, Tampere, Finland (October 2005)Google Scholar
  12. 12.
    Pitkänen, T., Mäkinen, R., Heikkinen, J., Partanen, T., Takala, J.: Low-power, high-performance TTA processor for 1024-point Fast Fourier transform. In: Vassiliadis, S., Wong, S., Hämäläinen, T.D. (eds.) SAMOS 2006. LNCS, vol. 4017, pp. 227–236. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Budnik, P., Kuck, D.J.: The organization and use of parallel memories. IEEE Trans. Comput. C-20(12), 1566–1569 (1971)CrossRefGoogle Scholar
  14. 14.
    Kim, K., Prasanna, V.K.: Latin squares for parallel array access. IEEE Trans. Parallel and Distrib. Syst. 4(4), 361–370 (1993)CrossRefGoogle Scholar
  15. 15.
    Frailong, J.M., Jalby, W., Lenfant, J.: XOR-schemes: a flexible data organization in parallel memories. In: Proc. Int. Conf. Parallel Processing, pp. 276–283 (August 20-23, 1985)Google Scholar
  16. 16.
    Liu, Z., Li, X.: XOR storage schemes for frequently used data patterns. Journal of Parallel and Distributed Computing 25(2), 162–173 (1995)CrossRefGoogle Scholar
  17. 17.
    Deb, A.: Multiskewing – a novel technique for optimal parallel memory access. IEEE Trans. Parallel and Distrib. Syst. 7(6), 595–604 (1996)CrossRefGoogle Scholar
  18. 18.
    Rau, B.R.: Pseudo-randomly interleaved memory. In: Proc. 18th Ann. Int. Symp. Computer Architecture, Toronto, Ontario, Canada, pp. 74–83 (May 27-30, 1991)Google Scholar
  19. 19.
    Seznec, A., Lenfant, J.: Odd memory systems: a new approach. Journal of Parallel and Distributed Computing 26(2), 248–256 (1995)CrossRefGoogle Scholar
  20. 20.
    Tanskanen, J.K., Creutzburg, R., Niittylahti, J.T.: On design of parallel memory access schemes for video coding. J. VLSI Signal Processing 40(2), 215–237 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jarno K. Tanskanen
    • 1
  • Teemu Pitkänen
    • 1
  • Risto Mäkinen
    • 2
  • Jarmo Takala
    • 1
  1. 1.Tampere University of Technology, P.O. Box 553, FIN-33101 TampereFinland
  2. 2.Plenware Oy, P.O. Box 13, FIN-33201 TampereFinland

Personalised recommendations