Multimedia Tools and Applications

, Volume 76, Issue 4, pp 5951–5963 | Cite as

Improving memory system performance for multimedia applications



The cost and performance of embedded systems heavily depends on the performance of memories it utilizes. Latency of a memory access is one of the major bottlenecks in the system performance. In software compilation, it is known that there are high variations in memory access latency depending on the ways of storing/retrieving variables in code to/from memories. To improve the latency, it needs a technique to maximize the use of memory bandwidth. A burst transfer is well known technique to maximally utilize memory bandwidth. The burst transfer capability offers an average access time reduction of more than 65 % for an eight-word sequential transfer. However, the problem of utilizing such burst transfers has not been generally addressed, and unfortunately, it is not tractable. In this work, we present a new technique that both identifies sequences of single load and store instructions for combining into burst transfers. The proposed technique provides an optimal data placement of nonarray variables to achieve the maximum utilization of burst data transfers. The major contributions of our work are, 1) we prove that the problem is NP-hard and 2) we propose an exact formulation of the problem and an efficient data placement algorithm. From experiments with a set of multimedia benchmarks, we confirm that our proposed technique uses on average 7 times more burst accesses than generated codes from ARM commercial compiler.


Memory system Energy consumption Compiler optimization Restructuring burst mode 


  1. 1.
    ARM,, ARM Developer Suite - Version 1.2, Nov. 2001
  2. 2.
    Ayukawa K, Watanabe T, Narita S (1998) An access-sequence control scheme to enhance random-access per formance of embedded DRAMs. IEEE J Solid-State Circ 33(5):800–806CrossRefGoogle Scholar
  3. 3.
    Chaitan G (1982) Register allocation and spilling via graph coloring. Proc of SIGPLAN Symposium on Compiler Construction 201–207Google Scholar
  4. 4.
    Chame J, Shin J, Hall M (2000) Compiler transformations for exploiting bandwidth in PIM-based systems. Proc. of the 27th Annu. Int. Symp. Comput. Architecture, Workshop Solving Memory Wall ProblemGoogle Scholar
  5. 5.
  6. 6.
    Dutt ND (1997) Memory organization and exploration for embedded systems-on-silicon. Proc. of Int. Conf. VLSI Computer-Aided Design, Seoul, KoreaGoogle Scholar
  7. 7.
    Grun P, Dutt ND, Nicolau A (2000) Memory aware compilation through accuratetiming extraction. Proc Des Autom Conf 316–321Google Scholar
  8. 8.
    Grun P, Dutt ND, Nicolau A (2000) APEX: Access pattern based memory architecture exploration. Proc of Int Symp Syst Synthesis 25–32Google Scholar
  9. 9.
    Hettiaratchi S, Cheung P, Clarke T (2002) Energy efficient address assignment through minimizedmemory rowswitching. Proc Int Conf Comput-Aided Des 577–582Google Scholar
  10. 10.
    IBM Cu-11 Embedded DRAM Macro 2002Google Scholar
  11. 11.
    Johnson N, Mycroft A (2003) Combined code motion and register allocation using the value state dependence graph. In Proc. of the 12th Compiler Construction 2622:1–16Google Scholar
  12. 12.
    Khare A., Panda PR., Dutt, Nicolau A (1998) High-level synthesis with synchronous and RAMBUS DRAMs. Workshop Synthesis Syst. Integr. Mixed TechnolGoogle Scholar
  13. 13.
    Mckee SA, Wulf WA, Aylor JH, Klenke RH, Salinas MH, Hong SI, Weikle D (2000) Dynamic access ordering for streamed computations. IEEE Trans Comput 49(11):1255–1271CrossRefGoogle Scholar
  14. 14.
    Panda PR, Dutt ND, Nicolau A (1997) Exploiting off-chip memory access modes in high-level synthesis. Proc Int Conf Comput-Aided Des 333–340Google Scholar
  15. 15.
    Panda PR, Dutt ND, Nicolau A (1997) Memory data organization for improved cache performance in embedded processor applications. ACM Trans Des Autom Electron Syst 2(4):384–409CrossRefGoogle Scholar
  16. 16.
    Shin J, Chame J, Hall M (2002) A compiler algorithm for exploiting page-mode memory access in embedded-DRAM devices. Proc. of the 4th Workshop Media Streaming ProcessGoogle Scholar
  17. 17.
    Zivojnovic V, Velarde JM, Schager C, Meyr H (1994) DSPStone- A DSP oriented benchmarking methodology. Proc Int Conf Signal Process Appl TechnolGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computer EngineeringYeungnam UniversityGyeongsanKorea
  2. 2.Department of Electrical & Electronic EngineeringSunchon National UniversitySuncheonKorea

Personalised recommendations