Application Specific Memory Access, Reuse and Reordering for SDRAM

  • Samuel Bayliss
  • George A. Constantinides
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6578)

Abstract

The efficient use of bandwidth available on an external SDRAM interface is strongly dependent on the sequence of addresses requested. On-chip memory buffers can make possible data reuse and request reordering which together ensure bandwidth on an SDRAM interface is used efficiently. This paper outlines an automated procedure for generating an application-specific memory hierarchy which exploits reuse and reordering and quantifies the impact this has on memory bandwidth over a range of representative benchmarks. Considering a range of parameterized designs, we observe up to 50x reduction in the quantity of data fetched from external memory. This, combined with reordering of the transactions, allows up to 128x reduction in the memory access time of certain memory-intensive benchmarks.

Keywords

Memory Access Loop Nest External Memory Memory Bandwidth Memory Address 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Akesson, B., Goossens, K., Ringhofer, M.: Predator: A Predictable SDRAM Memory Controller. In: CODES+ISSS 2007: Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis, Salzburg, Austria, pp. 251–256 (2007)Google Scholar
  2. 2.
    Baradaran, N., Diniz, P.C., Way, A., Rey, M.: Compiler-Directed Design Space Exploration for Caching and Prefetching Data in High-level Synthesis. In: FPT 2005: Proceedings of the IEEE International Conference on Field Programmable Technology, Singapore, pp. 233–240 (2005)Google Scholar
  3. 3.
    Celoxica Ltd. DK4: Handel-C Language Reference Manual (2005)Google Scholar
  4. 4.
    Claßen, M., Griebl, M.: Automatic code generation for distributed memory architectures in the polytope model. In: IPDPS 2006: 20th International Parallel and Distributed Processing Symposium, Rhodes, Greece, pp. 243–250 (2006)Google Scholar
  5. 5.
    Darte, A., Schreiber, R., Villard, G.: Lattice-Based Memory Allocation. IEEE Transactions on Computers 54(10), 1242–1257 (2005)CrossRefGoogle Scholar
  6. 6.
    Feautrier, P.: Parametric Integer Programming. RAIRO Recherche Opérationnelle 22(3), 243–268 (1988)MathSciNetMATHGoogle Scholar
  7. 7.
    Feautrier, P.: Automatic Parallelization in the Polytope Model. In: The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Application, pp. 79–103 (1996)Google Scholar
  8. 8.
    Hennesey, J., Patterson, D.: Computer Architecture: A Quantitative Approach, 6th edn. Morgan Kaufmann, San Francisco (2006)Google Scholar
  9. 9.
    Khare, A., Panda, P.R., Dutt, N.D., Nicolau, A.: High-Level Synthesis with SDRAMs and RAMBUS DRAMs. IEICE Transactions on Fundamentals of Electronics, Communications, and Computer Sciences E82A(11), 2347–2355 (1999)Google Scholar
  10. 10.
    Kim, H.S., Vijaykrishnan, N., Kandemir, M., Brockmeyer, E., Catthoor, F., Irwin, M.J.: Estimating Influence of Data Layout Optimizations on SDRAM Energy Consumption. In: ISLPED 2003: Proceedings of the 2003 International Symposium on Low Power Electronics and Design, Seoul, South Korea, pp. 40–43 (2003)Google Scholar
  11. 11.
    Lengauer, C.: Loop Parallelization in the Polytope Model. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 398–417. Springer, Heidelberg (1993)Google Scholar
  12. 12.
    Liu, Q., Constantinides, G.A., Masselos, K., Cheung, P.Y.K.: Automatic On-chip Memory Minimization for Data Reuse. In: FCCM 2007: Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, CA, USA, pp. 251–260 (2007)Google Scholar
  13. 13.
    Liu, Q., Masselos, K., Constantinides, G.A.: Data Reuse Exploration for FPGA based Platforms Applied to the Full Search Motion Estimation Algorithm. In: FPL 2006: 16th International Conference on Field Programmable Logic and Applications (2006)Google Scholar
  14. 14.
    Marchal, P., Bruni, D., Gomez, J., Benini, L., Pinuel, L., Catthoor, F., Corporaal, H.: SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms. In: DATE 2003: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Munich, Germany, pp. 516–521 (2003)Google Scholar
  15. 15.
    Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory Access Scheduling. In: ISCA 2000: Proceedings of the 27th Annual International Symposium on Computer Architecture, Vancouver, BC, Canada, vol. 28, pp. 128–138 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Samuel Bayliss
    • 1
  • George A. Constantinides
    • 1
  1. 1.Department of Electrical and Electronic EngineeringImperial College LondonLondonUnited Kingdom

Personalised recommendations