PPMC: A Programmable Pattern Based Memory Controller

  • Tassadaq Hussain
  • Muhammad Shafiq
  • Miquel Pericàs
  • Nacho Navarro
  • Eduard Ayguadé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7199)


One of the main challenges in the design of hardware accelerators is the efficient access of data from the external memory. Improving and optimizing the functionality of the memory controller between the external memory and the accelerators is therefore critical. In this paper, we advance toward this goal by proposing PPMC, the Programmable Pattern-based Memory Controller. This controller supports scatter-gather and strided 1D, 2D and 3D accesses with programmable tiling. Compared to existing solutions, the proposed system provides better performance, simplifies programming access patterns and eases software integration by interfacing to high-level programming languages. In addition, the controller offers an interface for automating domain decomposition via tiling. We implemented and tested PPMC on a Xilinx ML505 evaluation board using a MicroBlaze soft-core as the host processor. The evaluation uses six memory intensive application kernels: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication, and 3D-Stencil. The results show that the PPMC-enhanced system achieves at least 10x speed-ups for 1D, 2D and 3D memory accesses as compared to a non-PPMC based setup.


Clock Cycle Access Pattern Direct Memory Access Memory Controller Physical Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Roth, A., Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: ISCA 1999 Proceedings of the 26th Annual International Symposium on Computer Architecture (May 1999)Google Scholar
  2. 2.
    Chai, S.M., Bellas, N., Dwyer, M., Linzmeier, D.: Stream Memory Subsystem in Reconfigurable Platforms (2006)Google Scholar
  3. 3.
    Altera Corporation: Scatter-Gather DMA Controller Core, Quartus II 9.1 (November 2009)Google Scholar
  4. 4.
    Gannon, D., Jalby, W., Gallivan, K.: Strategies for Cache and Local Memory Management by Global Program Rransformation. Journal of Parallel and Distributed ComputingGoogle Scholar
  5. 5.
    Gou, C., Kuzmanov, G., Gaydadjiev, G.N.: SAMS multi-layout memory: providing multiple views of data to boost SIMD performance (2010)Google Scholar
  6. 6.
    Coole, J., Wernsing, J., Stitt, G.: A Traversal Cache Framework for FPGA Acceleration of Pointer Data Structures: A Case Study on Barnes-Hut N-body Simulation. In: International Conference on Reconfigurable Computing and FPGAs (2009)Google Scholar
  7. 7.
    Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L., Brunvand, E., Davis, A., Kuo, C.-C., Kuramkote, R., Parker, M., Schaelicke, L., Tateyama, T.: Impulse: Building a Smarter Memory Controller. In: Fifth International Symposium on High Performance Computer Architecture, HPCA-5 (January 1999)Google Scholar
  8. 8.
    Farkas, K.I., Jouppi, N.P., Chow, P.: How Useful Are Non-blocking Loads, Stream Buffers, and Speculative Execution in Multiple Issue Processors? (1995)Google Scholar
  9. 9.
    Lattice Semiconductor Corporation: Scatter-Gather Direct Memory Access Controller IP Core Users Guide (October 2010)Google Scholar
  10. 10.
    Shafiq, M., Pericas, M., de la Cruz, R., Araya-Polo, M., Navarro, N., Ayguade, E.: Exploiting Memory Customization in FPGA for 3D Stencil Computations (2009)Google Scholar
  11. 11.
    Jouppi, N.: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers (1990)Google Scholar
  12. 12.
    Riverside Optimizing Compiler for Configurable Computing (ROCCC),
  13. 13.
    Derrien, S., Rajopadhye, S.: Loop Tiling for Reconfigurable Accelerators. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 398–408. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Stitt, G., Chaudhari, G., Coole, J.: Traversal Caches: A First Step Towards FPGA Acceleration of Pointer-Based Data Structures (2008)Google Scholar
  15. 15.
    Xilinx: Channelized Direct Memory Access and Scatter Gather (February 2010)Google Scholar
  16. 16.
    Xilinx: Memory Interface Solutions (December 2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tassadaq Hussain
    • 1
  • Muhammad Shafiq
    • 1
  • Miquel Pericàs
    • 1
  • Nacho Navarro
    • 2
  • Eduard Ayguadé
    • 1
    • 2
  1. 1.Barcelona Supercomputing CenterSpain
  2. 2.Universitat Politecnica de CatalunyaSpain

Personalised recommendations