Advertisement

Journal of Signal Processing Systems

, Volume 90, Issue 11, pp 1533–1549 | Cite as

Memory Controller for Vector Processor

  • Tassadaq HussainEmail author
  • Oscar Palomar
  • Osman S. Ünsal
  • Adrian Cristal
  • Eduard Ayguadé
Article

Abstract

To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators and vector processing cores for data-intensive scientific applications. FPGA based vector accelerators are used to increase the performance of high-performance application kernels. Adding more vector lanes does not affect the performance, if the processor/memory performance gap dominates. In addition if on/off-chip communication time becomes more critical than computation time, causes performance degradation. The system generates multiple delays due to application’s irregular data arrangement and complex scheduling scheme. Therefore, just like generic scalar processors, all sets of vector machine – vector supercomputers to vector microprocessors – are required to have data management and access units that improve the on/off-chip bandwidth and hide main memory latency. In this work, we propose an Advanced Programmable Vector Memory Controller (PVMC), which boosts noncontiguous vector data accesses by integrating descriptors of memory patterns, a specialized on-chip memory, a memory manager in hardware, and multiple DRAM controllers. We implemented and validated the proposed system on an Altera DE4 FPGA board. The PVMC is also integrated with ARM Cortex-A9 processor on Xilinx Zynq All-Programmable System on Chip architecture. We compare the performance of a system with vector and scalar processors without PVMC. When compared with a baseline vector system, the results show that the PVMC system transfers data sets up to 1.40x to 2.12x faster, achieves between 2.01x to 4.53x of speedup for 10 applications and consumes 2.56 to 4.04 times less energy.

Keywords

Vector processor Scalar core SDRAM controller 

References

  1. 1.
    Visual computing technology from NVIDIA. http://www.nvidia.com/.
  2. 2.
    Espasa, R., Valero, M., & Smith, J.E. (1998). Vector architectures: past, present and future. In 12th international conference on Supercomputing.Google Scholar
  3. 3.
    Kozyrakism, C., & Patterson, D. (2003). Overcoming the limitations of conventional vector processors. In ACM SIGARCH Computer Architecture News.Google Scholar
  4. 4.
    Lee, Y., Avizienis, R., Bishara, A., Xia, R., Lockhart, D., Batten, C., & Asanović, K. (2011). Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In ACM SIGARCH Computer Architecture News, (Vol. 39 pp. 129–140): ACM.Google Scholar
  5. 5.
    Hussain, T., Pericas, M., Navarro, N., & Ayguade, E. Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool.Google Scholar
  6. 6.
    Hussain, T., Palomar, O., Cristal, A., Unsal, O., Ayguady, E., & Valero, M. (2014). Advanced pattern based memory controller for FPGA based HPC applications. In International Conference on High Performance Computing & Simulation (p. 8): ACM, IEEE.Google Scholar
  7. 7.
    Hussain, T., Pericas, M., Navarro, N., & Ayguade, E. (2012). PPMC: Hardware scheduling and memory management support for multi hardware accelerators. In FPL.Google Scholar
  8. 8.
    Embedded Development Kit EDK 10.1i. MicroBlaze Processor Reference Guide.Google Scholar
  9. 9.
    Nios II: Processor Reference Handbook, 2009.Google Scholar
  10. 10.
    Yiannacouras, P., Gregory Steffan, J., & Rose, J. (2008). Vespa: portable, scalable, and flexible fpga-based vector processors. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (pp. 61–70): ACM.Google Scholar
  11. 11.
    Chou, C.H., Severance, A., Brant, A.D., Liu, Z., Sant, S., & Lemieux, G.G.F. (2011). Vegas: soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (pp. 15–24): ACM.Google Scholar
  12. 12.
    Russell, R.M. The CRAY-1 computer system.Google Scholar
  13. 13.
    Hui, C. Vector pipelining, chaining, and speed on the IBM 3090 and cray X-MP.Google Scholar
  14. 14.
    Michael, W. (1991). Strip mining on SIMD architectures. In Proceedings of the 5th international conference on Supercomputing: ACM.Google Scholar
  15. 15.
    Hussain, T., Palomar, O., Cristal, A., Unsal, O., Ayguady, E., Valero, M., & Haider, A. (2014). Stand-alone memory controller for graphics system. In The 10th International Symposium on Applied Reconfigurable Computing (ARC 2014): ACM.Google Scholar
  16. 16.
    Hussain, T., & Amna, H. (2014). A Pattern-Based Graphics Controller International Journal of Circuits and Architecture.Google Scholar
  17. 17.
    Hussain, T., Shafiq, M., Pericas, M., Navarro, N., & Ayguade, E. (2012). PPMC: A programmable pattern based memory controller. In ARC.Google Scholar
  18. 18.
    Hussain, T., Pericas, M., Navarro, N., & Ayguade, E. (2011). Reconfigurable memory controller with programmable pattern support.Google Scholar
  19. 19.
    Yiannacouras, P., Rose, J., & Gregory Steffan, J (2005). The microarchitecture of FPGA-based soft processors. International conference on Compilers architectures and synthesis for embedded systems.Google Scholar
  20. 20.
    Crockett, L.H., Elliot, R.A., Enderwitz, M.A., & Stewart, R.W. (2014). The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc Strathclyde Academic Media.Google Scholar
  21. 21.
    Yu, J., Eagleston, C., Chou, Christopher H.-Y., Perreault, M., & Lemieux, G. (2009). Vector processing as a soft processor accelerator. volume 2, page 12 ACM.Google Scholar
  22. 22.
    Severance, A., & Lemieux, G. (2012). Venice: A compact vector processor for fpga applications. In 2012 International Conference on Field-Programmable Technology (FPT) (pp. 261–268): IEEE.Google Scholar
  23. 23.
    McKee, S.A., Wulf, W.A., Aylor, J.H., Klenke, R.H., Salinas, M.H., Hong, S.I., & Weikle, D.A.B. (2000). Dynamic access ordering for streamed computations (Vol. 49, pp. 1255–1271): IEEE.Google Scholar
  24. 24.
    Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L., Brunvand, E., Davis, A., Kuo, C.-C., Kuramkote, R., Parker, M., Schaelicke, L., & Tateyama, T. (1999). Impulse: Building a smarter memory controller: Prentice-Hall, Inc.Google Scholar
  25. 25.
    Zhang, L., Fang, Z., Parker, M., Mathew, B.K, Schaelicke, L., Carter, J.B, Hsieh, W.C, & McKee, S.A. (2001). The impulse memory controller (Vol. 50, pp. 1117–1132): IEEE.Google Scholar
  26. 26.
    Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., & Marwedel, P. (2002). Reducing energy consumption by dynamic copying of instructions onto onchip memory. In 15th International Symposium on System Synthesis.Google Scholar
  27. 27.
    Ranjan, P.P., Dutt Nikil, D, & Alexandru, N. (1999). Memory issues in embedded systems-on-chip: optimizations and exploration: Springer.Google Scholar
  28. 28.
    Suhendra, V., Mitra, T., Roychoudhury, A., & Chen, T. (2005). Wcet centric data allocation to scratchpad memory. In 26th IEEE International Real-Time Systems Symposium RTSS.Google Scholar
  29. 29.
    Deverge, J.-F., & Puaut, I. Wcet-directed dynamic scratchpad memory allocation of data. In 19th Euromicro Conference on Real-Time Systems, 2007. ECRTS’07.Google Scholar
  30. 30.
    Sumesh, U., Angel, D., & Rajeev, B. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Transactions on Embedded Computing Systems (TECS).Google Scholar
  31. 31.
    Xilinx (2010). Channelized direct memory access and scatter gather.Google Scholar
  32. 32.
    Lattice Semiconductor Corporation. Scatter-Gather Direct Memory Access Controller IP Core Users Guide, 2010.Google Scholar
  33. 33.
    Altera Corporation. Scatter-Gather DMA Controller Core, Quartus II 9.1, 2009.Google Scholar
  34. 34.
    Hussain, T., Palomar, O., Cristal, A., Unsal, O., Ayguady, E., & Valero, M. (2014). PVMC: Programmable vector memory controller. In The 25th IEEE International Conference on Application-specific Systems, Architectures and Processors. IEEE ASAP 2014 Conference.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Tassadaq Hussain
    • 1
    • 2
    • 3
    • 4
    Email author
  • Oscar Palomar
    • 3
    • 4
  • Osman S. Ünsal
    • 3
  • Adrian Cristal
    • 3
    • 4
    • 5
  • Eduard Ayguadé
    • 3
    • 4
  1. 1.Riphah International UniversityIslamabadPakistan
  2. 2.Unal Color of Education Research and DevelopmentIslamabadPakistan
  3. 3.Computer Sciences, Barcelona Supercomputing CenterBarcelonaSpain
  4. 4.Departament d’Arquitectura de ComputadorsUniversitat Politècnica de CatalunyaBarcelonaSpain
  5. 5.Artificial Intelligence Research Institute (IIIA), Centro Superior de Investigaciones Científicas (CSIC)BarcelonaSpain

Personalised recommendations