High-Bandwidth Low-Latency Interfacing with FPGA Accelerators Using PCI Express

  • Mohammadsadegh SadriEmail author
  • Christian De Schryver
  • Norbert Wehn


The need for high performance computing dictates constraints on the acceptable bandwidth of data transfer between processing units and the memory. Consequently it is crucial to build high performance, scalable, and energy efficient architectures capable of completing data transfer requests at satisfactory rates. Thanks to increased transfer rates obtained by exploiting high-speed serial data transfer links instead of traditional parallel ones, PCI Express provides a promising solution to the problem of connectivity for todays complex heterogeneous architectures. In this chapter, we first cover the principals of interfacing using PCI Express. To illustrate a practical situation, we select the Xilinx Zynq device and develop an example architecture which allows the x86 CPU cores of the host system, the ARM cores of the Zynq device, and the hardware accelerators directly realized on the FPGA fabric of the Zynq to share the available DRAM memory for efficient data sharing. We provide estimates on possible data transfer bandwidths in our architecture.


Field Programmable Gate Array Central Processing Unit Central Processing Unit Core Direct Memory Access Hardware Accelerator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    ARM.: AMBA AXI and ACE Protocol Specification, d edition (2011), last access: 2015-05-07
  2. 2.
    Athavale, A., Christensen, C.: High-Speed Serial I/O Made Simple, A Designer’s Guide, with FPGA Applications. Xilinx, San Jose (2005)Google Scholar
  3. 3.
    Cascaval, C., Chatterjee, S., Franke, H., Gildea, K.J., Pattnaik, P.: A taxonomy of accelerator architectures and their programming models. IBM J. Res. Dev. 54(5), 5:1–5:10 (2010)Google Scholar
  4. 4.
    Corbet, J., Rubini, A., Kroah-Hartman, G.: Linux Device Drivers, 3rd edn. O’Reilly Media, Beijing/Sebastopol (2005)Google Scholar
  5. 5.
    IBM. The CoreConnectTM Bus Architecture, (1999)$file/crcon_wp.pdf, last access: 2015-05-07
  6. 6.
    OpenCores. WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP Cores, b.3 edn. (2002), last access: 2015-05-07
  7. 7.
    Sadri, M., Weis, C., Wehn, N., Benini, L.: Energy and performance exploration of accelerator coherency port using xilinx zynq. In: Proceedings of the 10th FPGAworld Conference (FPGAworld ’13), New York, pp. 5:1–5:8. ACM (2013)Google Scholar
  8. 8.
  9. 9.
    Xilinx. Sdaccel development environment.
  10. 10.
    Xilinx. LogiCORE IP Endpoint for PCI Express, User Guide (2010), last access: 2015-05-07
  11. 11.
    Xilinx. ZC706 PCIe Targeted Reference Design (UG963) (2013), last access: 2015-05-07
  12. 12.
    Xilinx. Zynq-7000 All Programmable SoC Technical Reference Manual (UG585) (2013), last access: 2015-05-07

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mohammadsadegh Sadri
    • 1
    Email author
  • Christian De Schryver
    • 1
  • Norbert Wehn
    • 1
  1. 1.Microelectronic Systems Design Research GroupUniversity of KaiserslauternKaiserslauternGermany

Personalised recommendations