Skip to main content

Accelerate Communication, not Computation!

  • Chapter
  • First Online:
High-Performance Computing Using FPGAs

Abstract

Computer systems are showing a continuously increasing degree of parallelism in all areas. Stagnating single thread performance as well as power constraints prevent a reversal of this trend. On the contrary, current projections show that the trend towards parallelism will accelerate. In cluster computing scalability and therefore the degree of parallelism are limited by the network interconnect and its characteristics like latency, message rate, overlap and bandwidth. While most interconnection networks focus on improving bandwidth, there are many applications that are very sensitive to latency, message rate and overlap, too. We present an interconnection network called EXTOLL, which is specifically designed to improve characteristics like latency, message rate and overlap, rather than focusing solely on improving bandwidth. Key techniques to achieve this are designing EXTOLL as an integral part of the HPC system, providing dedicated support for multi-core environments and designing and optimizing EXTOLL from scratch for the needs of high performance computing. The most important parts of EXTOLL are the network interface and the network switch, which is a crucial resource when scaling the network. EXTOLL’s network interface provides dedicated support for small messages for eager communication, and for bulk transfers in the form of rendezvous communication. While support for small messages is optimized mainly for high message rates and low latencies, for bulk transfers the possible amount of overlap between communication and computation is optimized. EXTOLL is completely based on FPGA technology, both for the network interface and the switching. In this work we present a case for accelerated communication, where FPGAs are not used to speed up computational processes, rather we employ FPGAs to speed up communication. We will show that in spite of the inferior performance characteristics of FPGAs compared to ASIC solutions, we can dramatically accelerate communication tasks and thus reduce the overall execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. TOP500 list: http://www.top500.org

  2. Infiniband Trade Association; InfiniBand Architecture Specification Volume 1; Release 1.2.1, 2007

    Google Scholar 

  3. R. Brightwell, K.T. Pedretti, K.D. Underwood, H. Trammell, SeaStar interconnect: Balanced bandwidth for scalable performance. IEEE Micro 41–57 (2006)

    Google Scholar 

  4. R. Alverson, D. Roweth, L. Kaplan, The gemini system interconnect high performance interconnects (HOTI), in 2010 IEEE 18th Annual Symposium on, 2010, pp. 83–87

    Google Scholar 

  5. Y. Ajima, S. Sumimoto, T. Shimizu, Tofu: A 6D mesh/torus interconnect for exascale computers. Computer 36–40 (2009). IEEE Computer Society

    Google Scholar 

  6. The BlueGene/L Team An overview of the BlueGene/L supercomputer, in Proceedings 2002 ACM/IEEE Conf. Supercomputing (SC 02), IEEE CS Press, 2002

    Google Scholar 

  7. P. Kogge et al., (eds.), ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. US Department of Energy, Office of Science, Advanced Scientific computing Reasrach, Waschington, DC, (2008). available at http://www.er.doe.gov/ascr

  8. D.A. Patterson, Latency lags bandwith. Comm. ACM 47(10), 71–75 (2004)

    Article  MathSciNet  Google Scholar 

  9. D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, T. von Eicken, LogP: towards a realistic model of parallel computation, in Fourth ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 1993, pp. 262–273

    Google Scholar 

  10. S. Sur, M.J. Koop, D.K. Panda, High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis, in Proceedings of the 2006 ACM/IEEE conference on Supercomputing (SC ‘06), 2006

    Google Scholar 

  11. W. Lawry, C. Wilson, A. Maccabe, R. Brightwell, COMB: a portable benchmark suite for assessing MPI overlap, in Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002, p. 472

    Google Scholar 

  12. D. Slogsnat, A. Giese, M. Nüssle, U. Brüning, An open-source HyperTransport core, in ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 1(3), 2008, p. 1–21

    Google Scholar 

  13. H. Litz, H. Fröning, U. Brüning, HTAX: a novel framework for flexible and high performance networks-on-chip, in Fourth Workshop on Interconnection Network Architectures: On-Chip, Multi-Chip (INA-OCMC), in conjunction with HiPEAC, 2010

    Google Scholar 

  14. H. Litz, H. Fröning, M. Nüssle, U. Brüning, VELO: a novel communication engine for ultra-low latency message transfers, in 37th International Conference on Parallel Processing (ICPP-08), 2008

    Google Scholar 

  15. M. Nüssle, Acceleration of the Hardware-Software Interface of a Communication Device for Parallel Systems. Ph.D. thesis, University of Mannheim, 2009

    Google Scholar 

  16. M. Nüssle, M. Scherer, U. Brüning, A resource optimized remote-memory-access architecture for low-latency communication, in 38th International Conference on Parallel Processing (ICPP-2009), 2009

    Google Scholar 

  17. Hypertransport Technology Consortium, Hypertransport I/O Link Specification Revision 2.00b, 2005. Document #HTC20031217–0036–0009

    Google Scholar 

  18. Xilinx Inc, XtremeDSP for Virtex-4 FPGAs User Guide, UG073 (v2.7), 2008

    Google Scholar 

  19. Xilinx Inc, Virtex-4 RocketIO Multi-Gigabit Transceiver User Guide, UG076 (v4.1), 2008

    Google Scholar 

  20. H. Fröning, M. Nüssle, D. Slogsnat, H. Litz, U. Brüning, The HTX-board: a rapid prototyping station, in 3rd annual FPGAworld Conference, 2006

    Google Scholar 

  21. Xilinx Inc, Virtex-4 FPGA User Guide, UG070 (v2.6), 2008

    Google Scholar 

  22. M. Nüssle, B. Geib, H. Fröning, U. Brüning, An FPGA-based custom high performance interconnection network, in 2009 International Conference on ReConFigurable Computing and FPGAs, 2009

    Google Scholar 

  23. E. Gabriel, G.E. Fagg, G. Bosilca, et al., Open MPI: goals, concept, and design of a next generation MPI implementation, in Proceedings of the 11th European PVM/MPI Users’ Group Meeting (Euro- PVM/MPI04, 2004

    Google Scholar 

  24. H.W. Jin, S. Sur, L. Chai, D.K. Panda, LiMIC: support for high-performance MPI intra-node communication on Linux cluster, in 34th International Conference on Parallel Processing (ICPP-05), 2005

    Google Scholar 

  25. K. Yelick, D. Bonachea, W.Y. Chen, P. Colella, K. Datta, J. Duell, et al., Productivity and performance using partitioned global address space languages, in International Conference on Symbolic and Algebraic Computation, 2007

    Google Scholar 

  26. P. Mochel, The sysfs filesystem, in Proceedings of the Annual Linux Symposium, 2005

    Google Scholar 

  27. H. Fröning, M. Nüssle, H. Litz, U. Brüning, A case for FPGA based accelerated communication, in 9th International Conference on Networks (ICN 2010), 2010

    Google Scholar 

  28. Intel GmbH, Intel®;  MPI Benchmarks Users Guide and Methodology Description, 2006

    Google Scholar 

  29. V. Aggarwal, Y. Sabharwal, R. Garg, P. Heidelberger, HPCC RandomAccess benchmark for next generation supercomputers, in Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing (IPDPS), IEEE Computer Society, 2009

    Google Scholar 

  30. J. Michalakes, J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, W. Wang, The weather research and forecast model: software architecture and performance, in Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology, 2004

    Google Scholar 

  31. N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic, S. Wen-King, Myrinet: a gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995)

    Article  Google Scholar 

  32. F. Petrini, et al., The quadrics network: high-performance clustering technology. IEEE Micro 22(1), 46–57 (2002)

    Article  MathSciNet  Google Scholar 

  33. Y. Ajima, Y. Takagi, T. Inoue, S. Hiramoto, T. Shimizu, The tofu interconnect. High performance interconnects (HOTI), in 2011 IEEE 19th Annual Symposium, 2011

    Google Scholar 

  34. M. Trams, W. Rehm, SCI transaction management in our FPGA-based PCI-SCI bridge, in Proceedings of SCI Europe, 2001

    Google Scholar 

  35. N. Fugier, M. Herbert, E. Lemoine, B. Tourancheau, MPI for the Clint Gb/s Interconnect. Recent Advances in Parallel Virtual Machine and Message Passing Interface, Vol. 2840/2003, Lecture Notes in Computer Science (Springer, Heidelberg, 2003)

    Google Scholar 

  36. N. Tanabe, A. Kitamura, et al., Preliminary evaluations of a FPGA based-prototype of DIMMnet-2 network interface, in IEEE International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, 2005

    Google Scholar 

  37. M. Marazakis, K. Xinidis, V. Papaefstathiou, A. Bilas, Efficient remote block-level I/O over an RDMA-capable NIC, in Proceedings of the ACM International Conference on Supercomputing (ICS), 2006

    Google Scholar 

  38. M. Schlansker, N. Chitlur, et al., High-performance ethernet-based communications for future multi-core processors, in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC ‘07), 2007

    Google Scholar 

  39. H. Baier, et al., QPACE - a QCD parallel computer based on Cell processors, in Proceedings Science (LAT2009), 2009

    Google Scholar 

  40. C. Leber, B. Geib, H. Litz, High frequency trading acceleration using FPGAs, in 21st International Conference on Field Programmable Logic and Applications (FPL 2011), 2011

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mondrian Nüssle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Nüssle, M., Fröning, H., Kapferer, S., Brüning, U. (2013). Accelerate Communication, not Computation!. In: Vanderbauwhede, W., Benkrid, K. (eds) High-Performance Computing Using FPGAs. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1791-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1791-0_17

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1790-3

  • Online ISBN: 978-1-4614-1791-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics