Accelerate Communication, not Computation!

Nüssle, Mondrian; Fröning, Holger; Kapferer, Sven; Brüning, Ulrich

doi:10.1007/978-1-4614-1791-0_17

Mondrian Nüssle³,
Holger Fröning³,
Sven Kapferer³ &
…
Ulrich Brüning³

3683 Accesses
3 Citations

Abstract

Computer systems are showing a continuously increasing degree of parallelism in all areas. Stagnating single thread performance as well as power constraints prevent a reversal of this trend. On the contrary, current projections show that the trend towards parallelism will accelerate. In cluster computing scalability and therefore the degree of parallelism are limited by the network interconnect and its characteristics like latency, message rate, overlap and bandwidth. While most interconnection networks focus on improving bandwidth, there are many applications that are very sensitive to latency, message rate and overlap, too. We present an interconnection network called EXTOLL, which is specifically designed to improve characteristics like latency, message rate and overlap, rather than focusing solely on improving bandwidth. Key techniques to achieve this are designing EXTOLL as an integral part of the HPC system, providing dedicated support for multi-core environments and designing and optimizing EXTOLL from scratch for the needs of high performance computing. The most important parts of EXTOLL are the network interface and the network switch, which is a crucial resource when scaling the network. EXTOLL’s network interface provides dedicated support for small messages for eager communication, and for bulk transfers in the form of rendezvous communication. While support for small messages is optimized mainly for high message rates and low latencies, for bulk transfers the possible amount of overlap between communication and computation is optimized. EXTOLL is completely based on FPGA technology, both for the network interface and the switching. In this work we present a case for accelerated communication, where FPGAs are not used to speed up computational processes, rather we employ FPGAs to speed up communication. We will show that in spite of the inferior performance characteristics of FPGAs compared to ASIC solutions, we can dramatically accelerate communication tasks and thus reduce the overall execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

TOP500 list: http://www.top500.org
Infiniband Trade Association; InfiniBand Architecture Specification Volume 1; Release 1.2.1, 2007
Google Scholar
R. Brightwell, K.T. Pedretti, K.D. Underwood, H. Trammell, SeaStar interconnect: Balanced bandwidth for scalable performance. IEEE Micro 41–57 (2006)
Google Scholar
R. Alverson, D. Roweth, L. Kaplan, The gemini system interconnect high performance interconnects (HOTI), in 2010 IEEE 18th Annual Symposium on, 2010, pp. 83–87
Google Scholar
Y. Ajima, S. Sumimoto, T. Shimizu, Tofu: A 6D mesh/torus interconnect for exascale computers. Computer 36–40 (2009). IEEE Computer Society
Google Scholar
The BlueGene/L Team An overview of the BlueGene/L supercomputer, in Proceedings 2002 ACM/IEEE Conf. Supercomputing (SC 02), IEEE CS Press, 2002
Google Scholar
P. Kogge et al., (eds.), ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. US Department of Energy, Office of Science, Advanced Scientific computing Reasrach, Waschington, DC, (2008). available at http://www.er.doe.gov/ascr
D.A. Patterson, Latency lags bandwith. Comm. ACM 47(10), 71–75 (2004)
Article MathSciNet Google Scholar
D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, T. von Eicken, LogP: towards a realistic model of parallel computation, in Fourth ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 1993, pp. 262–273
Google Scholar
S. Sur, M.J. Koop, D.K. Panda, High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis, in Proceedings of the 2006 ACM/IEEE conference on Supercomputing (SC ‘06), 2006
Google Scholar
W. Lawry, C. Wilson, A. Maccabe, R. Brightwell, COMB: a portable benchmark suite for assessing MPI overlap, in Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002, p. 472
Google Scholar
D. Slogsnat, A. Giese, M. Nüssle, U. Brüning, An open-source HyperTransport core, in ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 1(3), 2008, p. 1–21
Google Scholar
H. Litz, H. Fröning, U. Brüning, HTAX: a novel framework for flexible and high performance networks-on-chip, in Fourth Workshop on Interconnection Network Architectures: On-Chip, Multi-Chip (INA-OCMC), in conjunction with HiPEAC, 2010
Google Scholar
H. Litz, H. Fröning, M. Nüssle, U. Brüning, VELO: a novel communication engine for ultra-low latency message transfers, in 37th International Conference on Parallel Processing (ICPP-08), 2008
Google Scholar
M. Nüssle, Acceleration of the Hardware-Software Interface of a Communication Device for Parallel Systems. Ph.D. thesis, University of Mannheim, 2009
Google Scholar
M. Nüssle, M. Scherer, U. Brüning, A resource optimized remote-memory-access architecture for low-latency communication, in 38th International Conference on Parallel Processing (ICPP-2009), 2009
Google Scholar
Hypertransport Technology Consortium, Hypertransport I/O Link Specification Revision 2.00b, 2005. Document #HTC20031217–0036–0009
Google Scholar
Xilinx Inc, XtremeDSP for Virtex-4 FPGAs User Guide, UG073 (v2.7), 2008
Google Scholar
Xilinx Inc, Virtex-4 RocketIO Multi-Gigabit Transceiver User Guide, UG076 (v4.1), 2008
Google Scholar
H. Fröning, M. Nüssle, D. Slogsnat, H. Litz, U. Brüning, The HTX-board: a rapid prototyping station, in 3rd annual FPGAworld Conference, 2006
Google Scholar
Xilinx Inc, Virtex-4 FPGA User Guide, UG070 (v2.6), 2008
Google Scholar
M. Nüssle, B. Geib, H. Fröning, U. Brüning, An FPGA-based custom high performance interconnection network, in 2009 International Conference on ReConFigurable Computing and FPGAs, 2009
Google Scholar
E. Gabriel, G.E. Fagg, G. Bosilca, et al., Open MPI: goals, concept, and design of a next generation MPI implementation, in Proceedings of the 11th European PVM/MPI Users’ Group Meeting (Euro- PVM/MPI04, 2004
Google Scholar
H.W. Jin, S. Sur, L. Chai, D.K. Panda, LiMIC: support for high-performance MPI intra-node communication on Linux cluster, in 34th International Conference on Parallel Processing (ICPP-05), 2005
Google Scholar
K. Yelick, D. Bonachea, W.Y. Chen, P. Colella, K. Datta, J. Duell, et al., Productivity and performance using partitioned global address space languages, in International Conference on Symbolic and Algebraic Computation, 2007
Google Scholar
P. Mochel, The sysfs filesystem, in Proceedings of the Annual Linux Symposium, 2005
Google Scholar
H. Fröning, M. Nüssle, H. Litz, U. Brüning, A case for FPGA based accelerated communication, in 9th International Conference on Networks (ICN 2010), 2010
Google Scholar
Intel GmbH, Intel^®; MPI Benchmarks Users Guide and Methodology Description, 2006
Google Scholar
V. Aggarwal, Y. Sabharwal, R. Garg, P. Heidelberger, HPCC RandomAccess benchmark for next generation supercomputers, in Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing (IPDPS), IEEE Computer Society, 2009
Google Scholar
J. Michalakes, J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, W. Wang, The weather research and forecast model: software architecture and performance, in Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology, 2004
Google Scholar
N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic, S. Wen-King, Myrinet: a gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995)
Article Google Scholar
F. Petrini, et al., The quadrics network: high-performance clustering technology. IEEE Micro 22(1), 46–57 (2002)
Article MathSciNet Google Scholar
Y. Ajima, Y. Takagi, T. Inoue, S. Hiramoto, T. Shimizu, The tofu interconnect. High performance interconnects (HOTI), in 2011 IEEE 19th Annual Symposium, 2011
Google Scholar
M. Trams, W. Rehm, SCI transaction management in our FPGA-based PCI-SCI bridge, in Proceedings of SCI Europe, 2001
Google Scholar
N. Fugier, M. Herbert, E. Lemoine, B. Tourancheau, MPI for the Clint Gb/s Interconnect. Recent Advances in Parallel Virtual Machine and Message Passing Interface, Vol. 2840/2003, Lecture Notes in Computer Science (Springer, Heidelberg, 2003)
Google Scholar
N. Tanabe, A. Kitamura, et al., Preliminary evaluations of a FPGA based-prototype of DIMMnet-2 network interface, in IEEE International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, 2005
Google Scholar
M. Marazakis, K. Xinidis, V. Papaefstathiou, A. Bilas, Efficient remote block-level I/O over an RDMA-capable NIC, in Proceedings of the ACM International Conference on Supercomputing (ICS), 2006
Google Scholar
M. Schlansker, N. Chitlur, et al., High-performance ethernet-based communications for future multi-core processors, in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC ‘07), 2007
Google Scholar
H. Baier, et al., QPACE - a QCD parallel computer based on Cell processors, in Proceedings Science (LAT2009), 2009
Google Scholar
C. Leber, B. Geib, H. Litz, High frequency trading acceleration using FPGAs, in 21st International Conference on Field Programmable Logic and Applications (FPL 2011), 2011
Google Scholar

Download references

Author information

Authors and Affiliations

University of Heidelberg, Mannheim, Germany
Mondrian Nüssle, Holger Fröning, Sven Kapferer & Ulrich Brüning

Authors

Mondrian Nüssle
View author publications
You can also search for this author in PubMed Google Scholar
Holger Fröning
View author publications
You can also search for this author in PubMed Google Scholar
Sven Kapferer
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Brüning
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mondrian Nüssle .

Editor information

Editors and Affiliations

School of Computing Science, University of Glasgow, Glasgow, UK
Wim Vanderbauwhede
School of Engineering and Electronics, The University of Edinburgh, Edinburgh, UK
Khaled Benkrid

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nüssle, M., Fröning, H., Kapferer, S., Brüning, U. (2013). Accelerate Communication, not Computation!. In: Vanderbauwhede, W., Benkrid, K. (eds) High-Performance Computing Using FPGAs. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1791-0_17

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1791-0_17
Published: 28 February 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1790-3
Online ISBN: 978-1-4614-1791-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics