A Tightly Coupled Accelerator Infrastructure for Exact Arithmetics

  • Fabian Nowak
  • Rainer Buchty
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5974)


Processor speed and available computing power constantly increases, enabling computation of more and more complex problems such as numerical simulations of physical processes. In this domain, however, the problem of accuracy arises due to rounding of intermediate results. One solution is to avoid intermediate rounding by using exact arithmetic. The use of FPGAs as application-specific accelerators can speed up such operations compared to their software implementation.

In this paper, we present a system approach employing state-of-the art FPGA and interconnection technology for exact arithmetic with double-precision operands, delivering up to 400M exact MACs/s in total and providing a speedup of up to 88 times over competing software implementations in the case of matrix multiplication.


Matrix Multiplication Pipeline Stage Accumulation Unit Exact Arithmetic Pipeline Register 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Danese, G., Leporati, F., Bera, M., Giachero, M., Nazzicari, N., Spelgatti, A.: An Accelerator for Physics Simulations. Computing in Science and Engineering 9(5), 16–25 (2007)CrossRefGoogle Scholar
  2. 2.
    Kumar, V.B.Y., Joshi, S., Patkar, S.B., Narayanan, H.: FPGA Based High Performance Double-Precision Matrix Multiplication. In: VLSID 2009: Proceedings of the 2009 22nd International Conference on VLSI Design, Washington, DC, USA, pp. 341–346. IEEE Computer Society, Los Alamitos (2009)CrossRefGoogle Scholar
  3. 3.
    DuBois, D., DuBois, A., Boorman, T., Connor, C., Poole, S.: An Implementation of the Conjugate Gradient Algorithm on FPGAs. In: Pocek, K.L., Buell, D.A. (eds.) FCCM, pp. 296–297. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  4. 4.
    Morris, G.: Floating-Point Computations on Reconfigurable Computers. In: HPCMP-UGC ’07: Proceedings of the 2007 DoD High Performance Computing Modernization Program Users Group Conference, Washington, DC, USA, pp. 339–344. IEEE Computer Society Press, Los Alamitos (2007)CrossRefGoogle Scholar
  5. 5.
    Strzodka, R., Göddeke, D.: Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), April 2006, pp. 259–268 (2006)Google Scholar
  6. 6.
    Herbordt, M., Sukhwani, B., Chiu, M., Khan, M.A.: Production Floating Point Applications on FPGAs. In: Symposium on Application Accelerators in High Performance Computing, SAAHPC 2009 (July 2009)Google Scholar
  7. 7.
    Kulisch, U.W.: Complete Interval Arithmetic and Its Implementation on the Computer. In: Cuyt, A., Krämer, W., Luther, W., Markstein, P. (eds.) Numerical Validation in Current Hardware Architectures. LNCS, vol. 5492, pp. 7–26. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Bierlox, N.: Ein VHDL Koprozessorkern für das exakte Skalarprodukt. PhD thesis, Universität Karlsruhe (November 2002),
  9. 9.
    Kirchner, R., Kulisch, U.: Accurate arithmetic for vector processors. Journal of Parallel and Distributed Computing 5(3), 250–270 (1988)CrossRefGoogle Scholar
  10. 10.
    Nowak, F., Buchty, R., Kramer, D., Karl, W.: Exploiting the HTX-Board as a Coprocessor for Exact Arithmetics. In: Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA 2009), February 2009, pp. 20–29. Computer Architecture Group, Institute for Computer Engineering (ZITI), University of Heidelberg (2009)Google Scholar
  11. 11.
    Buchty, R., Kramer, D., Kicherer, M., Karl, W.: A Light-Weight Approach to Dynamical Runtime Linking Supporting Heterogenous, Parallel, and Reconfigurable Architectures. In: Berekovic, M., Müller-Schloer, C., Hochberger, C., Wong, S. (eds.) ARCS 2009. LNCS, vol. 5455, pp. 60–71. Springer, Heidelberg (2009)Google Scholar
  12. 12.
    Fröning, H., Nüssle, M., Slogsnat, D., Litz, H., Brüning, U.: The HTX-Board: A Rapid Prototyping Station. In: Proceedings of the 3rd Annual FPGA World Conference (2006)Google Scholar
  13. 13.
    Kulisch, U.W.: Advanced Arithmetic for the Digital Computer: Design of Arithmetic Units. Springer, Secaucus (2002)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Fabian Nowak
    • 1
  • Rainer Buchty
    • 1
  1. 1.Chair for Computer ArchitectureKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations