Power and Thermal Efficient Numerical Processing



Numerical processing is at the core of applications in many areas ranging from scientific and engineering calculations to financial computing. These applications are usually executed on large servers or supercomputers to exploit their high speed, high level of parallelism and high bandwidth to memory.


Power Dissipation Clock Cycle Division Algorithm Quotient Digit Average Power Dissipation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    S. Borkar, “Electronics beyond nano-scale CMOS,” Proc. of the 43rd ACM/IEEE Design Automation Conference, pp. 807–808, 2006.Google Scholar
  2. 2.
    D. K. Schroder and J. A. Babcock, “Negative bias temperature instability: Road to cross in deep submicron silicon semiconductor manufacturing,” Journal of Applied Physics, vol. 94, no. 1, pp. 1–18, July 2003.CrossRefGoogle Scholar
  3. 3.
    X. Fan, W.-D. Weber, and L. A. Barroso, “Power Provisioning for a Warehouse-sized Computer,” Proc. of ACM International Symposium on Computer Architecture, June 2007.Google Scholar
  4. 4.
    M. Cornea, “Precision, Accuracy, and Rounding Error Propagation in Exascale Computing,” Proc. of 21st IEEE Symposium on Computer Arithmetic, pp. 231–234, Apr. 2013.Google Scholar
  5. 5.
    IEEE Standard for Floating-Point Arithmetic, IEEE Computer Society Std. 754, 2008.Google Scholar
  6. 6.
    M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann Publishers, 2004.Google Scholar
  7. 7.
    S. Oberman, G. Favor, and F. Weber, “AMD 3DNow! technology: architecture and implementations,” IEEE Micro, vol. 19, no. 2, pp. 37–48, Mar./Apr. 1999.CrossRefGoogle Scholar
  8. 8.
    T. Lang and J. Bruguera, “Floating-point multiply-add-fused with reduced latency,” IEEE Transactions on Computers, vol. 53, no. 8, pp. 988–1003, Aug. 2004.CrossRefGoogle Scholar
  9. 9.
    M. D. Ercegovac and T. Lang, Division and Square Root: Digit Recurrence Algorithms and Implementations. Kluwer Academic Publisher, 1994.Google Scholar
  10. 10.
    H. Baliga, N. Cooray, E. Gamsaragan, P. Smith, K. Yoon, J. Abel, and A. Valles, “Improvements in the Intel Core2 Penryn Processor Family Architecture and Microarchitecture,” Intel Technology Journal, pp. 179–192, Oct. 2008.Google Scholar
  11. 11.
    N. Burgess and C. Hinds, “Design issues in radix-4 SRT square root and divide unit,” Conference Record of 35th Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1646–1650, 2001.Google Scholar
  12. 12.
    G. Gerwig, H. Wetter, E. Schwarz, and J. Haess, “High performance floating-point unit with 116 bit wide divider,” Proc. of 16th IEEE Symposium on Computer Arithmetic, pp. 87–94, Jun. 2003.Google Scholar
  13. 13.
    A. Nannarelli and T. Lang, “Low-power division: comparison among implementations of radix 4, 8 and 16,” Proc. of 14th IEEE Symposium on Computer Arithmetic, pp. 60–67, 1999.Google Scholar
  14. 14.
    S. Oberman, “Floating point division and square root algorithms and implementation in the AMD-K7 microprocessor,” Proc. of 14th IEEE Symposium on Computer Arithmetic, pp. 106–115, 1999.Google Scholar
  15. 15.
    NVIDIA. “Fermi. NVIDIA’s Next Generation CUDA Compute Architecture”. Whitepaper. [Online]. Available: NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
  16. 16.
    D. DasSarma and D. Matula, “Measuring the accuracy of ROM reciprocal tables,” IEEE Transactions on Computers, vol. 43, no. 8, pp. 932–940, Aug. 1994.CrossRefMATHGoogle Scholar
  17. 17.
    D. A. Patterson and J. L. Hennessy, Computer Organization and Design-the hardware/software interface, 4th ed.Morgan Kaufmann Publishers Inc., 2009.Google Scholar
  18. 18.
    S. Oberman and M. Flynn, “Design issues in division and other floating-point operations,” IEEE Transactions on Computers, vol. 46, no. 2, pp. 154–161, Feb. 1997.CrossRefMathSciNetGoogle Scholar
  19. 19.
    W. Liu, A. Calimera, A. Nannarelli, E. Macii, and M. Poncino, “On-chip Thermal Modeling Based on SPICE Simulation,” Proc. of 19th International Workshop on Power And Timing Modeling, Optimization and Simulation (PATMOS 2009), pp. 66–75, Sept. 2009.Google Scholar
  20. 20.
    T. Sato, J. Ichimiya, N. Ono, K. Hachiya, and M. Hashimoto, “On-chip thermal gradient analysis and temperature flattening for SoC design,” Proc. of the 2005 Asia and South Pacific Design Automation Conference (ASP-DAC), vol. 2, pp. 1074–1077, Jan. 2005.CrossRefGoogle Scholar
  21. 21.
    A. Nannarelli, “FPGA Based Acceleration of Decimal Operations,” in Proc. of International Conference on ReConFigurable Computing and FPGA’s, Dec. 2011, pp. 146–151.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Oticon A/SSmørumDenmark
  2. 2.DTU ComputeTechnical University of DenmarkKongens LyngbyDenmark

Personalised recommendations