Low Precision Processing for High Order Stencil Computations

  • Gagandeep SinghEmail author
  • Dionysios DiamantopoulosEmail author
  • Sander Stuijk
  • Christoph Hagleitner
  • Henk Corporaal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11733)


Modern scientific workloads have demonstrated the inefficiency of using high precision formats. Moving to a lower bit format or even to a different number system can provide tremendous gains in terms of performance and energy efficiency. In this article, we explore the applicability of different number formats and exhaustively search for the appropriate bit width for 3D complex stencil kernels, which are one of the most widely used scientific kernels. Further, we demonstrate the achievable performance of these kernels on state-of-the-art hardware that includes CPU and FPGA, which is the only hardware supporting arbitrary fixed-point precision. Thus, this work fills the gap between current hardware capabilities and future systems for stencil-based scientific applications.



This work was performed in the framework of the Horizon 2020 program for the project “Near-Memory Computing (NeMeCo)”. It is funded by the European Commission under Marie Sklodowska-Curie Innovative Training Networks European Industrial Doctorate (Project ID: 676240). We would also like to thank Martino Dazzi for his valuable remarks. This work was partially supported by the H2020 research and innovation programme under grant agreement No 732631, project OPRECOMP.


  1. 1.
    Anderson, E., et al.: LAPACK Users’ guide, vol. 9. Siam (1999)Google Scholar
  2. 2.
    Carmichael, Z., et al.: Deep positron: a deep neural network using the posit number system. arXiv preprint arXiv:1812.01762 (2018)
  3. 3.
    Chi, Y., Cong, J., Wei, P., Zhou, P.: SODA: stencil with optimized dataflow architecture. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2018)Google Scholar
  4. 4.
    Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Press (2008)Google Scholar
  5. 5.
    Diamantopoulos, D., Giefers, H., Hagleitner, C.: ecTALK: energy efficient coherent transprecision accelerators–the bidirectional long short-term memory neural network case. In: 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), pp. 1–3. IEEE (2018)Google Scholar
  6. 6.
    Doms, G., Schättler, U.: The nonhydrostatic limited-area model LM (lokal-model) of the DWD. Part I. Scientific documentation. DWD, GB Forschung und Entwicklung (1999)Google Scholar
  7. 7.
    de Fine Licht, J., Blott, M., Hoefler, T.: Designing scalable FPGA architectures using high-level synthesis. ACM SIGPLAN Not. 53(1), 403–404 (2018)CrossRefGoogle Scholar
  8. 8.
    Finnerty, A., Ratigner, H.: Reduce power and cost by converting from floating point to fixed point. In: WP491 (v1. 0) (2017)Google Scholar
  9. 9.
    Gustafson, J.L., Yonemoto, I.T.: Beating floating point at its own game: posit arithmetic. Supercomput. Front. Innovations 4(2), 71–86 (2017)Google Scholar
  10. 10.
    Gysi, T., Grosser, T., Hoefler, T.: Modesto: data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 177–186. ACM (2015)Google Scholar
  11. 11.
    Iwata, A., et al.: An artificial neural network accelerator using general purpose 24 bits floating point digital signal processors. In: IJCNN-89, vol. 2, pp. l71–175 (1989)Google Scholar
  12. 12.
    Klöwer, M., Düben, P.D., Palmer, T.N.: Posits as an alternative to floats for weather and climate models (2019)Google Scholar
  13. 13.
    Langroudi, S.H.F., Pandit, T., Kudithipudi, D.: Deep learning inference on embedded devices: fixed-point vs posit. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 19–23. IEEE (2018)Google Scholar
  14. 14.
    Nguyen, A., et al.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. IEEE Computer Society (2010)Google Scholar
  15. 15.
    Parker, M.: Understanding peak floating-point performance claims. Technical White Paper WP-012220-1.0 (2014)Google Scholar
  16. 16.
    Sano, K., Hatsuda, Y., Yamamoto, S.: Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth. IEEE Trans. Parallel Distrib. Syst. 25(3), 695–705 (2014)CrossRefGoogle Scholar
  17. 17.
    Singh, G., et al.: A review of near-memory computing architectures: opportunities and challenges. In: 2018 21st Euromicro Conference on Digital System Design (DSD), pp. 608–617. IEEE (2018)Google Scholar
  18. 18.
    Singh, G., et al.: NAPEL: near-memory computing application performance prediction via ensemble learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019, pp. 27:1–27:6. ACM, New York (2019)Google Scholar
  19. 19.
    Waidyasooriya, H.M., et al.: OpenCL-based FPGA-platform for stencil computation and its optimization methodology. IEEE Trans. Parallel Distrib. Syst. 28(5), 1390–1402 (2017)CrossRefGoogle Scholar
  20. 20.
    Xu, J., et al.: Performance tuning and analysis for stencil-based applications on POWER8 processor. ACM Trans. Archit. Code Optim. (TACO) 15(4), 41 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Gagandeep Singh
    • 1
    • 2
    Email author
  • Dionysios Diamantopoulos
    • 2
    Email author
  • Sander Stuijk
    • 1
  • Christoph Hagleitner
    • 2
  • Henk Corporaal
    • 1
  1. 1.Eindhoven University of TechnologyEindhovenNetherlands
  2. 2.IBM Research-ZurichRüschlikonSwitzerland

Personalised recommendations