GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications

  • Ignacio LagunaEmail author
  • Paul C. Wood
  • Ranvijay Singh
  • Saurabh Bagchi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11501)


We present GPUMixer, a tool to perform mixed-precision floating-point tuning on scientific GPU applications. While precision tuning techniques are available, they are designed for serial programs and are accuracy-driven, i.e., they consider configurations that satisfy accuracy constraints, but these configurations may degrade performance. GPUMixer, in contrast, presents a performance-driven approach for tuning. We introduce a novel static analysis that finds Fast Imprecise Sets (FISets), sets of operations on low precision that minimize type conversions, which often yield performance speedups. To estimate the relative error introduced by GPU mixed-precision, we propose shadow computations analysis for GPUs, the first of this class for multi-threaded applications. GPUMixer obtains performance improvements of up to \(46.4\%\) of the ideal speedup in comparison to only \(20.7\%\) found by state-of-the-art methods.



We thank the anonymous reviewers for their suggestions and comments on the paper. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-748618).


  1. 1.
  2. 2.
    Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 44–54. IEEE (2009)Google Scholar
  3. 3.
    Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamarić, Z.: Rigorous floating-point mixed-precision tuning. In: 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017. Association for Computing Machinery (2017)Google Scholar
  4. 4.
    Chiang, W.-F., Gopalakrishnan, G., Rakamaric, Z., Solovyev, A.: Efficient search for inputs causing high floating-point errors. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, pp. 43–52. ACM, New York (2014)Google Scholar
  5. 5.
    Damouche, N., Martel, M., Chapoutot, A.: Intra-procedural optimization of the numerical accuracy of programs. In: Núñez, M., Güdemann, M. (eds.) FMICS 2015. LNCS, vol. 9128, pp. 31–46. Springer, Cham (2015). Scholar
  6. 6.
    Darulova, E., Kuncak, V.: Towards a compiler for reals. ACM Trans. Program. Lang. Syst. (TOPLAS) 39(2), 8 (2017)CrossRefGoogle Scholar
  7. 7.
    Guo, H., Rubio-González, C.: Exploiting community structure for floating-point precision tuning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 333–343. ACM (2018)Google Scholar
  8. 8.
    Harris, M.: Mini-nbody: a simple N-body code (2014).
  9. 9.
  10. 10.
    Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013Google Scholar
  11. 11.
    Lam, M.O., Hollingsworth, J.K.: Fine-grained floating-point precision analysis. Int. J. High Perform. Comput. Appl. 32, 231 (2016). 1094342016652462CrossRefGoogle Scholar
  12. 12.
    Lam, M.O., Hollingsworth, J.K., de Supinski, B.R., LeGendre, M.P.: Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 369–378. ACM (2013)Google Scholar
  13. 13.
    Lam, M.O., Rountree, B.L.: Floating-point shadow value analysis. In: Proceedings of the 5th Workshop on Extreme-Scale Programming Tools, pp. 18–25. IEEE Press (2016)Google Scholar
  14. 14.
    Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, p. 75. IEEE Computer Society (2004)Google Scholar
  15. 15.
    Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40, 190–200 (2005)CrossRefGoogle Scholar
  16. 16.
    Menon, H., et al.: ADAPT: algorithmic differentiation applied to floating-point precision tuning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 48. IEEE Press (2018)Google Scholar
  17. 17.
    NDIDIA. CUDA ToolKit Documentation - NVVM IR Specification 1.5 (2018).
  18. 18.
    Nguyen, H.: GPU Gems 3, pp. 677–694. Addison-Wesley Professional, Reading (2007). chapter 31Google Scholar
  19. 19.
    Nvidia. Nvidia Tesla P100 GPU. Pascal Architecture White Paper (2016)Google Scholar
  20. 20.
    Nvidia. CUDA C Programming Guide, v9.0 (2018).
  21. 21.
    Paganelli, G., Ahrendt, W.: Verifying (in-) stability in floating-point programs by increasing precision, using SMT solving. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–216. IEEE (2013)Google Scholar
  22. 22.
    Rubio-González, C., et al.: Floating-point precision tuning using blame analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, pp. 1074–1085. ACM, New York (2016)Google Scholar
  23. 23.
    Rubio-González, C., et al.: Precimonious: tuning assistant for floating-point precision. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 27. ACM (2013)Google Scholar

Copyright information

© This is a U.S. government work and not under copyright protection in the United States; foreign copyright protection may apply 2019

Authors and Affiliations

  • Ignacio Laguna
    • 1
    Email author
  • Paul C. Wood
    • 2
  • Ranvijay Singh
    • 3
  • Saurabh Bagchi
    • 4
  1. 1.Lawrence Livermore National LaboratoryLivermoreUSA
  2. 2.Johns Hopkins Applied Physics LabLaurelUSA
  3. 3.NVIDIA CorporationSanta ClaraUSA
  4. 4.Purdue UniversityWest LafayetteUSA

Personalised recommendations