Skip to main content

Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous Code

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13948))

Included in the following conference series:

Abstract

As the demand for developing and porting numerical applications to heterogeneous computing platforms increases, such programs may exhibit numerical inconsistencies caused by architectural differences and aggressive compiler optimizations. These numerical inconsistencies can negatively impact reproducibility and debugging. This paper presents Ciel, designed to identify the root cause of compiler-induced numerical inconsistencies in heterogeneous programs. Ciel uses a floating-point precision enhancement strategy, guided by a recursive bisection search algorithm with increasing search granularity, to identify the program expressions that induce numerical inconsistencies due to compiler optimizations. Ciel achieves 99.4% precision in isolating numerical inconsistencies in both CPU and GPU programs, including 330 synthetic GPU programs, benchmark applications like NAS Parallel Benchmarks and Rodinia, and real-world scientific applications such as CLOUDSC, a cloud microphysics parameterization mini-app for the ECMWF IFS. Furthermore, when compared with the state of the art, which only isolates lines of code in CPU programs, Ciel runs 24.5% fewer searches for statement isolation, and produces more precise results for 84.9% of the programs. Finally, manual inspection of hundreds of compiler-induced numerical inconsistencies in heterogeneous programs reveals common characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Statements and blocks with no floating-point operations are recorded but excluded from precision enhancement.

  2. 2.

    Pointers and array references are not categorized; their dereferences are directly cast.

  3. 3.

    https://github.com/LLNL/pLiner/commit/ef94b40.

References

  1. IEEE standard for floating-point arithmetic: IEEE Std 754–2008, 1–70 (2008). https://doi.org/10.1109/IEEESTD.2008.4610935

  2. CUDA Llvm compiler (2018). https://developer.nvidia.com/cuda-llvm-compiler

  3. Clang 14.0.0 documentation (2022). https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html

  4. Compiling CUDA with Clang (2022). https://releases.llvm.org/14.0.0/docs/CompileCudaWithLLVM.html

  5. Ahn, D.H., et al.: Keeping science on keel when software moves. Commun. ACM 64(2), 66–74 (2021)

    Article  Google Scholar 

  6. de Araujo, G.A., Griebler, D., Danelutto, M., Fernandes, L.G.: Efficient NAS parallel benchmark kernels with CUDA. In: PDP, pp. 9–16. IEEE (2020)

    Google Scholar 

  7. Benz, F., Hildebrandt, A., Hack, S.: A dynamic program analysis to find floating-point accuracy problems. In: PLDI, pp. 453–462. ACM (2012)

    Google Scholar 

  8. Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)

    Google Scholar 

  9. CEED: CEED/Laghos: high-order lagrangian hydrodynamics miniapp (2017). https://github.com/CEED/Laghos

  10. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC, pp. 44–54. IEEE Computer Society (2009)

    Google Scholar 

  11. Chen, Y., Su, T., Sun, C., Su, Z., Zhao, J.: Coverage-directed differential testing of JVM implementations. In: PLDI, pp. 85–99. ACM (2016)

    Google Scholar 

  12. Chowdhary, S., Nagarakatte, S.: Parallel shadow execution to accelerate the debugging of numerical errors. In: ESEC/SIGSOFT FSE, pp. 615–626. ACM (2021)

    Google Scholar 

  13. ECMWF: CLOUDSC-V3: re-create the single-exponent bug in the c variant (2019). https://github.com/ecmwf-ifs/dwarf-p-cloudsc/commit/d88c0c8f8d1effd5bd395cb71657629fb242f661

  14. ECMWF: Standalone mini-app of the ECMWF cloud microphysics parameterization (2022). https://github.com/ecmwf-ifs/dwarf-p-cloudsc

  15. Franco, A.D., Guo, H., Rubio-González, C.: A comprehensive study of real-world numerical bug characteristics. In: ASE, pp. 509–519. IEEE Computer Society (2017)

    Google Scholar 

  16. Fu, Z., Bai, Z., Su, Z.: Automated backward error analysis for numerical code. In: OOPSLA, pp. 639–654. ACM (2015)

    Google Scholar 

  17. Gopalakrishnan, G., Laguna, I., Li, A., Panchekha, P., Rubio-González, C., Tatlock, Z.: Guarding numerics amidst rising heterogeneity. In: Correctness@SC, pp. 9–15. IEEE (2021)

    Google Scholar 

  18. Guo, H., Laguna, I., Rubio-González, C.: pLiner: isolating lines of floating-point code for compiler-induced variability. In: SC, p. 49. IEEE/ACM (2020)

    Google Scholar 

  19. Joldes, M., Muller, J.-M., Popescu, V., Tucker, W.: CAMPARY: cuda multiple precision arithmetic library and applications. In: Greuel, G.-M., Koch, T., Paule, P., Sommese, A. (eds.) ICMS 2016. LNCS, vol. 9725, pp. 232–240. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42432-3_29

    Chapter  Google Scholar 

  20. Laguna, I.: FPChecker: detecting floating-point exceptions in GPU applications. In: ASE, pp. 1126–1129. IEEE (2019)

    Google Scholar 

  21. Laguna, I.: Varity: quantifying floating-point variations in HPC systems through randomized testing. In: IPDPS, pp. 622–633. IEEE (2020)

    Google Scholar 

  22. Laguna, I., Wood, P.C., Singh, R., Bagchi, S.: GPUMixer: performance-driven floating-point tuning for GPU scientific applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds.) ISC High Performance 2019. LNCS, vol. 11501, pp. 227–246. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20656-7_12

    Chapter  Google Scholar 

  23. Le, V., Afshari, M., Su, Z.: Compiler validation via equivalence modulo inputs. In: PLDI, pp. 216–226. ACM (2014)

    Google Scholar 

  24. Lu, M., He, B., Luo, Q.: Supporting extended precision on graphics processors. In: DaMoN, pp. 19–26. ACM (2010)

    Google Scholar 

  25. Nakayama, T., Takahashi, D.: Implementation of multiple-precision floating-point arithmetic library for GPU computing. In: PDCS, pp. 343–349 (2011)

    Google Scholar 

  26. Sanchez-Stern, A., Panchekha, P., Lerner, S., Tatlock, Z.: Finding root causes of floating point error. In: PLDI, pp. 256–269. ACM (2018)

    Google Scholar 

  27. Sato, K., Ahn, D.H., Laguna, I., Lee, G.L., Schulz, M.: Clock delta compression for scalable order-replay of non-deterministic parallel applications. In: SC, pp. 62:1–62:12. ACM (2015)

    Google Scholar 

  28. Sawaya, G., Bentley, M., Briggs, I., Gopalakrishnan, G., Ahn, D.H.: FLiT: cross-platform floating-point result-consistency tester and workload. In: IISWC, pp. 229–238. IEEE Computer Society (2017)

    Google Scholar 

  29. Vanover, J., Deng, X., Rubio-González, C.: Discovering discrepancies in numerical libraries. In: ISSTA, pp. 488–501. ACM (2020)

    Google Scholar 

  30. Yi, X., Chen, L., Mao, X., Ji, T.: Efficient automated repair of high floating-point errors in numerical libraries. In: POPL, pp. 56:1–56:29. ACM (2019)

    Google Scholar 

  31. Zeller, A.: Yesterday, my program worked. today, it does not. why? ACM SIGSOFT software engineering notes 24(6), 253–267 (1999)

    Google Scholar 

  32. Zhang, Q., Wang, J., Kim, M.: HeteroFuzz: fuzz testing to detect platform dependent divergence for heterogeneous applications. In: ESEC/SIGSOFT FSE, pp. 242–254. ACM (2021)

    Google Scholar 

  33. Zhang, X., et al.: Predoo: precision testing of deep learning operators. In: ISSTA, pp. 400–412. ACM (2021)

    Google Scholar 

  34. Zhu, Q., Zaidman, A.: Massively parallel, highly efficient, but what about the test suite quality? Applying mutation testing to GPU programs. In: ICST, pp. 209–219. IEEE (2020)

    Google Scholar 

Download references

Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-846081), the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under awards DE-SC0022182 and DE-SC0020286, and the National Science Foundation under award CCF-1750983.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dolores Miao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miao, D., Laguna, I., Rubio-González, C. (2023). Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous Code. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-32041-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32040-8

  • Online ISBN: 978-3-031-32041-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics