Parallel computations are intrinsically non-reproducible, due to a combined effect of non-deterministic parallel reductions and non-associative floating point operations. Different strategies have been proposed in literature to alleviate this issue or eliminate it altogether, however at present there is no study on the performance impact of associative floating point operations on large scale applications. In this work, we implement associative operations using binned doubles in MiniFE, and perform various performance tests on Cirrus and Fulhame, two state-of-the-art HPC systems.
Keywords
- Reproducibility
- Binned doubles
- Performance