On the Performance of Parallel Tasking Runtimes for an Irregular Fast Multipole Method Application

  • Patrick AtkinsonEmail author
  • Simon McIntosh-Smith
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10468)


This paper will present our work on optimising and comparing the performance of an irregular algorithm for the increasingly important fast multipole method with the use of tasks. Our aim is to provide insight into how different methods of synchronisation can affect the performance of tree-based particle methods, finding that performance can be improved by 21% on some platforms. We also compare the performance of the chosen application between different OpenMP implementations and to other task-parallel programming models, finding that significant performance differences can be observed on both NUMA and Many Integrated Core architectures.


OpenMP Tasks Mini-apps Locks Atomics 



The authors would like the thank EPSRC for funding this work, as well as Bristol’s Intel Parallel Computing Centre (IPCC) for access to the KNL platform. We would also like to thank GW4 for access to the Isambard supercomputer for Broadwell results.


  1. 1.
    Duran, A., Ayguad, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OMPSS: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011). MathSciNetCrossRefGoogle Scholar
  2. 2.
    Kim, W., Voss, M.: Multicore desktop programming with intel threading building blocks. IEEE Softw. 28(1), 23–31 (2011)CrossRefGoogle Scholar
  3. 3.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999).
  4. 4.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011).
  5. 5.
    Edwards, H.C., Sunderland, D.: Kokkos array performance-portable manycore programming model. In: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2012), pp. 1–10. ACM (2012)Google Scholar
  6. 6.
    Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving Performance via Mini-applications, Sandia National Laboratories, Technical report SAND2009-5574 (2009)Google Scholar
  7. 7.
    Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos (2012).
  8. 8.
    Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in openMP. In: 2009 International Conference on Parallel Processing, pp. 124–131, September 2009Google Scholar
  9. 9.
    Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite, pp. 16–29. Springer, Cham (2014).
  10. 10.
    Yokota, R.: An FMM based on dual tree traversal for many-core architectures 7(3), 301–324.
  11. 11.
    Miquel, P., Abdelhalim, A., Keisuke, F., Naoya, M., Rio, Y., Satoshi, M.: Towards a dataflow FMM using the OMPSS programming model 12 (2012).
  12. 12.
    Agullo, E., Aumage, O., Bramas, B., Coulaud, O., Pitoiset, S.: Bridging the gap between openMP and task-based runtime systems for the fast multipole method. IEEE Trans. Parallel Distrib. Syst. PP(99) 1 (2017)Google Scholar
  13. 13.
    Greengard, L.F.: The Rapid Evaluation of Potential Fields in Particle Systems (ACM Distinguished Dissertation). The MIT Press, Cambridge (1988)Google Scholar
  14. 14.
    Aidan Chalk, A.M.E., Mason, L.: Task-based parallelism in DL POLY 4 (2017).
  15. 15.
    Argonne National Laboratory: Bolt is openMP over lightweight threads.
  16. 16.
    Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating openMP 4.0’s effectiveness as a heterogeneous parallel programming model. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 338–347, May 2016Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Merchant Venturers BuildingBristolUK

Personalised recommendations