Advertisement

A Proposal for Supporting Speculation in the OpenMP taskloop Construct

  • Juan SalamancaEmail author
  • Alexandro Baldassin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)

Abstract

Parallelization constructs in OpenMP, such as parallel for or taskloop, are typically restricted to loops that have no loop-carried dependencies (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions prevent the parallelization of many computational intensive may DOACROSS loops. In such loops, the compiler cannot prove that the loop is free of loop-carried dependencies, although they may not exist at runtime. This paper proposes a new clause for taskloop that enables speculative parallelization of may DOACROSS loops: the tls clause. We also present an initial evaluation that reveals that: (a) for certain loops, slowdowns using DOACROSS techniques can be transformed in speed-ups of up to \(2.14\times \) by applying speculative parallelization of tasks; and (b) the scheduling of tasks implemented in the Intel OpenMP runtime exacerbates the ratio of order inversion aborts after applying the taskloop-tls parallelization to a loop.

Keywords

taskloop DOACROSS Thread-Level Speculation 

Notes

Acknowledgments

The authors would like to thank the anonymous reviewers for the insightful comments. This work is supported by FAPESP (grants 18/07446-8 and 18/15519-5).

References

  1. 1.
    Aldea, S., Estebanez, A., Llanos, D.R., Gonzalez-Escribano, A.: An OpenMP extension that supports thread-level speculation. IEEE Trans. Parallel Distrib. Syst. 27(1), 78–91 (2016)CrossRefGoogle Scholar
  2. 2.
    Ayguade, E., et al.: The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. (TPDS) 20(3), 404–418 (2009)CrossRefGoogle Scholar
  3. 3.
    Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: International Conference on Parallel Processing (ICPP), pp. 836–844 (1986)Google Scholar
  4. 4.
    Etsion, Y., et al.: Task superscalar: an out-of-order task pipeline. In: International Symposium on Microarchitecture, Washington, DC, USA, pp. 89–100 (2010)Google Scholar
  5. 5.
    cTuning Foundation: Cbench: collective benchmarks (2016). http://ctuning.org/cbench
  6. 6.
    Herlihy, M., Moss, J.E.: Transactional memory: architectural support for lock-free data structures. In: International Symposium on Computer Architecture (ISCA), San Diego, CA, USA, pp. 289–300, May 1993Google Scholar
  7. 7.
    IBM: IBM XL C/C++ for Blue Gene/Q, V12.1 Compiler Reference (2012). http://www-01.ibm.com/support/docview.wss?uid=swg27027065&aid=1
  8. 8.
    Intel Corporation: Intel architecture instruction set extensions programming reference. Chapter 8: Intel transactional synchronization extensions (2012)Google Scholar
  9. 9.
    Lamport, L.: The parallel execution of do loops. Commun. ACM 17(2), 83–93 (1974)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Mattos, L., Cesar, D., Salamanca, J., de Carvalho, J.P.L., Pereira, M., Araujo, G.: Doacross parallelization based on component annotation and loop-carried probability. In: International Symposium on Computer. Architecture and High Performance Computing (SBAC-PAD), Lyon, France, pp. 29–32 (2018)Google Scholar
  11. 11.
    Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., Wood, D.A.: LogTM: log-based transactional memory. In: High-Performance Computer Architecture (HPCA), pp. 254–265 (2006)Google Scholar
  12. 12.
    Murphy, N., Jones, T., Mullins, R., Campanoni, S.: Performance implications of transient loop-carried data dependences in automatically parallelized loops. In: International Conference on Compiler Construction (CC), Barcelona, Spain, pp. 23–33 (2016)Google Scholar
  13. 13.
    OpenMP-ARB: OpenMP application program interface version 4.5 (2015)Google Scholar
  14. 14.
    OpenMP-ARB: OpenMP application program interface version 5.0 (2018)Google Scholar
  15. 15.
    Ottoni, G., Rangan, R., Stoler, A., August, D.I.: Automatic thread extraction with decoupled software pipelining. In: International Symposium on Microarchitecture (MICRO), p. 12, November 2005Google Scholar
  16. 16.
    Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE International Conference on Cluster Computing, Tsukuba, Japan, pp. 142–151 (2008)Google Scholar
  17. 17.
    Podobas, A., Karlsson, S.: Towards unifying OpenMP under the task-parallel paradigm. In: International Workshop on OpenMP (IWOMP), Nara, Japan, pp. 116–129 (2016)CrossRefGoogle Scholar
  18. 18.
    Salamanca, J., Amaral, J.N., Araujo, G.: Evaluating and improving thread-level speculation in hardware transactional memories. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, USA, pp. 586–595 (2016)Google Scholar
  19. 19.
    Salamanca, J., Amaral, J.N., Araujo, G.: Using hardware-transactional-memory support to implement thread-level speculation. IEEE Trans. Parallel Distrib. Syst. 29(2), 466–480 (2018)CrossRefGoogle Scholar
  20. 20.
    Salamanca, J., Amaral, J.N., Araujo, G.: Performance evaluation of thread-level speculation in off-the-shelf hardware transactional memories. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 607–621. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64203-1_44 CrossRefGoogle Scholar
  21. 21.
    Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar processors. In: International Symposium on Computer Architecture (ISCA), Santa Margherita Ligure, Italy, pp. 414–425 (1995)Google Scholar
  22. 22.
    Steffan, J., Mowry, T.: The potential for using thread-level data speculation to facilitate automatic parallelization. In: High-Performance Computer Architecture (HPCA), Washington, USA, pp. 2–13 (1998)Google Scholar
  23. 23.
    Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: International Conference on Computer Architecture (ISCA), Vancouver, British Columbia, Canada, pp. 1–12 (2000)Google Scholar
  24. 24.
    Teruel, X., Klemm, M., Li, K., Martorell, X., Olivier, S.L., Terboven, C.: A proposal for task-generating loops in OpenMP*. In: International Workshop on OpenMP (IWOMP), Camberra, Australia (2013)Google Scholar
  25. 25.
    Torrellas, J.: Speculation, thread-level. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1894–1900. Springer, Boston (2011).  https://doi.org/10.1007/978-0-387-09766-4_170CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.São Paulo State UniversitySão PauloBrazil

Personalised recommendations