Toward a Standard Interface for User-Defined Scheduling in OpenMP

  • Vivek KaleEmail author
  • Christian Iwainsky
  • Michael Klemm
  • Jonas H. Müller Korndörfer
  • Florina M. Ciorba
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)


Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, standardizing each of them is infeasible. A more viable approach is to extend the OpenMP standard to allow a user to define loop scheduling strategies within her application. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation supporting user-defined scheduling in an OpenMP library.


OpenMP Multithreaded applications Shared-memory programming Multicore Loop scheduling Self-scheduling User-defined loop scheduling Dynamic load balancing High performance computing 



We thank Alice Koniges from Maui HPCC for providing us with NERSC’s cluster Cori for experimenting with machine learning applications using OpenMP, which helped us consider a relevant platform for user-defined scheduling. This work is partly funded by the Hessian State Ministry of Higher Education by granting the “Hessian Competence Center for High Performance Computing” and by the Swiss National Science Foundation in the context of the “Multi-level Scheduling in Large Scale High Performance Computers” (MLS) grant, number 169123.


  1. 1.
    QuickThread: A New C++ Multicore Library, November 2009.
  2. 2.
    LLVM’s OpenMP Compiler, April 2019.
  3. 3.
    An Enhanced OpenMP Library, January 2018. Accessed 27 Apr 2018
  4. 4.
    Banicescu, I.: Load balancing and data locality in the parallelization of the fast multipole algorithm. Ph.D. thesis, New York Polytechnic University (1996)Google Scholar
  5. 5.
    Banicescu, I., Liu, Z.: Adaptive factoring: a dynamic scheduling method tuned to the rate of weight changes. In: Proceedings of 8th High performance computing Symposium, pp. 122–129. Society for Computer Simulation International (2000)Google Scholar
  6. 6.
    Banicescu, I., Velusamy, V., Devaprasad, J.: On the scalability of dynamic scheduling scientific applications with adaptive weighted factoring. Cluster Comput. 6(3), 215–226 (2003). CrossRefGoogle Scholar
  7. 7.
    Bast, H.: Dynamic scheduling with incomplete information. In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1998, pp. 182–191. ACM, New York (1998)Google Scholar
  8. 8.
    Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP loop scheduling revisited: making a case for more schedules. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 21–36. Springer, Cham (2018). Scholar
  9. 9.
    Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1) (1998)CrossRefGoogle Scholar
  10. 10.
    Donfack, S., Grigori, L., Gropp, W.D., Kale, V.: Hybrid static/dynamic scheduling for already optimized dense matrix factorizations. In: 2012 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China (2012)Google Scholar
  11. 11.
    Dong, Y., Chen, J., Yang, X., Deng, L., Zhang, X.: Energy-oriented OpenMP parallel loop scheduling. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 162–169, December 2008Google Scholar
  12. 12.
    Dongarra, J., Beckman, P., et al.: The international exascale software roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011)CrossRefGoogle Scholar
  13. 13.
    Flynn Hummel, S., Banicescu, I., Wang, C.T., Wein, J.: Load balancing and data locality via fractiling: an experimental study. In: Szymanski, B.K., Sinharoy, B. (eds.) Languages, Compilers and Run-Time Systems for Scalable Computers, pp. 85–98. Springer, Boston (1996). Scholar
  14. 14.
    Flynn Hummel, S., Schmidt, J., Uma, R.N., Wein, J.: Load-sharing in Heterogeneous Systems via Weighted Factoring. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1996, pp. 318–328. ACM, New York (1996)Google Scholar
  15. 15.
    Flynn Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: a method for scheduling parallel loops. Commun. ACM 35(8), 90–101 (1992)CrossRefGoogle Scholar
  16. 16.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)zbMATHGoogle Scholar
  17. 17.
    Govindaswamy, K.: An API for adaptive loop scheduling in shared address space architectures. Master’s thesis, Mississippi State University (2003)Google Scholar
  18. 18.
    Kale, V., Donfack, S., Grigori, L., Gropp, W.D.: Lightweight scheduling for balancing the tradeoff between load balance and locality. Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)Google Scholar
  19. 19.
    Kale, V., Gamblin, T., Hoefler, T., de Supinski, B.R., Gropp, W.D.: Abstract: slack-conscious lightweight loop scheduling for improving scalability of bulk-synchronous MPI applications. In: High Performance Computing, Networking Storage and Analysis, SC Companion, p. 1392, November 2012Google Scholar
  20. 20.
    Kale, V., Gropp, W.: Load balancing for regular meshes on SMPs with MPI. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 229–238. Springer, Heidelberg (2010). Scholar
  21. 21.
    Kale, V., Gropp, W.D.: Composing low-overhead scheduling strategies for improving performance of scientific applications. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 18–29. Springer, Cham (2015). Scholar
  22. 22.
    Kasielke, F., Tschüter, R., Iwainsky, C., Velten, M., Ciorba, F.M., Banicescu, I.: Exploring loop scheduling enhancements in OpenMP: an LLVM case study. In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019), Amsterdam, June 2019Google Scholar
  23. 23.
    Krueger, P., Shivaratri, N.G.: Adaptive location policies for global scheduling. IEEE Trans. Softw. Eng. 20(6), 432–444 (1994)CrossRefGoogle Scholar
  24. 24.
    Kruskal, C.P., Weiss, A.: Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng. SE–11(10), 1001–1016 (1985)CrossRefGoogle Scholar
  25. 25.
    Li, H., Tandri, S., Stumm, M., Sevcik, K.C.: Locality and loop scheduling on NUMA multiprocessors. In: Proceedings of the 1993 International Conference on Parallel Processing, ICPP 1993, Washington, DC, USA, vol. 2, pp. 140–147. IEEE Computer Society (1993)Google Scholar
  26. 26.
    Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. C–36(12), 1425–1439 (1987)CrossRefGoogle Scholar
  27. 27.
    Rountree, B., Lowenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, Yorktown Heights, NY, USA, pp. 460–469. ACM (2009)Google Scholar
  28. 28.
    Seo, S., et al.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. 29(3), 512–526 (2018)CrossRefGoogle Scholar
  29. 29.
    Tang, P., Yew, P.C.: Processor self-scheduling for multiple-nested parallel loops. In: Proceedings of International Conference on Parallel Processing, pp. 528–535. IEEE, December 1986Google Scholar
  30. 30.
    Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP loop scheduling: a combined compiler and runtime approach. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 88–101. Springer, Heidelberg (2012). Scholar
  31. 31.
    Tzen, T.H., Ni, L.M.: Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans. Parallel Distrib. Syst. 4(1), 87–98 (1993)CrossRefGoogle Scholar
  32. 32.
    Wang, Y., Nicolau, A., Cammarota, R., Veidenbaum, A.V.: A fault tolerant self-scheduling scheme for parallel loops on shared memory systems. In: 2012 19th International Conference on High Performance Computing, pp. 1–10, December 2012Google Scholar
  33. 33.
    Zhang, Y., Voss, M.: Runtime empirical selection of loop schedulers on hyperthreaded SMPs. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005) - Papers - Volume 01, IPDPS 2005, Washington, DC, USA, p. 44.2. IEEE Computer Society (2005)Google Scholar

Copyright information

© This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2019

Authors and Affiliations

  • Vivek Kale
    • 1
    Email author
  • Christian Iwainsky
    • 2
  • Michael Klemm
    • 3
  • Jonas H. Müller Korndörfer
    • 4
  • Florina M. Ciorba
    • 4
  1. 1.Brookhaven National LaboratoryUptonUSA
  2. 2.Technische Universität DarmstadtDarmstadtGermany
  3. 3.Intel Deutschland GmbHFeldkirchenGermany
  4. 4.University of BaselBaselSwitzerland

Personalised recommendations