Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs

  • Stephen L. Olivier
  • Jan F. Prins
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5568)


The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of an irregular and unpredictable search space, as arises e.g. in combinatorial optimization problems. As such UTS presents a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies. Scalability and overheads are compared for OpenMP 3.0, Cilk, and an OpenMP implementation of the benchmark without tasks that performs all scheduling, load balancing, and termination detection explicitly. Current OpenMP 3.0 implementations generally exhibit poor behavior on the UTS benchmark.


Load Balance Schedule Strategy Task Graph Overhead Cost Load Imbalance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 3.0 (May 2008)Google Scholar
  2. 2.
    Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.W.: UTS: An unbalanced tree search benchmark. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proc. 1998 SIGPLAN Conf. Prog. Lang. Design Impl. (PLDI 1998), pp. 212–223 (1998)Google Scholar
  4. 4.
    Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: PPoPP 1995: Proc. 5th ACM SIGPLAN symp. Princ. Pract. Par. Prog. (1995)Google Scholar
  5. 5.
    Blumofe, R., Leiserson, C.: Scheduling multithreaded computations by work stealing. In: Proc. 35th Ann. Symp. Found. Comp. Sci., November 1994, pp. 356–368 (1994)Google Scholar
  6. 6.
    Mohr, E., Kranz, D.A., Robert, H., Halstead, J.: Lazy task creation: a technique for increasing the granularity of parallel programs. In: LFP 1990: Proc. 1990 ACM Conf. on LISP and Functional Prog., pp. 185–197. ACM, New York (1990)Google Scholar
  7. 7.
    Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conf. on Supercomputing, Piscataway, NJ, USA, pp. 1–11. IEEE Press, Los Alamitos (2008)Google Scholar
  9. 9.
    Ibanez, R.F.: Task chunking of iterative constructions in openmp 3.0. In: First Workshop on Execution Environments for Distributed Computing (July 2007)Google Scholar
  10. 10.
    Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P.: Compiler support of the workqueuing execution model for Intel(R) SMP architectures. In: European Workshop on OpenMP, EWOMP 2002 (2002)Google Scholar
  11. 11.
    Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An experimental evaluation of the new OpenMP tasking model. In: Adve, V.S., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Teruel, X., Unnikrishnan, P., Martorell, X., Ayguadé, E., Silvera, R., Zhang, G., Tiotto, E.: OpenMP tasks in IBM XL compilers. In: CASCON 2008: Proc. 2008 Conf. of Center for Adv. Studies on Collaborative Research, pp. 207–221. ACM, New York (2008)Google Scholar
  13. 13.
    Free Software Foundation, Inc.: GCC, the GNU compiler collection,
  14. 14.
    Eastlake, D., Jones, P.: US secure hash algorithm 1 (SHA-1). RFC 3174, Internet Engineering Task Force (September 2001)Google Scholar
  15. 15.
    Baker, D.: Proteins by design. The Scientist, 26–32 (July 2006)Google Scholar
  16. 16.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Stephen L. Olivier
    • 1
  • Jan F. Prins
    • 1
  1. 1.University of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations