Performance Driven Cooperation between Kernel and Auto-tuning Multi-threaded Interval B&B Applications

  • Juan Francisco Sanjuan-Estrada
  • Leocadio Gonzalez Casado
  • Immaculada García
  • Eligius M. T. Hendrix
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7333)


Dynamically determining the appropriate number of threads for a multi-threaded application may lead to a higher efficiency than predetermining the number of threads beforehand. Interval branch-and-bound (B&B) global optimization algorithms are typically irregular algorithms that may benefit from the use of a dynamic number of threads. The question is how to obtain the necessary on line information to decide on the number of threads. We experiment with a scheme following a SPMD (Single Program, Multiple Data) and AMP (Asynchronous Multiple Pool) model. This means that all threads execute the same code and they are consequently affected by the same types of blocked time.

There exist several methods to measure the blocked time of an application. The basis for the data to be obtained is the information provided by the Linux Operating System (O.S.) for tasks: task_interruptible and task_uninterruptible block time. We elaborate on this, to determine new metrics allowing kernel and applications to collaborate through system calls in order to decide on the number of threads for an application.


Execution Time Global Optimization Algorithm Task Parallelism Minimum Execution Time Kernel Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52, 56–67 (2009), doi:10.1145/1562764.1562783CrossRefGoogle Scholar
  2. 2.
    Bhattacharjee, A., Contreras, G., Martonosi, M.: Parallelization libraries: Characterizing and reducing overheads. ACM Trans. Archit. Code Optim. 8, 5:1–5:29 (2011), doi:10.1145/1952998.1953003CrossRefGoogle Scholar
  3. 3.
    Casado, L.G., Martínez, J.A., García, I., Hendrix, E.M.T.: Branch-and-bound interval global optimization on shared memory multiprocessors. Optimization Methods and Software 23(3), 689–701 (2008), doi:10.1080/10556780802086300MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    De Bruin, A., Kindervater, G., Trienekens, H.: Asynchronous Parallel Branch and Bound and Anomalies. In: Ferreira, A., Rolim, J. (eds.) IRREGULAR 1995. LNCS, vol. 980, pp. 363–377. Springer, Heidelberg (1995), doi:10.1007/3-540-60321-2_29CrossRefGoogle Scholar
  5. 5.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–11. IEEE Press, Piscataway (2008), doi:10.1109/SC.2008.5213927Google Scholar
  6. 6.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 36:1–36:11. IEEE Press, Piscataway (2008), Google Scholar
  7. 7.
    Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguadé, E.: Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. In: 38th International Conference on Parallel Processing (ICPP 2009), pp. 124–131. IEEE Computer Society, Vienna (2009), doi:10.1109/ICPP.2009.64Google Scholar
  8. 8.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: PLDI 1998: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, pp. 212–223. ACM, New York (1998), doi:10.1145/277650.277725CrossRefGoogle Scholar
  9. 9.
    Gendron, B., Crainic, T.G.: Parallel branch-and-bound algorithms: Survey and synthesis. Operations Research 42(6), 1042–1066 (1994), doi:10.1287/opre.42.6.1042MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Lee, J., Park, J.H., Kim, H., Jung, C., Lim, D., Han, S.: Adaptive execution techniques of parallel programs for multiprocessors. Journal of Parallel and Distributed Computing 70(5), 467–480 (2010), doi:10.1016/j.jpdc.2009.10.008zbMATHCrossRefGoogle Scholar
  11. 11.
    Olivier, S.L., Prins, J.F.: Evaluating openmp 3.0 run time systems on unbalanced task graphs. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 63–78. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02303-3_6CrossRefGoogle Scholar
  12. 12.
    Olivier, S.L., Prins, J.F.: Comparison of openmp 3.0 and other task parallel frameworks on unbalanced task graphs. International Journal of Parallel Programming 38, 341–360 (2010), doi:10.1007/s10766-010-0140-7zbMATHCrossRefGoogle Scholar
  13. 13.
    OpenMP Architecture Review Board: OpenMP Application Program Interface, version 3.0. OpenMP (2008)Google Scholar
  14. 14.
    Patterson, D.A.: Software knows best: portable parallelism requires standardized measurements of transparent hardware. In: Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, WOSP/SIPEW 2010, pp. 1–2. ACM, New York (2010), doi:10.1145/1712605.1712607CrossRefGoogle Scholar
  15. 15.
    Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: Dynamically determining number of threads via os level monitoring. In: Proceedings of the 2011 International Symposium on Workload Characterization, Austin, TX, USA, pp. 116–125 (October 2011), doi:10.1109/IISWC.2011.6114208Google Scholar
  16. 16.
    Reinders, J.: Intel Threading Building Blocks. O’Reilly (2007)Google Scholar
  17. 17.
    Sanjuan-Estrada, J., Casado, L., García, I.: Adaptive parallel interval branch and bound algorithms based on their performance for multicore architectures. The Journal of Supercomputing 58(3), 376–384 (2011), doi:10.1007/s11227-011-0594-4CrossRefGoogle Scholar
  18. 18.
    Sanjuan-Estrada, J.F., Casado, L.G., García, I.: Adaptive parallel interval global optimization algorithms based on their performance for non-dedicated multicore architectures. In: Proceedings of PDP 2011 - The 19th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Cyprus, pp. 252–256 (February 2011), doi:10.1109/PDP.2011.54Google Scholar
  19. 19.
    Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on cmps. SIGARCH Comput. Archit. News 36, 277–286 (2008), doi:10.1145/1353534.1346317CrossRefGoogle Scholar
  20. 20.
    Yu, C., Petrov, P.: Adaptive multi-threading for dynamic workloads in embedded multiprocessors. In: Proceedings of the 23rd Symposium on Integrated Circuits and System Design, SBCCI 2010, pp. 67–72. ACM, New York (2010), doi:10.1145/1854153.1854173CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Juan Francisco Sanjuan-Estrada
    • 1
  • Leocadio Gonzalez Casado
    • 1
  • Immaculada García
    • 2
  • Eligius M. T. Hendrix
    • 2
  1. 1.Department of Computer Architecture and ElectronicsUniversity of AlmeríaSpain
  2. 2.Department of Computer ArchitectureUniversity of MálagaSpain

Personalised recommendations