Abstract
Dynamically determining the appropriate number of threads for a multi-threaded application may lead to a higher efficiency than predetermining the number of threads beforehand. Interval branch-and-bound (B&B) global optimization algorithms are typically irregular algorithms that may benefit from the use of a dynamic number of threads. The question is how to obtain the necessary on line information to decide on the number of threads. We experiment with a scheme following a SPMD (Single Program, Multiple Data) and AMP (Asynchronous Multiple Pool) model. This means that all threads execute the same code and they are consequently affected by the same types of blocked time.
There exist several methods to measure the blocked time of an application. The basis for the data to be obtained is the information provided by the Linux Operating System (O.S.) for tasks: task_interruptible and task_uninterruptible block time. We elaborate on this, to determine new metrics allowing kernel and applications to collaborate through system calls in order to decide on the number of threads for an application.
This work has been funded by grants from the Spanish Ministry of Science and Innovation (TIN2008-01117), and Junta de Andalucía (P11-TIC-7176), in part financed by the European Regional Development Fund (ERDF). Eligius M. T. Hendrix is a fellow of the Spanish “Ramón y Cajal” contract program, co-financed by the European Social Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52, 56–67 (2009), doi:10.1145/1562764.1562783
Bhattacharjee, A., Contreras, G., Martonosi, M.: Parallelization libraries: Characterizing and reducing overheads. ACM Trans. Archit. Code Optim. 8, 5:1–5:29 (2011), doi:10.1145/1952998.1953003
Casado, L.G., Martínez, J.A., García, I., Hendrix, E.M.T.: Branch-and-bound interval global optimization on shared memory multiprocessors. Optimization Methods and Software 23(3), 689–701 (2008), doi:10.1080/10556780802086300
De Bruin, A., Kindervater, G., Trienekens, H.: Asynchronous Parallel Branch and Bound and Anomalies. In: Ferreira, A., Rolim, J. (eds.) IRREGULAR 1995. LNCS, vol. 980, pp. 363–377. Springer, Heidelberg (1995), doi:10.1007/3-540-60321-2_29
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–11. IEEE Press, Piscataway (2008), doi:10.1109/SC.2008.5213927
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 36:1–36:11. IEEE Press, Piscataway (2008), http://portal.acm.org/citation.cfm?id=1413370.1413407
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguadé, E.: Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. In: 38th International Conference on Parallel Processing (ICPP 2009), pp. 124–131. IEEE Computer Society, Vienna (2009), doi:10.1109/ICPP.2009.64
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: PLDI 1998: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, pp. 212–223. ACM, New York (1998), doi:10.1145/277650.277725
Gendron, B., Crainic, T.G.: Parallel branch-and-bound algorithms: Survey and synthesis. Operations Research 42(6), 1042–1066 (1994), doi:10.1287/opre.42.6.1042
Lee, J., Park, J.H., Kim, H., Jung, C., Lim, D., Han, S.: Adaptive execution techniques of parallel programs for multiprocessors. Journal of Parallel and Distributed Computing 70(5), 467–480 (2010), doi:10.1016/j.jpdc.2009.10.008
Olivier, S.L., Prins, J.F.: Evaluating openmp 3.0 run time systems on unbalanced task graphs. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 63–78. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02303-3_6
Olivier, S.L., Prins, J.F.: Comparison of openmp 3.0 and other task parallel frameworks on unbalanced task graphs. International Journal of Parallel Programming 38, 341–360 (2010), doi:10.1007/s10766-010-0140-7
OpenMP Architecture Review Board: OpenMP Application Program Interface, version 3.0. OpenMP (2008)
Patterson, D.A.: Software knows best: portable parallelism requires standardized measurements of transparent hardware. In: Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, WOSP/SIPEW 2010, pp. 1–2. ACM, New York (2010), doi:10.1145/1712605.1712607
Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: Dynamically determining number of threads via os level monitoring. In: Proceedings of the 2011 International Symposium on Workload Characterization, Austin, TX, USA, pp. 116–125 (October 2011), doi:10.1109/IISWC.2011.6114208
Reinders, J.: Intel Threading Building Blocks. O’Reilly (2007)
Sanjuan-Estrada, J., Casado, L., García, I.: Adaptive parallel interval branch and bound algorithms based on their performance for multicore architectures. The Journal of Supercomputing 58(3), 376–384 (2011), doi:10.1007/s11227-011-0594-4
Sanjuan-Estrada, J.F., Casado, L.G., García, I.: Adaptive parallel interval global optimization algorithms based on their performance for non-dedicated multicore architectures. In: Proceedings of PDP 2011 - The 19th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Cyprus, pp. 252–256 (February 2011), doi:10.1109/PDP.2011.54
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on cmps. SIGARCH Comput. Archit. News 36, 277–286 (2008), doi:10.1145/1353534.1346317
Yu, C., Petrov, P.: Adaptive multi-threading for dynamic workloads in embedded multiprocessors. In: Proceedings of the 23rd Symposium on Integrated Circuits and System Design, SBCCI 2010, pp. 67–72. ACM, New York (2010), doi:10.1145/1854153.1854173
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sanjuan-Estrada, J.F., Casado, L.G., García, I., Hendrix, E.M.T. (2012). Performance Driven Cooperation between Kernel and Auto-tuning Multi-threaded Interval B&B Applications. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31125-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-31125-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31124-6
Online ISBN: 978-3-642-31125-3
eBook Packages: Computer ScienceComputer Science (R0)