Abstract
With the wide adoption of Chip Multiprocessors (CMPs), software developers need to switch to parallel programming to reach the performance potential of CMPs and maximize their energy efficiency. Management overheads due to parallelization can cause sub-linear speedups and increase the energy consumption of parallel programs. In this paper, we investigate the parallelization overheads of Intel TBB with a particular focus on its victim selection policy. We implement an “all knowing” oracle victim selection scheme as well as a pseudo-random scheme and compare them against TBB’s default random selection policy. We also break down TBB’s parallelization overheads and report how basic operations like task spawning, task stealing and task de-queuing impact the energy footprint. Our experiments show that failed task stealing is by far the highest energy consumer. In fact, the oracle victim selection policy can reduce the application energy footprint by 13.6% compared to TBB’s default policy.
Keywords
- Intel TBB
- victim selection
- energy efficiency
This is a preview of subscription content, access via your institution.
Preview
Unable to display preview. Download preview PDF.
References
Borkar, S., Chien, A.A.: The Future of Microprocessors. Commun. ACM 54(5) (May 2011)
Fuller, S., Millett, L.: Computing Performance: Game Over or Next Level? Computer 44(1) (2011)
Cilk++: A quick, easy and reliable way to improve threaded performance, http://software.intel.com/en-us/articles/intel-cilk-plus/ (accessed September 15, 2013)
Leijen, D., Schulte, W., Burckhardt, S.: The Design of a Task Parallel Library. In: Proc. of the 24th Conf. on Object Oriented Programming, Systems Languages and Applications (2009)
Faxén, K.F.: Wool - A Work Stealing Library. SIGARCH Computer Architecture News 36(5) (2008)
Intel Corporation. Intel Threading Building Blocks Reference Manual, http://threadingbuildingblocks.org/ (accessed September 15, 2013)
Bienia, C.: Benchmarking Modern Multiprocessors. PhD thesis, Princeton University (January 2011)
Patterson, D.: The Trouble With Multicore. IEEE Spectrum 47(7) (2010)
Pan, H., Hindman, B., Asanović, K.: Composing Parallel Software Efficiently with Lithe. In: Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation (2010)
Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations. In: Int’l Conf. for High Performance Computing, Networking, Storage and Analysis (2011)
Genbrugge, D., Eyerman, S., Eeckhout, L.: Interval Simulation: Raising the Level of Abstraction in Architectural Simulation. In: Proc. of the IEEE 16th Int’l Symp. on High Performance Computer Architecture (2010)
Miller, J., Kasture, H., Kurian, G., Gruenwald, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: A Distributed Parallel Simulator for Multicores. In: Proc. of the IEEE 16th Int’l Symp. on High Performance Computer Architecture (2010)
Li, S., Ahn, J., Strong, R., Brockman, J., Tullsen, D., Jouppi, N.: McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multi-Core and Many-Core Architectures. In: Proc. of the 42nd Annual IEEE/ACM International Symp. on Microarchitecture (2009)
Contreras, G., Martonosi, M.: Characterizing and Improving the Performance of Intel Threading Building Blocks. In: IEEE Int’l Symp. on Workload Characterization (2008)
Li, J., Martínez, J.: Power-Performance Considerations of Parallel Computing on Chip Multiprocessors. ACM Trans. Archit. Code Optim. 2 (2005)
Bhattacharjee, A., Martonosi, M.: Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors. In: Proc. of the 36th Annual Int’l Symp. on Computer Architecture (2009)
Podobas, A., Brorsson, M., Faxén, K.F.: A Comparison of Some Recent Task-based Parallel Programming Models. In: Third Workshop on Programmability Issues for Multi-Core Computers (2009)
Iordan, A.C., Jahre, M., Natvig, L.: On the Energy Footprint of Task Based Parallel Applications. In: Proc. of the Int’l Conf. on High Performance Computing & Simulation (2013)
Faxén, K.F.: Efficient Work Stealing for Fine Grained Parallelism. In: 39th Int’l Conf. on Parallel Processing (2010)
Vandierendonck, H., Pratikakis, P., Nikolopoulos, D.S.: Parallel Programming of General-Purpose Programs Using Task-Based Programming Models. In: Proc. of the 3rd USENIX Conference on Hot Topic in Parallelism, HotPar 2011 (2011)
Chen, X., Chen, W., Li, J., Zheng, Z., Shen, L., Wang, Z.: Characterizing Fine-Grain Parallelism on Modern Multicore Platform. In: IEEE 17th Int’l Conf. on Parallel and Distributed Systems (2011)
Marowka, A.: TBBench: A Micro-Benchmark Suite for Intel Threading Building Blocks. JIPS 8(2) (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Iordan, A.C., Jahre, M., Natvig, L. (2014). Victim Selection Policies for Intel TBB: Overheads and Energy Footprint. In: Maehle, E., Römer, K., Karl, W., Tovar, E. (eds) Architecture of Computing Systems – ARCS 2014. ARCS 2014. Lecture Notes in Computer Science, vol 8350. Springer, Cham. https://doi.org/10.1007/978-3-319-04891-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-04891-8_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04890-1
Online ISBN: 978-3-319-04891-8
eBook Packages: Computer ScienceComputer Science (R0)