Characterizing the Impact of Using Spare-Cores on Application Performance
Increased parallelism on a single processor is driving improvements in peak-performance at both the node and system levels. However achievable performance, in particular from production scientific applications, is not always directly proportional to the core count. Performance is often limited by constraints in the memory hierarchy and also by a node inter-connectivity. Even on state-of-the-art processors, containing between four and eight cores, many applications cannot take full advantage of the compute-performance of all cores. This trend is expected to increase on future processors as the core count per processor increases. In this work we characterize the use of spare-cores, cores that do not provide any improvements in application performance, on current multi-core processors. By using a pulse-width modulation method, we examine the possible performance profile of using a spare-core and quantify under what situations its use will not impact application performance. We show that, for current AMD and Intel multi-core processors, spare-cores can be used for substantial computational tasks but can impact application performance when using shared caches or when significantly accessing main memory.
Unable to display preview. Download preview PDF.
- 1.Intel: Futuristic Intel Chip Could Reshape How Computers are Built, Consumers Interact with Their PCs and Personal Devices (2009) Press released at, http://www.intel.com/pressroom/archive/releases/2009/20091202comp_sm.html
- 3.Sakuma, K., Andry, P.S., Tsang, C.K., Wright, S.L., Dang, B., Patel, C.S., Webb, B.C., Maria, J., Sprogis, E.J., Kang, S.K., Polastre, R.J., Horton, R.R., Knickerbocker, J.U.: 3D Chip-stacking Technology with Through-silicon Vias and Low-volume Lead-free Interconnections. IBM Journal of Research and Development 52(6), 611–622 (2008)CrossRefGoogle Scholar
- 4.Wells, P.M., Chakraborty, K., Sohi, G.S.: Adapting to Intermittent Faults in Multicore Systems. In: Proc. ACM ASPLOS, Seattle, WA, pp. 255–264 (March 2008)Google Scholar
- 5.Joseph, R.: Exploring Salvage Techniques for Multi-core Architectures. In: Proc. Workshop on High Performance Computing Reliability in Conjunction with HPCA-11, San Francisco, CA (February 2005)Google Scholar
- 6.Zhou, H.: Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window. In: International Conference on Parallel Architecture and Compilation Techniques, St. Louis, MO, pp. 231–242 (2005)Google Scholar
- 7.Ganusov, I., Burtscher, M.: Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors. In: International Conference on Parallel Architecture and Compilation Techniques, St. Louis, MO, pp. 350–360 (September 2005)Google Scholar
- 8.Porterfield, A., Fowler, R., Neyer, M.: MAESTRO: Dynamic Runtime Power and Concurrency. In: Workshop on Managed Many-Core Systems Colocated with the ACM International Symposium on High Performance Distributed Computing, Boston, MA (June 2008)Google Scholar
- 9.Chow, J., Garfinkel, T., Chen, P.: Decoupling Dynamic Program Analysis from Execution in Virtual Environment. In: Proc. Usenix Annual Technical Conference, Boston, MA, pp. 1–14 (June 2008)Google Scholar
- 10.Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive Performance and Scalability Modeling of a Large-Scale Application. In: Supercomputing Conference, Denver, Colorado, p. 39 (November 2001)Google Scholar
- 11.Koch, K.R., Baker, R.S., Alcouffe, R.E.: Solution of the First-order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor. Transactions of the American Nuclear Society 65, 192–198 (1992)Google Scholar