Abstract
In multi- and many-core processors, a shared Last Level Cache (LLC) is utilized to alleviate the performance problems resulting from long latency memory instructions. However, an unmanaged LLC may become quite useless when the running threads have conflicting interests. In one extreme, a thread can make benefit from every portion of the cache whereas, in the other end, another thread may just want to thrash the whole LLC. Recently, a variety of way-partitioning mechanisms are introduced to improve cache performance. Today, almost all of the studies utilize the Utility-based Cache Partitioning (UCP) algorithm as their allocation policy. However, the UCP look-ahead algorithm, although it provides a better utility measure than its greedy counterpart, requires a very complex hardware circuitry and dissipates a considerable amount of energy at the end of each decision period. In this study, we propose an offline supervised machine learning algorithm that replaces the UCP look-ahead circuitry with a circuitry requiring almost negligible hardware and energy cost. Depending on the cache and processor configuration, our thorough analysis and simulation results show that the proposed mechanism reduces up to 5 % of the overall transistor count and 5 % of the overall processor energy without introducing any performance penalty.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 423–432. IEEE Computer Society, Washington, DC (2006)
Xie, Y., Loh, G.H.: PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In: SIGARCH Computer Architecture News, pp. 174–183. ACM, New York (2009)
Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: Adaptive insertion policies for high performance caching. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 381–391. ACM, New York (2007)
Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, Jr., S., Emer, J.: Adaptive insertion policies for managing shared caches. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 208–219. ACM, New York (2008)
Sanchez, D., Kozyrakis, C.: Vantage: scalable and efficient fine-grain cache partitioning. In: SIGARCH Computer Architecture News, pp. 57–68. ACM, New York (2011)
Wang, R., Chen, L.: Futility scaling: high-associativity cache partitioning. In: 47th IEEE/ACM International Symposium on Microarchitecture (MICRO) (2014)
Choi, S., Yeung, D.: Learning-based SMT processor resource distribution via hill-climbing. In: SIGARCH Computer Architecture News, pp. 239–251. ACM, New York (2006)
Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated management of multiple interacting resources in chip multiprocessors: a machine learning approach. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41), pp. 318–329. IEEE, Computer Society, Washington DC (2008)
Macsim simulator. http://code.google.eom/p/macsim/
Henning, J.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)
Hamerly, G., Perelman, E., Lau, J., Calder, B.: SimPoint 3.0: faster and more flexible program phase analysis. J. Instr. Level Parallelism 7, 1–28 (2005)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40), pp. 3–14. IEEE Computer Society, Washington, DC (2007)
Tran, A.T., Baas, B.M.: Design of an energy-efficient 32-bit adder operating at subthreshold voltages in 45-nm CMOS. In: Third International Conference on Communications and Electronics (ICCE), pp. 87–91 (2010)
Mehmood, N., Hansson, M., Alvandpour, A.: An energy-efficient 32-bit multiplier architecture in 90-nm CMOS. In: IEEE 24th Norchip Conference, pp. 35–38 (2006)
Pham, T.N., Swartzlander, E.E.: Design of Radix 4 SRT dividers for single precision DSP in deep submicron CMOS technology. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 236–241 (2006)
Folegnani, D., Gonzalez, A.: Energy-effective issue logic. In: IEEE International Symposium on Computer Architecture, pp. 230–239 (2001)
Acknowledgement
This work is supported by the Scientific and Technical Research Council of Turkey (TUBITAK) for Wise-Cache Project under Grant No: 114E119.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Guney, I.A., Yildiz, A., Bayindir, I.U., Serdaroglu, K.C., Bayik, U., Kucuk, G. (2015). A Machine Learning Approach for a Scalable, Energy-Efficient Utility-Based Cache Partitioning. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-20119-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)