Skip to main content
Log in

Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The accurate and quantitative analysis of the cache behavior in a Chip Multi-Core (CMP) machine has long been a challenging work. So far there has been no practical way to predict the cache allocation, i.e., allocated cache size, of a running program. Lots of applications, especially those that have many interactions with the users, cache allocation should be estimated with high accuracy since its variation is closely related to the stability of system performance which is important to the efficient operation of servers and has a great influence on user experience. For these interests, this paper proposes an accurate prediction model for the allocation of the last level cache (LLC) of the co-runners. With a precise cache allocation predicted, we further implemented a performance-stability-oriented co-runner scheduling algorithm which aims to maximize the number of co-runners running in performance-stable state and minimize the performance variation of the unstable ones. We demonstrate that the proposed prediction algorithm exhibits a high accuracy with an average error of 5.7 %; and the co-runner scheduling algorithm can find the optimal solution under the specified target with a time complexity of O(n).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE Micro 28:42–53. doi:10.1109/MM.2008.44

    Article  Google Scholar 

  2. Cazorla FJ, Knijnenburg Peter MW, Sakellariou R, Fernandez E, Ramirez A, Valero M (2004) Predictable performance in SMT processors. In: Proceedings of the 1st conference on computing frontiers. ACM, New York, pp 433–443. doi:10.1145/977091.977152

  3. Jiang Y, Shen X (2008) Exploration of the influence of program inputs on cmp co-scheduling. Euro Conf Parall Comp. 263–273. doi:10.1007/978-3-540-85451-7_29

  4. Sandberg A, Sembrant A, Hagersten E, Black-Schaffer D (2013) Modeling Performance Variation Due to Cache Sharing. In: Proceedings International Symposium High Performance Computer Architecture (HPCA), pp 155–166. doi:10.1109/HPCA.2013.6522315

  5. Chen Xi E, Aamodt Tor M (2012) Modeling cache contention and throughput of multiprogrammed manycore processors. IEEE Trans Comp 61:913–927. doi:10.1109/TC.2011.141

    Article  MathSciNet  Google Scholar 

  6. Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor Architecture. In: Proceedings of International Symposium High-Performance Computer Architecture (HPCA), pp 76–86. doi:10.1109/HPCA.2005.27

  7. Xu C, Chen X, Dick Rober P, Mao Zhuoqing M (2010) Cache contention and application performance prediction for multi-core systems. In: International Symposium Performance Analysis of Systems and Software (ISPASS). pp 76–86. doi:10.1109/ISPASS.2010.5452065

  8. Xiang X, Ding C, Luo H, Bao B (2013) HOTL: a higher order theory of locality. In: Proceedings of the 18th Intl’ Conf on Architectural support for programming languages and operating systems (ASPLOS ’13). pp 343–356. doi:10.1145/2451116.2451153

  9. Kim S, Chandra D, Solihin Y (2004) Fair cache sharing and partitioning on a chip multi-processor architecture. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques. pp 111–122. doi:10.1109/PACT.2004.15

  10. Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of Annual International Symposium on Microarchitecture (MICRO), pp 111–122. doi:10.1109/MICRO.2006.49

  11. Suh GE, Devadas S, Rudolph L (2002) A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of International Symposium on High Performance Computer Architecture. doi:10.1109/HPCA.2002.995703

  12. DeVuyst M, Kumar R, Tullsen Dean M (2006) Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: Proceedings International Parallel and Distributed Processing Symposium(IPDPS), pp 117–126. doi:10.1109/IPDPS.2006.1639374

  13. Jiang Yunlian, Tian Kai, Shen Xipeng, Zhang Jinghe, Jie Chen, Tripath Rahul (2010) The complexity of optimal job co-scheduling on chip multiprocessors and heuristics-based solutions. IEEE Trans Paral Distrib Syst 22:1192–1205. doi:10.1109/TPDS.2010.193

    Article  Google Scholar 

  14. Yunlian J, Xipeng S, Chen J, Rahul T (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of 17th Interantional Conference on Parallel Architectures and Compilation Techniques(PACT), pp 220–229. doi:10.1145/1454115.1454146

  15. Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of Architectural support for programming languages and operating systems(ASPLOS), pp 129–142. doi:10.1145/1736020.1736036

  16. Snavely A, Tullsen D (2000) Symbiotic job scheduling for a simultaneous multi threading processor. ASPLOS IX. doi:10.1145/356989.357011

  17. Aamer J, Najaf-abadi Hashem H, Samantika S, Steely Simon C, Joel E (2012) CRUISE: cache replacement and utility-aware scheduling. ASPLOS XII 249–260 doi:10.1145/2150976.2151003

  18. Gupta Saurabh, Xiang Ping, Zhang Yi, Zhou Huiyang (2013) Locality principle revisited: a probability-based quantitative approach. J Parallel Distrib Comput 73:1011–1027. doi:10.1016/j.jpdc.2013.01.010

    Article  Google Scholar 

  19. Xiaoya X, Bao B, Ding C, Kai S (2012) Cache conscious task regrouping on multicore processors. In: Proceedings of 12th IEEE/ACM Interantional Symposium on Cluster, Cloud and Grid Computing. doi:10.1109/CCGrid.139

  20. Knauerhase R, Brett P, Hohlt B, Li T, Hahn S (2008) Using OS observations to improve performance in multicore systems. IEEE Micro 28:54–66. doi:10.1109/MM.2008.48

    Article  Google Scholar 

  21. Fedorova A, Seltzer M, Smith Michael D (2007) Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of 16th International Conference Parallel Architecture and Compilaton Techniques (PACT), pp 25-38. doi:10.1109/PACT.2007.40

  22. Eiman E, Joo Lee C, Onur M, Patt Yale N (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ASPLOS XV. doi:10.1145/1736020.1736058

  23. Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: International Conference on Parallel Processing (ICPP), pp 165–175. doi:10.1109/ICPP.2011.15

  24. Perelman E, Polito M, Bouguet JY, Sampson J, Calder B, Dulong C (2006) Detecting phases in parallel applications on shared memory architectures. In: 20th IEEE Interantional Parallel and Distributed Processing Symposium (IPDPS), pp 88–98 doi:10.1109/IPDPS.2006.1639325

  25. Han W, Xiaopeng G, Zhiqiang W, Yi L (2009) Using GPU to accelerate cache simulation. In: IEEE Interantional Symposium on Parallel and Distributed Processing with Applications, pp 565–570. doi:10.1109/ISP.2009.51

  26. Curtin Ryan R, Cline James R, Slagle Neil P, March William B, Ram P, Mehta Nishant A, Gray Alexander G (2013) MLPACK: a scalable C++ machine learning library. J Mach Learn Res 801–805

Download references

Acknowledgments

This work was supported by Huawei Innovation Research Program(HIPRO, Grant Number. YB2015080028).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Gao, X. & Chen, G. Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms. J Supercomput 72, 1126–1151 (2016). https://doi.org/10.1007/s11227-016-1645-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1645-7

Keywords

Navigation