Skip to main content
Log in

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chou C, Jaleel A, Qureshi M K. BEAR: techniques for mitigating bandwidth bloat in gigascale DRAM caches. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. 2016, 198–210

    Google Scholar 

  2. Lee Y, Kim J, Jang H, Yang H, Kim J, Jeong J, Lee J W. A fully associative, tagless DRAM cache. In: Proceedings of ACM/IEEE International Symposium on Computer Architecture. 2015, 211–222

    Google Scholar 

  3. Hameed F, Bauer L, Henkel J. Adaptive cache management for a combined SRAMand DRAM cache hierarchy for multi-cores. In: Proceedings of Design, Automation and Test in Europe. 2013, 77–82

    Google Scholar 

  4. Hundal R, Oklobdzija V G. Determination of optimal sizes for a first and second level SRAM-DRAM on-chip cache combination. In: Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors. 1994, 60–64

    Chapter  Google Scholar 

  5. Qureshi M K, Loh G H. Fundamental latency trade-off in architecting DRAM caches: outperforming impractical SRAM-tags with a simple and practical design. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture. 2012, 235–246

    Google Scholar 

  6. Huang C C, Nagarajan V. ATCache: reducing DRAMcache latency via a small SRAM tag cache. In: Proceedings of International Conference on Parallel Architectures and Compilation. 2014, 51–60

    Google Scholar 

  7. Hameed F, Bauer L, Henkel J. Reducing latency in an SRAM/DRAMcache hierarchy via a novel Tag-Cache architecture. In: Proceedings of Design Automation Conference. 2014, 1–6

    Google Scholar 

  8. Andrade D, Fraguela B B, Doallo R. Accurate prediction of the behavior of multithreaded applications in shared caches. Parallel Computing, 2013, 39(1): 36–57

    Article  Google Scholar 

  9. Manikantan R, Rajan K, Govindarajan R. Probabilistic Shared Cache Management (PriSM). In: Proceedings of ACM/IEEE International Symposium on Computer Architecture. 2012, 428–439

    Google Scholar 

  10. Wei W, Jiang D, Xiong J, Chen M. HAP: hybrid-memory-aware partition in shared last-level cache. In: Proceedings of IEEE International Conference on Computer Design. 2014, 28–35

    Google Scholar 

  11. Holey A, Mekkat V, Yew P C, Zhai A. Performance-energy considerations for shared cache management in a heterogeneous multicore processor. ACM Transactions on Architecture & Code Optimization, 2015, 12(1): 1–29

    Article  Google Scholar 

  12. El-Moursy A, Sibai F N. V-Set cache: an efficient adaptive shared cache for multi-core processors. Journal of Circuits System & Computers, 2014, 23(23): 815–822

    Google Scholar 

  13. Zhang D, Ju L, Zhao M, Gao X, Jia Z. Write-back aware shared lastlevel cache management for hybrid main memory. In: Proceedings of Design Automation Conference. 2016

    Google Scholar 

  14. Elhelw A S, Moursy A E, Fahmy H A H. Time-based least memory intensive scheduling. In: Proceedings of the 8th IEEE International Symposium on Embedded Multicore/Manycore Systems-on-Chip. 2014, 311–318

    Google Scholar 

  15. Elhelw A S, El-Moursy A, Fahmy H A H. Adaptive time-based least memory intensive scheduling. In: Proceedings of the 9th IEEE International Symposium on Embedded Multicore/Manycore Systems-on-Chip. 2015, 167–174

    Google Scholar 

  16. Chen Q, Zheng L, Guo M. DWS: demand-aware work-stealing in multi-programmed multi-core architectures. In: Proceedings of Programming Models and Applications on Multicores and Manycores. 2014

    Google Scholar 

  17. Chen X, Xu C, Dick R P, Mao Z M. Performance and power modeling in a multi-programmed multi-core environment. In: Proceedings of Design Automation Conference. 2010, 813–818

    Google Scholar 

  18. Roscoe B, Herlev M, Liu C. Auto-tuning multi-programmed workload on the SCC. In: Proceedings of International Green Computing Conference. 2013, 1–5

    Google Scholar 

  19. Huang C, Ravi S, Raghunathan A, Jha N K. Synthesis of heterogeneous distributed architectures for memory-intensive applications. In: Proceedings of International Conference on Computer Aided Design. 2003, 46–53

    Google Scholar 

  20. Huang C, Ravi S, Raghunathan A, Jha N K. Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis. IEEE Transactions on Very Large Scale Integration Systems, 2007, 15(11): 1191–1204

    Article  Google Scholar 

  21. Castellana V G, Ferrandi F. Abstract: speeding-up memory intensive applications through adaptive hardware accelerators. In: Proceedings of SC Companion: High Performance Computing, Networking Storage and Analysis. 2012, 1415–1416

    Google Scholar 

  22. Yi W, Tang Y, Wang G, Fang X. A case study of SWIM: optimization of memory intensive application on GPGPU. In: Proceedings of International Symposium on Parallel Architectures, Algorithms and Programming. 2010, 123–129

    Google Scholar 

  23. Athanasaki E, Anastopoulos N, Kourtis K, Koziris N. Exploring the performance limits of simultaneous multithreading for memory intensive applications. Journal of Supercomputing, 2008, 44(1): 64–97

    Article  Google Scholar 

  24. Chun K C, Jain P, Kim C H. Logic-compatible embedded DRAM design for memory intensive low power systems. In: Proceedings of IEEE International Symposium on Circuits and Systems. 2010, 277–280

    Google Scholar 

  25. Jaleel A, Nuzman J, Moga A, Steely S C, Emer J. High performing cache hierarchies for server workloads: relaxing inclusion to capture the latency benefits of exclusive caches. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture. 2015, 343–353

    Google Scholar 

  26. Akin B, Franchetti F, Hoe J C. Data reorganization in memory using 3D-stacked DRAM. In: Proceedings of ACM/IEEE International Symposium on Computer Architecture. 2015, 131–143

    Google Scholar 

  27. Jevdjic D, Volos S, Falsafi B. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. In: Proceedings of ACM/IEEE International Symposium on Computer Architecture. 2013, 404–415

    Google Scholar 

  28. Oskin M, Loh G H. A software-managed approach to die-stacked DRAM. In: Proceedings of International Conference on Parallel Architecture and Compilation. 2015, 188–200

    Google Scholar 

  29. Mekkat V, Holey A, Yew P C, Zhai A. Managing shared last-level cache in a heterogeneous multicore processor. In: Proceedings of International Conference on Parallel Architectures & Compilation Techniques. 2013, 225–234

    Google Scholar 

  30. Lee M, Kim S. Performance-controllable shared cache architecture for multi-core soft real-time systems. In: Proceedings of IEEE International Conference on Computer Design. 2013, 519–522

    Google Scholar 

  31. Pan A, Pai V S. Runtime-driven shared last-level cache management for task-parallel programs. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 1–12

    Google Scholar 

  32. Albericio J, Ibáñez P, Viñals V, Llabería J M. The reuse cache: downsizing the shared last-level cache. In: Proceedings of IEEE/ACM International Symposium on Microarchitecturee. 2013, 310–321

    Chapter  Google Scholar 

  33. Loh G H, Hill M D. Supporting very large DRAM caches with compound-access scheduling and MissMap. IEEE Micro, 2012, 32(3): 70–78

    Article  Google Scholar 

  34. Loh G H, Hill M D. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture. 2011, 454–464

    Google Scholar 

  35. Dong HW, Seong N H, Lee H H S. Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV. IEEE Transactions on Very Large Scale Integration Systems, 2013, 21(1): 1–13

    Article  Google Scholar 

  36. Chen Q, Zheng L, Guo M. Adaptive demand-aware work-stealing in multi-programmed multi-core architectures. Concurrency & Computation: Practice & Experience, 2016, 28(2): 455–471

    Article  Google Scholar 

  37. Suo G, Yang X. System level speedup oriented cache partitioning for multi-programmed systems. In: Proceedings of IFIP International Conference on Network and Parallel Computing. 2009, 204–210

    Google Scholar 

  38. Kirovski D, Lee C, Potkonjak M, Mangione-Smith W H. Application-driven synthesis of memory-intensive systems-on-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1999, 18(9): 1316–1326

    Article  Google Scholar 

  39. Sim J, Loh G H, Sridharan V, O’Connor M. A configurable and strong RAS solution for die-stacked DRAMcaches. IEEEMicro, 2014, 34(3): 80–90

    Google Scholar 

  40. Lin B, Li S, Liao X, Zhang J. Leach: an automatic learning cache for inline primary deduplication system. Frontiers of Computer Science, 2014, 8(2): 175–183.

    Article  MathSciNet  Google Scholar 

  41. Chou C, Jaleel A, Qureshi M K. CAMEO: a two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture. 2014, 1–12

    Google Scholar 

  42. Ou J, Patton M, Moore M D, Xu Y, Jiang S. A penalty aware memory allocation scheme for key-value cache. In: Proceedings of International Conference on Parallel Processing. 2015, 530–539

    Google Scholar 

  43. Woo H D, Seong N H, Lewis D L, Lee H H S. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In: Proceedings of the 16th IEEE International Symposium on High Performance Computer Architecture. 2010, 1–12

    Google Scholar 

  44. Loh G H. Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture. 2009, 201–212

    Google Scholar 

  45. Jiang L, Liu Y, Duan L, Xie Y, Xu Q. Modeling TSV open defects in 3D-stacked DRAM. In: Proceedings of IEEE International Test Conference. 2010, 174–182

    Google Scholar 

  46. Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach. Elsevier, 2011

    MATH  Google Scholar 

  47. Li S, Cheng B, Gao X, Qiao L, Tang Z. Performance characterization of SPEC CPU2006 benchmarks on Intel and AMD platform. In: Proceedings of IEEE International Workshop on Education Technology & Computer Science. 2009, 116–121

    Google Scholar 

  48. Sim J, Loh G H, Kim H, O’Connor M. A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture. 2012, 247–257

    Google Scholar 

  49. Begum R, Hempstead M. Power-agility metrics: measuring dynamic characteristics of energy proportionality. In: Proceedings of IEEE International Conference on Computer Design. 2015, 643–650

    Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the editors and anonymous reviewers for their careful work and instructive suggestions. Also, we thank Dr. Zhi-Jie Wang for his warm help and advices. This work was supported by the National Basic Research Program of China (2015CB352403), the National Natural Science Foundation of China (Grant Nos. 61261160502, 61272099, 61303012, 61572323, and 61628208), the Scientific Innovation Act of STCSM (13511504200), the EU FP7 CLIMBER project (PIRSES-GA-2012-318939), and the CCF-Tencent Open Fund.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dingyu Yang or Minyi Guo.

Additional information

Jingyu Zhang received the BE degree in communication engineering from Hunan Normal University, China in 2008, and the ME degree in computer science from Chongqing Jiaotong University, China in 2010. He is currently a PhD candidate at the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. He was a visiting PhD student at the Ohio State University, USA during 2014 to 2016. His research interests include computer architecture, mobile networks, and energy optimization.

Chentao Wu received the PhD degree in electrical and computer engineering from Virginia Commonwealth University, USA in 2012. He received the ME degree in software engineering in 2006, and the BE degree in computer science and technology in 2004 both from Huazhong University of Science and Technology, China. He is currently an associate professor in the Department of Computer Science and Engineering at Shanghai Jiao Tong University, China. His research interests include computer architecture and data storage systems

Dingyu Yang received the BE and ME degrees from the Kunming University of Science and Technology, China and the PhD degree from the Shanghai Jiao Tong University, China, all in computer science. He is currently an assistant professor at the Shanghai Dian Ji University, China. His research interests include resource prediction and anomaly detection in cloud computing and big data.

Yuanyi Chen received his BS degree from Sichuan University, China in 2010, and the ME degree from Zhejiang University, China in 2013. From September 2014 to September 2015, he was a jointly-supervised PhD candidate at Hong Kong Polytechnic University, China. He is currently a PhD candidate at Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. His research topic includes the Internet of Things and mobile computing.

Xiaodong Meng is a PhD student of computer science at Shanghai Jiao Tong University, China. He received his bachelor’s degree in electronic information science and technology from Wuhan University, China in 2007, and master’s degree in information technology from Monash University, Australia in 2010. His main interest is in parallel and distributed computing, social graph processing and storage systems.

Liting XU received the BS degree andMaster’s degree from Jiangsu University, China in 2005 and 2007, respectively, both in computer science and applications. During 2008 to 2011, she was a Software R&D Engineer at TERAOKAWeigh-System Pte. Ltd, Singapore. From 2011 to 2012, she was with ASM Technology Singapore Pte Ltd, as a Software R&D Engineer. After that, she had been a Software Modelling Engineer in Continental Automotive Singapore PteLtd. In May 2013, she joined Shanghai Jiao Tong University, China as a Research Engineer. Her present research interests include computer supported cooperative work (CSCW), applications of distributed computing and Web visualization on the Grid.

Minyi Guo received the BS and ME degrees in computer science from Nanjing University, China in 1982 and 1986, respectively, and the PhD degree in information science from the University of Tsukuba, Japan in 1998. From 1998 to 2000, he had been a research associate of NEC Soft, Ltd. Japan. He was a visiting professor at the Department of Computer Science, Georgia Institute of Technology, USA. He was a full professor at the University of Aizu, Japan, and is the head of the Department of Computer Science and Engineering at Shanghai Jiao Tong University, China. He has published more than 150 papers in well-known conferences and journals. His research interests include automatic parallelization and data-parallel languages, bioinformatics, compiler optimization, high-performance computing, and pervasive computing.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Wu, C., Yang, D. et al. HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads. Front. Comput. Sci. 12, 1090–1104 (2018). https://doi.org/10.1007/s11704-017-6349-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-6349-5

Keywords

Navigation