Skip to main content
Log in

Adaptive prefetching using global history buffer in multicore processors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Data prefetching is a well-known technique to hide the memory latency in the last-level cache (LCC). Among many prefetching methods in recent years, the Global History Buffer (GHB) proves to be efficient in terms of cost and speedup. In this paper, we show that a fixed value for detecting patterns and prefetch degree makes GHB to (1) be conservative while there are more opportunities to create new addresses and (2) generate wrong addresses in the presence of constant strides. To resolve these problems, we separate the pattern length from the prefetching degree. The result is an aggressive prefetcher that can generate more addresses with a given pattern length. Furthermore with a variable pattern length mechanism, constant strides are grouped, such that more accurate patterns are detected. As the aggressiveness of this prefetcher is relatively high, we further propose an efficient throttling procedure to reduce the negative effects of wrong prefetching using a new measure of cache pollution. This adaptive method is suitable for CMP processors where the prefetcher resides in the shared LCC. Simulation results with a mixed suite of integer and floating point benchmarks from SPEC CPU2006 show that on a single-core processor both aggressive and adaptive methods outperform existing prefetchers by 48 and 28 %, respectively, while increasing the memory traffic by 20 and 14 %, respectively. Further on an 8-core CMP with a mix of multiprogrammed workloads, the adaptive method outperforms the state-of-the-art throttling methods by 8 % in speedup, while reducing the memory traffic by 3 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Throughout this paper, by GHB we mean a GHB with global delta correlation (G/DC) [11].

References

  1. International technology roadmap for semiconductor (ITRS). http://www.itrs.net/links/2010itrs

  2. Palacharla S, Jouppi NP, Smith JE (1997) Complexity-effective superscalar processors. In: Proceedings of international symposium on computer, architecture, pp 206–218

  3. Reinman G, Austin T, Calder B (1999) A scalable front-end architecture for fast instruction delivery. In: Proceedings of international symposium on computer architecture, pp 234–245

  4. Camacho ON, Villa VLA, Espinosa SO (2007) High performance cache. In: Proceedings of the international conference on computer design, pp 181–187

  5. Bellas NE, Hajj IN, Polychronopoulos CD (2000) Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Trans Very Large Scale Integr Syst 8:693–708

    Google Scholar 

  6. Ku JC, Ozdemir S, Ismail Y (2006) Power density minimization for highly-associative caches in embedded processors. In: Proceedings of the ACM Great Lakes symposium on VLSI, pp 100–104

  7. Gove D (2007) Cpu2006 working set size. ACM SIGARCH Comput Archit News 35:90–96

    Article  Google Scholar 

  8. Prakash TK, Peng L (2008) Performance characterization of spec cpu2006 benchmarks on intel core 2 duo processor. ISAST Trans Comput Softw Eng 2:36–41

    Google Scholar 

  9. Wang Z, Burger D, McKinley KS, Reinhardt SK, Weems CC (2003) Guided region prefetching: a cooperative hardware/software approach. In: Proceedings of international symposium on computer, architecture, pp 388–398

  10. Spracklen L, Chou Y, Abraham SG (2005) Effective instruction prefetching in chip multiprocessors for modern commercial applications. In: Proceedings of international symposium on high performance computer, architecture, pp 225–236

  11. Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: Proceedings of international symposium on high performance computer, architecture, pp 96–105

  12. Sair S, Sherwood T, Calder B (2003) A decoupled predictor-directed stream prefetching architecture. IEEE Trans Comput 52:260–276

    Article  Google Scholar 

  13. Liu G, Huang Z, Peir J-K, Shi X, Peng L (2011) Enhancements for accurate and timely streaming prefetcher. J Instr Level Parallelism 13

  14. Smith AJ (1982) Cache memories. ACM Comput Surv 14:473–530

    Article  Google Scholar 

  15. Chen T, Baer J (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Comput 44:609–623

    Article  MATH  Google Scholar 

  16. Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings of international symposium on microarchitecture, pp 281–290

  17. Charney M, Reeves A (1995) Generalized correlation based hardware prefetching. Technical Report EE-CEG-95-1 Cornell University

  18. Joseph D, Grunwald D (1997) Prefetching using markov predictors. In: Proceedings of international symposium on computer, architecture, pp 252–263

  19. Perez DG, Mouchard G, Temam O (2004) Microlib: a case for the quantitative comparison of micro-architecture mechanisms. In: Proceedings of the International Symposium on microarchitecture, pp 43–54

  20. Srinath S, Mutlu O, Kim H, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: Proceedings of international symposium on high performance computer, architecture, pp 63–74

  21. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Archit News 39:1–7

    Article  Google Scholar 

  22. Standard performance evaluation corporation (SPEC) cpu2006 benchmark suite. http://www.spec.org/cpu2006

  23. Verma S, Koppelman DM, Peng L (2011) Efficient prefetching with hybrid schemes and use of program feedback to adjust prefetcher aggressiveness. J Instr Level Parallelism 13

  24. Dahlgren F, Stenstrom P (1995) Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors. In: Proceeding of symposium on high-performance computer, architecture, pp 68–77

  25. Dimitrov M, Zhou H (2011) Combining local and global history for high performance data prefetching. J Instr Level Parallelism 13

  26. Sharif A, Lee HS (2011) Data prefetching by exploiting global and local access patterns. J Instr Level Parallelism 13

  27. Nesbit KJ, Smith JE (2004) AC/DC: an adaptive data cache prefetcher. In: Proceedings of international conference on parallel architecture and compilation, techniques, pp 135–145

  28. Diaz P, Cintra M (2009) Stream chaining: exploiting multiple levels of correlation in data prefetching. In: Proceedings of international symposium on computer, architecture, pp 81–92

  29. Grannaes M, Jahre M, Natvig L (2011) Storage efficient hardware prefetching using delta-correlating prediction tables. J Instr Level Parallelism 13

  30. Somogyi S, Wenisch TF, Ailamaki A, Falsafi B, Moshovos A (2006) Spatial memory streaming. In: Proceedings of international symposium on computer, architecture, pp 252–263

  31. Ebrahimi E, Multu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the international symposium on microarchitecture, pp 316–326

  32. Dang X, Wang X, Tong D, Lu J, Yi J, Wang K (2012) S/DC: a storage and energy efficient data prefetcher. In: Proceedings of the international conference on design, automation and test in Europe, pp 461–466

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahmood Naderan-Tahan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naderan-Tahan, M., Sarbazi-Azad, H. Adaptive prefetching using global history buffer in multicore processors. J Supercomput 68, 1302–1320 (2014). https://doi.org/10.1007/s11227-014-1088-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1088-y

Keywords

Navigation