Adaptive prefetching using global history buffer in multicore processors

Naderan-Tahan, Mahmood; Sarbazi-Azad, Hamid

doi:10.1007/s11227-014-1088-y

Adaptive prefetching using global history buffer in multicore processors

Published: 17 January 2014

Volume 68, pages 1302–1320, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mahmood Naderan-Tahan¹ &
Hamid Sarbazi-Azad¹

373 Accesses
2 Citations
Explore all metrics

Abstract

Data prefetching is a well-known technique to hide the memory latency in the last-level cache (LCC). Among many prefetching methods in recent years, the Global History Buffer (GHB) proves to be efficient in terms of cost and speedup. In this paper, we show that a fixed value for detecting patterns and prefetch degree makes GHB to (1) be conservative while there are more opportunities to create new addresses and (2) generate wrong addresses in the presence of constant strides. To resolve these problems, we separate the pattern length from the prefetching degree. The result is an aggressive prefetcher that can generate more addresses with a given pattern length. Furthermore with a variable pattern length mechanism, constant strides are grouped, such that more accurate patterns are detected. As the aggressiveness of this prefetcher is relatively high, we further propose an efficient throttling procedure to reduce the negative effects of wrong prefetching using a new measure of cache pollution. This adaptive method is suitable for CMP processors where the prefetcher resides in the shared LCC. Simulation results with a mixed suite of integer and floating point benchmarks from SPEC CPU2006 show that on a single-core processor both aggressive and adaptive methods outperform existing prefetchers by 48 and 28 %, respectively, while increasing the memory traffic by 20 and 14 %, respectively. Further on an 8-core CMP with a mix of multiprogrammed workloads, the adaptive method outperforms the state-of-the-art throttling methods by 8 % in speedup, while reducing the memory traffic by 3 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

Throughout this paper, by GHB we mean a GHB with global delta correlation (G/DC) [11].

References

International technology roadmap for semiconductor (ITRS). http://www.itrs.net/links/2010itrs
Palacharla S, Jouppi NP, Smith JE (1997) Complexity-effective superscalar processors. In: Proceedings of international symposium on computer, architecture, pp 206–218
Reinman G, Austin T, Calder B (1999) A scalable front-end architecture for fast instruction delivery. In: Proceedings of international symposium on computer architecture, pp 234–245
Camacho ON, Villa VLA, Espinosa SO (2007) High performance cache. In: Proceedings of the international conference on computer design, pp 181–187
Bellas NE, Hajj IN, Polychronopoulos CD (2000) Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Trans Very Large Scale Integr Syst 8:693–708
Google Scholar
Ku JC, Ozdemir S, Ismail Y (2006) Power density minimization for highly-associative caches in embedded processors. In: Proceedings of the ACM Great Lakes symposium on VLSI, pp 100–104
Gove D (2007) Cpu2006 working set size. ACM SIGARCH Comput Archit News 35:90–96
Article Google Scholar
Prakash TK, Peng L (2008) Performance characterization of spec cpu2006 benchmarks on intel core 2 duo processor. ISAST Trans Comput Softw Eng 2:36–41
Google Scholar
Wang Z, Burger D, McKinley KS, Reinhardt SK, Weems CC (2003) Guided region prefetching: a cooperative hardware/software approach. In: Proceedings of international symposium on computer, architecture, pp 388–398
Spracklen L, Chou Y, Abraham SG (2005) Effective instruction prefetching in chip multiprocessors for modern commercial applications. In: Proceedings of international symposium on high performance computer, architecture, pp 225–236
Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: Proceedings of international symposium on high performance computer, architecture, pp 96–105
Sair S, Sherwood T, Calder B (2003) A decoupled predictor-directed stream prefetching architecture. IEEE Trans Comput 52:260–276
Article Google Scholar
Liu G, Huang Z, Peir J-K, Shi X, Peng L (2011) Enhancements for accurate and timely streaming prefetcher. J Instr Level Parallelism 13
Smith AJ (1982) Cache memories. ACM Comput Surv 14:473–530
Article Google Scholar
Chen T, Baer J (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Comput 44:609–623
Article MATH Google Scholar
Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings of international symposium on microarchitecture, pp 281–290
Charney M, Reeves A (1995) Generalized correlation based hardware prefetching. Technical Report EE-CEG-95-1 Cornell University
Joseph D, Grunwald D (1997) Prefetching using markov predictors. In: Proceedings of international symposium on computer, architecture, pp 252–263
Perez DG, Mouchard G, Temam O (2004) Microlib: a case for the quantitative comparison of micro-architecture mechanisms. In: Proceedings of the International Symposium on microarchitecture, pp 43–54
Srinath S, Mutlu O, Kim H, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: Proceedings of international symposium on high performance computer, architecture, pp 63–74
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Archit News 39:1–7
Article Google Scholar
Standard performance evaluation corporation (SPEC) cpu2006 benchmark suite. http://www.spec.org/cpu2006
Verma S, Koppelman DM, Peng L (2011) Efficient prefetching with hybrid schemes and use of program feedback to adjust prefetcher aggressiveness. J Instr Level Parallelism 13
Dahlgren F, Stenstrom P (1995) Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors. In: Proceeding of symposium on high-performance computer, architecture, pp 68–77
Dimitrov M, Zhou H (2011) Combining local and global history for high performance data prefetching. J Instr Level Parallelism 13
Sharif A, Lee HS (2011) Data prefetching by exploiting global and local access patterns. J Instr Level Parallelism 13
Nesbit KJ, Smith JE (2004) AC/DC: an adaptive data cache prefetcher. In: Proceedings of international conference on parallel architecture and compilation, techniques, pp 135–145
Diaz P, Cintra M (2009) Stream chaining: exploiting multiple levels of correlation in data prefetching. In: Proceedings of international symposium on computer, architecture, pp 81–92
Grannaes M, Jahre M, Natvig L (2011) Storage efficient hardware prefetching using delta-correlating prediction tables. J Instr Level Parallelism 13
Somogyi S, Wenisch TF, Ailamaki A, Falsafi B, Moshovos A (2006) Spatial memory streaming. In: Proceedings of international symposium on computer, architecture, pp 252–263
Ebrahimi E, Multu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the international symposium on microarchitecture, pp 316–326
Dang X, Wang X, Tong D, Lu J, Yi J, Wang K (2012) S/DC: a storage and energy efficient data prefetcher. In: Proceedings of the international conference on design, automation and test in Europe, pp 461–466

Download references

Author information

Authors and Affiliations

Deptartment of Computer Engineering, Sharif University of Technology, Tehran, Iran
Mahmood Naderan-Tahan & Hamid Sarbazi-Azad

Authors

Mahmood Naderan-Tahan
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Sarbazi-Azad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahmood Naderan-Tahan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naderan-Tahan, M., Sarbazi-Azad, H. Adaptive prefetching using global history buffer in multicore processors. J Supercomput 68, 1302–1320 (2014). https://doi.org/10.1007/s11227-014-1088-y

Download citation

Published: 17 January 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11227-014-1088-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Adaptive prefetching using global history buffer in multicore processors

Abstract

Access this article

Similar content being viewed by others

A Performance Study of Software Prefetching for Tracing Garbage Collectors

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive prefetching using global history buffer in multicore processors

Abstract

Access this article

Similar content being viewed by others

A Performance Study of Software Prefetching for Tracing Garbage Collectors

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation