Advertisement

The Journal of Supercomputing

, Volume 68, Issue 3, pp 1302–1320 | Cite as

Adaptive prefetching using global history buffer in multicore processors

  • Mahmood Naderan-Tahan
  • Hamid Sarbazi-Azad
Article
  • 228 Downloads

Abstract

Data prefetching is a well-known technique to hide the memory latency in the last-level cache (LCC). Among many prefetching methods in recent years, the Global History Buffer (GHB) proves to be efficient in terms of cost and speedup. In this paper, we show that a fixed value for detecting patterns and prefetch degree makes GHB to (1) be conservative while there are more opportunities to create new addresses and (2) generate wrong addresses in the presence of constant strides. To resolve these problems, we separate the pattern length from the prefetching degree. The result is an aggressive prefetcher that can generate more addresses with a given pattern length. Furthermore with a variable pattern length mechanism, constant strides are grouped, such that more accurate patterns are detected. As the aggressiveness of this prefetcher is relatively high, we further propose an efficient throttling procedure to reduce the negative effects of wrong prefetching using a new measure of cache pollution. This adaptive method is suitable for CMP processors where the prefetcher resides in the shared LCC. Simulation results with a mixed suite of integer and floating point benchmarks from SPEC CPU2006 show that on a single-core processor both aggressive and adaptive methods outperform existing prefetchers by 48 and 28 %, respectively, while increasing the memory traffic by 20 and 14 %, respectively. Further on an 8-core CMP with a mix of multiprogrammed workloads, the adaptive method outperforms the state-of-the-art throttling methods by 8 % in speedup, while reducing the memory traffic by 3 %.

Keywords

Computer architecture Cache Prefetching Multicore processor 

References

  1. 1.
    International technology roadmap for semiconductor (ITRS). http://www.itrs.net/links/2010itrs
  2. 2.
    Palacharla S, Jouppi NP, Smith JE (1997) Complexity-effective superscalar processors. In: Proceedings of international symposium on computer, architecture, pp 206–218Google Scholar
  3. 3.
    Reinman G, Austin T, Calder B (1999) A scalable front-end architecture for fast instruction delivery. In: Proceedings of international symposium on computer architecture, pp 234–245Google Scholar
  4. 4.
    Camacho ON, Villa VLA, Espinosa SO (2007) High performance cache. In: Proceedings of the international conference on computer design, pp 181–187Google Scholar
  5. 5.
    Bellas NE, Hajj IN, Polychronopoulos CD (2000) Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Trans Very Large Scale Integr Syst 8:693–708Google Scholar
  6. 6.
    Ku JC, Ozdemir S, Ismail Y (2006) Power density minimization for highly-associative caches in embedded processors. In: Proceedings of the ACM Great Lakes symposium on VLSI, pp 100–104Google Scholar
  7. 7.
    Gove D (2007) Cpu2006 working set size. ACM SIGARCH Comput Archit News 35:90–96CrossRefGoogle Scholar
  8. 8.
    Prakash TK, Peng L (2008) Performance characterization of spec cpu2006 benchmarks on intel core 2 duo processor. ISAST Trans Comput Softw Eng 2:36–41Google Scholar
  9. 9.
    Wang Z, Burger D, McKinley KS, Reinhardt SK, Weems CC (2003) Guided region prefetching: a cooperative hardware/software approach. In: Proceedings of international symposium on computer, architecture, pp 388–398Google Scholar
  10. 10.
    Spracklen L, Chou Y, Abraham SG (2005) Effective instruction prefetching in chip multiprocessors for modern commercial applications. In: Proceedings of international symposium on high performance computer, architecture, pp 225–236Google Scholar
  11. 11.
    Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: Proceedings of international symposium on high performance computer, architecture, pp 96–105Google Scholar
  12. 12.
    Sair S, Sherwood T, Calder B (2003) A decoupled predictor-directed stream prefetching architecture. IEEE Trans Comput 52:260–276CrossRefGoogle Scholar
  13. 13.
    Liu G, Huang Z, Peir J-K, Shi X, Peng L (2011) Enhancements for accurate and timely streaming prefetcher. J Instr Level Parallelism 13Google Scholar
  14. 14.
    Smith AJ (1982) Cache memories. ACM Comput Surv 14:473–530CrossRefGoogle Scholar
  15. 15.
    Chen T, Baer J (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Comput 44:609–623CrossRefzbMATHGoogle Scholar
  16. 16.
    Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings of international symposium on microarchitecture, pp 281–290Google Scholar
  17. 17.
    Charney M, Reeves A (1995) Generalized correlation based hardware prefetching. Technical Report EE-CEG-95-1 Cornell UniversityGoogle Scholar
  18. 18.
    Joseph D, Grunwald D (1997) Prefetching using markov predictors. In: Proceedings of international symposium on computer, architecture, pp 252–263Google Scholar
  19. 19.
    Perez DG, Mouchard G, Temam O (2004) Microlib: a case for the quantitative comparison of micro-architecture mechanisms. In: Proceedings of the International Symposium on microarchitecture, pp 43–54Google Scholar
  20. 20.
    Srinath S, Mutlu O, Kim H, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: Proceedings of international symposium on high performance computer, architecture, pp 63–74Google Scholar
  21. 21.
    Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Archit News 39:1–7CrossRefGoogle Scholar
  22. 22.
    Standard performance evaluation corporation (SPEC) cpu2006 benchmark suite. http://www.spec.org/cpu2006
  23. 23.
    Verma S, Koppelman DM, Peng L (2011) Efficient prefetching with hybrid schemes and use of program feedback to adjust prefetcher aggressiveness. J Instr Level Parallelism 13Google Scholar
  24. 24.
    Dahlgren F, Stenstrom P (1995) Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors. In: Proceeding of symposium on high-performance computer, architecture, pp 68–77Google Scholar
  25. 25.
    Dimitrov M, Zhou H (2011) Combining local and global history for high performance data prefetching. J Instr Level Parallelism 13Google Scholar
  26. 26.
    Sharif A, Lee HS (2011) Data prefetching by exploiting global and local access patterns. J Instr Level Parallelism 13Google Scholar
  27. 27.
    Nesbit KJ, Smith JE (2004) AC/DC: an adaptive data cache prefetcher. In: Proceedings of international conference on parallel architecture and compilation, techniques, pp 135–145Google Scholar
  28. 28.
    Diaz P, Cintra M (2009) Stream chaining: exploiting multiple levels of correlation in data prefetching. In: Proceedings of international symposium on computer, architecture, pp 81–92Google Scholar
  29. 29.
    Grannaes M, Jahre M, Natvig L (2011) Storage efficient hardware prefetching using delta-correlating prediction tables. J Instr Level Parallelism 13Google Scholar
  30. 30.
    Somogyi S, Wenisch TF, Ailamaki A, Falsafi B, Moshovos A (2006) Spatial memory streaming. In: Proceedings of international symposium on computer, architecture, pp 252–263Google Scholar
  31. 31.
    Ebrahimi E, Multu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the international symposium on microarchitecture, pp 316–326Google Scholar
  32. 32.
    Dang X, Wang X, Tong D, Lu J, Yi J, Wang K (2012) S/DC: a storage and energy efficient data prefetcher. In: Proceedings of the international conference on design, automation and test in Europe, pp 461–466Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Deptartment of Computer EngineeringSharif University of TechnologyTehranIran

Personalised recommendations