Advertisement

Journal of Computer Science and Technology

, Volume 24, Issue 3, pp 405–417 | Cite as

Taxonomy of Data Prefetching for Multicore Processors

  • Surendra BynaEmail author
  • Yong Chen
  • Xian-He Sun
Survey

Abstract

Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.

Keywords

taxonomy of prefetching strategies multicore processors data prefetching memory hierarchy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Byna S, Chen Y, Sun X-H, Thakur R, Gropp W. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis (SC'08), Austin, USA, November 2008, Article No. 44.Google Scholar
  2. 2.
    Chen T F, Baer J L. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 1995, 44(5): 609–623.zbMATHCrossRefGoogle Scholar
  3. 3.
    Dahlgren F, Dubois M, Stenström P. Fixed and adaptive sequential prefetching in shared-memory multiprocessors. In Proc. International Conference on Parallel Processing, New York, USA, Aug. 16–20, 1993, pp.56–63.Google Scholar
  4. 4.
    Dahlgren F, Dubois M, Stenström P. Sequential hardware prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, July 1995, 6(7): 733–746.CrossRefGoogle Scholar
  5. 5.
    Joseph D, Grunwald D. Prefetching using Markov predictors. In Proc. the 24th International Symposium on Computer Architecture, Denver, USA, June 2–4, 1997, pp.252–263.Google Scholar
  6. 6.
    Kandiraju G, Sivasubramaniam A. Going the distance for TLB prefetching: An application-driven study. In Proc. the 29th International Symposium on Computer Architecture, Anchorage, USA, May 25–29, 2002, pp.195–206.Google Scholar
  7. 7.
    Luk C K, Mowry T C. Compiler-based prefetching for recursive data structures. In Proc. the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, USA, Oct. 1–5, 1996, pp.222–233.Google Scholar
  8. 8.
    Rabbah R M, Sandanagobalane H, Ekpanyapong M, Wong W F. Compiler orchestrated pre-fetching via speculation and predication. In Proc. the 11th International Conference on Architecture Support of Programming Languages and Operating Systems, Boston, USA, Oct. 7–13, 2004, pp.189–198.Google Scholar
  9. 9.
    Kim J, Palem K V, Wong W F. A framework for data prefetching using off-line training of Markovian predictors. In Proc. the 2002 IEEE International Conference on Computer Design, Freiburg, Germany, Sept. 16–18, 2002, pp.340–347.Google Scholar
  10. 10.
    Annavaram M, Patel J M, Davidson E S. Data prefetching by dependence graph precomputation. In Proc. the 28th International Symposium on Computer Architecture, Göteborg, Sweden, June 30–July 4, 2001, 29(2): 52–61.Google Scholar
  11. 11.
    Chen Y, Byna S, Sun X-H, Thakur R, Gropp W. Hiding I/O latency with pre-execution prefetching for parallel applications. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'08), Austin, USA, November 2008, Article No.40.Google Scholar
  12. 12.
    Kim D, Liao S S, Wang P H, del Cuvillo J, Tian X, Zou X, Wang H, Yeung D, Girkar M, Shen J P. Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors. In Proc. the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, Palo Alto, USA, March 21–24, 2004, p.27.Google Scholar
  13. 13.
    Solihin Y, Lee J, Torrellas J. Using a user-level memory thread for correlation prefetching. In Proc. the 29th International Symposium on Computer Architecture, Anchorage, USA, May 25–29, 2002, pp.171–182.Google Scholar
  14. 14.
    Song Y, Kalogeropulos S, Tirumalai P. Design and implementation of a compiler framework for helper threading on multicore processors. In Proc. the 14th Parallel Architectures and Compilation Techniques, St. Louis, USA, Sept. 17–21, 2005, pp.99–109.Google Scholar
  15. 15.
    Zilles C, Sohi G. Execution-based prediction using speculative slices. In Proc. the 28th International Symposium on Computer Architecture, Göteborg, Sweden, June 30–July 4, 29(2): 2–13.Google Scholar
  16. 16.
    Ganusov I, Burtscher M. Future execution: A hardware prefetching technique for chip multiprocessors. In Proc. the 14th Parallel Architectures and Compilation Techniques, St. Louis, USA, Sept. 17–21, 2005, pp.350–360.Google Scholar
  17. 17.
    Zhou H. Dual-core execution: Building a highly scalable single-thread instruction window. In Proc. the 14th Parallel Architectures and Compilation Techniques, St. Louis, USA, Sept. 17–21, 2005, Vol.17–21, pp.231–242.Google Scholar
  18. 18.
    Sun X H, Byna S, Chen Y. Server-based data push architecture for multi-processor environments. Journal of Computer Science and Technology (JCST), 2007, 22(5): 641–652.CrossRefGoogle Scholar
  19. 19.
    VanderWiel S, Lilja D J. Data prefetch mechanisms. ACM Computing Surveys, 2000, 32(2): 174–199.CrossRefGoogle Scholar
  20. 20.
    Oren N. A survey of prefetching techniques. Technical Report CS-2000-10, University of the Witwatersrand, 2000.Google Scholar
  21. 21.
    Casmira J P, Kaeli D R. Modeling cache pollution. International Journal of Modeling and Simulation, May 1998, 19(2): 132–138.Google Scholar
  22. 22.
    Dundas J, Mudge T. Improving data cache performance by pre-executing instructions under a cache miss. In Proc. International Conference on Supercomputing, Vienna, Austria, July 7–11, 1997, pp.68–75.Google Scholar
  23. 23.
    Mutlu O, Stark J, Wilkerson C, Patt Y N. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proc. the 9th International Symposium on High-Performance Computer Architecture, San Jose, USA, Feb. 3–7, 2003, p.129.Google Scholar
  24. 24.
    Doweck J. Inside Intel Core microarchitecture and smart memory access. White paper, Intel Research Website, 2006, http://download.intel.com/technology/architecture/sma.pdf.
  25. 25.
    Klaiber A C, Levy H M. An architecture for software-controlled data prefetching. In Proc. the 18th International Symposium on Computer Architecture, Toronto, Canada, May 27–30, 1991, 19(3): 43–53.Google Scholar
  26. 26.
    Mowry T, Gupta A. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 1991, 12(2): 87–106.CrossRefGoogle Scholar
  27. 27.
    Luk C K. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In Proc. the 28th International Symposium on Computer Architecture, Arlington, USA, June 13–15, 2001, 29(2): 40–51.Google Scholar
  28. 28.
    Liao S, Wang P H, Wang H, Hoflehner G, Lavery D, Shen J P. Post-pass binary adaptation for software-based speculative precomputation. In Proc. the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, Berlin, Gemany, June 2002, pp.117–128.Google Scholar
  29. 29.
    Roth A, Moshovos A, Sohi G S. Dependence based prefetching for linked data structures. In Proc. the 8th International Conference on Architecture Support for Programming Languages and Operating Systems, San Jose, USA, Oct. 4–7, 1998, pp.115–126.Google Scholar
  30. 30.
    Hassanein W, Fortes J, Eigenmann R. Data forwarding through in-memory precomputation threads. In Proc. the 18th International Conference on Supercomputing, Saint Malo, France, June 26–July 1, 2004, pp.207–216.Google Scholar
  31. 31.
    Collins J D, Wang H, Tullsen D M, Hughes C, Lee Y F, Lavery D, Shen J P. Speculative precomputation: Long-range prefetching of delinquent loads. In Proc. the 28th Annual International Symposium on Computer Architecture, Arlington, USA, June 13–15, 2001, 29(2): 14–25.Google Scholar
  32. 32.
    Collins J, Tullsen D M, Wang H, Shen J P. Dynamic speculative precomputation. In Proc. the 34th ACM/IEEE International Symposium on Microarchitecture, Austin, USA, Dec. 2–5, 2001, pp.306–317.Google Scholar
  33. 33.
    Byna S. Server-based data push architecture for data access performance optimization [Ph.D. Dissertation]. Department of Computer Science, Illinois Institute of Technology, 2006.Google Scholar
  34. 34.
    Nesbit K J, Smith J E. Prefetching using a global history buffer. IEEE Micro, 2005, 25(1): 90–97.CrossRefGoogle Scholar
  35. 35.
    Chen Y, Byna S, Sun X H. Data access history cache and associated data prefetching mechanisms. In Proc. the ACM/IEEE Supercomputing Conference 2007, Reno, USA, November 10–16, 2007, Article No. 21.Google Scholar
  36. 36.
    Chang P Y, Kaeli D R. Branch-directed data cache prefetching. In Proc. the 4th International Symposium on Computer Architecture Workshop on Scalable Shared-Memory Multiprocessors, Chicago, USA, April 1994, pp.225–230.Google Scholar
  37. 37.
    Tran N, Reed D A. Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Transactions on Parallel and Distributed Systems, April 2004, 15(4): 362–377.CrossRefGoogle Scholar
  38. 38.
    Sun X H, Byna S. Data-access memory servers for multi-processor environments. CS-TR-2005-001, Illinois Institute of Technology, 2005.Google Scholar
  39. 39.
    Hennessy J, Patterson D. Computer Architecture: A Quantitative Approach. The 4th Edition, Morgan Kaufmann, 2006.Google Scholar
  40. 40.
    Jouppi N P. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. the 17th International Symposium on Computer Architecture, Seattle, USA, May 28–31, 1990, pp.364–373.Google Scholar
  41. 41.
    Jain P, Devadas S, Rudolph L. Controlling cache pollution in prefetching with software-assisted cache replacement. Technical Report TR-CSG-462, Massachusetts Institute of Technology, 2001.Google Scholar
  42. 42.
    Megiddo N, Modha D. ARC: A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conference on File and Storage Technologies, San Francisco, March 31–April 2, 2003, pp.115–130.Google Scholar
  43. 43.
    Yang C L, Lebeck A R, Tseng H W, Lee C. Tolerating memory latency through push prefetching for pointer-intensive applications. ACM Transactions on Architecture and Code Optimization, 2004, 1(4): 445–475.CrossRefGoogle Scholar
  44. 44.
    Brown J, Wang H, Chrysos G, Wang P, Shen J. Speculative precomputation on chip multiprocessors. In Proc. the 6th Workshop on Multithreaded Execution, Architecture, and Compilation, Istanbul, Turkey, Nov. 19, 2002.Google Scholar
  45. 45.
    Smith A J. Sequential program prefetching in memory hierarchies. IEEE Computer, 1978, 11(12): 7–21.Google Scholar

Copyright information

© Springer 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceIllinois Institute of TechnologyChicagoU.S.A.

Personalised recommendations