Taxonomy of Data Prefetching for Multicore Processors

Byna, Surendra; Chen, Yong; Sun, Xian-He

doi:10.1007/s11390-009-9233-4

Taxonomy of Data Prefetching for Multicore Processors

Survey
Published: 26 May 2009

Volume 24, pages 405–417, (2009)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Surendra Byna¹,
Yong Chen¹ &
Xian-He Sun¹

264 Accesses
31 Citations
3 Altmetric
Explore all metrics

Abstract

Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

WebAssembly as an Enabler for Next Generation Serverless Computing

Article 26 June 2023

References

Byna S, Chen Y, Sun X-H, Thakur R, Gropp W. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis (SC'08), Austin, USA, November 2008, Article No. 44.
Chen T F, Baer J L. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 1995, 44(5): 609–623.
Article MATH Google Scholar
Dahlgren F, Dubois M, Stenström P. Fixed and adaptive sequential prefetching in shared-memory multiprocessors. In Proc. International Conference on Parallel Processing, New York, USA, Aug. 16–20, 1993, pp.56–63.
Dahlgren F, Dubois M, Stenström P. Sequential hardware prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, July 1995, 6(7): 733–746.
Article Google Scholar
Joseph D, Grunwald D. Prefetching using Markov predictors. In Proc. the 24th International Symposium on Computer Architecture, Denver, USA, June 2–4, 1997, pp.252–263.
Kandiraju G, Sivasubramaniam A. Going the distance for TLB prefetching: An application-driven study. In Proc. the 29th International Symposium on Computer Architecture, Anchorage, USA, May 25–29, 2002, pp.195–206.
Luk C K, Mowry T C. Compiler-based prefetching for recursive data structures. In Proc. the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, USA, Oct. 1–5, 1996, pp.222–233.
Rabbah R M, Sandanagobalane H, Ekpanyapong M, Wong W F. Compiler orchestrated pre-fetching via speculation and predication. In Proc. the 11th International Conference on Architecture Support of Programming Languages and Operating Systems, Boston, USA, Oct. 7–13, 2004, pp.189–198.
Kim J, Palem K V, Wong W F. A framework for data prefetching using off-line training of Markovian predictors. In Proc. the 2002 IEEE International Conference on Computer Design, Freiburg, Germany, Sept. 16–18, 2002, pp.340–347.
Annavaram M, Patel J M, Davidson E S. Data prefetching by dependence graph precomputation. In Proc. the 28th International Symposium on Computer Architecture, Göteborg, Sweden, June 30–July 4, 2001, 29(2): 52–61.
Chen Y, Byna S, Sun X-H, Thakur R, Gropp W. Hiding I/O latency with pre-execution prefetching for parallel applications. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'08), Austin, USA, November 2008, Article No.40.
Kim D, Liao S S, Wang P H, del Cuvillo J, Tian X, Zou X, Wang H, Yeung D, Girkar M, Shen J P. Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors. In Proc. the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, Palo Alto, USA, March 21–24, 2004, p.27.
Solihin Y, Lee J, Torrellas J. Using a user-level memory thread for correlation prefetching. In Proc. the 29th International Symposium on Computer Architecture, Anchorage, USA, May 25–29, 2002, pp.171–182.
Song Y, Kalogeropulos S, Tirumalai P. Design and implementation of a compiler framework for helper threading on multicore processors. In Proc. the 14th Parallel Architectures and Compilation Techniques, St. Louis, USA, Sept. 17–21, 2005, pp.99–109.
Zilles C, Sohi G. Execution-based prediction using speculative slices. In Proc. the 28th International Symposium on Computer Architecture, Göteborg, Sweden, June 30–July 4, 29(2): 2–13.
Ganusov I, Burtscher M. Future execution: A hardware prefetching technique for chip multiprocessors. In Proc. the 14th Parallel Architectures and Compilation Techniques, St. Louis, USA, Sept. 17–21, 2005, pp.350–360.
Zhou H. Dual-core execution: Building a highly scalable single-thread instruction window. In Proc. the 14th Parallel Architectures and Compilation Techniques, St. Louis, USA, Sept. 17–21, 2005, Vol.17–21, pp.231–242.
Sun X H, Byna S, Chen Y. Server-based data push architecture for multi-processor environments. Journal of Computer Science and Technology (JCST), 2007, 22(5): 641–652.
Article Google Scholar
VanderWiel S, Lilja D J. Data prefetch mechanisms. ACM Computing Surveys, 2000, 32(2): 174–199.
Article Google Scholar
Oren N. A survey of prefetching techniques. Technical Report CS-2000-10, University of the Witwatersrand, 2000.
Casmira J P, Kaeli D R. Modeling cache pollution. International Journal of Modeling and Simulation, May 1998, 19(2): 132–138.
Google Scholar
Dundas J, Mudge T. Improving data cache performance by pre-executing instructions under a cache miss. In Proc. International Conference on Supercomputing, Vienna, Austria, July 7–11, 1997, pp.68–75.
Mutlu O, Stark J, Wilkerson C, Patt Y N. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proc. the 9th International Symposium on High-Performance Computer Architecture, San Jose, USA, Feb. 3–7, 2003, p.129.
Doweck J. Inside Intel Core microarchitecture and smart memory access. White paper, Intel Research Website, 2006, http://download.intel.com/technology/architecture/sma.pdf.
Klaiber A C, Levy H M. An architecture for software-controlled data prefetching. In Proc. the 18th International Symposium on Computer Architecture, Toronto, Canada, May 27–30, 1991, 19(3): 43–53.
Mowry T, Gupta A. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 1991, 12(2): 87–106.
Article Google Scholar
Luk C K. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In Proc. the 28th International Symposium on Computer Architecture, Arlington, USA, June 13–15, 2001, 29(2): 40–51.
Liao S, Wang P H, Wang H, Hoflehner G, Lavery D, Shen J P. Post-pass binary adaptation for software-based speculative precomputation. In Proc. the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, Berlin, Gemany, June 2002, pp.117–128.
Roth A, Moshovos A, Sohi G S. Dependence based prefetching for linked data structures. In Proc. the 8th International Conference on Architecture Support for Programming Languages and Operating Systems, San Jose, USA, Oct. 4–7, 1998, pp.115–126.
Hassanein W, Fortes J, Eigenmann R. Data forwarding through in-memory precomputation threads. In Proc. the 18th International Conference on Supercomputing, Saint Malo, France, June 26–July 1, 2004, pp.207–216.
Collins J D, Wang H, Tullsen D M, Hughes C, Lee Y F, Lavery D, Shen J P. Speculative precomputation: Long-range prefetching of delinquent loads. In Proc. the 28th Annual International Symposium on Computer Architecture, Arlington, USA, June 13–15, 2001, 29(2): 14–25.
Collins J, Tullsen D M, Wang H, Shen J P. Dynamic speculative precomputation. In Proc. the 34th ACM/IEEE International Symposium on Microarchitecture, Austin, USA, Dec. 2–5, 2001, pp.306–317.
Byna S. Server-based data push architecture for data access performance optimization [Ph.D. Dissertation]. Department of Computer Science, Illinois Institute of Technology, 2006.
Nesbit K J, Smith J E. Prefetching using a global history buffer. IEEE Micro, 2005, 25(1): 90–97.
Article Google Scholar
Chen Y, Byna S, Sun X H. Data access history cache and associated data prefetching mechanisms. In Proc. the ACM/IEEE Supercomputing Conference 2007, Reno, USA, November 10–16, 2007, Article No. 21.
Chang P Y, Kaeli D R. Branch-directed data cache prefetching. In Proc. the 4th International Symposium on Computer Architecture Workshop on Scalable Shared-Memory Multiprocessors, Chicago, USA, April 1994, pp.225–230.
Tran N, Reed D A. Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Transactions on Parallel and Distributed Systems, April 2004, 15(4): 362–377.
Article Google Scholar
Sun X H, Byna S. Data-access memory servers for multi-processor environments. CS-TR-2005-001, Illinois Institute of Technology, 2005.
Hennessy J, Patterson D. Computer Architecture: A Quantitative Approach. The 4th Edition, Morgan Kaufmann, 2006.
Jouppi N P. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. the 17th International Symposium on Computer Architecture, Seattle, USA, May 28–31, 1990, pp.364–373.
Jain P, Devadas S, Rudolph L. Controlling cache pollution in prefetching with software-assisted cache replacement. Technical Report TR-CSG-462, Massachusetts Institute of Technology, 2001.
Megiddo N, Modha D. ARC: A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conference on File and Storage Technologies, San Francisco, March 31–April 2, 2003, pp.115–130.
Yang C L, Lebeck A R, Tseng H W, Lee C. Tolerating memory latency through push prefetching for pointer-intensive applications. ACM Transactions on Architecture and Code Optimization, 2004, 1(4): 445–475.
Article Google Scholar
Brown J, Wang H, Chrysos G, Wang P, Shen J. Speculative precomputation on chip multiprocessors. In Proc. the 6th Workshop on Multithreaded Execution, Architecture, and Compilation, Istanbul, Turkey, Nov. 19, 2002.
Smith A J. Sequential program prefetching in memory hierarchies. IEEE Computer, 1978, 11(12): 7–21.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois, 60616, U.S.A.
Surendra Byna (Member, IEEE), Yong Chen (Student Member, ACM, IEEE) & Xian-He Sun (Member, ACM, Senior Member, IEEE)

Authors

Surendra Byna
View author publications
You can also search for this author in PubMed Google Scholar
Yong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xian-He Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Surendra Byna.

Additional information

This research was supported in part by the National Science Foundation of USA under Grant Nos. EIA-0224377, CNS-0406328, CNS-0509118, and CCF-0621435.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Byna, S., Chen, Y. & Sun, XH. Taxonomy of Data Prefetching for Multicore Processors. J. Comput. Sci. Technol. 24, 405–417 (2009). https://doi.org/10.1007/s11390-009-9233-4

Download citation

Received: 28 July 2008
Revised: 31 December 2008
Published: 26 May 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s11390-009-9233-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Taxonomy of Data Prefetching for Multicore Processors

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

In-memory database acceleration on FPGAs: a survey

WebAssembly as an Enabler for Next Generation Serverless Computing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Taxonomy of Data Prefetching for Multicore Processors

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

In-memory database acceleration on FPGAs: a survey

WebAssembly as an Enabler for Next Generation Serverless Computing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation