A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors

Rui, Hou; Zhang, Longbing; Hu, Weiwu

doi:10.1007/11823285_52

Hou Rui¹⁹,
Longbing Zhang¹⁹ &
Weiwu Hu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4128))

Included in the following conference series:

European Conference on Parallel Processing

782 Accesses

Abstract

This paper proposes a hybrid hardware/software generated prefetching thread mechanism on Chip Multiprocessors(CMP). Two kinds of prefetching threads appear in our hybrid mechanism. Most threads belong to Dynamic Prefetching Thread, which are automatically generated, triggered, spawn and managed by hardware; The others are of Static Prefetching Thread, targeting at the critical delinquent loads which can not be accurately or timely predicted by Dynamic Prefetching Thread. Static Prefetching Threads are statically generated by binary-level optimization tool with the guide of profiling information. Also, some aggressive thread construction policies are proposed. Furthermore, the necessary hardware infrastructure for CMP supporting this hybrid mechanism are described. For a set of memory limited benchmarks with complicated access patterns, an average speedup of 3.1% is achieved on dual-core CMP when constructing basic hardware-generated prefetching thread, and this gain grows to 31% when adopting our hybrid mechanism.

Download to read the full chapter text

Chapter PDF

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Article 29 April 2016

Memory Centric Hardware Prefetching in Multi-core Processors

Adaptive Thread Scheduling in Chip Multiprocessors

Article 14 May 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Roth, A., Sohi, G.: Speculative data-driven multithreading. In: 7th HPCA, pp. 37–48 (2001)
Google Scholar
Collins, J., Wang, H., et al.: Speculative precomputation: Long-range prefetching of delinquent loads. In: the 28th ISCA, July 2001, pp. 14–25 (2001)
Google Scholar
Collins, J.D., Tullsen, D.M., Wang, H., et al.: Dynamic speculative precomputation. In: the 34th annual ACM/IEEE International Symposium on Microarchitecture, pp. 306–317 (2001)
Google Scholar
Liao, S., Wang, P., et al.: Post-Pass Binary Adaptation for Software-Based Speculative Precomputation. In: ACM Programming Language Design and Implementation (June 2002)
Google Scholar
Brown, J.A., Wang, H., et al.: Speculative Precomputation on Chip Multiprocessors. In: The 6th MTEAC (November 2002)
Google Scholar
Carlisle, M.: Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. PhD Thesis, Princeton University Department of Computer Science (1996)
Google Scholar
Moshovos, A., Pnevmatikatos, D., Baniasadi, A.: Slice processors: An implementation of operation-based prediction. In: the 15th International Conference on Supercomputing, June 2001, pp. 321–334 (2001)
Google Scholar
Zhou, H.: Dual-core execution: building a highly scalable single-thread instruction window. In: The 14th PACT 2005 (2005)
Google Scholar
Kohout, N., Choi, S., Yeung, D.: Multi-chain prefetching: Exploiting memory parallelism in pointer-chasing codes. In: ISCA Workshop on Solving the Memory Wall Problem (2000)
Google Scholar
Mowry, T., Gupta, A.: Tolerating latency through software controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 87–106 (June 1991)
Google Scholar
Luk, C.: Tolerating memory latency through softwarecontrolled pre-execution in simultaneous multithreading processors. In: The 28th ISCA, July 2001, pp. 40–51 (2001)
Google Scholar
Ganusov, I., Burtscher, M.: Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors. In: PACT 2005, pp. 350–360 (2005)
Google Scholar
Bershad, B.N., Lee, D., et al.: Avoiding Conflict Misses Dynamically in Large Direct-Mapped Caches. In: The 6th ASPLOS, pp. 158–170 (1994)
Google Scholar
Huh, J., Burger, D., Keckler, S.: Exploring the design space of future CMPs. In: The 10th PACT, September 2001, pp. 199–210 (2001)
Google Scholar
Burger, D., Goodman, J.R.: Billion-transistor architectures: there and back again. Computer, 22–28 (March 2004)
Google Scholar
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: The 9th HPCA (2003)
Google Scholar
Renau, J., Fraguela, B., Tuck, J., et al.: (January 2005), http://sesc.sourceforge.net

Download references

Author information

Authors and Affiliations

Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China
Hou Rui, Longbing Zhang & Weiwu Hu

Authors

Hou Rui
View author publications
You can also search for this author in PubMed Google Scholar
Longbing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiwu Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ZIH, TU Dresden, Germany
Wolfgang E. Nagel
Fakultät Mathematik, Institut für wissenschaftliches Rechnen, TU Dresden, 01062, Dresden, Germany
Wolfgang V. Walter
Database Technology Group, Technische Universität Dresden, Germany
Wolfgang Lehner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rui, H., Zhang, L., Hu, W. (2006). A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds) Euro-Par 2006 Parallel Processing. Euro-Par 2006. Lecture Notes in Computer Science, vol 4128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823285_52

Download citation

DOI: https://doi.org/10.1007/11823285_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37783-2
Online ISBN: 978-3-540-37784-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Memory Centric Hardware Prefetching in Multi-core Processors

Adaptive Thread Scheduling in Chip Multiprocessors

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Memory Centric Hardware Prefetching in Multi-core Processors

Adaptive Thread Scheduling in Chip Multiprocessors

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation