Abstract
This paper proposes a hybrid hardware/software generated prefetching thread mechanism on Chip Multiprocessors(CMP). Two kinds of prefetching threads appear in our hybrid mechanism. Most threads belong to Dynamic Prefetching Thread, which are automatically generated, triggered, spawn and managed by hardware; The others are of Static Prefetching Thread, targeting at the critical delinquent loads which can not be accurately or timely predicted by Dynamic Prefetching Thread. Static Prefetching Threads are statically generated by binary-level optimization tool with the guide of profiling information. Also, some aggressive thread construction policies are proposed. Furthermore, the necessary hardware infrastructure for CMP supporting this hybrid mechanism are described. For a set of memory limited benchmarks with complicated access patterns, an average speedup of 3.1% is achieved on dual-core CMP when constructing basic hardware-generated prefetching thread, and this gain grows to 31% when adopting our hybrid mechanism.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Roth, A., Sohi, G.: Speculative data-driven multithreading. In: 7th HPCA, pp. 37–48 (2001)
Collins, J., Wang, H., et al.: Speculative precomputation: Long-range prefetching of delinquent loads. In: the 28th ISCA, July 2001, pp. 14–25 (2001)
Collins, J.D., Tullsen, D.M., Wang, H., et al.: Dynamic speculative precomputation. In: the 34th annual ACM/IEEE International Symposium on Microarchitecture, pp. 306–317 (2001)
Liao, S., Wang, P., et al.: Post-Pass Binary Adaptation for Software-Based Speculative Precomputation. In: ACM Programming Language Design and Implementation (June 2002)
Brown, J.A., Wang, H., et al.: Speculative Precomputation on Chip Multiprocessors. In: The 6th MTEAC (November 2002)
Carlisle, M.: Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. PhD Thesis, Princeton University Department of Computer Science (1996)
Moshovos, A., Pnevmatikatos, D., Baniasadi, A.: Slice processors: An implementation of operation-based prediction. In: the 15th International Conference on Supercomputing, June 2001, pp. 321–334 (2001)
Zhou, H.: Dual-core execution: building a highly scalable single-thread instruction window. In: The 14th PACT 2005 (2005)
Kohout, N., Choi, S., Yeung, D.: Multi-chain prefetching: Exploiting memory parallelism in pointer-chasing codes. In: ISCA Workshop on Solving the Memory Wall Problem (2000)
Mowry, T., Gupta, A.: Tolerating latency through software controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 87–106 (June 1991)
Luk, C.: Tolerating memory latency through softwarecontrolled pre-execution in simultaneous multithreading processors. In: The 28th ISCA, July 2001, pp. 40–51 (2001)
Ganusov, I., Burtscher, M.: Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors. In: PACT 2005, pp. 350–360 (2005)
Bershad, B.N., Lee, D., et al.: Avoiding Conflict Misses Dynamically in Large Direct-Mapped Caches. In: The 6th ASPLOS, pp. 158–170 (1994)
Huh, J., Burger, D., Keckler, S.: Exploring the design space of future CMPs. In: The 10th PACT, September 2001, pp. 199–210 (2001)
Burger, D., Goodman, J.R.: Billion-transistor architectures: there and back again. Computer, 22–28 (March 2004)
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: The 9th HPCA (2003)
Renau, J., Fraguela, B., Tuck, J., et al.: (January 2005), http://sesc.sourceforge.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rui, H., Zhang, L., Hu, W. (2006). A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds) Euro-Par 2006 Parallel Processing. Euro-Par 2006. Lecture Notes in Computer Science, vol 4128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823285_52
Download citation
DOI: https://doi.org/10.1007/11823285_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37783-2
Online ISBN: 978-3-540-37784-9
eBook Packages: Computer ScienceComputer Science (R0)