Abstract
This paper presents a low complexity table-based approach to delta correlation prefetching. Our approach uses a table indexed by the load address which stores the latest deltas observed. By storing deltas rather than full miss addresses, considerable space is saved while making pattern matching easier. The delta-history can predict repeating patterns with long periods by using delta correlation. In addition, we propose L1 hoisting which is a technique for moving data from the L2 to the L1 using the same underlying table structure and partial matching which reduces the spatial resolution in the delta stream to expose more patterns.
We evaluate our prefetching technique using the simulator framework used in the Data Prefetching Championship. This allows us to use the original code submitted to the contest to fairly evaluate several alternate prefetching techniques. Our prefetcher technique increases performance by 87% on average (6.6X max) on SPEC2006.
This work was supported by the Norwegian Metacenter for Computational Science (Notur).
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Perez, D.G., Mouchard, G., Temam, O.: Microlib: A case for the quantitative comparison of micro-architecture mechanisms. In: MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 43–54. IEEE Computer Society, Los Alamitos (2004)
Grannaes, M., Jahre, M., Natvig, L.: Storage efficient hardware prefetching using delta correlating prediction tables. In: Data Prefetching Championships (2009)
Nesbit, K.J., Smith, J.E.: Data cache prefetching using a global history buffer. In: International Symposium on High-Performance Computer Architecture, p. 96 (2004)
Smith, A.J.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982)
Chen, T.F., Baer, J.L.: Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers 44, 609–623 (1995)
Dahlgren, F., Stenstrom, P.: Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems 7(4), 385–398 (1996)
Nesbit, K.J., Dhodapkar, A.S., Smith, J.E.: AC/DC: An adaptive data cache prefetcher. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques, pp. 135–145 (2004)
Dimitrov, M., Zhou, H.: Combining local and global history for high performance data prefetching. In: Data Prefetching Championship-1 (2009)
Ramos, L.M., Briz, J.L., Ibáñez, P.E., Viñals, V.: Multi-level adaptive prefetching based on performance gradient tracking. In: Data Prefetching Championship-1 (2009)
Ishii, Y., Inaba, M., Hiraki, K.: Access map pattern matching prefetch: Optimization friendly method. In: Data Prefetching Championship-1 (2009)
Hur, I., Lin, C.: Feedback mechanisms for improving probabilistic memory prefetching. In: HPCA 2009: Proceedings of the 15th International Symposium on High-Performance Computer Architecture, pp. 443–454 (2009)
DPC-1: Data prefetching championship rules, http://www.jilp.org/dpc/framework.html
Jaleel, A., Cohn, R.S., Luk, C.K., Jacob, B.: CMP$im: A pin-based on-the-fly multi-core cache simulator. In: MoBS (2008)
SPEC: Spec 2006 benchmark suites (2006), http://www.spec.org
Srinath, S., Mutlu, O., Kim, H., Patt, Y.N.: Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. Technical report, University of Texas at Austin, TR-HPS-2006-006 (May 2006)
Grannaes, M., Natvig, L.: Dynamic parameter tuning for hardware prefetching using shadow tagging. In: CMP-MSI: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (2008)
Grannaes, M., Jahre, M., Natvig, L.: Low-cost open-page prefetch scheduling in chip multiprocessors. In: IEEE International Conference on Computer Design, ICCD (2008)
Wenisch, T., Ferdman, M., Ailamaki, A., Falsafi, B., Moshovos, A.: Practical off-chip meta-data for temporal memory streaming. In: High Performance Computer Architecture, HPCA (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grannaes, M., Jahre, M., Natvig, L. (2010). Multi-level Hardware Prefetching Using Low Complexity Delta Correlating Prediction Tables with Partial Matching. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-11515-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)