Abstract
Cache misses and bus traffic are key obstacles to achieving high performance of bus-based shared memory multiprocessors using invalidation-based snooping caches. To overcome these problems, software-controlled techniques for tolerating memory latency can be used, such as cache prefetching and data forwarding. However, some previous studies have shown that cache prefetching is not so effective in bus-based shared memory multiprocessors, while data forwarding is not easy to implement in this environment. In this paper, we propose a novel technique called cache injection, which combines consumer and producer initiated approaches, as well as the broadcasting nature of bus. Performance evaluation based on program-driven simulation and a set of eight parallel benchmark programs shows that cache injection is highly effective in reducing coherence misses and bus traffic.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Culler D., Singh J. P., Gupta A.: Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann Publishers, San Francisco, CA (1998)
Mowry T.: Tolerating Latency Through Software-Controlled Data Prefetching. Ph. D. Thesis, Stanford University, (1994)
Koufaty D. A., Chen X., Poulsen D. K., Torrellas J.: Data Forwarding in Scaleable Shared Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Technology, Vol. 7, No. 12. (1996)1250–1264
Byrd, G. T., Flynn M. J.: Producer-Consumer Communication in Distributed Shared Memory Multiprocessors. Proceedings of the IEEE, vol. 87, no. 3. (1999) 456–466
Ramachandran U., Shah G., Sivasubramaniam A., Singla A., Yanasak I.: Architectural Mechanisms for Explicit Communication in Shared Memory Multiprocessors. Proceedings of the Supercomputing’95, vol. 2. (1995), 1737–1775
Shafi H. A., Hall J., Adve S., Adve V.: An Evaluation of Fine-Grain Producer Initiated Communication in Cache-Coherent Multiprocessors. Proceedings of the 3rd HPCA. (1997) 204–215
Skeppstedt J., Stenstrom P.: A Compiler Algorithm that Reduces Read Latency in Ownership-Based Cache Coherence Protocols. Proceedings of the PACT’95, IEEE Computer Society Press. (1995) 69–78
Trancoso P., Torrellas J.: The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding. Proceeding of the 25th ICPP, IEEE Computer Society Press, Vol. 3. (1996) 79–86
Tullsen D., Eggers S.: Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Computer Systems, Vol. 13, No. 1. (1995) 57–88
Dahlgren, F., Skeppstedt, J., Stenstrom, P.: Effectiveness of Hardware-Based and Compiler-Controlled Snooping Cache Protocol Extensions. Proceedings of the HiPC. (1995) 87–92
Anderson, C, Baer, J.-L.: Two Techniques for Improving Performance on Bus-Based Multiprocessors. Proceedings of the lstHPCA. (1995) 256–275
Magdic, D.: Limes: A Multiprocessor Simulation Environment. TCCA Newsletter, March 1997. 68–71
Woo S. C, Ohara M., Torrie E., Singh J. P., Gupta A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. Proceedings of the 22nd ISCA, (1995) 24–36
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Milenkovic, A., Milutinovic, V. (2000). Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds) Euro-Par 2000 Parallel Processing. Euro-Par 2000. Lecture Notes in Computer Science, vol 1900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44520-X_76
Download citation
DOI: https://doi.org/10.1007/3-540-44520-X_76
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67956-1
Online ISBN: 978-3-540-44520-3
eBook Packages: Springer Book Archive