Abstract
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit software prefetching instructions. Unfortunately, each mechanism has potential drawbacks. Non-blocking loads can significantly increase register pressure by extending the lifetimes of loads. Software prefetching increases the number of memory instructions in the loop body. For a loop whose execution time is bound by the number of loads/stores that can be issued per cycle, software prefetching exacerbates this problem and increases the number of idle computational cycles in loops.
In this paper, we show how compiler and architecture support for combining a load and a prefetch into one instruction, called a prefetching load, can give lower register pressure like software prefetching and lower load/store-unit requirements like non-blocking loads. On a set of 106 Fortran loops we show that prefetching loads obtain a speedup of 1.07–1.53 over using just non-blocking loads and a speedup of 1.04-1.08 over using software prefetching. In addition, prefetching loads reduced floating-point register pressure by as much as a factor of 0.4 and integer register pressure by as much as a factor of 0.8 over non-blocking loads. Integer register pressure was also reduced by a factor of 0.97 over software prefetching, while floating-point register pressure was increased by a factor of 1.02 versus software prefetching in the worst case.
Keywords
- Cache
- Software Prefetching
- Nonblocking Loads
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
V.H. Allan, R. Jones, R. Lee, and S.J. Allan. Software Pipelining. ACM Computing Surveys, 27 (3), September 1995.
A. Aiken and A. Nicolau. Optimal loop parallelization. In Conference on Programming Language Design and Implementation,pages 308–317, Atlanta Georgia, June 1988. SIGPLAN ‘88.
A. Aiken and A. Nicolau. Perfect Pipelining: A New Loop Optimization Technique. In Proceedings of the 1988 European Symposium on Programming, Springer Verlag Lecture Notes in Computer Science, #300, pages 221–235, Nancy, France, March 1988.
V.H. Allan, M. Rajagopalan, and R.M. Lee. Software Pipelining: Petri Net Pacemaker. In Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, II~ Orlando, FL, January 20–22 1993.
V. Adve, J-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing ‘85, San Diego, CA, December 1995.
Preston Briggs and Keith D. Cooper. Effective partial redundancy elimination. In Proceedings of the ACM SIGPLAN ‘84 Conference on Programming Language Design and Implementation, pages 159170, Orlando, FL, June 1994.
]P. Briggs, K. D. Cooper, and L. T. Simpson. Value numbering. Software — Practice 6 Experience, 27 (6): 701–724, June 1997.
Preston Briggs. The massively scalar compiler project. Technical report, Rice Univeristy, July 1994. Preliminary version available via anonymous ftp.
]Tien-Fu Chen and Jean-Loup Baer. Reducing memory latency via non-blocking and prefetching caches. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 51–61, Boston, Massachusetts, 1992.
David Callahan, Steve Carr, and Ken Kennedy. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN ‘80 Conference on Programming Language Design and Implementation, pages 53–65, White Plains, NY, June 1990.
S. Carr and K. Kennedy. Scalar replacement in the presence of conditional control flow. Software Practice and Experience, 24 (1): 5177, January 1994.
David Callahan, Ken Kennedy, and Allan Porterfield. Software pre-fetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40–52, Santa Clara, California, 1991.
Steve Carr, Kathryn McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252–262, Santa Clara, California, 1994.
R. Crowell. An experimental evaluation of compiler-based cache management techniques. Master’s thesis, Michigan Technological University, March 1998.
S. Carr and P. Sweany. Improving software pipelining with hardware support for self-spatial loads. In The Third Workshop on Interaction between Compilers and Computer Architecture (INTERACT-3), San Jose, CA, October 1998.
Keith D. Cooper, L. Taylor Simpson, and Christopher A. Vick. Operator strength reduction. Technical Report CRPC-TR95635S, Center for Research on Parallel Computation, Rice Univeristy, October 1995.
C. Ding, S. Carr, and P. Sweany. Modulo scheduling with cache reuse information. In Proceedings of EuroPar ‘87, Passau, Germany, August 1997.
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, 1987.
Monica Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN ‘88 Conference on Programming Language Design and Implementation, pages 318–328, Atlanta, GA, July 1988.
MIPS Technologies, Incorporated. R10000 Microprocessor Product Overview, October 1994.
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems,pages 62–75,Boston, Massachusetts, 1992.
D.A. Poplawski. The unlimited resource machine (URM). Technical Report 95–01,Michigan Technological University, January 1995.
B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th International Symposium on Microarchitecture (MICRO-27)pages 63–74San Jose, CA, December 1994.
G. Rivera and C.-W. Tseng. Data transformations for eliminationg conflict misses. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementationpages 38–49Montreal, Canada, June 17–19 1998.
Philip H. Sweany and Steven J. Beaty. Overview of the Rocket retargetable C compiler. Technical Report CS-94–01,Department of Computer Science, Michigan Technological University, Houghton, January 1994.
F. Sanchez and A. Gonzalez. Cache-sensitive modulo scheduling. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO-30)Research Triangle Park, NC, December 1997.
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN ‘81 Conference on Programming Language Design and Implementationpages 30–44Toronto, Ontario, June 1991.
Nancy J. Warter, Scott A. Mahlke, W.-M. Hwu, and B. Ramakrishna Rau. Reverse if-conversion. In Proceedings of the ACM SIG-PLAN ‘83 Conference on Programming Language Design and Implementationpages 290–299Albuquerque, NM, June 1993.
Mark N. Wegman and F. Kenneth Zadeck. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems 13(2):181–210April 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media New York
About this chapter
Cite this chapter
Bedy, M., Carr, S., Önder, S., Sweany, P. (2001). Improving Software Pipelining by Hiding Memory Latency with Combined Loads and Prefetches. In: Lee, G., Yew, PC. (eds) Interaction between Compilers and Computer Architectures. The Springer International Series in Engineering and Computer Science, vol 613. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3337-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3337-2_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4896-0
Online ISBN: 978-1-4757-3337-2
eBook Packages: Springer Book Archive
