Abstract
Loop fusion is recognized as an effective program transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of increased register pressure and cache conflict misses. The complex interaction between different levels of the memory hierarchy with the input program makes it very difficult to always make the right choice in fusing loops. In this paper, we present a cache-conscious analytical model for profitable loop fusion to be used with a constrained weighted fusion algorithm. We then extend the model to show its effectiveness in the context of an empirical tuning framework. A preliminary evaluation of the model is presented using hand experiments on four applications.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This material is based on work supported by the Department of Energy under Contract Nos. 03891-001-99-4G, 74837-001-03 49, 86192-001-04 49, and 12783-001-05 49 from the Los Alamos National Laboratory.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Carr, S.: Memory-Hierarchy Management. PhD thesis, Dept. of Computer Science, Rice University (September 1992)
Darte, A.: On the complexity of loop fusion. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662. Springer, Heidelberg (1999)
Ding, C., Kennedy, K.: Resource-constrained loop fusion. Technical report, Dept. of Computer Science, Rice University (October 2000)
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. In: International Parallel and Distributed Processing Symposium, San Francisco, CA (Best Paper Award) (April 2001)
Gao, G., Olsen, R., Sarkar, V., Thekkath, R.: Collective loop fusion for array contraction. In: Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT (August 1992)
Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12) (1989)
Kennedy, K.: Fast greedy weighted fusion. In: ICS 2000: Proceedings of the 14th international conference on Supercomputing (2000)
Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)
Lim, A., Lam, M.: Cache optimizations with affine partitioning. In: Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, Virginia (March 2001)
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACMTransactions on Programming Languages and Systems 18(4), 424–453 (1996)
Qasem, A., Kennedy, K.: Evaluating a model for cache conflict miss prediction. Technical report, Dept. of Computer Science, Rice University (October 2005)
Qasem, A., Kennedy, K., Mellor-Crummey, J.: Automatic tuning of whole applications using direct search and a performance-based transformation system. In: Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM (October 2004)
Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: Proceedings of the 15th ACM International Conference on Supercomputing, Sorrento, Italy (June 2001)
Verdoolaege, S., Bruynooghe, M., Jenssens, G., Catthoor, F.: Multi-dimensional incremental loop fusion for data locality. In: Proceedings of the IEEE International Conference on Application Specific Systems, Architectures, and Processors (June 2003)
Wolf, M.E., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada (June 1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qasem, A., Kennedy, K. (2006). A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-69330-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)