Abstract
The focus of this paper is on cache-conscious data layout optimizations. Although these optimizations have already been adopted by industrial compilers, they were shown to be inefficient for multi-process applications on multi-core platforms. Such factors as asymmetric distribution of processes over hardware resources (cores, cpus or hardware threads), along with their temporal migrations, unpredictably influence optimization results. Herein we present a new methodology that extends classical data layout optimizations to support multi-core architectures. Based on data trace collection that reflects actual interleaving of data accesses, this method aims to improve spatial locality of the data, while mitigating potential false sharing events. Introduction of architectural characteristics into an analysis phase further increases the accuracy of data affinity estimation. Feasibility study of this method, applied to multi-process webserver lighttpd on Power5 machine, not only showed performance improvement, but also proved its suitability for incorporation into an industrial compiler.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell sys. tech. J. 49, 291–308 (1970)
Lightweight open-source web server lighttpd, http://www.lighttpd.net/
Sinharoy, B., Kalla, R.N., Tendler, J.M., Eickemeyer, R.J., Joyner, J.B.: POWER5 system microarchitecure. IBM J. of Res. and Dev. 49(4/5), 505–522 (2005)
Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight Reference Affinity Analysis. In: Proceedings of the 19th annual international conference on Supercomputing, pp. 131–140. ACM, New York (2005)
Curial, S., Zhao, P., Amaral, J.N., Gao, Y., Silvera, R., Archambault, R.: MPADS: Memory-Pooling-Assisted Data Splitting. In: Proceedings of the 7th international symposium on Memory management, pp. 101–110. ACM, New York (2008)
Hagog, M., Tice, C.: Cache Aware Data Layout Reorganization Optimization in GCC. In: Proceedings of the GCC Developers’ Summit, pp. 69–92 (2005), http://www.gccsummit.org/
Golovanevsky, O., Zaks, A.: Struct-reorg: current status and future perspectives. In: Proceedings of the GCC Developers’ Summit, pp. 47–56 (2007), http://www.gccsummit.org/
Chakrabarti, G., Chow, F.: Structure Layout Optimizations in the Open64 Compiler:design, Implementaton and Measurements. In: Open64 Workshop at the International Symposium on Code Generation and Optimization (2008), http://www.capsl.udel.edu/conferences/open64/2008
Hundt, R., Mannarswamy, S., Chakrabarti, D.: Practical Structure Layout Optimization and Advice. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 233–244. IEEE Computer Society, Washington (2006)
Zhao, P., Cui, S., Gao, Y., Silvera, R., Amaral, J.N.: Forma: A Framework for Safe Automatic array Reshaping. ACM Transactions on Programming Languages and Systems 30 (2007)
SPEC CPU2000, http://www.spec.org/cpu2000/
SPEC CPU2006, http://www.spec.org/cpu2006/
Link Time Optimizations, http://gcc.gnu.org/wiki/LinkTimeOptimization
http_load, multiprocessing http test client, http://www.acme.com/software/http_load/
Chilimbi, T.M., Davidson, B., Larus, J.R.: Cache-conscious structure definition. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, pp. 13–24. ACM, New York (1999)
Chilimbi, T.M., Davidson, B., Larus, J.R.: Efficient Representation and Abstractions for Quantifying and Exploiting Data Reference Locality. In: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, pp. 191–202. ACM, New York (2001)
Ding, C., Zhong, Y.: Predicting Whole-Program Locality through Reuse Distance Analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 245–257. ACM, New York (2003)
Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality Apploximation Using Time. In: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 55–61. ACM, New York (2007)
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array Regrouping and Structure Splitting Using Whole-Program Reference Affinity. In: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pp. 255–266. ACM, New York (2004)
Jeon, J., Shin, K., Han, H.: Abstracting Access Patterns of Dynamic Memory Using Regular Expressions. ACM Transactions on Programming Languages and Systems 30 (2007)
Agarwal, A., Hennessy, J., Horowitz, M.: Cache Performance of Operating System and Multiprogramming Workloads. ACM Transactions on Computer Systems 431, 393–431 (1988)
Marathe, J., Mueller, F., Mohan, T., Mckee, S.A., De Suoinski, B.R., Yoo, A.: METRIC: Memory Tracing via Dynamic Binary Rewriting to Identify Cache Inefficiencies. ACM Transactions on Programming Languages and Systems 29, art.n.12 (2007)
Sarkar, S., Tullsen, D.M.: Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4917, pp. 353–368. Springer, Heidelberg (2008)
Rabbah, R.M., Palem, K.V.: Data Remapping for Design Space Optimization of Embedded Memory Systems. ACM Transactions on Embedded Computing Systems 2(2), 186–218 (2003)
Raman, E., Hundt, R., Mannarsway, S.: Structure Layout Optimization for Multithreaded Programs. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 271–282. IEEE Computer Society, Washington (2007)
Kandemir, M., Ramanujam, J.: Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework. IEEE Transactions on Parallel and Distributed Systems 14, 337–354 (2003)
Ozturk, O., Chen, G., Kandemir, M.: Multi-Compilation: Capturing Interactions Among Concurrently-Executing Applications. In: Proceedings of the 3rd conference on Computing frontiers, pp. 157–170. ACM, New York (2006)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24-36. ACM, New York (1995)
Calder, B., Krintz, C., Austin, T.: Cache-Conscious Data Placement. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 139–149. ACM, New York (1998)
Youfeng, W., James, R.L.: Static Branch Frequency and Program Profile Analysis. In: Proceedings of the 27th International Symposium on Microarchitecture, pp. 1–11. ACM, New York (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Golovanevsky, O., Dayan, A., Zaks, A., Edelsohn, D. (2010). Trace-Based Data Layout Optimizations for Multi-core Processors. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-11515-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)