Trace-Based Data Layout Optimizations for Multi-core Processors

Golovanevsky, Olga; Dayan, Alon; Zaks, Ayal; Edelsohn, David

doi:10.1007/978-3-642-11515-8_8

Olga Golovanevsky²¹,
Alon Dayan²¹,
Ayal Zaks²¹ &
…
David Edelsohn²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1308 Accesses
1 Citations

Abstract

The focus of this paper is on cache-conscious data layout optimizations. Although these optimizations have already been adopted by industrial compilers, they were shown to be inefficient for multi-process applications on multi-core platforms. Such factors as asymmetric distribution of processes over hardware resources (cores, cpus or hardware threads), along with their temporal migrations, unpredictably influence optimization results. Herein we present a new methodology that extends classical data layout optimizations to support multi-core architectures. Based on data trace collection that reflects actual interleaving of data accesses, this method aims to improve spatial locality of the data, while mitigating potential false sharing events. Introduction of architectural characteristics into an analysis phase further increases the accuracy of data affinity estimation. Feasibility study of this method, applied to multi-process webserver lighttpd on Power5 machine, not only showed performance improvement, but also proved its suitability for incorporation into an industrial compiler.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell sys. tech. J. 49, 291–308 (1970)
Google Scholar
Lightweight open-source web server lighttpd, http://www.lighttpd.net/
Sinharoy, B., Kalla, R.N., Tendler, J.M., Eickemeyer, R.J., Joyner, J.B.: POWER5 system microarchitecure. IBM J. of Res. and Dev. 49(4/5), 505–522 (2005)
Article Google Scholar
Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight Reference Affinity Analysis. In: Proceedings of the 19th annual international conference on Supercomputing, pp. 131–140. ACM, New York (2005)
Chapter Google Scholar
Curial, S., Zhao, P., Amaral, J.N., Gao, Y., Silvera, R., Archambault, R.: MPADS: Memory-Pooling-Assisted Data Splitting. In: Proceedings of the 7th international symposium on Memory management, pp. 101–110. ACM, New York (2008)
Chapter Google Scholar
Hagog, M., Tice, C.: Cache Aware Data Layout Reorganization Optimization in GCC. In: Proceedings of the GCC Developers’ Summit, pp. 69–92 (2005), http://www.gccsummit.org/
Golovanevsky, O., Zaks, A.: Struct-reorg: current status and future perspectives. In: Proceedings of the GCC Developers’ Summit, pp. 47–56 (2007), http://www.gccsummit.org/
Chakrabarti, G., Chow, F.: Structure Layout Optimizations in the Open64 Compiler:design, Implementaton and Measurements. In: Open64 Workshop at the International Symposium on Code Generation and Optimization (2008), http://www.capsl.udel.edu/conferences/open64/2008
Hundt, R., Mannarswamy, S., Chakrabarti, D.: Practical Structure Layout Optimization and Advice. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 233–244. IEEE Computer Society, Washington (2006)
Chapter Google Scholar
Zhao, P., Cui, S., Gao, Y., Silvera, R., Amaral, J.N.: Forma: A Framework for Safe Automatic array Reshaping. ACM Transactions on Programming Languages and Systems 30 (2007)
Google Scholar
SPEC CPU2000, http://www.spec.org/cpu2000/
SPEC CPU2006, http://www.spec.org/cpu2006/
Link Time Optimizations, http://gcc.gnu.org/wiki/LinkTimeOptimization
http_load, multiprocessing http test client, http://www.acme.com/software/http_load/
Chilimbi, T.M., Davidson, B., Larus, J.R.: Cache-conscious structure definition. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, pp. 13–24. ACM, New York (1999)
Chapter Google Scholar
Chilimbi, T.M., Davidson, B., Larus, J.R.: Efficient Representation and Abstractions for Quantifying and Exploiting Data Reference Locality. In: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, pp. 191–202. ACM, New York (2001)
Chapter Google Scholar
Ding, C., Zhong, Y.: Predicting Whole-Program Locality through Reuse Distance Analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 245–257. ACM, New York (2003)
Chapter Google Scholar
Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality Apploximation Using Time. In: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 55–61. ACM, New York (2007)
Chapter Google Scholar
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array Regrouping and Structure Splitting Using Whole-Program Reference Affinity. In: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pp. 255–266. ACM, New York (2004)
Chapter Google Scholar
Jeon, J., Shin, K., Han, H.: Abstracting Access Patterns of Dynamic Memory Using Regular Expressions. ACM Transactions on Programming Languages and Systems 30 (2007)
Google Scholar
Agarwal, A., Hennessy, J., Horowitz, M.: Cache Performance of Operating System and Multiprogramming Workloads. ACM Transactions on Computer Systems 431, 393–431 (1988)
Article Google Scholar
Marathe, J., Mueller, F., Mohan, T., Mckee, S.A., De Suoinski, B.R., Yoo, A.: METRIC: Memory Tracing via Dynamic Binary Rewriting to Identify Cache Inefficiencies. ACM Transactions on Programming Languages and Systems 29, art.n.12 (2007)
Google Scholar
Sarkar, S., Tullsen, D.M.: Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4917, pp. 353–368. Springer, Heidelberg (2008)
Chapter Google Scholar
Rabbah, R.M., Palem, K.V.: Data Remapping for Design Space Optimization of Embedded Memory Systems. ACM Transactions on Embedded Computing Systems 2(2), 186–218 (2003)
Article Google Scholar
Raman, E., Hundt, R., Mannarsway, S.: Structure Layout Optimization for Multithreaded Programs. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 271–282. IEEE Computer Society, Washington (2007)
Chapter Google Scholar
Kandemir, M., Ramanujam, J.: Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework. IEEE Transactions on Parallel and Distributed Systems 14, 337–354 (2003)
Article Google Scholar
Ozturk, O., Chen, G., Kandemir, M.: Multi-Compilation: Capturing Interactions Among Concurrently-Executing Applications. In: Proceedings of the 3rd conference on Computing frontiers, pp. 157–170. ACM, New York (2006)
Chapter Google Scholar
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24-36. ACM, New York (1995)
Google Scholar
Calder, B., Krintz, C., Austin, T.: Cache-Conscious Data Placement. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 139–149. ACM, New York (1998)
Chapter Google Scholar
Youfeng, W., James, R.L.: Static Branch Frequency and Program Profile Analysis. In: Proceedings of the 27th International Symposium on Microarchitecture, pp. 1–11. ACM, New York (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Haifa Research Laboratory,
Olga Golovanevsky, Alon Dayan & Ayal Zaks
IBM Watson Research Center,
David Edelsohn

Authors

Olga Golovanevsky
View author publications
You can also search for this author in PubMed Google Scholar
Alon Dayan
View author publications
You can also search for this author in PubMed Google Scholar
Ayal Zaks
View author publications
You can also search for this author in PubMed Google Scholar
David Edelsohn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Golovanevsky, O., Dayan, A., Zaks, A., Edelsohn, D. (2010). Trace-Based Data Layout Optimizations for Multi-core Processors. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics