Skip to main content

Trace-Based Data Layout Optimizations for Multi-core Processors

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Abstract

The focus of this paper is on cache-conscious data layout optimizations. Although these optimizations have already been adopted by industrial compilers, they were shown to be inefficient for multi-process applications on multi-core platforms. Such factors as asymmetric distribution of processes over hardware resources (cores, cpus or hardware threads), along with their temporal migrations, unpredictably influence optimization results. Herein we present a new methodology that extends classical data layout optimizations to support multi-core architectures. Based on data trace collection that reflects actual interleaving of data accesses, this method aims to improve spatial locality of the data, while mitigating potential false sharing events. Introduction of architectural characteristics into an analysis phase further increases the accuracy of data affinity estimation. Feasibility study of this method, applied to multi-process webserver lighttpd on Power5 machine, not only showed performance improvement, but also proved its suitability for incorporation into an industrial compiler.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell sys. tech. J. 49, 291–308 (1970)

    Google Scholar 

  2. Lightweight open-source web server lighttpd, http://www.lighttpd.net/

  3. Sinharoy, B., Kalla, R.N., Tendler, J.M., Eickemeyer, R.J., Joyner, J.B.: POWER5 system microarchitecure. IBM J. of Res. and Dev. 49(4/5), 505–522 (2005)

    Article  Google Scholar 

  4. Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight Reference Affinity Analysis. In: Proceedings of the 19th annual international conference on Supercomputing, pp. 131–140. ACM, New York (2005)

    Chapter  Google Scholar 

  5. Curial, S., Zhao, P., Amaral, J.N., Gao, Y., Silvera, R., Archambault, R.: MPADS: Memory-Pooling-Assisted Data Splitting. In: Proceedings of the 7th international symposium on Memory management, pp. 101–110. ACM, New York (2008)

    Chapter  Google Scholar 

  6. Hagog, M., Tice, C.: Cache Aware Data Layout Reorganization Optimization in GCC. In: Proceedings of the GCC Developers’ Summit, pp. 69–92 (2005), http://www.gccsummit.org/

  7. Golovanevsky, O., Zaks, A.: Struct-reorg: current status and future perspectives. In: Proceedings of the GCC Developers’ Summit, pp. 47–56 (2007), http://www.gccsummit.org/

  8. Chakrabarti, G., Chow, F.: Structure Layout Optimizations in the Open64 Compiler:design, Implementaton and Measurements. In: Open64 Workshop at the International Symposium on Code Generation and Optimization (2008), http://www.capsl.udel.edu/conferences/open64/2008

  9. Hundt, R., Mannarswamy, S., Chakrabarti, D.: Practical Structure Layout Optimization and Advice. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 233–244. IEEE Computer Society, Washington (2006)

    Chapter  Google Scholar 

  10. Zhao, P., Cui, S., Gao, Y., Silvera, R., Amaral, J.N.: Forma: A Framework for Safe Automatic array Reshaping. ACM Transactions on Programming Languages and Systems 30 (2007)

    Google Scholar 

  11. SPEC CPU2000, http://www.spec.org/cpu2000/

  12. SPEC CPU2006, http://www.spec.org/cpu2006/

  13. Link Time Optimizations, http://gcc.gnu.org/wiki/LinkTimeOptimization

  14. http_load, multiprocessing http test client, http://www.acme.com/software/http_load/

  15. Chilimbi, T.M., Davidson, B., Larus, J.R.: Cache-conscious structure definition. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, pp. 13–24. ACM, New York (1999)

    Chapter  Google Scholar 

  16. Chilimbi, T.M., Davidson, B., Larus, J.R.: Efficient Representation and Abstractions for Quantifying and Exploiting Data Reference Locality. In: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, pp. 191–202. ACM, New York (2001)

    Chapter  Google Scholar 

  17. Ding, C., Zhong, Y.: Predicting Whole-Program Locality through Reuse Distance Analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 245–257. ACM, New York (2003)

    Chapter  Google Scholar 

  18. Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality Apploximation Using Time. In: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 55–61. ACM, New York (2007)

    Chapter  Google Scholar 

  19. Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array Regrouping and Structure Splitting Using Whole-Program Reference Affinity. In: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pp. 255–266. ACM, New York (2004)

    Chapter  Google Scholar 

  20. Jeon, J., Shin, K., Han, H.: Abstracting Access Patterns of Dynamic Memory Using Regular Expressions. ACM Transactions on Programming Languages and Systems 30 (2007)

    Google Scholar 

  21. Agarwal, A., Hennessy, J., Horowitz, M.: Cache Performance of Operating System and Multiprogramming Workloads. ACM Transactions on Computer Systems 431, 393–431 (1988)

    Article  Google Scholar 

  22. Marathe, J., Mueller, F., Mohan, T., Mckee, S.A., De Suoinski, B.R., Yoo, A.: METRIC: Memory Tracing via Dynamic Binary Rewriting to Identify Cache Inefficiencies. ACM Transactions on Programming Languages and Systems 29, art.n.12 (2007)

    Google Scholar 

  23. Sarkar, S., Tullsen, D.M.: Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4917, pp. 353–368. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Rabbah, R.M., Palem, K.V.: Data Remapping for Design Space Optimization of Embedded Memory Systems. ACM Transactions on Embedded Computing Systems 2(2), 186–218 (2003)

    Article  Google Scholar 

  25. Raman, E., Hundt, R., Mannarsway, S.: Structure Layout Optimization for Multithreaded Programs. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 271–282. IEEE Computer Society, Washington (2007)

    Chapter  Google Scholar 

  26. Kandemir, M., Ramanujam, J.: Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework. IEEE Transactions on Parallel and Distributed Systems 14, 337–354 (2003)

    Article  Google Scholar 

  27. Ozturk, O., Chen, G., Kandemir, M.: Multi-Compilation: Capturing Interactions Among Concurrently-Executing Applications. In: Proceedings of the 3rd conference on Computing frontiers, pp. 157–170. ACM, New York (2006)

    Chapter  Google Scholar 

  28. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24-36. ACM, New York (1995)

    Google Scholar 

  29. Calder, B., Krintz, C., Austin, T.: Cache-Conscious Data Placement. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 139–149. ACM, New York (1998)

    Chapter  Google Scholar 

  30. Youfeng, W., James, R.L.: Static Branch Frequency and Program Profile Analysis. In: Proceedings of the 27th International Symposium on Microarchitecture, pp. 1–11. ACM, New York (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Golovanevsky, O., Dayan, A., Zaks, A., Edelsohn, D. (2010). Trace-Based Data Layout Optimizations for Multi-core Processors. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11515-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11514-1

  • Online ISBN: 978-3-642-11515-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics