Skip to main content

Scalable Implementation of Efficient Locality Approximation

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5335))

Abstract

As memory hierarchy becomes deeper and shared by more processors, locality increasingly determines system performance. As a rigorous and precise locality model, reuse distance has been used in program optimizations, performance prediction, memory disambiguation, and locality phase prediction. However, the high cost of measurement has been severely impeding its uses in scenarios requiring high efficiency, such as product compilers, performance debugging, run-time optimizations.

We recently discovered the statistical connection between time and reuse distance, which led to an efficient way to approximate reuse distance using time. However, not exposed are some algorithmic and implementation techniques that are vital for the efficiency and scalability of the approximation model. This paper presents these techniques. It describes an algorithm that approximates reuse distance on arbitrary scales; it explains a portable scheme that employs memory controller to accelerate the measure of time distance; it uncovers the algorithm and proof of a trace generator that can facilitate various locality studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  2. Beyls, K., D’Hollander, E.H.: Reuse distance-based cache hint selection. In: Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany (August 2002)

    Google Scholar 

  3. Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: Proceedings of Supercomputing (2000)

    Google Scholar 

  4. Cascaval, G.C.: Compile-time Performance Prediction of Scientific Programs. Ph.D thesis, University of Illinois, Urbana-Champaign (2000)

    Google Scholar 

  5. Chilimbi, T.M.: Efficient representations and abstractions for quantifying and exploiting data reference locality. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah (June 2001)

    Google Scholar 

  6. Ding, C.: Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse. Ph.D thesis, Dept. of Computer Science, Rice University (January 2000)

    Google Scholar 

  7. Ding, C., Zhong, Y.: Predicting whole-program locality with reuse distance analysis. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA (June 2003)

    Google Scholar 

  8. Fang, C., Carr, S., Onder, S., Wang, Z.: Instruction based memory distance analysis and its application to optimization. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO (2005)

    Google Scholar 

  9. Huang, S.A., Shen, J.P.: The intrinsic bandwidth requirements of ordinary programs. In: Proceedings of the 7th International Conferences on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA (October 1996)

    Google Scholar 

  10. Li, Z., Gu, J., Lee, G.: An evaluation of the potential benefits of register allocation for array references. In: Workshop on Interaction between Compilers and Computer Architectures in conjunction with the HPCA-2, San Jose, California (February 1996)

    Google Scholar 

  11. Luk, C.-K., et al.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, Illinois (June 2005)

    Google Scholar 

  12. Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, NewYork City, NY (June 2004)

    Google Scholar 

  13. Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM System Journal 9(2), 78–117 (1970)

    Article  MATH  Google Scholar 

  14. McKinley, K.S., Temam, O.: Quantifying loop nest locality using SPEC 1995 and the perfect benchmarks. ACM Transactions on Computer Systems 17(4), 288–336 (1999)

    Article  Google Scholar 

  15. Shen, X., Shaw, J., Meeker, B.: Accurate approximation of locality from time distance histograms. Technical Report TR902, Computer Science Department, University of Rochester (2006)

    Google Scholar 

  16. Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality approximation using time. In: Proceedings of the ACM SIGPLAN Conference on Priciples of Programming Languages, 7 pages (2007) (short paper)

    Google Scholar 

  17. Thabit, K.O.: Cache Management by the Compiler. Ph.D thesis, Dept. of Computer Science, Rice University (1981)

    Google Scholar 

  18. Zhao, Q., Sim, J.E., Wong, W.-F., Rudolph, L.: DEP: detailed execution profile. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (2006)

    Google Scholar 

  19. Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers 56(3) (2007)

    Google Scholar 

  20. Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array regrouping and structure splitting using whole-program reference affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shen, X., Shaw, J. (2008). Scalable Implementation of Efficient Locality Approximation. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89740-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89739-2

  • Online ISBN: 978-3-540-89740-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics