Computer Memory: Why We Should Care What Is under the Hood

  • Vlastimil Babka
  • Petr Tůma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7119)


The memory subsystems of contemporary computer architectures are increasingly complex – in fact, so much so that it becomes difficult to estimate the performance impact of many coding constructs, and some long known coding patterns are even discovered to be principally wrong. In contrast, many researchers still reason about algorithmic complexity in simple terms, where memory operations are sequential and of equal cost. The goal of this talk is to give an overview of some memory subsystem features that violate this assumption significantly, with the ambition to motivate development of algorithms tailored to contemporary computer architectures.


Memory Access Cache Size Cache Line Physical Address Virtual Address 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AMD: AMD64 Architecture Programmers Manual : System Programming, 3.18 edn. vol. 2 (2011)Google Scholar
  2. 2.
    AMD: Software Optimization Guide for AMD Family 15h Processors, 3.03 edn. (2011)Google Scholar
  3. 3.
    Babka, V.: Cache Sharing Sensitivity of SPEC CPU2006 Benchmarks. Tech. Rep. 2009/3 2.0, Department of Software Engineering, Faculty of Mathematics and Physics, Charles University (2009)Google Scholar
  4. 4.
    Babka, V., Bulej, L., Decky, M., Kraft, J., Libic, P., Marek, L., Seceleanu, C., Tuma, P.: Resource Usage Modeling: Q-ImPrESS Project Deliverable D3.3 (2009),
  5. 5.
    Babka, V., Bulej, L., Libic, P., Marek, L., Martinec, T., Podzimek, A., Tuma, P.: Resource Impact Analysis: Q-ImPrESS Project Deliverable D3.4 (2011),
  6. 6.
    Babka, V., Marek, L., Tuma, P.: When Misses Differ: Investigating Impact of Cache Misses on Observed Performance. In: Proceedings of ICPADS 2009. IEEE (2009)Google Scholar
  7. 7.
    Babka, V., Tuma, P.: Investigating Cache Parameters of x86 Family Processors. In: Kaeli, D., Sachs, K. (eds.) SPEC Benchmark Workshop 2009. LNCS, vol. 5419, pp. 77–96. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Babka, V., Tuma, P.: Can Linear Approximation Improve Performance Prediction?. In: Proceedings of EPEW 2011. Springer, Heidelberg (2011)Google Scholar
  9. 9.
    Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H.: The NAS Parallel Benchmarks. Tech. Rep. RNR-94-007, RNR (1994)Google Scholar
  10. 10.
    Boehm, H.: Threads and Memory Model for C++,
  11. 11.
    Drepper, U.: What Every Programmer Should Know About Memory. Tech. rep., Red Hat (2007)Google Scholar
  12. 12.
    Fields, B.A., Bodik, R., Hill, M.D., Newburn, C.J.: Interaction Cost and Shotgun Profiling. ACM Transactions on Architecture and Code Optimization 1, 272–304 (2004), CrossRefGoogle Scholar
  13. 13.
    Goetz, B.: Double-Checked Locking: Clever, But Broken. JavaWorld (2001)Google Scholar
  14. 14.
    Intel: Intel 64 and IA-32 Architectures Optimization Reference Manual, 248966-025 edn. (2011)Google Scholar
  15. 15.
    Intel: Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A: System Programming Guide, Part 1, 253668-039 edn. (2011)Google Scholar
  16. 16.
    Intel: Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B: System Programming Guide, Part 2, 253669-039 edn. (2011)Google Scholar
  17. 17.
    Kalibera, T., Tuma, P.: Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results. In: Horváth, A., Telek, M. (eds.) EPEW 2006. LNCS, vol. 4054, pp. 63–77. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    McCurdy, C., Vetter, J.: Memphis: Finding and Fixing NUMA-Related Performance Problems on Multi-Core Platforms. In: Proceedings of ISPASS 2010. IEEE (2010)Google Scholar
  19. 19.
    Meyers, S., Alexandrescu, A.: C++ and the Perils of Double-Checked Locking. Dr. Dobb’s Journal (2004)Google Scholar
  20. 20.
    Mytkowicz, T., Diwan, A., Hauswirth, M., Sweeney, P.F.: Producing Wrong Data Without Doing Anything Obviously Wrong. In: Proceedings of ASPLOS 2009, pp. 265–276. ACM (2009)Google Scholar
  21. 21.
    Omni Compiler Project. High Performance Computing Systems Laboratory, Graduate School of Systems and Information Engineering, University of Tsukuba,
  22. 22.
    Prokop, H.: Cache-Oblivious Algorithms. Master Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (1999)Google Scholar
  23. 23.
    Pugh, B.: The Java Memory Model,
  24. 24.
    Schmidt, D.C., Harrison, T.: Double-Checked Locking – An Object Behavioral Pattern for Initializing and Accessing Thread-Safe Objects Efficiently. Presented at PLoP 1996 (1996)Google Scholar
  25. 25.
    Stallman, R.M., et al.: Using the GNU Compiler Collection, 4.6.1 edn.Google Scholar
  26. 26.
    The Evaluate Collaboratory: Experimental Evaluation of Software and Systems in Computer Science,
  27. 27.
    Williams, S.: The Roofline Model. In: Performance Tuning of Scientific Applications. CRC (2010)Google Scholar
  28. 28.
    Wolfe, M.: More Iteration Space Tiling. In: Proceedings of Supercomputing 1989, pp. 655–664. ACM, New York (1989), CrossRefGoogle Scholar
  29. 29.
    Yotov, K., Pingali, K., Stodghill, P.: Automatic Measurement of Memory Hierarchy Parameters. In: Proceedings of SIGMETRICS 2005. ACM (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Vlastimil Babka
    • 1
  • Petr Tůma
    • 1
  1. 1.Department of Distributed and Dependable Systems, Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

Personalised recommendations