Online Phase-Adaptive Data Layout Selection

  • Chengliang Zhang
  • Martin Hirzel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5142)

Abstract

Good data layouts improve cache and TLB performance of object-oriented software, but unfortunately, selecting an optimal data layout a priori is NP-hard. This paper introduces layout auditing, a technique that selects the best among a set of layouts online (while the program is running). Layout auditing randomly applies different layouts over time and observes their performance. As it becomes confident about which layout performs best, it selects that layout with higher probability. But if a phase shift causes a different layout to perform better, layout auditing learns the new best layout. We implemented our technique in a product Java virtual machine, using copying generational garbage collection to produce different layouts, and tested it on 20 long-running benchmarks and 4 hardware platforms. Given any combination of benchmark and platform, layout auditing consistently performs close to the best layout for that combination, without requiring offline training.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abuaiadh, D., Ossia, Y., Petrank, E., Silbershtein, U.: An efficient parallel heap compaction algorithm. In: Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2004)Google Scholar
  2. 2.
    Adl-Tabatabai, A.-R., Hudson, R.L., Serrano, M.J., Subramoney, S.: Prefetch injection based on hardware monitoring and object metadata. In: Programming Language Design and Implementation (PLDI) (2004)Google Scholar
  3. 3.
    Arnold, M., Ryder, B.G.: A framework for reducing the cost of instrumented code. In: Programming Language Design and Implementation (PLDI) (2001)Google Scholar
  4. 4.
    Blackburn, S.M., Garner, R., Hoffman, C., Khan, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B.: The DaCapo benchmarks: Java benchmarking development and analysis. In: Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2006)Google Scholar
  5. 5.
    Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: IEEE SuperComputing (SC) (2000)Google Scholar
  6. 6.
    Chen, W.K., Bhansali, S., Chilimbi, T., Gao, X., Chuang, W.: Profile-guided proactive garbage collection for locality optimization. In: Programming Language Design and Implementation (PLDI) (2006)Google Scholar
  7. 7.
    Cheney, C.J.: A nonrecursive list compacting algorithm. Communications of the ACM (CACM) (1970)Google Scholar
  8. 8.
    Cheng, P., Blelloch, G.E.: A parallel, real-time garbage collector. In: Programming Language Design and Implementation (PLDI) (2001)Google Scholar
  9. 9.
    Chilimbi, T.M., Larus, J.R.: Using generational garbage collection to implement cache-conscious data placement. In: International Symposium on Memory Management (ISMM) (1998)Google Scholar
  10. 10.
    Courts, R.: Improving locality of reference in a garbage-collecting memory management system. Communications of the ACM (CACM) (1988)Google Scholar
  11. 11.
    Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Programming Language Design and Implementation (PLDI) (1999)Google Scholar
  12. 12.
    Diniz, P., Rinard, M.: Dynamic feedback: An effective technique for adaptive computing. In: Programming Language Design and Implementation (PLDI) (1997)Google Scholar
  13. 13.
    Fenichel, R.R., Yochelson, J.C.: A LISP garbage-collector for virtual-memory computer systems. Communications of the ACM (CACM) (1969)Google Scholar
  14. 14.
    Flood, C.H., Detlefs, D., Shavit, N., Zhang, X.: Parallel garbage collection for shared memory multiprocessors. In: Java Virtual Machine Research and Technology Symposium (JVM) (2001)Google Scholar
  15. 15.
    Fursin, G., Cohen, A., O’Boyle, M., Temam, O.: A practical method for quickly evaluating program optimizations. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Halstead Jr., R.H.: Multilisp: A language for concurrent symbolic computation. Transactions on Programming Languages and Systems (TOPLAS) (1985)Google Scholar
  17. 17.
    Hirzel, M.: Data layouts for object-oriented programs. In: Measurement and Modeling of Computer Systems (SIGMETRICS) (2007)Google Scholar
  18. 18.
    Hirzel, M., Chilimbi, T.M.: Bursty tracing: A framework for low-overhead temporal profiling. In: Feedback-Directed and Dynamic Optimizations (FDDO) (2001)Google Scholar
  19. 19.
    Hirzel, M., Diwan, A., Hertz, M.: Connectivity-based garbage collection. In: Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2003)Google Scholar
  20. 20.
    Huang, X., Blackburn, S.M., McKinley, K.S., Moss, J.E.B., Wang, Z., Cheng, P.: The garbage collection advantage: improving program locality. In: Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2004)Google Scholar
  21. 21.
    Ibrahim, A., Cook, W.R.: Automatic prefetching by traversal profiling in object persistence architectures. In: Thomas, D. (ed.) ECOOP 2006. LNCS, vol. 4067. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Imai, A., Tick, E.: Evaluation of parallel copying garbage collection on a shared-memory multiprocessor. IEEE Transactions on Parallel and Distributed Systems (1993)Google Scholar
  23. 23.
    Inagaki, T., Onodera, T., Komatsu, H., Nakatani, T.: Stride prefetching by dynamically inspecting objects. In: Programming Language Design and Implementation (PLDI) (2003)Google Scholar
  24. 24.
    Jones, R., Lins, R.: Garbage collection: Algorithms for automatic dynamic memory management. John Wiley, Chichester (1996)MATHGoogle Scholar
  25. 25.
    Kermany, H., Petrank, E.: The Compressor: Concurrent, incremental, and parallel compaction. In: Programming Language Design and Implementation (PLDI) (2006)Google Scholar
  26. 26.
    Lau, J., Arnold, M., Hind, M., Calder, B.: Online performance auditing: Using hot optimizations without getting burned. In: Programming Language Design and Implementation (PLDI) (2006)Google Scholar
  27. 27.
    Lieberman, H., Hewitt, C.: A real-time garbage collector based on the lifetimes of objects. Communications of the ACM (CACM) (1983)Google Scholar
  28. 28.
    McGachey, P., Hosking, A.L.: Reducing generational copy reserve overhead with fallback compaction. In: International Symposium on Memory Management (ISMM) (2006)Google Scholar
  29. 29.
    McGovern, A., Moss, J.E.B., Barto, A.G.: Building a basic block instruction scheduler with reinforcement learning and rollouts. Machine Learning 49(2-3) (2002)Google Scholar
  30. 30.
    Moon, D.A.: Garbage collection in a large Lisp system. In: LISP and Functional Programming (LFP) (1984)Google Scholar
  31. 31.
    Nagpurkar, P., Hind, M., Krintz, C., Sweeney, P., Rajan, V.: Online phase detection algorithms. In: Code Generation and Optimization (CGO) (2006)Google Scholar
  32. 32.
    Petrank, E., Rawitz, D.: The hardness of cache conscious data placement. In: Principles of Programming Languages (POPL) (2002)Google Scholar
  33. 33.
    Robbins, H.E.: Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society (58), 527–535 (1952)MATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Saavedra, R.H., Park, D.: Improving the effectiveness of software prefetching with adaptive execution. In: Parallel Architectures and Compilation Techniques (PACT) (1996)Google Scholar
  35. 35.
    Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: Malyshkin, V.E. (ed.) PACT 2001. LNCS, vol. 2127. Springer, Heidelberg (2001)Google Scholar
  36. 36.
    Shuf, Y., Gupta, M., Bordawekar, R., Singh, J.P.: Exploiting prolific types for memory management and optimizations. In: Principles of Programming Languages (POPL) (2002)Google Scholar
  37. 37.
    Shuf, Y., Gupta, M., Franke, H., Appel, A., Singh, J.P.: Creating and preserving locality of Java applications at allocation and garbage collection times. In: Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2002)Google Scholar
  38. 38.
    Siegwart, D., Hirzel, M.: Improving locality with parallel hierarchical copying GC. In: International Symposium on Memory Management (ISMM) (2006)Google Scholar
  39. 39.
    Singer, J., Brown, G., Watson, I., Cavazos, J.: Intelligent selection of application-specific garbage collectors. In: International Symposium on Memory Management (ISMM) (2007)Google Scholar
  40. 40.
    Soman, S., Krintz, C., Bacon, D.F.: Dynamic selection of application-specific garbage collectors. In: International Symposium on Memory Management (ISMM) (2004)Google Scholar
  41. 41.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  42. 42.
    Ungar, D.: Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In: Software Engineering Symposium on Practical Software Development Environments (SESPSDE) (1984)Google Scholar
  43. 43.
    Voss, M.J., Eigenmann, R.: High-level adaptive program optimization with ADAPT. In: Principles and Practice of Parallel Programming (PPoPP) (2001)Google Scholar
  44. 44.
    Wilson, P.R., Lam, M.S., Moher, T.G.: Effective “static-graph” reorganization to improve locality in a garbage-collected system. In: Conference on Programming Language Design and Implementation (PLDI) (1991)Google Scholar
  45. 45.
    Zhang, C., Ding, C., Ogihara, M., Zhong, Y., Wu, Y.: A hierarchical model of data locality. In: Principles of Programming Languages (POPL) (2006)Google Scholar
  46. 46.
    Zhang, L., Fang, Z., Parker, M., Mathew, B.K., Schaelicke, L., Carter, J.B., Hsieh, W.C., McKee, S.A.: The Impulse memory controller. IEEE Transactions on Computers (2001)Google Scholar
  47. 47.
    Zhang, W., Calder, B., Tullsen, D.M.: A self-repairing prefetcher in an event-driven dynamic optimization framework. In: Code Generation and Optimization (CGO) (2006)Google Scholar
  48. 48.
    Zhao, Q., Rabbah, R., Amarasinghe, S., Rudolph, L., Wong, W.-F.: Ubiquitous memory introspection. In: Code Generation and Optimization (CGO) (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Chengliang Zhang
    • 1
  • Martin Hirzel
    • 2
  1. 1.Microsoft in Redmond, WA, (C. Zhang was a student at the U. of Rochester when doing this work.) 
  2. 2.IBM in Hawthorne, NY 

Personalised recommendations