Automatic Data Locality Optimization Through Self-optimization

  • Rainer Buchty
  • Jie Tao
  • Wolfgang Karl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4124)


Data locality optimization in parallel systems is a non-trivial task. This task is typically done by the programmer: based upon an exhaustive analysis of an application’s run-time behavior, data access and distribution is re-modeled manually. Once the system, application, or just the input data set changes this effort has to be repeated.

Ideally, this task can be automated which requires introduction of Self-X qualities into the system. We developed an architecture concept for self-organizing parallel computer systems. This architecture is based on two main principles which are flexible monitoring to instantiate self-awareness, and adaptive components for all aspects of self-configuration. It is completed by a self-awareness mechanism, the autonomic planning. These Self-X properties pervade all system layers.

Based on this architecture concept, we implemented an autonomic data locality optimization system. With the achieved results presented in this paper we successfully demonstrated suitability and applicability of the architecture concept and were able to highlight the benefits of autonomic data locality optimization.


System Layer Autonomic Computing Remote Node Query Interface Architecture Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AMD. AMD Athlon processor, x86 Code Optimization Guide (2002)Google Scholar
  2. 2.
    Anderson, J.M., Berc, L.M., Dean, J., Ghemawat, S., Henzinger, M.R., Leung, S.-T.A., Sites, R.L., Vandevoorde, M.T., Waldspurger, C.A., Weihl, W.E.: Continuous profiling: Where have all the cycles gone? In: Proceedings of the 16th ACM Symposium on Operating Systems Principles (October 1997)Google Scholar
  3. 3.
    Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., Dwarkadas, S.: Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: Proceedings of the International Symposium on Microarchitecture (December 2000)Google Scholar
  4. 4.
    Beckett, J.: Scaling IT for the Planet: Creating the worldwide computing utility,
  5. 5.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (Fall 2000)CrossRefGoogle Scholar
  6. 6.
    Buchty, R., Acher, G., Jeitner, J., Karl, W., Tao, J., Trinitis, C.: ASoCS: An Architecture Concept for Self-optimizing Parallel and Distributed Computer Systems. PARS Newsletter 22, 108–117 (2005)Google Scholar
  7. 7.
    Chi, E., Salem, M., Bahar, I., Weiss, R.: Combining software and hardware monitoring for improved power and performance tuning. In: Boston Area Architecture Workshop (BARC 2003) (June 2003)Google Scholar
  8. 8.
    Compaq Computer. Alpha 21264 Microprocessor Hardware Reference ManualGoogle Scholar
  9. 9.
    Burger, D., et al.: Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 44–55 (July 2004)Google Scholar
  10. 10.
    McKinley, P.K., et al.: Composing Adaptive Software. IEEE Computer, 55–65 (July 2004)Google Scholar
  11. 11.
    Fenlason, J., Stallman, R.: GNU gprof: The GNU Profiler (1997)Google Scholar
  12. 12.
    Gerndt, M., Karl, W.: EP-Cache (February 2005),
  13. 13.
    APART IST Working Group. Automatic Performance Analysis: Real Tools (2004),
  14. 14.
    Hockauf, R., Karl, W., Leberecht, M., Oberhuber, M., Wagner, M.: Exploiting Spatial and Temporal Locality of Accesses: A New Hardware-Based Monitoring Approach for DSM Systems. In: Pritchard, D., Reeve, J.S. (eds.) Euro-Par 1998. LNCS, vol. 1470, p. 206. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  15. 15.
    Horn, P.: IBM’s Perspective on the State of Information Technology,
  16. 16.
    IBM. Autonomic Computing Web Page,
  17. 17.
    IBM. Tivoli Software Web Page,
  18. 18.
    IBM. PowerPC 740/PowerPC 750 RISC Microprocessor User’s Manual (1999)Google Scholar
  19. 19.
    Intel. Intel Itanium Architecture Software Developer’s Manual (2000)Google Scholar
  20. 20.
    Intel. Intel Architecture Software Developer’s Manual Volume 3: System programming Guide (2002)Google Scholar
  21. 21.
    Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)Google Scholar
  22. 22.
    Karl, W., Leberecht, M., Oberhuber, M.: SCI Monitoring Hardware and Software: Supporting Performance Evaluation and Debugging. In: Hellwagner, H., Reinefeld, A. (eds.) SCI: Scalable Coherent Interface. LNCS, ch. 24, vol. 1734, pp. 417–432. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  23. 23.
    Lebeck, A.R., Fan, X., Zeng, H., Ellis, C.S.: Power-aware page allocation. In: Proceedings of ASPLOS, vol. IX (November 2000)Google Scholar
  24. 24.
    Liao, C., Martonosi, M., Clark, D.W.: Performance monitoring in a myrinet-connected shrimp cluster. In: Proceedings of the International Conference on Measurement and Modeling of Computer Systems (Sigmetrics 1998) (August 1998)Google Scholar
  25. 25.
    Ludwig, T., Wismüller, R., Sunderam, V., Bode, A.: OMIS – On-line Monitoring Interface Specification (Version 2.0). Shaker Verlag, Aachen (1997)Google Scholar
  26. 26.
    Ludwig, T., Wismüller, R.: OMIS 2.0 — A Universal Interface for Monitoring Systems. In: Bubak, M., Waśniewski, J., Dongarra, J. (eds.) PVM/MPI 1997. LNCS, vol. 1332, pp. 267–276. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  27. 27.
    Ludwig, T., Wismüller, R., Bode, A.: Interoperable Tools based on OMIS. In: Proc. 2nd SIGMETRICS Symposium on Parallel and Distributed Tools SPDT 1998, Welches, OR, USA, p. 155. ACM Press, New York (1998)CrossRefGoogle Scholar
  28. 28.
    McNealy, S., Papadopoulos, G., Schwartz, J.: N1: Revolutionary IT Architecture for Business,
  29. 29.
    Sun Microsystems. Ultra-SPARC IIi User’s Manual (1997)Google Scholar
  30. 30.
    Motorola. MPC7450 RISC Microprocessor Familiy User’s Manual (2001)Google Scholar
  31. 31.
    Mu, T., Tao, J., Schulz, M., McKee, S.A.: Interactive Locality Optimization on NUMA Architectures. In: Proceedings of the ACM Symposium on Software Visualization, San Diego, USA (June 2003)Google Scholar
  32. 32.
    Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: A portable interface to hardware performance counters. In: Proceedings of the Department of Depense HPCMP User Group Conference (June 1999)Google Scholar
  33. 33.
    Schulz, M., Tao, J., Trinitis, C., Karl, W.: SMiLE: An Integrated, Multi-paradigm Software Infrastructure for SCI-based Clusters. In: Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), Berlin, Germany, pp. 247–254 (May 2002)Google Scholar
  34. 34.
    InstallShield Software. Web Page,
  35. 35.
    ZeroG Software. Web Page,
  36. 36.
    Sprunt, B.: Pentium 4 performance-monitoring features. IEEE Micro, pp. 72–82 (July/August 2002)Google Scholar
  37. 37.
    Sprunt, B.: The basics of performance-monitoring hardware. IEEE Micro, pp. 64–71 (July/August 2002)Google Scholar
  38. 38.
    Tao, J., Schulz, M., Karl, W.: A Simulation Tool for Evaluating Shared Memory Systems. In: Proceedings of the 36th Annual Simulation Symposium, Hyatt Orlando, Florida (to appear, April 2003)Google Scholar
  39. 39.
    Wismüller, R.: Interoperability Support in the Distributed Monitoring System OCM. In: Proceedings 3rd International Conference on Parallel Processing and Applied Mathematics - PPAM 1999, Kazimierz Dolny, Poland, pp. 77–91 (September 1999)Google Scholar
  40. 40.
    Wismuller, R., Trinitis, J., Ludwig, T.: OCM - A Monitoring System for Interoperable Tools. In: Proc. 2nd SIGMETRICS Symposium on Parallel and Distributed Tools SPDT 1998, Welches, OR, USA, p. 149. ACM Press, New York (1998)Google Scholar
  41. 41.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 24–36 (June 1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rainer Buchty
    • 1
  • Jie Tao
    • 1
  • Wolfgang Karl
    • 1
  1. 1.Institut für Technische InformatikUniversität Karlsruhe (TH)KarlsruheGermany

Personalised recommendations