Compiler Technology for Blue Gene Systems

  • Stefan Kral
  • Markus Triska
  • Christoph W. Ueberhuber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4128)


Standard compilers are incapable of fully harnessing the enormous performance potential of Blue Gene systems. To reach the leading position in the Top500 supercomputing list, IBM had to put considerable effort into coding and tuning a limited range of low-level numerical kernel routines by hand. In this paper the Vienna MAP compiler is presented, which particularly targets signal transform codes ubiquitous in compute-intensive scientific applications. Compiling Fftw code, MAP reaches as much as 80% of the optimum performance of Blue Gene systems. In an application code MAP enabled a sustained performance of 60 Tflop/s to be reached on BlueGene/L.


Basic Block Register Allocation Target Processor Instruction Count SIMD Instruction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Belady, L.A.: A study of replacement algorithms for virtual storage computers. IBM Systems Journal 5(2), 78–101 (1966)CrossRefGoogle Scholar
  2. 2.
    Dockser, K.: Oedipus Architecture: Extensions to PowerPC BookE for Hummer2. Technical report, IBM (August 2001)Google Scholar
  3. 3.
    Moreira, J.E., et al.: Blue Gene/L Programming and Operating Environment. IBM Journal for Research and Development 49(2/3) (2005)Google Scholar
  4. 4.
    Puschel, M., et al.: SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE 93(2), 232–275 (2005)CrossRefGoogle Scholar
  5. 5.
    Chatterjee, S., et al.: Design and exploitation of a high-performance SIMD floatingpoint unit for Blue Gene/L. IBM Journal for Research and Development 49(2/3) (2005)Google Scholar
  6. 6.
    Fisher, R.J., Dietz, H.G.: Compiling for SIMD Within A Register. In: Carter, L., Ferrante, J., Sehr, D., Chatterjee, S., Prins, J.F., Li, Z., Yew, P.-C. (eds.) LCPC 1998. LNCS, vol. 1656, pp. 290–304. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. 7.
    Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W.: Efficient Utilization of SIMD Extensions. IEEE Special Issue on Program Generation, Optimization, and Platform Adaptation 93(2) (2005)Google Scholar
  8. 8.
    Frigo, M.: A Fast Fourier Transform Compiler. Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI) 34(5), 169–180 (1999)Google Scholar
  9. 9.
    Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proceedings of the IEEE Intl. Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1381–1384. IEEE, Los Alamitos (1998)Google Scholar
  10. 10.
    Guo, J., Garzaran, M., Padua, D.: The power of Belady’s algorithm in register allocation for long basic blocks. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 374–390. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Gygi, F., Draeger, E., de Supinski, B.R., Yates, R.K., Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W., Gunnels, J., Sexton, J.: Large-Scale First- Principles Molecular Dynamics Simulations on the BlueGene/L Platform using the Qbox Code. In: Proceedings of the ACM/IEEE Conference on Supercomputing, Gordon Bell Prize runner-up (2005)Google Scholar
  12. 12.
    Hoxey, S., Karim, F., Hay, B., Warren, H. (eds.): The PowerPC Compiler Writer’s Guide. Warthman Associates (1996)Google Scholar
  13. 13.
    Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. ACM SIGPLAN Notices 35(5), 145–156 (2000)CrossRefGoogle Scholar
  14. 14.
    Leupers, R., Bashford, S.: Graph-based code selection techniques for embedded processors. ACM Trans. Design Autom. Electron. Syst. 5(4), 794–814 (2000)CrossRefGoogle Scholar
  15. 15.
    Lorenz, J., Kral, S., Franchetti, F., Ueberhuber, C.W.: Vectorization techniques for the Blue Gene/L double FPU. IBM Journal for Research and Development 49(2/3) (2005)Google Scholar
  16. 16.
    Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco (1997)Google Scholar
  17. 17.
    Sikha, E., Simpson, R.: The PowerPC Architecture: A Specification for a New Family of RISC Processors, 2nd edn. Morgan Kaufmann, San Francisco (1995)Google Scholar
  18. 18.
    van Hentenryck, P.: Constraint Satisfaction in Logic Programming. MIT Press, Cambridge (1989)Google Scholar
  19. 19.
    Zima, H., Chapman, B.: Supercompilers for Parallel and Vector Computers. ACM Press, New York (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Stefan Kral
    • 1
  • Markus Triska
    • 1
  • Christoph W. Ueberhuber
    • 1
  1. 1.Institute for Analysis and Scientific ComputingVienna University of TechnologyWienAustria

Personalised recommendations