Skip to main content

Automatic Generation of the HPC Challenge’s Global FFT Benchmark for BlueGene/P

  • Conference paper
High Performance Computing for Computational Science - VECPAR 2012 (VECPAR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7851))

Abstract

We present the automatic synthesis of the HPC Challenge’s Global FFT, a large 1D FFT across a whole supercomputer system. We extend the Spiral system to synthesize specialized single-node FFT libraries that combine a data layout transformation with the actual on-node FFT computation to improve the network performance through enabling all-to-all collectives. We run our optimized Global FFT benchmark on up to 128k cores (32 racks) of ANL’s BlueGene/P “Intrepid” and achieved 6.4 Tflop/s, outperforming ANL’s 2008 HPC Challenge Class I Global FFT run (5 Tflop/s). Our code was part of IBM’s winning 2010 HPC Challenge Class II submission. Further, we show first single-thread results on BlueGene/Q.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luszczek, P., Bailey, D., Dongarra, J., Kepner, J., Lucas, R., Rabenseifner, R., Takahashi, D.: The HPC Challenge (HPCC) benchmark suite. In: SC 2006 Conference Tutorial (2006)

    Google Scholar 

  2. Meuer, H.W.: The top500 project: Looking back over 15 years of supercomputing experience (2008)

    Google Scholar 

  3. Takahashi, D.: An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 1178–1187. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomputing 4, 23–35 (1990)

    Article  Google Scholar 

  5. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005); special issue on Program Generation, Optimization, and Adaptation

    Article  Google Scholar 

  6. Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE 93(2), 232–275 (2005); special issue on Program Generation, Optimization, and Adaptation

    Article  Google Scholar 

  7. Voronenko, Y., de Mesmay, F., Püschel, M.: Computer generation of general size linear transform libraries. In: Proc. Code Generation and Optimization (CGO), pp. 102–113 (2009)

    Google Scholar 

  8. Franchetti, F., Kral, S., Lorenz, J., Püschel, M., Überhuber, C.W.: Automatically tuned fFTs for blueGene/L’s double FPU. In: Daydé, M., Dongarra, J., Hernández, V., Palma, J.M.L.M. (eds.) VECPAR 2004. LNCS, vol. 3402, pp. 23–36. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Franchetti, F., Püschel, M., Voronenko, Y., Chellappa, S., Moura, J.M.F.: Discrete Fourier transform on multicore. IEEE Signal Processing Magazine, special issue on “Signal Processing on Platforms with Multiple Cores” 26(6), 90–102 (2009)

    Article  Google Scholar 

  10. UPC Consortium: UPC language specifications, v1.2, Lawrence Berkeley National Lab. Tech Report LBNL-59208 (2005)

    Google Scholar 

  11. Bonelli, A., Franchetti, F., Lorenz, J., Püschel, M., Uberhuber, C.W.: Automatic performance optimization of the discrete fourier transform on distributed memory computers. In: Guo, M., Yang, L.T., Di Martino, B., Zima, H.P., Dongarra, J., Tang, F. (eds.) ISPA 2006. LNCS, vol. 4330, pp. 818–832. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Van Loan, C.: Computational Framework of the Fast Fourier Transform. SIAM (1992)

    Google Scholar 

  13. Chellappa, S., Franchetti, F., Püschel, M.: How to write fast numerical code: A small introduction. In: Lämmel, R., Visser, J., Saraiva, J. (eds.) Generative and Transformational Techniques in Software Engineering II. LNCS, vol. 5235, pp. 196–259. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Alam, S., Barrett, R., Bast, M., Fahey, M.R., Kuehn, J., McCurdy, C., Rogers, J., Roth, P., Sankaran, R., Vetter, J.S., Worley, P., Yu, W.: Early evaluation of IBM BlueGene/P. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 23:1–23:12. IEEE Press, Piscataway (2008)

    Google Scholar 

  15. Team, T.B.G.: Blue Gene/Q: by co-design. Computer Science - Research and Development, 1–9 (2012)

    Google Scholar 

  16. Haring, R., Ohmacht, M., Fox, T., Gschwind, M., Satterfield, D., Sugavanam, K., Coteus, P., Heidelberger, P., Blumrich, M., Wisniewski, R., Gara, A., Chiu, G., Boyle, P., Chist, N., Kim, C.: The ibm blue gene/q compute chip. IEEE Micro 32(2), 48–60 (2012)

    Article  Google Scholar 

  17. Franchetti, F., Püschel, M.: Fast Fourier Transform. In: Encyclopedia of Parallel Computing. Springer (2011)

    Google Scholar 

  18. Chellappa, S.: Computer Generation of Fourier Transform Libraries for Distributed Memory Architectures. PhD thesis, Electrical and Computer Engineering, Carnegie Mellon University (2010)

    Google Scholar 

  19. Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., Booth, M., Rossi, F.: GNU Scientific Library Reference Manual (v1.12), 3rd edn. Network Theory Ltd. (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Franchetti, F., Voronenko, Y., Almasi, G. (2013). Automatic Generation of the HPC Challenge’s Global FFT Benchmark for BlueGene/P. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science - VECPAR 2012. VECPAR 2012. Lecture Notes in Computer Science, vol 7851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38718-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38718-0_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38717-3

  • Online ISBN: 978-3-642-38718-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics