International Journal of Parallel Programming

, Volume 37, Issue 5, pp 488–507 | Cite as

The Bottom-Up Implementation of One MILC Lattice QCD Application on the Cell Blade

  • Guochun Shi
  • Volodymyr KindratenkoEmail author
  • Steven Gottlieb


We report the results of the bottom-up implementation of one MILC lattice quantum chromodynamics (QCD) application on the Cell Broadband Engine™ processor. In our implementation, we preserve MILC’s framework for scaling the application to run on a large number of compute nodes and accelerate computationally intensive kernels on the Cell’s synergistic processor elements. Speedups of 3.4 × for the 8 × 8 × 16 × 16 lattice and 5.7 × for the 16 × 16 × 16 × 16 lattice are obtained when comparing our implementation of the MILC application executed on a 3.2 GHz Cell processor to the standard MILC code executed on a quad-core 2.33 GHz Intel Xeon processor. We provide an empirical model to predict application performance for a given lattice size. We also show that performance of the compute-intensive part of the application on the Cell processor is limited by the bandwidth between main memory and the Cell’s synergistic processor elements, whereas performance of the application’s parallel execution framework is limited by the bandwidth between main memory and the Cell’s power processor element.


Cell Broadband Engine Quantum chromodynamics MILC 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    APE—The Array Processor Experiment:
  2. 2.
    Belletti, F., Bilardi, G., Drochner, M., Eicker, N., Fodor, Z., Hierl, D., Kaldass, H., Lippert, T., Maurer, T., Meyer, N., Nobile, A., Pleiter, D., Schäfer, A., Schifano, F., Simma, H., Solbrig, S., Streuer, T., Tripiccione, R., Wettig, T.: QCD on the Cell Broadband Engine. In: Proceedings of the XXV International Symposium on Lattice Field Theory, Regensburg, Germany, July 2007Google Scholar
  3. 3.
    Bilardi, G., Pietracaprina, A., Pucci, G., Schifano, F., Tripiccione, R.: The Potential of On-Chip Multiprocessing for QCD Machines. Lecture Notes in Computer Science, vol. 3769, p. 386. Springer (2005). doi: 10.1007/11602569_41
  4. 4.
    Boyle P.A. et al.: Overview of the QCDSP and QCDOC computers. IBM J. Res. Develop. 49(March/May), 351–365 (2005)Google Scholar
  5. 5.
    Cyberinfrastructure partnership Previous Allocations Awards:
  6. 6.
    Gottlieb S. et al.: Hybrid-molecular-dynamics algorithms for the numerical simulation of quantum chromodynamics. Phys. Rev. D 35, 2531–2542 (1987)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Greiner, W., Schäfer, A.: Quantum Chromodynamics. Springer (1994)Google Scholar
  8. 8.
  9. 9.
    IBM Software Development Kit (SDK) for Multicore Acceleration: Version 3.0.
  10. 10.
    Iwasaki, Y., Status of the CP-PACS Project. In: Proceedings of Lattice ’96, Nucl. Phys. BGoogle Scholar
  11. 11.
    Kahle J., Day M., Hofstee H., Johns C., Maeurer T., Shippy D.: Introduction to the cell multiprocessor. IBM J. Res. Develop. 49(4/5), 589–604 (2005)CrossRefGoogle Scholar
  12. 12.
    Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P., Sharma, R., Kapoor, S., Srinivasan, A.: A Synchronous Mode MPI Implementation on the Cell BE Architecture. In: Proceedings of the 5th International Symposium on Parallel and Distributed Processing and Applications (ISPA-07). Lecture Notes in Computer Science, vol. 4742, pp. 982–991 (2007). doi: 10.1007/978-3-540-74742-0_86
  13. 13.
    Kufrin, R.: PerfSuite: An Accessible, Open Source Performance Analysis Environment for Linux. In: 6th International Conference on Linux Clusters: The HPC Revolution 2005. Chapel Hill, NC, April 2005Google Scholar
  14. 14.
    Los Alamos National Laboraroty Roadrunner Supercomputer:
  15. 15.
    Motoki, S., Nakamura, A.: Development of QCD code on a CELL Machine, In: Proceedings of the XXV International Symposium on Lattice Field Theory, Regensburg, Germany, July 2007Google Scholar
  16. 16.
    Pleiter, D.: Lattice QCD on the Cell BE, Power Architecture Developer Conference. (2007)
  17. 17.
  18. 18.
    Quad-Core Intel® Xeon® Processor 5300 Series Datasheet: September (2007)
  19. 19.
    Shi, G., Kindratenko, V., Gottlieb, S.: Cell processor implementation of a MILC lattice QCD application. In: Proceedings of the XXVI International Symposium on Lattice Field Theory—Lattice 2008Google Scholar
  20. 20.
    Smit, J.: Introduction to Quantum Fields on a Lattice. Lecture Notes in Physics. Cambridge University Press (2002)Google Scholar
  21. 21.
    Spray, J.: Lattice QCD on the Cell Processor, MS Thesis, 2007, The University of Edinburgh,
  22. 22.
    The MIMD Lattice Computation (MILC) Collaboration:
  23. 23.
    The STREAM benchmark:
  24. 24.
    Wolf, M.: Efficient linear algebra related to lattice QCD on Cell Broadband Engines, Technical Report, Ruprecht-Karls-University Heidelberg, (2007)
  25. 25.
    Zhang, Y.: Performance Measurement and Analysis on BlueGene/L Using SvPablo, SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, CA, February 2006Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Guochun Shi
    • 1
  • Volodymyr Kindratenko
    • 1
    Email author
  • Steven Gottlieb
    • 2
  1. 1.National Center for Supercomputing ApplicationsUniversity of IllinoisUrbanaUSA
  2. 2.Department of PhysicsIndiana UniversityBloomingtonUSA

Personalised recommendations