Abstract
We report the results of the bottom-up implementation of one MILC lattice quantum chromodynamics (QCD) application on the Cell Broadband Engine™ processor. In our implementation, we preserve MILC’s framework for scaling the application to run on a large number of compute nodes and accelerate computationally intensive kernels on the Cell’s synergistic processor elements. Speedups of 3.4 × for the 8 × 8 × 16 × 16 lattice and 5.7 × for the 16 × 16 × 16 × 16 lattice are obtained when comparing our implementation of the MILC application executed on a 3.2 GHz Cell processor to the standard MILC code executed on a quad-core 2.33 GHz Intel Xeon processor. We provide an empirical model to predict application performance for a given lattice size. We also show that performance of the compute-intensive part of the application on the Cell processor is limited by the bandwidth between main memory and the Cell’s synergistic processor elements, whereas performance of the application’s parallel execution framework is limited by the bandwidth between main memory and the Cell’s power processor element.
Similar content being viewed by others
References
APE—The Array Processor Experiment: http://apegate.roma1.infn.it/APE/
Belletti, F., Bilardi, G., Drochner, M., Eicker, N., Fodor, Z., Hierl, D., Kaldass, H., Lippert, T., Maurer, T., Meyer, N., Nobile, A., Pleiter, D., Schäfer, A., Schifano, F., Simma, H., Solbrig, S., Streuer, T., Tripiccione, R., Wettig, T.: QCD on the Cell Broadband Engine. In: Proceedings of the XXV International Symposium on Lattice Field Theory, Regensburg, Germany, July 2007
Bilardi, G., Pietracaprina, A., Pucci, G., Schifano, F., Tripiccione, R.: The Potential of On-Chip Multiprocessing for QCD Machines. Lecture Notes in Computer Science, vol. 3769, p. 386. Springer (2005). doi:10.1007/11602569_41
Boyle P.A. et al.: Overview of the QCDSP and QCDOC computers. IBM J. Res. Develop. 49(March/May), 351–365 (2005)
Cyberinfrastructure partnership Previous Allocations Awards: http://www.ci-partnership.org/Allocations/awards.html
Gottlieb S. et al.: Hybrid-molecular-dynamics algorithms for the numerical simulation of quantum chromodynamics. Phys. Rev. D 35, 2531–2542 (1987)
Greiner, W., Schäfer, A.: Quantum Chromodynamics. Springer (1994)
IBM BladeCenter QS20 Datasheet: http://www-03.ibm.com/technology/splash/qs20/pdf/qs20_datasheet.pdf
IBM Software Development Kit (SDK) for Multicore Acceleration: Version 3.0. http://www-03.ibm.com/technology/cell/software.html
Iwasaki, Y., Status of the CP-PACS Project. In: Proceedings of Lattice ’96, Nucl. Phys. B
Kahle J., Day M., Hofstee H., Johns C., Maeurer T., Shippy D.: Introduction to the cell multiprocessor. IBM J. Res. Develop. 49(4/5), 589–604 (2005)
Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P., Sharma, R., Kapoor, S., Srinivasan, A.: A Synchronous Mode MPI Implementation on the Cell BE Architecture. In: Proceedings of the 5th International Symposium on Parallel and Distributed Processing and Applications (ISPA-07). Lecture Notes in Computer Science, vol. 4742, pp. 982–991 (2007). doi:10.1007/978-3-540-74742-0_86
Kufrin, R.: PerfSuite: An Accessible, Open Source Performance Analysis Environment for Linux. In: 6th International Conference on Linux Clusters: The HPC Revolution 2005. Chapel Hill, NC, April 2005
Los Alamos National Laboraroty Roadrunner Supercomputer: http://www.lanl.gov/orgs/hpc/roadrunner/
Motoki, S., Nakamura, A.: Development of QCD code on a CELL Machine, In: Proceedings of the XXV International Symposium on Lattice Field Theory, Regensburg, Germany, July 2007
Pleiter, D.: Lattice QCD on the Cell BE, Power Architecture Developer Conference. http://www.power.org/devcon/07/Session_Downloads/PADC07_Pleiter.pdf (2007)
PowerEdge 1955 Spec Sheet: http://www.dell.com/downloads/global/products/pedge/en/pe1955_spec_sheet.pdf
Quad-Core Intel® Xeon® Processor 5300 Series Datasheet: http://download.intel.com/design/Xeon/datashts/31556903.pdf September (2007)
Shi, G., Kindratenko, V., Gottlieb, S.: Cell processor implementation of a MILC lattice QCD application. In: Proceedings of the XXVI International Symposium on Lattice Field Theory—Lattice 2008
Smit, J.: Introduction to Quantum Fields on a Lattice. Lecture Notes in Physics. Cambridge University Press (2002)
Spray, J.: Lattice QCD on the Cell Processor, MS Thesis, 2007, The University of Edinburgh, http://www2.epcc.ed.ac.uk/msc/dissertations/dissertations-0607/8991210-27b-d07rep1.2.pdf
The MIMD Lattice Computation (MILC) Collaboration: http://www.physics.utah.edu/~detar/milc/
The STREAM benchmark: http://www.cs.virginia.edu/stream/
Wolf, M.: Efficient linear algebra related to lattice QCD on Cell Broadband Engines, Technical Report, Ruprecht-Karls-University Heidelberg, http://www-zeuthen.desy.de/students/2007/doc/wolf.pdf (2007)
Zhang, Y.: Performance Measurement and Analysis on BlueGene/L Using SvPablo, SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, CA, February 2006
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shi, G., Kindratenko, V. & Gottlieb, S. The Bottom-Up Implementation of One MILC Lattice QCD Application on the Cell Blade. Int J Parallel Prog 37, 488–507 (2009). https://doi.org/10.1007/s10766-009-0102-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-009-0102-0