The Potential of On-Chip Multiprocessing for QCD Machines

  • Gianfranco Bilardi
  • Andrea Pietracaprina
  • Geppino Pucci
  • Fabio Schifano
  • Raffaele Tripiccione
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3769)


We explore the opportunities offered by current and forthcoming VLSI technologies to on-chip multiprocessing for Quantum Chromo Dynamics (QCD), a computational grand challenge for which over half a dozen specialized machines have been developed over the last two decades. Based on a careful study of the information exchange requirements of QCD both across the network and within the memory system, we derive the optimal partition of die area between storage and functional units. We show that a scalable chip organization holds the promise to deliver from hundreds to thousands flop per cycle as VLSI feature size scales down from 90 nm to 20 nm, over the next dozen years.


Functional Unit Dirac Operator Quantum Chromo Dynamics Memory Hierarchy VLSI Technology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abelson, H., Andreae, P.: Information transfer and area-time tradeoffs for VLSI multiplication. Communications of the ACM 23(1), 20–23 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Aggarwal, A., Chandra, A.K., Snir, M.: Hierarchical memory with block transfer. In: Proc. of the 28th IEEE Symp. on Foundations of Computer Science, pp. 204–216 (1987)Google Scholar
  3. 3.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Albanese, M., et al.: The APE Computer: an Array Processor Optimized for Lattice gauge Theory Simulations. Comput. Phys. Commun. 45, 345 (1987)CrossRefGoogle Scholar
  5. 5.
    Allen, F., et al.: Blue Gene: a vision for protein science using a petaflop supercomputer. IBM Systems Journal 40(2), 310–327 (2001)CrossRefGoogle Scholar
  6. 6.
    Almasi, G., et al.: Design and implementation of message passing services for the Blue Gene/L supercomputer. IBM J. Res. Develop. 49(2/3) (2005)Google Scholar
  7. 7.
    Alpern, B., Carter, L., Feig, E., Selker, T.: The uniform memory hierarchy model of computation. Algorithmica 12(2/3), 72–109 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Battista, C., et al.: The APE-100 Computer: (I) the Architecture. Int. J. High Speed Computing 5, 637 (1993)CrossRefGoogle Scholar
  9. 9.
    Beetem, J., Denneau, M., Weingarten, D.: The GF11 supercomputer. In: Proc.of 12th Int. Symposium on Computer Architecture, pp. 108–115 (1985)Google Scholar
  10. 10.
    Bilardi, G., Pietracaprina, A., D’Alberto, P.: On the space and access complexity of computation dags. In: Brandes, U., Wagner, D. (eds.) WG 2000. LNCS, vol. 1928, pp. 47–58. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Bilardi, G., Preparata, F.P.: Area-time lower-bound techniques with application to sorting. Algorithmica 1(1), 65–91 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Bilardi, G., Preparata, F.P.: Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32, 531–559 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Bilardi, G., Sarrafzadeh, M.: Optimal VLSI circuits for the discrete Fourier transform. In: Advances in Computing Research, vol. 4, pp. 87–101. JAI Press, Greenwich (1987)Google Scholar
  14. 14.
    Brent, R.P., Kung, H.T.: The chip complexity of binary arithmetic. J. Ass. Comp. Mach. 28(3), 521–534 (1981)zbMATHMathSciNetGoogle Scholar
  15. 15.
    Chen, D., et al.: QCDOC: A 10-teraflops scale computer for lattice QCD. In: Proc. of 18th Intl. Symposium on Lattice Field Theory (Lattice 2000), Bangalore, India (August 2000)Google Scholar
  16. 16.
    ClearSpeed Site,
  17. 17.
    Clouser, J., et al.: A 600-MHz superscalar floating-point processor. IEEE Journal on Solid-State Circuits 34(7), 1026–1029 (1999)CrossRefGoogle Scholar
  18. 18.
    Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, San Mateo (1999)Google Scholar
  19. 19.
    Cypher, R.: Theoretical aspects of VLSI PIN limitations. SIAM J. Comput. 2(2), 356–378 (1993)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Fantozzi, C., Pietracaprina, A., Pucci, G.: Seamless integration of parallelism and memory hierarchy. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 856–867. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  21. 21.
    Hong, J.W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proc. of the 13th ACM Symp. on Theory of Computing, pp. 326–333 (1981)Google Scholar
  22. 22.
  23. 23.
    Iwasaki, Y.: Computers for lattice field theories. Nuclear Physics (Proc. Suppl.) 34, 78 (1994)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Kahle, J., Suzuoki, M., Masubuchi, Y.: Cell Microprocessor, Briefing, San Francisco (February 7, 2005)Google Scholar
  25. 25.
    Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays ∙ Trees ∙ Hypercubes. Morgan Kaufmann, San Mateo (1992)zbMATHGoogle Scholar
  26. 26.
    Mueller, S., et al.: The vector floating-point unit in a synergistic processor element of a Cell processor. In: Proc. 17th IEEE Int. Symp. on Computer Arithmetic (June 2005) (To Appear)Google Scholar
  27. 27.
    Mawhinney, R.D.: The 1 Teraflops QCDSP Computer. Parallel Computing 25(10-11), 1281–1296 (1999)zbMATHCrossRefGoogle Scholar
  28. 28.
    Parallel Computing, 25(10–11), Special Issue on High Performance Computing in LQCD (1999)Google Scholar
  29. 29.
    Snir, M.: I/O Limitations on multi-chip VLSI systems. In: Proc. 19th Allerton Conference on Communications, Control, and Computing, Monticello, IL, pp. 224–233 (1981)Google Scholar
  30. 30.
    Sze, S.M. (ed.): VLSI Technology, 2nd edn. McGraw-Hill, New York (1988)Google Scholar
  31. 31.
    Thompson, C.D.: A complexity theory for VLSI. PhD thesis, Dept. of Computer Science, Carnegie-Mellon University, Tech. Rep. CMU-CS-80-140 (August 1980)Google Scholar
  32. 32.
    The Top 500 Supercomputer Sites,
  33. 33.
    Tripiccione, R.: APEmille. Parallel Computing 25(10-11), 1297–1309 (1999)zbMATHCrossRefGoogle Scholar
  34. 34.
    Tripiccione, R.: LGT simulations on APEmachines. Computer Physics Communications 139, 55 (2001)zbMATHCrossRefGoogle Scholar
  35. 35.
    Tripiccione, R.: Strategies for dedicated computing for lattice gauge theories. Computer Physics Communications 169, 442–448 (2005)CrossRefzbMATHGoogle Scholar
  36. 36.
    TRIPS: Tera-op Reliable Intelligently adaptive Processing System,
  37. 37.
    Ullman, J.D.: Computational Aspects of VLSI. Computer Science Press, Rockville MD (1984)zbMATHGoogle Scholar
  38. 38.
    Yao, A.C.C.: Some complexity questions related to distributive computing. Proc. of the 11th ACM Symp. on Theory of Comp., 209–213 (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Gianfranco Bilardi
    • 1
  • Andrea Pietracaprina
    • 1
  • Geppino Pucci
    • 1
  • Fabio Schifano
    • 2
  • Raffaele Tripiccione
    • 2
  1. 1.Dipartimento di Ingegneria dell’InformazioneUniversità di PadovaPadovaItaly
  2. 2.Dipartimento di FisicaUniversità di Ferrara, and INFNFerraraItaly

Personalised recommendations