Optimal grain size computation for pipelined algorithms

  • Frédéric Desprez
  • Pierre Ramet
  • Jean Roman
Workshop 01 Programming Environment and Tools
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1123)


In this paper, we present a method for overlapping communications on parallel computers for pipelined algorithms. We first introduce a general theoretical model which leads to a generic computation scheme for the optimal packet size. Then, we use the OPIUM library, which provides an easy-to-use and efficient way to compute, in the general case, this optimal packet size, on the column LU factorization; the implementation and performance measures are made on an Intel Paragon.


Communications overlap pipelined algorithms optimal packet size computation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BD96]
    T. Brandes and F. Desprez. Implementing Pipelined Computation and Communication in an HPF Compiler. Submitted to Europar'96, 1996.Google Scholar
  2. [CDD+95]
    J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. LAPACK Working Note: ScaLA-PACK: A Portable Linear Algebra Library for Distributed Memory Computers — Design Issues and Performances. Technical Report 95, Department of Computer Science — University of Tennessee, 1995.Google Scholar
  3. [Des94a]
    F. Desprez. A Library for Coarse Grain Macro-Pipelining in Distributed Memory Architectures. In IFIP 10.3 Conference on Programming Environments for Massively Parallel Distributed Systems, pages 365–371. Birkhaeuser Verlag AG, Basel, Switzerland, 1994.Google Scholar
  4. [Des94b]
    F. Desprez. Procédures de Base pour le Calcul Scientifique sur Machines Parallèles à Mémoire Distribuée. PhD thesis, Institut National Polytechnique de Grenoble, January 1994. LIP ENS-Lyon.Google Scholar
  5. [DRR96]
    F. Desprez, P. Ramet, and J. Roman. Optimal grain size computation for pipelined algorithms. Technical report, Laboratoire Bordelais de Recherche en Informatique, 1996.Google Scholar
  6. [DT94]
    F. Desprez and B. Tourancheau. LOCCS: Low Overhead Communication and Computation Subroutines. Future Generation Computer Systems, 10(2&3):279–284, June 1994.CrossRefGoogle Scholar
  7. [MP92]
    B. Tourancheau M. Pourzandi. Recouvrement Calcul/Communication dans l'Elimination de Gauss sur un iPSC/960. Technical report, Ecole Normal Supérieure de Lyon, 1992.Google Scholar
  8. [OSKO95]
    H. Ohta, Y. Saito, M. Kainaga, and H. Ono. Optimal Tile Size Adjustement in Compiling General DOACROSS Loop Nests. In ACM Press, editor, International Conference on Supercomputing, pages 270–279, Barcelona, Spain, July 1995. ACM SIGARCH.Google Scholar
  9. [Saa86]
    Y. Saad. Communication Complexity of the Gaussian Elimination Algorithm on Multiprocessors. Linear Algebra and Applications, 77:315–340, 1986.CrossRefGoogle Scholar
  10. [SS95]
    B.S. Siegel and P.A. Steenkiste. Controlling Application Grain Size on a Network of Workstations. In Supercomputing'95, 1995.Google Scholar
  11. [Tse93]
    C.W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Frédéric Desprez
    • 1
  • Pierre Ramet
    • 2
  • Jean Roman
    • 2
  1. 1.LIP (URA CNRS 1398)INRIA Rhône-Alpes et Ecole Normale Supérieure de LyonLyon CedexFrance
  2. 2.LaBRI (URA CNRS 1304)ENSERB et Université Bordeaux ITalence CedexFrance

Personalised recommendations