Efficient use of parallel & distributed systems: From theory to practice

  • Burkhard Monien
  • Ralf Diekmann
  • Rainer Feldmann
  • Ralf Klasing
  • Reinhard Lüling
  • Knut Menzel
  • Thomas Römke
  • Ulf-Peter Schroeder
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1000)


This article focuses on principles for the design of efficient parallel algorithms for distributed memory computing systems. We describe the general trend in the development of architectural properties and evaluate the state-of-the-art in a number of basic primitives like graph embedding, partitioning, dynamic load distribution, and communication which are used, to some extent, within all parallel applications. We discuss possible directions for future work on the design of universal basic primitives, able to perform efficiently on a broad range of parallel systems and applications, and we also give certain examples of specific applications which demand specialized basic primitives in order to obtain efficient parallel implementations. Finally, we show that programming frames can offer a convenient way to encapsulate algorithmic know-how on applications and basic primitives and to offer this knowledge to nonspecialist users in a very effective way.


Load Balance Load Distribution Distribute Hash Table Parallel Computing System Tree Search Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    M. Adler, S. Chakrabarti, M. Mitzenmacher, L. Rasmussen: Parallel Randomized Load Balancing. Proc. 27th Annual ACM Symp. on Theory of Computing (STOC '95), 1995.Google Scholar
  2. 2.
    Y. Azar, A.Z. Broder, A.R. Karlin, E. Upfal: Balanced Allocations. Proc. 25th Annual ACM Symp. on Theory of Computing (STOC '93), pp. 593–602, 1993.Google Scholar
  3. 3.
    R. Blumofe, C.E. Leiserson: Scheduling Multithreaded Computations by Work Stealing. Proc. 35th Annual IEEE Symp. on Foundations of Computer Science (FOCS '94), pp. 356–368, 1994.Google Scholar
  4. 4.
    R. Barret, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, H. van der Vorst: TEMPLATES for the Solution of Linear Systems: Building Blocks for Iterative Methods. Tech. Rep., CS-Dept., Univ. of Tennessee, 1993. WWW: Scholar
  5. 5.
    F. Bornemann, B. Erdmann, R. Kornhuber: Adaptive Multilevel Methods in Three Space Dimensions. Int. J. on Num. Meth. in Engineering (36), pp. 3187–3203, 1993.CrossRefGoogle Scholar
  6. 6.
    J. Buriánek, A. Holeček, K.Menzel, J. PŘikryl, J. Žára: Load Balancing for Parallel Environment on Virtual Walls. accepted for the Winter School of Computer Graphics and CAD Systems, Pilsen, Czech Republic, 1995.Google Scholar
  7. 7.
    S. Bhatt, F. Chung, J. Hong, T. Leighton, B. Obrenic, A. Rosenberg, E. Schwabe: Optimal Emulations by Butterfly-like Networks. J. of the ACM, to appear. (A preliminary version, Optimal Simulations by Butterfly Networks, appeared in: Proc. 20th ACM Symp. on Theory of Computing (STOC '88), pp. 192–204, 1988.)Google Scholar
  8. 8.
    S. Bhatt, F. Chung, T. Leighton, A. Rosenberg: Efficient Embeddings of Trees in Hypercubes. SIAM J. on Computing, Vol. 21, No. 1, pp. 151–162, 1992. (A preliminary version, Optimal Simulations of Tree Machines, appeared in Proc. 27th IEEE Symp. on Foundat. of Computer Science (FOCS '86), pp. 274–282, 1986.)CrossRefGoogle Scholar
  9. 9.
    M.Y. Chan: Embedding of Grids into Optimal Hypercubes. SIAM J. on Computing, Vol. 20, No. 5, pp. 834–864, 1991.CrossRefGoogle Scholar
  10. 10.
    M. Cole: Algorithmic Skeletons: Structured Management of Parallel Computation. PhD, Research Monographs in Par. and Distr. Computing, MIT Press.Google Scholar
  11. 11.
    The CUBIT Mesh Generation Research Project, Sandia National Lab 1995, WWW: http: // Scholar
  12. 12.
    M. Danelutto, R. Di Meglio, S. Orlando, S. Pelagatti, M. Vanneschi: A Methodology for the Development and the Support of Massively Parallel Programs. J. on Future Generation Computer Systems (FCGS), Vol. 8, 1992.Google Scholar
  13. 13.
    J. Darlington, A.J. Field, P.G. Harrison, P.H.J. Kelly, D.W.N. Sharp, Q. Wu: Parallel Programming Using Skeleton Functions. Proc. of Par. Arch. and Lang. Europe (PARLE '93), Lecture Notes in Computer Science No. 694, Springer-Verlag, 1993.Google Scholar
  14. 14.
    T. Decker, R. Diekmann, R. Lüling, B. Monien: Towards Developing Universal Dynamic Mapping Algorithms. Proc. 7th IEEE Symp. on Parallel and Distributed Processing (SPDP '95), 1995, to appear.Google Scholar
  15. 15.
    R. Diekmann, D. Meyer, B. Monien: Parallel Decomposition of Unstructured FEM-Meshes. Proc. Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR '95), Lecture Notes in Computer Science, Springer-Verlag, 1995, to appear.Google Scholar
  16. 16.
    R. Diekmann, B. Monien, R. Preis: Using Helpful Sets to Improve Graph Bisections. Tech. Rep. tr-rf-94-008, Univ. of Paderborn, 1994, and: DIMACS Series in Discrete Mathematics and Theoretical Computer Science, AMS, 1995, to appear.Google Scholar
  17. 17.
    R. Feldmann, B. Monien, P. Mysliwietz, O. Vornberger: Distributed Game-Tree Search. ICCA Journal, Vol. 12, No. 2, pp. 65–73, 1989.Google Scholar
  18. 18.
    R. Feldmann: Game Tree Search on Massively Parallel Systems. Doctoral Thesis, University of Paderborn, Germany, 1993.Google Scholar
  19. 19.
    P. Fraigniaud, E. Lazard: Methods and problems of communication in usual networks. Discrete Applied Mathematics, Vol. 53, No. 1–3, pp. 79–133, 1994.CrossRefGoogle Scholar
  20. 20.
    P. Fraigniaud: Vers un principe de localité pour les communications dans les réseaux d'interconnection. Habilitation Thesis, ENS Lyon, 1994.Google Scholar
  21. 21.
    W. Furmanski: SuperComputing and Virtual Reality. Presented at the Meckler Conference Virtual Reality '92, San Jose, Ca, 23–25 Sept, 1992.Google Scholar
  22. 22.
    A. George, J.R. Gilbert, J.W.H. Liu (ed.): Graph Theory and Sparse Matrix Computations. The IMA Volumes in Math. and its Appl. No. 56, Springer-Verlag, 1993.Google Scholar
  23. 23.
    P.L. George: Automatic Mesh Generation. John Wiley & Sons, 1993.Google Scholar
  24. 24.
    B. Ghosh, F.T. Leighton, B. Maggs S. Muthukrishnan, C.G. Plaxton, R. Rajaraman, A.W. Richa, R.E. Tarjan, D. Zuckerman: Tight Analyses of Two Local Load Balancing Algorithms. Proc. 27th Annual ACM Symp. on Theory of Computing (STOC '95), pp. 548–558, 1995.Google Scholar
  25. 25.
    A. Gupta, V. Kumar: A Scalable Parallel Algorithm for Sparse Matrix Factorization. Tech. Rep. TR 94-19, Dept. of Comp. Science, Univ. of Minnesota, 1994.Google Scholar
  26. 26.
    S.W. Hammond: Mapping Unstructured Grid Computations to Massively Parallel Computers. Tech. Rep. 92.14, RIACS, NASA Ames Research Center, 1992.Google Scholar
  27. 27.
    R. Heckmann, R. Klasing, B. Monien, W. Unger: Optimal Embedding of Complete Binary Trees into Lines and Grids. Proc. 17th Int. Workshop on Graph-Theoretic Concepts in Computer Science (WG '91), Lecture Notes in Computer Science No. 570, Springer-Verlag, pp. 25–35, 1991.Google Scholar
  28. 28.
    B. Hendrickson, R. Leland: The Chaco User's Guide. Tech. Rep. SAND93-2339, Sandia National Lab., Nov. 1993.Google Scholar
  29. 29.
    B. Hendrickson, R. Leland: An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations. SIAM J. on Scientific Computing, Vol. 16, No. 2, pp. 452–469, 1995.CrossRefGoogle Scholar
  30. 30.
    M. C. Heydemann, J. Opatrny, D. Sotteau: Embeddings of Hypercubes and Grids into de Bruijn Graphs. J. of Parallel and Distributed Computing, Vol. 23, pp. 104–111, 1994.CrossRefGoogle Scholar
  31. 31.
    J. Hromkovič, K. LoryŚ, P. Kanarek, R. Klasing, W. Unger, H. Wagener: On the Sizes of Permutation Networks and Consequences for Efficient Simulation of Hypercube Algorithms on Bounded-Degree Networks. Proc. 12th Symp. on Theoretical Aspects of Computer Science (STACS '95), Lecture Notes in Computer Science No. 900, Springer-Verlag, pp. 255–266, 1995.Google Scholar
  32. 32.
    J. Hromkovič, R. Klasing, B. Monien, R. Peine: Dissemination of information in interconnection networks (broadcasting and gossiping). In: F. Hsu, D.-Z. Du (ed.): Combinatorial Network Theory, Kluwer Academic Pub., 1995, to appear.Google Scholar
  33. 33.
    J. Hromkovič, R. Klasing, W. Unger, H. Wagener: Optimal Algorithms for Broadcast and Gossip in the Edge-Disjoint Path Modes. Proc. 4th Scandinavian Workshop on Algorithm Theory (SWAT '94), Lecture Notes in Computer Science No. 824, Springer-Verlag, pp. 219–230, 1994.Google Scholar
  34. 34.
    G. Karypis, V. Kumar: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. Techn. Rep. 95-035, Dept. of Comp. Science, University of Minnesota, 1995.Google Scholar
  35. 35.
    F.T. Leighton: Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann Publishers, 1992.Google Scholar
  36. 36.
    T. Leighton, M. Newman, A. Ranade, E. Schwabe: Dynamic Tree Embedding in Butterflies and Hypercubes. Proc. 1st Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA '89), pp. 224–234, 1989.Google Scholar
  37. 37.
    R. Lüling, B. Monien: A Dynamic Distributed Load Balancing Algorithm with Provable Good Performance. Proc. 5th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA '93), pp. 164–173, 1993.Google Scholar
  38. 38.
    E. Ma, D.G. Shea: The Embedding Kernel on the IBM Victor Multiprocessor for Program Mapping and Network Reconfiguration. Proc. 2nd IEEE Symp. on Parallel and Distributed Processing (SPDP '90), 1990.Google Scholar
  39. 39.
    B.M. Maggs, L.R. Matheson, R.E. Tarjan: Models of Parallel Computation: A Survey and Sythesis. Proc. 28th Hawaii Int. Conference on System Sciences (HICSS-w28), Vol. 2, pp. 61–70, 1995.Google Scholar
  40. 40.
    K. Mehlhorn, S. Näher: LEDA, a Library of Efficient Data Types and Algorithms. Proc. 14th Int. Symp. on Mathem. Foundations of Computer Science (MFCS '89), Lecture Notes in Computer Science No. 379, Springer-Verlag, pp. 88–106, 1989.Google Scholar
  41. 41.
    K. Menzel: Parallel Rendering Techniques for Multiprocessor Systems. SSCG, Spring School on Computer Graphics, Bratislava, Slovakia, 1994, Comenius University Press, pp. 91–103, 1994.Google Scholar
  42. 42.
    Z. Miller, I.H. Sudborough: Compressing Grids into Small Hypercubes. NET-WORKS, Vol. 24, pp. 327–358, 1994.Google Scholar
  43. 43.
    B. Monien: Simulating binary trees on X-trees. Proc. 3rd Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA '91), pp. 147–158, 1991.Google Scholar
  44. 44.
    B. Monien, R. Diekmann, R. Lüling: Communication Throughput of Interconnection Networks. Proc. 19th Int. Symp. on Mathematical Foundations of Computer Science (MFCS '94), Lecture Notes in Computer Science No. 841, Springer-Verlag, pp. 72–86, 1994.Google Scholar
  45. 45.
    B. Monien, R. Feldmann, R. Klasing, R. Lüling: Parallel Architectures: Design and Efficient Use. Proc. 10th Symp. on Theoretical Aspects of Computer Science (STACS '93), Lecture Notes in Computer Science No. 665, Springer-Verlag, pp. 247–269, 1993.Google Scholar
  46. 46.
    B. Monien, I.H. Sudborough: Simulating Binary Trees on Hypercubes. Proc. Aegean Workshop on Computing (AWOC '88), Lecture Notes in Computer Science No. 319, Springer-Verlag, pp. 170–180, 1988.Google Scholar
  47. 47.
    B. Monien, I.H. Sudborough: Embedding one Interconnection Network in Another. Computing Suppl. 7, pp. 257–282, 1990.Google Scholar
  48. 48.
    The Berkeley NOW Project. WWW: Scholar
  49. 49.
    A. Nowatzyk: The Interconnect System and TIC Chip. in: Hot Interconnects 93, Stanford CA, August 1993.Google Scholar
  50. 50.
    W. Purgathofer, M. Feda: Progressive Refinement Radiosity on a Transputer Network. Proc. 2nd Eurographics Workshop on Rendering, Barcelona, Spain, 1991.Google Scholar
  51. 51.
    A. Ranade: Optimal Speedup for Backtrack Search on a Butterfly Network. Proc. 3rd Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA '91), pp. 40–48, 1991.Google Scholar
  52. 52.
    T. Römke, M. Röttger, U.-P. Schroeder, J. Simon: On Efficient Embeddings of Grids into Grids in Parix. Proc. EURO-PAR '95, Lecture Notes in Computer Science, Springer-Verlag, 1995, to appear.Google Scholar
  53. 53.
    M. Röttger, U.-P. Schroeder, J. Simon: Implementation of a Parallel and Distributed Mapping Kernel for PARIX. Int. Conference and Exhibition on High-performance Computing and Networking (HPCN Europe '95), Lecture Notes in Computer Science No. 919, Springer-Verlag, pp. 781–786, 1995.Google Scholar
  54. 54.
    H.D. Simon: Partitioning of Unstructured Problems for Parallel Processing. Computing Systems in Engineering (2), pp. 135–148, 1991.CrossRefGoogle Scholar
  55. 55.
    L. Smarr, C.E. Catlett: Metacomputing. Comm. of the ACM 35(6), pp. 45–52, 1992.CrossRefGoogle Scholar
  56. 56.
    W. Stürzlinger, C. Wild: Parallel Visibility Computations for Parallel Radiosity. Parallel Processing, pp. 405–413, 1994.Google Scholar
  57. 57.
    S. Tschöke, R. Lüling, B. Monien: Solving the Traveling Salesman Problem with a Distributed Branch-and-Bound Algorithm on a 1024 Processor Network. Proc. 9th Int. Parallel Processing Symp. (IPPS '95), pp. 182–189, 1995.Google Scholar
  58. 58.
    A.J. West, T.L.J. Howard et al.: AVIARY: A Generic Virtual Reality Interface for Real Applications. Virtual Reality Systems, 1992.Google Scholar
  59. 59.
    O.C. Zienkiewicz: The finite element method. McGraw-Hill, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Burkhard Monien
    • 1
  • Ralf Diekmann
    • 1
  • Rainer Feldmann
    • 1
  • Ralf Klasing
    • 1
  • Reinhard Lüling
    • 1
  • Knut Menzel
    • 1
  • Thomas Römke
    • 2
  • Ulf-Peter Schroeder
    • 1
  1. 1.Department of Computer ScienceUniversity of PaderbornGermany
  2. 2.Paderborn Center for Parallel Computing (PC2)University of PaderbornGermany

Personalised recommendations