Exploiting Multiple Levels of Parallelism in Scientific Computing

  • Thomas Rauber
  • Gudula Rünger
Part of the IFIP — The International Federation for Information Processing book series (IFIPAICT, volume 172)


Parallelism is still one of the most prominent techniques to improve the performance of large application programs. Parallelism can be detected and exploited on several different levels, including instruction level parallelism, data parallelism, functional parallelism and loop parallelism. A suitable mixture of different levels of parallelism can often improve the performance significantly and the task of parallel programming is to find and code the corresponding programs,

We discuss the potential of using multiple levels of parallelism in applications from scientific computing and specifically consider the programming with hierarchically structured multiprocessor tasks. A multiprocessor task can be mapped on a group of processors and can be executed concurrently to other independent tasks. Internally, a multiprocessor task can consist of a hierarchical composition of smaller tasks or can incorporate any kind of data, thread, or SPMD parallelism. Such a programming model is suitable for applications with an inherent modular structure. Examples are environmental models combining atmospheric, surface water, and ground water models, or aircraft simulations combining models for fluid dynamics, structural mechanics, and surface heating. But also methods like specific ODE solvers or hierarchical matrix computations benefit from multiple levels of parallelism. Examples from both areas are discussed.


Task parallelism multiprocessor tasks orthogonal processor groups scientific computing 


  1. Banicescu, I. and Velusamy, V. (2002). Load balancing highly irregular computations with the adaptive factoring. In Proc. of the IEEE-International Parallel and Distributed Processing Symposium (IPDPS 2002)-Heterogeneous Computing Workshop. IEEE Computer Society Press, Fort Lauderdale.Google Scholar
  2. Banicescu, I., Velusamy, V, and Devaprasad, J. (2003). On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring. Cluster Computing, The Journal of Networks, Software Tools and Applications, 6(3):215–226.Google Scholar
  3. Deuflhard, P. (1985). Recent progress in extrapolation methods for ordinary differential equations. SIAM Review, 27:505–535.zbMATHMathSciNetCrossRefGoogle Scholar
  4. Forum, H. P. F. (1993). High Performance Fortran Language Specification. Scientific Programming, 2(1).Google Scholar
  5. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., and Sunderam, V. (1996). PVM Parallel Virtual Machine: A User's Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA.Google Scholar
  6. Hairer, E., Norsett, S., and Wanner, G. (1993). Solving Ordinary Differential Equations I: Nonstiff Problems. Springer-Verlag, Berlin.zbMATHGoogle Scholar
  7. Hennessy, J. and Patterson, D. (2003). Computer Architecture — A Quantitative Approach. Morgan Kaufmann, 3nd edition.Google Scholar
  8. Hippold, J. and Rünger, G. (2003). Task Pool Teams for Implementing Irregular Algorithms on Clusters of SMPs. In Proc. of the IPDPS (International Parallel and Distributed Processing Symposium), Nice, France. IEEE.Google Scholar
  9. Hoffmann, R., Korch, M., and Rauber, T. (2004). Using Hardware Operations to Reduce the Synchronization Overhead of Task Pools. In Proc. of the Int. Conference on Parallel Processing (ICPP), pages 241–249.Google Scholar
  10. Hunold, S., Rauber, T., and Rünger, G. (2004a). Hierarchical Matrix-Matrix Multiplication based on Multiprocessor Tasks. In Bubak, M., van Albada, G., Sloot, P. M., and Dongarra, J. J., editors, Proc. of the International Conference on Computational Science ICCS 2004, Part II, LNCS 3037, pages 1–8. Springer.Google Scholar
  11. Hunold, S., Rauber, T., and Rünger, G. (2004b). Multilevel Hierarchical Matrix Multiplication on Clusters. In Proc. of the 18th Annual ACM International Conference on Supercomputing, ICS'04, pages 136–145.Google Scholar
  12. Polychronopoulos, C. and Kuck, D. (1987). Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C-36(l2): 1425–1439.CrossRefGoogle Scholar
  13. Rauber, T. and Rünger, G. (2000). A Transformation Approach to Derive Efficient Parallel Implementations. IEEE Transactions on Software Engineering, 26(4):315–339.CrossRefGoogle Scholar
  14. Rauber, T. and Rünger, G. (2002). Library Support for Hierarchical Multi-Processor Tasks. In Proc. of the Supercomputing 2002, Baltimore, USA. ACM/IEEE.Google Scholar
  15. Singh, J. (1993). Parallel Hierarchical N-Body Methods and their Implication for Multiprocessors. PhD thesis, Stanford University.Google Scholar
  16. Snir, M., Otto, S., Huss-Ledermann, S., Walker, D., and Dongarra, J. (1998). MPI: The Complete Reference, Vol.1: The MPI Core. MIT Press, Camdridge, MA.Google Scholar
  17. van der Houwen, P. and Sommeijer, B. (1990a). Parallel Iteration of high—order Runge—Kutta Methods with stepsize control. Journal of Computational and Applied Mathematics, 29:111–127.MathSciNetCrossRefzbMATHGoogle Scholar
  18. van der Houwen, P. and Sommeijer, B. (1990b). Parallel ODE Solvers. In Proc. of the ACM Int. Conf. on Supercomputing, pages 71–81.Google Scholar
  19. Whaley, R. C. and Dongarra, J. J. (1997). Automatically Tuned Linear Algebra Software. Technical Report UT-CS-97-366, University of Tennessee.Google Scholar
  20. Wolfe, M. (1996). High Performance Compilers for Parallel Computing. Addison Wesley.Google Scholar

Copyright information

© International Federation for Information Processing 2005

Authors and Affiliations

  • Thomas Rauber
    • 1
  • Gudula Rünger
    • 2
  1. 1.Computer Science DepartmentUniversity BayreuthGermany
  2. 2.Computer Science DepartmentChemnitz University of TechnologyGermany

Personalised recommendations