Skip to main content
Log in

Efficient decomposition and performance of parallel PDE, FFT, Monte Carlo simulations, simplex, and Sparse solvers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we describe the decomposition of six algorithms: two partial differential equations (PDE) solvers (successive over-relaxation [SOR] and alternating direction implicit [ADI]), fast Fourier transform (FFT), Monte Carlo simulations, Simplex linear programming, and Sparse solvers. The algorithms were selected not only because of their importance in scientific applications, but also because they represent a variety of computational (structured to irregular) and communication (low to high) requirements. We present the performance results of these algorithms on two shared-memory VAX/VMSTM1 multiprocessor prototypes: the VAX 6300 series with up to 8 processors and the M31 with up to 22 processors. We demonstrate that by efficient decomposition it is possible to achieve high performance for all algorithms on both prototypes. We describe the efficient decomposition techniques applied to optimize the performance of parallel algorithms. Also, we discuss the performance implications due to different cache designs on two multiprocessors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, E.C. 1988. Parallel implementation of preconditioned conjugate gradient methods for solving sparse systems of linear equations. Master's thesis, Comp. Sci. Dept., Univ. of Ill. at Urbana-Champaign, Urbana, Ill.

    Google Scholar 

  • Baxter, D., Saltz, J., Schultz, M., and Eisenstat, S. 1988. Preconditioned Krylov solvers and methods for run-time loop parallelization. Yale Rept. YALEU/DCS/RT-655 (Oct.).

  • Burdick, S., and Schwetman, H. 1988. Parallelizing an electron transport Monte Carlo. MCC Tech. Rept., ACA-ST/CAD-328-87 (Jan.).

  • Cvetanovic, Z. 1987. Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM J. Res. and Dev., 31, 4 (July), 435–451.

    Google Scholar 

  • Digital Equipment. 1988a. Guide to Parallel Programming on VMS. Order No.: AA-LB38A-TE, Digital Equipment Corp.

  • Digital Equipment. 1988b. VAX 6300 System Technical User's Guide. Digital Equipment Corp.

  • Digital Equipment. 1988c. VAX FORTRAN User Manual. Order No.: AA-D035E-TE, Digital Equipment Corp.

  • Digital Equipment. 1988d. VMS RTL Parallel Processing (PPL$) Manual. Order No.: AA-LA74A-TE, Digital Equipment Corp.

  • Fatoohi, R.A., and Grosh, C.E. 1987. Implementation of an ADI method on parallel computers. ICASE Rept. No. 87-43 (July).

  • Hockney, R.W., and Jesshope, C.R. 1981. Parallel Computers. Adam Hilger Ltd., Bristol.

    Google Scholar 

  • Johnsson, L., Saad, Y., and Schultz, M. 1987. Alternating direction methods on multiprocessors. SIAM J. Sci. Stat. Comp., 8, 5 (Sept.), 668–700.

    Google Scholar 

  • Kunzi, H., Tzschach, H., and Zehnder, C. 1971. Numerical Methods of Mathematical Optimization. Academic Press.

  • Lambiotte, J.J. 1978. An alternating direction implicit method for the Control Data STAR-100 Vector Computer. NASA Tech. Paper 1282 (Sept.).

  • Liu, J.W.H. 1986. Computational models and task scheduling for parallel sparse Cholesky factorization. Parallel Computing, 3: 327–342.

    Google Scholar 

  • Luenberger, D. 1984. Linear and Nonlinear Programming. Addison-Wesley.

  • Murty, K. 1983. Linear Programming. John Wiley & Sons.

  • Norton, V.A., and Silberger, A. 1986. Parallelization and performance prediction of the Cooley-Tukey algorithm for shared-memory architectures. IBM Rept. RC-11885, IBM Thomas J. Watson Res. Center (May).

  • Ortega, J., and Voigt, R. 1985. Solution of partial differential equations on vector and parallel computers. SIAM Review, 27, 2 (June), 149–240.

    Google Scholar 

  • Pease, M.C. 1968. An adaptation of the fast Fourier transform for parallel processing. JACM, 15 (Apr.), 252–264.

    Google Scholar 

  • Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1986. Numerical Recipes. Cambridge Univ. Press, Cambridge, Mass.

    Google Scholar 

  • Reilly, M., and Sopka, J. 1988. M31: A large-scale multiprocessor VAX for parallel processing research. In Conf. Proc.—COMPCON Spring '88 (San Francisco, Feb. 29–Mar. 4), IEEE Comp. Soc. Press, pp. 200–206.

  • Sadayappan, P., and Visvanathan, V. 1988. Modeling and optimal scheduling of parallel sparse Gaussian elimination. Proc., ICPP, vol. 3 (Aug.), pp. 54–61.

    Google Scholar 

  • Salkin, H., and Saha, J. 1975. Studies in Linear Programming. North-Holland.

  • Saltz, J., and Naik, K. 1988. Towards developing robust algorithms for solving partial differential equations on MIMD machines. Parallel Computing, 6: 19–44.

    Google Scholar 

  • Saltz, J., Mirchandaney, R., and Baxter, D. 1988. Run-time parallelization and scheduling of loops. ICASE Rept. No. 88-70 (Dec.).

  • Stunkel, C. 1988. Linear optimization via message-based parallel processing. In Proc., 1988 Internat. Conf. on Parallel Processing (St. Charles, Ill., Aug. 15–19), Penn. Univ. Press, pp. 264–271.

  • Stunkel, C., and Reed, D. 1988. Hypercube implementation of the Simplex algorithm. In Proc., Third Conf. on Hypercube Concurrent Computers and Applications (Pasadena, Calif.), ACM Press, pp. 1473–1482.

  • Varga, R.S. 1962. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Wang, H.H. 1981. A parallel method for tridiagonal equations. ACM Trans. Math. Software, 7, 2 (June), 170–183.

    Google Scholar 

  • Whiteside, R.A., Hibbard, P.G., and Ostlund, N.S. 1982. Systolic algorithms for Monte Carlo simulations. In Proc., Third Internat. Conf. on Distributed Computing Systems (Miami/Ft. Lauderdale, Fla., Oct. 18–22), pp. 800–804.

  • Young, D.M. 1971. Iterative Solution of Large Linear Systems. Academic Press, New York.

    Google Scholar 

  • Zhang, X. 1988. Parallel block SOR methods for solving Poisson equations on shared and local memory multiprocessors. In Proc., 1988 Internat. Conf. on Parallel Processing (St. Charles, Ill., Aug. 15–19), Penn. Univ. Press, pp. 473–479.

Download references

Author information

Authors and Affiliations

Authors

Additional information

At the time of writing, all three authors were with Digital Equipment Corporation, VMS Systems and Servers Group.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cvetanovic, Z., Freedman, E.G. & Nofsinger, C. Efficient decomposition and performance of parallel PDE, FFT, Monte Carlo simulations, simplex, and Sparse solvers. J Supercomput 5, 219–238 (1991). https://doi.org/10.1007/BF00127844

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00127844

Keywords

Navigation