Skip to main content
Log in

A review of parallel processing for statistical computation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Parallel computers differ from conventional serial computers in that they can, in a variety of ways, perform more than one operation at a time. Parallel processing, the application of parallel computers, has been successfully utilized in many fields of science and technology. The purpose of this paper is to review efforts to use parallel processing for statistical computing. We present some technical background, followed by a review of the literature that relates parallel computing to statistics. The review material focuses explicitly on statistical methods and applications, rather than on conventional mathematical techniques. Thus, most of the review material is drawn from statistics publications. We conclude by discussing the nature of the review material and considering some possibilities for the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akl, S. (1985) Parallel Sorting Algorithms. Academic Press, New York.

    Google Scholar 

  • Al-Jumeily, D. M., Clegg, D. B., Pountney, D. C. and Harris, P. (1994) Optimising Simple Statistical Calculations Using Memory Computers. No. CMS 5, School of Computing and Mathematical Sciences, Liverpool John Moores University.

    Google Scholar 

  • Anderson, S. L. (1990) Random number generators on vector computers and other advanced architectures. SIAM Review, 32(2), 221–51.

    Google Scholar 

  • Bäck, T. and Hoffmeister, F. (1994) Basic aspects of evolution strategies. Statistics and Computing, 4, 51–63.

    Google Scholar 

  • Bailey, D. H. (1991) Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputer, 8(5), 4–7.

    Google Scholar 

  • Bertsekas, D. P. and Tsitsiklis, J. N. (1989) Parallel and Distributed Computation, Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Brophy, J. F., Gentle, J. E., Li, J. and Smith, P. W. (1989) Software for advanced architecture computers. In K. Berk and L. Malone (eds), Computer Science and Statistics, Proceedings of the 21st Symposium on the Interface, pp. 116–20. American Statistical Association.

  • Carriero, N. and Gelernter, D. (1989) LINDA in context. Communications of the ACM, 32(4), 444–58.

    Google Scholar 

  • Chambers, J. M. (1977) Computational Methods for Data Analysis. Wiley, New York.

    Google Scholar 

  • Cray-1 Computer Systems (1981) Fortran (CFT) reference manual. Publication No. SR-0009, Rev. H.

  • de Doncker, E. and Kapenga, J. (1989) Parallel multivariate numerical integration. In G. Rodrigue (ed.), Parallel Processing for Scientific Computing, pp. 109–13. SIAM, Philadelphia.

    Google Scholar 

  • de Doncker, E. and Vakalis, I. (1993) Convergence results and speedup of parallel numerical integration algorithms. In R. F. Sincovec, D. E. Keys, M. R. Leuze, L. R. Petzold and D. A. Reed (eds), Parallel Processing for Scientific Computing, Vol. 2, pp. 539–45. SIAM, Philadelphia.

    Google Scholar 

  • de Doncker, E., Kapenga, J. A., and McKean, J. W. (1989) Robust projection pursuit. In K. Berk and L. Malone (eds). Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 308–13. American Statistical Association.

  • Dongarra, J. J. and Sorenson, D. C. (1987) A portable environment for developing parallel Fortran programs. Parallel Computing, 5, 139–54.

    Google Scholar 

  • Dongarra, J. J. and Tourancheau, B. (1992) Environments and Tools for Parallel Scientific Computing. North-Holland, Amsterdam.

    Google Scholar 

  • Dongarra, J. J., Duff, I. S., Sorenson, D. C. and van der Vorst, H. A. (1991) Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia.

    Google Scholar 

  • Du Croz, J. (1990) Supercomputing with the NAG Library. Supercomputer, 7(2), 72–80.

    Google Scholar 

  • Durst, M. J. (1987) Library software in the supercomputing environment. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 7–12. American Statistical Association.

  • Eddy, W. F. (1986) Parallel architecture: a tutorial for statisticians. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 23–9. American Statistical Association.

  • Eddy, W. F. and Schervish, M. J. (1986) Discrete-finite inference on a network of Vaxes. In T. M. Boardman and I. M. Stefanski, (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 30–6. American Statistical Association.

  • Eddy, W. F., Meyer, M. M., Mockus, A., Schervish, M. J., Tan, K. and Viele, K. (1992) Smoothing census adjustment factors: an application of high performance computing. In H. J. Newton (ed.), Computing Science and Statistics, Proceedings of the 24th Symposium on the Interface, pp. 503/2-10. American Statistical Association.

  • Eddy, W. F. and Schervish, M. J. (1991) Parallel computing—a tutorial for statisticians. In E. M. Keramidas (ed.), Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, pp. 479–86. Interface Foundation North America.

  • Efron, B. and Tibshirani, R. J. (1993) An Introduction to the Bootstrap. Chapman and Hall, London.

    Google Scholar 

  • Encore (1988) Encore Parallel Fortran, Ref. No. 724–06785, Encore Computer Corporation, Fort Lauderdale, FL.

    Google Scholar 

  • Fahrmeir, L. (1977) Parallel estimation algorithms for stochastic parameters of time series models. In L. Feilmeier (ed.) Parallel Computers—Parallel Mathematics, pp. 99–102. North-Holland, Amsterdam.

    Google Scholar 

  • Flynn, M. J. (1972) Some computer organisations and their effectiveness. IEEE Transactions on Computers, 21(9), 948–60.

    Google Scholar 

  • Freeman, T. L. and Philips, C. (1992) Parallel Numerical Algorithms. Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Freisleben, B. (1993) Parallel learning algorithms for principal component extraction. In Proceedings of the 3rd International Conference on Artificial Neural Networks, 372, 267–71.

  • Furnival, G. M. and Wilson, R. W. Jr. (1974) Regression by leaps and bounds. Technometrics, 16, 299–511.

    Google Scholar 

  • Geist, A., Beguelin, A., Dongarra, J., Weichang, J., Manchek, R. and Sunderam, V. (1993) PVM 3·0 User's Guide and Reference Manual. Tech. Rept. ORNL/TM-12187, Oak Ridge National Laboratory.

  • Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. and Sunderam, V. (1994) PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA. (also available online http:// www.netlib.org/pvm3/book/pvm-book.html).

    Google Scholar 

  • Gladwell, I. (1987) Vectorisation of one dimensional quadrature codes. In G. Fairweather and P. M. Keast (eds), Numerical Integration. Recent Developments, Software and Applications, NATO ASI Series C203, pp. 230–8.

  • Golub, G. and Ortega, J. M. (1993) Scientific Computing an Introduction with Parallel Computing. Academic Press, New York.

    Google Scholar 

  • Gonzalez, C., Chen, J. and Sarma, J. (1988) A tool to generate FORTRAN parallel code for the Intel IPSC/2 Hypercube. In E. J, Wegman, D. T. Gantz and J. J. Miller (eds). Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 214–9. American Statistical Association.

  • Grenander, U. and Miller, M. I. (1994) Representation of knowledge in Complex Systems. Journal of the Royal Statistical Society, Series B, 54(4), 549–603.

    Google Scholar 

  • Havránek, T. and Stratkoš, Z. (1989) On practical experience with parallel processing of linear models. Bulletin of the International Statistical Institute, 53, 105–17.

    Google Scholar 

  • Hawkins, D. M., Simonoff, J. S. and Stromberg, A. J. (1994) Distributing a computationally intensive estimator: the case of exact LMS regression. Computational Statistics, 9, 83–95.

    Google Scholar 

  • Healey, A. R. and Davies, S. T. (1983) Statistical model fitting on the ICL distributed array processors. In M. Feilmeier, J. Joubert and U. Schendel (eds), Parallel Computing '83 pp. 311–17, Elsevier, Amsterdam.

    Google Scholar 

  • Hénaff, P. J. and Norman, A. L. (1987) Solving nonlinear econometric models using vector processors. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 348–51. American Statistical Association.

  • Hockney, R. W. and Jesshope, C. R. (1988) Parallel Computers 2. Adam Hilger, Bristol.

    Google Scholar 

  • Huber, P. J. (1985) Projection pursuit. Annals of Statistics, 13, 435–525.

    Google Scholar 

  • Hwang, K. (1993) Advanced Computer Architecture: Parallelism. Scalability, Programmability. McGraw-Hill, New York.

    Google Scholar 

  • Ihnen, L. (1989) Vectorisation of the SAS(R) System. In K. Berk and L. Malone (eds), Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 121–7. American Statistical Association.

  • Inmos (1990) Transputer Development System (2nd edn.). Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Jaeckel, L. A. (1972) Estimating regression coefficients by minimising the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–58.

    Google Scholar 

  • Kapenga, J. A. and McKean, J. W. (1987) The vectorisation of algorithms for R-estimates in linear regression. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 502–5. American Statistical Association.

  • Kaufman, L. and Rousseeuw, P. J. (1986) Clustering large data sets. In E. Gelsema and L. Kanal (eds), Pattern Recognition in Practice II, pp. 425–37. Elsevier/North-Holland, Amsterdam.

    Google Scholar 

  • Kaufman, L., Hopke, P. K. and Rousseeuw, P. J. (1988) Using a parallel computer system for statistical resampling methods. Computational Statistics Quarterly, 2, 129–41.

    Google Scholar 

  • Kaufmann, W. J. and Smarr, L. L. (1993) Supercomputing and the Transformation of Science. Scientific American Library.

  • Kleijnen, J. P. C. (1990) Supercomputers for Monte Carlo Simulation: Cross-validation versus Rao's test in multivariate analysis. In K. H. Jockes, G. Rothe and W. Sendler (eds), Bootstrapping and Related Techniques, pp. 233–45. Springer-Verlag, Berlin.

    Google Scholar 

  • Kleijnen, J. P. and Annink, B. (1992) Vector computers, Monte Carlo simulation and regression analysis: an introduction. Management Science, 38(2), 170–81.

    Google Scholar 

  • Lafaye de Micheaux, D. (1984) Parallelization of algorithms in the practice of statistical data. In T. Havránek, Z. Sidak and M. Novak (eds), COMPSTAT '84—Proceedings in Computational Statistics, pp. 293–300. Vienna.

  • Lewis, T. G. and El-Rewini, H. (1992) Introduction to Parallel Processing, Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Lootsma, F. A. (1989) Parallel Non Linear Optimisation. No. 89-45 Faculty of Tech. Math. and Informatics, Delft University of Tech.

  • Lootsma, F. A. and Ragsdell, K. M. (1988) State-of-the-art in parallel nonlinear optimisation. Parallel Computing, 6, 133–55.

    Google Scholar 

  • Malfait, M., Roose, D. and Vandermeulen, D. (1993) A convergence measure and some parallel aspects of Markov chain Monte Carlo algorithms. In Su-Shing Chen (ed.), Neural and Stochastic Methods in Image and Signal Processing, Proc. SPIE 2032, 23–34.

  • McCullagh, P. and Nelder, J. A. (1983) Generalised Linear Models. Chapman and Hall, London.

    Google Scholar 

  • McKean, J.W. and Hettmansperger, T. P. (1978) A robust analysis of the general linear model based on one step R-estimates. Biometrika, 65, 571–9.

    Google Scholar 

  • Mitchell, T. J. and Beauchamp, J. J. (1986) Algorithms for Bayesian variable selection in regression. In T. M. Boardman (ed.), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 181–2. American Statistical Association.

  • Mitchell, T. J. and Morris, M. D. (1988) A Bayesian approach to the design and analysis of computational experiments. In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 49–51. American Statistical Association.

  • Modi, J. J. (1988). Parallel Algorithms for Matrix Computations. Clarendon Press, Oxford.

    Google Scholar 

  • O'Sullivan, F. and Pawitan, Y. (1993) Multidimensional density estimation by tomography. Journal of the Royal Statistical Society, Series B, 55(2), 509–21.

    Google Scholar 

  • Ortega, J. M., Voigt, R. G. and Romine, C. H. (1990) A bibliography on parallel and vector numerical algorithms. In K.A. Gallivan, M. T. Heath, E. Ng, et al.Parallel Algorithms for Matrix Computations, pp. 125–97. SIAM, Philadelphia.

    Google Scholar 

  • Ostrouchov, G. (1987) Parallel computing on a hypercube: an overview of the architecture and some applications. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 27–32. American Statistical Association.

  • Perrott, R. H. (1987) Parallel Programming. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Quinn, M. J. (1987) Designing Efficient Algorithms for Parallel Computers. McGraw-Hill, New York.

    Google Scholar 

  • Raphalen, M. (1982) Applying parallel processing to data analysis: computing a distance's matrix on an SIMD machine. In H. Caussinus, P. Ettinger and R. Tomassone (eds), COMPSTAT '82—Proceedings in Computational Statistics, pp. 382–6. Physica-Verlag, Vienna.

    Google Scholar 

  • Rousseeuw, P. J. (1984) Least median of squares regression. Journal of the American Statistical Association, 79, 871–80.

    Google Scholar 

  • Schervish, M. J. (1988) Applications of parallel computation to statistical inference. Journal of the American Statistical Association, 83(404), 976–83.

    Google Scholar 

  • Schervish, M. J. and Tsay, R. S. (1988) Bayesian modelling and forecasting in large scale time series. In J. C. Spall (ed.), Bayesian Analysis of Time Series and Dynamic Models, pp. 23–52. Marcel Dekker, New York.

    Google Scholar 

  • Schnabel, R. B. (1988) Sequential and parallel methods for unconstrained optimization. Tech. Rept. CU-CS-414-88, Dept. of Comput. Sci., University of Colorado at Boulder, CO.

  • Schork, N. J. and Hardwick, J. (1990) Supercomputer-intensive multivariable randomization tests. In C. Page and R. LePage (eds), Computing Science and Statistics, Proceedings of the 22nd Symposium on the Interface, pp. 509–13. Springer-Verlag, New York.

    Google Scholar 

  • Skvoretz, J., Smith, S. A. and Baldwin, C. (1992) Parallel processing applications for data analysis in the social sciences. Concurrency: Practice and Experience, 4(3), 207–21.

    Google Scholar 

  • Stewart, G. W. (1986) Communication in parallel algorithms: an example. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 11–14. American Statistical Association.

  • Stewart, G. W. (1988) Parallel linear algebra in statistical computations. In D. Edwards and N. E. Raun (eds), COMPSTAT '88, Proceedings in Computational Statistics, pp. 3–14. Physica-Verlag, Vienna.

    Google Scholar 

  • Stine, R. A. and Woteki, T. H. (1989) A graphical programming environment for statistical simulations with parallel processing. In ASA Proceedings of the Statistical Computing Section, pp. 104–9. American Statistical Association.

  • Stratkoš, Z. (1987) Effectivity and optimizing algorithms and programs on the host-computer/array processor systems. Parallel Computing 4, 197–207.

    Google Scholar 

  • Sylwestrowicz, J. D. (1982) Parallel processing in statistics. In H. Caussinus, P. Ettinger and R. Tomassone (eds), COMPSTAT '82—Proceedings in Computational Statistics, pp. 131–6. Physica-Verlag, Vienna.

    Google Scholar 

  • Thisted, R. A. (1988) Elements of Statistical Computing. Chapman and Hall, London.

    Google Scholar 

  • Wilson, G. V. (1993) A glossary of parallel computing terminology. IEEE Parallel and Distributed Terminology, February, pp. 52–67.

  • Wollan, P. (1988) All-subsets regression on a hypercube computer. In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 224–7. American Statistical Association.

  • Xu, C. W. and Shiue, W. K. (1991) Parallel bootstrap and inference for means. Computational Statistics Quarterly, 3, 233–9.

    Google Scholar 

  • Xu, C. W. and Shiue, W. K. (1993) Parallel algorithms for least median of squares regression. Computational Statistics and Data Analysis, 16, 349–62.

    Google Scholar 

  • Xu, M., Miller, J. J. and Wegman, E. J. (1989) Parallelizing mutiple linear regression for speed and redundancy: an empirical study. In K. Berk and L. Malone (eds), Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 138–44. American Statistical Association.

  • Zenios, S. A. (1989) Parallel numerical optimization: current status and an annotated bibliography. Operational Research Society of America Journal of Computing, 1, 20–43.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adams, N.M., Kirby, S.P.J., Harris, P. et al. A review of parallel processing for statistical computation. Stat Comput 6, 37–49 (1996). https://doi.org/10.1007/BF00161572

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00161572

Keywords

Navigation