Advertisement

Statistics and Computing

, Volume 6, Issue 1, pp 37–49 | Cite as

A review of parallel processing for statistical computation

  • N. M. Adams
  • S. P. J. Kirby
  • P. Harris
  • D. B. Clegg
Article

Abstract

Parallel computers differ from conventional serial computers in that they can, in a variety of ways, perform more than one operation at a time. Parallel processing, the application of parallel computers, has been successfully utilized in many fields of science and technology. The purpose of this paper is to review efforts to use parallel processing for statistical computing. We present some technical background, followed by a review of the literature that relates parallel computing to statistics. The review material focuses explicitly on statistical methods and applications, rather than on conventional mathematical techniques. Thus, most of the review material is drawn from statistics publications. We conclude by discussing the nature of the review material and considering some possibilities for the future.

Keywords

Parallel processing statistical computing Flynn's taxonomy parallel software tools parallel performance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akl, S. (1985) Parallel Sorting Algorithms. Academic Press, New York.Google Scholar
  2. Al-Jumeily, D. M., Clegg, D. B., Pountney, D. C. and Harris, P. (1994) Optimising Simple Statistical Calculations Using Memory Computers. No. CMS 5, School of Computing and Mathematical Sciences, Liverpool John Moores University.Google Scholar
  3. Anderson, S. L. (1990) Random number generators on vector computers and other advanced architectures. SIAM Review, 32(2), 221–51.Google Scholar
  4. Bäck, T. and Hoffmeister, F. (1994) Basic aspects of evolution strategies. Statistics and Computing, 4, 51–63.Google Scholar
  5. Bailey, D. H. (1991) Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputer, 8(5), 4–7.Google Scholar
  6. Bertsekas, D. P. and Tsitsiklis, J. N. (1989) Parallel and Distributed Computation, Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  7. Brophy, J. F., Gentle, J. E., Li, J. and Smith, P. W. (1989) Software for advanced architecture computers. In K. Berk and L. Malone (eds), Computer Science and Statistics, Proceedings of the 21st Symposium on the Interface, pp. 116–20. American Statistical Association.Google Scholar
  8. Carriero, N. and Gelernter, D. (1989) LINDA in context. Communications of the ACM, 32(4), 444–58.Google Scholar
  9. Chambers, J. M. (1977) Computational Methods for Data Analysis. Wiley, New York.Google Scholar
  10. Cray-1 Computer Systems (1981) Fortran (CFT) reference manual. Publication No. SR-0009, Rev. H.Google Scholar
  11. de Doncker, E. and Kapenga, J. (1989) Parallel multivariate numerical integration. In G. Rodrigue (ed.), Parallel Processing for Scientific Computing, pp. 109–13. SIAM, Philadelphia.Google Scholar
  12. de Doncker, E. and Vakalis, I. (1993) Convergence results and speedup of parallel numerical integration algorithms. In R. F. Sincovec, D. E. Keys, M. R. Leuze, L. R. Petzold and D. A. Reed (eds), Parallel Processing for Scientific Computing, Vol. 2, pp. 539–45. SIAM, Philadelphia.Google Scholar
  13. de Doncker, E., Kapenga, J. A., and McKean, J. W. (1989) Robust projection pursuit. In K. Berk and L. Malone (eds). Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 308–13. American Statistical Association.Google Scholar
  14. Dongarra, J. J. and Sorenson, D. C. (1987) A portable environment for developing parallel Fortran programs. Parallel Computing, 5, 139–54.Google Scholar
  15. Dongarra, J. J. and Tourancheau, B. (1992) Environments and Tools for Parallel Scientific Computing. North-Holland, Amsterdam.Google Scholar
  16. Dongarra, J. J., Duff, I. S., Sorenson, D. C. and van der Vorst, H. A. (1991) Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia.Google Scholar
  17. Du Croz, J. (1990) Supercomputing with the NAG Library. Supercomputer, 7(2), 72–80.Google Scholar
  18. Durst, M. J. (1987) Library software in the supercomputing environment. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 7–12. American Statistical Association.Google Scholar
  19. Eddy, W. F. (1986) Parallel architecture: a tutorial for statisticians. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 23–9. American Statistical Association.Google Scholar
  20. Eddy, W. F. and Schervish, M. J. (1986) Discrete-finite inference on a network of Vaxes. In T. M. Boardman and I. M. Stefanski, (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 30–6. American Statistical Association.Google Scholar
  21. Eddy, W. F., Meyer, M. M., Mockus, A., Schervish, M. J., Tan, K. and Viele, K. (1992) Smoothing census adjustment factors: an application of high performance computing. In H. J. Newton (ed.), Computing Science and Statistics, Proceedings of the 24th Symposium on the Interface, pp. 503/2-10. American Statistical Association.Google Scholar
  22. Eddy, W. F. and Schervish, M. J. (1991) Parallel computing—a tutorial for statisticians. In E. M. Keramidas (ed.), Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, pp. 479–86. Interface Foundation North America.Google Scholar
  23. Efron, B. and Tibshirani, R. J. (1993) An Introduction to the Bootstrap. Chapman and Hall, London.Google Scholar
  24. Encore (1988) Encore Parallel Fortran, Ref. No. 724–06785, Encore Computer Corporation, Fort Lauderdale, FL.Google Scholar
  25. Fahrmeir, L. (1977) Parallel estimation algorithms for stochastic parameters of time series models. In L. Feilmeier (ed.) Parallel Computers—Parallel Mathematics, pp. 99–102. North-Holland, Amsterdam.Google Scholar
  26. Flynn, M. J. (1972) Some computer organisations and their effectiveness. IEEE Transactions on Computers, 21(9), 948–60.Google Scholar
  27. Freeman, T. L. and Philips, C. (1992) Parallel Numerical Algorithms. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  28. Freisleben, B. (1993) Parallel learning algorithms for principal component extraction. In Proceedings of the 3rd International Conference on Artificial Neural Networks, 372, 267–71.Google Scholar
  29. Furnival, G. M. and Wilson, R. W. Jr. (1974) Regression by leaps and bounds. Technometrics, 16, 299–511.Google Scholar
  30. Geist, A., Beguelin, A., Dongarra, J., Weichang, J., Manchek, R. and Sunderam, V. (1993) PVM 3·0 User's Guide and Reference Manual. Tech. Rept. ORNL/TM-12187, Oak Ridge National Laboratory.Google Scholar
  31. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. and Sunderam, V. (1994) PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA. (also available online http:// www.netlib.org/pvm3/book/pvm-book.html).Google Scholar
  32. Gladwell, I. (1987) Vectorisation of one dimensional quadrature codes. In G. Fairweather and P. M. Keast (eds), Numerical Integration. Recent Developments, Software and Applications, NATO ASI Series C203, pp. 230–8.Google Scholar
  33. Golub, G. and Ortega, J. M. (1993) Scientific Computing an Introduction with Parallel Computing. Academic Press, New York.Google Scholar
  34. Gonzalez, C., Chen, J. and Sarma, J. (1988) A tool to generate FORTRAN parallel code for the Intel IPSC/2 Hypercube. In E. J, Wegman, D. T. Gantz and J. J. Miller (eds). Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 214–9. American Statistical Association.Google Scholar
  35. Grenander, U. and Miller, M. I. (1994) Representation of knowledge in Complex Systems. Journal of the Royal Statistical Society, Series B, 54(4), 549–603.Google Scholar
  36. Havránek, T. and Stratkoš, Z. (1989) On practical experience with parallel processing of linear models. Bulletin of the International Statistical Institute, 53, 105–17.Google Scholar
  37. Hawkins, D. M., Simonoff, J. S. and Stromberg, A. J. (1994) Distributing a computationally intensive estimator: the case of exact LMS regression. Computational Statistics, 9, 83–95.Google Scholar
  38. Healey, A. R. and Davies, S. T. (1983) Statistical model fitting on the ICL distributed array processors. In M. Feilmeier, J. Joubert and U. Schendel (eds), Parallel Computing '83 pp. 311–17, Elsevier, Amsterdam.Google Scholar
  39. Hénaff, P. J. and Norman, A. L. (1987) Solving nonlinear econometric models using vector processors. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 348–51. American Statistical Association.Google Scholar
  40. Hockney, R. W. and Jesshope, C. R. (1988) Parallel Computers 2. Adam Hilger, Bristol.Google Scholar
  41. Huber, P. J. (1985) Projection pursuit. Annals of Statistics, 13, 435–525.Google Scholar
  42. Hwang, K. (1993) Advanced Computer Architecture: Parallelism. Scalability, Programmability. McGraw-Hill, New York.Google Scholar
  43. Ihnen, L. (1989) Vectorisation of the SAS(R) System. In K. Berk and L. Malone (eds), Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 121–7. American Statistical Association.Google Scholar
  44. Inmos (1990) Transputer Development System (2nd edn.). Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  45. Jaeckel, L. A. (1972) Estimating regression coefficients by minimising the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–58.Google Scholar
  46. Kapenga, J. A. and McKean, J. W. (1987) The vectorisation of algorithms for R-estimates in linear regression. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 502–5. American Statistical Association.Google Scholar
  47. Kaufman, L. and Rousseeuw, P. J. (1986) Clustering large data sets. In E. Gelsema and L. Kanal (eds), Pattern Recognition in Practice II, pp. 425–37. Elsevier/North-Holland, Amsterdam.Google Scholar
  48. Kaufman, L., Hopke, P. K. and Rousseeuw, P. J. (1988) Using a parallel computer system for statistical resampling methods. Computational Statistics Quarterly, 2, 129–41.Google Scholar
  49. Kaufmann, W. J. and Smarr, L. L. (1993) Supercomputing and the Transformation of Science. Scientific American Library.Google Scholar
  50. Kleijnen, J. P. C. (1990) Supercomputers for Monte Carlo Simulation: Cross-validation versus Rao's test in multivariate analysis. In K. H. Jockes, G. Rothe and W. Sendler (eds), Bootstrapping and Related Techniques, pp. 233–45. Springer-Verlag, Berlin.Google Scholar
  51. Kleijnen, J. P. and Annink, B. (1992) Vector computers, Monte Carlo simulation and regression analysis: an introduction. Management Science, 38(2), 170–81.Google Scholar
  52. Lafaye de Micheaux, D. (1984) Parallelization of algorithms in the practice of statistical data. In T. Havránek, Z. Sidak and M. Novak (eds), COMPSTAT '84—Proceedings in Computational Statistics, pp. 293–300. Vienna.Google Scholar
  53. Lewis, T. G. and El-Rewini, H. (1992) Introduction to Parallel Processing, Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  54. Lootsma, F. A. (1989) Parallel Non Linear Optimisation. No. 89-45 Faculty of Tech. Math. and Informatics, Delft University of Tech.Google Scholar
  55. Lootsma, F. A. and Ragsdell, K. M. (1988) State-of-the-art in parallel nonlinear optimisation. Parallel Computing, 6, 133–55.Google Scholar
  56. Malfait, M., Roose, D. and Vandermeulen, D. (1993) A convergence measure and some parallel aspects of Markov chain Monte Carlo algorithms. In Su-Shing Chen (ed.), Neural and Stochastic Methods in Image and Signal Processing, Proc. SPIE 2032, 23–34.Google Scholar
  57. McCullagh, P. and Nelder, J. A. (1983) Generalised Linear Models. Chapman and Hall, London.Google Scholar
  58. McKean, J.W. and Hettmansperger, T. P. (1978) A robust analysis of the general linear model based on one step R-estimates. Biometrika, 65, 571–9.Google Scholar
  59. Mitchell, T. J. and Beauchamp, J. J. (1986) Algorithms for Bayesian variable selection in regression. In T. M. Boardman (ed.), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 181–2. American Statistical Association.Google Scholar
  60. Mitchell, T. J. and Morris, M. D. (1988) A Bayesian approach to the design and analysis of computational experiments. In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 49–51. American Statistical Association.Google Scholar
  61. Modi, J. J. (1988). Parallel Algorithms for Matrix Computations. Clarendon Press, Oxford.Google Scholar
  62. O'Sullivan, F. and Pawitan, Y. (1993) Multidimensional density estimation by tomography. Journal of the Royal Statistical Society, Series B, 55(2), 509–21.Google Scholar
  63. Ortega, J. M., Voigt, R. G. and Romine, C. H. (1990) A bibliography on parallel and vector numerical algorithms. In K.A. Gallivan, M. T. Heath, E. Ng, et al.Parallel Algorithms for Matrix Computations, pp. 125–97. SIAM, Philadelphia.Google Scholar
  64. Ostrouchov, G. (1987) Parallel computing on a hypercube: an overview of the architecture and some applications. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 27–32. American Statistical Association.Google Scholar
  65. Perrott, R. H. (1987) Parallel Programming. Addison-Wesley, Reading, MA.Google Scholar
  66. Quinn, M. J. (1987) Designing Efficient Algorithms for Parallel Computers. McGraw-Hill, New York.Google Scholar
  67. Raphalen, M. (1982) Applying parallel processing to data analysis: computing a distance's matrix on an SIMD machine. In H. Caussinus, P. Ettinger and R. Tomassone (eds), COMPSTAT '82—Proceedings in Computational Statistics, pp. 382–6. Physica-Verlag, Vienna.Google Scholar
  68. Rousseeuw, P. J. (1984) Least median of squares regression. Journal of the American Statistical Association, 79, 871–80.Google Scholar
  69. Schervish, M. J. (1988) Applications of parallel computation to statistical inference. Journal of the American Statistical Association, 83(404), 976–83.Google Scholar
  70. Schervish, M. J. and Tsay, R. S. (1988) Bayesian modelling and forecasting in large scale time series. In J. C. Spall (ed.), Bayesian Analysis of Time Series and Dynamic Models, pp. 23–52. Marcel Dekker, New York.Google Scholar
  71. Schnabel, R. B. (1988) Sequential and parallel methods for unconstrained optimization. Tech. Rept. CU-CS-414-88, Dept. of Comput. Sci., University of Colorado at Boulder, CO.Google Scholar
  72. Schork, N. J. and Hardwick, J. (1990) Supercomputer-intensive multivariable randomization tests. In C. Page and R. LePage (eds), Computing Science and Statistics, Proceedings of the 22nd Symposium on the Interface, pp. 509–13. Springer-Verlag, New York.Google Scholar
  73. Skvoretz, J., Smith, S. A. and Baldwin, C. (1992) Parallel processing applications for data analysis in the social sciences. Concurrency: Practice and Experience, 4(3), 207–21.Google Scholar
  74. Stewart, G. W. (1986) Communication in parallel algorithms: an example. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 11–14. American Statistical Association.Google Scholar
  75. Stewart, G. W. (1988) Parallel linear algebra in statistical computations. In D. Edwards and N. E. Raun (eds), COMPSTAT '88, Proceedings in Computational Statistics, pp. 3–14. Physica-Verlag, Vienna.Google Scholar
  76. Stine, R. A. and Woteki, T. H. (1989) A graphical programming environment for statistical simulations with parallel processing. In ASA Proceedings of the Statistical Computing Section, pp. 104–9. American Statistical Association.Google Scholar
  77. Stratkoš, Z. (1987) Effectivity and optimizing algorithms and programs on the host-computer/array processor systems. Parallel Computing 4, 197–207.Google Scholar
  78. Sylwestrowicz, J. D. (1982) Parallel processing in statistics. In H. Caussinus, P. Ettinger and R. Tomassone (eds), COMPSTAT '82—Proceedings in Computational Statistics, pp. 131–6. Physica-Verlag, Vienna.Google Scholar
  79. Thisted, R. A. (1988) Elements of Statistical Computing. Chapman and Hall, London.Google Scholar
  80. Wilson, G. V. (1993) A glossary of parallel computing terminology. IEEE Parallel and Distributed Terminology, February, pp. 52–67.Google Scholar
  81. Wollan, P. (1988) All-subsets regression on a hypercube computer. In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 224–7. American Statistical Association.Google Scholar
  82. Xu, C. W. and Shiue, W. K. (1991) Parallel bootstrap and inference for means. Computational Statistics Quarterly, 3, 233–9.Google Scholar
  83. Xu, C. W. and Shiue, W. K. (1993) Parallel algorithms for least median of squares regression. Computational Statistics and Data Analysis, 16, 349–62.Google Scholar
  84. Xu, M., Miller, J. J. and Wegman, E. J. (1989) Parallelizing mutiple linear regression for speed and redundancy: an empirical study. In K. Berk and L. Malone (eds), Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 138–44. American Statistical Association.Google Scholar
  85. Zenios, S. A. (1989) Parallel numerical optimization: current status and an annotated bibliography. Operational Research Society of America Journal of Computing, 1, 20–43.Google Scholar

Copyright information

© Chapman & Hall 1996

Authors and Affiliations

  • N. M. Adams
    • 1
  • S. P. J. Kirby
    • 1
  • P. Harris
    • 1
  • D. B. Clegg
    • 1
  1. 1.Department of StatisticsFaculty of Mathematics and Computing, The Open University, Wallon HallMilton KeynesUK

Personalised recommendations