Journal of Statistical Theory and Practice

, Volume 6, Issue 3, pp 536–565 | Cite as

Parallel Statistical Computing for Statistical Inference

  • Guangbao GuoEmail author


Parallel statistical computing is an interesting and topical problem, driven by recent growth in the size of statistical data sets and the availability of network computing. This article reviews parallel statistical computing in regression analysis, nonparametric inference, and stochastic processes. In particular, we describe a range of methods including parallel multisplitting and the parallel QR method for least squares estimation in linear regression, parallel computing methods for nonlinear regression, the theoretical framework of the parallel bootstrap in nonparametric inference, preconditioner methods for Markov chains, and parallel Markov-chain Monte Carlo methods. We conclude that there is a need for further research in parallel statistical computing, and describe some of the important unsolved problems.

AMS Subject Classification

62G07 62G09 62J02 62J05 58J65 


Nonparametric inference Regression Statistical computing Stochastic processes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adams, N. M., S. P. J. Kirby, P. Harris, and D. B. Clegg. 1996. A review of parallel processing for statistical computation. Statist. Computing, 6, 37–49.CrossRefGoogle Scholar
  2. Arnal, J., V. Migallon, J. Penades, and D. B. Szyld. 2008. Newton additive and multiplicative Schwarz iterative methods. IMA J. Numer. Anal., 28, 143–161.MathSciNetCrossRefGoogle Scholar
  3. Azzini, I., R. Girardi, and M. Ratto. 2007. Parallelization of Matlab codes under Windows platform for Bayesian estimation: a dynare application. Working Paper 1, Euro-area Economy Modelling Centre. Available at: Scholar
  4. Björck, A. 1996. Numerical methods for least squares problems. Philadelphia, SIAM.CrossRefGoogle Scholar
  5. Beddo, V. 2002. Applications of parallel programming in statistics. PhD dissertation, University of California, Los Angeles.Google Scholar
  6. Beliakov, G. 2011. Parallel calculation of the median and order statistics on GPUs with application to robust regression. arXiv:1104.2732v1.Google Scholar
  7. Benzi, M., and T. Dayar. 1995. The arithmetic mean method for finding the stationary vector of Markov chains. Parallel Algorithms Appl., 6, 25–37.CrossRefGoogle Scholar
  8. Benzi, M., F. Sgallari, and G. Spaletta. 1995. A parallel block projection method of the Cimmino type for finite Markov chains. In Computations with Markov chains, ed. W. J. Stewart, 65–80. Dordrecht, Kluwer Academic.CrossRefGoogle Scholar
  9. Benzi, M., and M. Tuma. 2002. A parallel solver for large-scale Markov chains. Appl. Numer. Math., 41, 135–153.MathSciNetCrossRefGoogle Scholar
  10. Bouaricha, A., and R. B. Schnabel. 1993. Parallel tensor methods for nonlinear equations and nonlinear least squares. PPSC, 639–643.Google Scholar
  11. Bouyouli, R., K. Jbilou, R. Sadaka, and H. Sadok. 2006. Convergence properties of some block Krylov subspace methods for multiple linear systems. J. Comput. Appl. Math., 196, 498–511.MathSciNetCrossRefGoogle Scholar
  12. Brockwell, A. 2006. Parallel Markov chain Monte Carlo simulation by prefetching. J. Comput. Graphical Stati., 15, 246–261.CrossRefGoogle Scholar
  13. Bru, R., F. Pedroche, and D. B. Szyld. 2005. Additive Schwarz iterations for Markov chains. SIAM J. Matrix Anal. Appl., 27, 445–458.MathSciNetCrossRefGoogle Scholar
  14. Buckner, J., J. Wilson, M. Seligman, B. Athey, S. Watson, and F. Meng. 2010. The gputools package enables GPU computing in R. Bioinformatics, 26, 134–135.CrossRefGoogle Scholar
  15. Burrage, K., P. M. Burrage, and T. Tian. 2004. Numerical methods for strong solutions of stochastic differential equations: An overview. Proc. R. Soc. Lond. A, 460, 373–402.MathSciNetCrossRefGoogle Scholar
  16. Bylina, J. 2004. A distributed approach to solve large Markov chains. Proceedings from EuroNGIWorkshop: New Trends in Modeling, Quantitative Methods and Measurements, 145–154. Gliwice, Jacek Skalmierski Computer Studio.Google Scholar
  17. Chilson, J., R. Ng, A. Wagner, and R. Zamar. 2006. Parallel computation of high-dimensional robust correlation and covariance matrices. Algorithmica, 45, 403–431.MathSciNetCrossRefGoogle Scholar
  18. Coleman, T. F., and P. E. Plassmann. 1992. A parallel nonlinear least-squares solver: Theoretical analysis and numerical results. SIAM J. Sci. Stat. Comput., 13, 771–793.MathSciNetCrossRefGoogle Scholar
  19. Craiu, R. V., and X. L. Meng. 2005. Multi-process parallel antithetic coupling for forward and backward Markov chain Monte Carlo. Ann. Stat., 33, 661–697.CrossRefGoogle Scholar
  20. Craiu, R. V., J. S. Rosenthal, and C. Yang. 2009. Learn from thy neighbor: Parallel-chain and regional adaptive MCMC. J. Am. Stat. Associ., 104, 1454–1466.MathSciNetCrossRefGoogle Scholar
  21. Creel, M. 2005. User-friendly parallel computations with econometric examples. Comput. Econ., 26(2), 107–128.CrossRefGoogle Scholar
  22. Creel, M., and W. L. Goffe. 2008. Multi-core CPUs, clusters, and grid computing: A tutorial. Comput. Econom., 32(4), 353–382.CrossRefGoogle Scholar
  23. Dai, B., Y. Peng, and B. Gong. 2010. Parallel option pricing with BSDE method on GPU. Ninth International Conference on Grid and Cloud Computing, 191–195, IEEE.Google Scholar
  24. Dennis, J. E., and T. Steihaug. 1998. A Ferris-Mangasarian technique applied to linear least squares problems. Tech. Rep. CRPC-TR 98740. Rice University.Google Scholar
  25. Dhillon, I. S., and D. S. Modha. 2000. A parallel data-clustering algorithm for distributed memory multiprocessors. In Large-scale parallel data mining, Lecture notes in artificial intelligence, ed. M. J. Zaki and C. T. Ho, vol. 1759, 245–260. New York, Springer-Verlag.CrossRefGoogle Scholar
  26. Fischer, M., and P. Kemper. 2001. Distributed numerical Markov chain analysis. In Proc. 8th Euro PVM/MPI, vol. 2131 of LNCS, ed. Y. Cotronis and J. Dongarra. 272–279. Santorini, Greece.Google Scholar
  27. Flegal, J. M., and G. L. Jones. 2010. Implementing Markov chain Monte Carlo: Estimating with confidence. In Handbook of Markov chain Monte Carlo, ed. S. Brooks, A. Gelman, G. Jones, and X. Meng., p. 175–197, Boca Raton, FL, Chapman and Hall/CRC Press.Google Scholar
  28. Flynn, M. J. 1966. Very high-speed computing systems. Proc. IEEE, 54, 1901–1909.CrossRefGoogle Scholar
  29. Flynn, M. J. 1972. Some computer organizations and their effectiveness. IEEE Trans. Computers, C-21, 948–960.CrossRefGoogle Scholar
  30. Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences (with discussion). Statis. Sci., 7, 457–511.CrossRefGoogle Scholar
  31. Gentle J. E., W. Härdle, and Y. Mori, eds. 2004. Handbook of computational statistics. New York, Springer.zbMATHGoogle Scholar
  32. Geyer, C. J. 1991. Markov chain Monte Carlo maximum likelihood. In Computing science and statistics: Proceedings of the 23 rd Symposium on the Interface, ed. E. M. Keramidas. 156–163. Fairfax Station, VA, Interface Foundation.Google Scholar
  33. Guo, G. 2008. Schwarz methods for quasi-likelihood in generalized linear models. Commun. Stat. Simul Comput., 37, 2027–2036.MathSciNetCrossRefGoogle Scholar
  34. Guo, G., and S. Lin. 2010. Schwarz method for penalized quasi-likelihood in generalized additive models. Commun. Statist. Theory Methods, 39, 1847–1854.MathSciNetCrossRefGoogle Scholar
  35. Guo, G., and W. Zhao. 2012. Schwarz methods for quasi stationary distributions of Markov chains. Calcolo, 49, 21–39.MathSciNetCrossRefGoogle Scholar
  36. Gursoy, A. 2003. Data decomposition for parallel k-means clustering. In PPAM, Lecture Notes in Computer Science, vol. 319. ed. R. Wyrzykowski, J. Dongarra, M. Paprzycki, and J. Wasniewski. 241–248. New York, Springer.zbMATHGoogle Scholar
  37. Havranek, T., and Z. Stratkos. 1989. On practical experience with parallel processing of linear models. Bull. Inter. Statis. Inst., 53, 105–117.Google Scholar
  38. Hayfield, T., and J. Racine. 2008. Nonparametric econometrics: The np package. J. Stat. Software, 27, 1–32.CrossRefGoogle Scholar
  39. Kontoghiorghes, E. 1999. Special issue on parallel processing and statistics. Comput. Statist. Data Anal., 31, 373–516.CrossRefGoogle Scholar
  40. Keese, A. 2003. A review of recent developments in the numerical solution of stochastic PDEs (stochastic finite elements). Informatikbericht 2003-6. Braunschweig, Germany, Technische Universitat Braunschweig.Google Scholar
  41. Keese, A., and H. G. Matthies. 2003. Hierarchical parallel solution of stochastic systems. In Computational fluid and solid mechanics, vol. 2. ed. K.-J. Bathe. 2023–2025. Amsterdam: Elsevier.zbMATHGoogle Scholar
  42. Keese, A., and H. G. Matthies. 2004. Parallel computation of stochastic groundwater flow. Proc. NIC Symposium. p. 399–408, Germany.Google Scholar
  43. Kontoghiorghes, E. 2000. Parallel algorithms for linear models: Numerical methods and estimation problems, Advances in computational economics, vol. 15. Boston, MA, Kluwer Academic Publishers.CrossRefGoogle Scholar
  44. Kontoghiorghes, E. 2006. Handbook of parallel computing and statistics. Boca Raton, FL, CRC Press.zbMATHGoogle Scholar
  45. Kwiatkowska, M., D. Parker, Y. Zhang, and R. Mehmood. 2004. Dual-processor parallelization of symbolic probabilistic model checking. MASCOTS’04. 123–130. IEEE Computer Society, Volendam, The Netherlands.Google Scholar
  46. Lee, A., C. Yau, M. Giles, A. Doucet, and C. Holmes. 2010. On the utility of graphics cards to perform massively parallel simulation with advanced Monte Carlo methods. J. Comp. Graph. Stat., 19(4), 769–789.CrossRefGoogle Scholar
  47. Liu, H., Y. Peng, D. Wei, and B. Dai. 2011. X10 implementation of parallel option pricing with BSDE method. ACM SIGPLAN X10 Workshop. In Proceedings of the ACM SIGPLAN X10’11 Workshop, California, USA, June 2011.Google Scholar
  48. Lozano, E., and E. Acuña. 2003. Parallel computation of kernel density estimates classifiers and their ensembles. Proc. International Conference on Computer, Communication and Control Technologies, Orlando, FL.Google Scholar
  49. Lukasik, S. 2007. Parallel computing of kernel density estimates with MPI. Lecture Notes Computer Sci., 4489, 726–734.CrossRefGoogle Scholar
  50. Lubinsky, B., and F. Nicolls. 2011. Fast implementation of the FRAME algorithm using a GPU Gibbs sampler. PRASA2011, Johannesburg, South Africa.Google Scholar
  51. Marek, I., and D. B. Szyld. 2004. Algebraic Schwarz methods for the numerical solution of Markov chains. Linear Algebra Appl., 386, 67–81.MathSciNetCrossRefGoogle Scholar
  52. Mehmood, R., and J. Crowcroft. 2005. Parallel iterative solution method for large sparse linear equation systems. Tech. Rep. UCAM-CL-TR-650, Computer Laboratory, University of Cambridge, UK.Google Scholar
  53. Murray, L. 2012. GPU acceleration of the particle filter: The Metropolis resampler. arXiv:1202.6163.Google Scholar
  54. Hasenbusch, M., and S. Schaefer. 2010. Speeding up parallel tempering simulations. Phys. Rev., E82. 046707.Google Scholar
  55. Heeswijk, M., Y. Miche, E. Oja, and A. Lendasse. 2011. GPU-accelerated and parallelized ELM ensembles for large-scale regression. Neurocomputing, 74(16), 2430–2437.CrossRefGoogle Scholar
  56. Hegland, M., I. McIntosh, and B. A. Turlach. 1999. A parallel solver for generalized additive models. Comput. Stat. Data Anal., 31(4), 377–396.CrossRefGoogle Scholar
  57. Hussain, H. M., K. Benkrid, A. T. Erdogan, and H. Seker. 2011. Highly parameterized k-means clustering on FPGAs: Comparative results with GPPs and GPUs. Proc. ReConFig (2011), Cancun, Mexico.Google Scholar
  58. Nagel, K., and M. Rickert. 2001. Parallel implementation of the TRANSIMS microsimulation. Parallel Comput., 27, 1611–1639.CrossRefGoogle Scholar
  59. Nakano, J. 2004. Parallel computing techniques. In Handbook of computational statistics, ed. J. E. Gentle, W. Hadle, Y. Mori. 237–266. Berlin, Germany, Springer.Google Scholar
  60. Niemi, J., and M. Wheeler. 2011. Efficient Bayesian inference in stochastic chemical kinetic models using graphical processing units. arXiv:1101.4242v1.Google Scholar
  61. Owens, J., M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips. 2008. GPU computing. Proc. IEEE 96(5), 879–899.CrossRefGoogle Scholar
  62. Pagan, A., and A. Ullah. 1999. Nonparametric econometrics. New York, Cambridge University Press.CrossRefGoogle Scholar
  63. Pan, J., and D. Manocha. 2011. Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. GISACM (2011). 211–220.Google Scholar
  64. Peng, Y., B. Gong, H. Liu, and Zhang. 2010. Parallel computing for option pricing based on the backward stochastic diffierential equation. High performance computing and applications (Lecture Notes in Computer Science, vol. 5938), 325–330. Berlin, Springer.CrossRefGoogle Scholar
  65. Peng, Y., B. Gong, H. Liu, and B. Dai. 2011. Option pricing on the GPU with backward stochastic differential equation. PAAP. 19–23. Fourth International Symposium on Parallel Architectures, Algorithms and Programming.Google Scholar
  66. Platen, E., and N. Bruti-Liberati. 2010. Numerical solution of SDEs with jumps in finance. Applications of mathematics. Berlin, Springer.zbMATHGoogle Scholar
  67. Preis, T. 2011. GPU-computing in econophysics and statistical physics. EPJ-Special Topics, 194, 87–119.CrossRefGoogle Scholar
  68. Racine, J. 1995. Parallel distributed kernel estimation. Comput. Stat. Data Anal., 40, 293–302.MathSciNetCrossRefGoogle Scholar
  69. Racine, J., J. Hart, and Q. Li. 2006. Testing the significance of categorical predictor variables in nonparametric regression models. Econometric Rev., 25, 523–544.MathSciNetCrossRefGoogle Scholar
  70. Renaut, R. A. 1998. A parallel multisplitting solution of the least squares problem. Numer. Linear Algebra Appl., 5, 11–31.MathSciNetCrossRefGoogle Scholar
  71. Rossini, A. J., L. Tierney, and N. Li. 2007. Simple parallel statistical computing in R. J. Comput. Graph. Stat., 16, 399–420.MathSciNetCrossRefGoogle Scholar
  72. Rue, H. 2001. Fast sampling of Gaussian Markov random fields. J. R. Stat. Soc. B, 63, 325–338.MathSciNetCrossRefGoogle Scholar
  73. Ruoming, J., and Agrawal, G. 2001. A Middleware for developing parallel data mining applications. Proc. First SIAM Conference on Data Mining. Chicago, IL. 2001.Google Scholar
  74. Sarkar, A., N. Benabbou, and R. Ghanem. 2006. Domain decomposition of stochastic PDEs and its parallel. HPCS 2006, 14–17.Google Scholar
  75. Schmidberger, M. 2009. Parallel computing for biological data. Dissertation, LMU Munchen, Fakultat fur Mathematik, Informatik und Statistik, Munich, Germany.Google Scholar
  76. Silverman, B. W. 1985. Density estimation for statistics and data analysis. London, Chapman and Hall.Google Scholar
  77. Steinsland, I. 2007. Parallel exact sampling and evaluation of Gaussian Markov random fields. Comput. Stat. Data Anal., 51, 2969–2981.MathSciNetCrossRefGoogle Scholar
  78. Stewart, W. J. 2007. Performance modeling and Markov chains. In Formal methods for performance evaluation, ed. M. Bernardo and J. Hillston. 1–33. SF. 2007. LNC. 4486. New York, Springer.Google Scholar
  79. Strid, I. 2010. Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach. Comput. Stat. Data Anal., 54, 2814–2835.MathSciNetCrossRefGoogle Scholar
  80. Subber, W. and A. Sarkar. 2010. Domain decomposition of stochastic PDEs: A novel preconditioner and its parallel performance. In High performance computing systems and applications, ed. D. Mewhort, N. Cann, G. Slater, and T. Naughton. 251–268. New York, Springer.Google Scholar
  81. Suchard, M. A., Q. Wang, C. Chan, J. Frelinger, A. Cron, and M. West. 2010. Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J. Comput. Graph. Stat., 19(2), 419–438.MathSciNetCrossRefGoogle Scholar
  82. Suri, R., D. Deodhare, and P. Nagabhushan. 2002. Parallel Levenberg-Marquardt-based neural network training on linux clusters—A case study. ICVGIP, Ahmadabad, India.Google Scholar
  83. Temple Lang, D. 1997. A multi-threaded extension to a high level interactive statistical computing environment. PhD dissertation, University of California, Berkeley.Google Scholar
  84. Tibbits, M. M., M. Haran, and J. C. Liechty. 2010. Parallel multivariate slice sampling. Stat. Comput., 20, 1–16.zbMATHGoogle Scholar
  85. Trebst, S., M. Troyer, and U. Hansmann. 2006. Optimized parallel tempering simulations of proteins. J. Chem. Phys., 124, 174903.CrossRefGoogle Scholar
  86. Tran, M. 2010. A parallel four step domain decomposition scheme for coupled forward backward stochastic differential equation. arXiv:1008.0353v1.Google Scholar
  87. Weare, J. 2007. Efficient conditional path sampling of stochastic differential equations by parallel marginalization. Proc. Natl. Acad. Sci. USA, 104, 12657–12662.CrossRefGoogle Scholar
  88. Whiley, M. and S. P. Wilson. 2004. Parallel algorithms for Markov chain Monte Carlo in latent spatial Gaussian models. Stat. Comput., 14, 171–179.MathSciNetCrossRefGoogle Scholar
  89. Wilkinson, D. 2006. Parallel Bayesian computation. In Handbook of parallel computing and statistics, ed. E. J. Kontoghiorghes. 477–508. Boca Raton, FL, Chapman and Hall.Google Scholar
  90. Xiu, D., and G. E. Karniadakis. 2003. Modeling uncertainty in flow simulations via generalized polynomial chaos. J. Comput. Phys., 187, 137–167MathSciNetCrossRefGoogle Scholar
  91. Xu, M., E. Wegman, and J. Miller. 1991. Parallelizing multiple linear regression for speed and redundancy: An empirical study. J. Stat Comput. Simulation, 39, 205–214.CrossRefGoogle Scholar
  92. Yan, J., M. K. Cowles, S. Wang, and M. P. Armstrong. 2007. Parallelizing MCMC for Bayesian spatiotemporal geostatistical models. Stat. Comput., 17, 323–335.MathSciNetCrossRefGoogle Scholar
  93. Yang, T. 1996. Execution time analysis for least squares problems on massively parallel distributed memory computers. Proc. International Conference on Computational Modeling and Computing (CMCP-96), Dubna, Russia.Google Scholar
  94. Zareski, D., B. Wade, P. Hubbard, and P. Shirley. 1995. Efficient parallel global illumination using density-estimation. Proc. ACM Parallel Rendering Symposium. 47–54. Atlanta, GA. 1995.CrossRefGoogle Scholar
  95. Zhang, Y., D. Parker, and M. Kwiatkowska. 2005. A wavefront parallelisation of CTMC solution using MTBDDs. In DSN’05. 732–742. IEEE Computer Society Press, Yokohama, Japan.Google Scholar
  96. Zhou, H., K. Lange, and M. A. Suchard. 2010. Graphics processing units and high-dimensional optimization. Stat. Sci., 25(3), 311–324.MathSciNetCrossRefGoogle Scholar
  97. Zhu, W., and Y. Li. 2010. GPU-accelerated differential evolutionary Markov chain Monte Carlo method for multi-objective optimization over continuous space. Proc. 7th IEEE ICAC-BADS, Washington, DC.Google Scholar

Copyright information

© Grace Scientific Publishing 2012

Authors and Affiliations

  1. 1.Department of StatisticsShandong University of TechnologyZibo, ShandongChina

Personalised recommendations