Computational Economics

, Volume 40, Issue 2, pp 151–182 | Cite as

Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control

  • Sergei MorozovEmail author
  • Sudhanshu Mathur


The rapid growth in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many non-graphics applications, including a wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation trade-off, that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of a linear autoregression with the unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and the policy function need not be continuous. This complication makes the problem a suitable target for massively-parallel computation using graphics processors (GPUs). Our findings are cautiously optimistic in that the new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting single-core processors. Further gains up to a factor of 26 are also achievable but lie behind a learning and experimentation barrier of their own. Drawing upon experience with CUDA programming architecture and GPUs provides general lessons on how to best exploit future trends in parallel computation in economics.


Graphics processing units CUDA programming Dynamic programming Learning Experimentation 

JEL Classification

C630 C800 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abdelkhalek, A., Bilas, A., & Michaelides, A. (2001). Parallelization, optimization and performance analysis of portfolio choice models. In Proceedings of the 2001 International Conference on Parallel Processing (ICPP01).Google Scholar
  2. Aldrich E. M., Fernández-Villaverde J., Gallant A. R., Rubio-Ramírez J. F. (2011) Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors. Journal of Economic Dynamics and Control 35(3): 386–393CrossRefGoogle Scholar
  3. Backus J. (1977) Can programming be liberated from the von Neumann style?.   Communications of the ACM 21(8): 613–641CrossRefGoogle Scholar
  4. Bargen B., Donnelly P. (1998) Inside DirectX. Microsoft Programming Series. Microsoft Press, Redmond, WashingtonGoogle Scholar
  5. Barros, K. (2009). CUDA tricks and computational physics. Guest lecture, Course 6.963, Massachusetts Institute of Technology.Google Scholar
  6. Barros, K., Babich, R., Brower, R., Clark, M. A., & Rebbi, C. (2008). Blasting through lattice calculations using CUDA. Technical report, The XXVI International Symposium on Lattice Field Theory, Williamsburg, Virginia, USA.Google Scholar
  7. Beck G. W., Wieland V. (2002) Learning and control in a changing economic environment. Journal of Economic Dynamics and Control 26(9–10): 1359–1377CrossRefGoogle Scholar
  8. Bennemann, C., Beinker, M. W., Eggloff, D., & Guckler, M. (2008). Teraflops for games and derivatives pricing. Wilmott Magazine, 50–54.Google Scholar
  9. Bertsekas D. P. (2001) Dynamic programming and optimal control, vol 2 (2nd ed.). Athena Scientific, Nashua, NHGoogle Scholar
  10. Bertsekas D. P. (2005) Dynamic programming and optimal control, vol 1 (3rd ed.). Athena Scientific, Nashua, NHGoogle Scholar
  11. Boyd, C., & Schmit, M. (2009). Direct3D 11 compute shader. WinHEC 2008 presentation.Google Scholar
  12. Boyer, M., Tarjan, D., Acton, S. T., & Skadron, K. (2009). Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors. In 23rd IEEE International Parallel and Distributed Processing Symposium, Rome, Italy.Google Scholar
  13. Brent R. P. (1973) Algorithms for minimization without derivatives. Prentice-Hall, Englewood CliffsGoogle Scholar
  14. Brezzia M., Lai T. L. (2002) Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27(1): 87–108CrossRefGoogle Scholar
  15. Buck, I. (2005). Stream computing on graphics hardware. Ph.D. Dissertation, Stanford University, Stanford, CA, USA.Google Scholar
  16. Chandra R., Menon R., Dagum L., Kohr D. (2000) Parallel programming in OpenMP. Morgan Kaufmann, San FransiscoGoogle Scholar
  17. Chapman B., Jost G., Paas R. (2007) Using OpenMP: Portable shared memory parallel programming. Scientific and engineering computation series. MIT Press, Cambridge, MAGoogle Scholar
  18. Chong Y. Y., Hendry D. F. (1986) Econometric evaluation of linear macro-economic models. The Review of Economic Studies 53(4): 671–690CrossRefGoogle Scholar
  19. Coleman, W. J. (1992). Solving nonlinear dynamic models on parallel computers. Discussion Paper 66, Institute for Empirical Macroeconomics, Federal Reserve Bank of Minneapolis.Google Scholar
  20. Conn A. R., Gould N. I. M., Toint P. L. (1997) A globally convergent augmented Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds. Mathematics of Computation 66(217): 261–288CrossRefGoogle Scholar
  21. Creel M. (2005) User-friendly parallel computations with econometric examples. Computational Economics 26(2): 107–128CrossRefGoogle Scholar
  22. Creel M., Goffe W. L. (2008) Multi-core CPUs, clusters, and grid computing: A tutorial. Computational Economics 32(4): 353–382CrossRefGoogle Scholar
  23. Doornik J. A., Hendry D. F., Shephard N. (2002) Computationally-intensive econometrics using a distributed matrix-programming language. Philosophical Transactions of the Royal Society of London, Series A 360: 1245–1266CrossRefGoogle Scholar
  24. Doornik J.A., Shephard N., Hendry D.F. (2006) Parallel computation in econometrics: A simplified approach. In: Kontoghiorghes E.J. (eds) Handbook of parallel computing and statistics, Statistics: A series of TEXTBOOKS and MONOGRAPHS Chap. 15. Chapman & Hall/CRC, Boca Raton, FL USA, pp 449–476Google Scholar
  25. Easley D., Kiefer N. M. (1988) Controlling a stochastic process with unknown parameters. Econometrica 56(5): 1045–1064CrossRefGoogle Scholar
  26. Ferrall, C. (2003). Solving finite mixture models in parallel. Computational Economics 0303003, EconWPA.Google Scholar
  27. Ghuloum, A., Sprangle, E., Fang, J., Wu, G., & Zhou, X. (2007). Ct: A flexible parallel programming model for Tera-scale architectures. White paper, Intel Corporation.Google Scholar
  28. Goffe W. L., Ferrier G. D., Rogers J. (1994) Global optimization of statistical functions with simulated annealing. Journal of Econometrics 60(1–2): 65–99CrossRefGoogle Scholar
  29. Goldberg D. E. (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, MAGoogle Scholar
  30. Gropp W., Lusk E., Skjellum A., Thakur R. (1999) Using MPI: Portable parallel programming with message passing interface. Scientific and engineering computation series (2nd ed.). MIT Press, Cambridge, MAGoogle Scholar
  31. Harris M. (2005) Mapping computational concepts to GPUs. In: Parr M (eds) GPU gems (Vol. 2, Chap. 31). Addison-Wesley, Boston, pp 493–500Google Scholar
  32. Horst, R., Paradalos, P.M (eds) (1994) Handbook of global optimization, Nonconvex optimization and its applications (Vol 1). Kluwer Academic Publishers, Dordrecht, The NetherlandsGoogle Scholar
  33. Howes, L., & Thomas, D. (2007). Efficient random number generation and application using CUDA. In H. Nguyen (Ed.) GPU gems (Vol. 3, Chap. 37) (pp. 805–830). Boston: Addison-Wesley.Google Scholar
  34. Hwu, W. W. (eds) (2011) GPU computing gems emerald edition. Applications of GPU computing series. Morgan Kaufmann, Morgan KaufmannGoogle Scholar
  35. Judge G. G., Lee T.-C., Hill R. C. (1988) Introduction to the theory and practice of econometrics (2nd ed.). Wiley, New YorkGoogle Scholar
  36. Kendrick D. A. (1978) Non-convexities from probing an adaptive control problem. Journal of Economic Letters 1(4): 347–351CrossRefGoogle Scholar
  37. Kessenich J., Baldwin D., Rost R. (2006) The OpenGL shading language. Khronos Group, BeavertonGoogle Scholar
  38. Kirk D. B., Hwu W. W. (2010) Programming massively parallel processors: A hands-on approach. Morgan-Kaufmann, San FransiscoGoogle Scholar
  39. Kirkpatrick S., Gelatt C. D. Jr, Vecchi M. P. (1983) Optimization by simulated annealing. Science 220(4598): 671–680CrossRefGoogle Scholar
  40. Kola, K., Chhabra, A., Thulasiram, R. K., & Thulasiraman, P. (2006). A software architecture framework for on-line option pricing. In Proceedings of the 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA-06) (vol. 4330) Lecture notes in computer science, (pp. 747–759). Sorrento, Italy: Springer-Verlag.Google Scholar
  41. Lagarias J. C., Reeds J. A., Wright M. H., Wright P. E. (1998) Convergence properties of the Nelder-Meade simplex method in low dimensions. SIAM Journal on Optimization 9(1): 112–147CrossRefGoogle Scholar
  42. Lee A., Yau C., Giles M. B., Doucet A., Holmes C. C. (2010) On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics 19(4): 769–789CrossRefGoogle Scholar
  43. Lee V.W., Kim C., Chhugani J., Deisher M., Kim D., Nguyen A.D. et al (2010) Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Computer Architecture News 38(3): 451–460Google Scholar
  44. Lewis R. M., Torczon V. (2002) A globally convergent augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM Journal on Optimization 12(4): 1075–1089CrossRefGoogle Scholar
  45. Lindoff, B., & Holst, J. (1997). Suboptimal dual control of stochastic systems with time-varying parameters. mimeo, Department of Mathematical Statistics, Lund Institute of Technology.Google Scholar
  46. Morozov, S. (2008). Learning and active control of stationary autoregression with unknown slope and persistence. Working paper, Stanford University, Palo AltoGoogle Scholar
  47. Morozov, S. (2009a). Bayesian active learning and control with uncertain two-period impulse response. Working paper, Stanford University, Palo AltoGoogle Scholar
  48. Morozov, S. (2009b). Limits of passive learning in the Bayesian active learning control of drifting coefficient regression. Working paper, Stanford University, Palo AltoGoogle Scholar
  49. Munshi A., Gaster B., Mattson T.G., Fung J., Ginsburg D. (2011) OpenCL programming guide. Addison-Wesley Professional, Reading MAGoogle Scholar
  50. Nagurney A., Takayama T., Zhang D. (1995) Massively parallel computation of spatial price equilibrium problems as dynamical systems. Journal of Economic Dynamics and Control 19(1–2): 3–37CrossRefGoogle Scholar
  51. Nagurney A., Zhang D. (1998) A massively parallel implementation of discrete-time algorithm for the computation of dynamic elastic demand and traffic problems modeled as projected dynamical systems. Journal of Economic Dynamics and Control 22(8–9): 1467–1485CrossRefGoogle Scholar
  52. Nelder J.A., Mead R. (1965) A simplex method for function minimization. Computer Journal 7: 308–313Google Scholar
  53. NVIDIA: (2009a) CUBLAS library (2.3nd ed.). NVIDIA Corporation, Santa Clara, CAGoogle Scholar
  54. NVIDIA: (2009b) CUFFT library (2.3nd ed.). NVIDIA Corporation, Santa Clara, CAGoogle Scholar
  55. NVIDIA: (2009c) NVIDIA CUDA C programming best practices guide (CUDA Toolkit) (2.3nd ed.). NVIDIA Corporation, Santa Clara, CA 95050Google Scholar
  56. NVIDIA: (2010) CUDA CUSPARSE Library. NVIDIA Corporation, Santa Clara, CAGoogle Scholar
  57. NVIDIA: (2011) NVIDIA CUDA C programming guide. (3.2 ed.). NVIDIA Corporation, Santa Clara, CAGoogle Scholar
  58. Paradalos, P. M. (2002). In H. E. Romeijn (Ed.), Handbook of global optimization, Nonconvex optimization and its applications (Vol. 2). Dordrecht, The Netherlands: Kluwer Academic PublishersGoogle Scholar
  59. Pflug G. C., Swietanowski A. (2000) Selected parallel optimization methods for financial management under uncertainty. Parallel Computing 26(1): 3–25CrossRefGoogle Scholar
  60. Porteus E.L., Totten J. (1978) Accelerated computation of the expected discounted returns in a Markov chain. Operations Research 26(2): 350–358CrossRefGoogle Scholar
  61. Prescott E.C. (1972) The multi-period control problem under uncertainty. Econometrica 40(6): 1043–1058CrossRefGoogle Scholar
  62. Rahman M.R., Thulasiram R.K., Thulasiraman P. (2002) Forecasting stock prices using neural networks on a Beowulf cluster. In: Akl S., Gonzalez T (eds) Proceedings of the Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2002). MIT Press, Cambridge, MA, pp 470–475Google Scholar
  63. Sanders J., Kandrot E. (2010) CUDA by example: An introduction to general-purpose GPU programming. Addison-Wesley Professional, Upper Saddle River, NJGoogle Scholar
  64. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., et al. (2008). Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), 15 pp.Google Scholar
  65. Sims C. A., Waggoner D. F., Zha T. (2008) Methods for inference in large-scale multiple equation Markov-switching models. Journal of Econometrics 142(2): 255–274CrossRefGoogle Scholar
  66. Spendley W., Hext G. R., Himsworth F. R. (1962) Sequential application of simplex designs in optimization and evolutionary design. Technometrics 4: 441–461CrossRefGoogle Scholar
  67. Svensson L. E. O. (1997) Optimal inflation targets, ‘conservative’ central banks, and linear inflation contracts. American Economic Review 87(1): 96–114Google Scholar
  68. Swann C. A. (2002) Maximum likelihood estimation using parallel computing: An introduction to MPI. Computational Economics 19(2): 145–178CrossRefGoogle Scholar
  69. The Portland Group (2010a). CUDA Fortran programming guide and reference. The Portland Group. Release 2011.Google Scholar
  70. The Portland Group (2010b). PGI Fortran & C accelerator compilers and programming Model. The Portland Group. Version 1.3.Google Scholar
  71. Tibbits, M. M., Haran, M., & Liechty, J. C. (2009). Parallel multivariate slice sampling. Working paper, Pennsylvania State University.Google Scholar
  72. Tomov, S., Dongarra, J., Volkov, V., & Demmel, J. (2009). MAGMA Library. University of Tennessee, Knoxville, and University of California, Berkeley.Google Scholar
  73. Wieland V. (2000) Learning by doing and the value of optimal experimentation. Journal of Economic Dynamics and Control 24(4): 501–535CrossRefGoogle Scholar
  74. Zabinsky Z. B. (2005) Stochastic adaptive search for global optimization. Springer, New YorkGoogle Scholar
  75. Zenios S. A. (1999) High-performance computing in finance: The last 10 years and the next. Parallel Computing 25(13–14): 2149–2175CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC. 2011

Authors and Affiliations

  1. 1.Morgan StanleyNew YorkUSA
  2. 2.Deutshe BankMumbaiIndia

Personalised recommendations