A Method to Pre-compile Numerical Integrals When Solving Stochastic Dynamic Problems

We show how the interpolation step of numerical integration can be pre-compiled in partial equilibrium stochastic dynamic problems. We display the pre-compilation’s sufficient conditions and document its speed gains using a consumption-savings model with a discrete labour choice, wage uncertainty and stochastic non-labour income.

compiling numerical integrals and the cost of accessing the pre-compiled objects. An underlying assumption is that linear-interpolation is used. Under this assumption (the output of) search algorithms and the relative weights do not depend on the integrand.
An advantage of the this procedure is that it can supplement widely used solution algorithms such as the Value Function Iteration (VFI) and the Endogenous Grid Method (G 2 EGM), extended to account for non-convexities. The procedure's relative speed gains will be the largest for computationally expensive estimated high-dimensional problems, that feature multiple continuous state variables with complex laws of motion. Such models are common in the applied literature (See French & Jones, 2011;Blundell et al., 2016 for estimated non-convex partial equilibrium models that feature multiple dimensions).
To assess the pre-compilation procedure, we consider a non-convex 2 consumption-savings model with a discrete labour choice, wage uncertainty and stochastic non-labour income. We solve the model using G 2 EGM and VFI combined with linear interpolation and Gauss-Hermite quadrature integration. We consider both solution algorithms as G 2 EGM is not always applicable. 3 G 2 EGM supplemented with the pre-compilation procedure is approximately 3 times faster then standard G 2 EGM: When in addition to the pre-compilation procedure, the value and policy functions are parametrically approximated by a piecewise multilinear polynomial, the procedure reduces the solution time by an additional 40%. This paper is related to the literature that aims to reduce the cost of numerical integration via pre-compilations. Druedahl (2020) reduces the cost of numerical integration using a vectorized interpolation scheme that mitigates the use of search algorithms, without requiring a pre-compilation step. Judd et al. (2017) utilize parametric approximations and propose a method that pre-computes conditional expectations, subject to the functional form of the approximating function. 4 The procedure in this paper pre-compiles the interpolation step of numerical integration, without requiring functional form assumptions.
The paper is structured as follows. Section 2 presents a simple application of the proposed procedure. Section 3 presents the non-convex model. Section 4 presents results on performance and accuracy. Section 5 presents the procedure's sufficient conditions and discusses its limitations. Section 6 extends the procedure to piecewise multilinear polynomial parametric approximations. Section 7 concludes.
2 As the problem considered is non-convex, a process that eliminates non-global optima is required. To address this, the EGM solution algorithm of Carroll (2006) is complemented with the upper envelope of Druedahl and Jørgensen (2017). See Fella (2014) and Iskhakov et al. (2017) for alternative upper envelopes. 3 See Druedahl and Jørgensen (2017) for the necessary and sufficient conditions. 4 For example if a function Vðk 0 Þ is approximated by a linear polynomial, its expectation is E½Vðk 0 Þ % E½a 0 þ a 1 k 0 ¼ a 0 þ a 1 E½k 0 ¼ a 0 þ a 1 l k ; where l k does not depend on V and is computed up front. When using local curve fitting however (e.g. a collection of piecewise polynomial approximations), search algorithms might still be required. Section 6 discusses how these search algorithms can be avoided.

A Simple Univariate-Continuous Example
To display the intuition behind the pre-compilation procedure, consider the example of a researcher that wants to evaluate the continuation value E½Vðm 0 Þ ¼ E½VðRa 0 þ y 0 Þ, for N different values of savings fa 0 i g N i¼1 : In this example R is the exogenous gross returns and y 0 is a standard log-normal iid income stock. The function V potentially has no closed form and the researcher only knows the values of V on a finite set of points fm 0 j g n j¼1 , where m 0 1 \ Á Á Á \m 0 n . Conditional on a value a 0 2 fa 0 i g N i¼1 , he can evaluate each expectation by numerical integration as a weighted sum of the interpolated function V.
Equation 1 algebraically displays how numerical integration works. The integrand is evaluated at finitely many integration nodes and a weighted sum of these is used to approximate the true integral. In this example fw j g J j¼0 and fRa 0 þ y j g J j¼0 correspond to the integration weights and nodes respectively. 5 Interpolated values are denoted byV and define v ¼ ðv 1 ; . . .; v n Þ, where v j ¼ Vðm j Þ. To obtainVðRa 0 þ y j Þ, it is standard practise to use a search algorithm (e.g. binary search) to locate an index k j such that m k j Ra 0 þ y j \m k j þ1 , for each discretised node j. Then, linear interpolation can be used to evaluateVðRa 0 þ y j Þ as a convex combination of v k j and v k j þ1 . Notice however that neither the search algorithm nor the relative weight calculations for fw j;0 ;w j;1 g require any information on the function V. They only require knowledge of the finite sets fa 0 i g N i¼1 ; fy j g J j¼1 ; fm 0 j g n j¼1 and the law of motion of the state variable ðm 0 ¼ Ra 0 þ y 0 Þ: Thus, both can be pre-compiled (as a fixed cost paid up front) and then, expectations can be computed for any function V, are the corresponding Gauss-Hermite quadrature weights and nodes, see Judd (1998). Extrapolations are ignored in this example to facilitate exposition. circumventing law of motion evaluations, the use of search algorithms and relative weight calculations. 6 To assess the speed gains that the researcher will get by pre-compiling numerical integrals in this example, suppose that he now wants to evaluate a sequence of functions fV s g T s¼1 : Algorithm 1 presents the standard expectation evaluation method. 7 Algorithm 2 presents a shortened version, once search algorithms and relative weights have been pre-compiled and Algorithm 3 displays the precompilation step. The researcher can either use Algorithm 1 T times or Algorithm 3 once and then Algorithm 2 T times. The difference between the two approaches is that the latter trades the cost of evaluating the law of motion, performing search algorithms and relative weight calculations (lines 4-6 in Algorithm 1) with the cost of calling elements from the pre-compiled matrices (line 4 in Algorithm 2), the memory requirement of storing these matrices and the cost of performing Algorithm 3 once up front. Also, note that the expectation evaluation steps (lines 4-7 in Algorithm 1 vs lines 5-7 in Algorithm 3 and line 5 in Algorithm 2) are identical and thus results and approximation errors will be identical as well. 6 In linear algebra form, define w Ã ¼ ðw Ã 1;0 ; . . .; w Ã J;0 ; w Ã 1;1 ; . . .; w Ã J;1 Þ, j ¼ ðk 1 ; . . .; k J ; k 1 þ 1; . . .; k J þ 1Þ and B ð2JÂnÞ as a binary matrix where Bði; jÞ ¼ 1 if j i ¼ j and zero otherwise. Then, Figure 1 is showing the solution time needed to evaluate the sequence of integrands fV s g T s¼1 with Algorithm 1 (Without Pre-Compilation) and the combination of Algorithms 3 and 2 (With Pre-Compilation), for different values of the parameters T; N; n; J. The time difference is not affected by the grid-size of the state space n, but it is increasing with respect to the parameters T; N; J that affect the total number of law of motion evaluations, search algorithms and relative weight calculations. Thus, in computationally expensive multi-dimensional problems, that require the evaluation of a large number of expectations and have state variables with complex laws of motion, the proposed procedure will provide large gains.

The Non-convex Model
To assess the performance of the proposed procedure in a stochastic dynamic problem, consider the following non-convex model. An infinitely lived agent maximizes his expected lifetime utility by making a consumption (c)/savings (a') decision and a discrete labour supply (h) decision every period. To make these decisions, he is taking into account his non-labour cash on hand (m) and his productivity (e), that is following an AR(1) process in its logarithm. The recursive formulation of the problem is The sequence of integrands fV s g T s¼1 is generated by V s ðxÞ ¼ ðx À3Àa s Þ=ðÀ3 À a s Þ; where a s $ Uð0; 1Þ; 8s 2 f1; . . .; Tg: Solution times (medians) were obtained using Julia's benchmarktools package. In each subplot, the parameters that do not vary are held fixed at T ¼ 200, n ¼ 100, N ¼ 100, J ¼ 8 where R is the exogenous returns, w is the exogenous wage rate, q is a correlation coefficient and y is a log-normal iid non-labour income shock. Next periods variables are denoted with a prime. The grid sizes are #ðmÞ ¼ 300; #ðeÞ ¼ 150; #ðhÞ ¼ 3; #ða 0 Þ ¼ 300; where the function #ðxÞ returns the number of nodes in the array x. The random variables for productivity and non-labour income are integrated using a Gauss-Hermite quadrature with 10 nodes. Interpolations are performed by multidimensional linear interpolation and to increase accuracy (1) the value function is transformed by f ðxÞ ¼ À1=x (to avoid infinities and reduce the function's curvature near the origin) and (2) next period's functions are interpolated for every choice of labour supply and the highest utility option is selected. The convergence tolerance level is set to 10 À8 , measured by the Euclidean distance of the continuation value matrix (See Appendix, Part C). Detailed information on the grid construction is presented in the Appendix. The utility function (2) is additively separable between consumption and leisure and the parameter values are set to b ¼ 0:98; c ¼ 4:0; r ¼ ð2:5; 1:5Þ; q ¼ 0:977; r y ¼ 0:1; r e ¼ 0:01; R ¼ 1:03; w ¼ 1:1; v ¼ 0:5; l e ¼ 0; l y ¼ lnð0:9Þ: The pre-compilation Algorithm and a brief outline of the solution Algorithm are located in the Appendix.

Performance and Accuracy
This subsection presents performance and accuracy results for a stochastic simulation of N ¼ 100; 000 individuals for T ¼ 30 time periods. 8 To measure accuracy, following (Judd, 1992;Santos, 1999) and (Santos, 2000) the common logarithm of the relative absolute Euler-errors (E) is considered. As the Euler-Equation does not have to be true on the constrained region, I impose a lower bound on savings, a ¼ 0:002: The G 2 EGM requires 747 iterations to converge with relative absolute Euler-errors of E ¼ À5:56 for r ¼ 2:5 and 736 iterations with relative absolute Euler-errors of E ¼ À5:81 for r ¼ 1:5. The VFI requires 822 iterations to converge with relative absolute Euler-errors of E ¼ À1:68 for r ¼ 2:5 and 762 iterations with relative absolute Euler-errors of E ¼ À1:72 for r ¼ 1:5. Table 1 is displaying the speed and accuracy results for three different approaches (using both G 2 EGM and VFI). The first is the standard solution algorithm, the second is the algorithm supplemented with the vectorized interpolation scheme of Druedahl (2020) and the third is the algorithm supplemented with the pre-compilation procedure. Standard G 2 EGM solves the problem in 325.30 s for r ¼ 2:5 and 318.44 s for r ¼ 1:5, while standard VFI solves the problem in 951.09 s for r ¼ 2:5 and 607.54 s for r ¼ 1:5: Using the pre-compilation procedure makes the solution algorithm roughly 2.8 times faster for G 2 EGM (and 1.5 for VFI). Note that the model of Sect. 3 is low-dimensional and each state variable has a simple law of motion. We expect that the pre-compilation procedure will provide larger relative speed gains in high-dimensional problems, where the laws of motion for the state variables feature complex functions. Examples of such problems are retirement problems with endogenous labour supply, that model the institutional features of social security. These problems often include state variables with laws of motion that mimic aspects of the tax and benefit system and can take days to estimate. Thus, the relative speed gains reported in this Section should be treated as a lower bound for larger and more complex models.
Using the the vectorized interpolation scheme of Druedahl (2020) makes the solution algorithm roughly 2.6 times faster for G 2 EGM (and 1.4 for VFI). This Results were computed using an AMD Ryzen 9 3950X 16-Core Processor (single threaded implementation). The code was written and executed using Julia (JuliaPro-1.5.1-1). Solution times (medians) were obtained using Julia's benchmarktools package method is more generally applicable (partial and general equilibrium) and does not impose a memory requirement. We expect that this method will likely outperform the pre-compilation procedure in problems without multiple continuous dimensions and where the laws of motion for the state variables are not computationally expensive to evaluate. All three approaches generate the same results and thus the same values for the relative absolute Euler-errors. Figure 2 is showing how the relative absolute Eulererrors evolve as a function of the grid size.

Discussion
The proposed procedure can be applied to any finite-dimensional, discretecontinuous stochastic dynamic problem as long as the four conditions of Proposition 1 are satisfied. In practise, multiple solution algorithms such as the standard VFI and the G 2 EGM 9 satisfy these conditions, in partial equilibrium problems.
Proposition 1 If (i) the state space grid, (ii) the choice grid, (iii) the shock grid and iv) the law of motion of the state variables, are all exogenous, then numerical integrals can be pre-compiled.

Proof See Appendix h
The intuition is that that we need sufficient information to determine which points will be interpolated able to perform the operations up front. An underlying assumption behind Proposition 1 is that multidimensional linear interpolation is used. A weaker version of this proposition can be extended to alternative Fig. 2 Common logarithm of the relative absolute Euler-errors (E). Notes This figure is showing the accuracy of G 2 EGM and VFI for various grid sizes. A grid size scale of x implies that the grid sizes are x Á #ðzÞ, for z 2 fm; a; eg interpolation schemes, however relative weights may depend on the integrand and will have to be evaluated during the model's solution.
It is important to emphasize that the procedure will pre-compile the interpolation step of numerical integrals associated with computing the continuation values. It will not pre-compile every interpolation that will occur during a solution algorithm (e.g. interpolations that take place in the upper envelope step to eliminate nonglobal maxima).
When a numerical integral is pre-compiled, objects that are required for its evaluation, such as indices and relative weights have already been calculated and are stored in arrays. Using these arrays imposes two limitations. The memory access cost and the the RAM memory they occupy, both of which are increasing with the size of the arrays. An approximate upper bound (in megabytes) for the RAM memory requirement of these arrays can be evaluated by where the function #ðxÞ returns the number of cells in the array x and fI k g K k¼1 is a set that contains the arrays that store the indices. For every array, we count the number of cells and then adjust for 8 bytes per cell, 10 times two (one array for the indices and one for the relative weights). The denominator changes the units from bytes to megabytes. In problems with multiple continuous state-variables and where multiple choice, state and shock variables enter their laws of motion, the memory requirement will be larger. In the Appendix, we consider a six state-variable lifecycle retirement problem and show that the pre-compiled arrays require at most 9:1MB or 0:0091GB; a small cost compared to the memory requirement of the arrays that store the value and policy functions. Judd et al. (2017) utilize parametric approximations and display how to precompute conditional expectations, subject to the functional form of an approximating polynomial function. When using local curve fitting however (e.g. a collection of piecewise polynomial approximations), in the general case, law of motion evaluations and search algorithms are still required to locate the hypercube that contains the point to be evaluated and the surrounding vertices are used to calculate the approximating function's coefficients.

Extension to Parametric Approximations
This Section presents how the known link between multilinear polynomials and multidimensional linear interpolation can be utilized to pre-compile conditional expectations and further increase the speed of a solution algorithm. Parametrically approximating the value function of the model in Sect. 3, using a piecewise multilinear polynomial on the rectangular mesh ½m 1 ; . . .; m #ðmÞ Â ½e 1 ; . . .; e #ðeÞ yields E½Vðm 0 ; e 0 Þ % E½V ½m z ;m zþ1 Â½e s ;e sþ1 ðm 0 ; e 0 ; aÞ where fa j g 3 j¼0 are the coefficients of the polynomial. Notice however, that the resulting function is a deterministic multilinear polynomial and it is known that it will deliver the same value as bilinear interpolation on the point ðRa 0 þ E½y 0 ; e q E½expðu 0 e ÞÞ: Thus, instead of calculating the fa j g 3 j¼0 coefficients during the model's iterative solution cycle, 11 we can use the pre-compiled bilinear interpolation and obtain the same results.
Approximating the value and consumption functions of the problem in Sect. 3 using a piecewise multilinear polynomial, G 2 EGM requires 748 iterations to converge with relative absolute Euler-errors of E ¼ À4:10 for r ¼ 2:5 and 734 iterations with relative absolute Euler-errors of E ¼ À4:13 for r ¼ 1:5. Table 2 is displaying the speed and accuracy 12 results for the two different approaches.
Using the pre-compilation procedure reduces the running time by about 40%. The relative time difference is smaller than the one reported in Sect. 4, as in this case the loops occur over savings and productivity (not over the shocks) and thus there are less law of motion evaluations, search algorithms and coefficient (instead of relative weight) evaluations. Figure 3 is showing a comparison of the solutions with and without parametric approximations.
For further error analysis when using parametric approximations, the reader is advised to see Judd et al. (2017), where this topic is assessed extensively. Unlike this paper, they use an alternative procedure where inferences about the problem's accuracy are not contaminated by numerical integration error during the accuracy evaluation algorithm. þPreÀcompilation Results were computed using an AMD Ryzen 9 3950X 16-Core Processor (single threaded implementation). The code was written and executed using Julia (JuliaPro-1.5.1-1). Solution times (medians) were obtained using Julia's benchmarktools package

Conclusions
This paper presents a parsimonious pre-compilation procedure for partial equilibrium stochastic dynamic problems. The pre-compilations's speed gains were assesed using a consumption-savings model with a discrete labour choice, wage uncertainty and stochastic non-labour income. The procedure's relative speed gains will be larger for computationally expensive estimated high-dimensional problems, that feature multiple continuous state variables with complex laws of motion. This procedure should assist researchers in estimating and performing counterfactual analysis using high-dimensional models. Part E. Life-cycle retirement problem RAM requirement Consider a six state-variable retirement problem. This problem is a simplified version of French and Jones (2011). The problem has multiple continuous statevariables, with fairly complex laws of motion. The agent works from age 21 to age 65, retires at age 66 and dies at age 100. The variable m is non-labour cash on hand, g is health status, s is medical expenditures, w is the wage rate, AIME is average index monthly earnings and t age. There are four continuous variables m, s, w, AIME, the variable t 2 f21; . . .; 100g is discrete and g 2 fgood; badg is binary. For simplicity, I assume that the social security system has a replacement rate of 30% and abstract from specifying model restrictions that are not relevant to the memory calculation. The recursive formulation for the worker and retiree problems are t 65 ðworkerÞ Vðm; g; s; w; AIME; tÞ ¼ max c;a 0 ;h ( uðc; h; gÞ þ bE h Vðm 0 ; g 0 ; s 0 ; w 0 ; AIME 0 ; t þ 1Þ i ) requirement is at most 9:10MB; a small cost compared to the RAM requirement for the workers savings policy function (582.4 MB).