Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Discrete time dynamic programming to solve dynamic portfolio choice models has three immanent issues: firstly, the curse of dimensionality prohibits more than a handful of continuous states. Secondly, in higher dimensions, even regular sparse grid discretizations need too many grid points for sufficiently accurate approximations of the value function. Thirdly, the models usually require continuous control variables, and hence gradient-based optimization with smooth approximations of the value function is necessary to obtain accurate solutions to the optimization problem. For the first time, we enable accurate and fast numerical solutions with gradient-based optimization while still allowing for spatial adaptivity using hierarchical B-splines on sparse grids. When compared to the standard linear bases on sparse grids or finite difference approximations of the gradient, our approach saves an order of magnitude in total computational complexity for a representative dynamic portfolio choice model with varying state space dimensionality, stochastic sample space, and choice variables.


3 Introduction
A common approach to solve dynamic portfolio choice models in discrete time is dynamic programming, iterating over the value function backwards in time. Starting from the known value function at final time T, the value function is approximated on a state space grid, assuming that the state space is continuous. To determine the next iterate of the value function at each grid point at time T − 1 we have to solve an optimization problem that depends on the value function of the previous iterate at time T. When a tensor product approximation is used, this approach suffers from the curse of dimensionality as the number of grid points of the approximation grows exponentially with the dimensionality of the state space. In addition, solving for the current value function iterate at a grid point relies on an accurate solution of the underlying optimization problem. When the portfolio choice is continuous, e.g., choosing the investment amount in stocks, bonds, etc., the computation of the optimal solution can be greatly accelerated by gradient-based optimization routines if the gradient of the objective function is available.
Recently, sparse grids have been successfully employed to break the curse of dimensionality in high-dimensional dynamic models (Brumm and Scheidegger 2017;Judd et al. 2014;Schober 2018;Winschel and Krätzig 2010). 1 A standard d-dimensional tensor product grid on the unit hypercube [0, 1] d with mesh size 2 −n , n ∈ ℕ , and no points on the boundary contains 2 n − 1 grid points per coordinate direction and thus O(2 nd ) points in total, growing exponentially with the dimensionality d. In contrast, a regular sparse grid with the same mesh size contains only O(2 n n d−1 ) points. The error of the sparse grid approximation of a function with homogeneous boundary conditions using piecewise linear basis functions is O(2 −2n n d−1 ) with respect to the L 2 and L ∞ norm if the approximated function has bounded mixed second derivatives (Bungartz and Griebel 2004;Zenger 1991). This is only slightly worse than the corresponding error O(2 −2n ) for the case of full tensor product grids.
In higher dimensions, even regular sparse grids need too many grid points for a sufficiently accurate approximation when solving high-dimensional dynamic models (Brumm and Scheidegger 2017). Fortunately, for approximations in the standard piecewise linear basis, the hierarchical structure of sparse grids allows for spatially adaptive refinement of the grid by inserting the 2d children of only certain leaves in the hierarchical structure. Spatially adaptive refinement has successfully been employed to solve high-dimensional dynamic models by Brumm and Scheidegger (2017) and Schober (2018). Unfortunately, approximations of the value function using the standard piecewise linear basis are not continuously differentiable and, hence, have discontinuous gradients. This poses a problem to gradient-based optimization techniques, which rely on a twice continuously differentiable approximation of the objective function to ensure convergence (Schober 2018).

3
Solving High-Dimensional Dynamic Portfolio Choice Models… Global polynomial approximations have shown to work well with value function iteration and continuous choices for solving dynamic economic models Judd et al. 2014) as they are globally smooth. Smolyak's formula can be used to construct sparse grid approximations (Barthelmann et al. 2000) on global polynomial bases, which can be refined adaptively with regard to specific dimensions of the state space (Judd et al. 2014) and with regard to the hierarchical surpluses, i.e., locally adaptively (Stoyanov 2017). Value function iteration with the use of gradient information to approximate the value function more accurately with global polynomials is also possible .
However, B-splines are much more flexible than global polynomials (Valentin and Pflüger 2016;Valentin 2019). While global polynomial approximations are bound to certain grid structures to avoid Runge's phenomenon or similar issues, B-spline basis functions can be employed on any nested spatially adaptive grid hierarchy. They allow for simultaneous local-and degree-adaptive refinement (hpadaptivity), implying that one could use a smaller or larger mesh size and/or degree of the B-spline basis functions in certain regions of the state space, e.g., to resolve kinks. In addition, the local basis functions are faster to evaluate than the conventional global polynomial basis functions. Approximations with B-splines of cubic degree (or higher) are twice continuously differentiable, and readily supply smooth and explicit approximations of both, the value function and the gradient. Compared to approximating the derivatives with finite differences, the optimization is not only more accurate but also significantly faster, especially when the number of optimization variables is large (Valentin 2019). B-splines have thus proven useful for computing numerical solutions to numerous dynamic models when finding the root of the gradient is required (Chu et al. 2013;Habermann and Kindermann 2007;Judd and Solnick 1994;Philbrick and Kitanidis 2001).
In total, three issues with discrete time dynamic programming for dynamic portfolio choice models with continuous choices emerge: the curse of dimensionality, the lack of spatial adaptivity, and the lack of continuous gradients. It is apparent that current economic literature deals with these issues only in isolation, e.g., by combining sparse grids with global polynomial basis functions, or using sparse grids with non-smooth local linear basis functions to allow for spatial adaptivity. These approaches are hence computationally inefficient in accurately solving high-dimensional dynamic portfolio choice models or any high-dimensional dynamic economic model that requires smooth approximations or gradient-based optimization.
This paper is the first to address all of these issues at once by combining hierarchical B-splines with sparse grids to approximate the value function and its gradient. Thus, we enable accurate and fast numerical solutions using gradient-based optimization while still allowing for spatial adaptivity (Pflüger 2010;Valentin and Pflüger 2016). The hierarchical grid structure allows us to develop an algorithm that uses the local adaptivity similar to Brumm and Scheidegger (2017) and Schober (2018), but interpolates the value function and its gradient with a B-spline basis. Therefore, we create a sparse grid for the value function, for which we interpolate the value function in the piecewise linear basis. We then refine the grid using the standard hierarchical surplus-volume-based refinement criterion. Finally, we interpolate the value function with hierarchical B-spline basis functions on the spatially adaptively refined sparse grid.
We focus our study on the numerical accuracy of our approach. Therefore, we choose a dynamic portfolio choice model with multiple stocks, one bond, and consumption. For buying and selling the stocks, linear transaction costs have to be deducted. The resulting optimization problem is high-dimensional in terms of the state space, stochastic sample space, and choice variables. Hence, this problem is especially suited for a complexity analysis. At the same time, this model is similar to models from a vast strand of state-of-the-art literature on dynamic portfolio choice, e.g., Barberis andHuang (2009), Cocco et al. (2005), De Giorgi and Legg (2012), Horneff et al. (2010), Horneff et al. (2008), Hubener et al. (2016, Hubener et al. (2014), Inkmann et al. (2011). Consequently, our approach can be generalized to a broad class of dynamic portfolio choice models with only minor modifications.
Dynamic portfolio choice models with transaction costs have been studied economically, e.g., by Abrams and Karmarkar (1980), Kamin (1975), Liu and Loewenstein (2002), Magill and Constantinides (1976), and extensively numerically by Cai (2009), Cai andJudd (2010), , Cai et al. (2020). The latter report computational times and economic solutions for higher-dimensional transaction costs problems. They employ polynomial interpolation with only few polynomial nodes and parallelization to solve these problems with and without consumption using value function iteration in discrete time. Cai et al. (2020) present convergence results and computational times for the three-dimensional transaction costs problem with consumption and numerical errors for the four-dimensional problem using complete Chebyshev polynomials to approximate the value function. However, only we have employed spatially adaptive sparse grids to the transaction costs problem, which induces optimization on continuous choices (Schober 2018). We also apply local adaptivity to compute the optimal policies from the solution to the underlying optimization problem as earlier suggested by us (Schober 2018) and Brumm and Grill (2014).
A complexity analysis reveals that cubic B-splines save more than one order of magnitude in computational effort compared to the state of the art with the linear basis (Brumm and Scheidegger 2017) and/or finite difference approximations of the gradient on regular sparse grids. Using spatially adaptive refinement of the optimal policy, we solve the problem for up to five dimensions where highly accurate solutions on regular sparse grids would require hundreds of thousands of grid points, and are thus no longer suitable. Here, spatially adaptive refinement allows a comparably low base resolution in the solution process of the value function, and, in a second step, adds grid points in the optimal policies where required. By this, we obtain low reported unit-free Euler equation errors for the transaction costs problem.
The rest of this article is structured as follows: In Sect. 2, we define the general class of dynamic portfolio choice models for which our approach is applicable. Section 3 introduces hierarchical B-splines on spatially adaptive sparse grids, leading to the definition of hierarchical weakly fundamental not-a-knot splines. Algorithms for solving dynamic portfolio choice models with B-splines on spatially adaptive sparse grids are discussed in Sect. 4. We analyze the complexity and demonstrate the numerical accuracy of our approach solving the transaction costs problem in Sect. 5 before concluding in Sect. 6.

Discrete Time Dynamic Portfolio Choice Models
We consider discrete time dynamic portfolio choice models with finite time horizon T in which the investor seeks to maximize additive expected life-time utility u from consumption c t : Here, she has m p continuous choices p t ∈ ⊂ ℝ m p (e.g., investment amounts in stocks and bonds) with respect to d continuous states x t ∈ ⊂ ℝ d (e.g., current financial wealth or labor income) she can reside in at time t. In addition, the transition from state x t to x t+1 does not only depend on her choices and her state, but may also be subject to m random shocks t ∈ Z (such as stock returns or labor income shocks), which are drawn from the sample space Z ⊂ ℝ m . The random variable , then describes the continuous state dynamics between t and t + 1 . We denote by < 1 the subjective time discount factor, and we assume the utility function to be of Constant Relative Risk Aversion type with risk aversion > 1: It is also straightforward to include discrete choices (compare also, e.g., Brumm and Scheidegger 2017) in the trivial way and discrete states (Schober 2018) in this model setup. Furthermore, the model can be generalized to further utility functions, e.g., to Epstein and Zin (1989) utility and to utility functions with a narrow framing component (Barberis and Huang 2009). In this paper, we disregard these modeling choices purely for simplicity.
By the Bellman principle (Bellman 1954), this utility maximization problem can be reformulated in terms of the value function j t for t = 0, … , T with known terminal utility v subject to the m g possibly non-linear inequality constraints g t ∶ × → ℝ m g : For two vectors a, b , we define a ≥ b if a i ≥ b i for all i (and " ≤ " analogously). The corresponding expected value of j t+1 is Here, t (⋅ | x t ) denotes the conditional distribution of t . Note that we only need the value function at time t + 1 to determine the value function at time t via maximization.
To numerically solve the optimization problem (3), discrete-time dynamic programming iterating over the value function is common (Judd 1998;Rust 2008). Therefore, the value function j t is restricted to a finite grid on the (truncated) state space with N t grid points x (k) t , k = 1, … , N t . The value function values in between grid points are interpolated by with basis functions k and coefficients k , which are chosen in such a way that the interpolant j S t fits the known function values at all grid points x (k) t . Beginning with the known final solution at time T, the Bellman equation is solved backwards in time until the value function is computed for each grid point at each t = T − 1, … , 0.
For t < T , let us define the interpolant of the objective function of the maximization in Eq. (3a) for grid point x (k) t by: The maximization of this target function (6) with respect to p t can then be performed using Sequential quadratic programming (SQP) routines, see "Appendix A.1".
To compute the expectation with respect to d , numerical integration can be used if the conditional distributions (⋅ | x (k) t ) are known. 2

Hierarchical B-Splines on Sparse Grids
As discussed in Sect. 1, hierarchical B-splines on sparse grids provide numerous advantages over other basis choices, especially in the context of optimization (Valentin 2019; Valentin and Pflüger 2016). In addition, they allow for spatially adaptive refinement by applying the standard surplus-based refinement criterion. To compute the coefficients of the B-spline approximation, usually a computationally expensive linear system has to be solved. Therefore, we determine the underlying grid structure by applying the surplus-based refinement criterion on the piecewise linear basis and interpolating with B-splines on the resulting grid. As proven in our previous work (Valentin 2019), the computational effort needed for the computation of the coefficients can be further reduced by using the unidirectional principle. This is facilitated by weakly fundamental not-a-knot splines and the insertion of some additional grid points.

Not-A-Knot B-Spline Basis
Let p ∈ ℕ 0 , m ∈ ℕ , and = ( 0 , … , m+p ) be an increasing sequence of real numbers. The B-spline b p k, of degree p for the knot sequence is defined via the Cox-de Boor recurrence (Cox 1972;de Boor 1972;Höllig and Hörner 2013) where k = 0, … , m − 1.
It can be shown that for m > p , the B-splines b For simplicity, we restrict the considerations and results in this paper to cubic B-splines (i.e., p = 3 ) although it is important to note that our method can be generalized to arbitrary odd B-spline degrees. A common special case is the case of linear B-splines ( p = 1 , so called hat functions), which are commonly used as basis functions for sparse grids.
We consider equidistant grid points x ,i ∶= ih on the unit interval [0, 1] where ∈ ℕ 0 is the level, i = 0, … , 2 is the index, and h ∶= 2 − is the mesh size. We want to find basis functions ,i ∶ [0, 1] → ℝ such that we can interpolate a given objective function f ∶ [0, 1] → ℝ on the equidistant grid of level by a linear combination of the basis functions: The most straightforward choice of B-splines for ,i are uniform B-splines that are scaled and translated versions of the cardinal B-spline b 3 ∶= b 3 0,(0,1,2,3,4) (Pflüger 2010; Valentin 2019): However, the interpolation domain on which the B-splines span the spline space would only be This interval does not contain the two boundary grid points x ,0 = 0 and x ,2 = 1 . This leads to interpolation problems since the spline space on [0, 1] is not contained in the spanned space of the basis functions unif ,i . Even simple polynomials such as f (x) = 4(x − 0.5) 2 cannot be represented exactly with the B-spline basis on the whole domain [0, 1] as shown by Fig. 1a and Valentin and Pflüger (2016). Consequently, the approximation quality for more complex functions like the value functions we interpolate in this paper deteriorates unnecessarily, which means that the economic results are not as conclusive as they could be.
As a remedy, we impose so-called not-a-knot boundary conditions by removing the left-most and right-most inner grid points x ,1 and x ,2 −1 from the knot sequence (Höllig and Hörner 2013;Valentin 2019). To keep the number m = 2 + 1 of B-splines the same, we have to insert two additional knots outside the domain: is now the whole unit interval [0, 1], containing all grid points at which we interpolate. As a result, the not-a-knot B-spline functions

Hierarchical Not-A-Knot B-Splines
The construction of sparse grids needs a hierarchical splitting of the nodal basis. Therefore, we define the so-called nodal subspaces V nak and hierarchical subspaces W nak by where The bases of the hierarchical subspaces are shown in Fig. 2. It can be shown that the basis functions of W nak for = 0, … , n and n ∈ ℕ 0 are linearly independent on [0, 1] (Valentin 2019). This means that the sum span{ nak ,i | = 0, … , n , i ∈ I } of the subspaces W nak 0 , … , W nak n is direct (i.e., W nak ∩ W nak � = {0} for ≠ ′ ) and can be written as ⨁ n =0 W nak . Due to dimensional arguments, the direct sum coincides with the nodal space, Both sides are equal to the not-a-knot spline space described before.

Sparse Grids
We generalize the univariate hierarchical basis to d-variate functions with a tensor product approach: where level and index are multi The corresponding grid points are given by and nodal and hierarchical subspaces are defined by where 0 ≤ i ≤ 2 n is to be read component-wise ( 0 ≤ i t ≤ 2 n t for all t = 1, … , d ) and I = I 1 × ⋯ × I d with the Cartesian product × . The nodal subspace of level n can be split into hierarchical subspaces by the d-dimensional generalization of Eq. (15): where n ∈ ℕ d 0 . In the following, we assume that the level is equal for every dimension: n ∶= (n, … , n) = n ⋅ 1.
Sparse grids provide a method for the interpolation of objective functions f ∶ [0, 1] d → ℝ . The common approach is to use the nodal space V nak n for interpolation. However, the corresponding full grid of level n, contains (2 n + 1) d = (2 nd ) grid points. If we interpolated with V nak n , we would have to evaluate the objective function (2 nd ) times, a number that grows exponentially with the dimensionality d. This fact is known as the curse of dimensionality (Bellman 1961). If evaluations of the objective function are computationally expensive, then dimensionalities of d ≥ 4 usually prohibit full grid approaches. For dynamic portfolio choice models, this means that only very coarse full grids may be employed in the state space if the number d of state variables is large.
Sparse grids exploit the splitting (19) to select only some hierarchical subspaces such that the number of necessary evaluations for interpolation is drastically reduced, but the interpolation error deteriorates only slightly. The subspace selection can be formulated as an optimization problem for the piecewise linear case ( p = 1 ) on the L 2 and L ∞ interpolation error as detailed by Bungartz and Griebel (2004). The basic idea is to select those hierarchical subspaces that contribute most to the interpolation, assuming the objective function is sufficiently smooth. The optimal a priori selection for hat functions is given by the regular sparse grid of level n: where the 1-norm ‖⋅‖ 1 is given by This definition is illustrated in Fig. 3. It can be seen as an analogue to Eq. (19), which we obtain by replacing the 1-norm on the right-hand side of Eq. (21) with the ∞-norm Although the definition is motivated by the piecewise linear case ( p = 1 ), using other basis functions such as higher-order B-splines has proven useful in various . In contrast, the sparse grid L 2 interpolation error is O(2 −2n n d−1 ) and therefore only slightly worse (by a factor which is polynomial in n) while requiring O(2 n n d−1 ) grid points (see the proof in Bungartz and Griebel 2004). The number of grid points does not depend on 2nd anymore, which means that significantly less grid points than in the full grid case are required. For B-splines of degree p, the interpolation error has been proven by Sickel and Ullrich (2011) to be in the order of O(2 −(p+1)n n d−1 ) , which differs from the corresponding full grid error O(2 −(p+1)n ) (see Höllig and Hörner 2013 for a proof) by the same polynomial factor n d−1 . These a priori estimates are based on the assumption that the interpolated function has continuous mixed second derivatives. If this is not the case or if it contains oscillations with high frequencies, then spatial adaptivity must be employed. Therefore, grid points are refined a posteriori according to suitable refinement criteria, see Fig. 4. This is of particular importance for the scope of this paper as spatial adaptivity enables us to increase the accuracy in regions of interest while simultaneously keeping the number of grid points at an acceptable level. The idea of the common surplus-based refinement criterion is that in the piecewise linear basis, the hierarchical surpluses correspond to the integral of the mixed second derivative of the interpolated function (Bungartz and Griebel 2004). If the absolute value | ,i | of the hierarchical surplus of a grid point x ,i

Fig. 4
Spatially adaptive refinement of the hierarchical subspace W (1,2) for the regular sparse grid of level n = 3 in 2D. A refinable grid point (blue) is refined by adding its 2d = 4 children (red). The resulting grid is shown on the right Solving High-Dimensional Dynamic Portfolio Choice Models… is larger than a certain tolerance , then the 2d children of x ,i are inserted to improve the accuracy of the interpolation in the proximity of x ,i (Pflüger 2012). This criterion is only motivated for piecewise linear basis functions. Therefore, and for reasons of complexity, we use the piecewise linear basis to determine the grid points to be refined, and interpolate with the B-spline basis on the refined grid.

Weakly Fundamental Not-A-Knot Splines
Let S ⊂ [0, 1] d be the set of grid points of the sparse grid, for example S = {x ,i | ‖ ‖ 1 ≤ n , i ∈ I } for the regular sparse grid of level n (but dimensionally or spatially adaptive sparse grids are possible as well). In this setting, the task of interpolation is usually called hierarchization for the basis functions ,i . The resulting coefficients ,i for Eq. (8) are the hierarchical surpluses.
Conventional B-spline bases, such as the not-a-knot B-splines described before, share the drawback that the hierarchization is in general computationally expensive. In the case of the common piecewise linear basis ( p = 1 ), the hierarchical surpluses can be calculated in O(| S | ⋅ d) time with the so-called unidirectional principle (Pflüger 2010). For B-splines, usually the solution of a linear system with | S | unknowns is required, which is generally much slower as this needs O(| S | 3 ) time (where, e.g., | S | = O(2 n n d−1 ) for regular sparse grids of level n if we omit points on the boundary).
In one dimension, the reason is the additional couplings between the basis functions of different levels introduced by the wider support of the cubic B-splines compared to the piecewise linear functions. To mitigate this issue, we linearly combine as few neighboring nodal not-a-knot B-splines as possible such that the resulting combination wfnak l,i satisfies which we call the weakly fundamental property. The resulting basis functions are plotted in Fig. 5. This enables the efficient unidirectional principle for the hierarchization with the resulting weakly fundamental not-a-knot spline basis if specific points are inserted beforehand (for details, see Valentin 2019). Therefore, we use weakly fundamental not-a-knot splines on sparse grids for the rest of the paper.

B-Splines on Spatially Adaptive Sparse Grids for Dynamic Portfolio Choice Models
To solve the Bellman problem (3) numerically, we compute the value function interpolants (5) by solving at all grid points x (k) t ( k = 1, … , N t ), using higher-order B-splines on sparse grids for j S t+1 in the right-hand side of target function (6). This basis choice readily provides the gradient of the target function (6) at each x (k) t , such that we can supply it to any SQP routine. As a result of the SQP optimization, we obtain the values of the interpolant j S t (x (k) t ) and the optimal policies p opt,S t (x (k) t ) at these grid points for all t < T and k = 1, … , N t : In general, the shapes of the value function and the optimal policies have different characteristics. The sufficiently accurate resolution of the optimal policy is already relevant to achieve plausible economic results if full grid solutions are computed (Brumm and Grill 2014). On sparse grids, this is even more important as kinks in the optimal policies can deteriorate the numerical error drastically (Schober 2018). Hence, as proposed by Schober (2018), in a subsequent step, optimal policy interpolants p opt,S t are computed by adaptively refining the respective policy grids and re-optimizing for the refined grid points if the added grid point is not yet part of the solution from the first step.
We track two interpolants j S,1 t and j S,p t for each t = 0, … , T . The former interpolates the value function data at the grid points x (k) t ( k = 1, … , N t ) with the hierarchical piecewise linear basis (used for the surplus-based grid generation) while the latter interpolates the data with cubic hierarchical weakly fundamental nota-knot splines of degree p = 3 . Each j S, * t ( * ∈ {1, p} ) additionally stores the grid points x (k) t and the optimal policies p opt t (x (k) t ) at the grid points. For simplicity, we do not pass them explicitly to the algorithms.
In the following Sects. 4.1-4.3, we describe the algorithmic details of the generation of the value function interpolant (23). The generation of the optimal policy

Solution for the Value Function
Algorithm 1 shows solveValueFunction, generating the value function interpolants j S,1 t and j S,p t ( t = 0, … , T ). The algorithm follows a simple optimize-refine-interpolate scheme, which is presented in Fig. 6: First, Eq. (23) is solved on an initial sparse grid (optimize). Then, we refine the grid spatially adaptively. Finally, the resulting grid data are interpolated with hierarchical higher-order B-splines.
At the beginning of every iteration t, the grid of the piecewise linear interpolant is reset to an initial, possibly regular sparse grid. It would also be possible to reuse the grid from the previous iteration t + 1 . However, the results we then obtain become worse, likely due to the different characteristics of j S,1 t for different t (e.g., kinks).

Optimization
The optimize step can be seen in Algorithm 2. This algorithm accepts in j S,1 t a spatially adaptive sparse grid S t = {x (k) t | k = 1, … , N t } where the function values j S,1 t (x (k) t ) may already be known for some grid points x (k) t if optimize is called from within refine. The function optimize computes the missing value function values. For t = T , we assume that the terminal solution j T can be computed by some function computeKnownTerminalSolution. 3 Otherwise, for t < T , we solve the maximization problem (23) by using the higher-order B-spline interpolant j S,p t+1 of the previous iteration t + 1 (optimizeSinglePoint). The computations for the different x (k) t are independent of each other, which means that they can be computed in parallel Horneff et al. 2016). 4 After generating all missing data, we update the hierarchical surpluses of the piecewise linear interpolant j S,1 t to interpolate the new data at all grid points of S t .

Refinement
For adaptive refinement, the criterion is the common surplus-volume (Pflüger 2010). We use the piecewise linear interpolant for the surplus-based grid generation as the surpluses are easier to compute in the piecewise linear case, and as they are more meaningful due to the integral representation formula (Bungartz and Griebel 4 Such a problem is usually referred to as embarrassingly parallel. 3 In any case, the terminal solution may be computed as the solution of the corresponding single-time optimization problem, e.g., j T (x (k) T ) = max p T u(c T (x (k) T , p T )) .

3
Solving High-Dimensional Dynamic Portfolio Choice Models… 2004). Algorithm 3 shows how to generate the spatially adaptive sparse grid in solveValueFunction (Algorithm 1). Parameters are the tolerance ∈ ℝ ≥0 by which the set of grid points to be refined is determined and the number q ∈ ℕ 0 of refinement iterations.

Solution for the Optimal Policies
To construct the optimal policies, we use the higher-order B-spline interpolant j S,p t and the optimal policies p opt t (x (k) t ) at the grid points x (k) t ( k = 1, … , N t ) obtained from Algorithm 1. We then spatially adaptively refine the grid for each policy to construct a policy interpolant of degree one, p opt,S,1 t , for each t = 1, … , T . The corresponding Algorithm 1 is similar to solveValueFunction (Algorithm 4), except that it operates on the policy interpolants instead of the value function interpolant. The functions optimize and refine have been replaced by corresponding policy versions optimizePolicy and refinePolicy that work very much like their value function counterpart. In the optimization step, optimizePolicy only has to generate new values if the initial regular sparse grid for the policies is not contained in the grid of j S,p t . The policy grid is then refined independently of the value function grid. The iterations are independent of each other, which means that they can be parallelized. 5 1 3

Application: Transaction Costs Problem
First, we introduce the dynamic portfolio choice model with transaction costs (Sect. 5.1). We then describe its numerical solution in Sect. 5.2 and derive our error measure (unit-free Euler equation errors) in Sect. 5.3. We verify our solution economically on a two-dimensional full grid (Sect. 5.4). Then, we analyze the time complexity of the solution approach and show the impact of the choice of basis functions on the computational complexity in Sect. 5.5. Therefore, we solve the problem with B-splines of cubic degree on a regular sparse grid and compare them to the linear approach as used by Brumm and Scheidegger (2017). We find that we save approximately one order of magnitude in computational complexity with cubic B-splines already for three dimensions. Furthermore, we compare the results of our approach based on analytical gradients with the results we obtain with finite differences. For d > 3 , solutions on regular sparse grids are no longer feasible to compute in suitable numerical accuracy. In Sect. 5.6, we illustrate how spatial adaptivity allows us to solve the transaction costs problem up to d = 5 accurately by showing pointwise error decay and convergence of our approach. Finally, we present in Sect. 5.7 economic results for the transaction costs problem in higher dimensions. The results in these sections are also contained in the recently submitted Ph.D. thesis of Valentin (2019).

Transaction Costs Problem
As the transaction costs problem is easiest described in vector notation, let us denote the unit vector with 1 . For two vectors a , b , we define the Hadamard product as In the transaction costs problem the investor maximizes expected utility from consumption (1). Therefore, at time t, she tracks her wealth W t ∈ ℝ ≥0 and fractions of wealth x t ∈ [0, 1] d invested in stocks. Her choices are how much to buy of the d where > 0 is a cost factor. Additionally, she can invest in a transaction-costfree money market account B t , yielding a risk-free return r f ∈ ℝ ≥0 . We assume the returns r t ∈ ℝ ≥0 that the d stocks earn from t to t + 1 are independent and identically 1 3 Solving High-Dimensional Dynamic Portfolio Choice Models… lognormally distributed with mean and covariance matrix : r t ∼ LN( , ) (Cai 2009;Cai and Judd 2010). The investor's consumption C t in period t is the residual of her wealth that is not invested in stocks or bonds, reduced by the transaction costs for rearranging her portfolio in this period: The state dynamics from t to t + 1 are thus given by: The investor faces the optimization problem with utility function u from Eq. (2) subject to the constraint for all t = 0, … , T, Here, a minimum consumption level C min must be maintained, and the final stock holdings x T W T are assumed to be sold before they can be consumed. In addition, at no point in time t the investor can sell more of the stocks than her current holdings x t W t .
The problem can be simplified by normalizing the value function j t = J t ∕W t , consumption c t = C t ∕W t , and investment choices b t = B t ∕W t , + t = + t ∕W t , − t = − t ∕W t with respect to wealth W t for each t. The investor's normalized consumption c t in period t is then The state dynamics from t to t + 1 can be expressed in terms of the portfolio value in t + 1 : With this normalization, the solution to problem (27a-27c) can be expressed as where the minimum consumption level c min = C min ∕W t is also normalized with respect to wealth (see "Appendix A.2"). Now the investor's optimization problem no longer depends on W t , and hence one state variable can be eliminated. The nonnormalized optimal choices can be obtained by multiplication with a given wealth W t for any t and state x t .

Numerical Solution
To compute the solution to the transaction costs problem, we use the certainty equivalent transformation ̂j t of the normalized value function j t , which reduces the curvature of the value function when the utility is of Constant Relative Risk Aversion type, Eq.
(2) (Garlappi and Skoulakis 2009). Since this transform is strictly monotone, any maximizer of ̂j t also maximizes j t . The optimization problem then reads
The optimization problem for a given state x t is solved with the SQP solver SNOPT (Gill et al. 2005), see "Appendix A.4" for the specific objective function and its gradient. Since the distribution of the returns r t is multivariate lognormal and state-independent, we can compute the expectation in Eq. (6) using Gauss-Hermite quadrature. For this, we also use a sparse grid quadrature rule, thus breaking the curse of dimensionality when including stochastic risk factors (see the appendix of Horneff et al. 2016 for details).
The constraint (31h) constrains the state space. The resulting eligible subspace , not a rectangular domain as needed for the sparse grid approximation. We solve this problem by assuming that any state attained that is not eligible is cropped to an eligible state by selling all stock holdings pro rata until all constraints are satisfied. That is, money is transferred from stocks to wealth, for which the proportionate transaction costs are deducted (see "Appendix A.5"). The approximation of the value function is then evaluated at this eligible state.
The optimization ran over an investment horizon of T = 6 years and had a fixed period length of 1 year. The risk aversion = 3.5 , the risk-free rate r f = 4% , and the transaction costs factor = 1% were taken from Cai and Judd (2010). We extended the return distribution parametrization of Cai and Judd (2010) to five dimensions: and set the time discount factor to = 0.97 and c min = 0.001 to ensure that minimal consumption was taking place. For d stocks we used the first d entries of and the elements i,j , i, j ≤ d , as the return distribution parametrization. As initial grids, we used regular sparse grids S n,d of level n and dimension d. The code was written in MATLAB where the interpolation on sparse and full grids was implemented by a MEX file interface to the sparse grids C++ toolbox SG ++ (sgpp.spars egrid s.org, Pflüger 2010). The quadrature routine was implemented by a MEX file interface to the TASMANIAN sparse grids C++ toolbox (Stoyanov 2017) as TASMANIAN allows us to integrate a real valued function over a Gaussian density using Hermite polynomials on sparse grids (see the appendix of Horneff et al. 2016). We used the SNOPT implementation of the Numerical Algorithms Group (www.nag.co.uk). If convergence of the optimizer was not observed, we stopped the optimization after 100 iterations. To avoid being stuck in local minima, we repeated the optimization process for a varying number of initial multi-start points (in the range of a few dozens). All computations were performed on the compute cluster LOEWE-CSC (csc.uni-frank furt.de) where we exclusively allocated three compute nodes with two Intel Xeon 1 3 E5-2670 v2 CPUs (ten cores at 2.5 GHz, 20 threads) each, i.e., 120 threads in total, and 4000 MB RAM per thread.

Error Measurement
Any optimal policy p opt t ∶= (b opt t , +,opt t , −,opt t ) ⊤ must satisfy the first order conditions of the Lagrangian at any given state x t for each t < T . Specifically, for the transaction costs problem-when neglecting binding constraints-we obtain from the first-order condition with regard to the optimal bond policy b However, the state space cropping distorts unit-free Euler equation errors. This is due to three sources: Firstly, the cropping already occurs for large stock holdings 1 ⊤ ⋅ x t that are less than one as stocks have to be sold to maintain minimum consumption c min . Secondly, transaction costs for selling the stocks are deducted. Thirdly, even if neither minimum consumption is required, nor transaction costs are incurred the error at the hyperplane 1 ⊤ ⋅ x t = 1 does not vanish even for full grid solutions. Only in the limit, as the resolution of the grid goes to infinity, the error will vanish. Economically, the region near this hyperplane is not significant as such large stock fractions are unusual, which is confirmed by Monte Carlo simulations. We therefore use the weighted Euler equation error instead of t . Alternatives would be restricting the state domain in which the error is computed or weighting the error with the probability that a given state occurs in a Monte Carlo simulation. 6 We then choose the same N = 1000 points x (k) ∈ Simplex ( k = 1, … , N ) for all times t = 0, … , T − 1 and compute the errors w t (x (k) ) for each t. 7 We report the L 2 norm scaled by √ d! and the L ∞ norm for each t: For details on the error derivation see "Appendix A.6". In principle, we could also compare the solutions ̂j S t and optimal policies p opt,S t obtained on sparse grids with the full grid solution, e.g., in a point-wise way. However, full grid solutions with acceptable resolutions are computationally infeasible already in d > 2 . In addition, the Euler equation error does not compare numerical solutions with each other, but rather measures the accuracy of any solution, regardless of whether it is obtained numerically or analytically.

Economical Verification
We show in Fig. 7 a full grid solution for the case of d = 2 stocks, i.e., {x (k) t | k = 1, … , N t } = {0, 2 −n , … , 1} d for some fixed level n ∈ ℕ (here, n = 7 and N t = (2 7 + 1) 2 = 16641 ) and for all t = 0, … , T . The red dot (x t,1 , x t,2 ) = (0.1509, 0.1831) shows the so-called Merton point for which Merton (1969) derives that in the case of = 0 the optimal stock fractions x opt t are constant over time and wealth. When faced with transaction costs, Magill and Constantinides (1976) find that the investor must weigh up the benefits of improved diversification against the associated transaction costs for rebalancing the portfolio. This leads to the no-trade region (red outline). If x opt t lies within this region, the investor does not alter her portfolio. In discrete time consumption and portfolio choice, the no-trade region is known to be a convex set, and, if the current stock fraction is outside this region, the optimal policy is to move to the convex hull of the set (Abrams and Karmarkar 1980;Constantinides 1979). Naturally, the Merton point lies inside the no-trade region. We can confirm this result for our optimal policies, i.e., if we choose any point outside the no-trade region (but within the eligible domain Simplex ), computing x t + +,opt t − −,opt t then results in a boundary point of the no-trade region. Figure 7 also shows the impact of the state space cropping as the eligible subspace Simplex ⊂ [0, 1] d is not a rectangular domain as needed for the sparse grid interpolation. This is the reason why the certainty equivalent value function ̂j S t is zero in [0, 1] d ⧵ Simplex and the optimal sell policies −,opt,S t contain a diagonal kink at the hyperplane 1 ⊤ ⋅ x t .
Obviously, computing full grid solutions is only computationally feasible for low dimensionalities d due to the curse of dimensionality. The two-dimensional solution of level n = 7 took over nine hours to compute on the LOEWE-CSC cluster with 120 threads. The solution of the next level is estimated to already take one week. Hence, full grid solutions can only be computed up to d = 3 due to prohibitively long computational times for d ≥ 4 . This underlines the need for sophisticated discretization techniques such as sparse grids.

Savings in Complexity Using B-Splines
A complexity analysis reveals that the difficulty of solving dynamic portfolio choice models quickly grows with the dimensionality d: The number of necessary arithmetic operations grows like (see Fig. 6) where for the transaction costs problem m p = 2d + 1 and Q t , N t , N t+1 ∈ (2 n n d−1 ) if regular sparse grids of level n without boundary points would be used for state and stochastic grids (due to m = d ). In addition, the number of optimizer iterations is likely superlinear in d as this depends on the dimensionality m p of the search space as well as on the number of multi-start points (which also grows with m p ). This means that the complexity is at least cubic in d, quadratic in the average number N t of employed state grid points, and linear in the number Q t of quadrature points. Figure 8 confirms these observations with experimental data using regular sparse grids without spatially adaptive refinement. For fixed d, the total time required by

Comparison to Piecewise Linear Functions
Hierarchical B-splines introduce two major benefits to the solution of dynamic portfolio choice models. The first benefit are the smooth objective functions: When repeating the computations with piecewise linear functions (i.e., p = 1 ), one obtains almost the same weighted Euler equation errors as in the cubic case (except for the case of d = 1 where the error is one order of magnitude larger than in the cubic case). However, as we see in Fig. 8, the total computational time is several times larger for piecewise linear functions although evaluations are cheaper than for B-splines. The main reason is that the number of required optimizer iterations for piecewise linear basis functions is almost seven times as high as in the cubic case since the optimizer has to deal with kinks in the objective function. Our experiments show that beginning with d = 4 , the total optimization time required to solve the transaction costs problem will be one whole order of magnitude shorter for cubic B-splines than for piecewise linear functions.

Comparing Exact Gradients to Finite Differences
The second benefit is the availability of exact gradients: Figure 8 also contains computational times of the solution process if we artificially do not use exact gradients of the objective functions, but rather approximate them with finite differences. For each evaluation of the objective gradient, at least m p additional evaluations of the objective function have to be performed to compute the finite differences ( 2m p if central differences are used). Consequently, while the resulting weighted Euler equation errors are similar, the total optimization time increases by a factor of up to five if we do not use exact gradients. Figure 9 shows the convergence of the scaled L 2 norm w,L 2 t and the L ∞ norm w,L ∞ t of the weighted Euler equation error for t = 0 for regular sparse grids and spatially adaptive sparse grids for the cases of d = 1, … , 4 stocks. We look at the error for t = 0 throughout the remaining sections as numerical inaccuracies accrue from time t to time t − 1 by the dynamic programming nature of the problem (3). For Fig. 9 and the following plots, the value function grid is left unchanged while the average number N t of policy grid points increases with decreasing refinement threshold . This is because the value function grid does not seem to have a great influence on the convergence of the Euler equation errors. Compared to regular sparse grids, the spatial adaptivity decreases the error by two orders of magnitude in one dimension. The gain is smaller for higher dimensionalities d, but spatially adaptive sparse grids still outperform regular sparse grids. For d = 2 , we observe that the error saturates at N 0 ≈ 4000 points. This is most likely due to floating-point rounding errors that are not influenced by sparse grid interpolation. In addition, convergence significantly decelerates starting with d = 4 . For d = 4 , spatially adaptive sparse grids are able to achieve a weighted Euler equation error of w,L 2 0 ≈ 1.99e−02 and w,L ∞ 0 ≈ 5.76e−02 (with an average number N 0 = 4252 of policy grid points). For d = 5 , we are still able to achieve a small error of w,L 2 0 ≈ 2.67e−02 and w,L ∞ 0 ≈ 6.37e−02 , respectively, with spatially adaptive sparse grids with an average number N 0 = 12572 of policy grid points. While we cannot detect any convergence for this dimensionality yet, this is still a major result as such high-dimensional models could not be solved that accurately up to now with conventional methods.

Accuracy Through Spatial Adaptivity
Pointwise plots of the weighted Euler equation error as in Fig. 10 for two stocks reveal that there are two types of regions where the error is large: The first type of region is the neighborhood of the aforementioned diagonal boundary 1 ⊤ ⋅ x t = 1 of the uncropped region where the cropping distorts the error despite the weights. The second type of region are kinks of the optimal policy functions, which is most visible for coarse grids (e.g., Fig. 10a). When increasing the number of grid points (e.g., Fig. 10b, c), the error decreases quickly in the whole domain.
All in all, Figs. 9 and 10 show that there are two necessary conditions to compute accurate solutions in higher dimensions: Firstly, reliable optimization enabled through B-spline interpolants of the value function and, secondly, spatial (a) (b) (c) (d) Fig. 9 Convergence of the scaled L 2 norm w,L 2 t (solid) and L ∞ norm w,L ∞ t (dashed) of the weighted Euler equation error for t = 0 for regular sparse grids (blue) and spatially adaptive sparse grids (red). The number N t is the average number 1∕m p ∑m p j=1 N t,j of grid points over all policy grids for t = 0 where N t,j is the number of grid points of the j-th policy entry adaptive refinement of the policy grids. The latter condition was originally proposed in our previous work (Schober 2018). Figures 11 and 12 each display the value function and the optimal policies corresponding to sparse grid solutions for d = 2 stocks with N 0 = 879 policy grid points or d = 5 stocks with N 0 = 12572 policy grid points. Obviously, most grid points are placed along the various kinks in the policies. Interestingly, experiments show that the surplus-based refinement criterion does not place more grid points along the perfectly diagonal kink caused by the cropping of the state space (i.e., along 1 ⊤ ⋅ x t = 1 ). It is possible to circumvent this issue by either rotating the domain or directly incorporating the distance to the diagonal into the refinement criterion for the value function. However, we refrain from doing so here as this does not seem to drastically improve results. Again, this might be due to the domination of the overall error by general floating-point rounding errors.

Solutions in Higher Dimensions
For higher dimensions, economic results for our transaction costs problem (31a-31h) scarcely exist. In continuous time, analytical solutions for special cases (Liu 2004;Liu and Loewenstein 2002) and numerical solutions with finite  (Muthuraman and Kumar 2006) have been discussed. Dynamic programming solutions with value function iteration in discrete time have been studied by Cai (2009), Cai and Judd (2010), Cai et al. (2020) for up to six stocks and one bond without consumption choice.
The purpose of this paper is to show the numerical accuracy of our approach. Rather than analyzing the economic implications of the solution to the higherdimensional transaction costs problem, we limit ourselves to assessing the resulting optimal policy interpolants (p We plot the resulting average state and policies in Fig. 13 for d = 2 , 3, 4, and 5 stocks for m MC = 10 5 individuals. In addition, this figure contains the evolution of the weighted Euler equation error w,L 2 t over time. We perform a two-part assessment of the simulation results: First, consumption is slightly increasing over time, which is plausible given the solution, e.g., by Merton (1969) in the finite-horizon case. Second, we compare the stock fractions implied by the Merton points with the simulated stock fractions Table 1. We observe that the simulated stock fractions deviate from the Merton points' stock fractions as expected. However, after computing the simulated stock fractions for all times, we see that they do not change much over time. This is in line with the buy-and-hold characteristics of solutions to portfolio choice models with transaction costs (e.g., Liu and Loewenstein 2002). Finally, we present in Table 2 the computational times and numerical errors of the sparse grid solutions underlying Fig. 13 and Table 1. (c) (d) Fig. 13 Average values of wealth W t (blue), unnormalized optimal bonds B t (purple), unnormalized optimal consumption C t (green), and unnormalized stock holdings x t ∶= (x t + + t − − t )W t after buying and selling (red) in a Monte Carlo simulation of 10 5 individuals where we assume that W 0 = $1 for all individuals. In addition, the plots show the evolution of the scaled L 2 error w,L 2 t over time t (gray, right axes)

Conclusion
In this paper, we are the first to develop an approach to accurately solve high-dimensional dynamic portfolio choice models in discrete time that require smooth approximations or gradient-based optimization. With our approach, we have addressed the three key issues of solving these models by means of value function iteration: the curse of dimensionality, the lack of spatial adaptivity, and the lack of continuous gradients all at once by using B-splines on sparse grids with spatially adaptive refinement. We have solved a dynamic portfolio and consumption choice model with transaction costs to study the numerical accuracy of our approach. Solutions to the transaction costs problem with value function iteration have achieved economically acceptable results already for lower resolutions of the interpolation grid than in our presented example. Our approach, however, can easily be applied to other dynamic portfolio choice models or any high-dimensional economic model that require such a high resolution. We have solved the transaction costs problem with up to five stocks and one risk-free bond, i.e., a five-dimensional interpolation and an eleven-dimensional optimization problem per time step. Using spatially adaptive refinement of the optimal policies, we have obtained maximum unit-free Euler equation errors around 5% for the five-dimensional problem and even lower maximum errors for lower-dimensional problems. This showcases the high accuracy of the proposed spatially adaptive solution scheme for the optimization of continuous choices, which relies on smooth approximations of the value function and the gradient. We have shown convergence of our approach in up to four dimensions. Here, spatially adaptive refinement of the optimal policies decreased the maximum Table 2 Number of stocks d, base level n, refine tolerance , grid points of the base grid | S n,d | for d stocks and level n, grid points N t of the refined grid, added points N t , computational time and weighted Euler equation errors w,L 2 Euler equation error by nearly two orders of magnitude in the four-dimensional case compared to regular sparse grids without spatially adaptive refinement. This has shown that only with spatial adaptivity, high-dimensional problems can be solved accurately. Finally, we have given a rigorous analysis of the complexity of our approach for dynamic portfolio choice models in general, not only for the transaction costs problem for which we have verified our analysis of complexity with measurements of computational time. We have found that the sole availability of the gradient for the optimization process has saved nearly one order of magnitude in computational complexity in the three-stock case. We expect even larger reductions of the computational complexity for higher-dimensional problems. Compared to finite differences with interpolation on hat functions as used by Brumm and Scheidegger (2017), we have saved considerably more than one order of magnitude in computational complexity and one order of magnitude in total computational time in three dimensions.
There are certain limitations to the applicability of spatially adaptive sparse grids to solve high-dimensional dynamic economic models: Firstly, sparse grid approximations are not shape-preserving, which is especially of importance for value function iteration with interpolation (Cai and Judd 2012). Secondly, the calculation of the coefficients of the B-spline interpolant is time-consuming and not trivial to parallelize since the solution to a system of linear equations has to be computed in every time step. Thirdly, the exact choice of the refinement tolerance for value function and policy interpolants is subject to trial-and-error. Choosing a refinement tolerance that is too low will lead to too many points that are inserted and may cause instability of the entire scheme if the optimizer does not give perfect results.
Future improvements of our approach may lie in the use of problem-tailored adaptivity criteria (Brumm and Scheidegger 2017;Pflüger 2012) instead of the simple surplus-based refinement criterion.