Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Schober, Peter; Valentin, Julian; Pflüger, Dirk

doi:10.1007/s10614-020-10061-x

Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Open access
Published: 04 January 2021

Volume 59, pages 185–224, (2022)
Cite this article

Download PDF

You have full access to this open access article

Computational Economics Aims and scope Submit manuscript

Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Download PDF

2033 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 13 March 2021

This article has been updated

Abstract

Discrete time dynamic programming to solve dynamic portfolio choice models has three immanent issues: firstly, the curse of dimensionality prohibits more than a handful of continuous states. Secondly, in higher dimensions, even regular sparse grid discretizations need too many grid points for sufficiently accurate approximations of the value function. Thirdly, the models usually require continuous control variables, and hence gradient-based optimization with smooth approximations of the value function is necessary to obtain accurate solutions to the optimization problem. For the first time, we enable accurate and fast numerical solutions with gradient-based optimization while still allowing for spatial adaptivity using hierarchical B-splines on sparse grids. When compared to the standard linear bases on sparse grids or finite difference approximations of the gradient, our approach saves an order of magnitude in total computational complexity for a representative dynamic portfolio choice model with varying state space dimensionality, stochastic sample space, and choice variables.

Solving Dynamic Portfolio Choice Models in Discrete Time Using Spatially Adaptive Sparse Grids

Dynamic portfolio choice: a simulation-and-regression approach

Article 11 March 2017

Valuation of Structured Financial Products by Adaptive Multiwavelet Methods in High Dimensions

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A common approach to solve dynamic portfolio choice models in discrete time is dynamic programming, iterating over the value function backwards in time. Starting from the known value function at final time T, the value function is approximated on a state space grid, assuming that the state space is continuous. To determine the next iterate of the value function at each grid point at time $T- 1$ we have to solve an optimization problem that depends on the value function of the previous iterate at time T. When a tensor product approximation is used, this approach suffers from the curse of dimensionality as the number of grid points of the approximation grows exponentially with the dimensionality of the state space. In addition, solving for the current value function iterate at a grid point relies on an accurate solution of the underlying optimization problem. When the portfolio choice is continuous, e.g., choosing the investment amount in stocks, bonds, etc., the computation of the optimal solution can be greatly accelerated by gradient-based optimization routines if the gradient of the objective function is available.

Recently, sparse grids have been successfully employed to break the curse of dimensionality in high-dimensional dynamic models (Brumm and Scheidegger 2017; Judd et al. 2014; Schober 2018; Winschel and Krätzig 2010).^{Footnote 1} A standard d-dimensional tensor product grid on the unit hypercube $[0,1]^d$ with mesh size $2^{-n}$, $n \in \mathbb {N}$, and no points on the boundary contains $2^{n} - 1$ grid points per coordinate direction and thus $\mathscr {O}(2^{nd})$ points in total, growing exponentially with the dimensionality d. In contrast, a regular sparse grid with the same mesh size contains only $\mathscr {O}{(2^{n} n^{d-1})}$ points. The error of the sparse grid approximation of a function with homogeneous boundary conditions using piecewise linear basis functions is $\mathscr {O}(2^{-2n} n^{d-1})$ with respect to the $L^2$ and $L^\infty $ norm if the approximated function has bounded mixed second derivatives (Bungartz and Griebel 2004; Zenger 1991). This is only slightly worse than the corresponding error $\mathscr {O}(2^{-2n})$ for the case of full tensor product grids.

In higher dimensions, even regular sparse grids need too many grid points for a sufficiently accurate approximation when solving high-dimensional dynamic models (Brumm and Scheidegger 2017). Fortunately, for approximations in the standard piecewise linear basis, the hierarchical structure of sparse grids allows for spatially adaptive refinement of the grid by inserting the 2d children of only certain leaves in the hierarchical structure. Spatially adaptive refinement has successfully been employed to solve high-dimensional dynamic models by Brumm and Scheidegger (2017) and Schober (2018). Unfortunately, approximations of the value function using the standard piecewise linear basis are not continuously differentiable and, hence, have discontinuous gradients. This poses a problem to gradient-based optimization techniques, which rely on a twice continuously differentiable approximation of the objective function to ensure convergence (Schober 2018).

Global polynomial approximations have shown to work well with value function iteration and continuous choices for solving dynamic economic models (Cai and Judd 2015; Judd et al. 2014) as they are globally smooth. Smolyak’s formula can be used to construct sparse grid approximations (Barthelmann et al. 2000) on global polynomial bases, which can be refined adaptively with regard to specific dimensions of the state space (Judd et al. 2014) and with regard to the hierarchical surpluses, i.e., locally adaptively (Stoyanov 2017). Value function iteration with the use of gradient information to approximate the value function more accurately with global polynomials is also possible (Cai and Judd 2015).

However, B-splines are much more flexible than global polynomials (Valentin and Pflüger 2016; Valentin 2019). While global polynomial approximations are bound to certain grid structures to avoid Runge’s phenomenon or similar issues, B-spline basis functions can be employed on any nested spatially adaptive grid hierarchy. They allow for simultaneous local- and degree-adaptive refinement (hp-adaptivity), implying that one could use a smaller or larger mesh size and/or degree of the B-spline basis functions in certain regions of the state space, e.g., to resolve kinks. In addition, the local basis functions are faster to evaluate than the conventional global polynomial basis functions. Approximations with B-splines of cubic degree (or higher) are twice continuously differentiable, and readily supply smooth and explicit approximations of both, the value function and the gradient. Compared to approximating the derivatives with finite differences, the optimization is not only more accurate but also significantly faster, especially when the number of optimization variables is large (Valentin 2019). B-splines have thus proven useful for computing numerical solutions to numerous dynamic models when finding the root of the gradient is required (Chu et al. 2013; Habermann and Kindermann 2007; Judd and Solnick 1994; Philbrick and Kitanidis 2001).

In total, three issues with discrete time dynamic programming for dynamic portfolio choice models with continuous choices emerge: the curse of dimensionality, the lack of spatial adaptivity, and the lack of continuous gradients. It is apparent that current economic literature deals with these issues only in isolation, e.g., by combining sparse grids with global polynomial basis functions, or using sparse grids with non-smooth local linear basis functions to allow for spatial adaptivity. These approaches are hence computationally inefficient in accurately solving high-dimensional dynamic portfolio choice models or any high-dimensional dynamic economic model that requires smooth approximations or gradient-based optimization.

This paper is the first to address all of these issues at once by combining hierarchical B-splines with sparse grids to approximate the value function and its gradient. Thus, we enable accurate and fast numerical solutions using gradient-based optimization while still allowing for spatial adaptivity (Pflüger 2010; Valentin and Pflüger 2016). The hierarchical grid structure allows us to develop an algorithm that uses the local adaptivity similar to Brumm and Scheidegger (2017) and Schober (2018), but interpolates the value function and its gradient with a B-spline basis. Therefore, we create a sparse grid for the value function, for which we interpolate the value function in the piecewise linear basis. We then refine the grid using the standard hierarchical surplus-volume-based refinement criterion. Finally, we interpolate the value function with hierarchical B-spline basis functions on the spatially adaptively refined sparse grid.

We focus our study on the numerical accuracy of our approach. Therefore, we choose a dynamic portfolio choice model with multiple stocks, one bond, and consumption. For buying and selling the stocks, linear transaction costs have to be deducted. The resulting optimization problem is high-dimensional in terms of the state space, stochastic sample space, and choice variables. Hence, this problem is especially suited for a complexity analysis. At the same time, this model is similar to models from a vast strand of state-of-the-art literature on dynamic portfolio choice, e.g., Barberis and Huang (2009), Cocco et al. (2005), De Giorgi and Legg (2012), Horneff et al. (2010), Horneff et al. (2008), Hubener et al. (2016), Hubener et al. (2014), Inkmann et al. (2011). Consequently, our approach can be generalized to a broad class of dynamic portfolio choice models with only minor modifications.

Dynamic portfolio choice models with transaction costs have been studied economically, e.g., by Abrams and Karmarkar (1980), Kamin (1975), Liu and Loewenstein (2002), Magill and Constantinides (1976), and extensively numerically by Cai (2009), Cai and Judd (2010), Cai et al. (2015), Cai et al. (2020). The latter report computational times and economic solutions for higher-dimensional transaction costs problems. They employ polynomial interpolation with only few polynomial nodes and parallelization to solve these problems with and without consumption using value function iteration in discrete time. Cai et al. (2020) present convergence results and computational times for the three-dimensional transaction costs problem with consumption and numerical errors for the four-dimensional problem using complete Chebyshev polynomials to approximate the value function. However, only we have employed spatially adaptive sparse grids to the transaction costs problem, which induces optimization on continuous choices (Schober 2018). We also apply local adaptivity to compute the optimal policies from the solution to the underlying optimization problem as earlier suggested by us (Schober 2018) and Brumm and Grill (2014).

A complexity analysis reveals that cubic B-splines save more than one order of magnitude in computational effort compared to the state of the art with the linear basis (Brumm and Scheidegger 2017) and/or finite difference approximations of the gradient on regular sparse grids. Using spatially adaptive refinement of the optimal policy, we solve the problem for up to five dimensions where highly accurate solutions on regular sparse grids would require hundreds of thousands of grid points, and are thus no longer suitable. Here, spatially adaptive refinement allows a comparably low base resolution in the solution process of the value function, and, in a second step, adds grid points in the optimal policies where required. By this, we obtain low reported unit-free Euler equation errors for the transaction costs problem.

The rest of this article is structured as follows: In Sect. 2, we define the general class of dynamic portfolio choice models for which our approach is applicable. Section 3 introduces hierarchical B-splines on spatially adaptive sparse grids, leading to the definition of hierarchical weakly fundamental not-a-knot splines. Algorithms for solving dynamic portfolio choice models with B-splines on spatially adaptive sparse grids are discussed in Sect. 4. We analyze the complexity and demonstrate the numerical accuracy of our approach solving the transaction costs problem in Sect. 5 before concluding in Sect. 6.

2 Discrete Time Dynamic Portfolio Choice Models

We consider discrete time dynamic portfolio choice models with finite time horizon T in which the investor seeks to maximize additive expected life-time utility $u$ from consumption $c_t$:

$$\begin{aligned} \mathbb {E}_0 \left[ \sum _{t=0}^T \rho ^{t} u(c_t({{\varvec{p}}}_t, {{\varvec{x}}}_t))\right] \, . \end{aligned}$$

(1)

Here, she has $m_{{{\varvec{p}}}}$ continuous choices ${{\varvec{p}}}_t \in \varPsi \subset \mathbb {R}^{m_{{{\varvec{p}}}}}$ (e.g., investment amounts in stocks and bonds) with respect to d continuous states ${{\varvec{x}}}_t \in \varOmega \subset \mathbb {R}^d$ (e.g., current financial wealth or labor income) she can reside in at time t. In addition, the transition from state ${{\varvec{x}}}_t$ to ${{\varvec{x}}}_{t+1}$ does not only depend on her choices and her state, but may also be subject to $m_{{\varvec{\zeta }}}$ random shocks ${\varvec{\zeta }}_t \in Z$ (such as stock returns or labor income shocks), which are drawn from the sample space $Z\subset \mathbb {R}^{m_{{\varvec{\zeta }}}}$. The random variable ${\varvec{f}}_t :\varPsi \times \varOmega \times Z\rightarrow \varOmega $, $({{\varvec{p}}}_t, {{\varvec{x}}}_t, {\varvec{\zeta }}_t) \mapsto {{\varvec{x}}}_{t+1}$, then describes the continuous state dynamics between t and $t+1$. We denote by $\rho < 1$ the subjective time discount factor, and we assume the utility function to be of Constant Relative Risk Aversion type with risk aversion $\gamma > 1$:

$$\begin{aligned} u(c_t) = \frac{1}{1-\gamma } c_t^{1-\gamma } \, . \end{aligned}$$

(2)

It is also straightforward to include discrete choices (compare also, e.g., Brumm and Scheidegger 2017) in the trivial way and discrete states (Schober 2018) in this model setup. Furthermore, the model can be generalized to further utility functions, e.g., to Epstein and Zin (1989) utility and to utility functions with a narrow framing component (Barberis and Huang 2009). In this paper, we disregard these modeling choices purely for simplicity.

By the Bellman principle (Bellman 1954), this utility maximization problem can be reformulated in terms of the value function $j_t$ for $t = 0, \dotsc , T$ with known terminal utility $v$

$$\begin{aligned} j_t({{\varvec{x}}}_{t})&= \max \limits _{{{\varvec{p}}}_t} \bigl \{ u(c_t({{\varvec{p}}}_t, {{\varvec{x}}}_t)) + \rho \mathbb {E}_t \left[ j_{t+1} ({\varvec{f}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t, {\varvec{\zeta }}_t) ) \right] \bigr \} \, , \;\; t < T \, , \end{aligned}$$

(3a)

$$\begin{aligned} j_T({{\varvec{x}}}_{T})&= v({{\varvec{x}}}_{T}) \, , \end{aligned}$$

(3b)

subject to the $m_{{\varvec{g}}}$ possibly non-linear inequality constraints ${\varvec{g}}_t :\varPsi \times \varOmega \rightarrow \mathbb {R}^{m_{{\varvec{g}}}}$:

$$\begin{aligned} {\varvec{g}}_t ({{\varvec{p}}}_t, {{\varvec{x}}}_t )&\ge {\varvec{0}} \, . \end{aligned}$$

(3c)

For two vectors ${\varvec{a}}, {\varvec{b}}$, we define ${\varvec{a}} \ge {\varvec{b}}$ if $a_i \ge b_i$ for all i (and “$\le $” analogously). The corresponding expected value of $j_{t+1}$ is

$$\begin{aligned} \mathbb {E}_t \left[ j_{t+1}({\varvec{f}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t, {\varvec{\zeta }}_t)) \right] :=\int \limits _{Z} j_{t+1}({\varvec{f}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t, {\varvec{\zeta }}_t)) \mathop {}\!\mathrm {d}\varPhi _t({\varvec{\zeta }}_t\,|\,{{\varvec{x}}}_t) \, . \end{aligned}$$

(4)

Here, $\varPhi _t(\cdot \,|\,{{\varvec{x}}}_t)$ denotes the conditional distribution of ${\varvec{\zeta }}_t$. Note that we only need the value function at time $t+1$ to determine the value function at time t via maximization.

To numerically solve the optimization problem (3), discrete-time dynamic programming iterating over the value function is common (Judd 1998; Rust 2008). Therefore, the value function $j_t$ is restricted to a finite grid on the (truncated) state space with $N_t$ grid points ${{\varvec{x}}}_t^{(k)}$, $k = 1, \dotsc , N_t$. The value function values in between grid points are interpolated by

$$\begin{aligned} j^{\mathrm {S}}_t ({{\varvec{x}}}_t) :=\sum _{k=1}^{N_t} \alpha _k \varphi _k ({{\varvec{x}}}_t)\, , \end{aligned}$$

(5)

with basis functions $\varphi _k$ and coefficients $\alpha _k$, which are chosen in such a way that the interpolant $j^{\mathrm {S}}_t$ fits the known function values at all grid points ${{\varvec{x}}}_t^{(k)}$. Beginning with the known final solution at time T, the Bellman equation is solved backwards in time until the value function is computed for each grid point at each $t = T -1, \dotsc , 0$.

For $t < T$, let us define the interpolant of the objective function of the maximization in Eq. (3a) for grid point ${{\varvec{x}}}_t^{(k)}$ by:

$$\begin{aligned} \tilde{j}^{\mathrm {S}}_t ({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)} ) :=u(c_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)} )) + \rho \mathbb {E}_t \left[ j^{\mathrm {S}}_{t+1} ({\varvec{f}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)}\!\!, {\varvec{\zeta }}_t))\right] \, . \end{aligned}$$

(6)

The maximization of this target function (6) with respect to ${{\varvec{p}}}_t$ can then be performed using Sequential quadratic programming (SQP) routines, see "Appendix A.1".

To compute the expectation with respect to $\mathop {}\!\mathrm {d}\varPhi $, numerical integration can be used if the conditional distributions $\varPhi (\cdot \,|\,{{\varvec{x}}}_t^{(k)})$ are known.^{Footnote 2}

3 Hierarchical B-Splines on Sparse Grids

As discussed in Sect. 1, hierarchical B-splines on sparse grids provide numerous advantages over other basis choices, especially in the context of optimization (Valentin 2019; Valentin and Pflüger 2016). In addition, they allow for spatially adaptive refinement by applying the standard surplus-based refinement criterion. To compute the coefficients of the B-spline approximation, usually a computationally expensive linear system has to be solved. Therefore, we determine the underlying grid structure by applying the surplus-based refinement criterion on the piecewise linear basis and interpolating with B-splines on the resulting grid. As proven in our previous work (Valentin 2019), the computational effort needed for the computation of the coefficients can be further reduced by using the unidirectional principle. This is facilitated by weakly fundamental not-a-knot splines and the insertion of some additional grid points.

3.1 Not-A-Knot B-Spline Basis

Let $p \in \mathbb {N}_0$, $m \in \mathbb {N}$, and ${\varvec{\xi }}= (\xi _0, \dotsc , \xi _{m+p})$ be an increasing sequence of real numbers. The B-spline $b_{k,{\varvec{\xi }}}^p$ of degree p for the knot sequence ${\varvec{\xi }}$ is defined via the Cox–de Boor recurrence (Cox 1972; de Boor 1972; Höllig and Hörner 2013)

$$\begin{aligned} \begin{aligned} b_{k,{\varvec{\xi }}}^p(x)&:=\frac{x - \xi _k}{\xi _{k+p} - \xi _k} b_{k,{\varvec{\xi }}}^{p-1}(x) + \frac{\xi _{k+p+1} - x}{\xi _{k+p+1} - \xi _{k+1}} b_{k+1,{\varvec{\xi }}}^{p-1}(x),\\ b_{k,{\varvec{\xi }}}^0(x)&:={\left\{ \begin{array}{ll}1 &{}\quad \text {if }x \in [\xi _k, \xi _{k+1}),\\ 0 &{} \quad \text {otherwise},\end{array}\right. } \end{aligned} \end{aligned}$$

(7)

where $k = 0, \dotsc , m - 1$.

It can be shown that for $m > p$, the B-splines $b_{0,{\varvec{\xi }}}^p, \dotsc , b_{m-1,{\varvec{\xi }}}^p$ form a basis of the spline space $S_{\varvec{\xi }}^p :=\mathrm{span}\{b_{k,{\varvec{\xi }}}^p \mid k = 0, \dotsc , m - 1\}$ on $D_{\varvec{\xi }}^p :=[\xi _p, \xi _m]$ (Höllig and Hörner 2013). The space $S_{\varvec{\xi }}^p$ contains exactly those functions $s:D_{\varvec{\xi }}^p \rightarrow \mathbb {R}$ which are piecewise polynomial of degree smaller or equal to p on every knot interval $[\xi _k, \xi _{k+1}]$ in the interior of $D_{\varvec{\xi }}^p$ ($k = p, \dotsc , m - 1$) and at least $p - 1$ times continuously differentiable at every knot $\xi _k$ in the interior of $D_{\varvec{\xi }}^p$, $k = p + 1, \dotsc , m - 1$ (Höllig and Hörner 2013).

For simplicity, we restrict the considerations and results in this paper to cubic B-splines (i.e., $p = 3$) although it is important to note that our method can be generalized to arbitrary odd B-spline degrees. A common special case is the case of linear B-splines ($p = 1$, so called hat functions), which are commonly used as basis functions for sparse grids.

We consider equidistant grid points $x_{\ell ,i} :=i h_\ell $ on the unit interval [0, 1] where $\ell \in \mathbb {N}_0$ is the level, $i = 0, \dotsc , 2^\ell $ is the index, and $h_\ell :=2^{-\ell }$ is the mesh size. We want to find basis functions $\varphi _{\ell ,i} :[0, 1] \rightarrow \mathbb {R}$ such that we can interpolate a given objective function $f:[0, 1] \rightarrow \mathbb {R}$ on the equidistant grid of level $\ell $ by a linear combination of the basis functions:

$$\begin{aligned} \tilde{f}(x_{\ell ,j}) = f(x_{\ell ,j}) \quad \text {for all }j = 0, \dotsc , 2^\ell \,,\quad \text {where}\quad \tilde{f} :=\sum _{i=0}^{2^\ell } \alpha _{\ell ,i} \varphi _{\ell ,i} \end{aligned}$$

(8)

with some $\alpha _{\ell ,i} \in \mathbb {R}$.

The most straightforward choice of B-splines for $\varphi _{\ell ,i}$ are uniform B-splines that are scaled and translated versions of the cardinal B-spline $b^3 :=b_{0,(0,1,2,3,4)}^3$ (Pflüger 2010; Valentin 2019):

$$\begin{aligned} \varphi _{\ell ,i}^\mathrm {unif}(x) :=b^3(2^\ell x + 2 - i)\,. \end{aligned}$$

(9)

The resulting uniform B-splines $\varphi _{\ell ,i}^\mathrm {unif}$, $i = 0, \dotsc , 2^\ell $, are exactly the B-splines $b_{k,{\varvec{\xi }}^\mathrm {unif}}^3$, $k = 0, \dotsc , m - 1$, that arise from Eq. (7) when choosing the uniform knot sequence ${\varvec{\xi }}^\mathrm {unif} :=(x_{\ell ,-2}, x_{\ell ,-1}, \dotsc , x_{\ell ,2^\ell +2})$ and $m :=2^\ell + 1$.

However, the interpolation domain on which the B-splines span the spline space would only be $D_{{\varvec{\xi }}^\mathrm {unif}}^3 = [\xi _p, \xi _m] = [x_{\ell ,1}, x_{\ell ,2^\ell -1}] = [2^{-\ell }, 1 - 2^{-\ell }]$. This interval does not contain the two boundary grid points $x_{\ell ,0} = 0$ and $x_{\ell ,2^\ell } = 1$. This leads to interpolation problems since the spline space on [0, 1] is not contained in the spanned space of the basis functions $\varphi _{\ell ,i}^\mathrm {unif}$. Even simple polynomials such as $f(x) = 4(x - 0.5)^2$ cannot be represented exactly with the B-spline basis on the whole domain [0, 1] as shown by Fig. 1a and Valentin and Pflüger (2016). Consequently, the approximation quality for more complex functions like the value functions we interpolate in this paper deteriorates unnecessarily, which means that the economic results are not as conclusive as they could be.

As a remedy, we impose so-called not-a-knot boundary conditions by removing the left-most and right-most inner grid points $x_{\ell ,1}$ and $x_{\ell ,2^\ell -1}$ from the knot sequence (Höllig and Hörner 2013; Valentin 2019). To keep the number $m = 2^\ell + 1$ of B-splines the same, we have to insert two additional knots outside the domain:

$$\begin{aligned} {\varvec{\xi }}^\mathrm {nak}:=(x_{\ell ,-3},\; \dotsc ,\; x_{\ell ,0},\; x_{\ell ,2},\; \dotsc ,\; x_{\ell ,2^\ell -2},\; x_{\ell ,2^\ell },\; \dotsc ,\; x_{\ell ,2^\ell +3})\,. \end{aligned}$$

(10)

The new interpolation domain $D_{{\varvec{\xi }}^\mathrm {nak}}^3 = [x_{\ell ,0}, x_{\ell ,2^\ell }]$ is now the whole unit interval [0, 1], containing all grid points at which we interpolate. As a result, the not-a-knot B-spline functions

$$\begin{aligned} \varphi _{\ell ,i}^\mathrm {nak}:=b_{i,{\varvec{\xi }}^\mathrm {nak}}^3\,,\quad i = 0, \dotsc , 2^\ell \,, \end{aligned}$$

(11)

form a basis of the spline space on [0, 1] corresponding to the grid $\{x_{\ell ,i} \mid i = 0, \dotsc , 2^\ell \} {\setminus } \{x_{\ell ,1}, x_{\ell ,2^\ell -1}\}$ and, consequently, the not-a-knot basis is able to reproduce all polynomials of degree smaller or equal to p on [0, 1], see Fig. 1b, Höllig and Hörner (2013), and Valentin (2019). Note that the removal of the two grid points $x_{\ell ,1}$ and $x_{\ell ,2^\ell -1}$ is only possible if $\ell \ge 2$. If $\ell = 0$ or $\ell = 1$, we define $\varphi _{\ell ,i}^\mathrm {nak}$ as the Lagrange polynomial corresponding to the data $\{(i', \delta _{i,i'}) \mid i' = 0, \dotsc , 2^\ell \}$ where $\delta _{i,i'}$ is the Kronecker delta, e.g.,

$$\begin{aligned} \varphi _{0,0}^\mathrm {nak}(x) :=1-x\,,\quad \varphi _{0,1}^\mathrm {nak}(x) :=x\,,\quad \varphi _{1,1}^\mathrm {nak}(x) :=4x(x-1)\,. \end{aligned}$$

(12)

3.2 Hierarchical Not-A-Knot B-Splines

The construction of sparse grids needs a hierarchical splitting of the nodal basis. Therefore, we define the so-called nodal subspaces $V_\ell ^\mathrm {nak}$ and hierarchical subspaces $W_\ell ^\mathrm {nak}$ by

$$\begin{aligned} V_\ell ^\mathrm {nak}:=\mathrm{span}\{\varphi _{\ell ,i}^\mathrm {nak}\mid i = 0, \dotsc , 2^\ell \}\,,\quad W_\ell ^\mathrm {nak}:=\mathrm{span}\{\varphi _{\ell ,i}^\mathrm {nak}\mid i \in I_\ell \}\,, \end{aligned}$$

(13)

where

$$\begin{aligned} I_\ell :={\left\{ \begin{array}{ll}\{i = 1, \dotsc , 2^\ell - 1 \mid i\text { odd}\} &{} \quad \ell > 0,\\ \{0, 1\} &{} \quad \ell = 0.\end{array}\right. } \end{aligned}$$

(14)

The bases of the hierarchical subspaces are shown in Fig. 2.

It can be shown that the basis functions of $W_\ell ^\mathrm {nak}$ for $\ell = 0, \dotsc , n$ and $n \in \mathbb {N}_0$ are linearly independent on [0, 1] (Valentin 2019). This means that the sum $\mathrm{span}\{\varphi _{\ell ,i}^\mathrm {nak}\mid \ell = 0, \dotsc , n\,,\; i \in I_\ell \}$ of the subspaces $W_0^\mathrm {nak}, \dotsc , W_n^\mathrm {nak}$ is direct (i.e., $W_\ell ^\mathrm {nak}\cap W_{\ell '}^\mathrm {nak}= \{0\}$ for $\ell \not = \ell '$) and can be written as $\bigoplus _{\ell =0}^n W_\ell ^\mathrm {nak}$. Due to dimensional arguments, the direct sum coincides with the nodal space,

$$\begin{aligned} \bigoplus _{\ell =0}^n W_\ell ^\mathrm {nak}= V_n^\mathrm {nak}\,. \end{aligned}$$

(15)

Both sides are equal to the not-a-knot spline space described before.

3.3 Sparse Grids

We generalize the univariate hierarchical basis to d-variate functions with a tensor product approach:

$$\begin{aligned} \varphi _{{\varvec{\ell }},{\varvec{i}}}^\mathrm {nak}:[0, 1]^d \rightarrow \mathbb {R}\,,\quad \varphi _{{\varvec{\ell }},{\varvec{i}}}^\mathrm {nak}({\varvec{x}}) :=\prod _{t=1}^d \varphi _{\ell _t,i_t}^\mathrm {nak}(x_t)\,, \end{aligned}$$

(16)

where level and index are multi-indices ${\varvec{\ell }}= (\ell _1, \dotsc , \ell _d) \in \mathbb {N}_0^d$ and ${\varvec{i}}= (i_1, \dotsc , i_d)$ with $i_t \in \{0, \dotsc , 2^{\ell _t}\}$ for $t = 1, \dotsc , d$. The corresponding grid points are given by

$$\begin{aligned} {\varvec{x}}_{{\varvec{\ell }},{\varvec{i}}} :=(x_{\ell _1,i_1}, \dotsc , x_{\ell _d,i_d}) \in [0, 1]^d\,, \end{aligned}$$

(17)

and nodal and hierarchical subspaces are defined by

$$\begin{aligned} V_{\varvec{\ell }}^\mathrm {nak}:=\mathrm{span}\{\varphi _{{\varvec{\ell }},{\varvec{i}}}^\mathrm {nak}\mid {\varvec{0}} \le {\varvec{i}}\le {\varvec{2}}^{\varvec{n}}\}\,,\quad W_{\varvec{\ell }}^\mathrm {nak}:=\mathrm{span}\{\varphi _{{\varvec{\ell }},{\varvec{i}}}^\mathrm {nak}\mid {\varvec{i}}\in I_{\varvec{\ell }}\}\,, \end{aligned}$$

(18)

where ${\varvec{0}} \le {\varvec{i}}\le {\varvec{2}}^{\varvec{n}}$ is to be read component-wise ($0 \le i_t \le 2^{n_t}$ for all $t = 1, \dotsc , d$) and $I_{\varvec{\ell }}= I_{\ell _1} \times \cdots \times I_{\ell _d}$ with the Cartesian product $\times $. The nodal subspace of level ${\varvec{n}}$ can be split into hierarchical subspaces by the d-dimensional generalization of Eq. (15):

$$\begin{aligned} \bigoplus _{{\varvec{\ell }}\le {\varvec{n}}} W_{\varvec{\ell }}^\mathrm {nak}= V_{\varvec{n}}^\mathrm {nak}\,, \end{aligned}$$

(19)

where ${\varvec{n}}\in \mathbb {N}_0^d$. In the following, we assume that the level is equal for every dimension: ${\varvec{n}}:=(n, \dotsc , n) = n \cdot {\varvec{1}}$.

Sparse grids provide a method for the interpolation of objective functions $f:[0, 1]^d \rightarrow \mathbb {R}$. The common approach is to use the nodal space $V_{\varvec{n}}^\mathrm {nak}$ for interpolation. However, the corresponding full grid of level n,

$$\begin{aligned} \{{\varvec{x}}_{{\varvec{n}},{\varvec{i}}} \mid {\varvec{0}} \le {\varvec{i}}\le {\varvec{2}}^{\varvec{n}}\}\,, \end{aligned}$$

(20)

contains $(2^n + 1)^d = \varOmega (2^{nd})$ grid points. If we interpolated with $V_{\varvec{n}}^\mathrm {nak}$, we would have to evaluate the objective function $\varOmega (2^{nd})$ times, a number that grows exponentially with the dimensionality d. This fact is known as the curse of dimensionality (Bellman 1961). If evaluations of the objective function are computationally expensive, then dimensionalities of $d \ge 4$ usually prohibit full grid approaches. For dynamic portfolio choice models, this means that only very coarse full grids may be employed in the state space if the number d of state variables is large.

Sparse grids exploit the splitting (19) to select only some hierarchical subspaces such that the number of necessary evaluations for interpolation is drastically reduced, but the interpolation error deteriorates only slightly. The subspace selection can be formulated as an optimization problem for the piecewise linear case ($p = 1$) on the $L^2$ and $L^\infty $ interpolation error as detailed by Bungartz and Griebel (2004). The basic idea is to select those hierarchical subspaces that contribute most to the interpolation, assuming the objective function is sufficiently smooth. The optimal a priori selection for hat functions is given by the regular sparse grid of level n:

$$\begin{aligned} V_n^{\mathrm {S},\mathrm {nak}} :=\bigoplus _{\Vert {{\varvec{\ell }}}\Vert _1 \le n} W_{\varvec{\ell }}^\mathrm {nak}\,, \end{aligned}$$

(21)

where the 1-norm $\Vert {{\varvec{\cdot }}}\Vert _1$ is given by $\Vert {{\varvec{\ell }}}\Vert _1 = \sum _{t=1}^d |\ell _t|$. This definition is illustrated in Fig. 3. It can be seen as an analogue to Eq. (19), which we obtain by replacing the 1-norm on the right-hand side of Eq. (21) with the $\infty $-norm $\Vert {{\varvec{\ell }}}\Vert _\infty :=\max \{|\ell _t| \mid t=1,\dotsc ,d\}$.

Although the definition is motivated by the piecewise linear case ($p = 1$), using other basis functions such as higher-order B-splines has proven useful in various applications (Pflüger 2010; Valentin and Pflüger 2016; Valentin et al. 2018). For $p = 1$ and homogeneous boundary conditions, the $L^2$ interpolation error on the full grid of level n is given by $\mathscr {O}(2^{-2n})$ and the number of grid points (i.e., the required function evaluations) is $\mathscr {O}(2^{nd})$. In contrast, the sparse grid $L^2$ interpolation error is $\mathscr {O}(2^{-2n} n^{d-1})$ and therefore only slightly worse (by a factor which is polynomial in n) while requiring $\mathscr {O}(2^n n^{d-1})$ grid points (see the proof in Bungartz and Griebel 2004). The number of grid points does not depend on 2nd anymore, which means that significantly less grid points than in the full grid case are required. For B-splines of degree p, the interpolation error has been proven by Sickel and Ullrich (2011) to be in the order of $\mathscr {O}(2^{-(p+1)n} n^{d-1})$, which differs from the corresponding full grid error $\mathscr {O}(2^{-(p+1)n})$ (see Höllig and Hörner 2013 for a proof) by the same polynomial factor $n^{d-1}$.

These a priori estimates are based on the assumption that the interpolated function has continuous mixed second derivatives. If this is not the case or if it contains oscillations with high frequencies, then spatial adaptivity must be employed. Therefore, grid points are refined a posteriori according to suitable refinement criteria, see Fig. 4. This is of particular importance for the scope of this paper as spatial adaptivity enables us to increase the accuracy in regions of interest while simultaneously keeping the number of grid points at an acceptable level. The idea of the common surplus-based refinement criterion is that in the piecewise linear basis, the hierarchical surpluses correspond to the integral of the mixed second derivative of the interpolated function (Bungartz and Griebel 2004). If the absolute value $|\alpha _{{\varvec{\ell }},{\varvec{i}}}|$ of the hierarchical surplus of a grid point ${\varvec{x}}_{{\varvec{\ell }},{\varvec{i}}}$ is larger than a certain tolerance $\varepsilon $, then the 2d children of ${\varvec{x}}_{{\varvec{\ell }},{\varvec{i}}}$ are inserted to improve the accuracy of the interpolation in the proximity of ${\varvec{x}}_{{\varvec{\ell }},{\varvec{i}}}$ (Pflüger 2012). This criterion is only motivated for piecewise linear basis functions. Therefore, and for reasons of complexity, we use the piecewise linear basis to determine the grid points to be refined, and interpolate with the B-spline basis on the refined grid.

3.4 Weakly Fundamental Not-A-Knot Splines

Let $\varOmega ^{\mathrm {S}}\subset [0, 1]^d$ be the set of grid points of the sparse grid, for example $\varOmega ^{\mathrm {S}}= \{{\varvec{x}}_{{\varvec{\ell }}, {\varvec{i}}} \mid \Vert {{\varvec{\ell }}}\Vert _1 \le n\,,\; {\varvec{i}}\in I_{\varvec{\ell }}\}$ for the regular sparse grid of level n (but dimensionally or spatially adaptive sparse grids are possible as well). In this setting, the task of interpolation is usually called hierarchization for the basis functions $\varphi _{{\varvec{\ell }},{\varvec{i}}}$. The resulting coefficients $\alpha _{{\varvec{\ell }},{\varvec{i}}}$ for Eq. (8) are the hierarchical surpluses.

Conventional B-spline bases, such as the not-a-knot B-splines described before, share the drawback that the hierarchization is in general computationally expensive. In the case of the common piecewise linear basis ($p = 1$), the hierarchical surpluses can be calculated in $\mathscr {O}(|\varOmega ^{\mathrm {S}}| \cdot d)$ time with the so-called unidirectional principle (Pflüger 2010). For B-splines, usually the solution of a linear system with $|\varOmega ^{\mathrm {S}}|$ unknowns is required, which is generally much slower as this needs $\mathscr {O}(|\varOmega ^{\mathrm {S}}|^3)$ time (where, e.g., $|\varOmega ^{\mathrm {S}}| = \mathscr {O}(2^n n^{d-1})$ for regular sparse grids of level n if we omit points on the boundary).

In one dimension, the reason is the additional couplings between the basis functions of different levels introduced by the wider support of the cubic B-splines compared to the piecewise linear functions. To mitigate this issue, we linearly combine as few neighboring nodal not-a-knot B-splines as possible such that the resulting combination $\varphi _{l,i}^\mathrm {wfnak}$ satisfies

$$\begin{aligned} \varphi _{l,i}^\mathrm {wfnak}(x_{k,j}) = 0 \quad \text {for all }k < l\text { and } j \in I_k\,, \end{aligned}$$

(22)

which we call the weakly fundamental property. The resulting basis functions are plotted in Fig. 5. This enables the efficient unidirectional principle for the hierarchization with the resulting weakly fundamental not-a-knot spline basis if specific points are inserted beforehand (for details, see Valentin 2019). Therefore, we use weakly fundamental not-a-knot splines on sparse grids for the rest of the paper.

4 B-Splines on Spatially Adaptive Sparse Grids for Dynamic Portfolio Choice Models

To solve the Bellman problem (3) numerically, we compute the value function interpolants (5) by solving

$$\begin{aligned} j^{\mathrm {S}}_t({{\varvec{x}}}_t^{(k)}) = \max _{{{\varvec{p}}}_t} \bigl \{ \tilde{j}^{\mathrm {S}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)}) \bigr \} \end{aligned}$$

(23)

at all grid points ${{\varvec{x}}}_t^{(k)}$ ($k = 1, \dotsc , N_t$), using higher-order B-splines on sparse grids for $j^{\mathrm {S}}_{t + 1}$ in the right-hand side of target function (6). This basis choice readily provides the gradient of the target function (6) at each ${{\varvec{x}}}_t^{(k)}$, such that we can supply it to any SQP routine. As a result of the SQP optimization, we obtain the values of the interpolant $j^{\mathrm {S}}_t({{\varvec{x}}}_t^{(k)})$ and the optimal policies ${{\varvec{p}}}^{\mathrm {opt},\mathrm {S}}_t({{\varvec{x}}}_t^{(k)})$ at these grid points for all $t < T$ and $k = 1, \dotsc , N_t$:

$$\begin{aligned} {{\varvec{p}}}^{\mathrm {opt},\mathrm {S}}_t({{\varvec{x}}}_t^{(k)}) = \mathop {\mathbf {arg max}}\limits _{{{\varvec{p}}}_t} \bigl \{ \tilde{j}^{\mathrm {S}}_t ({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)}) \bigr \} \, . \end{aligned}$$

(24)

In general, the shapes of the value function and the optimal policies have different characteristics. The sufficiently accurate resolution of the optimal policy is already relevant to achieve plausible economic results if full grid solutions are computed (Brumm and Grill 2014). On sparse grids, this is even more important as kinks in the optimal policies can deteriorate the numerical error drastically (Schober 2018). Hence, as proposed by Schober (2018), in a subsequent step, optimal policy interpolants ${{\varvec{p}}}^{\mathrm {opt},\mathrm {S}}_t$ are computed by adaptively refining the respective policy grids and re-optimizing for the refined grid points if the added grid point is not yet part of the solution from the first step.

We track two interpolants $j^{\mathrm {S}{,1}}_t$ and $j^{\mathrm {S}{,p}}_t$ for each $t = 0, \dotsc , T$. The former interpolates the value function data at the grid points ${{\varvec{x}}}_t^{(k)}$ ($k = 1, \dotsc , N_t$) with the hierarchical piecewise linear basis (used for the surplus-based grid generation) while the latter interpolates the data with cubic hierarchical weakly fundamental not-a-knot splines of degree $p = 3$. Each $j^{\mathrm {S}{,*}}_t$ ($*\in \{1, p\}$) additionally stores the grid points ${{\varvec{x}}}_t^{(k)}$ and the optimal policies ${{\varvec{p}}}^{\mathrm {opt}}_t({{\varvec{x}}}_t^{(k)})$ at the grid points. For simplicity, we do not pass them explicitly to the algorithms.

In the following Sects. 4.1–4.3, we describe the algorithmic details of the generation of the value function interpolant (23). The generation of the optimal policy interpolant (24) follows in Sect. 4.4. Major parts from the Sects. 4.1–4.4 are taken from our previous work in the recently submitted Ph.D. thesis of Valentin (2019).

4.1 Solution for the Value Function

Algorithm 1 shows solveValueFunction, generating the value function interpolants $j^{\mathrm {S}{,1}}_t$ and $j^{\mathrm {S}, p}_t$ ($t = 0, \dotsc , T$). The algorithm follows a simple optimize–refine–interpolate scheme, which is presented in Fig. 6: First, Eq. (23) is solved on an initial sparse grid (optimize). Then, we refine the grid spatially adaptively. Finally, the resulting grid data are interpolated with hierarchical higher-order B-splines.

At the beginning of every iteration t, the grid of the piecewise linear interpolant is reset to an initial, possibly regular sparse grid. It would also be possible to reuse the grid from the previous iteration $t + 1$. However, the results we then obtain become worse, likely due to the different characteristics of $j^{\mathrm {S}{,1}}_t$ for different t (e.g., kinks).

4.2 Optimization

The optimize step can be seen in Algorithm 2. This algorithm accepts in $j^{\mathrm {S}{,1}}_t$ a spatially adaptive sparse grid $\varOmega ^{\mathrm {S}}_{t} = \{{{\varvec{x}}}_t^{(k)} \mid k = 1, \dotsc , N_t\}$ where the function values $j^{\mathrm {S}{,1}}_t({{\varvec{x}}}_t^{(k)})$ may already be known for some grid points ${{\varvec{x}}}_t^{(k)}$ if optimize is called from within refine. The function optimize computes the missing value function values. For $t = T$, we assume that the terminal solution $j_T$ can be computed by some function computeKnownTerminalSolution.^{Footnote 3} Otherwise, for $t < T$, we solve the maximization problem (23) by using the higher-order B-spline interpolant $j^{\mathrm {S}{,p}}_{t+1}$ of the previous iteration $t + 1$ (optimizeSinglePoint). The computations for the different ${{\varvec{x}}}_t^{(k)}$ are independent of each other, which means that they can be computed in parallel (Cai et al. 2015; Horneff et al. 2016).^{Footnote 4} After generating all missing data, we update the hierarchical surpluses of the piecewise linear interpolant $j^{\mathrm {S}{,1}}_t$ to interpolate the new data at all grid points of $\varOmega ^{\mathrm {S}}_{t}$.

4.3 Refinement

For adaptive refinement, the criterion is the common surplus-volume (Pflüger 2010). We use the piecewise linear interpolant for the surplus-based grid generation as the surpluses are easier to compute in the piecewise linear case, and as they are more meaningful due to the integral representation formula (Bungartz and Griebel 2004). Algorithm 3 shows how to generate the spatially adaptive sparse grid in solveValueFunction (Algorithm 1). Parameters are the tolerance $\varepsilon \in \mathbb {R}_{\ge 0}$ by which the set of grid points to be refined is determined and the number $q\in \mathbb {N}_0$ of refinement iterations.

4.4 Solution for the Optimal Policies

To construct the optimal policies, we use the higher-order B-spline interpolant $j^{\mathrm {S}{,p}}_t$ and the optimal policies ${{\varvec{p}}}^{\mathrm {opt}}_t({{\varvec{x}}}_t^{(k)})$ at the grid points ${{\varvec{x}}}_t^{(k)}$ ($k = 1, \dotsc , N_t$) obtained from Algorithm 1. We then spatially adaptively refine the grid for each policy to construct a policy interpolant of degree one, ${{\varvec{p}}}^{\mathrm {opt},\mathrm {S},1}_t$, for each $t = 1, \dotsc , T$. The corresponding Algorithm 1 is similar to solveValueFunction (Algorithm 4), except that it operates on the policy interpolants instead of the value function interpolant. The functions optimize and refine have been replaced by corresponding policy versions optimizePolicy and refinePolicy that work very much like their value function counterpart. In the optimization step, optimizePolicy only has to generate new values if the initial regular sparse grid for the policies is not contained in the grid of $j^{\mathrm {S}{,p}}_t$. The policy grid is then refined independently of the value function grid. The iterations are independent of each other, which means that they can be parallelized.^{Footnote 5}

5 Application: Transaction Costs Problem

First, we introduce the dynamic portfolio choice model with transaction costs (Sect. 5.1). We then describe its numerical solution in Sect. 5.2 and derive our error measure (unit-free Euler equation errors) in Sect. 5.3. We verify our solution economically on a two-dimensional full grid (Sect. 5.4). Then, we analyze the time complexity of the solution approach and show the impact of the choice of basis functions on the computational complexity in Sect. 5.5. Therefore, we solve the problem with B-splines of cubic degree on a regular sparse grid and compare them to the linear approach as used by Brumm and Scheidegger (2017). We find that we save approximately one order of magnitude in computational complexity with cubic B-splines already for three dimensions. Furthermore, we compare the results of our approach based on analytical gradients with the results we obtain with finite differences. For $d>3$, solutions on regular sparse grids are no longer feasible to compute in suitable numerical accuracy. In Sect. 5.6, we illustrate how spatial adaptivity allows us to solve the transaction costs problem up to $d=5$ accurately by showing pointwise error decay and convergence of our approach. Finally, we present in Sect. 5.7 economic results for the transaction costs problem in higher dimensions. The results in these sections are also contained in the recently submitted Ph.D. thesis of Valentin (2019).

5.1 Transaction Costs Problem

As the transaction costs problem is easiest described in vector notation, let us denote the unit vector with ${\varvec{1}}$. For two vectors ${\varvec{a}}$, ${\varvec{b}}$, we define the Hadamard product as $({\varvec{a}} \odot {\varvec{b}})_i :=a_i b_i$ for all i.

In the transaction costs problem the investor maximizes expected utility from consumption (1). Therefore, at time t, she tracks her wealth $W_t \in \mathbb {R}_{\ge 0}$ and fractions of wealth ${\varvec{x}}_t \in [0, 1]^d$ invested in stocks. Her choices are how much to buy of the d stocks ${\varvec{\varDelta }}^+_t \in \mathbb {R}_{\ge 0}^d$ with transaction costs $\tau {\varvec{\varDelta }}^+_t$ or sell ${\varvec{\varDelta }}^-_t \in \mathbb {R}_{\ge 0}^d$ with transaction costs $\tau {\varvec{\varDelta }}^-_t$ where $\tau > 0$ is a cost factor. Additionally, she can invest in a transaction-cost-free money market account $B_t$, yielding a risk-free return $r_f\in \mathbb {R}_{\ge 0}$. We assume the returns ${\varvec{r}}_t \in \mathbb {R}_{\ge 0}$ that the d stocks earn from t to $t+1$ are independent and identically lognormally distributed with mean ${\varvec{\mu }}$ and covariance matrix ${\varvec{\varSigma }}$: ${\varvec{r}}_t \sim LN({\varvec{\mu }}, {\varvec{\varSigma }})$ (Cai 2009; Cai and Judd 2010). The investor’s consumption $C_t$ in period t is the residual of her wealth that is not invested in stocks or bonds, reduced by the transaction costs for rearranging her portfolio in this period:

$$\begin{aligned} C_t = \left( 1 - {\varvec{1}}^\top \cdot {\varvec{x}}_t \right) W_t - B_t - (1 + \tau ) {\varvec{1}}^\top \cdot {\varvec{\varDelta }}^+_t- (\tau - 1) {\varvec{1}}^\top \cdot {\varvec{\varDelta }}^-_t\, . \end{aligned}$$

(25)

The state dynamics from t to $t+1$ are thus given by:

$$\begin{aligned} W_{t+1}&= B_t r_f+ \left( {\varvec{x}}_t W_t + {\varvec{\varDelta }}^+_t - {\varvec{\varDelta }}^-_t\right) ^\top \cdot {\varvec{r}}_t \, , \end{aligned}$$

(26a)

$$\begin{aligned} {\varvec{x}}_{t+1}&= \frac{\left( {\varvec{x}}_t W_t + {\varvec{\varDelta }}^+_t - {\varvec{\varDelta }}^-_t\right) \odot {\varvec{r}}_t}{W_{t+1}} \, . \end{aligned}$$

(26b)

The investor faces the optimization problem

$$\begin{aligned} J_t(W_t, {\varvec{x}}_t)&= \max \limits _{B_t, {\varvec{\varDelta }}^+_t, {\varvec{\varDelta }}^-_t} \bigl \{ u(C_t) + \rho \mathbb {E}_t \left[ J_{t + 1} \left( W_{t + 1},{\varvec{x}}_{t+1}\right) \right] \bigr \} \, , \quad t < T \, , \end{aligned}$$

(27a)

$$\begin{aligned} J_{T}(W_T, {\varvec{x}}_T)&= u\left( \left( 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T \right) W_T\right) \, , \end{aligned}$$

(27b)

with utility function u from Eq. (2) subject to the constraint for all $t = 0, \dotsc , T$,

$$\begin{aligned} B_t + (1 + \tau ) {\varvec{1}}^\top \cdot {\varvec{\varDelta }}^+_t + (\tau - 1) {\varvec{1}}^\top \cdot {\varvec{\varDelta }}^-_t \le \left( 1 - {\varvec{1}}^\top \cdot {\varvec{x}}_t \right) W_t - C_{\text {min}}\, , \end{aligned}$$

(27c)

where ${\varvec{\varDelta }}^+_t \ge {\varvec{0}}$, ${\varvec{\varDelta }}^-_t \in [{\varvec{0}}, {\varvec{x}}_t W_t]$, $B_t \ge 0$, and ${\varvec{1}}^\top \cdot {\varvec{x}}_t \le 1$. Here, a minimum consumption level $C_{\text {min}}$ must be maintained, and the final stock holdings ${\varvec{x}}_T W_T$ are assumed to be sold before they can be consumed. In addition, at no point in time t the investor can sell more of the stocks than her current holdings ${\varvec{x}}_t W_t$.

The problem can be simplified by normalizing the value function $j_t = J_t / W_t$, consumption $c_t = C_t / W_t$, and investment choices $b_t = B_t / W_t$, ${\varvec{\delta }}^{+{}}_t = {\varvec{\varDelta }}^+_t / W_t$, ${\varvec{\delta }}^{-{}}_t = {\varvec{\varDelta }}^-_t / W_t$ with respect to wealth $W_t$ for each t. The investor’s normalized consumption $c_t$ in period t is then

$$\begin{aligned} c_t = 1 - {\varvec{1}}^\top \cdot {\varvec{x}}_t - b_t - (1 + \tau ) {\varvec{1}}^\top \cdot {\varvec{\delta }}^{+{}}_t - (\tau - 1) {\varvec{1}}^\top \cdot {\varvec{\delta }}^{-{}}_t \, . \end{aligned}$$

(28)

The state dynamics from t to $t+1$ can be expressed in terms of the portfolio value

$$\begin{aligned} \pi _{t+1} :=b_t r_f+ ({\varvec{x}}_t + {\varvec{\delta }}^{+{}}_t - {\varvec{\delta }}^{-{}}_t)^\top \cdot {\varvec{r}}_t \end{aligned}$$

(29)

in $t+1$:

$$\begin{aligned} W_{t+1}&= W_t \pi _{t+1} \, , \end{aligned}$$

(30a)

$$\begin{aligned} {\varvec{x}}_{t+1}&= \frac{\left( {\varvec{x}}_t + {\varvec{\delta }}^{+{}}_t - {\varvec{\delta }}^{-{}}_t\right) \odot {\varvec{r}}_t}{\pi _{t+1}} \, . \end{aligned}$$

(30b)

With this normalization, the solution to problem (27a–27c) can be expressed as $J_t = W_t^{1 - \gamma } j_t$ for each $t = 0, \dotsc , T$ with

$$\begin{aligned} j_t({\varvec{x}}_t)&= \max \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \bigl \{ u(c_t) + \rho \mathbb {E}_t \left[ \pi _{t+1}^{1 - \gamma } j_{t + 1} ({\varvec{x}}_{t+1}) \right] \bigr \} \, , \quad t < T \, , \end{aligned}$$

(31a)

$$\begin{aligned} j_{T}({\varvec{x}}_T)&= u\left( 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T \right) \, , \end{aligned}$$

(31b)

subject to the constraints for all $t = 0, \dotsc , T$,

$$\begin{aligned} b_t + (1 + \tau ) {\varvec{1}}^\top \cdot {\varvec{\delta }}^{+{}}_t + (\tau - 1) {\varvec{1}}^\top \cdot {\varvec{\delta }}^{-{}}_t&\le 1 - {\varvec{1}}^\top \cdot {\varvec{x}}_t - c_{\text {min}}\, , \end{aligned}$$

(31c)

$$\begin{aligned} {\varvec{\delta }}^{+{}}_t&\ge {\varvec{0}} \, , \end{aligned}$$

(31d)

$$\begin{aligned} {\varvec{\delta }}^{-{}}_t&\ge {\varvec{0}} \, , \end{aligned}$$

(31e)

$$\begin{aligned} {\varvec{\delta }}^{-{}}_t&\le {\varvec{x}}_t \, , \end{aligned}$$

(31f)

$$\begin{aligned} b_t&\ge 0 \, , \end{aligned}$$

(31g)

$$\begin{aligned} {\varvec{1}}^\top \cdot {\varvec{x}}_t&\le 1 \, , \end{aligned}$$

(31h)

where the minimum consumption level $c_{\text {min}}= C_{\text {min}}/ W_t$ is also normalized with respect to wealth (see "Appendix A.2"). Now the investor’s optimization problem no longer depends on $W_t$, and hence one state variable can be eliminated. The non-normalized optimal choices can be obtained by multiplication with a given wealth $W_t$ for any t and state ${\varvec{x}}_t$.

5.2 Numerical Solution

To compute the solution to the transaction costs problem, we use the certainty equivalent transformation $\hat{j}_t$ of the normalized value function $j_t$,

$$\begin{aligned} \hat{j}_t({\varvec{x}}_t) = \left( (1 - \gamma ) j_t({\varvec{x}}_t)\right) ^{\frac{1}{1 - \gamma }} \, , \end{aligned}$$

(32)

which reduces the curvature of the value function when the utility is of Constant Relative Risk Aversion type, Eq. (2) (Garlappi and Skoulakis 2009). Since this transform is strictly monotone, any maximizer of $\hat{j}_t$ also maximizes $j_t$. The optimization problem then reads

$$\begin{aligned} \hat{j}_t({\varvec{x}}_t)&= \max \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \Biggl \{ \left( c_t^{1 - \gamma } + \rho \mathbb {E}_t \left[ \left( \pi _{t+1} \hat{j}_{t + 1} ({\varvec{x}}_{t+1})\right) ^{1 - \gamma } \right] \right) ^{\frac{1}{1 - \gamma }} \Biggr \} \, , \end{aligned}$$

(33a)

$$\begin{aligned} \hat{j}_{T}({\varvec{x}}_T)&= 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T \, , \end{aligned}$$

(33b)

with constraints Eq. (31c) to (31h) (see "Appendix A.3").

The optimization problem for a given state ${\varvec{x}}_t$ is solved with the SQP solver SNOPT (Gill et al. 2005), see "Appendix A.4" for the specific objective function and its gradient. Since the distribution of the returns ${\varvec{r}}_t$ is multivariate lognormal and state-independent, we can compute the expectation in Eq. (6) using Gauss–Hermite quadrature. For this, we also use a sparse grid quadrature rule, thus breaking the curse of dimensionality when including stochastic risk factors (see the appendix of Horneff et al. 2016 for details).

The constraint (31h) constrains the state space. The resulting eligible subspace $\varOmega _{\mathrm{Simplex}}= \{{\varvec{x}}_t \in [0, 1]^d \mid {\varvec{1}}^\top \cdot {\varvec{x}}_t \le 1\} \subset [0, 1]^d$ is a d-dimensional simplex, not a rectangular domain as needed for the sparse grid approximation. We solve this problem by assuming that any state attained that is not eligible is cropped to an eligible state by selling all stock holdings pro rata until all constraints are satisfied. That is, money is transferred from stocks to wealth, for which the proportionate transaction costs are deducted (see "Appendix A.5"). The approximation of the value function is then evaluated at this eligible state.

The optimization ran over an investment horizon of $T = 6$ years and had a fixed period length of 1 year. The risk aversion $\gamma = 3.5$, the risk-free rate $r_f= {4} {\%}$, and the transaction costs factor $\tau = {1} {\%}$ were taken from Cai and Judd (2010). We extended the return distribution parametrization of Cai and Judd (2010) to five dimensions:

(34)

and set the time discount factor to $\rho = 0.97$ and $c_{\text {min}}= 0.001$ to ensure that minimal consumption was taking place. For d stocks we used the first d entries of ${\varvec{\mu }}$ and the elements ${\varvec{\varSigma }}_{i,j}$, $i, j \le d$, as the return distribution parametrization. As initial grids, we used regular sparse grids $\varOmega ^{\mathrm {S}}_{n,d}$ of level n and dimension d.

The code was written in MATLAB where the interpolation on sparse and full grids was implemented by a MEX file interface to the sparse grids C++ toolbox SG⁺⁺ (sgpp.sparsegrids.org, Pflüger 2010). The quadrature routine was implemented by a MEX file interface to the TASMANIAN sparse grids C++ toolbox (Stoyanov 2017) as TASMANIAN allows us to integrate a real valued function over a Gaussian density using Hermite polynomials on sparse grids (see the appendix of Horneff et al. 2016). We used the SNOPT implementation of the Numerical Algorithms Group (www.nag.co.uk). If convergence of the optimizer was not observed, we stopped the optimization after 100 iterations. To avoid being stuck in local minima, we repeated the optimization process for a varying number of initial multi-start points (in the range of a few dozens). All computations were performed on the compute cluster LOEWE-CSC (csc.uni-frankfurt.de) where we exclusively allocated three compute nodes with two Intel Xeon E5-2670 v2 CPUs (ten cores at 2.5 GHz, 20 threads) each, i.e., 120 threads in total, and 4000 MB RAM per thread.

5.3 Error Measurement

Any optimal policy ${{\varvec{p}}}^{\mathrm {opt}}_t :=(b^{\mathrm {opt}}_t, {\varvec{\delta }}^{+,{\mathrm {opt}}}_t, {\varvec{\delta }}^{-,{\mathrm {opt}}}_t)^\top $ must satisfy the first order conditions of the Lagrangian at any given state ${\varvec{x}}_t$ for each $t < T$. Specifically, for the transaction costs problem—when neglecting binding constraints—we obtain from the first-order condition with regard to the optimal bond policy $b^{\mathrm {opt}}_t$:

$$\begin{aligned} -{c^{\mathrm {opt}}_t}^{-\gamma } + \rho \mathbb {E}_t\left[ \left( \pi ^{\mathrm {opt}}_{t + 1} \hat{j}^{\mathrm {S}}_{t+1}\right) ^{-\gamma } r_f\left( \hat{j}^{\mathrm {S}}_{t+1} - \left( \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1}\right) ^\top \cdot {\varvec{x}}^{\mathrm {opt}}_{t+1}\right) \right] = 0\, , \end{aligned}$$

(35)

where

$$\begin{aligned} c^{\mathrm {opt}}_t&= 1 -{\varvec{1}}^\top \cdot {\varvec{x}}_t - b^{\mathrm {opt}}_t - (1 + \tau ) {\varvec{1}}^\top \cdot {\varvec{\delta }}^{+,{\mathrm {opt}}}_t - (\tau - 1) {\varvec{1}}^\top \cdot {\varvec{\delta }}^{-,{\mathrm {opt}}}_t \, , \end{aligned}$$

(36a)

$$\begin{aligned} \pi ^{\mathrm {opt}}_{t+1}&= b^{\mathrm {opt}}_t r_f+ ({\varvec{x}}_t + {\varvec{\delta }}^{+,{\mathrm {opt}}}_t - {\varvec{\delta }}^{-,{\mathrm {opt}}}_t)^\top \cdot {\varvec{r}}_t \, , \end{aligned}$$

(36b)

$$\begin{aligned} {\varvec{x}}^{\mathrm {opt}}_{t+1}&= \frac{\left( {\varvec{x}}_t + {\varvec{\delta }}^{+,{\mathrm {opt}}}_t - {\varvec{\delta }}^{-,{\mathrm {opt}}}_t\right) \odot {\varvec{r}}_t}{\pi ^{\mathrm {opt}}_{t+1}} \, . \end{aligned}$$

(36c)

Rearranging Eq. (35) and taking the root $(\cdot )^{-1 / \gamma }$ yields the unit-free Euler equation error,

$$\begin{aligned} \varepsilon ^{{}}_t({\varvec{x}}_t) = \left( \rho \mathbb {E}_t\left[ \left( \pi ^{\mathrm {opt}}_{t+1} \hat{j}^{\mathrm {S}}_{t+1}\right) ^{-\gamma } r_f\left( \hat{j}^{\mathrm {S}}_{t+1} - \left( \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1}\right) ^\top \cdot {\varvec{x}}^{\mathrm {opt}}_{t+1}\right) {c^{\mathrm {opt}}_t}^{\gamma }\right] \right) ^{-\frac{1}{\gamma }} - 1 \, , \end{aligned}$$

(37)

which should be 0 for any given state ${\varvec{x}}_t$ in the eligible domain $\varOmega _{\mathrm{Simplex}}$.

However, the state space cropping distorts unit-free Euler equation errors. This is due to three sources: Firstly, the cropping already occurs for large stock holdings ${\varvec{1}}^\top \cdot {\varvec{x}}_t$ that are less than one as stocks have to be sold to maintain minimum consumption $c_{\text {min}}$. Secondly, transaction costs for selling the stocks are deducted. Thirdly, even if neither minimum consumption is required, nor transaction costs are incurred the error at the hyperplane ${\varvec{1}}^\top \cdot {\varvec{x}}_t = 1$ does not vanish even for full grid solutions. Only in the limit, as the resolution of the grid goes to infinity, the error will vanish. Economically, the region near this hyperplane is not significant as such large stock fractions are unusual, which is confirmed by Monte Carlo simulations. We therefore use the weighted Euler equation error

$$\begin{aligned} \varepsilon ^{{\mathrm {w}}}_t({\varvec{x}}_t) :=\bigl (1 - {\varvec{1}}^\top \cdot {\varvec{x}}_t\bigr ) \varepsilon ^{{}}_t({\varvec{x}}_t) \end{aligned}$$

(38)

instead of $\varepsilon ^{{}}_t$. Alternatives would be restricting the state domain in which the error is computed or weighting the error with the probability that a given state occurs in a Monte Carlo simulation.^{Footnote 6}

We then choose the same $N = 1000$ points ${\varvec{x}}^{(k)} \in \varOmega _{\mathrm{Simplex}}$ ($k = 1, \dotsc , N$) for all times $t = 0 ,\dotsc , T - 1$ and compute the errors $\varepsilon ^{{\mathrm {w}}}_t({\varvec{x}}^{(k)})$ for each t.^{Footnote 7} We report the $L^2$ norm scaled by $\sqrt{d!}$ and the $L^\infty $ norm for each t:

$$\begin{aligned} \varepsilon ^{{\mathrm {w},L^2}}_t&:=\sqrt{\frac{1}{N} \sum _{k=1}^{N} |\varepsilon ^{{\mathrm {w}}}_t({\varvec{x}}^{(k)})|^2} \, , \end{aligned}$$

(39a)

$$\begin{aligned} \varepsilon ^{{\mathrm {w},L^\infty }}_t&:=\max \{|\varepsilon ^{{\mathrm {w}}}_t({\varvec{x}}^{(k)})| \mid k=1, \dotsc , N\} \, . \end{aligned}$$

(39b)

For details on the error derivation see "Appendix A.6".

In principle, we could also compare the solutions $\hat{j}^{\mathrm {S}}_t$ and optimal policies ${{\varvec{p}}}^{\mathrm {opt},\mathrm {S}}_t$ obtained on sparse grids with the full grid solution, e.g., in a point-wise way. However, full grid solutions with acceptable resolutions are computationally infeasible already in $d > 2$. In addition, the Euler equation error does not compare numerical solutions with each other, but rather measures the accuracy of any solution, regardless of whether it is obtained numerically or analytically.

5.4 Economical Verification

We show in Fig. 7 a full grid solution for the case of $d = 2$ stocks, i.e., $\{{\varvec{x}}_t^{(k)} \mid k = 1, \dotsc , N_t\} = \{0, 2^{-n}, \dotsc , 1\}^d$ for some fixed level $n \in \mathbb {N}$ (here, $n = 7$ and $N_t = (2^7 + 1)^2 = {16641}$) and for all $t = 0, \dotsc , T$. The red dot $(x_{t,1}, x_{t,2}) = (0.1509, 0.1831)$ shows the so-called Merton point

$$\begin{aligned} {\varvec{x}}^{\mathrm {opt}}_t :=\frac{{\varvec{\varSigma }}^{-1} ({\varvec{\mu }}- {\varvec{1}}r_f)}{\gamma } \, , \end{aligned}$$

(40)

for which Merton (1969) derives that in the case of $\tau = 0$ the optimal stock fractions ${\varvec{x}}^{\mathrm {opt}}_t$ are constant over time and wealth. When faced with transaction costs, Magill and Constantinides (1976) find that the investor must weigh up the benefits of improved diversification against the associated transaction costs for rebalancing the portfolio. This leads to the no-trade region (red outline). If ${\varvec{x}}^{\mathrm {opt}}_t$ lies within this region, the investor does not alter her portfolio. In discrete time consumption and portfolio choice, the no-trade region is known to be a convex set, and, if the current stock fraction is outside this region, the optimal policy is to move to the convex hull of the set (Abrams and Karmarkar 1980; Constantinides 1979). Naturally, the Merton point lies inside the no-trade region. We can confirm this result for our optimal policies, i.e., if we choose any point outside the no-trade region (but within the eligible domain $\varOmega _{\mathrm{Simplex}}$), computing ${\varvec{x}}_t + {\varvec{\delta }}^{+,{\mathrm {opt}}}_t - {\varvec{\delta }}^{-,{\mathrm {opt}}}_t$ then results in a boundary point of the no-trade region. Figure 7 also shows the impact of the state space cropping as the eligible subspace $\varOmega _{\mathrm{Simplex}}\subset [0, 1]^d$ is not a rectangular domain as needed for the sparse grid interpolation. This is the reason why the certainty equivalent value function $\hat{j}^{\mathrm {S}}_t$ is zero in $[0, 1]^d {\setminus } \varOmega _{\mathrm{Simplex}}$ and the optimal sell policies ${\varvec{\delta }}^{-,{\mathrm {opt},\mathrm {S}}}_t$ contain a diagonal kink at the hyperplane ${\varvec{1}}^\top \cdot {\varvec{x}}_t$.

Obviously, computing full grid solutions is only computationally feasible for low dimensionalities d due to the curse of dimensionality. The two-dimensional solution of level $n = 7$ took over nine hours to compute on the LOEWE-CSC cluster with 120 threads. The solution of the next level is estimated to already take one week. Hence, full grid solutions can only be computed up to $d = 3$ due to prohibitively long computational times for $d \ge 4$. This underlines the need for sophisticated discretization techniques such as sparse grids.

5.5 Savings in Complexity Using B-Splines

A complexity analysis reveals that the difficulty of solving dynamic portfolio choice models quickly grows with the dimensionality d: The number of necessary arithmetic operations grows like (see Fig. 6)

$$\begin{aligned} \varTheta \biggl ( T \cdot N_t \cdot \#\text { optimizer iterations} \cdot \underbrace{ Q_t \cdot \overbrace{ m_{{{\varvec{p}}}}\cdot N_{t+1} \cdot d \cdot p }^{{\text {one evaluation of interpolant}}} }_{{\text {one evaluation of objective gradient}}}\, \biggr ), \end{aligned}$$

(41)

where for the transaction costs problem $m_{{{\varvec{p}}}}= 2d + 1$ and $Q_t, N_t, N_{t+1} \in \varTheta (2^n n^{d-1})$ if regular sparse grids of level n without boundary points would be used for state and stochastic grids (due to $m_{{\varvec{\zeta }}}= d$). In addition, the number of optimizer iterations is likely superlinear in d as this depends on the dimensionality $m_{{{\varvec{p}}}}$ of the search space as well as on the number of multi-start points (which also grows with $m_{{{\varvec{p}}}}$). This means that the complexity is at least cubic in d, quadratic in the average number $N_t$ of employed state grid points, and linear in the number $Q_t$ of quadrature points. Figure 8 confirms these observations with experimental data using regular sparse grids without spatially adaptive refinement. For fixed d, the total time required by the optimization process grows quadratically with the number $N$ of grid points. The time for one solution of the Bellman equation, the time for one optimizer iteration, and the time for one evaluation of the interpolant are all linear in $N$ as the number of optimizer iterations is constant for fixed d. If d increases, then the number of interpolant evaluations per optimizer iteration (i.e., the number of quadrature points) increases as well. Surprisingly, the number of optimizer iterations per grid point and the time per evaluation are not monotonously increasing. The latter observation might be due to code optimization effects such as vectorization.

5.5.1 Comparison to Piecewise Linear Functions

Hierarchical B-splines introduce two major benefits to the solution of dynamic portfolio choice models. The first benefit are the smooth objective functions: When repeating the computations with piecewise linear functions (i.e., $p = 1$), one obtains almost the same weighted Euler equation errors as in the cubic case (except for the case of $d = 1$ where the error is one order of magnitude larger than in the cubic case). However, as we see in Fig. 8, the total computational time is several times larger for piecewise linear functions although evaluations are cheaper than for B-splines. The main reason is that the number of required optimizer iterations for piecewise linear basis functions is almost seven times as high as in the cubic case since the optimizer has to deal with kinks in the objective function. Our experiments show that beginning with $d = 4$, the total optimization time required to solve the transaction costs problem will be one whole order of magnitude shorter for cubic B-splines than for piecewise linear functions.

5.5.2 Comparing Exact Gradients to Finite Differences

The second benefit is the availability of exact gradients: Figure 8 also contains computational times of the solution process if we artificially do not use exact gradients of the objective functions, but rather approximate them with finite differences. For each evaluation of the objective gradient, at least $m_{{{\varvec{p}}}}$ additional evaluations of the objective function have to be performed to compute the finite differences ($2m_{{{\varvec{p}}}}$ if central differences are used). Consequently, while the resulting weighted Euler equation errors are similar, the total optimization time increases by a factor of up to five if we do not use exact gradients.

5.6 Accuracy Through Spatial Adaptivity

Figure 9 shows the convergence of the scaled $L^2$ norm $\varepsilon ^{{\mathrm {w},L^2}}_t$ and the $L^\infty $ norm $\varepsilon ^{{\mathrm {w},L^\infty }}_t$ of the weighted Euler equation error for $t = 0$ for regular sparse grids and spatially adaptive sparse grids for the cases of $d = 1, \dotsc , 4$ stocks. We look at the error for $t = 0$ throughout the remaining sections as numerical inaccuracies accrue from time t to time $t - 1$ by the dynamic programming nature of the problem (3). For Fig. 9 and the following plots, the value function grid is left unchanged while the average number $N_t$ of policy grid points increases with decreasing refinement threshold $\varepsilon $. This is because the value function grid does not seem to have a great influence on the convergence of the Euler equation errors. Compared to regular sparse grids, the spatial adaptivity decreases the error by two orders of magnitude in one dimension. The gain is smaller for higher dimensionalities d, but spatially adaptive sparse grids still outperform regular sparse grids. For $d = 2$, we observe that the error saturates at $N_0 \approx {4000}$ points. This is most likely due to floating-point rounding errors that are not influenced by sparse grid interpolation. In addition, convergence significantly decelerates starting with $d = 4$. For $d = 4$, spatially adaptive sparse grids are able to achieve a weighted Euler equation error of $\varepsilon ^{{\mathrm {w},L^2}}_0 \approx {1.99e{-}02}$ and $\varepsilon ^{{\mathrm {w},L^\infty }}_0 \approx {5.76e{-}02}$ (with an average number $N_0 = {4252}$ of policy grid points). For $d = 5$, we are still able to achieve a small error of $\varepsilon ^{{\mathrm {w},L^2}}_0 \approx {2.67e{-}02}$ and $\varepsilon ^{{\mathrm {w},L^\infty }}_0 \approx {6.37e{-}02}$, respectively, with spatially adaptive sparse grids with an average number $N_0 = {12572}$ of policy grid points. While we cannot detect any convergence for this dimensionality yet, this is still a major result as such high-dimensional models could not be solved that accurately up to now with conventional methods.

Pointwise plots of the weighted Euler equation error as in Fig. 10 for two stocks reveal that there are two types of regions where the error is large: The first type of region is the neighborhood of the aforementioned diagonal boundary ${\varvec{1}}^\top \cdot {\varvec{x}}_t = 1$ of the uncropped region where the cropping distorts the error despite the weights. The second type of region are kinks of the optimal policy functions, which is most visible for coarse grids (e.g., Fig. 10a). When increasing the number of grid points (e.g., Fig. 10b, c), the error decreases quickly in the whole domain.

All in all, Figs. 9 and 10 show that there are two necessary conditions to compute accurate solutions in higher dimensions: Firstly, reliable optimization enabled through B-spline interpolants of the value function and, secondly, spatial adaptive refinement of the policy grids. The latter condition was originally proposed in our previous work (Schober 2018).

Figures 11 and 12 each display the value function and the optimal policies corresponding to sparse grid solutions for $d = 2$ stocks with $N_0 = {879}$ policy grid points or $d = 5$ stocks with $N_0 = {12572}$ policy grid points. Obviously, most grid points are placed along the various kinks in the policies. Interestingly, experiments show that the surplus-based refinement criterion does not place more grid points along the perfectly diagonal kink caused by the cropping of the state space (i.e., along ${\varvec{1}}^\top \cdot {\varvec{x}}_t = 1$). It is possible to circumvent this issue by either rotating the domain or directly incorporating the distance to the diagonal into the refinement criterion for the value function. However, we refrain from doing so here as this does not seem to drastically improve results. Again, this might be due to the domination of the overall error by general floating-point rounding errors.

5.7 Solutions in Higher Dimensions

For higher dimensions, economic results for our transaction costs problem (31a–31h) scarcely exist. In continuous time, analytical solutions for special cases (Liu 2004; Liu and Loewenstein 2002) and numerical solutions with finite element methods (Muthuraman and Kumar 2006) have been discussed. Dynamic programming solutions with value function iteration in discrete time have been studied by Cai (2009), Cai and Judd (2010), Cai et al. (2020) for up to six stocks and one bond without consumption choice.

The purpose of this paper is to show the numerical accuracy of our approach. Rather than analyzing the economic implications of the solution to the higher-dimensional transaction costs problem, we limit ourselves to assessing the resulting optimal policy interpolants $({{\varvec{p}}}^{\mathrm {opt},\mathrm {S},1}_t)_{t=0,\dotsc ,T}$ in a Monte Carlo simulation setup. We calculate the average optimal policy

$$\begin{aligned} \bar{{{\varvec{p}}}}^{\mathrm {opt}}_t :=\frac{1}{m_\mathrm {MC}} \sum _{j=1}^{m_\mathrm {MC}} {{\varvec{p}}}^{\mathrm {opt}}_{t,(j)} \end{aligned}$$

(42)

for $m_\mathrm {MC} \in \mathbb {N}$ individuals where ${{\varvec{p}}}^{\mathrm {opt}}_{t,(j)} = (b^{\mathrm {opt}}_{t,(j)}, {\varvec{\delta }}^{+,{\mathrm {opt}}}_{t,(j)}, {\varvec{\delta }}^{-,{\mathrm {opt}}}_{t,(j)})^\top $ denotes the optimal policies of the individuals ($t = 0, \dotsc , T$ and $j = 1, \dotsc , m_\mathrm {MC}$). They are determined by

$$\begin{aligned} b^{\mathrm {opt}}_{t,(j)}&:=b^{\mathrm {opt},\mathrm {S},1}_t({{\varvec{x}}}_{t,(j)}) \, , \end{aligned}$$

(43a)

$$\begin{aligned} {\varvec{\delta }}^{+,{\mathrm {opt}}}_{t,(j)}&:={\varvec{\delta }}^{+,{\mathrm {opt},\mathrm {S},1}}_t({{\varvec{x}}}_{t,(j)}) \, , \end{aligned}$$

(43b)

$$\begin{aligned} {\varvec{\delta }}^{-,{\mathrm {opt}}}_{t,(j)}&:={\varvec{\delta }}^{-,{\mathrm {opt},\mathrm {S},1}}_t({{\varvec{x}}}_{t,(j)}) \, , \end{aligned}$$

(43c)

$$\begin{aligned} \pi ^{\mathrm {opt}}_{t,(j)}&:=b^{\mathrm {opt}}_{t - 1,(j)} r_f+ ({\varvec{x}}_{t - 1,(j)} + {\varvec{\delta }}^{+,{\mathrm {opt}}}_{t - 1,(j)} - {\varvec{\delta }}^{-,{\mathrm {opt}}}_{t - 1,(j)})^\top \cdot {\varvec{r}}_{t - 1,(j)} \, , \;\; t > 0 \, , \end{aligned}$$

(43d)

$$\begin{aligned} {\varvec{x}}_{t,(j)}&:=\frac{\left( {\varvec{x}}_{t - 1,(j)} + {\varvec{\delta }}^{+,{\mathrm {opt}}}_{t - 1,(j)} - {\varvec{\delta }}^{-,{\mathrm {opt}}}_{t - 1,(j)}\right) \odot {\varvec{r}}_{t - 1,(j)}}{\pi ^{\mathrm {opt}}_{t - 1,(j)}} \, , \;\; t > 0 \, , \;\; {{\varvec{x}}}_{0,(j)} = {\varvec{1}}\, , \end{aligned}$$

(43e)

$$\begin{aligned} {\varvec{r}}_{t,(j)}&\sim LN({\varvec{\mu }}, {\varvec{\varSigma }}) \, . \end{aligned}$$

(43f)

We plot the resulting average state and policies in Fig. 13 for $d = 2$, 3, 4, and 5 stocks for $m_\mathrm {MC} = 10^5$ individuals. In addition, this figure contains the evolution of the weighted Euler equation error $\varepsilon ^{{\mathrm {w},L^2}}_t$ over time. We perform a two-part assessment of the simulation results: First, consumption is slightly increasing over time, which is plausible given the solution, e.g., by Merton (1969) in the finite-horizon case. Second, we compare the stock fractions implied by the Merton points with the simulated stock fractions $\acute{x}_{t,o}/({\varvec{1}}^\top \cdot \acute{{\varvec{x}}}_t)$ ($o=1,\dotsc ,d$) for $t = 0$ in Table 1. We observe that the simulated stock fractions deviate from the Merton points’ stock fractions as expected. However, after computing the simulated stock fractions for all times, we see that they do not change much over time. This is in line with the buy-and-hold characteristics of solutions to portfolio choice models with transaction costs (e.g., Liu and Loewenstein 2002).

Table 1 Simulated stock fractions $\acute{x}_{t,o}/({\varvec{1}}^\top \cdot \acute{{\varvec{x}}}_t)$ for $t = 0$ and stock fractions implied by the Merton points $x^{\mathrm {opt}}_{t,o}/({\varvec{1}}^\top \cdot {\varvec{x}}^{\mathrm {opt}}_t)$ ($o=1,\dotsc ,d$) for the Monte Carlo simulations obtained by evaluating the optimal policy interpolants computed on spatially adaptive sparse grids

Full size table

Finally, we present in Table 2 the computational times and numerical errors of the sparse grid solutions underlying Fig. 13 and Table 1.

6 Conclusion

In this paper, we are the first to develop an approach to accurately solve high-dimensional dynamic portfolio choice models in discrete time that require smooth approximations or gradient-based optimization. With our approach, we have addressed the three key issues of solving these models by means of value function iteration: the curse of dimensionality, the lack of spatial adaptivity, and the lack of continuous gradients all at once by using B-splines on sparse grids with spatially adaptive refinement. We have solved a dynamic portfolio and consumption choice model with transaction costs to study the numerical accuracy of our approach. Solutions to the transaction costs problem with value function iteration have achieved economically acceptable results already for lower resolutions of the interpolation grid than in our presented example. Our approach, however, can easily be applied to other dynamic portfolio choice models or any high-dimensional economic model that require such a high resolution.

Table 2 Number of stocks d, base level n, refine tolerance $\varepsilon $, grid points of the base grid $|\varOmega ^{\mathrm {S}}_{n,d}|$ for d stocks and level n, grid points $N_t$ of the refined grid, added points $\varDelta _{N_t}$, computational time and weighted Euler equation errors $\varepsilon ^{{\mathrm {w},L^2}}_t$, $\varepsilon ^{{\mathrm {w},L^\infty }}_t$, at $t=0$. For the optimal policy ${{\varvec{p}}}^{\mathrm {opt},\mathrm {S},1}_t$ rows, the number $N_t$ is the average number $1/m_{{{\varvec{p}}}}\sum _{j=1}^{m_{{{\varvec{p}}}}} N_{t,j}$ of grid points over all policy grids for $t = 0$ where $N_{t,j}$ is the number of grid points of the j-th policy entry

Full size table

We have solved the transaction costs problem with up to five stocks and one risk-free bond, i.e., a five-dimensional interpolation and an eleven-dimensional optimization problem per time step. Using spatially adaptive refinement of the optimal policies, we have obtained maximum unit-free Euler equation errors around 5% for the five-dimensional problem and even lower maximum errors for lower-dimensional problems. This showcases the high accuracy of the proposed spatially adaptive solution scheme for the optimization of continuous choices, which relies on smooth approximations of the value function and the gradient. We have shown convergence of our approach in up to four dimensions. Here, spatially adaptive refinement of the optimal policies decreased the maximum Euler equation error by nearly two orders of magnitude in the four-dimensional case compared to regular sparse grids without spatially adaptive refinement. This has shown that only with spatial adaptivity, high-dimensional problems can be solved accurately. Finally, we have given a rigorous analysis of the complexity of our approach for dynamic portfolio choice models in general, not only for the transaction costs problem for which we have verified our analysis of complexity with measurements of computational time. We have found that the sole availability of the gradient for the optimization process has saved nearly one order of magnitude in computational complexity in the three-stock case. We expect even larger reductions of the computational complexity for higher-dimensional problems. Compared to finite differences with interpolation on hat functions as used by Brumm and Scheidegger (2017), we have saved considerably more than one order of magnitude in computational complexity and one order of magnitude in total computational time in three dimensions.

There are certain limitations to the applicability of spatially adaptive sparse grids to solve high-dimensional dynamic economic models: Firstly, sparse grid approximations are not shape-preserving, which is especially of importance for value function iteration with interpolation (Cai and Judd 2012). Secondly, the calculation of the coefficients of the B-spline interpolant is time-consuming and not trivial to parallelize since the solution to a system of linear equations has to be computed in every time step. Thirdly, the exact choice of the refinement tolerance for value function and policy interpolants is subject to trial-and-error. Choosing a refinement tolerance that is too low will lead to too many points that are inserted and may cause instability of the entire scheme if the optimizer does not give perfect results.

Future improvements of our approach may lie in the use of problem-tailored adaptivity criteria (Brumm and Scheidegger 2017; Pflüger 2012) instead of the simple surplus-based refinement criterion.

Availability of data and materials/Code availability

The code used to generate the results presented in this paper and all input data is publicly available at trix0r/BBSG.

Change history

13 March 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10614-021-10105-w

Notes

See Cai (2019) for a discussion on alternative approximation methods for dynamic economic models that also do not suffer from the curse of dimensionality.
In this paper, we assume that numerical quadrature is possible. The applicability of the algorithms does not change if the distribution is generated by a different method, e.g., a Monte Carlo simulation. However, due to their slow convergence, Monte Carlo integration methods need a large simulated sample to obtain the accuracy required for the numerical optimization routine, see, e.g., the discussion by Cai (2019).
In any case, the terminal solution may be computed as the solution of the corresponding single-time optimization problem, e.g., $j_T({{\varvec{x}}}_T^{(k)}) = \max _{{{\varvec{p}}}_T} \bigl \{ u(c_T({{\varvec{x}}}_T^{(k)}, {{\varvec{p}}}_T)) \bigr \}$.
Such a problem is usually referred to as embarrassingly parallel.
In principle, one could generate policy interpolants of degree p, ${{\varvec{p}}}^{\mathrm {opt},\mathrm {S},p}_t$, by adding an extra interpolation step after refinePolicy (Valentin 2019).
Another possibility is to transform the non-rectangular domain to the unit hypercube, e.g., by analyzing the principal components of the ergodic distribution (Judd et al. 2014).
We choose $1000 \times d!$ points from $[0, 1]^d$ and discard all points that are not in $\varOmega _{\mathrm{Simplex}}$. Here, the d! corrects for the volume of the d-dimensional simplex obtained by cropping. Thus, there are only $N \approx 1000$ points in $\varOmega _{\mathrm{Simplex}}$.
We define $\mathop {{\varvec{\nabla }}_{{\varvec{x}}}} {\varvec{f}} :=(\partial f_j / \partial x_i)_{i,j}$ (i.e., the transposed Jacobian) and “$\cdot $” denotes the matrix-vector product.

References

Abrams, R. A., & Karmarkar, U. S. (1980). Optimal multiperiod investment-consumption policies. Econometrica, 48(2), 333–353.
Google Scholar
Barberis, N., & Huang, M. (2009). Preferences with frames: A new utility specification that allows for the framing of risks. Journal of Economic Dynamics and Control, 33(8), 1555–1576.
Google Scholar
Barthelmann, V., Novak, E., & Ritter, K. (2000). High dimensional polynomial interpolation on sparse grids. Advances in Computational Mathematics, 12(4), 273–288.
Google Scholar
Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503–515.
Google Scholar
Bellman, R. (1961). Adaptive control processes: A guided tour. Princeton: Princeton University Press. https://doi.org/10.1515/9781400874668.
Book Google Scholar
Brumm, J., & Grill, M. (2014). Computing equilibria in dynamic models with occasionally binding constraints. Journal of Economic Dynamics and Control, 38, 142–160.
Google Scholar
Brumm, J., & Scheidegger, S. (2017). Using adaptive sparse grids to solve high-dimensional dynamic models. Econometrica, 85(5), 1575–1612.
Google Scholar
Bungartz, H.-J., & Griebel, M. (2004). Sparse grids. Acta Numerica, 13, 147–269.
Google Scholar
Cai, Y. (2009). Dynamic programming and its application in economics and finance. Ph.D. thesis, Stanford University. https://purl.stanford.edu/zd335yg6884.
Cai, Y. (2019). Computational methods in environmental and resource economics. Annual Review of Resource Economics, 11(1), 59–82.
Google Scholar
Cai, Y., & Judd, K. L. (2010). Stable and efficient computational methods for dynamic programming. Journal of the European Economic Association, 8(2–3), 626–634.
Google Scholar
Cai, Y., & Judd, K. L. (2012). Shape-preserving dynamic programming. Mathematical Methods of Operations Research, 77(3), 407–421.
Google Scholar
Cai, Y., & Judd, K. L. (2015). Dynamic programming with Hermite approximation. Mathematical Methods of Operations Research, 81(3), 245–267.
Google Scholar
Cai, Y., Judd, K. L., Thain, G., & Wright, S. J. (2015). Solving dynamic programming problems on a computational grid. Computational Economics, 45(2), 261–284.
Google Scholar
Cai, Y., Judd, K. L., & Xu, R. (2020). Numerical solution of dynamic portfolio optimization with transaction costs. Working paper. EID arXiv:2003.01809.
Chu, M. T., Kuo, C.-H., & Lin, M. M. (2013). Tensor spline approximation in economic dynamics with uncertainties. Computational Economics, 42(2), 175–198.
Google Scholar
Cocco, J. F., Gomes, F. J., & Maenhout, P. J. (2005). Consumption and portfolio choice over the life cycle. Review of Financial Studies, 18(2), 491–533.
Google Scholar
Constantinides, G. M. (1979). Multiperiod consumption and investment behavior with convex transactions costs. Management Science, 25(11), 1127–1137.
Google Scholar
Cox, M. G. (1972). The numerical evaluation of B-splines. IMA Journal of Applied Mathematics, 10(2), 134–149.
Google Scholar
de Boor, C. (1972). On calculating with B-splines. Journal of Approximation Theory, 6(1), 50–62.
Google Scholar
De Giorgi, E. G., & Legg, S. (2012). Dynamic portfolio choice and asset pricing with narrow framing and probability weighting. Journal of Economic Dynamics and Control, 36(7), 951–972.
Google Scholar
Epstein, L. G., & Zin, S. E. (1989). Substitution, risk aversion, and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica, 57(4), 937–969.
Google Scholar
Fletcher, R. (2013). Practical methods of optimization. New York: Wiley. https://doi.org/10.1002/9781118723203.
Book Google Scholar
Garlappi, L., & Skoulakis, G. (2009). Numerical solutions to dynamic portfolio problems: The case for value function iteration using Taylor approximation. Computational Economics, 33(2), 193–207.
Google Scholar
Gill, P. E., Murray, W., & Saunders, M. A. (2005). SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM Review, 47(1), 99–131.
Google Scholar
Habermann, C., & Kindermann, F. (2007). Multidimensional spline interpolation: Theory and applications. Computational Economics, 30(2), 153–169.
Google Scholar
Horneff, V., Maurer, R., & Schober, P. (2016). Efficient parallel solution methods for dynamic portfolio choice models in discrete time. Working paper 2665031. Available at SSRN. Goethe University Frankfurt. https://doi.org/10.2139/ssrn.2665031.
Horneff, W. J., Maurer, R., & Rogalla, R. (2010). Dynamic portfolio choice with deferred annuities. Journal of Banking & Finance, 34(11), 2652–2664.
Google Scholar
Horneff, W. J., Maurer, R., & Stamos, M. Z. (2008). Life-cycle asset allocation with annuity markets. Journal of Economic Dynamics and Control, 32(11), 3590–3612.
Google Scholar
Hubener, A., Maurer, R., & Mitchell, O. S. (2016). How family status and social security claiming options shape optimal life cycle portfolios. Review of Financial Studies, 29(4), 937–978.
Google Scholar
Hubener, A., Maurer, R., & Rogalla, R. (2014). Optimal portfolio choice with annuities and life insurance for retired couples. Review of Finance, 18(1), 147–188.
Google Scholar
Höllig, K., & Hörner, J. (2013). Approximation and modeling with B-splines. SIAM. http://bookstore.siam.org/ot132/.
Inkmann, J., Lopes, P., & Michaelides, A. (2011). How deep is the annuity market participation puzzle? Review of Financial Studies, 24(1), 279–319.
Google Scholar
Judd, K. L. (1998). Numerical methods in economics. Cambridge: MIT Press. https://mitpress.mit.edu/books/numericalmethods-economics.
Google Scholar
Judd, K. L., Maliar, L., Maliar, S., & Valero, R. (2014). Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain. Journal of Economic Dynamics and Control, 44, 92–123.
Google Scholar
Judd, K. L., & Solnick, A. (1994). Numerical dynamic programming with shape-preserving splines. Unpublished manuscript, Hoover Institution. https://web.stanford.edu/~judd/papers/dpshape.pdf.
Kamin, J. H. (1975). Optimal portfolio revision with a proportional transaction cost. Management Science, 21(11), 1263–1271.
Google Scholar
Liu, H. (2004). Optimal consumption and investment with transaction costs and multiple risky assets. Journal of Finance, 59(1), 289–338.
Google Scholar
Liu, H., & Loewenstein, M. (2002). Optimal portfolio selection with transaction costs and finite horizons. Review of Financial Studies, 15(3), 805–835.
Google Scholar
Magill, M. J., & Constantinides, G. M. (1976). Portfolio selection with transactions costs. Journal of Economic Theory, 13(2), 245–263.
Google Scholar
Merton, R. C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case. Review of Economics and Statistics, 51(3), 247–257.
Google Scholar
Muthuraman, K., & Kumar, S. (2006). Multidimensional portfolio optimization with proportional transaction costs. Mathematical Finance, 16(2), 301–335.
Google Scholar
Pflüger, D. (2010). Spatially adaptive sparse grids for high-dimensional problems. Verlag Dr. Hut. https://www5.in.tum.de/pub/pflueger10spatially.pdf.
Pflüger, D. (2012). Spatially adaptive refinement. In J. Garcke & M. Griebel (Eds.), Sparse grids and applications. Lecture Notes in Computational Science and Engineering (Vol. 88, pp. 243–262). Berlin: Springer.
Philbrick, C. R, Jr., & Kitanidis, P. K. (2001). Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Operations Research, 49(3), 398–412.
Google Scholar
Rust, J. (2008). Dynamic programming. In S. N. Durlauf & L. E. Blume (Eds.), The new Palgrave dictionary of economics (Vol. 1-8, pp. 1471–1489). London: Palgrave Macmillan.
Google Scholar
Schober, P. (2018). Solving dynamic portfolio choice models in discrete time using spatially adaptive sparse grids. In J. Garcke, D. Pflüger, C. Webster, & G. Zhang (Eds.), Sparse grids and applications-Miami 2016. Lecture Notes in Computational Science and Engineering (Vol. 123, pp. 135–173). Berlin: Springer.
Google Scholar
Sickel, W., & Ullrich, T. (2011). Spline interpolation on sparse grids. Applicable Analysis, 90(3–4), 337–383.
Google Scholar
Stoyanov, M. (2017). User manual: TASMANIAN sparse grids v4.0. Technical report, Oak Ridge National Laboratory. https://tasmanian.ornl.gov/documents/UserManual.pdf.
Valentin, J. (2019). B-splines for sparse grids: Algorithms and application to higher-dimensional optimization. Ph.D. thesis. University of Stuttgart.
Valentin, J., & Pflüger, D. (2016). Hierarchical gradient-based optimization with B-splines on sparse grids. In J. Garcke & D. Pflüger (Eds.), Sparse grids and applications-Stuttgart 2014. Lecture Notes in Computational Science and Engineering (Vol. 109, pp. 315–336). Berlin: Springer.
Google Scholar
Valentin, J., Sprenger, M., Pflüger, D., & Röhrle, O. (2018). Gradient-based optimization with B-splines on sparse grids for solving forward-dynamics simulations of three-dimensional, continuum-mechanical musculoskeletal system models. International Journal for Numerical Methods in Biomedical Engineering, 34(5), 1–21.
Google Scholar
Winschel, V., & Krätzig, M. (2010). Solving, estimating, and selecting nonlinear dynamic models without the curse of dimensionality. Econometrica, 78(2), 803–821.
Google Scholar
Zenger, C. (1991). Sparse grids. In W. Hackbusch (Ed.), Parallel algorithms for partial differential equations. Notes on Numerical Fluid Mechanics (Vol. 31, pp. 241-251). Braunschweig: Vieweg. http://www5.in.tum.de/pub/zenger91sg.pdf.

Download references

Acknowledgements

We thank the initiative High Performance Computing in Hessen for granting us computing time at the LOEWE-CSC cluster and the Lichtenberg High Performance Computer. We appreciate valuable remarks to improve this paper from two anonymous referees. Helpful comments have been provided by Johannes Brumm, Yannick Dillschneider, Kenneth Judd, and Alexander Ludwig. We thank Stefan Zimmer for helping us develop the weakly fundamental not-a-knot splines. Finally, Peter Schober thanks Raimond Maurer for supporting this research in every way possible.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work was supported by funding from the German Investment and Asset Management Association (BVI), the Juniorprofessurenprogramm of the Landesstiftung Baden-Württemberg, and the DFG (Cluster of Excellence SimTech EXC310/EXC2075).

Author information

Authors and Affiliations

Finance Department, Goethe University Frankfurt, Theodor-W.-Adorno-Platz 3, 60323, Frankfurt am Main, Germany
Peter Schober
Institute for Parallel and Distributed Systems, University of Stuttgart, Universitätsstraße 38, 70569, Stuttgart, Germany
Julian Valentin & Dirk Pflüger

Authors

Peter Schober
View author publications
You can also search for this author in PubMed Google Scholar
Julian Valentin
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Pflüger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Schober.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

1.1 A.1 Sequential Quadratic Programming (SQP)

SQP methods are well-suited to compute the solution to problem (3). These methods use the linearization of the Lagrangian

$$\begin{aligned} \mathscr {L}_t( {{\varvec{p}}}_t, {{\varvec{\lambda }}}_t, {{\varvec{x}}}_t^{(k)} ) :=\tilde{j}^{\mathrm {S}}_t ({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)} ) + {{\varvec{\lambda }}}_t^\top \cdot {\varvec{g}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)}) \, , \end{aligned}$$

(44)

with Kuhn-Tucker multipliers ${{\varvec{\lambda }}}_t \in \mathbb {R}^{m_{{\varvec{g}}}}$ to set up a quadratic programming problem

$$\begin{aligned} \max \limits _{{\varvec{d}}_t^{(i)}} \Bigl \{ \mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} \tilde{j}^{\mathrm {S}}_t({{\varvec{p}}}_t^{(i)}\!\!, {{\varvec{x}}}_t^{(k)} )^\top \cdot {\varvec{d}}_t^{(i)} + \frac{1}{2} {{\varvec{d}}_t^{(i)}}^\top \cdot \mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}^2} \tilde{j}^{\mathrm {S}}_t({{\varvec{p}}}_t^{(i)}\!\!, {{\varvec{x}}}_t^{(k)} ) \cdot {\varvec{d}}_t^{(i)} \Bigr \} \ , \end{aligned}$$

(45)

which finds the search direction ${\varvec{d}}_t^{(i)}$ for the current iterate ${{\varvec{p}}}_t^{(i)}$ starting from an initial guess ${{\varvec{p}}}_t^{(0)} \in \varPsi \subset \mathbb {R}^{m_{{{\varvec{p}}}}}$.^{Footnote 8} This problem can hence be solved by means of standard quadratic programming where frequently the Hessian $\mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}^2} \tilde{j}^{\mathrm {S}}_t$ is approximated by the BFGS method (Fletcher 2013). Finally, the next iterate ${{\varvec{p}}}_t^{(i+1)}$ is chosen by an appropriate line search procedure over the step length $\sigma _i$:

$$\begin{aligned} {{\varvec{p}}}_t^{(i+1)} :={{\varvec{p}}}_t^{(i)} + \sigma _i {\varvec{d}}_t^{(i)} \, . \end{aligned}$$

(46)

If the gradient $\mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} \tilde{j}^{\mathrm {S}}_t$ is not available, implementations of SQP methods approximate it by finite differences. This leads to $m_{{{\varvec{p}}}}$ additional evaluations of $\tilde{j}^{\mathrm {S}}_t$ per quadratic programming iteration and to $2m_{{{\varvec{p}}}}$ additional evaluations if central finite differences are used. Some SQP routines allow the user to choose finite difference approximations, some automatically use central finite differences in certain situations to achieve higher accuracy when needed. However, almost all SQP routines allow the user to provide the gradient $\mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} \tilde{j}^{\mathrm {S}}_t$ (and many also the gradients of the constraints) to save computing time and to increase accuracy. For details see Fletcher (2013), Gill et al. (2005).

The gradient $\mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} \tilde{j}^{\mathrm {S}}_t$ of the target function (6) at the grid point ${{\varvec{x}}}_t^{(k)}$,

$$\begin{aligned} \mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} \tilde{j}^{\mathrm {S}}_t = \mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} u(c_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)} )) + \rho \mathbb {E}_t \left[ \mathop {{\varvec{\nabla }}_{{\varvec{f}}_t}} j^{\mathrm {S}}_{t+1}({\varvec{f}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)}\!\!, {\varvec{\zeta }}_t))^\top \cdot \mathop {{\varvec{\nabla }}_{{{\varvec{p}}}_t}} {\varvec{f}}_t({{\varvec{p}}}_t, {{\varvec{x}}}_t^{(k)}\!\!, {\varvec{\zeta }}_t) \right] \, , \end{aligned}$$

(47)

can be supplied to the SQP routine by evaluating the approximation of the gradient $\mathop {{\varvec{\nabla }}_{{\varvec{f}}_t}} j^{\mathrm {S}}_{t+1}$ rather than using finite differences.

1.2 A.2 Proof of the Normalization

Theorem 1

For all $t = 0, \dotsc , T, $

$$\begin{aligned} J_t(W_t, {\varvec{x}}_t) = W_t^{1 - \gamma } j_t({\varvec{x}}_t) \, , \end{aligned}$$

(48)

where $J_t$ is the solution of problem (27a–27c) and $j_t$ the solution of problem (31a–31h).

Proof

We have $u\left( C_t\right) = u\left( W_t c_t\right) = W_t^{1-\gamma } u\left( c_t\right) $ for all t due to the choice of the utility function (2).

Base case $t = T$: Because of Eqs. (27b) and (31b) it is

$$\begin{aligned} \begin{aligned} J_T(W_T, {\varvec{x}}_T)&= u\left( \left( 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T \right) W_T\right) = W_T^{1 - \gamma } u\left( 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T \right) \\&= W_T^{1 - \gamma } j_T({\varvec{x}}_t) \, . \end{aligned} \end{aligned}$$

(49)

Inductive hypothesis: For $t + 1$ it is $J_{t + 1} \left( W_{t + 1}, {\varvec{x}}_{t+1}\right) = W_{t + 1}^{1 - \gamma } j_{t + 1}({\varvec{x}}_{t+1})$.

Inductive step $t + 1 \rightarrow t$: By the inductive hypothesis and Eq. (30a) it is

$$\begin{aligned} \begin{aligned} J_t(W_t, {\varvec{x}}_t)&= \max \limits _{B_t, {\varvec{\varDelta }}^+_t, {\varvec{\varDelta }}^-_t} \bigl \{ u(C_t) + \rho \mathbb {E}_t \left[ W_{t + 1}^{1 - \gamma } j_{t + 1} ({\varvec{x}}_{t+1}) \right] \bigr \} \\&= W_t^{1 - \gamma } \max \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \bigl \{ u(c_t) + \rho \mathbb {E}_t \left[ \pi _{t+1}^{1 - \gamma } j_{t + 1} ({\varvec{x}}_{t+1}) \right] \big \} \\&= W_t^{1 - \gamma } j_t({\varvec{x}}_t) \, . \end{aligned} \end{aligned}$$

(50)

$\square $

1.3 A.3 Proof of the Certainty Equivalent Transformation

Theorem 2

For all $t = 0, \dotsc , T, $

$$\begin{aligned} j_t({\varvec{x}}_t) = \frac{1}{1 - \gamma } \hat{j}_t({\varvec{x}}_t)^{1 - \gamma } \, , \end{aligned}$$

(51)

where $j_t$ the solution of problem (31a–31h) and $\hat{j}_t$ the solution of problem (33a, 33b).

Proof

Base case $t = T$: Because of Eqs. (31b) and (33b) it is

$$\begin{aligned} \begin{aligned} j_T({\varvec{x}}_T)&= u\left( 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T \right) = \frac{1}{1 - \gamma } \left( 1 - \tau {\varvec{1}}^\top \cdot {\varvec{x}}_T\right) ^{1 - \gamma } \\&= \frac{1}{1 - \gamma } \hat{j}_T({\varvec{x}}_t)^{1 - \gamma } \, . \end{aligned} \end{aligned}$$

(52)

Inductive hypothesis: For $t + 1$ it is $j_{t + 1} ({\varvec{x}}_{t+1}) = 1/(1 - \gamma ) \hat{j}_{t+1} ({\varvec{x}}_{t+1})^{1 - \gamma }$.

Inductive step $t + 1 \rightarrow t$: By the inductive hypothesis and Eq. (33a) it is

$$\begin{aligned} \begin{aligned} j_t({\varvec{x}}_t)&= \max \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \bigl \{ u(c_t) + \rho \mathbb {E}_t \left[ \pi _{t+1}^{1 - \gamma } j_{t + 1} ({\varvec{x}}_{t+1}) \right] \bigr \} \\&= \max \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \biggl \{ \frac{1}{1 - \gamma }c_t^{1 - \gamma } + \rho \mathbb {E}_t \left[ \pi _{t+1}^{1 - \gamma } \frac{1}{1 - \gamma } \hat{j}_{t+1} ({\varvec{x}}_{t+1})^{1 - \gamma } \right] \biggr \} \\&= \frac{1}{1 - \gamma } \min \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \biggl \{ c_t^{1 - \gamma } + \rho \mathbb {E}_t \left[ \left( \pi _{t+1} \hat{j}_{t+1} ({\varvec{x}}_{t+1})\right) ^{1 - \gamma } \right] \biggr \} \\&= \frac{1}{1 - \gamma } \left( \max \limits _{b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t} \Biggl \{ \left( c_t^{1 - \gamma } + \rho \mathbb {E}_t \left[ \left( \pi _{t+1} \hat{j}_{t+1} ({\varvec{x}}_{t+1})\right) ^{1 - \gamma } \right] \right) ^{\frac{1}{1-\gamma }} \Biggr \} \right) ^{1-\gamma } \\&= \frac{1}{1 - \gamma } \hat{j}_t ({\varvec{x}}_t)^{1 - \gamma } \, , \end{aligned} \end{aligned}$$

(53)

where we used that $(\cdot )^{1/(1 - \gamma )}$ is a strictly monotonously decreasing function and $1 - \gamma < 0$. $\square $

1.4 A.4 Analytical Gradients

For $t < T$ at state ${\varvec{x}}_t$, let us define the objective function (6) for the certainty equivalent formulation of the transaction costs problem (33a, 33b) by

$$\begin{aligned} \tilde{\hat{j}}^{\mathrm {S}{}}_t (b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t, {\varvec{x}}_t) :=\left\{ c_t^{1 - \gamma } + \rho \mathbb {E}_t \left[ \left( \pi _{t+1} \hat{j}^{\mathrm {S}}_{t + 1}\right) ^{1 - \gamma } \right] \right\} ^{\frac{1}{1-\gamma }}\, , \end{aligned}$$

(54)

where $\hat{j}^{\mathrm {S}}_t$ denotes the sparse grid B-spline approximation of Eq. (33a) evaluated at ${\varvec{x}}_t$. With the sparse grid B-spline approximation of the gradient with respect to ${\varvec{x}}_t$ evaluated at ${\varvec{x}}_t$, $\mathop {{\varvec{\nabla }}_{{\varvec{x}}_t}} \hat{j}^{\mathrm {S}}_t$, the gradient of the objective function (47) with respect to the policies $b_t$, ${\varvec{\delta }}^{+{}}_t$, and ${\varvec{\delta }}^{-{}}_t$ is:

$$\begin{aligned} \mathop {{\varvec{\nabla }}_{b_t}} \tilde{\hat{j}}^{\mathrm {S}{}}_t&= {\tilde{\hat{j}}^{\mathrm {S}{}}_t}^{\gamma } \left( -c_t^{-\gamma } + \rho \mathbb {E}_t\left[ \left( \pi _{t+1} \hat{j}^{\mathrm {S}}_{t+1}\right) ^{-\gamma } r_f\left( \hat{j}^{\mathrm {S}}_{t+1} - \left( \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1}\right) ^\top \cdot {\varvec{x}}_{t+1}\right) \right] \right) \, , \end{aligned}$$

(55a)

$$\begin{aligned} \mathop {{\varvec{\nabla }}_{{\varvec{\delta }}^{+{}}_t}} \tilde{\hat{j}}^{\mathrm {S}{}}_t&= {\tilde{\hat{j}}^{\mathrm {S}{}}_t}^{\gamma } \left( -(1+\tau ) {\varvec{1}}c_t^{-\gamma } + \rho \mathbb {E}_t\left[ \left( \pi _{t+1} \hat{j}^{\mathrm {S}}_{t+1}\right) ^{-\gamma } {\varvec{r}}_t \odot \left( {\varvec{1}}\hat{j}^{\mathrm {S}}_{t+1} + \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1} - {\varvec{1}}\left( \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1}\right) ^\top \cdot {\varvec{x}}_{t+1}\right) \right] \right) \, , \end{aligned}$$

(55b)

$$\begin{aligned} \mathop {{\varvec{\nabla }}_{{\varvec{\delta }}^{-{}}_t}} \tilde{\hat{j}}^{\mathrm {S}{}}_t&= {\tilde{\hat{j}}^{\mathrm {S}{}}_t}^{\gamma } \left( -(\tau - 1){\varvec{1}}c_t^{-\gamma } - \rho \mathbb {E}_t\left[ \left( \pi _{t+1} \hat{j}^{\mathrm {S}}_{t+1} \right) ^{-\gamma } {\varvec{r}}_t \odot \left( {\varvec{1}}\hat{j}^{\mathrm {S}}_{t+1} + \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1} - {\varvec{1}}\left( \mathop {{\varvec{\nabla }}_{{\varvec{x}}_{t+1}}} \hat{j}^{\mathrm {S}}_{t+1}\right) ^\top \cdot {\varvec{x}}_{t+1}\right) \right] \right) \, . \end{aligned}$$

(55c)

1.5 A.5 State Space Cropping

To obtain function values outside the feasible state space we virtually sell, if ${\varvec{1}}^\top \cdot {\varvec{x}}_t > 1$ as many stocks as needed to meet the constraint ${\varvec{1}}^\top \cdot {\varvec{x}}_t \le 1$. We already might need to sell stocks even if ${\varvec{1}}^\top \cdot {\varvec{x}}_t$ is smaller but close to one in order to satisfy the minimum consumption requirement (31c). In detail, we replace ${\varvec{x}}_t$ by $\hat{\beta }{\varvec{x}}_t$ whenever $\hat{\beta }< 1$ where $\hat{\beta }> 0$ is a cropping factor that is determined by

$$\begin{aligned} \Bigl [ 1 - \tau \, \bigl ( {\varvec{1}}^\top \cdot {\varvec{x}}_t - {\varvec{1}}^\top \cdot (\hat{\beta }{\varvec{x}}_t) \bigr ) \Bigr ] \cdot \bigl (1 - {\varvec{1}}^\top \cdot (\hat{\beta }{\varvec{x}}_t)\bigr ) = c_{\min }. \end{aligned}$$

(56)

Here, $\bigl ({\varvec{1}}^\top \cdot {\varvec{x}}_t - {\varvec{1}}^\top \cdot (\hat{\beta }{\varvec{x}}_t)\bigr )$ is the amount of virtually sold stocks. Hence, the term in square brackets is the fraction of wealth that is still available after deducting the induced transaction costs. The product of this term with $\bigl (1 - {\varvec{1}}^\top \cdot (\hat{\beta }{\varvec{x}}_t)\bigr )$ is the fraction of wealth that can be consumed after the virtual selling, which needs to be at least $c_{\min }$. Solving Eq. (56) for $\hat{\beta }$ and choosing the positive solution, we finally obtain

$$\begin{aligned} \hat{\beta }= \frac{ \tau \, \bigl (1 + {\varvec{1}}^\top \cdot {\varvec{x}}_t\bigr ) - 1 + \sqrt{ \tau ^2\, \bigl (1 - {\varvec{1}}^\top \cdot {\varvec{x}}_t\bigr )^2 - 2 \tau \, \bigl (2 c_{\text {min}}- 1 + {\varvec{1}}^\top \cdot {\varvec{x}}_t\bigr ) + 1 } }{ 2 \tau {\varvec{1}}^\top \cdot {\varvec{x}}_t} \, . \end{aligned}$$

(57)

1.6 A.6 Euler Equation Error Derivation

The Lagrangian (44) for the transaction costs problem in certainty equivalent formulation (33a, 33b) is given by:

$$\begin{aligned} \mathscr {L}_t(b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t, {{\varvec{\lambda }}}_t, {\varvec{x}}_t) :=\tilde{\hat{j}}^{\mathrm {S}{}}_t(b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t, {\varvec{x}}_t) + {{\varvec{\lambda }}}_t^\top \cdot {\varvec{g}}_t(b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t, {\varvec{x}}_t) \, , \end{aligned}$$

(58)

with $\tilde{\hat{j}}^{\mathrm {S}{}}_t$ from Eq. (54), ${{\varvec{\lambda }}}\in \mathbb {R}^{3d + 2}$ and ${\varvec{g}}_t(b_t, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t, {\varvec{x}}_t) = (c_t - c_{\text {min}}, {\varvec{\delta }}^{+{}}_t, {\varvec{\delta }}^{-{}}_t, {\varvec{x}}_t - {\varvec{\delta }}^{-{}}_t, b_t)^\top $ from constraints Eqs. (31c) to (31g).

The first order condition with regard to $b_t$ is

$$\begin{aligned} \mathop {{\varvec{\nabla }}_{b_t}} \mathscr {L}_t = \mathop {{\varvec{\nabla }}_{b_t}} \tilde{\hat{j}}^{\mathrm {S}{}}_t - \lambda ^{1}_t \mathop {{\varvec{\nabla }}_{b_t}} c^{\mathrm {opt}}_t - \lambda ^{3d+2}_t = 0 \, . \end{aligned}$$

(59)

We neglect binding constraints, i.e., we assume $\lambda ^{1}_t = \lambda ^{3d+2}_t = 0$, and set the error $\varepsilon ^{{\mathrm {w}}}_t({\varvec{x}}_t) = \text {NaN}$ whenever $\lambda ^{1}_t \ne 0$ or $\lambda ^{3d+2}_t \ne 0$ for any ${\varvec{x}}_t$. Thus, we take the $\text {NaN}$-mean in Eq. (39a) and the $\text {NaN}$-maximum in Eq. (39b).

Assuming $\lambda ^{1}_t = \lambda ^{3d+2}_t = 0$ in Eq. (59), plugging in Eq. (55a) for $\mathop {{\varvec{\nabla }}_{b_t}} \tilde{\hat{j}}^{\mathrm {S}{}}_t$ at the optimum $(b^{\mathrm {opt}}_t, {\varvec{\delta }}^{+,{\mathrm {opt}}}_t, {\varvec{\delta }}^{-{\mathrm {opt}}}_t)^\top $ and dividing both sides by ${\tilde{\hat{j}}^{\mathrm {S}{}}_t}^{\gamma }$ yields Eq. (35).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schober, P., Valentin, J. & Pflüger, D. Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids. Comput Econ 59, 185–224 (2022). https://doi.org/10.1007/s10614-020-10061-x

Download citation

Accepted: 15 October 2020
Published: 04 January 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10614-020-10061-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Abstract

Similar content being viewed by others

Solving Dynamic Portfolio Choice Models in Discrete Time Using Spatially Adaptive Sparse Grids

Dynamic portfolio choice: a simulation-and-regression approach

Valuation of Structured Financial Products by Adaptive Multiwavelet Methods in High Dimensions

1 Introduction

2 Discrete Time Dynamic Portfolio Choice Models

3 Hierarchical B-Splines on Sparse Grids

3.1 Not-A-Knot B-Spline Basis

3.2 Hierarchical Not-A-Knot B-Splines

3.3 Sparse Grids

3.4 Weakly Fundamental Not-A-Knot Splines

4 B-Splines on Spatially Adaptive Sparse Grids for Dynamic Portfolio Choice Models

4.1 Solution for the Value Function

4.2 Optimization

4.3 Refinement

4.4 Solution for the Optimal Policies

5 Application: Transaction Costs Problem

5.1 Transaction Costs Problem

5.2 Numerical Solution

5.3 Error Measurement

5.4 Economical Verification

5.5 Savings in Complexity Using B-Splines

5.5.1 Comparison to Piecewise Linear Functions

5.5.2 Comparing Exact Gradients to Finite Differences

5.6 Accuracy Through Spatial Adaptivity

5.7 Solutions in Higher Dimensions

6 Conclusion

Availability of data and materials/Code availability

Change history

13 March 2021

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Appendix A

1.1 A.1 Sequential Quadratic Programming (SQP)

1.2 A.2 Proof of the Normalization

Theorem 1

Proof

1.3 A.3 Proof of the Certainty Equivalent Transformation

Theorem 2

Proof

1.4 A.4 Analytical Gradients

1.5 A.5 State Space Cropping

1.6 A.6 Euler Equation Error Derivation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation