1 Introduction

Solving the dynamic portfolio management problem has become an interesting topic ever since empirical findings in financial research suggested that asset returns were predictable. When the distributions of the asset returns are time-invariant, Merton (1969) and Samuelson (1969) have shown that an investor using a power utility function, who re-balances his portfolio optimally, should choose the same asset allocation at every time point, regardless of the investment horizon.

However, if the distributions of the asset returns are time-dependent, for example, when the asset returns follow a vector auto-regression (VAR) model, the optimal asset allocations at intermediate time points are usually not identical. In this case, an investor, who aims to find an optimal asset allocation at initial time, has to first consider all possible asset allocations in subsequent time points, which in turn are also influenced by the initial investment decision. Mathematically, an investor must solve a multivariate optimization problem regarding asset allocations at all portfolio re-balancing times.

Generally it is difficult to solve this multivariate optimization problem directly, and therefore this problem is usually solved by a backward recursion process, where at each time step the investor considers a simplified problem. Basically, there are two stages for solving this step-wise optimization problem. First, we determine how to formulate this optimization problem. We can either focus on the optimization problem directly or try to solve its corresponding first-order conditions, which usually depend on a preparatory approximation of the optimization problem. Then, the rest of the problem can be treated as a mathematical problem of computing conditional expectations. In Brandt et al. (2005), the authors work on the first-order conditions and compute the conditional expectations by simulation and cross-path regression. We call the algorithm the BGSS algorithm. In van Binsbergen and Brandt (2007b), the authors propose an alternative algorithm, the vBB algorithm, where they work on the optimization problem directly via grid-searching but still utilizing simulation and cross-path regression to compute the conditional expectations. They state that the vBB algorithm is more stable than the BGSS algorithm, since the BGSS algorithm essentially relies on an approximation of the utility function. Many other numerical approaches for computing conditional expectations have been considered, for example, in Barberis (2000), Jondeau and Rockinger (2006) and Garlappi and Skoulakis (2009).

In this paper, we propose improvements for the BGSS and the vBB algorithms, which, respectively, rely on solving the first-order conditions and grid-searching to tackle the optimization problem.

In the original BGSS algorithm, cross-path standard regression is employed for solving first-order conditions, which correspond to the utility function via a Taylor series expansion. Within this framework, we particularly contribute in two aspects. First, we replace the standard regression method by the (local) regression combined with bundling of simulation paths, as employed in the stochastic grid bundling method (SGBM), from Jain and Oosterlee (2015). According to our tests, this modification makes the algorithm more stable and robust, and therefore our algorithm performs highly satisfactorily compared to the BGSS and the vBB algorithms, particularly when the investment horizon is long and risk aversion is high. In the process of approximating the utility function, we consider an alternative Taylor expansion to the expansion employed in the original BGSS algorithm. This Taylor expansion was introduced in Garlappi and Skoulakis (2009). This expansion is however not directly compatible with regression-based approaches. With a specific choice of the Taylor expansion center, we can equip our SGBM regression-based portfolio algorithm with this improved Taylor expansion, making the approximations less biased. In short, our enhanced algorithm still constitutes an algorithm based on simulation and cross-path regression. It thus remains possible to extend this algorithm to high-dimensional scenarios without increasing the computation complexity dramatically.

Based on grid-searching, which is the basic idea of the vBB algorithm, we utilize a Fourier cosine series technique (Fang and Oosterlee 2008, 2009) to compute the conditional expectations and come up with a benchmark algorithm, the COS portfolio management method. Because this method is not based on simulation, there is no such error present in the corresponding numerical results. In the test cases to follow, reference solutions can therefore be generated via this COS-based algorithm.

The paper is organized as follows. Section 2 gives the mathematical formulation of the investor’s problem. In Sect. 2.1, we introduce a special case of the investor’s problem where the simulation- and regression-based methods can be applied. In Sect. 3 the SGBM algorithm is briefly described and the alternative Taylor expansion is also discussed. The benchmark algorithm based on the COS method is presented in Sect. 4. Following that, we display results of the numerical tests in Sect. 5. A brief discussion of the errors of the simulation-based methods is performed in Sect. 5.6. We conclude in the last section.

2 Problem Formulation: The Investor’s Problem

This section defines the dynamic portfolio optimization problem, or, in other words, “the investor’s problem”. We assume that the financial market is defined on a complete filtered probability space \((\varOmega ,{\mathscr {F}},\{{\mathscr {F}}_t\}_{0\le t\le T},{\mathbb {P}})\) with finite time horizon [0, T]. The state space \(\varOmega \) is the set of all realizations of the financial market within the time horizon [0, T], \({\mathscr {F}}\) is the sigma algebra of events at time T, i.e. \({\mathscr {F}} = {\mathscr {F}}_T\). We assume that the filtration \(\{{\mathscr {F}}_t\}_{0\le t\le T}\) is generated by the price processes of the financial market and augmented with the null sets of \({\mathscr {F}}\). The probability measure \({\mathbb {P}}\) is defined on \({\mathscr {F}}\).

We consider a portfolio consisting of one risk-free asset and d risky assets, which can be traded at discrete time points, \(t\in [0,1,\ldots ,T-1]\), before terminal time T. At each trading time t, an investor decides his trading strategy to maximize the expected value of the utility of his terminal wealth \(W_T\). Formally, the investor’s problem is given by

$$\begin{aligned} V_t(W_t,\mathbf {Z}_t) = \max _{\{\mathbf {x}_s\}^{T-1}_{s=t}}{\mathbb {E}}[U(W_T)\big |W_t,\mathbf {Z}_t], \end{aligned}$$
(1)

subject to the constraints:

$$\begin{aligned} W_{s+1} = W_s \cdot \left( {\mathbf {x}^{\prime }_s} \mathbf {R}^e_{s+1}+R^f\right) , \quad s = t,\ldots ,T-1. \end{aligned}$$

Here \(\mathbf {x}_s\) denotes the asset allocation of the investor’s wealth in risky assets. Vector transposition is denoted by the prime sign. \(R^f\) is the return of the risk-free asset, which is assumed to be constant for simplicity, and \(\mathbf {R}^e_{s+1} = [R^{e,1}_{s+1},\ldots ,R^{e,d}_{s+1}]\) are the excess returns of the risky assets at time \(s+1\). The function \(U(W_T)\) denotes the utility of the investor’s terminal wealth. \(V_t(W_t,\mathbf {Z}_t)\) is termed the value function, which measures the investor’s investment opportunities at time t with wealth \(W_t\) and market state \(\mathbf {Z}_t\). We assume that \(\{\mathbf {Z}_t\}^T_{t=0}\) is an \({\mathscr {F}}_t\)-adapted Markov process.

Mathematically, an investor decides his asset allocations \(\{\mathbf {x}_s\}^{T-1}_{s=0}\) at all time steps to maximize \(V_0(W_0,\mathbf {Z}_0)\) or, equivalently, \({\mathbb {E}}[U(W_0 \prod ^{T-1}_{s=0}({\mathbf {x}^{\prime }_s}\mathbf {R}^e_{s+1}+R^f))\big |W_0,\mathbf {Z}_0]\).

2.1 Numerical Approaches to the Investor’s Problem

From Eq. (1), we see that at time t it is impossible for the investor to determine the optimal asset allocation \(\mathbf {x}_t\) without knowing optimal asset allocations \(\{\mathbf {x}_s\}^{T-1}_{s=t+1}\) at future time points. A multivariate optimization problem with respect to all asset allocations \(\{\mathbf {x}_s\}^{T-1}_{s=t}\) may be considered, but due to the complexity of the dynamics of \(\mathbf {Z}_t\) it is usually not feasible to solve this problem.

A special case, discussed in Barberis (2000), Brandt et al. (2005) and van Binsbergen and Brandt (2007b), is when the investor has constant relative risk aversion (CRRA)Footnote 1, his optimal asset allocation \(\mathbf {x}_t\) is independent of his wealth \(W_t\). With this utility function, the optimization problem with respect to the original value function, \(V_t(W_t,\mathbf {Z}_t)\), which depends on two variables \(W_t\) and \(\mathbf {Z}_t\), reduces to an optimization problem with respect to a simplified value function, \(v_t(\mathbf {Z}_t)\):

$$\begin{aligned} v_t(\mathbf {Z}_t):= & {} V_t(1,\mathbf {Z}_t) = \max _{\{x_s\}^{T-1}_{s=t}}{\mathbb {E}}\left[ U\left( \prod ^{T-1}_{s=t}\left( {\mathbf {x}^{\prime }_s}\mathbf {R}^e_{s+1} + R^f\right) \right) \Bigg |\mathbf {Z}_t\right] . \end{aligned}$$

Value function \(v_t(\mathbf {Z}_t)\) can be written as a recursive procedure:

$$\begin{aligned} v_t(\mathbf {Z}_t)&= \max _{\{x_s\}^{T-1}_{s=t}}{\mathbb {E}}\left[ U\left( \prod ^{T-1}_{s=t}\left( {\mathbf {x}^{\prime }_s}\mathbf {R}^e_{s+1} + R^f\right) \right) \Bigg |\mathbf {Z}_t\right] \nonumber \\&= \max _{\{\mathbf {x}_s\}^{T-1}_{s=t}}{\mathbb {E}}\left[ {\mathbb {E}}\left[ U\left( \prod ^{T-1}_{s=t}6\left( {\mathbf {x}^{\prime }_s}\mathbf {R}^e_{s+1} + R^f\right) \right) \Bigg |\mathbf {Z}_{t+1}\right] \Bigg |\mathbf {Z}_t\right] \nonumber \\&= \max _{\mathbf {x}_t}{\mathbb {E}}\left[ \max _{\{\mathbf {x}_s\}^{T-1}_{s=t+1}}{\mathbb {E}} \left[ U\left( \prod ^{T-1}_{s=t}({\mathbf {x}^{\prime }_s}\mathbf {R}^e_{s+1} + R^f)\right) \Bigg |\mathbf {Z}_{t+1}\right] \Bigg |\mathbf {Z}_t\right] \nonumber \\&= \max _{\mathbf {x}_t}{\mathbb {E}}\left[ v_{t+1}\left( \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1} + R^f\right) ,\mathbf {Z}_{t+1}\right) \Bigg |\mathbf {Z}_t\right] . \end{aligned}$$
(4)

Equation (4) is based on the Bellman principle of optimality and dynamic programming (Bellman 1957), which forms the basis for any recursive solution of the dynamic portfolio problem. The principle can be applied since the state vector is assumed to follow a Markov process and, therefore, the optimal asset allocation \({\mathbf {x}_t}\) only depends upon time and the current state \(\mathbf {Z}_t\).

Using the simplified value function and the power utility function with parameter \(\gamma \), we can solve the investor’s problem, in a backward recursion processFootnote 2, as follows:

  • At time T, we determine the value function as:

    $$\begin{aligned} v_T(\mathbf {Z}_T) = \frac{1}{1-\gamma }, \quad \gamma \not = 1; \end{aligned}$$
  • At time \(T-1\), the investor considers the optimization problem:

    $$\begin{aligned} v_{T-1}(\mathbf {Z}_{T-1})= & {} \max _{\mathbf {x}_{T-1}} {\mathbb {E}}\left[ U\left( {\mathbf {x}^{\prime }_{T-1}}\mathbf {R}^e_{T}+R^f\right) \Bigg |\mathbf {Z}_{T-1}\right] \\= & {} \max _{\mathbf {x}_{T-1}} {\mathbb {E}}\left[ \left( {\mathbf {x}^{\prime }_{T-1}}\mathbf {R}^e_{T}+R^f\right) ^{1-\gamma }v_T(\mathbf {Z}_T)\Bigg |\mathbf {Z}_{T-1}\right] . \end{aligned}$$

    We denote the optimal asset allocation by \({\hat{\mathbf {x}}}_{T-1}\), so that:

    $$\begin{aligned} \max _{\mathbf {x}_{T-1}} {\mathbb {E}}\left[ U\left( {\mathbf {x}^{\prime }_{T-1}}\mathbf {R}^e_{T}+R^f\right) \Bigg |\mathbf {Z}_{T-1}\right] := {\mathbb {E}}\left[ U\left( {{\hat{\mathbf {x}}}^{\prime }_{T-1}}\mathbf {R}^e_{T}+R^f\right) \Bigg |\mathbf {Z}_{T-1}\right] . \end{aligned}$$

    Recursively, moving backward in time, the following steps are subsequently performed at times t, \(t = T-2, T-3,\ldots , 1, 0\).

  • When the investor’s optimal asset allocations, \(\{{\hat{\mathbf {x}}_s}\}^{T-1}_{s=t+1}\), are determined, we can calculate the value function \(v_{t+1}(\mathbf {Z}_{t+1})\) as:

    $$\begin{aligned} v_{t+1}(\mathbf {Z}_{t+1})= & {} {\mathbb {E}} \left[ U\left( \prod ^{T-1}_{s=t+1}\left( {{\hat{\mathbf {x}}}^{\prime }_s}\mathbf {R}^e_{s+1}+R^f\right) \right) \Bigg |\mathbf {Z}_{t+1}\right] . \end{aligned}$$

    Then, the value function \(v_t(\mathbf {Z}_t)\) reads

    $$\begin{aligned} v_t(\mathbf {Z}_t)&= \max _{\{\mathbf {x}_s\}^{T-1}_{s=t}} {\mathbb {E}}\left[ U\left( \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1}+R^f\right) \prod ^{T-1}_{s=t+1}\left( {\mathbf {x}^{\prime }_s} \mathbf {R}^e_{s+1}+R^f\right) \right) \Bigg |\mathbf {Z}_t\right] \nonumber \\&= \max _{\mathbf {x}_t} {\mathbb {E}}\left[ \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1}+R^f\right) ^{1-\gamma }\max _{\{\mathbf {x}_s\}^{T-1}_{s=t+1}}{\mathbb {E}}\left[ U\left( \prod ^{T-1}_{s=t+1}({\mathbf {x}^{\prime }_s} \mathbf {R}^e_{s+1}+R^f)\right) \Bigg |\mathbf {Z}_{t+1}\right] \Bigg |\mathbf {Z}_t\right] \nonumber \\&= \max _{\mathbf {x}_t} {\mathbb {E}}\left[ \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1}+R^f\right) ^{1-\gamma } v_{t+1}(\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] , \end{aligned}$$
    (5)

    where the last equality is valid by using the definition of \(v_{t+1}(\mathbf {Z}_{t+1})\). Value function \(v_t(\mathbf {Z}_t)\) can also be written as:

    $$\begin{aligned} v_t(\mathbf {Z}_t)&= \max _{\mathbf {x}_t} {\mathbb {E}}\left[ \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1}+R^f\right) ^{1-\gamma }{\mathbb {E}}\left[ U\left( \prod ^{T-1}_{s=t+1}\left( {{\hat{\mathbf {x}}}^{\prime }_s} \mathbf {R}^e_{s+1}+R^f\right) \right) \Bigg |\mathbf {Z}_{t+1}\right] \Bigg |\mathbf {Z}_t\right] \nonumber \\&= \max _{\mathbf {x}_t} {\mathbb {E}}\left[ \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1}+R^f\right) ^{1-\gamma }U\left( \prod ^{T-1}_{s=t+1}\left( {{\hat{\mathbf {x}}}^{\prime }_s} \mathbf {R}^e_{s+1}+R^f\right) \right) \Bigg |\mathbf {Z}_t\right] , \end{aligned}$$
    (6)

    where the last equality follows from the law of iterated expectations.

Either Eqs. (5) or (6) can be employed to evolve the information in the backward recursion. They respectively correspond to the “value function iteration” and the “portfolio weight iteration”, to be discussed in the following subsection. In either case, the optimization problem with respect to \(\mathbf {x}_t\) can be solved via numerical techniques.

As mentioned before, there are basically two numerical approaches available for dealing with this problem, one is by grid-searching and the other is by solving the first-order conditions. These techniques are discussed in subsequent sections.

2.1.1 Portfolio Weight Iteration or Value Function Iteration

In the backward recursion process, after either the optimal asset allocations \(\{\mathbf {x}_s\}^{T-1}_{s=t+1}\) or \(\mathbf {x}_{t+1}\) and \(v_{t+1}(\mathbf {Z}_{t+1})\) have been determined, we need to evolve the information from time step \(t+1\) to time step t to proceed the recursive computation. We can consider either Eqs. (5) or (6) for this purpose. The former is termed “portfolio weight iteration” and the latter “value function iteration”. In van Binsbergen and Brandt (2007b) the authors show that more stable results can be obtained by the portfolio weight iteration. They explain their results as follows. In the value function iteration, the value function is a conditional expectation approximated by cross-path regression and approximation errors may accumulate in the backward recursion process. In the portfolio weight iteration, since the portfolio weights are bounded by borrowing and short-sale constraints, the approximation error remains bounded throughout the whole valuation process.

However, if the value function at each intermediate time step can be approximated accurately, the value function iteration should yield similar results as the portfolio weight iteration. In the numerical tests to follow, we will see that our enhanced numerical methods perform highly satisfactory and in most cases, using the value function iteration produces comparable results as the portfolio weight iteration.

3 Solving First-order Conditions

When the value function \(v_{t+1}(\mathbf {Z}_{t+1})\) is known, we consider the optimization problem displayed in Eq. (5).

One approach to obtain the optimal asset allocation \(\mathbf {x}_t\) in Eq. (5) is to solve the first-order conditions for an optimum, i.e.

$$\begin{aligned} {\mathbb {E}}\left[ \frac{\partial }{\partial \mathbf {x}_t}\left( \left( {\mathbf {x}^{\prime }_t} \mathbf {R}^e_{t+1}+R^f\right) ^{1-\gamma } v_{t+1}\left( \mathbf {Z}_{t+1}\right) \right) \Bigg |\mathbf {Z}_t\right] = 0. \end{aligned}$$
(7)

Since Eq. (7) is not directly solvable with respect to \(\mathbf {x}_t\), in Brandt et al. (2005) the authors proposed an approach to first approximate the value function \(v_t(\mathbf {Z}_t)\) via a Taylor series expansion and then solve the first-order conditions corresponding to the approximated function. Second-order Taylor expansion of the value function is written asFootnote 3:

$$\begin{aligned} v_t(\mathbf {Z}_t)\approx & {} \max _{\mathbf {x}_t} \Big \{ {\mathbb {E}}[(R^f)^{1-\gamma }v_{t+1}(\mathbf {Z}_{t+1}) \big |\mathbf {Z}_t] + {\mathbb {E}}[(1-\gamma )(R^f)^{-\gamma }{\mathbf {x}^{\prime }_t}\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \big |\mathbf {Z}_t] \\&+\,\,{\mathbb {E}}[\frac{1}{2}(1-\gamma )(-\gamma )(R^f)^{-1-\gamma }({\mathbf {x}^{\prime }_t}\mathbf {R}^e_{t+1})^2v_{t+1}(\mathbf {Z}_{t+1}) \big |\mathbf {Z}_t]\Big \}. \end{aligned}$$

The corresponding first-order conditions read:

$$\begin{aligned}&{\mathbb {E}}\left[ (1-\gamma )(R^f)^{-\gamma }\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \\&\quad +\,{\mathbb {E}}\left[ (1-\gamma )(-\gamma )(R^f)^{-1-\gamma }(\mathbf {R}^e_{t+1}{\mathbf {R}^{e\prime }_{t+1}})v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \mathbf {x}_t = 0, \end{aligned}$$

and the optimal asset allocation \(\hat{\mathbf {x}}_t\), which is assumed to be \(\mathbf {Z_t}\)-measurable, is given by:

$$\begin{aligned} \hat{\mathbf {x}}_t = \left[ {\mathbb {E}}\left[ \gamma \cdot (\mathbf {R}^e_{t+1}{\mathbf {R}^{e\prime }_{t+1}})v_{t+1}(\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] \right] ^{-1} \cdot {\mathbb {E}}\left[ R^f\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] . \end{aligned}$$
(8)

Here the conditional expectations can be approximated via simulation and cross-path regression, as done in Brandt et al. (2005), Longstaff and Schwartz (2001) and Tsitsiklis and Roy (2001).

It is mentioned in Brandt et al. (2005) that solving first-order conditions is quite sensitive to the order of the Taylor expansion of the value function and the results from second-order and fourth-order expansions can be different. If we consider the fourth-order Taylor expansion of the value function \(v_t(\mathbf {Z}_t)\), i.e.

$$\begin{aligned} v_t(\mathbf {Z}_t)\approx & {} \max _{\mathbf {x}_t}\Bigg \{{\mathbb {E}}\left[ (R^f)^{1-\gamma }v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \\&+\,\,{\mathbb {E}}\left[ (1-\gamma )(R^f)^{-\gamma }{\mathbf {x}^{\prime }_t}\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \\&+\,\,{\mathbb {E}}\left[ \frac{1}{2}(1-\gamma )(-\gamma )(R^f)^{-1-\gamma }({\mathbf {x}^{\prime }_t}\mathbf {R}^e_{t+1})^2v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \\&+\,\,{\mathbb {E}}\left[ \frac{1}{6}(1-\gamma )(-\gamma )(-1-\gamma )(R^f)^{-2-\gamma }({\mathbf {x}^{\prime }_t}\mathbf {R}^e_{t+1})^3v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \\&+\,\,{\mathbb {E}}\left[ \frac{1}{24}(1-\gamma )(-\gamma )(-1-\gamma )(-2-\gamma )(R^f)^{-3-\gamma }({\mathbf {x}^{\prime }_t}\mathbf {R}^e_{t+1})^4v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \Bigg \}, \end{aligned}$$

the optimal asset allocation \(\hat{\mathbf {x}}_t\) is defined as an implicit solution of the following equation:

$$\begin{aligned} \hat{\mathbf {x}}_t&\approx \left[ {\mathbb {E}}\left[ \gamma \cdot (\mathbf {R}^e_{t+1}{\mathbf {R}^{e\prime }_{t+1}})v_{t+1}(\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] \right] ^{-1} \cdot \Bigg \{{\mathbb {E}}\left[ R^f\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \nonumber \\&\quad \, + \frac{1}{2} {\mathbb {E}} \left[ \frac{(-\gamma )(-1-\gamma )}{R^f} (\hat{\mathbf {x}}^{\prime }_t\mathbf {R}^e_{t+1})^2 \mathbf {R}^e_{t+1} v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \nonumber \\&\quad \, + \frac{1}{6} {\mathbb {E}}\left[ \frac{(-\gamma )(-1-\gamma )(-2-\gamma )}{(R^f)^{2}} (\hat{\mathbf {x}}^{\prime }_t\mathbf {R}^e_{t+1})^3 \mathbf {R}^e_{t+1} v_{t+1}(\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] \Bigg \}. \end{aligned}$$
(9)

This equation can be treated as a fixed point problem, \(\mathbf {x} = h(\mathbf {x})\) with \(h(\cdot )\) denoting the right-hand side in Eq. (9). This can be solved by an iterative method. To start the iteration, we need an initial guess of the optimal asset allocation. Following the discussion in Brandt et al. (2005), we can take the solution from the second-order Taylor expansion of the value function as the initial guess \(\mathbf {x}^0_t\).

The iteration can be conducted by Newton’s method for \(h(\mathbf {x})-\mathbf {x} =0\):

$$\begin{aligned} \mathbf {x}^{l+1}_{t} = \mathbf {x}^l_{t} - \frac{h(\mathbf {x}^l_t)-\mathbf {x}^l_t}{h^{\prime }(\mathbf {x}^l_t) - 1},\quad l=0,1,2,\ldots . \end{aligned}$$

We stop the iteration, if either the 2-norm of the distance between two consecutive approximations \(\mathbf {x}^l_t\) and \(\mathbf {x}^{l+1}_t\) is smaller than a tolerance value \(\epsilon _{\mathrm {TOL}}\) or the number of iterations reaches a predetermined value \(l_{\mathrm {max}}\). We take the last iteration \(\mathbf {x}^{l+1}_t\) as the final solution of Eq. (9). In the numerical tests, we choose \(\epsilon _{\mathrm {TOL}}=0.0001\) and \(l_{\mathrm {max}} = 30\). Always the tolerance \(\epsilon _{\mathrm {TOL}}\) can be reached, unless stated otherwise.

3.1 Stochastic Grid Bundling Method

The Stochastic Grid Bundling Method (SGBM), introduced in Jain and Oosterlee (2015), is a powerful regression-based method for calculating conditional expectations in Eqs. (8) and (9).

It is shown in Jain and Oosterlee (2015) that applying SGBM is highly efficient for obtaining the early-exercise boundary when pricing American-style options and the estimated path-wise option value is so accurate that the Greeks can be generated directly. In Cong and Oosterlee (2016), SGBM is implemented for solving the dynamic mean-variance portfolio management problem in a robust and efficient way. In this paper, we implement SGBM for the dynamic utility-based portfolio management problem. Similar as Brandt et al. (2005), we take the second-order Taylor expansion in the description of the algorithm for expositional ease. However, in our numerical experiments, we always employ the fourth-order Taylor expansion. Extension to fourth-order expansion can be achieved with the formulas in Eq. (9). Our algorithm can be formally described as follows:

Step I: Simulation.

Simulate N paths \([\mathbf {R}^e_t(i),\mathbf {Z}_t(i)]^N_{i=1},\; t = 0,1,\ldots ,T\), and set the value function at terminal time T as:

$$\begin{aligned} v_T(\mathbf {Z}_T(i)) = \frac{1}{1-\gamma },\quad i=1,\ldots ,N. \end{aligned}$$

The following steps are subsequently performed at times t, \(t\le T-1\).

Step II: Bundling.

We bundle the paths at time t into B non-overlapping partitions, \({\mathscr {B}}_{t}(1),\ldots ,{\mathscr {B}}_{t}(B)\). Let each bundle cover a similar number of paths.

Step III: Regression.

Assume that there are N(b) paths in bundle \({\mathscr {B}}_{t}(b)\) and their value functions are \(\{v_{t+1}(\mathbf {Z}_{t+1})(i)\}^{N(b)}_{i=1}\) at time \(t+1\), or, equivalently, the optimal asset allocations read \(\{{\hat{\mathbf {x}}}_s(i)\}^{N(b)}_{i=1},s=t+1,\ldots ,N\), and their excess returns \(\{\mathbf {R}^e_{t+1}(i)\}^{N(b)}_{i=1}\) are known. For these paths, we determine bundle-wise regression parameters \(\{\alpha _k(b)\}^K_{k=1}\) by regressing the values \(\{\gamma \cdot (\mathbf {R}^e_{t+1}(i)\mathbf {R}^{e\prime }_{t+1}(i))v_{t+1}(\mathbf {Z}_{t+1})(i)\}^{N(b)}_{i=1}\) on basis functionsFootnote 4 \([\phi _1(\mathbf {R}^e_{t+1}(i),\mathbf {Z}_{t+1}(i)),\ldots ,\phi _K(\mathbf {R}^e_{t+1}(i),\mathbf {Z}_{t+1}(i))]^{N(b)}_{i=1}\), which are constructed using the information at time \(t+1\). For any path whose state \(\mathbf {Z}_{t}\) is covered by bundle \({\mathscr {B}}_{t}(b)\), \({\mathbb {E}}[\gamma \cdot (\mathbf {R}^e_{t+1}\mathbf {R}^{e\prime }_{t+1})v_{t+1}(\mathbf {Z}_{t+1})\big |\mathbf {Z}_t]\), the denominator of the right-hand side part in Eq. (8), can be approximated by:

$$\begin{aligned} {\mathbb {E}}\left[ \gamma \cdot (\mathbf {R}^e_{t+1}\mathbf {R}^{e\prime }_{t+1})v_{t+1}(\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] \approx \sum ^K_{k=1}\alpha _k(b) {\mathbb {E}}\left[ \phi _k(\mathbf {R}^e_{t+1},\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] . \end{aligned}$$

Similarly, \({\mathbb {E}}[R^f\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \big |\mathbf {Z}_t]\), the numerator of the right-hand side part in Eq. (8), can be approximated by:

$$\begin{aligned} {\mathbb {E}}\left[ R^f\mathbf {R}^e_{t+1}v_{t+1}(\mathbf {Z}_{t+1}) \Bigg |\mathbf {Z}_t\right] \approx \sum ^K_{k=1}\beta _k(b) {\mathbb {E}}\left[ \phi _k(\mathbf {R}^e_{t+1},\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] , \end{aligned}$$

where the regression parameters \(\{\beta _k(b)\}^K_{k=1}\) are obtained by regressing \(\{R^f\mathbf {R}^e_{t+1}(i)v_{t+1}(\mathbf {Z}_{t+1}(i))\}^{N(b)}_{i=1}\) on the same basis functions \([\phi _1(\mathbf {R}^e_{t+1}(i),\mathbf {Z}_{t+1}(i)),{\ldots },\phi _K(\mathbf {R}^e_{t+1}(i),\mathbf {Z}_{t+1}(i))]^{N(b)}_{i=1}\).

For any path whose state \(\mathbf {Z}_{t}\) is covered by bundle \({\mathscr {B}}_{t}(b)\), the optimal asset allocation is approximated by:

$$\begin{aligned} {\hat{\mathbf {x}}}_t \approx \left[ \sum ^K_{k=1}\alpha _k(b) {\mathbb {E}}\left[ \phi _k(\mathbf {R}^e_{t+1},\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] \right] ^{-1} \cdot \left[ \sum ^K_{k=1}\beta _k(b) {\mathbb {E}}[\phi _k(\mathbf {R}^e_{t+1},\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t]\right] . \end{aligned}$$

The regression step is repeated for all bundles at each time step, so for each path we find the corresponding optimal asset allocation.

Step IV: Transition. For the i-th path in bundle \({\mathscr {B}}_{t}(b)\), we can either apply portfolio weight iteration or value function iteration to transfer the information of the optimal investment strategy from time t to time \(t-1\).

When using the portfolio weight iteration, we just store the optimal asset allocations \(\{\mathbf {x}_s\}^{T-1}_{s=t}\) and write \(v_{t}(\mathbf {Z}_{t})\) as \(\prod ^{T-1}_{s=t}(\mathbf {x}^{\prime }_s\mathbf {R}^e_{s+1}+R^f)^{1-\gamma }/(1-\gamma )\) in the regression step at time \(t-1\).

If we use the value function iteration, the process is slightly more involved. For all paths in bundle \({\mathscr {B}}_{t}(b)\), we regress \(\{(\mathbf {x}^{\prime }_t(i)\mathbf {R}^e_{t+1}(i)+R^f)^{1-\gamma }v_{t+1}(\mathbf {Z}_{t+1}(i))\}^{N(b)}_{i=1}\) on the following polynomial basis functions \([\hat{\phi }_1(\mathbf {Z}_t(i)),\ldots ,\hat{\phi }_K(\mathbf {Z}_t(i))]^{N(b)}_{i=1}\) formed by \(\{\mathbf {Z}_t(i)\}^{N(b)}_{i=1}\) and obtain regression parameters \([\varphi _1(b),\ldots ,\varphi _K(b)]\). The value function is then approximated byFootnote 5:

$$\begin{aligned} v_t(\mathbf {Z}_t) \approx \sum ^K_{k=1} \varphi _k(b) \hat{\phi }_k(\mathbf {Z}_t). \end{aligned}$$

3.2 Taylor Expansion Based on a Nonlinear Decomposition

In both, the BGSS and SGBM:Footnote 6 algorithms, an essential step before solving the equations for the first-order conditions is to rewrite the value function, \(v_t(\mathbf {Z}_t)\), in a Taylor series expansion in which the asset allocation \(\mathbf {x}_t\) is separated from the conditional expectations of \(\mathbf {R}^e_{t+1}\). A Kth-order Taylor expansion in SGBM can be written asFootnote 7

$$\begin{aligned} \left( x_t R^e_{t+1} + R^f\right) ^{1-\gamma } \approx \sum ^K_{k=0} \frac{g^{(k)}_1(0)}{k!}(x_{t} R^e_{t+1})^k, \end{aligned}$$
(10)

where \(g^{(k)}_1(0)\) denotes the kth derivative of function \(g_1(y)\) when \(y=0\). Function \(g_1(y) = (y + R^f)^{1-\gamma }\). So, \(g_1(x_t R^e_{t+1})= (x_t R^e_{t+1} + R^f)^{1-\gamma }\).

Since the excess return \(R^e_t\) is a nonlinear transformation of the log excess return \(\mathbf {r}^e_t\), i.e.

$$\begin{aligned} R^e_{t+1} = \exp (r^e_{t+1}) R^f - R^f, \end{aligned}$$

an alternative way to perform a Taylor expansion for \((x_t R^e_{t+1} + R^f)^{1-\gamma }\) is given by:

$$\begin{aligned} \left( x_t R^e_{t+1} + R^f\right) ^{1-\gamma } \approx \sum ^K_{k=0} \frac{g^{(k)}_2(0)}{k!}(r^e_{t+1})^k, \end{aligned}$$
(11)

where function \(g_2(z)\) is defined by:

$$\begin{aligned} g_2(z) = \left( R^f + x_t \left( \exp (z) R^f-R^f\right) \right) ^{1-\gamma }. \end{aligned}$$

Functions \(g_1(x_t R^e_{t+1})\) and \(g_2(r^e_t)\) are both identical to \((x_t R^e_{t+1} + R^f)^{1-\gamma }\), but different ways of choosing the underlying variable yield different Taylor expansion formulas.

In Garlappi and Skoulakis (2011), the authors term the expansion described in Eq. (10) as “Taylor expansion based on a linear decomposition” and the expansion described in Eq. (11) as “Taylor expansion based on a nonlinear decomposition”. They show that when the centers of Taylor expansions are carefully chosen, the “Taylor expansion based on a nonlinear decomposition” is more accurate than the “Taylor expansion based on a linear decomposition” when approximating the function \((x_t R^e_{t+1} + R^f)^{1-\gamma }\). We will call these expansions “original Taylor” and “log Taylor” expansions, respectively, in the rest of this paper. Although the log Taylor expansion has been implemented in Garlappi and Skoulakis (2009) for dynamic portfolio management, their choice of expansion center is not compatible with the algorithm discussed here. We deal with this problem by performing a log Taylor expansion around center 0, as displayed in Eq. (11).Footnote 8 According to the numerical tests in Sect. 5, we find that the log Taylor expansion is indeed a superior choice even when the expansion center is chosen to be 0. The reasoning is that the log excess return \(r^e_t\) usually exhibits a distribution similar to the normal distribution. Therefore, a Taylor expansion with respect to this variable, i.e. the so-called “log Taylor” expansion, can yield accurate results with a limited number of expansion terms. The distribution of the excess return \(R^e_t\) usually exhibits a fat tailed distribution, which requires more terms in the original Taylor expansion to approximate its moments.

4 Grid-Searching Methods

An alternative technique to solving first-order conditions is based on grid-searching, which is an intuitive idea for solving the optimization problem described in Eq. (5). In grid-searching, we reduce the optimization problem on the continuous domain to a problem on a discrete domain. For example, if we consider the allocation, \(x_t\), of one risky asset, the original optimization problem is solved on a domain [0, 1]. By grid-searching, we construct M equidistant grid points \(\{\frac{m}{M}\}^M_{m=0}\) and consider the optimization problem on the discrete domain \(D_M = \{\frac{m}{M}|\; m=0,1,\ldots ,M\}\). To solve this discrete optimization problem, we test each possible choice of the allocation \(x^{(m)}_t = \frac{m}{M},m=0,\ldots ,M\) and calculate the corresponding value functions:

$$\begin{aligned} v^{(m)}_t(\mathbf {Z}_t) = {\mathbb {E}}\left[ (x^{(m)}_t R^e_{t+1}+R^f)^{1-\gamma } v_{t+1}(\mathbf {Z}_{t+1})\Bigg |\mathbf {Z}_t\right] . \end{aligned}$$
(12)

We determine the maximum, \(v^{\mathrm {max}}_t(\mathbf {Z}_t)\), from \(\{v^{(m)}_t(\mathbf {Z}_t)\}^M_{m=0}\) and denote its corresponding asset allocation as “the optimal asset allocation”.

Although it is mentioned in van Binsbergen and Brandt (2007a, b) that the grid-searching method is robust and avoids a number of numerical issues regarding convergence that occur when solving first-order conditions, it should be noted that grid-searching is an expensive numerical approach. The workload of grid-searching grows exponentially as the dimensionality of the problem increases. Moreover, according to our numerical tests in the low-dimensional cases, the vBB algorithm, which employs grid-searching together with simulation, yields more “uncertain” results (larger variance) compared to the other simulation-based algorithms.

However, if we wish to find an accurate reference solution to the dynamic portfolio management problem, grid-searching seems our only choice since solving first-order conditions essentially relies on Taylor approximations of the utility function, whereas grid-searching does not. In the next subsection we will present our benchmark approach based on the idea of grid-searching.

4.1 COS Portfolio Management Method

In this section, we present a benchmark method, based on the Fourier cosine series expansion (COS) method to calculate the conditional expectations. This method was introduced in Fang and Oosterlee (2008) for pricing one-dimensional European options and later in Fang and Oosterlee (2009) for pricing one-dimensional Bermudan and barrier options. In Ruijter and Oosterlee (2012), this method was extended to the two-dimensional case. Because the COS method is not based on simulation, it can yield benchmark solutions to the investor’s problem, especially in the basic case with one risky asset and one risk-free asset. Following the previous discussions, this basic investor’s problem with power utility function is given by:

$$\begin{aligned} v_{t-1}(Z_{t-1}) = \max _{x_{t-1}} {\mathbb {E}}\left[ ({x_{t-1}} R^e_{t}+R^f)^{1-\gamma }v_t(Z_t)\Bigg |Z_{t-1}\right] \quad t = 1,\ldots ,T-1, \end{aligned}$$
(13)

where the terminal condition reads:

$$\begin{aligned} v_T(Z_T) = \frac{1}{1-\gamma }, \quad \gamma \not = 1. \end{aligned}$$

If we denote the conditional transition density function from state \(Z_{t-1}\) to \((R^e_t,Z_t)\) as \(f(R^e_t,Z_t|Z_{t-1})\), the investor’s problem reads:

$$\begin{aligned} v_{t-1}(Z_{t-1}) = \max _{x_{t-1}} \iint _{{\mathbb {R}}^2}({x_{t-1}} R^e_{t}+R^f)^{1-\gamma }v_t(Z_t) f(R^e_t,Z_t|Z_{t-1}) \mathrm {d}R^e_t \mathrm {d}Z_t, \quad t = 1,\ldots ,T-1. \end{aligned}$$
(14)

The COS algorithm for calculating conditional expectations can be described in five steps:

Step I: Truncate the integration range in Eq. (14).

If we assume that the integrand is integrable, we can truncate the integration range from \({\mathbb {R}}^2\) to \([a_R,b_R]\times [a_Z,b_Z]\) without losing significant accuracy. The approximated value function \(\hat{v}_{t-1}(Z_{t-1})\) reads:

$$\begin{aligned} \hat{v}_{t-1}(Z_{t-1})= & {} \max _{x_{t-1}} \int ^{b_Z}_{a_Z} \int ^{b_R}_{a_R}({x_{t-1}} R^e_{t}+R^f)^{1-\gamma }v_t(Z_t) f(R^e_t,Z_t|Z_{t-1}) \mathrm {d}R^e_t \mathrm {d}Z_t. \end{aligned}$$

Remark 4.1

For one variable, for example \(Z_t\), the suggested integration range \([a_Z,b_Z]\) in Fang and Oosterlee (2009) and Ruijter and Oosterlee (2012) is \([\xi ^Z_1 - L\xi ^Z_2, \xi ^Z_1 + L\xi ^Z_2]\), where \(\xi ^Z_1\) is the mean of \(Z_t\) and \(\xi ^Z_2\) the standard deviation of \(Z_t\). L should be large enough to make the truncation error acceptably low.

Step II: Expand the integrand in Fourier cosines.

If we denote the Fourier cosine expansion of \(f(R^e_t,Z_t|Z_{t-1})\) on \([a_R,b_R]\times [a_Z,b_Z]\) by:

$$\begin{aligned} A_{k_1,k_2}(Z_{t-1}):= & {} \frac{2}{b_Z-a_Z} \frac{2}{b_R-a_R} \int ^{b_Z}_{a_Z} \int ^{b_R}_{a_R} f(R^e_t,Z_t\big |Z_{t-1}) \\&\times \cos \left( k_1 \pi \frac{R^e_t-a_R}{b_R-a_R}\right) \cos \left( k_2 \pi \frac{Z_t-a_Z}{b_Z-a_Z}\right) \mathrm {d}R^e_t \mathrm {d}Z_t, \end{aligned}$$

and similarly define the utility coefficients as:

$$\begin{aligned} V_{k_1,k_2}(t,x_{t-1}):= & {} \frac{2}{b_Z-a_Z} \frac{2}{b_R-a_R} \int ^{b_Z}_{a_Z} \int ^{b_R}_{a_R} ({x_{t-1}} R^e_{t}+R^f)^{1-\gamma }v_t(Z_t) \\&\times \cos \left( k_1 \pi \frac{R^e_t-a_R}{b_R-a_R}\right) \cos \left( k_2 \pi \frac{Z_t-a_Z}{b_Z-a_Z}\right) \mathrm {d}R^e_t \mathrm {d}Z_t, \end{aligned}$$

value function \({v}_{t-1}(Z_{t-1})\) can be approximated by:

$$\begin{aligned} \hat{v}_{t-1}(Z_{t-1}) = \max _{x_{t-1}} \left\{ \frac{b_Z-a_Z}{2} \frac{b_R-a_R}{2} \sum ^{\infty }_{k_1 = 0}{}'\sum ^{\infty }_{k_2 = 0}{}' A_{k_1,k_2}(Z_{t-1}) V_{k_1,k_2}(t,x_{t-1}) \right\} . \end{aligned}$$

The primed sum \(\sum {}'\) means that the first term of the summation has half weight.

Step III: Truncate the infinite series.

We truncate the infinite series, as follows:

$$\begin{aligned} \bar{v}_{t-1}(Z_{t-1}) = \max _{x_{t-1}} \left\{ \frac{b_Z-a_Z}{2} \frac{b_R-a_R}{2} \sum ^{N_1-1}_{k_1 = 0}{}'\sum ^{N_2-1}_{k_2 = 0}{}' A_{k_1,k_2}(Z_{t-1}) V_{k_1,k_2}(t,x_{t-1}) \right\} . \end{aligned}$$

Step IV: Calculate the coefficients \(A_{k_1,k_2}(Z_{t-1})\).

The coefficients \(A_{k_1,k_2}(Z_{t-1})\) can be approximated by \(F_{k_1,k_2}(Z_{t-1})\), as follows:

$$\begin{aligned} F_{k_1,k_2}(Z_{t-1}):= & {} \frac{2}{b_Z-a_Z} \frac{2}{b_R-a_R} \iint _{{\mathbb {R}}^2} f(R^e_t,Z_t\big |Z_{t-1}) \\&\times \cos \left( k_1 \pi \frac{R^e_t-a_R}{b_R-a_R}\right) \cos \left( k_2 \pi \frac{Z_t-a_Z}{b_Z-a_Z}\right) \mathrm {d}R^e_t \mathrm {d}Z_t. \end{aligned}$$

Using the following property of cosines: \(2\cos (\alpha ) \cos (\beta ) = \cos (\alpha +\beta ) + \cos (\alpha - \beta )\), we can calculate \(F_{k_1,k_2}(Z_{t-1})\) by:

$$\begin{aligned} F_{k_1,k_2}(Z_{t-1}) = \frac{F^+_{k_1,k_2}(Z_{t-1}) + F^-_{k_1,k_2}(Z_{t-1})}{2}, \end{aligned}$$

where

$$\begin{aligned}&F^{\pm }_{k_1,k_2}(Z_{t-1}) \\ =&\frac{2}{b_Z-a_Z} \frac{2}{b_R-a_R} \iint _{{\mathbb {R}}^2} f(R^e_t,Z_t|Z_{t-1}) \cos \left( k_1 \pi \frac{R^e_t-a_R}{b_R-a_R} \pm k_2 \pi \frac{Z_t-a_Z}{b_Z-a_Z}\right) \mathrm {d}R^e_t \mathrm {d}Z_t\\ =&\frac{2}{b_Z-a_Z} \frac{2}{b_R-a_R} \mathfrak {R}\Bigg (\iint _{{\mathbb {R}}^2} f(R^e_t,Z_t|Z_{t-1}) \exp \left( i k_1 \pi \frac{R^e_t}{b_R-a_R} \pm i k_2 \pi \frac{Z_t}{b_Z-a_Z}\right) \mathrm {d}R^e_t \mathrm {d}Z_t \\&\times \exp \left( -i k_1 \pi \frac{a_R}{b_R-a_R} \mp i k_2 \pi \frac{a_Z}{b_Z-a_Z}\right) \Bigg )\\ =&\frac{2}{b_Z-a_Z} \frac{2}{b_R-a_R} \mathfrak {R}\Bigg (\psi \left( \frac{k_1 \pi }{b_R-a_R},\pm \frac{k_2 \pi }{b_Z - a_Z}\Bigg |Z_{t-1}\right) \\&\times \exp \left( -i k_1 \pi \frac{a_R}{b_R-a_R} \mp i k_2 \pi \frac{a_Z}{b_Z-a_Z}\right) \Bigg ). \end{aligned}$$

\(\mathfrak {R}(\cdot )\) means taking the real part of the input data. \(\psi (u_r,u_Z|Z_{t-1})\) is the bivariate conditional characteristic function of \((R^e_t,Z_t)\) given state \(Z_{t-1}\):

$$\begin{aligned} \psi (u_R,u_Z|Z_{t-1}) = \iint _{{\mathbb {R}}^2} \exp (i [u_R,u_Z] \cdot [R^e_t,Z_t]' f(R^e_t,Z_t|Z_{t-1})\mathrm {d}R^e_t \mathrm {d}Z_t. \end{aligned}$$

For many asset dynamics models this bivariate characteristic function is known in closed form.

Step V: Calculate the coefficients \(V_{k_1,k_2}(t,x_{t-1})\).

The coefficients \(V_{k_1,k_2}(t,x_{t-1})\) are not directly related to any closed-form expression. However, we can apply numerical integration and the discrete cosine transform (DCT) to approximate \(V_{k_1,k_2}(t,x_{t-1})\). To do this, we take \(Q \ge \max [N_1,N_2]\) grid points in each spatial dimension and define:

$$\begin{aligned} R^{n_1}_t:= & {} a_R + (n_1 + \frac{1}{2})\varDelta R_t \quad n_1 = 1,\ldots ,Q\\ Z^{n_2}_t:= & {} a_Z + (n_2 + \frac{1}{2})\varDelta Z_t \quad n_2 = 1,\ldots ,Q\\ \varDelta R_t:= & {} \frac{b_R - a_R}{Q}, \quad \varDelta Z_t := \frac{b_Z - a_Z}{Q}. \end{aligned}$$

The midpoint-rule integration gives us

$$\begin{aligned} V_{k_1,k_2}(t,x_{t-1})\approx & {} \sum ^{Q-1}_{n_1 = 0} \sum ^{Q-1}_{n_2 = 0} \frac{2}{b_R-a_R} \frac{2}{b_Z-a_Z} \left( {x_{t-1}} R^e_{t}+R^f\right) ^{1-\gamma }v_t(Z_t)\\&\times \, \cos \left( k_1 \pi \frac{R^{n_1}_t-a_R}{b_R - a_R}\right) \cos \left( k_2 \pi \frac{Z^{n_2}_t-a_Z}{b_Z- a_Z}\right) \varDelta R_t \varDelta Z_t\\= & {} \sum ^{Q-1}_{n_1 = 0} \sum ^{Q-1}_{n_2 = 0} \frac{2}{Q} \frac{2}{Q} \left( {x_{t-1}} R^{n_1}_{t}+R^f\right) ^{1-\gamma }v_t(Z^{n_2}_t) \cos \left( k_1 \pi \frac{R^{n_1}_t-a_R}{b_R - a_R}\right) \\&\times \,\cos \left( k_2 \pi \frac{Z^{n_2}_t-a_Z}{b_Z- a_Z}\right) . \end{aligned}$$

The equation above can be calculated efficiently via a two-dimensional DCT, for example, with the function dct2 of MATLAB. Moreover, we can rewrite the sum of multiplications into a multiplication of sums, that is:

$$\begin{aligned} V_{k_1,k_2}(t,x_{t-1})\approx & {} \frac{\displaystyle 2}{\displaystyle Q} \frac{\displaystyle 2}{\displaystyle Q} \left( \displaystyle \sum ^{Q-1}_{n_1 = 0} \left( {x_{t-1}} R^{n_1}_{t}+R^f\right) ^{1-\gamma } \cos \left( k_1 \pi \frac{R^{n_1}_t-a_R}{b_R - a_R}\right) \right) \\&\qquad \qquad \times \left( \displaystyle \sum ^{Q-1}_{n_2 = 0} v_t\left( Z^{n_2}_t\right) \cos \left( k_2 \pi \frac{Z^{n_2}_t-a_Z}{b_Z- a_Z}\right) \right) . \end{aligned}$$

Then, the two-dimensional DCT can be replaced by two separate one-dimensional DCTs, which helps reducing the computational time.

For state \(Z_{t-1}\) and asset allocation \(x_{t-1}\), we can calculate the conditional expectation shown in Eq. (13) by the COS method. To solve the optimization problem with respect to \(x_{t-1}\), we employ grid-searching: we evaluate discretized values of \(x_{t-1} \in \{\frac{m}{M}|m=0,\ldots ,M\}\) and find the largest conditional expectation. The backward recursion process can be performed from time \(T-1\) to the initial time.

Within the COS method, we have five parameters to adjust the truncation and discretization errors. These are \(N_1\), \(N_2\), L, Q and M. Generally, larger values of these parameters lead to more accurate approximations but also to higher computational load. We use the following default parameter setting:

$$\begin{aligned} N_1 = 50,\quad N_2 = 100,\quad L = 8,\quad Q = 100,\quad M = 200. \end{aligned}$$
(15)

According to our experiments, the COS method provides highly accurate results under this setting. However, when the admissible asset allocation can be chosen from a very wide range of values, the COS approach, which is based on discrete grid search, may lose its accuracy. In that case, the SGBM method equipped with the log Taylor expansion and a large number of paths will still generate satisfactory solutions and appears favorable.

Remark 4.2

The COS method suffers from the curse of dimensionality. However, this is a problem for any method involving discretization of the state space and grid-searching. To settle this issue in high-dimensional cases, adaptive discretization, or sparse grids, and grid-searching can be applied.

Remark 4.3

The computational load of the COS method for a dynamic portfolio management problem is mainly related to the DCT computations, for which the computational complexity at each time step is \(O(N_2 \cdot M \cdot Q \cdot \log (Q))\). Computations at each time step are performed sequentially, but the computations for the value function at each state point are independent, so it should be possible to accelerate the COS method by parallel processing.

5 Numerical Experiments

In this section, we test the performance of five methods for generating the optimal dynamic portfolio management strategy. These are:

  • “BGSS”: the method introduced in Brandt et al. (2005);

  • “vBB”: the method introduced in van Binsbergen and Brandt (2007b);

  • “SGBM”: SGBM with the original Taylor expansion;

  • “SGBM-LT”: SGBM with the log Taylor expansion;

  • “COS”: the COS method.

We impose borrowing and short-sale constraints on the asset allocations, that are therefore restricted between 0 and 1. When we implement the simulation-based algorithms, we always generate \(2^{14}\) paths. For “SGBM” and “SGBM-LT”, which require bundling, we employ 32 bundles at each time step. We approximate the utility function by Taylor expansions, up to 4th-order for both the log Taylor expansion and the original Taylor expansion. For “BGSS” and “vBB”, we use polynomials of the state variable up to second-order as the basis functions for the cross-path regression. For “SGBM” and “SGBM-LT”, the polynomials are also second-order but in of the state variable and the return variable.

To measure the performance of a dynamic portfolio management strategy, we consider the statistic, “annualized certainty equivalent rate”, CER. It describes the annualized return rate of a risk-free asset which at terminal time Y(years) yields the same utility of wealth obtained from the dynamic portfolio management strategy. Equivalently, the CER is the risk-free rate that an investor is willing to accept rather than adopting a particular risky portfolio management strategy. Formally the CER is defined by:

$$\begin{aligned} U(W_0 \cdot (1+\mathrm {CER})^Y)) = V_0(W_0,\mathbf {Z}_0), \end{aligned}$$
(16)

where the value function \(V_0(W_0,\mathbf {Z}_0)\) is defined by Eq. (1). Generally, a portfolio management strategy with high CER is close to the optimal strategy and can thus be regarded as an accurate solution to the dynamic portfolio management problem.

We perform numerical tests here for a basic test case where the portfolio contains one risky asset and one risk-free asset. We consider the vector auto-regression (VAR) model to describe the dynamics of the log excess return \(r^e_t\) of the risky asset and its log dividend yield \(d_t\), that are chosen as the state variables. Quarterly data are generated with the following process, as in Brandt et al. (2005), van Binsbergen and Brandt (2007b) and Garlappi and Skoulakis (2009):

$$\begin{aligned} \begin{bmatrix} r^e_{t+1}\\ d_{t+1} \end{bmatrix} = \begin{bmatrix} 0.227\\ -0.155 \end{bmatrix} + \begin{bmatrix} 0.060\\ 0.958 \end{bmatrix} d_t + \begin{bmatrix} \epsilon ^r_{t+1}\\ \epsilon ^d_{t+1} \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} \begin{bmatrix} \epsilon ^r_{t+1}\\ \epsilon ^d_{t+1} \end{bmatrix} \sim N(\mathbf {\mu }_{\epsilon },\varSigma _\epsilon ),\quad \mathbf {\mu }_{\epsilon } = \begin{bmatrix} 0\\0 \end{bmatrix} \quad \text { and }\quad \varSigma _\epsilon = \begin{bmatrix} 0.0060&-0.0051\\ -0.0051&0.0049 \end{bmatrix}. \end{aligned}$$

In most of the tests, the initial state, \(d_0\), is chosen as the unconditional mean, i.e., \(d_0 = -0.155/(1-0.958) = -3.6905\). Only in Sect. 5.4 we will consider three quantiles, the 25, 50 and 75 % quantiles, of the unconditional distribution of state variable respectively as the initial state. The gross return of the risk-free asset is chosen as \(R^f = 1.06^{0.25}\) and the excess return \(R^e_t\) of the risky asset is \(R^e_t = R^f (\exp (r^e_t)-1)\).

Associated to the 1D-VAR model, the characteristic function, which is essential for the COS portfolio management method, can be formulated asFootnote 9:

$$\begin{aligned} \psi (u_r,u_Z|Z_{t-1})= & {} \exp \left( i k_1 \pi \frac{0.227 + 0.060\cdot Z_{t-1}}{b_r-a_r} + i k_2 \pi \frac{-0.155+0.958\cdot Z_{t-1}}{b_Z-a_Z}\right) \cdot \\&\times \exp \left( i \mathbf {\mu _\epsilon }'[u_r,u_Z]' - \frac{1}{2}[u_r,u_Z]\varSigma _\epsilon [u_r,u_Z]'\right) . \end{aligned}$$

5.1 Quality of the COS Portfolio Management Method

We first check the validity and quality of the COS portfolio management method. For the dynamic portfolio management problem with the 1D-VAR model, we calculate the optimal asset allocations and the corresponding annualized certainty equivalent rates and compare them with the reference values from Garlappi and Skoulakis (2009). As we can see in Table 1, in case of different investment horizons and risk aversions, the COS method always provides accurate approximations of the annualized certainty equivalent rates and also highly satisfactory approximations of the optimal initial asset allocations.

Table 1 Initial optimal asset allocations and the corresponding annualized certainty equivalent rates of the COS portfolio management method, based on the 1D-VAR model, with reference values from Garlappi and Skoulakis (2009)

As the COS method with the parameter settings in (15) and the reference method involve some approximation errors, it is difficult to say whose optimal initial asset allocation is superior. However, since it is known that first-order deviations in the portfolio policy have only second-order welfare effects (Cochrane 1989) and the COS method and the reference method yield similar annualized certainty equivalent rates, we consider these as the optimal solutions when comparing with simulation-based methods.

Remark 5.1

We have also tested the performance of the COS portfolio management method with different initial states \(d_0\). For any initial state tested, it generates very similar results as the reference values in Garlappi and Skoulakis (2009).

5.2 Portfolio Management with the Buy-and-Hold Strategy

In this section, instead of the dynamic portfolio management problem, in which an investor decides his optimal asset allocations at intermediate times \(t=0,1,\ldots ,T-1\), we consider a case where the investor decides his optimal asset allocation at time \(t=0\) and holds a fixed amount of assets until terminal time \(t=T\). The corresponding value function reads

$$\begin{aligned} v_0(Z_0) = \max _{x_0} {\mathbb {E}} \left[ \frac{1}{1-\gamma } \left( {x_0} R^e_{0\rightarrow T}+R^f_{0\rightarrow T}\right) ^{1-\gamma }\Bigg |Z_0\right] , \end{aligned}$$

where \(R^f_{0\rightarrow T} = (R^f)^T, R^e_{0\rightarrow T} = R^f_{0\rightarrow T} \cdot e^{\sum ^T_{t=1} r^e_t} - R^f_{0 \rightarrow T}\).

This type of problem can be viewed as a static portfolio management problem, for which the aforementioned four simulation-based methods (“SGBM-LT”, “SGBM”, “BGSS” and “vBB”) can be applied. The COS method is utilized to generate benchmarks for the optimal asset allocations and the corresponding annualized certainty equivalent rates.

Figure 1 shows that “vBB” provides identical results to the optimal ones, since it does not involve Taylor expansion errors. For the other three candidates, in which Taylor expansions are involved, “SGBM-LT” provides the best approximation of the initial asset allocations. When the investment horizon is long, although the asset allocations of “SGBM-LT” are not close to the optimal solutions, their corresponding certainty equivalent rates are similar to the optimal ones. For the other two methods, “SGBM” and “BGSS”, the estimates of asset allocations and certainty equivalent rates are acceptable only when the investment horizon is shorter than 10 quarters.

This test indicates that the log Taylor expansion (\(2^{14}\) paths, 32 bundles) outperforms the original Taylor expansion for approximating the utility functions. The advantage of using the log Taylor expansion is obvious when the distribution of the accumulated excess return, \(R^e_{0\rightarrow T}\), exhibits a fat tail.

Fig. 1
figure 1

For the simulation-based methods, we report the point estimate of the initial asset allocations from 100 runs. The optimal values are generated with the COS method, a optimal allocation \(\gamma = 5\), b certainty equivalent rate \(\gamma = 5\), c optimal allocation \(\gamma = 15\), d certainty equivalent rate \(\gamma = 15\)

5.3 Dynamic Portfolio Management with Different Investment Horizons and Risk Aversion Parameters

Following the discussion in van Binsbergen and Brandt (2007b), we consider for the dynamic optimization problem the portfolio weight iteration in the transfer step and compare the four simulation-based methods.

In Table 2, we observe that “SGBM-LT”, among the four methods, always provides the highest certainty equivalent rates, which implies that the portfolio management strategy generated by “SGBM-LT” is most similar to the optimal one. However, when the investment horizon is long and risk aversion is high, even the results of “SGBM-LT” are not highly satisfactory. In that case, we prefer to solve the dynamic portfolio management problem by the COS portfolio management method. Regarding the simulation-based methods, “SGBM” and “SGBM-LT” are superior to “BGSS” and “vBB”, since their corresponding CERs have larger means and smaller standard errors.

Table 2 Mean and the standard derivations of the CER from 100 runs, comparing 4 simulation-based methods for dynamic portfolio management for different investment horizons and risk aversion parameters
Table 3 Mean and the standard derivations of the CER from 100 runs, comparing “SGBM” and “SGBM-LT” with the portfolio weight and the value function iteration; different investment horizons and risk aversion parameters

Different from the findings in van Binsbergen and Brandt (2007b) that value function iteration also results in low certainty equivalent rates here. Table 3 shows that when using “SGBM” or “SGBM-LT”, we can also get satisfactory results by the value function iteration in most test cases. Portfolio weight iteration is significantly better than value function iteration when the risk aversion is large and the investment horizon is long.

5.4 Influence of Varying Initial State

We consider three different initial values, \(d_0\), of the state variable. Each value corresponds to the p-th quantile of the unconditional distribution of d, where p takes values 25, 50 and 75.

Figure 2 shows that, for any initial state, “SGBM-LT” performs better than the other three simulation-based algorithms. The intermediate asset allocations generated by “SGBM-LT” are most similar to the optimal ones. At the initial recursion steps, “vBB” also generates similar asset allocations. However, as the backward recursion progresses, the uncertainty in the “vBB” estimates grows and hence the accuracy of “vBB” gets worse.

In any case, “SGBM” and “SGBM-LT” yield estimates with low uncertainties. Moreover, we see that “SGBM-VFI” and “SGBM-LT-VFI”, in which the value function iteration is considered in the recursion step, respectively, generate very similar results to those of “SGBM” and “SGBM-LT”. These are advantages of the new method to calculate conditional expectations.

Fig. 2
figure 2

Comparison of simulation-based algorithms for estimating the optimal intermediate asset allocations for different initial states. At each time step, the average asset allocations is computed. For the simulation-based algorithms, the mean and the standard derivation of the average asset allocations are generated from 100 runs. The optimal values are generated by the COS method, a Mean of the average asset allocations, \(d_0=-3.8551\), b Confidence interval of the average asset allocations, \(d_0=-3.8551\), c Mean of the average asset allocations, \(d_0=-3.6905\), d Confidence interval of the average asset allocations, \(d_0=-3.6905\), e Mean of the average asset allocations, \(d_0=-3.5258\), f Confidence interval of the average asset allocations, \(d_0 =-3.5258\)

5.5 Influence of Varying Model Uncertainty

If we consider higher model uncertainty in the 1D-VAR model, the aforementioned methods perform differently. The model uncertainty can be modified by introducing a multiplier \(M^2\) to the original covariance matrix \(\varSigma _\epsilon \) of the white noise vector \([\epsilon ^r_{t+1},\epsilon ^d_{t+1}]'\), so that the covariance matrix of the error term will be:

$$\begin{aligned} \varSigma ^M_{\epsilon } = M^2 \cdot \varSigma _{\epsilon }. \end{aligned}$$

In this test, with a fixed risk aversion parameter \(\gamma = 10\), we change the multiplier M and the investment horizon and report the certainty equivalent rates corresponding to the different algorithms.

Table 4 Comparing four methods with various model uncertainties.

As shown in Table 4, when the model uncertainty increases, “vBB” is the most impacted algorithm. “BGSS” performs somewhat better than “vBB” but worse than “SGBM” and “SGBM-LT” as the corresponding certainty equivalent rate is smaller and with higher uncertainty. “SGBM-LT” outperforms “SGBM”. The differences are obvious when the model uncertainty is high and the investment horizon is long. The “SGBM-LT” values in the table are obtained with sample size \(2^{14}\). In any case, “COS” yields the reference results, which are verified by using “SGBM-LT” with a large sample size \(2^{18}\). In that case, we find, for example, the certainty equivalent rate of “SGBM-LT” has mean value 7.71 and standard error 0.02 when \(T=20\) and \(M=4\).

5.6 Errors of the Four Simulation-Based Methods

In this subsection, we would like to briefly summarize the errors encountered within the methods analyzed. If we do not consider errors in the simulation part, the errors of the four simulation-based methods, “vBB”, “BGSS”, “SGBM” and “SGBM-LT”, can be subdivided into three categories:

  • approximation error which occurs when we approximate the true value functions by the Taylor series expansion.

  • projection error which occurs when we use low-order polynomials to approximate the conditional expectations of the value functions or of the approximated value functions.

  • regression bias which occurs when we use cross-path regression to approximate the conditional expectations.

The approximation error does not occur when Taylor series expansions are not involved, for example, in “vBB”. However, as we have seen in the numerical tests, “BGSS” and “SGBM” suffer from this source of error in a similar fashion, while “SGBM-LT” appears to suffer less.

The projection error is the main source of error in “vBB”, where low-order polynomials are implemented to approximate the value functions, which may be high-order functions when the risk aversion is high, see Eq. (12). For “BGSS”, “SGBM” and “SGBM-LT”, this is generally not a problem since the object functions, as in Eq. (9), are at most of fourth-order.

The regression bias, which has been discussed in Cong and Oosterlee (2015), can be controlled effectively by bundling. The regression bias is high in “vBB” and “BGSS” but relatively low in “SGBM” and “SGBM-LT”, that benefit from their bundling technique.

Table 5 Errors of the four simulation-based methods

A general description of the error components of the four simulation-based methods is listed in Table 5. “SGBM-LT” exhibits a highly satisfactory performance in our tests, since it has relatively small-sized errors in all three aspects. We expect however that when the risk aversion parameter is high and the model volatility is large, even “SGBM-LT” may fail to converge in some cases. In those cases, we propose either to use a large number of paths in the simulation together with more bundles or to implement a variance reduction technique.

6 Conclusion

In this paper, we enhance a popular dynamic portfolio management algorithm, the BGSS algorithm, in two aspects. First, for the computation of the conditional expectations appearing, we replace the standard regression method by the techniques from the SCBM, so that the variances of the approximated asset allocations and the corresponding certainty equivalent rates can be reduced. Then, a log Taylor expansion, based on a nonlinear decomposition, is employed in our algorithm. This expansion gives rise to improved results compared to the original ones when approximating the utility function. The resulting SGBM based portfolio management algorithm results in a lower biased approximation of the optimal asset allocations.

Based on the COS method and the grid-searching technique, we have developed the COS portfolio management method for generating reference values, which are quite comparable to the reference values and further serve as the “optimal” solutions in our numerical tests.

In our tests, combining SGBM and the log Taylor expansion yields superior results to those of other simulation-based algorithms. In all testing cases, “SGBM-LT” shows the higher certainty equivalent rates. When we merely consider introducing the SGBM components in the regression step, the benefits are obvious: the value function iteration and the portfolio weight iteration associated to both “SGBM” and “SGBM-LT” generate quite similar results, which indicate that the approximation errors at each recursion step are small.

Our simulation- and regression-based algorithm “SGBM-LT” can be generalized to higher-dimensional dynamic portfolio management problems. Besides, since our algorithm is robust even in scenarios with high volatility dynamics, it is also possible to focus on models with more complicated dynamics, for example, models with jump components or other time series models. In those cases, we may need some effective bundling technique as proposed in Jain and Oosterlee (2015) and Cong and Oosterlee (2015) but in each local domain we may still use low-order polynomials as the basis functions. This helps to retain the robustness of our algorithm. In this paper, our benchmark method is only employed for the case of one risky asset. It can be extended at least to solving portfolio management problems with two or three risky assets. For all algorithms, it is promising to adopt parallel computation.

One potential future research direction is to combine the grid-searching approach with SGBM. This may be useful for utility-based optimization problems, where the utility function cannot be approximated accurately by its Taylor expansion. In that case, an adaptive grid-search combined with SGBM may constitute a generic solution.