Multi-period power utility optimization under stock return predictability

In this paper, we derive an analytical solution to the dynamic optimal portfolio choice problem in the case of an investor equipped with a power utility function of wealth. The results are established by solving the Bellman backward recursion under the assumption that the vector of asset returns follows a vector-autoregressive process with predictable variables. In an empirical study, the performance of the derived solution is compared with the one obtained by applying the numerical method. The comparison is performed in terms of the final wealth and its expected utility. It is documented that the application of the analytical solution to the multi-period portfolio choice problem leads to higher values of both the final wealth and the expected utility.


Introduction
With the development of the monetary system and the rise of market trading capacity, portfolio theory has appeared to be an important approach in explaining the behaviour of stock prices and has provided a quantitative tool for strategic investment.Introduced in 1952 by Harry M. Markowitz, the optimal portfolio choice problem has become a popular topic in finance and applied statistics (see, e.g., Bodnar and Okhrin 2013;Bonaccolto et al. 2018;Consigli et al. 2019;Muhinyuza et al. 2020;Bodnar et al. 2021;Mariani et al. 2022).It aims to minimize the variance of 4 Page 2 of 27 the portfolio given some predefined level of the expected portfolio return called the mean-variance paradigm (see, Markowitz 1952;Merton 1972).
The mean-variance approach of Markowitz provides a simple way how to diversify the investment portfolio that brings a trade-off between the expected return and the variance manually settable by an investor.On the other hand, there are some challenges that the Markowitz optimization problem meets.Namely, it can be only presented as an expected utility maximization of the quadratic or exponential utility functions and it may be considered as a second-order approximation for an arbitrary utility.Besides that, it provides a single-period solution, whereas most investment strategies have a multi-period setup with a portfolio rebalancing during the investment horizon (see, Brandt 2009;Meucci and Nicolosi 2016;Zhao et al. 2022).
In this paper, we concentrate on finding the multi-period portfolio strategy for the expected utility optimization problem.The concept of the expected utility of wealth has become and remains an intensively discussed branch of portfolio theory (see Brandt 2009;Pennacchi 2008;Campbell and Viceira 2002;Bodnar et al. 2018).The utility is defined as an increasing concave function, which fulfills four axioms and implies an investor who benefits from the maximization of its expected value (see, Pennacchi 2008;Campbell and Viceira 2002).As such, several approaches can be used to specify the utility function and the investor has to choose one of them.To classify and understand different utility functions, one can employ the concept of "risk premium" which is usually presented by absolute risk aversion (ARA) and relative risk aversion (RRA).For instance, as discussed by J. Pratt (see, Pratt (1964)), the constant absolute risk aversion represents an individual willing to pay some fixed amount of money to avoid risk.On the other side, a willingness to sacrifice a percentage of the investor wealth characterizes the concept of relative risk aversion (Pennacchi 2008;Campbell and Viceira 2002).The exponential utility is a common example of the utility function with constant absolute risk aversion (CARA), while the power utility represents the constant relative risk aversion (CRRA) utility (see, Brandt 2009;Pennacchi 2008).Chronopoulos et al. (2011) developed a utility-based framework for optimal investment and made use of the CRRA utility function to determine the optimal time of investment.Finally, the other popular utility function used in the financial literature is the quadratic utility which is neither CARA nor CRRA (see, Li and Ng 2000).
The application of the power utility in portfolio selection problems has some advantages over other utility functions.For example, it obeys a decreasing ARA, which implies that with increasing wealth the investor would increase the number of dollars in the risky assets held in the portfolio.This is not the case, however, for the quadratic utility, which exhibits an increasing ARA, causing an unrealistic behaviour of investors on the market.Although the exponential utility obeys a constant ARA, which is more plausible than the increasing ARA in the case of the quadratic utility, it still lies behind the power utility because of the experimental and empirical evidences, which mostly justify a decreasing ARA (see, Friend and Blume (1975)).Moreover, the power utility investor would have an optimal strategy which is independent of the initial value of the wealth.This constitutes another very appealing property of the power utility.There are also some criticisms on the power utility, e.g., it seems to have an impact on the equity premium puzzle (see, Mehra and Prescott (1985)) and it is unbounded, while the exponential utility does not possess such issues (see, Alpanda and Woglom (2007) for more details).Even though we have focused solely on the power utility, we believe that a proper combination of both exponential and power utilities can potentially lead to a "better" utility function.Moreover, the methodology developed in the paper can be generalized to a more general class of utility functions, like hyperbolic absolute risk aversion (HARA) utility.This important and challenging problem has been left for future research.
Most of the results derived in the literature about the dynamic optimal portfolio choice problem are obtained in the continuous time scenario by using a Markov decision process with the asset prices that follow a geometric Brownian motion (see Vigna 2009;Çanakoğlu and Özekici 2012;Chronopoulos et al. 2011;Lioui and Poncet 2013;Moreno-Bromberg et al. 2013).Besides that, the quadratic utility function has appeared to be the most popular utility (see, Li and Ng (2000), Bodnar et al. (2015a)), while the exponential utility is another popular choice which is usually accompanied by the assumption that the asset returns are normally distributed (cf., Liu (2004), Soyer and Tanyeri (2006), Bodnar et al. (2015b)).On the other side, Li et al. (2001) argued that solving a dynamic optimal portfolio choice problem in the discrete-time setup is a considerably more difficult task.In the current paper, we present the solution for the power utility in the discrete-time case.Our findings complement the existent analytical solutions for multi-period portfolio choice problems derived when the expected quadratic utility function (cf., Li and Ng (2000), Bodnar et al. (2015a)) or the exponential utility (see, e.g., Bodnar et al. (2015b)) are used.Also, it generalizes the numerical solution which is obtained by applying the Taylor series expansion to the utility or value function (see, Brandt et al. 2005;Garlappi and Skoulakis 2011).
In the derivation of the theoretical results, it is assumed that the asset returns follow a vector auto-regressive model of order one with Gaussian error terms.The latter is a popular approach for modelling the behaviour of stock returns in financial literature (Brandt et al. 2005;Campbell et al. 2003).Applying the Bellman equation we obtain the dynamic solution of the multi-period maximization problem.In an empirical illustration, the new approach is compared to the well-known numerical solution suggested in Brandt et al. (2005), where the weights were obtained by the Taylor series expansion of the value function.The predictable model is fitted to weekly stock returns, while we consider an investor with an investment horizon of 1 month (4 weeks), 2 months, 3 months and 4 months who rebalances the holding portfolio on a weekly basis.The results of the simulation study show that the new solution provides a higher average utility compared to the numerical one.These results hold independently of the value of the risk-aversion coefficient and the investment horizon.Moreover, when the investment horizon increases, our findings become more pronounced.In particular, for the investment horizon of 3 and 4 months, we find that with a probability of about 80% the expected utility of the final wealth is larger than −0.1 when the closed-form solution is used, and the prob- ability of such an event is close to zero for the numeric approximation.As such, the dynamic investment strategy based on the closed-form solution outperforms the investment strategy based on the numeric approximation.On the other side, it 4 Page 4 of 27 should be noted that the closed-form investment strategy is obtained under some model assumptions imposed on the data-generating process.The violation of these assumptions may lead to the strategy, which is not optimal any longer.Thus, an investor should first validate the model before applying the derived optimal portfolio strategy in practice.To this end, we also point out that both the closed-form strategy and the one based on the numeric approximation are explicitly derived for the investor who maximizes the power utility function.The solutions of dynamic optimal portfolio choice problems under other utility functions, like the quadratic utility and exponential utility, can be found in Çanakoğlu and Özekici (2010), Li and Ng (2000), Bodnar et al. (2015a), Bodnar et al. (2015b), among others.
The rest of the paper is structured as follows.Section 2 presents a framework and a problem formulation considered in the paper.The analytical solution of the utility maximization problem based on the power utility function is deduced in Theorem 1 of Sect. 3. Section 4 provides an empirical study investigating the practical relevance of the derived results.Section 5 summarises the theoretical and empirical findings obtained in the paper, while the proof of Theorem 1 can be found in the "Appendix" (Sect.6).

Framework
Let r t denotes a k-dimensional random vector of simple returns on risky assets and let r f t be the non-random simple return on a risk-free asset at time t.We also consider predictable variables t as a n-dimensional vector and assume that the combined vector X t = r � t , � t � follows a vector autoregressive process (VAR) of order one given by It is assumed that { εt } are independent random vectors with εt ∼ N( , ̃ t ) where ̃ t defines (k + n) × (k + n)-dimensional positive definite deterministic covariance matrix possibly dependent on t.In practice, this matrix can be fitted by employing the results from Xu and Phillips (2008) and Bodnar and Zabolotskyy (2011), who considered autoregressive models with time-varying unconditional variances.Moreover, Xu and Phillips (2008) pointed out that the time series models with time-depending variances are more flexible and they usually provide a better fit in practice.The predictable variable is often interpreted as the most correlated element with r t (see, Brandt and Santa-Clara 2006;Campbell et al. 2003;Bodnar et al. 2015b).It could be, for example, dividend yield, term spread, or even another asset return.Here we assume that the whole wealth is solely invested in the risky assets r t and in the risk-free asset, but not in the vector of predictable variables t .
To recover the vector of returns one can multiply X t by the dimension-reduction matrix = [ k k,n ] which consists of the k-dimensional identity matrix k and the k × n-dimensional zero matrix k,n .Let  = φ , Φ = Φ and  t = εt .Then, it holds that (1) We denote the conditional mean vector of r t by and its conditional covariance matrix by given the information set F t−1 available at a time point t − 1 .Hence, we get r t |F t−1 ∼ N( t , t ) .Note that t is assumed to be deterministic, while t can be ran- dom as a measurable function of F t−1 .
By t we denote the vector of portfolio weights at time t.Let r P,t denote the port- folio simple return at time t and let rP,t be the portfolio log return at time t, i.e., rP,t = ln(1 + r P,t ) .Then the wealth evolution process of the investor is given by or using the approximation of the portfolio log return by the simple asset returns we get for t = 1, … , T .Consequently, where W 0 stands for the initial wealth.
Remark 1 It has to be noted that (3) holds only approximately and it is based on the approximation of the portfolio log return by the corresponding linear combination of the asset log returns.This approximation works usually very well in practice and it is a standard approach in financial literature (see, Tsay 2005, p.5).
The classical multi-period portfolio choice problem aims to maximize the expected utility U(⋅) of the final wealth W T received after the last investment period T. Namely,

Let
Then it holds (2) In the literature, V t is often called the value function.
According to Brandt (2009) and Pennacchi (2008) problem ( 4) can be solved using the Bellman equation expressed as Equation ( 8) determines a backward recursive scheme where the optimal weights are calculated by starting from the last investment period and ending up with the first one.
In this paper, we consider the utility of wealth given by a power function, also known as constant relative risk aversion (CRRA) utility, which belongs to the HARA family (see, LiCalzi and Sorato 2006).The power utility is defined by It should also be noted that in the limiting case, i.e., when = 1 , then U(W T ) becomes the logarithmic utility function given by The CRRA utility implies the behaviour of an investor such that his/her decisions are independent of the initial wealth, which appears to be a very attractive property.
The multi-period optimization problem for the power utility is solved in the continuous-time setting by assuming the geometric Brownian motion for the stock prices and the Markov processes for the market states (see, Çanakoğlu and Özekici 2010, 2012).Moreover, in the case of short time intervals and in the case of the single-period investment the log-normal distribution is used to model the portfolio return (see, Campbell and Viceira 2002;Bodnar et al. 2020).In our framework we consider the discrete-time case assuming that the asset returns are modelled by the vector autoregressive process (1) and provide the analytic solution of the multiperiod optimal portfolio choice problem (8).

Multi-period optimal portfolio: closed-form solution
Under the power utility function ( 9) and the assumption of VAR process (1) imposed on the asset returns, the multi-period optimal portfolio selection problem has a solution presented in Theorem 1. Henceforward, we will call it the "closed-form" solution. (6) Theorem 1 Let the join vector of asset returns and predictable variables follow a VAR(1) process as defined in (1).Then for  > 1 the optimal portfolio weights, which solve (4), are given by and The proof of Theorem 1 is given in the "Appendix" (see, Sect.6).The following two corollaries formulate the results obtained under several important special cases of model (1).
Corollary 1 If no predictable variable is present in (1), the optimal weights in Theorem 1 can be simplified to for t ∈ 2, … , T , and Corollary 1 corresponds the statement of Theorem 1 with n = 0 , i.e., when the matrix L is replaced by the identity matrix and, thus, ̃ t = t , φ =  , and Φ = Φ.
Corollary 2 Let the vectors of asset returns r t be stochastically independent over time.Then, the optimal weights for the multi-period portfolio choice problem (4) with  > 1 are given by for t ∈ 1, … , T.
The statement of Corollary 2 follows from Corollary 1 by setting Φ = . (10) As expected, we receive that all the formulas for the optimal portfolio weights do not depend on the wealth component, implying that the optimal portfolio weights of the investor equipped with the power utility function are independent of the amount of initial wealth (see Campbell and Viceira 2002;Pennacchi 2008).One should also point out that the result obtained in Corollary 2 has a similar form to the optimal portfolio rule for a power utility derived by Marín-Solano and Navas (2010), where the asset prices are assumed to follow a geometric Brownian motion.On the other side, under a weaker assumption of the vector auto-regressive process, an additional term is appeared caused by the model-specific dependence structure.

Empirical illustration
In order to present the relevance of the obtained results in practice, we use the VAR(1) model estimated by the weekly returns of five stock market indices, i.e., Belgium, Germany, Japan, the UK and the USA, from January 4, 2002, to December 4, 2009 (see, Bodnar et al. 2015b).
The parameters of the model (1) for the considered data are estimated in (see Bodnar et al. 2015b, p.532) and they are given by The fitted autoregression matrix Φ displays larger values of the coefficients which correspond to the lagged values of the U.S. market return in comparison to the lagged values of the return on each capital market itself.As such, the return of the U.S. stock market is considered to be a predictable variable in our model and we use the returns of the other four capital markets to construct a dynamic four-dimensional optimal portfolio.Without loss of generality, we set the initial wealth to be equal one, i.e., W 0 = 1 , and the weekly return of a hypothetical risk-free asset r f t is assumed to be 0.06%, corresponding to the annual return of around 3%.
The performance of the closed-form solution derived in Theorem 1 is compared to the one obtained by the numerical solution presented in Brandt et al. (2005).The numerical approximation of the weights of the dynamic optimal portfolio is obtained by the 4th order Taylor series expansion of the value function (see, Ivasiuk 2019).The comparison is done by simulating the return path and calculating the weights corresponding to the numerical and closed-form solutions, which then are used in the computation of the final utility.All values reported in the table and figures are based on 10000 independent repetitions.The simulations are performed for several combinations of the relative risk aversion parameter ∈ {4, 6, 9, 12} and time hori- zon T ∈ {4, 8, 12, 16} .The choice of the risk aversion coefficient is motivated by the findings documented in Barsky et al. (1997), Elminejad et al. (2022), andPennacchi (2008, p.21).In particular, Barsky et al. (1997) and Pennacchi (2008) pointed out that the median relative risk aversion is approximately equal to 7, while Elminejad et al. ( 2022) argued that the most common values of the risk aversion coefficient lie between 2 and 7 in the financial applications.Motivated by these results, we set = 4 and = 6 , and also add two further values corresponding to a more risk-averse investor.
The results are depicted in Table 1 and in Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12. Table 1 demonstrates the values of the sample mean and sample median of the power utility of the final wealth together with the mean absolute deviation and the median absolute deviation, respectively, for several values of and T described above.Both the sample median and the sample median absolute deviations are robust estimators for the location and scale parameters of a probability distribution.On the other side, the sample mean and the mean absolute deviation can be affected by extreme observations.As such, to robustify these two measures, their 95% trimmed versions are used.
We observe that all the sample means and the sample medians of the final utility corresponding to the numerical solution are always smaller than those obtained by using the analytical solution derived in Theorem 1.The differences between the values increase as the investment horizon becomes larger.Very pronounced results are documented when the sample median is used as the centrality measure.In this case, for T = 4 the application of the analytical solution leads to the values of the final utility which are three times as larger as the corresponding values computed by applying the numerical solution, while for T = 16 they are more than 200 times larger indicating a considerable improvement when the analytical solution is used.The values of the median absolute deviation are also significantly smaller in the case when the closed-form solution is used with the exception for T ∈ {12, 16} .For the sample means, the differences are not so large, although even in this case the closed-form solution outperforms the strategy based on the numerical solution.On the other side, the mean standard deviations computed for the closed-form solution are considerably larger than the ones obtained for the numerical solution.Such a result is also observable in Figs. 1, 2, 3, and 4, where the histograms of final wealth are plotted for the considered values of and T. In all plots, the histograms created for the closed-form strategies are right skewed with the range of final wealth that includes values considerably larger than one, the point where the histograms for the numerical solution are concentrated.As such, large values of the final wealth have a strong impact on the computed mean absolute deviation in the case of the closedform solution.To this end, we also compute the p-values of the two sample t-test for comparing the means and the medians of the two samples.Interestingly, all the 4 Page 10 of 27 p-values appear to be close to zero leading to the conclusion that the differences in the means and the medians are statistically significant in all of the considered cases.Finally, the increase of the investment horizon leads to opposite effects on the two dynamic trading strategies.While the numerical solution becomes more volatile, the application of the closed-form solution considerably reduces the uncertainty in the stochastic behaviour of the final utility.
The point is further investigated in Figs. 5,6,7,8,9,10,11 and 12, where the empirical cumulative distribution functions (ecdf) of the final wealth obtained by the  precisely, the intersection points on the curves of two ecdfs reveal that the probability of receiving larger final wealth is larger when the closed-form solution is used.The impact becomes more pronounced when both T and increase.
A similar relationship between the performance of the two considered dynamic trading strategies is observed in Figs. 9, 10, 11 and 12 with ecdfs constructed for the final utility.This result holds uniformly over the considered values for the risk aversion coefficient and the investment horizon T, with higher impact when T and increase.In contrast to the figures with the final wealth, we observe more volatile behavior for the utility of the final wealth computed by using the numerical solution.Also, the ecdfs  the closed-form trading strategy is used.For the trading strategy based on the numerical approximation the probability of such an event is less than 30% when  < 12.
Next, we investigate the computational time of both the numerical and analytical solutions in Fig. 13.The analytical solution shows a clear advantage over the numerical one even in the case of small time horizons.The logarithmic computational time of the analytical solution always stays negative, which indicates that the computational cost is still low for time horizon T = 60 .This is not the case for the numerical strategy where the logarithmic computational time is always positive.Moreover, it is greater than one and reaches four in the case of T = 60 .This justifies that the numerical method is prob- ably not the best choice for increasing time horizon.Using more summands of the Taylor series expansion in the case of the numerical solution would increase the accuracy, however, the computational time will increase considerably as well.
At last, we compare the proposed analytical solution obtained for the power utility function with the one of Bodnar et al. (2015b) which was derived under the exponential utility expressed as Although the comparison is not completely fair because both utility functions represent different risk behaviour of the investors, the solutions seem to be quite similar.So, a question arises which of the two multi-period optimal portfolio strategies can generate a higher amount of the final wealth.This question is answered in Fig. 14.To keep the comparison fair we distinguish between low-and high-risk situations.The low-risk scenario will correspond to = 9 and = 2 for the power and expo- nential utilities, respectively.Similarly, the high-risk one will arise for = 4 and U(W T ) = −e −W T with  > 0. = 0.8 .In both cases, we consider the investment horizon of T = 52 and T = 104 .We observe a striking property of the analytic solution derived under the power utility function.Namely, it can generate considerably higher wealth levels in comparison to the solution derived under the exponential utility (stochastic dominance of the first order).The only case, where the situation is a bit unclear is the low-risk situation with T = 52 .However, one can still observe the stochastic dominance of the second order for the power utility solution in this case (the difference in the areas between the curves).To conclude, one can notice a clear dominance of the power utility over the exponential one in terms of final wealth when the time horizon is large enough.

Summary
In this paper, we present the solution to the multi-period portfolio selection problem for the investor who makes the investment decision based on maximizing the expected power utility.Using the Bellman back-propagation method, the closedform expression of the dynamic optimal portfolio weights is derived under the assumption that the vector of asset returns follows a vector-autoregressive process of order one with Gaussian errors.In the derivation of the theoretical results the relationship between the portfolio log returns and the simple asset returns is employed which is widely used in finance and is known to provide a very good approximation in practice (cf., Tsay (2005, p.5)).The solution of the multi-period optimization problem is deduced for the investor whose relative risk aversion is assumed to Final wealth, WT Distribution function, F(WT) exponential utility power utility Fig. 14 Empirical distribution function of the final wealth for the closed-form solutions of the power and exponential utility functions be larger than 1 which corresponds to the recent survey study documented that the median relative risk aversion of investor is 7 (see, e.g., Pennacchi (2008)).Moreover, ∈ (0, 1) indicates an extremely risky behaviour of an investor that is a rare case in practice.
The obtained theoretical findings are implemented in the empirical study, where the performance of the derived closed-form solution is compared to the numerical approximation suggested in Brandt et al. (2005).The results of the simulation study show that the new closed-form solution of the dynamic portfolio choice problem under the power utility considerably outperforms the numerical approach by providing significantly larger values of the final wealth and utility.This observation holds uniformly for all the considered values of the relative risk aversion coefficient and the investment horizons.Moreover, it is shown that the application of the derived closed-form solution leads to larger both the final wealth and its utility.In particular, when the investment horizon is set to be equal to 12 or 16 weeks, then with a probability of about 80% the closed-form strategy leads to the value of the util- ity which is larger than −0.1 .In contrast, if the numerical approximation is used, then the probability of receiving the utility larger than −0.1 is close to zero.To this end, we note that the performance of the numerical approximation can be improved when more than four summands are used in the Taylor series expansion of the value function.However, this will increase the computational time considerably, which is always dominated by the analytic solution.At last, one can observe that the proposed strategy dominates the solution given by the exponential utility for increasing time horizon.
The analytical results derived in the paper do not include the transaction costs which appear when the holding portfolio is rebalanced.The inclusion of the transaction costs in the derivation is a very challenging task, which is left for future research.

Appendix
In this section the proof of Theorem 1 is given.

Proof of Theorem 1
The Bellman equation is solved by deriving the expressions of the value functions starting at time T − 1 , which is expressed as , the conditional expectation is equal to the value of the moment-generating function defined for the multivariate normal distribution at (1 − ) T−1 .Thus, for  > 1 we get (16) Finally, the application of Theorem 3.2a.1 in Mathai and Provost (1992) leads to which is maximized when Substituting a T−1 and solving with respect to T−2 gives us the optimal weights for the value function at time T − 2 expressed as Moreover, where The value function at time T − 3 is given by 4 Page 24 of 27 with Using that the conditional expectation in ( 19) is given by (see, Mathai and Provost 1992, Theorem 3.2a.1) the maximum in ( 19) is attained at Finally, the value function at time T − 3 is equal to (19) with Since the value function at time T − 3 has the same structure as the value function at time T − 2 , the rest of the steps in the Bellman backward recursions are performed in the same way leading to the expressions of the optimal weights presented in the statement of the theorem.◻

Fig. 1 4
Fig. 1 Histogram of the final wealth for the closed-form solution and the numerical solution for = 4 and T ∈ {4, 8, 12, 16}

Fig.
Fig. Histogram of the final wealth for the closed-form solution and the numerical solution for = and T ∈ {4, 8, 12, 16}

Fig. 12
Fig. 12 Empirical distribution function of the power utility for the closed-form solution and the numerical solution for = 12 and T ∈ {4, 8, 12, 16}

Fig. 13 4
Fig. 13 Logarithmic computational time in log(seconds) as a function of the horizon T , the value function at time T − 1 is given by where c T = exp((1 − )r f T ).The application of (16) leads to the value function at time T − 2 expressed as Let Xt = X t − r f t � .Then taking into account that the conditional expectation in (17) can be rewritten in the following way with

Table 1
The sample mean, the sample median, the mean absolute deviation and the median absolute deviation of the power utility of the final wealth for the numerical solution and the closed-form strategy for ∈ {4, 6, 9, 12} and T ∈ {4, 8, 12, 16}The largest value of the sample mean and sample median is highlighted bold in each box 6, 7 and 8, while the ecdfs of the corresponding utilities are shown inFigs.9, 10, 11  and 12.In Figs. 5, 6, 7and 8, we observe that the final wealth obtained by applying the numerical solution is tightly spread around one, whereas the closed-form portfolio has its wealth mostly larger with some amount of values being smaller.More