Optimal portfolio choice: a minimum expected loss approach

The mainstream in finance tackles portfolio selection based on a plug-in approach without consideration of the main objective of the inferential situation. We propose minimum expected loss (MELO) estimators for portfolio selection that explicitly consider the trading rule of interest. The asymptotic properties of our MELO proposal are similar to the plug-in approach. Nevertheless, simulation exercises show that our proposal exhibits better finite sample properties when compared to the competing alternatives, especially when the tangency portfolio is taken as the asset allocation strategy. We have also developed a graphical user interface to help practitioners to use our MELO proposal.


Introduction
The mainstream methods to estimate the optimal weights in the portfolio allocation problem is based on the plug-in approach; that is, individual location and scale estimates are simply plugged into the objective expression without explicit consideration of the main goal of the inferential situation. However, this approach has some shortcomings: it ignores parameter uncertainty [1][2][3][4][5][6], has infinite mean in some cases (tangency and Treynor-Black portfolios) [7], and has unbounded risks relative to quadratic loss functions [8].
To mitigate these issues, we follow a decision theory framework based on a Bayesian Zhou [6] proposed optimal portfolio allocation estimators minimizing a risk function that dependents on the out-of-sample performance of the expected investor's utility function. Kan and Zhou [6] focused on admissible trading strategies, admissibility is a minimum requirement on decision rules, proposing a "three-fund" portfolio rule composed by the risk-free asset, tangency portfolio, and the global minimum-variance. Tu and Zhou [27] proposed a combined portfolio between the naive strategy and one that comes from an optimization problem. They proposed the naive portfolio as a shrinkage target. Other strategies to mitigate estimation error are based on robust portfolios [28,29], and transforming the optimal weight estimation problem into linear regressions [30,31]. In the former approach, parameter uncertainty is taken into account in the optimization procedure. In the latter approach, Li [30] proposed a sparse and stable methodology based on lasso and ridge regressions with similar statistical characteristics than shrinkage estimators. Klimenka and Wolter [31], also proposed a regression framework that uses the focused information criterion [32], which is based on the trading strategy, and model averaging to take model uncertainty into account.
Our proposal has some characteristics from previous proposals due to being based on a Bayesian setting [2,3,16,17] under a decision theory framework focused on the final inferential goal [6,31]. However, our proposal is based on minimizing the posterior expected loss function rather than the frequentist risk function, and our loss function is based on the trading strategy rather than the utility function. In particular, we exploit the specific structure (rational functions) of the main objective of estimation in three well known portfolio optimization problems, and propose the MELO approach obtaining same asymptotic results as the plug-in approach, but showing that our proposal obtains better statistical properties in finite samples when compared to the competing alternatives, especially when the optimal trading rule is the tangency portfolio. To the best of our knowledge, this the first time that the MELO estimator is used for these three optimal portfolio strategies.
The rest of this paper is structured as follows. Section 2 shows theoretical framework of different competing alternatives. Section 3 develops the MELO estimates for global minimum variance, tangency portfolio and Treynor-Black model. Section 4 exhibits the outcomes of the simulation exercises. In Sect. 5, we develop an empirical study. Finally, we make some conclusions.

Theoretical framework
Suppose that the investment universe consists on N assets. Denoting by R t the excess of returns of the N assets at time t, R t = (r 1t , r 2t , . . . , r N t ) . 1 It is assumed that the excess of returns has a multivariate normal distribution R t ∼ N (μ, Σ). The portfolio weights are the proportion of wealth invested in each of the N assets, w = (w 1 , w 2 , . . . , w N ) . We suppose that the investor has a portfolio holding period of length κ and that the investor wants to maximize their wealth at the end of the investment horizon, T + κ, where T is the last period for which return data is available (sample size).

Global minimum variance portfolio
The global minimum variance (GMV) is a portfolio whose weights represent the combination that gives the minimum variance between all possible portfolios. It is defined as the solution of the minimization problems, where 1 denotes a vector of ones. Because Σ is positive defined, the GMV is unique and the solution of the minimization problem is (1)

Tangency portfolio
The tangency portfolio is defined as the portfolio that has the highest Sharpe ratio. The tangency portfolio solves the constrained maximization problem thus, the solution has the expression

Treynor-Black Model
Active management searches some sources of abnormal returns (alpha) to outperform a passive benchmark portfolio. The Treynor-Black model, which was proposed by Treynor and Black [33], tackled this problem by assuming an investor who considers that most securities are mis-priced with respect to an asset pricing model but who believes that they have information that can be used to predict the abnormal returns of a few of the securities. Consider the following regression model, where r Mt is the excess of return of the benchmark portfolio, and e t ∼ N (0, H). This strategy consists of investing in an active portfolio (A) containing the assets for which the investor has made a prediction about abnormal return and a passive portfolio (B, benchmark) containing all assets in proportion to their market value. Let's w * denote the weights for the active portfolio that maximize the information ratio.
where α T +κ = (α 1,T +κ , α 2,T +κ , . . . , α N ,T +κ ) . The solution is given by The second stage is to construct an optimal mix of A and B to form a risky portfolio P. This is a standard two risk assets portfolio problem. Here, w A and 1 − w A denote the weights of wealth invested in A and B, respectively, where Observe that all of the optimal weights depend on future expected returns at T + κ. As a consequence, they depend on parameter estimates.

Plug-in approach
The classical approach estimates parameters using available sample information and then plugs these estimates in the optimal solutions omitting parameter uncertainty. In particular, where R is a T × N matrix of excess of returns, X = [1 r M ] is a T × 2 design matrix, and B = α β .

Shrinkage approach
A shrinkage estimator is a weighted average of the sample estimator and the so-called Bayes-Stein estimator of the mean. Under this approach, where μ 0 is the shrinkage target, and the shrinkage intensity λ is given by Jorion [19] proposed as the shrinkage target the return on the global minimum variance portfolio, 2

Bayesian approach
The Bayesian approach accounts for parameter uncertainty. In particular, it expresses the investor's problem in terms of the predictive distribution of the future excess returns. Denoting the unobserved κ next-periods excesses return data by R T +κ , the predictive return density is where p(μ, Σ | R) is the joint posterior density, and p(R T +κ | μ, Σ) is a multivariate normal density is the likelihood function, and p(μ, Σ) is the prior density.
In the following, we show the Bayesian solution under two situations: non-informative and informative priors (see supplementary material section 1).
Non-informative priors In this case, the investor is uncertain about the distribution of the parameters μ and Σ, and has no particular prior knowledge. This situation can be represented by a flat prior, which is typically taken to be the Jeffreys' prior (see supplementary material subsection 1.2).
The estimates for μ T +κ , Σ T +κ , α T +κ , β T +κ and H T +κ are and where Mκ is a forecast about future benchmark portfolio returns, c 2 = C κ,κ − C κ,1:κ−1 C −1 1:κ−1,1:κ−1 C 1:κ−1,κ . 3 Informative priors Now we suppose that the investor has information about parameters in the investment period. We get the following results using conjugate family priors (see supplementary material subsection 1.1), and, where η is an N dimensional vector of prior mean returns, τ is a hyperparameter that defines prior precision, Ω and H 0 are prior scale matrices associated with the covariance matrix, More precision about prior information implies more weight associated with this source.

Minimum expected loss for trading strategies
Taking into account that the financial trading strategies (Eqs. 1, 2 and 3) are rational functions of parameters and that these are the final objective of estimation, we propose the following framework. Suppose that the main concern of estimation is and m(θ ) = 0 are polynomial functions in θ, such that g i (θ ) is a continuously differentiable constant order transformation. 5 Setting ω = (w 1 , w 2 , . . . , w N ), the optimal portfolio weights, we propose to focus our inferential problem directly on our final objective; that is, the trading strategies. Therefore, we select as an estimator the Bayesian action that minimizes the posterior expected value of a generalized quadratic loss function focused on the optimal portfolio rules, let us say g(θ ), where h(θ ) > 0 is a case specific weighting function.

provided previous assumptions on g(θ ) and h(θ ), and integration and differentiation can be interchanged (see assumptions E and F in supplementary material subsection 2.1 for details).
See the supplementary material for a proof (subsection 2.2).
Observe that Proposition 1 implies that the MELO is a kernel weighted average of g(θ ). These weights implicitly depend on the probability associated with each θ in their parameter space, as well as their magnitude. When h does not depend on θ , which implies equal weight to each θ , the minimum expected loss estimate is the posterior mean; that is, If our problem is to estimate the weights for the global minimum variance portfolio, where Σ is the covariance matrix of the excess of returns, then we have is the estimation error introduced by the estimateω. Then, the posterior expected loss function is

Corollary 1 The MELO estimate for the weights associated with the minimum variance portfolio is given byω
Proof This is an immediate consequence of Proposition 1 taking g(θ ) We can see from Corollary 1 that the MELO estimate for the weights of the minimum variance portfolio is a weighted average, where the weights depend on the updated belief regarding the variance of the minimum variance portfolio. In particular, covariance matrices that imply larger portfolio's variance have smaller weights to calculate the MELO estimates. This is consistent with the logic of the optimization problem from a financial theory perspective, whose concern is to minimize the variance of the portfolio.
If the main concern is an estimate of the weights associated with the tangency portfolio, ω = Σ −1 μ 1 Σ −1 μ , where μ and Σ are the mean and covariance matrix of the excess of returns, then we set = (1 Σ −1 μ)ω − Σ −1 μ. Then, the loss function is L (Σ, μ,ω) = , and the posterior expected loss, 2 are the mean and variance of the tangency portfolio.

Corollary 2 The MELO estimate for the weights associated with the tangency portfolio is given byω
Proof This is a consequence from Proposition 1 taking g(θ ) We can see from Corollary 2 that the MELO estimate for the weights of the tangency portfolio is a weighted average, where the weights depend on the updated belief regarding the ratio between the mean and the variance of the tangency portfolio. In particular, combinations of the mean and covariance matrices that imply larger portfolio's ratios have larger weights to calculate the MELO estimate. This is consistent with the logic of the optimization problem from a financial theory perspective, whose concern is to maximize the Sharpe ratio.
In addition, we propose MELO estimates for the weights of the Treynor-Black model, whose optimal solution is ω = H −1 α 1 H −1 α , where α and H are the intercept and covariance matrix of the stochastic errors in the model r it = α + β i r Mt + e it . In this framework, we − ω), and the posterior expected loss, 2 are the wighted alpha and wighted stochastic error variance of the Treynor-Black portfolio.

Corollary 3 The MELO estimate for the weights associated with the Treynor-Black portfolio is given byω
Proof This is a consequence from Proposition 1 taking g We observe in Corollary 3 that the MELO estimate for the weights of the Treynor-Black portfolio is a weighted average, where the weights depend on the updated belief regarding variables directly associated with the information ratio. This is consistent with the logic of maximizing the information ratio.
For the asymptotic results, we find that our MELO proposal has the same properties as the plug-in (ML) estimator.

Proposition 2
Assuming that g(θ ) and h(θ ) are continuous constant order functions having nonzero first order, then the density function, f (R|θ), satisfies common assumptions of the maximum likelihood estimator [34], and π(θ ) satisfies the Bernstein-von Mises theorem's conditions [35] (see Assumptions in supplementary material, subsection 2.1 for details) then, whereθ and θ 0 are the maximum likelihood estimator and "true" parameter, respectively. See supplementary material for a proof (subsection 2.3). Consequently, 1 Ĥ −1α in the cases of the minimum variance portfolio, tangency portfolio estimator, and the Treynor-Black portfolio estimators. We find the same results for asymptotic distributions; that is, g(θ 0 )). However, we should take into account that in the case of tangency portfolio, and probably Treynor-Black case, the moments of the exact distribution do not exist [7].  6 In particular, we calculate squared errors, where ω i are optimal trading weights using population parameters andω s i are optimal trading weight estimates for each simulation using different statistical approaches. Table 1 shows descriptive statistics of the squared error for the global minimum variance portfolios. As we can see, the results are almost the same for each methodology, except naive weights, which gives the lowest mean squared errors. The fact that we got almost same results with most of the trading strategies is in agreement with the literature, given that the global minimum variance portfolio depends on only the covariance matrix and this does not introduce excessive estimation error [15]. Additionally, we cannot forget that the main concern of the global minimum variance portfolio is to minimize portfolio variance. Table 2 shows the descriptive statistics of portfolio variance of each methodology. We can observe that there are no meaningful differences between them. The results change drastically when using the tangency portfolio due to including estimation of the expected return. We can see the outcomes of our simulation exercises in Table 3. In particular, mean squared error (MSE) and range of variability associated with MELO are lower than the other methodologies. As expected, the plug-in and non-informative Bayesian both obtain same results. They are also the worst estimators in these settings. Table 4 shows the Sharpe ratio for each methodology, where we observe that the two MELO methodologies   Table 5 shows descriptive statistics of the squared error associated with Treynor-Black trading strategy. We observe that the plug-in approach has the highest MSE followed by the non-informative Bayesian. MELO informative presents the smallest MSE. Table 6 shows descriptive statistics of information ratios. Informative Bayesian and MELO, on average, have the best outcomes.
We also performed out of sample simulation exercises for tangency portfolio and Treynor-Black model taking 12 periods as the investment horizon and holding portfolios until the end of these periods. For this experiment, we consider the average of the portfolio returns in the out  of sample period as hyperparameter for the informative prior on the expected return, and the average of the difference between the return and benchmark returns as hyperparameter for the informative prior of the abnormal portfolio return in the Treynor-Black model. Information about the covariance matrix is not considered. 7 We calculate the mean Sharpe ratio using 1,000 simulations for each of the sample periods. Figure 1 shows the results. The MELO and the Bayesian using informative priors have always the greater Sharpe ratios. However, the goodness of using informative priors depends on how well these priors are defined. For instance, in our experiments, we have the best possible priors that could be used due to using population parameters (in sample) and mean future returns (out of sample) as hyperparameters. Observe that the non-informative MELO is in third place when the sample size is 120, whereas this position is for the shrinkage estimator using 240 as sample size. Meanwhile, the non-informative Bayesian, which gives the same results than the plug-in approach, obtains the second worst results, followed by the naive approach, which obtains the worst performance. Figure 2 shows the mean of the information ratios using 1,000 simulations for each of the sample periods. We observe that MELO and Bayesian using informative priors have almost greater information ratios, except when using 50 assets with a sample size of 120-where only MELO using informative prior has the greatest information ratio. MELO and Bayesian using non-informative priors have almost same mean information ratios using 10 assets. MELO using non-informative priors has the second and third best information ratio when using 50 assets and a sample size of 120 and 240, respectively. Meanwhile, the plug-in and naives approaches obtain the worst results.
In subsection 2.4.2 in supplementary material, we show robustness checks regarding distributional assumptions of our previous results.

Empirical study
We use weekly historical return of 21 MSCI international equity indices: Canada, United States, Austria, Belgium, Denmark, Finland, France, Germany, Israel, Ireland, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom, Australia, Japan, and Singapore. The index is adjusted by dividends and splits. We use weekly closing prices from June 2009 to June 2017. 8 We calculate the excess of returns with respect to the interest rate of the 3-month US treasury bill. We get 417 weekly excess of returns. We use a one year band width rolling window to estimate all trading and statistical strategies. We re-balance trading strategies every three months. The first portfolio is set in June 2010, and is held constant up to September 2010. Then, out-of-sample returns are calculated during this period. The second estimation is done in September 2010 and held constant up to December 2010, and the out-of-sample returns are calculated during this period, and so on. 9 Therefore, we obtain 365 (417-52) out of sample returns for each strategy. Then, we calculated the number of times that each strategy gets the highest out of sample return. Consequently, their relative frequencies are the most profitable. We repeat this process 100 times. Therefore we have 100 sets of relative frequencies counting the most profitable strategy. In each iteration, we randomly draw equity indices to have three portfolio sizes (5, 10, and 15 stocks). We can see in Table 7 that, on average, the non-informative MELO got the highest out of sample return in 45.50%, 47.09%, and 47.05% of times using portfolio sizes equal to 5, 10, and 15 assets, respectively. On the other hand, the naive weights got on average the worst out of sample performance. Table 8 shows the results for the tangency portfolio. We can observe that, on average, noninformative MELO got the highest out of sample return in 24.85%, 26.04%, and 27.73% of times using 5, 10, and 15 assets, respectively. On the other hand, the naive weights got the worst performance on average with 5 and 10 stocks, and the informative MELO was the worst using 15 stocks.
In the Treynor-Black empirical study (see Table 9), the naive approach got the best out of sample performance, 22.59%, 24.39% and 27.48% of times using 5, 10, and 15 assets, respectively. The non-informative Bayesian got the second best performance using 5 stocks, and the non-informative MELO got this position using 10 and 15 stocks. The informative Bayesian got worst out of sample performance on average.

Conclusions
In this paper, we proposed a decision theory framework to mitigate estimation risk. Our proposal has the same statistical properties as the delta method (maximum likelihood) estimator. However, it seems from our simulation exercises that our non-informative MELO proposal has better finite properties than competing alternatives. The degree of estimation improvement depends on the trading strategy. In particular, it seems that tangency can be better estimated using our approach. The non-informative MELO is the most realistic scenario in our simulation exercises, showing less degree of estimation variability and the lowest error. Our results are robust to heavy and serially correlated error distributions. It seems from our empirical study that the non-informative MELO is the best statistical strategy when global minimum variance or tangency portfolios are used as the trading strategy. Meanwhile, the naive approach is the best in the case of Treynor-Black trading strategy. However, the naive weights have the worst performance in the other trading strategies. It seems that the non-informative MELO is robust to these three trading strategies, and the implicit data generating process of the returns [Student's t and autoregressive process, AR(1)].
We should note at this point that real world applications are surrounded by a lot of noise, which invalidates many of the implicit assumptions in portfolio selection methodologies at financial and statistical level. Consequently, our recommendation is to implement all of these methodologies, then identify which generates the best outcomes in a cross-validation dataset, and finally make decisions based on these results. Therefore, we have developed a graphical user interface that helps to apply traditional approaches, as well as our proposal, which can be download at https://besmarter-team.shinyapps.io/meloportfolio/. See supplementary material, section 3.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.