Mean–variance andmean–semivariance portfolio selection: a multivariate nonparametric approach

While univariate nonparametric estimation methods have been developed for estimating returns in mean-downside risk portfolio optimization, the problem of handling possible cross-correlations in a vector of asset returns has not been addressed in portfolio selection. We present a novel multivariate nonparametric portfolio optimization procedure using kernel-based estimators of the conditional mean and the conditional median. The method accounts for the covariance structure information from the full set of returns. We also provide two computational algorithms to implement the estimators. Via the analysis of 24 French stock market returns, we evaluate the in-sample and out-of-sample performance of both portfolio selection algorithms against optimal portfolios selected by classical and univariate nonparametric methods for three highly different time periods and different levels of expected return. By allowing for cross-correlations among returns, our results suggest that the proposed multivariate nonparametric method is a useful extension of standard univariate nonparametric portfolio selection approaches.


Introduction
Modern portfolio theory (MPT) is one of the most applied and recognized investment approaches used by investors today. The theoretical basis on which it relies is not very complicated and easy to apply, which is one of the reasons for its success. Rather than analyzing each investment individually, the key idea is to have a look at the portfolio as B Jan G. De Gooijer j.g.degooijer@uva.nl Extended author information available on the last page of the article a whole and hence take into account the correlation structure between assets. In fact, the MPT originates from the so-called mean-variance (MV) portfolio model of Markowitz (1952); see also Markowitz (1959Markowitz ( , 1987. The optimizing asset allocation is simply defined as the process of mixing asset weights of a portfolio within the constraints of an investor's capital resources to yield the most favorable risk-return trade-off. For typical risk-averse investors, an optimal combination of investment assets that gives a lower risk and a higher return is always preferred.
The variance, which is the deviation above and below the mean return, is used as a risk measure in portfolio optimization to find the trade-off between risk and return. In order to compare investment options, Markowitz developed a mathematical framework to describe each investment or each asset class using unsystematic risk statistics. Indeed, by quantifying investment risk in the form of the mean, variance, and covariance of returns, Markowitz gave investors a mathematical approach to asset selection and portfolio management. He used these statistics to derive a so-called efficient frontier, or risk-reward equation, where every portfolio maximizes the expected return for a given variance, or equivalently minimizes variance for a given expected return. The efficient frontier flattens as it goes higher because there is a limit to the return an investor can expect. In a complete market without riskless lending and borrowing, a whole range of efficient asset portfolios with stochastic dominant features can be determined, which collectively delineates a convex MV frontier. However, variance is a questionable measure of risk for at least two reasons: (i) it makes no distinction between gains and losses; (ii) it is an appropriate measure of risk only when the underlying distribution of returns is symmetric with moments of order two. Markowitz (1959) recognized the asymmetrical inefficiencies inherited in the traditional MV model. To overcome this drawback, he suggested a downside risk (DSR) measured by the semivariance, which takes into consideration the asymmetry and the risk perception of investors. In fact, symmetry of asset return distributions have been widely rejected in practice, see, for example, Eftekhari and Satchell (1996). This fact justifies the use of semivariance when the presence of skewness or any other measure of asymmetry is observed. The semivariance is often considered as a more plausible risk measure than the variance. However, mean-semivariance optimal portfolios cannot be easily derived as the semicovariance matrix is endogenous and not symmetric (see, e.g., Estrada 2004Estrada , 2008, and the classical Lagrangian method is not applicable to resolve the optimization problem. de Athayde (2001) has developed an algorithm to construct a mean-DSR portfolio frontier. Although this frontier is continuous and convex, it has several kinks due to the fact that asset returns are not identically distributed. Clearly, the frontier is made on segments of parabolas (piecewise of quadratic functions), each one becoming steeper and steeper as we move toward the extremes, in either direction. They are connected to each other producing the successive kinks. The more observations we have, the more parabolas will appear and the smaller the segment of each will become. Otherwise, the number of convexity kinks will increase with the number of observations, getting closer and closer to each other, until when we reach the asymptotic limit, they will not be qualified as kinks any more, and the whole portfolio frontier will have a smooth shape which can be compared to the one obtained with the mean-variance portfolio model of Markowitz (1952).
Since, from a practitioner point of view, we always have a finite number of observations, the DSR portfolio frontier will always exhibit such convexity kinks. In order to overcome this problem, de Athayde (2003) used nonparametric techniques to estimate smooth continuous distribution of the portfolio in question. The major contribution is to replace returns by their mean kernel estimates (nonparametric mean regression). The advantage of this technique is to provide an effect similar to the case in which observations are continuous yielding a smoother portfolio frontier. Although this contribution is innovative, the paper is unstructured with no simulations and no applications. Another neglected aspect which deserves serious attention concerns the theory: a great confusion is palpable in the estimators writing. Ben Salah et al. (2018a) revisited de Athayde's work by making it more rigorous. The mean nonparametric estimator is clarified and its parameters are exhibited as well as their practical choices. The corresponding optimization algorithms were coded using the R programming language and empirically validated on real data. Secondly, taking advantage on the robustness of the median, de Athayde's work is improved by proposing another method to optimize a portfolio. This new method is based on nonparametric estimation of conditional median based on kernel methods. It is well known that the median is more robust than the mean and less sensitive to outliers. Returns will be replaced by their nonparametric median estimators. Nevertheless, the computing step and the convergence of the algorithm takes a long time due to the construction of the estimators: the asset estimation returns are derived from the estimation of the portfolio returns and they change at each computing step. Convergence is assumed to occur when the portfolio weights stay within some fixed tolerance value across successive iteration of the optimization stage. Contrary to de Athayde (2003), Ben Salah et al. (2018a proposed a new strategy to speed up the convergence of the algorithm: the idea is to start by estimating all the returns of each asset using kernel mean or median estimates. The portfolio return estimates are then obtained as a linear combination of the different asset return estimates, and the overall CPU load is drastically reduced.
Although some progresses have been made, all methods introduced above are univariate and do not take into account any possible correlation between assets. In this paper, an alternative nonparametric method to derive portfolio frontiers is proposed. The proposed approach is multivariate and based on vectorial nonparametric estimation of returns using multivariate mean and median. It has the advantage of taking into consideration the possible correlation between asset returns without specifying any specific dependence structure.
The paper is organized as follows. Section 2 presents standard models of portfolio optimization. Section 3 introduces the univariate and multivariate kernel-based estimators for the conditional mean and median. We also discuss their optimal parameter values. These estimators give an estimator of the DSR. Section 4, based on the previous estimators, exhibits two DSR optimization algorithms to get an optimal portfolio and the corresponding efficient frontier. Based on real data, Sect. 5 provides empirical support for the proposed multivariate nonparametric portfolio selection method and compares its efficiency with classical and univariate nonparametric portfolio selection methods. Section 6 contains some concluding remarks.

The M-V model
A mean-variance analysis is the process of weighing risk (variance) against expected return. By looking at the expected return and variance of an asset, investors attempt to make more efficient investment choices: seeking the lowest variance for a given expected return or seeking the highest expected return for a given variance level. More precisely, the classical MV portfolio optimization model aims at determining the proportions (weights) ω i of a given capital to be invested in each asset i belonging to a predetermined set or market, so as to minimize the risk of the return of the whole portfolio for a specified expected return E * .
Let m denote the number of assets, {R i } m i=1 a set of random returns, and {R i,t } T t=1 the set of observed returns of size T > m ≥ 2. In addition, let µ i be the expected return of asset i, and (σ i j ) the (i, j)th coefficient of the m-dimensional variance-covariance matrix M of asset returns. Then, for a required level E * of the portfolio return R p ,t = m i=1 ω i R i,t , the MV model can be written as a convex linear optimization problem with the following form: where ω = (ω 1 , . . . , ω m ) ′ , µ = (µ 1 , . . . , µ m ) ′ , and 1 is an m-dimensional vector whose elements are all one. The optimization problem can be solved by a number of efficient algorithms with moderate computational effort, even for large values of m. Moreover, (1) can be solved for a specific value of E * or, alternatively, for several values of E * and thus generating the minimum variance set. Using the Lagrange multiplier method, the m × 1 vector of optimal (op) weights is given by In this case, the efficient frontier curve is continuously convex. It is a parabola in mean-standard deviation space. Either way, it is important to notice that the risk of the portfolio can be expressed as a function of the risk of the individual assets in the portfolio. Moreover, all the variances, covariances, and expected returns of the individual assets are exogenous variables.

The DSR model
However, the variance is a questionable measure of risk for at least three reasons: (i) it makes no distinction between gains and losses; (ii) it is an appropriate measure of risk only when the underlying distribution of returns is symmetric; and (iii) it can be applied as a risk measure only when the underlying distribution of the returns is symmetric. Markowitz (1959) recognized the asymmetrical inefficiencies inherited in the traditional MV model. To overcome this drawback, he suggested to use a downside risk (DSR) defined by where B is any benchmark return chosen by the investor. The benchmark can be equal to 0, or the risk-free rate R f , any stock market index, or the mean µ p of the portfolio return R p ,t . When B = µ, DSR is a downside risk measure called semivariance.
The DSR is a more robust measure of asset risk that focuses only on the risks below a target rate of return. This measure of risk is more plausible for several reasons. First, investors obviously do not dislike upside volatility; they only dislike downside volatility. Second, the DSR measure is more useful than the variance when the underlying distribution of returns is asymmetric and just as useful when the underlying distribution is symmetric. In other words, the DSR is at least a measure of risk as useful as the variance. Finally, the DSR combines the information provided by two statistics, variance and skewness, and hence making it possible to use a one-factor model to estimate required returns; see, e.g., Nawrocki (1999) and Estrada (2006). The corresponding optimization problem can be written as follows: where the elements of the m × m matrix M SR are given by and where T 0 is the period in which the portfolio underperforms the target return B.
The above framework provides an exact estimate of the portfolio semivariance. However, finding the portfolio with a minimum DSR is not an easy task. The major obstacle is that the semicovariance matrix M SR is endogenous; that is, a change in weights affects the periods in which the portfolio underperforms the target rate of return, which in turn affects the elements of M SR . To get an approximate solution of (5), Hogan and Warren (1972) define the sample semicovariance between assets i and j as This definition has two drawbacks. The benchmark return is limited to the risk-free rate R f and cannot be tailored to any desired benchmark. Moreover, it is usually the case that M HW i, j (·) ̸ = M HW j,i (·). This second characteristic is particularly limiting both formally, i.e., the semicovariance matrix is usually asymmetric, and intuitively, i.e., it is not clear how to interpret the contribution of assets i and j to the risk of the portfolio. Further, the optimization problem (1) is not quadratic anymore which may cause optimization difficulties.
In order to overcome these drawbacks, Estrada (2004Estrada ( , 2008 defines the sample semicovariance between assets i and j with respect to a benchmark B as This definition can be tailored to any desired B and generates a symmetric (M i, j = M j,i ), nonnegative, definite, semicovariance matrix. Next, the solution of the MV problem follows directly.
To get a direct solution of (5), a simple and iterative optimization algorithm was developed by de Athayde (2001) that ensures the convergence to the optimal solution. However, due to some properties of the frontier, when only a finite number of observations is available, the portfolio frontier presents some discontinuity on its convexity. To address this issue, de Athayde (2003) generalizes his algorithm by introducing univariate kernel-based mean estimators of the returns. This idea provides an effect similar to the case in which observations are continuous and, moreover, it establishes a smooth portfolio frontier comparable to that obtained by the MV optimization method. Although de Athayde's contribution is innovative, his two papers are unstructured with no simulations and no applications. Another neglected aspect which deserves serious attention concerns the theory: a great confusion is palpable in the setup of the estimators. Motivated by these observations, Ben Salah et al. (2018a) revisited de Athayde's work by making it more rigorous. In particular, they clarify the kernel-based mean estimator, exhibit its tuning parameters, and discuss implementation issues. Then, by taking advantage of the robustness of the median, these authors improved on de Athayde's results by replacing the kernel-based mean return estimators by their kernelbased median counterparts.
Valuable as the univariate kernel-based estimation methods can sometimes be in portfolio selection, they do not take into account the possible cross-correlations between asset returns. Indeed, it is well known that returns are not mutually independent. It is therefore worth proposing multivariate, or vector, nonparametric techniques to estimate smooth continuous distributions of a set of portfolios under study. This is the topic of the current paper. In particular, we focus on both univariate and multivariate kernel-based mean and median estimators of {R i,t }. Next, we use the smoothed continuous distributions of the returns to optimize their corresponding DSR. Finally, given an optimal DSR, we discuss the construction of a new and smooth portfolio frontier, thus providing a more flexible framework for portfolio selection.

Nonparametric approaches
In this section, first we introduce the univariate nonparametric estimators of returns of de Athayde (2001, 2003) and Ben Salah et al. (2018a. Then we propose two multivariate nonparametric, kernel-based, estimators. Throughout the paper we assume that the vector of random returns R t = (R 1,t , . . . , R m,t ) ′ (t ∈ Z) consists of strictly stationary and ergodic time series processes taking values in R m .

Univariate nonparametric return estimation
Let K (·) be a probability density function satisfying some regularity conditions, and h T > 0 the bandwidth or smoothing parameter. Here, K : R → R is a so-called kernel function. Common assumptions on the bandwidth are h ≡ h T → 0, and T h T → ∞ as T → ∞. In addition, let {R i,t } T t=1 be a sequence of observations on the process {R i,t , t ∈ Z} (i = 1, . . . , m) with each ith subprocess having a continuous distribution function and a proper density. Then, at time t, a kernel smoother of the mean (Mn) of Note that (9) is essentially a weighted average of {R i,ℓ } T ℓ=1 in which the weight given to each R i,ℓ decreases with its distance from the observation in question.
The disadvantage of (9) is that it is sensitive to outliers and may be inappropriate in some cases, such as when the distribution of {R i,t , t ∈ Z} is heavy-tailed or asymmetric. In those cases, it may be sensible to use a univariate kernel-based estimator of the median (Mdn) rather than an estimator of the mean. In particular at time t, and given an L 1 -loss function, the estimator is defined as Alternatively, one can obtain R Mdn i,t by solving the equation where F T (·|·) is an estimator of the conditional distribution function of R i,ℓ given R i,t , and I (·) is the indicator function.

Remark 1
The kernel K (·) determines the shape of the weighting function. The use of symmetric and unimodal kernels is standard in nonparametric estimation. Throughout the empirical analysis we adopt the Gaussian kernel. For practical problems the choice of the kernel is not so crucial, as compared to the choice of the bandwidth. For all univariate and multivariate kernel-based methods, we use the so-called Sheather-Jones bandwidth, which is generally believed to be a satisfactory way of doing so. Under certain mixing conditions of the process {R i,t , t ∈ Z}, uniform convergence rates and asymptotic normality of the estimators (9) and (10) can be proved; see, e.g., De Gooijer (2017) and Gannoun et al. (2003) and the references therein.

Multivariate nonparametric return estimation
Multivariate kernel-based mean and median estimation is a straightforward extension of plain univariate estimation. Let K(·) : R m → R be a multivariate kernel density function with a symmetric positive definite m × m matrix H known as the bandwidth matrix. Then, at time t, the multivariate kernel-based mean estimator (M-Mn) and the multivariate kernel-based median estimator (M-Mdn) of the m-dimensional process {R t , t ∈ Z} are respectively defined as where ∥ · ∥ is a matrix or vector norm. In this paper, we adopt the Euclidean norm. Computing the above estimators requires a numerical procedure, which becomes increasingly difficult to implement as the dimension m increases. As a simplification the matrix H is often taken to be a diagonal matrix with values {h i ≡ h i,T } m i=1 such that h i > 0, h i → 0 and T h i → ∞, as T → ∞. In addition, it is common to consider a product of m univariate kernel functions, i.e., K(u) = m i=1 K (u i ). Then we can write the above estimators as follows A kernel-based estimate of R p ,t is given by Given this estimate, a kernel-based estimate of the DSR follows by replacing R p ,t in (4) by R p ,t .

Computational algorithms
Recall the optimization problem in (5). Given the set of kernel-based estimates , the matrix M SR has the following elements Using the above framework, we propose two algorithms for the optimization of (5).
p ,t = ω ′ 0 R t . At each time point t, compute the following two univariate nonparametric estimators In addition, compute the multivariate nonparametric estimators where R ℓ = (R 1,ℓ , . . . , R m,ℓ ) ′ and K(·) = K m (·). Let R p ,t − B) < 0}. Next, similar to (2), compute the m × 1 portfolio weight vector ω 1 . That is (17), compute the portfolio return at time t, i.e., R (1) p ,t = ω ′ 1 R t . Next, compute the univariate nonparametric estimators Also, compute the multivariate nonparametric estimators Next, similarly to S 0 in step (i), construct the set S 1 = {t : is the cardinality of set S 1 . Finally, compute the m × 1 portfolio weight vector ω 2 , i.e., where (1) m ) ′ is an m × 1 vector of the mean returns associated with the set S 1 . (iii) Continue with step (ii) until at iteration step u + 1 the matrix M u+1 will be the same as M u or, alternatively, if ∥ω u+1 ∥ ≈ ∥ω u ∥. In that case, the weight vector of the minimum DSR portfolio, with expected return E * , is given by where m ) ′ is an m × 1 vector of the mean returns associated with the set S u . (iv) Finally, employ the quantities in step (iii) to approximate the DSR. Calling the resulting value DSP (I) , i.e., Clearly, with Algorithm I the estimation of the portfolio returns changes at each iteration step u. This may be time-consuming, in particular, when the set of potential asset returns contains many variables. The following algorithm avoids this problem.

Efficient frontier construction
In order to build the portfolio frontier, there is a need to select some other points in the efficient set. As the DSR is a function of the expected return E * , by varying this value one can obtain a new value of DSR (·) (·) and, hence, the efficient frontier. The equation for DSR (·) (·) shows that, while the final matrix M u does not change, DSR (·) (·) is a quadratic function of E * . However, if E * changes considerably, both algorithms end up with a new matrix M u , and therefore a new quadratic function. Thus, the portfolio frontier will be described by a sequence of segments of different quadratic functions. The more assets are used, the smoother will be the portfolio frontier in question, creating a similar effect as if one is adding more observations to the return series. In addition, it is known that the nonparametric, kernel-based, technique creates a continuous distribution of the returns, and hence gives rise to a new portfolio frontier with a smoother shape.
Remark 3 Algorithms I and II both allow short selling, i.e., there are no constraints on the weights ω i (i = 1, . . . , m). With selling constraints the additional condition ω i ≥ 0 is needed. There are many available software packages to solve the corresponding optimization problem; see, for instance, the R-quadprog package.

Data and methods
To illustrate the proposed portfolio selection methods, we present results for a set of French stock prices taken from Thomson Reuters. The data set consists of m = 24 daily stock closing prices {P i,t }, covering the time period 07/01/2000-06/01/2017. These assets belong to different sectors: banks, insurance, industry, energy, technology, and observations. For all series, the returns are computed as R i,t = (P i,t − P i,t−1 )/P i,t−1 with P i,t adjusted for dividend payments and stock splits. Figure 1 shows a graph of the CAC 40 index for the complete time period. We see a strong decrease of the index in mid 2008 during the financial crisis. It is also evident that after 2012 a slow recovery of the French economy settles in, as indicated by an upward trend. The grey-shaded areas depict the following three subperiods under investigation: "calm" (I), covering the year 2004 (T = 259), "crisis" (II), covering the year 2008 (T = 256), and "good" (III), covering the year 2013 (T = 255). The number of contemporaneous cross-correlations having a p value smaller than 0.05, out of a total of 276 p values for each subperiod, is 222 (calm), 276 (crisis), and 274 (good). Indeed, there are strong correlations between the individual stocks, which in all cases are positive. These results suggest the use of portfolio selection methods that explicitly take account of cross-correlations.
With respect to the portfolio optimization, we employ the following seven methods: (1) Naive, i.e., (3) Univariate nonparametric mean DSR. That is, using (4) with R p ,t replaced by where R Mn i,t is defined by (9). (4) Univariate nonparametric median DSR. That is, using (4) with R p ,t replaced by is not included in our study. The reason is that for this particular data set Algorithm I does not converge. That is, it oscillates between two different optimal portfolios, albeit equivalent in terms of returns, for the data under study. However, it should be said that theoretically both Algorithms I and II converge. Figure 2 displays the efficient frontier curves of the multivariate nonparametric median DSR method 6(ii) with no constraints on the portfolio weights. All other nonparametric portfolio optimization methods produced similar curves which are not distinguishable from the ones shown in Fig. 2, and hence they are not shown here. Compared to the classical MV portfolio optimization, the multivariate nonparametric efficient frontier based on median estimation dominates the other curves for any expected return. Long-short positions can be based on the sign of the portfolio weights. Across all nonparametric methods, 8 assets have negative signs for period I, 12 have negative signs for period II, and 11 assets have negative signs for period III. Given these results, one may take profit of the evolution of the stock market by taking short positions on assets with nonnegative weights.

Efficient frontiers
Next, we investigate optimization methods 2-6 with short selling constraint. Short selling is a strategy to speculate if the market value of an asset is going to decline. It can also be used to hedge long positions. To be more specific, the strategy involves borrowing a stock from a broker and then selling it in the market. The stock is bought back and returned to the broker at a later date, called covering the short. If the stock drops, the short-seller buys at a lower price and then he makes money.
Under the short selling constraint, the results of the portfolio optimization methods differ for each subperiod and different levels of expected return. For period I, the optimal portfolio has nine nonzero weights corresponding to the following stocks: ArcelorMital, Klépierre, Pernod Ricard, Safran, Technip, Total, Unibail-Rodamco, Vinci, and Danone. In contrast, for period II the following four stock indices have nonzero weights: Air Liquide, Orange, Sodexo, and Danone. Finally, for period III there are eight stock indices with nonzero weights: Air Liquide, Klépierre, Pernod Ricard, Publicis Groupe, Safran, Sanofi, Sodexo, and Danone. Interestingly, for all three time periods, Danone is part of the optimal portfolio for all levels of expected return.

Forecasting evaluation
As a robustness check, this section reports the out-of-sample forecasting performance of portfolio selection methods 1-6 against the CAC 40 index returns in terms of mean forecast error (MFE), mean squared forecast error (MSFE), and mean absolute forecast error (MAFE). We adopt a rolling window-scheme with a forecast horizon of one quarter. In particular, all optimal portfolios are trained with a quarter and predictions are made on the next quarter. For instance, for period I, the first quarter of 2004 is used to test the performance of an optimal portfolio composed of individual securities selected in the 4th quarter of 2003. The length of the initial in-sample estimation period, having enough observations for reasonably accurate nonparametric estimates, balances with our desire for a relatively long out-of-sample period for forecast evaluation. With such a design, a total of 252 forecasts are generated.
The results are summarized in Table 1. There are some observations worth noting. For Algorithm II the forecasts obtained from the nonparametric mean and median DSR methods are equivalent in terms of the lowest values of the three performance measures, and across all three time periods. For period II, all multivariate methods perform considerably better than their univariate counterparts. This indicates that there is some gain in using multivariate portfolio selection methods that account for possible cross-correlations between asset returns. On the other hand, the forecasting results for the univariate and multivariate methods are qualitatively similar for periods I and III. Finally, we see that it is easy to outperform the naive benchmark method.

Conclusion
In this paper, we propose multivariate nonparametric estimators of the conditional mean and conditional median for mean-DSR optimization. In particular, the estimators account for possible interrelationships between asset returns, as for instance quantified by cross-correlations. To implement the proposed estimators, we provide two computational algorithms for efficient portfolio selection. Via the analysis of 24 French stock market returns, we evaluate the in-sample performance of classical and nonparametric portfolio selection methods with and without restrictions on the portfolio weights. In addition, we compare the out-of-sample performance of seven portfolio selection methods in forecasting the CAC 40 index returns during three highly different time periods. From a theoretical point of view, it is clear that the proposed nonparametric multivariate methods are more natural than their univariate counterparts when asset returns are correlated. This particular extension has not been considered in the current literature. Algorithm II provides an efficient and simple tool for this purpose. Moreover, the algorithm allows for heavy-tailed or asymmetric distribution of portfolio returns by considering univariate and multivariate kernel-based median estimators. Finally, it is good to mention that the computational burden of Algorithm II is minimal.

Supplementary material
Data and R codes, as supplementary material, are available at: http://www.jandegooijer. nl.