1 Introduction

A strand of literature suggests that investors do not treat gains and losses in the same way. As a consequence, a distinction between “good” and “bad” target deviations is helpful for volatility forecasts, risk management, and asset pricing (for an exhaustive overview see Bollerslev, 2021). This distinction is crucial in financial markets to explain stylized facts such as fat tails, negative asymmetry, and non-normality in asset returns, and to construct more robust portfolio selection strategies. By recognizing the empirical regularities in asset returns, especially those implying negative asymmetry and aggregational Gaussianity over longer time horizons, loss-averse investors can use alternative portfolio selection rules like minimum semivariance optimization (see Cont, 2001, 2010; Ratliff-Crain et al., 2023).

Shortly after introducing the mean-variance framework, Markowitz (1959) formally describes the concept of semivariance of a portfolio and illustrates its theoretical advantages over the variance as a measure of risk. Some decades later, Sortino and Prince (1994) introduce the Sortino ratio, which is similar to the Sharpe ratio but relies on the downside risk, i.e., the square root of the semivariance, to quantify risk.

So far, research has mostly focused on mean-variance efficient portfolios and minimum-variance portfolios (see Zopounidis et al., 2014, and the references within the special issue on Markowitz’s contributions in portfolio theory), although targeting the downside volatility allows the investor to focus on loss minimization without considering the upside volatility as risk. Downside risk measures are better suited for assessing the investment risk of investors who consider potential losses relative to their target returns. For further elucidation on downside risk measures, see Harlow (1991) and Nawrocki (1991), and more recently Pla-Santamaria and Bravo (2013) and Klebaner et al. (2016).

The standard deviation is more widely used as a risk measure because calculating optimal semivariance portfolios is more challenging from an optimization perspective compared to using quadratic programming for the mean-variance framework (see Ballestero, 2005; Estrada, 2008; Cumova & Nawrocki, 2011). In order to minimize the portfolio semivariance it is necessary to compute a semicovariance matrix. Differently from the covariance matrix, the semicovariance matrix only considers periods in which the portfolio underperforms the benchmark. As a consequence, the semicovariance matrix is endogenous in the sense that changes in the portfolio weights affect the periods in which the portfolio return underperforms the benchmark, which in turn determines the value of the semivariance. This means that the semivariances and semicovariances contained in the semicovariance matrix depend not only on a given set of data but also on the chosen portfolio weights.

From a mathematical viewpoint, the semivariance relies on a non-smooth indicator function to identify periods where the portfolio is below a certain threshold. As a consequence, analytical solutions do not exist, and naïve numerical optimization procedures, such as grid search, are only feasible for a relatively small number of assets. Previous attempts to estimate the semicovariance matrix exist in the literature (see Hogan & Warren, 1972; Nawrocki, 1991; Ballestero, 2005; Estrada, 2008; Cumova & Nawrocki 2011). However, all available approaches suffer from serious drawbacks, including lack of matrix symmetry, presence of simplifying assumptions on the semivariance structure or on the preferences of investors, restrictions on the choice of the benchmark value, and poor performance in terms of approximation error (see Estrada, 2008; Cheremushkin, 2009; Rigamonti, 2020).

In this work, we introduce an estimator for the portfolio semivariance, which we refer to as smoothed semivariance (SSV) estimator in the rest of the paper. The main idea is to approximate the indicator function in the classical definition of the empirical semivariance by a continuous function with accuracy of approximation controlled by a smoothing constant. The resulting estimator has the advantage to be a smooth function in the portfolio weights, while it can be made arbitrarily close to the usual semivariance estimator. We explore the minimization of the SSV estimator as a rule for portfolio selection. Depending on the value of the smoothing constant, the resulting minimum SSV portfolios allow spanning an entire set of solutions, ranging from the minimum semivariance to the portfolio obtained by equally weighting all the observations in the estimation set (i.e., without discriminating between upside and downside volatility). When the benchmark is equal to the sample mean, the latter case coincides with the minimum variance portfolio. In order to present this additional theoretical property, in this paper we focus on the special case in which the benchmark is equal to the mean. However, our methodology can be used with any benchmark.Footnote 1 To compute the minimum SSV portfolios, we introduce an easy-to-implement iteratively reweighted scheme, which allows tackling large problems with many assets.

To corroborate the precision of our SSV algorithm, for a small problem we show that our results are almost identical to the one obtained via grid search, which can be considered as the global optimum. Moreover, we illustrate that the estimated semicovariance matrix might suffer from greater parameter uncertainty than the estimated covariance matrix. This suggests that an estimator that perfectly separates positive and negative data points might not be optimal in terms of out-of-sample performance, and that intermediate levels of the SSV tuning parameter might lead to better out-of-sample properties. This is confirmed on simulated and on empirical data when comparing the SSV approach with benchmarks such as the global minimum-variance portfolio and the minimum semivariance portfolios obtained using the techniques proposed by Ballestero (2005), Estrada (2008) and Cumova and Nawrocki (2011). Furthermore, in order to address parameter uncertainty in the out-of-sample exercise, we show that the proposed SSV approach can be easily combined with well-known shrinkage techniques.

The remainder of this paper is structured as follows. Section 2 describes the SSV estimator and its theoretical properties. Section 3 addresses the issue of portfolio selection through the optimization of the SSV and describes the algorithm. Sections 4 and 5 illustrate the properties of the new method using simulated and real data, respectively. Section 6 concludes and discusses future research. Proofs of propositions are deferred to the “Appendix A”. Robustness results are provided in “Appendix B”.

2 Methodology

2.1 Background and setup

Let \(X=(X_1,\dots , X_N)^\top \) be a random vector representing the returns for N assets with mean vector \(E(X) = \eta \in {\mathbb {R}}^N\) and covariance matrix \(cov(X)= \Sigma \in {\mathbb {R}}^{N \times N}\). The portfolio return is defined by the weighted average \(Y(w)=w^\top X\), where \(w = (w_1, \dots , w_N)^\top \) is the vector of portfolio weights to be determined and satisfying the budget constraint \(\sum _{j=1}^N w_j = 1\). The expected return and variance of the portfolio are given by

$$\begin{aligned} \mu (w)&= E[Y(w)] = w^\top \eta , \end{aligned}$$
(1)
$$\begin{aligned} \sigma ^2(w)&= {var}[Y(w)] = w^\top \Sigma w. \end{aligned}$$
(2)

The portfolio semivariance is defined as

$$\begin{aligned} {\sigma }_s^2(w)&= E \left[ \text {min}\{Y(w) - b,0\}^2\right] = E[(Y(w) - b)^2 I\left( Y(w) < b\right) ], \end{aligned}$$
(3)

where \(I(\cdot )\) represents the indicator function, and b is a benchmark set by the investor. The portfolio semivariance is similar to the variance, but it considers only the variability of portfolio returns below b.

Let \(X_1, \dots , X_T\) be T independent observations on X, where \(X_t = (X_{1t}, \dots , X_{Nt})^\top \) and \(X_{it}\) denotes the observation on asset i at time t. Consider the empirical semivariance estimator

$$\begin{aligned} {\widehat{\sigma }}_s^2(w)&= \dfrac{1}{T} \sum _{t=1}^T \text {min}\{Y_t(w) - b,0\}^2 = \dfrac{1}{T} \sum _{t=1}^T (Y_t(w) - b)^2 I\left( Y_t(w) < b\right) , \end{aligned}$$
(4)

where \(Y_t(w)=w^\top X_t\). Particularly, we are interested in finding the optimal portfolio \({{\hat{w}}}\) by solving the minimum semivariance problem

$$\begin{aligned} \underset{w}{\min }\ \ {\widehat{\sigma }}_s^2(w) \ \ \text {s.t.} \ \sum _{j=1}^N w_{j} =1. \end{aligned}$$
(5)

The optimal portfolio weights obtained by solving (5) do not coincide with those of the minimum variance portfolio if asset returns are asymmetric and/or the benchmark differs from the mean (see Klebaner et al., 2016). Solving the optimization task in (5) is challenging as it relies on the non-smooth indicator function \(I(Y_t(w)<b)\), which impedes the use of standard optimizers, and requires the semicovariance matrix as an endogenous input.

To address the above challenges, several algorithms based on linear or quadratic programming have been suggested to minimize the semivariance of a portfolio (for a review see Estrada, 2008). Taking a different approach, Estrada (2008) proposes to approximate the semicovariance matrix using a heuristic approach based on whether the single assets, and not the portfolio as a whole, underperform the benchmark. Although this yields a symmetric and positive semi-definite matrix which is easy to compute, the discrepancy between this heuristic definition and the actual semicovariance \(\sigma ^2_s(w)\) can be substantial and generally increases with the number of assets. The approximation error implied by this heuristic approach cannot be neglected and implies a significant estimation bias and performance loss (see Cheremushkin, 2009; Rigamonti, 2020). Differently from existing approaches, we propose a smoothed version of the empirical semicovariance matrix. The estimator is shown to converge to the sample semicovariance estimator in (4). The proposed approximation can be conveniently optimized through a reweighting algorithm.

2.2 The smoothed semivariance estimator

Motivated by the above computational issues, we introduce the smoothed semivariance (SSV) estimator defined by

$$\begin{aligned} {{\widehat{\sigma }}}_s^2(w, \theta ) = \dfrac{1}{T} \sum _{t=1}^T (Y_t(w) - b)^2 \pi (Y_t(w) - b; \theta ), \end{aligned}$$
(6)

where \(\pi (\cdot ; \theta )\) is a continuous function indexed by the smoothing parameter \(\theta \), which we use to relax the binary weights \(I(Y_t(w)-b <0)\) in (4). In this paper, \(\pi \) is given by the parametric model

$$\begin{aligned} \pi (z;\theta ) = \left\{ \begin{array}{ll} 1 - F( z/\theta ), &{} \theta > 0 \\ I(z<0), &{} \theta = 0, \end{array} \right. \end{aligned}$$
(7)

where \(F(\cdot )\) is the distribution function of \(Y(w)-b\), and the smoothing parameter \(\theta \) may be determined based on the sample. We let \(\theta = \theta _T\) be a positive sequence decreasing to zero with the sample size T. This choice, while ensuring smoothness of the estimator (6) with respect to w, it also makes the discrepancy between the smoothed estimator (6) and the traditional sample semivariance estimator (4) negligible as T diverges.

For \(\theta \rightarrow 0\), the smoothing function \(\pi \) converges to the indicator function \(I(\cdot )\) in (4), and in the limit \({{\widehat{\sigma }}}_s^2(w; \theta )\) coincides with the traditional empirical semivariance \({{\widehat{\sigma }}}_s^2(w) = {{\widehat{\sigma }}}^2(w; 0)\), regardless of the benchmark b set by the investor; see Fig. 1a. If \( b=\bar{Y}(w)=T^{-1}\sum _{t=1}^T Y_t(w)\) and \(\theta \rightarrow \infty \), the SSV weighs equally all the observations, and converges to the sample covariance estimator; see Fig. 1c. Therefore, we focus on the special case where the SSV is defined as

$$\begin{aligned} {{\widehat{\sigma }}}_s^2(w, \theta ) = \dfrac{1}{T} \sum _{t=1}^T [Y_t(w) - {{\bar{Y}}}(w)]^2 \pi (Y_t(w) - {{\bar{Y}}}(w); \theta ), \end{aligned}$$
(8)

with smooth function \(\pi (\cdot ; \theta )\) as defined in (7). In Sect. 4, we show that, due to parameter uncertainty, the lowest downside deviation in an out-of-sample exercise might be achieved with an intermediate value of \(\theta \). This corresponds to the situation in Fig. 1b.

Fig. 1
figure 1

Smoothing weights \(\pi (z;\theta )\) for different values of the smoothing parameter \(\theta \): (a) \(\theta = 0.0001\), (b) \(\theta =0.05\) and (c) \(\theta =1000\)

The following proposition shows that the SSV estimator \({{\widehat{\sigma }}}^2(w, \theta )\) is asymptotically equivalent to the non-smoothed semivariance \({{\widehat{\sigma }}}^2_s(w)\), and uniformly in w, as the sample size T increases. The proposition also provides information on the convergence rate.

Proposition 2.1

Let \(\theta = \theta _T\) be a sequence such that \(\theta _T \rightarrow 0\) as \(T \rightarrow \infty \). Assume: (i) \(\{ X_{t}, t\ge 1 \}\) are iid vectors with finite moment up to second moment; and (ii) the portfolio \(Y_t= w^\top X_t\) has continuous and bounded density \(f(\cdot )\) such that \(y f(a y) \rightarrow 0\) as \(|y| \rightarrow \infty \) for all \(a \in {\mathbb {R}}^+\) and \(w \in {\mathbb {R}}^N\). Then for any w such that \(\Vert w \Vert _2 <C\), \(C>0\), we have

$$\begin{aligned} \left| {{\widehat{\sigma }}}_s^2(w, \theta _T) - {{\widehat{\sigma }}}^2_s(w) \right| = O_p\left( \theta ^{-1}_T f(k \theta _T^{-1})\right) , \end{aligned}$$
(9)

where \(k>0\) is some positive constant.

The main requirement is that the portfolio \(Y_t(w) = w^\top X_t\) has density with sub-linear tail behavior, meaning its tails go to zero quicker than y as \(y \rightarrow \infty \). This is a relatively mild condition satisfied by a large number of distributions used in financial modelling. Then, the rate of convergence between the smoothed and non-smoothed estimators is governed by the tail behavior. For example, when \(X_t\) follows the N-variate normal distribution \({\mathcal {N}}_N(\eta , \Sigma )\), then \((Y_t(w)- b)\) follows the univariate normal distribution \({\mathcal {N}}_1(w^\top \eta - b, w^\top \Sigma w)\). In this case, for any \(a>0\) and \(\Vert w \Vert _2 <C\) we have

$$\begin{aligned} y f(a y) = y \dfrac{1}{\sqrt{2 \pi w^\top \Sigma w}} \exp \left\{ - \dfrac{a (y - w^\top \eta + b)^2}{2 w^\top \Sigma w}\right\} \rightarrow 0, \ \ \text { as } |y| \rightarrow \infty . \end{aligned}$$

The tail of the normal distribution decreases exponentially fast in \(y^2\) and clearly satisfies the requirement of Proposition 2.1.

Proposition 2.1 contains information on the convergence rate of the smooth estimator, thus providing us with some guidance on how to set \(\theta _T\) in applications. For the case of normally distributed assets, the SSV estimator approximates the non-smoothed estimator up to a vanishing term of order \(\exp \{-k \theta _T^{-2} \}\), \(k>0\), in probability, because of the dominating effect of the exponential term in the normal pdf. This means that setting \(\theta _T = T^{-1/2}\) ensures an approximation of order \(O_p(\exp \{ - k T \})\) for sufficiently large T. In real data applications, we recommend setting \(\theta _T = T^{-1/2}\) or even smaller, such as \(\theta = T^{-1}\), as these choices are expected to have negligible impact on the statistical accuracy of either smoothed or non-smoothed estimators. Particularly, the standard deviation of such estimators is expected to be of order \(T^{-1/2}\) by the Central Limit Theorem.

3 Optimal portfolio selection

Consider a random sample \(X_1,\dots , X_T\) from \(X\sim {\mathcal {N}}_N(\eta ,\Sigma )\) and define the sample covariance matrix \({{\hat{\Sigma }}} = T^{-1} \sum _{t=1}^T (X_t - {{\bar{X}}})(X_t - \bar{X})^\top \). The global minimum variance (GMV) portfolio \({{\hat{w}}}_{mv}\) is found by minimizing the portfolio variance \({\widehat{\sigma }}^2(w) = w^\top {{\hat{\Sigma }}} w\) subject to the constraint \( \sum _{j=1}^N w_{j} =1\). The GMV portfolio has the closed-form expression \({{\hat{w}}}_{mv} = {{\hat{\Sigma }}}^{-1} {{\textbf {1}}}/({{\textbf {1}}}^{\top } \hat{\Sigma }^{-1} {{\textbf {1}}})\), where \( {{\textbf {1}}} \) is the \(N\times 1\) vector of ones.

The semivariance minimization problem (5) can be similarly formulated. Particularly, note that the portfolio semivariance is given by \({\hat{\sigma }}_s^2(w) = w^\top {{\hat{\Sigma }}}_s(w) w \), where \({{\hat{\Sigma }}}_s(w)\) is the \(N\times N\) semicovariance matrix

$$\begin{aligned} {{\hat{\Sigma }}}_s(w) = \dfrac{1}{T} \sum _{t=1}^T (X_t - {{\bar{X}}})(X_t - {{\bar{X}}})^\top I\{ w^\top (X_t- {{\bar{X}}}) < 0\}. \end{aligned}$$
(10)

The minimum semivariance portfolio \({{\hat{w}}}_{0}\) is found by solving

$$\begin{aligned} \underset{w}{\min }\ \ \ w^\top {{\hat{\Sigma }}}_s(w) w \ \ \text {s.t.} \ \sum _{j=1}^N w_{j} =1. \end{aligned}$$
(11)

This is a challenging optimization task that cannot be tackled using gradient-based optimization techniques due to the non-smoothness of \({{\hat{\Sigma }}}_s(w)\). To address this issue, we replace \(\hat{\Sigma }_s(w)\) in (11) by the smoothed semicovariance matrix

$$\begin{aligned} {{\hat{\Sigma }}}_s(w;\theta ) = \dfrac{1}{T} \sum _{t=1}^T (X_t - \bar{X})(X_t - {{\bar{X}}})^\top \pi \left( w^\top (X_t- {{\bar{X}}}); \theta \right) , \end{aligned}$$
(12)

where the function \(\pi (\cdot ; \theta )\) is defined in (7). Thus, given \(\theta >0\), we compute the smoothed minimum semivariance portfolio by solving the approximate optimization task

$$\begin{aligned} \underset{w}{\min }\ \ \ w^\top {{\hat{\Sigma }}}_s(w;\theta ) w \ \ \text {s.t.} \ \sum _{j=1}^N w_{j} =1. \end{aligned}$$
(13)

The solution to the above problem clearly depends on the choice of the smoothing parameter \(\theta \), and we denote it by \({{\hat{w}}}_\theta \) in the rest of the paper.

Note that the minimum semivariance portfolio \({{\hat{w}}}_0\) and the minimum variance portfolio \({{\hat{w}}}_{mv}\) are recovered as limit cases from the optimization (13). In particular, for \(\theta \rightarrow 0\), we have \({{\hat{\Sigma }}}_s(w, \theta ) \rightarrow {{\hat{\Sigma }}}_s(w)\) for all \(w \in {\mathbb {R}}^N\), meaning that the minimum SSV portfolio \({{\hat{w}}}_\theta \) approximates the target (non-smooth) minimum semivariance portfolio \({{\hat{w}}}_{0}\) for a sufficiently small \(\theta \). For certain special cases, the minimum semivariance portfolio \({{\hat{w}}}_0\) may be obtained by brute-force computation, for example using grid search. However, this is typically unfeasible in realistic applications where the number of assets N is moderate or large. On the other hand, our smoothing approach with \(\theta >0\) can be applied to situations with arbitrary, and possibly very large, number of assets N. For \(\theta \rightarrow \infty \), the SSV estimator converges to the sample minimum variance portfolio \( {{\hat{w}}}_{mv}\).

The next proposition shows that the minimum SSV portfolio \({{\hat{w}}}_\theta \) converges to the population minimum semivariance portfolio as long as \(\theta \) decreases to 0 with the sample size T. For the following analysis, we define the Lagrangian function \(L_s(v;\theta ) = \sigma ^2_s(w;\theta ) + \lambda \sum _{j=1}^n w_j\) with \(v = (w^\top , \lambda )^\top \) and let \(v^*= (w^*, \lambda ^*)^\top \) be the minimizer of \(L_s(v;0)\). Here \(w^*\) is the optimal portfolio that minimizes the theoretical non-smooth population semivariance (3) in the absence of sampling variability. Note that (13) can be written in terms of the empirical objective

$$\begin{aligned} {{\hat{L}}}_s(v; \theta ) = \dfrac{1}{T}\sum _{t=1}^T \left\{ w^\top (X_t - {{\bar{X}}})(X_t - {{\bar{X}}})^\top \pi (w^\top (X_t- {{\bar{X}}}); \theta ) w + \lambda \sum _{j=1}^N w_j \right\} . \end{aligned}$$

Particularly, \({{\hat{L}}}_s(v;\theta )\) can be regarded as an M-estimating function, with w representing a statistical parameter. From this viewpoint, the minimizer \({{\hat{v}}}_\theta = ({{\hat{w}}}^\top _\theta , {{\hat{\lambda }}}_\theta )^\top \) of \({{\hat{L}}}(w; \theta )\) can be viewed as an M-estimator, which enables us to exploit existing theory to analyze its properties.

Proposition 3.1

Assume: (i) for every \(\epsilon >0\), \(\inf _{v: \Vert v - v^*\Vert _2> \epsilon }L_s(v;0) > L_s(v^*; 0)\), and (ii) \(X_t\) has finite moments up to fourth moment. Then, under the conditions given Proposition 2.1, \({{\hat{w}}}_{\theta _T} \rightarrow w^*\) in probability as \(T \rightarrow \infty \).

Consistency for \({{\hat{w}}}_{\theta }\) with \(\theta \) decreasing to zero is obtained through standard arguments for M-estimators (e.g. see Theorem 5.7 in van der Vaart, 2000). The main assumptions for consistency of \({{\hat{w}}}_\theta \) are the uniform convergence of the semivariance estimator shown in Proposition 2.1 and the uniqueness of the minimizer \(w^*\) of \(\sigma _s^2(w;0)\) in a neighborhood of \(w^*\).

3.1 Iteratively reweighted algorithm

The optimization task in (11) cannot be solved by a gradient-based approach due to the non-smoothness of the objective function. Solving (11) using a generic optimization algorithm may still be challenging when the number of elements in w is moderate or large. To address this issue, we propose a reweighting optimization approach whereby, starting from some initial portfolio weights, we alternate the estimation of the weighted covariance matrix \({{\hat{\Sigma }}}_s( w;\theta )\) and the solution of (13) to update the weights. Note that, given \({{\hat{\Sigma }}}_s( w;\theta )\), the optimization task in (13) is a standard minimum-variance problem which admits a closed-form solution for \(T>N\). This reweighting strategy leads to a fast and easy-to-implement algorithm, which we describe next.

Let \(x_1, \dots , x_T\) be T realizations on the vector of asset returns with \(x_{t} = (x_{1t}, \dots , x_{Nt})^\top \) and \(x_{it}\) denotes the realization of the return for asset i at time t. Moreover, define \(z_t(w) = w^\top (x_t - {{\bar{x}}})\), with \({{\bar{x}}} = T^{-1} \sum _{t=1}^T x_t\). We use the superscript [k] to denote the k-th step of our algorithm, i.e., \(w^{[k]}=\left( w^{[k]}_1, \dots , w^{[k]}_N\right) '\) are the portfolio weights obtained at Step k. For a given \(\theta \), we carry out the following steps:

Algorithm 1: Iteratively reweighted algorithm for minimum SSV portfolio

  1. 0.

    Initialization. Set \(k=0\) and \( \pi _t^{[0]}= I\left( z_t\left( w^{[0]}\right) <0\right) \), for \(t=1,\dots , T\), where \(w^{[0]}\) are initial weights. For example, set \(w^{[0]}_1=\dots =w^{[0]}_N =1/N\).

  2. 1.

    Parametric smoothing update. Compute \(\pi ^{[k]}_t =\pi \left( z_t\left( w^{[k]}\right) ; \theta \right) \), for \(1 \le t \le T\), where \(\pi (\cdot ; \theta )\) is the parametric smoother in (7).

  3. 2.

    Update of the optimal portfolio. Given \(\pi ^{[k]}_t\), \(t=1,\dots , T\), update the portfolio weights \({{\hat{w}}}^{[k+1]}\) by solving

    $$\begin{aligned} \min _{w} \ {{\widehat{\sigma }}}^{2[k]}_s(w) = w^\top \hat{\Sigma }^{[k]}_s w \ \ \text { s.t.} \ \ \sum _{j=1}^N w_j =1, \end{aligned}$$
    (14)

    where \({{\hat{\Sigma }}}^{[k]}_s = T^{-1} \sum _{t=1}^T(x_t - {{\bar{x}}})(x_t - {{\bar{x}}})^\top \pi _t^{[k]}\) is the sample covariance matrix with observations reweighted using the smoothing constants \(\pi ^{[k]}_t\), \(t=1,\dots , T\). The solution of (14) is

    $$\begin{aligned} {{\hat{w}}}^{[k+1]} = \frac{\left( {{\hat{\Sigma }}}^{[k]}_s\right) ^{-1} {{\textbf {1}}}}{ {{\textbf {1}}}^{\top } \left( {{\hat{\Sigma }}}^{[k]}_s\right) ^{-1} {{\textbf {1}}}}. \end{aligned}$$
  4. 3.

    Set \(k \leftarrow k + 1\) and repeat Steps 1 and 2 until a convergence criterion is met; e.g., \( \Vert {{\hat{w}}}^{[k+1]}-{{\hat{w}}}^{[k]} \Vert _2/ \Vert w^{[k]}\Vert _2 <\tau \), for some tolerance \(\tau >0\).

A greater degree of flexibility is achieved by replacing Step 1 with non-parametric weights computed by kernel smoothing.

  1. 1’.

    Non-parametric smoothing update. Compute \(\pi ^{[k]}_t = 1- {{\hat{F}}}\left( z_t\left( w^{[k]}\right) /\theta \right) \), for \(1 \le t \le T\), where \({{\hat{F}}}\) is a non-parametric estimator for the cdf of \(Z_t\left( w^{[k]}\right) = Y_t\left( w^{[k]}\right) - {{\bar{Y}}}\left( w^{[k]}\right) \) with observations \(z_1\left( w^{[k]}\right) , \dots , z_t\left( w^{[k]}\right) \).

To find the final solution, the above algorithm solves a sequence of convex optimization tasks. In particular, note that Step 2 corresponds to a minimum variance portfolio optimization task where the usual sample covariance matrix is replaced by a weighted covariance matrix. Since the solution is available in closed-form, the computational time for this step is usually negligible. Based on our numerical experiments, the algorithm converges very quickly within a few iterations, even when the number of assets is large. Finally, note that our procedure is not limited to the constraint \(\sum _{j=1}^N w_j = 1\). A different set of constraints on the portfolio weights may be included in Step 2 of the algorithm, depending on the application at hand. Then a quadratic programming solver may be used to compute the update \({{\hat{w}}}^{[k+1]}\) instead of the explicit formula given above.

4 Simulation study

In this section, we study the performance of the SSV estimator compared to the global minimum variance portfolio (GMV) and the following competing semivariance estimation algorithms: Estrada (2008), Cumova and Nawrocki (2011), Ballestero (2005), both in-sample and out-of-sample. First, we compare our method with the minimum semivariance portfolio obtained via grid search, which can be considered as the global optimum for computationally feasible cases. To this end, we focus on a simple setting with two assets only, and we consider both assets having either negatively skewed or positively skewed returns. We generate \(100\,000\) returns from a multivariate skew-normal with the skewness of both assets equal to \(-\)0.5 or 0.5 (see Azzalini and Dalla Valle, 1996).Footnote 2 The variance of the two assets is set equal to 0.0050 and 0.0034 respectively, while their covariance is set equal to 0.0019.Footnote 3 We set the benchmark \(b={{\bar{Y}}}(w)\), such that the SSV estimator converges to the sample covariance matrix estimator when \(\theta \rightarrow \infty \), and we evaluate the results based on the portfolio downside deviation.

We use rolling windows of different lengths to compare our SSV approach with various other proposed in the literature. We rely on a parametric smoothing based on a normal cdf, testing three different values for the smoothing parameter \(\theta = 1/T \), which gives a portfolio very close to the one obtained with grid search, \(\theta =100\), a high value that results in a portfolio close to the sample minimum variance portfolio, and an intermediate value \(\theta =1/\sqrt{T} \). These choices for \(\theta \) are corroborated by our theoretical analysis in Proposition 2.1. We consider also other portfolio selection rules: the minimum semivariance portfolio obtained via grid search, the sample global minimum variance portfolio, and the minimum semivariance portfolios obtained with the techniques proposed by Ballestero (2005), Estrada (2008) and Cumova and Nawrocki (2011).

Fig. 2
figure 2

The figure illustrates the in-sample (left panels) and out-of-sample (right panels) downside deviation achieved by minimum variance and minimum semivariance portfolios with two assets and skewness of both assets equal to \(-\)0.5 (top panels) and 0.5 (bottom panels). Competing semivariance estimation algorithms: Estrada (2008), Cumova and Nawrocki (2011), Ballestero (2005)

Figure 2 shows the in-sample and the out-of-sample results with negatively skewed (Panels a and b) and with positively skewed (Panels c and d) returns. In the in-sample analysis, all strategies except for Cumova and Nawrocki (2011) achieve a lower downside deviation with shorter rather than longer estimation windows. This is due to the fact that, instead of the true (unknown) inputs, sample estimates are used in the calculations. In other words, overfitting creates the appearance of better results in-sample when the sample size in not very large. In case of multivariate normal distributed returns the size of the effect could be derived analytically.Footnote 4 For longer estimation windows, as the sample estimates converge to the true values, this effect wanes out.

In the out-of-sample evaluation, the total downside deviation is larger for shorter estimation windows due to the unknown input parameters.Footnote 5 As the estimation window increases, the downside deviation converges to the true downside deviation under known parameters. Asymptotically, the in-sample and the out-of-sample estimates of the downside deviation tend to the same values.

For all panels in Fig. 2, the dashed black line, which corresponds to the SSV estimator with \(\theta =1/T\), perfectly overlaps the green line, obtained by minimizing the semivariance via grid search and which gives the global optimum. This means that our estimator provides an extremely precise approximation of the true sample semicovariance matrix. The SSV estimator with \(\theta =100\), indicated as dashed light blue line, matches the red line obtained by using the sample covariance matrix. As expected, the SSV estimator corresponds to the sample covariance estimator when \(\theta \) is sufficiently large. Targeting the semivariance always gives the best results in-sample, but using the variance or an intermediate approach given by smoothing with \(\theta =1/\sqrt{T}\) (dashed brown line) can work better out-of-sample, especially for negatively skewed returns and short estimation windows, where parameter uncertainty is high.Footnote 6 Overall, our strategy significantly outperforms the three competitor semicovariance matrix estimation algorithms across all considered settings in approximating the target sample covariance matrix.

Besides considering different estimation windows, we also study the in-sample effect of varying the number of assets. We generate 63,000 returns for a number of assets \(N \in \{20, 40, \dots , 160\}\) by randomly sampling with replacement daily returns of S &P 500 constituents with data spanning from November 2, 1999, to April 28, 2023.Footnote 7 As we use the same sample of days for all assets, we capture the cross-correlation of stock returns. We use a rolling estimation window of 5 years (i.e., \(1\,260\) days) to calculate the asset weights.Footnote 8 We then compute the downside deviation achieved by the different portfolio selection rules. Results are reported in Fig. 3.

Fig. 3
figure 3

Downside deviation for positively skewed data when varying the number of assets N. Smoothing with \(\theta =1/\sqrt{T}\) not plotted to improve the clarity of the illustration. Competing semivariance estimation algorithms: Estrada (2008), Cumova and Nawrocki (2011), Ballestero (2005)

An interesting finding is that the approximation from Cumova and Nawrocki (2011) performs well with small portfolios, but it no longer works as N gets larger as the estimated semicovariance matrix is quasi-singular. The Estrada (2008) heuristic generally performs very poorly, and the algorithm proposed by Ballestero (2005), although much better than Estrada (2008), also fails to beat the sample covariance matrix. The SSV, on the other hand, matches the covariance matrix with \(\theta =100\) and significantly improves over it with \(\theta =1/T\). Moreover, the improvement over the other strategies gets larger as N grows. For the largest portfolio, the SSV estimation time with \(\theta =1/T\) and required convergence precision set at 0.01% (i.e., we stop the iterations when the difference between the portfolio weights estimated in the current and previous round is less than 0.01%) is only 0.27 s. This is significantly less than the 0.67 s required by Cumova and Nawrocki (2011), although Ballestero (2005) and Estrada (2008) are by far the fastest methods, with 0.041 and 0.042 s respectively. The exercise was carried out using a computer equipped with an Intel Core i7-8565U processor and 24 GB of RAM.

5 Real data application

We now consider real-world financial data to evaluate the in-sample and the out-of-sample performance of the minimum SSV portfolios. As in our Monte Carlo simulations, for the in-sample results of the SSV portfolios we use the normal cdf with \(\theta =1/T\), \(\theta =1/\sqrt{T}\) and \(\theta =100\). According to the theoretical analysis in Proposition 2.1, a choice for the smoothing parameter \(\theta \) of order \(1/\sqrt{T}\), or smaller, provides an accurate approximation of the empirical SSV to the standard non-smoothed empirical SV assuming normally distributed portfolios. We compare these results with the global minimum variance portfolio and the minimum semivariance portfolios obtained with the methods of Ballestero (2005), Estrada (2008) and Cumova and Nawrocki (2011). We employ monthly returns of the 17, 30 and 48 industry portfolios, and those of the 100 portfolios formed on size and book-to-market, all downloaded from the Kenneth R. French data library.Footnote 9 The 48 industry portfolios returns start from July 1969, as the dataset contains missing data before that date. The other datasets start from July 1926, but for the 100 portfolios formed on size and book-to-market we do not consider 30 portfolios that contain missing data, and therefore for this dataset we have \(N=70\). All datasets span up to April 2023.

For estimation, we use a rolling window of 20 years, i.e., \(T=240\), obtaining 406 portfolio returns for the 48 industry portfolios and 922 returns for the other datasets. To calculate realized portfolio returns, for the in-sample analysis we use the asset returns of the last month \(x_{T}\) in our estimation sample, i.e., the rolling window, while for the out-of-sample analysis we use the returns \(x_{T+1}\). As in our simulation experiments, we set the benchmark equal to the sample mean such that for \(\theta \rightarrow \infty \) the SSV estimator perfectly converges to the sample covariance estimator. For the in- and out-of-sample calculated portfolio returns, in addition to the downside deviation (DD), we report the standard deviation (SD), skewness, excess kurtosis, the Sortino ratio and the Sharpe ratio. To better compare the last two measures, we work with excess returns over the risk-free rate from the Kenneth R. French data library. Furthermore, to consider potential trading costs of the various portfolio selection rules, we report the average turnover (TO). The turnover at time t is \(TO_{t} = \sum _{i=1}^N|w_{i}(t)-w_{i}(t-1)|\), where each weight \( w_{i}(t-1) \) is adjusted for the effect of the returns realized in the previous period, as in DeMiguel et al. (2009). A TO of 1 means that, on average, an amount of assets equal to 100% of the wealth has to be traded.

For the out-of-sample exercise, we present only the SSV results for minimizing the semivariance of the portfolio with \(\theta =1/T\).Footnote 10 In addition to the comparison with the other strategies used in Table 1, we also use our method in combination with a shrinkage-technique to consider parameter uncertainty and to improve the out-of-sample results.Footnote 11 In particular, when estimating the sample covariance matrix (for the GMV strategy) and the sample semicovariance matrix (for the SSV and the Ballestero, 2005, strategies) we shrink the implied correlation matrix R towards the identity matrix \({\textbf{I}}\). For the minimum-variance portfolio strategy (GMV), we decompose the sample covariance matrix in a vector of sample standard deviations \(\hat{\sigma }\) and a sample correlation matrix \({{\hat{R}}}\). The shrunk correlation matrix is then given by \({\widehat{RS}}=\delta {{\hat{R}}}+(1-\delta ) {\textbf{I}}\), with \({\textbf{I}}\) the identity matrix and \(0<\delta <1\). For the calculation of the portfolio weights \({{\hat{w}}}\), we use the shrunk covariance matrix \({{\hat{\Sigma }}}:= diag({{\hat{\sigma }}})\, {\widehat{RS}}\, diag({{\hat{\sigma }}}\)). We apply the same shrinkage procedure for the estimated semicovariance matrix of Ballestero (2005), and iteratively in our SSV approach (Algorithm 1). For each loop k, after estimating the sample semicovariance matrix \({{\hat{\Sigma }}}^{[k]}_s = T^{-1} \sum _{t=1}^T(x_t - {{\bar{x}}})(x_t - {{\bar{x}}})^\top \pi _t^{[k]}\), we decompose \(\hat{\Sigma }^{[k]}_s\) in \({{\hat{\sigma }}}^{[k]}_s\) and \({{\hat{R}}}^{[k]}_s\), we shrink the semi-correlation matrix \({\widehat{RS}}^{[k]}_s=\delta {{\hat{R}}}^{[k]}_s+(1-\delta ) {\textbf{I}}\), and we use the semicovariance matrix \({{\hat{\Sigma }}}^{[k]}_s:= diag\left( {{\hat{\sigma }}}^{[k]}_s\right) {\widehat{RS}}^{[k]}_s diag\left( {{\hat{\sigma }}}^{[k]}_s\right) \) to calculate the portfolio weights.

The aim of this exercise is to show that our SSV approach can be easily combined with a shrinkage approach, with greater benefit compared to the Ballestero (2005) approach, which is a difficult benchmark to beat out-of-sample. An exhaustive comparison of the proposed minimum-semivariance strategy (SSV) under different shrinkage approaches is beyond the scope of the present paper, and it is left for future research.

Tables 1 and 2 report the in-sample and out-of-sample results, respectively, with the lowest downside deviation for each data set in bold. The first row of each table reports values for the broad US market as defined in the Kenneth R. French dataset, which uses all NYSE, AMEX and NASDAQ firms to compute its returns. Almost all portfolio selection rules improve out-of-sample over the broad market portfolio: They show lower downside deviation, standard deviation, and excess kurtosis, and higher skewness, Sortino and Sharpe Ratio.

In Table 1, when considering the in-sample downside deviation (DD), parameter uncertainty is avoided, and the SSV estimator is a very close approximation to the traditional sample semicovariance matrix. The SSV with \(\theta =1/T\) shows the best results in terms of DD. On the other hand, the other three semicovariance matrix estimation methods suffer because of the worse approximation of the semicovariance matrix compared to the proposed SSV method. In line with intuition, GMV shows the lowest standard deviation among the competitors. SSV with \(\theta =100\) matches, as expected, the GMV portfolio, showing the flexibility of the proposed approach. In terms of turnover, the SSV strategy is competitive with the other portfolio selection rules. Despite the estimation window of 240 observations is larger than the portfolio size, for an increasing number of assets the estimated semicovariance matrix using Cumova and Nawrocki (2011) is quasi-singular. As a consequence, the extreme portfolio weights imply that this method strongly underperforms for SBM 100.

The positive performance of the SSV approach also largely holds out-of-sample, see Table 2. While the SSV performs worse than the GMV and the Ballestero (2005) approaches without shrinkage, it still significantly improves over Estrada (2008) and Cumova and Nawrocki (2011). As mentioned before, the performance of the latter strategy quickly degrades as N increases, leading to very poor results with \(N=70\). The good performance of GMV reflects the trade-off between minimizing downside deviation and minimizing the estimation error in a out-of-sample exercise where the data generating process changes over time. Although GMV uses another objective function, i.e., optimizes over variance rather than semivariance, it benefits from using all the observations for estimation. As in our data sets returns are not heavily skewed, the benefit from using all observations and having lower estimation errors might outweigh the drawback of using variance instead of semivariance. When combined with shrinking, the SSV approach outperforms for all datasets, except for the largest dataset, where it achieves the same downside deviation as the GMV. Furthermore, the turnover more than halves when using shrinkage, with basically the same values for SSV \(\theta =1/N\) (shrink) and GMV (shrink).

Table 1 In-sample, monthly data, benchmark equal to sample mean return
Table 2 Out-of-sample, monthly data, benchmark equal to sample mean return

As a robustness analysis, in “Appendix B” we report additional in- and out-of-sample results for daily and quarterly data. For daily data we use an estimation window of 1260 observations, i.e., approximately 5 years.Footnote 12 For quarterly data we use 80 observations in the estimation window, i.e., 20 years (as with monthly data). Furthermore, for all data frequencies we also report results for the benchmark being equal to sample median return (see Bernard et al., 2019). Our findings remain unaltered for these additional settings. The problem affecting the Cumova and Nawrocki (2011) algorithm disappears with daily data (due to the much larger sample size used for estimation), while it becomes more serious with quarterly data (as fewer observations are used for the estimation). Again, out-of-sample most portfolio selection rules improve over the broad US market in terms of downside deviation and the other risk-return measures, and—across the different portfolio selection rules — the SSV results are the most promising one. As optimal shrinkage is not the focus of our analysis, for the out-of-sample results we simply set the shrinkage factor equal to 0.9 for daily data, to 0.8 for monthly data, and to 0.7 for quarterly data. This corresponds to a stronger shrinkage effect with lower frequency data, justified by the fact that less observations are used for estimation, which is therefore less precise. Selecting an optimal, time-varying, value for the shrinkage parameter would likely improve our results, but it is beyond the scope of this work.

6 Conclusions

The SSV matrix estimator introduced in this paper provides an extremely precise approximation of the sample semicovariance matrix of a set of assets. By changing the single tuning parameter \(\theta \), the SSV can span the entire set of portfolios included between the minimum semivariance and the minimum variance portfolio. The low computational intensity of this procedure makes it suitable for portfolio optimization problems with many assets, contrary to the grid search algorithm, which despite being accurate, is only feasible for relatively small portfolios. Compared to other approaches considered in this paper, the proposed method is unbiased in large samples since it targets the actual portfolio semivariance instead of a heuristic approximation. Our theoretical derivation and numerical findings support these claims. Although we illustrate our algorithm on the relatively simple case of minimum semivariance optimization, our procedure is very general. With minor modifications, our algorithm can accommodate more sophisticated constraints on the portfolio weights, depending on the specific application at hand. Furthermore, other types of objective functions involving the semivariance, beyond the quadratic objective function studied in this paper, may be considered in the future using a similar framework.

For real-world data, we show that in-sample the proposed SSV approach outperforms the competing portfolio selection rules in terms of downside deviation. Out-of-sample, while the SSV approach underperforms some of the considered portfolio selection rules, it outperforms all of them when combined with shrinkage estimation of the semivariance. This indicates that the higher parameter uncertainty implied by the semicovariance estimation can offset the benefits from targeting a more realistic objective function. Overall, our SSV approach provides a robust and flexible framework for semivariance portfolio minimization. Out of sample, almost all of our considered portfolio selection rules improve over the broad market portfolio. They show lower downside deviation, standard deviation, and excess kurtosis, and higher skewness, Sortino ratio and Shape ratio. Across the different settings, our SSV approach results as the most promising portfolio selection technique, especially when combined out-of-sample with a shrinkage approach.

Looking forward, several avenues for further research present themselves. Firstly, extending the application of the smoothed semivariance (SSV) estimator to periods where asset returns present particularly heavy-tailed distributions or skewed distributions could provide insights into its performance under different market conditions and risk profiles. Although we have considered both b equal to the mean and the median and different time frequencies (i.e., daily, monthly and quarterly), further analysis is required to provide additional insights on the applicability and performance of the SSV method. While the performance of smooth weights depending on the normal cdf worked reasonably well in our numerical studies, a more detailed understanding on the choice of the smoothing function in a wider range of scenarios would be useful. Secondly, investigating adaptive approaches for selecting the smoothing parameter \(\theta \) in the SSV estimator based on changing market conditions or asset characteristics could enhance its flexibility and robustness. For instance, \(\theta \) might be considered as a tuning parameter in an appropriate data-fitting procedure. Thirdly, further analyses should focus on combining SSV with shrinkage techniques as well as the development of other procedures to mitigate the effects of parameter uncertainty in the semicovariance matrix estimation. Fourthly, analyzing the risk-return tradeoff in portfolio optimization using the SSV approach under different risk preferences and investment objectives could offer valuable guidance for investors. Lastly, extending the application of the SSV estimator beyond traditional asset classes to alternative investments like cryptocurrencies, commodities, or real estate could broaden its applicability across different investment domains.