1 Introduction

Portfolio optimization is a prominent topic in finance that involves selecting securities from among a large and complex array of candidates to allocate wealth rationally with the goal of maximizing returns and minimizing risks. Although investors often have access to numerous assets, investing in a large number of them can result in high transaction costs and increased complexity in portfolio management. Consequently, most investors are only able to invest in a limited number of assets. In this paper, we address multi-period sparse portfolio selection problems, aimed at developing an optimal strategy for long-term investment through sparse portfolio selection.

Markowitz (1968) proposed the mean-variance (MV) portfolio selection model, which established the foundation of modern portfolio theory. This model defines the mean of returns as the measure of gain and considers the variance of returns as a measure of risk. Mathematically, the classical MV model can be formulated as a quadratic programming problem as follows:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{x}} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}\\ \mathrm{s.t.} &{} \sum _{i=1}^n \mu _ix_i \ge \rho , \\ &{} \sum _{i=1}^n x_i = 1, \end{array} \end{aligned} \end{aligned}$$
(1)

where \(\textbf{x}=[x_1,x_2,\ldots ,x_n]^\top \) denotes the weight vector, H denotes the covariance matrix of returns, and the objective function is to minimize portfolio risk. \(\mu \) is the expected return vector, and \(\rho \) is the minimum end-of-period wealth value. To enhance realism, many researchers have extended the MV model under various settings, such as minimum investment, maximum investment, and cardinality constraints (see, e.g., Jacob 1974; Perold 1984; Chang et al. 2000; Bertsimas and Cory-Wright 2022). We refer to Zhang et al. (2018), Mencarelli and d’Ambrosio (2019), Cui et al. (2022) for surveys on the MV portfolio selection model and its variants.

When the number of assets is large and returns are highly correlated, the MV model produces a weight vector with many non-zero that infinitely close to zero values, making the model unstable with poor out-of-sample performance (Cui et al., 2018; Huang et al., 2021). On the other hand, an excessive number of assets can increase management difficulty and transaction costs for investors. To address these issues, Gao and Li (2013) developed the cardinality constrained MV model, which selects a small number of assets from a larger pool. This model is formulated as

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{x}} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}+\lambda \Vert \textbf{x}\Vert _0 \\ \mathrm{s.t.} &{} \sum _{i=1}^n \mu _ix_i \ge \rho , \\ &{} \sum _{i=1}^n x_i = 1, \end{array} \end{aligned} \end{aligned}$$
(2)

where \(\lambda \) is a trade-off parameter, and \(\Vert \textbf{x}\Vert _0\) denotes the number of nonzero elements in \(\textbf{x}\). The model (2) can produce sparse portfolios, which are more attainable in real-life situations. However, the constrained MV model (2) is NP-hard (Gao & Li, 2013), and can only be solved by using some heuristic or convex relaxation methods (Chang et al., 2000; Bertsimas & Shioda, 2009; Anis & Kwon, 2022).

To remedy the NP-hardness of the \(\ell _0\)-norm penalty model (2), some convex penalty methods have been developed to produce sparse portfolio selection solution and improve out-of-sample performance. For instance, DeMiguel et al. (2009) proposed using \(\ell _1\) and squared \(\ell _2\) norm constraints to achieve the minimum variance criterion. It is worth noting that \(\ell _1\)-regularization for MV portfolio selection can be seen as an adaptation of the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996), which can be solved exactly. Yen and Yen (2014) and Ho et al. (2015) introduced an elastic net penalty in the context of constrained minimum variance portfolio optimization. Fastrich et al. (2015) studied a weighted LASSO approach for minimum variance portfolios and proposed a calibration scheme wherein the weights are selected based on the variability in the volatility of each asset. Various regularization techniques applied in the Markowitz MV framework have been proposed in recent works such as Fastrich et al. (2015); Dai and Wen (2018); Corsaro and De Simone (2019); Dai and Kang (2021).

The large appeal of using convex penalties such as \(\ell _1\)-regularization in portfolio optimization is mainly due to the fact that they can be solved using convex optimization methods (Corsaro & De Simone, 2019; Kremer et al., 2020). However, Fan and Li (2001) showed that the \(\ell _1\)-penalty tends to produce biased estimates and becomes ineffective when there are both portfolio budget constraints and short-selling constraints in portfolio selection. To address this issue, nonconvex penalties that promote sparsity while countering bias by satisfying singularity at the origin have been suggested. Actually, the nonconvex SCAD penalty proposed in Fan and Li (2001) has been used in Fastrich et al. (2015), Kim et al. (2016) for sparse portfolio selection. Cui et al. (2018) proposed a model for solving sparse portfolio selection problems using nonconvex fraction penalty functions. More recently, Li and Zhang (2022) introduced a nonconvex penalty to promote sparse asset selection for short-term single period portfolio selection problems. Other nonconvex penalties, such as \(\ell _q\) norm (\(0< q < 1\)), Capped \(\ell _1\), log-function, have also been used to solve the sparse portfolio selection (Fastrich et al., 2014; Xu et al., 2016; Benidis et al., 2018). However, there is still a lack of unified frameworks and theoretically guaranteed algorithms for these nonconvex penalties based on MV portfolio selection models.

In the actual decision-making process, investors often have the flexibility to adjust their asset positions multiple times depending on market conditions, making it a multi-period process (Li & Ng, 2000; Cui et al., 2014, 2022). Wealth is reallocated at the beginning of each period with the goal of maximizing returns upon exit from the market. Based on this background, Li and Ng (2000) first developed a multi-period MV model, and Cui et al. (2014) extended the MV model to multiple sub-periods under the assumption that short selling is not allowed. Pun and Wong (2019) developed a linear programming model for selecting sparse high-dimensional multi-period portfolios and introduced a constrained \(\ell _1\) minimization approach to directly estimate parameters in the optimal portfolio solution. Nystrup et al. (2019) proposed a model predictive control method based on a multivariate hidden Markov model with time-varying parameters to dynamically optimize an investment portfolio and control drawdowns. Corsaro et al. (2021a, 2021b) recently proposed a fused LASSO model to solve sparse multi-period portfolio problems and utilized the split Bregman algorithm in the implementation. Li et al. (2022) studied multi-period portfolio optimization problems with MV and risk parity asset allocation frameworks. For more information on the multi-period MV portfolio selection model, see the recent survey by Cui et al. (2022).

In this paper, we further study a possibly nonconvex penalty-based MV framework for solving the sparse multi-period portfolio selection problem. For the developed nonconvex optimization model, we propose a generalized alternating direction method of multipliers (ADMM) with theoretically guaranteed convergence. Previous work often used heuristic algorithms, such as genetic algorithms and particle swarm optimization, to solve the model (Chen & Wei, 2019; Silva et al., 2019). However, the speed of convergence is slow, and convergence cannot be guaranteed. As a benchmark method for structured convex optimization problems, ADMM is widely used in many practical fields (Boyd et al., 2011; Maneesha & Swarup, 2021; Han, 2022). ADMM can be seen as an extension of the augmented Lagrangian method that decomposes the problem and updates primal and dual variables alternatively, making subproblems easy to solve and applicable to large-scale optimization problems. Indeed, ADMM has been proposed to solve sparse portfolio problems with convex penalties, and global convergence can be guaranteed (Chen et al., 2020). In recent years, ADMM has been extended to solve nonconvex optimization problems, and global convergence can be guaranteed under the Kurdyka-Łojasiewicz framework (Kurdyka, 1998), as shown in Guo et al. (2017), Wu et al. (2017), Themelis and Patrinos (2020), Boţ and Nguyen (2020). However, due to the special property of the nonconvex penalty-based multi-period MV model, existing algorithms cannot solve the problem directly with guaranteed convergence.

Contributions of the paper are threefold. Firstly, we propose a possibly nonconvex penalty-based sparse multi-period MV model that includes two possibly nonconvex penalties on the weight vector to produce a sparse portfolio in a single period and reduce the number of changes in adjacent periods. This model provides a general framework for solving multi-period MV portfolio selection problems. Secondly, we propose a generalized ADMM method to solve the unified model, where each subproblem can be solved efficiently. Thirdly, the rigorous theoretical analysis for the generalized ADMM are conducted based on the Kurdyka-Łojasiewicz property. The computational scalability of the algorithm and the impressive performance of the presented model are demonstrated through some out-of-sample empirical tests in numerical experiments.

The rest of this paper is organized as follows. In Sect. 2, we propose a unified sparse multi-period MV model with possibly nonconvex penalties. In Sect. 3, we develop a generalized ADMM to solve the novel model. The global convergence of the proposed algorithm is rigorously analyzed in Sect. 4 based on subdifferential theory and the Kurdyka-Łojasiewicz property. We report some numerical results on several datasets from practical applications in Sect. 5. Finally, we conclude this paper in Sect. 6.

2 Nonconvex sparse multi-period mean-variance model

We first refer to the fused LASSO model presented in Corsaro et al. (2021b). More precisely, let m denote the number of subperiods, decision taken at time \(j~(j=1,2,\ldots ,m)\) is kept in the j-th sub-period \([j, j+1)\) of the investment. Let n denote the number of assets that can be invested at each sub-period, then the strategy of portfolio selection at whole period can be denoted by

$$\begin{aligned} \textbf{x}=[\textbf{x}_1,\textbf{x}_2,\ldots ,\textbf{x}_m]^\top \in {\mathbb {R}}^N, \end{aligned}$$
(3)

where \(\textbf{x}_j\in {{\mathbb {R}}}^n\) is the portfolio of holdings at the beginning of sub-period \(j~(j=1,2,\ldots ,m)\) and \(N = mn\). Note that \((\textbf{x}_j)_i\) is the portion of the investor’s total wealth invested in asset i at j-th sub-period.

Assuming that \(j=1\) is the initial period, we denote \(\textbf{r}_j\in {\mathbb {R}}^n\) as the expected return vector and \(H_j\in {\mathbb {R}}^{n\times n}\) as the covariance matrix, which is assumed to be positive definite. Then, the FL model can be formulated as follows:

$$\begin{aligned} \begin{aligned} (\mathrm FL) \qquad \begin{array}{cl} \min \limits _{\textbf{x}\in {\mathbb {R}}^N} &{} \sum _{j=1}^m\left( \frac{1}{2}{} \textbf{x}_j^\top H_j\textbf{x}_j +\tau _1\Vert \textbf{x}_j\Vert _1\right) +\tau _2\sum _{j=1}^{m-1}\Vert \textbf{x}_{j+1}-\textbf{x}_j\Vert _1 \\ \mathrm{s.t.} &{} \textbf{x}_1\textbf{1}_n= \xi _0, \\ &{} \textbf{x}_j\textbf{1}_n=(\textbf{1}_n+\textbf{r}_{j-1})^\top \textbf{x}_{j-1}, ~~j=2,3,\ldots ,m,\\ &{} \textbf{x}_j\textbf{1}_n\ge (\mathbf{x_{\min }})_{j-1}, ~~j=2,3,\ldots ,m,\\ &{} (\textbf{1}_n+\textbf{r}_m)^\top \textbf{x}_m \ge (\mathbf{x_{\min }})_m, \end{array} \end{aligned} \end{aligned}$$
(4)

where \(\textbf{1}_n\) is the column vector with n elements, all equal to one, \(\xi _0\) is the initial wealth, \(\mathbf{x_{\min }}\) is the vector of expected minimum wealth, \(\tau _1\) and \(\tau _2\) are trade-off parameters. Note that in the objective function of (4), the quadratic term represents the portfolio risk, which is the sum of all sub-period variances. The \(\ell _1\)-norm is used to promote sparsity in the solution. In particular, the terms \(\Vert \textbf{x}_j\Vert _1~(j=1,2,\ldots ,m)\) and \(\Vert \textbf{x}_{j+1}-\textbf{x}_j\Vert _1~ (j=1,2,\ldots , m-1)\) characterize the sparsity of the investment in a single period and the rebalance during successive periods, respectively. The constraints are all imposed on wealth, including the initial equilibrium and the constraint on minimal return.

As presented in Corsaro et al. (2021a, 2021b), the model (4) can be reformulated as the following compact form:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{x}\in {\mathbb {R}}^N} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}+\tau _1\Vert \textbf{x}\Vert _1+\tau _2\Vert F\textbf{x}\Vert _1 \\ \mathrm{s.t.} &{} E\textbf{x} = \textbf{b}, \\ &{} G\textbf{x}\ge \mathbf{x_{\min }}, \end{array} \end{aligned}\end{aligned}$$
(5)

where \(\textbf{b}=(\xi _0,0,0,\ldots ,0)^\top \in {\mathbb {R}}^m\), and the matrices are defined as follows:

$$\begin{aligned} H=\left( \begin{array}{cccc} H_1&{}\textbf{0}&{}\cdots &{}\textbf{0}\\ \textbf{0}&{}H_2&{}\ddots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}\textbf{0}\\ \textbf{0}&{}\cdots &{}\textbf{0}&{}H_m\\ \end{array} \right) ,~~~ F=\left( \begin{array}{cccccc} -I&{}I&{}\textbf{0}&{}\cdots &{}\textbf{0}\\ \textbf{0}&{}-I&{}I&{}\ddots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}I&{}\textbf{0}\\ \textbf{0}&{}\cdots &{}\textbf{0}&{}-I&{}I\\ \end{array} \right) , \end{aligned}$$

and

$$\begin{aligned} E=\left( \begin{array}{cccc} \textbf{1}_n&{}-(\textbf{1}_n+\textbf{r}_1)&{}\cdots &{}\textbf{0}\\ \textbf{0}&{}\textbf{1}_n&{}\ddots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}-(\textbf{1}_n+\textbf{r}_{m-1})\\ \textbf{0}&{}\cdots &{}\textbf{0}&{}\textbf{1}_n\\ \end{array} \right) ,~~~ G=\left( \begin{array}{ccccc} \textbf{0}_n&{}\textbf{1}_n&{}\cdots &{}\textbf{0}&{}\textbf{0}\\ \textbf{0}&{}\textbf{0}_n&{}\ddots &{}\vdots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}\textbf{0}_n&{}\textbf{1}_n\\ \textbf{0}&{}\cdots &{}\cdots &{}\textbf{0}&{}\textbf{1}_n+\textbf{r}_m\\ \end{array} \right) . \end{aligned}$$
Table 1 Popular choices of \(\Phi (x)=\sum _{i=1}^N g_{\kappa }(x_i)\) or \(\Psi (x)=\sum _{i=1}^N g_{\kappa }(x_i)\) and their proximal operators
Fig. 1
figure 1

Illustration of some common penalty functions

Recently, nonconvex penalties have received much attention in sparse learning problems, as they have been found to have nearly unbiased properties and can overcome the limitations of the \(\ell _1\)-norm. Moreover, in many cases, \(\ell _1\) regularization has been shown to be suboptimal; for instance, it cannot recover a signal with the least measurements when applied to compressed sensing (Xu et al., 2012). Therefore, in this paper, we propose a general nonconvex penalty mean-variance (GNPMV) model to solve the multi-period sparse portfolio selection problem, as follows:

$$\begin{aligned} (\mathrm GNPMV) \qquad \begin{array}{cl} \min \limits _{\textbf{x}\in {\mathbb {R}}^N} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}+\tau _1\Phi (\textbf{x})+\tau _2\Psi (F\textbf{x}) \\ \mathrm{s.t.} &{} E\textbf{x} = \textbf{b}, \\ &{} G\textbf{x}\ge \mathbf{x_{\min }}, \end{array} \end{aligned}$$
(6)

where \(\Phi \) and \(\Psi \) are possibly nonconvex penalty functions, and the other notations are the same as those in (4). We list several popular choices of \(\Phi \) and \(\Psi \) in Table 1, including \(\ell _{1/2}\) regularization (Xu et al., 2012), smoothly clipped absolute deviation (SCAD) penalty (Fan & Li, 2001), minimax concave penalty (MCP) (Zhang, 2010a), and capped \(\ell _1\) penalty (CAP) (Zhang, 2010b). In Table 1, we also provide the proximal operator of the nonconvex penalties, which will be useful in solving the subproblems in the implementation. The proximal operator of a function g with \(\lambda >0\) is defined by

$$\begin{aligned} \textrm{prox}_{\lambda g }(t) = \arg \min _x \left\{ g (x) + \frac{1}{2\lambda }\left\| x-t\right\| ^2\right\} . \end{aligned}$$
(7)

In addition, we show several common penalty functions with fixed \(c=2\) and \(\kappa =1\) in Fig. 1 to illustrate the difference between convex and nonconvex penalties.

3 A generalized ADMM for solving GNPMV model

The GNPMV model (6) is a nonconvex and nonsmooth optimization problem if nonconvex penalties are chosen. For this potentially nonconvex model, previous works have typically relied on existing solvers or heuristic algorithms for solving it. However, the convergence speed of such methods can be slow, and theoretical convergence guarantee is lacking.

We present a generalized alternating direction method of multipliers (ADMM) for efficiently solving the nonconvex GNPMV model (6) with guaranteed convergence. To achieve this, we introduce auxiliary variables \(\textbf{t}\in {\mathbb {R}}^{N}\), \(\textbf{y}\in {\mathbb {R}}^{N-n}\), and \(\textbf{z}\in {\mathbb {R}}^N\), which allows us to reformulate (6) as follows:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{t},\textbf{y},\textbf{z},\textbf{x}} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x} +\tau _1\Phi (\textbf{t})+\tau _2\Psi (\textbf{y}) + \Pi _{{{\mathbb {R}}}_+^N}(\textbf{z})\\ \mathrm{s.t.} &{} \textbf{x}=\textbf{t}, \\ &{} E\textbf{x} = \textbf{b}, \\ &{} F\textbf{x}= \textbf{y},\\ &{} G\textbf{x}+\textbf{z}= \mathbf{x_{\min }}, \end{array} \end{aligned} \end{aligned}$$
(8)

where \(\Pi _{{{\mathbb {R}}}_+^N}(\textbf{z})\) is an indicator function that equals 0 if \(\textbf{z} \in {{\mathbb {R}}}_+^N\), and is otherwise infinite.

Let

$$\begin{aligned} A=\left( \begin{array}{c} I\\ E\\ F\\ G\\ \end{array} \right) ,~~~ B=\left( \begin{array}{c} -I\\ 0\\ 0\\ 0\\ \end{array} \right) ,~~~ C=\left( \begin{array}{c} 0\\ 0\\ -I\\ 0\\ \end{array} \right) ,~~~ D=\left( \begin{array}{c} 0\\ 0\\ 0\\ I\\ \end{array} \right) ,~~~ \textbf{q}=\left( \begin{array}{c} 0\\ b\\ 0\\ \textbf{x}_{\min }\\ \end{array} \right) . \end{aligned}$$
(9)

Then, we can reformulate the model (8) into the following compact form:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{t},\textbf{y},\textbf{z},\textbf{x}} \frac{1}{2}\textbf{x}^\top H\textbf{x}+\tau _1\Phi (\textbf{t})+\tau _2{\Psi (\textbf{y})} + \Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z})\\ \mathrm{s.t.} ~~~{ A\textbf{x}+B\textbf{t}+C\textbf{y}+D\textbf{z}=\textbf{q},} \end{array} \end{aligned} \end{aligned}$$
(10)

where \(\Phi \) and \(\Psi \) are proper, closed, and nonnegative functions that may be nonconvex and nonsmooth.

Define the augmented Lagrangian function of problem (10) as follows:

$$\begin{aligned}&{\mathcal {L}}_\beta ({ \textbf{t},\textbf{y},\textbf{z},\textbf{x},{\gamma }}) = \frac{1}{2}{ \textbf{x}^\top }H{ \textbf{x}}+ \tau _1\Phi (\textbf{t})+\tau _2\Psi (\textbf{y})+\Pi _{{\mathbb {R}}_{+}^{N}}({ \textbf{z}})\nonumber \\&~~~~~~~~+\langle \gamma , A\textbf{x} +B\textbf{t}+C\textbf{y}+D\textbf{z}-\textbf{q}\rangle +\frac{\beta }{2}\Vert A\textbf{x} +B\textbf{t}+C\textbf{y}+D\textbf{z}-\textbf{q}\Vert ^2, \end{aligned}$$
(11)

where \({\gamma }\) is the Lagrangian multiplier corresponding to the equality constraint in (10) and \(\beta >0\) is a penalty parameter. The generalized ADMM framework is presented in Algorithm 1, where the primal and dual variables are updated alternately with respect to the augmented Lagrangian function (11).

Algorithm 1
figure a

A generalized ADMM for solving GNPMV model (6)

Let \(\gamma _1\), \(\gamma _2\), and \(\gamma _3\) be the components of \({\gamma }\) corresponding to the Lagrangian multipliers with respect to the constraints \(\textbf{x}=\textbf{t}\), \(F\textbf{x}=\textbf{y}\), and \(G\textbf{x}+\textbf{z}=\textbf{x}_{\min }\) in (8), respectively. We now specify the implementation of subproblems in Algorithm 1:

  • The \(\textbf{t}\)-subproblem (12a) is equivalent to estimating the proximal operator of \(\Phi \), which can be read as

    $$\begin{aligned} \begin{aligned} \textbf{t}^{k+1}&= \arg \min _\textbf{t} \left\{ \tau _1\Phi (\textbf{t})+\frac{\beta }{2}\left\| \textbf{x}^k-\textbf{t}+\frac{{\gamma }_1^k}{\beta }\right\| ^2\right\} \\&= \textrm{prox}_{\frac{\tau _1}{\beta }\Phi }\left( \textbf{x}^k+\frac{{\gamma }_1^k}{\beta }\right) . \end{aligned} \end{aligned}$$
    (14)
  • Similarly, the \(\textbf{y}\)-subproblem (12b) is equivalent to estimating the proximal operator of \(\Psi \) as follows:

    $$\begin{aligned} \begin{aligned} \textbf{y}^{k+1}&= \arg \min _\textbf{y} \left\{ \tau _2\Psi (\textbf{y})+\frac{\beta }{2}\left\| {F\textbf{x}}^k-\textbf{y}+\frac{{\gamma }_2^k}{\beta }\right\| ^2\right\} \\&= \textrm{prox}_{\frac{\tau _2}{\beta }\Psi }\left( {F\textbf{x}}^k+\frac{{\gamma }_2^k}{\beta }\right) . \end{aligned}\end{aligned}$$
    (15)
  • The \(\textbf{z}\)-subproblem (12c) is equivalent to deriving the projection onto \({\mathbb {R}}_{+}^{N}\), which is

    $$\begin{aligned} \begin{aligned} \textbf{z}^{k+1}&= \arg \min _\textbf{t} \left\{ \Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z})+\frac{\beta }{2}\left\| G\textbf{x}^k+\textbf{z}- \mathbf{x_{\min }}+\frac{{\gamma }_3^k}{\beta }\right\| ^2\right\} \\&= \textrm{Proj}_{{\mathbb {R}}_{+}^{N} }\left( \mathbf{x_{\min }}-\frac{{\gamma }_3^k}{\beta }-G \textbf{x}^k\right) . \end{aligned}\end{aligned}$$
    (16)
  • The \(\textbf{x}\)-subproblem (12d) is equivalent to solving the following linear system:

    $$\begin{aligned} H\textbf{x} +A^\top {\gamma }^k+\beta A^\top (A\textbf{x} +B\textbf{t}^{k+1}+C\textbf{y}^{k+1}+D\textbf{z}^{k+1}-\textbf{q})=0. \end{aligned}$$
    (17)

We can see that the \(\textbf{z}\)-subproblem has an explicit solution, while the \(\textbf{t}\)- and \(\textbf{y}\)-subproblems depend on the choices of \(\Phi \) and \(\Psi \). If the popular nonconvex penalties presented in Table 1 are chosen, the closed-form solutions of the \(\textbf{t}\)- and \(\textbf{y}\)-subproblems can be obtained. The linear system (17) can be efficiently solved using sparse Cholesky factorization (Corsaro et al., 2021a) or the conjugate gradient method (Wright & Nocedal, 1999).

4 Convergence analysis

Although the theoretical convergence of ADMM has been studied for various nonconvex optimization problems, such as those presented in Guo et al. (2017); Themelis and Patrinos (2020); Wang et al. (2019), the assumptions made in these studies are not always easy to verify or satisfy, especially for concrete applications. Thus, for the sake of self-containedness in this paper, we will analyze the global convergence of ADMM in Algorithm 1 to solve the nonconvex portfolio optimization problem (8).

4.1 Preliminaries

For an extended-real-valued function g, the domain of g is defined as

$$\begin{aligned} \textrm{dom} g:=\{\textbf{x}\in {\mathbb {R}}^n\;|\;g(\textbf{x})<\infty \}. \end{aligned}$$

A function g is closed if it is lower semicontinuous and is proper if \(\textrm{dom}g\ne \emptyset \) and \(g(\textbf{x})>-\infty \) for any \(\textbf{x}\in \textrm{dom} g\). For any point \(\textbf{x}\in {\mathbb {R}}^{n}\) and subset \(S \subseteq {\mathbb {R}}^{n}\), the Euclidean distance from \(\textbf{x}\) to S is defined by

$$\begin{aligned} \textrm{dist}(\textbf{x},S):= \inf \big \{\Vert \textbf{y}-\textbf{x}\Vert \; \big | \; \textbf{y}\in S\big \}. \end{aligned}$$

For a proper and closed function \(g:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\cup \{\infty \}\), a vector \( \textbf{u}\in \partial g(\textbf{x})\) is a subgradient of g at \(\textbf{x}\in \textrm{dom}g\), where \(\partial g\) denotes the subdifferential of g (Rockafellar & Wets, 2009) defined by

$$\begin{aligned} \partial g(\textbf{x}):=\big \{\textbf{u}\in {\mathbb {R}}^n\;|\;\exists \textbf{x}^k\rightarrow \textbf{x},~ \widehat{\partial }g(\textbf{x}^k) \ni \textbf{u}^k \rightarrow \textbf{u} ~\textrm{with}~g(\textbf{x}^k)\rightarrow g(\textbf{x})\big \} \end{aligned}$$
(18)

with \(\widehat{\partial }g(\textbf{x})\) being the set of regular subgradients of g at \(\textbf{x}\):

$$\begin{aligned} \widehat{\partial }g(\textbf{x}):=\big \{\textbf{u}\in {\mathbb {R}}^n~|~g(\textbf{y})&\ge g(\textbf{x})+\langle \textbf{u},\textbf{y}-\textbf{x}\rangle +o(\Vert \textbf{y}-\textbf{x}\Vert ),~\forall \textbf{y}\in {\mathbb {R}}^n\big \}. \end{aligned}$$

As discussed in Rockafellar and Wets (2009), it holds that \(\widehat{\partial }g(\textbf{x})\subseteq \partial g(\textbf{x})\) and both of them are closed. Note that for a continuously differentiable function f, the subdifferential of f reduces to the gradient of f, denoted by \(\nabla f\). Furthermore, if \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is continuously differentiable and \(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\cup \{\infty \}\) is proper and lower semicontinuous, it follows from Rockafellar and Wets (2009) that \(\partial (f+g)=\nabla f+\partial g\). A point \(\textbf{x}^*\) is called (limiting-) critical point or stationary point of a cost function F if it satisfies \(0\in \partial F(\textbf{x}^*)\), and the set of critical points of F is denoted by \(\textrm{crit} F\).

Definition 1

We say that \((\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, {\gamma }^*)\) is a critical point of the augmented Lagrangian function \( {\mathcal {L}}_\beta (\cdot )\) in (11) if it satisfies

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \tau _1\partial _\textbf{t} \Phi ({\textbf{t}^*})+B^\top {\gamma }^*, \\ 0\in \tau _2\partial _\textbf{y} \Psi ({\textbf{y}^*})+C^\top {\gamma }^*, \\ 0\in \partial _\textbf{z} \Pi _{{\mathbb {R}}_{+}^{N}}({\textbf{z}^*})+D^\top {\gamma }^*, \\ 0=\frac{1}{2}C\textbf{x}^*+A^\top {\gamma }^*,\\ 0=A\textbf{x}^*+B\textbf{t}^*+C\textbf{y}^*+D\textbf{z}^*-\textbf{q}. \end{array}\right. \end{aligned}$$
(19)

It is straightforward to observe that a critical point of the augmented Lagrangian function of (10) corresponds to a KKT point associated with it.

We now introduce the definition of Kurdyka-Łojasiewicz (KL) function and uniform KL property, as borrowed from Attouch et al. (2013); Bolte et al. (2014), respectively. These concepts will aid in establishing global convergence.

Definition 2

Let \(f:{\mathbb {R}}^n\rightarrow (-\infty ,\infty ]\) be a proper and lower semicontinuous function.

(i) The function f is said to have KL property at \(\textbf{x}^*\in \textrm{dom}(\partial f)\) if there exists \(\eta \in (0,+\infty ]\), a neighborhood U of \(\textbf{x}^*\), and a continuous and concave function \(\varphi :[0,\eta )\rightarrow \mathbb {R^+}\) such that

(a) \(\varphi (0)=0\) and \(\varphi \) is continuously differentiable on \((0,\eta )\) with \(\varphi '>0;\)

(b) for all \(\textbf{x}\in U\cap \{\textbf{z}\in {\mathbb {R}}^n|f(\textbf{x}^*)<f(\textbf{z})<f(\textbf{x}^*)+\eta \}\), the following KL inequality holds:

$$\begin{aligned} \varphi '(f(\textbf{x})-f(\textbf{x}^*)){ \mathrm dist}(0,\partial f(\textbf{x}))\ge 1. \end{aligned}$$

(ii) If f satisfies the KL property at each point of \( \textrm{dom}(\partial f)\), then f is called a KL function.

Throughout this paper, we assume that the objective function of (10) is coercive and there exists at least a KKT point of (10).

4.2 Convergence

In this subsection, we are devoted to analyzing the convergence of Algorithm 1. Recalling the iterative scheme (12)–(13), we first present the first-order optimality conditions of the subproblems in Algorithm 1 as follows:

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \tau _1\partial _\textbf{t} \Phi ({\textbf{t}^{k+1}})+B^\top {\gamma }^k +\beta B^\top (A \textbf{x}^k+B \textbf{t}^{k+1}+C \textbf{y}^{k}+D \textbf{z}^k-\textbf{q}), \\ 0\in \tau _2\partial _\textbf{y} \Psi ({\textbf{y}^{k+1}})+C^\top {\gamma }^k+\beta C^\top (A \textbf{x}^k+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^k-\textbf{q}), \\ 0\in \partial _\textbf{z} \Pi _{{\mathbb {R}}_{+}^{N}}({\textbf{z}^{k+1}})+D^\top {\gamma }^k+\beta D^\top (A \textbf{x}^k+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q}), \\ 0=\frac{1}{2}C\textbf{x}^{k+1}+A^\top {\gamma }^{k}+\beta A^\top {A \textbf{x}^{k+1}+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q}},\\ {\gamma }^{k+1}={\gamma }^k+\beta (A \textbf{x}^{k+1}+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q}). \end{array}\right. \end{aligned}$$
(20)

In the following, we first present several lemmas to characterize the properties of the sequences generated by Algorithm 1. The proofs of these lemmas can be found in Appendix A.

Lemma 1

Let \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) be the sequence generated by Algorithm 1. Then, for any \(k>0\), we have

$$\begin{aligned} \Vert \gamma ^{k+1}-\gamma ^k\Vert ^2\le \frac{1}{\lambda _{\min }}\Vert A^\top (\gamma ^{k+1}-\gamma ^k)\Vert ^2, \end{aligned}$$

where \(\lambda _{\min }\) is the smallest eigenvalue of \(A^\top A\).

Lemma 2

Let \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) be the sequence generated by Algorithm 1, then the sequence \(\{{\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)\}\) is decreasing, i.e.,

$$\begin{aligned} {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k+1},{\gamma }^{k+1})-{\mathcal {L}}_\beta (\textbf{t}^{k},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},{\gamma }^{k}) \le -b\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$
(21)

where \(b>0\) is a certain positive constant.

Lemma 3

The sequence \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) generated by Algorithm 1 is bounded.

Lemma 4

Let \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) be the sequence generated by Algorithm 1, then we have

$$\begin{aligned} \underset{k\rightarrow \infty }{\lim }\Vert \textbf{t}^{k+1}-\textbf{t}^k\Vert +\Vert \textbf{y}^{k+1}-\textbf{y}^k\Vert +\Vert \textbf{z}^{k+1}-\textbf{z}^k\Vert +\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert +\Vert \gamma ^{k+1}-\gamma ^k\Vert =0. \end{aligned}$$

Remark 1

Note that in practical computation, the value of \({\hat{\beta }}\) might be too large, which will lead to slow convergence. As suggested in Li and Pong (2016), Yang et al. (2017), one could initialize the algorithm with a small \(\beta \) less than \({\hat{\beta }}\), and then increase \(\beta \) by a constant ratio if \(\beta \le {\hat{\beta }}\) and the sequence generated by the algorithm becomes unbounded or the successive change of the sequence does not vanish sufficiently fast. It is obvious that one can get \(\beta >{\hat{\beta }}\) after at most finitely many increases and then the conclusion of Lemma 4 holds. Otherwise, one must have that the sequence is bounded and the successive change goes to zero. Hence the assertions of Lemma 4 hold.

We provide the subsequential convergence result in the following theorem, and the proof can be found in Appendix A.5.

Theorem 5

Let \(\beta >\hat{\beta }\) and \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) be the sequence generated by Algorithm 1, then any cluster point \((\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, \gamma ^*)\) of the sequence \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) is a stationary point of (10).

By utilizing the KL function and KL property, we can establish that the convergence generated by Algorithm 1 is globally convergent. The proof of this theorem can be found in Appendix A.7.

Theorem 6

Let \(\beta >\hat{\beta }\) and \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}\) be the sequence generated by Algorithm 1. Suppose \({\mathcal {L}}_{\beta }\) is a KL function, then the sequence \(\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k\}\) converges globally to a critical point of (10).

5 Numerical experiments

In this section, we apply the generalized ADMM, i.e., Algorithm 1, to solve the proposed GNPMV model (6). All numerical experiments are written by MATLAB 2019a on a 64-bit Windows 10 laptop with Intel(R) Core (TM) i5-10210U CPU @ 1.60GHz 2.11Ghz and 16 G running memory.

To evaluate the performance of the nonconvex penalty MV model (8), we consider a well-diversified investment and compare the results with those obtained by 1/n strategy. 1/n strategy means investing the same amount of money in all available assets, which is also called naive portfolio. By recursively applying the 1/n allocation rule, we can get the expected wealth of the naive portfolio as follows:

$$\begin{aligned} {\tilde{\xi }}=\frac{1}{n}\left( \cdots \left( \frac{1}{n}\left( \frac{\xi _{0}}{n} 1_{n}^\top \left( 1+\textbf{r}_{1}\right) \right) 1_{n}^\top \left( 1+\textbf{r}_{2}\right) \right) \cdots \right) 1_{n}^\top \left( 1+\textbf{r}_{m}\right) , \end{aligned}$$

where \(\xi _0\) denotes the wealth at beginning of the investment, which is assumed to be one unit without loss of generality, and \(\textbf{r}_j\in {\mathbb {R}}^n,~j=1,2,\ldots ,m,\) is the expected return vector. We set the expected wealth of the naive portfolio to be the minimal expected wealth of each period, i.e., \(\textbf{x}_{\min }\) is the vector whose elements are all \({\tilde{\xi }}\) in (4) and (8). Now we introduce some performance measures considering portfolio risk and cost. Firstly, we compute the ratio between the number of non-zero weights and the total number of weights in the result as Density, which is

$$\begin{aligned} \text { Density }=\frac{ amount }{N}, \end{aligned}$$

where amount denotes the number of non-zero weights in the result, and N is the total number of weights. This value is used to measure the sparsity of the portfolio, and reflect the investor’s holding costs.

Secondly, we denote the ratio between the estimated risk of the naive strategy and the estimated risk of the optimal strategy as Ratio, i.e.,

$$\begin{aligned} \text {Ratio}=\frac{{\tilde{\textbf{x}}}^\top H {{\tilde{\textbf{x}}}}}{\textbf{x}_{o}^\top H \textbf{x}_{o}}, \end{aligned}$$

where \(\tilde{\textbf{x}}\) denotes the 1/n portfolio selection and thus the numerator represents the estimated risk of the naive portfolio strategy, \(\textbf{x}_{o}\) denotes the optimal portfolio selection obtained by the tested models and the denominator represents the estimated risk of the optimal one. This value measures the risk reduction factor related to the benchmark. If Ratio \(>1\), it means the model is more efficient than 1/n portfolio strategy.

Thirdly, we count the number of weight changes, which is a measure of transaction costs. We construct a matrix \(Y\in \textrm{R}^{n \times (m-1)}\) to reflect the change in the weights of the same asset during two adjacent investment periods. Each element of Y denotes whether security i was bought or sold during the period j, i.e.,

$$\begin{aligned} {Y}_{i, j}= {\left\{ \begin{array}{ll}1 &{} \text { if }\left| \left( \textbf{x}_{j+1}\right) _{i}-\left( \textbf{x}_{j}\right) _{i}\right| >0, \\ 0 &{} \text { otherwise},\end{array}\right. } \end{aligned}$$

where \(i=1,2, \ldots , n\) and \( j=1,2, \ldots , m-1\). The naive strategy re-executes the decision to distribute evenly every period, thus the total number of transactions is

$$\begin{aligned} {\tilde{\vartheta }}=(m-1) \times n, \end{aligned}$$

The number of transactions associated with the optimal strategy of the tested models can be expressed as

$$\begin{aligned} \vartheta _{o}=\sum _{x=1}^{n} \sum _{t=1}^{m-1} Y_{i, j}, \end{aligned}$$

To estimate the percentage of transactions of the optimal strategy, we define

$$\begin{aligned} \vartheta =\frac{\vartheta _{o}}{{\tilde{\vartheta }}}. \end{aligned}$$

If \(\vartheta <1\), it means that the tested model can effectively reduce the percentage of transactions, thus reduces transaction costs and obtains more profits.

For the implementation of \(\beta \) in Algorithm 1, we adopt a strategy similar to that in Yang et al. (2017), as discussed in Remark 1. We choose \(\beta \) as follows: we initialize \(n_s=0\) and \(\beta =0.5 \hat{\beta }\). In the k-th iteration, we compute

$$\begin{aligned} \begin{aligned} obj^k&=\Vert \textbf{t}^{k}\Vert +\Vert \textbf{y}^{k}\Vert +\Vert \textbf{z}^{k}\Vert , \\ succ\_delta^k&=\Vert \textbf{t}^{k}-\textbf{t}^{k-1}\Vert +\Vert \textbf{y}^{k}-\textbf{y}^{k-1}\Vert +\Vert \textbf{z}^{k}-\textbf{z}^{k-1}\Vert . \end{aligned} \end{aligned}$$

Then, we increase \(n_s\) by 1 if \(succ\_delta^k>0.99\cdot succ\_delta^{k-1}\). Obviously, \(n_s\) is nondecreasing in this procedure. We then update \(\beta \) as \(1.1 \beta \) whenever \(\beta \le 1.01 \hat{\beta }\) and the sequence satisfies either \(n_s \ge 0.3 k\) or \(obj^k>10^{10}\).

5.1 Numerical performance of ADMM

We first test the performance of Algorithm 1, i.e., ADMM, for solving the proposed GNPMV model (6) with different penalties on a reliable FF48 dataset. FF48 dataset comes from the Fama and French database,Footnote 1 containing monthly returns for 48 industry sector portfolios from July 1926 to April 2022. We set the investment rebalancing at the end of each year, and test the model with the period being 10 and 20 years respectively, i.e., \(m=10,20\). The assets in FF48 are moderately correlated, and the condition number of the covariance matrix is \(cond(C)=O(10^4)\), which implies the good numerical stability.

Table 2 Numerical comparisons between ADMM and CPLEX solver for different models on FF48 dataset

We test the performance of ADMM for solving GNPMV model (6) with \(\ell _1\) norm penalty, SCAD and MCP penalty in Table 1, i.e., \(\Phi \) and \(\Psi \) are both chosen to be the SCAD or MCP function, denoted by FL, GNPMV\(_\textrm{SCAD}\) and GNPMV\(_\textrm{MCP}\). We fix \(\tau _1=0.001\) and \(\tau _2=0.01\) for each model, and use \(tol:=10^{-4}\) as the stopping criterion and make optimal parameter selection by simulating the parameters involved in the tested algorithm. As observed in Fan and Li (2001), we find that the parameters c and \(\kappa \) can be chosen empirically with cross-validation or generalized cross-validation techniques. By the cross-validation, we fix \(c=\) \(9, \kappa =6\) for the SCAD and MCP penalty functions presented in Table 1. The maximum number of iteration is set as 25000. For each period, we set as expected minimum wealth \((\textbf{x}_{\min })_j\), \(j=1,2,\ldots ,m\), the expected value produced by the recursive application of the 1/n naive strategy as presented in Corsaro et al. (2021a).

Since the proposed ADMM is customized with theoretical guarantee for solving the portfolio optimization problem (6), we compare it with the general purpose solver, i.e., CPLEX. In Table 2, we report the obtained objective function value (f\(\_\)value), Density (Dens.(%)) and computing time (Time(s)). From the results presented in Table 2, we can see that ADMM can obtain higher quality solution and costs less computation time compared with CPLEX solver.

5.2 Effects of regularization parameters

Table 3 Numerical performance of SCAD penalty model with different choices of \(\tau _{1}\) and \(\tau _{2}\)

For the model (6), the setting of the regularization parameters \(\tau _{1}\) and \(\tau _{2}\) is important to trade off the risk measure and sparsity. Hence, in this subsection, we will test the effects of the regularization parameters \(\tau _1\) and \(\tau _2\) on the resulted optimal portfolio selection. The parameter \(\tau _{1}\) controls the sparsity within group and affect the number of non-zero elements in the obtained portfolio selection. The parameter \(\tau _{2}\) characters the sparsity of the rebalance between the successive periods, which will influence the turnover rate and the transaction cost. In the experiment, we first test the influence of parameters \(\tau _1\) and \(\tau _2\) on GNPMV\(_\textrm{SCAD}\). Specifically, we set \(\Phi \) and \(\Psi \) both to be SCAD penalty in Table 1, and test GNPMV\(_\textrm{SCAD}\) with \(\tau _{1}, \tau _{2} \in \left\{ 10^{-2}, 10^{-3}, 10^{-4}\right\} \).

Table 4 Numerical performance of MCP penalty model with different choices of \(\tau _{1}\) and \(\tau _{2}\)
Fig. 2
figure 2

Asset weight trend over time. Graphs refer to FF48 dataset, model NRO with SCAD penalty, with different \(\tau _1\), \(\tau _2\).Top: 10-years investment, Bottom: 20-years investment

We report the numerical performance of GNPMV\(_\textrm{SCAD}\) with different choices of \(\tau _{1}\) and \(\tau _{2}\) for 10 and 20 years of FF48 dataset in Table 3, including the Density (Dens.(%)), Ratio and the percentage of transactions (\(\vartheta (\%)\)). From the results in the left half of Table 3, we can see that the proportion of non-zero elements (Density) is greatly reduced with the increase of \(\tau _{1}\), thus achieving better sparsity and reducing the holding cost. The risk reduction factor (Ratio) is at least 1.46. This indicates that the investment risk of the optimization model is significantly lower than that of the naive investment portfolio. With the increase of \(\tau _{2}\), the percentage of transactions of the optimal strategy \(\vartheta \) generally shows a downward trend and always remains below \(27 \%\), which shows that the regularization parameter \(\tau _{2}\) indeed promotes the smooth effect between groups, thus reducing transaction costs. The 20-year investment results of the FF48 dataset under different \(\tau _{1}, \tau _{2}\) are also reported in Table 3. In all cases, we can find that the optimal portfolio outperforms the naive portfolio in terms of risk and turnover rate. More precisely, the risk reduction factor Ratio of the 10-year investment period for FF48 dataset at least 1.46, the percentage of transactions of the optimal strategy \(\vartheta \) at most \(27 \%\), the risk reduction factor Ratio of the 20-year investment period for FF48 dataset at least 1.03, and the percentage of transactions of the optimal strategy no more than \(20 \%\).

We further test the numerical performance of GNPMV\(_\textrm{MCP}\) with different choices of \(\tau _{1}\) and \(\tau _{2}\) and report the results in Table 4. As we expected, the performance of all indicators is relatively outstanding for both the 10-year and 20-year periods. By adjusting the parameters, we observe the indicators that measure investment transactions performance, that is, the proportion of non-zero elements (Density), the risk reduction factor (Ratio), the percentage of transactions of the optimal strategy \((\vartheta (\%))\). As listed in Table 4, we observe that the risk reduction factor Ratio is all more than 1.00, which implies that with enhanced sparsity to reduce transaction costs, the risk is still less than that of a naive strategy.

Fig. 3
figure 3

Asset weight trend over time. Graphs refer to FF48 dataset, model NRO with MCP penalty, with different \(\tau _1\), \(\tau _2\). Top: 10-years investment, Bottom: 20-years investment

In Figs. 2  and 3, the trend of the optimal portfolio weights over time is shown to investigate the difference of the GNPMV with different penalties and parameters in depth. In each picture, the number of color blocks on each rebalancing period represents the number of assets allocated, and the height of each color block represents the proportion of the amount allocated to the asset at that time. Obviously, the graphs show excellent smooth effect of the nonconvex penalty term in all cases. It can be seen that the asset weight trend graphs of GNPMV with nonconvex penalty are clearly smooth, and the color blocks are simpler by adjusting the parameters.

5.3 Numerical comparisons between different penalties

In this subsection, we further compare the performance of GNPMV (6) with convex and nonconvex penalty functions. Especially, we conduct the experiments for GNPMV with \(\ell _1\), i.e., FL in (4), on several datasets, i.e., FF48, DJ28, NasdqQ100 and SP500, and compare with GNPMV\(_{\ell _{1/2}}\), GNPMV\(_\textrm{SCAD}\), GNPMV\(_\textrm{MCP}\) and GNPMV\(_\textrm{CAP}\), which correspond to the GNPMV (6) with the penalties presented in Table 1. We run the models to achieve the same predetermined target Ratio and compare the sparsity, turnover, short positions and computing time for these five models. In order to ensure fair comparisons between different models, we utilize a 5-fold cross-validation strategy for selecting the regularization parameters \(\tau _1\) and \(\tau _2\) under the preset target Ratio. Specifically, each dataset is arbitrarily divided into five equally sized portions, out of which four are allocated as training sets, i.e., each partition includes \(80\%\) of the data for training and \(20\%\) for testing purposes. For each dataset, we perform cross-validation to compare sparsity under various parameter groups ranging from \(\{10^{-5},10^{-4},\ldots , 10^{3}\}\). The optimal parameters \(\tau _1\) and \(\tau _2\) are then chosen based on averaged performance.

Table 5 Numerical comparison for different penalties with the same Ratio in FF48 dataset
Table 6 Numerical comparison for different penalties in DJ28, NasdqQ100 and SP500 dataset

The numerical results including sparsity (Dens.), turnover rate (\(\vartheta \)), short positions (Shorts) and computing time (Time(s)) are reported in Tables 5 and 6. Table 5 presents the results of sparsity, turnover, short positions and computing time for \(m=10,~20\) and 30 respectively in FF48 database when the target Ratio is not less than 2.0 and 2.5. We can find that GNPMV with nonconvex penalties perform better in terms of sparsity, turnover rate and short positions to achieve the same target Ratio. Meanwhile, the computing time of solving various models using ADMM is also similar. Taking \(m=20\) as an example, the sparsity of FL is about \(50\%\), and there are many short positions, which makes asset management difficult and costly, while GNPMV\(_{\ell _{1/2}}\), GNPMV\(_\textrm{SCAD}\), GNPMV\(_\textrm{MCP}\) and GNPMV\(_\textrm{CAP}\) can achieve lower sparsity and reduce short positions to 0. Table 6 also presents the numerical comparisons on DJ28, NasdqQ100 and SP500 databases, and the same advantages that GNPMV with nonconvex penalties outperform \(\ell _1\) penalty, i.e., FL, except for GNPMV\(_\textrm{MCP}\) on SP500 dataset. More importantly, GNPMV with nonconvex penalties can achieve no short positions, which fits the fact that short positions are limited in many financial markets.

6 Conclusions

We introduced a nonconvex penalty-based mean-variance optimization model for solving multi-period sparse portfolio selection problems. Our proposed model provides a comprehensive framework for all regularized portfolio selection models. To address the potential nonconvexity of the model, we developed a new solving method, namely the generalized alternating direction method of multipliers based on that for the two-block setting. With the aid of nonconvex optimization theories, we then conducted rigorous convergence analysis to guarantee the efficiency of the proposed method. We performed numerical experiments on four datasets to illustrate the benefits of the nonconvex penalty model in terms of sparsity in single period and transactions between nearest periods. In the future, we plan to extend our work to consider multi-period portfolio selection under uncertain returns.