Nonconvex multi-period mean-variance portfolio optimization

Wu, Zhongming; Xie, Guoyu; Ge, Zhili; De Simone, Valentina

doi:10.1007/s10479-023-05524-x

Nonconvex multi-period mean-variance portfolio optimization

Original Research
Open access
Published: 27 July 2023

Volume 332, pages 617–644, (2024)
Cite this article

Download PDF

You have full access to this open access article

Annals of Operations Research Aims and scope Submit manuscript

Nonconvex multi-period mean-variance portfolio optimization

Download PDF

Zhongming Wu¹,
Guoyu Xie¹,
Zhili Ge² &
…
Valentina De Simone ORCID: orcid.org/0000-0002-3357-5252³

1210 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we address the problem of long-term investment by exploring optimal strategies for allocating wealth among a finite number of assets over multiple periods. Based on the classical Markowitz mean-variance philosophy, we develop a new portfolio optimization framework which can produce sparse portfolios. The sparsity of the portfolio at each and across periods is characterized by the possibly nonconvex penalties. For the constructed nonconvex and nonsmooth constrained model, we propose a generalized alternating direction method of multipliers and its global convergence to a stationary point can be guaranteed theoretically. Moreover, some numerical experiments are conducted on several datasets generated from practical applications to illustrate the effectiveness and advantage of the proposed model and solving method.

A brief review of portfolio optimization techniques

Article 15 September 2022

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Diversification and portfolio theory: a review

Article 04 June 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Portfolio optimization is a prominent topic in finance that involves selecting securities from among a large and complex array of candidates to allocate wealth rationally with the goal of maximizing returns and minimizing risks. Although investors often have access to numerous assets, investing in a large number of them can result in high transaction costs and increased complexity in portfolio management. Consequently, most investors are only able to invest in a limited number of assets. In this paper, we address multi-period sparse portfolio selection problems, aimed at developing an optimal strategy for long-term investment through sparse portfolio selection.

Markowitz (1968) proposed the mean-variance (MV) portfolio selection model, which established the foundation of modern portfolio theory. This model defines the mean of returns as the measure of gain and considers the variance of returns as a measure of risk. Mathematically, the classical MV model can be formulated as a quadratic programming problem as follows:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{x}} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}\\ \mathrm{s.t.} &{} \sum _{i=1}^n \mu _ix_i \ge \rho , \\ &{} \sum _{i=1}^n x_i = 1, \end{array} \end{aligned} \end{aligned}$$

(1)

where $\textbf{x}=[x_1,x_2,\ldots ,x_n]^\top $ denotes the weight vector, H denotes the covariance matrix of returns, and the objective function is to minimize portfolio risk. $\mu $ is the expected return vector, and $\rho $ is the minimum end-of-period wealth value. To enhance realism, many researchers have extended the MV model under various settings, such as minimum investment, maximum investment, and cardinality constraints (see, e.g., Jacob 1974; Perold 1984; Chang et al. 2000; Bertsimas and Cory-Wright 2022). We refer to Zhang et al. (2018), Mencarelli and d’Ambrosio (2019), Cui et al. (2022) for surveys on the MV portfolio selection model and its variants.

When the number of assets is large and returns are highly correlated, the MV model produces a weight vector with many non-zero that infinitely close to zero values, making the model unstable with poor out-of-sample performance (Cui et al., 2018; Huang et al., 2021). On the other hand, an excessive number of assets can increase management difficulty and transaction costs for investors. To address these issues, Gao and Li (2013) developed the cardinality constrained MV model, which selects a small number of assets from a larger pool. This model is formulated as

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{x}} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}+\lambda \Vert \textbf{x}\Vert _0 \\ \mathrm{s.t.} &{} \sum _{i=1}^n \mu _ix_i \ge \rho , \\ &{} \sum _{i=1}^n x_i = 1, \end{array} \end{aligned} \end{aligned}$$

(2)

where $\lambda $ is a trade-off parameter, and $\Vert \textbf{x}\Vert _0$ denotes the number of nonzero elements in $\textbf{x}$. The model (2) can produce sparse portfolios, which are more attainable in real-life situations. However, the constrained MV model (2) is NP-hard (Gao & Li, 2013), and can only be solved by using some heuristic or convex relaxation methods (Chang et al., 2000; Bertsimas & Shioda, 2009; Anis & Kwon, 2022).

To remedy the NP-hardness of the $\ell _0$-norm penalty model (2), some convex penalty methods have been developed to produce sparse portfolio selection solution and improve out-of-sample performance. For instance, DeMiguel et al. (2009) proposed using $\ell _1$ and squared $\ell _2$ norm constraints to achieve the minimum variance criterion. It is worth noting that $\ell _1$-regularization for MV portfolio selection can be seen as an adaptation of the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996), which can be solved exactly. Yen and Yen (2014) and Ho et al. (2015) introduced an elastic net penalty in the context of constrained minimum variance portfolio optimization. Fastrich et al. (2015) studied a weighted LASSO approach for minimum variance portfolios and proposed a calibration scheme wherein the weights are selected based on the variability in the volatility of each asset. Various regularization techniques applied in the Markowitz MV framework have been proposed in recent works such as Fastrich et al. (2015); Dai and Wen (2018); Corsaro and De Simone (2019); Dai and Kang (2021).

The large appeal of using convex penalties such as $\ell _1$-regularization in portfolio optimization is mainly due to the fact that they can be solved using convex optimization methods (Corsaro & De Simone, 2019; Kremer et al., 2020). However, Fan and Li (2001) showed that the $\ell _1$-penalty tends to produce biased estimates and becomes ineffective when there are both portfolio budget constraints and short-selling constraints in portfolio selection. To address this issue, nonconvex penalties that promote sparsity while countering bias by satisfying singularity at the origin have been suggested. Actually, the nonconvex SCAD penalty proposed in Fan and Li (2001) has been used in Fastrich et al. (2015), Kim et al. (2016) for sparse portfolio selection. Cui et al. (2018) proposed a model for solving sparse portfolio selection problems using nonconvex fraction penalty functions. More recently, Li and Zhang (2022) introduced a nonconvex penalty to promote sparse asset selection for short-term single period portfolio selection problems. Other nonconvex penalties, such as $\ell _q$ norm ($0< q < 1$), Capped $\ell _1$, log-function, have also been used to solve the sparse portfolio selection (Fastrich et al., 2014; Xu et al., 2016; Benidis et al., 2018). However, there is still a lack of unified frameworks and theoretically guaranteed algorithms for these nonconvex penalties based on MV portfolio selection models.

In the actual decision-making process, investors often have the flexibility to adjust their asset positions multiple times depending on market conditions, making it a multi-period process (Li & Ng, 2000; Cui et al., 2014, 2022). Wealth is reallocated at the beginning of each period with the goal of maximizing returns upon exit from the market. Based on this background, Li and Ng (2000) first developed a multi-period MV model, and Cui et al. (2014) extended the MV model to multiple sub-periods under the assumption that short selling is not allowed. Pun and Wong (2019) developed a linear programming model for selecting sparse high-dimensional multi-period portfolios and introduced a constrained $\ell _1$ minimization approach to directly estimate parameters in the optimal portfolio solution. Nystrup et al. (2019) proposed a model predictive control method based on a multivariate hidden Markov model with time-varying parameters to dynamically optimize an investment portfolio and control drawdowns. Corsaro et al. (2021a, 2021b) recently proposed a fused LASSO model to solve sparse multi-period portfolio problems and utilized the split Bregman algorithm in the implementation. Li et al. (2022) studied multi-period portfolio optimization problems with MV and risk parity asset allocation frameworks. For more information on the multi-period MV portfolio selection model, see the recent survey by Cui et al. (2022).

In this paper, we further study a possibly nonconvex penalty-based MV framework for solving the sparse multi-period portfolio selection problem. For the developed nonconvex optimization model, we propose a generalized alternating direction method of multipliers (ADMM) with theoretically guaranteed convergence. Previous work often used heuristic algorithms, such as genetic algorithms and particle swarm optimization, to solve the model (Chen & Wei, 2019; Silva et al., 2019). However, the speed of convergence is slow, and convergence cannot be guaranteed. As a benchmark method for structured convex optimization problems, ADMM is widely used in many practical fields (Boyd et al., 2011; Maneesha & Swarup, 2021; Han, 2022). ADMM can be seen as an extension of the augmented Lagrangian method that decomposes the problem and updates primal and dual variables alternatively, making subproblems easy to solve and applicable to large-scale optimization problems. Indeed, ADMM has been proposed to solve sparse portfolio problems with convex penalties, and global convergence can be guaranteed (Chen et al., 2020). In recent years, ADMM has been extended to solve nonconvex optimization problems, and global convergence can be guaranteed under the Kurdyka-Łojasiewicz framework (Kurdyka, 1998), as shown in Guo et al. (2017), Wu et al. (2017), Themelis and Patrinos (2020), Boţ and Nguyen (2020). However, due to the special property of the nonconvex penalty-based multi-period MV model, existing algorithms cannot solve the problem directly with guaranteed convergence.

Contributions of the paper are threefold. Firstly, we propose a possibly nonconvex penalty-based sparse multi-period MV model that includes two possibly nonconvex penalties on the weight vector to produce a sparse portfolio in a single period and reduce the number of changes in adjacent periods. This model provides a general framework for solving multi-period MV portfolio selection problems. Secondly, we propose a generalized ADMM method to solve the unified model, where each subproblem can be solved efficiently. Thirdly, the rigorous theoretical analysis for the generalized ADMM are conducted based on the Kurdyka-Łojasiewicz property. The computational scalability of the algorithm and the impressive performance of the presented model are demonstrated through some out-of-sample empirical tests in numerical experiments.

The rest of this paper is organized as follows. In Sect. 2, we propose a unified sparse multi-period MV model with possibly nonconvex penalties. In Sect. 3, we develop a generalized ADMM to solve the novel model. The global convergence of the proposed algorithm is rigorously analyzed in Sect. 4 based on subdifferential theory and the Kurdyka-Łojasiewicz property. We report some numerical results on several datasets from practical applications in Sect. 5. Finally, we conclude this paper in Sect. 6.

2 Nonconvex sparse multi-period mean-variance model

We first refer to the fused LASSO model presented in Corsaro et al. (2021b). More precisely, let m denote the number of subperiods, decision taken at time $j~(j=1,2,\ldots ,m)$ is kept in the j-th sub-period $[j, j+1)$ of the investment. Let n denote the number of assets that can be invested at each sub-period, then the strategy of portfolio selection at whole period can be denoted by

$$\begin{aligned} \textbf{x}=[\textbf{x}_1,\textbf{x}_2,\ldots ,\textbf{x}_m]^\top \in {\mathbb {R}}^N, \end{aligned}$$

(3)

where $\textbf{x}_j\in {{\mathbb {R}}}^n$ is the portfolio of holdings at the beginning of sub-period $j~(j=1,2,\ldots ,m)$ and $N = mn$. Note that $(\textbf{x}_j)_i$ is the portion of the investor’s total wealth invested in asset i at j-th sub-period.

Assuming that $j=1$ is the initial period, we denote $\textbf{r}_j\in {\mathbb {R}}^n$ as the expected return vector and $H_j\in {\mathbb {R}}^{n\times n}$ as the covariance matrix, which is assumed to be positive definite. Then, the FL model can be formulated as follows:

$$\begin{aligned} \begin{aligned} (\mathrm FL) \qquad \begin{array}{cl} \min \limits _{\textbf{x}\in {\mathbb {R}}^N} &{} \sum _{j=1}^m\left( \frac{1}{2}{} \textbf{x}_j^\top H_j\textbf{x}_j +\tau _1\Vert \textbf{x}_j\Vert _1\right) +\tau _2\sum _{j=1}^{m-1}\Vert \textbf{x}_{j+1}-\textbf{x}_j\Vert _1 \\ \mathrm{s.t.} &{} \textbf{x}_1\textbf{1}_n= \xi _0, \\ &{} \textbf{x}_j\textbf{1}_n=(\textbf{1}_n+\textbf{r}_{j-1})^\top \textbf{x}_{j-1}, ~~j=2,3,\ldots ,m,\\ &{} \textbf{x}_j\textbf{1}_n\ge (\mathbf{x_{\min }})_{j-1}, ~~j=2,3,\ldots ,m,\\ &{} (\textbf{1}_n+\textbf{r}_m)^\top \textbf{x}_m \ge (\mathbf{x_{\min }})_m, \end{array} \end{aligned} \end{aligned}$$

(4)

where $\textbf{1}_n$ is the column vector with n elements, all equal to one, $\xi _0$ is the initial wealth, $\mathbf{x_{\min }}$ is the vector of expected minimum wealth, $\tau _1$ and $\tau _2$ are trade-off parameters. Note that in the objective function of (4), the quadratic term represents the portfolio risk, which is the sum of all sub-period variances. The $\ell _1$-norm is used to promote sparsity in the solution. In particular, the terms $\Vert \textbf{x}_j\Vert _1~(j=1,2,\ldots ,m)$ and $\Vert \textbf{x}_{j+1}-\textbf{x}_j\Vert _1~ (j=1,2,\ldots , m-1)$ characterize the sparsity of the investment in a single period and the rebalance during successive periods, respectively. The constraints are all imposed on wealth, including the initial equilibrium and the constraint on minimal return.

As presented in Corsaro et al. (2021a, 2021b), the model (4) can be reformulated as the following compact form:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{x}\in {\mathbb {R}}^N} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}+\tau _1\Vert \textbf{x}\Vert _1+\tau _2\Vert F\textbf{x}\Vert _1 \\ \mathrm{s.t.} &{} E\textbf{x} = \textbf{b}, \\ &{} G\textbf{x}\ge \mathbf{x_{\min }}, \end{array} \end{aligned}\end{aligned}$$

(5)

where $\textbf{b}=(\xi _0,0,0,\ldots ,0)^\top \in {\mathbb {R}}^m$, and the matrices are defined as follows:

$$\begin{aligned} H=\left( \begin{array}{cccc} H_1&{}\textbf{0}&{}\cdots &{}\textbf{0}\\ \textbf{0}&{}H_2&{}\ddots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}\textbf{0}\\ \textbf{0}&{}\cdots &{}\textbf{0}&{}H_m\\ \end{array} \right) ,~~~ F=\left( \begin{array}{cccccc} -I&{}I&{}\textbf{0}&{}\cdots &{}\textbf{0}\\ \textbf{0}&{}-I&{}I&{}\ddots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}I&{}\textbf{0}\\ \textbf{0}&{}\cdots &{}\textbf{0}&{}-I&{}I\\ \end{array} \right) , \end{aligned}$$

and

$$\begin{aligned} E=\left( \begin{array}{cccc} \textbf{1}_n&{}-(\textbf{1}_n+\textbf{r}_1)&{}\cdots &{}\textbf{0}\\ \textbf{0}&{}\textbf{1}_n&{}\ddots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}-(\textbf{1}_n+\textbf{r}_{m-1})\\ \textbf{0}&{}\cdots &{}\textbf{0}&{}\textbf{1}_n\\ \end{array} \right) ,~~~ G=\left( \begin{array}{ccccc} \textbf{0}_n&{}\textbf{1}_n&{}\cdots &{}\textbf{0}&{}\textbf{0}\\ \textbf{0}&{}\textbf{0}_n&{}\ddots &{}\vdots &{}\vdots \\ \vdots &{}\ddots &{}\ddots &{}\textbf{0}_n&{}\textbf{1}_n\\ \textbf{0}&{}\cdots &{}\cdots &{}\textbf{0}&{}\textbf{1}_n+\textbf{r}_m\\ \end{array} \right) . \end{aligned}$$

Table 1 Popular choices of $\Phi (x)=\sum _{i=1}^N g_{\kappa }(x_i)$ or $\Psi (x)=\sum _{i=1}^N g_{\kappa }(x_i)$ and their proximal operators

Full size table

Recently, nonconvex penalties have received much attention in sparse learning problems, as they have been found to have nearly unbiased properties and can overcome the limitations of the $\ell _1$-norm. Moreover, in many cases, $\ell _1$ regularization has been shown to be suboptimal; for instance, it cannot recover a signal with the least measurements when applied to compressed sensing (Xu et al., 2012). Therefore, in this paper, we propose a general nonconvex penalty mean-variance (GNPMV) model to solve the multi-period sparse portfolio selection problem, as follows:

$$\begin{aligned} (\mathrm GNPMV) \qquad \begin{array}{cl} \min \limits _{\textbf{x}\in {\mathbb {R}}^N} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x}+\tau _1\Phi (\textbf{x})+\tau _2\Psi (F\textbf{x}) \\ \mathrm{s.t.} &{} E\textbf{x} = \textbf{b}, \\ &{} G\textbf{x}\ge \mathbf{x_{\min }}, \end{array} \end{aligned}$$

(6)

where $\Phi $ and $\Psi $ are possibly nonconvex penalty functions, and the other notations are the same as those in (4). We list several popular choices of $\Phi $ and $\Psi $ in Table 1, including $\ell _{1/2}$ regularization (Xu et al., 2012), smoothly clipped absolute deviation (SCAD) penalty (Fan & Li, 2001), minimax concave penalty (MCP) (Zhang, 2010a), and capped $\ell _1$ penalty (CAP) (Zhang, 2010b). In Table 1, we also provide the proximal operator of the nonconvex penalties, which will be useful in solving the subproblems in the implementation. The proximal operator of a function g with $\lambda >0$ is defined by

$$\begin{aligned} \textrm{prox}_{\lambda g }(t) = \arg \min _x \left\{ g (x) + \frac{1}{2\lambda }\left\| x-t\right\| ^2\right\} . \end{aligned}$$

(7)

In addition, we show several common penalty functions with fixed $c=2$ and $\kappa =1$ in Fig. 1 to illustrate the difference between convex and nonconvex penalties.

3 A generalized ADMM for solving GNPMV model

The GNPMV model (6) is a nonconvex and nonsmooth optimization problem if nonconvex penalties are chosen. For this potentially nonconvex model, previous works have typically relied on existing solvers or heuristic algorithms for solving it. However, the convergence speed of such methods can be slow, and theoretical convergence guarantee is lacking.

We present a generalized alternating direction method of multipliers (ADMM) for efficiently solving the nonconvex GNPMV model (6) with guaranteed convergence. To achieve this, we introduce auxiliary variables $\textbf{t}\in {\mathbb {R}}^{N}$, $\textbf{y}\in {\mathbb {R}}^{N-n}$, and $\textbf{z}\in {\mathbb {R}}^N$, which allows us to reformulate (6) as follows:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{t},\textbf{y},\textbf{z},\textbf{x}} &{} \frac{1}{2}{} \textbf{x}^\top H\textbf{x} +\tau _1\Phi (\textbf{t})+\tau _2\Psi (\textbf{y}) + \Pi _{{{\mathbb {R}}}_+^N}(\textbf{z})\\ \mathrm{s.t.} &{} \textbf{x}=\textbf{t}, \\ &{} E\textbf{x} = \textbf{b}, \\ &{} F\textbf{x}= \textbf{y},\\ &{} G\textbf{x}+\textbf{z}= \mathbf{x_{\min }}, \end{array} \end{aligned} \end{aligned}$$

(8)

where $\Pi _{{{\mathbb {R}}}_+^N}(\textbf{z})$ is an indicator function that equals 0 if $\textbf{z} \in {{\mathbb {R}}}_+^N$, and is otherwise infinite.

Let

$$\begin{aligned} A=\left( \begin{array}{c} I\\ E\\ F\\ G\\ \end{array} \right) ,~~~ B=\left( \begin{array}{c} -I\\ 0\\ 0\\ 0\\ \end{array} \right) ,~~~ C=\left( \begin{array}{c} 0\\ 0\\ -I\\ 0\\ \end{array} \right) ,~~~ D=\left( \begin{array}{c} 0\\ 0\\ 0\\ I\\ \end{array} \right) ,~~~ \textbf{q}=\left( \begin{array}{c} 0\\ b\\ 0\\ \textbf{x}_{\min }\\ \end{array} \right) . \end{aligned}$$

(9)

Then, we can reformulate the model (8) into the following compact form:

$$\begin{aligned} \begin{aligned} \begin{array}{cl} \min \limits _{\textbf{t},\textbf{y},\textbf{z},\textbf{x}} \frac{1}{2}\textbf{x}^\top H\textbf{x}+\tau _1\Phi (\textbf{t})+\tau _2{\Psi (\textbf{y})} + \Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z})\\ \mathrm{s.t.} ~~~{ A\textbf{x}+B\textbf{t}+C\textbf{y}+D\textbf{z}=\textbf{q},} \end{array} \end{aligned} \end{aligned}$$

(10)

where $\Phi $ and $\Psi $ are proper, closed, and nonnegative functions that may be nonconvex and nonsmooth.

Define the augmented Lagrangian function of problem (10) as follows:

$$\begin{aligned}&{\mathcal {L}}_\beta ({ \textbf{t},\textbf{y},\textbf{z},\textbf{x},{\gamma }}) = \frac{1}{2}{ \textbf{x}^\top }H{ \textbf{x}}+ \tau _1\Phi (\textbf{t})+\tau _2\Psi (\textbf{y})+\Pi _{{\mathbb {R}}_{+}^{N}}({ \textbf{z}})\nonumber \\&~~~~~~~~+\langle \gamma , A\textbf{x} +B\textbf{t}+C\textbf{y}+D\textbf{z}-\textbf{q}\rangle +\frac{\beta }{2}\Vert A\textbf{x} +B\textbf{t}+C\textbf{y}+D\textbf{z}-\textbf{q}\Vert ^2, \end{aligned}$$

(11)

where ${\gamma }$ is the Lagrangian multiplier corresponding to the equality constraint in (10) and $\beta >0$ is a penalty parameter. The generalized ADMM framework is presented in Algorithm 1, where the primal and dual variables are updated alternately with respect to the augmented Lagrangian function (11).

Let $\gamma _1$, $\gamma _2$, and $\gamma _3$ be the components of ${\gamma }$ corresponding to the Lagrangian multipliers with respect to the constraints $\textbf{x}=\textbf{t}$, $F\textbf{x}=\textbf{y}$, and $G\textbf{x}+\textbf{z}=\textbf{x}_{\min }$ in (8), respectively. We now specify the implementation of subproblems in Algorithm 1:

The $\textbf{t}$-subproblem (12a) is equivalent to estimating the proximal operator of $\Phi $, which can be read as
$$\begin{aligned} \begin{aligned} \textbf{t}^{k+1}&= \arg \min _\textbf{t} \left\{ \tau _1\Phi (\textbf{t})+\frac{\beta }{2}\left\| \textbf{x}^k-\textbf{t}+\frac{{\gamma }_1^k}{\beta }\right\| ^2\right\} \\&= \textrm{prox}_{\frac{\tau _1}{\beta }\Phi }\left( \textbf{x}^k+\frac{{\gamma }_1^k}{\beta }\right) . \end{aligned} \end{aligned}$$
(14)
Similarly, the $\textbf{y}$-subproblem (12b) is equivalent to estimating the proximal operator of $\Psi $ as follows:
$$\begin{aligned} \begin{aligned} \textbf{y}^{k+1}&= \arg \min _\textbf{y} \left\{ \tau _2\Psi (\textbf{y})+\frac{\beta }{2}\left\| {F\textbf{x}}^k-\textbf{y}+\frac{{\gamma }_2^k}{\beta }\right\| ^2\right\} \\&= \textrm{prox}_{\frac{\tau _2}{\beta }\Psi }\left( {F\textbf{x}}^k+\frac{{\gamma }_2^k}{\beta }\right) . \end{aligned}\end{aligned}$$
(15)
The $\textbf{z}$-subproblem (12c) is equivalent to deriving the projection onto ${\mathbb {R}}_{+}^{N}$, which is
$$\begin{aligned} \begin{aligned} \textbf{z}^{k+1}&= \arg \min _\textbf{t} \left\{ \Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z})+\frac{\beta }{2}\left\| G\textbf{x}^k+\textbf{z}- \mathbf{x_{\min }}+\frac{{\gamma }_3^k}{\beta }\right\| ^2\right\} \\&= \textrm{Proj}_{{\mathbb {R}}_{+}^{N} }\left( \mathbf{x_{\min }}-\frac{{\gamma }_3^k}{\beta }-G \textbf{x}^k\right) . \end{aligned}\end{aligned}$$
(16)
The $\textbf{x}$-subproblem (12d) is equivalent to solving the following linear system:
$$\begin{aligned} H\textbf{x} +A^\top {\gamma }^k+\beta A^\top (A\textbf{x} +B\textbf{t}^{k+1}+C\textbf{y}^{k+1}+D\textbf{z}^{k+1}-\textbf{q})=0. \end{aligned}$$
(17)

We can see that the $\textbf{z}$-subproblem has an explicit solution, while the $\textbf{t}$- and $\textbf{y}$-subproblems depend on the choices of $\Phi $ and $\Psi $. If the popular nonconvex penalties presented in Table 1 are chosen, the closed-form solutions of the $\textbf{t}$- and $\textbf{y}$-subproblems can be obtained. The linear system (17) can be efficiently solved using sparse Cholesky factorization (Corsaro et al., 2021a) or the conjugate gradient method (Wright & Nocedal, 1999).

4 Convergence analysis

Although the theoretical convergence of ADMM has been studied for various nonconvex optimization problems, such as those presented in Guo et al. (2017); Themelis and Patrinos (2020); Wang et al. (2019), the assumptions made in these studies are not always easy to verify or satisfy, especially for concrete applications. Thus, for the sake of self-containedness in this paper, we will analyze the global convergence of ADMM in Algorithm 1 to solve the nonconvex portfolio optimization problem (8).

4.1 Preliminaries

For an extended-real-valued function g, the domain of g is defined as

$$\begin{aligned} \textrm{dom} g:=\{\textbf{x}\in {\mathbb {R}}^n\;|\;g(\textbf{x})<\infty \}. \end{aligned}$$

A function g is closed if it is lower semicontinuous and is proper if $\textrm{dom}g\ne \emptyset $ and $g(\textbf{x})>-\infty $ for any $\textbf{x}\in \textrm{dom} g$. For any point $\textbf{x}\in {\mathbb {R}}^{n}$ and subset $S \subseteq {\mathbb {R}}^{n}$, the Euclidean distance from $\textbf{x}$ to S is defined by

$$\begin{aligned} \textrm{dist}(\textbf{x},S):= \inf \big \{\Vert \textbf{y}-\textbf{x}\Vert \; \big | \; \textbf{y}\in S\big \}. \end{aligned}$$

For a proper and closed function $g:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\cup \{\infty \}$, a vector $ \textbf{u}\in \partial g(\textbf{x})$ is a subgradient of g at $\textbf{x}\in \textrm{dom}g$, where $\partial g$ denotes the subdifferential of g (Rockafellar & Wets, 2009) defined by

$$\begin{aligned} \partial g(\textbf{x}):=\big \{\textbf{u}\in {\mathbb {R}}^n\;|\;\exists \textbf{x}^k\rightarrow \textbf{x},~ \widehat{\partial }g(\textbf{x}^k) \ni \textbf{u}^k \rightarrow \textbf{u} ~\textrm{with}~g(\textbf{x}^k)\rightarrow g(\textbf{x})\big \} \end{aligned}$$

(18)

with $\widehat{\partial }g(\textbf{x})$ being the set of regular subgradients of g at $\textbf{x}$:

$$\begin{aligned} \widehat{\partial }g(\textbf{x}):=\big \{\textbf{u}\in {\mathbb {R}}^n~|~g(\textbf{y})&\ge g(\textbf{x})+\langle \textbf{u},\textbf{y}-\textbf{x}\rangle +o(\Vert \textbf{y}-\textbf{x}\Vert ),~\forall \textbf{y}\in {\mathbb {R}}^n\big \}. \end{aligned}$$

As discussed in Rockafellar and Wets (2009), it holds that $\widehat{\partial }g(\textbf{x})\subseteq \partial g(\textbf{x})$ and both of them are closed. Note that for a continuously differentiable function f, the subdifferential of f reduces to the gradient of f, denoted by $\nabla f$. Furthermore, if $f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}$ is continuously differentiable and $g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\cup \{\infty \}$ is proper and lower semicontinuous, it follows from Rockafellar and Wets (2009) that $\partial (f+g)=\nabla f+\partial g$. A point $\textbf{x}^*$ is called (limiting-) critical point or stationary point of a cost function F if it satisfies $0\in \partial F(\textbf{x}^*)$, and the set of critical points of F is denoted by $\textrm{crit} F$.

Definition 1

We say that $(\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, {\gamma }^*)$ is a critical point of the augmented Lagrangian function $ {\mathcal {L}}_\beta (\cdot )$ in (11) if it satisfies

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \tau _1\partial _\textbf{t} \Phi ({\textbf{t}^*})+B^\top {\gamma }^*, \\ 0\in \tau _2\partial _\textbf{y} \Psi ({\textbf{y}^*})+C^\top {\gamma }^*, \\ 0\in \partial _\textbf{z} \Pi _{{\mathbb {R}}_{+}^{N}}({\textbf{z}^*})+D^\top {\gamma }^*, \\ 0=\frac{1}{2}C\textbf{x}^*+A^\top {\gamma }^*,\\ 0=A\textbf{x}^*+B\textbf{t}^*+C\textbf{y}^*+D\textbf{z}^*-\textbf{q}. \end{array}\right. \end{aligned}$$

(19)

It is straightforward to observe that a critical point of the augmented Lagrangian function of (10) corresponds to a KKT point associated with it.

We now introduce the definition of Kurdyka-Łojasiewicz (KL) function and uniform KL property, as borrowed from Attouch et al. (2013); Bolte et al. (2014), respectively. These concepts will aid in establishing global convergence.

Definition 2

Let $f:{\mathbb {R}}^n\rightarrow (-\infty ,\infty ]$ be a proper and lower semicontinuous function.

(i) The function f is said to have KL property at $\textbf{x}^*\in \textrm{dom}(\partial f)$ if there exists $\eta \in (0,+\infty ]$, a neighborhood U of $\textbf{x}^*$, and a continuous and concave function $\varphi :[0,\eta )\rightarrow \mathbb {R^+}$ such that

(a) $\varphi (0)=0$ and $\varphi $ is continuously differentiable on $(0,\eta )$ with $\varphi '>0;$

(b) for all $\textbf{x}\in U\cap \{\textbf{z}\in {\mathbb {R}}^n|f(\textbf{x}^*)<f(\textbf{z})<f(\textbf{x}^*)+\eta \}$, the following KL inequality holds:

$$\begin{aligned} \varphi '(f(\textbf{x})-f(\textbf{x}^*)){ \mathrm dist}(0,\partial f(\textbf{x}))\ge 1. \end{aligned}$$

(ii) If f satisfies the KL property at each point of $ \textrm{dom}(\partial f)$, then f is called a KL function.

Throughout this paper, we assume that the objective function of (10) is coercive and there exists at least a KKT point of (10).

4.2 Convergence

In this subsection, we are devoted to analyzing the convergence of Algorithm 1. Recalling the iterative scheme (12)–(13), we first present the first-order optimality conditions of the subproblems in Algorithm 1 as follows:

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \tau _1\partial _\textbf{t} \Phi ({\textbf{t}^{k+1}})+B^\top {\gamma }^k +\beta B^\top (A \textbf{x}^k+B \textbf{t}^{k+1}+C \textbf{y}^{k}+D \textbf{z}^k-\textbf{q}), \\ 0\in \tau _2\partial _\textbf{y} \Psi ({\textbf{y}^{k+1}})+C^\top {\gamma }^k+\beta C^\top (A \textbf{x}^k+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^k-\textbf{q}), \\ 0\in \partial _\textbf{z} \Pi _{{\mathbb {R}}_{+}^{N}}({\textbf{z}^{k+1}})+D^\top {\gamma }^k+\beta D^\top (A \textbf{x}^k+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q}), \\ 0=\frac{1}{2}C\textbf{x}^{k+1}+A^\top {\gamma }^{k}+\beta A^\top {A \textbf{x}^{k+1}+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q}},\\ {\gamma }^{k+1}={\gamma }^k+\beta (A \textbf{x}^{k+1}+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q}). \end{array}\right. \end{aligned}$$

(20)

In the following, we first present several lemmas to characterize the properties of the sequences generated by Algorithm 1. The proofs of these lemmas can be found in Appendix A.

Lemma 1

Let $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ be the sequence generated by Algorithm 1. Then, for any $k>0$, we have

$$\begin{aligned} \Vert \gamma ^{k+1}-\gamma ^k\Vert ^2\le \frac{1}{\lambda _{\min }}\Vert A^\top (\gamma ^{k+1}-\gamma ^k)\Vert ^2, \end{aligned}$$

where $\lambda _{\min }$ is the smallest eigenvalue of $A^\top A$.

Lemma 2

Let $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ be the sequence generated by Algorithm 1, then the sequence $\{{\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)\}$ is decreasing, i.e.,

$$\begin{aligned} {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k+1},{\gamma }^{k+1})-{\mathcal {L}}_\beta (\textbf{t}^{k},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},{\gamma }^{k}) \le -b\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$

(21)

where $b>0$ is a certain positive constant.

Lemma 3

The sequence $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ generated by Algorithm 1 is bounded.

Lemma 4

Let $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ be the sequence generated by Algorithm 1, then we have

$$\begin{aligned} \underset{k\rightarrow \infty }{\lim }\Vert \textbf{t}^{k+1}-\textbf{t}^k\Vert +\Vert \textbf{y}^{k+1}-\textbf{y}^k\Vert +\Vert \textbf{z}^{k+1}-\textbf{z}^k\Vert +\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert +\Vert \gamma ^{k+1}-\gamma ^k\Vert =0. \end{aligned}$$

Remark 1

Note that in practical computation, the value of ${\hat{\beta }}$ might be too large, which will lead to slow convergence. As suggested in Li and Pong (2016), Yang et al. (2017), one could initialize the algorithm with a small $\beta $ less than ${\hat{\beta }}$, and then increase $\beta $ by a constant ratio if $\beta \le {\hat{\beta }}$ and the sequence generated by the algorithm becomes unbounded or the successive change of the sequence does not vanish sufficiently fast. It is obvious that one can get $\beta >{\hat{\beta }}$ after at most finitely many increases and then the conclusion of Lemma 4 holds. Otherwise, one must have that the sequence is bounded and the successive change goes to zero. Hence the assertions of Lemma 4 hold.

We provide the subsequential convergence result in the following theorem, and the proof can be found in Appendix A.5.

Theorem 5

Let $\beta >\hat{\beta }$ and $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ be the sequence generated by Algorithm 1, then any cluster point $(\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, \gamma ^*)$ of the sequence $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ is a stationary point of (10).

By utilizing the KL function and KL property, we can establish that the convergence generated by Algorithm 1 is globally convergent. The proof of this theorem can be found in Appendix A.7.

Theorem 6

Let $\beta >\hat{\beta }$ and $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ be the sequence generated by Algorithm 1. Suppose ${\mathcal {L}}_{\beta }$ is a KL function, then the sequence $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k\}$ converges globally to a critical point of (10).

5 Numerical experiments

In this section, we apply the generalized ADMM, i.e., Algorithm 1, to solve the proposed GNPMV model (6). All numerical experiments are written by MATLAB 2019a on a 64-bit Windows 10 laptop with Intel(R) Core (TM) i5-10210U CPU @ 1.60GHz 2.11Ghz and 16 G running memory.

To evaluate the performance of the nonconvex penalty MV model (8), we consider a well-diversified investment and compare the results with those obtained by 1/n strategy. 1/n strategy means investing the same amount of money in all available assets, which is also called naive portfolio. By recursively applying the 1/n allocation rule, we can get the expected wealth of the naive portfolio as follows:

$$\begin{aligned} {\tilde{\xi }}=\frac{1}{n}\left( \cdots \left( \frac{1}{n}\left( \frac{\xi _{0}}{n} 1_{n}^\top \left( 1+\textbf{r}_{1}\right) \right) 1_{n}^\top \left( 1+\textbf{r}_{2}\right) \right) \cdots \right) 1_{n}^\top \left( 1+\textbf{r}_{m}\right) , \end{aligned}$$

where $\xi _0$ denotes the wealth at beginning of the investment, which is assumed to be one unit without loss of generality, and $\textbf{r}_j\in {\mathbb {R}}^n,~j=1,2,\ldots ,m,$ is the expected return vector. We set the expected wealth of the naive portfolio to be the minimal expected wealth of each period, i.e., $\textbf{x}_{\min }$ is the vector whose elements are all ${\tilde{\xi }}$ in (4) and (8). Now we introduce some performance measures considering portfolio risk and cost. Firstly, we compute the ratio between the number of non-zero weights and the total number of weights in the result as Density, which is

$$\begin{aligned} \text { Density }=\frac{ amount }{N}, \end{aligned}$$

where amount denotes the number of non-zero weights in the result, and N is the total number of weights. This value is used to measure the sparsity of the portfolio, and reflect the investor’s holding costs.

Secondly, we denote the ratio between the estimated risk of the naive strategy and the estimated risk of the optimal strategy as Ratio, i.e.,

$$\begin{aligned} \text {Ratio}=\frac{{\tilde{\textbf{x}}}^\top H {{\tilde{\textbf{x}}}}}{\textbf{x}_{o}^\top H \textbf{x}_{o}}, \end{aligned}$$

where $\tilde{\textbf{x}}$ denotes the 1/n portfolio selection and thus the numerator represents the estimated risk of the naive portfolio strategy, $\textbf{x}_{o}$ denotes the optimal portfolio selection obtained by the tested models and the denominator represents the estimated risk of the optimal one. This value measures the risk reduction factor related to the benchmark. If Ratio $>1$, it means the model is more efficient than 1/n portfolio strategy.

Thirdly, we count the number of weight changes, which is a measure of transaction costs. We construct a matrix $Y\in \textrm{R}^{n \times (m-1)}$ to reflect the change in the weights of the same asset during two adjacent investment periods. Each element of Y denotes whether security i was bought or sold during the period j, i.e.,

$$\begin{aligned} {Y}_{i, j}= {\left\{ \begin{array}{ll}1 &{} \text { if }\left| \left( \textbf{x}_{j+1}\right) _{i}-\left( \textbf{x}_{j}\right) _{i}\right| >0, \\ 0 &{} \text { otherwise},\end{array}\right. } \end{aligned}$$

where $i=1,2, \ldots , n$ and $ j=1,2, \ldots , m-1$. The naive strategy re-executes the decision to distribute evenly every period, thus the total number of transactions is

$$\begin{aligned} {\tilde{\vartheta }}=(m-1) \times n, \end{aligned}$$

The number of transactions associated with the optimal strategy of the tested models can be expressed as

$$\begin{aligned} \vartheta _{o}=\sum _{x=1}^{n} \sum _{t=1}^{m-1} Y_{i, j}, \end{aligned}$$

To estimate the percentage of transactions of the optimal strategy, we define

$$\begin{aligned} \vartheta =\frac{\vartheta _{o}}{{\tilde{\vartheta }}}. \end{aligned}$$

If $\vartheta <1$, it means that the tested model can effectively reduce the percentage of transactions, thus reduces transaction costs and obtains more profits.

For the implementation of $\beta $ in Algorithm 1, we adopt a strategy similar to that in Yang et al. (2017), as discussed in Remark 1. We choose $\beta $ as follows: we initialize $n_s=0$ and $\beta =0.5 \hat{\beta }$. In the k-th iteration, we compute

$$\begin{aligned} \begin{aligned} obj^k&=\Vert \textbf{t}^{k}\Vert +\Vert \textbf{y}^{k}\Vert +\Vert \textbf{z}^{k}\Vert , \\ succ\_delta^k&=\Vert \textbf{t}^{k}-\textbf{t}^{k-1}\Vert +\Vert \textbf{y}^{k}-\textbf{y}^{k-1}\Vert +\Vert \textbf{z}^{k}-\textbf{z}^{k-1}\Vert . \end{aligned} \end{aligned}$$

Then, we increase $n_s$ by 1 if $succ\_delta^k>0.99\cdot succ\_delta^{k-1}$. Obviously, $n_s$ is nondecreasing in this procedure. We then update $\beta $ as $1.1 \beta $ whenever $\beta \le 1.01 \hat{\beta }$ and the sequence satisfies either $n_s \ge 0.3 k$ or $obj^k>10^{10}$.

5.1 Numerical performance of ADMM

We first test the performance of Algorithm 1, i.e., ADMM, for solving the proposed GNPMV model (6) with different penalties on a reliable FF48 dataset. FF48 dataset comes from the Fama and French database,^{Footnote 1} containing monthly returns for 48 industry sector portfolios from July 1926 to April 2022. We set the investment rebalancing at the end of each year, and test the model with the period being 10 and 20 years respectively, i.e., $m=10,20$. The assets in FF48 are moderately correlated, and the condition number of the covariance matrix is $cond(C)=O(10^4)$, which implies the good numerical stability.

Table 2 Numerical comparisons between ADMM and CPLEX solver for different models on FF48 dataset

Full size table

We test the performance of ADMM for solving GNPMV model (6) with $\ell _1$ norm penalty, SCAD and MCP penalty in Table 1, i.e., $\Phi $ and $\Psi $ are both chosen to be the SCAD or MCP function, denoted by FL, GNPMV$_\textrm{SCAD}$ and GNPMV$_\textrm{MCP}$. We fix $\tau _1=0.001$ and $\tau _2=0.01$ for each model, and use $tol:=10^{-4}$ as the stopping criterion and make optimal parameter selection by simulating the parameters involved in the tested algorithm. As observed in Fan and Li (2001), we find that the parameters c and $\kappa $ can be chosen empirically with cross-validation or generalized cross-validation techniques. By the cross-validation, we fix $c=$ $9, \kappa =6$ for the SCAD and MCP penalty functions presented in Table 1. The maximum number of iteration is set as 25000. For each period, we set as expected minimum wealth $(\textbf{x}_{\min })_j$, $j=1,2,\ldots ,m$, the expected value produced by the recursive application of the 1/n naive strategy as presented in Corsaro et al. (2021a).

Since the proposed ADMM is customized with theoretical guarantee for solving the portfolio optimization problem (6), we compare it with the general purpose solver, i.e., CPLEX. In Table 2, we report the obtained objective function value (f$\_$value), Density (Dens.(%)) and computing time (Time(s)). From the results presented in Table 2, we can see that ADMM can obtain higher quality solution and costs less computation time compared with CPLEX solver.

5.2 Effects of regularization parameters

Table 3 Numerical performance of SCAD penalty model with different choices of $\tau _{1}$ and $\tau _{2}$

Full size table

For the model (6), the setting of the regularization parameters $\tau _{1}$ and $\tau _{2}$ is important to trade off the risk measure and sparsity. Hence, in this subsection, we will test the effects of the regularization parameters $\tau _1$ and $\tau _2$ on the resulted optimal portfolio selection. The parameter $\tau _{1}$ controls the sparsity within group and affect the number of non-zero elements in the obtained portfolio selection. The parameter $\tau _{2}$ characters the sparsity of the rebalance between the successive periods, which will influence the turnover rate and the transaction cost. In the experiment, we first test the influence of parameters $\tau _1$ and $\tau _2$ on GNPMV$_\textrm{SCAD}$. Specifically, we set $\Phi $ and $\Psi $ both to be SCAD penalty in Table 1, and test GNPMV$_\textrm{SCAD}$ with $\tau _{1}, \tau _{2} \in \left\{ 10^{-2}, 10^{-3}, 10^{-4}\right\} $.

Table 4 Numerical performance of MCP penalty model with different choices of $\tau _{1}$ and $\tau _{2}$

Full size table

We report the numerical performance of GNPMV$_\textrm{SCAD}$ with different choices of $\tau _{1}$ and $\tau _{2}$ for 10 and 20 years of FF48 dataset in Table 3, including the Density (Dens.(%)), Ratio and the percentage of transactions ($\vartheta (\%)$). From the results in the left half of Table 3, we can see that the proportion of non-zero elements (Density) is greatly reduced with the increase of $\tau _{1}$, thus achieving better sparsity and reducing the holding cost. The risk reduction factor (Ratio) is at least 1.46. This indicates that the investment risk of the optimization model is significantly lower than that of the naive investment portfolio. With the increase of $\tau _{2}$, the percentage of transactions of the optimal strategy $\vartheta $ generally shows a downward trend and always remains below $27 \%$, which shows that the regularization parameter $\tau _{2}$ indeed promotes the smooth effect between groups, thus reducing transaction costs. The 20-year investment results of the FF48 dataset under different $\tau _{1}, \tau _{2}$ are also reported in Table 3. In all cases, we can find that the optimal portfolio outperforms the naive portfolio in terms of risk and turnover rate. More precisely, the risk reduction factor Ratio of the 10-year investment period for FF48 dataset at least 1.46, the percentage of transactions of the optimal strategy $\vartheta $ at most $27 \%$, the risk reduction factor Ratio of the 20-year investment period for FF48 dataset at least 1.03, and the percentage of transactions of the optimal strategy no more than $20 \%$.

We further test the numerical performance of GNPMV$_\textrm{MCP}$ with different choices of $\tau _{1}$ and $\tau _{2}$ and report the results in Table 4. As we expected, the performance of all indicators is relatively outstanding for both the 10-year and 20-year periods. By adjusting the parameters, we observe the indicators that measure investment transactions performance, that is, the proportion of non-zero elements (Density), the risk reduction factor (Ratio), the percentage of transactions of the optimal strategy $(\vartheta (\%))$. As listed in Table 4, we observe that the risk reduction factor Ratio is all more than 1.00, which implies that with enhanced sparsity to reduce transaction costs, the risk is still less than that of a naive strategy.

In Figs. 2 and 3, the trend of the optimal portfolio weights over time is shown to investigate the difference of the GNPMV with different penalties and parameters in depth. In each picture, the number of color blocks on each rebalancing period represents the number of assets allocated, and the height of each color block represents the proportion of the amount allocated to the asset at that time. Obviously, the graphs show excellent smooth effect of the nonconvex penalty term in all cases. It can be seen that the asset weight trend graphs of GNPMV with nonconvex penalty are clearly smooth, and the color blocks are simpler by adjusting the parameters.

5.3 Numerical comparisons between different penalties

In this subsection, we further compare the performance of GNPMV (6) with convex and nonconvex penalty functions. Especially, we conduct the experiments for GNPMV with $\ell _1$, i.e., FL in (4), on several datasets, i.e., FF48, DJ28, NasdqQ100 and SP500, and compare with GNPMV$_{\ell _{1/2}}$, GNPMV$_\textrm{SCAD}$, GNPMV$_\textrm{MCP}$ and GNPMV$_\textrm{CAP}$, which correspond to the GNPMV (6) with the penalties presented in Table 1. We run the models to achieve the same predetermined target Ratio and compare the sparsity, turnover, short positions and computing time for these five models. In order to ensure fair comparisons between different models, we utilize a 5-fold cross-validation strategy for selecting the regularization parameters $\tau _1$ and $\tau _2$ under the preset target Ratio. Specifically, each dataset is arbitrarily divided into five equally sized portions, out of which four are allocated as training sets, i.e., each partition includes $80\%$ of the data for training and $20\%$ for testing purposes. For each dataset, we perform cross-validation to compare sparsity under various parameter groups ranging from $\{10^{-5},10^{-4},\ldots , 10^{3}\}$. The optimal parameters $\tau _1$ and $\tau _2$ are then chosen based on averaged performance.

Table 5 Numerical comparison for different penalties with the same Ratio in FF48 dataset

Full size table

Table 6 Numerical comparison for different penalties in DJ28, NasdqQ100 and SP500 dataset

Full size table

The numerical results including sparsity (Dens.), turnover rate ($\vartheta $), short positions (Shorts) and computing time (Time(s)) are reported in Tables 5 and 6. Table 5 presents the results of sparsity, turnover, short positions and computing time for $m=10,~20$ and 30 respectively in FF48 database when the target Ratio is not less than 2.0 and 2.5. We can find that GNPMV with nonconvex penalties perform better in terms of sparsity, turnover rate and short positions to achieve the same target Ratio. Meanwhile, the computing time of solving various models using ADMM is also similar. Taking $m=20$ as an example, the sparsity of FL is about $50\%$, and there are many short positions, which makes asset management difficult and costly, while GNPMV$_{\ell _{1/2}}$, GNPMV$_\textrm{SCAD}$, GNPMV$_\textrm{MCP}$ and GNPMV$_\textrm{CAP}$ can achieve lower sparsity and reduce short positions to 0. Table 6 also presents the numerical comparisons on DJ28, NasdqQ100 and SP500 databases, and the same advantages that GNPMV with nonconvex penalties outperform $\ell _1$ penalty, i.e., FL, except for GNPMV$_\textrm{MCP}$ on SP500 dataset. More importantly, GNPMV with nonconvex penalties can achieve no short positions, which fits the fact that short positions are limited in many financial markets.

6 Conclusions

We introduced a nonconvex penalty-based mean-variance optimization model for solving multi-period sparse portfolio selection problems. Our proposed model provides a comprehensive framework for all regularized portfolio selection models. To address the potential nonconvexity of the model, we developed a new solving method, namely the generalized alternating direction method of multipliers based on that for the two-block setting. With the aid of nonconvex optimization theories, we then conducted rigorous convergence analysis to guarantee the efficiency of the proposed method. We performed numerical experiments on four datasets to illustrate the benefits of the nonconvex penalty model in terms of sparsity in single period and transactions between nearest periods. In the future, we plan to extend our work to consider multi-period portfolio selection under uncertain returns.

Notes

Data available at http:/mba.tuck.dartmouth.edupagesfacultyken.frenchdata_library.html#BookEquity.

References

Anis, H. T., & Kwon, R. H. (2022). Cardinality-constrained risk parity portfolios. European Journal of Operational Research, 302(1), 392–402.
MathSciNet Google Scholar
Attouch, H., Bolte, J., & Svaiter, B. F. (2013). Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods. Mathematical Programming, 137(1–2), 91–129.
MathSciNet Google Scholar
Benidis, K., Feng, Y., Palomar, D.P. (2018). Optimization methods for financial index tracking: From theory to practice. Foundations and Trends® in Optimization 3(3), 171–279
Bertsimas, D., & Cory-Wright, R. (2022). A scalable algorithm for sparse portfolio selection. INFORMS Journal on Computing, 34(3), 1489–1511.
MathSciNet Google Scholar
Bertsimas, D., & Shioda, R. (2009). Algorithm for cardinality-constrained quadratic optimization. Computational Optimization and Applications, 43(1), 1–22.
MathSciNet Google Scholar
Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1–2), 459–494.
MathSciNet Google Scholar
Boţ, R. I., & Nguyen, D.-K. (2020). The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Mathematics of Operations Research, 45(2), 682–712.
MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3(1), 1–122.
Chang, T.-J., Meade, N., Beasley, J. E., & Sharaiha, Y. M. (2000). Heuristics for cardinality constrained portfolio optimisation. Computers & Operations Research, 27(13), 1271–1302.
Google Scholar
Chen, J., Dai, G., & Zhang, N. (2020). An application of sparse-group lasso regularization to equity portfolio optimization and sector selection. Annals of Operations Research, 284, 243–262.
MathSciNet Google Scholar
Chen, C., & Wei, Y. (2019). Robust multiobjective portfolio optimization: a set order relations approach. Journal of Combinatorial Optimization, 38(1), 21–49.
MathSciNet Google Scholar
Corsaro, S., & De Simone, V. (2019). Adaptive $ l_1 $ lregularization for short-selling control in portfolio selection. Computational Optimization and Applications, 72(2), 457–478.
MathSciNet Google Scholar
Corsaro, S., De Simone, V., & Marino, Z. (2021). Split bregman iteration for multi-period mean variance portfolio optimization. Applied Mathematics and Computation, 392, 125715.
MathSciNet PubMed Google Scholar
Corsaro, S., Simone, V. D., & Marino, Z. (2021). Fused Lasso approach in portfolio selection. Annals of Operations Research, 299(1), 47–59.
MathSciNet Google Scholar
Cui, X., Gao, J., Li, X., & Shi, Y. (2022). Survey on multi-period mean–variance portfolio selection model. Journal of the Operations Research Society of China, 1–24.
Cui, A., Peng, J., Zhang, C., Li, H., & Wen, M. (2018). Sparse portfolio selection via non-convex fraction function. arXiv preprint arXiv:1801.09171
Cui, X., Gao, J., Li, X., & Li, D. (2014). Optimal multi-period mean-variance policy under no-shorting constraint. European Journal of Operational Research, 234(2), 459–468.
MathSciNet Google Scholar
Dai, Z., & Kang, J. (2021). Some new efficient mean-variance portfolio selection models. International Journal of Finance and Economics, 1, 1–13.
ADS Google Scholar
Dai, Z., & Wen, F. (2018). Some improved sparse and stable portfolio optimization problems. Finance Research Letters, 27, 46–52.
Google Scholar
DeMiguel, V., Garlappi, L., Nogales, F. J., & Uppal, R. (2009). A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management Science, 55(5), 798–812.
Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
MathSciNet Google Scholar
Fastrich, B., Paterlini, S., & Winker, P. (2014). Cardinality versus q-norm constraints for index tracking. Quantitative Finance, 14(11), 2019–2032.
MathSciNet Google Scholar
Fastrich, B., Paterlini, S., & Winker, P. (2015). Constructing optimal sparse portfolios using regularization methods. Computational Management Science, 12(3), 417–434.
MathSciNet Google Scholar
Gao, J., & Li, D. (2013). Optimal cardinality constrained portfolio selection. Operations Research, 61(3), 745–761.
MathSciNet Google Scholar
Guo, K., Han, D., & Wu, T. (2017). Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints. International Journal of Computer Mathematics, 94(8), 1653–1669.
MathSciNet Google Scholar
Han, D. (2022). A survey on some recent developments of alternating direction method of multipliers. Journal of the Operations Research Society of China, 10, 1–52.
MathSciNet Google Scholar
Ho, M., Sun, Z., & Xin, J. (2015). Weighted elastic net penalized mean-variance portfolio design and computation. SIAM Journal on Financial Mathematics, 6(1), 1220–1244.
MathSciNet Google Scholar
Huang, R., Qu, S., Yang, X., Xu, F., Xu, Z., & Zhou, W. (2021). Sparse portfolio selection with uncertain probability distribution. Applied Intelligence, 51(10), 6665–6684.
Google Scholar
Jacob, N. L. (1974). A limited-diversification portfolio selection model for the small investor. The Journal of Finance, 29(3), 847–856.
Google Scholar
Kim, M. J., Lee, Y., Kim, J. H., & Kim, W. C. (2016). Sparse tangent portfolio selection via semi-definite relaxation. Operations Research Letters, 44(4), 540–543.
MathSciNet CAS Google Scholar
Kremer, P. J., Lee, S., Bogdan, M., & Paterlini, S. (2020). Sparse portfolio selection via the sorted $\ell _1$-norm. Journal of Banking and Finance, 110, 105687.
Google Scholar
Kurdyka, K. (1998). On gradients of functions definable in o-minimal structures. In: Annales de L’institut Fourier, vol. 48, pp. 769–783.
Li, D., & Ng, W.-L. (2000). Optimal dynamic portfolio selection: Multiperiod mean-variance formulation. Mathematical Finance, 10(3), 387–406.
MathSciNet Google Scholar
Li, G., & Pong, T. K. (2016). Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Mathematical Programming, 159(1), 371–401.
MathSciNet Google Scholar
Li, X., Uysal, A. S., & Mulvey, J. M. (2022). Multi-period portfolio optimization using model predictive control with mean-variance and risk parity frameworks. European Journal of Operational Research, 299(3), 1158–1176.
MathSciNet Google Scholar
Li, Q., & Zhang, W. (2022). Sparse and risk diversifcation portfolio selection. Optimization Letters. https://doi.org/10.1007/s11590-022-01914-5
Article PubMed PubMed Central Google Scholar
Maneesha, A., & Swarup, K. S. (2021). A survey on applications of alternating direction method of multipliers in smart power grids. Renewable and Sustainable Energy Reviews, 152, 111687.
Google Scholar
Markowitz, H.M. (1968). Portfolio Selection. Yale University Press
Mencarelli, L., & d’Ambrosio, C. (2019). Complex portfolio selection via convex mixed-integer quadratic programming: a survey. International Transactions in Operational Research, 26(2), 389–414.
MathSciNet Google Scholar
Nystrup, P., Boyd, S., Lindström, E., & Madsen, H. (2019). Multi-period portfolio selection with drawdown control. Annals of Operations Research, 282(1), 245–271.
MathSciNet Google Scholar
Perold, A. F. (1984). Large-scale portfolio optimization. Management Science, 30(10), 1143–1160.
MathSciNet Google Scholar
Pun, C. S., & Wong, H. Y. (2019). A linear programming model for selection of sparse high-dimensional multiperiod portfolios. European Journal of Operational Research, 273(2), 754–771.
MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.-B. (2009) Variational Analysis. Springer
Silva, Y. L. T., Herthel, A. B., & Subramanian, A. (2019). A multi-objective evolutionary algorithm for a class of mean-variance portfolio selection problems. Expert Systems with Applications, 133, 225–241.
Google Scholar
Themelis, A., & Patrinos, P. (2020). Douglas-rachford splitting and ADMM for nonconvex optimization: Tight convergence results. SIAM Journal on Optimization, 30(1), 149–181.
MathSciNet Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
MathSciNet Google Scholar
Wang, Y., Yin, W., & Zeng, J. (2019). Global convergence of ADMM in nonconvex nonsmooth optimization. Journal of Scientific Computing, 78, 29–63.
MathSciNet Google Scholar
Wright, S., Nocedal, J., et al. (1999). Numerical Optimization. Springer. Science, 35(67–68), 7.
Google Scholar
Wu, Z., Li, M., Wang, D. Z., & Han, D. (2017). A symmetric alternating direction method of multipliers for separable nonconvex minimization problems. Asia-Pacific Journal of Operational Research, 34(06), 1750030.
MathSciNet Google Scholar
Xu, Z., Chang, X., Xu, F., & Zhang, H. (2012). $ \ell _{1/2}$ regularization: A thresholding representation theory and a fast solver. IEEE Transactions on Neural Networks and Learning Systems, 23(7), 1013–1027.
PubMed Google Scholar
Xu, F., Lu, Z., & Xu, Z. (2016). An efficient optimization approach for a cardinality-constrained index tracking problem. Optimization Methods and Software, 31(2), 258–271.
MathSciNet Google Scholar
Yang, L., Pong, T. K., & Chen, X. (2017). Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM Journal on Imaging Sciences, 10(1), 74–110.
MathSciNet Google Scholar
Yen, Y.-M., & Yen, T.-J. (2014). Solving norm constrained portfolio optimization via coordinate-wise descent algorithms. Computational Statistics and Data Analysis, 76, 737–759.
MathSciNet Google Scholar
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.
MathSciNet Google Scholar
Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research, 11(3), 1081–1107.
MathSciNet Google Scholar
Zhang, Y., Li, X., & Guo, S. (2018). Portfolio selection problems with Markowitz’s mean-variance framework: A review of literature. Fuzzy Optimization and Decision Making, 17(2), 125–158.
MathSciNet CAS Google Scholar

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Nos. 12001286, 12001281) and the Project funded by China Postdoctoral Science Foundation (No. 2022M711672) and Qing Lan Project and by the Istituto Nazionale di Alta Matematica - Gruppo Nazionale per il Calcolo Scientifico (INdAM-GNCS).

Funding

Open access funding provided by Università degli Studi della Campania Luigi Vanvitelli within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

School of Management Science and Engineering, Nanjing University of Information Science and Technology, Nanjing, 210044, People’s Republic of China
Zhongming Wu & Guoyu Xie
School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, 210038, People’s Republic of China
Zhili Ge
Department of Mathematics and Physics, University of Campania “Luigi Vanvitelli”, Caserta, 81100, Italy
Valentina De Simone

Authors

Zhongming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Guoyu Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Ge
View author publications
You can also search for this author in PubMed Google Scholar
Valentina De Simone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentina De Simone.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

1.1 A.1 Proof of Lemma 1

Proof

It follows from the definitions in (9), we know that A is full column rank and $\textrm{Im}(B)\subseteq \textrm{Im}(A), \textrm{Im}(C)\subseteq \textrm{Im}(A), \textrm{Im}(D)\subseteq \textrm{Im}(A)$, where $\textrm{Im}(\cdot )$ returns the image of a matrix. By the $\gamma -$updating rule (13) and the $\textrm{Im}(B)\subseteq \textrm{Im}(A), \textrm{Im}(C)\subseteq \textrm{Im}(A), \textrm{Im}(D)\subseteq \textrm{Im}(A)$, for any $k>0$, we have $\gamma ^{k+1}-\gamma ^k=\beta (A \textbf{x}^{k+1}+B \textbf{t}^{k+1}+C \textbf{y}^{k+1}+D \textbf{z}^{k+1}-\textbf{q})\in \textrm{Im}(A)$. Because A has full column rank, there exist the matrices R and S such that R is invertible, $SS^\top =I$ and $A^\top =RS.$ Noticing that $\textrm{Im}(A)=\textrm{Im}(S^\top )$, we get $\gamma ^{k+1}-\gamma ^k\in \textrm{Im}(S^\top ).$ Thus, $\Vert \gamma ^{k+1}-\gamma ^k\Vert ^2=\Vert S(\gamma ^{k+1}-\gamma ^k)\Vert ^2.$ Consequently, we have

$$\begin{aligned} \Vert A^\top (\gamma ^{k+1}-\gamma ^k)\Vert ^2=\Vert RS(\gamma ^{k+1}-\gamma ^k)\Vert ^2\ge \lambda _{R^\top R}\Vert S(\gamma ^{k+1}-\gamma ^k)\Vert ^2=\lambda _{R^\top R}\Vert (\gamma ^{k+1}-\gamma ^k)\Vert ^2, \end{aligned}$$

where $\lambda _{R^\top R}$ denotes the smallest eigenvalue of $R^\top R.$ By the definitions of R and S, we have $\lambda _{\min }=\lambda _{RR^\top }$. Together with the common conclusion in linear algebra $\lambda _{R^\top R}=\lambda _{RR^\top }$, we get $\lambda _{R^\top R}=\lambda _{\min }$, which completes the proof. $\square $

1.2 A.2 Proof of Lemma 2

Proof

It follows from the definition of ${\mathcal {L}}_\beta (\cdot )$ in (11) and (13) that

$$\begin{aligned}&{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k+1},{\gamma }^{k+1}) -{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k+1},\gamma ^{k})\nonumber \\&=\langle \gamma ^{k+1}-\gamma ^{k}, A\textbf{x}^{k+1}+B\textbf{t}^{k+1} +C\textbf{y}^{k+1}+D\textbf{z}^{k+1}-\textbf{q}\rangle \nonumber \\&=\frac{1}{\beta }\Vert \gamma ^{k+1}-\gamma ^k\Vert ^2. \end{aligned}$$

(22)

Notice that the first-order optimality condition of x-subproblem (12d) can be read as

$$\begin{aligned} 0=H\textbf{x}^{k+1}+A^\top {\gamma }^k+A^\top (\gamma ^{k+1}-\gamma ^k). \end{aligned}$$

(23)

Hence, we have $A^\top \gamma ^{k+1}=-H\textbf{x}^{k+1}$. Similarly, $A^\top \gamma ^{k}=-H\textbf{x}^{k}$. It follows from the positive definiteness of H that

$$\begin{aligned} \Vert A^\top (\gamma ^{k+1}-\gamma ^{k})\Vert ^2=\Vert H(\textbf{x}^{k+1} -\textbf{x}^k)\Vert ^2\le \lambda '^2_{\max }\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$

where $\lambda '_{\max }$ denotes the largest eigenvalue of H. Combining the Lemma 1, we get

$$\begin{aligned} \frac{1}{\beta }\Vert \gamma ^{k+1}-\gamma ^k\Vert ^2\le \frac{\lambda '^2_{\max }}{\beta \lambda _{\min }}\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2. \end{aligned}$$

(24)

Next, the $\textbf{x}$-subproblem (12d) in Algorithm 1 can be simplified to

$$\begin{aligned} (H+\beta A^\top A)\textbf{x}=-A^\top \gamma ^k-\beta A^\top (B\textbf{t}^{k+1} +C\textbf{y}^{k+1}+D\textbf{z}^{k+1}-\textbf{q}). \end{aligned}$$

Since H is positive definite and A is full rank, the inverse of $(H+\beta A^\top A)^{-1}$ exists. That is, $\textbf{x}$-subproblem is strongly convex with modulus at least $\lambda '_{\min }+\beta \lambda _{\min }$, where $\lambda '_{\min }$ is the smallest eigenvalue of H. Then, we have

$$\begin{aligned}&{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k+1},{\gamma }^{k}) -{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k},\gamma ^{k})\nonumber \\&=-\frac{\lambda '_{\min }+\beta \lambda _{\min }}{2}\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2. \end{aligned}$$

(25)

Moreover, because $\textbf{t}^{k+1}$ is a minimizer of $\textbf{t}-$subproblem (12a) in Algorithm 1, we have

$$\begin{aligned} {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k})-{\mathcal {L}}_\beta (\textbf{t}^{k},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k})\le 0. \end{aligned}$$

(26)

Similarly, we have

$$\begin{aligned} \left\{ \begin{array}{l} {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k}) -{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k})\le 0, \\ {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k},\gamma ^{k})-{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k})\le 0. \end{array}\right. \end{aligned}$$

(27)

Summing (22), (24), (25), (26), and (27), we get

$$\begin{aligned}&{\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1},\textbf{x}^{k+1},\gamma ^{k+1}) -{\mathcal {L}}_\beta (\textbf{t}^{k},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k})\nonumber \\&\le \frac{\lambda '^2_{\max }}{\beta \lambda _{\min }}\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2 -\frac{\lambda '_{\min }+\beta \lambda _{\min }}{2}\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2\nonumber \\&=-b\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$

where $b=\frac{\lambda '_{\min }+\beta \lambda _{\min }}{2}-\frac{\lambda '^2_{\max }}{\beta \lambda _{\min }}$. We know that $b>0$ if $\beta >\frac{-\lambda '_{\min }+\sqrt{\lambda '^2_{\min }+8\lambda '^2_{\max }}}{2\lambda _{\min }}$ given in Algorithm 1, then it holds that $\{\mathcal { L}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)\}$ is decreasing. This completes the proof. $\square $

1.3 A.3 Proof of Lemma 3

Proof

From Lemma 2, for $k\ge 1$, we have

$$\begin{aligned}&{\mathcal {L}}_\beta (\textbf{t}^{1},\textbf{y}^{1},\textbf{z}^{1},\textbf{x}^{1},\gamma ^{1})\nonumber \\&\ge {\mathcal {L}}_\beta (\textbf{t}^{k},\textbf{y}^{k},\textbf{z}^{k},\textbf{x}^{k},\gamma ^{k})\nonumber \\&=\frac{1}{2}{\textbf{x}^k}^\top H\textbf{x}^k + \tau _1\Phi (\textbf{t}^k)+\tau _2\Psi (\textbf{y}^k)+\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^k)\nonumber \\&\qquad +\langle \gamma ^k, A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q}\rangle +\frac{\beta }{2}\Vert A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q}\Vert ^2\nonumber \\&=\frac{1}{2}{\textbf{x}^k}^\top H\textbf{x}^k + \tau _1\Phi (\textbf{t}^k)+\tau _2\Psi (\textbf{y}^k)+\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^k)\nonumber \\&\qquad +\frac{\beta }{2}\left\| \frac{\gamma ^k}{\beta }+A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k +D\textbf{z}^k-\textbf{q}\right\| ^2-\frac{1}{2\beta }\Vert \gamma ^k\Vert ^2\nonumber \\&\ge \frac{1}{2}\left( \lambda '_{\min }-\frac{\lambda '^2_{\max }}{\beta \lambda _{\min }}\right) \Vert \textbf{x}^k\Vert ^2 +\tau _1\Phi (\textbf{t}^k)+\tau _2\Psi (\textbf{y}^k)+\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^k)\nonumber \\&\qquad +\frac{\beta }{2}\left\| \frac{\gamma ^k}{\beta }+A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q}\right\| ^2, \end{aligned}$$

(28)

where the last inequality follows from (23).

By the assumption on $\beta $ in Algorithm 1, we can see that $\lambda '_{\min }-\frac{\lambda '^2_{\max }}{ \beta \lambda _{\min }}>0$. Again by using the fact that $\Phi $, $\Psi $, and $\Pi _{{\mathbb {R}}_{+}^{N}}$ are nonnegative, we conclude that $\{\textbf{t}^k\}$, $\{\textbf{y}^k\}$,$\{\textbf{z}^k\}$, $\{\textbf{x}^k\}$, and $\{\frac{\gamma ^k}{\beta }+A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q}\}$ are bounded.

In addition, it holds that

$$\begin{aligned} \Vert \gamma ^k\Vert \le \beta \Vert A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k -\textbf{q}\Vert +\beta \left\| \frac{\gamma ^k}{\beta }+A{\textbf{x}^k} +B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q}\right\| , \end{aligned}$$

which implies that $\{\gamma ^k\}$ is also bounded. Hence, the sequence $\{(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)\}$ is bounded. This completes the proof. $\square $

1.4 A.4 Proof of Lemma 4

Proof

Suppose that $(\textbf{t}^*,\textbf{y}^*,\textbf{z}^*,\textbf{x}^*,\gamma ^*)$ is a cluster point of a sequence $(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)$ generated by Algorithm 1, and $(\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i},\gamma ^{k_i})$ the corresponding subsequence satisfies

$$\begin{aligned} \underset{i\rightarrow \infty }{\lim }(\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i},\gamma ^{k_i})=(\textbf{t}^*,\textbf{y}^*,\textbf{z}^*,\textbf{x}^*,\gamma ^*). \end{aligned}$$

Summing (21) from $k=1$ to $k=k_i-1$, we have

$$\begin{aligned} {\mathcal {L}}_\beta (\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i},\gamma ^{k_i})-{\mathcal {L}}_\beta (\textbf{t}^{1},\textbf{y}^{1},\textbf{z}^{1},\textbf{x}^{1},\gamma ^{1}) \le -b\sum _{k=1}^{k_i-1}\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$

which implies

$$\begin{aligned} b\sum _{k=1}^{\infty }\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2 \le \mathcal { L}_\beta (\textbf{t}^{1},\textbf{y}^{1},\textbf{z}^{1},\textbf{x}^{1},\gamma ^{1})-{\mathcal {L}}_\beta (\textbf{t}^{*},\textbf{y}^{*},\textbf{z}^{*},\textbf{x}^{*},\gamma ^{*})<\infty . \end{aligned}$$

Together with $b>0$, we can derive that $\textbf{x}^{k+1}-\textbf{x}^k\rightarrow 0.$

Next, summing both sides of (24) from $k=1$ to $k=k_i-1$ and taking limits, we have

$$\begin{aligned} \sum _{k=1}^{\infty }\Vert \gamma ^{k+1}-\gamma ^k\Vert ^2\le \frac{\lambda '^2_{\max }}{ \lambda _{\min }}\sum _{k=1}^{\infty }\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$

from which we can conclude that $\gamma ^{k+1}-\gamma ^k \rightarrow 0.$ Again from following iteration form of $\gamma ^{k+1}$:

$$\begin{aligned}&\gamma ^{k+1}={\gamma }^k+\beta (A\textbf{x}^{k+1}+B\textbf{t}^{k+1}+C\textbf{y}^{k+1}+D\textbf{z}^{k+1}-\textbf{q}) \nonumber \\&\quad {~~~}={\gamma }^k+\beta \left( \begin{array}{c} \textbf{x}^{k+1}-\textbf{t}^{k+1}\\ E \textbf{x}^{k+1}-b\\ F \textbf{x}^{k+1}-\textbf{y}^{k+1}\\ G \textbf{x}^{k+1}+\textbf{z}^{k+1}-\textbf{x}_{\min }\\ \end{array} \right) , \end{aligned}$$

(29)

we can get $\textbf{t}^{k+1}-\textbf{t}^k \rightarrow 0$, $\textbf{y}^{k+1}-\textbf{y}^k \rightarrow 0$ and $\textbf{z}^{k+1}-\textbf{z}^k \rightarrow 0$. This completes the proof. $\square $

1.5 A.5 Proof of Theorem 5

Proof

It follows from Lemma 3 that the sequence $(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)$ generated by Algorithm 1 is bounded. Let $(\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, \gamma ^*)$ be the cluster point of the sequence $(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k)$, and there exists a subsequence $(\textbf{t}^{k_1},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i},\gamma ^{k_i})$ such that

$$\begin{aligned} \lim _{i\rightarrow \infty }(\textbf{t}^{k_1},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i},\gamma ^{k_i}) = (\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, \gamma ^*). \end{aligned}$$

Since ${\mathcal {L}}_\beta $ is lower semi-continuous, we have

$$\begin{aligned}&\underset{i\rightarrow \infty }{\lim }\ {\inf } {\mathcal {L}}_\beta (\textbf{t}^{k_i+1},\textbf{y}^{k_i+1},\textbf{z}^{k_i+1},\textbf{x}^{k_i+1},\gamma ^{k_i})\nonumber \\&\ge \frac{1}{2}{\textbf{x}^*}^\top H\textbf{x}^* + \tau _1\Phi (\textbf{t}^*)+\tau _2\Psi (\textbf{y}^*) +\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^*)\nonumber \\&\qquad +\langle \gamma ^*, A {\textbf{x}^*} +B \textbf{t}^*+C\textbf{y}^*+D \textbf{z}^*-\textbf{q}\rangle +\frac{\beta }{2}\Vert A{\textbf{x}^*} +B \textbf{t}^*+C \textbf{y}^*+D \textbf{z}^*-\textbf{q}\Vert ^2. \end{aligned}$$

(30)

From the definition of $\textbf{x}^{k_i+1}$ in (12d), we have

$$\begin{aligned} {\mathcal {L}}_\beta (\textbf{t}^{k_i+1},\textbf{y}^{k_i+1},\textbf{z}^{k_i+1},\textbf{x}^{k_i+1},\gamma ^{k_i})\le {\mathcal {L}}_\beta (\textbf{t}^{k_i+1},\textbf{y}^{k_i+1},\textbf{z}^{k_i+1},\textbf{x}^{*},\gamma ^{k_i}). \end{aligned}$$

Taking limits on both sides of the above inequality, we have

$$\begin{aligned}&\underset{i\rightarrow \infty }{\lim }\ {\sup } {\mathcal {L}}_\beta (\textbf{t}^{k_i+1},\textbf{y}^{k_i+1},\textbf{z}^{k_i+1},\textbf{x}^{k_i+1},\gamma ^{k_i})\nonumber \\&\le \frac{1}{2}{\textbf{x}^*}^\top H\textbf{x}^* + \tau _1\Phi (\textbf{t}^*)+\tau _2\Psi (\textbf{y}^*) +\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^*)\nonumber \\&\qquad +\langle \gamma ^*, A {\textbf{x}^*} +B \textbf{t}^*+C\textbf{y}^*+D \textbf{z}^*-\textbf{q}\rangle +\frac{\beta }{2}\Vert A{\textbf{x}^*} +B \textbf{t}^*+C \textbf{y}^*+D \textbf{z}^*-\textbf{q}\Vert ^2. \end{aligned}$$

(31)

Combining (30) and (31), we obtain

$$\begin{aligned}&\underset{i\rightarrow \infty }{\lim }\ {\mathcal {L}}_\beta (\textbf{t}^{k_i+1},\textbf{y}^{k_i+1},\textbf{z}^{k_i+1},\textbf{x}^{k_i+1},\gamma ^{k_i})\nonumber \\&= \frac{1}{2}{} \textbf{x}^* H\textbf{x}^* + \tau _1\Phi (\textbf{t}^*)+\tau _2\Psi (\textbf{y}^*)+\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^*)\nonumber \\&\qquad +\langle \gamma ^*,A {\textbf{x}^*} +B \textbf{t}^*+C\textbf{y}^*+D \textbf{z}^*-\textbf{q}\rangle +\frac{\beta }{2}\Vert A\textbf{x}^* +B \textbf{t}^*+C \textbf{y}^*+D \textbf{z}^*-\textbf{q}\Vert ^2, \end{aligned}$$

which, together with $\textbf{t}^{k+1}-\textbf{t}^k \rightarrow 0$, $\textbf{y}^{k+1}-\textbf{y}^k \rightarrow 0$, $\textbf{z}^{k+1}-\textbf{z}^k \rightarrow 0$ in Lemma 4 and the definition of ${\mathcal {L}}_\beta $ in (11), implies that

$$\begin{aligned} \underset{i\rightarrow \infty }{\lim }\Phi (\textbf{t}^{k_i+1})=\Phi (\textbf{t}^{*}), \!{\quad } \underset{i\rightarrow \infty }{\lim }\Psi (\textbf{y}^{k_i+1})\!=\!\Phi (\textbf{y}^{*}), {\quad }\textrm{and} {\quad }\!\underset{i\rightarrow \infty }{\lim }\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^{k_i+1})=\Pi _{{\mathbb {R}}_{+}^{N}}(\textbf{z}^{*}).\nonumber \\ \end{aligned}$$

(32)

Taking limits of (20) and invoking Lemma 4, (32), and (18), we see that (19) holds. That is, $(\textbf{t}^*, \textbf{y}^*, \textbf{z}^*, \textbf{x}^*, \gamma ^*)$ is a stationary point of (10). $\square $

1.6 A.6 Uniformized KL property (Bolte et al., 2014)

Lemma 7

Let $\Phi _\eta $ be the set of function $\varphi $ which satisfy the involved conditions in Definition 2(i). Let $\Omega $ be a compact set and $f:{\mathbb {R}}^n\rightarrow (-\infty ,\infty ]$ be a proper and lower semicontinuous function. Assume that f is a constant on $\Omega $ and satisfies the KL property at each point of $\Omega $. Then, there exist $\zeta , \eta >0$ and $\varphi \in \Phi _\eta $ such that for all $\bar{\textbf{x}}\in \Omega $ and all $\textbf{x}$ in the following intersection

$$\begin{aligned} \{\textbf{z}\in {\mathbb {R}}^n|\textrm{dist}(\textbf{z},\Omega )<\zeta \}\cap \{\textbf{z}\in {\mathbb {R}}^n|f(\bar{\textbf{x}})<f(\textbf{x})<f(\bar{\textbf{x}})+\eta \}, \end{aligned}$$

one has

$$\begin{aligned} \varphi '(f(\textbf{x})-f(\bar{\textbf{x}})){ \mathrm dist}(0,\partial f(\textbf{x}))\ge 1. \end{aligned}$$

1.7 A.7 Proof of Theorem 6

Proof

In view of Theorem 5, we only need to show that the sequence is convergent. From the coercivity of (10) and (28), it is not hard to observe that $\{{\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\}$ is bounded from below. Hence, we assume that $\underset{k\rightarrow \infty }{\lim }\{{\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\}=\theta ^*$. To make the proof clear, we consider the following two cases. One is ${\mathcal {L}}_{\beta }(\textbf{t}^N,\textbf{y}^N,\textbf{z}^N, \textbf{x}^N,\gamma ^N)=\theta ^*$ for some $N\ge 1,$ and the other one is ${\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)>\theta ^*$ for all $k\ge 1.$

Case (i). Because $\{{\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\}$ is decreasing, we have ${\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)=\theta ^*$ for all $k\ge N.$ It follows from (21) that $\textbf{x}^{N+t}=\textbf{x}^N$ for all $t\ge 0.$ Then, $\{\textbf{x}^k\}$ converges finitely. Moreover, from (24), we have

$$\begin{aligned} \Vert \gamma ^{k+1}-\gamma ^k\Vert \le \sqrt{\frac{\lambda '^2_{\max }}{\lambda _{\min }}}\Vert \textbf{x}^{k+1}-{ \textbf{x}}^k\Vert , \end{aligned}$$

(33)

which implies that $\{\gamma ^k\}$ is convergent.

By using (29), we obtain $\{\textbf{t}^k\}$, $\{\textbf{y}^k\}$, and $\{\textbf{z}^k\}$ are also convergent. Consequently, we see that $\{ \textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k \}$ is convergent in this case.

Case (ii). In this case, we consider ${\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)>\theta ^*$ for all $k\ge 1.$ We will divide the proof into three steps. (1) We first prove that ${\mathcal {L}}_{\beta }$ is constant on the set of cluster points of the sequence $\{ \textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k \}$ and then apply the uniformized KL property; (2) We bound the distance from 0 to $\partial {\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)$; (3) We show that the sequence $\{ \textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k \}$ is a Cauchy sequence and hence is convergent. The complete proof can be presented as follows.

Step (1). It follows from Lemma 3 that the sequence $\{ \textbf{t}^k,\textbf{y}^k,\textbf{z}^k,\textbf{x}^k,\gamma ^k\}$ generated by Algorithm 1 is bounded and hence there exists at least one cluster point. Let $\Gamma $ denote the set of cluster points of $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k\}$. We will show that ${\mathcal {L}}_{\beta }$ is constant on $\Gamma $. To this end, take any $(\textbf{t}^*,\textbf{y}^*,\textbf{z}^*,\textbf{x}^*,\gamma ^*)\in \Gamma $ and consider a convergent subsequence $ (\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i}, \textbf{x}^{k_i},\gamma ^{k_i}) $ with $\underset{i\rightarrow \infty }{\lim }(\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i}, \textbf{x}^{k_i},\gamma ^{k_i})=(\textbf{t}^*,\textbf{y}^*,\textbf{z}^*, \textbf{x}^*,\gamma ^*).$ Then from the lower semicontinuity of ${\mathcal {L}}_{\beta }$ and the definition of $\theta ^*$, we have

$$\begin{aligned} \theta ^*=\underset{i\rightarrow \infty }{\lim }{\mathcal {L}}_{\beta } (\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i},\gamma ^{k_i}) \ge {\mathcal {L}}_{\beta } (\textbf{t}^{*},\textbf{y}^{*},\textbf{z}^{*}, \textbf{x}^{*},\gamma ^{*}). \end{aligned}$$

(34)

On the other hand, notice that from the definitions of $\textbf{t}^{k+1}$, $\textbf{y}^{k+1}$, and $\textbf{z}^{k+1}$ in (12a), (12b), and (12c), we have

$$\begin{aligned} {\mathcal {L}}_{\beta }(\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i-1},\gamma ^{k_i-1}) \le {\mathcal {L}}_{\beta }(\textbf{t}^{*},\textbf{y}^{*},\textbf{z}^{*}, \textbf{x}^{*},\gamma ^{*}). \end{aligned}$$

This together with Lemma 4 and the continuity of ${\mathcal {L}}_{\beta }$ with respect to $\textbf{x}$, $\gamma $, and the definition of $\theta ^*$ implies that

$$\begin{aligned} \theta ^*=\underset{i\rightarrow \infty }{\lim }{\mathcal {L}}_{\beta } (\textbf{t}^{k_i},\textbf{y}^{k_i},\textbf{z}^{k_i},\textbf{x}^{k_i-1},\gamma ^{k_i-1}) \le {\mathcal {L}}_{\beta }(\textbf{t}^{*},\textbf{y}^{*},\textbf{z}^{*}, \textbf{x}^{*},\gamma ^{*}). \end{aligned}$$

(35)

Combining (34) and (35), we derive that ${\mathcal {L}}_{\beta }(\textbf{t}^{*},\textbf{y}^{*},\textbf{z}^{*}, \textbf{x}^{*},\gamma ^{*})=\theta ^*.$ Furthermore, $(\textbf{t}^{*},\textbf{y}^{*},\textbf{z}^{*}, \textbf{x}^{*},\gamma ^{*})\in \Gamma $ is arbitrary, we conclude that ${\mathcal {L}}_{\beta }$ is constant on $\Gamma .$

Since ${\mathcal {L}}_{\beta }\equiv \theta ^*$ on $\Gamma $ as discussed in the above and ${\mathcal {L}}_{\beta }$ is a KL function by the assumption, it follows from Lemma 1 that there exist $\zeta , \eta >0$, and $\varphi \in \Phi _\eta $ such that

$$\begin{aligned} \varphi '({\mathcal {L}}_{\beta }(\textbf{t},\textbf{y},\textbf{z}, \textbf{x},\gamma )-\theta ^*){ \mathrm dist}(0,\partial {\mathcal {L}}_{\beta }(\textbf{t},\textbf{y},\textbf{z}, \textbf{x},\gamma ))\ge 1, \end{aligned}$$

for all $(\textbf{t},\textbf{y},\textbf{z}, \textbf{x},\gamma )$ satisfying $\textrm{dist}((\textbf{t},\textbf{y},\textbf{z}, \textbf{x},\gamma ),\Gamma )<\zeta $ and $\theta ^*<{\mathcal {L}}_{\beta }(\textbf{t},\textbf{y},\textbf{z}, \textbf{x},\gamma )<\theta ^*+\eta $. On the other hand, since $\underset{i\rightarrow \infty }{\lim }\textrm{dist}((\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k),\Gamma )=0$ by the definition of $\Gamma $, and ${\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\rightarrow \theta ^*$, it holds that for such $\zeta $, and $\eta $, there exists $k_1\ge 0$ and

$\underset{k\rightarrow \infty }{\lim }\textrm{dist}((\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k),\Gamma )<\zeta $ and $\theta ^*<{\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)<\theta ^*+\eta $ for all $k\ge k_1$. Thus, for all $k\ge k_1$, we have

$$\begin{aligned} \varphi '({\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)-\theta ^*){ \mathrm dist}(0,\partial {\mathcal {L}}_{\beta }(\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k))\ge 1. \end{aligned}$$

(36)

Step (2). First, the partial subdifferential with respect to t is

$$\begin{aligned}&\partial _\textbf{t} {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\\&= \tau _1\partial _\textbf{t} \Phi (\textbf{t}^k)+B^\top \gamma ^k +\beta B^\top (A\textbf{x}^k+B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q})\nonumber \\&= \tau _1\partial _\textbf{t} \Phi (\textbf{t}^k)+B^\top \gamma ^{k-1}+\beta B^\top (A\textbf{x}^{k-1}+B\textbf{t}^k +C\textbf{y}^{k-1}+D\textbf{z}^{k-1}-\textbf{q})\nonumber \\&\quad +B^\top (\gamma ^k-\gamma ^{k-1})+\beta B^\top A(\textbf{x}^k-\textbf{x}^{k-1})\nonumber \\&\quad +\beta B^\top C(\textbf{y}^k-\textbf{y}^{k-1})+\beta B^\top D(\textbf{z}^k-\textbf{z}^{k-1})\nonumber \\&\ni B^\top (\gamma ^k-\gamma ^{k-1})+\beta B^\top A(\textbf{x}^k-\textbf{x}^{k-1}), \end{aligned}$$

where the conclusion follows from (12a), and the definitions of B, C, and D. Similarly,

$$\begin{aligned}{} & {} \partial _\textbf{y} {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\\{} & {} =\tau _2\partial _\textbf{y} \Psi (\textbf{y}^k)+C^\top \gamma ^k +\beta C^\top (A\textbf{x}^k+B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q})\nonumber \\{} & {} = \tau _2\partial _\textbf{y} \Psi (\textbf{y}^k)+C^\top \gamma ^{k-1}+\beta C^\top (A\textbf{x}^{k-1} +B\textbf{t}^k+C\textbf{y}^{k}+D\textbf{z}^{k-1}-\textbf{q})\nonumber \\{} & {} \quad +C^\top (\gamma ^k-\gamma ^{k-1})+\beta C^\top A(\textbf{x}^k-\textbf{x}^{k-1}) +\beta C^\top D(\textbf{z}^k-\textbf{z}^{k-1})\nonumber \\{} & {} \ni C^\top (\gamma ^k-\gamma ^{k-1})+\beta C^\top A(\textbf{x}^k-\textbf{x}^{k-1}), \end{aligned}$$

and

$$\begin{aligned}&\partial _\textbf{z} {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\\&=\partial _\textbf{z} \Pi _{{\mathbb {R}}_{+}}(\textbf{z}^k)+D^\top \gamma ^k +\beta D^\top (A\textbf{x}^k+B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q})\nonumber \\&= \partial _\textbf{z} \Pi _{{\mathbb {R}}_{+}}(\textbf{z}^k)+D^\top \gamma ^{k-1} +\beta D^\top (A\textbf{x}^{k-1}+B\textbf{t}^k+C\textbf{y}^{k}+D\textbf{z}^{k}-\textbf{q})\nonumber \\&\quad +D^\top (\gamma ^k-\gamma ^{k-1})+\beta D^\top A(\textbf{x}^k-\textbf{x}^{k-1})\nonumber \\&\ni D^\top (\gamma ^k-\gamma ^{k-1})+\beta D^\top A(\textbf{x}^k-\textbf{x}^{k-1}). \end{aligned}$$

Moreover, according to (12d), we have

$$\begin{aligned}&\nabla _\textbf{x} {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\\&= H\textbf{x}^k+A^\top \gamma ^k+ \beta A^\top (A\textbf{x}^k+B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q})\nonumber \\&= H\textbf{x}^k+A^\top \gamma ^{k-1}+ \beta A^\top (A\textbf{x}^k+B\textbf{t}^k +C\textbf{y}^k+D\textbf{z}^k-\textbf{q}) +A^\top (\gamma ^k-\gamma ^{k-1})\nonumber \\&=A^\top (\gamma ^k-\gamma ^{k-1}). \end{aligned}$$

Finally, it follows from (13) that

$$\begin{aligned} \nabla _\gamma (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)= A\textbf{x}^k+B\textbf{t}^k+C\textbf{y}^k+D\textbf{z}^k-\textbf{q}=\frac{1}{\beta }(\gamma ^k-\gamma ^{k-1}). \end{aligned}$$

Thus, there exists $a:=\max \{(3+\Vert A^\top \Vert +\frac{1}{\beta }), \beta (\Vert B^\top A\Vert +\Vert C^\top A\Vert +\Vert D^\top A\Vert )\}>0$ such that

$$\begin{aligned}&\textrm{dist}(0,\partial {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)\le a(\Vert \gamma ^k-\gamma ^{k-1}\Vert +\Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert ). \end{aligned}$$

(37)

Step (3). Define $\Delta ^k=\varphi ({\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)-\theta ^*)- \varphi ({\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1}, \textbf{x}^{k+1},\gamma ^{k+1})-\theta ^*)$. Since $\mathcal { L}_\beta $ is decreasing and $\varphi $ is monotonic, it is easy to see $\Delta ^k\ge 0$ for all $k\ge 1$.

Then, for $k\ge k_1$, we have

$$\begin{aligned}&a(\Vert \gamma ^k-\gamma ^{k-1}\Vert +\Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert )\cdot \Delta ^k\nonumber \\&{~}\ge \textrm{dist}(0,\partial {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k))\cdot \Delta ^k\nonumber \\&{~}\ge \textrm{dist}(0,\partial {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)) \varphi '({\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)-\theta ^*)\nonumber \\&{~~~~~~~}\cdot [{\mathcal {L}}_\beta ((\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)- {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1}, \textbf{x}^{k+1},\gamma ^{k+1})]\nonumber \\&{~}\ge {\mathcal {L}}_\beta (\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k)- {\mathcal {L}}_\beta (\textbf{t}^{k+1},\textbf{y}^{k+1},\textbf{z}^{k+1}, \textbf{x}^{k+1},\gamma ^{k+1})\nonumber \\&{~}\ge b\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ^2, \end{aligned}$$

(38)

where the first inequality follows from (37), the second inequality follows from the concavity of $\varphi $, the third inequality follows from (36), and the fourth inequality follows from (21).

By using (33) and $\sqrt{uv}\le \frac{u+v}{2}$ for $u,v \ge 0$, we further obtain

$$\begin{aligned}&\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert \le \frac{1}{2t}\left( \sqrt{\frac{{\lambda '^2_{\max }}}{\lambda _{\min }}}\Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert +\Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert \right) +\frac{t a}{2b}\Delta ^k \nonumber \\&{~~~~~~~~~~~~~~~}=\frac{1}{2t}\left( 1+\sqrt{\frac{{\lambda '^2_{\max }}}{\lambda _{\min }}}\right) \Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert +\frac{t a}{2b}\Delta ^k , \end{aligned}$$

(39)

where t is an arbitrary positive constant.

Let $m:=\frac{1}{2t}\left( 1+\sqrt{\frac{{\lambda '^2_{\max }}}{\lambda _{\min }}}\right) $, adding $-m\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert $ to both sides of the above inequality and simplifying the resulting inequality, we have

$$\begin{aligned} \Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert \le \frac{m}{1-m}(\Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert -\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert )+\frac{t a}{2b(1-m)}\Delta ^k. \end{aligned}$$

(40)

Taking $t>\frac{1}{2}\left( 1+\sqrt{\frac{{\lambda '^2_{\max }}}{\lambda _{\min }}}\right) $ and hence $\frac{m}{1-m}>0$. Thus, summing (40) from $k=k_1, k_1+1, \cdots , \infty $, we have

$$\begin{aligned}&\sum _{k=k_1}^{\infty }\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert \nonumber \\&\le \frac{m}{1-m}\sum _{k=k_1}^{\infty }(\Vert \textbf{x}^k-\textbf{x}^{k-1}\Vert -\Vert \textbf{x}^{k+1}-\textbf{x}^k\Vert ) +\frac{t a}{2b}\varphi ({\mathcal {L}}_{\beta }(s^{k_1},y^{k_1},x^{k_1},\Lambda ^{k_1})-\theta ^*)\nonumber \\&<\infty . \end{aligned}$$

Hence, $\{\textbf{x}^k\}$ is convergent. From the inequality (33), we immediately obtain $\{\gamma ^k\}$ is convergent. From (29), we see that $\{\textbf{t}^k\}$, $\{\textbf{y}^k\}$, and $\{\textbf{z}^k\}$ are also convergent. Consequently, we conclude that $\{\textbf{t}^k,\textbf{y}^k,\textbf{z}^k, \textbf{x}^k,\gamma ^k\}$ is a convergent sequence. This completes the proof. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, Z., Xie, G., Ge, Z. et al. Nonconvex multi-period mean-variance portfolio optimization. Ann Oper Res 332, 617–644 (2024). https://doi.org/10.1007/s10479-023-05524-x

Download citation

Received: 03 September 2022
Accepted: 09 July 2023
Published: 27 July 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10479-023-05524-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Nonconvex multi-period mean-variance portfolio optimization

Abstract

Similar content being viewed by others

A brief review of portfolio optimization techniques

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Diversification and portfolio theory: a review

1 Introduction

2 Nonconvex sparse multi-period mean-variance model

3 A generalized ADMM for solving GNPMV model

4 Convergence analysis

4.1 Preliminaries

Definition 1

Definition 2

4.2 Convergence

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Remark 1

Theorem 5

Theorem 6

5 Numerical experiments

5.1 Numerical performance of ADMM

5.2 Effects of regularization parameters

5.3 Numerical comparisons between different penalties

6 Conclusions

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A

Appendix A

1.1 A.1 Proof of Lemma 1

Proof

1.2 A.2 Proof of Lemma 2

Proof

1.3 A.3 Proof of Lemma 3

Proof

1.4 A.4 Proof of Lemma 4

Proof

1.5 A.5 Proof of Theorem 5

Proof

1.6 A.6 Uniformized KL property (Bolte et al., 2014)

Lemma 7

1.7 A.7 Proof of Theorem 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation