1 Introduction

In many real-world applications of mathematical programming models, the continuous decision variables, x i , i=1,⋯,n, have to be confined in a disconnected set, {0}∪[l i ,u i ], i=1,⋯,n, and the number of nonzero variables of x i has to be upper bounded by a certain number K<n. In general, a variable x i ∈{0}∪[l i ,u i ] is referred to as semi-continuous variable, where 0∈[l i ,u i ] is also allowed, and the constraint for controlling the number of nonzero variables is referred to as cardinality constraint. In the production planning, the semi-continuous variables can be used to describe the state of a production process that is either turned off (inactive), hence nothing is produced, or turned on (active) such that the amount of the production has to lie in certain interval due to managerial and technological considerations. In portfolio selection optimization models, the semi-continuous variables are closely related to the so called minimum buy-in threshold which prevents the investors from holding a small position for assets, while the cardinality constraint limits the total number of different assets in the optimal portfolio due to the transaction cost and managerial concerns. Cardinality constraint is particularly important in portfolio management using index tracking strategy where a market benchmark index is tracked by a small group of assets.

The concepts of semi-continuous variables and cardinality constraint can be generalized to a decision vector of the form \(x=(x_{1},\cdots,x_{p})^{T}\in\Re^{n_{1}}\times\cdots\times\Re^{n_{p}}\) with n 1+⋯+n p =n. The sub-vector \(x_{i}\in\Re^{n_{i}}\) is called semi-continuous if

(1)

where \(A_{i}\in\Re^{m_{i}\times n_{i}}\), \(b_{i}\in\Re^{m_{i}}\) and A i x i ⩽0 implies x i =0. For x=(x 1,⋯,x p )T∈ℜn, the cardinality constraint for x can be expressed as

(2)

where \(\operatorname{card}(x)\) is defined as the number of i∈{1,⋯,p} such that x i ≠0. Using the binary variable y i ∈{0,1} in (1), the cardinality constraint (2) is equivalent to e T yK, where e is the column vector of all ones and y∈{0,1}p.

A general formulation of mathematical programming with semi-continuous variables and cardinality constraint can be expressed as the following mixed-integer programming:

where F(x,y,z): \((\Re^{n_{1}}\times\cdots\times\Re^{n_{p}})\times\Re ^{p}\times\Re^{q}\to\Re\) is a convex function, the set Ω⊆ℜn×ℜp×ℜq represents the general constraints for (x,y,z), and K is an integer with 1⩽Kp. Problem (P) is, in general, NP-hard as testing the feasibility of (P) with linear constraints is already NP-complete (see [7]). We point out that there are polynomially solvable cases of (P); for instance, a class of polynomially solvable cardinality-constrained quadratic optimization problems was identified in [28].

In this paper, we survey some of the recent advances in optimization problems with semi-continuous variables and cardinality constraint. This survey is motivated by continuously increasing interests in problem (P) in recent years from operations research, management science, finance engineering and engineering communities. In the sequel, we only consider the situation when the objective function of (P) has the following separable form:

(3)

where f(x) and g(z) are convex cost functions of x and z, and c∈ℜn is the vector of fixed cost coefficients associated with the semi-continuous variables.

In the next section, we describe some examples of problem (P) arising from different real-world applications. One of the efficient techniques for improving the continuous relaxation of (P) is the construction of convex envelope of a univariate function over semi-continuous variable using perspective function. In Sect. 3, we describe perspective reformulation of problem (P) when f(x) in (3) is a separable function. In particular, we discuss two tractable perspective reformulations: SOCP reformulation and perspective cut reformulation. A dual method for deriving the perspective reformulation is also presented. In Sect. 4, we focus on the quadratic case of (P). We give an SDP approach for computing the “best” diagonal decomposition of quadratic objective function. Specialized branch-and-bound and branch-and-cut methods for the quadratic case of (P) are also discussed. In Sect. 22, we discuss various approximate methods and techniques for dealing with the cardinality or sparse constraint. Finally, we give some concluding remarks in Sect. 6.

2 Examples of Applications

In this section, we describe some examples of problem (P) arising from portfolio selection, compressed sensing and subset selection, quadratic uncapacitated facility location problem and unit commitment problem in power system.

Example 1

(Portfolio selection)

Suppose there are n risky assets in the financial market with random return vector R=(R 1,⋯,R n )T. The expected return vector and the covariance matrix of R are given as μ and Q, respectively. A mean-variance portfolio selection model with cardinality and minimum buy-in threshold can be modeled as

where x i ∈ℜ represents the proportion of the total capital invested in the ith asset, ρ is a prescribed return level set by the investor, α i ∈(0,1) represents the minimum transaction amount of the ith asset, and u i ∈(α i ,1] is the maximum position of the ith asset.

Portfolio selection problems with cardinality and minimum threshold constraints have been studied by many researchers (see, e.g., [6, 7, 9, 11, 16, 29, 37, 44]). The model (MV) has been also investigated in finance literature in the context of limited-diversification, small portfolios and portfolio selection models with real features (see, e.g., [8, 33, 34, 42, 46]). In the literature, there are also heuristic procedures based on metaheuristics such as genetic algorithms, tabu search, and simulated annealing for problem (MV) (see, e.g., [12, 15, 20, 40, 43]). However, these metaheuristics do not guarantee to find the optimal or even a satisfactory near-optimal solution of (MV).

Example 2

(Compressed sensing and subset selection)

Compressed sensing is an important problem in signal processing (see, e.g., [10] and the references therein). The problem can be formulated as

where A∈ℜm×n is a data matrix, b∈ℜm is an observation vector, and 1⩽Kn is an integer for controlling the sparsity of the solution. In the compressed sensing problem, it is often assumed that m<n.

In multivariate linear regression, we are given m observed data points (a i,b i) with a i∈ℜn and b i∈ℜ. The goal is to minimize the least square measure of \(\sum^{m}_{i=1}(a_{i}^{T}x-b_{i})^{2}\) with only a subset of the prediction variables in x. This subset selection problem then has the same form as (CS) (see [3, 6, 41]). In contrast with the case of compressed sensing, the number of data in subset selection is often much larger than the dimension of the data (m>n).

In practice, we can always impose lower bound and upper bound on x, i.e., −l i x i u i , i=1,⋯,n, for some sufficiently large positive numbers l i and u i . Thus, (CS) is a special case of (P).

Example 3

(Separable quadratic UFL)

Given a set of customers N={1,⋯,n} and a set of facilities M={1,⋯,m}. A fixed cost c i occurs if facility iM is opened and the total number of opened facilities should be no more than 1⩽Km. All customers have unit demand that can be satisfied from the opened facilities. Let y i ∈{0,1} indicate whether or not facility i is opened, and x ij denote the fraction of demand of customer j satisfied from facility i. The transportation cost is defined by \(q_{ij}x_{ij}^{2}\). The separable quadratic uncapacitated facility location problem [31] can be then formulated as

The problem (SQUFL) is a generalization of the classical linear uncapacitated facility location problem in combinatorial optimization. We see that the decision vector can be written as x=(x 1,⋯,x m )∈ℜmn, where x i =(x i1,⋯,x in )T for i=1,⋯,m. Since 0⩽x i y i e, x i is semi-continuous for i=1,⋯,m.

Example 4

(Unit commitment problem)

This is a problem arising from electrical power production (see [21, 25]). Given a set I of thermal generating units, for each iI, the unit is either turned off or turned on with the power output lying between \(p^{i}_{\min}\) and \(p^{i}_{\max}\). The cost function of the ith unit is defined by a quadratic function of the power output p: f i(p)=a i p 2+b i p+c i, iI. The power demand over time period tT is d t , where T is the set of time periods in the planning horizon. The unit commitment problem is to generate power to meet the demand while minimizing the total cost. The operation of thermal units has to satisfy certain minimum up- and down-time constraints: whenever a unit is turned on or turned off it must remain committed or decommitted for a certain time. Let u it ∈{0,1} represent the status of commitment of unit i at time period t, and x it ⩾0 the power output of unit i at time period t. Let U denote the set of u it satisfying the minimum up- and down-time constraints. The basic unit commitment problem can be then formulated as a mixed-integer separable quadratic problem:

It is clear that (UC) is a special case of (P) with only semi-continuous variables. More realistic constraints can be further attached to the above basic formulation (see [25]).

3 Perspective Reformulations for Mathematical Programming with Semi-continuous Variables

In this section, we consider the following special case of (P):

where f i (i=1,⋯,n) are univariate convex functions, A, B, C are matrices with appropriate dimensions. Note that the cardinality constraint \(\sum_{i=1}^{n}y_{i}\leqslant K\) can be included in the linear constraints of (x,y,z). A novel reformulation technique called perspective reformulation was proposed by Frangioni and Gentile [2123] (see also [31]). This reformulation method is based on constructing convex envelope of a univariate function over the semi-continuous variable and gives rise to much more efficient mixed-integer program reformulation than the standard formulation (Ps).

3.1 Convex Envelope and Perspective Reformulation

Consider the following one-dimensional minimization over a semi-continuous variable:

(4)

where h:ℜ→ℜ is a convex function. This problem can be equivalently restated as

(5)

where

(6)

The convex envelope of \(\hat{h}(s,t)\) is defined by

$$\overline{co}(\hat{h}) (s,t)=\inf\bigl\{v\mid(s,t,v)\in\operatorname{conv}\bigl(\operatorname{epi}(\hat{h})\bigr)\bigr\}, $$

where \(\operatorname{epi}(\hat{h})\) is the epigraph defined by \(\operatorname{epi}(\hat{h})=\{(s,t,v)\mid v\geqslant\hat{h}(s,t)\}\). By (6), the convex hull of \(\operatorname{epi}(\hat{h})\) consists of all the points of the following form:

$$(1-\theta) (0,0,\bar{w})+\theta(\bar{s},1,\bar{v})=\bigl(\theta\bar {s}, \theta,(1-\theta)\bar{w}+\theta\bar{v}\bigr), $$

where \(\bar{w}\geqslant0\), \(\bar{v}\geqslant h(\bar{s})+d\), \(\bar{s}\in [\alpha,\beta]\) and θ∈[0,1]. Letting t=θ and \(s=\theta\bar{s}=t\bar{s}\), we have \(\bar{s}=s/t\) and αtsβt. Since \(\bar{w}\geqslant0\) and \(\bar{v}\geqslant h(\bar{s})+d\), we have

$$(1-\theta)\bar{w}+\theta\bar{v}\geqslant\theta\bigl(h(\bar{s})+d\bigr)=t\cdot h(s/t)+dt $$

for t∈(0,1]. Thus,

This function is continuous at (0,0) by defining 0/0:=0. The function th(s/t) is called perspective function of h(s) in convex analysis [32], see Fig. 1. Replacing \(\hat{h}(s,t)\) in (5) by \(\overline{co}(\hat {h})(s,t)\), we obtain the following equivalent form of (4):

(7)

Problem (7) is hence called the perspective reformulation of problem (4). Its continuous relaxation is

(8)

Since the objective function of (8) is the convex envelope of h(s)+dt, (8) is tighter than the direct continuous relaxation of (4). Actually, since there are no other constraints in (4), the optimal values of (4) and (8) are the same.

Fig. 1
figure 1

The perspective function of h(s)=s 2 on [−1,1]

Now, we apply the one-dimensional perspective reformulation (7) to problem (Ps) which has a separable objective function. The resulting perspective reformulation is

The problem (PRs) is more efficient than (Ps) in the sense that the continuous relaxation of (PRs) is tighter than that of (Ps).

The above perspective reformulation technique can be extended to problem which has an additional nonseparable term p(x) in the objective function. In fact, we can introduce a copy constraint \(x=\tilde{z}\) and rewrite the objective function as \(\sum_{i=1}^{n}f_{i}(x_{i})+c^{T}y+g(z)+p(\tilde{z})\). The perspective reformulation is then applicable to the resulting problem. In general, we can use perspective reformulation for minimization of any convex function over semi-continuous variables provided that the objective function has a separable term \(\sum_{i=1}^{n}f_{i}(x_{i})\).

3.2 A Dual Approach for Perspective Reformulation

In this subsection, we describe a dual approach for deriving the perspective reformulation (PRs). Without loss of generality, we consider a simple version of (Ps):

(9)

Dualizing the first constraint of (9) yields the following Lagrangian relaxation:

The Lagrangian dual of problem (9) then takes the following form,

Next, we show that the dual problem (Ds) is equivalent to the continuous relaxation of (PRs). For each i=1,⋯,n, let \(q_{i}=\min_{l_{i}\leqslant x_{i}\leqslant u_{i}} (f_{i}(x_{i})+\lambda^{T}a_{i}x_{i})\), where a i is the ith column of A. Then,

(10)

Since f i is convex, by strong duality, we have

(11)

where ζ i and η i are the multipliers for the inequalities l i x i and x i u i , respectively, and r i (ζ i ,η i ,λ) is the corresponding dual function. It then follows from (10) and (11) that

(12)

Note that problem (12) is a convex program since r i (ζ i ,η i ,λ) is a concave function. It is obvious that the Slater condition holds for (12). Dualizing the first constraint in problem (12) with multiplier y i ⩾0 and using strong duality, we have

Applying again the strong duality to the inner maximization of the above problem, we obtain

Letting z i =y i x i makes x i =z i /y i . Substituting x i =z i /y i into the above problem yields exactly the continuous relaxation of (PRs) for problem (9).

The above discussion reveals that the lower bound provided by the continuous relaxation of the perspective reformulation \({\rm(PR_{s}})\) is the same as the dual bound of (Ps), which is usually much tighter than the direct continuous relaxation of the original problem (Ps).

3.3 SOCP Reformulation and Perspective Cut Reformulation

Although the perspective reformulation (PRs) is, in theory, tighter than the original mixed-integer formulation, the nonlinear term y i f i (x i /y i ) makes the objective more nonlinear and intractable even for simple nonlinear functions such as quadratic functions. In this subsection, we describe two tractable reformulations derived from the basic perspective reformulation.

The second-order cone programming (SOCP) reformulation [1, 31, 45] is obtained by introducing an additional variable ϕ i =y i f i (x i /y i ) for each i. The problem (PRs) can be then restated as

If the constraints ϕ i y i f i (x i /y i ) is SOCP-representable, then the continuous relaxation of problem (SOCPs) is an SOCP problem that can be solved efficiently by interior-point methods [2, 5]. For instance, consider the quadratic case where \(f_{i}(x_{i})=a_{i}x_{i}^{2}+b_{i}x_{i}\) with a i >0. Then, \(y_{i}f_{i}(x_{i}/y_{i})=a_{i}x_{i}^{2}/y_{i}+b_{i}x_{i}\). Problem (PRs) can thus be equivalently rewritten as

(13)

It is easy to see that \(\phi_{i}y_{i}\geqslant a_{i}x_{i}^{2}\), ϕ i ⩾0, y i ⩾0 can be represented by the following SOCP constraint:

Thus, problem (13) is a mixed-integer SOCP reformulation of (Ps). The standard MIQP solvers such as CPLEX can be used to solve (13). Computational results in [21, 31, 48] showed that the SOCP reformulation is more efficient than the standard MIQP formulation \({\rm(P_{s}})\).

An alternative way in deriving tractable reformulation from (PRs) is the perspective cut reformulation proposed in [2123]. Recall that the convex envelope of the extended univariate function \(\hat{h}(s,t)\) defined in (6) is given by ψ(s,t)=th(s/t)+dt when αtsβt and t∈(0,1]. Assume that h is differentiable. Then,

$$\nabla\psi(s,t)=\biggl[h'(s/t),h(s/t)-\frac{s}{t}h'(s/t)+d \biggr]^T. $$

Note that ∇ψ(s,t) only depends on s/t and hence is constant on the line \(\{(\bar{s}t,t)\mid t\in[0,1]\}\) for any \(\bar{s}\in [\alpha,\beta]\). Recall from the convex analysis that the convex function ψ(s,t) is fully characterized by its tangent planes. For a given point \((\bar{s},\bar{t})\in{\rm dom}(\hat{h})\), the tangent plane of ψ is

$$v=\psi(\bar{s},\bar{t})+\nabla^T\psi(\bar{s},\bar {t})\bigl[(s,t)-( \bar{s},\bar{t})\bigr]. $$

Since ∇ψ(s,t) is constant on \((\bar{s}t,t)\), we only need to consider the tangent planes at \((\bar{s},1)\) for \(\bar{s}\in[\alpha,\beta]\). The epigraph \(\operatorname{epi}(\hat{h})\) can be then represented by the following infinite many linear inequalities:

$$v\geqslant h(\bar{s})+d+ h'(\bar{s}) (s-\bar{s})+\bigl[h( \bar{s})+d-h'(\bar {s})\bar{s}\bigr](t-1),\quad \forall\bar{s}\in[ \alpha,\beta], $$

which can be simplified to

(14)

The inequalities (14) are called perspective cuts [21]. Applying this “linearized” representation of \(\operatorname{epi}(\hat{h})\) to the perspective reformulation (PRs), we obtain the following perspective cut (P/C) reformulation of (Ps):

Problem (PCs) is a semi-infinite mixed-integer linear programming problem, which cannot be solved directly. Nevertheless, “localized” subproblems of (PCs) with a small finite subset of perspective cuts can be embedded in a branch-and-cut framework, where the violated perspective cuts with \(\bar{x}_{i}=x_{i}^{*}/y_{i}^{*}\) are added at each node when \(y_{i}^{*}\) is fractional in the optimal solution (x ,y ,z ,v ) of the continuous subproblem of the current node. The above solution scheme can be either implemented by tailor-made branch-and-cut method (see [21, 22]) or by means of cutcallback procedures in CPLEX (see [23]).

Computational results in [23] showed that the P/C reformulation is more efficient than the SOCP reformulation for problems with only semi-continuous variables. As mentioned in [31], the SOCP reformulation has the advantage featured with a simple and straightforward implementation using optimization modeling languages, without appealing to the specialized branch-and-cut procedures. Moreover, the numerical results in [48] also suggested that the SOCP reformulation can be more efficient than the P/C reformulation when cardinality constraint is present with small cardinality K.

A main drawback of the SOCP reformulation is the introduction of additional variables and constraints that makes the continuous relaxation to be more complex and time-consuming to solve. Recently, Frangioni et al. [24] proposed a projected perspective reformulation for problem (Ps) with quadratic function \(f_{i}(x_{i})=a_{i}x_{i}^{2}+b_{i}x_{i}\). This reformulation is based on a piecewise-quadratic description of \((1/y_{i})a_{i}x_{i}^{2}+b_{i}x_{i}+c_{i}y_{i}\), which is the convex envelope of f i (x i )+c i y i over the semi-continuous variable. Consequently, the projected perspective reformulation in [24] is a mixed-integer piecewise-quadratic program whose continuous relaxation has roughly the same size of the standard continuous relaxation.

4 Quadratic Programming with Semi-continuous Variables and Cardinality Constraints

In this section, we focus on quadratic programming with semi-continuous variables and cardinality constraint. This problem is a special case of (P) with convex objective quadratic function and linear constraints:

(15)
(16)
(17)
(18)

where Q is an n×n positive semidefinite matrix, c∈ℜn, A∈ℜm×n, b∈ℜm, K is an integer satisfying 1⩽Kn, 0<α i <u i . Problem (CCQP) is an important case of (P) because quadratic objective functions are used in most applications of (P), as seen in Sect. 2.

4.1 Diagonal Decompositions

As the quadratic objective function q(x) is usually nonseparable, the perspective reformulations in Sect. 3 cannot be directly applied to (CCQP). A diagonal decomposition was proposed in [22] to extract separable terms from the quadratic form x T Qx. Let \(d\in\Re^{n}_{+}\) with QD⪰0, where \(D=\operatorname{diag}(d)\) denotes the diagonal matrix with d being the diagonal vector. The quadratic objective function of (P) can be then decomposed as

(19)

Replacing the separable term x T Dx with its convex envelope over the semi-continuous variables, which is the sum of the perspective functions of \(d_{i}x_{i}^{2}\) over x i ∈{0}∪[α i ,u i ] for i=1,⋯,n, the perspective reformulation for (CCQP) has the following form:

The SOCP reformulation of (CCQP) is:

(20)

On the other hand, using the perspective cuts (14) for quadratic function \(f_{i}(x_{i})=d_{i}x_{i}^{2}\), we obtain the P/C reformulation of (PR(d)):

A key issue in implementing the SOCP reformulation (SOCP(d)) and the P/C reformulation (PC(d)) is how to choose the parameter vector d. A natural choice is d=(λ minϵ)e when Q is positive definite, where λ min is the minimum eigenvalue of Q, ϵ>0 is a sufficiently small scalar and e is the all one column vector. Frangioni and Gentile [22] suggested to use a heuristic to find a diagonal matrix \(D=\operatorname{diag}(d)\) by solving a simple semidefinite program (SDP):

which we will call the “small” SDP problem. Numerical results in [22] show that this approach favorably compares with the minimum eigenvalue method. A further question arises: How to find a “better” d in the perspective reformulation?

Zheng et al. [48] proposed an SDP approach for selecting the “best” parameter vector d in the reformulation (SOCP(d)) such that the continuous relaxation of (SOCP(d)) is as tight as possible.

The continuous relaxation of (SOCP(d)) has the following form:

The parameter vector d corresponding to the tightest continuous relaxation of (SOCP(d)) can be found by solving the following problem:

(21)

where v(⋅) denotes the optimal value of problem (⋅). It was shown in [48] that problem (21) is equivalent to the following SDP problem:

where β i :=α i +u i for i=1,⋯,n.

Compared with the “small” SDP formulation (SDPs) proposed by [22], the formulation (SDPl) has a drawback of having a larger dimension: 4n+m+2 variables in (SDPl) compared to only n variables in (SDPs). Also, (SDPl) has n additional 2×2 linear matrix inequalities. In spite of the larger size of (SDPl), it can still be computed efficiently by the interior-point based methods such as SeDuMi due to its simple structure. The longer time spent on solving the SDP problem (SDPl) could be well compensated by the savings in the computation time for the SOCP or P/C reformulations, as witnessed in the computational results in [48].

A Lagrangian decomposition scheme was proposed in [44] for cardinality constrained quadratic program with q(x)=x T(H T H)x+x T Dx+c T x (without semi-continuous variables). In [44], the dual bound is computed by subgradient method for fixed H and \(D=\operatorname{diag}(d)\) and is used in a branch-and-bound method. This Lagrangian decomposition method is extended in [48] to give an alternative derivation of the SOCP reformulation for (CCQP) when a fixed diagonal decomposition is used.

4.2 Specialized Branch-and-Bound Methods for (CCQP)

In this subsection, we discuss specialized branch-and-bound and branch-and-cut methods for solving (CCQP).

Since the objective function of (CCQP) is convex, a direct implementation of branch-and-bound method is possible: At each node of the search tree, solve a convex quadratic subproblem (CCQP) with all or part of the variables (x,y) and branch at some y i by forcing y i =0 or y i =1. This procedure could be not desirable because the number of variables in the continuous relaxation is doubled compared to the number of original decision variables. Bienstock [7] proposed to solve the subproblem with original variables and a surrogate constraint \(\sum_{i=1}^{n} x_{i}/u_{i}\leqslant K\) which replaces the constraint \(\sum_{i=1}^{n}y_{i}\leqslant K\). Clearly, \(\sum_{i=1}^{n} x_{i}/u_{i}\leqslant K\) is valid to (CCQP) since x i u i y i . The continuous subproblem then has the following form:

where \(x^{l}_{i}\) and \(x^{u}_{i}\) are lower and upper bounds of x i at the current node; and it is possible \(x^{l}_{i}=x^{u}_{i}\) when the variable is fixed. Primal feasible method is used in [7] for solving the continuous subproblem. The branching in [7] is done directly on the variable x i instead of on the binary variable y i : Either the constraint x i ⩽0 is added when x i is branched down or the constraint x i α i is added when x i is branched up. Mixed-integer rounding cuts, knapsack cuts and disjunctive cuts for cardinality constraint and semi-continuous variables were investigated in [7]. Recently, strong valid inequalities for semi-continuous knapsack polyhedron and cardinality constraint were derived in [18, 19]. These cuts can be incorporated into branch-and-cut methods.

Bertsimas and Shioda [6] proposed a branch-and-bound method where the continuous subproblem at each node is solved by Lemke’s pivoting method. One of the advantages of using Lemke’s method in a branch-and-bound method is that the optimal solution of the parent node can be used as an initial point of the Lemke’s method for solving the continuous subproblem of the current node, because Lemke’s method can be started from an infeasible basic solution.

Shaw et al. [44] presented a branch-and-bound method for cardinality constrained mean-variance portfolio problems, where the asset returns are driven by a factor model. The covariance matrix of the asset returns can be then expressed as Q=H T H+D, where H∈ℜm×n and D is a nonnegative diagonal matrix. Unlike the above two exact methods where a quadratic programming relaxation is solved at each node, the lower bounds in [44] are computed by solving a Lagrangian dual problem via subgradient method. Recently, [27] derived the optimal control law for discrete-time cardinality constrained linear-quadratic control problem, in which the number of time periods where controls can be applied is limited, by using a semi-definite programming solution scheme.

To provide upper bounds in branch-and-bound or branch-and-cut methods, fast heuristics are needed to find feasible solutions of (CCQP) at the root node or some sub-nodes during the branch-and-bound search process. A simple and natural heuristic to generate a feasible solution of (CCQP) is as follows.

Heuristic 1

([34])

Let x be the optimal solution of the continuous relaxation. Rank the absolute values of \(x^{*}_{i}\) as: \(|x^{*}_{i_{1}}|\leqslant|x^{*}_{i_{2}}|\leqslant\cdots\leqslant|x^{*}_{i_{n}}|\). Resolve the continuous problem with \(x_{i_{j}}=0\) for j=1,⋯,nK and \(x_{i_{j}}\geqslant\alpha_{i}\) for j=nK+1,⋯,n. If the problem is feasible, then the optimal solution is feasible to (CCQP).

Another heuristic is based on solving the small mixed-integer subproblem of (CCQP) using branch-and-bound method. The number of variables of the subproblem is K+κ, where κ is a small integer, e.g., κ=⌈n/10⌉. The heuristic can be described as follows:

Heuristic 2

([7])

Let x be the optimal solution of the continuous relaxation. Rank the absolute values of \(x^{*}_{i}\) as: \(|x^{*}_{i_{1}}|\leqslant|x^{*}_{i_{2}}|\leqslant\cdots\leqslant|x^{*}_{i_{n}}|\). Set \(x_{i_{j}}=0\) for j=1,⋯,nKκ in (CCQP). Solve the reduced mixed-integer quadratic program with initial upper bound obtained from Heuristic 1 (or +∞ if no feasible solution is found by Heuristic 1). Early termination of the branch-and-bound method for the small reduced subproblem can be also done by setting a limit on the maximum time, number of nodes or relative gap. Since K+κ is often much less than n in practice, it is expected that Heuristic 2 is fast in finding a reasonably good feasible solution of (CCQP). If the reduced mixed-integer program is infeasible, we can increase κ until the reduced problem is feasible.

5 Approximate Methods for Cardinality Constrained Problems

In the literature, for a vector x=(x 1,⋯,x n )T∈ℜn, the cardinality function \(\operatorname{card}(x)\) is also called (quasi) 0-norm, denoted by ∥x0, in sparse optimization literature. A general form of cardinality constrained mathematical programming can be formulated as

where f:ℜn→ℜ, g:ℜn→ℜm, h:ℜn→ℜp are continuously differentiable functions. (Pc) can be regarded as a special case of (P) without explicit semi-continuous variables. An important special case of (Pc) is the cardinality constrained quadratic program where f is quadratic and g and h are affine functions. In particular, the following problem of minimizing a quadratic function under cardinality constraint is of interest:

$$(\mathrm{QP}_{\mathrm{c}})\quad \min\bigl\{x^TQx+c^Tx\mid\|x \|_0\leqslant K\bigr\}. $$

The above formulation includes many applications such as compressed sensing and subset selection. Specialized convex relaxations for (QPc) are discussed in [4]. A closely related problem to the cardinality constrained quadratic program is the 0-norm minimization problem or sparse solutions of linear equations:

The reader is referred to [10, 35, 47] for an extensive literature on this problem.

In this section, we describe different inexact methods for (Pc) or its special cases. These methods are mainly based on various approximations and relaxations of the 0-norm function ∥x0 except for the penalty decomposition and alternating direction method in [38]. The suboptimal or local solutions obtained from these approximate methods can be used to improve the performance of branch-and-bound methods to find exact solutions of (Pc).

5.1 p -Norm Approximation

A popular approach in the literature for dealing with 0-norm ∥x0 is to replace it with the 1-norm ∥x1. The resulting problem then becomes

(22)

The above problem is a convex relaxation of (Pc) when f and g are convex and h is affine. The 1-norm constraint ∥x1K in problem (22) can be also incorporated into the objective function as a regularized or penalized term, yielding the following convex problem,

(23)

where λ>0 is a regularization parameter. The approach to replace 0-norm with 1-norm is called basis pursuit [17]. In contrast to its successful applications in sparse solution to linear system, the 1-norm approximation problem formulation (22) or (23), however, does not necessarily produce solutions with desired sparsity for general cardinality constrained optimization problems. Nonconvex norm p -norm (0<p<1) can be used to replace ∥x0 to enforce a stricter sparsity, leading to the following nonconvex approximation of (Pc):

(24)

or its regularized problem:

(25)

The p -norm approximation is based on the following property:

for any fixed x (see Fig. 2 for the one-dimensional case).

Fig. 2
figure 2

p -norm functions with different values of p

It is shown in [30] that the p minimization: \(\min\{ \|x\|_{p}^{p}\mid Ax=b\}\) is strongly NP-hard when 0<p<1. An interior point method is suggested in [30] for the p minimization. Lai and Wang [36] also developed an iterative solution method for the p minimization. In [26], it is proved that the 0 and p minimization problems with linear equality and inequality constraints are equivalent for some sufficiently small p>0. A successive linearization algorithm is proposed in [26] for finding a stationary point of the p minimization problem. Chen et al. [13] proved that the unconstrained q - p minimization:

$$\min_{x\in\Re^n} \|Ax-b\|^q_q+\lambda\|x \|^p_p, \quad \mbox{where } q\geqslant1,~\lambda>0,~ 0<p<1, $$

is strongly NP-hard. Lower bounds of the parameter λ in the 2- p minimization for achieving the sparsity requirement ∥x0K are established in [13, 14].

5.2 Mangasarian’s Approximation Method

Mangasarian [39] suggested to replace ∥x0 by the following exponential function:

for some α>0. It is clear that ϕ α (x)⩽∥x0 and

$$\lim_{\alpha\to+\infty}\phi_\alpha(x)=\|x\|_0 $$

for any fixed x (see Fig. 3 for the one-dimensional case). It is shown in [39] that the ϕ α minimization and the 0 minimization over polyhedral set are equivalent for some large α>0. A successive linearization algorithm is also proposed in [39] to find a stationary point of the ϕ α minimization problem.

Fig. 3
figure 3

Functions 1−e α|x| with different values of α

5.3 DC Approximation

In this subsection, we introduce a DC approach [49] to approximate the 0 norm function ∥x0. We first note that

(26)

where \(\operatorname{sign}(z)\) denotes the sign function of z∈ℜ which is discontinuous at 0. Consider the following piecewise linear approximation of \(\operatorname{sign}(|z|)\):

(27)

where t>0 is a parameter (see Fig. 4). It is easy to see that

$$\lim_{t\to0^+}w(z,t)=\operatorname{sign}\bigl(|z|\bigr) $$

for any fixed z.

Fig. 4
figure 4

(a) Function \(y=\operatorname{sign}(|z|)\); (b) function y=w(z,t)

We see that function w(z,t) can be also expressed as

$$w(z,t)=\frac{1}{t}|z| -\frac{1}{t} \bigl[(z-t)^++(-z-t)^+ \bigr]= \frac{1}{t}\bigl[h(z,0)-h(z,t)\bigr], $$

where a +=max(a,0) and h(z,t)=(zt)++(−zt)+. Since h(z,t) is a convex function of z, w(z,t) is a DC function (difference of two convex functions) of z. Using w(z,t), we can construct the following piecewise linear underestimation of the 0-norm function ∥x0 for x∈ℜn:

$$\psi(x,t)=\sum_{i=1}^nw(x_i,t)= \frac{1}{t}\Biggl(\|x\|_1-\sum_{i=1}^nh(x_i,t) \Biggr). $$

We see that ψ(x,t) is a nonsmooth piecewise linear DC function of x and \(\lim_{t\to0^{+}}\psi(x,t)=\|x\|_{0}\) for any fixed x.

A prominent feature of the above piecewise linear DC approximation lies in its polyhedral properties which can be exploited to construct tighter convex subproblems using strengthening cuts when linearization method is used to derive convex approximation of the constraint ψ(x,t)⩽K. In fact, by the definition of w(x i ,t), it always holds w(x i ,t)⩽1 (i=1,⋯,n). The convex inner approximation of w(x i ,t)⩽1 at y i is

$$\sum_{i=1}^n\frac{1}{t}|x_i|- \frac{1}{t}\bigl[h(y_i,t)+\xi _i(x_i-y_i) \bigr]\leqslant1, $$

where ξ i ∂h(y i ,t) for i=1,⋯,n. The above n inequalities provide strengthening cuts to the feasible set of the convex subproblems in the linearization method (see [49]).

5.4 Penalty Decomposition and Alternating Direction Method

Alternating direction method (ADM) or block coordinate decent method is a classical method in solving convex programming: min(x,y)∈X×Y f(x,y). The idea of ADM is quite simple and straightforward: The function f(x,y) is minimized over x for fixed y and minimized over y for fixed x alternatively in the hope that the iterative sequence {(x k,y k)} eventually converges to the stationary point of the problem. This method is particularly useful when the problem over (x,y) can be reduced to “easier” subproblems when either x or y is fixed.

We now consider the problem (Pc) with convex f and g, which can be rewritten as

Penalizing the equality constraints x=y, we obtain

(28)

We observe that problem (28) reduces to a continuous convex program for fixed y, and to a minimization of separable quadratic function with cardinality constraint for fixed x. It turns out the latter problem can be explicitly solved as stated in the following lemma.

Lemma 1

([38])

Consider the problem:

(29)

where q i (0)=0 (i=1,⋯,n). Let \(q_{i}^{*}=\min_{x_{i}\in X_{i}}q_{i}(x_{i})\) with minimizer \(x_{i}=\bar{x}^{*}_{i}\). Let \(\{q^{*}_{i}\}_{i=1}^{n}\) be ranked in an increasing order: \(q^{*}_{i_{1}}\leqslant q^{*}_{i_{2}}\leqslant\cdots\leqslant q^{*}_{i_{n}}\). Then, the optimal solution x of (29) is given by

$$x^*_{i_k}=\left \{ \begin{array}{@{}l@{\quad}l@{}} \bar{x}_{i_k}^*, & 1\leqslant k\leqslant K,\\ 0, & \mbox{\textit{otherwise}}. \end{array} \right . $$

It is shown in [38] that under some mild conditions, the ADM for (Pc) converges to a local minimizer of (Pc). Computational results in [38] show that the ADM is capable of finding feasible solutions of good quality within reasonable computing time. We remark that alternating direction method for (Pc) can be extended to problem (CCQP) with semi-continuous variables x i ∈{0}∪[α i ,u i ], where 0<α i <u i . Finally, we point out that the augmented Lagrangian function \(f(x)+\lambda^{T}(x-y)+\frac{\rho}{2}\|x-y\|^{2}_{2}\) can be also employed in the penalized problem (28).

6 Conclusions

We have summarized in this paper some recent advances in mathematical programming with semi-continuous variables and cardinality constraint. Our focuses are mainly on theory and solution techniques that are potentially applicable to problems arising from the real-world applications. On one hand, semi-continuous variables and cardinality constraint have been widely used in modeling real-world optimization problems, leading to increasing interests and demands of efficient solution methods to tackle this kind of discrete constraints. On the other hand, the inherent combinatorial nature makes it very difficult to find the exact solutions of the problems with realistic dimension and data structures. The current literature lacks systematic investigation of the theory and solution methods for mathematical programming with semi-continuous variables and cardinality constraint. Further research efforts are needed to better our understandings of this class of challenging mathematical programming problems.