1 Introduction

Over the past decades, the problem of sparse signal recovery, now termed as the compressed sensing (CS) [1,2,3], has been greatly developed and widely used in many fields such as the pattern recognition [4], image processing [5,6,7], medical image [8] and also the camera design [9], to name a few. Simply speaking, we say a signal \({\varvec{\widehat{x}}}\in {\mathbb {R}}^{n}\) is sparse if and only if it has fewer nonzero components than its length, and if there are at most \(k(\ll n)\) nonzero entries in \({\varvec{\widehat{x}}}\), \({\varvec{\widehat{x}}}\) is said to be a k-sparse signal. In fact, one of the key goals of CS is the recovery of such a k-sparse signal \({\varvec{\widehat{x}}}\) from its as few as observations \({\varvec{b}}=A{\varvec{\widehat{x}}}\), where \(A\in {\mathbb {R}}^{m\times n}(m<n)\) is a pre-designed measurement matrix and \({\varvec{b}}\in {\mathbb {R}}^{m}\) is the resultant observed signal. To realize this goal, it is often suggested to solve the following \(\ell _{1}\) method [10, 11]

$$\begin{aligned} \min _{{\varvec{x}}\in {\mathbb {R}}^{n}}~ \Vert {\varvec{x}} \Vert _{1}=\sum _{i=1}^n| x_i|~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}}. \end{aligned}$$
(1.1)

Since model (1.1) is convex, it can be efficiently solved by lots of convex optimization algorithms [12, 13], and many recovery guarantees, including the recovery conditions as well as their resultant recovery error estimates, have also been obtained for (1.1) before, see, e.g., [14,15,16].

Unfortunately, the \(\ell _{1}\) method do not incorporate any side information of \({\varvec{\widehat{x}}}\) due to the equal treatment of the \(\ell _{1}\) norm for the components of the variable \({\varvec{x}}\). Considering that such side information is often available in many real-world applications, it is naturally expected that the performance of model (1.1) can be further improved if the side information is well integrated. In general, there are two types of common side information in CS filed. The first one takes the form of a known support estimate. To deal with this type of side information, the authors in [17] first modeled the known support as a set T, and then integrated it into the \(\ell _{1}\) norm, leading to the model

$$\begin{aligned} \min _{{\varvec{x}}\in {\mathbb {R}}^{n}}~ \Vert {\varvec{x}}_{K^{c}} \Vert _{1}\triangleq \sum _{i\in K^{c}} |x_i|~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}}, \end{aligned}$$
(1.2)

where \(K^{c}\) models the complement set of K in \(\{1,2,3,\dots , n\}\). Their work also showed that the resultant recovery conditions are weaker than those without side information. In [18], the authors considered a more general weight rather than a constant weight in the known support estimate. In [19], a variant iterative hard thresholding (IHT) algorithm was proposed by incorporating the partially known support information, and some theoretical analysis was also established for this algorithm. In [20], the orthogonal matching pursuit (OMP), as an iterative greedy algorithm, was extended by using the partially known support. The authors of [21] also considered embedding the known support information into the iterative reweighted least squares (IRLS) algorithm at each iteration, leading a reduction of the number of the measurements as well as the computational cost. Recently, some new recovery conditions were obtained by Ge et al. in [22].

Another type of the side information takes the form of a referenced similar to the original signal \({\varvec{\widehat{x}}}\). The side information of this type usually comes from applications such as the magnetic resonance imaging (MRI) [23], video processing [24, 25] and estimate problems [26]. For example, when faced with some video processing problems, we usually know some previous video frames before we cope with the next video frames. These video frames known in advance, to some degree, can be viewed as the side information of the next video frames. By introducing two \(\ell _{1}\) norm and \(\ell _{2}\) norm approximation terms to model (1.1), respectively, Mota et al. [27] proposed to solve an \(\ell _{1}\)\(\ell _{1}\) method

$$\begin{aligned} {\varvec{x}}^{\sharp }=\mathop {\arg \min }_{{\varvec{x}}\in {\mathbb {R}}^{n}}~\Vert {\varvec{x}} \Vert _{1}+\beta \Vert {\varvec{x}}-{\varvec{w}}\Vert _1~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}} \end{aligned}$$
(1.3)

and the \(\ell _{1}\)\(\ell _{2}\) method

$$\begin{aligned} {\varvec{x}}^{\diamond }=\mathop {\arg \min }_{{\varvec{x}}\in {\mathbb {R}}^{n}}~\Vert {\varvec{x}} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{x}}-{\varvec{w}}\Vert ^{2}_{2}~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}}, \end{aligned}$$
(1.4)

where \(\beta\) is a positive parameter and \({\varvec{w}}\in {\mathbb {R}}^{n}\) is the referenced signal that models the side information. For simple, \({\varvec{w}}\) is assumed to obey \(\text {Supp}({\varvec{w}})\subset \text {Supp}({\varvec{\widehat{x}}})\), where \(\text {Supp}({\varvec{w}})=\{i:|{\varvec{w}}_{i}|\ne 0, i=1,2,\cdots ,n\}\). Based on some statistical tools, the authors affirmatively answer how many measurements one required to ensure the exact recovery of any k-sparse signal \({\varvec{\widehat{x}}}\). Some convincing experiments are also conducted to support their claims. Note that there also exist some works which embeds the prior information of the desired signals into some other models. For example, Zhang et al. [28] recently proposed to use an \(\ell _{1-2}\) model to deal with the signal recovery with prior information. We refer the interested readers to [28,29,30] and the references within for more details.

In this paper, we revisit the above \(\ell _{1}\)\(\ell _{1}\) and \(\ell _{1}\)\(\ell _{2}\) methods for exact sparse recovery with side information. Different from the pioneering work of [27], this paper aims at investigating both the \(\ell _{1}\)\(\ell _{1}\) and \(\ell _{1}\)\(\ell _{2}\) methods in a deterministic way. To do so, by means of the powerful null space property (NSP), we established two kind of deterministic sufficient and necessary condition for these two methods. Our obtained theoretical results not only well complement the previous work [27] that was based on the statistical analysis, but also surprisingly find that the sharp exact recovery conditions of model (1.1) are still suitable for the \(\ell _{1}\)\(\ell _{1}\) model (1.3). Moreover, the resultant numerical experiments show that the recovery performance of the \(\ell _{1}\)\(\ell _{1}\) method is superior to other methods in terms of the number of the measurements required by incorporating the side information.

The rest of this paper is organized as follows the main theoretical results are presented in Sect. 2, and the resultant numerical experiments are provided in Sect. 3. Finally, we conclude this paper in Sect. 4.

2 The deterministic analysis of \(\ell _{1}\)\(\ell _{1}\) and \(\ell _{1}\)\(\ell _{2}\) methods

Our main results will be presented in this section, which include the exact recovery guarantees of \(\ell _{1}\)\(\ell _{1}\) and \(\ell _{1}\)\(\ell _{2}\) methods. Before moving on, we first introduce the following two key definitions.

Definition 2.1

(NSP, see, e.g., [31]) For any subsets \(K\subset \{1,2,\ldots n\}\) with \(|K|\le k\) and any \({\varvec{h}}\in \text {Ker}(A)\backslash \{0\}\triangleq \{{\varvec{h}}:A{\varvec{h}}=0, {\varvec{h}}\ne 0\}\), we say \(A\in {\mathbb {R}}^{m\times n}\) satisfies the k-order NSP if it holds that

$$\begin{aligned} \Vert {\varvec{h}}_K \Vert _{1}< \Vert {\varvec{h}}_{K^c} \Vert _{1}. \end{aligned}$$
(2.5)

Furthermore, if it holds that

$$\begin{aligned} \Vert {\varvec{h}}_K \Vert _{1}\le \alpha \Vert {\varvec{h}}_{K^c} \Vert _{1} \end{aligned}$$
(2.6)

for certain \(0<\alpha <1\), then we say A satisfies the k-order stable NSP with constant \(\alpha\).

Definition 2.2

(Restricted isometry property, see, e.g., [3]) A matrix A is said to satisfy the k-order restricted isometry property (RIP) if there exists \(0<\delta <1\) such that

$$\begin{aligned} (1-\delta )\Vert {\varvec{h}}_K \Vert _{2}^{2}\le \Vert A{\varvec{h}}_K \Vert _{2}\le (1+\delta )\Vert {\varvec{h}}_K \Vert _{2}^{2}, \end{aligned}$$
(2.7)

holds for all k-sparse signals \({\varvec{h}}\in {\mathbb {R}}^{n}\) and subsets \(K\subset \{1,2,\ldots n\}\) with \(|K|\le k\). Moreover, the smallest \(\delta\) obeying (2.7) is denoted by \(\delta _{k}\), i.e., the known k-order restricted isometry constant (RIC).

Theorem 2.3

The \(\ell _{1}\)\(\ell _{1}\) model (1.3) with \(\text {Supp}(w)\subset \text {Supp}({\hat{x}})\) has a unique k-sparse solution if and only if A obeys the k-order NSP.

Remark 2.4

The k-order NSP has been demonstrated to be a necessary and sufficient condition for the classical \(\ell _{1}\) method to ensure the exactly k-sparse signal recovery. However, according to our Theorem 2.3, this condition also applies to the \(\ell _{1}\)\(\ell _{1}\) model (1.3). On the other hand, it has also been shown in [32, 33] that if A obeys the k-order stable NSP with constant \(\alpha\), then \(\alpha\) can be expressed by tk-order RIC \(\delta _{tk}\) with \(t>1\) as follows:

$$\begin{aligned} \alpha =\frac{\delta _{tk}}{\sqrt{\left( 1-\left( \delta _{tk}\right) ^{2}\right) (t-1)}}. \end{aligned}$$

If one further restricts \(\alpha <1\), then we will get

$$\begin{aligned} \delta _{tk}<\sqrt{\frac{t-1}{t}}. \end{aligned}$$
(2.8)

Note that condition (2.8) has been proved to sharp for the classical \(\ell _{1}\) to exactly recover any k-sparse signal. Again, condition (2.8) is also suitable to the \(\ell _{1}\)\(\ell _{1}\) model (1.3). As far as we know, the RIC-based sufficient conditions have not been established for the \(\ell _{1}\)\(\ell _{1}\) method before.

Proof of Theorem 2.3

First, we prove the sufficiency. Pick any feasible k-sparse vectors \({\varvec{\widehat{x}}}\). Let \(K=Supp({\varvec{\widehat{x}}})\). Since \({\varvec{h}}\in \text {Ker}(A)\), it holds that \(A({\varvec{\widehat{x}}}+{\varvec{h}})=A{\varvec{\widehat{x}}}={\varvec{b}}\). And

$$\begin{aligned} \Vert {\varvec{\widehat{x}}}+{\varvec{h}} \Vert _{1}+ & {} \beta \Vert {\varvec{\widehat{x}}}-{\varvec{w}}+{\varvec{h}} \Vert _{1} =\Vert {\varvec{\widehat{x}}}_{K}+{\varvec{h}}_{K} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}_{K}+{\varvec{h}}_{K}-{\varvec{w}} \Vert _{1}+\Vert {\varvec{h}}_{K^c} \Vert _{1}+\beta \Vert {\varvec{h}}_{K^c} \Vert _{1}\nonumber \\\ge & {} \Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}-\Vert {\varvec{h}}_{K} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}_{K}-{\varvec{w}} \Vert _{1}-\beta \Vert {\varvec{h}}_{K} \Vert _{1}+\Vert {\varvec{h}}_{K^c} \Vert _{1}+\beta \Vert {\varvec{h}}_{K^c} \Vert _{1}\nonumber \\= & {} \Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}_{K}-{\varvec{w}} \Vert _{1}+(1+\beta )(\Vert {\varvec{h}}_{K^c} \Vert _{1}-\Vert {\varvec{h}}_{K} \Vert _{1}), \end{aligned}$$
(2.9)

where we have used the triangle inequality in the first inequality.

Recall that \(\Vert {\varvec{h}}_K \Vert _{1}< \Vert {\varvec{h}}_{K^c} \Vert _{1}\), \(\beta >0\), and we have assumed that A obeys the k-order NSP, we get \(\Vert {\varvec{\widehat{x}}}+{\varvec{h}} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}+{\varvec{h}}-{\varvec{w}} \Vert _{1}>\Vert {\varvec{\widehat{x}}} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}-{\varvec{w}} \Vert _{1}\). Hence, the sufficiency is proved.

Now, we prove the necessity. To do so, we first assume that the ith component of \({\varvec{\widehat{x}}}\) obeys \({\varvec{\widehat{x}}}_{i}=-\text {sign}(\Vert {\varvec{h}}\Vert _{\infty })\) for all \(i\in K\). Then, we can obtain the following properties \(\Vert {\varvec{\widehat{x}}} \Vert _0\le k\), \(\Vert {\varvec{\widehat{x}}}_{K}+ \tau {\varvec{h}}_{K} \Vert _{1}=\Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}-\Vert \tau {\varvec{h}}_{K} \Vert _{1}\) holds for all \(0<\tau \le 1\). Now, By replacing \({\varvec{h}}\) in (2.9) with \(\tau {\varvec{h}}\), and noting that \(\Vert {\varvec{\widehat{x}}}+\tau {\varvec{h}} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}+\tau {\varvec{h}}-{\varvec{w}} \Vert _{1}>\Vert {\varvec{\widehat{x}}} \Vert _{1}+\beta \Vert {\varvec{\widehat{x}}}-{\varvec{w}} \Vert _{1}\) since \({\varvec{\widehat{x}}}\) is the exact solution, we can easily deduce that

$$\begin{aligned} (1+\beta )(\Vert \tau {\varvec{h}}_{K^c} \Vert _{1}-\Vert \tau {\varvec{h}}_{K} \Vert _{1})>0 \end{aligned}$$

for any \(0<\tau \le 1\) and \(\beta >0\), which requires (2.5) to hold. \(\square\)

In what follows, we establish the stable NSP condition of order s for the \(\ell _{1}\)\(\ell _{2}\) model (1.4).

Theorem 2.5

Assume that \({\varvec{w}}\) is a side information with \(\text {Supp}(w)\subset \text {Supp}({\hat{x}})\). The \(\ell _{1}\)\(\ell _{2}\) model (1.4) has a unique k-sparse solution \({\varvec{\widehat{x}}}(\ne 0)\) if and only if A obeys k-order stable NSP with \(\alpha\) being

$$\begin{aligned} \alpha =\frac{1}{1+\beta \left( \Vert {\varvec{\widehat{x}}} \Vert _{\infty }+\Vert {\varvec{w}} \Vert _{\infty }\right) } \end{aligned}$$
(2.10)

Remark 2.6

Compared with the previous NSP condition for the \(\ell _{1}\)\(\ell _{1}\) model (1.3), the obtained stable NSP for \(\ell _{1}\)\(\ell _{2}\) model (1.4) performs a bit loose. Besides, it is also affected by the infinite norm of the desired k-sparse signal \({\varvec{\widehat{x}}}\) and the referenced signal \({\varvec{w}}\). From this point of view, the \(\ell _{1}\)\(\ell _{2}\) model is less effective than the \(\ell _{1}\)\(\ell _{1}\) model in theoretical aspect.

Proof of Theorem 2.5

Our proof is partially inspired by [34]. We start with proving the sufficiency. Pick any feasible k-sparse vectors \({\varvec{\widehat{x}}}\). Let \(K=\text {Supp}({\varvec{\widehat{x}}})\). Since \({\varvec{h}}\in \text {Ker}(A)\), we first have \(A({\varvec{\widehat{x}}}+{\varvec{h}})=A{\varvec{\widehat{x}}}={\varvec{b}}\), and

$$\begin{aligned}&\Vert {\varvec{\widehat{x}}}+{\varvec{h}} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}+{\varvec{h}}-{\varvec{w}} \Vert _{2}^{2}=\Vert {\varvec{\widehat{x}}}_{K}+{\varvec{h}}_{K} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}_{K}+{\varvec{h}}_{K}-{\varvec{w}} \Vert _{2}^{2}+\Vert {\varvec{h}}_{K^c} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{h}}_{K^c} \Vert _{2}^{2}\nonumber \\&\ge \Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}-\Vert {\varvec{h}}_{K} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}_{K}-{\varvec{w}} \Vert _{2}^{2}+\frac{\beta }{2}\Vert {\varvec{h}}_{K} \Vert _{2}^{2}+\beta \langle {\varvec{\widehat{x}}}_{K}-{\varvec{w}},{\varvec{h}}_{K}\rangle +\Vert {\varvec{h}}_{K^c} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{h}}_{K^c} \Vert _{2}^{2}\nonumber \\&\ge \Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}_{K}-{\varvec{w}} \Vert _{2}^{2}+\Vert {\varvec{h}}_{K^c} \Vert _{1}-\Vert {\varvec{h}}_{K} \Vert _{1}+\beta \langle {\varvec{\widehat{x}}}_{K}-{\varvec{w}},{\varvec{h}}_{K}\rangle +\frac{\beta }{2}\Vert {\varvec{h}} \Vert _{2}^{2}\nonumber \\&\ge \Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}_{K}-{\varvec{w}} \Vert _{2}^{2}+\Vert {\varvec{h}}_{K^c} \Vert _{1}-\Vert {\varvec{h}}_{K} \Vert _{1}-\beta (\Vert {\varvec{\widehat{x}}}\Vert _{\infty }+\Vert {\varvec{w}}\Vert _{\infty })\Vert {\varvec{h}}_{K} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{h}} \Vert _{2}^{2} \end{aligned}$$
(2.11)

where we have used \(\Vert {\varvec{h}}_{K} \Vert _{2}^{2}+\Vert {\varvec{h}}_{K^c} \Vert _{2}^{2}=\Vert {\varvec{h}} \Vert _{2}^{2}\) and \(\langle {\varvec{\widehat{x}}}_{K}-{\varvec{w}},{\varvec{h}}_{K}\rangle \ge -(\Vert {\varvec{\widehat{x}}}\Vert _{\infty }+\Vert {\varvec{w}}\Vert _{\infty })\Vert {\varvec{h}}_{K} \Vert _{1}\) in the second inequality. Since \(\Vert {\varvec{h}} \Vert _{2}^{2}>0\) and \([1+\beta \left( \Vert {\varvec{\widehat{x}}} \Vert _{\infty }+\Vert {\varvec{w}} \Vert _{\infty } \right) ]\Vert {\varvec{h}}_K \Vert _{1}<\Vert {\varvec{h}}_{K^c} \Vert _{1}\), we get \(\Vert {\varvec{\widehat{x}}}+{\varvec{h}} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}+{\varvec{h}}-{\varvec{w}} \Vert _{2}^{2}>\Vert {\varvec{\widehat{x}}} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}-{\varvec{w}} \Vert _{2}^{2}\). Hence, we prove that \({\varvec{\widehat{x}}}\) is the unique minimizer of (1.4).

As for the necessity, it is sufficient to show that for any given nonzero \({\varvec{h}}\in \text {Ker}(A)\) and K with \(|K|\le k\) the stable NSP of order k with \(\alpha\) given by (2.10) holds. Similarly with Theorem 2.3, we can obtain \(\Vert {\varvec{\widehat{x}}} \Vert _0\le k\), \(\Vert {\varvec{\widehat{x}}}_{K}+ \tau {\varvec{h}}_{K} \Vert _{1}=\Vert {\varvec{\widehat{x}}}_{K} \Vert _{1}-\Vert \tau {\varvec{h}}_{K} \Vert _{1}\). Furthermore, assume that the scales \({\varvec{\widehat{x}}}\) and \({\varvec{w}}\) have fixed values of \(\Vert {\varvec{\widehat{x}}}\Vert _{\infty }\) and \(-\Vert {\varvec{w}}\Vert _{\infty }\), respectively, then we get \(\langle {\varvec{\widehat{x}}}_{K}-{\varvec{w}}, \tau {\varvec{h}}_{K}\rangle =-(\Vert {\varvec{\widehat{x}}}\Vert _{\infty }+\Vert {\varvec{w}}\Vert _{\infty })\Vert {\varvec{h}}_{K} \Vert _{1}\) for any \(0<\tau \le 1\). Now, we replace \(\tau {\varvec{h}}\) with \({\varvec{h}}\) and observe that both of inequalities of (2.11) now hold with equality. Since \({\varvec{\widehat{x}}}\) is the exact recovery, it requires \(\Vert {\varvec{\widehat{x}}}+\tau {\varvec{h}} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}+\tau {\varvec{h}}-{\varvec{w}} \Vert _{2}^{2}>\Vert {\varvec{\widehat{x}}} \Vert _{1}+\frac{\beta }{2}\Vert {\varvec{\widehat{x}}}-{\varvec{w}} \Vert _{2}^{2}\), so we get

$$\begin{aligned} \Vert \tau {\varvec{h}}_{K^c} \Vert _{1}-\Vert \tau {\varvec{h}}_{K} \Vert _{1}-\beta (\Vert {\varvec{\widehat{x}}}\Vert _{\infty }+\Vert {\varvec{w}}\Vert _{\infty })\Vert \tau {\varvec{h}}_{K} \Vert _{1}+\frac{\beta }{2}\Vert \tau {\varvec{h}} \Vert _{2}^{2}>0 \end{aligned}$$

for any \(0<1\le \tau\), which proves the necessity. \(\square\)

3 Numerical simulations

As can be seen in the previous sections, our goal in this paper is to provide some deterministic recovery conditions for the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model to guarantee the exact sparse recovery, and the obtained theoretical results for these models can be found in Theorem 2.3 and Theorem 2.5, respectively. On the other hand, it is still a difficult problem to find the desired measurement matrices according to the conditions stated in two obtained theorems. Nevertheless, we still hope to provide some numerical simulations to testify the efficiency of two models with the side information. Since both the \(\ell _{1}\)\(\ell _{1}\) model, \(\ell _{1}\)\(\ell _{2}\) model and the \(\ell _{1}\) model are convex, in this paper, we resort to the popular and easy-implemented CVXFootnote 1 (with an SeDuMi solver) to solve them.

3.1 Experiments on the synthetic signals

We start with the experiments on synthetic signals. For simplicity, in all experiments, we assume that the length of the desired signal \({\varvec{\widehat{x}}}\) is \(n=256\), and its sparsity is set to be k. We generate such a k-sparse signal \({\varvec{\widehat{x}}}\) as follows. The location of the nonzero components in \({\varvec{\widehat{x}}}\) is generated at random and their corresponding values are chosen from a standard normal distribution. We assume that the referenced signal \({\varvec{w}}\) is \(k_{{\varvec{w}}}\)-sparse with \(k_{{\varvec{w}}}\le k\), and its nonzero entries are chosen at random from nonzero entries in \({\varvec{\widehat{x}}}\) with \(\text {Supp}({\varvec{w}})\subseteq \text {Supp}({\varvec{\widehat{x}}})\). Obviously, when \(k_{{\varvec{w}}}<k\), \({\varvec{w}}\) contains part of the side information of \({\varvec{\widehat{x}}}\), and when \(k_{{\varvec{w}}}=k\), \({\varvec{w}}\) contains all the information of \({\varvec{\widehat{x}}}\), i.e., \({\varvec{w}}={\varvec{\widehat{x}}}\). Besides, we generate the measurement matrix \(A\in {\mathbb {R}}^{m\times n}\) by drawing it from a standard Gaussian distribution. To judge the recovery performance of the completing methods, we adopt the signal-to-noise (SNR), see, e.g., [35], given by

$$\begin{aligned} \text {SNR}({\varvec{\widehat{x}}},{\varvec{\widetilde{x}}})=20\log _{10}\frac{\Vert {\varvec{\widehat{x}}}\Vert _{2}}{\Vert {\varvec{\widehat{x}}}-{\varvec{\widetilde{x}}}\Vert _{2}}, \end{aligned}$$

where \({\varvec{\widehat{x}}}\) and \({\varvec{\widetilde{x}}}\) are denoted by the original signal and the recovered signal by certain model. If there is no specific description, the average SNR results over independent 50 trails are used as the final results.

We first conduct a simple experiment for the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{1}\) model to test their recovery performance on the signal with side information. In this sort of experiments, we set \(m=70\), \(k=20\), \(k_{{\varvec{w}}}=10\), and \(\beta =10^{4}\) for the \(\ell _{1}\)\(\ell _{1}\) model and \(\beta =10^{-4}\) for the \(\ell _{1}\)\(\ell _{2}\) model.

Fig. 1
figure 1

Recovery performance of the models (1.3) and (1.4) on a given sparse signal with side information

Figure 1 plots the resultant recovery performance. One can easily observe that both the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model well finish the recovery task with the two recovered signals being almost same with the original signal. In the above experiments, we only set \(\beta =10^{4}\) for the \(\ell _{1}\)\(\ell _{1}\) model and \(\beta =10^{-4}\) for the \(\ell _{1}\)\(\ell _{2}\) model for simplicity. Obviously, if these parameters can be further optimized, the resultant SNR performance will be correspondingly improved. To select the proper \(\beta\) for the models (1.3) and (1.4). We let the parameter \(\beta\) be chosen from \(\{10^{-8},10^{-7},\cdots , 10^{8}\}\), and set the other parameters be same as before.

Fig. 2
figure 2

Selection of the best \(\beta\)’s for both the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model

Figure 2 plots the obtained results. According to Fig. 2, in what follows we set \(\beta =10^{5}\) for the model (1.3) and \(\beta =10^{-6}\) for the model (1.4).

Furthermore, by fixing \(k=20\), \(k_{{\varvec{w}}}=10\), we testify the recovery performance of the two models under different number of measurements. In this sort of experiments, we also consider comparing these two models with the classical \(\ell _{1}\) model (1.1) and the weighted \(\ell _{1}\) model (without noise) in [36], i.e.,

$$\begin{aligned} \min _{{\varvec{x}}\in {\mathbb {R}}^{n}}~\sum _{i=1}^{n}{\varvec{w}}_{i}|{\varvec{x}}_{i}|~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}}, \end{aligned}$$
(3.12)

where

$$\begin{aligned} {\varvec{w}}_{i}= \left\{ \begin{array}{ll} \omega \in [0,1], &{} i\in K\\ 1, &{} i\in K^{c} \end{array} \right. , \end{aligned}$$

and \(K\subseteq \{1,2,3,\cdots , n\}\) models the known support set of the original signal \({\varvec{x}}\). For simplicity, we set \(K=\text {Supp}({\varvec{w}})\). Since (3.12) is also convex we can solve it easily by means of the CVX. Obviously, the above weighted \(\ell _{1}\) model will reduce to (1.2) when one sets \(\omega =0\). According to [36], it is suggested to set \(\omega\) as closer as 0 when the value of \(|K\cap \text {Supp}({\varvec{\widehat{x}}})|/|K|\) is as closer as 1. Considering that in our experiments K is strictly included in \(\text {Supp}({\varvec{\widehat{x}}})\), we set \(\omega =0\) for the weighted \(\ell _{1}\) model to boost its best recovery performance. Figure 3 shows the obtained results.

Fig. 3
figure 3

Recovery performance of four models under different number of the measurements

It first shows that an increasing m leads to a better recovery for all three models. However, among these models, the \(\ell _{1}\)\(\ell _{1}\) model performs best, followed by the weighted \(\ell _{1}\) model. The \(\ell _{1}\)\(\ell _{2}\) model and the classical \(\ell _{1}\) model perform worst. It first indicates that a good selection of the constraint on the error of the true signal and its referenced signal plays a key role in enhancing the recovery performance of the models. It also shows that the \(\ell _{1}\)\(\ell _{1}\) model is better than the weighted \(\ell _{1}\) model in taking good advantage of the side information. When it comes to the (classical) \(\ell _{1}\) model itself, it is suggested to add an \(\ell _{1}\)-norm based error, rather than an \(\ell _{2}\)-norm based error, into the objective function of the \(\ell _{1}\) model to boost its recovery performance when the side information of the signals becomes available. This observation is also consistent with the conclusion drawn in [21].

Moreover, to further testify the performance of the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model affected by the sparsity of the signals with side information, we consider using these two models to recover the k-sparse signals under different kinds of \(k_{{\varvec{w}}}\)-sparse referenced signals. Figure 4 first plots the recovery results with k changing from \(\{10,12,14,\cdots , 28\}\) where \(k_{{\varvec{w}}}\) is set to be \(k_{{\varvec{w}}}=\lceil k/2 \rceil\).

Fig. 4
figure 4

Recovery performance of four models under different sparsity of the desired signals

In general, if the number of the measurements is fixed, a larger k always leads to a poorer recovery. Obviously, this conclusion can be easily drawn from Fig. 4. However, once the side information of the desired signals is available and is also well modeled the recovery performance can be further improved. Since the classical \(\ell _{1}\) model do not take the side information into consideration, its performance is weaker than the \(\ell _{1}\)\(\ell _{1}\) model and the weighted \(\ell _{1}\) model. In this experiment, the \(\ell _{1}\)\(\ell _{2}\) model performs poor again. Moreover, it reconfirms again a fact that if one can take good advantage of the side information and also well model the side information, the recovery performance can be further improved. In Fig. 5, we investigate the recovery performance affected by the “quality/quantity” of the side information. To be specific, we fix the sparsity of the original signals as \(k=20\), and then let \(k_{{\varvec{w}}}\) change from \(\{1,3,5,7,\cdots , 17\}\). Obviously, the larger the value of \(k_{{\varvec{w}}}\), the higher “quality/quantity” the side information. It is also expected that the recovery performance will be largely improved once the value of \(k_{{\varvec{w}}}\) is increasing.

Fig. 5
figure 5

Recovery performance of four models under different “quality/quantity” of side information

Obviously, it is easy to see from Fig. 5 that both the recovery performance of the \(\ell _{1}\)\(\ell _{2}\) model and the weighted \(\ell _{1}\) model is consistent with our expectations, which is far better than the rest two models. Note that in this sort of experiments, we also fix \(m=64\).

At the end of this part, we will conduct a special experiment, in which the referenced signal \({\varvec{w}}\) is set as \({\varvec{w}}={\varvec{\widehat{x}}}\). Under such setting it becomes very important to investigate how the parameter \(\beta\) affects the recovery performance of both the \(\ell _{1}\)\(\ell _{1}\) model and the the \(\ell _{1}\)\(\ell _{2}\) model. To conduct this experiment, we set \(m=64\), \(k=k_{{\varvec{w}}}=20\), and let the parameter \(\beta\) change from \(\{10^{-8},10^{-7},\cdots , 10^{8}\}\). Figure 6 plots the obtained results.

Fig. 6
figure 6

Recovery performance of the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model under different \(\beta\)’s

Obviously, a larger \(\beta\), a better recovery performance of both these two models. When \(\beta\) is relatively small, both the models perform similar. However, when \(\beta\) increases, the \(\ell _{1}\)\(\ell _{1}\) model performs much better than the \(\ell _{1}\)\(\ell _{2}\) model. It should also be noted that such a assumption that \({\varvec{w}}={\varvec{\widehat{x}}}\) is usually impractical in real-world applications. However, we can rough conclude that a relative bigger \(\beta\) helps the \(\ell _{1}\)\(\ell _{1}\) model to yield a better recovery performance.

3.2 Experiments on the real-world images

In this part, we consider applying the above-mentioned four models to deal with the real-world image recovery problem.

Fig. 7
figure 7

The \(128\times 128\) test images. They are numbered in order from 1 to 10, from left to right, and from top to bottom

Figure 7 shows ten real-world images that we will use in the following experiments. As is known to all, the real-world images are generally not nearly sparse themselves, but can be transformed to be nearly sparse by using some sparse dictionaries such as the discrete cosine transform (DCT). On the other hand, almost all the real-world images usually have local smoothness, which indicates that one can use some known information of the original image to help recover some unknown neighboring information of the original image. Therefore, we will take the nearly sparse vectors (generated by applying the DCT to each column of the input images) as the test signals to test the recovery performance of these models. To be specific, let the original image G be denoted by \(G=[{\varvec{g}}_{1},{\varvec{g}}_{2},\cdots ,{\varvec{g}}_{d}]\) with \({\varvec{g}}_{i}\in {\mathbb {R}}^{n}\) for \(i=1,2,\cdots ,d\), and the DCT dictionary be denoted by \(D\in {\mathbb {R}}^{n\times n}\), then we can easily get the desired sparse (test) signal \({\varvec{x}}_{i}\) by \({\varvec{x}}_{i}=D{\varvec{g}}_{i}\). To model the side information of \({\varvec{x}}_{i}\), we first generate \({\varvec{r}}_{i}={\varvec{x}}_{i}+0.01*\Vert {\varvec{x}}_{i}\Vert _{2}*\xi\), where \(\xi \in {\mathbb {R}}^{n}\) and its elements are generated independently from the standard norm distribution. The signal \({\varvec{r}}_{i}\) can be viewed as the perturbed version of the signal \({\varvec{x}}_{i}\). As to the support estimate K in the weighted \(\ell _{1}\) model, we set K to be the indices of the \(\lceil n*1\%\rceil\) largest absolute elements in \({\varvec{r}}_{i}\). As to the \(\ell _{1}\)\(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model, we set the the referenced signal of \({\varvec{x}}_{i}\) by \({\varvec{w}}_{i}\) with \(({\varvec{w}}_{i})_{j}=({\varvec{r}}_{i})_{j}\) when \(j\in K\) and 0 otherwise. As before, we generate the measurement matrix \(A\in {\mathbb {R}}^{m\times n}\) with \(m=\lceil n/4\rceil\) whose elements are generated independently from the standard norm distribution. As to the other parameters, we set them as we have claimed in Sect. 3.1. Obviously, once we obtain the recovered signal one by one, denoted by \({\varvec{x}}_{i}^{\sharp }\) the ith recovered signal for \(i=1,2,\cdots , d\), by any of the four models, we can thus obtain the recovered image \(G^{\sharp }\) by using

$$\begin{aligned} G^{\sharp }=D^{-1}[{\varvec{x}}_{1}^{\sharp }, {\varvec{x}}_{2}^{\sharp }, \cdots , {\varvec{x}}_{d}^{\sharp }]. \end{aligned}$$

To eliminate the column effect on each recovered image, we consider recovering the original images by column and by row, respectively, and then use their average values as the final output. Moreover, to evaluate the quality of the recovered images, we consider using two popular indices, i.e., the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM), more details on these two indices can be found in [37]. Table 1 lists the obtained PSNR|SSIM results of ten test images recovered by three different models, where the highest PSNR and SSIM values are marked in bold.Footnote 2

Table 1 PSNR|SSIM performance of ten images by four different models

It is easy to see that the \(\ell _{1}\)\(\ell _{1}\) model performs best among all the models, followed by the weighted \(\ell _{1}\) model. The \(\ell _{1}\) model and the \(\ell _{1}\)\(\ell _{2}\) model perform almost be same, but are all far worse than the \(\ell _{1}\)\(\ell _{1}\) model. There results further confirm the claims we have drawn previously. Note that, in this sort of experiments, we still set \(\beta =10^{5}\) for the \(\ell _{1}\)\(\ell _{1}\) model and \(\beta =10^{-6}\) for the \(\ell _{1}\)\(\ell _{2}\) model. It is believed that if \(\beta\) can be further optimized, the final performance of these two models can be further improved.

4 Conclusion and future work

In this paper, we establish two NSP-based sufficient and necessary conditions for two \(\ell _{1}\)\(\ell _{1}\) and \(\ell _{1}\)\(\ell _{2}\) methods in the case that the side information of the desired signal becomes available. These deterministic theoretical results provide good complement for the previous work on these two methods that are based on the statistical analysis. Besides, some experiments demonstrate \(\ell _{1}\)\(\ell _{1}\) method with side information is better than the \(\ell _{1}\)\(\ell _{2}\) method, weighted \(\ell _{1}\) method and the traditional \(\ell _{1}\) method in terms of exact sparse signal recovery when some side information of the desired signals become available.

Note that there also exist a large number of other methods that have been used to deal with the sparse recovery with the prior information of the signals, such as the weighted \(\ell _{p}\) method [29] and the weighted \(\ell _{1-2}\) [28, 30]. Unfortunately, when compared with the above methods, the \(\ell _{1}\)\(\ell _{1}\) method seems to under-perform. To overcome this bad situation, we suggest replacing the \(\ell _{1}\) norm used in the \(\ell _{1}\)\(\ell _{1}\) method with \(\ell _{p}\) quasi-norm with \(0<p<1\) or the \(\ell _{1-2}\) metric, which directly leads to the following two models

$$\begin{aligned} \min _{{\varvec{x}}\in {\mathbb {R}}^{n}}~\Vert {\varvec{x}} \Vert _{p}+\beta \Vert {\varvec{x}}-{\varvec{w}}\Vert _1~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}}, \end{aligned}$$

and

$$\begin{aligned} \min _{{\varvec{x}}\in {\mathbb {R}}^{n}}~\Vert {\varvec{x}} \Vert _{1}-\Vert {\varvec{x}} \Vert _{2}+\beta \Vert {\varvec{x}}-{\varvec{w}}\Vert _1~~~\text {s.t.}~~~A{\varvec{x}}={\varvec{b}}. \end{aligned}$$

On the other hand, considering that the unknown noise widely exist in many real-world applications, it is very necessary to extend the models (1.3) and (1.4) to more general cases. Inspired by the recent work [33], it is suggested to consider investigating a regularization version of (1.3), i.e.,

$$\begin{aligned} {\varvec{x}}^{\sharp }=\mathop {\arg \min }_{{\varvec{x}}\in {\mathbb {R}}^{n}}~\Vert {\varvec{x}} \Vert _{1}+\beta \Vert {\varvec{x}}-{\varvec{w}}\Vert _{1}+\theta \Vert A{\varvec{x}}-{\varvec{b}}\Vert _{2}^{2}, \end{aligned}$$

where \(\theta >0\) is another trade-off parameter. All the above considerations will be our future work.