1 Introduction

Compressed sensing (CS) [14] is a novel data acquisition scheme that samples a signal at a sub-Nyquist rate, which allows simultaneous data acquisition and compression. The original signal can be faithfully recovered from the measurement samples, if it is sparse with respect to a particular basis and sampled via a random projection. With efficient measurement and stable reconstruction, the CS technique has been of interest in a variety of research fields, e.g., communications [57], sensor networks [810], image processing [1113], and radar [14].

Recently, a great deal of attention has been paid to the CS technique for data confidentiality in information security field. A CS-based cryptosystem encrypts a plaintext through a CS measurement process by keeping the sensing matrix secret. Then, the ciphertext can be decrypted by a CS reconstruction process. Thus, the CS-based cryptosystem performs simultaneous data acquisition and encryption at physical layer. Such a lightweight cryptosystem is particularly attractive for secure communications in wireless sensor networks, where the resources are not sufficient for providing data confidentiality by conventional encryption.

The security potential of compressed sensing was hinted by Candes and Tao [3], where the measurement samples were referred to as a weakly encrypted ciphertext. In [15], Rachlin and Baron proved that the CS-based cryptosystem cannot be perfectly secure but might be computationally secure. Orsdemir et al. [16] showed that it is computationally secure against a key search technique via an algebraic approach. Subsequently, many researchers have studied the security of CS-based cryptosystems for practical applications, which will be discussed with more details in Section 2.3. For a comprehensive review of CS techniques in information security, readers are referred to [17].

In this paper, we study the computational security of a CS-based cryptosystem that encrypts a plaintext with a partial unitary sensing matrix embedding a secret keystream. The keystream to be embedded is obtained by a keystream generator of stream ciphers, which ensures fast and efficient generation of the keystream. Assuming that the keystream is part of the original one with an extremely long period, we renew it at each encryption, which leads to a one-time sensing (OTS) cryptosystem. Then, the initial seed (or state) of the original keystream generator is essentially the secret key of the CS-based cryptosystem. With the sensing matrix, we demonstrate that the CS-based cryptosystem theoretically guarantees a stable and robust CS decryption for a legitimate recipient.

For security analysis, we first use probability metrics to investigate the security in a statistical manner. The total variation (TV) distance [18] between probability distributions of ciphertexts conditioned on a pair of plaintexts is examined as a security measure for the indistinguishability [19] of our CS-based cryptosystem. We investigate the TV distance by developing upper bounds on the relative entropy [20] and the Hellinger distance [21], which demonstrates that our CS-based cryptosystem can be computationally secure in terms of the indistinguishability, as long as the keystream length for each encryption is sufficiently large with low compression \(\left (\frac {M}{N} \right)\) and sparsity \(\left (\frac {K}{N} \right)\) ratios.

Next, we analyze the security of our CS-based cryptosystem by examining the resistance against a cryptanalytic attack. We consider a potential chosen plaintext attack (CPA) from an adversary to recover the key of our CS-based cryptosystem. In the CPA, the adversary needs to restore a keystream embedded in CS encryption, which is nontrivial unlike in stream ciphers, since the keystream is not outstanding from a known plaintext-ciphertext pair. Associated with the key recovery attack, we show that the security of our CS-based cryptosystem is based on the mathematical intractability of a constrained integer least-squares (ILS) problem. For a sub-optimal, but feasible key recovery attack, we consider a successive approximate maximum-likelihood (ML) detection (SAMD) for the adversary’s CPA and investigate the performance by developing an upper bound on the success probability. Finally, theoretical analysis and numerical results reveal that our CS-based cryptosystem can be secure against the key recovery attack through the SAMD.

This paper is organized as follows. Section 2 reviews the CS principle, discusses some known CS-based cryptosystems, and summarizes the contributions of this paper. In Section 3, we describe a mathematical model of the CS-based cryptosystem proposed by this paper. We discuss a theoretical guarantee of CS decryption for a legitimate recipient by the cryptosystem. In Section 4, we analyze the indistinguishability of our CS-based cryptosystem, to demonstrate the computational security. Section 5 introduces an adversary’s potential CPA strategy for key recovery, where we describe the details and examine the performance of SAMD. Section 6 presents numerical results to demonstrate the reliability and the security of our CS-based cryptosystem. Finally, concluding remarks will be given in Section 7.

2 Background

2.1 Notations

A matrix (or a vector) is represented by a boldface upper (or lower) case letter. U T and |U| denote the transpose and the determinant of a matrix U, respectively. tr(U) denotes the trace of a matrix U or the sum of all diagonal entries of U. U(k,t) is an entry of an M×N matrix U in the kth row and the tth column, where 0≤kM−1 and 0≤tN−1. μ(U) denotes the maximum magnitude of the entries of U, i.e., \(\mu (\mathbf {U}) = \underset {k, t}{\max } |\mathbf {U}(k,t)|\). diag(s) is a diagonal matrix whose diagonal entries are from a vector s. An identity matrix is denoted by I, where the dimension is determined in the context. W is a conventional N×N Walsh-Hadamard matrix, where W W T=W T W=N I. Also, D denotes a discrete-cosine transform (DCT) matrix, where D D T=D T D=N I. For a vector \(\mathbf {x} = (x_{0},\cdots, x_{N-1})^{T} \in \mathbb {R}^{N}\), the l p -norm of x is denoted by \( || \mathbf {x} ||_{p} = \left (\sum _{k=0}^{N-1} |x_{k}|^{p} \right)^{\frac {1}{p}} \), where 1≤p<. If the context is clear, ||x|| denotes the l 2-norm of x. A vector \(\mathbf {n} \sim {\mathcal {N}} \left (\mathbf {0}, \sigma ^{2} \mathbf {I}\right)\) is a Gaussian random vector with mean 0=(0,⋯,0)T and covariance σ 2 I. Finally, \(\mathbb {E}[\!\cdot ]\) denotes the average of a random vector or a random matrix.

Table 1 summarizes the abbreviations of this paper.

Table 1 Abbreviations

2.2 Compressed sensing

Compressed sensing (CS) [13] is to recover a sparse signal from the measurements that are believed to be incomplete. A signal \(\mathbf {x} \in \mathbb {R}^{N}\) is called K-sparse with respect to a sparsifying (orthonormal) basis Ψ if θ=Ψ x has at most K nonzero entries, where KN. The sparse signal x is linearly measured by \(\mathbf {r} = \boldsymbol {\Phi } \mathbf {x} + \mathbf {n} = \boldsymbol {\Phi } \boldsymbol {\Psi }^{T} \boldsymbol {\theta } + \mathbf {n} \in \mathbb {R}^{M} \), where Φ is an M×N measurement matrix with MN and \(\mathbf {n} \in \mathbb {R}^{M}\) is a measurement noise. The CS theory states that if the sensing matrix A=Φ Ψ T obeys the restricted isometry property (RIP) [2], a stable and robust reconstruction of θ can be guaranteed from the incomplete measurement r. The CS reconstruction is accomplished by solving the l 1-minimization problem of

$$ \boldsymbol{\widehat{\theta}} = \underset{\boldsymbol{\theta}}{\text{argmin}} || {\boldsymbol{\theta}} ||_{1} \text{subject to } ||\mathbf{A} {\boldsymbol{\theta}} - \mathbf{r}||_{2} \leq \epsilon $$

with convex optimization or greedy algorithms [4]. For simplicity, this paper assumes Ψ=I, or that x is sparse in canonical basis, which yields the sensing matrix of A=Φ.

2.3 Prior works on CS-based cryptosystems

Since the foundational works of [15] and [16], there have been many research efforts on CS-based cryptosystems. Bianchi, Bioglio, and Magli [22, 23] analyzed the security of a noiseless CS-based cryptosystem utilizing random Gaussian sensing matrices in an OTS manner. In [24], a similar analysis has been made for a noiseless CS-based cryptosystem having a circulant sensing matrix for efficient CS processes. Cambareri et al. [25] proposed a CS-based cryptosystem that supports multiclass encryption using a random Bernoulli matrix and its class-dependent variations. In spite of exploiting different security measures, i.e., indistinguishability [23] and asymptotic spherical security [25], the security analyses of [23] and [25] showed that the statistical properties of ciphertexts reveal only the information about the energy of the plaintexts. The security of the multiclass encryption scheme has been further investigated in [26] against a known plaintext attack (KPA), by examining the average number of candidate solutions matching a plaintext-ciphertext pair.

In addition to the secret sensing matrix, a CS-based cryptosystem may employ an extra cryptographic primitive, which can be considered as a product cipher. For instance, scrambling or random permutation has been additionally accomplished, before [27] or after [28] CS encryption. In [29], nonlinear diffusion has been added to quantized ciphertexts. Zhang et al. [30] proposed a bi-level protected CS (BLP-CS), where the sparsifying basis and the sensing matrix are generated with different secret keys. In the BLP-CS, the knowledge of both the sparsifying basis and the sensing matrix is required for CS decryption.

To gain a resistance against KPA and CPA, a CS-based cryptosystem normally operates in an OTS manner, by renewing the sensing matrix at each encryption. As the renewal requires the additional complexity and can quickly waste up the cryptographic resource for generating each sensing matrix, a CS-based cryptosystem reusing the sensing matrix during multiple encryptions has also been of interest. However, it is insecure against KPA and CPA, since an adversary can easily recover the sensing matrix with N linearly independent plaintexts by solving the system of linear equations [15]. While reusing the same sensing matrix, the BLP-CS [30] attempted to overcome the weakness and to achieve a CPA-resistance by ensuring a RIPless reconstruction for an adversary.

CS-based cryptosystems can work in a framework of physical layer security [31]. The emerging technology of physical layer security is a promising paradigm for enhancing wireless security [32], by exploiting the randomness of wireless channel characteristics. In [33], Agrawal and Vishwanath derived sufficient conditions for secret communications via CS in a wiretap channel. Reeves at al. [34] investigated the secrecy capacity of a wiretap channel employing CS. Dautov and Tsouri [35] used the received signal strength indicator (RSSI) from wireless channels for secure key establishment in a CS-based cryptosystem, where the shared key can be used to form a common sensing matrix in a sender and a recipient. In practice, a variety of CS-based cryptosystems concerning the security and privacy of multimedia, imaging, and smart grid data have been suggested and studied in [3639].

2.4 Summary of contributions

The main results of this paper are summarized in comparison with prior works. Our CS-based cryptosystem encrypts a plaintext with a partial unitary sensing matrix embedding a secret keystream, which is used only once for each encryption. Thus, it operates in an OTS manner, similar to those of [2225], but different from the BLP-CS [30]. It can further reduce the consumption of the cryptographic resource by renewing only the keystream of length N, not replacing the entire M×N sensing matrix, at each encryption. Unlike the BLP-CS, our CS-based cryptosystem uses only a single cryptographic primitive, or the secret keystream, while keeping the sparsifying basis public. Furthermore, the secret keystream can be efficiently generated by a keystream generator of stream ciphers. Based on the RIP analysis, the knowledge of the sensing matrix, or equivalently the keystream, theoretically guarantees a reliable CS decryption.

In security analysis, we obtain the result by two different approaches. On the one hand, we demonstrate the indistinguishability of our CS-based cryptosystem, by investigating the TV distance between probability distributions of a pair of ciphertexts. This statistical approach seems like the analysis of [23], but we use a new probability metric of the Hellinger distance [21] to characterize the TV distance. On the other hand, we consider a potential CPA from an adversary for key recovery of our CS-based cryptosystem. By formulating the CPA as an NP-hard problem, we show that the success of the CPA is computationally infeasible for a sufficiently large keystream length. In addition, we introduce a sub-optimal but feasible CPA strategy and investigate the performance with the highest possible success probability. Finally, the CPA performance turns out to be quite poor even under an optimistic scenario, which guarantees the security against the CPA for our CS-based cryptosystem. The second type of security analysis is new in this paper.

3 Mathematical model

3.1 CS encryption with a partial unitary sensing matrix

A CS-based cryptosystem encrypts a sparse plaintext \(\mathbf {x} \in \mathbb {R}^{N}\) through the CS measurement process with a sensing matrix \(\boldsymbol {\Phi } \in \mathbb {R}^{M \times N}\), which produces the ciphertext \(\mathbf {r} = \boldsymbol {\Phi } \mathbf {x} + \mathbf {n} \in \mathbb {R}^{M}\), where \(\mathbf {n} \sim \mathcal {N}\left (\mathbf {0}, \sigma ^{2} \mathbf {I}\right)\) is a measurement noise. This paper proposes a CS-based cryptosystem that employs a partial unitary sensing matrix Φ embedding a secret keystream, as defined in Definition 1.

Definition 1

The sensing matrix1 of our CS-based cryptosystem is defined by

$$ \boldsymbol{\Phi} = \frac{1}{\sqrt{M}} \mathbf{R}_{\Omega} \mathbf{U} = \frac{1}{\sqrt{MN}} \mathbf{R}_{\Omega} \mathbf{U}_{1} {diag}(\mathbf{s}) \mathbf{U}_{2}. $$
(1)

In (1), R Ω is a public random subsampling operator that selects M rows out of N ones uniformly at random, where the selected indices are specified by Ω={ω 0,⋯,ω M−1}. Also, \(\mathbf {U}_{i} \in {\mathbb {R}}^{N \times N} \) is a unitary matrix, i.e., \(\mathbf {U}_{i}^{T} \mathbf {U}_{i} = \mathbf {U}_{i} \mathbf {U}_{i}^{T} = N \mathbf {I}\) for i=1 and 2, respectively. In particular, each entry of U 1 has unit magnitude, i.e., |U 1(k,t)|=1 for all 0≤k,tN−1. Finally, \(\mathbf {U} = \frac {1}{\sqrt {N}} \mathbf {U}_{1} { diag}(\mathbf {s}) \mathbf {U}_{2} \) is also unitary for s∈{−1,+1}N, where s is a secret keystream to be embedded in Φ for each CS encryption.

In this paper, we use U 1=H, or an N×N Hadamard matrix that employs a binary m-sequence [40] of period N−1=2n−1 for a positive integer n, i.e., \(\mathbf {d} = \left (d_{0}, \cdots, d_{2^{n}-2} \right)\), where d k ∈{0,1}. For 0≤k,tN−1, each entry of H is given by

$$\mathbf{H}(k, t) = \left\{ \begin{array}{ll} 1, & \quad \text{if}\ k=0\ \text{or}\ t=0, \\ (-1)^{d_{k+t-2}}, & \quad \text{otherwise,} \end{array} \right. $$

where the index k+t−2 is computed modulo 2n−1. From the structure, H is symmetric, or H T=H. As d has the ideal two-level autocorrelation [40], i.e.,

$$\sum\limits_{k = 0}^{2^{n}-2} (-1)^{d_{k} + d_{k + \tau}} = \left\{ \begin{array}{ll} 2^{n}-1, & \quad \text{if } \tau = 0, \\ -1, & \quad \text{if } 1 \leq \tau \leq 2^{n}-2, \end{array} \right. $$

where k+τ is computed modulo 2n−1, it is obvious that H H T=H T H=N I. Since H is public, the structure and the initial state of an n-stage linear feedback shift register (LFSR) generating the binary m-sequence d are publicly known.

3.2 Keystream generation for CS encryption

In the sensing matrix Φ of (1), we assume that s is a segment of length N from the original keystream of an extremely long period, which enables to renew the keystream s at each CS encryption. For fast and efficient keystream generation, one may employ an LFSR-based nonlinear keystream generator of stream ciphers. For example, we may consider the combinatorial sequence generator [41], the filtering sequence generator [42], the clock-controlled generator [43, 44], the shrinking generator [45], and the self-shrinking generator (SSG) [46], each of which presents a simple structure but a remarkable resistance against various attacks. For more details on keystream generators and stream ciphers, see [47] and [48]. Regarding the keystream of our CS-based cryptosystem, we make the following assumption.

Assumption 1

An original keystream from a stream cipher is designed to have nice pseudorandomness properties [40] such as balance, large period, low autocorrelation, and large linear complexity. With the properties, we assume that each element of the keystream s takes +1 or −1 independently and uniformly at random, which facilitates the security analysis of our CS-based cryptosystem.

When we employ a keystream generator to produce the keystream s, the initial seed (or state) of the generator is essentially the key of our CS-based cryptosystem. The key should be kept secret between a sender and a legitimate recipient, whereas the structure of the keystream generator can be publicly known. For secure key exchange, we may establish a separate secure channel, or use the key establishment via the RSSI from wireless channels as in [35].

3.3 CS decryption

For CS decryption, a noisy ciphertext \(\mathbf {r} = \boldsymbol {\Phi } \mathbf {x} + \mathbf {n} \in {\mathbb {R}}^{M} \) is available for an adversary as well as a legitimate recipient, where \(\mathbf {n} \sim {\mathcal {N}}\left (\mathbf {0}, \sigma ^{2} \mathbf {I}\right)\) is a measurement noise. A legitimate recipient of the ciphertext r, who knows Φ, attempts to recover the plaintext x by conducting a CS reconstruction. Meanwhile, an adversary will make various attempts to recover the plaintext x or the keystream s, with no knowledge of Φ.

Proposition 1 presents the reliability and the stability of our CS-based cryptosystem for a legitimate recipient, which is from the RIP result [49, 50] of a partial unitary sensing matrix.

Proposition 1

[49, 50] For a legitimate recipient, our CS-based cryptosystem theoretically guarantees a stable decryption of a K-sparse plaintext with bounded errors, as long as \(M = \mathcal {O}\left (\mu ^{2}(\mathbf {U}) \cdot K\log ^{4}N \right)\).

When U 1=H, numerical experiments revealed that \(\mu (\mathbf {U}) = {\mathcal {O}}\left (\sqrt {\log N}\right) \) for i) U 2=W or ii) U 2=D, if each entry of the keystream s takes +1 or −1 uniformly at random. In this case, if \(M = \mathcal {O}\left (K\log ^{5}N\right)\), Proposition 1 guarantees a stable decryption.

Table 2 summarizes a symmetric-key CS-based cryptosystem proposed in this paper.

Table 2 Symmetric-key CS-based cryptosystem

4 Security analysis

A CS-based cryptosystem cannot be perfectly secure [15] but is believed to be computationally secure [15, 16]. In this section, we analyze the computational security of our CS-based cryptosystem by studying the notion of indistinguishability [19].

Assume that a cryptosystem produces a ciphertext by encrypting one of two possible plaintexts. The cryptosystem is said to have the indistinguishability, if no adversary can determine in polynomial time which of the two plaintexts corresponds to the ciphertext, with probability significantly better than that of a random guess [51]. In short, if a cryptosystem has the indistinguishability, an adversary is unable to learn any partial information of the plaintext in polynomial time from a given ciphertext.

In specific, let us consider an indistinguishability experiment [51] with a constraint of K-sparse plaintexts. First of all, an adversary creates a pair of plaintexts x 1 and x 2 with at most K nonzero entries per each. Then, our CS-based cryptosystem produces a ciphertext r=Φ x h +n by randomly selecting h, where h=1 or 2. Given r, the adversary attempts to figure out which plaintext, x 1 or x 2, was encrypted for the ciphertext, by carrying out a polynomial time test \({\mathcal {D}}: \mathbf {r} \rightarrow h \in \{1, 2\}\).

In this paper, we make use of the total variation (TV) distance [18] to evaluate the performance of the indistinguishability experiment. Let d TV(p 1,p 2) be the TV distance between the probability distributions p 1=Pr(r|x 1) and p 2=Pr(r|x 2). Then, it is readily checked from [52] that the probability that an adversary can successfully distinguish the plaintexts by some kind of the binary hypothesis test \({\mathcal {D}}\) is bounded by

$$ p_{d} \leq \frac{1}{2} + \frac{d_{\text{TV}} (p_{1}, p_{2}) }{2}. $$
(2)

Therefore, if d TV(p 1,p 2) approaches to zero, the probability of success will be at most that of a random guess, which leads to the indistinguishability of a cryptosystem. Consequently, one can argue that a cryptosystem with d TV(p 1,p 2) closer to zero would be more secure in terms of the indistinguishability. Since computing d TV(p 1,p 2) directly is difficult [53], we compute two probability metrics instead to bound the TV distance, which ultimately examines the indistinguishability of our CS-based cryptosystem.

4.1 Relative entropy

In [23] and [24], the relative entropy (or the Kullback-Leibler divergence [20]) has been used to quantify the indistinguishability. Precisely, the relative entropy of two probability distributions gives an upper bound on the TV distance by Pinsker’s inequality [54] or the refinements [55], which ultimately bounds the success probability of the indistinguishability experiment by (2).

In (1), one may assume that the entries of Φ are asymptotically Gaussian for a sufficiently large N, since each one can be seen as the sum of independent random variables weighted by each entry of s. Along with the Gaussian noise n, we assume that r, conditioned on x 1 (or x 2), is a jointly Gaussian random vector. Also, \( {\mathbb {E}}[\boldsymbol {\Phi }] = \frac {1}{\sqrt {MN}} \mathbf {R}_{\Omega } \mathbf {U}_{1} \cdot {\mathbb {E}} [\!\text {diag}(\mathbf {s})] \cdot \mathbf {U}_{2} = \mathbf {0}\) for a given R Ω , as each entry of s takes ±1 with probability 1/2 under Assumption 1. Thus, \( {\mathbb {E}}\left [\!\mathbf {r} | \mathbf {x}_{h}\right ] = {\mathbb {E}}[\boldsymbol {\Phi }] \cdot \mathbf {x}_{h} + {\mathbb {E}}[\!\mathbf {n}] = \mathbf {0}\). With the Gaussian random vector r, the relative entropy between p 1=Pr(r|x 1) and p 2=Pr(r|x 2) has the following closed-form expression [56]

$$ D\left(p_{1} || p_{2}\right) = \frac{1}{2} \left[ \log \frac{|\mathbf{C}_{2}|}{|\mathbf{C}_{1}|} + \text{tr} \left(\mathbf{C}_{2}^{-1} \mathbf{C}_{1}\right) - M \right], $$
(3)

where C 1 and C 2 are the covariance matrices of r conditioned on x 1 and x 2, respectively. By measuring the relative entropy by (3), we obtain an upper bound on the TV distance, i.e.,

$$ d_{\text{TV}} (p_{1}, p_{2}) \leq \min \left(\sqrt{\frac{D(p_{1} || p_{2})}{2}}, \ 1 \right) $$
(4)

by Pinsker’s inequality. In (4), the upper bound is set to be at most 1, since d TV(p 1,p 2)∈[ 0,1].

In what follows, we present an upper bound on the relative entropy with some constraints on plaintexts, which subsequently yields an analytic upper bound on the maximum TV distance by (4).

Theorem 1

In our CS-based cryptosystem, assume that each plaintext x has at most K nonzero entries with the constant energy \(\mathcal {E}_{x} = || \mathbf {x} ||^{2}\). Then, the relative entropy of (3) is bounded by

$$ {}D(p_{1} || p_{2})\! \leq\! \frac{M}{2} \left(\!K \mu^{2} (\mathbf{U}_{2})\! \cdot \text{PNR}\,-\,\log\! \left(K \mu^{2} (\mathbf{U}_{2})\! \cdot\! \text{PNR}\! +\! 1 \right) \right)\!, $$
(5)

where \({PNR} = \frac {\mathcal {E}_{x}}{M \sigma ^{2}}\) is the plaintext-to-noise power ratio (PNR).

Proof

See the Appendices. □

In Theorem 1, μ(U 2)=1 if U 2=W, while \(\mu (\mathbf {U}_{2}) = \sqrt {2}\) if U 2=D. However, if \(\mathbf {U}_{2} = \sqrt {N} \mathbf {I}\), the upper bound increases as N for \(\mu (\mathbf {U}_{2}) = \sqrt {N}\). Thus, Theorem 1 implies that one must not use \(\mathbf {U}_{2} = \sqrt {N} \mathbf {I}\), to achieve the indistinguishability of our CS-based cryptosystem.

To ensure a reliable CS decryption for a legitimate recipient, our CS-based cryptosystem can set \(K = {\mathcal {O}} \left (\frac {M}{ \mu ^{2} (\mathbf {U}) \log N} \right)\) for nonuniform CS recovery [57], which yields the following corollary.

Corollary 1

In our CS-based cryptosystem with U 1=H and N=2n, assume U 2=W or D, where \(\mu (\mathbf {U})={\mathcal {O}}(\sqrt {\log N})\). In Theorem 1, if \(K \leq \frac {c M}{n^{2}} \) with a constant c, then

$$ \begin{aligned} D(p_{1} || p_{2}) & \leq \frac{M}{2} \left(\frac{c M \mu^{2} (\mathbf{U}_{2})}{n^{2}} \cdot \text{PNR}\right. \\ &\quad - \log \left.\left(\frac{c M \mu^{2} (\mathbf{U}_{2})}{n^{2}} \cdot \text{PNR} + 1 \right) \right). \end{aligned} $$

Thus, if the keystream length N is sufficiently large with given M and PNR, our CS-based cryptosystem will have low relative entropy, which contributes to the indistinguishability against an adversary, while guaranteeing the reliability for a legitimate recipient.

4.2 Hellinger distance

To bound the TV distance, we may use another probability metric, the Hellinger distance [21]. In our CS-based cryptosystem, recall that the ciphertext r, conditioned on x h , is assumed to be a jointly Gaussian random vector with zero mean and the covariance matrix C h , where h=1 or 2. Then, the Hellinger distance for the multivariate Gaussian distributions p 1 and p 2 is given by [58, 59]

$$ d_{\mathrm{H}} (p_{1}, p_{2}) = \sqrt{1- \frac{|\mathbf{C}_{1}|^{\frac{1}{4}} |\mathbf{C}_{2}|^{\frac{1}{4}}}{|\mathbf{C}_{3}|^{\frac{1}{2}}} }, $$
(6)

where \(\mathbf {C}_{3} = \frac {\mathbf {C}_{1} + \mathbf {C}_{2}}{2}\). The Hellinger distance is particularly useful by giving both upper and lower bounds on the TV distance [60], i.e.,

$$ {}d_{\mathrm{H}}^{2} (p_{1}, p_{2}) \leq d_{\text{TV}} (p_{1}, p_{2}) \leq d_{\mathrm{H}}(p_{1}, p_{2}) \sqrt{2 - d_{\mathrm{H}}^{2} (p_{1}, p_{2})}. $$
(7)

In what follows, we present an upper bound on the Hellinger distance of (6), which leads to an analytic upper bound on the maximum TV distance by (7).

Theorem 2

Recall the assumptions and definitions of Theorem 1. In our CS-based cryptosystem, the Hellinger distance of (6) is bounded by

$$ d_{\mathrm{H}} (p_{1}, p_{2}) \leq \sqrt{1 - \left(\frac{2\sqrt{K \mu^{2} (\mathbf{U}_{2}) \cdot \text{PNR} + 1}}{K \mu^{2} (\mathbf{U}_{2})\cdot \text{PNR} + 2} \right)^{\frac{M}{4} }}, $$
(8)

where \(\text {PNR} = \frac {\mathcal {E}_{x}}{M \sigma ^{2}}\).

Proof

See the Appendices. □

Corollary 2

In our CS-based cryptosystem with U 1=H and N=2n, assume U 2=W or D, where \(\mu (\mathbf {U}) = {\mathcal {O}}\left (\sqrt {\log N}\right)\). In Theorem 2, if \(K \leq \frac {c M}{n^{2}}\) with a constant c, then

$$ d_{\mathrm{H}} (p_{1}, p_{2}) \leq \sqrt{1 - \left(\frac{2n \sqrt{c M \mu^{2} (\mathbf{U}_{2}) \cdot \text{PNR} + n^{2}}} {c M \mu^{2} (\mathbf{U}_{2}) \cdot \text{PNR} + 2n^{2}} \right)^{\frac{M}{4} }}. $$

Thus, if the keystream length N is sufficiently large with given M and PNR, our CS-based cryptosystem will have low Hellinger distance, which contributes to the indistinguishability against an adversary, while guaranteeing the reliability for a legitimate recipient.

Remark 1

Theorems 1 and 2 suggest that the relative entropy and the Hellinger distance will approach to zero as PNR decreases. Accordingly, our CS-based cryptosystem will have low TV distance by (4) and (7) at low PNR. Similarly, the TV distance will be low when M and K are small, respectively. Consequently, our CS-based cryptosystem can be indistinguishable at low PNR for small M and K.

Remark 2

When N=2n increases, Corollaries 1 and 2 suggest that if M is fixed, the relative entropy and the Hellinger distance will decrease at a given PNR by reducing \(K = {\mathcal {O}} \left (\frac {M}{n^{2}} \right)\), which will be confirmed by numerical results of Section 5. On the other hand, if M increases with \(M = {\mathcal {O}} \left (K n^{2}\right) \) for a given K, numerical results reveal that they also decrease over N at a given PNR, which contradicts Theorems 1 and 2. This observation implies that there is a room to improve the bounds of the theorems. Combined with Remark 1, the TV distance will be low if the keystream length N is sufficiently large with low compression \(\left (\frac {M}{N} \right)\) and sparsity \(\left (\frac {K}{N} \right)\) ratios, which leads to the asymptotic indistinguishability of our CS-based cryptosystem.

5 Potential key recovery attack

In this section, we consider a potential key recovery attack in which an adversary attempts to recover the key of our CS-based cryptosystem. In the CPA, the adversary tries to restore a keystream from a ciphertext (stage 1) and then to recover the original key from the restored keystream via algebraic cryptanalysis (stage 2). With a sufficiently long key, we assume that the number of keystream bits required for the algebraic cryptanalysis, denoted by D, is much larger than the ciphertext length M. For a convenience of analysis, we assume D=N, which means that the adversary needs to restore a keystream of full length N from stage 1. Figure 1 illustrates the potential CPA from an adversary for key recovery. This section discusses the adversary’s strategy for keystream recovery in stage 1. Once a keystream is successfully restored through stage 1, a known cryptanalysis [47, 48] can be carried out in stage 2 for key recovery, which will not be discussed in this paper.

Fig. 1
figure 1

An adversary’s chosen plaintext attack for key recovery against our CS-based cryptosystem

5.1 Mathematical intractability of keystream recovery

In stage 1 of the CPA, an adversary needs to observe a correct N-bit keystream from a ciphertext that has been encrypted by a chosen plaintext. We assume that the adversary will choose a plaintext x such that each entry of \(\widehat {\mathbf {x}} = \mathbf {U}_{2} \mathbf {x} \) is nonzero for a unitary matrix U 2. Then, the corresponding ciphertext is given by

$$ \begin{aligned} \mathbf{r} = \boldsymbol{\Phi} \mathbf{x} + \mathbf{n} & = \frac{1}{\sqrt{MN}} \mathbf{R}_{\Omega} \mathbf{U}_{1} \text{diag}(\mathbf{s}) \mathbf{U}_{2} \mathbf{x} + \mathbf{n}\\ & = \frac{1}{\sqrt{MN}} \mathbf{R}_{\Omega} \mathbf{U}_{1} \text{diag}(\widehat{\mathbf{x}}) \mathbf{s} + \mathbf{n}\\ & = \mathbf{A} \mathbf{s} +\mathbf{n}, \end{aligned} $$
(9)

where \(\mathbf {A} = \frac {1}{\sqrt {MN}} \mathbf {R}_{\Omega } \mathbf {U}_{1} \text {diag}(\widehat {\mathbf {x}})\). Unlike in stream ciphers, restoring the keystream s from the known plaintext-ciphertext pair is not a trivial task, since s is hidden under compression in r.

From the ciphertext r of (9), an adversary needs to find a most likely keystream, which is equivalent to a maximum-likelihood (ML) estimate of

$$ \widehat{\mathbf{s}} = \underset{\mathbf{s} \in \{-1, +1\}^{N}}{\text{argmin}} || \mathbf{r} - \mathbf{A} \mathbf{s} ||^{2}. $$
(10)

Finding the ML solution of (10) is known as a constrained integer least-squares (ILS) problem, which is also called a closest vector problem (CVP) [61] in lattices. For a general A, the constrained ILS problem is proven to be NP hard [62].

To find a most likely keystream of (10), an exhaustive ML search requires the complexity of \({\mathcal {O}}\left (2^{N}\right)\), which would be computationally infeasible if the keystream length N is sufficiently large. Alternatively, the generalized sphere decoding (GSD) algorithms [6365] can find an ML solution to the ILS problem of the underdetermined system with M<N. However, as it has the complexity exponential in NM [6365], the GSD cannot be applicable to the ILS problem with MN. To the best of our knowledge, there is no polynomial-time algorithm to find an ML solution of (10) with MN for a sufficiently large N.

In summary, the computational security of our CS-based cryptosystem against the key recovery attack is brought by the mathematical hardness that no polynomial-time algorithm is known to find an ML solution to the underdetermined ILS problem. In fact, the mathematical intractability of the ILS problem has been exploited by public-key cryptosystems [6668]. In our symmetric-key CS-based cryptosystem, it also ensures that if the keystream length N is sufficiently large with MN, no adversary will be able to find a most likely keystream of length N in polynomial time, which demonstrates the computational security of our CS-based cryptosystem against the key recovery attack.

5.2 Successive approximate maximum-likelihood detection (SAMD)

In Section 5.1, we demonstrated that the ML detection would be infeasible for keystream recovery, as long as the keystream length is sufficiently large. As an alternative, we consider a sub-optimal, but feasible keystream recovery process for the CPA. Instead of restoring an N-bit keystream at once, we assume that an adversary attempts to restore a disjoint J-bit segment2 of the keystream from each detection, where JN, and repeats the detection \(\lceil \frac {N}{J} \rceil \) times successively to restore the keystream of full length N. In this subsection, we describe the details of the successive detection process for keystream recovery.

For a convenience of analysis, we assume a chosen plaintext such that \(\widehat {\mathbf {x}} = \left (\sqrt {MN}, \cdots, \sqrt {MN}\right)^{T}\) in (9), which yields A=R Ω U 1 for our analysis3. In the keystream recovery, an adversary has a freedom to choose the value of J and the J-bit positions of a keystream to be restored at the ith detection. Let Θ i ⊂{0,⋯,N−1} be a set of indices, where |Θ i |=J if 1≤in s −1 and |Θ i |=N−(n s −1)J if i=n s , respectively, for \(n_{s} = \lceil \frac {N}{J} \rceil \). Also, Θ a Θ b =ϕ for ab, where ϕ is an empty set, and \(\phantom {\dot {i}\!}\Theta _{1} + \cdots + \Theta _{n_{s}} = \{0, \cdots, N-1 \}\).

Let \(\mathbf {s}_{\Theta _{i}} \in \{ -1, +1 \}^{|\Theta _{i} |}\) be a |Θ i |-bit vector, where the entries are taken from the indices of Θ i in the keystream s. At the ith detection, an adversary attempts to find \(\mathbf {s}_{\Theta _{i}}\) from the ciphertext r of (9). With \(\mathbf {s}_{\Theta _{1}}, \cdots, \mathbf {s}_{\Theta _{i-1}}\) that have been detected from the previous detections, the ith detection should use a new ciphertext r i by subtracting their contribution from r, i.e.,

$$ \mathbf{r}_{i} = \mathbf{r} - \sum\limits_{h=1}^{i-1} \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Theta_{h}}^{T} \widehat{\mathbf{s}}_{\Theta_{h}}, $$
(11)

where \(\widehat {\mathbf {s}}_{\Theta _{h}}\) is an estimate from the hth detection. In (11), \(\mathbf {R}_{\Theta _{h}}^{T}\) is an N×J column selection operator that selects J columns of U 1 whose indices are specified by Θ h . Let Δ i ={0,⋯,N−1}∖(Θ 1+⋯+Θ i ), where \(\Delta _{n_{s}} = \phi \), and \(\mathbf {R}_{\Delta _{i}}^{T}\) be an N×(Ni J) column selection operator whose indices are specified by Δ i . By assuming \(\widehat {\mathbf {s}}_{\Theta _{h}} = \mathbf {s}_{\Theta _{h}}\) for 1≤hi−1, we have from (11)

$$ \begin{aligned} \mathbf{r}_{i} & = \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Theta_{i}}^{T} \mathbf{s}_{\Theta_{i}} + \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Delta_{i}}^{T} \mathbf{s}_{\Delta_{i}} + \mathbf{n} \\ & = \mathbf{m}_{i} + \mathbf{w}_{i} + \mathbf{n}, \end{aligned} $$
(12)

where \(\mathbf {m}_{i} = \mathbf {R}_{\Omega } \mathbf {U}_{1} \mathbf {R}_{\Theta _{i}}^{T} \mathbf {s}_{\Theta _{i}}\) corresponds to a desired component to be detected at the ith detection, \(\mathbf {w}_{i} = \mathbf {R}_{\Omega } \mathbf {U}_{1} \mathbf {R}_{\Delta _{i}}^{T} \mathbf {s}_{\Delta _{i}} \) is an interfering component from the keystream segments that have not been detected yet, and \(\mathbf {n} \sim {\mathcal N} (\mathbf {0}, \sigma ^{2} \mathbf {I})\) is a Gaussian random noise.

In (12), \(\mathbf {w}_{n_{s}} = \mathbf {0}\) since \(\Delta _{n_{s}} = \phi \). On the other hand, if 1≤in s −1, each entry of w i is taken from the sum of Ni J column vectors of R Ω U 1, each of which is weighted by the entry of \(\mathbf {s}_{\Delta _{i}}\). Since each entry of \(\mathbf {s}_{\Delta _{i}}\) takes +1 or −1 randomly and independently under Assumption 1, w i will follow the jointly Gaussian distribution by the central limit theorem [69]. By noting that w i +n can be modeled as a Gaussian random vector for 1≤in s , r i is also Gaussian for a given \(\mathbf {s}_{\Theta _{i}}\). Then,

$$ \begin{aligned} {\mathbb{E}}\left[\mathbf{r}_{i} | \mathbf{s}_{\Theta_{i}} \right] &= {\mathbb{E}}\left[\mathbf{m}_{i} | \mathbf{s}_{\Theta_{i}} \right] + {\mathbb{E}}\left[\mathbf{w}_{i} | \mathbf{s}_{\Theta_{i}} \right] + {\mathbb{E}}\left[\mathbf{n} | \mathbf{s}_{\Theta_{i}} \right]\\ &= \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Theta_{i}}^{T} \mathbf{s}_{\Theta_{i}} = \mathbf{m}_{i}, \end{aligned} $$
(13)

where \({\mathbb {E}}\left [\mathbf {w}_{i} | \mathbf {s}_{\Theta _{i}} \right ] = {\mathbb {E}}\left [\mathbf {n} | \mathbf {s}_{\Theta _{i}} \right ] = \mathbf {0}\), since \(\mathbf {s}_{\Theta _{i}}\) is independent of w i and n, respectively. Also, the covariance of r i is given by

$$ {}{\mathbb{E}}\!\left[\!\left(\mathbf{r}_{i} \,-\, \mathbf{m}_{i}\right)\!\left(\mathbf{r}_{i} \,-\, \mathbf{m}_{i}\right)^{T}\! | \mathbf{s}_{\Theta_{i}}\!\right] \!\!= \!{\mathbb{E}}\!\left[\!\left(\mathbf{w}_{i} \,+\, \mathbf{n}\right) \!\left(\mathbf{w}_{i} \,+\, \mathbf{n}\right)^{T}\!\right]\! =\! \mathbf{K}_{i} + \sigma^{2} \mathbf{I}, $$
(14)

where w i and n are independent. In (14),

$$ \begin{aligned} \mathbf{K}_{i} = {\mathbb{E}}\left[\mathbf{w}_{i} \mathbf{w}_{i}^{T}\right] & = \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Delta_{i}}^{T} \cdot {\mathbb{E}}\left[\mathbf{s}_{\Delta_{i}}\mathbf{s}_{\Delta_{i}}^{T}\right] \cdot \mathbf{R}_{\Delta_{i}} \mathbf{U}_{1}^{T} \mathbf{R}_{\Omega}^{T} \\ & = \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Delta_{i}}^{T} \cdot \mathbf{R}_{\Delta_{i}} \mathbf{U}_{1}^{T} \mathbf{R}_{\Omega}^{T}, \end{aligned} $$
(15)

where \({\mathbb {E}}\left [\mathbf {s}_{\Delta _{i}} \mathbf {s}_{\Delta _{i}}^{T}\right ] = \mathbf {I}\). Since K i does not depend on \(\mathbf {s}_{\Theta _{i}}\), the covariance of r i in (14) is equal for all possible \(\mathbf {s}_{\Theta _{i}} \in \{-1, +1\}^{|\Theta _{i}|}\) at each ith detection. Under the Gaussian model of r i with equal covariance, we can apply the ML decision rule [70] at the ith detection, which yields

$$ \widehat{\mathbf{s}}_{\Theta_{i}} = \underset{\mathbf{s}_{\Theta_{i}} \in \{-1, +1\}^{|\Theta_{i}|}}{\text{argmin}} \left(\mathbf{r}_{i} - \mathbf{m}_{i} \right)^{T} \left(\mathbf{K}_{i} + \sigma^{2} \mathbf{I} \right)^{-1} \left(\mathbf{r}_{i} - \mathbf{m}_{i} \right). $$
(16)

In (11) and (12), we assumed that all the estimates \(\widehat {\mathbf {s}}_{\Theta _{h}}\), 1≤hi−1, from the previous detections are correct, and then ignored the estimation errors \(\mathbf {s}_{\Theta _{h}} - \widehat {\mathbf {s}}_{\Theta _{h}}\) while subtracting the contribution from r. Therefore, (16) cannot be a true ML detection, but an optimistic approximation to the adversary.

Finally, the adversary carries out the approximate ML detection of (16) n s times successively for 1≤in s and restores the full N-bit keystream by combining the disjoint |Θ i |-bit estimates of \(\widehat {\mathbf {s}}_{\Theta _{i}} \). Throughout this paper, the detection process is called a successive approximate ML detection (SAMD). In what follows, we present an upper bound on the success probability of the SAMD.

Theorem 3

In the SAMD, recall the approximate ML decision rule of (16) applied at each ith detection for 1≤in s , where \(n_{s} = \lceil \frac {N}{J} \rceil \). Let λ min (K i ) be the minimum eigenvalue of the covariance matrix K i in (15). Let P succ be the probability that an N-bit keystream can be successfully restored by the SAMD. Then,

$$ P_{succ} \leq \prod_{i=1}^{n_{s}} \left(1 - Q \left(\sqrt{\frac{M \mu^{2} (\mathbf{U}_{1})}{ \lambda_{min} (\mathbf{K}_{i}) + \sigma^{2}}} \right) \right) \triangleq P_{succ, UB}, $$
(17)

where \(Q(x) = \frac {1}{\sqrt {2 \pi }} \int _{x}^{\infty } e^{-\frac {t^{2}}{2}} dt\).

Proof

See the Appendices. □

Theorem 3 shows the result for a general unitary matrix U 1, which suggests that our CS-based cryptosystem should choose an N×N unitary matrix U 1 such that μ(U 1) is as small as possible, regardless of N, in order to degrade the performance of the SAMD. In this paper, μ(U 1)=1 from U 1=H.

The upper bound on the success probability of Theorem 3 represents the highest possible performance that the SAMD can achieve with no estimation errors at each detection, which is an optimistic scenario for an adversary. In reality, the actual probability of success will be much lower than the upper bound, due to estimation errors and error propagation through detections. If an adversary finds a solution of (16) via an exhaustive search, the complexity of each detection of the SAMD will be \({\mathcal O} \left (2^{J}\right)\) with JN.

5.3 Minimum eigenvalues of K i

Theorem 3 implies that minimizing λ min(K i ) can improve the performance of the SAMD. At the ith detection of the SAMD, it is an adversary that determines the selection operator \(\mathbf {R}_{\Theta _{i}}\). Therefore, if the adversary appropriately chooses Θ i (or equivalently Δ i ) to minimize λ min(K i ), the success probability of the SAMD can be improved. In this paper, we consider three possible selections for Θ i that the adversary may choose reasonably.

  1. 1)

    Uniform selection: \(\Theta _{i} = \left \{i-1, \ \lfloor \frac {N}{J} \rfloor +i-1, \ \cdots, \ (J-1)\lfloor \frac {N}{J} \rfloor +i-1 \right \}\).

  2. 2)

    Consecutive selection: Θ i ={(i−1)J, (i−1)J+1, ⋯, i J−1}.

  3. 3)

    Random selection: Θ i selects the J indices from {0,⋯,N−1}∖(Θ 1+⋯+Θ i−1) uniformly at random.

Each selection is valid for 1≤in s −1, and \(\Theta _{n_{s}} = \{0, \cdots, N-1\} \setminus (\Theta _{1} + \cdots + \Theta _{n_{s}-1})\), where \(n_{s} = \lceil \frac {N}{J} \rceil \). To further minimize λ min(K i ), the adversary might be able to develop a more sophisticated selection of Θ i by exploiting the structure of R Ω and U 1. However, we leave this issue open for future research. Regarding the selection operator, we have the following assumption.

Assumption 2

Once an adversary chooses a value of J and a type of selection, we assume that they will be fixed through the entire detections of the SAMD.

Intuitively, the larger J will ensure better detection performance for the SAMD, since a longer keystream segment that can be subtracted from each detection may contribute less interference. The intuition will be justified by the numerical results of Section 6. In this regard, Assumption 2 is valid, since the adversary’s reasonable option is to fix the value of J to the largest possible one allowed by the computing power. In addition, the numerical results of Section 6 show that λ min(K i ) is not so affected by the type of selections, which also supports Assumption 2.

In what follows, we present a theoretical lower bound on λ min(K i ) for 1≤in s , if Θ i is a random selection.

Theorem 4

In our CS-based cryptosystem with U 1=H, assume that an adversary chooses a random selection for Θ i in the ith detection of the SAMD, where \(1 \leq i \leq n_{s} = \lceil \frac {N}{J} \rceil \). Let \(I_{T} = \lceil \frac {N - c_{2} M\log M}{J} \rceil \) for a constant c 2>0. Then,

$$ \lambda_{\min}(\mathbf{K}_{i}) \geq \left\{ \begin{array}{ll} \left(\sqrt{N - iJ} - \sqrt{c_{1} M \log M} \right)^{2}, & \quad \text{if } i < I_{T}, \\ 0, & \quad \text{if } i \geq I_{T} \end{array} \right. $$
(18)

with high probability, where c 1 is a constant with 0<c 1<c 2.

Proof

See the Appendices. □

The numerical results of Section 6 show that the lower bound also holds for uniform and consecutive selections. Using the bound, Corollary 3 presents a further upper bound on the success probability of the SAMD, which is straightforward from Theorems 3 and 4 with μ(H)=1.

Corollary 3

In our CS-based cryptosystem with U 1=H, if an adversary chooses a random selection for Θ i , 1≤in s during the SAMD, P succ, UB in Theorem 3 is bounded by

$$\begin{aligned} {}P_{\mathrm{succ, UB}} \!\leq\! & \left(\!1\! - Q \left(\sqrt{\frac{M }{\sigma^{2}}} \right) \right)^{n_{s} - I_{T} + 1} \\ & \cdot \prod_{i=1}^{I_{T} -1}\!\! \left(\! 1\! -\! Q\! \left(\! \sqrt{\frac{M }{(\sqrt{N - iJ} \,-\,\! \sqrt{c_{1} M \log M})^{2}\,+\,\sigma^{2}}}\! \right) \right) \\ & \triangleq P_{\text{succ}, \mathrm{U}^{2}\mathrm{B}}, \end{aligned} $$

where \(I_{T} = \lceil \frac {N - c_{2} M\log M}{J} \rceil \) for constants c 1 and c 2 with 0<c 1<c 2.

6 Numerical results

This section presents numerical results to demonstrate the reliability and the security of our CS-based cryptosystem. In numerical experiments, each plaintext x has at most K nonzero entries, where the positions are chosen uniformly at random and the coefficients are taken from the Gaussian distribution. In CS encryption, \(\boldsymbol {\Phi }= \frac {1}{\sqrt {MN}} \mathbf {R}_{\Omega } \mathbf {U}_{1} \text {diag}(\mathbf {s}) \mathbf {U}_{2}\), where U 1=H, and U 2=W or D. Also, the secret keystream s is generated by the self-shrinking generator [46] of a 128-stage LFSR. For CS decryption, the CoSaMP recovery algorithm [71] has been employed for a legitimate recipient to decrypt each ciphertext with the knowledge of Φ.

6.1 CS decryption of a legitimate recipient

Figure 2 demonstrates the performance of CS decryption of a legitimate recipient, where the plaintext length is N=1024 and the ciphertext length is M=48. The figure sketches the normalized mean squared error (NMSE), defined by \(\text {NMSE} = {\mathbb {E}} \left [ \frac {||\mathbf {x} - \widehat {\mathbf {x}} ||^{2}}{||\mathbf {x}||^{2}} \right ]\), where x and \(\widehat {\mathbf {x}}\) are original and decrypted plaintexts, respectively. We examine the performance with total 10000 plaintexts at a given PNR, where each one has at most K=4 nonzero entries. For comparison, we sketch the performance of CS reconstruction with a random Gaussian sensing matrix for Φ. The figure shows that the performance of our CS decryption is as good as that of CS recovery with a random Gaussian sensing matrix. As a consequence, it demonstrates that our CS-based cryptosystem guarantees a reliable CS decryption for a legitimate recipient.

Fig. 2
figure 2

The normalized mean squared error (NMSE) of CS decryption for a legitimate recipient

6.2 Indistinguishability

Figure 3 displays the upper and lower bounds of TV distance over PNR with U 2=W, where N=1024, M=48, and K=4. In the figure, the relative entropy of (3) and the Hellinger distance of (6) were computed using the covariance matrix of (19). Averaged over 10,000 pairs of randomly generated plaintexts (x 1,x 2) with at most K nonzero entries per each, the relative entropy and the Hellinger distance yield the bounds of (4) and (7) on the TV distance, respectively. For comparison, we also sketch the theoretical upper bounds on the TV distance, which are obtained by the maximum relative entropy of (5) and the maximum Hellinger distance of (8), respectively. The figure shows that the TV distance approaches to zero as noise level grows, which implies that our CS-based cryptosystem can be indistinguishable at low PNR. As PNR increases, however, we observe that the upper and lower bounds increase and finally converge to certain levels, respectively. More extensive simulations agreed with the implication of Remark 1 that the CS-based cryptosystem will have lower TV distances with less PNR, M, and K. We made similar observations of the TV distance when U 2=D and/or each plaintext has bipolar nonzero entries.

Fig. 3
figure 3

The upper and lower bounds of total variation distance over PNR

Figure 4 depicts the upper bounds on the success probability of an adversary in the indistinguishability experiment, where the best- and worst-case upper bounds of (2) are from the minimum and maximum achievable TV distances of (7), respectively, obtained by the Hellinger distance (6). In the figure, U 2=W and PNR=25 dB. With a given ciphertext length M=48, the maximum sparsity is set as \(K = \left \lfloor c M / \log _{2}^{2} N \right \rfloor \) for each N=2n, to ensure a reliable nonuniform CS decryption for a legitimate recipient, where c=8.5. For comparison, we sketch the empirical success probability of CS decryption by a legitimate recipient, where a decrypted plaintext has been declared as a success if \(\frac {||\mathbf {x} - \widehat {\mathbf {x}} ||^{2}}{||\mathbf {x}||^{2}} < 10^{-2}\). The figure reveals that the adversary’s success probability approaches to that of a random guess as the keystream length N increases, while a legitimate recipient maintains its reliability.

Fig. 4
figure 4

The success probability of legitimate recipient and adversary for a given M

Figure 5 also displays the upper bounds on the success probability of an adversary in the indistinguishability experiment. At this time, the ciphertext length is kept as \(M = \left \lceil c K \log _{2}^{2} N \right \rceil \) for each N=2n with a given K=4, where c=0.12. As in Fig. 4, it also reveals that the adversary’s success probability approaches to 0.5 as the keystream length N increases, while a legitimate recipient maintains its reliability. In conclusion, the empirical results of Figs. 4 and 5 show that if the keystream length N is sufficiently large with low compression \(\left (\frac {M}{N} \right)\) and sparsity \(\left (\frac {K}{N} \right)\) ratios, our CS-based cryptosystem can be computationally secure in terms of the indistinguishability, while guaranteeing a reliable CS decryption for a legitimate recipient.

Fig. 5
figure 5

The success probability of legitimate recipient and adversary for a given K

6.3 Performance of SAMD

Figure 6 sketches the minimum eigenvalues of the covariance matrix K i of (15) at the ith detection for various J∈{32,48,64,80}, where N=1024 and M=48. For comparison, it also sketches the lower bound of Theorem 4, where c 1=0.5 and c 2=1. For each i, we tested with 100,000 pairs of (Ω,Θ i ) for random subsampling and selection operators R Ω and \(\mathbf {R}_{\Theta _{i}}\), where Θ i had been fixed through the tested pairs in case of uniform and consecutive selections. In each subfigure, λ min(K i ) is sketched over 1≤in s −1, where \(n_{s} = \lceil \frac {N}{J} \rceil \). Figure 6 shows that if J increases, λ min(K i ) decreases faster over i, which suggests that the detection performance will be improved as J increases. It is plausible because if more keystream bits are detected from the ith detection with no estimation errors, more interfering components can be subtracted from the (i+1)th detection. In addition, it appears that the minimum eigenvalues are irrelevant to the types of Θ i , which means that an adversary may expect no benefits from a particular selection of Θ i . Finally, Fig. 6 demonstrates that the lower bound of Theorem 4 is valid, not only for random selection but also for uniform and consecutive selections.

Fig. 6
figure 6

The minimum eigenvalues of K i at the ith detection of the SAMD

Figure 7 displays the upper bounds on the success probability of the SAMD for keystream recovery. For comparison, it also sketches the theoretical upper bound of Corollary 3 for random selection Θ i . In view of the adversary’s bounded computing power, we set J≤128, where the complexity of each detection in the SAMD will be \({\mathcal {O}}\left (2^{J}\right)\) by an exhaustive search. Since λ min(K i ) has similar values for different types of Θ i ’s in Fig. 6, the upper bounds of Fig. 7 are also similar for every selection types. Moreover, the upper bounds increase over J, which is obvious from the sharp decline of λ min(K i ) over J, observed from Fig. 6. However, even if an adversary chooses a large value of J, the upper bounds on the success probability are still significantly low, which implies that the potential of the SAMD to restore a correct N-bit keystream is pessimistic. Note that this is the result of an optimistic scenario, and in reality, the actual probability of success of the SAMD will be much lower than the upper bounds, due to estimation errors and their propagation through the SAMD.

Fig. 7
figure 7

The upper bounds on the success probability of the SAMD

7 Conclusions

This paper has proposed a CS-based cryptosystem that encrypts a plaintext with a partial unitary sensing matrix embedding a secret keystream. We demonstrated that our CS-based cryptosystem can offer a theoretically and empirically reliable decryption performance for a legitimate recipient, which is the first contribution of this paper. Then, we examined the indistinguishability of our CS-based cryptosystem by studying the TV distance as a security measure. To investigate the TV distance, we developed upper bounds on the relative entropy and the Hellinger distance, respectively. From the second contribution, we showed that our CS-based cryptosystem can be computationally secure in terms of the indistinguishability, as long as the keystream length for each encryption is sufficiently large with low compression and sparsity ratios.

In addition, we considered a potential CPA from an adversary to recover the key of our CS-based cryptosystem. The computational security of our CS-based cryptosystem against the CPA is based on the mathematical hardness that no polynomial-time algorithm is known to find an ML solution to the underdetermined ILS problem for keystream recovery. As a sub-optimal approach, we introduced the SAMD for an adversary to restore a secret keystream in polynomial time. In the third contribution, we developed an upper bound on the success probability of the SAMD and demonstrated that the performance of the keystream recovery through the SAMD is very pessimistic. In conclusion, our CS-based cryptosystem with a partial unitary sensing matrix embedding a secret keystream can be secure against the CPA, while guaranteeing a stable and robust decryption for a legitimate recipient.

8 Endnotes

1 This paper assumes that a plaintext x is sparse in canonical basis, or Ψ=I. In general, if a plaintext x is sparse with respect to an arbitrary orthonormal basis Ψ, i.e., x=Ψ T θ, the sensing matrix A=Φ Ψ T maintains the form of (1) by considering U 2 Ψ T as a new unitary matrix U 2.

2 In the last detection, \((N - (\lceil \frac {N}{J} \rceil - 1) J)\)-bit segment will be restored, where \(\lceil \frac {N}{J} \rceil \) denotes the nearest integer greater than or equal to \(\frac {N}{J}\).

3 Under this assumption, numerical results showed that the upper bound on the success probability of the successive detection is more favorable for an adversary than that of \(\widehat {\mathbf {x}}\) with arbitrary nonzero entries.

9 Appendices

9.1 Proof of Theorem 1

We give a brief sketch for the proof of Theorem 1, as the underlying technique is similar to that of Theorem 1 in [72]. Similar to Lemma 1 of [72], the covariance matrix of r is given by

$$ \mathbf{C}_{h} = {\mathbb{E}}\left[\mathbf{r} \mathbf{r}^{T} | \mathbf{x}_{h}\right] = \mathbf{R}_{\Omega} \widetilde{\mathbf{C}}_{h} \mathbf{R}_{\Omega}^{T} + \sigma^{2} \mathbf{I}, $$
(19)

where \( \widetilde {\mathbf {C}}_{h} = \frac {1}{N} \mathbf {U}_{1}^{T} \text {diag}\left (\frac {|\widehat {\mathbf {x}}_{h}|^{2} }{M} \right) \mathbf {U}_{1} \) for \(\widehat {\mathbf {x}}_{h} = \mathbf {U}_{2} \mathbf {x}_{h}\). Let λ 1(C h )≥⋯≥λ M (C h ) be the eigenvalues of C h , while \(\lambda _{1}(\widetilde {\mathbf {C}}_{h}) \geq \cdots \ge \lambda _{N} \left (\widetilde {\mathbf {C}}_{h}\right)\) be the eigenvalues of \(\widetilde {\mathbf {C}}_{h}\). With \(\widehat {\mathbf {x}}_{h} = \mathbf {U}_{2} \mathbf {x}_{h} = \left (\widehat {x}_{h,0}, \cdots, \widehat {x}_{h,N-1}\right)^{T}\), let v h =(v h,0,⋯,v h,N−1)T, where \(v_{h,k} = |\widehat {x}_{h, \pi (k)}|^{2}\) for k=0,⋯,N−1, and π(k) is a permutation for v h,0≥⋯≥v h,N−1. From the definition of \(\widetilde {\mathbf {C}}_{h}\), it is clear that \(\lambda _{t}\left (\widetilde {\mathbf {C}}_{h}\right) = \frac {v_{h, t-1}}{M} \ge 0\) for t=1,⋯,N.

In (19), \(\widehat {\mathbf {C}}_{h} = \mathbf {R}_{\Omega } \widetilde {\mathbf {C}}_{h} \mathbf {R}_{\Omega }^{T} \) is an M×M principal submatrix of \(\widetilde {\mathbf {C}}_{h} \), where successive application of the interlacing inequality [73] leads to \( \lambda _{t+N-M} \left (\widetilde {\mathbf {C}}_{h}\right) \leq \lambda _{t} \left (\widehat {\mathbf {C}}_{h}\right) \leq \lambda _{t} \left (\widetilde {\mathbf {C}}_{h}\right)\) for 1≤tM. Thus, \( \underset {h}{\min } \ \underset {\mathbf {x}_{h}}{\min } \ \lambda _{M} \left (\widehat {\mathbf {C}}_{h}\right) = \underset {h}{\min } \ \underset {\mathbf {x}_{h}}{\min } \ \lambda _{N} \left (\widetilde {\mathbf {C}}_{h}\right) = 0\) from v h,N−1≥0. On the other hand, \( \underset {h}{\max } \ \underset {\mathbf {x}_{h}}{ \max } \ \lambda _{1} \left (\widehat {\mathbf {C}}_{h}\right) = \underset {h}{\max } \ \underset {\mathbf {x}_{h}}{ \max } \ \lambda _{1} \left (\widetilde {\mathbf {C}}_{h}\right) = \underset {h}{\max } \ \underset {\mathbf {x}_{h}}{ \max } \ \frac {v_{h, 0}}{M}\). By the Cauchy-Schwarz inequality, we obtain \( \frac {v_{h, 0}}{M} = \frac {|\widehat {x}_{h, \pi (0)}|^{2}}{M} = \frac {1}{M} \left | \sum _{k \in \mathcal {S}} x_{h, k} \mathbf {U}_{2}(\pi (0), k) \right |^{2} \leq \frac {K \mu ^{2} (\mathbf {U}_{2}) \cdot \mathcal {E}_{x}}{M}\), where \(\mathcal {S}\) is the set of nonzero entries of x h with \(|\mathcal {S}| \leq K\). As \(\lambda _{t} (\mathbf {C}_{h}) = \lambda _{t} \left (\widehat {\mathbf {C}}_{h}\right) + \sigma ^{2}\) from \(\mathbf {C}_{h} = \widehat {\mathbf {C}}_{h} + \sigma ^{2} \mathbf {I}\), we have

$$ {\displaystyle \begin{array}{cc}{\lambda}_{\mathrm{min}}& =\underset{h}{\min \limits}\kern1em \underset{{\mathbf{x}}_h}{\min \limits}\kern1em {\lambda}_M\left({\mathbf{C}}_h\right)={\sigma}^2,\\ {}{\lambda}_{\mathrm{max}}& =\underset{h}{\max \limits}\kern1em \underset{{\mathbf{x}}_h}{\max \limits}\kern1em {\lambda}_1\left({\mathbf{C}}_h\right)=\frac{K{\mu}^2\left({\mathbf{U}}_2\right)\cdotp {\mathcal{E}}_x}{M}+{\sigma}^2,\end{array}} $$
(20)

where h=1 or 2.

Meanwhile, the upper bound on \(\text {tr} \left (\mathbf {C}_{2}^{-1} \mathbf {C}_{1} \right)\) in Lemma 3 of [72] yields

$$ {}\begin{aligned} D(p_{1} || p_{2}) & \leq \frac{1}{2} \sum\limits_{t=1}^{M} \left(\log \frac{\lambda_{M+1-t}(\mathbf{C}_{2})}{\lambda_{t}(\mathbf{C}_{1})} + \frac{\lambda_{t}(\mathbf{C}_{1})}{\lambda_{M+1-t}(\mathbf{C}_{2})} - 1 \right) \\ & = \frac{1}{2} \sum\limits_{t=1}^{M} \, f(z_{t}), \end{aligned} $$

where f(z)=z− logz−1 and \(z_{t} = \frac {\lambda _{t} (\mathbf {C}_{1})}{ \lambda _{M+1-t} (\mathbf {C}_{2})} > 0\). With λ min and λ max in (20), define \( \tau = \frac {\lambda _{\max }}{\lambda _{\min }} = \frac {K \mu ^{2} (\mathbf {U}_{2}) \mathcal {E}_{x}}{M \sigma ^{2}} + 1 > 1\). Similar to the proof of Theorem 1 in [72], \(D(p_{1} || p_{2}) \leq \frac {M}{2} f(\tau)\), which yields (5).

9.2 Proof of Theorem 2

We use definitions and notations in the proof of Theorem 1. Let λ 1(C 3)≥⋯≥λ M (C 3) be the eigenvalues of \(\mathbf {C}_{3} = \frac {\mathbf {C}_{1} + \mathbf {C}_{2}}{2}\). Clearly, the eigenvalues of C 1, C 2, and C 3 are positive by (20) and the Weyl inequality [73]. In (6), let \( \Gamma = \frac {|\mathbf {C}_{1}|^{\frac {1}{2}} |\mathbf {C}_{2}|^{\frac {1}{2}}}{|\mathbf {C}_{3}|} \triangleq \frac {\Gamma _{n}}{\Gamma _{d}}\). Then,

$$ \begin{aligned} \Gamma_{d} = \prod_{t=1}^{M} \lambda_{t} (\mathbf{C}_{3}) &\leq \left(\frac{\sum_{t=1}^{M} \lambda_{t} (\mathbf{C}_{3})}{M} \right)^{M} = \left(\frac{\text{tr} (\mathbf{C}_{3})}{M} \right)^{M}\\& = \left(\frac{\text{tr} (\mathbf{C}_{1}) + \text{tr} (\mathbf{C}_{2})}{2M} \right)^{M}, \end{aligned} $$
(21)

where the inequality is from the arithmetic mean-geometric mean inequality. For h=1 or 2, the tth diagonal entry of \(\widetilde {\mathbf {C}}_{h} = \frac {1}{N} \mathbf {U}_{1}^{T} \text {diag}\left (\frac {|\widehat {\mathbf {x}}_{h}|^{2} }{M} \right) \mathbf {U}_{1} \) is given by \(\frac {1}{MN} \sum _{k=0}^{N-1} |\widehat {x}_{h,k}|^{2} \mathbf {U}_{1}^{2} (k, t) = \frac {1}{MN} || \widehat {\mathbf {x}}_{h} ||^{2} = \frac {1}{M} ||\mathbf {x}_{h}||^{2} = \frac {\mathcal {E}_{x}}{M},\) where \(\mathbf {U}_{1}^{2} (k, t) = 1\) for 0≤tN−1. Note that \(\widehat {\mathbf {C}}_{h} = \mathbf {R}_{\Omega } \widetilde {\mathbf {C}}_{h} \mathbf {R}_{\Omega }^{T}\) has the same diagonal entry of \(\widetilde {\mathbf {C}}_{h}\). Thus, from \(\mathbf {C}_{h} = \widehat {\mathbf {C}}_{h} + \sigma ^{2} \mathbf {I}\), we have

$$ \text{tr}(\mathbf{C}_{h}) = \text{tr}(\widehat{\mathbf{C}}_{h}) + M \sigma^{2} = \mathcal{E}_{x} + M \sigma^{2}, $$
(22)

where (21) becomes

$$ \Gamma_{d} \leq \left(\frac{\mathcal{E}_{x}}{M} + \sigma^{2} \right)^{M}. $$
(23)

In Γ n , the geometric mean-harmonic mean inequality yields

$$ |\mathbf{C}_{h}|^{\frac{1}{2}} = \left(\prod_{t=1}^{M} \lambda_{t} (\mathbf{C}_{h}) \right)^{\frac{1}{2}} \ge \left(\frac{1}{\frac{1}{M} \sum_{t=1}^{M} \lambda_{t}^{-1} (\mathbf{C}_{h})} \right)^{\frac{M}{2}}, $$
(24)

where h=1 or 2. By the Kantorovich inequality [74],

$$ \begin{aligned} \frac{1}{M} \sum\limits_{t=1}^{M} \lambda_{t}^{-1} (\mathbf{C}_{h}) & \leq \frac{M}{4 \ \text{tr}(\mathbf{C}_{h})} \left(\frac{\lambda_{1} (\mathbf{C}_{h})}{\lambda_{M} (\mathbf{C}_{h})} + \frac{\lambda_{M} (\mathbf{C}_{h})}{\lambda_{1} (\mathbf{C}_{h})} \!+ 2 \right) \\ & = \frac{M}{4 \ \text{tr}(\mathbf{C}_{h})} \left(\frac{\lambda_{\max}}{\lambda_{\min}} + \frac{\lambda_{\min}}{\lambda_{\max}} + 2 \right) \\ & = \frac{M}{4 \ \text{tr}(\mathbf{C}_{h})} \left(\tau + \frac{1}{\tau} + 2 \right), \end{aligned} $$
(25)

where λ 1(C h ) and λ M (C h ) have been replaced by λ max and λ min of (20), respectively. In (25), \(\tau = \frac {\lambda _{\max }}{\lambda _{\min }} = \frac {K \mu ^{2}(\mathbf {U}_{2}) \cdot \mathcal {E}_{x}}{M \sigma ^{2}} + 1 = K \mu ^{2}(\mathbf {U}_{2}) \cdot \text {PNR} + 1\). By (22), (24), and (25),

$$ \Gamma_{n} \geq \left(\frac{4 \sqrt{\text{tr}(\mathbf{C}_{1}) \cdot \text{tr}(\mathbf{C}_{2})} }{M(\tau + \frac{1}{\tau} + 2)} \right)^{M} = \left(\frac{4 \left(\frac{\mathcal{E}_{x}}{M} + \sigma^{2} \right) }{\tau + \frac{1}{\tau} + 2} \right)^{M}. $$
(26)

By combining Γ d and Γ n , (23) and (26) yield

$$ \begin{aligned} \Gamma = \frac{\Gamma_{n}}{\Gamma_{d}} &\geq \frac{\left(\frac{4 \left(\frac{\mathcal{E}_{x}}{M} + \sigma^{2} \right) }{\tau + \frac{1}{\tau} + 2} \right)^{M}} {\left(\frac{\mathcal{E}_{x}}{M} + \sigma^{2} \right)^{M}} = \left(\frac{2\sqrt{\tau} }{\tau + 1} \right)^{\frac{M}{2}} \\&= \left(\frac{2\sqrt{K \mu^{2} (\mathbf{U}_{2})\cdot \text{PNR}+1} }{ K \mu^{2} (\mathbf{U}_{2}) \cdot \text{PNR}+ 2} \right)^{\frac{M}{2}}. \end{aligned} $$

Finally, the proof is completed by \(d_{\mathrm {H}} (p_{1}, p_{2}) = \sqrt {1 - \Gamma ^{\frac {1}{2}}}\).

9.3 Proof of Theorem 3

In (15), K i is the Gram matrix, or \(\mathbf {K}_{i} = \mathbf {A}_{i}^{T} \mathbf {A}_{i}\) with \(\mathbf {A}_{i} = \mathbf {R}_{\Delta _{i}} \mathbf {U}_{1}^{T} \mathbf {R}_{\Omega }^{T}\) for 1≤in s −1, where λ min(K i )≥0, since K i is positive semi-definite [73]. Let \(\mathbf {s}_{\Theta _{i}}\) and \(\mathbf {s}_{\Theta _{i}} '\) be a pair of correct and wrong J-bit segments from a keystream s at the index set Θ i , respectively. From (13), \({\mathbb {E}}\left [\mathbf {r}_{i} | \mathbf {s}_{\Theta _{i}} \right ] = \mathbf {m}_{i} = \mathbf {R}_{\Omega } \mathbf {U}_{1} \mathbf {R}_{\Theta _{i}}^{T} \mathbf {s}_{\Theta _{i}} \) and \({\mathbb {E}}\left [\mathbf {r}_{i} | \mathbf {s}_{\Theta _{i}} ' \right ] = \mathbf {m}_{i} ' = \mathbf {R}_{\Omega } \mathbf {U}_{1} \mathbf {R}_{\Theta _{i}}^{T} \mathbf {s}_{\Theta _{i}} '\), respectively. Also, (14) yields \({\mathbb {E}}\left [\left (\mathbf {r}_{i} - \mathbf {m}_{i} \right)\left (\mathbf {r}_{i} - \mathbf {m}_{i} \right)^{T} | \mathbf {s}_{\Theta _{i}} \right ] = {\mathbb {E}}\left [\left (\mathbf {r}_{i} - \mathbf {m}_{i} ' \right)\left (\mathbf {r}_{i} - \mathbf {m}_{i} '\right)^{T} | \mathbf {s}_{\Theta _{i}} '\right ] = \mathbf {K}_{i} + \sigma ^{2} \mathbf {I}\). Assuming that r i is a Gaussian random vector, the binary hypothesis detection of Section 3.2 in [70] reveals that the pairwise error probability that \(\mathbf {s}_{\Theta _{i}} '\) is incorrectly detected by the ith detection is

$$ \begin{aligned} {}\text{Pr}\left[\!\mathbf{s}_{\Theta_{i}} \!\rightarrow\! \left. \mathbf{s}_{\Theta_{i}} ' \right| \mathbf{s}_{\Theta_{i}}, \mathbf{s}_{\Theta_{i}} ' \right] & \geq Q \left(\frac{|| \mathbf{m}_{i} - \mathbf{m}_{i} '||}{2 \sqrt{\lambda_{\min}(\mathbf{K}_{i})+\sigma^{2}}} \right) \\ & = Q \left(\frac{|| \mathbf{R}_{\Omega} \mathbf{U}_{1} \mathbf{R}_{\Theta_{i}}^{T} \left(\mathbf{s}_{\Theta_{i}} - \mathbf{s}_{\Theta_{i}}'\right)||} {2 \sqrt{\lambda_{\min}(\mathbf{K}_{i})+\sigma^{2}}} \right)\!. \end{aligned} $$
(27)

We assume that the pairwise error event occurs only for a specific \(\mathbf {s}_{\Theta _{i}}'\), which is closest to \(\mathbf {s}_{\Theta _{i}}\), and ignore all the other \(\mathbf {s}_{\Theta _{i}}'\). In other words, we take into account only a single \(\mathbf {s}_{\Theta _{i}}'\), where \(\mathbf {s}_{\Theta _{i}} - \mathbf {s}_{\Theta _{i}}'\) has the nonzero entry (+2 or −2) at one position, or equivalently \(|| \mathbf {s}_{\Theta _{i}} - \mathbf {s}_{\Theta _{i}} ' || = 2\) for a given \(\mathbf {s}_{\Theta _{i}}\). This assumption, similar to the one in [75], is favorable for an adversary. From (27), the error probability under the assumption is given by

$$ {}\begin{aligned} P_{e}^{(i)} & \,=\, \sum_{\mathbf{s}_{\Theta_{i}}}\! \text{Pr} \left[\! \mathbf{s}_{\Theta_{i}}\right] \!\cdot\! \sum_{\mathbf{s}_{\Theta_{i}} ' } \text{Pr}\left[\! \mathbf{s}_{\Theta_{i}} \rightarrow\! \mathbf{s}_{\Theta_{i}} '\! \mid\! \mathbf{s}_{\Theta_{i}}, \mathbf{s}_{\Theta_{i}} ' \right] \!\cdot\! \text{Pr} \left[ \mathbf{s}_{\Theta_{i}}' \mid \mathbf{s}_{\Theta_{i}} \right] \\ & =\! \sum_{\mathbf{s}_{\Theta_{i}}}\! \text{Pr} \left[ \mathbf{s}_{\Theta_{i}}\right] \!\cdot\! \text{Pr}\left[ \mathbf{s}_{\Theta_{i}} \!\rightarrow\! \mathbf{s}_{\Theta_{i}} ' \mid \mathbf{s}_{\Theta_{i}}, \mathbf{s}_{\Theta_{i}} ', ||\mathbf{s}_{\Theta_{i}} - \mathbf{s}_{\Theta_{i}} ' || \,=\, 2\! \right] \\ & =\! \text{Pr}\left[ \mathbf{s}_{\Theta_{i}} \rightarrow \mathbf{s}_{\Theta_{i}} ' \mid \mathbf{s}_{\Theta_{i}}, \mathbf{s}_{\Theta_{i}} ', ||\mathbf{s}_{\Theta_{i}} - \mathbf{s}_{\Theta_{i}} ' || = 2 \right] \\ & \geq Q \left(\frac{\sqrt{\sum_{k=0}^{M-1} 4 \left| \mathbf{U}_{1} \left(\omega_{k}, \theta_{i, \tau}\right) \right|^{2} }}{2 \sqrt{\lambda_{\min}(\mathbf{K}_{i})+\sigma^{2}}} \right) \\ & = Q \left(\sqrt{\frac{M \mu^{2}(\mathbf{U}_{1})}{\lambda_{\min}(\mathbf{K}_{i})+\sigma^{2}}} \right), \end{aligned} $$
(28)

where ω k Ω and θ i,τ Θ i . In (28), we assumed that \(\mathbf {s}_{\Theta _{i}}\) and \(\mathbf {s}_{\Theta _{i}} '\) differ only at a position corresponding to the column index θ i,τ of U 1. Note that \(P_{e}^{(i)}\) is under the assumption that all the estimates from previous i−1 detections have been subtracted with no errors to yield r i of (12). Then, the success probability of the ith detection is

$$ \begin{aligned} P_{s}^{(i)} & = \text{Pr} \left[ \widehat{\mathbf{s}}_{\Theta_{i}} = \mathbf{s}_{\Theta_{i}} \mid \widehat{\mathbf{s}}_{\Theta_{1}} = \mathbf{s}_{\Theta_{1}}, \cdots, \widehat{\mathbf{s}}_{\Theta_{i-1}} = \mathbf{s}_{\Theta_{i-1}} \right] \\ & = 1 - P_{e}^{(i)} \leq 1 - Q \left(\sqrt{\frac{M \mu^{2}(\mathbf{U}_{1})}{\lambda_{\min}(\mathbf{K}_{i})+\sigma^{2}}} \right), \end{aligned} $$
(29)

where 1≤in s . If a correct N-bit keystream is to be restored, all the component detections should be successful. Thus, the success probability of the SAMD is

$$ {}\begin{aligned} P_{\text{succ}} &= \text{Pr} \left[\widehat{\mathbf{s}}_{\Theta_{1}} = \mathbf{s}_{\Theta_{1}}, \cdots, \widehat{\mathbf{s}}_{\Theta_{n_{s}}} = \mathbf{s}_{\Theta_{n_{s}}} \right] \\ & = \prod_{i=1}^{n_{s}} \text{Pr} \left[ \widehat{\mathbf{s}}_{\Theta_{i}} = \mathbf{s}_{\Theta_{i}} \mid \widehat{\mathbf{s}}_{\Theta_{1}} = \mathbf{s}_{\Theta_{1}}, \cdots, \widehat{\mathbf{s}}_{\Theta_{i-1}} = \mathbf{s}_{\Theta_{i-1}} \right] \\ & = \prod_{i=1}^{n_{s}} P_{s}^{(i)}. \end{aligned} $$
(30)

Finally, we obtain the upper bound of (17) by combining (29) and (30), which completes the proof.

9.4 Proof of Theorem 4

In (15), let \(\mathbf {A}_{i} = \mathbf {R}_{\Delta _{i}} \mathbf {H}^{T} \mathbf {R}_{\Omega }^{T}\) with U 1=H. Then, the singular values of A i are equal to the square roots of the eigenvalues of \(\mathbf {K}_{i} = \mathbf {A}_{i}^{T} \mathbf {A}_{i}\), where λ min(K i )≥0 for all i’s. In other words, if σ min(A i ) denotes the minimum singular value of A i , then \(\lambda _{\min }(\mathbf {K}_{i}) = \sigma _{\min }^{2} (\mathbf {A}_{i})\).

To examine σ min(A i ) for 1≤in s −1, we first define \(\mathbf {B}_{i} = \mathbf {H}^{T} \mathbf {R}_{\Omega }^{T}\). Then, B i is an N×M matrix satisfying \( \mathbf {B}_{i}^{T} \mathbf {B}_{i} = \mathbf {R}_{\Omega } \mathbf {H} \cdot \mathbf {H}^{T} \mathbf {R}_{\Omega }^{T} = N \mathbf {I}\), which means that each column of B i is mutually orthogonal. Also, it is clear that the l 2-norm of each row of B i is \(\sqrt {M}\), since each entry of B i is ±1. If Θ i is a random selection, so is Δ i , where \(\phantom {\dot {i}\!}\mathbf {A}_{i} = \mathbf {R}_{\Delta _{i}} \mathbf {B}_{i}\) is an (Ni JM matrix obtained by randomly subsampling (Ni J) rows from B i , where the selected row indices are specified by Δ i . For such a matrix A i , Corollary 5.55 of [4] shows that for every t≥0,

$$ \sigma_{\min} (\mathbf{A}_{i}) \geq \sqrt{N-iJ} - t \sqrt{M} $$
(31)

with probability at least \(1-2M e^{-ct^{2}}\) for a constant c>0. The corollary assumed that \(t \geq \sqrt {c_{1} \log {M}}\) and Ni J>c 2 M logM for the bound to be nontrivial and nonnegative, where 0<c 1<c 2. Thus, the bound of (31) is valid only for \(i <\lceil \frac {N - c_{2} M\log M}{J} \rceil = I_{T}\), and we set σ min(A i )≥0 if iI T , which gives the bound of (18) from \(\lambda _{\min }(\mathbf {K}_{i}) = \sigma _{\min }^{2} (\mathbf {A}_{i})\).