Skip to main content
Log in

On data complexity of distinguishing attacks versus message recovery attacks on stream ciphers

  • Published:
Designs, Codes and Cryptography Aims and scope Submit manuscript

Abstract

We revisit the different approaches used in the literature to estimate the data complexity of distinguishing attacks on stream ciphers and analyze their inter-relationships. In the process, we formally argue which approach is applicable (or not applicable) in what scenario. To our knowledge, this is the first kind of such an exposition. We also perform a rigorous statistical analysis of the message recovery attack that exploits a distinguisher and show that in practice there is a significant gap between the data complexities of a message recovery attack and the underlying distinguishing attack. This gap is not necessarily determined by a constant factor as a function of the false positive and negative rate, as one would expect. Rather this gap is also a function of the number of samples of the distinguishing attack. We perform a case study on RC4 stream cipher to demonstrate that the typical complexities for message recovery attack inferred in the literature are but under-estimates and the actual estimates are quite larger.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. A family of probability distributions \(\left\{ P_{\theta }\right\} \) indexed by the parameter \(\theta \) is said to be identifiable w.r.t. \(\theta \), if

    $$\begin{aligned} \theta _1\ne \theta _2 \Rightarrow P_{\theta _1 } \ne P_{\theta _2}. \end{aligned}$$

    Otherwise the family is said to be non-identifiable.

References

  1. AlFardan N.J., Bernstein D.J., Paterson K.G., Poettering B., Schuldt J.C.N.: On the security of RC4 in TLS. In: King S.T. (ed.) Proceedings of the 22th USENIX Security Symposium, Washington, DC, USA, 14–16 August 2013, pp. 305–320. USENIX Association, Santa Clara (2013).

  2. Aumasson J.-P., Fischer S., Khazaei S., Meier W., Rechberger C.: New features of latin dances: analysis of salsa, chacha, and rumba. In: Nyberg K. (ed.) Fast Software Encryption, 15th International Workshop, FSE 2008, Lausanne, Switzerland, 10–13 February 2008, Revised Selected Papers, vol. 5086. Lecture Notes in Computer Science, pp. 470–488. Springer, Berlin (2008).

  3. Baignères T., Junod P., Vaudenay S.: How far can we go beyond linear cryptanalysis? In: Lee P.J. (ed.) Advances in Cryptology—ASIACRYPT 2004, 10th International Conference on the Theory and Application of Cryptology and Information Security, Jeju Island, Korea, 5–9 December 2004, Proceedings, vol. 3329. Lecture Notes in Computer Science, pp. 432–450. Springer, Berlin (2004).

  4. Baignères T., Sepehrdad P., Vaudenay S.: Distinguishing distributions using Chernoff information. In: Heng S.-H., Kurosawa K. (eds.) Provable Security—4th International Conference, ProvSec 2010, Malacca, Malaysia, 13–15 October 2010, Proceedings, vol. 6402. Lecture Notes in Computer Science, pp. 144–165. Springer, Berlin (2010).

  5. Banik S., Isobe T.: Cryptanalysis of the full Spritz stream cipher. In: Peyrin T. (ed.) Fast Software Encryption—23rd International Conference, FSE 2016, Bochum, Germany, 20–23 March 2016, Revised Selected Papers, vol. 9783. Lecture Notes in Computer Science, pp. 63–77. Springer, Berlin (2016).

  6. Basu R., Ganguly S., Maitra S., Paul G.: A complete characterization of the evolution of RC4 pseudo random generation algorithm. J. Math. Cryptol. 2(3), 257–289 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  7. Blahut R.E.: Principles and Practice of Information Theory. Addison-Wesley Longman Publishing, Boston (1987).

    MATH  Google Scholar 

  8. Blondeau C., Gérard B., Tillich J.-P.: Accurate estimates of the data complexity and success probability for various cryptanalyses. Des. Codes Cryptogr. 59(1–3), 3–34 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  9. Casella G., Berger R.: Statistical Inference. Duxbury Resource Center, Boston (2001).

    MATH  Google Scholar 

  10. Cover T.M., Thomas J.A: Elements of Information Theory. Wiley Series in Telecommunications and Signal Processing. Wiley-Interscience, Hoboken (2006).

  11. Ekdahl P., Johansson, T.: Distinguishing attacks on sober-t16 and t32. In: Daemen J., Rijmen V. (eds.) Fast Software Encryption, 9th International Workshop, FSE 2002, Leuven, Belgium, 4–6 February 2002, Revised Papers, vol. 2365. Lecture Notes in Computer Science, pp. 210–224. Springer, Berlin (2002).

  12. Fluhrer S.R., McGrew, D.A.: Statistical analysis of the alleged RC4 keystream generator. In: Schneier B. (ed.) Fast Software Encryption, 7th International Workshop, FSE 2000, New York, NY, USA, 10–12 April 2000, Proceedings, vol. 1978. Lecture Notes in Computer Science, pp. 19–30. Springer, Berlin (2000).

  13. Garman C., Paterson K.G., Van der Merwe, T.: Attacks only get better: password recovery attacks against RC4 in TLS. In: Jung J., Holz T. (eds.) 24th USENIX Security Symposium, USENIX Security 15, Washington, D.C., USA, 12–14 August 2015, pp. 113–128. USENIX Association, Santa Clara (2015).

  14. Gupta S.S., Maitra S., Paul G., Sarkar S.: (Non-)random sequences from (non-)random permutations–analysis of RC4 stream cipher. J. Cryptol. 27(1), 67–108 (2014).

    Article  MATH  Google Scholar 

  15. Gut A.: Probability: A Graduate Course, 2nd edn. Springer, New York (2013).

    Book  MATH  Google Scholar 

  16. Hardy G.H., Littlewood J.E., Pólya G.: Inequalities. Cambridge Mathematical Library. Cambridge University Press, Cambridge (1952).

  17. Kullback S., Leibler R.A.: On information and sufficiency. Ann. Math. Stat. 22, 49–86 (1951).

    Article  MathSciNet  MATH  Google Scholar 

  18. Maitra S., Paul G., Raizada S., Sen S., Sengupta R.: Some observations on HC-128. Des. Codes Cryptogr. 59(1–3), 231–245 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  19. Mantin I.: Predicting and distinguishing attacks on RC4 keystream generator. In: Cramer R. (ed.) Advances in Cryptology—EUROCRYPT 2005, 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, 22–26 May 2005, Proceedings, vol. 3494. Lecture Notes in Computer Science, pp. 491–506. Springer, Berlin (2005)

  20. Mantin I., Shamir A.: A practical attack on broadcast RC4. In: Matsui M. (ed.) Fast Software Encryption, 8th International Workshop, FSE 2001 Yokohama, Japan, April 2-4, 2001, Revised Papers, vol. 2355. Lecture Notes in Computer Science, pp. 152–164. Springer, Berlin (2001)

  21. Neyman, J., Pearson, E.S.: On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A 231, 289–337 (1933).

  22. Samajder S., Sarkar P.: Another look at normal approximations in cryptanalysis. IACR Cryptology. ePrint Archive 2015, 679 (2015).

  23. Samajder S., Sarkar P.: Rigorous upper bounds on data complexities of block cipher cryptanalysis. IACR Cryptology. ePrint Archive 2015, 916 (2015).

  24. Stankovski P., Ruj S., Hell M., Johansson T.: Improved distinguishers for HC-128. Des. Codes Cryptogr. 63(2), 225–240 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  25. Wu H.: The stream cipher HC-128. In: Robshaw M.J.B., Billet O. (eds.) New Stream Cipher Designs—The eSTREAM Finalists, vol. 4986. Lecture Notes in Computer Science, pp. 39–47. Springer, Berlin (2008).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Goutam Paul.

Additional information

Communicated by C. Cid.

The second author worked for this paper during the summer and winter breaks of 2015 in his Bachelor of Statistics course.

Appendices

Appendix A: Proof of Lemma 2

Proof

Suppose \({\mathcal {S}}\) is the sample space and \(\phi : {\mathcal {S}} \longrightarrow [0,1]\) be the test function [9] for the concerned test with false positive rate (\(\alpha \)) and false negative rate (\(\beta \)), i.e., we reject \(H_0\) with probability \(\phi (\varvec{x})\), when \(\varvec{X}=\varvec{x}\) is observed. Then we have by definition

$$\begin{aligned} E_{H_0}[\phi (\varvec{X})]= & {} \sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x})P_n(\varvec{x})= \alpha ,\\ E_{H_1}[(1-\phi )(\varvec{X})]= & {} \sum _{\varvec{x} \in {\mathcal {S}}} (1-\phi (\varvec{x}))Q_n(\varvec{x})= \beta . \end{aligned}$$

Note that,

$$\begin{aligned} D_{KL}(Q_n\Vert P_n)= & {} \sum _{\varvec{x} \in {\mathcal {S}}} Q_n(\varvec{x}) \log _2 \frac{Q_n(\varvec{x})}{P_n(\varvec{x})} = \sum _{\varvec{x} \in {\mathcal {S}}} P_n(\varvec{x})\frac{Q_n(\varvec{x})}{P_n(\varvec{x})} \log _2 \frac{Q_n(\varvec{x})}{P_n(\varvec{x})} \\= & {} \sum _{\varvec{x} \in {\mathcal {S}}} P_n(\varvec{x}) f\left( \frac{Q_n(\varvec{x})}{P_n(\varvec{x})}\right) \end{aligned}$$

where \(f:{\mathbb {R}}^{+} \rightarrow {\mathbb {R}}\) defined as \(f(z) = z \log _2 (z),\,\forall \, z >0.\) Then, \( \dfrac{d^2f(z)}{dz^2} = {(z\ln 2)}^{-1}> 0,\, \forall \,z >0 \); which implies f is convex and continuous also. Hence, using Jensen’s Inequality, we have

$$\begin{aligned}&\sum _{\varvec{x} \in {\mathcal {S}}} \frac{\phi (\varvec{x})P_n (\varvec{x})}{\sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x}) P_n(\varvec{x})} f\left( \frac{Q_n(\varvec{x})}{P_n(\varvec{x})}\right) \\ \\&\quad \ge f\left( \sum _{\varvec{x} \in {\mathcal {S}}} \frac{ \phi (\varvec{x}) P_n(\varvec{x})}{\sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x}) P_n(\varvec{x})} \frac{Q_n(\varvec{x})}{P_n(\varvec{x})}\right) \\ \\&\quad = f\left( \frac{\sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x}) Q_n(\varvec{x})}{ \sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x})P_n(\varvec{x})}\right) \\ \\&\quad = f\left( \frac{1-\beta }{\alpha }\right) . \end{aligned}$$

Hence,

$$\begin{aligned} \sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x}) P_n(\varvec{x}) f\left( \frac{Q_n(\varvec{x})}{P_n(\varvec{x})}\right)\ge & {} \sum _{\varvec{x} \in {\mathcal {S}}} \phi (\varvec{x}) P_n(\varvec{x}) f\left( \frac{1-\beta }{\alpha }\right) \\ \\= & {} \alpha f\left( \frac{1-\beta }{\alpha }\right) \\ \\= & {} (1-\beta )\log _2\left( \frac{1-\beta }{\alpha }\right) . \end{aligned}$$

Replacing \(\phi \) by \(1-\phi \) and taking similar sums we get,

$$\begin{aligned} \sum _{\varvec{x} \in {\mathcal {S}}} (1-\phi (\varvec{x})) P_n(\varvec{x}) f\left( \frac{Q_n(\varvec{x})}{P_n(\varvec{x})}\right) \ge \beta \log _2\frac{\beta }{1-\alpha }. \end{aligned}$$

Summing the above two inequalities we get

$$\begin{aligned}&\sum _{\varvec{x} \in {\mathcal {S}}} P_n(\varvec{x}) f\left( \frac{Q_n(\varvec{x})}{P_n(\varvec{x})}\right) \\&\quad \ge \beta \log _2\frac{\beta }{1-\alpha } +(1- \beta )\log _2\frac{1-\beta }{\alpha }, \end{aligned}$$

and hence the desired result. \(\square \)

Appendix B: Proof of Lemma 4

Proof

This proof of Chernoff–Stein Lemma occurs in [10]. First note that,

$$\begin{aligned} \log _2 \left[ \frac{P_n(X_1,\ldots ,X_n)}{Q(X_1,\ldots ,X_n)} \right] = \sum _{k=1}^n \log _2 \left[ \frac{P(X_k)}{Q(X_k)} \right] , \end{aligned}$$

and by Law of Large numbers

$$\begin{aligned} \frac{1}{n} \sum _{k=1}^n \log _2 \left[ \frac{P(X_k)}{Q(X_k)} \right]&\mathop {\longrightarrow }\limits ^{p}&E_P \left[ \log _2 \left( \frac{P(X_1)}{Q(X_1)} \right) \right] \\= & {} D_{KL} (P \Vert Q), \end{aligned}$$

under the null. Hence,

$$\begin{aligned} \frac{1}{n} \log _2 \left[ \frac{P_n(X_1,\ldots ,X_n)}{Q(X_1,\ldots ,X_n)} \right] \mathop {\longrightarrow }\limits ^{p} D_{KL} (P \Vert Q), \end{aligned}$$

which by definition gives that \(\forall \epsilon , \alpha > 0\), \(\exists \) \(N_{\epsilon ,\alpha } \in {\mathbb {N}}\) such that, \(\forall \, n \ge N_{\epsilon ,\alpha }\), we have

$$\begin{aligned} P_n \left[ \Bigg |\frac{1}{n} \log _2 \left[ \frac{P_n(\varvec{X})}{Q_n(\varvec{X})} \right] - D_{KL} (P\Vert Q)\Bigg | < \epsilon \right] \ge 1- \alpha , \end{aligned}$$
(26)

where \(D=D_{KL}(P\Vert Q)\) and \(\varvec{X}=(X_1,\ldots ,X_n)\). Now, define \(A_n^{\epsilon }\) be the subset of \({\mathcal {\chi }}^n\) consisting of all \(\varvec{x}=(x_1,\ldots ,x_n)\) such that

$$\begin{aligned} P_n(\varvec{x})2^{-n(D+\epsilon )}< Q_n(\varvec{x}) < P_n(\varvec{x}) 2^{-n(D-\epsilon )}, \end{aligned}$$

i.e.,

$$\begin{aligned} \left| \frac{1}{n} \log _2 \left[ \frac{P_n(\varvec{x})}{Q_n(\varvec{x})} \right] - D\right| < \epsilon . \end{aligned}$$

Then, Eq. (26) gives,

$$\begin{aligned} P_n (A_n^{\epsilon }) \ge 1- \alpha , \end{aligned}$$

\(\forall \, n \ge N_{\epsilon ,\alpha }\). Also note that

$$\begin{aligned} Q_n(A_n^{\epsilon })= & {} \sum _{\varvec{x} \in A_n^{\epsilon }} Q_n (\varvec{x}) \end{aligned}$$
(27)
$$\begin{aligned}< & {} \sum _{\varvec{x} \in A_n^{\epsilon }} P_n (\varvec{x}) 2^{-n(D-\epsilon )} < 2^{-n(D-\epsilon )}, \end{aligned}$$
(28)

and

$$\begin{aligned} Q_n(A_n^{\epsilon })= & {} \sum _{\varvec{x} \in A_n^{\epsilon }} Q_n (\varvec{x}) \end{aligned}$$
(29)
$$\begin{aligned}> & {} \sum _{\varvec{x} \in A_n^{\epsilon }} P_n (\varvec{x}) 2^{-n(D+\epsilon )} \end{aligned}$$
(30)
$$\begin{aligned}= & {} 2^{-n(D+\epsilon )} P_n(A_n^{\epsilon }) \ge (1 - \alpha )2^{-n(D+\epsilon )}, \end{aligned}$$
(31)

\(\forall \, n \ge N_{\epsilon ,\delta }\). Now consider the test which rejects the null if and only if \(\varvec{x} \notin A_n^{\epsilon }\). Then, by Eq. (27) \(\forall \; n \ge N_{\epsilon ,\alpha }\),

$$\begin{aligned} 1-P_n(A_n^{\epsilon })< \alpha \quad \text {and} \quad Q_n(A_n^{\epsilon })< 2^{-n(D-\epsilon )}, \end{aligned}$$

which says that the non-randomized test with acceptance region \(A_n^{\epsilon }\) has size less than \(\alpha \) and has false negative error less than \(2^{-n(D-\epsilon )}\). So, by definition of \( \beta _{n,\alpha }\), which is the least attainable false negative error for level \(\alpha \) non-randomized tests, we have

$$\begin{aligned} \beta _{n,\alpha } < 2^{-n(D-\epsilon )}. \end{aligned}$$

Thus we have, \(\forall \, n \ge N_{\epsilon ,\alpha },\epsilon > 0\),

$$\begin{aligned} \frac{\log _2 \beta _{n,\alpha }}{n} < -D + \epsilon \Longrightarrow \limsup _{n \rightarrow \infty } \frac{\log _2 \beta _{n,\alpha }}{n} \le - D. \end{aligned}$$
(32)

On the other hand, consider any other test with rejection region \({\mathcal {R}}\), such that \(P_n({\mathcal {R}})< \alpha \). Then we have, \(\forall \, n \ge N_{\epsilon ,\alpha }\),

$$\begin{aligned} Q_n({\mathcal {R}}^c)\ge & {} Q_n({\mathcal {R}}^c \cap A_n^{\epsilon }) \\= & {} \sum _{\varvec{x} \in {\mathcal {R}}^c \cap A_n^{\epsilon }}Q_n(\varvec{x}) \\> & {} \sum _{\varvec{x} \in {\mathcal {R}}^c \cap A_n^{\epsilon }} 2^{-n(D+\epsilon )} P_n(\varvec{x}) \\= & {} 2^{-n(D+\epsilon )} P_n({\mathcal {R}}^c \cap A_n^{\epsilon }) \\\ge & {} 2^{-n(D+\epsilon )} (P_n({A_n^{\epsilon }})-P_n({\mathcal {R}})) \\\ge & {} 2^{-n(D+\epsilon )} (1-2\alpha ) \end{aligned}$$

Hence, \(\forall \, n \ge N_{\epsilon ,\alpha }\),

$$\begin{aligned} \beta _{n,\alpha } = \min _{{\mathcal {R}},P_n({\mathcal {R}})<\alpha }Q_n({\mathcal {R}}^c) > 2^{-n(D+\epsilon )} (1-2\alpha ), \end{aligned}$$

which in turn gives,

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{\log _2 \beta _{n,\alpha }}{n} \ge -D - \epsilon , \;\forall \, \epsilon >0. \end{aligned}$$

Therefore,

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{\log _2 \beta _{n,\alpha }}{n} \ge -D. \end{aligned}$$
(33)

Combining Eqs. (32) and (33) we get the desired result. \(\square \)

Appendix C: Proof of Lemma 6

Proof

Without loss of generality assume that \(s=0\). Note that

$$\begin{aligned} \sum _{r=1}^{N-1} \Pr (W_0< W_r)\ge & {} \Pr ( \exists \; 1 \le k \le (N-1) \; s.t. \; W_0 < W_k ) \\= & {} \Pr \Bigg (\max _{0 \le i \le N-1} W_i \ne W_0\Bigg ) \\\ge & {} \Pr ( W_0 \le W_l), \quad \forall \; l=1,\ldots ,N-1. \end{aligned}$$

\(\forall \, k=1,\ldots ,N-1,\) we have, \(\Pr (W_0< W_k) = \Pr (W_0-W_k < 0)\) and

$$\begin{aligned} W_0-W_k \sim {\mathcal {N}}(\mu _0-\mu _k, \frac{1}{n}(\sigma _{00} +\sigma _{kk}-2\sigma _{0k})), \end{aligned}$$

i.e.,

$$\begin{aligned} R_k := \dfrac{\sqrt{n}}{\sqrt{\sigma _{00}+\sigma _{kk}-2 \sigma _{0k}}}(W_0-W_k) \sim {\mathcal {N}}(\sqrt{n}\delta _k,1). \end{aligned}$$

Hence,

$$\begin{aligned} \Pr (W_0-W_k< 0) = \Pr (R_k < 0)= \varPhi (-\sqrt{n} \delta _k). \end{aligned}$$
(34)

Thus,

$$\begin{aligned} \sum _{r=1}^{N-1} \varPhi (-\sqrt{n} \delta _r)\ge & {} \Pr ( \exists \; 1 \le k \le (N-1) \; s.t. \; W_0 < W_k ) \\\ge & {} \varPhi (-\sqrt{n}\delta _l), \quad \forall \; l=1,\ldots ,N-1. \end{aligned}$$

which gives

$$\begin{aligned} \sum _{r=1}^{N-1} \varPhi (-\sqrt{n} \delta _r)\ge & {} \Pr ( \exists \; 1 \le k \le (N-1) \; s.t. \; W_0 < W_k ) \\\ge & {} \max _{1 \le l \le N-1} \varPhi (-\sqrt{n}\delta _l) = \varPhi (-\sqrt{n}\delta ) \end{aligned}$$

Let, Now, we shall show that the ratio of the two extremes in the inequality stated above goes to 1 as n goes to infinity, i.e., for large n they are quite close and then we can approximate the middle term by the right-hand extreme. The limit we get by using L’Hospital’s Rule is as follows,

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{ \varPhi (-\sqrt{n}\delta )}{\sum _{r=1}^{N-1} \varPhi (-\sqrt{n} \delta _r)}= & {} \lim _{n \rightarrow \infty } \frac{\delta e^{-\frac{n \delta ^2}{2}}}{\sum _{r=1}^{N-1} \delta _r e^{-\frac{n \delta _r^2}{2}}} \\= & {} 1 \end{aligned}$$

as \(\delta < \delta _k, \quad \forall k \ne 0,j. \) So,

$$\begin{aligned} \Pr \Bigg (\max _{0 \le i \le N-1} W_i \ne W_0\Bigg ) = \Pr (\exists \; 1 \le k \le (N-1) \; s.t. \; W_0 < W_k ) \approx \varPhi (-\sqrt{n}\delta ). \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paul, G., Ray, S. On data complexity of distinguishing attacks versus message recovery attacks on stream ciphers. Des. Codes Cryptogr. 86, 1211–1247 (2018). https://doi.org/10.1007/s10623-017-0391-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10623-017-0391-z

Keywords

Mathematics Subject Classification

Navigation