Skip to main content
Log in

Revisiting Mutual Information Analysis: Multidimensionality, Neural Estimation and Optimality Proofs

  • Research Article
  • Published:
Journal of Cryptology Aims and scope Submit manuscript

Abstract

Recent works showed how Mutual Information Neural Estimation (MINE) could be applied to side-channel analysis in order to evaluate the amount of leakage of an electronic device. One of the main advantages of MINE over classical estimation techniques is to enable the computation between high dimensional traces and a secret, which is relevant for leakage assessment. However, optimally exploiting this information in an attack context in order to retrieve a secret remains a non-trivial task especially when a profiling phase of the target is not allowed. Within this context, the purpose of this paper is to address this problem based on a simple idea: there are multiple leakage sources in side-channel traces and optimal attacks should necessarily exploit most/all of them. To this aim, a new mathematical framework, designed to bridge classical Mutual Information Analysis (MIA) and the multidimensional aspect of neural-based estimators, is proposed. One of the goals is to provide rigorous proofs consolidating the mathematical basis behind MIA, thus alleviating inconsistencies found in the state of the art. This framework allows to derive a new attack called Neural Estimated Mutual Information Analysis (NEMIA). To the best of our knowledge, it is the first unsupervised attack able to benefit from both the power of deep learning techniques and the valuable theoretical properties of MI. From simulations and experiments conducted in this paper, it seems that NEMIA performs better than classical and more recent deep learning based unsupervised side-channel attacks, especially in low-information contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Due to MI estimator limitations, \({\mathcal {D}}(k)\) is often replaced in practice by \(\max _i {\mathcal {I}} (f(Z_k), L[i])\), where L[i] represents the i-th sample of the trace. This does not affect the theory described in this section so we decided to keep it as described in Eq. 5 for the sake of simplicity. More details are provided in Sect. 4.2.

  2. Note that other criterion such as the distance with the mean of the wrong hypotheses could also have been used without modifying the analysis as discussed in remark 1.

References

  1. M.I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, R. Devon Hjelm, Mine: Mutual information neural estimation (2018)

  2. E. Brier, C. Clavier, F. Olivier, Correlation power analysis with a leakage model, in Marc Joye and Jean-Jacques Quisquater, editors, Cryptographic Hardware and Embedded Systems - CHES 2004 (Springer, Berlin, Heidelberg, 2004)

  3. L. Batina, B. Gierlichs, E. Prouff, M. Rivain, F.-X. Standaert, N. Veyrat-Charvillon, Mutual information analysis: A comprehensive study. J. Cryptol. 24(2), 269–291 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. R. Benadjila, E. Prouff, R. Strullu, E. Cagli, C. Dumas, Study of deep learning techniques for side-channel analysis and introduction to ascad database (ANSSI, France & CEA, LETI, MINATEC Campus, France, 2018)

    Google Scholar 

  5. J. Normand, Beaudry and Renato Renner. An intuitive proof of the data processing inequality (2012)

  6. C. Chan, A. Al-Bashabsheh, H.P. Huang, M. Lim, D.S. Handason Tam, C. Zhao, Neural entropic estimation: A faster path to mutual information estimation (2019)

  7. E. Cagli, C. Dumas, E. Prouff, Convolutional neural networks with data augmentation against jitter-based countermeasures, in Wieland Fischer and Naofumi Homma, editors, Cryptographic Hardware and Embedded Systems—CHES 2017 (Springer International Publishing, Cham, 2017), pp. 45–68

  8. S. Chari, C.S. Jutla, J.R. Rao, P. Rohatgi, Towards sound approaches to counteract power-analysis attacks, in Michael Wiener, editor, Advances in Cryptology—CRYPTO’ 99 (Springer, Berlin, Heidelberg, 1999), pp. 398–412

  9. K. Choi, S. Lee, Regularized mutual information neural estimation (2020)

  10. V. Cristiani, M. Lecomte, T. Hiscock, A bit-level approach to side channel based disassembling, in CARDIS 2019 (Prague, Czech Republic, 2019)

  11. V. Cristiani, M. Lecomte, P. Maurine, Leakage assessment through neural estimation of the mutual information, in International Conference on Applied Cryptography and Network Security (ACNS), volume 12418 of Lecture Notes in Computer Science (Rome, Italy, 2020), pp. 144–162

  12. J. Doget, E. Prouff, M. Rivain, F.-X. Standaert, Univariate side channel attacks and leakage modeling. J. Cryptog. Eng. 1, 123–144 (2012)

  13. M. Abdelaziz Elaabid, S. Guilley, Portability of templates. J. Cryptogra. Eng. 2, 63–74 (2012)

  14. B. Gierlichs, L. Batina, P. Tuyls, B. Preneel, Mutual information analysis, in Elisabeth Oswald, Pankaj Rohatgi, editors, Cryptographic Hardware and Embedded Systems—CHES 2008 (Springer, Berlin, Heidelberg, 2008)

    Google Scholar 

  15. D.P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014)

  16. P. Kocher, J. Jaffe, B. Jun, Differential power analysis, in Annual International Cryptology Conference (1999)

  17. P.C. Kocher, Timing attacks on implementations of Diffie-Hellman, rsa, dss, and other systems, in Advances in Cryptology—CRYPTO ’96, 16th Annual International Cryptology Conference, Santa Barbara, California, USA, August 18–22, 1996, Proceedings, volume 1109 of Lecture Notes in Computer Science (Springer, 1996), pp. 104–113

  18. X. Lin, I. Sur, S.A. Nastase, A. Divakaran, U. Hasson, M.R. Amer, Data-efficient mutual information neural estimator (2019)

  19. L. Masure, V. Cristiani, M. Lecomte, F.-X. Standaert, Don’t learn what you already know: Scheme-aware modeling for profiling side-channel analysis against masking. Cryptology ePrint Archive, Paper 2022/493 (2022). https://eprint.iacr.org/2022/493

  20. L. Masure, C. Dumas, E. Prouff, A comprehensive study of deep learning for side-channel analysis. IACR Trans. Cryptograph. Hardware Embedded Syst. 2020 (2019)

  21. A. Moradi, T. Eisenbarth, A. Poschmann, C. Rolfes, C. Paar, M.T. Manzuri, M. Salmasizadeh, Information leakage of flip-flops in dpa-resistant logic styles. IACR Cryptology ePrint Archive 2008, 188 (2008)

  22. T.S. Messerges, Using second-order power analysis to attack dpa resistant software, in Çetin K. Koç and Christof Paar, editors, Cryptographic Hardware and Embedded Systems—CHES 2000 (Springer, Berlin, Heidelberg, 2000), pp. 238–251

  23. H. Maghrebi, T. Portigliatti, E. Prouff, Breaking cryptographic implementations using deep learning techniques, in Claude Carlet, M. Anwar Hasan, and Vishal Saraswat, editors, Security, Privacy, and Applied Cryptography Engineering (Springer International Publishing, Cham, 2016), pp. 3–26

  24. C. Percival, Cache missing for fun and profit, in Proceeding of BSDCan 2005 (2005)

  25. E. Prouff, M. Rivain, Theoretical and practical aspects of mutual information based side channel analysis, in Michel Abdalla, David Pointcheval, Pierre-Alain Fouque, and Damien Vergnaud, editors, Applied Cryptography and Network Security (Springer, Berlin, Heidelberg, 2009), pp. 499–518

  26. E. Prouff, M. Rivain, Masking against side-channel attacks: A formal security proof, in Thomas Johansson and Phong Q. Nguyen, editors, Advances in Cryptology—EUROCRYPT 2013 (Springer, Berlin, Heidelberg, 2013), pp. 142–159

  27. E. Prouff, M. Rivain, R. Bevan, Statistical analysis of second order differential power analysis. IEEE Trans. Comput. 58(6), 799–811 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. J.-J. Quisquater, D. Samyde, Electromagnetic analysis (ema): Measures and counter-measures for smart cards, in Isabelle Attali and Thomas Jensen, editors, Smart Card Programming and Security (Springer, Berlin, Heidelberg, 2001), pp. 200–210

  29. O. Reparaz, B. Gierlichs, I. Verbauwhede, Generic dpa attacks: Curse or blessing? In Emmanuel Prouff, editor, Constructive Side-Channel Analysis and Secure Design (Springer International Publishing, Cham, 2014), pp. 98–111

  30. O. Reparaz, B. Gierlichs, I. Verbauwhede, A note on the use of margins to compare distinguishers, in Emmanuel Prouff, editor, Constructive Side-Channel Analysis and Secure Design (Springer International Publishing, Cham, 2014), pp. 1–8

  31. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  32. A. Schaub, E. Schneider, A. Hollender, V. Calasans, L. Jolie, R. Touillon, A. Heuser, S. Guilley, O. Rioul, Attacking suggest boxes in web applications over https using side-channel stochastic algorithms. 8924, 116–130 (2014)

  33. B. Timon, Non-profiled deep learning-based side-channel attacks with sensitivity analysis. IACR Trans. Cryptograph. Hardware Embedded Syst. 2019(2), 107–131 (2019)

    Article  Google Scholar 

  34. N. Veyrat-Charvillon, F.-X. Standaert, Mutual information analysis: How, when and why? In Christophe Clavier and Kris Gaj, editors, Cryptographic Hardware and Embedded Systems—CHES 2009 (Springer, Berlin, Heidelberg, 2009), pp. 429–443

  35. C. Whitnall, E. Oswald, A comprehensive evaluation of mutual information analysis using a fair evaluation framework. In Phillip Rogaway, editor, Advances in Cryptology—CRYPTO 2011 (Springer, Berlin, Heidelberg, 2011), pp. 316–334

  36. C. Whitnall, E. Oswald, F.-X. Standaert, The myth of generic dpa...and the magic of learning, in Josh Benaloh, editor, Topics in Cryptology—CT-RSA 2014 (Springer International Publishing, Cham, 2011), pp. 183–205

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valence Cristiani.

Additional information

Communicated by François-Xavier Standaert.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Lemma 1

Lemma 1

Let f: \({\mathcal {Z}} \rightarrow {\mathbb {R}}^n\) be any function. For any leakage model \(\varphi \): \({\mathcal {Z}} \rightarrow {\mathbb {R}}^n\) there exists a decomposition of f into \(f = f_2 \circ f_1\), with \(f_1: {\mathcal {Z}} \rightarrow {\mathbb {N}}\), \(f_2: {\mathbb {N}} \rightarrow {\mathbb {R}}^n\), satisfying the two following properties:

  1. 1)

    \(\exists \,f_3: {{\,\textrm{Im}\,}}f_1 \rightarrow {\mathbb {R}}^n \text { such that } f_3 \circ f_1 = \varphi \)

  2. 2)

    \(\forall z \in {\mathcal {Z}}, f_2\vert _{f_1 \big (\varphi ^{-1}(\{\varphi (z)\})\big )} \) is bijective of reciprocal \(f_2^{-1}\vert _{f_2\circ f_1\big (\varphi ^{-1}(\{\varphi (z)\}) \big )}\)

Proof

Let us create a partition of \({\mathcal {Z}} = \sqcup _{i=1}^n P_i\) where two elements \(z_1, z_2 \in {\mathcal {Z}}\) are in the same \(P_i\) if and only if:

  • \(\varphi (z_1) = \varphi (z_2)\)

  • \(f(z_1) = f(z_2)\)

Then, one may define \(f_1\) as \(f_1(z) = i, \forall z \in P_i\). Since \(f_1\) only collides for z that already collides through \(\varphi \), there exists \(f_3\) such that \(f_3 \circ f_1 = \varphi \). As f is constant on \(P_i\), let us denote by \(v_i\) its output on elements of \(P_i\). Then \(f_2\) can be defined as \(f_2(i) = v_i\) so that \(f_2 \circ f_1 = f\). Now let us prove 2). Let \(z \in {\mathcal {Z}}\) and \(a,b \in f_1(\varphi ^{-1}(\{\varphi (z)\}))\) such that \(f_2(a) = f_2(b)\). There exists \(z_a\) and \(z_b\) such that \(a = f_1(z_a)\) and \(b=f_1(z_b)\) with \(\varphi (z_a) = \varphi (z_b) = \varphi (z)\). So:

  • \(\varphi (z_a) = \varphi (z_b)\)

  • \(f_2(f_1(z_a)) = f_2(f_1(z_b)) \iff f(z_a) = f(z_b)\)

which means that \(z_a\) and \(z_b\) are in the same \(P_i\) and thus collides through \(f_1\). So \(a = b\) which proves that \(f_2\vert _{f_1(\varphi ^{-1}(\{\varphi (z)\}))}\) is injective. Then, considering its set of destination being its image, one can say that this function is bijective with reciprocal function: \(f_2^{-1}\vert _{f_2\circ f_1(\varphi ^{-1}(\{\varphi (z)\}))}\). \(\square \)

B Proof of Corollary 1

Definition 1

A function f is said wider- than g if there exists another function h such that: \(h \circ f =g\).

Corollary 1

Let L be defined as in (47). Then, for any function \({\bar{h}}\) wider than \(\text {HW}\), \({\mathcal {S}}_{\text {HW}} \ge {\mathcal {S}}_{{\bar{h}}}\).

Proof

There exists h such that \(h \circ {\bar{h}} = \text {HW}\). So:

$$\begin{aligned} \begin{aligned} {\mathcal {S}}_{\text {HW}}&= {\mathcal {I}} \big (\text {HW}(Z_{k^*}), L \big ) - \max _{k\ne k^*} \big [ {\mathcal {I}} \big (\text {HW}(Z_k), L \big )\big ] \\&= {\mathcal {I}} \big (h \circ {\bar{h}}(Z_{k^*}), L \big ) - \max _{k\ne k^*} \big [ {\mathcal {I}} \big (h \circ {\bar{h}}(Z_k), L \big )\big ] \end{aligned} \end{aligned}$$
(81)

Since removing h in the second term can only increase the information:

$$\begin{aligned} {\mathcal {S}}_{\text {HW}} \ge {\mathcal {I}} \big (h \circ {\bar{h}}(Z_{k^*}), L \big ) - \max _{k\ne k^*} \big [ {\mathcal {I}} \big ({\bar{h}}(Z_k), L \big )\big ] \end{aligned}$$
(82)

By Th.2, \(\text {HW}\) maximizes over g the quantity: \({\mathcal {I}} \big (g(Z_{k^*}), L \big )\), so removing h in the first term cannot increase the information:

$$\begin{aligned} \begin{aligned} {\mathcal {S}}_{\text {HW}}&\ge {\mathcal {I}} \big ({\bar{h}}(Z_{k^*}), L \big ) - \max _{k\ne k^*} \big [ {\mathcal {I}} \big ({\bar{h}}(Z_k), L \big )\big ] \\ {\mathcal {S}}_{\text {HW}}&\ge {\mathcal {S}}_{{\bar{h}}} \end{aligned} \end{aligned}$$
(83)

\(\square \)

C Complementary Material on the Entropy

Lemma 2

Let A and B be a two discrete random variables. Let f: \({\mathcal {A}} \rightarrow {\mathbb {R}}^n\) be any function. Then:

$$\begin{aligned} {\mathcal {H}}\big (f(A) \mid B\big ) \le {\mathcal {H}}(A \mid B) \end{aligned}$$
(84)

Proof

The data processing inequality [5] ensures that applying f to any variables can not increase its mutual information with another variable so:

$$\begin{aligned} \begin{aligned} {\mathcal {I}}\big (f(A),f(A) \mid B\big )&\le {\mathcal {I}}(A,A \mid B) \\ {\mathcal {H}}\big (f(A) \mid B\big )&\le {\mathcal {H}}(A \mid B) \end{aligned} \end{aligned}$$
(85)

\(\square \)

Lemma 3

Let A and B be a two discrete random variables. Let f: \({\mathcal {A}} \rightarrow {\mathbb {R}}^n\) be any function. Then:

$$\begin{aligned} {\mathcal {H}}\big (A \mid f(B)\big ) \ge {\mathcal {H}}(A \mid B) \end{aligned}$$
(86)

Proof

Again, using the data processing inequality [5]:

$$\begin{aligned} \begin{aligned} {\mathcal {I}}\big (A, f(B)\big )&\le {\mathcal {I}}(A, B) \\ {\mathcal {H}}(A) - {\mathcal {H}}\big (A \mid f(B)\big )&\le {\mathcal {H}}(A) - {\mathcal {H}}(A \mid B)\\ {\mathcal {H}}\big (A \mid f(B)\big )&\ge {\mathcal {H}}(A \mid B) \end{aligned} \end{aligned}$$
(87)

\(\square \)

Fig. 6
figure 6

Network architecture for MINE

Fig. 7
figure 7

Network architecture for the classifiers (Supervised and DDLA)

D Network Architectures

Figures 6 and 7 show the network architectures used for the experiments performed respectfully with MINE and classifiers (supervised and DDLA). For fairness, we tried to keep the two architectures as close as possible. The optimizer used in both cases is Adam [15] with default parameters. The loss function used for the classifiers is the categorical cross-entropy. Note that when using convolutional layers with MINE, the convolutional layers should only be applied to the trace variable and not to \(f(Z_k)\) which would not make sense.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cristiani, V., Lecomte, M. & Maurine, P. Revisiting Mutual Information Analysis: Multidimensionality, Neural Estimation and Optimality Proofs. J Cryptol 36, 38 (2023). https://doi.org/10.1007/s00145-023-09476-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00145-023-09476-0

Keywords

Navigation