Skip to main content

NNBits: Bit Profiling with a Deep Learning Ensemble Based Distinguisher

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13871)


We introduce a deep learning ensemble (NNBits) as a tool for bit-profiling and evaluation of cryptographic (pseudo) random bit sequences. On the one hand, we show how to use NNBits ensemble to explain parts of the seminal work of Gohr [16]: Gohr’s depth-1 neural distinguisher reaches a test accuracy of 78.3% in round 6 for SPECK32/64  [3]. Using the bit-level information provided by NNBits we can partially explain the accuracy obtained by Gohr (78.1% vs. 78.3%). This is achieved by constructing a distinguisher which only uses the information about correct or incorrect predictions on the single bit level and which achieves 78.1% accuracy. We also generalize two heuristic aspects in the construction of Gohr’s network: i) the particular input structure, which reflects expert knowledge of SPECK32/64, as well as ii) the cyclic learning rate.

On the other hand, we extend Gohr’s work as a statistical test on avalanche datasets of SPECK32/64, SPECK64/128, SPECK96/144, SPECK128/128, and AES-128. In combination with NNBits ensemble we use the extended version of Gohr’s neural network to draw a comparison with the NIST Statistical Test Suite (NIST STS) on the previously mentioned avalanche datasets. We compare NNBits in conjunction with Gohr’s generalized network to the NIST STS and conclude that the NNBits ensemble performs either as good as the NIST STS or better. Furthermore, we demonstrate cryptanalytic insights that result from bit-level profiling with NNBits, for example, we show how to infer the strong input difference \((0x0040, 0x0000)\) for SPECK32/64 or infer a signature of the multiplication in the Galois field of AES-128.


  • Cryptanalysis
  • Evaluation tools
  • Block cipher
  • Distinguisher
  • Avalanche dataset
  • Bit-profiling
  • Neural networks
  • Random number generator

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    Note that this statement has limited practical implications: even if enough data, representational power of the network, as well as sufficient computational resources to train the network are given, the training itself may be an NP-hard problem [26].

  2. 2.

    This limit has recently been overpassed by human cryptanalysis in [8], giving machine learning a new threshold to overcome.


  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2016), pp. 265–283 (2016)

    Google Scholar 

  2. Géron, A.: Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media (2019).

  3. Bacuieti, N.N., Batina, L., Picek, S.: Deep neural networks aiding cryptanalysis : a case study of the Speck distinguisher. ePrint, pp. 1–24 (2022).

  4. Baksi, A., Breier, J., Chen, Y., Dong, X.: Machine learning assisted differential distinguishers for lightweight ciphers. In: 2021 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 176–181 (2021).

  5. Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.: The SIMON and SPECK families of lightweight block ciphers. National Security Agency (NSA), 9800 Savage Road, Fort Meade, MD 20755, USA (2013)

    Google Scholar 

  6. Bellini, E., Rossi, M.: Performance comparison between deep learning-based and conventional cryptographic distinguishers. In: Arai, K. (ed.) Intelligent Computing. LNNS, vol. 285, pp. 681–701. Springer, Cham (2021).

    CrossRef  Google Scholar 

  7. Benamira, A., Gerault, D., Peyrin, T., Tan, Q.Q.: A deeper look at machine learning-based cryptanalysis. In: Canteaut, A., Standaert, F.-X. (eds.) EUROCRYPT 2021. LNCS, vol. 12696, pp. 805–835. Springer, Cham (2021).

    CrossRef  Google Scholar 

  8. Biryukov, A., dos Santos, L.C., Teh, J.S., Udovenko, A., Velichkov, V.: Meet-in-the-filter and dynamic counting with applications to speck. Cryptology ePrint Archive (2022)

    Google Scholar 

  9. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996).

    CrossRef  MATH  Google Scholar 

  10. Brown, R.G.: DieHarder: a GNU public license random number tester. Duke University Physics Department, Durham, NC 27708-0305 (2006).

  11. Castro, J.C.H., Sierra, J.M., Seznec, A., Izquierdo, A., Ribagorda, A.: The strict avalanche criterion randomness test. Math. Comput. Simul. 68(1), 1–7 (2005).

    CrossRef  MathSciNet  MATH  Google Scholar 

  12. Daemen, J., Hoffert, S., Van Assche, G., Van Keer, R.: The design of xoodoo and xoofff. IACR Trans. Symmetric Cryptol. 2018(4), 1–38 (2018).,

  13. Daor, J., Daemen, J., Rijmen, V.: AES proposal: Rijndael (1999).

  14. Feistel, H.: Cryptography and computer privacy. Sci. Am. 228(5), 15–23 (1973)

    CrossRef  Google Scholar 

  15. Gohr, A.: Deep speck (2019).

  16. Gohr, A.: Improving attacks on round-reduced speck32/64 using deep learning. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11693, pp. 150–179. Springer, Cham (2019).

    CrossRef  Google Scholar 

  17. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 19. The MIT Press (2017).

  18. Gunning, D., Vorm, E., Wang, J.Y., Turek, M.: Darpa’s explainable AI (XAI) program: a retrospective. Appl. AI Lett. 2, e61 (2021).

  19. Gustafson, H., Dawson, E., Golić, J.D.: Automated statistical methods for measuring the strength of block ciphers. Stat. Comput. 7(2), 125–135 (1997)

    CrossRef  Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 770–778 (2016).,

  21. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. (1991).

    CrossRef  Google Scholar 

  22. Hou, Z., Ren, J., Chen, S., Fu, A.: Improve neural distinguishers of Simon and speck. Secur. Commun. Netw. 2021 (2021).

  23. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018).

  24. Langford, S.K., Hellman, M.E.: Differential-linear cryptanalysis. In: Desmedt, Y.G. (ed.) CRYPTO 1994. LNCS, vol. 839, pp. 17–25. Springer, Heidelberg (1994).

    CrossRef  Google Scholar 

  25. L’Ecuyer, P., Simard, R.: TestU01: a software library in ANSI C for empirical testing of random number generators, software user’s guide. Département d’Informatique et Recherche opérationnelle, Université de Montréal, Montréal, Québec, Canada (2001).

  26. Livni, R., Shalev-Shwartz, S., Shamir, O.: On the computational efficiency of symmetric neural networks. Adv. Neural Inf. Process. Syst. 27, 855–863 (2014).

  27. Makridakis, S., Spiliotis, E., Assimakopoulos, V.: The M4 competition: 100,000 time series and 61 forecasting methods. Int. J. Forecast. 36(1), 54–74 (2020).

    CrossRef  Google Scholar 

  28. Moritz, P., et al.: Ray: a distributed framework for emerging \(\{\)AI\(\}\) applications. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 561–577 (2018)

    Google Scholar 

  29. Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y.: N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. (2020).

  30. Reddi, S.J., Kale, S., Kumar, S.: On the convergence of Adam and beyond. arXiv preprint arXiv:1904.09237 (2019)

  31. Rukhin, A., et al.: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. NIST (2010)

    Google Scholar 

  32. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T., Silver, D.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020).

    CrossRef  Google Scholar 

  33. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015).

  34. Soto, J.: Randomness testing of the advanced encryption standard candidate algorithms. NIST Interagency/Internal Report (NISTIR) (1999).

  35. Soto, J., Bassham, L.: Randomness testing of the advanced encryption standard finalist candidates. NIST Interagency/Internal Report (NISTIR) (2000).

  36. Švenda, P., Ukrop, M., Matyáš, V.: Determining cryptographic distinguishers for eStream and SHA-3 candidate functions with evolutionary circuits. In: Obaidat, M.S., Filipe, J. (eds.) ICETE 2013. CCIS, vol. 456, pp. 290–305. Springer, Heidelberg (2014).

    CrossRef  Google Scholar 

  37. Team, P.: PyTorch ResNet implementation (2022).

  38. Team, R.: Ray (2022).

  39. (TII), T.I.I.: Crypto-TII nnbits (2022).

  40. Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020).

    CrossRef  Google Scholar 

  41. Walker, J.: ENT: a pseudorandom number sequence test program. Web site (2008).

  42. Yadav, T., Kumar, M.: Differential-ML distinguisher: machine learning based generic extension for differential cryptanalysis. In: Longa, P., Ràfols, C. (eds.) LATINCRYPT 2021. LNCS, vol. 12912, pp. 191–212. Springer, Cham (2021).

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Anna Hambitzer .

Editor information

Editors and Affiliations


ADetails for the Generalized network Experiment

We apply Generalized network as a distinguisher to the avalanche datasets of SPECK32/64, SPECK64/128, SPECK96/144, SPECK128/128 and AES-128 in the settings , which are advantageous for machine learning. Table 6 summarizes the experimental settings for each cipher. We generate X bit sequences of the length of avalanche units for the respective cipher. A randomly chosen half of the inputs X have the label \(Y=0\) and contains random data. The other half of the inputs has the label \(Y=1\) and contains avalanche units of a cipher, that is, not random data. A Generalized network, as presented in Sect. 4.2 is trained on a subset \(X_\textrm{train}\) to predict the labels \(Y_\textrm{train}\) for 10 epochs. Subsequently, previously unseen data \(X_\textrm{test}\) is used to evaluate the accuracy A of the distinguisher.

Table 6 summarizes the avalanche unit bit sizes, the number of avalanche units for training \(X_\textrm{train}\) and testing \(X_\textrm{test}\), as well as the distinguisher’s accuracy A for relevant rounds. The accuracy is given as the mean and standard deviation over four runs of the previously described experiment.

Table 6. Accuracies A for distinguishing avalanche units of the respective cipher from random data. Bold is the first round for which the distinguisher offers no advantage over a random-guess.

B Details of NIST Results

Fig. 12.
figure 12

Randomness evaluation of rounds by NIST STS.

C Details of NNBits Results

Table 7. Summary of the number of training and testing avalanche units presented to the neural network ensemble for each cipher. The detailed training outcomes for each round are shown in Table 8.
Table 8. Detailed results for the NNBits analysis presented in Table 5. For each round r the training settings (number of epochs, number of training avalanche sequences, number of testing avalanche sequences, as well as the runtime in minutes), as well as the resulting test accuracy, p-value and randomness result is shown.

D Bit Profiles of AES-128

1.1 D.1 AES Round 1/10 Bit Pattern

The previous analysis of SPECK32/64 has shown a particular region of weak bits. In AES-128, however, we find repeating patterns of weak and strong bits in rounds 1 and 2 in the 128 bit sub-blocks of the avalanche unit.

Figure 13 shows details of the patterns observed after one round of AES-128. The complete avalanche unit of AES-128 consists of \(128 \times 128=16,384\) bits. We analyze the complete avalanche unit in blocks of 128 bits (Fig. 13a)) and can identify four recurring patterns of weak and strong bits that occur throughout the avalanche unit. For example, pattern occurs in the avalanche blocks \(s=0\) to \(s=7\) (Fig. 13b)). Exemplary sections for the distributions of weak and strong bit patterns are shown in Fig. 13c).

After one round of AES-128 96 consecutive bits of the 128 bits in each subblock can be predicted with 100% accuracy. The remaining 32 bits (4 bytes) can be predicted with less than 100% accuracy, which can be understood as follows. The round function of the AES is such that changing one byte in the input results in differences in one column of the output after one round (with the other columns remaining undisturbed). This is a well-known fact about the AES, due in particular to the MDS property of the mixcolumns operation. For the avalanche dataset, this implies that for each subblock of 128 bits (corresponding to one input difference bit), 4 bytes (one column) are nonzero, while the rest of the bytes are all zeroes.

The distribution of patterns of Fig. 13a) and Fig. 13b) is still observable after two rounds of AES (we show the equivalent 2-round patterns in the appendix Fig. 14c)). When encrypting for two rounds, each of the nonzero bytes of round 1 is sent to a different column through the shiftrows operation, and then propagated to a whole column through mixcolumns, so that after two rounds, all the bytes of the dataset are non-zero. Furthermore, there are relations between the bytes of each column: mixcolumns applies a linear transformation to a 4-byte column, and by construction, only one byte is non-zero in each column. Therefore, the resulting values are multiples (in the Galois field of AES) of a single variable, with the coefficients (2, 3, 1, 1), in an order that depends on the position of the 128-bit block in the avalanche dataset. The bytes with coefficient 1 are consistently predicted, whereas only some bits of the bytes with coefficients 2 and 3 are reliably predicted. This explains the peculiar pattern observed in the prediction, where for each group of 4 bytes, there are peaks for 2 bytes, and for some of the bits among the remaining 2 bytes.

Fig. 13.
figure 13

NNBits ensemble analysis of the avalanche sequence of AES-128. a) The avalanche block s corresponds to bits \(128\,\times \,s \ldots 128\,\times \,(s\,+\,1)\). b) Four patterns occur over the total of 16, 384 bits in one avalanche unit. c) Examples of the recurring byte patterns of weak and strong bits observed in round 1/10.

1.2 D.2 AES Round 2/10 Bit Pattern

Please see Appendix D for the context of the analysis shown in Fig. 14.

Fig. 14.
figure 14

NNBits ensemble analysis of the avalanche sequence of AES-128. a) The avalanche block s corresponds to bits \(128\,\times \, s \ldots 128\,\times \, (s\,+\,1)\). b) Four patterns occur over the total of 16, 384 bits in one avalanche unit. c) Examples of the recurring patterns of weak (green) and strong (orange) bits. The pattern is actually the same, but shifted with a starting point indicated by the black arrow. (Color figure online)

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hambitzer, A., Gerault, D., Huang, Y.J., Aaraj, N., Bellini, E. (2023). NNBits: Bit Profiling with a Deep Learning Ensemble Based Distinguisher. In: Rosulek, M. (eds) Topics in Cryptology – CT-RSA 2023. CT-RSA 2023. Lecture Notes in Computer Science, vol 13871. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30871-0

  • Online ISBN: 978-3-031-30872-7

  • eBook Packages: Computer ScienceComputer Science (R0)