## Abstract

The Keccak hash function is the winner of the SHA-3 competition (2008–2012) and became the SHA-3 standard of NIST in 2015. In this paper, we focus on practical collision attacks against round-reduced SHA-3 and some Keccak variants. Following the framework developed by Dinur et al. at FSE 2012 where 4-round collisions were found by combining 3-round differential trails and 1-round connectors, we extend the connectors to up to three rounds and hence achieve collision attacks for up to 6 rounds. The extension is possible thanks to the large degree of freedom of the wide internal state. By linearizing S-boxes of the first round, the problem of finding solutions of 2-round connectors is converted to that of solving a system of linear equations. When linearization is applied to the first two rounds, 3-round connectors become possible. However, due to the quick reduction in the degree of freedom caused by linearization, the connector succeeds only when the 3-round differential trails satisfy some additional conditions. We develop dedicated strategies for searching differential trails and find that such special differential trails indeed exist. To summarize, we obtain the first real collisions on six instances, including three round-reduced instances of SHA-3, namely 5-round SHAKE128, SHA3-224 and SHA3-256, and three instances of Keccak contest, namely Keccak[1440, 160, 5, 160], Keccak[640, 160, 5, 160] and Keccak[1440, 160, 6, 160], improving the number of practically attacked rounds by two. It is remarked that the work here is still far from threatening the security of the full 24-round SHA-3 family.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
Our experiment shows \(2^{28.87}\) pairs of 5-round Keccak could be evaluated per second on NVIDIA GeForce GTX970 graphic card.

## References

- 1.
J.-P. Aumasson, W. Meier. Zero-sum distinguishers for reduced Keccak-f and for the core functions of Luffa and Hamsi.

*rump session of Cryptographic Hardware and Embedded Systems-CHES*, 2009 (2009) - 2.
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche. Keccak crunchy crypto collision and pre-image contest. http://keccak.noekeon.org/crunchy_contest.html

- 3.
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche. Cryptographic sponge functions.

*Submission to NIST (Round 3)*(2011). http://sponge.noekeon.org/CSF-0.1.pdf - 4.
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche. The Keccak reference. http://keccak.noekeon.org, January (2011). Version 3.0

- 5.
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche. KeccakTools. http://keccak.noekeon.org/, (2015)

- 6.
A. Canteaut, editor. in

*Fast Software Encryption—19th International Workshop, FSE 2012, Washington, DC, USA, March 19-21, 2012. Revised Selected Papers*, volume 7549 of*Lecture Notes in Computer Science*( Springer, 2012) - 7.
P.-L. Cayrel, G. Hoffmann, M. Schneider. GPU implementation of the Keccak hash function family. in

*International Conference on Information Security and Assurance*, (Springer, 2011), pp. 33–42 - 8.
J. Daemen.

*Cipher and Hash Function Design Strategies Based on Linear and Differential Cryptanalysis*. Ph.D. thesis, Doctoral Dissertation, March 1995, KU Leuven (1995) - 9.
J. Daemen, G. V. Assche. Differential propagation analysis of Keccak. in Canteaut [6], pp. 422–441

- 10.
I. Dinur, O. Dunkelman, A. Shamir. New attacks on Keccak-224 and Keccak-256. in Canteaut [6], pp. 442–461

- 11.
I. Dinur, O. Dunkelman, A. Shamir. Collision attacks on up to 5 rounds of SHA-3 using generalized internal differentials. in S. Moriai, editor,

*Fast Software Encryption—20th International Workshop, FSE 2013, Singapore, March 11–13, 2013. Revised Selected Papers*, volume 8424 of*Lecture Notes in Computer Science*, (Springer, 2013), pp. 219–240 - 12.
I. Dinur, O. Dunkelman, A. Shamir. Improved practical attacks on round-reduced Keccak.

*J. Cryptol.***27**(2), 183–209 (2014) - 13.
I. Dinur, P. Morawiecki, J. Pieprzyk, M. Srebrny, M. Straus. Cube attacks and cube-attack-like cryptanalysis on the round-reduced Keccak sponge function. in E. Oswald, M. Fischlin, editors,

*Advances in Cryptology—EUROCRYPT 2015, Sofia, Bulgaria, April 26–30, 2015, Proceedings, Part I*, volume 9056 of*LNCS*, (Springer, 2015), pp. 733–761 - 14.
A. Duc, J. Guo, T. Peyrin, L. Wei. Unaligned rebound attack: application to Keccak. in Canteaut [6], pp. 402–421

- 15.
J. Guo, J. Jean, I. Nikolic, K. Qiao, Y. Sasaki, S. M. Sim. Invariant subspace attack against Midori64 and the resistance criteria for S-box designs.

*IACR Trans. Symmetric Cryptol.***2016**(1), 33–56 (2016) - 16.
J. Guo, M. Liu, L. Song. Linear structures: applications to cryptanalysis of round-reduced Keccak. in J. H. Cheon, T. Takagi, editors,

*Advances in Cryptology—ASIACRYPT 2016, Hanoi, Vietnam, December 4–8, 2016, Proceedings, Part I*, volume 10031 of*LNCS*, (2016), pp. 249–274 - 17.
J. Jean, I. Nikolic. Internal differential boomerangs: practical analysis of the round-reduced Keccak-f permutation. In G. Leander, editor,

*Fast Software Encryption—FSE 2015, Istanbul, Turkey, March 8–11, 2015, Revised Selected Papers*, volume 9054 of*LNCS*, (Springer, 2015), pp. 537–556 - 18.
S. Kölbl, F. Mendel, T. Nad, M. Schläffer. Differential cryptanalysis of Keccak variants. in M. Stam, editor,

*Cryptography and Coding—14th IMA International Conference, IMACC 2013, Oxford, UK, December 17–19, 2013. Proceedings*, volume 8308 of*Lecture Notes in Computer Science*, (Springer, 2013), pp. 141–157 - 19.
S. Mella, J. Daemen, G. V. Assche. New techniques for trail bounds and application to differential trails in Keccak.

*IACR Trans. Symmetric Cryptol.***2017**(1), 329–357 (2017) - 20.
G. S. Murthy.

*Optimal loop unrolling for GPGPU programs*. Ph.D. thesis, The Ohio State University (2009) - 21.
M. Naya-Plasencia, A. Röck, W. Meier. Practical analysis of reduced-round Keccak. in D. J. Bernstein, S. Chatterjee, editors,

*Progress in Cryptology—INDOCRYPT 2011—12th International Conference on Cryptology in India, Chennai, India, December 11–14, 2011. Proceedings*, volume 7107 of*Lecture Notes in Computer Science*, (Springer, 2011), pp. 236–254 - 22.
NIST. SHA-3 Competition. http://csrc.nist.gov/groups/ST/hash/sha-3/index.html, 2007–2012

- 23.
C. Nvidia. CUDA C programming guide.

*Nvidia Corporation*,**120**(18) (2011) - 24.
K. Qiao, L. Song, M. Liu, J. Guo. New collision attacks on round-reduced Keccak. in J. Coron, J. B. Nielsen, editors,

*Advances in Cryptology—EUROCRYPT 2017—36th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Paris, France, April 30–May 4, 2017, Proceedings, Part III*, volume 10212 of*Lecture Notes in Computer Science*, (2017), pp. 216–243 - 25.
G. Sevestre. Implementation of Keccak hash function in tree hashing mode on Nvidia GPU (2010)

- 26.
L. Song, G. Liao, J. Guo. Non-full sbox linearization: applications to collision attacks on round-reduced Keccak. in J. Katz, H. Shacham, editors,

*Advances in Cryptology—CRYPTO 2017—37th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 20–24, 2017, Proceedings, Part II*, volume 10402 of*Lecture Notes in Computer Science*, (Springer, 2017), pp. 428–451 - 27.
The U.S. National Institute of Standards and Technology. SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions . Federal Information Processing Standard, FIPS 202, 5th August (2015)

- 28.
V. Volkov. Better performance at lower occupancy. in

*Proceedings of the GPU technology conference, GTC*, volume 10. San Jose, CA (2010)

## Acknowledgements

This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its Strategic Capability Research Centres Funding Initiative, NTU under research grants M4080456 and M4082123, and Ministry of Education Singapore under Grant M4012049. Guohong Liao is partially supported by the National Natural Science Foundation of China (Grant No. 61572028). Guozhen Liu is partially supported by the State Scholarship Fund (No. 201706230141) organized by China Scholarship Council. Meicheng Liu is partially supported by the National Natural Science Foundation of China (Grant No. 61672516). Kexin Qiao and Ling Song are partially supported by the National Natural Science Foundation of China (Grant Nos. 61802399, 61802400, 61732021 and 61772519), the Youth Innovation Promotion Association CAS, and Chinese Major Program of National Cryptography Development Foundation (Grant No. MMJJ20180102).

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is prepared based mainly on [24] and [26]. The work was done when all the authors were working with Nanyang Technological University in Singapore.

Communicated by Vincent Rijmen.

## Appendices

### A Linearizable Affine Subspaces

*The 80 2-dimensional linearizable affine subspaces are listed in Table* 7.

### B GPU Implementation

### B.1 Techniques for GPU Implementation Optimization

The techniques commonly used to optimize the CUDA program include memory optimizations, execution configuration optimizations and instruction-level parallelism (ILP).

*Memory Optimizations* Usually, registers have the shortest access latency compared with other memory, so keeping data in registers as much as possible improves the efficiency in general. However, dynamically indexed arrays cannot be stored in registers, so we define some variables for the 25 lanes by hand in order to have them stored in registers. Constant memory is a type of read-only memory. When it is necessary for a warp of threads to read the same location of memory, constant memory is the best choice. So we store 24 round constants on it. When the threads in warp read data which are physically adjacent to each other, the texture memory provides better performance than global memory, and it reduces memory traffic as well. So we can bind input data and some frequently accessed read-only data with texture memory.

*Execution Configuration* With resources like registers and shared memory limited in each graphic card, the number of threads running in each block will affect the performance since too many threads running in parallel will cause a shortage of registers and shared memory allocated to each thread, while too few parallel threads reduce the overall performance directly. According to our experiments, one block with 128 threads gives the best performance.

*Instruction-Level Parallelism* From [28], hashcat and ccminer, we see that forcing adjacent instructions independent gives better performance. Without prejudice to the functions of the program, we can adjust the order of instructions to improve the efficiency of the operations. In addition, loop unrolling [20] is also a good practice to obtain ILP.

### B.2 Hardware Specification of GPU

### C Details of Differential Trail Search

### C.1 Analysis of the Starting Point of the Search

The following paragraphs describe how specific attributes of differential trails are settled down. We take the 5-round Keccak collision situation as an example to explain why those kinds of trails are necessary.

Searching from Light \(\beta _3\) Our initial goal is to find collisions for 5-round Keccak. To facilitate a 5-round collision of Keccak, we need to find 4-round differential trails satisfying the three requirements mentioned in Sect. 5.2. However, it is difficult to meet all of them simultaneously even though each of them can be fulfilled solely.

We explain as follows. Since we aim for practical attacks, \(w_2+w_3+w_{4}^d\) must be small enough, say 55. That is to say, the last three rounds of the trail must be light and sparse. When we restrict a 3-round trail to be lightweight and extend it backward for one round, we almost always unfortunately get a heavy state \(\alpha _2\) (usually \(\#AS(\alpha _2)>120\)) whose weight may exceed the TDF. We take Keccak-224 as an example. The \({\mathtt{TDF}} \) of Keccak-224 is 191, which indicates \(\#\mathrm {AS}(\alpha _2)<92\) as the least weight for an S-box is 2. For a lightweight 3-round trail, it satisfies Requirement (1) occasionally. The greater the *d* is, the fewer trails satisfy Requirement (1).

With these requirements in mind, we search for 4-round differential trail cores from light middle state differences \(\beta _3\)’s. From light \(\beta _3\)’s we search forward and backward and check whether Requirements (1) and (2) are satisfied, respectively; once these two requirements are satisfied, we compute the weight \(w_2+w_3+w_{4}^d\) for brute force, hoping it is small enough for practical attacks.

\(\alpha _3, \alpha _4\)*in CP-Kernel* The designers of Keccak show in [4] that it is not possible to construct 3-round low-weight differential trails which stay in CP-kernel. However, 2-round differential trails in CP-kernel are possible, as studied in [9, 14, 21].

We restrict \(\alpha _3\) in CP-kernel. If \(\rho ^{-1}\circ \pi ^{-1}(\beta _3)\) is outside the CP-kernel and sparse, say 8 active bits, the active bits of \(\alpha _3=L^{-1}(\beta _3)\) will increase due to the strong diffusion of \(\theta ^{-1}\) and the sparseness of \(\beta _3\). When \(\#AS(\alpha _3)>11\), the complexity for searching backward for one \(\beta _3\) is greater than \(2^{34.87}\) which is too time-consuming. On the other hand, we had better also confine \(\alpha _4\) to the CP-kernel. If not, the requirement \(\alpha _{n_r}^d=0\) may not be satisfied. As can be seen from the lightest 3-round trail for Keccak-*f*[1600] [14], after \(\theta \) the nonzero difference bits are diffused among the states making a 224-bit collision impossible. (A 160-bit collision is still possible.) So our starting point is special \(\beta _3\) which makes sure \(\alpha _3 = L^{-1}(\beta _3)\) lies in CP-kernel, and for which there exists a compatible \(\alpha _4\) in CP-kernel. Fortunately, such kind of \(\beta _3\)’s can be obtained with KeccakTools [5].

### C.2 Detailed Algorithm for Searching Differential Trails

In this section, we describe more at length about the algorithm for finding differential trails. Firstly, light \(\beta _3\)’s, namely 2-round in CP-kernel trail cores, are generated with KeccakTools [5] and then extended one round forward and backward, respectively, to find suitable 4-round trail cores. Note that all extensions should be traversed. Given a \(\beta _3\), suppose there are \(C_1\) possible one-round forward extensions and \(C_2\) one-round backward extensions. These two numbers are mainly determined by the active S-boxes of \(\beta _3\). If the number of active S-boxes is *AS*, then roughly \(C_1\ge 4^{AS}\) and \(C_2\ge 9^{AS}\) according to the DDT. In the search for 4-round trail cores, \(C_2\) is the dominant time complexity, while for 5-round trail cores of Keccak[1440, 160, 6, 160], we start from (\(\beta _3,\beta _4\)) generated by KeccakTools, and \(C_1\) is almost as large as \(C_2\).

Generate \(\beta _3\) such that \(\alpha _3=L^{-1}(\beta _3)\) lies in CP-kernel and that there exists a compatible \(\alpha _4\) in CP-kernel, using TrailCoreInKernelAtC of KeccakTools [5] where the parameter

*aMaxWeight*is set to be 60. We obtain more than 3000 such cores.For each \(\beta _3\), if \(C_1\le 2^{36}\), we traverse all possible \(\alpha _4\), compute \(\beta _4\) and check whether the collision is possible for \(\beta _4\). If yes, keep this \(\beta _3\) and record this forward extension; otherwise, discard this \(\beta _3\).

For remaining \(\beta _3\), if \(C_2\le 2^{35}\), try all possible \(\beta _2\) which are compatible with \(\alpha _3=L^{-1}(\beta _3)\), and compute \(AS(\alpha _2)\) where \(\alpha _2=L^{-1}(\beta _3)\). If \(AS(\alpha _2)\le 110\), check whether this trail core (\(\beta _2, \beta _3, \beta _4\)) is practical for the collision attack.

To find a 5-round trail core for \({\textsc {Keccak}}[1440,160,6,160]\), we adapt the second step as follows.

For each \(\beta _3\), extend forward for one round using KeccakFTrailExtension of KeccakTools [5] with weight up to 45. For each generated 2-round core \((\beta _3,\beta _4)\), if \(C_1\le 2^{36}\) for \(\beta _4\), traverse all possible \(\alpha _5\) and compute \(\beta _5\). Check whether there exists an \(\alpha _6\) such that \(\alpha _6^{d}=0\). If yes, record the three-round core \(\beta _3,\beta _4,\beta _5\); otherwise, discard the \(\beta _3\).

### D Differential Trails

In this section, we give details of differential trails of Keccak mentioned in Sect. 5. Actually, we present trail cores. For example, a 4-round tail core \((\beta _2,\beta _3,\beta _4)\) consisting of three state differences represents a set of 4-round differential trails

where \(\alpha _5\) is compatible with \(\beta _4\) and \(\beta _1\rightarrow \alpha _2\) is of the least weight determined by \(\beta _2\). In our collision attacks on 5-round (6-round) Keccak, 4-round (5-round) trail cores are needed.

The 1600-bit state is displayed as a \(5\times 5\) array, ordered from left to right, where ‘|’ acts as the separator; each lane is denoted in hexadecimal using little-endian format; ‘0’ is replaced with ‘-’ for differential trails.

### E Instances of Collisions

In this section, we give instances of collisions against \({\textsc {Keccak}}[1440,160,5,160]\), \({\textsc {Keccak}}[640,160,5,160]\), \({\textsc {Keccak}} \) [1440, 160, 6, 160], 5-round SHAKE128, 5-round SHA3-224, 5-round SHA3-256, respectively. Note that we denote two colliding messages with \(M_1,M_2\) (Tables 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19).

## Rights and permissions

## About this article

### Cite this article

Guo, J., Liao, G., Liu, G. *et al.* Practical Collision Attacks against Round-Reduced SHA-3.
*J Cryptol* **33, **228–270 (2020). https://doi.org/10.1007/s00145-019-09313-3

Received:

Revised:

Published:

Issue Date:

### Keywords

- Cryptanalysis
- Hash function
- SHA-3
- Keccak
- Collision
- Linearization
- Differential
- GPU