1 Introduction

Hash functions are among the most important basic primitives in cryptography, used in many applications such as digital signatures, message integrity check and message authentication codes (MAC). Informally, a hash function H is a function that takes an arbitrarily long message M as input and outputs a fixed-length hash value of size n bits. Classical security requirements are collision resistance and (second)-preimage resistance. Namely, it should be impossible for an adversary to find a collision (two distinct messages that lead to the same hash value) in less than \(2^{n/2}\) hash computations or a (second)-preimage (a message hashing to a given challenge) in less than \(2^n\) hash computations. More complex security properties can be considered up to the point where the hash function should be indistinguishable from a random oracle, thus presenting no weakness whatsoever. Most standardized hash functions are based upon the Merkle-Damgård paradigm [4, 19] and iterate a compression function h with fixed input size to handle arbitrarily long messages. The compression function itself should ensure equivalent security properties in order for the hash function to inherit from them.

Recent impressive progresses in cryptanalysis [2629] led to the fall of most standardized hash primitives, such as MD4, MD5, SHA-0 and SHA-1. All these algorithms share the same design rationale for their compression function (i.e., they incorporate additions, rotations, XORs and boolean functions in an unbalanced Feistel network), and we usually refer to them as the MD-SHA family. As of today, only SHA-2, RIPEMD-128 and RIPEMD-160 remain unbroken among this family, but the rapid improvements in the attacks decided the NIST to organize a 4-year SHA-3 competition to design a new hash function, eventually leading to the selection of Keccak  [1]. This choice was justified partly by the fact that Keccak was built upon a completely different design rationale than the MD-SHA family. Yet, we cannot expect the industry to quickly move to SHA-3 unless a real issue is identified in current hash primitives. Therefore, the SHA-3 competition monopolized most of the cryptanalysis power during the last four years and it is now crucial to continue the study of the unbroken MD-SHA members.

The notation RIPEMD represents several distinct hash functions related to the MD-SHA family, the first representative being RIPEMD-0  [2] that was recommended in 1992 by the European RACE Integrity Primitives Evaluation (RIPE) consortium. Its compression function basically consists in two MD4-like [21] functions computed in parallel (but with different constant additions for the two branches), with 48 steps in total. Early cryptanalysis by Dobbertin on a reduced version of the compression function [7] seemed to indicate that RIPEMD-0 was a weak function and this was fully confirmed much later by Wang et al. [26] who showed that one can find a collision for the full RIPEMD-0 hash function with as few as \(2^{16}\) computations.

However, in 1996, due to the cryptanalysis advances on MD4 and on the compression function of RIPEMD-0, the original RIPEMD-0 was reinforced by Dobbertin, Bosselaers and Preneel [8] to create two stronger primitives RIPEMD-128 and RIPEMD-160, with 128 / 160-bit output and 64 / 80 steps, respectively (two other less known 256 and 320-bit output variants RIPEMD-256 and RIPEMD-320 were also proposed, but with a claimed security level equivalent to an ideal hash function with a twice smaller output size). The main novelty compared to RIPEMD-0 is that the two computation branches were made much more distinct by using not only different constants, but also different rotation values and boolean functions, which greatly hardens the attacker’s task in finding good differential paths for both branches at a time. The security seems to have indeed increased since as of today no attack is known on the full RIPEMD-128 or RIPEMD-160 compression/hash functions and the two primitives are worldwide ISO/IEC standards [10].

Even though no result is known on the full RIPEMD-128 and RIPEMD-160 compression/hash functions yet, many analysis were conducted in the recent years. In [18], a preliminary study checked to what extent the known attacks [26] on RIPEMD-0 can apply to RIPEMD-128 and RIPEMD-160. Then, following the extensive work on preimage attacks for MD-SHA family, [20, 22, 25] describe high complexity preimage attacks on up to 36 steps of RIPEMD-128 and 31 steps of RIPEMD-160. Collision attacks were considered in [16] for RIPEMD-128 and in [15] for RIPEMD-160, with 48 and 36 steps broken, respectively. Finally, distinguishers based on nonrandom properties such as second-order collisions are given in [15, 16, 23], reaching about 50 steps with a very high complexity.

1.1 Our Contributions

In this article, we introduce a new type of differential path for RIPEMD-128 using one nonlinear differential trail for both the left and right branches and, in contrary to previous works, not necessarily located in the early steps (Sect. 3). The important differential complexity cost of these two parts is mostly avoided by using the freedom degrees in a novel way: Some message words are used to handle the nonlinear parts in both branches and the remaining ones are used to merge the internal states of the two branches (Sect. 4). Overall, we obtain the first cryptanalysis of the full 64-round RIPEMD-128 hash and compression functions. Namely, we provide a distinguisher based on a differential property for both the full 64-round RIPEMD-128 compression function and hash function (Sect. 5). Previously best-known results for nonrandomness properties only applied to 52 steps of the compression function and 48 steps of the hash function. More importantly, we also derive a semi-free-start collision attack on the full RIPEMD-128 compression function (Sect. 5), significantly improving the previous free-start collision attack on 48 steps. Any further improvement in our techniques is likely to provide a practical semi-free-start collision attack on the RIPEMD-128 compression function. In order to increase the confidence in our reasoning, we implemented independently the two main parts of the attack (the merge and the probabilistic part) and the observed complexity matched our predictions. Our results and previous work complexities are given in Table 1 for comparison.

Table 1 Summary of known and new results on RIPEMD-128 hash function

2 Description of RIPEMD-128

RIPEMD-128  [8] is a 128-bit hash function that uses the Merkle-Damgård construction as domain extension algorithm: The hash function is built by iterating a 128-bit compression function h that takes as input a 512-bit message block \(m_i\) and a 128-bit chaining variable \(cv_i\):

$$\begin{aligned} cv_{i+1}=h(cv_i, m_{i}) \end{aligned}$$

where the message m to hash is padded beforehand to a multiple of 512 bitsFootnote 1 and the first chaining variable is set to a predetermined initial value \(cv_0=IV\) (defined by four 32-bit words 0x67452301, 0xefcdab89, 0x98badcfe and 0x10325476 in hexadecimal notation).

We refer to [8] for a complete description of RIPEMD-128. In the rest of this article, we denote by \([Z]_i\) the i-th bit of a word Z, starting the counting from 0. By least significant bit we refer to bit 0, while by most significant bit we will refer to bit 31. and represent the modular addition and subtraction on 32 bits, and \(\oplus \), \(\vee \), \(\wedge \), the bitwise “exclusive or”, the bitwise “or”, and the bitwise “and” function, respectively.

2.1 The RIPEMD-128 Compression Function

The RIPEMD-128 compression function is based on MD4, with the particularity that it uses two parallel instances of it. We differentiate these two computation branches by left and right branch and we denote by \(X_i\) (resp. \(Y_i\)) the 32-bit word of the left branch (resp. right branch) that will be updated during step i of the compression function. The process is composed of 64 steps divided into 4 rounds of 16 steps each in both branches.

2.1.1 Initialization

The 128-bit input chaining variable \(cv_i\) is divided into 4 words \(h_i\) of 32 bits each that will be used to initialize the left and right branches 128-bit internal state:

$$\begin{aligned} \begin{array}{l c l c l c l} X_{-3}=h_{0} &{} \,\,\, &{} X_{-2}=h_{1} &{} \,\,\, &{} X_{-1}=h_{2} &{} \,\,\, &{} X_{0}=h_{3} \\ Y_{-3}=h_{0} &{} \,\,\, &{} Y_{-2}=h_{1} &{} \,\,\, &{} Y_{-1}=h_{2} &{} \,\,\, &{} Y_{0}=h_{3} . \end{array} \end{aligned}$$

2.1.2 The Message Expansion

The 512-bit input message block is divided into 16 words \(M_i\) of 32 bits each. Every word \(M_i\) will be used once in every round in a permuted order (similarly to MD4) and for both branches. We denote by \(W^l_i\) (resp. \(W^r_i\)) the 32-bit expanded message word that will be used to update the left branch (resp. right branch) during step i. We have for \(0\le j \le 3\) and \(0\le k \le 15\):

$$\begin{aligned} \begin{array}{c c c c c} W^l_{j\cdot 16 + k} = M_{\pi ^l_j(k)} &{} \,\,\, &{} \hbox {and} &{} \,\,\, &{} W^r_{j\cdot 16 + k} = M_{\pi ^r_j(k)} \\ \end{array} \end{aligned}$$

where permutations \(\pi ^l_j\) and \(\pi ^r_j\) are given in Table 2.

Table 2 Word permutations for the message expansion in RIPEMD-128.

2.1.3 The Step Function

At every step i, the registers \(X_{i+1}\) and \(Y_{i+1}\) are updated with functions \(f^l_j\) and \(f^r_j\) that depend on the round j in which i belongs:

where \(K^l_j,K^r_j\) are 32-bit constants defined for every round j and every branch, \(s^l_i,s^r_i\) are rotation constants defined for every step i and every branch, \(\Phi ^l_j,\Phi ^r_j\) are 32-bit boolean functions defined for every round j and every branch. All these constants and functions are given in Tables 3 and 4.

Table 3 Rotation constants in RIPEMD-128
Table 4 Boolean functions and round constants in RIPEMD-128, with \(\hbox {XOR}(x, y, z) := x \oplus y \oplus z\), \(\hbox {IF}(x, y, z) := x \wedge y \oplus \bar{x} \wedge z\) and \(\hbox {ONX}(x, y, z) := (x \vee \bar{y}) \oplus z\)

2.1.4 The Finalization

A finalization and a feed-forward are applied when all 64 steps have been computed in both branches. The four 32-bit words \(h'_i\) composing the output chaining variable are finally obtained by:

3 A New Family of Differential Paths for RIPEMD-128

3.1 The General Strategy

The first task for an attacker looking for collisions in some compression function is to set a good differential path. In the case of RIPEMD and more generally double or multi-branches compression functions, this can be quite a difficult task because the attacker has to find a good path for all branches at the same time. This is exactly what multi-branches functions designers are hoping: It is unlikely that good differential paths exist in both branches at the same time when the branches are made distinct enough (note that the main weakness of RIPEMD-0 is that both branches are almost identical and the same differential path can be used for the two branches at the same time).

Differential paths in recent collision attacks on MD-SHA family are composed of two parts: a low-probability nonlinear part in the first steps and a high probability linear part in the remaining ones. Only the latter will be handled probabilistically and will impact the overall complexity of the collision finding algorithm, since during the first steps the attacker can choose message words independently. This strategy proved to be very effective because it allows to find much better linear parts than before by relaxing many constraints on them. The previous approaches for attacking RIPEMD-128  [16, 18] are based on the same strategy: building good linear paths for both branches, but without including the first round (i.e., the first 16 steps). The first round in each branch will be covered by a nonlinear differential path, and this is depicted left in Fig. 1. The collision search is then composed of two subparts, the first handling the low-probability nonlinear paths with the message blocks (Step ①) and then the remaining steps in both branches are verified probabilistically (Step ②).

Fig. 1
figure 1

Previous (left-hand side) and new (right-hand side) approach for collision search on double-branch compression functions

This differential path search strategy is natural when one handles the nonlinear parts in a classic way (i.e., computing only forward) during the collision search, but in Sect. 4 we will describe a new approach for using the available freedom degrees provided by the message words in double-branch compression functions (see right in Fig. 1): Instead of handling the first rounds of both branches at the same time during the collision search, we will attack them independently (Step ①), then use some remaining free message words to merge the two branches (Step ②) and finally handle the remaining steps in both branches probabilistically (Step ③). This new approach broadens the search space of good linear differential parts and eventually provides us better candidates in the case of RIPEMD-128.

3.2 Finding a Good Linear Part

Since any active bit in a linear differential path (i.e., a bit containing a difference) is likely to cause many conditions in order to control its spread, most successful collision searches start with a low-weight linear differential path, therefore reducing the complexity as much as possible. RIPEMD-128 is no exception, and because every message word is used once in every round of every branch in RIPEMD-128, the best would be to insert only a single-bit difference in one of them. This was considered in [16], but the authors concluded that none of all single-word differences lead to a good choice and they eventually had to utilize one active bit in two message words instead, therefore doubling the amount of differences inserted during the compression function computation and reducing the overall number of steps they could attack (this was also considered in [15] for RIPEMD-160, but only 36 rounds could be reached for semi-free-start collision attack). By relaxing the constraint that both nonlinear parts must necessarily be located in the first round, we show that a single-word difference in \(M_{14}\) is actually a very good choice.

3.2.1 Boolean Functions

Analyzing the various boolean functions in RIPEMD-128 rounds is very important. Indeed, there are three distinct functions: XOR, ONX and IF, all with very distinct behavior. The function IF is nonlinear and can absorb differences (one difference on one of its input can be blocked from spreading to the output by setting some appropriate bit conditions). In other words, one bit difference in the internal state during an IF round can be forced to create only a single-bit difference 4 steps later, thus providing no diffusion at all. On the other hand, XOR is arguably the most problematic function in our situation because it cannot absorb any difference when only a single-bit difference is present on its input. Thus, one bit difference in the internal state during an XOR round will double the number of bit differences every step and quickly lead to an unmanageable amount of conditions. Moreover, the linearity of the XOR function makes it problematic to obtain a solution when using the nonlinear part search tool as it strongly leverages nonlinear behavior. In between, the ONX function is nonlinear for two inputs and can absorb differences up to some extent. We can easily conclude that the goal for the attacker will be to locate the biggest proportion of differences in the IF or if needed in the ONX functions, and try to avoid the XOR parts as much as possible.

3.2.2 Choosing a Message Word

We would like to find the best choice for the single-message word difference insertion. The XOR function located in the 4th round of the right branch must be avoided, so we are looking for a message word that is incorporated either very early (so we can propagate the difference backward) or very late (so we can propagate the difference forward) in this round. Similarly, the XOR function located in the 1st round of the left branch must be avoided, so we are looking for a message word that is incorporated either very early (for a free-start collision attack) or very late (for a semi-free-start collision attack) in this round as well. It is easy to check that \(M_{14}\) is a perfect candidate, being inserted last in the 4th round of the right branch and second-to-last in the 1st round of the left branch.

3.2.3 Building the Linear Part

Once we chose that the only message difference will be a single bit in \(M_{14}\), we need to build the whole linear part of the differential path inside the internal state. By linear we mean that all modular additions will be modeled as a bitwise XOR function. Moreover, if a difference is input of a boolean function, it is absorbed whenever possible in order to remain as low weight as possible (yet, for a few special bit positions it might be more interesting not to absorb the difference if it can erase another difference in later steps). We give the rough skeleton of our differential path in Fig. 2. Both differences inserted in the 4th round of the left and right branches are simply propagated forward for a few steps, and we are very lucky that this linear propagation leads to two final internal states whose difference can be mutually erased after application of the compression function finalization and feed-forward (which is yet another argument in favor of \(M_{14}\)). All differences inserted in the 3rd and 2nd rounds of the left and right branches are propagated linearly backward and will be later connected to the bit difference inserted in the 1st round by the nonlinear part. Note that since a nonlinear part has usually a low differential probability, we will try to make it as thin as possible. No difference will be present in the input chaining variable, so the trail is well suited for a semi-free-start collision attack. Following this method and reusing notations from [3] given in Table 5, we eventually obtain the differential path depicted in Fig. 3, the “?" representing unrestricted bits that will be constrained during the nonlinear parts search. We had to choose the bit position for the message \(M_{14}\) difference insertion and among the 32 possible choices, the most significant bit was selected because it is the one maximizing the differential probability of the linear part we just built (this finds an explanation in the fact that many conditions due to carry control in modular additions are avoided on the most significant bit position).

Fig. 2
figure 2

Shape of our differential path for RIPEMD-128. The numbers are the message words inserted at each step, and the red curves represent the rough amount differences in the internal state during each step. The arrows show where the bit differences are injected with \(M_{14}\)

Table 5 Notations used in [3] for a differential path: x represents a bit of the first message and \(x^{*}\) stands for the same bit of the second message
Fig. 3
figure 3

Differential path for RIPEMD-128, before the nonlinear parts search. The notations are the same as in [3] and are described in Table 5. The column \(\pi ^l_i\) (resp. \(\pi ^r_i\)) contains the indices of the message words that are inserted at each step i in the left branch (resp. right branch), which corresponds to \(\pi ^l_j(k)\) (resp. \(\pi ^r_j(k)\)) with \(i=16\cdot j + k\)

3.3 The Nonlinear Differential Part Search Tool

Starting from Fig. 3, our goal is now to instantiate the unconstrained bits denoted by “?” such that only inactive (“0”, “1” or “-”) or active bits (“n”, “u” or “x”) remain and such that the path does not contain any direct inconsistency. This is generally a very complex task, but we implemented a tool similar to [3] for SHA-1 in order to perform this task in an automated way. Since RIPEMD-128 also belongs to the MD-SHA family, the original technique works well, in particular when used in a round with a nonlinear boolean function such as IF.

We have to find a nonlinear part for the two branches and we remark that these two tasks can be handled independently. We have included the special constraint that the nonlinear parts should be as thin as possible (i.e., restricted to the smallest possible number of steps), so as to later reduce the overall complexity (linear parts have higher differential probability than nonlinear ones).

3.4 The Final Differential Path Skeleton

Applying our nonlinear part search tool to the trail given in Fig. 3, we obtain the differential path in Fig. 4, for which we provide at each step i the differential probability \(\hbox {P}^l[i]\) and \(\hbox {P}^r[i]\) of the left and right branches, respectively. Also, we give for each step i the accumulated probability \(\hbox {P}[i]\) starting from the last step, i.e., \(\hbox {P}[i]=\prod _{j=63}^{j=i} (\hbox {P}^r[j] \cdot \hbox {P}^l[j])\).

Fig. 4
figure 4

Differential path for RIPEMD-128, after the nonlinear parts search. The notations are the same as in [3] and are described in Table 5. The column \(\pi ^l_i\) (resp. \(\pi ^r_i\)) contains the indices of the message words that are inserted at each step i in the left branch (resp. right branch), which corresponds to \(\pi ^l_j(k)\) (resp. \(\pi ^r_j(k)\)) with \(i=16\cdot j + k\). The column \(\hbox {P}^l[i]\) (resp. \(\hbox {P}^r[i]\)) represents the \(\log _2()\) differential probability of step i in left (resp. right) branch. The column P[i] represents the cumulated probability (in \(\log _2()\)) until step i for both branches, i.e., \(\hbox {P}[i]=\prod _{j=63}^{j=i} (\hbox {P}^r[j] \cdot \hbox {P}^l[j])\)

One can check that the trail has differential probability \(2^{-85.09}\) (i.e., \(\prod _{i=0}^{63} \hbox {P}^l[i]=2^{-85.09}\)) in the left branch and \(2^{-145}\) (i.e., \(\prod _{i=0}^{63} \hbox {P}^r[i]=2^{-145}\)) in the right branch. Its overall differential probability is thus \(2^{-230.09}\) and since we have 511 bits of message with unspecified value (one bit of \(M_4\) is already set to “1”), plus 127 unrestricted bits of chaining variable (one bit of \(X_0=Y_0=h_3\) is already set to “0”), we expect many solutions to exist (about \(2^{407.91}\)).

In order for the path to provide a collision, the bit difference in \(X_{61}\) must erase the one in \(Y_{64}\) during the finalization phase of the compression function: . Since the signs of these two bit differences are not specified, this happens with probability \(2^{-1}\) and the overall probability to follow our differential path and to obtain a collision for a randomly chosen input is \(2^{-231.09}\).

4 Utilization of the Freedom Degrees

In the differential path from Fig. 4, the difference mask is already entirely set, but almost all message bits and chaining variable bits have no constraint with regard to their value. All these freedom degrees can be used to reduce the complexity of the straightforward collision search (i.e., choosing random 512-bit message values) that requires about \(2^{231.09}\) RIPEMD-128 step computations. We will utilize these freedom degrees in three phases:

  • Phase 1: We first fix some internal state and message bits in order to prepare the attack. This will allow us to handle in advance some conditions in the differential path as well as facilitating the merging phase. This preparation phase is done once for all.

  • Phase 2: We will fix iteratively the internal state words \(X_{21}\), \(X_{22}\), \(X_{23}\), \(X_{24}\) from the left branch, and \(Y_{11}\), \(Y_{12}\), \(Y_{13}\),\(Y_{14}\) from the right branch, as well as message words \(M_{12}\), \(M_{3}\), \(M_{10}\), \(M_{1}\), \(M_{8}\), \(M_{15}\), \(M_{6}\), \(M_{13}\), \(M_{4}\), \(M_{11}\) and \(M_{7}\) (the ordering is important). This will provide us a starting point for the merging phase. However, due to a lack of freedom degrees, we will need to perform this phase several times in order to get enough starting points to eventually find a solution for the entire differential path.

  • Phase 3: We use the remaining unrestricted message words \(M_{0}\), \(M_{2}\), \(M_{5}\), \(M_{9}\) and \(M_{14}\) to efficiently merge the internal states of the left and right branches.

4.1 Phase 1: Preparation

Before starting to fix a lot of message and internal state bit values, we need to prepare the differential path from Fig. 4 so that the merge phase can later be done efficiently and so that the probabilistic part will not be too costly. Understanding these constraints requires a deep insight into the differences propagation and conditions fulfillment inside the RIPEMD-128 step function. Therefore, the reader not interested in the details of the differential path construction is advised to skip this subsection.

The first constraint that we set is \(Y_3=Y_4\). The effect is that the IF function at step 4 of the right branch, \(\mathtt{IF} (Y_2,Y_4,Y_3)=(Y_2 \wedge Y_3) \oplus (\overline{Y_2} \wedge Y_4)=Y_3=Y_4\), will not depend on \(Y_2\) anymore. We will see in Sect. 4.3 that this constraint is crucial in order for the merge to be performed efficiently.

The second constraint is \(X_{24}=X_{25}\) (except the two bit positions of \(X_{24}\) and \(X_{25}\) that contain differences), and the effect is that the IF function at step 26 of the left branch (when computing \(X_{27}\)), \(\mathtt{IF} (X_{26},X_{25},X_{24})=(X_{26}\wedge X_{25}) \oplus (\overline{X_{26}} \wedge X_{24})=X_{24}=X_{25}\), will not depend on \(X_{26}\) anymore. Before the final merging phase starts, we will not know \(M_0\), and having this \(X_{24}=X_{25}\) constraint will allow us to directly fix the conditions located on \(X_{27}\) without knowing \(M_0\) (since \(X_{26}\) directly depends on \(M_0\)). Moreover, we fix the 12 first bits of \(X_{23}\) and \(X_{24}\) to “01000100u001" and “001000011110", respectively, because we have checked experimentally that this choice is among the few that minimizes the number of bits of \(M_9\) that needs to be set in order to verify many of the conditions located on \(X_{27}\).

The third constraint consists in setting the bits 18 to 30 of \(Y_{20}\) to “0000000000000". The effect is that for these 13 bit positions, the ONX function at step 21 of the right branch (when computing \(Y_{22}\)), \(\mathtt{ONX} (Y_{21},Y_{20},Y_{19})=(Y_{21} \vee \overline{Y_{20}}) \oplus Y_{19}\), will not depend on the 13 corresponding bits of \(Y_{21}\) anymore. Again, because we will not know \(M_0\) before the merging phase starts, this constraint will allow us to directly fix the conditions on \(Y_{22}\) without knowing \(M_0\) (since \(Y_{21}\) directly depends on \(M_0\)).

Finally, the last constraint that we enforce is that the first two bits of \(Y_{22}\) are set to “10” and the first three bits of \(M_{14}\) are set to “011”. We have checked experimentally that this particular choice of bit values reduces the spectrum of possible carries during the addition of step 24 (when computing \(Y_{25}\)) and we obtain a probability improvement from \(2^{-1}\) to \(2^{-0.25}\) to reach “u” in \(Y_{25}\).

We give in Fig. 5 our differential path after having set these constraints (we denote a bit \([X_i]_j\) with the constraint \([X_i]_j=[X_{i-1}]_j\) by “\(\;\hat{}\;\)”). We observe that all the constraints set in this subsection consume in total \(32+51+13+5=101\) bits of freedom degrees, and a huge amount of solutions (about \(2^{306.91}\)) are still expected to exist.

Fig. 5
figure 5

Differential path for RIPEMD-128, after the nonlinear parts search. The notations are the same as in [3] and are described in Table 5. Moreover, we denote by “\(\;\hat{}\;\)” the constraint on a bit \([X_i]_j\) such that \([X_i]_j=[X_{i-1}]_j\). The column \(\pi ^l_i\) (resp. \(\pi ^r_i\)) contains the indices of the message words that are inserted at each step i in the left branch (resp. right branch), which corresponds to \(\pi ^l_j(k)\) (resp. \(\pi ^r_j(k)\)) with \(i=16\cdot j + k\)

4.2 Phase 2: Generating a Starting Point

Once the differential path is properly prepared in Phase 1, we would like to utilize the huge amount of freedom degrees available to directly fulfill as many conditions as possible. Our approach is to fix the value of the internal state in both the left and right branches (they can be handled independently), exactly in the middle of the nonlinear parts where the number of conditions is important. Then, we will fix the message words one by one following a particular scheduling and propagating the bit values forward and backward from the middle of the nonlinear parts in both branches.

4.2.1 Fixing the Internal State

We chose to start by setting the values of \(X_{21}\), \(X_{22}\), \(X_{23}\), \(X_{24}\) in the left branch, and \(Y_{11}\), \(Y_{12}\), \(Y_{13}\), \(Y_{14}\) in the right branch, because they are located right in the middle of the nonlinear parts. We take the first word \(X_{21}\) and randomly set all of its unrestricted “-" bits to “0" or “1" and check if any direct inconsistency is created with this choice. If that is the case, we simply pick another candidate until no direct inconsistency is deduced. Otherwise, we can go to the next word \(X_{22}\). If too many tries are failing for a particular internal state word, we can backtrack and pick another choice for the previous word. Finally, if no solution is found after a certain amount of time, we just restart the whole process, so as to avoid being blocked in a particularly bad subspace with no solution.

4.2.2 Fixing the Message Words

Similarly to the internal state words, we randomly fix the value of message words \(M_{12}\), \(M_{3}\), \(M_{10}\), \(M_{1}\), \(M_{8}\), \(M_{15}\), \(M_{6}\), \(M_{13}\), \(M_{4}\), \(M_{11}\) and \(M_{7}\) (following this particular ordering that facilitates the convergence toward a solution). The difference here is that the left and right branches computations are no more independent since the message words are used in both of them. However, this does not change anything to our algorithm and the very same process is applied: For each new message word randomly fixed, we compute forward and backward from the known internal state values and check for any inconsistency, using backtracking and reset if needed.

Overall, finding one new solution for this entire Phase 2 takes about 5 minutes of computation on a recent PC with a naive implementationFootnote 2. However, when one starting point is found, we can generate many for a very cheap cost by randomizing message words \(M_4\), \(M_{11}\) and \(M_7\) since the most difficult part is to fix the 8 first message words of the schedule. For example, once a solution is found, one can directly generate \(2^{18}\) new starting points by randomizing a certain portion of \(M_7\) (because \(M_7\) has no impact on the validity of the nonlinear part in the left branch, while in the right branch one has only to ensure that the last 14 bits of \(Y_{20}\) are set to “u0000000000000") and this was verified experimentally.

We give an example of such a starting point in Fig. 6, and we emphasize that by “solution" or “starting point", we mean a differential path instance with exactly the same probability profile as this one. The 3 constrained bit values in \(M_{14}\) are coming from the preparation in Phase 1, and the 3 constrained bit values in \(M_{9}\) are necessary conditions in order to fulfill step 26 when computing \(X_{27}\). It is also important to remark that whatever instance found during this second phase, the position of these 3 constrained bit values will always be the same thanks to our preparation in Phase 1.

Fig. 6
figure 6

Differential path for RIPEMD-128, after the second phase of the freedom degree utilization. The notations are the same as in [3] and are described in Table 5. The column \(\pi ^l_i\) (resp. \(\pi ^r_i\)) contains the indices of the message words that are inserted at each step i in the left branch (resp. right branch), which corresponds to \(\pi ^l_j(k)\) (resp. \(\pi ^r_j(k)\)) with \(i=16\cdot j + k\). The column \(\hbox {P}^l[i]\) (resp. \(\hbox {P}^r[i]\)) represents the \(\log _2()\) differential probability of step i in left (resp. right) branch. The column P[i] represents the cumulated probability (in \(\log _2()\)) until step i for both branches, i.e., \(\hbox {P}[i]=\prod _{j=63}^{j=i} (\hbox {P}^r[j] \cdot \hbox {P}^l[j])\)

The probabilities displayed in Fig. 6 for early steps (steps 0 to 14) are not meaningful here since they assume an attacker only computing forward, while in our case we will compute backward from the nonlinear parts to the early steps. However, we can see that the uncontrolled accumulated probability (i.e., Step ③ on the right side of Fig. 1) is now improved to \(2^{-29.32}\), or \(2^{-30.32}\) if we add the extra condition for the collision to happen at the end of the RIPEMD-128 compression function.

4.3 Phase 3: Merging the Left and Right Branches

At the end of the second phase, we have several starting points equivalent to the one from Fig. 6, with many conditions already verified and an uncontrolled accumulated probability of \(2^{-30.32}\). Our goal for this third phase is to use the remaining free message words \(M_{0}\), \(M_{2}\), \(M_{5}\), \(M_{9}\), \(M_{14}\) and make sure that both the left and right branches start with the same chaining variable.

We recall that during the first phase we enforced that \(Y_3=Y_4\), and for the merge we will require an extra constraint (this will later make \(X_1\) to be linearly dependent on \(X_4\), \(X_3\) and \(X_2\)). The message words \(M_{14}\) and \(M_9\) will be utilized to fulfill this constraint, and message words \(M_0\), \(M_2\) and \(M_5\) will be used to perform the merge of the two branches with only a few operations and with a success probability of \(2^{-34}\).

4.3.1 Handling the Extra Constraint with \(M_{14}\) and \(M_9\)

First, let us deal with the constraint , which can be rewritten as . Thus, we have by replacing \(M_5\) using the update formula of step 8 in the left branch. Finally, isolating \(X_{6}\) and replacing it using the update formula of step 9 in the left branch, we obtain:

(1)

All values on the right-hand side of this equation are known if \(M_{14}\) is fixed. Therefore, so as to fulfill our extra constraint, what we could try is to simply pick a random value for \(M_{14}\) and then directly deduce the value of \(M_9\) thanks to Eq. (1). However, one can see in Fig. 6 that 3 bits are already fixed in \(M_9\) (the last one being the 10th bit of \(M_9\)) and thus a valid solution would be found only with probability \(2^{-3}\). In order to avoid this extra complexity factor, we will first randomly fix the first 24 bits of \(M_{14}\) and this will allow us to directly deduce the first 10 bits of \(M_9\). We thus check that our extra constraint up to the 10th bit is fulfilled (because knowing the first 24 bits of \(M_{14}\) will lead to the first 24 bits of \(X_{11}\), \(X_{10}\), \(X_{9}\), \(X_{8}\) and the first 10 bits of \(X_{7}\), which is exactly what we need according to Eq. (1)). Once a solution is found after \(2^3\) tries on average, we can randomize the remaining \(M_{14}\) unrestricted bits (the 8 most significant bits) and eventually deduce the 22 most significant bits of \(M_9\) with Eq. (1). With this method, we completely remove the extra \(2^{3}\) factor, because the cost is amortized by the final randomization of the 8 most significant bits of \(M_{14}\).

4.3.2 Merging the Branches with \(M_0\), \(M_2\) and \(M_5\)

Once \(M_9\) and \(M_{14}\) are fixed, we still have message words \(M_0\), \(M_2\) and \(M_5\) to determine for the merging. One can see that with only these three message words undetermined, all internal state values except \(X_2\), \(X_1\), \(X_{0}\), \(X_{-1}\), \(X_{-2}\), \(X_{-3}\) and \(Y_2\), \(Y_1\), \(Y_{0}\), \(Y_{-1}\), \(Y_{-2}\), \(Y_{-3}\) are fully known when computing backward from the nonlinear parts in each branch.

This is where our first constraint \(Y_3=Y_4\) comes into play. Indeed, when writing \(Y_1\) from the equation in step 4 in the right branch, we have:

which means that \(Y_1\) is already completely determined at this point (the bit condition present in \(Y_1\) in Fig. 6 is actually handled for free when fixing \(M_{14}\) and \(M_9\), since it requires to know the 9 first bits of \(M_9\)). In other words, the constraint \(Y_3=Y_4\) implies that \(Y_1\) does not depend on \(Y_2\) which is currently undetermined. Another effect of this constraint can be seen when writing \(Y_2\) from the equation in step 5 in the right branch:

where is a constant.

Our second constraint is useful when writing \(X_1\) and \(X_2\) from the equations from step 4 and 5 in the left branch

where is a constant.

Finally, our ultimate goal for the merge is to ensure that \(X_{-3}=Y_{-3}\), \(X_{-2}=Y_{-2}\), \(X_{-1}=Y_{-1}\) and \(X_{0}=Y_{0}\), knowing that all other internal states are determined when computing backward from the nonlinear parts in each branch, except , and . We therefore write the equations relating these eight internal state words:

If these four equations are verified, then we have merged the left and right branches to the same input chaining variable. We first remark that \(X_0\) is already fully determined, and thus, the second equation \(X_{-1}=Y_{-1}\) only depends on \(M_2\). Moreover, it is a T-function in \(M_2\) (any bit i of the equation depends only on the i first bits of \(M_2\)) and can therefore be solved very efficiently bit per bit. We give in “Appendix 1” more details on how to solve this T-function and our average cost in order to find one \(M_2\) solution is one RIPEMD-128 step computation.

Since \(X_0\) is already fully determined, from the \(M_2\) solution previously obtained, we directly deduce the value of \(M_0\) to satisfy the first equation \(X_{0}=Y_{0}\). From \(M_2\) we can compute the value of \(Y_{-2}\) and we know that \(X_{-2} = Y_{-2}\) and we calculate \(X_{-3}\) from \(M_0\) and \(X_{-2}\). At this point, the two first equations are fulfilled and we still have the value of \(M_5\) to choose.

The third equation can be rewritten as , where and \(C_2\), \(C_3\) are two constants. Similarly, the fourth equation can be rewritten as , where \(C_4\) and \(C_5\) are two constants. Solving either of these two equations with regard to V can be costly because of the rotations, so we combine them to create a simpler one: . This equation is easier to handle because the rotation coefficient is small: we guess the 3 most significant bits of and we solve simply the equation 3-bit layer per 3-bit layer, starting from the least significant bit. Once the value of V is deduced, we straightforwardly obtain and the cost of recovering \(M_5\) is equivalent to 8 RIPEMD-128 step computations (the 3-bit guess implies a factor of 8, but the resolution can be implemented very efficiently with tables).

When all three message words \(M_0\), \(M_2\) and \(M_5\) have been fixed, the first, second and a combination of the third and fourth equalities are necessarily verified. However, we have a probability \(2^{-32}\) that both the third and fourth equations will be fulfilled. Moreover, one can check in Fig. 6 that there is one bit condition on \(X_{0}=Y_{0}\) and one bit condition on \(Y_{2}\), and this further adds up a factor \(2^{-2}\). We evaluate the whole process to cost about 19 RIPEMD-128 step computations on average: There are 17 steps to compute backward after having identified a proper couple \(M_{14}\), \(M_9\), and the 8 RIPEMD-128 step computations to obtain \(M_5\) are only done 1 / 4 of the time because the two bit conditions on \(Y_{2}\) and \(X_{0}=Y_{0}\) are filtered before.

To summarize the merging: We first compute a couple \(M_{14}\), \(M_9\) that satisfies a special constraint, we find a value of \(M_2\) that verifies \(X_{-1}=Y_{-1}\), then we directly deduce \(M_0\) to fulfill \(X_{0}=Y_{0}\), and we finally obtain \(M_5\) to satisfy a combination of \(X_{-2}=Y_{-2}\) and \(X_{-3}=Y_{-3}\). Overall, with only 19 RIPEMD-128 step computations on average, we were able to do the merging of the two branches with probability \(2^{-34}\).

5 Results and Implementation

5.1 Complexity Analysis and Implementation

After the quite technical description of the attack in the previous section, we would like to wrap everything up to get a clearer view of the attack complexity, the amount of freedom degrees, etc. Given a starting point from Phase 2, the attacker can perform \(2^{26}\) merge processes (because 3 bits are already fixed in both \(M_9\) and \(M_{14}\), and the extra constraint consumes 32 bits) and since one merge process succeeds only with probability of \(2^{-34}\), he obtains a solution with probability \(2^{-8}\). Since he needs \(2^{30.32}\) solutions from the merge to have a good chance to verify the probabilistic part of the differential path, a total of \(2^{38.32}\) starting points will have to be generated and handled.

The attack starts at the end of Phase 1, with the path from Fig. 5. From here, he generates \(2^{38.32}\) starting points in Phase 2, that is, \(2^{38.32}\) differential paths like the one from Fig. 6 (with the same step probabilities). In Phase 3, for each starting point, he tries \(2^{26}\) times to find a solution for the merge with an average complexity of 19 RIPEMD-128 step computations per try. The semi-free-start collision final complexity is thus \(19 \cdot 2^{26+38.32}\) RIPEMD-128 step computations, which corresponds to \((19/128) \cdot 2^{64.32} = 2^{61.57}\) RIPEMD-128 compression function computations (there are 64 steps computations in each branch).

The merge process has been implemented, and we provide, in hexadecimal notation, an example of a message and chaining variable pair that verifies the merge (i.e., they follow the differential path from Fig. 4 until step 25 of the left branch and step 20 of the right branch). The second member of the pair is simply obtained by adding a difference on the most significant bit of \(M_{14}\).

$$\begin{aligned} \begin{array}{ccccccc} h_0 = \mathtt{0x1330db09} &{} \quad &{} h_1 = \mathtt{0xe1c2cd59} &{} \quad &{} h_2 = \mathtt{0xd3160c1d} &{} \quad &{} h_3 = \mathtt{0xd9b11816} \\ M_{0} = \mathtt{0x4b6adf53} &{} \quad &{} M_{1} = \mathtt{0x1e69c794} &{} \quad &{} M_{2} = \mathtt{0x0eafe77c} &{} \quad &{} M_{3} = \mathtt{0x35a1b389} \\ M_{4} = \mathtt{0x34a56d47} &{} \quad &{} M_{5} = \mathtt{0x0634d566} &{} \quad &{} M_{6} = \mathtt{0xb567790c} &{} \quad &{} M_{7} = \mathtt{0xa0324005} \\ M_{8} = \mathtt{0x8162d2b0} &{} \quad &{} M_{9} = \mathtt{0x6632792a} &{} \quad &{}M_{10} = \mathtt{0x52c7fb4a} &{} \quad &{}M_{11} = \mathtt{0x16b9ce57} \\ M_{12} = \mathtt{0x914dc223}&{} \quad &{}M_{13} = \mathtt{0x3bafc9de} &{} \quad &{}M_{14} = \mathtt{0x5402b983} &{} \quad &{}M_{15} = \mathtt{0xe08f7842} \\ \end{array} \end{aligned}$$

We measured the efficiency of our implementation in order to compare it with our theoretic complexity estimation. As point of reference, we observed that on the same computer, an optimized implementation of RIPEMD-160 (OpenSSL v.1.0.1c) performs \(2^{21.44}\) compression function computations per second. With 4 rounds instead of 5 and about 3 / 4 less operations per step, we extrapolated that RIPEMD-128 would perform at \(2^{22.17}\) compression function computations per second. Our implementation performs \(2^{24.61}\) merge process (both Phase 2 and Phase 3) per second on average, which therefore corresponds to a semi-free-start collision final complexity of \(2^{61.88}\) RIPEMD-128 compression function computations. While our practical results confirm our theoretical estimations, we emphasize that there is a room for improvements since our attack implementation is not really optimized. As a side note, we also verified experimentally that the probabilistic part in both the left and right branches can be fulfilled.

A last point needs to be checked: the complexity estimation for the generation of the starting points. Indeed, as much as \(2^{38.32}\) starting points are required at the end of Phase 2 and the algorithm being quite heuristic, it is hard to analyze precisely. The amount of freedom degrees is not an issue since we already saw in Sect. 4.1 that about \(2^{306.91}\) solutions are expected to exist for the differential path at the end of Phase 1. With our implementation, a completely new starting point takes about 5 minutes to be outputted on average, but from one such path we can directly generate \(2^{18}\) equivalent ones by randomizing \(M_7\). Using the OpenSSL implementation as reference, this amounts to \(2^{50.72}\) RIPEMD-128 computations to generate all the starting points that we need in order to find a semi-free-start collision. This rough estimation is extremely pessimistic since its does not even take in account the fact that once a starting point is found, one can also randomize \(M_4\) and \(M_{11}\) to find many other valid candidates with a few operations. Finally, one may argue that with this method the starting points generated are not independent enough (in backward direction when merging and/or in forward direction for verifying probabilistically the linear part of the differential path). However, no such correlation was detected during our experiments and previous attacks on similar hash functions [12, 14] showed that only a few rounds were enough to observe independence between bit conditions. In addition, even if some correlations existed, since we are looking for many solutions, the effect would be averaged among good and bad candidates.

5.2 Collision for the RIPEMD-128 Compression Function

We described in previous sections a semi-free-start collision attack for the full RIPEMD-128 compression function with \(2^{61.57}\) computations. It is clear from Fig. 6 that we can remove the 4 last steps of our differential path in order to attack a 60-step reduced variant of the RIPEMD-128 compression function. No difference will be present in the internal state at the end of the computation, and we directly get a collision, saving a factor \(2^{4}\) over the full RIPEMD-128 attack complexity.

We also give in “Appendix 2” a slightly different freedom degrees utilization when attacking 63 steps of the RIPEMD-128 compression function (the first step being taken out) that saves a factor \(2^{1.66}\) over the collision attack complexity on the full primitive.

5.3 Distinguishers

The setting for the distinguisher is very simple. As nonrandom property, the attacker will find one input m, such that \(H(m) \oplus H(m \oplus {\varDelta }_I) = {\varDelta }_O\). In other words, he will find an input m such that with a fixed and predetermined difference \({\varDelta }_I\) applied on it, he observes another fixed and predetermined difference \({\varDelta }_O\) on the output. This problem is called the limited-birthday [9] because the fixed differences removes the ability of an attacker to use a birthday-like algorithm when H is a random function. The best-known algorithm to find such an input for a random function is to simply pick random inputs m and check if the property is verified. This has a cost of \(2^{128}\) computations for a 128-bit output function.

Of course, considering the differential path we built in previous sections, in our case we will use \({\Delta }_O=0\) and \({\Delta }_I\) is defined to contain no difference on the input chaining variable, and only a difference on the most significant bit of \(M_{14}\). If we are able to find a valid input with less than \(2^{128}\) computations for RIPEMD-128, we obtain a distinguisher.

5.3.1 Distinguisher for the RIPEMD-128 Compression Function

A collision attack on the RIPEMD-128 compression function can already be considered a distinguisher. However, we remark that since the complexity gap between the attack cost (\(2^{61.57}\)) and the generic case (\(2^{128}\)) is very big, we can relax some of the conditions in the differential path to reduce the distinguisher computational complexity. Indeed, we can straightforwardly relax the collision condition on the compression function finalization, as well as the condition in the last step of the left branch. Overall, the distinguisher complexity is \(2^{59.57}\), while the generic cost will be very slightly less than \(2^{128}\) computations because only a small set of possible differences \({\varDelta }_O\) can now be reached on the output.

5.3.2 Distinguisher for the RIPEMD-128 Hash Function

There are two main distinctions between attacking the hash function and attacking the compression function. Firstly, when attacking the hash function, the input chaining variable is specified to be a fixed public IV. Secondly, a part of the message has to contain the padding.

Since the chaining variable is fixed, we cannot apply our merging algorithm as in Sect. 4. Instead, we utilize the available freedom degrees (the message words) to handle only one of the two nonlinear parts, namely the one in the right branch because it is the most complex. We use the same method as in Phase 2 in Sect. 4, and we very quickly obtain a differential path such as the one in Fig. 7. One can remark that the six first message words inserted in the right branch are free (\(M_5\), \(M_{14}\), \(M_7\), \(M_{0}\), \(M_9\) and \(M_{2}\)) and we will fix them to merge the right branch to the predefined input chaining variable. The entirety of the left branch will be verified probabilistically (with probability \(2^{-84.65}\)) as well as the steps located after the nonlinear part in the right branch (from step 19 with probability \(2^{-19.75}\)). The bit condition on the IV can be handled by prepending a random message, and the few conditions in the early steps when computing backward are directly fulfilled when choosing \(M_2\) and \(M_9\).

Fig. 7
figure 7

Differential path for the full RIPEMD-128 hash function distinguisher. The notations are the same as in [3] and are described in Table 5. The column \(\pi ^l_i\) (resp. \(\pi ^r_i\)) contains the indices of the message words that are inserted at each step i in the left branch (resp. right branch), which corresponds to \(\pi ^l_j(k)\) (resp. \(\pi ^r_j(k)\)) with \(i=16\cdot j + k\). The column \(\hbox {P}^l[i]\) (resp. \(\hbox {P}^r[i]\)) represents the \(\log _2()\) differential probability of step i in left (resp. right) branch. The column P[i] represents the cumulated probability (in \(\log _2()\)) until step i for both branches, i.e., \(\hbox {P}[i]=\prod _{j=63}^{j=i} (\hbox {P}^r[j] \cdot \hbox {P}^l[j])\)

Overall, adding the extra condition to obtain a collision after the finalization of the compression function, we end up with a complexity of \(2^{105.4}\) computations to get a collision after the first message block. Once this collision is found, we add an extra message block without difference to handle the padding and we obtain a collision for the whole hash function. In the ideal case, generating a collision for a 128-bit output hash function with a predetermined difference mask on the message input requires \(2^{128}\) computations, and we obtain a distinguisher for the full RIPEMD-128 hash function with \(2^{105.4}\) computations.

Since the first publication of our attack at the EUROCRYPT 2013 conference [13], this distinguisher has been improved by Iwamoto et al. [11]. They remarked that one can convert a semi-free-start collision attack on a compression function into a limited-birthday distinguisher for the entire hash function. They use our semi-free-start collision finding algorithm on RIPEMD-128 compression function, but they require to find about \(2^{33.2}\) valid input pairs. As explained in Sect. 4.1, the amount of freedom degrees is sufficient for this requirement to be fulfilled.

6 Conclusion

In this article, we proposed a new cryptanalysis technique for RIPEMD-128 that led to a collision attack on the full compression function as well as a distinguisher for the full hash function. We believe that our method still has room for improvements, and we expect a practical collision attack for the full RIPEMD-128 compression function to be found during the coming years. While our results do not endanger the collision resistance of the RIPEMD-128 hash function as a whole, we emphasize that semi-free-start collision attacks are a strong warning sign which indicates that RIPEMD-128 might not be as secure as the community expected. Considering the history of the attacks on the MD5 compression function [5, 6], MD5 hash function [28] and then MD5-protected certificates [24], we believe that another function than RIPEMD-128 should be used for new security applications (we also remark that, considering nowadays computing power, RIPEMD-128 output size is too small to provide sufficient security with regard to collision attacks).

Aside from reducing the complexity of the collision attack on the RIPEMD-128 compression function, future works include applying our methods to RIPEMD-160 and other parallel branches-based functions. Since the first publication of our attacks at the EUROCRYPT 2013 conference [13], our semi-free-start search technique has been used by Mendel et al. [17] to attack the RIPEMD-160 compression function. So far, this direction turned out to be less efficient then expected for this scheme, due to a much stronger step function.

It would also be interesting to scrutinize whether there might be any way to use some other freedom degrees techniques (neutral bits, message modifications, etc.) on top of our merging process.