Skip to main content
Log in

The suffix-free-prefix-free hash function construction and its indifferentiability security analysis

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

In this paper, we observe that in the seminal work on indifferentiability analysis of iterated hash functions by Coron et al. and in subsequent works, the initial value \((IV)\) of hash functions is fixed. In addition, these indifferentiability results do not depend on the Merkle–Damgård (MD) strengthening in the padding functionality of the hash functions. We propose a generic \(n\)-bit-iterated hash function framework based on an \(n\)-bit compression function called suffix-free-prefix-free (SFPF) that works for arbitrary \(IV\)s and does not possess MD strengthening. We formally prove that SFPF is indifferentiable from a random oracle (RO) when the compression function is viewed as a fixed input-length random oracle (FIL-RO). We show that some hash function constructions proposed in the literature fit in the SFPF framework while others that do not fit in this framework are not indifferentiable from a RO. We also show that the SFPF hash function framework with the provision of MD strengthening generalizes any \(n\)-bit-iterated hash function based on an \(n\)-bit compression function and with an \(n\)-bit chaining value that is proven indifferentiable from a RO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Bellare, M., Canetti, R., Krawczyk, H.: Pseudorandom functions revisited: the cascade construction and its concrete security. In: Proceedings of the 37th Annual IEEE Symposium on Foundations of Computer Science, FOCS’96, pp. 514–523. IEEE Computer Society, IEEE Computer Society Press (1996)

  2. Bellare, M., Ristenpart, T.: Multi-property-preserving hash domain extension and the EMD transform. In: Proceedings of ASIACRYPT 2006, vol. 4284 of Lecture Notes in Computer Science, pp. 299–314. Springer (2006)

  3. Bellare, M., Rogaway, P.: Random oracles are practical: a paradigm for designing efficient protocols. In: Proceedings of CCS ’93p, pp. 62–73. ACM Press (1993)

  4. Chang, D., Lee, S., Nandi, M., Yung, M.: Indifferentiable security analysis of popular hash functions with prefix-free padding. In: Proceedings of ASIACRYPT 2006, vol. 4284 of Lecture Notes in Computer Science, pp. 283–298. Springer (2006)

  5. Chang, D., Nandi, M.: Improved Indifferentiability Security Analysis of chopMD Hash Function. In: Proc. FSE 2008, volume 5086 of Lecture Notes in Computer Science, pp. 429–443. Springer (2008)

  6. Chang, D., Sung, J., Hong, S., Lee, S.: Indifferentiable security analysis of choppfMD, chopMD, chopMDP, chopWPH, chopNI, chopEMD, chopCS, and chopESh hash domain extensions. Cryptology ePrint archive, report 2008/407 (2008)

  7. Coron, J.-S., Dodis, Y., Malinaud, C., Puniya, P.: Merkle–Damgård revisited: how to construct a hash function. In: Proceedings of CRYPTO 2005, vol. 3621 of Lecture Notes in Computer Science, pp. 430–448. Springer (2005)

  8. Coron, J.-S., Patarin, J., Seurin, Y.: The random oracle model and the ideal cipher model are equivalent. In: Proceedings of CRYPTO 2008, volume 5157 of Lecture Notes in Computer Science, pp. 1–20. Springer (2008)

  9. Damgård, I.B.: A design principle for hash functions. In: Proceedings of CRYPTO 1989, vol. 435 of Lecture Notes in Computer Science, pp. 416–427. Springer (1989)

  10. Dobbertin, H., Bosselaers, A., Preneel, B.: RIPEMD-160: a strengthened version of RIPEMD. In: Proceedings of FSE 1996, vol. 1039 of Lecture Notes in Computer Science, pp. 71–82. Springer (1996)

  11. Gong, Z., Lai, X., Chen, K.: A synthetic indifferentiability analysis of some block-cipher-based hash functions. Des. Codes Cryptogr. 48(3), 293–305 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hirose, S., Park, J.H., Yun, A.: A simple variant of the Merkle–Damgård scheme with a permutation. In: Proceedings of ASIACRYPT 2007, vol. 4833 of Lecture Notes in Computer Science, pp. 113–129. Springer (2007)

  13. Joux, A.: Multicollisions in iterated hash functions. Application to cascaded constructions. In: Proceedings of CRYPTO 2004, vol. 3152 of Lecture Notes in Computer Science, pp. 306–316. Springer (2004)

  14. Kelsey, J., Kohno, T.: Herding hash functions and the Nostradamus attack. In: Proceedings of EUROCRYPT 2006, vol. 4004 of Lecture Notes in Computer Science, pp. 183–200. Springer (2006)

  15. Kelsey, J., Schneier, B.: Second preimages on n-bit hash functions for much less than \(2^{n}\) work. In: Proceedings of EUROCRYPT 2005, vol. 3494 of Lecture Notes in Computer Science, pp. 474–490. Springer (2005)

  16. Lai, X., Massey, J.L.: Hash functions based on block ciphers. In: Proceedings of EUROCRYPT 1992, vol. 658 of Lecture Notes in Computer Science, pp. 53–66. Springer (1992)

  17. Lucks, S.: A failure-friendly design principle for hash functions. In: Proceedings of ASIACRYPT 2005, vol. 3788 of Lecture Notes in Computer Science, pp. 474–494. Springer (2005)

  18. Maurer, U.M., Renner, R., Holenstein, C.: Indifferentiability, impossibility results on reductions, and applications to the random oracle methodology. In: Proceedings of TCC ’04, vol. 2951 of Lecture Notes in Computer Science, pp. 21–39. Springer (2004)

  19. Merkle, R.C.: One way hash functions and DES. In: Proceedings of CRYPTO 1989, vol. 435 of Lecture Notes in Computer Science, pp. 428–446. Springer (1989)

  20. National Institute of Standards and Technology.: FIPS PUB 180–2-Secure Hash Standard, Aug 2002

  21. Preneel, B.: Analysis and design of cryptographic hash functions. Thesis (Ph.D.), Katholieke Universiteit Leuven, Leuven, Belgium, Jan 1993

Download references

Acknowledgments

We would like to thank anonymous reviewers for their valuable comments on the paper. We also thank Colin Boyd and Choudary Gorantla for their comments on an earlier version of this paper and Shoichi Hirose for his discussions on this topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Zenner.

Additional information

A portion of this project and initial submission were done when the author was a postdoc researcher in the Department of Mathematics, Technical University of Denmark sponsored by Danish Council for Independent Research–Technology and Production Sciences (FTP) grant number 09-066486/FTP. Part of this work was done when the author was visiting CR RAO Advanced Institute of Mathematics, Statistics and Computer Science (AIMSCS), India. A portion of this project and initial submission were done when the author was employed in the Department of Mathematics, Technical University of Denmark.

Appendices

Appendix 1: Hash function constructions

Algorithms 510 describe some hash constructions used in the main text.

figure a5
figure a6
figure a7
figure a8
figure a9
figure a10

Appendix 2: Proof of Theorem 2

In the following, we provide a proof for Theorem 2.

Proof

Let \(\mathcal A \) be the adversary whose goal is to differentiate \((H^f,(f_1,f_2,f_3))\) from \((R,S^{f})\) by asking \(q\) non-repetitive queries where \(q=\tau q^{\bar{H^f}}+q^{\bar{f_1}}+q^{\bar{f_2}}+q^{\bar{f_3}}\). Recall from the previous discussion that the simulator \(S^{f}\) answers each new query of \(\mathcal A \) to \(\bar{f_1}\) and \(\bar{f_2}\) with a random value. The simulator \(S^{f}\) answers each new query to \(\bar{f_3}\) by checking the possibility of the combination of the current query with the previous entries in \(T_{\bar{f_1}},\,T_{\bar{f_2}}\) and \(T_{\bar{f_3}}\) in order to be in consistence with the queries to \(\bar{H^f}\) and \(R\). Since the number of entries in the tables \(T_{\bar{f_1}},\,T_{\bar{f_2}}\) and \(T_{\bar{f_3}}\) together would not be more than \(q\), the running time of the simulator is \(t_{S}\le q(q+1)/2\). The time of \(\mathcal A \) to maximize its advantage for \(q\)-queries, \(t_{A}\), can be any value.

We analyze the advantage of \(\mathcal A \) by considering Games \(G_{i}\) for \(i\in \{0,1,\ldots ,7\}\) that are informally described in the following (formal descriptions of the games are given in “Appendix 4”). We will denote \(\mathcal A \) with access to (playing) Game \(G_i\) by \(\mathcal A ^{G_i}\). For each game \(G_{i}\), we let \(p_i=Pr[\mathcal A ^{G_i}\Rightarrow 1 ]\). We start with the game \(G_{0}\) which directly communicates with \((H^{f},(f_1,f_2,f_3))\) and complete the proof with the game \(G_{7}\) which emulates \((R,S^{f})\). The intermediate games \(G_{1},\ldots ,G_{6}\) would slowly transform these games into each other. We start the game playing as follows:

  • Game 0 \((G_0)\) : This game shows the communication of \(\mathcal A \) with \(H^f,f_1,f_2\) and \(f_3\).

  • Game 1 \((G_1)\) : We denote by \(IH\) a subroutine that emulates the iteration process of the SFPF hash function \(H^{f}\). This game exactly emulates \(H^f\) and \(f_1,f_2\) and \(f_3\). It is identical to \(G_{0}\) except that FIL-ROs \(f_1,f_2\) and \(f_3\) are chosen in a “lazy” manner. Namely, we introduce a controller \(C_{H}\) that keeps the history of all queries to \(\bar{f_1},\bar{f_2}\) and \(\bar{f_3}\) in the Tables \(T_{\bar{f_1}},\,T_{\bar{f_2}}\), and \(T_{\bar{f_3}}\) respectively. Initially, the tables are empty. Upon receiving a query from \(\mathcal A \) to \(\bar{f_1},\bar{f_2}\) or \(\bar{f_3},\,C_{H}\) first checks in their respective tables \(T_{\bar{f_1}},\,T_{\bar{f_2}}\), or \(T_{\bar{f_3}}\) for an entry corresponding to the query and if found, \(C_{H}\) returns that entry to \(\mathcal A \) consistently. Otherwise \(C_{H}\) returns a random value for the query. In addition, \(C_{H}\) uses a subroutine, denoted \(IH\), that emulates the iteration process of \(H^{f}\) to answer the queries of \(\mathcal A \) to \(\bar{H^{f}}\). Now, we can see that \(G_{1}\) is a syntactic representation of \(G_{0}\). Thus, \(p_{1}=p_{0}\).

  • Game 2 \((G_2)\) : This game is identical to \(G_{1}\) except that \(C_{H}\) maintains trees to detect the connection between queries and responses. The functions \(GetPath\) and \(NewPath\) (explained before) are used to access and update the trees respectively. The only change from \(G_1\) to \(G_2\) is the access and update of the trees for new queries to \(\bar{f_1},\bar{f_2}\) and \(\bar{f_3}\). However, it has no affect on the random selection of the values returned to \(\mathcal A \). Thus, \(p_{2}=p_{1}\).

  • Game 3 \((G_3)\) : In this Game, \(C_{H}\) does not let \(IH\) to directly communicate with \(\bar{f_1},\bar{f_2}\) and \(\bar{f_3}\), but it changes \(IH\) such that the FIL-ROs are simulated in \(IH\). However, it has no effect on the returned values to \(\mathcal A \). Thus, \(p_{3}=p_{2}\).

  • Game 4 \((G_4)\) : This game is identical to \(G_{3}\) except that for the new queries to \(\bar{f_3},\,C_{H}\) accesses the trees to find a root connected to the current query to \(\bar{f_3}\). If \(C_{H}\) finds such path, it concatenates the message blocks that included in that path with the current message queried to \(\bar{f_3}\) and queries it to \(IH\). However, for this query, \(IH\) returns a random value and it does not change \(\mathcal A \)’s advantage in comparison with \(G_3\). Thus, \(p_{4}=p_{3}\).

  • Game 5 \((G_5)\) : In this game, \(C_H\) applies some restriction on the values returned to \(\mathcal A \). The controller \(C_H\) restricts the returned values for a query to \(\bar{f_1}\) or \(\bar{f_2}\) to not collided with any value in \(T_{\bar{f_1}}^{R}\bigcup T_{\bar{f_2}}^{Q_y} \bigcup T_{\bar{f_2}}^{R}\bigcup T_{\bar{f_3}}^{Q_y}\). In general, any event that lets \(C_{H}\) terminate the game is considered as a \(bad\) event. Such events are explained below. It is obvious that \(G_4\) and \(G_5\) are identical until a bad event is set to true in \(G_5\). This is denoted by \(bad \leftarrow \mathrm{true}\). Hence, the maximum advantage of \(\mathcal A \) in distinguishing \(G_5\) from \(G_4\) (transient from \(G_4\) to \(G_5\)) is at most the maximum probability of the occurrence of \(bad\) events. Thus:

    $$\begin{aligned}&\left| Pr [\mathcal A ^{G_5}\Rightarrow 1] - Pr [\mathcal A ^{G_4}\Rightarrow 1]\right|\\&\le Pr [\mathcal A ^{G_5}\Rightarrow (bad \leftarrow true)] \end{aligned}$$

    The probability that the bad events (explained below) \(bad_{\bar{f_1}}\) or \(bad_{\bar{f_2}}\) are set to true in \(G_5\) is denoted as \(Pr^{bad_{\bar{f_1}}}_{G_5}\) and \(Pr^{bad_{\bar{f_2}}}_{G_5}\) respectively. Thus:

    $$\begin{aligned} Pr [\mathcal A ^{G_5}\Rightarrow (bad \leftarrow true)] \le Pr^{bad_{\bar{f_1}}}_{G_5}+Pr^{bad_{\bar{f_2}}}_{G_5} \end{aligned}$$

    Now we bound each of the bad events as follows:

    1. 1.

      \(bad_{\bar{f_1}}\): This event is set to true if the current selected random value for a query to \(\bar{f_1}\) is collided with a value in \((T_{\bar{f_1}}^{R}\bigcup T_{\bar{f_2}}^{Q_y} \bigcup T_{\bar{f_2}}^{R}\bigcup T_{\bar{f_3}}^{Q_y} )\). For the \(i\)th query to \(\bar{f_1}\) we have \(i-1\) domain and range points defined for \(\bar{f_1}\) and up to \(q^{\bar{f_2}}\) (resp. \(q^{\bar{f_3}}\)) previous queries to \(\bar{f_2}\) (resp. \({\bar{f_3}}\)). Thus:

      $$\begin{aligned} \left| (T_{\bar{f_1}}^{R}\bigcup T_{\bar{f_2}}^{Q_y} \bigcup T_{\bar{f_2}}^{R}\bigcup T_{\bar{f_3}}^{Q_y}) \right|\le i-1+2q^{\bar{f_2}}+q^{\bar{f_3}} \end{aligned}$$

      The probability that one of the values that set this bad event to true is selected at random from \(\{0,1\}^n\) for the \(i\)th query to \(\bar{f_1}\) is not larger than \((i-1+2q^{\bar{f_2}}+q^{\bar{f_3}})/2^n\). Hence, we can sum up this probability over all the queries to \(\bar{f_1}\) and bound the probability of \(bad_{\bar{f_1}}\) occurrence, \(Pr^{bad_{\bar{f_1}}}_{G_5}\), as follows:

      $$\begin{aligned}&Pr^{bad_{\bar{f_1}}}_{G_5}\le \sum ^{q^{\bar{f_1}}}_{i=1} \frac{(i-1+2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^n}\\&=\frac{1}{2^n} \left(\sum ^{q^{\bar{f_1}}}_{i=1}i \!-\! \sum ^{q^{\bar{f_1}}}_{i=1}1 \!+\! 2q^{\bar{f_2}}\sum ^{q^{\bar{f_1}}}_{i=1}1\!+\! q^{\bar{f_3}}\sum ^{q^{\bar{f_1}}}_{i=1}1\right)\\&\le \frac{{(q^{\bar{f_1}})^2}+ 2q^{\bar{f_1}}(2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n+1}}\\&\le \frac{{q^{\bar{f_1}}}({q^{\bar{f_1}}}+2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n}} \end{aligned}$$
    2. 2.

      \(bad_{\bar{f_2}}\): This event is set to true if the current selected random value for a query to \(\bar{f_2}\) is collided with a value in \((T_{\bar{f_1}}^{R}\bigcup T_{\bar{f_2}}^{Q_y} \bigcup T_{\bar{f_2}}^{R}\bigcup T_{\bar{f_3}}^{Q_y})\). Hence, based on the calculation of \(bad_{\bar{f_1}}\) , we can bound the probability of \(bad_{\bar{f_2}}\) occurrence, \(Pr^{bad_{\bar{f_2}}}_{G_5}\), as follows:

      $$\begin{aligned} Pr^{bad_{\bar{f_2}}}_{G_5}&\le \sum ^{q^{\bar{f_2}}}_{i=1} \frac{(2(i-1)+q^{\bar{f_1}}+q^{\bar{f_3}})}{2^n}\\&= \frac{{(q^{\bar{f_2}})^2}+ q^{\bar{f_2}}(q^{\bar{f_1}}+q^{\bar{f_3}})}{2^{n}}\\&= \frac{{ q^{\bar{f_2}}(q^{\bar{f_1}}+q^{\bar{f_2}}+q^{\bar{f_3}})}}{2^{n}} \end{aligned}$$

    Thus:

    $$\begin{aligned}&\left| Pr [\mathcal A ^{G_5}\Rightarrow 1] - Pr [\mathcal A ^{G_4}\Rightarrow 1]\right|\le Pr^{bad_{\bar{f_1}}}_{G_5}+Pr^{bad_{\bar{f_2}}}_{G_5}\\&\quad \le \frac{{q^{\bar{f_1}}}({q^{\bar{f_1}}}+2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n}}+ \frac{q^{\bar{f_2}}(q^{\bar{f_1}}+q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n}}\\&\quad \le \frac{(q^{\bar{f_1}}+q^{\bar{f_2}})(q^{\bar{f_1}}+2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n}} \end{aligned}$$
  • Game 6 \((G_6)\) : In game \(G_6, C_H\) changes \(IH\) such that it simply return a random value for any new query. The implementation of \(IH(y,M)\) in \(G_5\) follows the \(SFPF\) iteration, while \(G_6\) returns a random value for any new query to \(\bar{H^{f}}\). It is obvious that the returned values for the queries to \(\bar{H^f}\) in \(G_6\) and \(G_5\) are determined by the \(IH\)- sub function. Both games return random values for any new query \((y,M)\) to \(\bar{H^{f}}\) where \(M\) consists of \(N\) message blocks \(M_{i}\) for \(i=1,\ldots ,N\). \(G_{5}\) answers such queries by invoking \(\bar{f_1}(y_0,M_1), \bar{f_2}(y_{i-1},M_i)\) for \(2\le i \le N-1\), and \(\bar{f_3} (y_{N-1},M_N)\) in the order. Whereas for any new query \((y,M)\) to \(\bar{H^{f}}, G_6\) does not invoke \(\bar{f_1}, \bar{f_2}\) and \(\bar{f_3}\) and selects the answer to such new queries randomly from \(\{0,1\}^{n}\). Hence, in \(G_6\) the cardinality of \(T_{\bar{f_1}}^{R},\,T_{\bar{f_2}}^{Q_y},T_{\bar{f_2}}^{R}\), and \(T_{\bar{f_3}}^{Q_y}\) would be decreased up to \(q^{\bar{H^f}}, (\tau -2)q^{\bar{H^f}}, (\tau -2)q^{\bar{H^f}}\) and \(q^{\bar{H^f}}\) respectively, which reduces the probability of receiving a bad event in \(G_6\) compared to that probability in \(G_5\). Hence, we have:

    $$\begin{aligned}&\left| Pr [\mathcal A ^{G_6}\Rightarrow 1] - Pr [\mathcal A ^{G_5}\Rightarrow 1]\right|\le \left| Pr^{bad_{\bar{f_1}}}_{G_5}\!-\!Pr^{bad_{\bar{f_1}}}_{G_6}\right|\\&\qquad \!+\!\left|Pr^{bad_{\bar{f_2}}}_{G_5}\!-\!Pr^{bad_{\bar{f_2}}}_{G_6} \right|\le Pr^{bad_{\bar{f_1}}}_{G_5}\!+\!Pr^{bad_{\bar{f_2}}}_{G_5}\\&\quad \le \frac{(q^{\bar{f_1}}+q^{\bar{f_2}})(q^{\bar{f_1}}+2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n}} \end{aligned}$$
  • Game 7 \((G_7)\) : We finish the play with the “ideal” game \(G_7\) that exactly simulates \(R\) and \(S^{f}\). In this game, \(\bar{H^f}\) does not send its query to \(IH\) any more and respond to any new query randomly. However, it has no affect on the returned values to \(\mathcal A \). Thus, in \(G_{7}, \mathcal A \) does not gain any additional advantage over \(G_{6}\) and \(p_{7}=p_{6}\). In this game, \(\bar{H^f}\) is exactly the same as \(R\), and the controller \(C_{H}\) is precisely equivalent to \(S^{f}\), our proposed simulator for \(f_{1},\,f_{2}\) and \(f_{3}\).

We complete the proof by combining Games 0 to 7. Note that \(G_0\) emulates \(H^f\) and \(f_1 f_2\) and \(f_3\) and \(G_7\) exactly emulates \(R\) and \(S^{f}\). We conclude that:

$$\begin{aligned}&Adv^{indif}_{R,S}(\mathcal A )=\left|Pr [\mathcal A ^{H^f,(f_1,f_2,f_3)}]-Pr [\mathcal A ^{R,S^{f}}] \right| \\&\quad \le 2\times ( Pr^{bad_{\bar{f_1}}}_{G_5}+Pr^{bad_{\bar{f_2}}}_{G_5})\\&\quad \le 2\times \frac{(q^{\bar{f_1}}+q^{\bar{f_2}})(q^{\bar{f_1}}+2q^{\bar{f_2}}+q^{\bar{f_3}})}{2^{n}} \end{aligned}$$

With further simplification, this results in

$$\begin{aligned} Adv^{indif}_{R,S}(\mathcal A )= \epsilon \le \frac{4q^2}{2^{n}} \end{aligned}$$

\(\square \)

Appendix 3: Simulator for the SFPF-N hash function

In this section, we present a simulator for the SFPF-N hash function in Algorithm 11. This simulator emulates \(f_1\) and \(f_2\) such that SFPF-N is indifferentiable from \(R\). For simplicity and without loss generality, this simulator assumes that the entire last block is used for MD strengthening. Its running time \(t_S=O(q^{2})\), and \(\mathcal A \)’s advantage after \(q\) queries is bounded by \(\epsilon \le O(\tau ^{2} \cdot q^2 \cdot 2^{-n})\).

figure a11

Appendix 4: Formal description of the Games used in the indifferentiability analysis of the SFPF hash function

In this section, we provide figures that formally describe the games used in the indifferentiability analysis of the SFPF hash function. See Figs. 45678.

Fig. 4
figure 4

\(G_0\) representation

Fig. 5
figure 5

\(G_1\) (boxes removed) ad \(G_2\) (boxes included) representation

Fig. 6
figure 6

\(G_3\) (boxes removed) ad \(G_4\) (boxes included) representation

Fig. 7
figure 7

\(G_6\) and \(G_5\) representation with their \(IH\) subfunctions

Fig. 8
figure 8

\(G_7\) representation

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bagheri, N., Gauravaram, P., Knudsen, L.R. et al. The suffix-free-prefix-free hash function construction and its indifferentiability security analysis. Int. J. Inf. Secur. 11, 419–434 (2012). https://doi.org/10.1007/s10207-012-0175-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-012-0175-4

Keywords

Navigation