1 Introduction

As a ubiquitous phenomenon in nature, chaos is a kind of deterministic random-like process generated by nonlinear dynamical systems. Chaotic dynamical systems possess the following main characteristics: sensitivity to tiny changes in initial conditions, random-like behavior, ergodicity, unstable periodic orbits, desired diffusion and confusion properties, and one-way property. Due to these properties, chaotic systems have become a very good candidate for use in the field of cryptography. The existing related research progress includes chaos-based symmetric encryptions [17], security protocols [8, 9], asymmetric encryptions [10, 11], and hash functions [1218].

Hash function is one of the major tools in cryptography, which is usually used for integrity protection in conjunction with message authentication and digital signature schemes. There are many famous hash functions such as MD4, MD5, and SHA-1 [1921] which are realized by complicated methods based on logical operations or multi-round iteration of some available ciphers [22]. Recently, some of the weaknesses of the famous and widely used hash functions MD5 and SHA-1 have been discovered. Apparently, it is very easy to carry out collision attack on MD5 [2325], and at the same time, the collision attack on SHA-1 is close to practical [26]. Thus, the research on the design of secure and efficient hash functions attracts more and more attention.

In recent years, there has been a considerable advance in the construction of one-way hash functions based on chaotic maps, including 1D piecewise linear/nonlinear chaotic map [27, 28], 2D nonlinear piecewise linear chaotic map [29], high-dimensional chaotic map [30], chaotic neural networks [31], hyper chaos [32], and chaos S-Box [33], which indicates extensive effort in constructing hash functions based on different new methods although some of these new algorithms have been proven to be insecure [22, 3437].

Recently, high-dimensional dynamical systems have been an attractive research field for many scientific areas particularly in computer sciences and have received considerable attention in secure communication field. This kind of chaotic map has better mixing nature and higher complexity. Moreover, it is more difficult or even impossible to predict the time series generated by the system. These properties granted them to be a good candidate for secure communications. To the best of our knowledge, only a few chaos-based hash functions with high-dimensional dynamical systems have been reported [30, 32, 38]. For instance, an algorithm for image encryption based on a high-dimensional chaotic map with high efficiency is proposed in [4]. The high-dimensional chaotic maps [4] possess remarkable dynamical properties such as invariant measure, ergodicity, the possibility of KS-entropy calculation, and bifurcation without any period doubling [4, 39, 40].

Hence, in this paper, a new scheme for constructing parallel hash function based on high-dimensional trigonometric map is suggested, and efficiency of the scheme is investigated in detail. The proposed algorithm fulfills the performance requirements of hash functions. Theoretical analysis and experimental results indicate that the proposed scheme demonstrated several interesting features, such as good statistical properties, strong collision resistance, and high flexibility. At the same time, it can keep the parallel structure and message sensitivity as required by practical hash functions.

2 A one-way hash function based on high dimensional chaotic map

2.1 High dimensional chaotic map

Hierarchy of the high-dimensional chaotic maps with ergodic behavior in the two intervals [0,1] and [0,) is presented in [4, 39, 40]. One-parameter families of chaotic maps of the interval [0, 1] with an invariant measure can be defined as the ratio of polynomials of degree N

Φ N ( x , α ) = α 2 F 1 + ( α 2 - 1 ) F ,

where α is the control parameter. F substitutes with the Chebyshev polynomial of the first kind T N (x), and Nis the degree of Chebyshev polynomials. Hence,

Φ N ( x , α ) = α 2 ( T N ( x ) ) 2 1 + ( α 2 - 1 ) ( T N ( x ) ) 2 .

We used its conjugate or isomorphic map [39, 41]. Conjugacy means that the invertible map h(x)= ( 1 - x ) x maps I = [0,1] into [0,) [39, 42]. Using the hierarchy of families of one-parameter chaotic maps, we can generate new hierarchy of tripled maps with an invariant measure. In this paper, one of the hierarchies of the chaotic map in the interval [0,) is adapted for constructing chaos-based hash function. Hence, this chaotic map can be defined as [4]

Φ N (x,α) x ( n + 1 ) = 1 α 1 2 tan 2 ( N 1 arctan x ( n ) ) , y ( n + 1 ) = 1 α 2 2 ( x ( n ) ) tan 2 ( N 2 arctan y ( n ) ) , z ( n + 1 ) = 1 α 3 2 ( x ( n ) , y ( n ) ) tan 2 ( N 3 arctan z ( n ) ) ,
(1)

where

β ( x ( n ) ) = ( α 2 2 - α 2 + ϵx ( n ) ) 2 , α 2 ( x ( n ) ) = 2 β ( x ( n ) ) 1 + β ( x ( n ) ) β ( x ( n + 1 ) ) β ( x ( n ) ) , β ( x ( n ) , y ( n ) ) = ( α 3 2 - α 3 + ϵ y ( n ) ) 2 , α 3 ( x ( n ) , y ( n ) ) = 2 β ( x ( n ) , y ( n ) ) 1 + β ( x ( n ) , y ( n ) ) × β ( x ( n + 1 ) , y ( n + 1 ) ) β ( x ( n ) , y ( n ) ) ,

where α 1, α 2(x(n)) and α 3(x(n),y(n)) are control parameters of the chaotic map, and N 1, N 2, and N 3 are degrees of Chebyshev polynomials. In this paper, Equation 1 as a high-dimensional chaotic map is used in the process of generating a hash function.

2.2 Proposed algorithm

In this section, the framework of the proposed algorithm is described.

Step 1. First, the message M:{0,1}a is input and transformed into message blocks M n×1.

Step 2. Keys of the algorithm (initial conditions, control parameters, and coupling parameters) are input.

Step 3. Creation and initiation of the threads T1 and T2. In this step, the two separate threads are generated, and initial conditions and control parameters are initiated (the threads are demonstrated in Figure 1 using two parallel vertical rectangles).

Step 4. In order to remove the transient effect and also initiate the initial conditions of the threads, Equation 1 is iterated 1,500 times, and the initial conditions for the threads are set (x 0′=x 1,000, y 0′ = y 1,000, and z 0′ = z 1,000 for thread one (T1) and x 0″ = x 1,500, y 0″ = y 1,500, and z 0″ = z 1,500 for thread two (T2)).

Step 5. Each thread processes one chunk of the message using y i and z i of the map in Equation 1, and in each round, the control parameters are modified using the value of the processed chunk of the message and new initial conditions y i and z i . This step is repeated until the message is exhausted and the final z n in each of the threads is used to generate matrices D and D (see Figure 1).

Step 6. The matrices D and D are generated using the final values of z n in step 5 (see Figure 1). In this step, the chaotic map, Equation 1, is iterated L/8 times (L is the final size of the hash value which by default is set to 256). Finally, the matrices are combined using the function mentioned below.

Function Combine(D [1..k],D [1..k])

      For i = 0 to k

         If D [i] mod 2 = 0

         D[i] = D [i] X O R D [i];

Else

         D[i] = D [i] m o d 28 X O R D [i];

End if

   end for

  Return D;

  Return Result;

End Function

Step 7. The final value of the hash function is output.

Figure 1
figure 1

Block diagram.

Figure 1 shows a block diagram that illustrates the steps in the algorithm. In this algorithm, the values of the parameters are as follows:

Initial conditions: x = 3, y = 8, and z = 93

Control parameters: α 1 = 12, α 2 = 1.5, and α 3 = 3

Coupling parameters: ϵ = 0.2 and ϵ  = 0.3

N 1 = 20, N 2 = 4, and N 3 = 4

Note that all of the control parameters are selected in the chaotic region.

3 Performance analysis

3.1 Hash results of messages

We have chosen a message with the length of 1,024 characters, and all the characters are chosen equal to zero; in this way, it is easier to show the sensitivity of the hash function to the message.

Condition 1: The original message contains 1,024 null characters.

Condition 2: Change the first character to 1.

Condition 3: Change the 800th character to Z.

Condition 4: Add a space to the end of the message.

Condition 5: Change the hash key from key α 1 = 12 to α 1 = 12.00000000000001.

Condition 6: Change the hash key from key x = 3 to x = 3.00000000000001.

The corresponding hash values in hexadecimal format are given as follows:

Condition 1: FD97319170D87CB2F25CDCD8917ACA60

Condition 2: E0BC3C61DFC69D6D40B7491BF88427EC

Condition 3: 9EEAFAAABFFE6539A011C180CA4E70B9

Condition 4: A164446E3387324D7E267991225B6C9

Condition 5: 3E5576B020A6095859217050F59E97F

Condition 6: 761B97931E7528308A4ED14F7ACA8028

The graphical display of binary sequences is shown in Figure 2. The simulation results indicate that the one-way property of the proposed algorithm is remarkably high, and any least difference of the message or key will cause huge changes in the final hash value.

Figure 2
figure 2

Hash values under different conditions.

3.2 Statistic analysis of diffusion and confusion

In order to hide message redundancy, Shannon pointed in his masterpiece [43] that ‘It is possible to solve many kinds of ciphers by statistical analysis,’ and therefore, he suggested two methods of diffusion and confusion for the purpose of frustrating the powerful statistical analysis. For the hash value in binary format, each bit is only 1 or 0, so the ideal diffusion effect should be that any tiny changes in initial conditions lead to the 50% changing probability of each bit. A message is selected and the hash value for the message is generated; then, 1 bit of the message is changed randomly and a new hash value is generated. The two hash values are compared with each other, and the changed bits are counted and called B i . This test is performed N times, and the corresponding distribution of B i is shown as Figure 3a,b, where N = 10,000.

Minimum changed bit number B m i n  = m i n({B i }1N);

Maximum changed bit number B m a x  = m a x({B i }1N);

Mean changed bit number B ̄ = i = 1 N B i /N;

Mean changed probability P=( B ̄ /128)×100%;

Standard variance of the changed bit number ΔB= 1 N - 1 Σ i = 1 N ( B i - B ̄ ) 2 ;

Standard variance ΔP= 1 N - 1 Σ i = 1 N ( B i / 128 - P ) 2 ×100%;

Figure 3
figure 3

Distribution of changed bit number. (a) Plot of B i and (b) histogram of B i .

Through the tests with N = 512, 1,024, 2,048, and 10,000, respectively, the corresponding data are listed in Table 1. Based on the analysis of the data in Table 1, we can draw the conclusion: the mean changed bit number B and the mean changed probability P are both very close to the ideal value of 64 bits and 50%, while Δ B and Δ P are very little, which indicates that the capability for diffusion and confusion is very stable and strong.

Table 1 Static number of changed bit B i

3.3 Analysis of collision resistance and birthday attack resistance

Collision resistance and birthday attacks are similar in idea. Both are originated from the probability problem that two random input data are found to hash to the same value. In the proposed scheme, the state of the chaotic map is directly related to each message bit. Also, the control parameters and the initial conditions of the chaotic map are fully affected by the message bits. These operations will ensure that each bit of the final hash value is related to all the bits of the message. Therefore, even 1-bit change in the message would lead to a completely different hash code. Moreover, the initial conditions and the control parameters of the chaotic map are related to the message and the state of the map, so the keyed hash function is so much sensitive to the keys.

3.4 Collision test

In order to investigate the collision resistance capability of the proposed algorithm, we have performed the following test [12, 28, 29]: first, a message is randomly chosen, and the hash value is generated for it and stored in ASCII format. Then, a bit from a random position in the message is chosen and its value is changed, and a new hash value is generated for the new changed message and is stored in ASCII format. Two hash values are compared, and the number of ASCII character with the same value at the same location in the hash value is counted. The absolute difference of two hash values and the theoretical number of ω with different values during N independent experiments which is denoted by W N (ω) is calculated using the following formulas:

d= i = 1 N |t( e i )-t(e i )|,
(2)
ω = i = 1 N f ( t ( e i ) - t ( e i ) ) , where f ( x , y ) = 1 , x = y , 0 , x y .
(3)
W N (ω)=N×Prob{ω}=N s ! ω ! ( s - ω ) ! ( 1 2 k ) ω ( 1 - 1 2 k ) s - ω ,
(4)

where e i and e i are the i th entry of the original and the new hash value, respectively, and function t(.) converts the entries to their equivalent decimal value. In Equation 4, ω = 0,1,...,s. The theoretical values for Equation 4 are as follows [14]: W N (0) = 9,416, W N (1) = 572, W N (2) = 12, and W N (ω) = 0 for ω = 3,4,...,16. Also, the experimental values of W N (ω), the proposed algorithm, are as follows: W N (0) = 9,387, W N (1) = 591, W N (2) = 22, and W N (ω) = 0 for ω = 3,4,...,16. A plot of the distribution of the number of ASCII characters with the same value at the same location in the hash value is given in Figure 4a,b. It seems that the proposed algorithm possesses a stronger collision resistance capability than the algorithms in [12, 14, 27, 44] and can be compared with [45, 46].

Figure 4
figure 4

Distribution of w . w represents the number of 8-bit subblocks with the same value at the same location in hash value (N = 10,000). (a) Decimal scale of W N (ω) and (b) logarithmic scale of W N (ω).

3.5 Theoretical value of mean of absolute difference

With the assumption that H is a discrete uniform distribution in the range of 0 to 255 (with 16 elements), the ideal mean value of this distribution would be equal to the 1 2 maximum value of the distribution (in the maximum value condition, all elements are equal to 255): 255 × 16 2 =2,040. Using the ideal theoretical value of the hash digest H, we can calculate the ideal theoretical absolute difference of the two hash digests. With the assumption that the two hash digests H and H are ideally uniform, the ‘Sum of Absolute Difference’ of these two digests has to be equal to 2 3 of the mean value of a uniform distribution [47]. As the mean value of a uniform distribution with the characteristics of a hash digest with the size of 128 bits is equal to 2,040, therefore, the ideal mean value of the absolute difference for H and H is equal to 2 3 ×2,040=1,360. Minimum, maximum, and mean values of the absolute difference of two hash values are 547, 2,246, and 1,365.9641, respectively. It can be seen that the mean of absolute difference is 1,365.9641 and is very close to the ideal value.

3.6 Pseudo-collision resistance

Without loss of generality, normally in a collision attack, the attacker not only has access to the message (M) and can modify each block (M i ) of the message during the hashing process, but also can change the medial hash value (H i - 1). This type of collision attack is usually called pseudo-collision attack. The main goal of this attack is to use the weaknesses in the compression function so that the attacker can find a way to reconstruct a hash value for a different message [45, 46]. In the proposed algorithm, the message is processed twice (two different threads), once from the beginning to the end and once from the end to the beginning of the message, so that there would be two medial hash values at the same time for each message block, which with regard to a small change in one or both of the medial values (in here, the control parameters and initial conditions) would lead to a completely different sequence (because of the nature of the proposed high-dimensional chaotic map), so it is almost impossible to find a pattern by changing these values.

3.7 Resistance of forgery attacks

Most of the parallel hash function algorithms have a mixing section in their structure which usually uses the XOR operation for this propose. This operation lets the attackers use the privilege of communicative law and practice forgery attack on the hash function. Some of these algorithms are broken by forgery attack [37]. Not only does the proposed algorithm process the whole message from two sides (which leads to high complexity in mixture), but also in the final step, instead of a simple XOR operation, a conditional method is used in order to resist forgery attack in any section of the algorithm.

3.8 Key space analysis

In this algorithm, besides the control parameter and initial condition of the piecewise nonlinear chaotic map, the parameters of the one-dimensional chaotic map are chosen as the secret keys. From the security point of view, the key space for a cryptographic algorithm should not be less than 2128 in order to resist brute force attacks [48, 49]. If the precision 10-14 is used, the key space size for two coupling parameters, three initial conditions, and three control parameters of the chaotic map is over 2279, so the key space is very large to resist all kinds of brute force attacks.

3.9 Analysis of speed

The proposed hash function is a multi-threaded algorithm which works fine on both single and multi-core processors, while the speed of the algorithm eventually doubles on a multi-core processor. Moreover, the process time needed for generating the hash value is in direct relation with the message length; in Figure 5, we can see that the computational complexity of the algorithm is linear and almost equal to Θ(n). Although the speed of the algorithms is lower than the existing traditional hash functions such as SHA-1 and MD5 [20], it is, however, acceptable for practical use; at the same time, the algorithm is so flexible since the digest size can be 128, 160, 256, or 512 bits in length. Finally, we present the average encryption speed comparison based on the specified platform with some hash algorithms [12, 13, 20, 21, 28, 29, 50, 51]. The average encryption speed of our algorithm (45 Mbps) is higher than that in [12, 13, 50, 51]. In addition, the algorithms of [20, 21, 28, 29] show higher encryption speed.

Figure 5
figure 5

Computational time vs. message size.

3.10 Uniform distribution on hash space

To check the distribution property in hash space, similar to that in [14], the following test is performed: two hash values are generated according to the method described in the previous subsection, and then, the number of the bits changed at each bit’s location is counted. Figure 6 shows the statistical results for N = 10,000. The minimum, maximum, and mean of the changed bit number are 4,896, 5,124, and 5,063, respectively, for N = 10,000. It can be seen that the mean of the toggled bit numbers is 5,063 and is very close to the ideal value of 5,000 (half of the test times). All toggled bit numbers of each bit location in hash space also concentrate around the ideal value. Apparently, this shows the resistance against statistical attack.

Figure 6
figure 6

Distribution of hash value in hash space. The minimum, maximum, and mean of the toggled bit number are 4,896, 5,124, and 50,063, respectively, for N = 10,000.

3.11 Randomness tests for the hash

Cryptographic hash functions are an important tool of cryptography and play a fundamental role in efficient and secure information processing. Cryptographic hash functions can be used to construct pseudo-random number generators. To ensure the security of a cryptosystem, the cipher must have some properties such as good distribution, long period, high complexity, and efficiency. In this algorithm, several tests of randomness are performed on a stream of random numbers generated by the main dynamical system (Equation 1) of the hash function in order to test the randomness of the presented algorithm. These tests include DIEHARD [52], NIST statistical test suite [53], and ENT test suite [54]. According to Tables 2 and 3, 4, and 5 which present NIST, DIEHARD, and ENT test results, respectively, of the sample, it can be concluded that the stream has a highly random behavior.

Table 2 Results of the SP800-22 test suite for the 32-bit proposed high-dimensional map
Table 3 Results of the SP800-22 test suite for the 32-bit proposed high-dimensional map
Table 4 DIEHARD test suite for the 32-bit proposed high-dimensional map
Table 5 Max grade of ENT test suite

3.12 Resistance to birthday attack

The birthday attack is one of the common classic attacks on cryptographic hash functions, which is independent of the algorithm, and can be applied on any algorithm. The main goal of this attack is to find two messages with identical hash values with less than 2n/2 trials (n is the size of hash value) [46, 51, 55]. In fact, the algorithm is flexible so that the length of the hash value can be tuned. If the hash value size is set to 128, the difficulty of the attack is 264 which is huge enough to resist brute force attack. Also in the proposed algorithm, in order to analyze the security and resistance against such an attack, several tests including the random number batteries and collision tests are applied. It can be seen from the results in Table 5 that the entropy of the random numbers generated by the hash function is very close to the ideal value (7.999994 ≈ 8) and that the serial correlation coefficient in the same table is very close to zero, which demonstrates the collision resistance nature of the algorithm. The proposed algorithm, which is visually discussed using the block diagram (Figure 1), is very complex, and the initial conditions and control parameters are modified in each round. According to [56], ‘if an appropriate padding scheme is used and the compression function is collision-resistant, then the hash function will also be collision resistant’. Therefore, the results of the tests, size of the hash value for the presented algorithm, and collision resistance of the proposed algorithm suggest that the effectiveness of the birthday attack is almost impossible and that the algorithm is resistant against this type of attack.

3.13 Flexibility

In this paper, an algorithm based on high-dimensional chaotic map is proposed in order to solve the problems such as the size of hash functions, padding of the message, resistance against differential attacks, and capability of efficiently working on multi-core CPUs using parallel processing. Although the algorithm is designed as an unkeyed hash function, the control parameters and the initial conditions of the algorithm can be treated as the secrete key for a keyed hash function. In comparison to the conventional hash algorithm such as MD5, MAC-DES, and HMAC-MD5 with fixed length, the proposed algorithm is advantageous in terms of adaptation to 160, 256, or 512 bits, with a small change in the steps, and since the algorithm is designed in such a way that it can work in parallel, therefore in longer digest speed, it would not be affected so much. Furthermore, in the proposed algorithm, the chaotic map is iterated with a double-precision floating-point arithmetic. A double-precision floating-point number is represented according to the IEEE 754 floating-point standard found in virtually almost all computers produced since 1980 [57]. The program is implemented in Visual C++.NET, and the standard of the floating point which is used in this compiler is the IEEE 754 floating-point standard so that the hash values generated on any computer would be the same.

4 Conclusion

A novel parallel hash function based on a high-dimensional chaotic map is proposed. In the core of the presented dynamical system lies interesting features such as invariant measure, ergodicity, bifurcation without period doubling, and the possibility of KS-entropy calculation [4]. By using significant properties of chaos and structure of the chaotic map, in the proposed algorithm, the message is totally connected to all parameters, so this structure can ensure the uniform sensitivity of hash value to the message. Since the coupled architecture is a complex chaotic system, it is very difficult or even impossible to predict its time series. Therefore, the complexity and nonlinearity of the chaotic map yield strong bit confusion and diffusion with the low expense of floating-point computations. Theoretical analysis and numerical simulation indicate that the proposed new scheme achieves several desirable features such as high flexibility, good statistical properties, parallel implementation, high key sensitivity, and message sensitivity. Moreover, it resists linear analysis, exhaustive key search, and statistical attack. It can be concluded that the proposed algorithm fulfills the performance requirements of hash function and that it is practical and reliable, with high potential to be adopted in real-world applications.