StringENT test suite: ENT battery revisited for efficient P value computation

Random numbers play a key role in a wide variety of applications, ranging from mathematical simulation to cryptography. Generating random or pseudo-random numbers is not an easy task, especially when hardware, time and energy constraints are considered. In order to assess whether generators behave in a random fashion, there are several statistical test batteries. ENT is one of the simplest and most popular, at least in part due to its efficacy and speed. Nonetheless, only one of the tests of this suite provides a p value, which is the most useful and standard way to determine whether the randomness hypothesis holds, for a certain significance level. As a consequence of this, rather arbitrary and at times misleading bounds are set in order to decide which intervals are acceptable for its results. This paper introduces an extension of the battery, named StringENT, which, while sticking to the fast speed that makes ENT popular and useful, still succeeds in providing p values with which sound decisions can be made about the randomness of a sequence. It also highlights a flagrant randomness flaw that the classical ENT battery is not capable of detecting but the new StringENT notices, and introduces two additional tests.


Introduction
Random numbers are ubiquitous, and their use is widespread in technology. They have a myriad of applications: mathematical simulation and generation [1][2][3][4][5][6], initialization of encryption algorithms [7][8][9][10][11][12], key generation [13][14][15], video 1. True random number generators use what we call entropy sources, which are sources of intrinsic natural randomness, in order to draw random bits. Examples of such natural phenomena include radioactive decay, thermal noise, clock jitter, meteorological changes... However, extracting this sort of information is usually rather costly, and hence the speed of such generators cannot generally match the demand of bitrate that is often required in many applications. Moreover, usually the data has to be processed through a so-called whitening process to get rid of biases. 2. Pseudo-random number generators produce a seemingly random output, which is usually obtained through deterministic methods. They are initially fed with a seed, which ideally emanates from a TRNG or some high entropy source.
There are numerous proposals that implement lightweight PRNGs and TRNGs, so that they can be fit in small embedded systems. Since these random numbers are frequently used for security purposes, it is extremely important that they actually behave as randomly as possible and that in particular there is no way to predict their output by just looking at previously generated numbers.
Hence, randomness tests play a major role in assessing and designing new random number generator proposals.
On this paper, we will treat number sequences one byte at a time. Therefore, for an n-byte sequence x = x 0 x 1 . . . x n−1 , to be considered a random enough sequence we expect that x 0 , x 1 , . . . , x n−1 to be indistinguishable from independent, identically distributed samples (i.i.d.) from a discrete uniform distribution U{0, 1, . . . , 255}. If we instead worked with the sequence at the bit level, the equivalent definition would be that their output bits cannot be efficiently and consistently distinguished with a given test battery from i.i.d. samples coming from a Bernoulli( 1 2 ) distribution. This paper is organized as follows: in Sect. 2, we describe the theoretical foundation on which our work is based. In Sect. 3, we analyse the (in)dependence of the tests under study. In Sect. 4, we describe a degenerate generator we have designed that is able to fool the ENT battery, thus highlighting its limitations. In Sect. 5, we carry out a detailed study of the statistics underlying the ENT tests and define the p values associated with them. In Sect. 6, we describe two tests that we will incorporate into the new battery to extend its analysis capacity and address the problem detected in Sect. 4. In Sects. 7 and 8, we focus, respectively, on the measurement of computational time and existing correlations. Finally, in Sect. 9, we draw the main conclusions of our work and point out future research lines.

Randomness tests
We have established what we expect from a random sequence. Therefore, we now have a hypothesis that we can try and contrast with a given byte sequence. This can be done through hypothesis testing, by computing statistics from which we derive p values. Suppose we have a sequence x = x 0 . . . x n−1 , with each x i ∈ {0, 1, . . . , 255}.

Statistics
A statistic is a function of the sequence x, f (x). It can represent any kind of operation, for instance, the sequence mean, the number of zeros that x has in binary representation, etc. If we can derive the distribution of the statistic under the assumption that x is random, this enables us to perform hypothesis testing.

Hypothesis testing
Hypothesis testing is a statistical inference procedure that allows us to check whether a certain hypothesis holds. We call the hypothesis we want to test the null hypothesis, H 0 . In our case, the null hypothesis is that a given sequence is random. The alternative hypothesis is the opposite of the null hypothesis, and is denoted H 1 . In our case, this means the sequence is not random.
A statistic whose distribution under the null hypothesis is known is computed. Since the distribution is known, we can compute the probability of obtaining statistics as extreme or more extreme than the ones observed. This is called a p value.
We fix a number α, usually between 0.001 and 0.05, and use it as a discriminating threshold. If the observed p value is smaller than α, we deem the statistic to be too extremely unlikely, and so we reject the null hypothesis. Otherwise, we do not reject the null hypothesis.
Therefore, even when the null hypothesis is true, there is a probability α that we will reject the null hypothesis. This error is referred to as a Type I error. On the other hand, in the scenario where the null hypothesis does not hold, but we fail to obtain a p value smaller than α and hence we do not reject the null hypothesis, we incur in a Type II error.
It is important to note that statistical tests are necessary but will never be sufficient to completely evaluate the quality of a generator. Designers and practitioners should never put blind faith in their results, as only negative results are set in stone. It is important to constantly keep in mind that many poor and fully deterministic random number generators can fool most if not all randomness tests easily, a case in point been AES in counter mode.
That said, there are many randomness test suites which compute various useful statistics in order to assess whether a sequence looks random. To name a few, we can highlight the following (for more details about batteries see [18]): 1. ENT (for Entropy) [19]. It is made of five simple tests, which are extremely fast. Only one of the tests outputs an approximate p value, and the rest just produce statistics that can yield useful information with respect to the apparent randomness of the sequence. 2. NIST SP 800-22 [20]. It is one of the most widely used due to it being backed by the National Institute of Stan-dards and Technology. It is a quite demanding suite with 15 tests, and a relatively high computing cost, specially compared to the former suite. 3. TESTU01 [21]. It treats the sequence as a collection of floating point numbers in [0, 1). It has three sets of tests, composed of 10, 96 and 100 tests each. 4. Dieharder [22]. This suite is built upon the famous suite Diehard, and adds a number of tests from other sources as well.
We will focus on ENT because it is very widely used due to its speed. However, we will prove that this speed comes with some caveats.

ENT
The ENT test suite [19] consists of the following statistics: entropy, ideal compression rate, Chi-square test, arithmetic mean, Monte Carlo estimation of π and serial correlation.
The Chi-square test is the only one which outputs an approximate p value, which helps determine whether the sequence is significantly non-random. We will now explain what the tests compute.

Entropy
This test finds the entropy as classically defined in Information Theory [23]. We denote the sequence X = {x k } N −1 k=0 . The number of occurrences of values i between 0 and 255 is counted. We then denote P X (i) = Ap i n , where Ap i = #{0 ≤ k ≤ n − 1 : x k = i}. The entropy of X is then computed by the following formula [23]: Intuitively, the entropy conveys the amount of bits per byte of in-compressible information the sequence holds. Therefore, it yields a value between 0 and 8. A value of 0 means the sequence contains no information, and 8 means the sequence is in-compressible, at the byte level. The ideal compression percentage is also given, which is computed as: Therefore, it is really important to bear in mind that this is simply one additional visualization of the entropy's meaning, and does not actually constitute a different independent test.
Note that perfect entropy is achieved when all values i have P X (x i ) = 2 −8 , as one can see that Eq. (1) is then equivalent to 8.

Chi-square test
The Chi-square test measures the goodness of fit for the observed number of occurrences with respect to a uniform distribution. We draw a statistic known as X 2 , computed in the following fashion: where Ap i again denotes the occurrences of value i ∈ {0, 1, . . . , 255} in the sequence, Ap i = #{0 ≤ k ≤ n − 1 : x k = i}, and E Ap i are the expected appearances under the perfect randomness hypothesis, that is n 256 . After computing this value, the test displays an approximate p value in the form of a percentage. That is, it shows what the probability is (under the randomness hypothesis) to get a statistic X 2 as the one observed. That is, the probability of getting a worse fit. This p value is given by where χ 2 255 is the Chi-square distribution with 255 degrees of freedom. Usually, as a rule of thumb, we require that the output size be large enough to expect at least around 5 appearances per bin. Otherwise, the distribution of X 2 might not be close enough to the χ 2 255 distribution. This test is also focused on measuring uniformity between the values {0, 1, . . . , 255} in the sequence.

Arithmetic mean
Since we interpret bytes as integers between 0 and 255, this is simply the arithmetic mean of all observed bytes, which has an expected value of 127.5 under randomness hypothesis. We will writē

Monte Carlo Pi estimation
In order to perform this test, coordinates are generated in the rectangle [0, 2 24 − 1]. In order to do so, 3 bytes form the value of x and another 3 characterize y. Since 3 bytes are 24 bits, these represent unsigned integer values from 0 to 2 24 − 1. Once a pair (x, y) is generated, we check whether it is inside the closed Euclidean ball of centre (0, 0) and radius r = 2 24 − 1. This is true if and only if x 2 + y 2 ≤ r 2 . Then the number of generated points within the circle is counted. The rate between the points within the circle area and the total number of points approximates the ratio between the area of a quarter of a circle and the rectangle's area. This is given by: points within circle total points circle part's area rectangle's area = π · r 2 /4 So the π estimation is given byπ = 4 · points within circle total points = 4 · pts in pts . The error percentage is also shown ( Fig. 1).

Serial correlation
This test measures how one byte depends on the previous byte. Alternatively, one can see it as a way of measuring the predicting power of the last output byte over the following one. It counts the appearances of each pair of bytes and returns a value between −1 and 1, with 0 meaning no linear relation between bytes, and an absolute value of 1 being complete linear dependence. This coefficient is computed as the Pearson's r correlation coefficient between the sequence x = x 0 . . . x n−1 and the sequence shifted a byte "to the left" The Pearson's correlation coefficient r between two samples x and y is given by the following expression: This is the only test in the original ENT suite that tries, to a certain extend, to measure independence as opposed to the others which are concerned exclusively with uniformity.
Unfortunately only one of these tests produces an approximate p value. This means the interpretation of the rest of statistics is not trivial. We will address this issue in Sect. 5. In Fig. 2, we show the result we get from applying the classical ENT test to a gif file included with the source code.

Independence of tests
We studied the tests in ENT with good-quality pseudorandom data, by analysing 10,000 files of size 15 MB, drawn from Unix's PRNG /dev/urandom. Firstly, we run the suite to compute the resulting 50,000 statistics. We show a histogram for the data in Fig. 3. One of the first things we can notice is that the entropy takes very few different values for these random-like data. More precision for this output might be desirable.  Another interesting fact we could notice in Fig. 3 is that the Mean, Monte Carlo π and Serial Correlation statistics seem to be normally distributed. In order to check that this is indeed the case, we run the Shapiro-Wilk test and the Anderson-Darling test. Either test returns a p value. We divide the 10, 000 files in two groups of 5, 000 and perform a normality test in both sets (S1 and S2 respectively). Table 1 confirms that indeed the Mean, Monte Carlo π and Serial Correlation statistics are normally distributed. This will be very helpful in developing p values for these statistics later in Sect. 5.
A quick look at the definition of the tests can hint that the Chi-square test and the Entropy test might be closely related since they both aim to measure the uniformity of bytes in the sequence. First, we graph a scatter plot of the statistics obtained for our data. See Fig. 4. Figure 4 shows that the data for the entropy and the Chisquare tests lie almost exactly on a line. In order to have a closer look at this, we compute the Pearson's r correlation coefficient, as defined in Eq. 5, between these two statistics. This is shown in Table 2. Table 2 shows there is an almost perfect negative correlation between the entropy and the Chi-square statistic. This is rather intuitive since an even allocation of numbers 0-255 in the sequence that leads to high entropy also leads to a lower X 2 value for the goodness-of-fit statistic. Nonetheless, we have to beware of the fact that the entropy took very few different values. A linear regression model showed that resid-uals were not normal, and so a linear expression cannot be extrapolated that will work for both random and non-random data. Nonetheless, it is clear that the Chi-square test and the Entropy test are highly correlated. Note that the π estimation and Mean test also have a remarkable correlation.
In Table 2, we have coloured each correlation coefficient for which their corresponding p value under the independence hypothesis was smaller than 0.01. This means the compared tests are very likely not independent.

Modified counter with good results
Since most of ENT's tests only focus on the uniformity of the sequence or source under analysis, there might be many easily predictable sequences capable of obtaining good results in the battery. For instance, a counter with values from 0 to 2 16 − 1 where two bytes are output for each counter value will seemingly have perfect entropy, very low error for Monte Carlo π estimation, perfect arithmetic mean, and no serial correlation. We show the results in Fig. 5 for 16 full loops of this counter.
However, we can notice in Fig. 5 that the X 2 statistic is 0.00 which is too good a fit. By modifying this counter a bit, we can get a more "random looking" X 2 statistic but still very good randomness results, even if the sequence still remains essentially a counter.
In order to do so, first we express the 0 to 2 16 − 1 counter over two bytes as pairs (i, j) with i, j ∈ {0, 1, ..., 255}. That is, i is the most significant or left byte of the counter and j the least significant or right one. Now, we are going to change some appearances of the value 250 by appearances of 251, as shown in Algorithm 1.
Note that in this pseudo-counter, all byte values from 0 to 255 will have precisely the number of expected appearances n 256 except for 250 and 251.  The change will be performed every time i mod k = 0, and so this will result in L · 256/k more appearances than expected for 251. The value 250 will be that same amount short from the expected amount. Therefore, the X 2 statistic of the sequence will be Algorithm 1 Pseudo-code of modified k, L-counter.  Now, considering that the length of the file processed by Algorithm 1 is n = L · 2 · 256 2 (which is obvious by multiplying the range of the loops and considering that every inner iteration outputs two bytes) it is easy to see from Eq. 6 that it suffices to take L and k such that k 2 = L in order to get a statistic of exactly 256, which is less suspicious that the 0.00 one we obtained before and much closer to the expected value (of ∼ 255.3 for 255 degrees of freedom).
For instance, we can take L = 16, to loop 16 times as we did for the initial counter, and k = 4. This yields the results shown in Fig. 6. Therefore, we have obtained impeccable results for the battery using our modified counter. Figures 7 and 8 show two representations that should make abundantly clear how non-random this source is, despite its excellent results with ENT.

Calculating p values from existing statistics
One of the most obvious improvements the battery could use is offering p values from the statistics that do not provide one. This would allow us to perform proper hypothesis testing, in order to find statistically sound results from which we can conclude whether the hypothesis test can be rejected. For instance, say we have a sequence with arithmetic mean That will depend on the sequence length. That approximation might be excellent for a 50-byte sequence, but extremely poor for a 20 MB file.
However, there are some derivations by which we can provide a p value, which will tell us how likely it would be to obtain results "more extreme" than the ones observed if the sequence were truly random.

Arithmetic mean
The arithmetic mean is arguably the most simple statistic in the battery. For a sequence x = x 0 ...x n−1 , we get the statisticx = 1 n n−1 i=0 x i . Under the randomness hypothesis, x i are i.i.d. samples from a discrete uniform distribution Y U{0, 255}. Therefore, we can apply the Central Limit and compare our results to a standard normal distribution N (0, 1).
If each x i for i ∈ {0, 1, . . . , n − 1} is an independent sample from Y , the Central Limit Theorem states that the following holds: It is straightforward to find μ and σ . Since Y is a discrete uniform distribution between 0 and 255, we get: Therefore we can obtain a two-tailed p value. First we compute the following z statistic: Under the randomness hypothesis, z will approximately follow a standard normal distribution. z will be near 0 if the mean was near 127.5. Therefore, "more extreme" results than z are those that have a bigger absolute value than |z|. Now if we denote by the cumulative standard normal distribution, we get the p value from the following equation, which resembles the two-tailed p values from [20]: Since the original ENT code has a numerical approximation of from which it approximates the value of χ 2 255 , we can perform this computation without adding too much code, and thus turn the arithmetic mean into a more useful and insightful test.

Serial correlation
For the serial correlation, the same problem holds. It's hard to tell what constitutes a critical boundary for serial correlation, and how to factor in the sequence length.
If two sequences were independent, their serial correlation should approximate zero. In the case of a random sequence, we would expect the correlation between the sequence and the sequence shifted one byte to be near zero. If this value was too big, it would indicate the generator is predictable.
In order to obtain a p value from the Pearson's correlation coefficient between two sets of samples x = x 0 . . . x n−1 and y = y 0 . . . y n−1 , we use the statistic which follows a Student t-distribution with n − 2 degrees of freedom when the samples come from a bivariate normal distribution. However, this requirement is usually dismissed when the number of samples is large enough. Nonetheless, there is always a caveat with this omission, and this can lead to issues with small sample sizes [24]. Since the sequences analysed by StringENT will typically consist of at least thousands of bytes, this approximation will be good enough. We will compute statistic t and find a twotailed p value from Student's t-distribution with n−2 degrees of freedom, n being the number of bytes of the sequence. We will denote such distribution T . Since T is symmetric, this p value's computation is analogous to the one in Eq. (10): Nonetheless, the code in the ENT battery does not come with any tables of Student's t-distribution. Therefore, in order to avoid adding too much new code to the battery, we have made a compromise in accuracy vs. code length and used an approximation of the t-distribution by the normal distribution, which yields three decimal digits' precision. The formula is given by [25,26] Moreover, since the t-distribution converges to the normal distribution as the degrees of freedom grow, and we will typically use it with very large values for d f = n−2, the accuracy will be reasonably good. By joining Eqs. (12) and (13), our two-tailed p value will be given by:

Monte Carlo Pi estimation
We look at Eq. (4). Given a coordinate (x, y) ∈ [0, 2 24 − 1] 2 at random, if x and y are independently and uniformly distributed according to U(0, 2 24 − 1), the probability that (x, y) lies within the circle is given by π/4, as is easy to deduce from Fig. 1. Therefore, in our case, in which x and y will be, under randomness hypothesis, i.i.d. samples from U{0, 2 24 − 1}, we get that the probability that (x, y) lies within the circle should be around π/4. Therefore, we can model the experiment "draw a coordinate and see whether it lies within the circle" as a Bernoulli(π/4) distribution. That is, an experiment with probability π/4 of success, and 1 − π/4 of failure.
Hence, since under randomness hypothesis each coordinate is independent from all others, the number of coordinates that lie within the circle follows a Binomial distribution with p = π/4 and n = number of points, Binomial(π/4, n).
A particular case of the Central Limit Theorem, known as the De Moivre-Laplace Theorem [27], states that if B n follows a Binomial( p, n) distribution, and a, Therefore, the number of points that will fall within the circle can lead us to an approximately standard normal distribution.
This allows us to find a p value. Denoting byπ the approximation of π obtained through the observed sequence, and using Eq. (4), we get that, if we refer to the number of coordinates in the sequence as pts, where pts = n bytes 6 , the number of points within the circle is given by Therefore, since under the randomness hypothesis pts in is approximately distributed as a Binomial(π/4, pts), due to Eq. (15) we get the following statistic with approximate standard normal distribution: We now follow the same path we did before. Under the null hypothesis, the sequence is random, and the number of points that is expected to be within the circle are around π 4 · pts, and so the p value, just like in Eq. (10), is given by In order to double check that these formulae are indeed well derived, we confirmed p value uniformity for the 10, 000 files of /dev/urandom which we used in Sect. 3.
If the p values are well formulated, they should be uniformly distributed between 0 and 1. In order to check that such is the case, we perform a Chi-square uniformity test, dividing the p values in 15 bins that partition [0, 1] in even intervals, and measuring the goodness of fit for the number of p values in each bin compared to the expected 10,000/15. The resulting p values for this uniformity test are shown in Table 3, and they confirm our expectations of uniformity. In order to see this graphically, we also show the corresponding histograms in Fig. 9.

New tests
We decided to add a couple of very fast tests in order to extend the capabilities of the original ENT battery and address the issue detailed in Sect. 4.

Runs test
Firstly we add a simple runs test, as explained in [28], which in turn takes it from [29]. This test is one of the most wellknown and used for randomness testing. In fact, all the most popular test suites include this test as an important element, for example, the NIST SP 800-22 battery, Dieharder, Crypt-X, and SPRNG, among others, as well as older batteries such as the Knuth or Diehard batteries. Given a sequence x = x 0 ...x n−1 , we create a sequence of n + and − signs, by comparing each value to the median 127.5. If a value is below the median, we assign the sign −, and if it is above it we assign a +. We show a small example in Table 4.
A run is then defined as a subsequence of the sign sequence consisting entirely of +'s or −'s, such that the signs at either end of the run are different from the sign of the run.
For instance, the example of Table 4 consists of 4 runs: 2 of + signs and 2 of − signs. We will define n 1 as the number of − signs and n 2 as the number of + signs. Then, the expected number of runs is given bȳ We denote by R the number of observed runs and s 2 R its variance [29]. Since R follows a binomial distribution, we get that Therefore, a two-tailed p value is obtained just like in Sect. 5.1.

Local means test
We also created a specific test to address the problem of the pseudo-counter passing the battery successfully. This test is based on two principles: the arithmetic mean's p value of Sect. 5.1 and a Chi-square goodness-of-fit test.
We divide the sequence in N parts of M bytes each (and discard n mod M bytes). We denote each block X i = x i·M x i·M+1 . . . x (i+1)·M−1 for i = 0, . . . , N − 1. We get a z i statistic by replicating Eq. (9) where σ y μ are the same as in Equation (8), and X i = x i·M x i·M+1 ...x (i+1)·M−1 . Since the z i are approximately distributed as standard normals, we can now take a goodnessof-fit statistic, via Under randomness hypothesis, each block X i is independent from all blocks X j with j = i, and so the statistic of Eq. 17 has a χ 2 N distribution, that is, a Chi-square distribution with N degrees of freedom. Therefore, we can obtain a one-tailed p value by reusing the Chi-square test already in the battery, with no extra code.
Note that this test in the bit mode becomes the block frequency test used in [20], and is inspired by it. We will take M = 1024 as default.
After introducing these changes, and setting a bigger precision for the entropy output, we run the extended battery on the modified counter of Sect. 4. The output is shown in Fig. 10.
We see that the local means test is able to easily detect the non-randomness of the modified counter. Since each block in the output of the counter has very skewed means from the expected 127.5, the goodness-of-fit statistic is extremely high, and so the corresponding p value is less than 10 −6 . Therefore, we get a significant result at α = 0.01 that allows us to confidently reject the randomness hypothesis for the modified counter.

Execution times
In this section, we look at the new changes we introduced to the battery code, and what computational cost they bring.

Computation of the p value for the arithmetic mean test
When the battery calculates the arithmetic mean, the cost associated with this computation is a case of computing Eqs. (10) and (9), which is done in a constant number of steps. Thus these computations are on the order of O(1).

Computation of the p value for the serial correlation test
Similarly to the previous case, this is performed in a constant number of operations that does not depend on the length of the sequence, so its complexity is of the order of O(1).

Computation of the p value for the estimation of the Monte
Carlo estimation of π It is also a constant number of operations, since the battery has already calculated the number of points inside the circle and the total number of points. Therefore we have a complexity on the order of O(1).

Runs test
The runs can be calculated in a single pass to the sequence, because while the i-th byte is being studied, its sign can be compared with that of the i − 1 byte, and it can be determined at the same time whether there is a new run or not. Finally, the number of operations must be added to find the p value, but this is constant whatever the number of runs and the number of positive and negative signs. Therefore, this has a complexity order of O(n), where n is the number of bytes in the sequence. 5. Local means test Like the runs, in a single pass through the sequence, all the means are found, normalized, squared, and the X 2 statistic of Eq. 17 is updated. Thus, the complexity is of the order of O(n).
Therefore, while doing more calculations that will take slightly longer, the order of complexity of StringENT is still at O(n), which is the same that all ENT computations have. To empirically verify this result, we experimented with different files of varying sizes and compared the results. We will also include the results when applying the NIST SP 800-22 battery as a reference, it is one of the most commonly used in practice. We note that, although the execution time has gone up, it is still extremely fast, particularly when compared to the NIST SP 800-22 suite. The results are shown in Table 5. These were computed using a standard Windows 10 (64-bit)

Redundancies
After the derivation of the arithmetic mean p value, we noticed that p values for the arithmetic mean and the uniformity Chi-square test are the same in the bit mode of the suite. This makes sense since in the ENT bit mode, the Chisquare test just measures how many ones and zeros there are, and compares that to the expected rate of 0.5. That is, in bit mode, the Chi-square test and the arithmetic mean test perform the very same computation. This is also true for the newly introduced runs test and the serial correlation test, and consequently they output the same p values (save for digit precision). Therefore, we acknowledge that by using the bit mode of the battery we can incur in many undesirable redundancies and hence we discourage its usage.
On the other hand, we saw in Sect. 3 that the Entropy test and the Chi-square test are very highly correlated, which means they also provide similar information. Both tests focus    In order to illustrate this, we run the battery on a jpg picture of a seaside landscape. It is displayed in Fig. 11, and the corresponding results in Fig. 12. By looking at the output we can confirm: 1. Displaying a 0 per cent compression rate is not necessarily a good indicator of randomness at all, as it takes "relatively little" entropy (it is just 0.03 below the maximum of 8 bits/byte for this file). Therefore, even a picture with multiple non-random patterns can achieve a 0 per cent compression rate, although the picture is clearly nonrandom. This is particularly true of compressed formats such as jpg/mpeg/webP, etc. 2. The rest of tests all fail with a very high statistical significance, as one should expect.
Also, having now more digits of precision for the Entropy and the new tests, we checked correlations again. Figure 13 shows a scatter plot matrix of some of the StringENT statistics for the 10, 000 /dev/urandom files. Table 6 shows the correlations obtained for the data in these tests, and Table 7 has the associated p values. We can see, as highlighted in bold, that many tests are not independent. Therefore, if we really want to maximize StringENT efficiency, we can remove some of these.

Conclusions and future work
We have studied the popular ENT test suite for evaluating random number generators, shown that some trivially flawed generators (i.e. pseudo-counter) can fool it and then proposed an extension we call StringENT.
With StringENT, we keep most of the simplicity and speed that made ENT so widely used, but add some much needed functionalities (in particular the computation of p values for all existing test) and two new tests. Now the user can call the execution of various tests through a parameter in the executable, to find a convenient trade-off between speed and coverage.
With all tests now offering a p value, it becomes easier to analyse their correlation, and we do this next in our work.
This also means the interpretation of the output statistics is easier now.
Future work could focus on designing, implementing and selecting new sets of tests which are similarly fast but do not incur on any kind of correlation.
To sum up, we believe that a promising way forward will consist in adding more tests to StringENT that maintain the characteristic speed of the battery while offering independent, uncorrelated insights.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.