In several places, in this paper we used the term true randomness. When doing this, we have implicitly made a simplifying assumption that “the sequence which possesses all the properties that a truly random sequence would have” can be considered as “sufficiently random.” The cited statement comes from the abstract of [14]. The author of this phrase extends this line of thought by stating that a satisfactory testing “would require an infinite number of tests.” By referring to an infinite number of tests these considerations become more philosophical than technically orientated. Although far from infinite, the sets of tests have been designed. Among these, the TestU01 library [17] and the NIST Statistical Tests Suite [1] are easily accessible and frequently used by cryptographers ( [17]: over 1000 citations; [1]: over 2800 citations [10]). The NIST Suite is more recent and seems to be easier to use. Therefore, we have chosen it as a practical solution to the question of how to check whether the data we generate can be considered as random or not.
While being aware of that the true randomness is generally a difficult and still open problem, we have treated the use of the NIST Suite as a means to classify the sequences of bits generated by our methods as only apparently unordered but definitely not random or possible to be considered as random. Seeking to apply an infinite number of tests would not make us closer to the answer to our basic question: yes or not. So, if there are no indications to treat a sequence as not random, we shall consider it not only looking as random, but truly random, all the time in a limited, engineering sense.
For the tests, a pixel was expressed as two bits: 00 for black, 01 for red, 10 for green and 11 for blue. This arbitrary assignment was believed not to influence the result. Pixels were saved in two separate files: by rows and by columns. Both shares were tested. Therefore, for each image there were four data files. Besides the above mentioned two images, a number of well known benchmark images were used: baboon, peppers and Lena (to be found in the sources cited in the historical web site [13]).
Default values of the parameters of the NIST software were used; in particular, \(\alpha =0.01\). Results for the tests which have subtests (CumulativeSums, NonOverlappingTemplate, RandomExcursions, RandomExcursionsVariant) were shown together, so finally 188 tests with their subtests were treated as just 15 tests.
To make it possible to capture the state of the randomness for one image as a whole, it has been attempted to show the results for 100 random realizations of coding, two directions and two shares, in one page. This conforms with the concept of small multiples introduced by Tufte [26], which advices to present all the relevant data together, so that they can be perceived simultaneously.
The results are presented in the form of histograms of p-values (Figs. 7, 8). Further we shall refer to the p-values as simply p. The width of the bins is \(\alpha /4\). The counts for the test failures, \(p\le \alpha\), denoted in the key of the graphs with a graphical symbol as low, are shown in shades of red. The histogram values for the success of the tests, denoted as good, are shown in the shades of gray. Separately, to the right of the range of p, the number of cases for which the preconditions for the tests were not met is shown with the shades of blue and denoted with a question mark meaning not applicable. This can happen for tests: RandomExcursions, RandomExcursionsVariant and Universal, for which the NIST Tests Suite can issue warnings. The data for the basic shares, denoted as share 1, are shown with full symbols and for the coding shares as share 2, with empty symbols. The data for reading the pixels by columns, vertically, are shown with bars and by rows, horizontally, with circles.
To show clearly the important data for \(p\le \alpha\) which indicate the failures of the tests, a nonlinear axis for p is used. The transformation is \(p\rightarrow {}p^a\), with a such that the point 0.01 takes the place of the former point 0.1.
Let us start with the results of analysis for image TextBlack shown in Fig. 6 used in [24] to illustrate the basic random algorithm for two-level images, introduced in [23]. The image and the result of decoding are shown together.
The graphs of p-values shown in Fig. 7 show that the basic random version of the coding is truly random.
For all the tests, the majority of realizations passed successfully. The tests that failed are in minority, and their frequencies conform to the generally flat distribution of p-values. The cases which did not meet the initial conditions of some tests are also rare.
The results for the image parrots (shown in Fig. 5a), coded with the coordinated method, are shown in Fig. 8.
The same results for this one and other images are shown in an abstracted form in Table 1, where for each test only the numbers of low, good and not applicable results are given. This hides the distribution of p-values but makes it possible to present the results for many images condensed in a limited space. The histograms themselves can be found by the interested reader in supplementary material [3].
Table 1 Abstracted results of statistical randomness tests shown as pairs (or triplets, where applicable) of numbers of realizations, with color text: [low (red values) good (black values) n/a (blue values)] In three images, the numbers of black pixels were reduced with three methods: free, intermediate and coordinated. Results from the coordinated method are generally worse than those of the free method. This indicates that the changes in the shares restricted to operating on each segment separately, although giving visually better results, are strongly less random than making the changes freely. The free method, however, has an important drawback of reducing the contrast of the decoded image, shown in images of Figs. 4f and 5e. It does not guarantee the randomness in all the cases (baboon: ApproximateEntropy, parrots: Runs, LongestRun, ApproximateEntropy, Serial). The local intermediate method gave less random results than the coordinated one, especially for the tests Frequency and CumulativeSums. Therefore, we discontinued testing this method, in spite of that this method yields the decoded images free from additional errors. We did not show the results for the global intermediate method, which is unacceptable due to information leaks, but unexpectedly, it generally has a better randomness. (Some detailed results are available in supplementary material [3].)
The tests which can be considered as confirming the randomness of the color shares are Frequency, BlockFrequency, CumulativeSums, Rank, RandomExcursions, RandomExcursionsVariant and LinearComplexity.
The tests which always or nearly always reject the randomness are Runs, LongestRun, OverlappingTemplates and ApproximateEntropy.
It can be also noted that the preconditions for RandomExcursions and RandomExcursionsVariant were not met over twice more frequently for images read horizontally than for those read vertically (Fig. 8), which is an indication of directionality, present in the expectedly isotropic objects.
Finally, it should be stated that the search for true randomness, successful for the black-and-white images, is still an unreachable target for the color ones.