Quantum Random Numbers generated by the Cloud Superconducting Quantum Computer

A cloud quantum computer is similar to a random number generator in that its physical mechanism is inaccessible to the users. In this respect, a cloud quantum computer is a black box. In both devices, the users decide the device condition from the output. A framework to achieve this exists in the field of random number generation in the form of statistical tests for random number generators. In the present study, we generated random numbers on the 20-qubit cloud quantum computer and evaluated the condition and stability of its qubits using statistical tests for random number generators. As a result, we observed that the qubits varied in bias and stability. Statistical tests for random number generators may provide a simple indicator of qubit condition and stability, enabling users to decide for themselves which qubits inside a cloud quantum computer to use.


Introduction
Given a coin with an unknown probability distribution, there are two approaches to decide whether the coin is fair [1].The first approach is to examine the coin itself; one expects an evenly shaped coin to yield fair results.The second approach is to actually toss the coin a number of times to see if the output is sound.In this approach, the coin is treated as a black box.A random number generator is similar to a coin in that it is expected to produce unbiased and independent 0s and 1s.Unlike a coin, however, the physical mechanism of a random number generator is often inaccessible to its users.Therefore, users rely on statistical tests to decide the fairness of the device from its output.
Random number generators play an important role in cryptography, particularly in the context of key generation.For example, the security of the RSA cryptosystem is based on keys that are determined by random choices of two large prime numbers [2].If the choices of prime numbers are not random, an adversary could predict future keys and hence compromise the security of the system.Randomness in cryptography derives from what is called the seed.The seed is provided by physical random number generators [3,4].It is required that the physical mechanism of the physical random number generator remains a black box for the seed to be unpredictable.Given that the measurement outcomes are theoretically unpredictable in quantum mechanics, random number generators based on quantum phenomena are a promising source of unpredictability [5,6,7].
Cloud quantum computers are quantum computers that are accessed online [8,9,10,11,12,13].In order to use a cloud quantum computer, users are required to send programs specifying the quantum circuit to be executed and the number of times the circuit should be run [14].When his/her turn arrives, the quantum computer executes the program and returns the results [15].A similarity between random number generators and cloud quantum computers is that the users do not have direct access to the physical mechanism of the device.So, as far as the users are concerned, both random number generators and cloud quantum computers are black boxes.In the field of random number generation, much research has been done on how to characterize the device from its output.This lead to the creation of statistical tests for random number generators.The present study aims to introduce the idea of statistical tests for random number generators to the field of cloud quantum computing.This aim is supported by three points.Firstly, the cloud quantum computer is a black box to its users, which is also the case with random number generators.Secondly, quantum computers become random number generators when given certain programs.Finally, the cloud quantum computer lacks a simple benchmark that would enable its users to decide the condition of the device.
The rest of this article is organized as follows.In Section 2, the min-entropy, which is a measure of uniformity often employed in the field of cryptography, is introduced.Section 3 generally explains statistical tests for random number generators.Section 4 deals with a group of statistical tests called the NIST SP 800-22.In Section 5, we present the result of statistical analysis of random number samples obtained from the cloud quantum computer, IBM 20Q Poughkeepsie.This Section includes the test results of the 8 statistical tests from the NIST SP 800-22.Finally, Section 6 is devoted to the conclusion.

Min-entropy
Among various entropy measures for uniformity, the min-entropy is often used in the context of cryptography.The min-entropy for a random variable X is defined as follows: On the other hand, Shannon's entropy, which is also a measure for uniformity, is defined as follows: Both measures ( 1) and ( 2) take values ranging from 0 to 1 for a random variable on {0, 1}.The reason why the min-entropy is more appropriate in the context of cryptography is that it is more sensitive than Shannon's entropy.This is apparent from Fig. 1. Figure 1  min-entropy is also indicates the probability that an adversary with knowledge of the probability distribution of X predicts the outcome of X correctly [16].Here, the adversary predicts the value that appears with the highest probability.For this reason, the min-entropy considers the maximum probability of X.

Statistical Tests for Random Number Generators
Statistical tests for random number generators are necessary to confirm that a random number generator is suitable for use in encryption processes [17].Random number generators used in this context are required to have unpredictability.This means that given any subset of a sequence produced by the device, no adversary can predict the rest of the sequence including the output from the past of future.Random number generators with unpredictability should produce unbiased and independent bits.Statistical tests aim to detect random number generators with bias and/or patterns, which makes them predictable devices.When subjected to statistical tests, a random number generator is considered a black box.This means that the only information available is its output.Under the null hypothesis that the generator is unbiased and independent, one expects its output to have certain characteristics.The characteristics of the output is quantified by the test statistic, whose probability distribution is known.From the test statistic, the probability that a true random number generator produces an output with a worse test statistic value is calculated.This is called the p-value.If this probability is below the level of significance α, the generator fails the test and the null hypothesis that the generator is unbiased and independent is rejected.Since statistical tests for random number generators merely rule out significantly biased and/or correlated generators, these tests do not verify that a device is the ideal random number generator.Nevertheless, a generator that passes the tests is more reliable than a generator that doesn't.This is why statistical tests are usually organized in the form of test suites, so as to be comprehensive.Some well known test suites are the NIST SP 800-22 [18], TestU01 [19], and the Dieharder test.

NIST SP 800-22
The NIST SP 800-22 is a series of statistical tests for cryptographic random number generators provided by the National Institute of Standards and Technology [18].Random number generators for cryptographic purposes are required to have unpredictability, which is not strictly necessary in other applications such as simulation and modeling, but is a crucial element of randomness.The test suite contains 16 tests, each test with a different test statistic to characterize deviations of binary sequences from randomness.The entire testing procedure of the NIST SP 800-22 is divided into 3 steps.The first step is to subject all samples to the 16 tests.For each sample, each test returns the probability that the sample is obtained from an unbiased and independent RNG.This probability, which is called the p-value, is then compared to the level of significance α = 0.01.If the p-value is under the level of significance, the sample fails the test.The second step involves the proportion of passed samples for each test.Under the level of significance α = 0.01, 1% of samples obtained from an unbiased and independent RNG is expected to fail each test.If the proportion of passed samples is too high or too low, the RNG fails the test.Finally, p-value uniformity is checked for each test.Suppose one tested 100 binary samples.This yields 100 p-values per test.If the samples are independent, the p-values should be uniformly distributed for all tests.The distribution of p-values is checked via the chi-squared test.
Table 1 The minimum length n required for each test in order to obtain meaningful results.The tests not employed in the present study are shaded in grey.Note that the tests will be referred to by their test # in the results section. Test

Frequency Test
The frequency test aims to test whether a sequence contains a reasonable proportion of 0s and 1s.If the probability of obtaining the sequence from an independent and unbiased random number generator is lower than 1 %, it follows that the random number generator is not independent and unbiased.The minimum length required for this test is 100.
Test Description 1. Convert the sequence into ±1 using the formula: X i = 2ε i − 1. 2. Add the elements of X together to obtain S n .

Compute test statistic:
2) using complementary error function shown as 5. Compare p-value to 0.01.If p-value ≥ 0.01, then the sequence is random.
Otherwise, the sequence is not random.
2) ≈ 0.527089.5. P-value = 0.527089 > 0.01 → the sequence is random.This test is equivalent to testing the histogram for bias.Because the test only considers the proportion of 1s, sequences such as 0000011111 or 0101010101 would pass the test.Failing this test means that the sample is overall biased.

Frequency Test Within a Block
Firstly, the sequence is divided into N blocks of size M.The frequency test is then applied to the respective blocks.As a result, one obtains N p-values.The second part of this test aims to check whether the variance of the p-values is by chance or not.This is called the chi-squared ( χ 2 ) test.For meaningful results, a sample with a length of at least 100 is required.The following is the test description.
Test Description 1. Divide the sequence into N = n M non-overlapping blocks of size M. 2. Determine the proportion of 1s in each block using Note that igamc stands for the incomplete gamma function.
Otherwise, the sequence is not random.
2 ) = 0.801252.6. P-value = 0.801252 > 0.01 → the sequence is random.This test divides the sequence into blocks and checks each block for bias.Depending on the block size, samples such as 001100110011 or 101010101010 could pass the test.Failing this test means that certain sections of the sequence are biased.

Runs Test
The proportion of 0s and 1s does not suffice to identify a random sequence.A run, which is an uninterrupted sequence of identical bits, is also a factor to be taken into account.The runs test determines whether the lengths and oscillation of runs in a sequence is as expected for a random sequence.A minimum sample length of 100 is required for this test.The following is the test description.Otherwise, the sequence is not random.

The Longest Run of Ones Within a Block Test
This test determines whether the longest runs of ones 111 . Note that K, N and π i are determined by M. See Tables 4 and 5.

Discrete Fourier Transform Test
This test checks for periodic patterns in the sequence by performing a discrete Fourier transform (DFT).The minimum sample length required for this test is 1000.The following is the test description.
Test Description 1. Convert the sequence ε of 0s and 1s into a sequence X of −1s and +1s.
2. Apply a DFT on X: S = DFT(X).This should yield a sequence of complex variables representing the periodic components of the sequence of bits at different frequencies.= 0.031761.

Cumulative Sums Test
The cumulative sums test is basically a random walk test.It checks how far from 0 the sum of the sequence in terms of ±1 reaches.For a sequence that contains uniform and independent 0s and 1s, the sum should be close to 0. This test requires a minimum sample length of 100.
Test Description 1. Convert 0 to -1 and 1 to +1. 2. In forward mode, compute the sum of the first i elements of X.In backward mode, compute the sum of the last i elements of X. 3. Find the maximum value z of the sums.4. Compute the following p-value.Φ is the cumulative distribution function for the standard normal distribution.
Otherwise, the sequence fails the test.
In forward mode, the maximum value is z = 2. 4. P-value = 0.941740 for both forward and backward.5. P-value = 0.941740 ≥ 0.01.The sequence passes the test.
Once the p-value has been calculated for all tests and samples, the proportion of samples that passed the test is computed for each test.Let us consider a case where 1000 samples were subjected to each of the 15 tests.This results in 1000 p-values per test.For example, if 950 out of 1000 samples passed the frequency test, the proportion of passed samples is 0.95.If the proportion of passed samples falls within the following range for all 15 tests, the samples pass the second step of the NIST SP 800-22.The acceptable range of proportion is calculated with where α stands for the level of significance and m the sample size.It is noted that there still is a controversial topic whether the coefficient should be 3.It is suggested the coefficient should be 2.6 [20].In the case of the current example, (8) can be calculated using α = 0.01 and m = 1000 as From the fact that 0.95 is not within acceptable range, it follows that the samples fail the frequency test.The same process is done with all 16 tests and unless the samples pass tests, the result is that the hypothesis that the RNG is unbiased and independent is rejected.The final step of the NIST SP 800-22 is to evaluate p-value uniformity of each test.In order to perform the chi-squared ( χ 2 ) test, the p-value is divided into 10 regions: [k, k + 0.1) for k = 0, 1, . . ., 9. The test statistic is given by (number of samples in i − th region − sample size/10) 2  sample size/10 .
When the number of samples in each region is 2, 8, 10, 13, 17, 17, 13, 10, 8, 2, the test statistic 10 is calculated as χ 2 = 25.200000.From χ 2 , the p-value is Therefore, in the current example where χ 2 = 25.200000, the p-value is 0.002758.The level of significance for p-value uniformity is α = 0.0001.So when the p-value is 0.002758, it follows that the p-value distribution is uniform.The p-value uniformity test requires at least 55 samples.As mentioned before, it is remarked that passing the NIST SP 800-22 does not ensure a sequence to be truly random [21,22,23].

Quantum Random Numbers Generation on the Cloud Quantum Computer
According to quantum mechanics, the measurement outcomes of the superposition state (|0 + |1 )/ √ 2 along the computational basis ideally form a random number sequence.This means that the resulting sequences are expected to pass the statistical tests for RNGs explained previously.Here, the computational basis, |0 and |1 , spans the two-dimensional Hilbert space.In a a quantum computer, the desired state 2 is generated from the initial state |0 by applying the Hadamard gate to a single quantum bit (qubit).
In the present study, the cloud superconducting quantum computer, IBM 20Q Poughkeepsie, was used.The device was given the circuit in Fig. 2 and was repeatedly  instructed to execute the circuit 8192 times without interruption from 2019/05/09 11:24:27 GMT.Because the quantum computers have multiple users across the globe, interruption between jobs occur [24].8192 is the maximum number of uninterrupted executions (shots) available.Running the circuit with 8192 shots yields a binary sequence with a length of 8192 per qubit.This process was automatically repeated across calibrations.The device goes through calibration once a day (see Table 6).As a result, 579 samples were obtained from the IBM 20Q Poughkeepsie device.Note that each qubit produced 579 samples, each with a length of 8192.The samples were subjected to the 8 tests from the NIST SP 800-22, which are: the frequency test, frequency within a block test, runs test, longest runs within a block test, DFT test, approximate entropy test, and cumulative sums test (forward, backward).The p-value of each test corresponding to the respective samples was computed.For each test, the proportion of passed samples was checked.The acceptable range of the proportion of passed samples for 579 samples under the level of significance α = 0.01 is > 0.977595.
By constantly running the IBM 20Q Poughkeepsie device for 5 days, we obtained 579 samples for each of the 20 qubits.In theory, these samples should qualify as the output of an ideal random number generator.In random number generation, the output sequences are checked for two properties: bias and patterns.When the sequences show signs of bias or patterns, the device is not in ideal condition.The same logic applies to the cloud quantum computer.
In the present section, the random number output of each qubit inside the IBM 20Q Poughkeepsie device is analyzed.For the purpose of comparison, the data plots of the 20 qubits are aligned to resemble the device topology.The device topology shown in Fig. 3 represents the pairs of qubits which can be entangled.The min-entropy of the 579 samples was computed for each qubit.This resulted in 579 min-entropy transition plots for 20 qubits.Figure 4 is organized to form the topology of IBM 20Q Poughkeepsie.The min-entropy takes values from 0 to 1 depending on the highest probability of the probability distribution.When the probability distribution is uniform, the min-entropy is 1. Figure 4 shows how each qubit has a unique tendency for min-entropy.Qubit [17], for example, shows a sudden drop in min-entropy at around 60 hours.A sudden drop in min-entropy suggests that the measurement results can vary depending on when the cloud quantum computer executes a circuit.
Next, the samples are checked for bias.The qubits produced 579 samples with a length of 8192, which forms a 4,743,168-bit sequence.Figure 4 demonstrates the proportion of 1s in the entire sequence output by each qubit.Under the level of significance α = 0.01, the proportion of 1s of a 4,743,168-bit sequence should fall between the red lines.The result is that none of the qubits produced acceptable proportions of 1s (see Fig. 5).4XELWQXPEHU 3URSRUWLRQRIV $FFHSWDEOHUDQJHXQGHU = 0.01>@ Fig. 5 The proportion of 1s of qubit[0]∼ [19].
The problem with histograms is that they fail to detect certain anomalies.For example, a sequence consisting of all 0s for the former half and all 1s for the latter half yields a perfect histogram.However, such a sequence is clearly not random.To compensate for this flaw, we focused on the transition of the number of 1s in the sequence.Ideally, the number of 1s in a random number sequence should always be roughly half of the sequence length.The difference between the ideal number of 1s and the observed number of ones for the 4,743,168-bit sequence of each qubit is examined in Fig. 6.Note that here, too, the figures are aligned topologically.Figure 6 shows the stability of each qubit in terms of the proportion of 1s in its output; a linear plot suggests that the qubit is being stably operated.

Conclusion
We characterized the qubits in a cloud quantum computer by using statistical tests for random number generators to provide a potential indicator of the device's condition.The IBM 20Q Poughkeepsie device was repeatedly run for a period of 5 days, and 579 samples with a length of 8192 were obtained for each of the 20 qubits.These samples were statistically analyzed for bias and patterns.To evaluate the uniformity of each sample, the min-entropy was computed.The transition of min-entropy showed that the qubits have unique characteristics.We identify a sudden drop of min-entropy in qubit [17].The histogram of the proportion of 1s in the 4, 743, 168-bit sequences produced by each qubit revealed that overall, none of the qubits produced acceptable proportions of 1s.However, we evaluated each qubit stability from the time-series data of the proportion of 1s, and found that qubits [0] and [12] were relatively stable.Instead of deciding whether a qubit is ideal or not under the level of significance, interpreting the p-value as an indicator of how close to ideal a qubit is provides a more flexible interpretation of the test results.Finally, 8 tests from the NIST SP 800-22 were applied to the 529 samples of the 20 qubits.None of the qubits cleared the standards of the test suite.However, the test results show that [0] and [12] were the closest to ideal in terms of randomness.As is the case with random number generators, the cloud quantum computer is a black box to its users.Therefore, the users are required to decide for themselves when to use the device and which qubits to choose.Statistical tests for random number generators are a potential candidate for a simple indicator of qubit condition and stability inside a cloud quantum computer.

Fig. 6
Fig.6 The difference between the ideal and observed increase in the number of 1s of qubit [0]∼[19].The figure has been rotated 90 degrees.
• • • within blocks of size M is consistent with what would be expected in a random sequence.The possible values of M for this test are limited to three values, namely, 8, 128 and 10000, depending on the size of the sequence to be tested.Divide the sequence into blocks of size M.The choices of M and N are determined in regard to the length of the sequence.N denotes the number of blocks, and the elements exceeding the number of blocks are discarded.

Table 2
Choices of M for the longest runs of ones within a block test.
2. Classify each block into the following categories regarding M and the length of the longest run in each block.See Table3.

Table 3
Classifications of each block.Classes v i M ≥ 8 M ≥ 128 M ≥ 100000

Table 4
Values of K and N corresponding to M.

Table 5
Values of π i corresponding to K and M.

Table 6
The correspondence between calibration start/end time and time of job sent.All dates and times are in GMT.

Table 7 P
-values corresponding to the proportion of passed samples for each test.The test names corresponding to the test # can be found in Table1.The acceptable range is > 0.977595.