On the effects of biased quantum random numbers on the initialization of artificial neural networks

,


Introduction
The intrinsic non-deterministic nature of quantum mechanics (Kofler and Zeilinger, 2010) makes random number generation a native application of quantum computers.It has been exemplarily studied in Bird et al. (2020) how such quantum random numbers can affect stochastic machine learning algorithms.For this purpose, electron-based superposition states have been prepared and measured on quantum hardware to create random 32-bit integers.These numbers have subsequently been used to initialize the weights in neural network models and to determine random splits in decision trees and random forests.Bird et al. have observed that quantum random numbers can lead to superior results for certain numerical experiments in comparison with classically1 generated pseudo-random numbers.
However, the authors have not further explained this behavior.In particular, they have not discussed the statistical properties of the generated quantum numbers.Due to technical imperfections and physical phenomena like decoherence and dissipation, measurement results from a quantum computer might in fact significantly deviate from idealized theoretical predictions (Tamura and Shikano, 2020, Shikano et al., 2020, Tamura and Shikano, 2021).This raises the question of whether it is not the superiority of the quantum random number generator to sample perfectly random from the uniform distribution that leads to the observed effect, but instead its ability to sample bit strings from a very particular distribution that is imposed by the quantum hardware.
We therefore revisit this topic in the present manuscript and generate biased random numbers using real quantum hardware, where the specifics of the bias are determined by the natural imperfections of the hardware itself.The bias is therefore not under our control and even beyond our full understanding.With this approach, we aim to better comprehend the effects observed by Bird et al. for an analogous setup and explore the resulting implications.Summarized, our main goal is to further study the results of that work and to analyze the effects of quantum and classical random numbers with and without biases on neural network initialization.Our analysis is mainly based on numerical experiments and statistical tests.
The structure of the remaining paper is as follows.In section 2, we briefly summarize the background of the main ingredients of our work, namely quantum computing and random number generation.Subsequently, we present the setup of our quantum random number generator and discuss the statistics of its results in section 3.In section 4, we study the effects of the generated quantum random numbers on artificial neural network weight initialization using numerical experiments.Finally, we close with a conclusion.

Background
In the following, we provide a brief introduction to quantum computing and random number generation without claiming to be exhaustive.For more in-depth explanations, we refer to the cited literature.

Quantum computing
Quantum mechanics is a physical theory that describes objects at the scale of atoms and subatomic particles, e. g., electrons and photons (Norsen, 2017).An important interdisciplinary subfield is quantum information science, which considers the interplay of information science with quantum effects and includes the research direction of quantum computing (Nielsen and Chuang, 2011).

Quantum devices
A quantum computer is a processor which utilizes quantum mechanical phenomena to process information (Benioff, 1980, Grumbling andHorowitz, 2019).Theoretical studies show that quantum computers are able to solve certain computational problems significantly faster than classical computers, for example, in the fields of cryptography (Pirandola et al., 2020) and quantum simulations (Georgescu et al., 2014).Recently, different hardware solutions for quantum computers have been realized and are steadily improved.For example, superconducting devices (Huang et al., 2020) and ion traps (Bruzewicz et al., 2019) have been successfully used to perform quantum computations.However, various technical challenges are still unresolved so that the current state of technology, which is subject to substantial limitations, is also phrased as noisy intermediate-scale quantum (NISQ) computing (Preskill, 2018).Nevertheless, quantum supremacy on NISQ devices has already been verified experimentally for a specialized task of randomized sampling (Boixo et al., 2018, Wu et al., 2021).
There are different theoretical models to describe quantum computers, typically used for specific hardware or in different contexts.We only consider the quantum circuit model, in which a computation is considered as a sequence of quantum gates and the quantum computer can consequently be seen as a quantum circuit (Nielsen and Chuang, 2011).In contrast to a classical computer, which operates on electronic bits with a well-defined binary state of either 0 or 1, a quantum circuit works with qubits.A qubit is described by a quantum mechanical state, which can represent a binary 0 or 1 in analogy to a classical bit.In addition, however, it can also represent any superposition of these two values.Such a quantum superposition is a fundamental principle of quantum mechanics and cannot be explained with classical physical models.Moreover, two or more qubits can be entangled with each other.Entanglement is also a fundamental principle of quantum mechanics and leads to non-classical correlations (Bell and Aspect, 2004

quantum computation process
Figure 1: Sketch of the three-step quantum computation process consisting of an initial state preparation, a sequence of gate operations and a final measurement, which yields the result of the computation.Also shown are the errors associated with each step in the computation process: the state preparation errors, the gate errors, and the measurement errors, respectively.They are all hardware-related errors, which can in principle be reduced (or even eliminated) by technological advances.These errors can cause a hardware-related uncertainty (statistical and systematic) of the computation result.On the other hand, the intrinsic randomness of quantum mechanics emerging at the time of the measurement causes an intrinsic uncertainty of the computation result, which is an integral part of quantum computing and can be exploited to construct QRNGs.
In order to illustrate the aforementioned fundamental quantum principles and to connect them with well-known notions from the field of machine learning, one can consider the following intuitive (but physically inaccurate) simplifications: Superposition states can be understood as probability distributions over a finite state space, while entanglement amounts to high-order dependencies between univariate random variables.This intuition particularly emphasizes the close relationship between quantum mechanics and probability theory.
Any quantum computation can be considered as a three-step process, which is sketched in Fig. 1.First, an initial quantum state of the qubits is prepared, usually a low-energy ground state.Second, a sequence of quantum gates deterministically transforms the initial state into a final quantum state.Third, a measurement is performed on the qubits to determine an outcome.When a qubit is measured, the result of the measurement is always either 0 or 1, but the observation is non-deterministic with a probability depending on the quantum state of the qubit at the time of the measurement.
In this sense, a quantum computation includes an intrinsic element of randomness.This randomness is in particular not a consequence of lack of knowledge about the quantum system, but an integral part of quantum mechanics itself.In constrast to classical mechanics, where complete knowledge about the intitial state of a system allows to infer all later (and earlier) states, complete knowledge about a quantum mechanical state does not generally allow the prediction of a single measurement outcome, but only its probability as determined by Born's rule (Norsen, 2017).The non-deterministic nature of quantum mechanics relies on the assumption that there are no so-called hidden variables whose knowledge would lead to a deterministic behavior (Norsen, 2017).Various theoretic and experimental evidences, for example based on Bell's theorem (Bell and Aspect, 2004) or the Kochen-Specker theorem (Kochen and Specker, 1975), strongly suggest that there are no such hidden variables.However, a conclusive answer to the question of quantum non-determinism is still in scientific discourse.For a more detailed discussion about this topic, we refer to Bera et al. (2017) and references therein.Since our work concerns the practical application of random numbers in machine learning algorithms and a theoretical provability of their randomness from first principles is beyond the scope of this paper, we presume in the following that quantum mechanics is indeed intrinsically non-deterministic for all purposes considered.NISQ devices, as their name suggests, are typically only capable of computing noisy results.A fundamental reason is that the quantum computer, despite all technical efforts, is not perfectly isolated and interacts (weakly) with its environment.In particular, there are two major effects of the environment that can contribute to computational errors, namely dissipation and decoherence in the sense of dephasing (Zurek, 2007, Vacchini, 2016).Dissipation describes the decay of qubit states of higher energy due to an energy exchange with the environment.Decoherence, on the other hand, represents a loss of quantum superpositions as a consequence of environmental interactions.Typically, decoherence is more dominating than dissipation.Beyond these typical effects, other (possibly unknown) influences can occur, which can lead to additional uncertainties.
To compensate the resulting computational errors to a certain extend, error correction can be used (Roffe, 2019).However, it is generally not possible to completely eliminate statistical (also called aleatoric) or systematic (also called epistemic) uncertainties, which might originate from quantum and classical effects, respectively.Therefore, quantum algorithms must be designed sufficiently robust for practical applications on NISQ hardware.
In Fig. 1, we briefly outline different error sources in the quantum computation process.Specifically, each computation step is affected by certain hardware-related errors, which are referred to as state preparation errors, gate errors, and measurement errors, respectively (Nachman and Geller, 2021).All of them are a consequence of the imperfect physical hardware and they are non-negligible for NISQ devices (Leymann and Barzen, 2020).The resulting hardware-related uncertainty might be both statistical and systematic.In addition, the final measurement step is also affected by the intrinsic randomness of quantum mechanics.The measurement ultimately yields a computation result that contains two layers of uncertainty (Heese and Freyberger, 2014): First, the uncertainty caused by the hardware-related errors, and second, the uncertainty caused by the intrinsic randomness.While technological advances (like better hardware and improved algorithm design) can in principle reduce (or even eliminate) hardware-related errors and thus the hardware-related uncertainty, the intrinsic uncertainty is an integral part of quantum computing.It is this intrinsic uncertainty which can be exploited to construct QRNGs.

Quantum machine learning
In a machine learning context, we may identify a quantum circuit with a parameterizable probability distribution over all possible measurement outcomes, where each measurement of the circuit draws a sample from this distribution.The interface between quantum mechanics and machine learning can be attributed to the field of quantum machine learning (Biamonte et al., 2017).A typical use case is the processing of classical data using algorithms that are fully or partially computed with quantum circuits, which is also called quantum-enhanced machine learning (Dunjko et al., 2016).
The noisy nature of NISQ devices presents a challenge for machine learning applications.On the other hand, the probabilistic nature of quantum computing can be related to the statistical background of machine learning algorithms, for which the understanding and modeling of uncertainty is crucial.A review about different types of uncertainty in machine learning and how to typically deal with them can for example be found in Hüllermeier and Waegeman (2021).

Random number generation
For many machine learning methods, random numbers are a crucial ingredient and therefore random number generators (RNGs) are an important tool.Examples include sampling from generative models like generative adversarial networks, variational autoencoders or Markov random fields, parameter estimation via stochastic optimization methods, as well as randomized regularization and validation techniques, randomly splitting for cross-validation, drawing of random mini-batches, and computing a stochstic gradient, to name a few.Randomness also plays an important role in non-deterministic optimization algorithms or the initialization of (trainable) neural network parameters (Glorot andBengio, 2010, He et al., 2015).
At its core, a RNG performs random coin tosses in the sense that it samples from a uniform distribution over a binary state space (or, more generally, a discrete state space of arbitrary size).
Given a sequence of randomly generated bits, corresponding integer or floating-point values can be constructed straightforwardly.

Classical RNGs
In the classical world, there are two main types of random number generators.Pseudo-random number generators (PRNGs) represent a class of algorithms to generate a sequence of apparently random (but in fact deterministic) numbers from a given seed (James and Moneta, 2020).In other words, the seed fully determines the order of the bits in the generated sequence, but the statistical properties of the sequence (e.g., mean and variance) are independent of the seed (as determined by the underlying algorithm).We remark that PRNGs can also be constructed based on machine learning algorithms (Pasqualini and Parton, 2020).
The more advanced true random number generators (TRNGs) are hardware devices that receive a signal from a complex physical process, which is unpredictable for all practical purposes, to extract random numbers (Yu et al., 2019).A multitude of physical effects can be used as sources of entropy for TRNGs, with only some of them directly linked to quantum phenomena.For example, metastability in latches can be exploited in specialized electrical circuits (CMOS devices) to yield random bits (Tokunaga et al., 2008, Holleman et al., 2008).Usually, such setups are built to calibrate themselves to account for hardware-inherent bias effects.Multiple of these selfcalibrating entropy sources can be combined to further increase the cryptographic quality (Mathew et al., 2015).Other approaches make use of ring oscillators to source randomness from timing jitter (Kim et al., 2017), or exploit random telegraph noise to produce bit streams (Puglisi et al., 2018, Brown et al., 2020).
For TRNGs, the lack of knowledge about the observed physical system induces randomness, but it cannot be guaranteed in principle that the dynamics of the underlying physical system are unpredictable (if quantum effects are not sufficiently involved).Likewise, the statistical properties of the generated random sequence are not in principle guaranteed to be constant over time since they are subject to the hidden process.
Independent of their source, random numbers have to fulfill two properties: First, they have to be truly random (i.e., the next random bit in the sequence must not be predictable from the previous bits) and second, they have to be unbiased (i.e., the statistics of the random bit sequence must correspond to the statistics of the underlying uniform distribution).In other words, they have to be secure and reliable.A "good" RNG has to produce numbers that fulfill both requirements.In practice, it is difficult to rigorously proof the quality of RNGs.For a bit sequence of finite length, there is no formal method to decide its randomness with certainty.On the other hand, an infinite bit sequence cannot be tested in finite time (Khrennikov, 2015).Therefore, statistical test are typically used to check specific properties of RNGs with a certain confidence.
Typically, statistical tests are organized in the form of test suites (e. g., the NIST Statistical Test Suite described in Rukhin et al., 2010) to provide a comprehensive statistical screening.A predictive analysis based on machine learning methods can also be used for a quality assessment (Li et al., 2020).It remains a challenge to certify classical RNGs in terms of the aforementioned criteria (Balasch et al., 2018) to, e. g., ensure cryptographical security.
When implementing learning and related algorithms, PRNGs are typically used.Despite the broad application of randomness in machine learning, the apparent lack of research regarding the particular choice of RNGs suggests that it is usually not crucial in practice.This assumption has been experimentally verified, e. g., in Rajashekharan and Shunmuga Velayutham (2016) for differential evolution and is most certainly due to the fact that modern PRNGs seem to be sufficiently secure and reliable for most practical purposes.The influence of different seeds for a PRNG on various deep learning algorithms for computer vision has been studied empirically in Picard (2021) with the result that it is often possible to find seeds that lead to a much better or much worse performance than the average.This highlights the fact that numerical experiments with non-deterministic algorithms have to be conducted carefully to account for the variance of random numbers.However, the specific implications of varying degrees of security and reliability of RNGs on machine learning applications generally remain unresolved, i. e., it generally remains unclear whether a certain machine learning algorithm may suffer or benefit from the artifacts of an imperfect RNG.In the present work, we approach this still rather open field of research by specifically considering the randomness in artificial neural network initialization.

Quantum RNGs
As previously stated, quantum computers (or, more generally, quantum systems) have an intrinsic ability to produce truly random outcomes in a way that cannot be predicted or emulated by any classical device (Calude et al., 2010).Therefore, it seems natural to utilize them as a source of random numbers in the sense of a quantum random number generator (QRNG).Such QRNGs (Herrero-Collantes and Garcia-Escartin, 2017) have already been realized with different quantum systems, for example using nuclear decay (Park et al., 2020) or optical devices (Leone et al., 2020).
Summarized, the main difference between randomness from classical systems and randomness from quantum systems is that a classical system is fully deterministic and therefore all randomness can only result from a lack of knowledge about the system, whereas a quantum system is non-deterministic and therefore -even with perfect knowledge -an intrinsic randomness may be involved.In this sense, the origin of randomness is different for quantum and classical RNGs.However, it is in principle not possible to mathematically distinguish the randomness of a classical system from the randomness of a quantum system (Khrennikov, 2015).
A simple QRNG can be straightforwardly realized using a quantum circuit.For this purpose, each of its qubits has to be brought into a superposition of 0 and 1 such that both outcomes are equally probable to be measured.This operation can for example be performed by applying a single Hadamard gate on each qubit (Nielsen and Chuang, 2011).Each measurement of the circuit consequently generates a sequence of i.i.d.random bits, one for each qubit.
However, when computing this simple QRNG circuit on a NISQ device, it can be expected that the results will deviate from the theoretic expectations due to statistical and systematic uncertainties such that the QRNG is likely to produce biased outcomes.This means that it is in fact not guaranteed that the measurement outcomes obey the theoretically predicted probability distribution of a fair coin toss.It is not even guaranteed that the measurement outcomes are truly random in the sense that bits are generated entirely independent.As a consequence (and based on the fact that quantum non-determinism is not ultimately resolved), it cannot be generally taken for granted that random numbers from such a QRNG are naturally "better" than random numbers from PRNGs, both with respect to security and reliability.For this reason, technically more refined solutions are necessary to realize trustworthy QRNGs on NISQ decices.Moreover, QRNGs have to be certified similar to classical RNGs.For example, to enable a theoretically secure QRNG, the Bell inequality (Pironio et al., 2010) or the Kochen-Specker theorem can be utilized (Abbott et al., 2014, 2015, Kulikov et al., 2017).For an experimental verification of random bit sequences from a QRNG, entanglement-based public tests of randomness can be used without violating the secrecy of the generated sequences (Jacak et al., 2020).
Currently, there exist various commercial and non-commercial QRNGs, which can be used to create quantum random numbers on demand, for example ANU QRNG (2021).Although there still seem to be some practical challenges (Martínez et al., 2018, Petrov et al., 2020), theoretical and technological advances in the field will most certainly lead to a steady improvement of QRNGs.
3 Biased QRNG Motivated by the work in Bird et al. (2020), we take a different approach than usual in this manuscript.Instead of aiming for a RNG with as little bias as possible, we discuss whether the typical bias in a naively implemented, gate-based QRNG can actually be beneficial for certain machine learning applications.In other words, we consider the bias that is "naturally" imposed by the quantum hardware itself (i.e., by the hardware-related errors outlined in Fig. 1).In addition to a bias, we also accept that the randomness of the results is not necessarily guaranteed in the sense that the QRNG can (to some degree) produce correlations or predictable patterns from systematic quantum hardware errors.Since the imperfections of the quantum hardware are beyond our control (i.e., they can in particular not be switched off at will), a RNG realized in this way contains unknown and uncontrollable elements.Therefore, we have to analyze its outcomes statistically to capture the effects of these elements on the generated random numbers.In the present section,  we first describe our experimental setup for such a naively implemented QRNG and subsequently discuss the statistics of the resulting "hardware-biased" quantum random numbers.

Setup
To realize a hardware-biased QRNG (B-QRNG), we utilize a physical quantum computer, which we access remotely via Qiskit (Abraham et al., 2019) using the cloud-based quantum computing service provided by IBM Quantum (IBM, 2021).With this service, users can send online requests for quantum experiments using a high-level quantum circuit model of computation, which are then executed sequentially (LaRose, 2019).The respective quantum hardware, also called backend, operates on superconducting transmon qubits.
For our application, we specifically use the ibmq manhattan backend (version 1.11.1), which is one of the IBM quantum Hummingbird r2 processors with N ≡ 65 qubits.A sketch of the backend topology diagram can be found in Fig. 2(a).It indicates the hardware index of each qubit and the pairs of qubits that support two-qubit gate operations between them.IBM also provides an estimate for the relaxation time T 1 and the dephasing time T 2 for each qubit at the time of operation.The mean and standard deviation of these times over all qubits read T 1 ≈ (59.11 ± 15.25) µs and T 2 ≈ (74.71 ± 31.22) µs, respectively.
Initially, all qubits in this backend are prepared in the ground state.Our B-QRNG cicuit, which is sketched in Fig. 2(b), consists of one Hadamard gate applied to each qubit such that it is brought into a balanced superposition of ground state and excited state.A subsequent measurement on each qubit should therefore ideally (i.e., in the error-free case) reveal an outcome of either 0 (corresponding to the ground state) or 1 (corresponding to the excited state) with equal probability.However, since we run the circuit on real quantum hardware, we can expect to obtain random numbers which deviate from these idealized outcomes due to hardware-related errors.An analogous setup with a different backend is considered in Tamura and Shikano (2020), Shikano et al. (2020), Tamura and Shikano (2021).
We sort the qubit measurements according to their respective hardware index in an ascending order so that each run of the backend yields a well-defined bit string of length N .Such a single run is called a shot in Qiskit.We perform sequences of S ≡ 8192 shots (which is the upper limit according to the backend access restrictions imposed by IBM) for which we concatenate the resulting bit strings in the order in which they are executed.Such a sequence of shots is called experiment in Qiskit.We repeat this experiment R ≡ 564 times (900 experiments is the upper limit set by IBM) and again concatenate the resulting bit strings in the order of execution.A sequence of experiments is denoted as a job in Qiskit and can be submitted directly to the backend.It is run in one pass without interruption from other jobs.
Our submitted job ran from March 5, 2021 10:45 AM GMT to March 5, 2021 11:58 AM GMT.The final result of the job is a bit string of length M ≡ N SR = 300 318 720 as sketched in Fig. 3.The choice of R is determined by the condition M ⪆ 3 × 10 8 , which we have estimated as sufficient for our numerical experiments.We split the bit string into chunks of length C ≡ 32 to obtain L ≡ M/C = 9 384 960 random 32-bit integers, which we use for the following machine learning experiments.

Statistics
Before we utilize our generated random numbers for learning algorithms, we first briefly discuss their statistics.The measurement results from the nth qubit can be considered as a Bernoulli random variable (Forbes et al., 2011), where n ∈ {0, . . ., 64} represents the hardware index as outlined in Fig. 2. Such a variable has a probability mass function depending on the value of the bit b ∈ B and the success probability p ∈ [0, 1] of observing an outcome b = 1.

Bias
We denote the measured bit string from our B-QRNG as a vector B ∈ B M .The extracted bit string exclusively resulting from measurements of the nth qubit is given by the vector with b n ∈ B M/N .Based on its population, the corresponding expected probability p n (0) of obtaining the bit b for the nth qubit is given by with the indicator function such that p n (0) + p n (1) = 1.From an idealized prediction of the measurement results of qubits in a balanced superposition, we would assume that all expected probabilities p 0 (b), . . ., p N (b) correspond to the uniform probability with uncertainties coming only from the finite number of samples.We show the estimated probabilities in Fig. 4. It is apparent that all bit probabilities deviate significantly from their idealized value p, Eq. ( 5).In particular, we find an expected probability and standard deviation with respect to all measured bits of p(0) ≈ 0.5112 ± 0.0215.
( Figure 4: Measured bit distribution for each qubit from the B-QRNG on ibmq manhattan.We show the expected probability pn(0) of obtaining a zero bit from the measured bit string for the nth qubit, Eq. ( 3), and (stacked on top) its complement pn(1) = 1 − pn(0).Also shown are the corresponding expected probabilities with respect to all measured bits p(0) ≈ 0.51 and p(1) = 1 − p(0) ≈ 0.49, respectively, Eq. ( 6).Apparently, all bit distributions deviate differently from the uniform probability p, Eq. ( 5), which we assume to be a consequence of the imperfect hardware.The distributions with the highest (n = 50) and lowest (n = 19) expected probabilities of obtaining a zero bit are marked on top.
We assume that this is a consequence of the imperfect hardware with its decoherence and dissipation effects.In particular, the fact that p(0) > p( 1) is most likely a consequence of dissipation since a bit of 0 corresponds to an observation of a qubit ground state, whereas a bit of 1 is associated with an excited state.
From a χ 2 test (Pearson, 1900) on the measured bit distribution, the null hypothesis of a uniform zero bit occurrence can be rejected as expected with a confidence level of 1.0000.To further quantify the deviation of the measured probabilities from a uniform distribution, we utilize the discrete Hellinger distance (Hellinger, 1909) which can be used to measure similarities between two discrete probability distributions q 1 ≡ q 1 (i) and q 2 ≡ q 2 (i) defined on the same probability space Q.By iterating over all qubits we find the mean and standard deviation The mean value quantifies the average deviation of the measured qubit distributions from the idealized uniform distribution and confirms our qualitative observations.The non-negligible standard deviation results from the fluctuations in-between the individual qubit outcomes.

Randomness
Although quantum events intrinsically exhibit a truly random behavior, the output of our B-QRNG is the result of a complex physical experiment behind a technically sophisticated pipeline that appears as a black box to us and it can therefore not be assumed with certainty that its outcomes are indeed statistically independent.To examine this issue in more detail, we briefly study the randomness of the resulting bit string in the following.
For this purpose, we make use of the Wald-Wolfowitz runs test (Wald and Wolfowitz, 1940), which can be used to test the null hypothesis that elements of a binary sequence are mutually independent.We perform a corresponding test on the measured bit string from the nth qubit b n , Eq. ( 2), and denote the resulting p-value as p r n .The null hypothesis has to be rejected if this probability does not exceed the significance level, which we choose as α = 0.05.2).We show p-values in different colors depending on whether or not they exceed α = 0.05.In case of p r n ≤ α, the corresponding hardware indices are additionally denoted on top of the plot and indicate the qubits that fail the test of randomness.
The test results are shown in Fig. 5.We find that the bit strings from almost all qubits pass the test and can therefore be considered random in the sense of the test criteria.However, the bit strings from five qubits fail the test, which implies non-randomness.We also perform a test on the total bit string B, which yields the p-value p r ≈ 0.0000 < α such that the test also fails for the entire sequence of random numbers.
Summarized, we find that the reliability of the generated quantum random numbers is questionable.A typical binary random sequence from a PRNG of the same length as B can be expected to pass the Wald-Wolfowitz runs test.However, within the scope of this work, the reason for this observation cannot be further investigated and we accept it as an integral part of our naive approach to the B-QRNG.Further work regarding the properties of our setup (applied to a different quantum hardware) can be found in Tamura and Shikano (2020), Shikano et al. (2020), Tamura and Shikano (2021), which contain similar observations.A lack of reliability is not surprising considering the fact that we have not aimed for a certified random number generation and our setup is motivated by a strongly idealized model of quantum gate computers, as already mentioned above.

Integers
Next, we analyze the resulting random 32-bit integers.To obtain these, we convert B into a vector of integers B → I ∈ {0, . . ., 2 C − 1} L by consecutively grouping its elements into bit strings of length C and converting them to non-negative integers according to with j ∈ {1, . . ., L}.For a bit string of Bernoulli random variables B with a fair success probability p = p, Eqs. ( 1) and ( 5), the sequence of random integers in I would be uniformly distributed.
However, as we have seen before, this assumption does not hold true for the results from our B-QRNG.14), and entropy SI (p), Eq. ( 15), for a random integer from the domain {0, . . ., ξ} resulting from a string of C random bits from a Bernoulli process with success probability p, Eq. ( 1).The expected value is proportional to p, whereas the entropy attains its maximum value at p = 0.5.We apply rescaling factors to constrain both quantities to the same scale.
population of the kth bin is given by with the indicator function for k ∈ {1, . . ., K}.
Additionally, we consider a simplified theoretical description of the bin population by modeling the bit string as the result of a Bernoulli process with a single success probability p, Eq. (1).That is, the bits represent i.i.d.Bernoulli random variables.The integer j ∈ {0, . . ., ξ} corresponding to a bit string τ (j) ∈ B C is determined in analogy to Eq. ( 9) such that C−1 i=0 τ i+1 (j)2 i = j.The probability mass function of the resulting integers can consequently be written as Its expected value is given by and the information entropy (Shannon, 1948) in nats by We show a plot of Eqs. ( 14) and ( 15) in Fig. 6.Finally, the predicted (possibly non-integer) population of the kth bin reads which we use as our simplified model of Eq. ( 11).9), are rescaled by a division by (2 32 −1) and sorted into 250 equally sized bins.The kth bin (with k ∈ {1, . . ., 250}) has a population of c k according to Eq. ( 11).For comparison, the corresponding theoretic bin population of the kth bin ĉk (p(b)) is shown, which is obtained from a Bernoulli process according to Eq. ( 16) with a success probability of p = p(1) = 1 − p(0), Eq. ( 6).The minor deviations between the two populations results from the finite number of measured samples as well as the observation that bits from different qubits have their own success probability, cf.Fig. 4.An outline of the uniform bin population is shown as a frame of reference.
We show both the measured bin population c k , Eq. ( 11), and the theoretical bin population ĉk (p), Eq. ( 16), for a success probability p corresponding to the expected probability of all measured bits p(1) = 1 − p(0), Eq. ( 6), in Fig. 7. Clearly, the generated sequence of random integers is not uniformly distributed (i.e., with a population of L/K in each bin).Instead, we find a complex arrangement of spikes and valleys in the bin populations.
Specifically, since p(0) > p(1), random integers become more probable when their binary representation contains as many zeros as possible, which is reflected in the bin populations.In particular, the first bin (containing the smallest integers) has the highest population.The minor deviations between the measured and the theoretic bin populations results from the finite number of measured samples and the simplification of the theoretical model: The success probability of each bit from the B-QRNG specifically depends on the qubit it is generated from as shown in Fig. 4, whereas our theoretical model only uses one success probability for all bits corresponding to p(1).
We recall the Hellinger distance, Eq. ( 7), to quantify the deviation of the distribution of integers from the uniform distribution.Specifically, we find where we have made use of the measured integer distribution p c ≡ p c (k) ≡ c k /L and the corresponding uniform distribution pc ≡ pc (k) ≡ 1/K with k ∈ {1, . . ., K}.This metric quantifies our observations from Fig. 7.
For comparative purposes, we show additional theoretical bin populations for other success probabilities in Fig. 8.As expected, the rugged pattern of the distribution becomes sharper for lower or higher values of p and the deviation from the uniform distribution increases.
show greatly differing patterns in learning patterns and their overall results when using PRNG and QRNG methods to generate the initial weights." To ensure repeatability of our experiments, PyTorch is run in deterministic mode with fixed (i.e., hard-coded) random seeds.The main hardware component is a Nvidia GeForce GTX 1080 Ti graphics card.Our Python implementation of the experiments is publicly available online (Wolter, 2021).
In the present section, we first summarize the considered RNGs.Subsequently, we present the two experiments and discuss their results.

RNGs
In total, we use four different RNGs to initialize neural network weights: 1. B-QRNG: Our hardware-biased quantum random number generator introduced in section 3 from which we extract the integer sequence I according to Eq. ( 9).The data is publicly available online (Heese et al., 2023).
2. QRNG: A bias-free quantum random number generator (ANU QRNG, 2021) based on quantum-optical hardware that performs broadband measurements of the vacuum field contained in the radio-frequency sidebands of a single-mode laser to produce a continuous stream  (Symul et al., 2011, Haw et al., 2015).We particularly use a publicly available pre-generated sequence of random bits from this stream (ANU QRNG, 2017), extract the first M bits and convert them into the integer sequence I ′ ∈ {0, . . ., 2 C − 1} L according to Eq. ( 9).Based on the Hellinger distance H(p ′ c , pc ) ≈ 0.0018, Eq. ( 7), with 12), for k ∈ {1, . . ., K}, we find that I ′ is indeed much closer to the uniform distribution than I, Eq. ( 17).We visualize the corresponding integer distribution in Fig. 9.

B-PRNG:
A "pseudo hardware-biased quantum random number generator", which generates a bit string of i.i.d.Bernoulli random variables with a success probability p corresponding to the expected probability of all measured bits p(1) = 1 − p(0), Eqs. ( 1) and ( 6), using the native pseudo-random number generator from PyTorch.The bit strings are then converted into integers according to Eq. ( 9).Their probability mass function is given by Eq. ( 13).
All of these RNGs, which are summarized in Tab. 1, produce 32-bit random numbers.However, the random numbers from the B-QRNG and the QRNG are taken in order (i.e., unshuffled) from the predefined sequences I and I ′ , respectively, whereas the PRNG and the B-PRNG algorithmically generate random numbers on demand based on a given random seed.
For the sake of completeness, we also analyze the binary random numbers from the B-QRNG and the QRNG, respectively, with the NIST Statistical Test Suite for the validation of random number generators (Rukhin et al., 2010, NIST, 2010).For this purpose, the bit strings are segmented into smaller sequences and multiple statistical tests are evaluated on each sequence.Each test consists of one or more sub-tests with the null hypothesis that the sequence being tested is random.Based on the proportion of sequences for which a sub-test satisfies the null hypothesis, it is considered as passed or rejected, where a rejection indicates non-randomness.A more detailed discussion about this procedure can also be found in Sýs et al. (2015).
A summary of our results is listed in Tab. 2. It shows that the B-QRNG numbers fail a majority of statistical tests of randomness, as expected, whereas the QRNG passes all.

CNN
In the first experiment, we consider a LeNet-5 inspired CNN with ReLU activation functions and without dropout (Lecun et al., 1998).The network weights are initialized as proposed by He et al. (2015), but we use a uniform distribution instead of a normal distribution, as is also common.This means that each weight w i (with i = 1, 2, . . . ) is sampled uniformly according to where h i > 0 is chosen such that a constant output variance can be achieved over all layers.The network biases are initialized analogously.
As data we use the MNIST handwritten digit recognition problem (LeCun et al., 1998), which contains 70 000 grayscale images of handwritten digits in 28 × 28 pixel format.The digits are split into a training set of 60 000 images and a training set of 10 000 images.The network is trained using Adadelta (Zeiler, 2012) over d ≡ 14 epochs.
In Fig. 10 we show the CNN test accuracy convergence for each epoch over 31 independent training runs using the four RNGs from section 4.1.The use of a biased RNG means that the He  (NIST, 2010) applied to the whole sequence of binary random numbers from the B-QRNG and the QRNG, respectively.For each of the two bit strings, a series of statistical tests is evaluated, where each test consists of one or more sub-tests with the null hypothesis that the sequence being tested is random.The bit strings are segmented into sequences and each sub-test is run on each of these sequences.A sub-test is accepted or rejected based on a certain proportion of sequences that satisfy the null hypothesis.A rejection therefore indicates non-randomness.We choose 100 bitstreams in the program options such that each sequence contains at least 10 6 bits, as recommended.The required number of passed sequences of the total number of sequences for the acceptance of a sub-test is 96 of 100 for all tests except for the Random excursion tests, for which it is 68 of 72 due to a reduced number of effectively used sequences.The predefined standard parameters are used for all tests (e. g., significance level α = 0.01).We list for each statistical test the corresponding number of accepted ("✓") and rejected ("✗") sub-tests.The total number of acceptances and rejections are shown at the bottom in bold.In addition, the column "passed" contains information about the number of passed sequences.For tests with only one or two sub-tests, we explicitly list the number of passed sequences.Otherwise, we present the respective means and standard deviations.A detailed description of the software and its statistical tests of randomness can be found in Rukhin et al. (2010).Summarized, the QRNG data passes all tests of randomness, whereas most of the tests fail for the B-QRNG data, which indicates non-randomness.Note that the Random excursions tests are not applicable for the B-QRNG data since they require the acceptance of the rejected Frequency test (Sýs et al., 2015).et al. initialization is actually effectively realized based on a non-uniform distribution instead of a uniform distribution.Therefore, such an approach could potentially be considered a new type of initialization strategy (depending on the bias), which is why one might expect a different training efficiency.However, the results show that the choice of RNG for the network weight initialization has no major effect on the CNN test accuracy convergence.Only a closer look reveals that the mean QRNG results seem to be slightly superior to the others in the last epochs.
To quantify this observation, we utilize Welch's (unequal variances) t-test for the null hypothesis that two independent samples have identical expected values without the assumption of equal population variance (Welch, 1947).We apply this test to two of each of the four results from different RNGs, where the resulting test accuracies from all runs in a specific epoch are treated as samples.We denote the two results to be compared as x and y, respectively, with x, y ∈ R 31×d for 31 runs and d epochs.Consequently, for each pair of results and each epoch i ∈ {1, . . ., d}, we obtain a two-tailed p-value p t i (x, y).The null hypothesis has to be rejected if such a p-value does not exceed the significance level, which we choose as α = 0.05.
We are particularly interested whether the aforementioned hypothesis holds true for all epochs.To counteract the problem of multiple comparisons, we use the Holm-Bonferroni method (Holm, 1979)  is fulfilled, no overall statistically significant deviation between the results from different RNGs is present.
In addition, we also quantify the correlation of x and y using the Pearson correlation coefficient (Pearson, 1895) of the mean values over all runs, where we make use of the abbreviations x′ For the results from the CNN experiment, we obtain the similarity and correlation metrics listed in Tab. 3 in the rows marked with "CNN".Summarized, we find a high mutual similarity (Eq.( 19) holds true) and almost perfect mutual correlations of the results.This means that the choice of RNG for the network weight initialization has no statistically significant effect on the CNN test accuracy convergence and, in particular, the QRNG results are not superior despite the visual appearance in Fig. 10.At this point, the question arises whether a different bias of the RNGs might have led to better training results.To answer this question, we consider additional pseudo-random number generators B-PRNG(p), which are based on a Bernoulli process with success probability p, Eq. ( 1), such that the originally considered B-PRNG corresponds to B-PRNG(p(1)), Eq. ( 6), and the PRNG corresponds to B-PRNG(0.5).In the extreme cases of p = 0 and p = 1, B-PRNG(p) is not random anymore and produces only constant values of 0 and 2 32 − 1, respectively.The probability mass function of the resulting integers is given by Eq. ( 13).We train the CNN again on the MNIST data set with a weight initialization based on B-PRNG(p) for different values of p ∈ [0, 1] and consider the test accuracy at epoch 14.
The results are shown in Fig. 11.Clearly, the mean test accuracy attains a maximum at p = 0.5, which corresponds to an unbiased pseudo-random number generator (i.e., the PRGN).For smaller and larger success probabilities, the mean test accuracy decreases.In particular, we observe a steep drop in performance for p < 0.2 and p > 0.95, which indicates that a bias of the random number generator towards 0 has more severe effects than a bias towards 1.The worst performance is achieved for p = 0 and p = 1, respectively.
We recall that for p = 0.5, weights are sampled uniformly around zero, Eq. ( 18).Thus, for p > 0.5, the weights are more probable to be positive, whereas for p < 0.5, they are more probable to be negative, cf.Eq. ( 14).Since our CNN contains ReLU activation functions, a shift of the weights towards negative values leads to vanishing gradients.According to our experiments, this seems to become significant for p < 0.2.On the other hand, an equivalent shift towards positive values does not drastically decrease the training performance and even for p = 0.95 the test accuracy is above 98.6 %.However, for p = 1 the test accuracy also drops.We think that the reason for this behavior is that the weights are in this case constant and attain the maximum value of the distribution, Eq. ( 18).The resulting lack of diversity, which is for example evident from the entropy, Eq. ( 15), is probably the cause for the bad training performance (Frankle and Carbin, 2019).

RNN
In the second experiment, we consider a recurrent LSTM cell with a uniform initialization in analogy to Eq. ( 18), which we apply on the synthetic adding and memory standard benchmarks The results are shown in Fig. 12, where we present the loss for each of the 79 epochs over 31 independent training runs for both problems.Again, we compare the results using random numbers from the four RNGs from section 4.1.The use of a biased RNG effectively realizes a non-uniform initialization (depending on the bias) in comparison with the uniform initialization from a non-biased RNG.However, we find that no RNG yields a major difference in performance.
In analogy to the first experiment, we list the similarity and correlation metrics in Tab. 3 in the rows marked with "RNN-M" and "RNN-A", respectively.Again, we find a high mutual similarity (Eq.( 19) holds true) and correlation.Thus, the choice of RNG also has no statistically significant effect in this second experiment.Due to the numerical effort required to train the RNNs, we cannot perform an analysis of different biases of RNGs as in the first experiment.1).We consider p ∈ {0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1}.Shown are mean values over 30 runs with the respective standard deviations (one sigma) as error bars for (a) the full bias range and (b) a zoom on the peak of the accuracy at p = 0.5.For comparison, we also plot the corresponding results from Fig. 10 for the B-PRNG with p = p(1), Eq. ( 6), as well as for the PRNG with p = 0.5.

Conclusions
Summarized, by running a naively designed quantum random number generator on a quantum gate computer, we have generated a random bit string.Its statistical analysis has revealed a significant bias and mutual dependencies as imposed by the quantum hardware.When converted into a sequence of integers, we have found a specially shaped distribution of values with a rich pattern.We have utilized these integers as hardware-biased quantum random numbers (B-QRNG).Motivated by the results from Bird et al. (2020), we have deliberately chosen to use these biased and correlated random numbers to study their impact on machine learning algorithms.Specifically, we have studied their effect on the initialization of artificial neural network weights in two experiments.For comparison, we have additionally considered unbiased random numbers from another quantum random number generator (QRNG) and a classical pseudo-random number generator (PRNG) as well as random numbers from a classical pseudo-random number generator replicating the hardware bias (B-PRNG).The two experiments consider a CNN and a RNN, respectively, and show no statistically significant influence of the choice of RNG.
Despite a similar setup, we have not been able to replicate the from Bird et al. (2020), where it is stated that quantum random number generators and pseudo-random number generators "do inexplicably produce different results to one another when employed in machine learning."However, we have not explicitly attempted to replicate the numerical experiments from the aforementioned work, but have instead considered two different examples that we consider typical applications of neural networks in machine learning.
Since our results are only exemplary, it may indeed be possible that there is an advantage in the usage of biased quantum random numbers for certain applications.Based on our studies, we expect, however, that in such cases it will in fact not be the "true randomness" of the quantum random numbers, but rather the opposite -their hardware-induced bias, including possible correlations -that will cause an effect.But is quantum hardware really necessary to produce such results?It seems that classical pseudo-random number generators are also able to mimic these effects.Even more, because the reliability and security of PRNGs can be ensured with less effort and a greater confidence than that of gate-based QRNGs on NISQ devices.Therefore, we think that for typical machine learning applications the usage of (high-quality) pseudo-random numbers is sufficient.Accordingly, a more elaborate experimental or theoretical study of the effects of biased pseudo-random numbers (with particular patterns) on certain machine learning applications could be a suitable research topic, e. g., to better understand the claims from Bird et al. (2020).
Repeatability is generally difficult to achieve for numerical calculations involving random numbers (Crane, 2018).In particular, our B-QRNG can in principle not be forced to reproduce a specific random sequence (as opposed to PRNGs).Furthermore, the statistics of the generated quantum random numbers may depend on the specific configuration of the quantum hardware at the time of operation.It might therefore be possible that a repetition of the numerical experiments with quantum random numbers obtained at a different time or from a different quantum hardware may lead to significantly different results.To ensure the greatest possible transparency, the source code for our experiments is publicly available online (Wolter, 2021) and may serve as a point of origin for further studies.

Figure 2 :
Figure 2: Main components of our B-QRNG setup: (a) topology diagram of the backend and (b) circuit diagram.

Figure 3 :
Figure3: Bit string composition from our B-QRNG.A single job is submitted to the backend, it consists of 564 experiments.In each experiment, 8192 shots are performed.In each shot, each of the 65 qubits yields a single bit.The resulting bit string consequently contains 300 318 720 bits.

Figure 5 :
Figure5: Results of Wald-Wolfowitz runs test on the bit strings of all qubits, where p r n denotes the resulting p-value of the bit string of the nth qubit bn, Eq. (2).We show p-values in different colors depending on whether or not they exceed α = 0.05.In case of p r n ≤ α, the corresponding hardware indices are additionally denoted on top of the plot and indicate the qubits that fail the test of randomness.

Figure 6 :
Figure6: Expected value Î(p), Eq. (14), and entropy SI (p), Eq. (15), for a random integer from the domain {0, . . ., ξ} resulting from a string of C random bits from a Bernoulli process with success probability p, Eq. (1).The expected value is proportional to p, whereas the entropy attains its maximum value at p = 0.5.We apply rescaling factors to constrain both quantities to the same scale.

Figure 7 :
Figure7: Measured distribution of 32-bit integers from the B-QRNG.The values from the generated vector of random integers I, Eq. (9), are rescaled by a division by (2 32 −1) and sorted into 250 equally sized bins.The kth bin (with k ∈ {1, . . ., 250}) has a population of c k according to Eq. (11).For comparison, the corresponding theoretic bin population of the kth bin ĉk (p(b)) is shown, which is obtained from a Bernoulli process according to Eq. (16) with a success probability of p = p(1) = 1 − p(0), Eq. (6).The minor deviations between the two populations results from the finite number of measured samples as well as the observation that bits from different qubits have their own success probability, cf.Fig.4.An outline of the uniform bin population is shown as a frame of reference.

Figure 9 :Figure 10 :
Figure 9: Distribution of 32-bit integers from the QRNG in analogy to Fig.The values from the vector of random integers I ′ are rescaled by a division by (2 32 − 1) and sorted into 250 equally sized bins.The population of the kth bin (with k ∈ {1, . . ., 250}) is denoted by c ′ k .For comparison, we also show the corresponding population c k , Eq. (11), from the B-QRNG and an outline of the uniform bin population.

Figure 11 :
Figure11: CNN test accuracy on the MNIST data at epoch 14 using different pseudo-random number generators B-PRNG(p), which are based on a Bernoulli process with success probability p, Eq. (1).We consider p ∈ {0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1}.Shown are mean values over 30 runs with the respective standard deviations (one sigma) as error bars for (a) the full bias range and (b) a zoom on the peak of the accuracy at p = 0.5.For comparison, we also plot the corresponding results from Fig.10for the B-PRNG with p = p(1), Eq. (6), as well as for the PRNG with p = 0.5.

Figure 12 :
Figure 12: RNN convergence on two benchmark data sets using four different RNGs (B-QRNG, QRGN, PRGN and B-PRNG from section 4.1).Shown are mean values over runs with the respective standard deviations (one sigma) in analogy to Fig. 10.The inset plot zooms in on the means of the final epochs. ).

Table 1 :
Overview over the four considered RNGs presented in section 4.1, which are either based on a classical pseudo-random number generator or a quantum experiment (as indicated by the rows) and yield either unbiased or biased outcomes (as indicated by the columns).

Table 2 :
Summary of the results from the NIST Statistical Test Suite for the validation of random number generators and y ′ i ≡ 31 j=1 y ji /31.A coefficient of 1 implies a perfect linear correlation of the means, whereas a coefficient of 0 indicates no linear correlation.

Table 3 :
(Hochreiter and Schmidhuber, 1997)st over all epochs pt min (x, y), Eq. (19), and Pearson correlation coefficient ρ(x, y), Eq. (20), of the experimental data.The metrics are listed for all mutual combinations of the results from the four RNGs (B-QRNG, QRGN, PRGN, and B-PRNG from section 4.1) of all experiments (CNN, RNN-M, and RNN-A from sections 4.2 and 4.3, respectively).Hochreiter and Schmidhuber, 1997)with T = 64 for the memory problem.For this purpose, we use RMSprop(Hinton, 2012)with a step size of 10 −3 to optimize LSTM cells(Hochreiter and Schmidhuber, 1997)with a state size of 256.For each problem, a total of 9 × 10 5 updates with training batches of size 128 is computed until the training stops.In total, there are ⌊9×10 5 /128⌋ = 7031 training steps.Since the synthetic data sets are infinitely large, overfitting is not an issue and we can consequently use the training loss as performance metric.Specifically, we consider 89 consecutive training steps as one epoch, which leads to d ≡ 4687/89 = 79 epochs in total, each associated with the mean loss of the corresponding training steps.