1 Introduction

The intrinsic non-deterministic nature of quantum mechanics (Kofler & Zeilinger, 2010) makes random number generation a native application of quantum computers. It has been exemplarily studied in Bird et al. (2020) how such quantum random numbers can affect stochastic machine learning algorithms. For this purpose, electron-based superposition states have been prepared and measured on quantum hardware to create random 32-bit integers. These numbers have subsequently been used to initialize the weights in neural network models and to determine random splits in decision trees and random forests. Bird et al. have observed that quantum random numbers can lead to superior results for certain numerical experiments in comparison with classically.Footnote 1 generated pseudo-random numbers.

However, the authors have not further explained this behavior. In particular, they have not discussed the statistical properties of the generated quantum numbers. Due to technical imperfections and physical phenomena like decoherence and dissipation, measurement results from a quantum computer might in fact significantly deviate from idealized theoretical predictions (Tamura & Shikano, 2020; Shikano et al., 2020; Tamura & Shikano, 2021). This raises the question of whether it is not the superiority of the quantum random number generator to sample perfectly random from the uniform distribution that leads to the observed effect, but instead its ability to sample bit strings from a very particular distribution that is imposed by the quantum hardware.

We therefore revisit this topic in the present manuscript and generate biased random numbers using real quantum hardware, where the specifics of the bias are determined by the natural imperfections of the hardware itself. The bias is therefore not under our control and even beyond our full understanding. With this approach, we aim to better comprehend the effects observed by Bird et al. for an analogous setup and explore the resulting implications. Summarized, our main goal is to further study the results of that work and to analyze the effects of quantum and classical random numbers with and without biases on neural network initialization. Our analysis is mainly based on numerical experiments and statistical tests.

The structure of the remaining paper is as follows. In Sect. 2, we briefly summarize the background of the main ingredients of our work, namely quantum computing and random number generation. Subsequently, we present the setup of our quantum random number generator and discuss the statistics of its results in Sect. 3. In Sect. 4, we study the effects of the generated quantum random numbers on artificial neural network weight initialization using numerical experiments. Finally, we close with a conclusion.

2 Background

In the following, we provide a brief introduction to quantum computing and random number generation without claiming to be exhaustive. For more in-depth explanations, we refer to the cited literature.

2.1 Quantum computing

Quantum mechanics is a physical theory that describes objects at the scale of atoms and subatomic particles, e. g., electrons and photons (Norsen, 2017). An important interdisciplinary subfield is quantum information science, which considers the interplay of information science with quantum effects and includes the research direction of quantum computing (Nielsen & Chuang, 2011).

2.1.1 Quantum devices

A quantum computer is a processor which utilizes quantum mechanical phenomena to process information (Benioff, 1980; Grumbling & Horowitz, 2019). Theoretical studies show that quantum computers are able to solve certain computational problems significantly faster than classical computers, for example, in the fields of cryptography (Pirandola et al., 2020) and quantum simulations (Georgescu et al., 2014). Recently, different hardware solutions for quantum computers have been realized and are steadily improved. For example, superconducting devices (Huang et al., 2020) and ion traps (Bruzewicz et al., 2019) have been successfully used to perform quantum computations. However, various technical challenges are still unresolved so that the current state of technology, which is subject to substantial limitations, is also phrased as noisy intermediate-scale quantum (NISQ) computing (Preskill, 2018). Nevertheless, quantum supremacy on NISQ devices has already been verified experimentally for a specialized task of randomized sampling (Boixo et al., 2018; Wu et al., 2021).

There are different theoretical models to describe quantum computers, typically used for specific hardware or in different contexts. We only consider the quantum circuit model, in which a computation is considered as a sequence of quantum gates and the quantum computer can consequently be seen as a quantum circuit (Nielsen & Chuang, 2011). In contrast to a classical computer, which operates on electronic bits with a well-defined binary state of either 0 or 1, a quantum circuit works with qubits. A qubit is described by a quantum mechanical state, which can represent a binary 0 or 1 in analogy to a classical bit. In addition, however, it can also represent any superposition of these two values. Such a quantum superposition is a fundamental principle of quantum mechanics and cannot be explained with classical physical models. Moreover, two or more qubits can be entangled with each other. Entanglement is also a fundamental principle of quantum mechanics and leads to non-classical correlations (Bell & Aspect, 2004).

In order to illustrate the aforementioned fundamental quantum principles and to connect them with well-known notions from the field of machine learning, one can consider the following intuitive (but physically inaccurate) simplifications: Superposition states can be understood as probability distributions over a finite state space, while entanglement amounts to high-order dependencies between univariate random variables. This intuition particularly emphasizes the close relationship between quantum mechanics and probability theory.

Any quantum computation can be considered as a three-step process, which is sketched in Fig. 1. First, an initial quantum state of the qubits is prepared, usually a low-energy ground state. Second, a sequence of quantum gates deterministically transforms the initial state into a final quantum state. Third, a measurement is performed on the qubits to determine an outcome. When a qubit is measured, the result of the measurement is always either 0 or 1, but the observation is non-deterministic with a probability depending on the quantum state of the qubit at the time of the measurement.

Fig. 1
figure 1

Sketch of the three-step quantum computation process consisting of an initial state preparation, a sequence of gate operations and a final measurement, which yields the result of the computation. Also shown are the errors associated with each step in the computation process: the state preparation errors, the gate errors, and the measurement errors, respectively. They are all hardware-related errors, which can in principle be reduced (or even eliminated) by technological advances. These errors can cause a hardware-related uncertainty (statistical and systematic) of the computation result. On the other hand, the intrinsic randomness of quantum mechanics emerging at the time of the measurement causes an intrinsic uncertainty of the computation result, which is an integral part of quantum computing and can be exploited to construct QRNGs

In this sense, a quantum computation includes an intrinsic element of randomness. This randomness is in particular not a consequence of lack of knowledge about the quantum system, but an integral part of quantum mechanics itself. In constrast to classical mechanics, where complete knowledge about the intitial state of a system allows to infer all later (and earlier) states, complete knowledge about a quantum mechanical state does not generally allow the prediction of a single measurement outcome, but only its probability as determined by Born’s rule (Norsen, 2017). The non-deterministic nature of quantum mechanics relies on the assumption that there are no so-called hidden variables whose knowledge would lead to a deterministic behavior (Norsen, 2017). Various theoretic and experimental evidences, for example based on Bell’s theorem (Bell & Aspect, 2004) or the Kochen-Specker theorem (Kochen & Specker, 1975), strongly suggest that there are no such hidden variables. However, a conclusive answer to the question of quantum non-determinism is still in scientific discourse. For a more detailed discussion about this topic, we refer to Bera et al. (2017) and references therein. Since our work concerns the practical application of random numbers in machine learning algorithms and a theoretical provability of their randomness from first principles is beyond the scope of this paper, we presume in the following that quantum mechanics is indeed intrinsically non-deterministic for all purposes considered.

NISQ devices, as their name suggests, are typically only capable of computing noisy results. A fundamental reason is that the quantum computer, despite all technical efforts, is not perfectly isolated and interacts (weakly) with its environment. In particular, there are two major effects of the environment that can contribute to computational errors, namely dissipation and decoherence in the sense of dephasing (Zurek, 2007; Vacchini, 2016). Dissipation describes the decay of qubit states of higher energy due to an energy exchange with the environment. Decoherence, on the other hand, represents a loss of quantum superpositions as a consequence of environmental interactions. Typically, decoherence is more dominating than dissipation. Beyond these typical effects, other (possibly unknown) influences can occur, which can lead to additional uncertainties.

To compensate the resulting computational errors to a certain extend, error correction can be used (Roffe, 2019). However, it is generally not possible to completely eliminate statistical (also called aleatoric) or systematic (also called epistemic) uncertainties, which might originate from quantum and classical effects, respectively. Therefore, quantum algorithms must be designed sufficiently robust for practical applications on NISQ hardware.

In Fig. 1, we briefly outline different error sources in the quantum computation process. Specifically, each computation step is affected by certain hardware-related errors, which are referred to as state preparation errors, gate errors, and measurement errors, respectively (Nachman & Geller, 2021). All of them are a consequence of the imperfect physical hardware and they are non-negligible for NISQ devices (Leymann & Barzen, 2020). The resulting hardware-related uncertainty might be both statistical and systematic. In addition, the final measurement step is also affected by the intrinsic randomness of quantum mechanics. The measurement ultimately yields a computation result that contains two layers of uncertainty (Heese & Freyberger, 2014): First, the uncertainty caused by the hardware-related errors, and second, the uncertainty caused by the intrinsic randomness. While technological advances (like better hardware and improved algorithm design) can in principle reduce (or even eliminate) hardware-related errors and thus the hardware-related uncertainty, the intrinsic uncertainty is an integral part of quantum computing. It is this intrinsic uncertainty which can be exploited to construct QRNGs.

2.1.2 Quantum machine learning

In a machine learning context, we may identify a quantum circuit with a parameterizable probability distribution over all possible measurement outcomes, where each measurement of the circuit draws a sample from this distribution. The interface between quantum mechanics and machine learning can be attributed to the field of quantum machine learning (Biamonte et al., 2017). A typical use case is the processing of classical data using algorithms that are fully or partially computed with quantum circuits, which is also called quantum-enhanced machine learning (Dunjko et al., 2016).

The noisy nature of NISQ devices presents a challenge for machine learning applications. On the other hand, the probabilistic nature of quantum computing can be related to the statistical background of machine learning algorithms, for which the understanding and modeling of uncertainty is crucial. A review about different types of uncertainty in machine learning and how to typically deal with them can for example be found in Hüllermeier and Waegeman (2021).

2.2 Random number generation

For many machine learning methods, random numbers are a crucial ingredient and therefore random number generators (RNGs) are an important tool. Examples include sampling from generative models like generative adversarial networks, variational autoencoders or Markov random fields, parameter estimation via stochastic optimization methods, as well as randomized regularization and validation techniques, randomly splitting for cross-validation, drawing of random mini-batches, and computing a stochstic gradient, to name a few. Randomness also plays an important role in non-deterministic optimization algorithms or the initialization of (trainable) neural network parameters (Glorot & Bengio, 2010; He et al., 2015).

At its core, a RNG performs random coin tosses in the sense that it samples from a uniform distribution over a binary state space (or, more generally, a discrete state space of arbitrary size). Given a sequence of randomly generated bits, corresponding integer or floating-point values can be constructed straightforwardly.

2.2.1 Classical RNGs

In the classical world, there are two main types of random number generators. Pseudo-random number generators (PRNGs) represent a class of algorithms to generate a sequence of apparently random (but in fact deterministic) numbers from a given seed (James & Moneta, 2020). In other words, the seed fully determines the order of the bits in the generated sequence, but the statistical properties of the sequence (e. g., mean and variance) are independent of the seed (as determined by the underlying algorithm). We remark that PRNGs can also be constructed based on machine learning algorithms (Pasqualini & Parton, 2020).

The more advanced true random number generators (TRNGs) are hardware devices that receive a signal from a complex physical process, which is unpredictable for all practical purposes, to extract random numbers (Yu et al., 2019). A multitude of physical effects can be used as sources of entropy for TRNGs, with only some of them directly linked to quantum phenomena. For example, metastability in latches can be exploited in specialized electrical circuits (CMOS devices) to yield random bits (Tokunaga et al., 2008; Holleman et al., 2008). Usually, such setups are built to calibrate themselves to account for hardware-inherent bias effects. Multiple of these self-calibrating entropy sources can be combined to further increase the cryptographic quality (Mathew et al., 2015). Other approaches make use of ring oscillators to source randomness from timing jitter (Kim et al., 2017), or exploit random telegraph noise to produce bit streams (Puglisi et al., 2018; Brown et al., 2020).

For TRNGs, the lack of knowledge about the observed physical system induces randomness, but it cannot be guaranteed in principle that the dynamics of the underlying physical system are unpredictable (if quantum effects are not sufficiently involved). Likewise, the statistical properties of the generated random sequence are not in principle guaranteed to be constant over time since they are subject to the hidden process.

Independent of their source, random numbers have to fulfill two properties: First, they have to be truly random (i. e., the next random bit in the sequence must not be predictable from the previous bits) and second, they have to be unbiased (i. e., the statistics of the random bit sequence must correspond to the statistics of the underlying uniform distribution). In other words, they have to be secure and reliable. A “good” RNG has to produce numbers that fulfill both requirements. In practice, it is difficult to rigorously proof the quality of RNGs. For a bit sequence of finite length, there is no formal method to decide its randomness with certainty. On the other hand, an infinite bit sequence cannot be tested in finite time (Khrennikov, 2015). Therefore, statistical test are typically used to check specific properties of RNGs with a certain confidence.

Typically, statistical tests are organized in the form of test suites (e. g., the NIST Statistical Test Suite described in Rukhin et al., 2010) to provide a comprehensive statistical screening. A predictive analysis based on machine learning methods can also be used for a quality assessment (Li et al., 2020). It remains a challenge to certify classical RNGs in terms of the aforementioned criteria (Balasch et al., 2018) to, e. g., ensure cryptographical security.

When implementing learning and related algorithms, PRNGs are typically used. Despite the broad application of randomness in machine learning, the apparent lack of research regarding the particular choice of RNGs suggests that it is usually not crucial in practice. This assumption has been experimentally verified, e. g., in Rajashekharan and Shunmuga Velayutham (2016) for differential evolution and is most certainly due to the fact that modern PRNGs seem to be sufficiently secure and reliable for most practical purposes. The influence of different seeds for a PRNG on various deep learning algorithms for computer vision has been studied empirically in Picard (2021) with the result that it is often possible to find seeds that lead to a much better or much worse performance than the average. This highlights the fact that numerical experiments with non-deterministic algorithms have to be conducted carefully to account for the variance of random numbers. However, the specific implications of varying degrees of security and reliability of RNGs on machine learning applications generally remain unresolved, i. e., it generally remains unclear whether a certain machine learning algorithm may suffer or benefit from the artifacts of an imperfect RNG. In the present work, we approach this still rather open field of research by specifically considering the randomness in artificial neural network initialization.

2.2.2 Quantum RNGs

As previously stated, quantum computers (or, more generally, quantum systems) have an intrinsic ability to produce truly random outcomes in a way that cannot be predicted or emulated by any classical device (Calude et al., 2010). Therefore, it seems natural to utilize them as a source of random numbers in the sense of a quantum random number generator (QRNG). Such QRNGs (Herrero-Collantes & Garcia-Escartin, 2017) have already been realized with different quantum systems, for example using nuclear decay (Park et al., 2020) or optical devices (Leone et al., 2020).

Summarized, the main difference between randomness from classical systems and randomness from quantum systems is that a classical system is fully deterministic and therefore all randomness can only result from a lack of knowledge about the system, whereas a quantum system is non-deterministic and therefore – even with perfect knowledge – an intrinsic randomness may be involved. In this sense, the origin of randomness is different for quantum and classical RNGs. However, it is in principle not possible to mathematically distinguish the randomness of a classical system from the randomness of a quantum system (Khrennikov, 2015).

A simple QRNG can be straightforwardly realized using a quantum circuit. For this purpose, each of its qubits has to be brought into a superposition of 0 and 1 such that both outcomes are equally probable to be measured. This operation can for example be performed by applying a single Hadamard gate on each qubit (Nielsen & Chuang, 2011). Each measurement of the circuit consequently generates a sequence of i.i.d. random bits, one for each qubit.

However, when computing this simple QRNG circuit on a NISQ device, it can be expected that the results will deviate from the theoretic expectations due to statistical and systematic uncertainties such that the QRNG is likely to produce biased outcomes. This means that it is in fact not guaranteed that the measurement outcomes obey the theoretically predicted probability distribution of a fair coin toss. It is not even guaranteed that the measurement outcomes are truly random in the sense that bits are generated entirely independent. As a consequence (and based on the fact that quantum non-determinism is not ultimately resolved), it cannot be generally taken for granted that random numbers from such a QRNG are naturally “better” than random numbers from PRNGs, both with respect to security and reliability. For this reason, technically more refined solutions are necessary to realize trustworthy QRNGs on NISQ decices. Moreover, QRNGs have to be certified similar to classical RNGs. For example, to enable a theoretically secure QRNG, the Bell inequality (Pironio et al., 2010) or the Kochen-Specker theorem can be utilized (Abbott et al., 2014, 2015; Kulikov et al., 2017). For an experimental verification of random bit sequences from a QRNG, entanglement-based public tests of randomness can be used without violating the secrecy of the generated sequences (Jacak et al., 2020).

Currently, there exist various commercial and non-commercial QRNGs, which can be used to create quantum random numbers on demand, for example ANU (2021). Although there still seem to be some practical challenges (Martínez et al., 2018; Petrov et al., 2020), theoretical and technological advances in the field will most certainly lead to a steady improvement of QRNGs.

3 Biased QRNG

Motivated by the work in Bird et al. (2020), we take a different approach than usual in this manuscript. Instead of aiming for a RNG with as little bias as possible, we discuss whether the typical bias in a naively implemented, gate-based QRNG can actually be beneficial for certain machine learning applications. In other words, we consider the bias that is “naturally” imposed by the quantum hardware itself (i. e., by the hardware-related errors outlined in Fig. 1). In addition to a bias, we also accept that the randomness of the results is not necessarily guaranteed in the sense that the QRNG can (to some degree) produce correlations or predictable patterns from systematic quantum hardware errors. Since the imperfections of the quantum hardware are beyond our control (i. e., they can in particular not be switched off at will), a RNG realized in this way contains unknown and uncontrollable elements. Therefore, we have to analyze its outcomes statistically to capture the effects of these elements on the generated random numbers. In the present section, we first describe our experimental setup for such a naively implemented QRNG and subsequently discuss the statistics of the resulting “hardware-biased” quantum random numbers.

3.1 Setup

To realize a hardware-biased QRNG (B-QRNG), we utilize a physical quantum computer, which we access remotely via Qiskit (Abraham et al., 2019) using the cloud-based quantum computing service provided by IBM Quantum (IBM, 2021). With this service, users can send online requests for quantum experiments using a high-level quantum circuit model of computation, which are then executed sequentially (LaRose, 2019). The respective quantum hardware, also called backend, operates on superconducting transmon qubits.

For our application, we specifically use the ibmq_manhattan backend (version 1.11.1), which is one of the IBM quantum Hummingbird r2 processors with \(N \equiv {65}\) qubits. A sketch of the backend topology diagram can be found in Fig. 2a. It indicates the hardware index of each qubit and the pairs of qubits that support two-qubit gate operations between them. IBM also provides an estimate for the relaxation time \(T_1\) and the dephasing time \(T_2\) for each qubit at the time of operation. The mean and standard deviation of these times over all qubits read \(T_1 \approx {59.11 +- 15.25\,\mathrm{\upmu \text {s}}}\) and \(T_2 \approx {74.71 +- 31.22\,\mathrm{\upmu \text {s}}}\), respectively.

Fig. 2
figure 2

Main components of our B-QRNG setup: a topology diagram of the backend and b circuit diagram

Initially, all qubits in this backend are prepared in the ground state. Our B-QRNG cicuit, which is sketched in Fig. 2b, consists of one Hadamard gate applied to each qubit such that it is brought into a balanced superposition of ground state and excited state. A subsequent measurement on each qubit should therefore ideally (i. e., in the error-free case) reveal an outcome of either 0 (corresponding to the ground state) or 1 (corresponding to the excited state) with equal probability. However, since we run the circuit on real quantum hardware, we can expect to obtain random numbers which deviate from these idealized outcomes due to hardware-related errors. An analogous setup with a different backend is considered in Tamura and Shikano (2020); Shikano et al. (2020); Tamura and Shikano (2021).

We sort the qubit measurements according to their respective hardware index in an ascending order so that each run of the backend yields a well-defined bit string of length N. Such a single run is called a shot in Qiskit. We perform sequences of \(S \equiv {8192}\) shots (which is the upper limit according to the backend access restrictions imposed by IBM) for which we concatenate the resulting bit strings in the order in which they are executed. Such a sequence of shots is called experiment in Qiskit. We repeat this experiment \(R \equiv {564}\) times (900 experiments is the upper limit set by IBM) and again concatenate the resulting bit strings in the order of execution. A sequence of experiments is denoted as a job in Qiskit and can be submitted directly to the backend. It is run in one pass without interruption from other jobs.

Our submitted job ran from March 5, 2021 10:45 AM GMT to March 5, 2021 11:58 AM GMT. The final result of the job is a bit string of length \(M \equiv NSR={300318720}\) as sketched in Fig. 3. The choice of R is determined by the condition \(M \gtrapprox {3e8}\), which we have estimated as sufficient for our numerical experiments. We split the bit string into chunks of length \(C \equiv {32}\) to obtain \(L \equiv M/C={9384960}\) random 32-bit integers, which we use for the following machine learning experiments.

Fig. 3
figure 3

Bit string composition from our B-QRNG. A single job is submitted to the backend, it consists of 564 experiments. In each experiment, 8192 shots are performed. In each shot, each of the 65 qubits yields a single bit. The resulting bit string consequently contains 300318720 bits

3.2 Statistics

Before we utilize our generated random numbers for learning algorithms, we first briefly discuss their statistics. The measurement results from the nth qubit can be considered as a Bernoulli random variable (Forbes et al., 2011), where \(n\in \{0,\dots ,64\}\) represents the hardware index as outlined in Fig. 2. Such a variable has a probability mass function

$$\begin{aligned} f(b;p) \equiv p^b (1-p)^{1-b} \end{aligned}$$
(1)

depending on the value of the bit \(b \in {\mathbb {B}}\) and the success probability \(p \in [0,1]\) of observing an outcome \(b=1\).

3.2.1 Bias

We denote the measured bit string from our B-QRNG as a vector \({\textbf {B}} \in {\mathbb {B}}^M\). The extracted bit string exclusively resulting from measurements of the nth qubit is given by the vector

$$\begin{aligned} {\textbf {b}}_n \equiv \left( B_{n+1}, B_{n+1+N}, \dots , B_{n+1+M-N} \right) \end{aligned}$$
(2)

with \({\textbf {b}}_n \in {\mathbb {B}}^{M/N}\). Based on its population, the corresponding expected probability \(p_n(0)\) of obtaining the bit b for the nth qubit is given by

$$\begin{aligned} p_n(b) = \frac{N\sum _{i=1}^{M/N}\mathbbm {1}_n(i,b)}{M} \end{aligned}$$
(3)

with the indicator function

$$\begin{aligned} \mathbbm {1}_n(i,b) \equiv {\left\{ \begin{array}{ll} 1 &{} \text {if}\; B_{n+(i-1)N+1} = b \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(4)

such that \(p_n(0) + p_n(1) = 1\). From an idealized prediction of the measurement results of qubits in a balanced superposition, we would assume that all expected probabilities \(p_0(b),\dots ,p_N(b)\) correspond to the uniform probability

$$\begin{aligned} {\tilde{p}} \equiv {\tilde{p}}(b) \equiv \frac{1}{2} \end{aligned}$$
(5)

with uncertainties coming only from the finite number of samples.

We show the estimated probabilities in Fig. 4. It is apparent that all bit probabilities deviate significantly from their idealized value \({\tilde{p}}\), Eq. (5). In particular, we find an expected probability and standard deviation with respect to all measured bits of

$$\begin{aligned} {\bar{p}}(0) \approx {0.5112 \pm 0.0215}. \end{aligned}$$
(6)

We assume that this is a consequence of the imperfect hardware with its decoherence and dissipation effects. In particular, the fact that \({\bar{p}}(0) > {\bar{p}}(1)\) is most likely a consequence of dissipation since a bit of 0 corresponds to an observation of a qubit ground state, whereas a bit of 1 is associated with an excited state.

Fig. 4
figure 4

Measured bit distribution for each qubit from the B-QRNG on ibmq_manhattan. We show the expected probability \(p_n(0)\) of obtaining a zero bit from the measured bit string for the nth qubit, Eq. (3), and (stacked on top) its complement \(p_n(1)=1-p_n(0)\). Also shown are the corresponding expected probabilities with respect to all measured bits \({{\bar{p}}}(0) \approx {0.51}\) and \({{\bar{p}}}(1)=1-{{\bar{p}}}(0) \approx {0.49}\), respectively, Eq. (6). Apparently, all bit distributions deviate differently from the uniform probability \({\tilde{p}}\), Eq. (5), which we assume to be a consequence of the imperfect hardware. The distributions with the highest (\(n=50\)) and lowest (\(n=19\)) expected probabilities of obtaining a zero bit are marked on top

From a \(\chi ^2\) test (Pearson, 1900) on the measured bit distribution, the null hypothesis of a uniform zero bit occurrence can be rejected as expected with a confidence level of 1.0000. To further quantify the deviation of the measured probabilities from a uniform distribution, we utilize the discrete Hellinger distance (Hellinger, 1909)

$$\begin{aligned} \text {H}(q_1, q_2) \equiv \frac{1}{\sqrt{2}} \sqrt{ \sum _{i \in Q} \left( \sqrt{q_1(i)} - \sqrt{q_2(i)} \right) ^2 }, \end{aligned}$$
(7)

which can be used to measure similarities between two discrete probability distributions \(q_1 \equiv q_1(i)\) and \(q_2 \equiv q_2(i)\) defined on the same probability space Q. By iterating over all qubits we find the mean and standard deviation

$$\begin{aligned} \langle \text {H}( p_n, {\tilde{p}} ) \rangle \approx {0.0133 \pm 0.0110}. \end{aligned}$$
(8)

The mean value quantifies the average deviation of the measured qubit distributions from the idealized uniform distribution and confirms our qualitative observations. The non-negligible standard deviation results from the fluctuations in-between the individual qubit outcomes.

3.2.2 Randomness

Although quantum events intrinsically exhibit a truly random behavior, the output of our B-QRNG is the result of a complex physical experiment behind a technically sophisticated pipeline that appears as a black box to us and it can therefore not be assumed with certainty that its outcomes are indeed statistically independent. To examine this issue in more detail, we briefly study the randomness of the resulting bit string in the following.

For this purpose, we make use of the Wald-Wolfowitz runs test (Wald & Wolfowitz, 1940), which can be used to test the null hypothesis that elements of a binary sequence are mutually independent. We perform a corresponding test on the measured bit string from the nth qubit \({\textbf {b}}_n\), Eq. (2), and denote the resulting p-value as \(p^{\text {r}}_n\). The null hypothesis has to be rejected if this probability does not exceed the significance level, which we choose as \(\alpha = {0.05}\).

The test results are shown in Fig. 5. We find that the bit strings from almost all qubits pass the test and can therefore be considered random in the sense of the test criteria. However, the bit strings from five qubits fail the test, which implies non-randomness. We also perform a test on the total bit string \({\textbf {B}}\), which yields the p-value \(p^{\text {r}} \approx {0.0000} < \alpha\) such that the test also fails for the entire sequence of random numbers.

Summarized, we find that the reliability of the generated quantum random numbers is questionable. A typical binary random sequence from a PRNG of the same length as \({\textbf {B}}\) can be expected to pass the Wald-Wolfowitz runs test. However, within the scope of this work, the reason for this observation cannot be further investigated and we accept it as an integral part of our naive approach to the B-QRNG. Further work regarding the properties of our setup (applied to a different quantum hardware) can be found in Tamura and Shikano (2020); Shikano et al. (2020); Tamura and Shikano (2021), which contain similar observations. A lack of reliability is not surprising considering the fact that we have not aimed for a certified random number generation and our setup is motivated by a strongly idealized model of quantum gate computers, as already mentioned above.

Fig. 5
figure 5

Results of Wald-Wolfowitz runs test on the bit strings of all qubits, where \(p^{\text {r}}_n\) denotes the resulting p-value of the bit string of the nth qubit \({\textbf {b}}_n\), Eq. (2). We show p-values in different colors depending on whether or not they exceed \(\alpha = {0.05}\). In case of \(p^{\text {r}}_n \le \alpha\), the corresponding hardware indices are additionally denoted on top of the plot and indicate the qubits that fail the test of randomness

3.2.3 Integers

Next, we analyze the resulting random 32-bit integers. To obtain these, we convert \({\textbf {B}}\) into a vector of integers \({\textbf {B}} \mapsto {\textbf {I}} \in \{0,\dots ,2^{C}-1\}^L\) by consecutively grouping its elements into bit strings of length C and converting them to non-negative integers according to

$$\begin{aligned} I_j \equiv \sum _{i=0}^{C-1} B_{C (j-1) + i + 1} 2^i \end{aligned}$$
(9)

with \(j \in \{1,\dots ,L\}\). For a bit string of Bernoulli random variables \({\textbf {B}}\) with a fair success probability \(p={\tilde{p}}\), Eqs. (1) and (5), the sequence of random integers in \({\textbf {I}}\) would be uniformly distributed. However, as we have seen before, this assumption does not hold true for the results from our B-QRNG. So the question arises as to what the distribution of random integers looks like for our unfair set of Bernoulli variables.

For this purpose, we rescale the elements of \({\textbf {I}}\) by a division by

$$\begin{aligned} \xi \equiv 2^{C}-1 \end{aligned}$$
(10)

such that \({\textbf {I}}/\xi \in [0,1]^L\) and group the range [0, 1] into \(K \equiv {250}\) equally sized bins. Thus, the population of the kth bin is given by

$$\begin{aligned} c_k \equiv \sum _{i=1}^{L} \mathbbm {1}(I_i,k) \end{aligned}$$
(11)

with the indicator function

$$\begin{aligned} \mathbbm {1}(i,k) \equiv {\left\{ \begin{array}{ll} 1 &{} \text {if}\; k< K \;\wedge \; \frac{k-1}{K} \le \frac{i}{\xi } < \frac{k}{K} \\ 1 &{} \text {if}\; k = K \;\wedge \; \frac{K-1}{K} \le \frac{i}{\xi } \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(12)

for \(k \in \{1,\dots ,K\}\).

Additionally, we consider a simplified theoretical description of the bin population by modeling the bit string as the result of a Bernoulli process with a single success probability p, Eq. (1). That is, the bits represent i.i.d. Bernoulli random variables. The integer \(j \in \{0,\dots ,\xi \}\) corresponding to a bit string \(\varvec{\tau }(j) \in {\mathbb {B}}^{C}\) is determined in analogy to Eq. (9) such that \(\sum _{i=0}^{C-1} \tau _{i+1}(j) 2^i = j\). The probability mass function of the resulting integers can consequently be written as

$$\begin{aligned} P(j,p) \equiv \prod _{i=1}^{C} p^{\tau _i(j)} (1-p)^{1-{\tau _i(j)}}. \end{aligned}$$
(13)

Its expected value is given by

$$\begin{aligned} {\hat{I}}(p) \equiv \sum _{i=0}^{\xi } i P(i,p) = \xi p \end{aligned}$$
(14)

and the information entropy (Shannon, 1948) in nats by

$$\begin{aligned} S_I(p) \equiv \sum _{i=0}^{\xi } P(i,p) \ln P(i,p) = - C \left[ p \ln p + (1-p) \ln (1-p) \right] . \end{aligned}$$
(15)

We show a plot of Eqs. (14) and (15) in Fig. 6. Finally, the predicted (possibly non-integer) population of the kth bin reads

$$\begin{aligned} {\hat{c}}_k(p) \equiv L \sum _{i=0}^{\xi } \mathbbm {1}(i,k) P(i,p), \end{aligned}$$
(16)

which we use as our simplified model of Eq. (11).

Fig. 6
figure 6

Expected value \({\hat{I}}(p)\), Eq. (14), and entropy \(S_I(p)\), Eq. (15), for a random integer from the domain \(\{0,\dots ,\xi \}\) resulting from a string of C random bits from a Bernoulli process with success probability p, Eq. (1). The expected value is proportional to p, whereas the entropy attains its maximum value at \(p={0.5}\). We apply rescaling factors to constrain both quantities to the same scale

We show both the measured bin population \(c_k\), Eq. (11), and the theoretical bin population \({\hat{c}}_k(p)\), Eq. (16), for a success probability p corresponding to the expected probability of all measured bits \({\bar{p}}(1)=1-{{\bar{p}}}(0)\), Eq. (6), in Fig. 7. Clearly, the generated sequence of random integers is not uniformly distributed (i. e., with a population of L/K in each bin). Instead, we find a complex arrangement of spikes and valleys in the bin populations.

Specifically, since \({\bar{p}}(0) > {\bar{p}}(1)\), random integers become more probable when their binary representation contains as many zeros as possible, which is reflected in the bin populations. In particular, the first bin (containing the smallest integers) has the highest population. The minor deviations between the measured and the theoretic bin populations results from the finite number of measured samples and the simplification of the theoretical model: The success probability of each bit from the B-QRNG specifically depends on the qubit it is generated from as shown in Fig. 4, whereas our theoretical model only uses one success probability for all bits corresponding to \({\bar{p}}(1)\).

Fig. 7
figure 7

Measured distribution of 32-bit integers from the B-QRNG. The values from the generated vector of random integers \({\textbf {I}}\), Eq. (9), are rescaled by a division by \((2^{32}-1)\) and sorted into 250 equally sized bins. The kth bin (with \(k \in \{1,\dots ,250\}\)) has a population of \(c_k\) according to Eq. (11). For comparison, the corresponding theoretic bin population of the kth bin \({\hat{c}}_k({\bar{p}}(b))\) is shown, which is obtained from a Bernoulli process according to Eq. (16) with a success probability of \(p={\bar{p}}(1)=1-{\bar{p}}(0)\), Eq. (6). The minor deviations between the two populations results from the finite number of measured samples as well as the observation that bits from different qubits have their own success probability, cf. Fig. 4. An outline of the uniform bin population is shown as a frame of reference

We recall the Hellinger distance, Eq. (7), to quantify the deviation of the distribution of integers from the uniform distribution. Specifically, we find

$$\begin{aligned} \text {H}(p_c, {\tilde{p}}_c) \approx {0.0213}, \end{aligned}$$
(17)

where we have made use of the measured integer distribution \(p_c \equiv p_c(k) \equiv c_k/L\) and the corresponding uniform distribution \({\tilde{p}}_c \equiv {\tilde{p}}_c(k) \equiv 1/K\) with \(k\in \{1,\dots ,K\}\). This metric quantifies our observations from Fig. 7.

For comparative purposes, we show additional theoretical bin populations for other success probabilities in Fig. 8. As expected, the rugged pattern of the distribution becomes sharper for lower or higher values of p and the deviation from the uniform distribution increases.

Fig. 8
figure 8

Theoretical distribution of 32-bit integers in analogy to Fig. 4 for different success probabilities \(p \in \{p_1={0.3},p_2={0.4},p_3={0.5},p_4={0.6},p_5={0.7}\}\), Eq. (16). The population axis is scaled logarithmically. We also show the (rescaled) mean values \({\hat{I}}(p)/\xi\), Eq. (14), and the uniform distribution \({\tilde{p}}_c\) as used in Eq. (17). The corresponding Hellinger distances, Eq. (7), with \({\hat{p}}_c(p) \equiv {\hat{p}}_c(p;k) \equiv {\hat{c}}_k(p)/L\) and \(k\in \{1,\dots ,K\}\) read \(\text {H}({\hat{p}}_c(p_1), {\tilde{p}}_c) \approx \text {H}({\hat{p}}_c(p_5), {\tilde{p}}_c) \approx {0.3776}\), \(\text {H}({\hat{p}}_c(p_2), {\tilde{p}}_c) \approx \text {H}({\hat{p}}_c(p_4), {\tilde{p}}_c) \approx {0.1867}\), and \(\text {H}({\hat{p}}_c(p_3), {\tilde{p}}_c) \approx {0.0000}\), respectively

4 Experiments

To study the effects of quantum-based network initializations, we consider two independent experiments, which are both implemented in PyTorch (Paszke et al., 2019): First, a convolutional neural network (CNN) and second, a recurrent neural network (RNN). The choice of these experiments is motivated by the statement from Bird et al. (2020) that “neural network experiments show greatly differing patterns in learning patterns and their overall results when using PRNG and QRNG methods to generate the initial weights.”

To ensure repeatability of our experiments, PyTorch is run in deterministic mode with fixed (i. e., hard-coded) random seeds. The main hardware component is a Nvidia GeForce GTX 1080 Ti graphics card. Our Python implementation of the experiments is publicly available online (Wolter, 2021).

In the present section, we first summarize the considered RNGs. Subsequently, we present the two experiments and discuss their results.

4.1 RNGs

In total, we use four different RNGs to initialize neural network weights:

  1. 1.

    B-QRNG: Our hardware-biased quantum random number generator introduced in Sect. 3 from which we extract the integer sequence \({\textbf {I}}\) according to Eq. (9). The data is publicly available online (Heese et al., 2023).

  2. 2.

    QRNG: A bias-free quantum random number generator (ANU, 2021) based on quantum-optical hardware that performs broadband measurements of the vacuum field contained in the radio-frequency sidebands of a single-mode laser to produce a continuous stream of binary random numbers (Symul et al., 2011; Haw et al., 2015). We particularly use a publicly available pre-generated sequence of random bits from this stream (ANU, 2017), extract the first M bits and convert them into the integer sequence \({\textbf {I'}} \in \{0,\dots ,2^{C}-1\}^L\) according to Eq. (9). Based on the Hellinger distance \(\text {H}(p'_c, {\tilde{p}}_c) \approx {0.0018}\), Eq. (7), with \(p'_c \equiv p'_c(k) \equiv c'_k/L\) and \(c'_k \equiv \sum _{i=1}^{L} \mathbbm {1}(I'_i,k)\), Eq. (12), for \(k\in \{1,\dots ,K\}\), we find that \({\textbf {I'}}\) is indeed much closer to the uniform distribution than \({\textbf {I}}\), Eq. (17). We visualize the corresponding integer distribution in Fig. 9.

  3. 3.

    PRNG: The (presumably unbiased) native pseudo-random number generator from PyTorch.

  4. 4.

    B-PRNG: A “pseudo hardware-biased quantum random number generator”, which generates a bit string of i.i.d. Bernoulli random variables with a success probability p corresponding to the expected probability of all measured bits \({\bar{p}}(1)=1-{\bar{p}}(0)\), Eqs. (1 and (6), using the native pseudo-random number generator from PyTorch. The bit strings are then converted into integers according to Eq. (9). Their probability mass function is given by Eq. (13).

All of these RNGs, which are summarized in Tab. 1, produce 32-bit random numbers. However, the random numbers from the B-QRNG and the QRNG are taken in order (i. e., unshuffled) from the predefined sequences \({\textbf {I}}\) and \({\textbf {I'}}\), respectively, whereas the PRNG and the B-PRNG algorithmically generate random numbers on demand based on a given random seed.

Table 1 Overview over the four considered RNGs presented in Sect.on 4.1, which are either based on a classical pseudo-random number generator or a quantum experiment (as indicated by the rows) and yield either unbiased or biased outcomes (as indicated by the columns)

For the sake of completeness, we also analyze the binary random numbers from the B-QRNG and the QRNG, respectively, with the NIST Statistical Test Suite for the validation of random number generators (Rukhin et al., 2010; NIST, 2010). For this purpose, the bit strings are segmented into smaller sequences and multiple statistical tests are evaluated on each sequence. Each test consists of one or more sub-tests with the null hypothesis that the sequence being tested is random. Based on the proportion of sequences for which a sub-test satisfies the null hypothesis, it is considered as passed or rejected, where a rejection indicates non-randomness. A more detailed discussion about this procedure can also be found in Sýs et al. (2015).

A summary of our results is listed in Tab. 2. It shows that the B-QRNG numbers fail a majority of statistical tests of randomness, as expected, whereas the QRNG passes all.

Fig. 9
figure 9

Distribution of 32-bit integers from the QRNG in analogy to Fig. 7. The values from the vector of random integers \({\textbf {I'}}\) are rescaled by a division by \((2^{32}-1)\) and sorted into 250 equally sized bins. The population of the kth bin (with \(k \in \{1,\dots ,250\}\)) is denoted by \(c'_k\). For comparison, we also show the corresponding population \(c_k\), Eq. (11), from the B-QRNG and an outline of the uniform bin population

Table 2 Summary of the results from the NIST Statistical Test Suite for the validation of random number generators (NIST, 2010) applied to the whole sequence of binary random numbers from the B-QRNG and the QRNG, respectively

4.2 CNN

In the first experiment, we consider a LeNet-5 inspired CNN with ReLU activation functions and without dropout (Lecun et al., 1998). The network weights are initialized as proposed by He et al. (2015), but we use a uniform distribution instead of a normal distribution, as is also common. This means that each weight \(w_i\) (with \(i=1,2,\dots\)) is sampled uniformly according to

$$\begin{aligned} w_i \sim [-h_i,h_i], \end{aligned}$$
(18)

where \(h_i>0\) is chosen such that a constant output variance can be achieved over all layers. The network biases are initialized analogously.

As data we use the MNIST handwritten digit recognition problem (LeCun et al., 1998), which contains 70,000 grayscale images of handwritten digits in \({28}\times {28}\) pixel format. The digits are split into a training set of 60,000 images and a training set of 10,000 images. The network is trained using Adadelta (Zeiler, 2012) over \(d \equiv {14}\) epochs.

In Fig. 10 we show the CNN test accuracy convergence for each epoch over 31 independent training runs using the four RNGs from Sect. 4.1. The use of a biased RNG means that the He et al. initialization is actually effectively realized based on a non-uniform distribution instead of a uniform distribution. Therefore, such an approach could potentially be considered a new type of initialization strategy (depending on the bias), which is why one might expect a different training efficiency. However, the results show that the choice of RNG for the network weight initialization has no major effect on the CNN test accuracy convergence. Only a closer look reveals that the mean QRNG results seem to be slightly superior to the others in the last epochs.

Fig. 10
figure 10

CNN test accuracy convergence on the MNIST data set using four different random number generators (B-QRNG, QRGN, PRGN and B-PRNG from Sect. 4.1). Shown are mean values over 31 runs with the respective standard deviations (one sigma). The inset plot zooms in on the means of the final epochs

To quantify this observation, we utilize Welch’s (unequal variances) t-test for the null hypothesis that two independent samples have identical expected values without the assumption of equal population variance (Welch, 1947). We apply this test to two of each of the four results from different RNGs, where the resulting test accuracies from all runs in a specific epoch are treated as samples. We denote the two results to be compared as \({\textbf {x}}\) and \({\textbf {y}}\), respectively, with \({\textbf {x}},{\textbf {y}} \in {\mathbb {R}}^{{31} \times d}\) for 31 runs and d epochs. Consequently, for each pair of results and each epoch \(i \in \{1,\dots ,d\}\), we obtain a two-tailed p-value \(p_i^t({\textbf {x}}, {\textbf {y}})\). The null hypothesis has to be rejected if such a p-value does not exceed the significance level, which we choose as \(\alpha = {0.05}\).

We are particularly interested whether the aforementioned hypothesis holds true for all epochs. To counteract the problem of multiple comparisons, we use the Holm-Bonferroni method (Holm, 1979) to adjust the p-values \(p_i^t({\textbf {x}}, {\textbf {y}}) \mapsto {\bar{p}}_i^t({\textbf {x}}, {\textbf {y}})\) for all \(i \in \{1,\dots ,d\}\). Summarized, if the condition

$$\begin{aligned} \min _{i, {\textbf {x}}, {\textbf {y}}} {\bar{p}}_i^t({\textbf {x}}, {\textbf {y}}) \equiv \min _{{\textbf {x}}, {\textbf {y}}} {\bar{p}}_{\min }^t({\textbf {x}}, {\textbf {y}}) \overset{!}{>}\ \alpha = {0.05} \end{aligned}$$
(19)

is fulfilled, no overall statistically significant deviation between the results from different RNGs is present.

In addition, we also quantify the correlation of \({\textbf {x}}\) and \({\textbf {y}}\) using the Pearson correlation coefficient (Pearson, 1895)

$$\begin{aligned} \rho ({\textbf {x}},{\textbf {y}}) \equiv \frac{\sum _{i=1}^c {\bar{x}}^m_i {\bar{y}}^m_i}{\sqrt{\sum _i ({\bar{x}}^m_i)^2 \sum _j ({\bar{y}}^m_j)^2}} \in [-1,1] \end{aligned}$$
(20)

of the mean values over all runs, where we make use of the abbreviations \({\bar{x}}'_i \equiv x'_i - \sum _{i=1}^d x'_i/d\), \(x'_i \equiv \sum _{j=1}^{{31}} x_{ji}/{31}\), \({\bar{y}}'_i \equiv y'_i - \sum _{i=1}^d y'_i/d\), and \(y'_i \equiv \sum _{j=1}^{{31}} y_{ji}/{31}\). A coefficient of 1 implies a perfect linear correlation of the means, whereas a coefficient of 0 indicates no linear correlation.

For the results from the CNN experiment, we obtain the similarity and correlation metrics listed in Tab. 3 in the rows marked with “CNN”. Summarized, we find a high mutual similarity [Eq. (19) holds true] and almost perfect mutual correlations of the results. This means that the choice of RNG for the network weight initialization has no statistically significant effect on the CNN test accuracy convergence and, in particular, the QRNG results are not superior despite the visual appearance in Fig. 10.

Table 3 Minimum p-values from Welch’s t-test over all epochs \({\bar{p}}_{\min }^t({\textbf {x}}, {\textbf {y}})\), Eq. (19), and Pearson correlation coefficient \(\rho ({\textbf {x}}, {\textbf {y}})\), Eq. (20), of the experimental data

At this point, the question arises whether a different bias of the RNGs might have led to better training results. To answer this question, we consider additional pseudo-random number generators B-PRNG(p), which are based on a Bernoulli process with success probability p, Eq. (1), such that the originally considered B-PRNG corresponds to B-PRNG(\({\bar{p}}(1)\)), Eq. (6), and the PRNG corresponds to B-PRNG(0.5). In the extreme cases of \(p={0}\) and \(p={1}\), B-PRNG(p) is not random anymore and produces only constant values of 0 and \(2^{32}-1\), respectively. The probability mass function of the resulting integers is given by Eq. (13). We train the CNN again on the MNIST data set with a weight initialization based on B-PRNG(p) for different values of \(p \in [0,1]\) and consider the test accuracy at epoch 14.

The results are shown in Fig. 11. Clearly, the mean test accuracy attains a maximum at \(p={0.5}\), which corresponds to an unbiased pseudo-random number generator (i. e., the PRGN). For smaller and larger success probabilities, the mean test accuracy decreases. In particular, we observe a steep drop in performance for \(p<{0.2}\) and \(p>{0.95}\), which indicates that a bias of the random number generator towards 0 has more severe effects than a bias towards 1. The worst performance is achieved for \(p={0}\) and \(p={1}\), respectively.

We recall that for \(p={0.5}\), weights are sampled uniformly around zero, Eq. (18). Thus, for \(p>{0.5}\), the weights are more probable to be positive, whereas for \(p<{0.5}\), they are more probable to be negative, cf. Eq. (14). Since our CNN contains ReLU activation functions, a shift of the weights towards negative values leads to vanishing gradients. According to our experiments, this seems to become significant for \(p<{0.2}\). On the other hand, an equivalent shift towards positive values does not drastically decrease the training performance and even for \(p={0.95}\) the test accuracy is above \({98.6\,\mathrm{\%}}\). However, for \(p={1}\) the test accuracy also drops. We think that the reason for this behavior is that the weights are in this case constant and attain the maximum value of the distribution, Eq. (18). The resulting lack of diversity, which is for example evident from the entropy, Eq. (15), is probably the cause for the bad training performance (Frankle & Carbin, 2019).

Fig. 11
figure 11

CNN test accuracy on the MNIST data at epoch 14 using different pseudo-random number generators B-PRNG(p), which are based on a Bernoulli process with success probability p, Eq. (1). We consider \(p \in \{{0}, {.05}, {.1}, {.2}, {.3}, {.4}, {.5}, {.6}, {.7}, {.8}, {.9}, {.95}, {1}\}\). Shown are mean values over 30 runs with the respective standard deviations (one sigma) as error bars for (a) the full bias range and (b) a zoom on the peak of the accuracy at \(p={0.5}\). For comparison, we also plot the corresponding results from Fig. 10 for the B-PRNG with \(p={\bar{p}}(1)\), Eq. (6), as well as for the PRNG with \(p={0.5}\)

4.3 RNN

In the second experiment, we consider a recurrent LSTM cell with a uniform initialization in analogy to Eq. (18), which we apply on the synthetic adding and memory standard benchmarks (Hochreiter & Schmidhuber, 1997) with \(T={64}\) for the memory problem. For this purpose, we use RMSprop (Hinton, 2012) with a step size of e-3 to optimize LSTM cells (Hochreiter & Schmidhuber, 1997) with a state size of 256. For each problem, a total of 9e5 updates with training batches of size 128 is computed until the training stops. In total, there are \(\lfloor {9e5} / {128} \rfloor = {7031}\) training steps.

Since the synthetic data sets are infinitely large, overfitting is not an issue and we can consequently use the training loss as performance metric. Specifically, we consider 89 consecutive training steps as one epoch, which leads to \(d \equiv {4687}/{89} = {79}\) epochs in total, each associated with the mean loss of the corresponding training steps.

The results are shown in Fig. 12, where we present the loss for each of the 79 epochs over 31 independent training runs for both problems. Again, we compare the results using random numbers from the four RNGs from Sect. 4.1. The use of a biased RNG effectively realizes a non-uniform initialization (depending on the bias) in comparison with the uniform initialization from a non-biased RNG. However, we find that no RNG yields a major difference in performance.

Fig. 12
figure 12

RNN convergence on two benchmark data sets using four different RNGs (B-QRNG, QRGN, PRGN and B-PRNG from Sect. 4.1). Shown are mean values over 31 runs with the respective standard deviations (one sigma) in analogy to Fig. 10. The inset plot zooms in on the means of the final epochs

In analogy to the first experiment, we list the similarity and correlation metrics in Tab. 3 in the rows marked with “RNN-M” and “RNN-A”, respectively. Again, we find a high mutual similarity [Eq. (19) holds true] and correlation. Thus, the choice of RNG also has no statistically significant effect in this second experiment. Due to the numerical effort required to train the RNNs, we cannot perform an analysis of different biases of RNGs as in the first experiment.

5 Conclusions

Summarized, by running a naively designed quantum random number generator on a quantum gate computer, we have generated a random bit string. Its statistical analysis has revealed a significant bias and mutual dependencies as imposed by the quantum hardware. When converted into a sequence of integers, we have found a specially shaped distribution of values with a rich pattern. We have utilized these integers as hardware-biased quantum random numbers (B-QRNG). Motivated by the results from Bird et al. (2020), we have deliberately chosen to use these biased and correlated random numbers to study their impact on machine learning algorithms.

Specifically, we have studied their effect on the initialization of artificial neural network weights in two experiments. For comparison, we have additionally considered unbiased random numbers from another quantum random number generator (QRNG) and a classical pseudo-random number generator (PRNG) as well as random numbers from a classical pseudo-random number generator replicating the hardware bias (B-PRNG). The two experiments consider a CNN and a RNN, respectively, and show no statistically significant influence of the choice of RNG.

Despite a similar setup, we have not been able to replicate the observation from Bird et al. (2020), where it is stated that quantum random number generators and pseudo-random number generators “do inexplicably produce different results to one another when employed in machine learning.” However, we have not explicitly attempted to replicate the numerical experiments from the aforementioned work, but have instead considered two different examples that we consider typical applications of neural networks in machine learning.

Since our results are only exemplary, it may indeed be possible that there is an advantage in the usage of biased quantum random numbers for certain applications. Based on our studies, we expect, however, that in such cases it will in fact not be the “true randomness” of the quantum random numbers, but rather the opposite – their hardware-induced bias, including possible correlations – that will cause an effect. But is quantum hardware really necessary to produce such results? It seems that classical pseudo-random number generators are also able to mimic these effects. Even more, because the reliability and security of PRNGs can be ensured with less effort and a greater confidence than that of gate-based QRNGs on NISQ devices. Therefore, we think that for typical machine learning applications the usage of (high-quality) pseudo-random numbers is sufficient. Accordingly, a more elaborate experimental or theoretical study of the effects of biased pseudo-random numbers (with particular patterns) on certain machine learning applications could be a suitable research topic, e. g., to better understand the claims from Bird et al. (2020).

Repeatability is generally difficult to achieve for numerical calculations involving random numbers (Crane, 2018). In particular, our B-QRNG can in principle not be forced to reproduce a specific random sequence (as opposed to PRNGs). Furthermore, the statistics of the generated quantum random numbers may depend on the specific configuration of the quantum hardware at the time of operation. It might therefore be possible that a repetition of the numerical experiments with quantum random numbers obtained at a different time or from a different quantum hardware may lead to significantly different results. To ensure the greatest possible transparency, the source code for our experiments is publicly available online (Wolter, 2021) and may serve as a point of origin for further studies.