1 Introduction

In recent years, there has been substantial interest in algorithms based on “quantum linear algebra”, where quantum states are used to represent vectors with exponentially large dimensions, which are manipulated by large matrices representing quantum operations (Zhao et al. 2019; Gilyén et al. 2018). A particularly fruitful domain of applying such methods is quantum machine learning, where quantum algorithms promise for exponential or high-degree polynomial improvements on computational bottlenecks, or enhancement in model expressiveness due to the ability to construct kernels using access to exponentially large Hilbert spaces (Aïmeur et al. 2006; Lloyd et al. 2013; 2014b; Schuld et al. 2014; Zhao et al. 2019; Dunjko and Briegel 2018; Zhao et al. 2019; Zhao et al. 2019; Schuld and Killoran 2019; Tiwari et al. 2020; Dehdashti et al. 2020; Kübler et al. 2019; Havlíček et al. 2019). Many of these methods leverage quantum algorithms for solving linear systems that trace back to the seminal result of Harrow, Hassidim and Lloyd (2009). This algorithm provides a way to generate quantum states proportional to A− 1x, in time logarithmic in the dimension of the vector x which was subsequently improved to achieve exponentially better precision (Childs et al. 2017). However, as Aaronson pointed out (Aaronson 2015), these quantum-assisted machine learning approach comes with several caveats relating to the sparsity and conditioning of the matrix involved, and the structure of the input vector. Crucially, nearly all the algorithms assume access to a quantum accessible database which can produce quantum states with amplitudes proportional to the classical input entries, which is referred to as amplitude encoding. This requirement persists even under the reasonable assumption that the loading and construction of the required databases constitute a one-off overhead, and does not contribute to the computational complexity analysis of data processing.

A breakthrough came with the seminal work by Kerenidis and Prakash who provided the first end-to-end quantum machine learning algorithm with explicit description of quantum accessible data structure for efficient input preparation, which was applied to recommendation systems (Kerenidis and Prakash 2017). The same data structure was subsequently applied to improve the performance of the quantum linear system algorithm in dense cases (Wossnig et al. 2018). The Kerenidis-Prakash (KP) model explicitly stores the amplitudes in a binary tree structure. As such in addition to the ability of realising amplitude encoding for the row vectors of a matrix, the KP structure further allows for a stronger ability for preparing classical probability distributions proportional to vectors of entries squared – so-called 2-sampling. From a classical perspective, intriguingly, recent results of quantum-inspired machine learning algorithms (Tang 2018a; Gilyén et al. 2018; Tang 2018b; Chia et al. 2018) have shown that assuming such a data structure capable of efficient 2-sampling leads to equally efficient classical algorithms in the low-rank cases. The KP structure allows for input preparation both in the quantum and classical cases with a logarithmic storage overhead. The entries in the KP structure, i.e. the partial sums, are fixed beforehand and stored in the data structure. As a result, specific vectors for which the correct partial sums are stored can be directly accessed as amplitude-encoded quantum state without post-selection. However, practical situations may arise where computation and re-computation of the partial sums is inefficient. For instance, it could be desirable to generate input vectors collecting entries from differing rows of a matrix stored in the data structure in statistical resampling techniques for cross-validation, or specific entry-wise functions are required for different algorithms using the same input data. In such situations, a more flexible and general input query model would have significant relevance. Alternatively, some quantum machine learning algorithms operate on an input model in which data is stored as entries in density matrices and subsequently read out by exponentiating density matrices and simulating them as Hamiltonians (Lloyd et al. 2014a; Rebentrost et al. 2014). This method requires explicit encoding and storage of data as quantum mixed states, and restricted by a similar rigidity as the KP model.

2 Input Model and Amplitude Encoding

In this work, we examine the generic input model where only entry-wise access is allowed, and prove using smoothed analysis that both quantum amplitude encoding and classical 2-sampling can be achieved with constant queries in realistic machine learning application with moderately noisy input data. In the quantum setting, entry-wise access is equivalent to having access to an oracle Ox that implements the unitary transformation,

$$ \sum\limits_{i,j} \alpha_{ij} \vert i \rangle\vert j \rangle \xrightarrow{O_{\mathbf{f(x)}}} \sum\limits_{i,j} \alpha_{ij} \vert i \rangle\vert j + f(x_i) \rangle. $$
(1)

whereas in the classical setting, it is the standard random access memory that provides the mapping, \(i \rightarrow f(x_{i})\). Without committing to an ad hoc data structure either in the form of the KP model or density matrix access, primitive input assumption reserves the flexibility of loading the data as arbitrary entry-wise functions. We assume in the sequel f(xi) = xi for simplicity. Generating the exact amplitude-encoded states (or 2-samples) from the entry-wise oracle, by the Grover (or unstructured) search lower bound requires Ω(D1/2) (Ω(D), respectively) calls for \(\mathbf {x}\in \mathbb R^{D}\). This observation is a common critique on early quantum machine proposals which did not explicitly address the issue of initial state preparation, yet attempted to argue for polylogarithmic runtimes. Here we show that, in fact, for practical machine learning where the algorithm is robust against moderate input noise represented by entry-wise perturbations, quantum state preparation based on amplitude encoding has high success probability with only constant query cost. Furthermore, By an analogous argument, we also show that low-rank quantum-inspired algorithms operating under the same practical assumptions do not require access to special data structures to attain polylogarithmic runtime.

For quantum amplitude encoding, our objective is to prepare the state

$$ \begin{array}{@{}rcl@{}} \vert \mathbf{x} \rangle = \|\mathbf{x}\|_{2}^{-1}\sum\limits_{i=1}^{D} x_{i} \vert i \rangle \end{array} $$
(2)

with amplitudes proportional to the entries xi of \(\mathbf {x} \in \mathbb {R}^{D}\), given access to an oracle that realises the mapping in Eq. 1. Quantum random access memory (QRAM) provides one way to implementing such an operation. In this case, x is stored classically and one is able to access the corresponding memory cells in quantum superposition (Giovannetti et al. 2008). In other cases, the oracle can also be constructed if xi is efficiently computable given the index i. To produce |x〉 probabilistically for any x, one can start with the state \(D^{-\frac {1}{2}}{\sum }_{i} \vert i \rangle \vert 0 \rangle \) as the query state for the operation in Eq. 1 to obtain \(D^{-\frac {1}{2}}{\sum }_{i} \vert i \rangle \vert x_{i} \rangle \). An ancillary qubits is prepared in state |0〉 and then conditionally rotated based on the second register to obtain

$$ \begin{array}{@{}rcl@{}} D^{-\frac{1}{2}} \sum\limits_{i} \vert i \rangle\vert x_{i} \rangle\left( \sqrt{1-|x_{i}|^{2}} \vert 0 \rangle + x_{i} \vert 1 \rangle\right), \end{array} $$
(3)

where we have assumed for simplicity that x is normalised such that |xi|≤ 1. Performing a second oracle call to uncompute the registers |xi〉, and post-selecting onto |1〉 results in the desired state |x〉.

A known barrier to state preparation in the oracle setting is the probability of projecting onto the correct subspace in the final step, which is given by \(D^{-1} {\sum }_{i} |x_{i}|^{2}\). When the entries of x are of similar magnitude, |x〉 can be prepared using a constant number queries. However, in the case where a few entries are much larger than the rest, the lower bounds on unordered search (Boyer et al. 1996) imply that the corresponding state requires \({\Omega }\left (\sqrt {D}\right )\) oracle queries (Soklakov and Schack 2006). This argument can be extended to the case where it is only necessary to prepare any \(\vert \mathbf {x^{\prime }} \rangle \) such that \(|\mathbf {x^{\prime }} - \mathbf {x}|_{2}\) is sufficiently small.

However, here we make the observation that if real-valued data comes from a realistic source subjective to noise, such as datasets of ocean temperatures (International Comprehensive Ocean-Atmosphere Data Set) and geostatistics (Practical Geostatistics 2000 Data Sets), and the algorithm is robust against such realistic input noise, then quantum state preparation can be done efficiently with constant oracle queries. Formally, our argument is based on analysing the smoothed complexity for the amplitude encoding procedure when the input vectors are subjective to small perturbations. The smoothed complexity was introduced by Spielman and Teng (2009), originally to explain the efficient performance of the simplex algorithm for linear programming in typical real-world scenarios. The same line of reasoning was subsequently applied to analyse the practical efficiency of various important algorithms in mathematical programming, machine learning, numerical analysis discrete mathematics and combinatorics optimisation (Spielman and Teng 2001).

3 Smoothed Analysis

The key intuition here is to analyse the performance of the algorithms when the worst-case data input is subject to noise, which is represented by a small Gaussian element-wise perturbation. Following the convention of Spielman and Teng (2009), we state the definition of smoothed complexity and then prove that preparing amplitude-encoded states has a constant smoothed complexity:

Definition 1 (Smoothed Complexity 2009)

Given an algorithm \(\mathcal {A}\) with an input domain \({\Omega }_{D}=\mathbb {R}^{D}\), the smoothed complexity of \(\mathcal {A}\) with σ-Gaussian perturbation is defined as

$$ \begin{array}{@{}rcl@{}} \text{Smoothed}^{\sigma}_{\mathcal{A}}(D)=\max_{\mathbf{x}\in [-1,1]^{D}} \mathbb{E}_{\mathbf{g}}[T_{\mathcal{A}}(\mathbf{x}+\mathbf{g})], \end{array} $$
(4)

where g is a Gaussian random vector with variance σ2, and \(T_{\mathcal {A}}\) denotes the runtime of \(\mathcal {A}\).

Furthermore, \(\mathcal {A}\) is said to have polynomial smoothed complexity if there exist positive constants k1, k2, D0, σ0 and c such that for all DD0 and 0 ≤ σσ0, it holds that

$$ \begin{array}{@{}rcl@{}} \text{Smoothed}^{\sigma}_{\mathcal{A}}(D)\le c \sigma^{-k_{2}} D^{k_{1}}. \end{array} $$
(5)

Theorem 1

Given oracle access, Ox to the entries of \(\mathbf {x}\in \mathbb {R}^{D}\), the amplitude encoding of x into |x〉 has smoothed complexity \(\mathcal {O}(1/\sigma )\).

Proof

Let \(\mathcal {A}\) be the algorithm that maps \(D^{-\frac {1}{2}} {\sum }_{i} \vert i \rangle \vert x_{i} \rangle \) into \(\vert \mathbf {x} \rangle = \|\mathbf {x}\|_{2}^{-1}{\sum }_{i=1}^{D} x_{i} \vert i \rangle \). After applying the controlled rotation and uncomputing the second register, the optimal success probability of projecting the state,

$$ \begin{array}{@{}rcl@{}} D^{-\frac{1}{2}} \sum\limits_{i} \vert i \rangle\left( \sqrt{1-|x_{i}|^{2}} \vert 0 \rangle + x_{i} \vert 1 \rangle\right) \end{array} $$
(6)

onto the desired state |x〉 is given by \(P_{\mathcal {A}}=\mathcal {O}(\frac {\|\mathbf {x}\|_{2}}{\sqrt {D}})\) with fixed-point amplitude amplification (Gilyén et al. 2018). From the definition of smoothed complexity, Eq. 4, in the worst case, we have

$$ \begin{array}{@{}rcl@{}} \text{Smoothed}^{\sigma}_{\mathcal{A}}(D)=&\max_{\mathbf{x}\in [-1,1]^{D}} \mathbb{E}_{\mathbf{g}}[T_{\mathcal{A}}(\mathbf{x}+\mathbf{g})] \\ =& \left( \min_{\mathbf{x}\in [-1,1]^{D}} \mathbb{E}_{\mathbf{g}}[P_{\mathcal{A}}(\mathbf{x}+\mathbf{g})]\right)^{-1}, \end{array} $$
(7)

where the second line follows from the fact that the expected runtime is inversely proportional to the expected success probability. Note that since \(P_{\mathcal {A}}(\mathbf {x}+\mathbf {g}) = \mathcal {O}(\frac {\|\mathbf {x}+\mathbf {g}\|_{2}}{\sqrt {D}})\) and g is a zero-mean Gaussian random vector, the minimum of expected success probability is obtained at x = 0. We thus have

$$ \begin{array}{@{}rcl@{}} \text{Smoothed}^{\sigma}_{\mathcal{A}}(D)=&\left( \mathbb{E}_{\mathbf{g}}[P_{\mathcal{A}}(\mathbf{g})]\right)^{-1}\\ =& D^{\frac{1}{2}} \left( \mathbb{E}[\|\mathbf{g}\|_{2}]\right)^{-1}.\\ =& \mathcal{O}\left( \frac{1}{\sigma}\right), \end{array} $$
(8)

where the last equality followed by noting that the random variable, ∥g2, by definition follows a chi distribution with mean,

$$ \begin{array}{@{}rcl@{}} \mathbb{E}[\|\mathbf{g}\|_{2}] = \sqrt{2}\sigma\frac{\Gamma((D+1)/2)}{\Gamma(D/2)}=\mathcal{O}(\sigma\sqrt{D} ). \end{array} $$
(9)

The result of Eq. 8 implies that when the input vector is subjective to a certain level of noise represented by an element-wise Gaussian perturbation, the query complexity of preparing amplitude encoding is independent of the dimensionality.

We have seen that the quantum state preparation based on amplitude encoding has a constant runtime given that the input is subjective to a finite variance Gaussian noise. An analogous reasoning applies in the fully classical setting. Classically, particularly in the context of quantum-inspired ML algorithm, 2-sampling can be achieved by simple rejection sampling: an entry is chosen uniformly at random, and a value xj is read (we assume xjc, and c is a known constant upper bound on the entries). Then a random real is sampled from the interval (0,c), and if this value is below |xj|2, the value j is output. Otherwise the process is repeated. As desired, the acceptance probability of the element j is given by \( P_{j}^{\text {accept}} = |x_{j}|^{2}, \) which leads to the following average runtime for producing an 2-sample from the correct distribution:

$$ \begin{array}{@{}rcl@{}} \mathbb{E}[T_{\mathcal{R}}] = \left( \frac{1}{D}\sum\limits_{j} P_{j}^{\text{accept}}\right)^{-1} = D\left( \sum\limits_{j}|x|_{j}^{2}\right)^{-1}. \end{array} $$
(10)

We can make an analogous smoothed analysis for this classical rejection sampling process. Denoting \(\mathcal {R}\) as the algorithm that performs rejection sampling, we have

$$ \begin{array}{@{}rcl@{}} \text{Smoothed}^{\sigma}_{\mathcal{R}}(D)&=&\max_{\mathbf{x}\in [-1,1]^{D}} \mathbb{E}_{\mathbf{g}}[T_{\mathcal{R}}(\mathbf{x}+\mathbf{g})] \\ &=& D \left( \mathbb{E}[\|\mathbf{g}\|_{2}]\right)^{-2}.\\ &=& \mathcal{O}\left( \frac{1}{\sigma^{2}}\right). \end{array} $$
(11)

Thus the smoothed complexity of preparing quantum amplitude encoding from QRAM queries and classical 2-samplings from classical RAM are \(\mathcal {O}(\frac {1}{\sigma })\) and \(\mathcal {O}(\frac {1}{\sigma ^{2}})\) respectively given a σ2-variance Gaussian perturbation on the input. The quadratic quantum improvement in the dependency on σ comes directly from amplitude amplification.

4 Element-Wise Perturbation: A Closer Look

In practical settings, the effect of a noisy element-wise perturbation on input data is well-studied in the machine learning literature (Nettleton et al. 2010; Kalapanidas et al. 2003; Cesa-Bianchi et al. 2011). As a specific example, in the domain of computer vision, Dodge and Karam (2016) examined the effect of various synthetic noise effects on the performance of popular deep learning computer vision models such as Caffe, VGG16, VGG-CNN-S, and GoogLeNet. Two relevant results from the work, the pixel-wise Gaussian noise and a change in contrast to the image, make a little effect on the performance. While adversarial attacks aim to find the worst-case corruption which leads to misclassification, Dodge and Karam (2016) empirically show that random perturbations of a half pixel and small shifts in the pixel values will likely have a negligible effect. Furthermore, if the entry-wise perturbation represents a systematic shift in the data points, it has no effect on a large class of useful machine learning models. For examples, kernel methods such as Gaussian processes, support vector machines, determinantal point processes and Gaussian Markov random fields, to name but a few, most commonly use stationary kernels which are shift invariant (Genton 2001). More generally, any digital data processing with floating-point arithmetic only makes sense if the overall results of such a computation maintain their validity when the features of the input vector had been perturbed below the machine precision. This is also practically reasonable, since real-world data come from measurements of finite precision.

An element-wise perturbation can be represented by an off-set in the \(\infty \)-norm induced distance between the original vector x and the perturbed vector \(\mathbf {x}^{\prime }\). Assuming that the data analysis process, along with the particular instance of the input data, is robust against such small \(\infty \)-norm perturbations, we can in fact choose to work with the vector entries \(x^{\prime }_{i}\) which are half-integer multiples of the base precision 𝜖. Such that \(\mathbf {x^{\prime }}\) is chosen to be the closest representable vector to x (as shown in Fig. 1), which satisfies \(|\mathbf {x^{\prime }} - \mathbf {x}|_{\infty }\leq \frac {\epsilon }{2}\) and the distance from the true value of the data is less than 𝜖. This offset rounding can be implemented in the oracle access stage, or effectively realised at the controlled rotation stage, as discussed earlier. The aforementioned robustness assumption in data processing implies the analysis of results are insensitive to such offset rounding. In the quantum setting, the key benefit in using the offset rounding is that, since the exact representation of 0 is not included in this offset rounding convention, we have \(|x_{i}|\ge \frac {\epsilon }{2}\) and the probability of the final projection step succeeding is at least

$$ \begin{array}{@{}rcl@{}} D^{-1} \sum\limits_{i} |x_{i}|^{2} \ge \frac{\epsilon^{2}}{4}, \end{array} $$
(12)

and hence independent of D. Inverting the lower bound of success probability leads to an upper bound of number of expected queries of \(\mathcal {O} (1/\epsilon ^{2})\). The query complexity can be improved to \(\mathcal {O}(1/\epsilon )\) using fixed-point quantum amplitude amplification based on the methods presented in Gilyén et al. (2018), which achieves the same optimal query complexity as in Yoder et al. (2014), but without introducing an additional phase that could be undesirable if multiple vectors are required to be prepared in superposition. Note that in some cases, a systematic perturbation of data-vectors, by utilising, say, a positive sign offset (+ 𝜖/2) to data points is undesirable. In these cases, one can opt for a near-white noise offset using either a suitable pseudo-random number generator seeded by the memory location being queried or by adding random data wi, performing |i〉|xi〉→|i〉|xi + wi〉. Critically, however, the number of oracle queries necessary to successfully prepare the state always has an upper bound that is independent from the database size.

Fig. 1
figure 1

Numerical rounding conventions. In the standard rounding convention scalar values are rounded to the nearest integer multiple of precision 𝜖. Alternatively, we can consider an offset rounding convention, where the rounding is to the nearest half-integer multiple of 𝜖. As a result, this numbering convention does not contain an exact representation of 0. In either scheme, the rounded value is always within \(\frac {\epsilon }{2}\) of the true value

It is worth cautioning that there exist scenarios, especially in computational learning theory (Arunachalam and de Wolf 2017), for which the robustness assumption against entry-wise data perturbation does not generally hold. For instance, if the input vector contains zeros with special meanings such as indicating a canonical vector or representing unknown values, the model may be sensitive toward shifting the zeros to even a small value. For instance, when loading a high-dimensional data point with constant or polylogarithmic sparsity, a systematic shift on the zero entries will produce a state vector converging to a uniform superposition, hence losing necessary information about the original input for meaningful analysis. Nevertheless, the constraint that \(|\mathbf {x^{\prime }} - \mathbf {x}|_{\infty }\leq \epsilon \) rather than requiring closeness in the 2-norm is still meaningful to a large class of practical machine learning tasks. We should also note that the quantum state preparation discussed here is useful to quantum algorithms which aim to improve the efficiency of conventional classical algorithms but otherwise realise the same input-output functionality, such that the quantum algorithms inherit the desired robustness property of their conventional counter-parts.

5 Conclusion

In summary, we have shown that any application which is robust under small \(\infty \)-norm perturbations, as it is the case in most practical machine learning, allows for efficient input preparation, both in the sense of classical 2-sampling and the coherent amplitude-encoded state preparation. In the context of quantum machine learning, this suggests that the caveat related to state preparation raised by Aaronson (2015) can generally be overcome for a wide range of practical use-cases, due to the natural robustness assumption. In the context of quantum-inspired machine learning, this finding removes the necessity of special data structures that involve the storage of partial sums. Hence we have provided a concrete argument for the feasibility of input preparation for both quantum and quantum-inspired algorithms for machine learning under the most general and flexible entry-wise query access model, which we believe will present both conceptual merit and practical utility to the promising exploration between quantum information and machine learning.