Pseudo-dimension of quantum circuits

We characterize the expressive power of quantum circuits with the pseudo-dimension, a measure of complexity for probabilistic concept classes. We prove pseudo-dimension bounds on the output probability distributions of quantum circuits; the upper bounds are polynomial in circuit depth and number of gates. Using these bounds, we exhibit a class of circuit output states out of which at least one has exponential gate complexity of state preparation, and moreover demonstrate that quantum circuits of known polynomial size and depth are PAC-learnable.


Introduction
An important line of research in classical learning theory is characterizing the expressive power of function classes using complexity measures.Such complexity bounds can in turn be used to bound the size of training data required for learning.Among the most prominent of these are the Vapnik-Chervonenkis (VC) dimension introduced by (Vapnik and Chervonenkis 1971).Other well-known measures are the pseudo-dimension due to (Pollard 1984), the fat-shattering dimension due to (Alon et al. 1997), the Rademacher complexities (see Bartlett and Mendelson 2002), and more generally covering numbers in metric spaces.
The goal of characterizing an object's expressive power also appears in different guises throughout quantum information.A well-known example is quantum state tomography.(Aaronson 2007) related a variant of state tomography to a classical learning task whose difficulty can be bounded using the fat-shattering dimension of a particular function class related to the set of quantum states.Associated to this is a corresponding upper bound on sample complexity.(Aaronson 2007) observes that there is no analogous theorem for general quantum process tomography, but leaves as an open question whether there are restricted classes of operations that are information-efficiently learnable.We answer this question in the affirmative, demonstrating in a precise sense that quantum circuits admit a generalization of Aaronson's learning theorem.
Unitary complexity and state complexity are yet another example of how one may capture the richness of a function class that corresponds to a quantum computational process (see e.g.Aaronson 2016).For unitary complexity, the challenge is to determine, e.g., how many two-qubit unitaries (i.e.two-qubit logical gates, in a computational setting) are required to implement a certain multi-qubit unitary (i.e. a quantum circuit).For state complexity, it is to determine how E-mail address: †caro@ma.tum.de,‡idatta@stanford.edu.Date: February 6, 2020.many unitaries produce a certain multi-qubit state.An alternative perspective, adopted in this work, is to consider the expressive power of a set of circuits with a fixed number of unitaries.
In this work we describe a new way of applying complexity measures from classical learning, specifically pseudo-dimension, to quantum information.We associate with a quantum circuit a natural probabilistic function class describing the outcome probabilities of measurements performed on the circuit output.In this way, a function class corresponding to a quantum circuit can be studied with the classical tool of pseudo-dimension.Here, we show that the pseudo-dimension of such a class can be bounded in terms of a polynomial of the circuit depth and size.We also give two applications of these bounds, one in quantum state complexity, the other in learnability of quantum circuits.
These findings are noteworthy not only because of the results themselves, but because we demonstrate the power of pseudo-dimension to gain insight into quantum computation.We hope that these tools may be applied to other problems in quantum computing in future work.
1.1.Related Work.(Aaronson 2007) showed that using the framework of PAC learning, one can introduce a variant of quantum state tomography and prove an upper bound on the required number of copies of the unknown state.This idea was developed further in (Aaronson 2018) and (Aaronson et al. 2018).
Motivated by Aaronson's work, (Cheng et al. 2016) use pseudo-dimension and fat shattering dimension to characterize the learnability of measurements, as a dual problem to learning the state.We apply this mathematical framework to study the problem of learning the circuit itself, in particular by offering a natural function class corresponding to a quantum circuit.(Rocchetto 2018) proved that stabilizer states, prevalent in error correction, are computationally-efficiently learnable, establishing a connection between efficient classical simulability and computationally efficient learnability.This was realized experimentally for small optical systems in (Rocchetto et al. 2019).Similarly, in Sec. 5 we pose as an open problem whether there are quantum operations that can be PAC-learned with modest computation, which could then in principle be demonstrated in an experiment.
While we take a formal approach to learning quantum circuits, others have studied learning unitaries numerically, e.g. with heuristics such as gradient descent (Kiani et al. 2020).Practical machine learning algorithms have also been used for state tomography by (Torlai et al. 2018), and similar techniques could be applied to restricted classes of process tomography.
Another branch of quantum learning deals with whether quantum examples can decrease the information-theoretic complexity of learning a classical function.There are different flavors of this question, e.g.depending on whether learning is distribution-specific or distribution-independent.(Arunachalam and de Wolf 2017) gives an overview of some of these aspects of quantum learning.
In classical learning theory, bounding the complexity measures of function classes (based on complexity-theoretic assumptions) has been studied widely.(Goldberg and Jerrum 1995) derived an upper bound on the VC-dimension of a function class in terms of the runtime required by an algorithm implementing the elements of that class.(Karpinski and Macintyre 1997) established an analogous bound for the function class implemented by a neural network (for various activation functions) in terms of the number of nodes and the number of programmable parameters of the network.(Koiran 1996) demonstrated that by bounding the complexity of function classes implemented on a given architecture, one can lower-bound the size of an architecture implementing a specific "hard" function.
1.2.Overview of our Results.We consider the general scenario in which one measures the output state of a 2-local quantum circuit, generating a probability distribution.We define a function class that characterizes such probability distributions and prove an upper bound on its pseudo-dimension.By doing so, we provide insight into the complexity or "hardness" of the circuit and the output state that gives rise to the probability distribution.The first variant of our result (Theorem 3.3) applies to circuits with fixed architecture, namely those for which the positions of the 2-qudit unitaries (i.e.gates) are fixed, but the unitaries themselves may vary.The second variant (Theorem 3.7) applies to circuits with only the depth and number of gates fixed.In both cases, the upper bound is polynomial in the dimensionality d, the depth δ, and the size γ.More precisely, the pseudo-dimension bounds are O(d 4 • γ log γ) in the former scenario and O(δ • d 4 • γ 2 log γ) in the latter.We show that this approach is viable whether the input state is fixed or variable, and whether the gates constituting the circuit are unitary or quantum operations.
We demonstrate how to apply these complexity upper bounds to explicitly construct, for each n ∈ N, a finite-but-large set of n-qubit quantum states, out of which at least one cannot be implemented by a 2-local qudit circuit of subexponential depth or size.Though finite, the set of states is large, i.e. with cardinality of order 2 2 n .
Analogously to (Aaronson 2007), we use our pseudo-dimension bounds to prove a relaxed variant of quantum process tomography, which following Aaronson's terminology can be called pretty-good circuit tomography: Theorem (Informal, see 4.2).Given a circuit depth and size polynomial in the number of qudits and known in advance to the learner, polynomially-many training examples suffice to learn the unitary implemented by a 2-local quantum circuit.This result generalizes to learning a circuit whose constituent gates are quantum operations rather than unitaries.
In this framework, each training example is a three-tuple of the input state, the observed measurement outcome, and the corresponding measurement probability.Alternately, one may take each training example as a two-tuple of the input state and the measurement outcome, whose probability is the corresponding measurement probability (see Aaronson 2007, Appendix 8).
We review the basics of quantum information, quantum computation, and classical learning theory in Sec. 2. We also discuss prior classical results as motivation.Sec. 3 contains our main results on the pseudo-dimension of quantum circuits and the respective proofs.In Sec. 4, we apply these results to quantum state complexity lower bounds and to a learning problem for quantum operations.We conclude with open questions in Sec. 5.

Preliminaries
As our readership includes both physicists and computer scientists, in this section we review the mathematical frameworks of quantum information theory and learning theory.Further details appear in the reference texts (Heinosaari and Ziman 2013) and (Nielsen and Chuang 2010).2.1.Quantum Information and Computation.The most general descriptor of a d-level quantum system or statistical ensemble thereof is a density matrix, an element of Here, ρ ≥ 0 means that the matrix ρ is Hermitian and all its eigenvalues are non-negative.An important subset of density matrices is the set of pure states, which are one-dimensional projections.Following Dirac notation, we denote the projector onto the subspace spanned by a unit vector |ψ ∈ C d by |ψ ψ|.By the spectral theorem, every quantum state can be written as a convex combination of pure states, though this decomposition is not unique in general.
Central to the framework of quantum mechanics is the measurement, the mechanism by which one may observe properties of a quantum system.These are typically described by socalled positive-operator valued measures (POVMs).As we focus on measurements with a finite set of outcomes {i}, it suffices to think of measurements as collections of so-called effect operators We denote the set of effect operators by Again, we highlight a special case: if we take an orthonormal basis {|ψ i } d i=1 of C d , then the set i=1 is called a projective measurement.Born's rule connects measurements to measurement outcomes: given a state characterized by a density operator, the effect operator has a corresponding probability p i = tr[ρE i ].Thus the requirement that the effect operators sum to the identity can be seen as probabilities summing to one.In the special case of pure state ρ = |ψ ψ| and projective measurement So far we have described the components of static quantum theory.The dynamics of quantum states are described by so-called quantum operations, which we denote by linear, completely positive, and trace-non-increasing}.
Here, a map T is completely positive if T ⊗ Id n is positivity-preserving for every n ∈ N. If T ∈ T C d is trace-preserving, we call T a quantum channel.An important example is the unitary quantum channel, T (ρ) = U ρU * for some unitary U ∈ C d×d .2.2.Classical Learning Theory and Complexity Measures.Next we describe the "probably approximately correct" (PAC) model of learning, introduced and formalized by (Vapnik and Chervonenkis 1971) and (Valiant 1984).In (realizable) PAC learning for spaces X , Y and a concept class F ⊂ Y X , a learning algorithm receives as input labeled training data {(x i , f (x i ))} m i=1 for some f ∈ F, where the samples x i are drawn independently according to some unknown probability distribution D on X that is unknown to the learner.Given the training examples, the goal of the learner is to approximate the unknown function f by a hypothesis function h, with high probability.
We can formalize this as follows: first, we introduce a loss function ℓ : Y × Y → R + to quantify the discrepancy between the hypothesis h and the function f .We call a concept class F PAC-learnable if there exists a learning algorithm A such that for every D ∈ Prob(X), f ∈ F and δ, ε ∈ (0, 1), running A on training data drawn according to D and f yields a hypothesis h such that E x∼D [ℓ(h(x), f (x))] ≤ ε with probability ≥ 1 − δ (with regard to the choice of training data).Moreover, we quantify the minimum amount of training data that an algorithm A needs to meet the above conditions by a map m F : (0, 1) × (0, 1) → N, (δ, ε) → m(δ, ε), the so-called sample complexity of F.
A standard approach to assessing learnability is to characterize the complexity of the respective concept class F. Many such complexity measures are used, the most common being the VC-dimension for binary-valued function classes F ⊂ {0, 1} X , named after its progenitors (Vapnik and Chervonenkis 1971).This combinatorial parameter can be shown to fully characterize the learnability: a concept class F ⊂ {0, 1} X is PAC-learnable (w.r.t. the 0-1-loss) if and only if the VC-dimension of F is finite.Moreover, the sample complexity of PAC learning F can be expressed in terms of its VC-dimension (see Blumer et al. 1989;Hanneke 2016).
In this work, we employ a widely-used extension of the VC-dimension to real-valued concept classes: Alternatively, one can express the pseudo-dimension in terms of the VC-dimension.Namely, There is also a scale-sensitive version of the pseudo-dimension: Definition 2.2.(Fat-Shattering Dimension (Alon et al. 1997)) Let F be a real-valued concept class and let γ > 0. A set {x 1 , ..., x k } ⊂ X is γ-fat-shattered by F if there are y 1 , ..., y k ∈ R such that for any C ⊆ {1, ..., k} there is an f C ∈ F such that for all 1 ≤ i ≤ k: The γ-fat-shattering dimension of F is defined to be Note that, trivially, fat F (γ) ≤ Pdim(F) holds for every γ > 0 and for every real-valued function class F.
Sample complexity upper bounds for [0, 1]-valued function classes in terms of the fat-shattering dimension have been proved in (Bartlett and Long 1998;Anthony and Bartlett 2000).

Pseudo-Dimension Bounds for Quantum Circuits
We now formulate how to characterize the expressive power of quantum circuits.In particular, we consider circuits with n input registers of qudits, size (i.e.number of gates) γ, and depth (i.e.number of layers) δ.More precisely, we consider circuits composed of two-qudit unitaries, i.e. logical gates with two inputs.Note that two-qudit gates include one-qudit gates.We assume that gates in the same layer and acting on disjoint pairs of qudits can act in parallel.
In this section, we assign function classes to quantum circuits and then derive bounds on the pseudo-dimension of these function classes, in terms of the number of qudits and the size and depth of the circuits.First, we fix quantum circuit structure and inputs, varying only the entries of the unitary gates and thereby the resulting function.Then, we broaden our scope to variable circuit architectures, variable inputs, and circuits whose 'gates' are general quantum operations.
An important tool that will recur throughout our work is the following result on polynomial sign assignments, used in (Goldberg and Jerrum 1995) to derive VC-dimension bounds from computational complexity.Here, a "consistent non-zero sign assignment" to a set of polynomials {p 1 , . . ., p m } is a vector b ∈ {±1} m s.t.there exist x 1 , . . ., x n ∈ R for which sgn(p The following implication of Theorem 3.1 for consistent but not necessarily non-zero sign assignments to sets of polynomials was observed in (Goldberg and Jerrum 1995).
Corollary 3.2.Let {p 1 , . . ., p m } be a set of real polynomials in n variables with m ≥ n, each of degree at most d ≥ 1.Then the number of consistent sign assignments to {p 1 , . . ., p m } is at most 8edm n n .
Proof: (Sketch) This can be obtained by applying Theorem 3.1 to the set {p 1 + ε, p 1 − ε, . . ., p m + ε, p m − ε} with ε > 0 chosen sufficiently small.3.1.Fixed Circuit Structure.Suppose we fix the architecture of a quantum circuit of depth δ and size γ.Specifically, we restrict our attention to 2-local quantum circuits, i.e. circuits whose logical gates have support on two qudits, not necessarily neighboring each other.(See Fig. 1.) "Fixed architecture" means that we specify the positions of the two-qudit unitaries, namely their order and which qudits they act on.Though the unitaries' positions are fixed, we may vary the entries of the unitaries themselves.Can we bound the pseudo-dimension of the function class of measurement probability distributions that this circuit generates?And how does the bound depend on d (the dimensionality of the qudits), δ and γ?
To formalize this question: let n ∈ N be the number of qudits, d ∈ N be their dimensionality, and N be a fixed quantum circuit architecture of depth δ and size γ acting on n qudits.We enumerate the positions of the two-qudit unitaries in N by tuples (i, j) with 1 ≤ i ≤ δ denoting the layer and 1 ≤ j ≤ γ i the position of the unitary among all the unitaries inside layer i, where w.l.o.g.we count from top to bottom and take into account only the first qudit on which a unitary acts.Note that δ i=1 γ i = γ, and trivially γ i ≤ γ and γ i ≤ n 2 .We write the unitary at position (i, j) as U (i,j) .These constitute the "free parameters" which we can vary in order to make the quantum circuit perform different tasks.The overall unitary implemented by N when plugging in the unitaries {U (i,j) } 1≤i≤δ,1≤j≤γ i at the respective positions we denote by U N |{U (i,j) } .Note that U N |{U (i,j) } strongly depends on the two-qudit unitaries that are plugged into the architecture, but sometimes we will suppress this dependence and simply write U N for notational ease.The quantum circuit N now gives rise to the following set of output states: These output states in turn give rise to a function class of measurement probability distributions with regard to product measurements: where we take X = S d × . . .× S d to be the Cartesian product of n unit spheres of C d .
The main insight of this subsection is the following: Theorem 3.3.With the notation and assumptions from above, it holds that To prove this result, we provide the following.
Lemma 3.4.With the notation and assumptions from above, there exists a polynomial p N in 2γd 4 + 2dn real variables of degree ≤ 2(γ + n) such that every f ∈ F N can be obtained from p N by fixing values for the first 2γd 4 variables.Moreover, in each term of p, the degree in the first 2γd 4 real variables is ≤ 2γ and the degree in the last 2dn real variables is ≤ 2n.
Proof: We first observe that We study this expression in a layer-wise analysis.When reading the circuit from right to left, the state that enters layer δ is transformed by the unitary U (δ,j) † such that each amplitude of the state after the δ th layer is a linear combination of the amplitudes of |x , where each coefficient is a multilinear monomial of degree γ δ in some of the γ δ • d 4 complex entries of the {U (δ,j) † } 1≤j≤γ δ .By iterating this reasoning, we see that the state after the (δ −i) th layer has amplitudes which are given by a linear combination of the amplitudes of |x , where each coefficient is a multilinear In particular, the |0 ⊗n -amplitude of the state U † N |x can be written as a linear combination of the amplitudes of |x , where each coefficient is given by a multilinear polynomial q N of degree Recalling that the probability of observing outcome |0 ⊗n is the square of the absolute value of the corresponding amplitude of |x , we obtain from the polynomial q N a polynomial p N = |q N | 2 that describes the output probabilities.As q N has degree at most γ in the γ • d 4 complex parameters of the unitaries, p N has degree at most 2γ in the corresponding 2γ•d 4 real parameters.Fixing these 2γd 4 parameters corresponds to fixing the circuit, and therefore one may obtain every f ∈ F N by fixing these parameters in p N .
Moreover, p N is a polynomial in the 2dn real parameters which give rise to the amplitudes of |x .(Here, the assumption that |x is a product state enters.)As each such amplitude has degree ≤ n in the 2dn complex parameters, the degree of p N in the 2dn real parameters is at most 2n.
Remark 3.5.We formulate the result only for measurement operators consisting of tensor products of 1-dimensional projections, and continue to do so throughout this manuscript.For j |j , so we associate dn complex variables with x.That each amplitude of |x can be written as a product of n complex parameters gives rise to the upper bound of n in the degree.
We could instead look at more general measurement operators consisting of 1-dimensional projections without requiring product structure, i.e. entangled measurements.In this scenario, we would write |x = z∈{0,...,d−1} n x z |z , associating d n complex variables with x.In this setup, each amplitude of x is simply a polynomial of degree 1 in these complex variables.
As we fix the variables corresponding to x and y in the shattering assumption that appears in our proof of Theorem 3.3, their corresponding degrees are not relevant to our argument; only the degree in the entries of the unitaries enters our analysis.Therefore, both product measurements or entangled measurements lead to the same pseudo-dimension bound.This is due to the fact that allowing for entangled measurements changes the set of allowed inputs but not the function class itself.Now that we have established Lemma 3.4, we can prove Theorem 3.3 with reasoning analogous to that in (Goldberg and Jerrum 1995).
By Lemma 3.4, there exists a polynomial p N in 2γd 4 + 2dn real variables of degree ≤ 2(γ + n) such that for every C ⊂ {1, . . ., m} there exists an assignment Ξ C to the first 2γd 4 variables of In particular, this implies (using the "moreover" part of Lemma 3.4) that the set i=1 is a set of m polynomials of degree ≤ 2γ in 2γd 4 real variables that has at least 2 m different consistent sign assignments.
The attentive reader may notice that we do not explicitly refer to the unitarity assumption in our reasoning; our argument mainly uses linearity.This already hints at a generalization to quantum circuits not of unitaries but of operations, which we will describe in subsection 3.4.In that subsection, we will also see how the unitarity assumption implicit in this proof produces a better upper bound than in the general setting of quantum operations.
Remark 3.6.We formulate our bounds in terms of the pseudo-dimension, not its scale-sensitive version called fat-shattering dimension, even though the latter is more commonly used in classical learning.In our scenario, however, the pseudo-dimension and the fat-shattering dimension effectively coincide.This is because we could apply our reasoning for general matrices instead of only unitaries in the setting of Theorem 3.3 as well and achieve the same bounds.In that case, however, the resulting real-valued function class is closed under scalar multiplication with non-negative scalars and it follows from the definition that for such classes, the fat-shattering dimension equals the pseudo-dimension.
3.2.Variable Circuit Structure.Whereas in the previous subsection we fixed a quantum circuit architecture and only varied the entries of the two-qudit unitaries plugged into this structure, we now additionally vary the structure of the quantum circuit architecture itself and consider the complexity of the class of all quantum circuits of a given depth and size.Once again, we consider 2-local quantum circuits, i.e. circuits with one-and two-qudit gates acting on arbitrary pairs of qudits.
The class of states which is of relevance in this analysis is Again, this set of states gives rise to a function class via where X is as above given by X = S d × . . .× S d .As before, we want to bound the pseudodimension of this function class.
We summarize the result of this subsection in the following: Theorem 3.7.With the notation and assumptions from above, it holds that As with Theorem 3.3, the main step towards this result consists of relating the functions appearing in F δ,γ to polynomials.The difference here is that we must upper bound the number of polynomials, as below.
Lemma 3.8.With the notation and assumptions from above, there exists a set P δ,γ of polynomials in 2γd 4 + 2dn real variables of degree ≤ 2(γ + n) such that for every f ∈ F δ,γ there exists a polynomial p ∈ P δ,γ such that f can be obtained from p by fixing values for the first 2γd 4 variables, and such that Moreover, in each term of p ∈ P δ,γ the degree in the first 2γd 4 real variables is ≤ 2γ and the degree in the last 2dn real variables is ≤ 2n.
Proof: Given an ordering of the γ two-qudit gates, there are at most γ! δ γ−δ (γ−δ)!ways to assign them among the δ layers.The term γ! (γ−δ)!counts assigning a single gate to each layer, to ensure that there are no trivial (empty) layers.Having assigned each layer one gate, the remaining γ − δ gates may be distributed to any of the δ layers.
Fix a layer i, with 1 ≤ i ≤ δ.Assume that there are γ i two-qudit unitaries to be applied in layer i.Then there are n 2γ i ways of choosing the qudits on which the unitaries act.After this choice is made, there are (2γ i )! ways to form pairs from these 2γ i qudits.Note that here, the order of the pairs as well as the order of the qubits inside each pair is relevant.In total, there are (2γ i )! • n 2γ i ways of assigning qudits to the unitaries in the i th layer.Hence, there are at most different quantum circuit architectures.The proof is completed by applying Lemma 3.4 to every such quantum circuit architecture.Now that we have established Lemma 3.8, we can prove Theorem 3.7 by reasoning analogous to that in (Goldberg and Jerrum 1995).See the Appendix for the proof of Theorem 3.7.
3.3.Extension to Circuits with Variable Inputs.We now modify the results of 3.1 and 3.2 to allow not only for the fixed input |0 ⊗n , but also for variable input.This is of use, for instance, in subsection 4.2, in which we consider the PAC-learnability of quantum circuits (of unitary gates or more general quantum channels).In that context, allowing variable input amounts to learning the entire quantum circuit, rather than just its action on |0 ⊗n .This is necessary in order to meaningfully compare the learning problem in subsection 4.2 to exact circuit tomography.
To consider variable input states, we define the following function classes, analogously to those in subsections 3.1 and 3.2: where Y can be taken as the computational basis states {0, 1, ..., d − 1} n , or more generally as Lemma 3.9.With the notation and assumptions from above the following holds: There exists a polynomial p ′ N in 2γd 4 + 4dn real variables of degree ≤ 2γ + 4n such that every f ∈ F ′ N can be obtained from p ′ N by fixing values for the first 2γd 4 variables.Moreover, in each term of p N the degree in the first 2γd 4 real variables is ≤ 2γ, the degree in the 2dn real variables corresponding to x ∈ X is ≤ 2n, and the degree in the 2dn real variables corresponding to y ∈ Y is ≤ 2n.
Proof: Consider the product state input |y = z y z |z .As we consider product states, each y z is a product of n complex parameters.Following the same reasoning as before, for a fixed z ∈ {0, . . ., d − 1}, z|U N |x is a multilinear polynomial q z N .Then, the amplitude y|U N |x is In the above equation, q ′ N (x, y) has degree at most n in y, and so upon squaring the amplitude q ′ N (x, y) to obtain p ′ N (x, y) as in 3.4, we have a degree at most 2n in the 2dn real variables corresponding to y.The rest follows from Lemma 3.4.
The bound from Theorem 3.3 still holds for the case of variable circuit input, with the proof proceeding almost identically upon replacing Lemma 3.4 by Lemma 3.9.The 2d • n additional variables that arise from the polynomial y-dependence do not alter the bound because we fix the values of these variables in the pseudo-shattering assumption.
3.4.Extension to Circuits of Quantum Operations.We finish this section by describing an extension of Theorems 3.3 and 3.7 to the case of circuits of quantum operations, instead of only unitaries.This generalization is relatively straightforward because the decisive property of unitaries used in our previous proofs was not the preservation of inner products, but rather linearity.This setting is useful to e.g.describe circuits with imperfect gates.Rather than consider a logical gate that implements a unitary exactly, each gate can instead be considered a quantum operation that executes the desired unitary with some probability, and e.g.depolarizes input qudits with some probability.(Other noise models are of course possible.) We use analogous notation to that introduced at the beginning of subsection 3.1, writing T N |{T (i,j) } for the overall quantum operation implemented by N when plugging the two-qudit quantum operations {T (i,j) } 1≤i≤δ,1≤j≤γ i into the respective positions of the quantum circuit.
The quantum circuit N (of operations) now gives rise to the set of output states By taking into account all possible quantum circuits of size γ and depth δ, we obtain These states now yield again a p-concept class In this scenario, we show: Theorem 3.10.With the notation and assumptions from above, it holds that Proof: We only sketch the reasoning, as it is similar to that in the proof of Theorem 3.7.We first need to establish an analogue of Lemma 3.8.To this end, observe that a quantum operation acting on two-qudit states can be interpreted as a d 4 ×d 4 matrix with complex entries.Moreover, we may write where T * N denotes the adjoint operation of T N with regard to the Hilbert-Schmidt inner product.As before, we can do a layer-wise analysis of the transformation of |x x| and observe that the entries of the (sub-normalized) density matrix after a layer can be written as linear combinations of the entries of the (sub-normalized) density matrix before the layer.Moreover, the coefficients can be written as multilinear polynomials with the degree determined by the number of two-qudit operations in the layer.Hence, we obtain the result of Lemma 3.4 with d 8 instead of d 4 .The bound on the number of different quantum circuit architectures can be derived in exactly the same way as before, so the analogue of Lemma 3.8 holds, completing the proof of the theorem.Theorem 3.10 and its proof sketch also help to elucidate the relevance of the unitarity assumption in Theorems 3.3 and 3.7.The unitarity assumption allows us to work on the level of pure states, and to consider only a class of quantum operations with a smaller number of free parameters.This effectively reduces the dimension of each two-qudit subsystem from d 8 to d 4 for the respective operations, or from d 4 to d 2 for the states.

Applications
In this section, we explore two different applications of our pseudo-dimension upper bounds.First, we employ the pseudo-dimension to exhibit a large but finite discrete set of quantum states, out of which at least one is hard to implement.Second, we combine the pseudo-dimension bound with results from the theory of p-concept learning to derive the PAC-learnability of quantum circuits.
4.1.Quantum State Complexity Lower Bounds.It is well known that almost all n-qubit unitaries require an exponential (in n) number of 2-qubit unitaries to be implemented.Similarly, almost all pure n-qubit states require an application of exponentially (in n) many 2-qubit unitaries to be generated from the |0 ⊗n state (see e.g.Nielsen and Chuang 2010).However, in neither case are there explicit examples of unitaries or states saturating this exponentiality bound.(See (Aaronson 2016) for more information on unitary and state complexity.)We will use the pseudo-dimension as a tool to exhibit a discrete set of pure qubit states such that at least one of them requires exponentially many 2-qubit unitaries to be generated from |0 ⊗n .
The drawback of our result is that the size of this set is 2 2 n and thus unsatisfyingly large.By relatively simple deliberations this size can be reduced by an order of 2 n elements, though this is negligible compared to the overall size.
We now describe the construction of the candidate set of states.For a subset C ⊂ {|x0 } x∈{0,1} n , namely a subset of the set of all computational basis states of n + 1 qubits that end on 0, with C = ∅, define For C = ∅ we take (Note that the (n + 1) st qubit only really matters for ρ ∅ .)Our set of interest will be This discrete set of 2 2 n mulit-qubit quantum states now gives rise to a class of p-concepts This class has large pseudo-dimension, as described in the following lemma.
Proof: Consider the subset of computational basis states {|x0 } x∈{0,1} n and the corresponding threshold values y x0 = 0 independently of x0.By construction of S and thus F S the following holds: For any Hence, Pdim(F S ) ≥ 2 n , because we have found an example of a set of size 2 n that is pseudoshattered.
We now combine this simple observation with Theorem 3.7, which gives us the following: Theorem 4.2.With the notation introduced above, if γ and δ are such that each state in S can be generated from the state |0 ⊗(n+1) by some circuit of size γ and depth δ, then Note that any set of functions which pseudo-shatters a set of size 2 n has to have at least 2 2 n elements.Hence, the large size of the set C is an automatic consequence of our line of reasoning.
Remark 4.4.We note that a set of n-qubit states with cardinality doubly exponential in n s.t. at least one of them needs an exponential number of gates (up to logarithmic factors) to be implemented can also be obtained with more standard reasoning.Namely, it is well known that there are n-qubit states the approximation of which up to trace-distance ε requires Ω unitary gates (see Nielsen and Chuang 2010, chap. 4.5.4).So if we pick a 1 2 -net of size O 2 2 n for the set of pure n-qubit quantum states, this will have the desired properties.
We sketch another way in which our pseudo-dimension bound could be used in quantum state complexity and which might potentially lead to a smaller set of candidates.Given pure n-qubit states |ψ 1 , . . ., |ψ m and efficiently implementable (i.e. with polynomially many 2-qubit unitary gates arranged in polynomially many layers) unitaries U 1 , . . ., U k , one can study the set of states {U i |ψ j } 1≤i≤k,1≤j≤m .If an exponential (in n) pseudo-dimension lower bound can be established then, since every U i is efficiently implementable, one can conclude that at least one among the states |ψ j is not efficiently implementable.
The advantage of such a pseudo-dimension-based reasoning would be that m need not be doubly exponential in n, since we can compensate for this in k.This realization can already be used to reduce the size of the set of candidate states given in Corollary 4.3.However, we have not yet been able to identify sufficiently many efficiently-implementable unitaries to reduce the size below doubly exponential.Nevertheless, there is likely room for improvement in applying our method to state complexity 4.2.Learnability of Quantum Circuits.We now use our pseudo-dimension bounds to study learnability.Specifically, we use the pseudo-dimension bound for the case of variable inputs (Subsec.3.3) combined with the generalization to quantum operations (subsection 3.4).We proceed quite similarly to (Aaronson 2007).
The learning problem which we want to study is the following: Let µ ∈ Prob((X × Y ) × [0, 1]) be a probability measure (unknown to the learner).Let S = {((x (i) , y (i) ), p (i) )} m i=1 be corresponding training data drawn i.i.d.according to µ.A learner must, upon input of training data S, size Γ ∈ N, depth ∆ ∈ N, confidence δ ∈ [0, 1), accuracy, ε ∈ [0, 1) and error margin β ∈ (0, 1), with probability ≥ 1−δ with regard to the choice of training data, output a hypothesis quantum circuit N of size Γ and depth ∆ consisting of two-qudit operations such that where the infimum runs over all quantum circuits M of size Γ and depth ∆.Here, f N denotes the function f N (x, y) = x|T N (|y y|)|x and f M is defined analogously, similarly to subsection 3.3.
We use our pseudo-dimension bound in order to upper-bound the size of the training data sufficient for solving this task.More precisely, we make use of sample complexity upper bounds from the fat-shattering dimension as proved in (Anthony and Bartlett 2000;Bartlett and Long 1998), together with the fact that the fat-shattering dimension is upper-bounded by the pseudodimension.
First we restrict our scope to the "realizable" scenario, i.e. we will assume the probability measure to be of the form for some quantum circuit N * of size Γ and depth ∆.This will in particular imply that for quantum circuits M of size Γ and depth ∆ inf Colloquially, realizability means that there exists a set of "correct" parameters Γ and ∆ and these are known to the learner, i.e. training samples are promised to be drawn from circuits of size Γ and depth ∆.
+ log 1 δ suffices to guarantee that, with probability ≥ 1 − δ with regard to the choice of training data S, In our setting, this result implies: Corollary 4.6.Let N * be a quantum circuit of quantum operations with size Γ and depth ∆.Let µ ∈ Prob(X × Y ) be a probability measure unknown to the learner.Let + log 1 δ suffice to guarantee that, with probability ≥ 1 − δ with regard to choice of the training data, any quantum circuit N of size Γ and depth ∆ that satisfies Proof: Combine Theorem 4.5 with Theorem 3.7 (more precisely, with its version for variable input states, which can be proved for operations analogously to the reasoning in subsection 3.3) and use that the fat-shattering dimension is always upper-bounded by the pseudo-dimension.
Note that in particular, this implies that for the class of circuits of quantum operations with polynomial size and depth in the number of qudits, a hypothesis that performs well on training data will also perform well in a probably approximately correct sense.
Next, we want to discuss briefly how our result compares to the work (Aaronson 2007) on the learnability of quantum states.There, it is shown that quantum states can be PAClearned with a sample complexity that depends linearly on the number of qubits and (among other dependencies) polynomially on 1 ε , where ε denotes the desired accuracy.However, this result does not imply learnability of quantum channels with a sample complexity that depends polynomially on the number of qubits.This observation is already stated in (Aaronson 2007), and we provide an alternate, intuitive explanation for why the result on states does not directly apply to operations.
One can straightforwardly apply the result of (Aaronson 2007) to learn the Choi-Jamiolkowski state of a quantum channel.One can then compute measurement probabilities of output states of a channel T acting on n-qubit states, using its Choi-Jamiolkowski state τ .For this we must make use of the formula tr Here, we see that any error on the side of the Choi-Jamiolkowski state will be multiplied by a factor exponential in n, and thus in this case the overall n-dependence of the sample complexity bound from (Aaronson 2007) becomes exponential via the accuracy-dependence.This motivates our study of learnability of a restricted class of quantum operations.Finding such operations for which process tomography is possible was left as an open problem in (Aaronson 2007).Our answer to this question is that a PAC-version of quantum process tomography is possible when we restrict our scope to operations can be implemented by quantum circuits of depth and size polynomial in the number of qudits.However, note that this is subject to a realizability assumption: the learner must known in advance a polynomial bound on the size and depth of the circuit.We show that imposing the operations be efficiently implementable automatically reduces the information-theoretic complexity of learning, requiring only a modest number of training examples.We do not make any statement about the computational complexity of this learning task; this remains an open problem.
How can this probably approximately correct version of quantum process tomography be put to use?Given polynomially many uses of a black box implementing an unknown quantum operation of polynomial size and depth, one can exhibit a circuit of two-qudit quantum operations that approximates the unknown channel.In other words, we obtain a classical description of an approximate copy of the channel.

Open Problems
Finally, in this section we discuss future directions and possible generalizations of our results.Two natural parameters of a circuit, depth and size, appear polynomially in the pseudodimension upper bounds.Notably, these bounds are independent of the number of qudits in the circuit.Are our upper bounds tight in their dependence on size and depth?Can similar techniques produce pseudo-dimension lower bounds?
Our application of pseudo-dimension for state complexity lower bounds complements known methods (described e.g. in Nielsen and Chuang 2010) based on counting dimensions or covering arguments.We exhibit a class of states of size 2 2 n , for which at least one has exponential state complexity.Can we exploit this new technique to exhibit a smaller set of states?Perhaps the most exciting application of pseudo-dimension bounds could be provable complexity lower bounds, if the reasoning in 4.1 is sharpened or the tools are developed further.
If circuit depth and size are known in advance, one can information-efficiently learn the circuit.If the learner receives training data generated by an approximation of the circuit, does the result still hold?Can the realizability assumption be relaxed?
Does "pretty-good circuit tomography" have applications?On the theory side, this might involve exploiting the learning process as an approximate copy-machine for quantum circuits.Of interest for both theory and experiment is whether circuits can be learned with a reasonable amount of computation.One can imagine progress on this question for process tomography similar to that for state tomography; demonstrating a class of states for which learning is computationally efficient in (Rocchetto 2018) made it possible to learn physically interesting states in a laboratory in (Rocchetto et al. 2019).An efficiency improvement in the process tomography case might also have experimental ramifications.
Theorem 3.1.(Warren 1968, Theorem 3) Let {p 1 , . . ., p m } be a set of real polynomials in n variables with m ≥ n, each of degree at most d ≥ 1.Then the number of consistent non-zero sign assignments to {p 1 , . . ., p m } is at most 4edm n n .
Under the assumption of the Theorem we can conclude F S ⊂ F δ,γ .Now combine the lower bound of Lemma 4.1 with the upper bound from Theorem 3.7.Corollary 4.3.There exists a C ⊂ {|x0 } x∈{0,1} n such that |ψ C = 1 √ |C| x0∈Ccannot be implemented by a quantum circuit of 2-qubit unitaries with subexponential (in n) size or depth.
Theorem 4.5.(Anthony and Bartlett 2000, Corollary 3 .C. and I.D. thank Michael Wolf for suggesting this problem and both Michael Wolf and Yifan Jia for insightful discussions.M.C.C. gratefully acknowledges support from the TopMath Graduate Center of the TUM Graduate School at the Technische Universität München, Germany, and from the TopMath Program at the Elite Network of Bavaria.I.D. gratefully acknowledges that this material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No. DGE 1656518, and by the German Academic Exchange Service (DAAD) under Grant No. 57381410.Any conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the aforementioned institutions.