Unfolding measurement distributions via quantum annealing

High-energy physics is replete with hard computational problems and it is one of the areas where quantum computing could be used to speed up calculations. We present an implementation of likelihood-based regularized unfolding on a quantum computer. The inverse problem is recast in terms of quadratic unconstrained binary optimization (QUBO), which has the same form of the Ising hamiltonian and hence it is solvable on a programmable quantum annealer. We tested the method using a model that captures the essence of the problem, and compared the results with a baseline method commonly used in precision measurements at the Large Hadron Collider (LHC) at CERN. The unfolded distribution is in very good agreement with the original one. We also show how the method can be extended to include the effect of nuisance parameters representing sources of systematic uncertainties affecting the measurement.

Problems in experimental high-energy physics have always challenged information and communication technologies.The Large Hadron Collider at CERN is a great example of this, and with its upgrade program it is crucial that the computational resources continue scaling [1].Quantum computing is one the technologies that holds the promise of speeding up computationally expensive tasks [2].Progress has been rapid in the development of quantum computing hardware, but implementations remain imperfect.Noise and limited scalability are the primary issues, leading to the term noisy, intermediatescale quantum (NISQ) era [3].Given the potential of quantum computing, there has been some exploratory work on using quantum devices in high-energy physics, for instance, in finding the Higgs boson in a simplified search domain [4].This proof-of-concept was shown on a programmable quantum annealer [5,6], which is the largest scale quantum computer currently available.We use the same architecture to apply quantum computing in a practical problem and solve a frequently appearing task in experiments: unfolding.
Unfolding is the procedure of correcting for distortions due to limited resolution of the measuring device [7] as found in particle physics [8,9], astronomy [10] and other fields.This mathematical treatments is also known as the inverse problem or deconvolution.Unfolding is not necessary if the only goal is to compare the theory with the experimental results.In this case, a simulation of the experimental apparatus is used to account e.g. for the interaction of radiation with matter, nuclear interactions, lens distortions, etc.In practical terms, this is usually carried out using computer codes such as Geant4 [11], FLUKA [12], and Delphes [13].On the other hand, unfolding is essential if the aim is to compare measurements coming from different experiments.In general, each experimental apparatus has a unique signature in terms of detection efficiency, geometric acceptance and resolution.The overall effect is that the numerical value of some quantity generated in a given physical process is changed by the process of measurement.Unfolding is the mathematical procedure to "correct" for these effects and recover the original value.Solving unfolding is a computationally challenging task: in what follows, we demonstrate how to map the problem to a quantum computer to accelerate the computations with upcoming quantum hardware.Fig. 1 outlines the proposed scheme.Usually, the physical observable x being measured is distributed with a probability density function f (x) whose functional form is not known a priori.It is customary to make use of binned distributions in the form of normalized histograms.Then, probability p j of finding x in the bin j is simply given by the integral of f (x) in that bin, i.e. : (1)

arXiv:1908.08519v1 [physics.data-an] 22 Aug 2019
The sum of all p j is equal to one.This probability distribution is turned into an actual physical prediction by multiplying the value of each bin by a dimensionful quantity such as a cross-section.Typically, this probability density function is estimated by simulating the physics process multiple times, using Monte Carlo methods, as it is done for example in so-called event generators such as Pythia [15] or Herwig [16].The result of this operation is a histogram representing the "true" distribution x.After applying the effect of the experimental apparatus, one obtains another histogram corresponding to the "reconstructed" distribution y.
In its simplest form, the method to unfold the measurement to the truth level is to estimate efficiency factors j for each bin from the simulation: The unfolded distribution x is thus obtained by applying these efficiency factors to a measured distribution d such that: This method, called bin-to-bin correction, has strong limitations: the histograms representing distributions before and after interacting with the experimental apparatus (in the following referred to as truth-and reco-level respectively) must have the same binning, correlations between adjacent bins are neglected, and there is no way to account for migration of events generated in a truth bin j but ending up in a reco bin i.
These difficulties can be overcome if we promote the efficiency correction factors j to a matrix R ij (usually called response matrix), estimated from the simulation, such that: In principle, the number of bins at reco and truth level do not have to match, allowing also for over-and underconstrained unfolding.Typically, R is presented as a matrix of (reco, truth) with rows that are normalized to unity, so that each entry is the probability that events produced in a given truth bin j will be observed in the reco bin i.The inverse of this matrix multiplication, as applied in Eq. 5, is commonly known as unregularized matrix inversion unfolding.
Despite its relative simplicity, this method still suffers from numerical instabilities due to the inversion of the response matrix.This can be exacerbated by limitations in the simulations that may require very large computational resources to estimate off-diagonal elements with a high degree of precision.These elements encode very unlikely situations where the limited resolution causes migrations, or a dramatic drop of efficiency or acceptance due to incomplete hermiticity, lack of instrumentation and other effects.One possible approach is to constrain the truth-level distribution x by maximizing a likelihood function L(y|d) that depends on the estimated reco-level spectrum y = y(x) and observed data d.In the case of a counting experiment, the likelihood is usually the product of Poisson or Gaussian (in the limit of large y) distributions for each bin.Ideally, the truth-level unfolded distribution x is expected to show a good degree of regularity.For example, there should not be sharp transitions unless otherwise anticipated from a theoretical model.Among various possibilities, the most common way to achieve this is to impose an additional constraint in the form of a Lagrange multiplier added to the likelihood function, whose effect is to favour smooth solutions.The strength of the regularization can be controlled by an additional parameter β.A common measure of smoothness to be minimized is the second derivative of the distribution.This is an example of what is known more generally as Tikhonov regularization.
The complete likelihood to be maximized is thus: For numerical stability reason, it is often preferable to minimize the logarithm of the likelihood, log L(y|d).More advanced methods [17,18] have been developed to integrate out nuisance parameters, i.e. parameters needed to account for extra variation in the model such as sources of systematic uncertainty.
QUBO formulation of the unfolding problem-To translate matrix operations to the quantum computing realm, an intermediate step has to be taken: the likelihood function presented in Eq. 6 has to be adapted so that the unfolding can be expressed in terms of a binary optimization, which in turn can be implemented on a quantum annealer.
The quadratic unconstrained binary optimization (QUBO) model requires the minimization of an objective function: where q is a vector of binary elements (observing that q 2 j = q j if q j can only be either 0 or 1) and C is a square matrix of constants, often presented in upper triangular form.QUBO is a special case of quadratic programming.Usually, the minimization is subject to a number of constraints, which can be encoded in a quadratic penalty term, such that: The linear constraints appear on the diagonal of the C matrix, while the off-diagonal elements encode the quadratic penalty terms.
To rewrite the unfolding equation, we follow the approach described in Ref. [19].First, we replace the Poisson term with a sum of squares of differences, i.e. the square of the L 2 norm between the observed data and the unfolded spectrum, which corresponds to taking the Gaussian approximation.Then, the regularization is obtained by minimizing the second derivative up to a multiplicative factor λ, which is obtained by multiplying the truth-level spectrum x by the Laplacian operator D. Thus, the objective function to be minimized is: The detailed conversion of Eq. 12 into QUBO form can be found in the appendix.Among some other algebraic manipulations, we have to deal with the fact that the weights above have been derived for a real-valued vector x (of N elements) rather than a binary-value vector q of nN elements, where n is the number of bits.This can easily resolved by constructing equivalent matrices that act upon the discretized version of x.This can be achieved by expanding x as: The QUBO is formulated from Equation 13, by replacing x i with its binary encoding.
This way, the QUBO minimization can be expressed in terms of the binary vector q: Systematic uncertainties-Sources of systematic uncertainty shift the value of each bin by a certain amount.Typically, variations corresponding to ±1 standard deviation are estimated from the simulation, and an interpolation is done during the optimization.For each bin i, the effect of the k-th systematic is represented by an additional term, so that the predicted spectrum is: where y is the central prediction, s ik is the variation in bin i due to systematic k of strength z k , which corresponds to one standard deviation when z k = 1.
For the purpose of the QUBO optimization, the effect of systematic errors is encoded by extending the truthlevel vector and the response matrix.In this case the values of the systematic uncertainties can be altered and the predicted spectrum will change accordingly.Generally, such an equation does not have a unique solution as it corresponds to an under-constrained unfolding (there are more parameters to fit at truth-level than bins at reco-level).
An additional term S is introduced to penalize large systematics, with some strength, γ.The final objective function to be minimized is thus: For the derivation, refer to the appendix.In this case, the systematic shifts z k are being minimized simultaneously to the spectrum, with a penalty imposed for any deviation from their nominal value of 0, with a strength controlled by the parameter γ.
Quantum Annealing-Many NISQ-era quantum devices such as programmable quantum annealers [5], variational quantum eigensolvers [20] and variational circuits [21] on gate-model quantum computers, or networks of degenerate optical parametric oscillators [22] are proposed for solving QUBO tasks [23][24][25].In most cases, the QUBO is mapped to a classical Ising model with a simple change of variables.This includes the D-Wave 2000Q quantum annealer with 2038 manufactured spins that our experiments relied on.On this device, the coupling and onsite fields of the Ising model are bounded the following way: where Ĥ is the cost Hamiltonian for which the annealer attempts to find the ground state.
Results-We tested the method described above in a realistic situation, where the quantity to be unfolded is represented by a histogram with 5 bins and migrations are not negligible (i.e.non-zero off-diagonal elements in the response matrix) [26].We assume that the distortions introduced by the detector are perfectly known, hence we applied the same response matrix to both the signal and pseudo-data truth-level distributions in order to obtain their corresponding reco-level histograms, which will be different in general.The pseudo-data reco-level histogram is then unfolded using the following methods: D'Agostini iterative Bayesian [17] with N itr = 4; QUBO solved on the CPU by the neal simulator; and QUBO executed by the real QPU, with the regularization strength λ set values between 0 and 1.
A first test is shown in Fig. 2, where a peaking distribution is unfolded.This is representative of the mass spectrum in presence of a resonance.All the unfolding methods are able to recover the truth-level distribution.There is no apparent difference between the lower-noise annealer DW 2000Q 5 solver and the older regular-noise DW 2000Q 2 1.
Fig. 3 shows the unfolding applied to a steeply falling spectrum, representative of a certain class of observables such as the transverse momentum of a particle.In this case, the unregularized QUBO is in good agreement with both the truth distribution and the D'Agostini method.The impact of the regularization is evident as larger and larger values of the λ parameter result in flatter and flatter solutions which do not agree well with the truth level.This is a confirmation of the well-known fact that the regularization is problem-dependent, and an optimal value of its strength has to be carefully estimated e.g. from the simulation.
The good agreement with the simulated annealing and with the D'Agostini method indicates that with a large enough numerical precision, the QUBO unfolding would have a comparable performance with the traditional methods.On the other hand, the execution on the QPU is less accurate due to hardware limitations.
A further test was performed to compare the difference between a 4-bit and 8-bit encoding.The result is shown in Fig. 4. In principle, with more bits it should be possible to have a better numerical precision.However, it also results in longer chains of qbits in the minor embedding.
To test the effect of systematic uncertainties, we added one component to the pseudo-data: a shape systematic whose effect is to distort the number of entries in each bin.We set the strength of this systematic to -0.75.The unfolding is shown in Fig. 5.The discretized nature of the optimization introduces some interesting features.For this particular setup, the full solution is not degenerate, even when no penalty is applied for increasing the systematic uncertainty (i.e.γ = 0).In this case, the simulated annealing correctly picks the value of the systematic and each bin of the distribution.The QPU also picks out this solution on average.
If a sufficiently large penalty parameter is applied (in this case, γ=1000), the annealing prefers a value of the systematic strength close to zero.In order to accommodate this choice, the bins gets shifted.The solution found by the simulated annealer in this case matches the brute force solution, however this solution is not found by the QPU.While more work remains to understand many subtleties, a more refined version of this methodol-ogy could be used to estimate the impacts of systematic uncertainties.This is an area for continued work.Finally, we used the DWave-Hybrid software framework to find the solution by searching for the state of minimum energy using a Tabu heuristic algorithm [27,28] and the QPU in parallel.The result we obtained is completely in agreement with both the truth-level distribution and the one obtained with the simulated annealing.While the hybrid approach goes well beyond the needs of the scenario under consideration, we argue that this is likely to be the key for a typical measurement at the LHC where the number of bins is larger than 10 and the number of sources of systematic uncertainty is on the order of 100.
Conclusions-NISQ-era quantum devices are meant to work in close integration with classical computing resources, creating hybrid classical-quantum algorithms.We found that using the quantum annealer together with

Unfolded
True value lower noise 4bits lower noise 8bits regular noise 4bits regular noise 8bits FIG. 4. distribution of pseudo-data, unfolded with either a 4-or 8-bits encoding, of a peaked spectrum.The data corresponding to runs submitted to the QPU are averaged over 20 with 5000 reads each.The maximum chain length in the best embedding is 7 and 15 qbits respectively.The 8-bit encoding allows a more fine-grained estimation of the unfolded histogram.a classical optimization metaheuristics helps with two problems: (1) problems larger than the hardware spin system can be solved; (2) results are quickly refined into the optimal solutions in a few iterations between the quantum hardware and the classical algorithm.The close agreement between the correct results and the one obtained by the hybrid classical-quantum protocol indicates the relevance of this paradigm to practical problems in experimental high-energy physics.
Upcoming quantum hardware, like the Pegasus architecture that has a denser connectivity and over five thousand qubits [29], could easily tilt the balance towards the quantum part of the protocol, yielding faster time to solution compared to a purely classical solver.The scheme we introduced here may also be amenable to taking into account discretization effects, which can be significant in problems with low statistics [30].

APPENDIX
This appendix gives the detailed derivation of mapping the unfolding problem to a quadratic unconstrained binary optimization (QUBO).After deriving the basic formulation with Tikhonov regularization, we add an additional regularization term for systematic uncertainties.
Deriving the QUBO formulation of the unfolding problem To rewrite the unfolding equation, we follow the approach described in Ref. [19].First, we replace the Poisson term with a sum of squares of differences, i.e. the square of the L 2 norm between the observed data and the unfolded spectrum.Then, the regularization is obtained by minimizing the second derivative up to a multiplicative factor λ, which is obtained by multiplying the truth-level spectrum x by the Laplacian operator, e.g.: Thus, the objective function to be minimized is: To convert Eq. 19 into QUBO form such as Eq. ( 12) in the main text, we rewrite the equations using index notation, i.e.: A summation over repeated indices is implicit, unless otherwise noted.Thus, we have: The term d i d i can be ignored, since it adds a constant, which does not affect the optimization.Before determining the QUBO weights, we need to encode the floating point values x i into an appropriate binary format for the QUBO.
The simplest encoding, which is suitable for some situations, is a standard n-bit integer encoding: where q a ∈ {0, 1}.However, this is somewhat limiting for two reasons: firstly, this limits the values to positive integers; secondly, current quantum computing hardware requires the bit encodings to be small, which may put limitations on the range of values which can be encoded.
Standard floating point encodings do not lend themselves naturally to the QUBO problem because the exponent bits give rise to non-quadratic terms.Here, we chose to use a non-standard encoding.Each value x i is encoded with n bits using an offset, α i , and a scaling parameter, β i : The values of the offset and scaling parameters are stored on the classical computer, so that they can be used to encode and decode the the QUBO results.In order to provide additional flexibility, each value in x may be encoded with a different number of bits.This flexibility is implemented in the code, but for clarity of exposition, the fixed-bit encoding is given here.To simplify the notation, we define β ia instead of writing the sum over j: Defining a 0 ≡ i × n, we can reproduce Equation 25 by setting β ia = β i 2 n−(a−a0) for 0 ≤ (a − a 0 ) < n and 0 otherwise.The QUBO is formulated from Equation 23, by replacing x i with its binary encoding.Additionally, for notational convenience, we define W jk ≡ (R ij R ik + λD ij D ik ).This gives: As before, constant terms (those not containing q a ) can be removed.Expressing in terms of the binary values q a , we also have q a q a = q a .Enforcing the requirement a < b then gives the QUBO weights: The repeated index a is not summed over in Equation 29.In general, determining the parameters α i and β i is a problem-specific task.They define the space of possible solutions, and therefore can be chosen to force certain constraints.For example, for problems where x ∈ N choosing α i , β i ∈ N will ensure the solution also respects this form.Nonetheless, the solution will only span a certain range.We use a heuristic to form initial guesses for suitable choices.Additional heuristics and criteria for updating the values based on results can be developed in a straightforward manner.
Deriving the QUBO formulation of systematic uncertainties-Sources of systematic uncertainty shift the value of each bin by a certain amount.Typically, variations corresponding to ±1 standard deviation are estimated from the simulation, and an interpolation is done during the optimization.For each bin i, the effect of the k-th systematic is represented by an additional term, so that the predicted spectrum is: where y is the central prediction, s ik is the variation in bin i due to systematic k of strength z k .The shifts are normalized to correspond to one standard deviation when z k =1.
For the purpose of the QUBO optimization, the effect of systematic is encoded by extending the truth-level vector and the response matrix as: x N z 1 . . . .
In this case the values of the systematic uncertainties can be altered and the predicted spectrum will change accordingly.Generally, such an equation does not have a unique solution as it corresponds to an under-constrained unfolding (there are more parameter to fit at truth-level than bins at reco-level).An additional term is introduced to penalize large systematics, i.e. : The final objective function to be minimized is thus: The QUBO weights are then as given in Equations 28 and 29 but with In this case, the systematic shifts z k are being minimized simultaneously to the spectrum.A penalty is imposed for any deviation from their nominal value of 0, with a strength controlled by the parameter γ.

FIG. 1 .
FIG.1.The unfolding procedure corrects measured distributions, removing distortions caused by the detector.Here, we implement unfolding on a quantum computer[14].
FIG.2.Truth-level distribution of pseudo-data, unfolded with different methods, of a peaked spectrum.The data corresponding to runs submitted to the QPU are averaged over 20 executions with 5000 reads each.
FIG.3.Truth-level distribution of pseudo-data, unfolded with different methods and values of the regularization strength λ, of a steeply-falling spectrum.The data corresponding to runs submitted to the QPU are averaged over 20 executions with 5000 reads each.

FIG. 5 .
FIG.5.Truth-level distribution of pseudo-data, unfolded with different methods, of a peaked spectrum including the effect of a shape systematic.The data corresponding to runs submitted to the QPU are averaged over 20 executions with 5000 reads each.