Quantum tomography for collider physics: Illustrations with lepton pair production

Quantum tomography is a method to experimentally extract all that is observable about a quantum mechanical system. We introduce quantum tomography to collider physics with the illustration of the angular distribution of lepton pairs. The tomographic method bypasses much of the field-theoretic formalism to concentrate on what can be observed with experimental data, and how to characterize the data. We provide a practical, experimentally-driven guide to model-independent analysis using density matrices at every step. Comparison with traditional methods of analyzing angular correlations of inclusive reactions finds many advantages in the tomographic method, which include manifest Lorentz covariance, direct incorporation of positivity constraints, exhaustively complete polarization information, and new invariants free from frame conventions. For example, experimental data can determine the $entanglement$ $entropy$ of the production process, which is a model-independent invariant that measures the degree of coherence of the subprocess. We give reproducible numerical examples and provide a supplemental standalone computer code that implements the procedure. We also highlight a property of $complex$ $positivity$ that guarantees in a least-squares type fit that a local minimum of a $\chi^{2}$ statistic will be a global minimum: There are no isolated local minima. This property with an automated implementation of positivity promises to mitigate issues relating to multiple minima and convention-dependence that have been problematic in previous work on angular distributions.


I. INTRODUCTION
Tomography builds up higher dimensional objects from lower dimensional projections. Quantum tomography [1] is a strategy to reconstruct all that can be observed about a quantum physical system. After becoming a focal point of quantum computing, quantum tomography has recently been applied in a variety of domains [2][3][4][5][6][7][8][9].
The method of quantum tomography uses a known "probe" to explore an unknown system. Data is related directly to matrix elements, with minimal model dependence and optimal efficiency.
Collider physics is conventionally set up in a framework of unobservable and model-dependent scattering amplitudes. In quantum tomography these unobservable features are skipped to deal directly with observables. The unknown system is parameterized by a certain density matrix ρ(X), which is model-independent. The probe is described by a known density matrix ρ(probe). The matrices are represented by numbers generated and fit to experimental data, not abstract operators. Quantum mechanics predicts an experiment will measure tr(ρ(probe) · ρ(X)), where tr is the trace. In many cases ρ(probe) is extremely simple: A 3 × 3 matrix, say. What will be observed is strictly limited by the dimension and symmetries of the probe. The powerful efficiency of quantum tomography comes from exploiting the probe's simplicity in the first steps. The description never involves more variables than will actually be measured.
We illustrate the advantages of quantum tomography with inclusive lepton-pair production. It is a relatively mature subject chosen for its pedagogical convenience. Despite the maturity of the subject, we discover new things. For example, the puzzling plethora of plethora of ad hoc invariant quantities is completely cleared up. We also find new ways to assist experimental data analysis. Positivity is a central issue overlooked in the literature, which we show how to control. Moreover, the tomography procedure carries over straightforwardly to many final states, including the inclusive production of charmonium, bottomonium, dijets, including boosted tops, HH, W + W − , ZZ . Our practical guide to analyzing experimental data uses density matrices at each step and circumvents the more elaborate traditional theoretical formalism. We concentrate on making tools available to experimentalists. We give a step-by-step guide where density matrices stand as definite arrays of numbers, bypassing unnecessary formalism.

PRODUCTION
The tomography procedure reconstructs all that can be observed about a quantum physical system. For inclusive lepton pair production, what can be observed is the invariant mass distribution, the lepton pair angular distribution dN/dΩ and the polarization of the unknown intermediate state of the system, contained in ρ(X). 1 In this section we reconstruct dN/dΩ and ρ(X) from first principles using tomography. Structure functions and model-dependent assumptions about the intermediate state, common to the traditional formalism [33][34][35][36][37], do not appear.
Expert readers, who are accustomed to seeing some of these formulas derived, might note that the method of derivation is particularly simple. The particular steps we do not follow are to be noted. That also explains why some of the relations we find seem to have been overlooked in the past.

A. Kinematics
Consider inclusive production of a lepton pair with 4-momenta k, k from the collision of two hadrons with 4-momenta P A , P B : where X and the final state lepton spins are unobserved and thus summed over. In the high energy limit k 2 = k 2 = 0.
Let the total pair momentum Q = k + k . The azimuthal distribution of total pair momenta in the lab frame is isotropic. Lepton pair angular distributions are described in the pair rest-frame defined event-by-event. In this frame the pair momenta are back-to-back and equal in magnitude. The frame orientation depends on the beam momenta and the pair total momentum.
Defining momentum 2 observables via a Lorentz-covariant frame convention allows calculations to be done in any frame. In its rest frame the total pair momentum Q µ = ( Q 2 , Q = 0). A set of xyz spatial axes in this frame will be defined by three 4-vectors X µ , Y µ , Z µ , satisfying The frame vectors being orthogonal implies Taking P A =(1, 0, 0, 1), P B =(1, 0, 0, -1) (light-cone ± vectors), a frame satisfying the relations of Eq. 1 and Eq. 2 is given by 3Z These frame vectors define the Collins-Soper (CS) frame. 4 The normalized frame vectors are In fact,ˆ J = (sin θ cos φ, sin θ sin φ, cos θ) J , where θ, φ are the polar and azimuthal angles of one (e.g. plus-charge) lepton in the rest frame of Q. The meaning of a "Lorentz invariant cos θ" is a scalar Z µ (k − k ) µ which becomesẑ ·(k − k ) in the rest frame of Q.

B. The angular distribution, in terms of the probe and target density matrices
The standard amplitude for inclusive production of a fermion-anti-fermion pair of spin s, s has a string of gamma-matrices contracted with final state spinors v a (k s ),ū b (k, s). When the amplitude is squared, these factors appear bi-linearly, as in Summing over unobserved s and dropping mδ aa , a form of density matrix appears: The Feynman rules for the density matrix of two relativistic final state fermions (or anti-fermions, or any combination) is a factor given by This fundamental equality is not present in pure state quantum systems. There is no spinor corresponding to a fermion averaged over initial spins, nor to a fermion summed over final spins.
As shown in the Appendix, the rest of the cross section appears in the target density matrix ρ(X), which must have four indices to contract with the probe indices: where dLIPS is the Lorentz invariant phase space.
Note that u a (ks)ū a (ks) is not positive definite since the Dirac adjointū a (ks) = (u † (ks)γ 0 ) a has a factor of γ 0 , introduced by convention. Removing it, ∑ s u a (ks)u † a (ks) becomes positive by inspection. (Any matrix of the form M · M † has positive eigenvalues.) ρ(k, k ) as written is not normal-ized, because the Feynman rules shuffle spinor normalizations into overall factors. To make the arrow in Eq. 4 into an equality, multiply on the right by γ 0 twice, and standardize the normalizations. The same steps applied to ρ aa , bb (X) cancels the γ 0 factors. The result is that the probability to find two fermions has the fundamental quantum mechanical form 5 P(k, k ) = tr(ρ(k, k )ρ(X)).
The left side of Eq. 5 is dσ(k, k ), the same as the joint probability P(Q, init) where init are the initial state variables. The phase space for two leptons converts as We can write P(Q, init) = P( Q, init)P(Q init).
Here P(Q init) = dσ/d 4 Q, and P( Q, init) = dN/dΩ is the conditional probability to find given Q and the initial state. This factorization is general and unrelated to one-boson exchange, parton model, or other considerations. Since P( Q, init) is a probability, quantum mechanics predicts it is a trace: where tr indicates the trace, dΩ = dcosθ · dφ, and ρ( ), the probe, is a 3 × 3 matrix to be defined momentarily which depends only on the directionsˆ J . The target hadronic system is represented by ρ(X). Since the probe ρ( ) is a 3 × 3 matrix, then ρ(X) is a 3 × 3 matrix of numbers. 6 The description has just been reduced from ρ aa , bb (k, k ), a Dirac tensor with 4 4 possible matrix elements, to a 3 × 3 Hermitian matrix with 8 independent elements, since tr(ρ( )) = 1 is one condition. Equation 6 is the most general angular distribution that can be observed. It is valid for like-sign and unlike sign pairs, and assumes no model for how the pairs are produced. The Dirac form (and Dirac traces) is over-complicated, because describing every possible exclusive reaction for every possible in and out state is over-achieved in the formalism.

C. The probe matrix
The probe matrix ρ( ) is given by which is derived in the Appendix. The Standard Model predicts only two parameters, a and b. If on-shell lepton helicity is conserved (as in lowest order production by a minimally-coupled vector boson) then a = 1/2 and b = c A c V . The latter is not a prediction but a definition. If the production is parity-symmetric then c A = 0. The only non-trivial prediction of the Standard Model is the value of c A c V . Lowest-order production by Z bosons predicts b = sin 2 θ W ∼ 0.22. More generally, the probe matrix itself represents a reduced system that is unknown a priori. It should be determined experimentally. Consider the angular distribution of e + e − → µ + µ − . Let ρ(e;ẑ) describe electrons with parameters a e , b e colliding along the z axis. Let ρ(µ;ˆ ) describe muons with parameters a µ , b µ emerging along directionˆ . A short calculation using Eq. 7 twice gives 7

D. How tomography works: dN/dΩ as a function of ρ( ), ρ(X)
LetĜ be a set of probe operators, with expectation values < G >= tr(G ρ(X)). The trace defines the Hilbert-Schmidt inner product of operators. The condition for operators (matrices) to be orthonormal is tr(G G k ) = δ k orthonormal matrices. (9) There are N 2 − 1 orthonormal N × N Hermitian operators, not including the identity. When a complete set of probe operators has been measured, the density matrix is tomographically reconstructed from observables as For a pure state density matrix, there exists a basis {G } such that only one term appears in the sum over . Then ρ pure = |ψ >< ψ|, and |ψ > is reconstructed as the eigenvector of ρ pure . Each orthogonal probe operator measures the corresponding component of the unknown system, and is classified by its transformation properties. For angular distributions the transformations of interest are rotations. ρ( ) contains tensors transforming like spin-0, spin-1 and spin-2. Each tensor of a given type is orthogonal to the others.
Organizing transformation properties simplifies things significantly. Recall the general form of ρ( ), from Eq. 7. The most general form for ρ(X) that is observable will have the same general expansion, with new parameters: These formulas reiterate Eq. 7 while identifying (J k ) ij = −ı ijk as the generator of the rotation group in the 3 × 3 representation. 8 Upon taking the trace as an inner product, orthogonality selects each term in ρ(X) that matches its counterpart in ρ( ). For example J is orthogonal to all the other terms except the same component of J: Orthogonality makes it trivial to predict which density matrix terms can be measured by probe matrix terms. We call the matching of terms "the mirror trick." We now make several relevant comments about Eq. 10 and Eq. 11: • All density matrices can be written as 1 N×N /N to take care of the normalization, plus a traceless Hermitian part. The unit matrix is the spin-0 part and invariant under rotations. The only contribution of the 1 terms is tr(1 × 1)/N 2 = 1/N.
• The textbook density matrix spin vector S consists of those parameters coupled to the angular momentum operator. This is also called the spin-1 contribution. The quantum mechanical average angular momentum of the system is When the coordinates are rotated, the J matrices transform exactly so that S rotates like a vector under proper rotations, and a pseudovector under a change of parity.
• The last term of Eq. 10, the spin-2 part, is real, symmetric and traceless. By the mirror trick it can only communicate with a corresponding spin-2 term in ρ(X) denoted U ij (X), which is real, symmetric and traceless. It can be considered a measure of angular momentum fluctuations: A common mistake assumes the quadrupole U should be zero in a pure "spin state." Actually a pure state with | S| = 1 has a density matrix For example, when S =ẑ the density matrix has one circular polarization eigenstate with eigenvalue unity, and two zero eigenvalues. Pure states exist with S = 0: They have real eigenvectors corresponding to linear polarization. From the spectral resolution ρ(X) = ∑ α λ α |e a >< e α |, there is no observable distinction between a density matrix and the occurrence of pure states |e a > with probabilities λ α , which are the density matrix eigenvalues.
• As it stands the U ij matrices in Eq. 10 and Eq. 11 have not been expanded in a complete set of symmetric, orthonormal 3 × 3 matrices. Regardless ρ(X) can be fit to data whether or not an expansion is done. The purpose of such work is to complete the classification process to assist with interpreting data. We sketch the steps here. Details are provided in an Appendix.
Let E M be a basis of traceless orthonormal matrices where U( ) = ∑ M tr(U( )E M )E M . This is the tomographic expansion of the probe. Choose E M so the outputs are normalized realvalued spin-2 spherical harmonics Y M (θ, φ). The expansion of the unknown system will be By orthogonality the spin-2 contribution to the angular distribution will be dN dΩ Writing out the terms gives The label X has been dropped in ρ M and MM is a matrix available from textbooks [38]. The traditional A k , λ k conventions do not use orthogonal functions. Transformations from the traditional conventions to the ρ M convention are given in an Appendix.
Note the transformation properties listed are exact. The systematic and statistical errors of a measurement appear in fitting ρ(X).

E. Fitting ρ(X), dN/dΩ
Quantum mechanics requires ρ(X) must be positive, which means it has positive eigenvalues. Positivity produces subtle non-linear constraints, similar to unitarity. In the 3 × 3 case the relations are generally cubic polynomials. Positivity is not the same concept as yielding a positive cross section, and generally is a more restrictive set of relations. 9 If density matrices are not used it is quite straightforward to fit data yielding a positive cross section while violating positivity.
Fortunately positivity can be implemented by the Cholesky decomposition of ρ X [39], which is discussed in the Appendix. For the 3 × 3 case it is: where the parameters −1 ≤ m α ≤ 1.
Event by event ρ( ) is an array of numbers, and ρ(X)(m) is an array of parameters. The results are combined to make the Jth instance of tr(ρ J ( )ρ(X)(m)), where ρ(X)(m) has been parameterized in Eq. 14. Fit the m α parameters to the data set. For example, the log likelihood L of the set Sample code available online 10 carries out these steps, returning parameters m α . The details of cuts and acceptance appear in fitting the numbers m α using numbers for the lepton matrix ρ(lep) (not angles, nor trigonometric functions.) In one example with simulated Z-boson data we found Using the Standard Model parameters for ρ( ), Eq. 3 and Eq. 7, the trace yields where ... indicates several terms there is no need to write out. Integrated over φ, this expression becomes 1.57 + 0.137 cos θ + 1.56 cos 2 θ .
A 1 + cos 2 θ distribution is the leading order Drell-Yan prediction for virtual spin-1 boson annihilation, while 0.137 cosθ represents a charge asymmetry. It is trivial to go from tr(ρ( )ρ(X)) to a conventional parameterization of an angular distribution by taking inner products of orthogonal functions. It is also easy to expand ρ(X) in a basis of orthonormal matrices with the same results. Note these steps are exact, and much different from fitting data to trigonometric functions in some convention, which tends to yield multiple solutions, along with violations of positivity, which can introduce pathological convention-dependence. Perhaps struggles with convention-dependence of quarkonium data [41,42] are related to this. It would be interesting to investigate.

F. Summary of quantum tomography procedure
To analyze data for each event labeled J: • Make the lepton density matrix. For Z bosons in the Standard Model it is • The results are combined to make the Jth instance of tr(ρ J ( )ρ(X)(m)), where ρ(X)(m) has been parameterized in Eq. 14. Fit the m α parameters to the data set. For example, the log likelihood L of the set J = 1...J max is Sample code available online (see footnote 10) carries out these steps, returning parameters m α .
G. Comments 1. The possible symmetries of ρ( ) enter here. Suppose c A = 0. Then ρ( ) is even under parity, real and symmetric. The imaginary antisymmetric elements of ρ(X) are orthogonal, and contribute nothing to the angular distribution. When known in advance, the redundant parameters of ρ(X) can be set to zero while making the fit. (That does not mean unmeasured parameters can be forgotten when dealing with positivity.) In general a fitting routine will either report a degeneracy for redundant parameters, or converge to values generated by round-off errors. Degeneracy will always be detected in the Hessian matrix computed to evaluate uncertainties.
2. The normalization condition ∑ k m 2 (k) = 1 can be postponed by removing 1/ m 2 α from Eq. 14, and subtracting J max log(∑ J max k m 2 (k)) from the log-likelihood (Eq.17). When that is done the fitted density matrix will not be automatically normalized, due to the symmetry ρ(X) → λρ(X) of the modified likelihood. The density matrix becomes normalized by dividing by its trace. Incorporating such tricks improved the speed of the code available online (see footnote 10) by a factor of about 100.
3. Algorithms are said to compute a "unique" Cholesky decomposition, which would seem to predict m α given ρ(X). The algorithms choose certain signs of m α by a convention making the diagonals of M positive. However that is not quite enough to assure a numerical fit finds a unique solution.
The fundamental issue is that MM † = ρ(X) is solved by M = ρ(X), and the square root is not unique. There are 2 N arbitrary sign choices possible among N eigenvalues of ρ(X). Forcing the diagonals of M to be positive reduces the possibilities greatly, and an algorithm exists to force a unique, canonical form of m α in a data fitting routine. We did not make use of such a routine, since fitting ρ(X) is the objective. Depending upon the data fitting method, increasing the number of ways for M(m α ) to make a fit sometimes makes convergence faster.
4. Let <> exp stand for the expectation value of a quantity in the experimental distribution of events. By symmetry <ˆ > exp and <ˆ iˆ j > exp are vector and tensor estimators, respectively, which must depend on the vector and tensor parameters S, U ij (X) in the underlying density matrix. A calculation finds An estimate of ρ(X) not needing a parameter search then exists directly from data. However positivity of ρ(X) is more demanding, and not automatically maintained by such estimates.

Convex Optimization
The issue of multiple solutions for ρ(X) is different. Multiple minima of χ 2 statistics affects fits to cross sections parameterized by trigonometric functions. However, quantum tomography using maximum likelihood happens to be a problem of convex optimization. In brief, when ρ is positive then < e|ρ|e > is a positive convex function of |e >. Then tr(ρ( )ρ(X)) is convex, being equivalent to a positively weighted sum of such terms. The logarithm is a concave function, leading to a convex optimization problem. That means that when ρ(X) is a local maximum of likelihood it is the global maximum. Exceptions can only come from degeneracies due to symmetry or an inadequate number of data points [43]. Convex optimization is important because without such a property the evaluation of high-dimensional fits by trial and error can be exponentially difficult. Table I lists discrete transformation properties of all terms under parity P, time reversal T, and lepton charge conjugation C . If leptons have different flavors (as in like or unlike sign eµ) the C operation swaps the particle definingˆ .

Discrete Transformation Properties
When coordinates XYZ are defined the direction ofŶ =Ẑ ×X is even under time reversal and parity, which is exactly the opposite of X and Z. Then S ·Ŷ is T-odd, contributing the sin θ sin φ term. 11 The XY and ZY matrix elements of ρ(X) are also odd under T, contributing the terms shown. T-odd terms come from imaginary parts of amplitudes, which are generated by loop corrections in perturbative QCD. 11 In a forthcoming study [44] of inclusive lepton pair production near the Z pole, we find interesting, new features in the S y data of Ref. [11].
term origin dN/dΩ C P T C P PT  Notice that every term in the lepton density matrix (Eq. 7) is automatically symmetric under C P. This is a kinematic fact of the lepton pair probe which does not originate in the Standard Model. As a result the C P transformations of the angular distribution depend on the coupling to the unknown system. If overall CP symmetry exists the target density matrix will have CP odd terms where C P odd terms are found. In the Standard Model these cos θ and sin θ cos φ terms correspond to charge asymmetries of leptons correlated with charge asymmetries of the system, namely the beam quark and anti-quark distributions.
While weak CP violation is a mainstream topic, P and CP symmetry of the strong interactions at high energies has not been tested [45]. The gauge sector of QCD is kinematically CP symmetric, because the non-Abelian tr( E · B) term is a pure divergence. 12 . Higher order terms in a gaugecovariant derivative expansion are expected to exist, and can violate CP symmetry [45].
However, measuring violation of CP or fundamental T symmetry in scattering experiments is invariably frustrated by the experimental impossibility of preparing time-reversed counterparts. Some ingenuity is needed to devise a signal. It appears that any signal will involve four independent 4-momenta p J and a quantity of the form Ω 4 = αβλσ p α 1 p β 2 p λ 3 p σ 4 . For example a term going like · Y ∼ αβλσ α Q β P Aλ P Bσ might possibly originate in fundamental T symmetry violation, and be mistaken for perturbative loop effects. A more creative road to finding CP violation involves two pairs with sum and difference vectors Q, ; Q , , and the scalar αβλσ α Q β Q λ σ , which is even under C and odd under P. The pairs need not be leptons (although "double Drell Yan" has long been discussed) but might be (say) µ + µ − π + π − . It would be interesting to explore further what a tomographic approach to such observables might uncover.

Density Matrix Invariants
We mentioned that scattering planes, trig functions, boosts and rotations could be avoided, and the examples show how. Once a frame convention is defined the lepton "coordinates" (X J · J , Y J · J , Z J · J ) are actually Lorentz scalars. However they depend on the convention for XYZ, which is arbitrary. At least four different conventions compete for attention. Moreover, once a frame is chosen, at least two naming schemes (the "A k " and "λ k " schemes) exist to describe the angular distribution in terms of trigonometric polynomials..
Well-constructed invariants can reduce the confusion associated with convention-dependent quantities [44,[46][47][48]. Since S transforms like a vector its magnitude-squared S 2 is rotationally invariant. The spin-1 part of ρ(X) does not mix with the real symmetric part under rotations. Since it is traceless, the real symmetric (spin-2) part has two independent eigenvalues, which are rotationally invariant. 13 Finally the dot-products of three eigenvectorsê J of the spin-2 part with S are rotationally invariant. Then (ê j · S) 2 are three invariants not depending on the sign of eigenvectors. That suggests six possible invariants, but ∑ j (ê j · S) 2 = S 2 makes the S invariants dependent, leaving five independent rotational invariants. That is consistent with counting 8 real parameters in a 3 × 3 Hermitian matrix, subject to 3 free parameters of the rotation group, leaving 8-3=5 rotational invariants. The same counting for unitary transformations would leave only the two independent eigenvalues of the matrix.
Any function of invariants is invariant. The combinations below have useful physical interpretations: • The degree of polarization d is a standard measure of the deviation from the unpolarized case.
It comes from the sum of the squares of the eigenvalues of ρ minus 1/3, normalized to the maximum possible: When d = 0 the system is unpolarized, and when d = 1 the system is a pure state.
• The entanglement entropy S is the quantum mechanical measure of order. The formula is In terms of eigenvalues ρ α , S = − ∑ α ρ α log(ρ α ). When ρ → 1 N×N /N the system is unpo- 13 Work by Faccioli and collaborators [49,50] attempted to construct invariants by inspecting the transformation properties of ratios of sums of angular distribution coefficients upon making rotation about the conventional Y axis. The method cannot identify a true invariant unless Y happens to be an eigenvector of the matrix. By the same method the group also identified S 2 as a "parity violating invariant," while S 2 is actually even under parity. Parity violation is not required to measure S with polarized beams. larized, and S = log(N). That is the maximum possible entropy, and minimum possible information. When S = 0 the entropy is the minimum possible, providing the maximum possible information, and the system is a pure state.
It is instructive to interpret e S as the "effective dimension" of the system. For example the eigenvalues (1/2 + b, 1/2 − b, 0) occur in the density matrix of on-shell fermion annihilation with helicity conservation. One zero-eigenvalue describes an elliptical disk-shaped object. The entropy ranges from S = 0, (e S = 1 for b = 1/2, a one dimensional stick shape) to S = log(2), (e S = 2 for a disk-shaped object with maximum symmetry.) As expected, an unpolarized 3-dimensional system has three equal eigenvalues, is shaped like a sphere, and e S → 3. Figure 1 shows the entropy of the lepton density matrix ρ( ) (Eq. 7) in the plane of parameters The Standard Model leptons from lowest order s-channel Z production have a = 1/2, b = sin 2 θ W , which is shown in Fig. 1 as a dot. The edge a = 1/2 corresponds to on-shell helicity conservation, with eigenvalues 0, 1/2 − sin 2 θ W , 1/2 + sin 2 θ W . The a, b parameters of leptons from a different production process, or subject to radiative corrections, must still lie inside the triangle. Maximal symmetry with eigenvalues (1/2, 1/2, 0) occurs where the line of b = sin 2 θ W just touches the b = −a line, which happens at sin 2 θ W = 1/4. That is not far from the Standard Model value, which is very interesting. Since no established theory predicts sin 2 θ W one cannot rule out a deeper connection.
It is tempting but incorrect to assume the bounds discussed would apply to the same terms of a more general density matrix. For example, add −cn inj to the expression in Eq. 7, wherê n ·ˆ = 0 and update the normalization condition. The resulting positivity region of a, b, c is shown in Figure 2, which also shows the plane c = 0 equivalent to Figure 1. At the extrema c = ±1 the region of consistent (a, b) parameters shrinks to single points.
The matrix for ρ(X) computed earlier is an example where all terms in any standard convention happen to occur. By inspection this system (mostly quark-antiquark annihilation) is superficially much like the lepton one. The entropy of is 0.68 and e S = 1.96, and one eigenvalue is close to zero. Of course there is much more information in the other parameters, the orientation of eigenvectors, S, and its magnitude.

IV. DISCUSSION
The quantum tomography procedure offers at least seven significant advantages over standard methods of analyzing the angular correlations of inclusive reactions: • Simplicity and Efficiency. Tomography exploits a structured order of analysis. By construction, unobservable elements never appear.
• Covariance. Physical quantities are expressed covariantly every step of the way. That is not always the case with quantities like angular distributions.
• Complete polarization information. The unknown density matrix ρ(X) contains all possible information, ready for classification under symmetry groups.
• Model-independence. No theoretical planning, nor processing, nor assumptions are made about the unknown state. The process of defining general structure functions has been completely bypassed. It is not even necessary to assume anything about the spin of sor t−channel intermediates. The observable target structures is always a mirror of the probe structure. The "mirror trick" is universal as described in Section II D.
• Manifest positivity. A pattern of misconceptions in the literature misidentifies positivity as being equivalent to positive cross sections. It is not difficult to fit data to an angular distribution and violate positivity. In fact, an angular distribution expressed in terms of expansion coefficients actually lacks the quantum mechanical information to enforce positivity.
• Convex optimization. The positive character of the density matrix leads to convex optimization procedures to fit experimental data. This provides a powerful analysis tool that ensures convergence..
• Frame independence. Once the unknown density matrix has been reconstructed, rotationally invariant quantities can be made by straightforward methods. This is illustrated in Section III A 3, which includes a discussion of the entanglement entropy.
Quantum tomography has already yielded significant results. Our tomographic analysis [44] of a recent ATLAS study of Drell-Yan lepton pairs with invariant mass near the Z pole [11] discovered surprising features in the density matrix eigenvalues and entanglement entropy. By way of advertising, we have also gained insight into the mysterious Lam-Tung relation [51], including why it holds at NLO but fails at NNLO. These topics will be presented in separate papers.
where dΠ LIPS ∼ dΩ · d 4 q and q is the sum of the final-state pair momentum. Then, for given pair momentum q, the angular distribution is, where the proportionality suppresses an overall normalization. The conditional probability given q captures the event-by-event character of angular correlations. The explicit q dependence might suggest we assumed an s-channel boson intermediate state, but we have not.
If we know dσ/dΩ and ρ probe for a given event, we can reconstruct ρ X for that event. This is the essence of the quantum tomography procedure. The probe is what is known, and it determines what can be discovered. Figure 3 shows the diagram for the simplest lepton-pair probe. It is completely non-specific about the process creating a lepton with momentum k 1 and Dirac density matrix polarization αα , and an anti-lepton with momentum k 2 and polarization ββ. From the Feynman rules

B. Probe matrix
The symbol ∼ indicates the high energy limit and ignoring a trivial overall normalization.

FIG. 3:
The lepton pair density matrix ρ αα ββ (lep), in black, coupled to the colliding system density matrix ρ(X). The matrix labels on legs are diagonal in momenta k 1 , k 2 . Off-diagonal polarization (Dirac) indices are explicitly shown. The Feynman rules are the same as for ordinary diagrams.
Continuing, ρ(X) is something of vast complexity, which can only couple to ρ(lep) via the indices shown. The Dirac structure of ρ(X) can be expanded over several complete sets. However the relevant (observable) part of ρ(X) ββ αα will be its projection onto the subspace coupled to this particular 22. If and when the lepton pair originates from an intermediate boson with vertex c V γ µ + c A γ µ γ 5 , then The general possibility these parameters might be functions of q 2 , namely vertex form factors, has emerged on its own. Notice that in the rest frame oriented naturally along the lepton momentum ρ(lep) is not diagonal. The diagonal elements are interpretable as probabilities, even classical probabilities. The off-diagonal elements convey the information about entanglement.

C. Positivity
There is a positivity issue in fitting angular distribution data. Represent ρ = MM † , and then ρ > 0. Any M = HU, where H = H † and UU † = 1. We can make M self-adjoint since U cancels out. To parameterize N × N matrices M use SU(N) representations G a , normalized to G a G b = (1/2)δ ab . For N = 3 those are the Gell-Mann matrices. We define parameters with Compute The symmetric product is Check the trace of both sides for the normalization of δ ab . Everything else must be traceless and spanned by G g . Then The normalization tr(ρ) = 1 needs ∑ 9 µ=0 m µ m µ = 1. This requires each 0 < m 2 µ < 1, while it is more restrictive.
There is a degeneracy issue in the nonlinear relation of m µ to a straight expansion ρ = 1/3 + c g G g , The positivity problem is often solved with the Cholesky decomposition: ρ = LL † , where L is a lower-triangular matrix with real entries on the diagonal. L is related to M by a similarity transform. There are N + 2N(N − 1)/2 = N 2 free real parameters in a lower-triangular matrix with real diagonals, which is just right for Hermiticity. As before, the condition tr(ρ) = 1 requires ∑ m µ m µ = 1. The Cholesky decomposition is unique, in the sense above, when ρ is positive definite.

D. Collected conventions
As a consequence of consistent definitions, our ρ M and Y M transform under rotations like real representations of spin-2. Other conventions have long existed. Table II shows the relations of the ρ M parameters compared to the ad-hoc conventions known as A k and λ k . The ρ M are selfexplanatory because they correspond to orthonormal harmonics and transform like spin-2 representations. The arbitrary normalizations and conventions relating different basis functions have been a barrier to interpretation, needlessly complicated transformations between angular frame conventions.
1/ √ 3 − √ 3 cos 2 (θ) sin(2θ) cos(φ) sin 2 (θ) cos(2φ) sin 2 (θ) sin(2φ) sin(2θ) sin(φ) Three ways of parameterizing the monopole (left of double vertical line) relative to quadrupole (right) part of the angular distribution. The bottom row (right) represents our spin-2 basis functions that are both orthogonal and uniformly normalized, compared to ad-hoc conventions of the other rows. The spin-2 coefficient combinations listed in each row are the ones that mix linearly under rotations of the frame coordinates. To find the parameterization of a given row, multiply the coefficient in each column by the function at the bottom, and add. To absolutely normalize the A j form, multiply the entire sum by 3/(16π), and the normalized series will begin at 1/(4π). To absolutely normalize the λ j form multiply the entire sum by 3/(4π). The ρ M form is absolutely normalized by definition. The constant c = 3/(8π √ 2) has been divided out to match the other conventions. The absolutely normalized form uses the sum of the ρ M row multiplied by c.
Our self-explanatory conventions for the spin-1 parameters are given in Table III. For example, it is quite easy to remember that S x → sinθcosφ, S y → sinθsinφ and S z leads to cosθ angular dependence.
S y S z sin(θ) cos(φ) sin(θ) sin(φ) cos(θ) TABLE III: Parameterizing the spin-1 part of the angular distribution. To form the angular distribution coefficients from each row are multiplied by functions on the bottom row and added to those from Table II