Seeing opportunity in every difficulty: protecting information with weak value techniques

A weak value is an effective description of the influence of a pre and post-selected 'principal' system on another 'meter' system to which it is weakly coupled. Weak values can describe anomalously large deflections of the meter, and deflections in otherwise unperturbed variables: this motivates investigation of the potential benefits of the protocol in precision metrology. We present a visual interpretation of weak value experiments in phase space, enabling an evaluation of the effects of three types of detector noise as 'Fisher information efficiency' functions. These functions depend on the marginal distribution of the Wigner function of the meter, and give a unified view of the weak value protocol as a way of protecting Fisher information from detector imperfections. This approach explains why weak value techniques are more effective for avoiding detector saturation than for mitigating detector jitter or pixelation.

involves the definition of a new quantity A w , the weak value. It may be thought of as a generalisation of the eigenvalue ofÂ [3]. As can be seen from the above expression, it emerges only through a weak coupling (g 1) of a principal system, preselected into |i and preselected into |f , with a 'meter' degree of freedom, initialised in |m . This generalisation leaves us with an economical, effective description of meter solely in terms of the generally complex and unbounded quantity A w . The physical effect of the imaginary component [4], the breakdown in the aforementioned approximations [5,6], and the full-order, exact evolution of the meter [7] all fall within the purview of Tsutsui's first category and have been the subject of many studies. Ref [6] is a comprehensive treatment of nonperturbative analogues of Equation (1) featuring various coupling Hamiltonians and meter states. Insofar as research in this category is mathematical it is uncontroversial, and the discussion mainly focusses on the approximations to be made and the range of their validity.
The fruits of this first category of research feed into the other two. The second category considers the foundational aspects of weak values. They have been associated with quantum paradoxes [8], with negative probabilities, and with macrosopic realism [9] (as has the wider topic of pre and post-selected measurements [8,10]). Weak values have been argued to provide an operational definition of the quantum wavefunction [11], and to reveal particle trajectories of Bohmian mechanics [12]. The question of whether weak values are quantum mechanical has also been debated [13]. It is perhaps unsurprising that this second category of research has enjoyed a large deal of disagreement and controversy, since it involves speculation into the meaning of weak values, and indeed into the meaning of quantum mechanics itself (where there remain open questions on the interpretation of the formalism and the peculiar role of measurement [14]).
The third category ought to be far more clear-cut, for it deals with the use of weak values and associated techniques in technology, specifically for parameter estimation. Here, one might expect, the question of whether the technique is genuinely useful or not would be a matter of fact: unforgiving and unquestionable. Why then, has this line of research also been afflicted by apparent contradictions and disagreements? The third category was ignited by an experiment in which a weak value technique was used to measure the spin-Hall effect of light [15]. The interaction between the polarisation and transverse momentum of a light beam at an interface may be modelled using Equation (1), where the coupling constant g is considered an unknown parameter to be investigated. Using clever pre and postselected polarization states the lateral displacements could be measured with a sensitivity of around 1 Angstrom. Other impressive experiments include the estimation of the angular tilt of a mirror in a Sagnac interferometer to a precision of a few hundred femtoradians [16]. From 2013 onwards, however, theoretical treatments of the weak value technique called into doubt whether there was any true advantage over standard strategies [17,18,19,20], especially from the point of view of parameter estimation. There were also a slew of theoretical [21,22] and experimental [23] papers arguing for the merits of weak value techniques. A review of these results is given in Ref [24].
A central tool in our approach is to use Fisher information, rather than the more commonly employed signal-to-noise ratio. Fisher information is a way of measuring the amount of information that an observable random variable such as x or k carries about an unknown parameter upon which the probability of x or k depends. In AAV's original work [2], the subject of investigation is the operatorÂ itself. The canonical example they gave was of a spin 1/2 particle traversing a magnetic field gradient, as in the famous Stern-Gerlach experiment (where the coupling between the spin and the field gradient causes a bream of such particles to be deflected 'up' or 'down' depending on their spin state). The title of the paper trumpeted the astonishing result that A w could take very large values (say 100), where usually an eigenvalue bounded by ±1/2 would appear. From the parameter estimation perspective, however, the emphasis is rather more on the coupling constant g: in the Stern-Gerlach setup this corresponds to the magnitude of the magnetic field gradient, and generally controls the 'strength' of the measurement via the degree of correlation that is built up between principal system and meter. Since the weak value A w multiplies g in Equation (1), it controls the information about g that can be extracted in the subsequent detection of the meter.
The lively debate in the third category has left open the question whether and how weak value techniques can offer precision advantages. The arguments against, in a nutshell, state that the postselected meters carry an amplified amount of Fisher information per measured data point, but this is more than cancelled by the low probability of the signal for each potential data point surviving the postselection process. So the arguments seem quite clear-cut after all -but such a summary belies some of the subtleties in evaluating metrological merit. Firstly, there is the issue of technical noise: as long as postselection is performed in the experiment before any ultimate detection (say with a polariser), rather than merely implemented as voluntary data loss, the weak value technique has characteristics that physically distinguish it from regular approaches. This means that certain types of detector noise may favour one approach over the other, potentially reversing the conclusions reached for ideal detection: although it is then difficult to say in full generality when advantages may be had, and the situation must be judged mostly on a case by case basis. Secondly, there is the issue of resourcecounting. This is an ever present question in metrology, and is informed by the 'cost' that is associated with each component of an experiment. The figure of merit (e.g. the precision of the method) is often given 'per unit cost', as in 'Fisher information per photon'. Since the constraints on the cost depend on the nature of the experiment, and can even vary from laboratory to laboratory, any general claims will necessarily depend on the details. Nevertheless we will discuss a scenario that will end up favouring weak value techniques: namely, when system-meter pairs (e.g. photons) are more costly to detect than to create.
In this paper we present a new, visual way of intuiting the utility of weak values. The main tool we use is the Wigner function: a quasi-probability distribution over phase space. We hope our approach may help bring clarity to an otherwise confusing field. We will not provide a comprehensive treatment of all possible implementations of the weak value protocol, but instead make choices that simplify the arguments while remaining faithful to the original approach and also representing most experiments so far. In our treatment of three types of technical noise -namely transverse jitter, pixelation and saturation -we rely heavily on theoretical results that have been previously reported [19,6,25] but require some re-application in our new setting.

A simple model
We will consider a simple model of a qudit (the principal system) coupled to a single continuously varying degree of freedom (the meter), prepared in a real-valued Gaussian state: ψ(x) represents the position wavefunction of the meter, while φ(k) describes the distribution over values for the conjugate momentum. σ is a parameter controlling the uncertainty in x; and since Gaussian states saturate the Heisenberg uncertainty principle it also controls the uncertainty in k, but with an inverse relationship. To develop our visual depiction of weak-value experiments, we make use of the Wigner function description [26] of ψ: The Wigner function may be thought of as a representation of the wave-function that allows one to simultaneously visualise both position (x) and momentum (k) variables. When x is a component of the transverse position of a quantum particle (e.g. a photon or spin) propagating along z, k is the corresponding component of the transverse wavevector (with the other transverse coordinate integrated out). Despite the possibility of negative regions (for a general Wigner function) precluding the interpretation W (x, k) as a genuine probability distribution, it has the attractive property that its marginal distributionsp(k) = W (x, k)dx and p(x) = W (x, k)dk correspond to those derived in the usual fashion, via the Born rule, for measurements of k and x respectively. They are therefore bona fide probability distributions, in the sense that they are positive and integrate to unity. Since the initial state of the meter is normalised, we thus naturally have that the volume of the initial Wigner function W (x, k)dxdk = 1. See Figure 1.

Weak value technique
By applying equation (1) to (2), the Wigner function of the meter after the weak interaction and conditioned on successful postselection of the system into |f will be: This can be realised by spotting that the real and imaginary components of the weak value have actions e −igReA wk (unitary) and e gImA wk (non-unitary) which

Standard measurement
Limit of weak value technique Ф Fig. 1 We represent the initial quantum state of the meter via a Wigner function. A standard measurement always succeeds and results in a small shift in the x direction (black arrow, final state show in Figure 4). The weak value technique is postselected, and allows for larger shifts in any direction φ in phase space. For the linear regime considered by AAV to hold, the weak value shift must lie well within the red curve, which is a parametric ellipse traced out by the maximum of the Wigner function when g|Aw| = σ is substituted into Equation (4) and φ is allowed to vary. The characteristics of weak value experiments offer no fundamental advantage under perfect detection, but can alleviate certain types of detector imperfection. In this figure we take σ = 1, g = 0.1, λ * = 1.
mutually commute. This expression extends beyond the simple case of a real Gaussian function that we consider here but not to arbitrary complex valued functions: for a comprehensive treatment, see Ref [6].
The approximation leading to Equation (4) requires g|A w | σ [5]; a condition that is depicted as a red ellipse in Figure 1. The probabilistic nature of the protocol is reflected in W A w being sub-normalised: the volume under the post-selected Wigner function is less than one and equal to the success probability.
The Wigner function representation implies that a generally complex weak value will introduce a shift in an oblique direction in phase space. This raises the question of using 'rotated' phase space observables to detect the shift. Definê s θ :=x cos θ +k sin θ.
The distribution for this rotated observable may be extracted from the Wigner function by integrating over its conjugatet θ := −x sin θ +k cos θ [27]: This is once more a Gaussian distribution, which can be thought of as a 'view' of W (x, k) from an oblique angle. The variance σ 2 θ is now a function of the x and k variances. One can show that after the weak value technique one is left with once again a simple shift: where we have written the weak value in polar form:

Standard measurement
We will define a standard measurement by setting |i = |f = |i * to the eigenstate of A with eigenvalue of greatest magnitude. This scheme succeeds deterministically, and so results in a Wigner function of unit volume: Here λ * = arg max λ {|λ| :Â|i = λ|i }. The two strategies we consider can thus both be thought of as restricted shifts of the initial Wigner function in phase space. Figure 1 shows the shift corresponding to the case where λ * = 1 with a black arrow. In depicting both standard and weak value techniques in the same plots (below in Figures 2 and 4 in particular), we have avoided dramatic examples such as A w = 100 for several reasons. Firstly, it is somewhat of a distraction that we wish to demote in favour of the more subtle but ultimately more useful properties of weak value techniques. Secondly, it is difficult to show such examples without adversely affecting our visual picture. Thirdly, and most importantly, our intuition is actually much better served by realising that the red ellipse in Figure 1, which represents the very limit of validity of AAV's approach, is fixed by σ. Very large weak values are of course still possible, but only by ensuring g is smaller and smaller. In this scenario, the black arrow shrinks while the red ellipse remains constant, meaning that the 'amplification factor' |A w |/λ * becomes very large, while the size of the shift itself must actually remain small. This has previously been described as the requirement that the signal-to-noise ratio per data point must be low [28].

Precision of estimating g
Whenever a quantity of interest is not directly measured, it is necessary to process the raw data in order to 'measure' -or more strictly 'estimate' -that quantity. For instance, g is not directly measured, but inferred from repeated measurements of x. The data are subsequently processed thus: Here angled brackets stand for the sample average. The rules governing the data processing are known as the 'estimator': above we used a tilde to denote the estimator for g in the standard technique. For the weak value technique, the procedure is similar, but x is replaced by s θ and λ * is replaced by the coefficient of g in the argument of p θ of Equation (6). Because the data (the input to the estimator function, x) are random variables, the estimates (the output of the estimator function, g) are also random variables. We will assume that all estimators under consideration are unbiased, meaning that they give the correct answer on average 1 , i.e. g = g.
As alluded above, the classical Fisher information about the coupling constant g: is the central figure of merit of classical parameter estimation. The Cramér Rao bound states that in the limit of many trials, the precision (or uncertainty, or standard deviation) of an unbiased estimator for g will be lower-bounded by the reciprocal of F g [29]. The higher F g , the better the precision, and the Cramér Rao bound can be saturated by efficient estimation strategies: maximum likelihood, for example. Owing to the simplicity of the model considered here, the estimator given in Equation (9) is indeed efficient for ideal detection, but requires modification when detector imperfections are included [30].
The Fisher information measures the information content of a probability distribution p θ (s θ ), which can be derived from the Wigner function upon fixing a measurement. Under our formalism, any phase space quadrature may be considered, or indeed noisy implementations thereof. The choice of measurement thus influences p θ (s θ ) and thereby also influences F g : so we may use the latter to evaluate the suitability of different measurement schemes.
Because we consider a single parameter, the maximum classical Fisher information about g (when considering all possible measurements) of the joint-system meter state immediately after the interaction is given by the quantum Fisher information [31,32,33]: H g is therefore a property of the initial quantum states alone, once the Hamiltonian has been fixed. It is clear that the smaller σ -the narrower the distribution in space -the more information can be extracted. We will consider different measurement schemes that can harvest this maximal information in different ways, by channelling it into different parts of parameter space. That is, we will consider how close one can get to saturating In general, if we consider system and meter to be a single indivisible quantum system, harvesting all of the information would require entangled preparation and entangled measurement. However, for the case where the average momentum of the meter is zero as it enters the weak interaction (i.e we take k = 0 in accordance with Equation (2)), factorable preparations and measurements will suffice. 2 It follows from the convexity of the QFI [37]: that using mixed states cannot increase the Fisher information.
The chain rule of differentiation is a particularly effective tool in relating the Fisher information of a shifted distribution (which does not change shape) to properties of the shape of the distribution itself [19]. Namely: where the last step follows from the Gaussian shape of the marginal distribution.

Optimal protocols
It is straightforward to see that our standard measurement sets a high Quantum Fisher information, and then extracts all of it. Recall that we set |i = |f to the eigenstate of A with highest magnitude eigenvalue λ * , and then measure x: Full information is harvested, but one is 'stuck' with measuring x and with having a high flux onto the detector. Our weak value technique, on the other hand, uses arbitrary initial and final states. Rotated quadratures have We may use this formula, along with Equations (4) and (14) to evaluate a corrected Fisher information for a weak value shift for in an arbitrary direction φ and arbitrary measurement direction θ in phase space. To perform this calculation, we will normalize the distribution before calculating the Fisher Information and then correct it by the postselection probability. This is appropriate given the additivity of Fisher information: N independent experiments enjoy a total Fisher information of N F , or in the case of weak value experiments a total of | f |i | 2 N F . We have . (17) We may now show that this information is always lower than than of the standard measurement: The first inequality is proved in the appendix, along with a prescription for saturating it. The second inequality follows from the Cauchy-Schwarz inequality [19]. The third inequality follows from the selection of the optimal initial state |i * by the standard measurement strategy. It is possible to get close to saturating these inequalities whilst maintaining a large weak value [22,23]. This means that almost all of the Fisher information may be concentrated into an unlikely or 'dark' detector mode. We will not focus so much here on what combination of initial and final states, coupling parameters and meter states are necessary to approach equality in Equations (18)(19)(20), save to note that ifÂ 2 = I then |f =Â|i will saturate the second condition and implies a real weak value. The question has been considered outside of the linear regime we consider [19,38]. Instead the important point is that one may sacrifice only a small amount of information at this stage of the discussion: the sacrifice may well be considered negligible in comparison to other benefits that we describe below.

Protecting information by alleviating technical noise
So far we have considered 'ideal' detection. This is to be understood as a projective and orthogonal measurement that one would typically find in a quantum mechanics text book: an Hermitian observablex ork, for example. Such an idealised measurement offers unlimited precision when the number of repetitions N tends to infinity. For finite trials, we get a finite precision; but this is a consequence of the quantum noise in the prepared state |m (often referred to as 'shot noise') rather than the detector. If the initial state was noise-free (which would correspond to the limit of a highly 'squeezed' Wigner function with x Dirac-delta distributed) an ideal detector would offer 'perfect precision' -an exact estimate with zero uncertainty after a single trial. Otherwise, non-ideal detectors are modelled by different kinds of coarse-grained approximations to ideal detectors, and imply a loss of information.
We will consider the following examples of technical noise: (1) transverse jitter (2) pixelation and (3) saturation. Each will be defined by a transformation which relates the ideal distribution p(s) to the imperfect one p (s). The results of this paper are to present the information obscuring effect of these transformations graphically, as Fisher information efficiency functions η, defined by: These functions allow one to describe different imperfect measurements as η×100% Fisher efficient, where η = 1 recovers the ideal information. Let us consider the relative Fisher information for the weak value technique compared to the standard technique, both under noisy detection: At best F g [p wv (s)] for the weak value technique may be comparable to that of the standard measurement: then the second factor of this equation (the relative Fisher information under ideal detection) will be close to unity. If η favours the weak value technique, however, the first factor (the relative Fisher information efficiency) can be greater than unity and there is scope for practical advantages to be had. Our central idea is that one can guide the Fisher information around in phase space. In particular one can usher it away from danger; from regions of phase space where the information would be lost or degraded. One can concentrate a large proportion (near one hundred percent) of the available information into a low number of events, or even into a conjugate variable for less-noisy detection. One can guide the detector distribution away from a faulty or noisy part of the detector.

Transverse jitter
Consider that the detector is in a random motion, so that the actual distribution at the detector is Under this model it is possible to show that [19]: Note that there is no dependence on s θ , but only on σ θ and ζ θ . Since weak values may channel information into the S θ variable (with θ not necessarily zero, which is impossible in standard measurements), this can be useful in the case where a more stable measurement of s θ (than of x) is available [28]. See Figure (2). Transverse jitter reduces the Fisher information of a shifted Gaussian, but in a way that is independent of the magnitude of the shift and the intensity of the beam. It can be more or less severe in position or momentum space, however: in this example one could use a weak value technique to swap detection in x (90% Fisher efficient) for detection in k (99% Fisher efficient).

Pixelation
Detection of continuous variables is often performed in a discretized fashion, such as with the pixel arrays found in modern cameras. We can model this by Pr(n; νg) = r(n+1) r(n) p(s − νg)ds where r is the width of each pixel, labelled by integers n and centred at (n + 1 2 )r. In practice, this one-dimensional distribution is the result of summing over the y direction of a two-dimensional pixel array. Here we have fixed the detector so that the pixel boundaries are aligned with the centroid of the initial meter state -for the more general case of free alignment, see Ref. [19]. We can numerically compute a few examples are shown in Figure (3). The efficiency is surprisingly high and only very weakly dependent on the parameters that distinguish standard measurements from weak value techniques: there is therefore likely that η wv pixel ≈ η std pixel ≈ 1. There are therefore two points to be made. Firstly, since η pixel is so robust to changing r, we conclude that pixelation does not represent much of a difficulty: there is limited scope for protecting information from this kind of imperfection. Secondly, η pixel does not depend very strongly on the size of the shift νg. Recall that the distinguishing property of the weak value technique is the ability to organise for anomalously large shits: so that even if η pixel is low for the standard technique (say around 84% as in Figure 3), it will be very similar for weak value technique. Therefore there is not much of an opportunity in this particular difficulty.

Saturation
Avoiding detector saturation is perhaps the most promising application of weak values in metrology. An information-theoretical advantage has been conjectured several times [15,28,19,39]: and a recent study by Harris, Boyd and Lundeen [25] gives these conjectures a theoretical underpinning. We refer the reader to [25] for a full analysis, but the rough argument is as follows: for a deterministic strategy, there will be some maximum photon flux for which the information will be degraded due to 'bleaching' or saturation of the photodetectors. On the other hand, if | f |i | 2 is small enough, then saturation will not be an issue and η sat ≈ 1.
In practice, it is likely that saturation problems will be avoided altogether (by reducing the intensity of the laser source, for example with a neutral density filter). It will therefore serve our purposes adequately to define for Θ the Heaviside step function defined by Of course in reality η sat may not fall to zero and may or may not feature a discontinuity at P sat , but will monotonically decrease with | f |i | 2 for a high enough input photon flux. We show our rough model pictorially in Figure 4 3 .

Conclusion
By using the Wigner representation of the meter to describe weak value experiments, we have provided a simple picture of the flexibility of the method and how it compares to standard measurements. Working with real Gaussian meter 98% 92% 84% 84.21% 84.24% Fig. 3 Plots of η pixel (νg) (blue) for pixel widths of r = 0.5, 1, 1.5 respectively. This plot uses σ = 1, with the continuous distribution before pixelation (p(x), black) and the pixel boundaries (at (n + 1 2 )r, vertical grey bars) shown as a guide for the eye. The dependence on s is weakly periodic and becomes weaker as the pixel size reduces. Even for ostensibly very large pixel sizes, the detection scheme is surprisingly effective: in these examples being at least 84% Fisher efficient. For pixel sizes much smaller than σ, the detector is almost 100% Fisher efficient. The bottom panel is a zoom of the r = 1.5 example, and shows the shifts under a standard measurement (black) and weak value measurement with Aw = 3 (red). Pixelation has mostly a negligible effect in general, but can even have a more detrimental effect on larger shifts (as in this example) depending on overall alignment of the detector [19,30].
states and in the linear regime considered by AAV, we have extended previous arguments [19] concerning the Fisher information about the coupling parameter between system and meter: allowing for arbitrary complex weak values and measurements at oblique angles in phase space. Under ideal detection, the Fisher information of the weak value technique will never exceed the Fisher information of the standard strategy without post-selection. We then considered the impact of three types of technical noise, namely (1) transverse jitter, (2) pixelation and (3) photo-saturation of the detector. The Fisher information efficiency that char-

Standard measurement
Weak value technique Fig. 4 The problem of detector saturation is shown here in red as a maximum Wigner function volume (in our formalism this is equivalent to a maximum Wigner function height). When the Wigner function pokes through the red surface, this implies that the distribution at the detector will be 'clipped', resulting in a loss of Fisher information. The weak value technique allows almost all of the available Fisher information (which is mostly concentrated into a distribution conditioned on an unlikely postselection event) to continue to a detector which would otherwise saturate. This figure uses |λ * | = 1, Aw = 3. The black arrow and the red ellipse have the same meaning as in Figure 1.
acterises these imperfections may in some cases be higher for the weak value technique than for the standard strategy, in which case weak value techniques would give improved precision. Whilst this might conceivably be the case for (1) and (2), (3) is the killer application of weak value techniques. By concentrating almost all of the available quantum Fisher information into a low number of events, it can amplify the parameter to be measured when detector saturation would otherwise be a limitation. As long as the cost of generating a photon is considered lower than the cost of detecting one, weak values therefore offer greater precision per unit cost than standard measurements. This conclusion invites experimental exploration and implementation.