Optical resolution from Fisher information

The information gained by performing a measurement on a physical system is most appropriately assessed by the Fisher information, which in fact establishes lower bounds on estimation errors for an arbitrary unbiased estimator. We revisit the basic properties of the Fisher information and demonstrate its potential to quantify the resolution of optical systems. We illustrate this with some conceptually important examples, such as single-slit diffraction, spectroscopy and superresolution techniques.


Introduction
One of the numerous definitions of the word "information" in the online Merriam-Webster dictionary [1] is "knowledge obtained from investigation, study, or instruction", typically about a certain subject or event. Nowadays, this notion has influenced the world of global communication in which we live. Other terms, such as "intelligence", "news", "facts", and "data" are often regarded as synonyms of this word. In the same dictionary entry, right after two other definitions that immediately follow, a slightly more technical meaning of this word reads "a numerical quantity that measures the uncertainty in the outcome of an experiment to be performed". This operational definition forms the basis of our understanding of Nature through data acquisition and statistical inference.
As with all scientific methods, it is necessary to determine the "accuracy" (or the "degree of conformity of a measure to a standard or a true value" [2]) of the information gained. Among related concepts, the accuracy of the acquired scientific information also correlates, in some respects, with its "exactness" and "correctness", thereby setting the standards of scientific truth.
In physics, any piece of information about a system is always masked by noise, which originates from several confounding factors, such as imperfections in the measurement apparatus, experimental limitations, and other plausible error sources. In optics, signal measurements of a given object yield statistical information about quantities attributed to it, which may be extracted using known methods of statistical inference and information theory. In general, the inferred values of physical quantities differ from the corresponding true values because of the presence of noise in the data. With more refined and optimized data analysis strategies, noise can be reduced to retrieve the original piece of information of interest. The uncertainties of these inferred values also depend on the choice of the statistical estimator for the measured parameters and on the particular configurations of the measurements. We shall employ a well-known statistical quantity-the Fisher information-to quantify the information gained via a measurement with any optical device. In this way, we will be able to evaluate and optimize the performance of measurement schemes and/or signal estimation strategies.
As exemplifying confirmation of its extensive compatibility with optics, the concept of Fisher information shall be applied to diffraction problems, spectroscopic measurements with noisy position-sensitive photodetectors and optical imaging of spatially bounded objects. These problems are intimately connected with important theoretical concepts, such as the point spread function, Rayleigh resolution limit and superresolution techniques. a e-mail: lsanchez@fis.ucm.es Even though the intensity on the right shows only one maximum, the actual shape is broader than a single point intensity and contains information about two points. This information is retrievable by data analysis.
This paper is organized as follows. The standard theory of two-point resolution in optics is first briefly reviewed in sect. 2. This is followed by an introduction to the concept of the Fisher information in sect. 3. This machinery is applied in sect. 4 to some selected examples. Spectroscopic resolution is introduced in sect. 5 whereas sect. 6 discusses superresolution. Finally, sect. 7 serves as an epilogical treatise to biased estimation strategies that can potentially beat the standard inference limits defined by the Fisher information.

Two-point resolution of an imaging system
In optics, perfect imaging implies the existence of a one-to-one correspondence between points of the object plane and those of the image plane. Unfortunately, this perfect correspondence is prohibited by the laws of wave optics, where points are imaged in general as extended diffraction spots. The corresponding normalized intensity distribution is known as the point spread function (PSF). Put differently, the PSF represents the normalized response of an optical imaging system to a point light source. This function equivalently describes the image degradation due to diffraction phenomena and optical aberrations and provides a simple and intuitive way of evaluating the quality of an optical imaging system.
Assuming linearity and spatial invariance of an incoherent optical imaging system, the output intensity at the image plane, I i , can be expressed as a convolution of the input intensity at the object plane, I o , and the system PSF h: where (x, y) and (x , y ) are the coordinates of the object and image planes, respectively. Neglecting optical aberrations and assuming the Fraunhofer approximation, the shape of the diffraction spot is readily expressed as the scaled Fourier transform of the aperture function of the imaging system. For example, for a rectangular aperture of dimensions a × b, we get where sinc x = (sin x)/x, λ is the central wavelength, and z is the image distance. For a circular aperture of diameter a, the PSF shows cylindrical symmetry and can be expressed in terms of the Bessel function of first kind J 1 , in polar coordinates (r = x 2 + y 2 ), as Two-point resolution, which is defined as the ability to resolve two point sources of equal intensity, is a widely used measure of the overall resolving capabilities of an imaging system. The classical Rayleigh criterion [3] is certainly the most famous way to deal with the problem: according to this criterion, two point sources are just resolved if the maximum of the first PSF coincides with the first minimum of the second one. Two point sources are optically resolved if the separation is larger than that defined by the criterion. Figure 1 describes this pictorially. For instance, with an optical system possessing a circular aperture of radius a, the Rayleigh criterion establishes a separation d between the sources of d 1.22 λz a . (4) Nevertheless, this criterion should not be taken dogmatically [4]. Good visual resolution of two point sources, such as a double star, requires two separated maxima. Moreover, optimal digital processing of the data acquired by a camera allows, in principle, for an improvement of the optical resolution beyond the Rayleigh limit. Going back to fig. 1, upon comparing the imaging quality at the Rayleigh limit shown on the left, the image on the right, although poorly resolved according the Rayleigh criterion, evidently differs in shape and size from the image of a single point. Hence some information about any hidden features of the object is preserved (think of an object signal coming from two stars that are very close to each other) and can be retrieved with clever post processing. This measurement limit falls in the superresolution regime, that is beyond the standard resolution limit defined by the Rayleigh criterion. We shall discuss this in sect. 6, with more detailed statistical analysis.

Fisher information
Consider a measurement consisting of n output channels described by the random vector X = (X 1 , X 2 , . . . , X n ), where X = x = (x 1 , x 2 , . . . , x n ) is one of the many possible data vectors obtainable from a given physical system. The probability p(x|θ) for measuring x is a conditional probability that depends on the value of the unknown true parameter θ that describes some property of the system. A main goal of this article is to infer the value of the unknown parameter θ as faithfully as possible from the measured data x-a well-known procedure of parameter estimation in statistics. We denote an estimator for θ calculated from such an estimation procedure by θ(x). Owing to the presence of noise, any such estimator θ(x) always possesses an inherent statistical uncertainty Δ θ, which depends on the choice of estimation strategy.
For example, upon tossing a coin N times, we observe a certain number x of "head"s. This measurement data can be used to estimate the true bias θ of the coin, that is, the probability of observing a "head" in a single flip. One plausible choice of estimator would be θ(x) = θ 1 (x) = x/N -the relative frequency of observing head in the experiment. If one ignores the observed data x and insists that the coin is fair, then θ(x) = θ 2 (x) = 1/2 is yet another option. Depending on the situation, either the former or the latter estimator shall more accurately describe the coin. When no additional prior information about the bias θ of the coin is known, θ 1 (x) provides a much more reliable estimate as it converges to the true value in the limit of large N (a consistent estimator).
An important relation between the measurement statistics and estimation error is the Cramér-Rao inequality for the estimation mean squared error (MSE) of an unbiased estimator θ, characterized by the property θ = θ [5,6], where the bar denotes average, where is the Fisher information and (Δ θ) 2 = ( θ − θ) 2 stands for the variance. Estimators saturating the Cramér-Rao lower bound (CRLB) 1/F (θ) are called efficient estimators.
To proceed further we note that for n independent channels, the conditional probability takes the form where p j (x j |θ) is the conditional probability for the j-th measurement output [7]. As an example, suppose that x j is a continuous variable and is a normal distribution with meanx j =x j (θ) and standard deviation Δx j , independent of θ. Then, the Fisher information is where a prime denotes a derivative with respect to the unknown true parameter θ. The formula in eq. (9) is also valid for the Poissonian distribution with x j taking random integer values k and λ j =x j . For another important class of measurements that involve sequential counting of sampling events, such that x is now a sequence of event occurrences collected into the vector x = (m 1 , m 2 , . . . , m n ) of integers n j=1 m j = N , and the corresponding data statistics is a multinomial distribution and the following formula holds. We shall not discuss here the relationship with the quantum aspects of quantum tomography and quantum Fisher information, that are analyzed in ref. [8].

Diffraction through an amplitude grating: estimating wavelength
For a better understanding of the role played by the Fisher information in optics, we begin by looking at a simple measurement of the wavelength of a laser source using an amplitude grating and a collimator. Behind the amplitude grating, diffraction spots appear, which are here represented as diffraction orders (see fig. 2). The mean angle of propagation φ m of an optical beam after passing through the grating is a simple function of the diffraction order m: with m integer and f g being the grating frequency. Let us consider the measurement of two output channels corresponding to diffraction orders m = 1 and m = −1. For large separation of the these spots, one may assume that these angular measurement outcomes are independent and normally distributed, where (Δφ) 2 is the corresponding angular uncertainty that characterizes the size of each diffraction spot (note that λf g Δφ). In general, (Δφ) 2 varies with λ, but for simplicity, we may take this to be a constant for small variations in λ, which is so for many practical purposes. The resulting joint probability for the angular outcomes is factorized: Every unbiased estimator λ for the true wavelength λ will have an MSE (Δ λ) 2 and, according to eq. (9), this MSE is bounded from below by the inverse Fisher information:

Diffraction through a slit: estimating position
A different aspect of analyzing measurement uncertainties using the Fisher information can be demonstrated through the problem of single-slit diffraction. The measurement apparatus consists of the slit and a position-sensitive photodetector placed in the far-field zone behind the slit (see fig. 3). Since the position and wave vector k = (k x , k y , k z ) are conjugate variables, one expects their uncertainties to be mutually related. We suppose that z labels the axis of propagation and consider electromagnetic waves propagating from z < 0 towards a single slit of width a placed in the z = 0 plane, with field amplitude E(x, y = 0, z) ∝ e i(kzz+kxx) . We make the additional assumption of paraxial propagation with respect to the z-axis; namely, k x k z . If the slit is parallel to the y-axis and extends from x = −a/2 to x = a/2, the intensity distribution in x (for any y) in the z = 0 plane reads In the far-field limit, the field amplitude e ikxx at z = 0 can be used to compute the intensity registered by the photodetector placed in the far-field plane at z = d a 2 /λ using the Fraunhofer approximation [9]. The resulting intensity distribution is The joint parametric dependence on the position x and wave number k x defines a relationship between the two physical quantities in wave propagation. The corresponding uncertainties Δx and Δk x would also depend on this relationship. This means that when a measurement is made on these two quantities for the same light source, their uncertainties cannot vary arbitrarily. The identity of this relationship between these uncertainties is revealed by simply calculating so that, we find the uncertainty relation [10] ( One might be tempted to believe that the inequality in (20) forms the basis of Heisenberg's uncertainty relation (Δx) 2 (Δp x ) 2 ≥ 2 /4 just by a simple substitution of p x = k x . However, the inequality (20) has a completely different origin than the Heisenberg relation [11][12][13][14]. The classical inequality quantifies the principal limitation in performing statistical inference on the wave number k x . In particular, for a fixed set of data, the product on the left-hand side varies with different estimation strategies. For instance, for the same set of position data collected, one may either choose to take the sample mean to obtain a linear estimator k x for the unknown k x that would give an infinite MSE as a property of the distribution function in eq. (18), or one may choose the Maximum Likelihood (ML) estimator, which will saturate this inequality (see fig. 4). Uncertainty relations based on the CRLB, such as that in (20), are extremely useful for evaluating the performance of complicated measurement schemes, as will be shown below.

Resolution of a spectrometer
In sect. 4.1, the simple measurement of the laser-light wavelength was used to illustrate the basic concepts of the Fisher information and its relations to statistical inference. In this section, we discuss the implications of the Fisher information in evaluating the performance of spectrometers, which are devices used to observe the spectrum of an incoming signal, as sketched in fig. 5. Owing to the fact that a spectrum carries unique information about the material from which light was emanated, spectroscopy is an essential tool for material analysis.
The spectral resolution is an important parameter that determines the spectrometer performance. This parameter quantifies the spectrometer's ability to distinguish between two closely spaced spectral bands and observe fine structures in the spectrum. Just like the case for two-point resolution, the standard spectral resolution criteria are based on the widths of the spectral bands at the output of the spectrometer and do not take data post-processing methods into account. In practice, to identify the chemical composition of a sample, its corresponding spectrum does not need to be fully resolved. Characteristic information about the sample can alternatively be retrieved by digitally matching its spectrum to a set of previously recorded spectra [15][16][17][18]. In this section, the Fisher information will be employed to discuss spectral resolution in detail.
To study the resolution of a spectrometer, it is necessary to understand how this device works. The key component of the spectrometer is an optical prism or diffraction grid-a dispersive element that spatially separates the individual spectral components of the measured signal. Other important components include the input slit, collimating and focusing optical elements and a position-sensitive photodetector. An incoming signal passes through the slit placed in the focal plane of a lens and is collimated before impinging on the dispersive element. Assuming a plane wave incident on a grating, the angle of deflection θ for the first diffraction order is governed by the grid equation and depends on the signal wavelength λ, inclination angle θ and spatial frequency of the grid f g . The output signal is collected by the second lens, whose optical axis is adjusted to coincide approximately with the direction of propagation for the diffraction order +1, and focused on a photodetector. Thereafter, the spectral component λ = λ+Δλ is imaged at a position with respect to the optical axis of the focusing lens. Here z is the focal length and γ = 1/ 1 − (f g λ) 2 is a scaling factor that arises from the paraxial approximation for the sine function in eq. (21). This factor approaches unity for small f g . According to eq. (22), the spectral bands separation is proportional to the frequency f g of the grating and focal length z of the focusing element. One effect that needs to be considered during the discussion of spectrometer resolution is image blurring due to the different incident angles θ of the signal. Blurring arises from the θ-dependent term in eq. (21). In order to avoid this false line splitting, the input signal is collimated before it intersects the grating. The quality of collimation strongly depends on the width a of the collimator slit. Another important element of the imaging path is the excitation laser, which illuminates the specimen. Consequently, interaction of a laser beam with the specimen results in a set of spectral bands that encode the chemical signatures of the specimen. Imperfect temporal coherence is reflected in the broadening of the spectral bands of interest. Finally, the image of the signal is obtained through convoluting the signal bands broadened by the excitation laser profile with the scaled aperture function of the entrance slit (again, the parameter γ plays the role of the scaling factor here): Assuming that the laser spectral band has a Gaussian profile of standard deviation b, the observed intensity turns out to be where erf(·) is the error function. Both the collimating and the focusing components are assumed to be free of aberrations.
A good spectral resolution in the recorded data is not always necessary. Information about the sample is encoded not only in the number and position of individual spectral intensity peaks, but in the intensity profile as a whole. Not surprisingly, the Fisher information and the corresponding CRLB reveal this information. The discrete structure of the photodetector and the presence of noise in the data are incorporated in such an analysis. This is important because the detected signals are often very weak. The results of simulated spectroscopy of a pair of spectral bands is illustrated in fig. 6. The left panel shows the intensity profiles for three different separations of a pair of broadened spectral bands. The full-width at half maximum (FWHM) of each spectral band is 2.8 cm −1 . This corresponds to a Rayleigh separation of approximately 3.1 cm −1 . The separation of the two bands can be estimated from the data. The right panel shows the Fisher information as a function of the true separation for three different noise levels. Here, Poissonian detection statistics is assumed. Obviously, the information about d gets smaller with decreasing band separation and increasing noise level. However, some information about the separation is always extractable, even in regimes well beyond the standard Rayleigh limit.
To turn the Fisher information about the spectral-band separation into a resolution limit that can be directly compared with the standard Rayleigh limit, we consider a simple hypothesis testing method based on the CRLB. We denote by σ the CRLB for an efficient unbiased estimator for d. With reference to the CRLB, the spectral bands are resolved (unresolved) if the zero separation d = 0 falls outside inside the interval [d − 3σ, d + 3σ] about the true value. This limit is evaluated in fig. 7 for two different excitation laser bandwidths. Comparing the blue solid line with the left panel of fig. 6, we emphasize that clever postprocessing of high-quality data brings about a resolution gain of at least an order of magnitude with respect to the standard heuristic resolution limits.

Superresolution of extended objects
Consider the imaging of an extended object of amplitude U o by a linear spatially-invariant optical system described by the impulse response function h(·). The input-output transformation in eq. (1) for intensities (incoherent imaging), or complex amplitudes (coherent imaging), is considerably simplified in the spatial frequency domain, (25) Fig. 8. Superresolution of the optical imaging system (IS). The spatially bounded phase grating (PG) is illuminated by a plane wave. If the grid is blazed, only signals of one diffraction order are propagated toward the imaging system. Even if the frequency of the phase grating is higher than the cutoff frequency of the imaging system, information about the frequency is still present in the position-sensitive detector (PD), which is placed in the focal plane of the imaging system.
Physically, the spectrum of the image G i is obtained by simply applying a frequency filter, described by a transfer function H(·), to the spectrum of the object G o . Transfer functions represent important characteristics of imaging systems. They describe how well the individual spatial frequencies of the object spectrum are mapped to the final image spectrum, and are thus related to the spectral resolution which was discussed previously. Indeed, fine details, sharp edges, etc. appearing in the object spectrum correspond to high-frequency components. The filtering of these components thus results in the blurring of the object spectrum and a reduction in resolution. The standard resolution criterion is based on the highest frequency transmitted f max . Without loosing generality, we focus on spatially invariant diffraction-limited coherent imaging systems, whose transfer functions can be, under mild conditions, shown to be scaled versions of the system aperture functions. Generalization to incoherent illumination or imperfect systems is straightforward. The finite spatial extent of any real aperture means that the corresponding transfer function is bounded and there is always a frequency cutoff beyond which spectral transfer drops to zero. Signals of higher frequencies are completely filtered out from the image and the imaging system therefore acts as a low-pass frequency filter. One would think that the resolution of a real imaging system is always limited to the size of the cutoff frequency period 1/f max and an object modulation faster than this cutoff is not observable. Fortunately, it turns out that the resolution can be extended far beyond this limit, provided that some prior information about the object is available.
To keep the analysis as simple as possible we discuss imaging of a simple 1-D periodic object modulation U o (x) = exp(−i2πf g x), where f g is the modulation frequency (see fig. 8). This frequency is assumed to exceed the cutoff frequency f max of the imaging system with a finite rectangular aperture (f g > f max ). Since the corresponding object spectrum G o (f ) ∝ δ(f − f g ) is a delta function peaked at f g , no information about the object is available in the image: The situation drastically changes with a realistic object of finite size. In this case, the delta-function spectrum of the oscillatory modulation is broadened by a convolution with the rectangular function defining the object length : Although the spectrum remains centered outside the support of H(f ), notice that the "tails" of the sinc function overlap with the transmitted range of frequencies (see fig. 9) and information about f g is now accessible despite the fact that f g is larger than the cutoff f max , since This discussion can be generalized to any spatially bounded object [9]. The Fourier spectrum of any bounded function is unbounded and analytic. As a consequence, the full spectrum can, in principle, be reconstructed by analytic continuation from the observable frequency region. The Fisher information can be used to quantify the resolution limits of such superresolution techniques.
To demonstrate this, let us return to the example of an amplitude grating of a finite size . This example is sketched in fig. 9. Only diffraction of a single order shall concern us here, which is a consequence of illuminating a blazed grid with a plane wave. The present goal is to estimate the grid frequency f g from the image spectrum recorded with a position-sensitive photodetector. In the superresolution regime, most object intensity is filtered out according to the system transfer function and noise becomes an important factor. The results that are summarized in fig. 10 show the normalized Fisher information and the MSE against the true frequency and the data noise-to-signal ratio NSR. By increasing the modulation frequency f g of the object, the Fisher information decreases and the MSEs become larger, as expected. The higher the grid frequency f g , the more difficult it is for its estimation. In addition, we observe that spectral observations in the standard resolution regime and the superresolution regime differ strongly as far as the sensitivity of the measurement to data noise is concerned: While increasing the noise from NSR = 100 to NSR = 10 has little effect on the estimation of f g at 50% of the cutoff frequency f max , for modulations at 130% of f max , the MSE is more than doubled and the inference of even faster modulations require data that are almost noiseless. As such, we conclude that resolution limits of a given imaging system cannot be determined before the noise properties and a statistical model for the signal detection are specified, contrary to the overly simplified treatments based on the geometry of the PSF alone that can be found in some standard textbooks (see [3] for instance).
Finally, in fig. 11, the role of the object size in superresolution estimation is discussed. For objects of infinite size, the Fisher information about the frequencies exceeding the cutoff is zero. By decreasing the size of the object, information about low-frequency features is reduced, yet inference in the superresolution regime becomes possible. Smaller object sizes, or large amount of prior information about the object, allow for extending the resolution far beyond the imaging-system cutoff. As mentioned above, high-quality data with very little noise and other systematic errors are required to exploit this benefit.

Biased estimation strategies: beating the Cramér-Rao Lower Bound
Thus far, this work has been devoted to the discussion of the CRLB and its implications in the analysis of optical measurements. These results are, strictly speaking, only applicable to unbiased estimation strategies. To complete picture, we shall remind the reader of the possibility of beating the statistical inference limits set by the CRLBs by looking at estimators that are biased.
In sect. 4.1, we investigated the inference limit of wavelength estimation by considering angular measurements of two diffraction spots of orders m = 1 and m = −1. This limit is defined by the CRLB in eq. (16). Evidently, the linear unbiased estimator that saturates this inequality is given by This efficient estimator is also known as the best linear unbiased estimator (BLUE) [19,20] that minimizes the variance for the problem. One can understand why this linear estimator is the BLUE by recognizing that this estimator maximizes the likelihood p(φ 1 , φ −1 |λ), which is in fact a Gaussian distribution for the random vector (φ 1 , −φ −1 ) with mean vector (λ/d, λ/d) and covariance matrix diag(σ 2 , σ 2 ), and so the statement of asymptotic efficiency, which states that the maximum-likelihood (ML) estimator of a Gaussian likelihood achieves the CRLB, holds [21]. On the other hand, one may choose another estimator which takes the root-mean-square (RMS) of the data. This estimator is no longer unbiased since λ = λ. One can check that the variance of this positive estimator is given by with μ = λ/d and I n (y) is the modified Bessel function of the first kind of order n. Note that with this positive estimator, it is possible to achieve an estimation precision smaller than the CRLB. For simplicity, let us take the spacing d to be of unit length and λ to be a scaled wavelength relative to d. When λ Δφ/2, the MSE for the biased estimator in eq. (29) is always smaller than that for the BLUE, with the largest improvement typically near λ = Δφ and (Δ λ) 2 (Δφ) 2 /2 = CRLB in the "infrared" regime (λ Δφ). More generally, under certain conditions, it is possible to find a more accurate biased estimator that gives a lower MSE than the CRLB for unbiased estimators. This possibility is not surprising and is well-known in the theory of statistical inference for decades [22][23][24]. When the spots of the two diffraction orders are closely spaced (λ Δφ/2), the MSE for the biased estimator is higher than that of the BLUE and we have (Δ λ) 2 (Δφ) 2 = 2 × CRLB in the "ultraviolet" regime (λ Δφ), so that the BLUE is the better choice (see fig. 12). However, note that the results of eqs. (16) apply to the condition that the scaled λ is much larger than the angular uncertainty Δφ. In a more realistic experiment, when the orders are spaced too closely, the diffraction spots overlap and are in general impossible to be distinguished, so that the angular measurement of a specific diffraction order loses its meaning. Therefore, these reasonings make physical sense only for moderate to large values of the scaled λ relative to Δφ, for which a biased estimation strategy may prove to be much more reliable for the same data.
As an additional assurance, we mention here that if the objective is to supply an estimator that gives the minimal MSE possible, then the bias of this estimator is not a real problem as long as it is a consistent estimator. For the biased estimator in eq. (29), as long as the accuracy of the measured values φ 1 and φ −1 are high, this estimator approaches the true value.

Conclusions
We have discussed, in this tutorial review, various applications of the Fisher information in optics. Through these applications, we highlighted that the Cramér-Rao lower bound that is defined by this information measure defines the limits of optical resolution, both for the standard Rayleigh limit and the superresolution limit. These concepts also give rise to fundamental inequalities that appear in wave optics and are particularly useful in evaluating the performance of optical imaging systems. As a final note, we remind the reader that more precise estimation strategies can be employed to beat the limit set by the Cramér-Rao bound in statistical inference if some prior information about the object of interest is known, and in some cases, there is remarkably large improvement for the corresponding estimators over those obtained with common unbiased estimation strategies. Open Access This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.