Fluorescent X-ray Scan Image Quality Prediction

This paper summarizes approaches to image quality prediction in support of an effort under the IARPA RAVEN program to demonstrate a non-destructive, tabletop X-ray microscope for high-resolution 3D imaging of integrated circuits (ICs). The fluorescent X-rays are generated by scanning an electron beam along an appropriately patterned target layer placed in front of the sample and are then detected after passing through the sample by a high-resolution (in both solid angle and energy) backside sensor array. The images are created by way of a model-based tomographic inversion algorithm, with image resolution depending critically on the electron beam scan density and diversity of sample orientations. We derive image quality metrics that quantify the image point spread function and noise sensitivity for any proposed experiment design. Application of these metrics will guide final system design when physical data are not yet available.


Introduction
Image quality (IQ) assessment is an important element of system design, optimization, and quality control [1,2]. A complete assessment evaluates the entire imaging chain including both data acquisition and image reconstruction stages. Retrospective IQ measurement is based on careful analysis of images taken in a series of laboratory measurements, often using specialized phantoms. In contrast, prospective image quality prediction is based on a highquality end-to-end simulation of the system data product, and subsequent image reconstruction algorithm applied to that data. A key application is to guide system design in advance of hardware construction when physical data are not yet available. Similarly, it allows one to tune or Distribution Statement A. Approved for public release: distribution unlimited.
Peter B. Weichman peter.weichman@baesystems.com Eugene M. Lavely eugene.lavely@baesystems.com 1 Fast Labs, BAE Systems, Electronic Systems, 600 District Avenue, Burlington, MA 01803, USA customize an existing system to a particular set of specialized tasks under conditions where the retrospective approach is time-consuming or infeasible for other reasons (e.g., target sensitivity to, or potential degradation under, multiple measurements). Given an accurate system forward model, IQ assessment is relatively straightforward in cases of "direct" image reconstruction algorithms, in which a relatively simple (though perhaps numerically intensive) forward algorithm is applied to the data to obtain an image. An example is back projection image formation, which is also linear in the data. Various Fourier transform methods (e.g., SAR or range-Doppler processing) are special cases. Construction of various IQ metrics, such as point spread functions, are equally direct.
Less straightforward are imaging algorithms that rely on model-based iterative reconstruction (MBIR). The resulting algorithm may be highly nonlinear in the data, and IQ metrics may therefore vary strongly with position, noise levels, source and detector blur, etc. This certainly increases the complexity of, and necessity for, IQ assessment. Here, we develop an IQ assessment approach for the fluorescent X-ray tomography system (Fig. 1). We define and evaluate a set of rigorous IQ metrics informed by the MBIR based on the Poisson statistics of photon counting. Specific forward model elements for this system that must be accurately modeled include focal spot blur (finite-size X-ray source Fig. 1 Top: Illustrated is the microscope interior chamber, highlighting the electron column, electron detectors (SE, BSE), and the TES-based [3] transmission X-ray detector assembly. The target/sample assembly is mounted in the sample holder and positioned for imaging with a translation and rotation stage. The detector package is mounted in a bellows inserted into a "receiver tube" for high proximity to the X-ray source. The instrument is a customized adaptation of Orsay Physics' "NanoSpace" instrument, performed collaboratively with Orsay Physics. Bottom: Target/sample assembly and stimulated signals. The tomographic data collection uses fluorescent X-rays induced in a metallic thin film deposited onto the prepared backside of the sample. To obtain desired resolution, the sample must be decapsulated and the silicon backside thinned to within a few hundred nanometers of the semiconductor device layer prior to the deposition step region in the target), cross-talk between superconducting detector pixels when recording change in conductivity due to a single photon capture, and noise correlations. This paper is intended to provide a detailed theoretical basis for a general set of IQ metrics. Specific illustrative applications to the RAVEN IC tomography problem will be presented separately.

System Forward Model
In the X-ray microscope system (Fig. 1), the collected fluorescent photon count data forms the basis for a tomographic inversion of the sample structure. We begin by accounting for the physics and geometry of the measurement, assuming ideal sample stage (translation and rotation) operation, perfect knowledge of the electron beam location, and ideal photon detector behavior. Later, we will include effects of sample stage and electron beam uncertainty, as well as non-ideal detector effects such as pixel cross-talk and noise properties.

Measurement Physics
In what follows, we choose a 3D coordinate system that is fixed in the sample. Of course, it is generally the sample that is translated and rotated while the electron beam and detector apparatus remain fixed. However, it is the detailed sample structure that is the desired end-product of the tomographic inversion, and these take the form of fixed functions of position within the sample volume. In this frame, the relative positions and orientations of the source, receiver, and electron beam must be carefully tracked for each measurement.
For an idealized point source x S and point receiver x R , the mean count rate for a chosen fluorescent photon with sharp energy E is modeled in the form where n 0 is the raw fluorescent photon production rate, n B is the background rate (due to bremsstrahlung, multiple scattering, etc.) of continuum photons deposited in the energy bin E about E. The first term represents the photon intensity derived from the physical optics approximate solution to the 3D wave equation, including geometric spreading and energy dissipation. For the latter, fluorescent photons are absorbed (or scattered to lower energy) according to the line integral (valid in the weak absorption limit characteristic of higher energy X-rays) in which defines the ray, parameterized by physical distance 0 ≤ s ≤ |x R − x S |, connecting the two endpoints. The function μ(x, E) is the local absorption rate as a function of 3D location x, commonly modeled as a linear superposition in which the index A ranges over atomic elements (Au, Al, Si, Cu, etc.) and n A (x) is the element number density. For specified electron beam entry center point x 0 S , we write the position-dependent fluorescent production rate in the form in which the source distribution function S is normalized to unit integral in the first argument, and is now explicitly parameterized as well by the electron beam energy E (through fluorescent production efficiency, electron multiple scattering, etc.), the beam center point x 0 S , and the incident directionê 0 S of that beam. Thus, represents the total integrated production rate, in general also depending on energy and incident beam direction, but this will normally be suppressed from the notation. We now use S to define an average of any function F (x S , x R ) of the source and receiver points by which includes separate averages over the 3D source region volume and over the 2D receiver pixel area A R . The result is defined by the entry center point x 0 S of the electron beam on the target surface, the receiver center coordinate x 0 R , and the corresponding mean incident X-ray flux direction The receiver area integral is normalized by the total projected area A R ·ê 0 SR along the incident X-ray flux direction. More generally, one could also incorporate a receiver sensitivity profile that is not uniform over the pixel.
With the above definitions, for finite-size source and receiver, the actual mean measured count rate is expressed in the form in whichn B is the integrated background rate, and is the solid angle subtended by the receiver pixel-in the denominator, we have simply substituted under the reasonable assumption that source region and pixel diameter are very small compared to their separation. The background term is not resolved in detail here since it is more difficult to model quantitatively and is assumed to be a smooth function of E, which can be fit and subtracted directly from the data. Although in principle present, any residual dependence of the background on μ is neglected. The function S encompasses the entire fluorescent X-ray creation process, including incident electron beam shape and energy spectrum, fluorescing layer thickness, and orientation. All of these need to be included in the system model. For a horizontally homogeneous target, the extra dependence of S on x 0 S drops out (with only the difference variable x S − x 0 S surviving). However, target inhomogeneity will in general be present (e.g., through varying deposited film thickness).
The source region may be crudely thought of as a cylinder with axis oriented alongê 0 S and diameter determined by electron beam scattering characteristics. More generally, S would either need to be determined via prior calibration measurements in the absence of the sample, or modeled through Monte Carlo simulations of the electron beam, with given energy and geometry, interacting with the target with given atomic composition and geometry [4]. Clearly, detailed calibration measurements would be most desirable, but these are unlikely to be available given that the target film is most likely directly applied to the sample in our setup (Fig. 1). In absence of this, an effort should be made to produce as uniform a film as possible. With sufficient measurement diversity, it turns out that although it is essential that the electron beam position be tracked to high precision, the source intensity (hence film thickness) may to a certain extent be included as part of the inversion. Thus, given a sufficiently large combination of source and receiver pixel positions, there may be sufficient redundancy in the geometry of rays passing through the sample to refine the estimate for the amplituden 0 in Eq. 8 based on the samplepresent measurements only (see, e.g., [5] and references therein).
Of course, for tomographic purposes, the critical dependence of Eq. 8 is on the variation of M with the precise fluorescent photon path, which could be quite complex due to discontinuities of μ across sample constituent boundaries. As alluded to above, the coordinate system is defined by the fixed function μ(x), while all other parameters, including the detailed geometry of the function S , vary with the measurement. The aim is to reconstruct μ(x) from an appropriately large set of line integral estimates , after subtraction of the continuous background, and properly accounting for photon Poisson statistics in the low count regime (see Section 3).

Simplified form
Equation 8 implies, in general, a very complicated exponential relationship between the source and receiver profiles and the key line integrals. However, there are natural conditions under which the averages over these profiles can be moved directly to the line integrals themselves. Specifically, the experimental design and X-ray energies are deliberately selected so that e −M(x S ,x R ;E) is large enough that a reasonable fraction of the n 0 (x S , E) source photons make it through the sample, generating a correspondingly reasonable fluorescence count rate, distinguishable from the background. Thus, M(x S , x R ; E) = O(1), and we further assume that, for most source-receiver pairs There are clearly cases where this will fail, such as when a ray happens to run for a long distance parallel and immediately adjacent to a straight metal-nonmetal boundary. However, this will likely be a rare occurrence: much more commonly, slight adjustments of a ray will produce similarly small changes in the material profile along the ray, hence only slight adjustments to the net line integral. Note that if this were not the case, then the experimental resolution requirements are likely not being met, and the inversion will not produce an accurate result.
Proceeding under the assumption that the experimental design ensures validity of Eq. 10, to high accuracy one obtains as follows: in which is the measurement average of the line integral. Substituting (11) into (8), the mean photon count rate follows in the form Higher order terms δM 2 0 are neglected. Equation 13 conveniently allows one to treat the measured photon count as a single Poisson process involving a "cone volume" average of rays, rather than a complex superposition of Poisson processes for each individual ray in the cone. This greatly simplifies the inversion scheme. For an actual microchip, there will be structure on many length scales, requiring a multi-scale approach to estimate μ (or, more specifically, the atomic densities n A ) constrained by available prior information.

Measurement Imperfections
In the above discussion, we have assumed perfect knowledge of the source positions x S and the associated function S . These will be impacted by electron beam wandering and by sample stage translation and rotation uncertainties. We have also not yet accounted for detector uncertainties, such as pixel cross-talk and multiple photon absorption within a detection time window, that will generate, respectively, photon count and energy estimate errors.
For typical CT scans which use scintillator detectors, rather than being directly absorbed the X-ray photon creates a cascade of visible light photons that are then converted to an electronic signal. Scattering of the photons between pixels can be an important blurring effect.
In contrast, the TES detector pixel fully absorbs an X-ray photon, generating an equilibrium heating effect that translates to a superconducting element conductivity change that in turn measures the photon energy [6]. The detailed equilibration process is very complicated, and might also deposit energy in more than one pixel. However, the energy deposited in each pixel, while summing to E, would be less than E, hence would appear as part of the background Bremsstrahlung. This leads to a fluorescent Xray undercount rather than a spreading of the count [2]. Such processes might then be modeled in the form [1,2] in which ε(x 0 R , E) ≤ 1 is an overall efficiency factor for the detector pixel defined by x 0 R . This is significantly simpler than the scintillator blurring model that replaces the scalar ε by a matrix B that mixes the count among nearby pixels [2].

Tomographic Inversion
The MBIR result for the absorption function μ(x) is based on minimization of an appropriate objective function in which the array μ now represents the absorption function values over an M element sample grid, and the array n represents the (integer photon count) data. The function takes a penalized likelihood (PL) form with log-likelihood term L, quantifying the photon count Poisson statistics, and regularization term R which encompasses all sample prior knowledge. The latter includes prior constraints on the decomposition (4), such as material stepwise uniformity (with sharp, flat interfaces), metal interconnect geometry, and expected metal types in various sample regions. The relative weight β controls the balance between data fidelity and regularization and its optimal choice must be part of an investigation using various known test samples. For the present problem, the data consist of independent photon count measurements for each source-receiver pair, with mean count where τ l is the dwell time, and we define the following: These are, respectively, the source mean photon count (determined by target and e-beam properties), the mean background count (obtained, e.g., by a smooth fit over a range of energies surrounding the fluorescent energy), and the attenuation factor. The first two are, for now, assumed accurately known, while the latter contains the μ-dependence targeted by the inversion. For simplicity, additional dependence on beam incident directionê S , beam geometry, etc., has been suppressed from the notation, and extensions to problems wheren 0 is also uncertain [5] will not be treated here. The Poisson likelihood takes the form of an independent sum over all measurements, in which with n l being the actual measured (integer) photon count.

Simplified Form Likelihood
In general, the μ-dependence ofn l is complicated (being a superposition of exponentials of linear functionals of μ), but with the approximation (10), Eq. (13) allows one to simplify (17) to the form The key simplification here is that μ now enters through the single linear functionalx l =x l [μ]. Explicitly, upon discretizing the sample one may write an individual line integral in the form in which w j (x S , x R ) is the length of the intersection of the line segment connecting x S to x R with voxel j . It follows that one may writē in which the (non-negative) matrix elements are weighted averages of the intersection lengths over the paths associated with measurement l (characterized by source spot center x 0 S,l and receiver pixel center x 0 R,l ). The function h l [n l ,n l (x)] has a unique minimum at but is not a convex function of x if r l > 0-unfortunately ruling out certain numerically efficient objective function minimization algorithms. The result (25) simply states that the most likely absorption model x is such that the measured photon count is precisely the expected count: n l =n l (x). Except for statistical outliers, the measured photon count will typically lie in the range b l + r l > n l > r l , yielding the physically required positive minimum, x > 0. Note that a simplified measurement model might forgo the average in Eqs. 17 or 24, using instead the single set of line segment lengths associated with the single line connecting the measurement centroids. The result is then an inversion that neglects "blurring" effects in the real data. In principle, for sufficiently large photon counts, one may expect there to be a noticeable difference between images derived from the two different models. As long as parameters {b l , r l , x S,l , x R,l }, along with the associated fluorescence weighting functions { S,l } (which depend on the measurement through the local fluorescent layer structure and orientation) are accurately known, the blur-aware model should produce an unblurred image from the blurred data, whereas the mismatched model will produce a blurred image from blurred data.

Image Quality Metrics: Smooth Reconstructions
We consider first tomographic reconstructions based on smoothly varying targets, controlled by L 2 -type regularizations. This rules out piecewise constant type prior constraints on which (4) is based. We will generalize the theory to include the latter in subsequent sections.

Spatial Resolution Metrics
In what follows, we assume the existence of a rapidly converging algorithm that producesμ[n] for given fixed data array n and underlying statistical signal forward model defined by Eqs. 17-20.
The simplest image quality metrics are based on "noise free" data. Thus, letμ(n) be the solution (15) obtained by replacing the data values by the mean values, n i →n i [μ], following the minimization (i.e., the replacement is not n =n(μ), which would entail a minimization that includes the μ-dependence ofn). Since n i appears linearly in the μ-dependent terms in Eq. 20 (the n i ! term is a trivial normalization that plays no role in the minimization (15)), the evaluation of Eq. 27 for continuous n is straightforward.
Equation 27 defines an estimatorμ =μ[n(μ)] as a function of the true μ. An obvious measure of image quality is therefore how closely the former reproduces the latter. A way to quantify this is through the set of point spread functions in which [ê (μ) j ] l = δ lj perturbs the absorption on the single voxel j . These measure the change in the estimated μ associated with a change in the actual μ (under the assumption of noise-free data). In a perfect world,μ = μ, and one obtains [P j ] i = δ ij -the inversion result precisely tracks the input value. More realistically, the support of P j will spread to nearby voxels, and optimal measurement design is aimed at minimizing this spread.
The last line of Eq. 28 naturally factors the point spread function into the product of (a) a vector quantifying the sensitivity of the data to absorption changes at the single site x j , with (b) a matrix quantifying the sensitivity of the inversion to changes in the data. Using Eqs. 17 and 18, one obtains for term (a): The result is nonzero only ifw lj > 0, i.e., the ray intersects voxel j within the support of the average (24). The matrix term (b) quantifies the degree to which the set of measurements (together with the regularization conditions) can be used to resolve a unique signature from voxel j . Resolution will improve as one increases the number of line integrals passing through x j from as diverse a set of directions as possible. An explicit expression is obtained from the minimum condition ∇ μ (μ + δμ,n + δn) = 0 (30) in the neighborhood of the given solutionμ[n] at δμ = 0, δn = 0. We emphasize here again that ∇ μ operates only on the first argument of , not on the implicit μ-dependence ofn. To linear order, one obtains in which the subscript 0 indicates that the second derivatives are evaluated at δμ = 0, δn = 0. The smooth reconstruction assumption being made in this section is equivalent here to the existence of the derivatives in Eq. 31, and as a consequence one obtains Note here that ∇ μ ∇ μ 0 is a square matrix, hence with nominally well-defined inverse, while ∇ μ ∇ n 0 is in general not square. From Eqs. 16-20, one obtains in which [e (n) l ] m = δ lm is the unit vector along data dimension l, while is the corresponding second derivative matrix derived from the regularization term. It is precisely the existence of the matrix R that defines a smooth reconstruction [1,2]. A typical choice is a quadratic form for R, giving rise to a constant, positive definite matrix R. The mean photon count second derivatives are given by
For sufficiently high-quality data, one may expect n l [μ]/n l [μ] 1, and to leading order (36) simplifies to which involve only the first derivative (29), and all quantities are evaluated at the input value μ. In this way, under conditions where the image is expected to be reasonably accurate, correspondingly accurate image quality metrics may be derived from the forward model alone, without actually needing to first derive the tomographic inversionμ.

Local Noise Metrics
The next step is to explore the effects of noise, measuring image quality in the presence of both model imperfection and measurement noise. Adopting the linear approximation, the effects of noise may be quantified by the covariance in which is the underlying photon count covariance. In our model, the Poisson statistics of each measurement are independent and C[n] is therefore diagonal, with each measurement variance equal to the mean: C[n] lm = (n l −n m )(n m −n m ) =n l δ lm ≈ n l δ lm .
The local image covariance C(μ) is in general non-diagonal since the matrix ∇ nμ [n(μ)], the same as that appearing in Eqs. 28 and 32, is nonlocal. It follows that sensitivity of a particular photon count to μ spreads to the neighborhood of the corresponding source-receiver ray.

Simplified Form Image Quality Metrics
For the simplified form Eq. 21, with Eq. 23, one obtains (via (25)) n l (μ) = b l e −w l ·μ + r l , which then produces ∂n l (μ) ∂μ j = −w lj b i e −w l ·μ = −w lj (n l − r l ) and depends on a combination of the (average) line integral sensitivityw l of measurement l to voxels i, j and the overall expected non-background photon count (n l − r l ). Once again, with high-quality data, one may usē n l [μ] interchangeably withn l [μ], avoiding the need to first reconstructμ. Moreover, if the photon counts are sufficiently high, one will have n l /n l ≈ 1, and the quality metrics may then be computed directly from the data with the replacementn → n [2]: the simplified forms require only the blur-averaged geometrical parametersw l , measured photon counts n l , and data-derived background counts r l without any direct reference to μ orμ.
Correspondingly, from a simulation point of view, one may investigate the quality of a proposed experimental setup using the forward model alone to generate the valuesn(μ) (along with background count estimates) from a given target model μ. One may either use these values for n, or add further realism by generating simulated Poisson-distributed counts from these computed average counts. One is not required to go through the more complicated inversion step of first derivingμ[n(μ)] from μ.

Numerical Considerations
The matrix inversion in Eq. 32 is perhaps the most numerically intensive step in the computation of the point spread functions (28). Under certain conditions, this step may be sped up. An important case is that of approximate translation invariance, where Fourier transforms can be used to speed up the inversion tremendously. Thus, let be a matrix that is a function in both indices of a coordinate on a rectangular grid. One may always rewrite this as a function of the difference and center coordinates, However, we suppose now thatÃ is local in the first argument, effectively vanishing outside of some range r 0 , A ij → 0 for |x i − x j | > r 0 , while essentially constant over this same range in the second argument. The objective is to solve a matrix equation of the form in which we define with indices lying in the distinct spatial grid and photon measurement spaces. For the present problem, we identify Under the slowly varying condition, one may then derive an approximate solution to Eq. 46 using Fourier transform "windowing." Thus, about each given point x, consider a box of diameter 2r 0 , and define the local Fourier transformŝ (x i , x), in which M is the number of points in the box, and the prime on the sums indicates restriction to the corresponding box centered at the origin. The wave vector q ranges over the box domain reciprocal lattice, and describes the more rapid variation of A and B on scales smaller than r 0 . Note that sinceÃ is short range in its first argument, the sum restriction is essentially redundant. Applying the local Fourier transform to both sides of (46), one obtainŝ In the approximate equality, we have neglected the x i − x j dependence of the second argument ofÃ and thereby succeeded in proving a local version of the convolution theorem. The matrix inversion now reduces to an algebraic division of both sides byÂ and a subsequent inverse Fourier transform, and one obtains the desired result Here, the result is restricted to the local box neighborhood of a chosen point x. The global solution is obtained by varying x over the appropriate global set of box centers.

Consistency Conditions
It remains to discuss the conditions under which one indeed expectsÃ to obey the desired conditions. First, the regularization term R is typically local, e.g., depending on near-neighbor differences μ(x i ) − μ(x j ). Often a quadratic regularization is used [2], in which case R is a constant, near-diagonal matrix (or perhaps a slowly varying set of near-diagonal quadratic coefficients, depending on prior knowledge of the target), and the desired conditions indeed hold because such a regularization biases the solution toward relatively smoothμ(x).
Similarly, if the bias is toward smooth, slowly varyinĝ μ(x), the likelihood term (19) will be a smooth function of the mean countsn(μ). However, each independent countn l depends on a narrow cylinder of voxels connecting source and receiver and is hence strongly nonlocal. In particular, the second derivative (35) will be nonzero for any pair x i , x j lying within the cylinder defined byn l .
On the other hand, the sum over l in Eq. 19 contains many cylinders, and one expects high-resolution images to emerge only when each voxel is intersected by many cylinders. In this case for widely separated x i , x j , one expects a small number of cylinders to pass through both, and a correspondingly small number of terms will contribute to the l-sum in the first equalities in Eqs. 36 or 37. In contrast, for x i , x j within a cylinder diameter, many terms will contribute. It follows that ∂ 2 L/∂μ i ∂μ j will be strongly peaked about the diagonal x i = x j , and the locality property emerges.

Generalized Image Quality Metrics
We next consider more general classes of regularization terms R. The structure of L remains the same as before, and we continue to assume that many l-cylinders pass through each voxel. Thus, ∂ 2 L/∂μ i ∂μ j continues to be near diagonal. More problematical is the structure (4) in which the densities n A (x) are piecewise slowly varying, but with sharp interfaces between, further biased, e.g., toward metal interconnect rectilinear geometry. The regularization term will be relatively agnostic to the position of the interface, but sharp interfaces dictate an L 1 rather than L 2 regularization. The second derivative of R will therefore be very singular when evaluated atμ(x). Moreover, strong rectilinear constraints may lead to global shifts in an interconnect position with small change in n. For example, for cylinders aligned along a preferred interconnect direction, the value of n l will have a large jump as the cylinder crosses from one side of a metal interface to the other.

Illustrative Regularization
Let the target consist of K different materials/compounds with absorption {μ(κ)} K κ=1 . Each target voxel is then assigned a material index M i ∈ {1, 2, . . . , K} with absorption μ i = μ(M i ). A simple regularization term might take the Potts model form in which the fields h κ are used to control the fractional area of each material type, and the coupling constants J κ > 0 favor nearest neighbor voxels i, j being of the same material type. Both could also be made slow functions of position in order to encode prior information on material types in different regions of the target.
Additional terms may be included that favor particular material boundary orientations (e.g., rectilinear). For example, if a, b, c, d are counter-clockwise (starting from the first quadrant) labels of a 2 × 2 plaquette centered on some point i, with corresponding material labels

M a (i), M b (i), M c (i), M d (i), then the terms in
are nonzero only if sites a, b are both the same material but different from c, d, or vice versa, hence favoring horizontal interfaces. Additional terms, for example, could be introduced favoring certain interface types (e.g., metal-insulator).

Properly Designed Image Quality Metrics
By balancing the data-driven likelihood term L against this alternative type of regularization term, characteristic material solutionsM will emerge with certain types of enforced geometries (e.g., piecewise constant, preferred shape and orientation)-notional geometries are illustrated in Fig. 2. The size of the coefficients h κ , J κ , L κ , . . . are chosen to emphasize different target features in proportion to one's degree of confidence in such prior information.
Of course, the more high-quality data one has (the larger L is relative to R), the less impact these coefficients will have. They are most useful under data-starved conditions in which R is able to resolve ambiguities in L in favor of the desired geometry. For example, some violations of geometry rules may in fact be real (due to target  (16), including resolution varying with position. Different material constituents (e.g., metallic interconnects colored blue and green; insulating materials colored yellow and orange), are generally contiguous and compact in shape. However, the Poisson likelihood term (19) and (20) ensures that the photon count data strongly influence the shape of the boundary to be consistent with the line integral values (2) along the fans of X-ray paths (receiver array not shown). Proper balance between the two sets of terms is influenced by the density and diversity of line integrals and by a physically reasonable choice for the regularization amplitude β imperfections, unrecognized manufacturing specifications, etc.), and sufficiently high-quality data must be permitted to resolve the discrepancy in favor of L rather than R. One must therefore ensure that the coefficients in R are chosen in such a way that the regularization does not completely dominate the data. Thus, the difference in the likelihood terms, |L[μ(M 0 ),n 0 ] − L[μ(M 0 ),n 0 ]| should be small: the adjustments in M 0 that significantly reduce R should remain consistent with the data. The results here are therefore quite different from the continuously varying solutions favored by the L 2 -type regularizations discussed in Section 4. In particular, it no longer makes sense to define the objective function as a continuous function of μ, and to subsequently formulate the sensitivity of the minimumμ in terms of gradients-see Eqs. 28 and 30-32. The regularization smoothing process is unlikely to be sensitive to changes limited to a single site.
Instead, one needs to recognize that small changes in the photon count data could lead to strongly nonlocal changes inM, e.g., small lateral shift of an entire metal interconnect (which would be best sensed by a photon count measurement whose source-receiver cylinder of support strongly overlaps the region of this shift). Thus, instead of considering the effects of single local changes in μ, as in Eq. 28, one should probe the effects of more general changes consistent with R. Specifically, let M 0 be a representative physically consistent material geometry (in the sense that R(M 0 ) is small), with corresponding mean photon countsn 0 =n[μ(M 0 )], and let be the minimizer. An obvious image quality requirement is that |M 0 −M 0 | should be small (in an appropriate norm that, e.g., measures the number of voxels on which the models differ). Image quality should also be judged on the ability of the photon count data (sparse as it might be) to quantitatively distinguish changes consistent with R. If the data is so starved that such changes have too small an effect on L, then the tomography experiment cannot be expected to yield robust results. To quantify this, consider the class of changes δM 0 such that |R(M 0 + δM 0 ) − R(M 0 )| is small. We then demand that L be able sense such changes-the measurement design must be capable of disambiguating nearby target geometries that are both consistent with R. Defining the reconstruction change δM 0 bŷ then a more general image quality metric, generalizing (28), is that should be "close" to δM 0 (also in the sense of differing on a small number of target pixels). In the present case, δM 0 might correspond to a small, smooth translation of a material boundary, and the desire would be that δM 0 substantially matches this change.

Forward-Model-Only Approximate Formulation
As discussed below (37), it would be desirable to derive reasonably accurate image quality metrics from the forward model alone. Thus, at the end of Section 4.1, we succeeded in estimating the key second derivative matrices appearing in Eq. 32 in terms of the forward photon datan(μ) alone without reference to the optimal modelμ. Here, the presence of R(M), which is deliberately sensitive to the discrete nature of M in order to help select models with desirable piecewise constant geometries, is a complication because the assumption of continuous second derivatives no longer make sense. An alternative approach taken here is to consider only models M 0 consistent with the geometrical constraints, hence with relatively small values of R[M 0 ], and to consider only perturbations δM 0 such that R(M 0 + δM 0 ) − R(M 0 ) is very small, i.e., such that the change in the tomographic estimate δM 0 is driven by the changen[μ(M 0 + δM 0 )] − n[μ(M 0 )] in the photon count data. Thus, we suppose that the measurement protocol is such that the change in is dominated by the change in L, which remains continuous in μ.
It is convenient to write the mean photon count likelihood term in the form L(μ, n)| n=n(μ) ≡L(μ, μ), (57) in which one calls out the separate dependencies on the reconstructed and true absorption profilesμ, μ (both appearing, via (19), through their respective mean photon counts n(μ),n(μ)). One notes as well that a model perturbation which reassigns the absorption values on the support of δM. With this notation, the minimum condition for the reconstructionμ 0 (μ 0 ) now takes the form L(μ 0 + δμ, μ 0 ) −L(μ 0 , μ 0 ) = 0 (59) over the subspace of permissible δμ ≡ δμ(M 0 , δM). Given that the mean photon countsn(μ) depend continuously on μ, and that one expects only small changes if the support of δμ is small (except for singular cases, which we neglect, where the support of a long narrow change happens to line up exactly with a ray trajectory), one may approximate (59) by in which ∇ 1,2 will refer to μ-derivatives with respect to the first and second arguments. Considering next small changes μ 0 → μ 0 +δμ 0 to the true profile, the generalization of the minimum condition (31) is (δμ · ∇ 1 )(δμ 0 · ∇ 1 + δμ 0 · ∇ 2 )L(μ 0 , μ 0 ) = 0, which is enforced for all permitted independent choices of δμ, δμ 0 , and is to be solved for the induced change δμ 0 in the reconstruction for each given δμ 0 . The second derivatives ofL are computed as in Section 4.1. As observed below (37), they may be expressed entirely in terms of the forward model parameters alone-one need not perform the tomographic inversion in order to evaluate the image quality. A reasonable starting point for investigating solutions to Eq. 61 is to evaluate the second derivatives ofL at μ 0 = μ 0 (as in Eq. 37), which then also allows one to draw the perturbations δμ, δμ 0 , δμ 0 from the same set. Thus, if {δμ a } L a=1 are the permitted common set of small R perturbations, and one defines the L × L matrices L (1) ab = (δμ a · ∇ 1 )(δμ b · ∇ 1 )L(μ 0 , μ 0 ) L (2) ab = (δμ a · ∇ 1 )(δμ b · ∇ 2 )L(μ 0 , μ 0 ), then Eq. 61 takes the form L (1) ac + L (2) ab 0 (63) for all a, and for some value c for each choice of b. The use of " " here indicates that perfect equality is unlikely to be achieved when only discrete values of the absorption on a discrete lattice are permitted (compared to the continuous values permitted in the corresponding matrix (31)). Instead, one will likely only obtain approximate correspondence, e.g., of displacements of various interfaces between Potts domains of uniform material. For a perfect measurement, one expects c = b, but in general, there will be an imperfect correspondence between the true and reconstructed perturbations.