1 Introduction

Image quality (IQ) assessment is an important element of system design, optimization, and quality control [1, 2]. A complete assessment evaluates the entire imaging chain including both data acquisition and image reconstruction stages. Retrospective IQ measurement is based on careful analysis of images taken in a series of laboratory measurements, often using specialized phantoms. In contrast, prospective image quality prediction is based on a high-quality end-to-end simulation of the system data product, and subsequent image reconstruction algorithm applied to that data. A key application is to guide system design in advance of hardware construction when physical data are not yet available. Similarly, it allows one to tune or customize an existing system to a particular set of specialized tasks under conditions where the retrospective approach is time-consuming or infeasible for other reasons (e.g., target sensitivity to, or potential degradation under, multiple measurements).

Given an accurate system forward model, IQ assessment is relatively straightforward in cases of “direct” image reconstruction algorithms, in which a relatively simple (though perhaps numerically intensive) forward algorithm is applied to the data to obtain an image. An example is back projection image formation, which is also linear in the data. Various Fourier transform methods (e.g., SAR or range-Doppler processing) are special cases. Construction of various IQ metrics, such as point spread functions, are equally direct.

Less straightforward are imaging algorithms that rely on model-based iterative reconstruction (MBIR). The resulting algorithm may be highly nonlinear in the data, and IQ metrics may therefore vary strongly with position, noise levels, source and detector blur, etc. This certainly increases the complexity of, and necessity for, IQ assessment. Here, we develop an IQ assessment approach for the fluorescent X-ray tomography system (Fig. 1). We define and evaluate a set of rigorous IQ metrics informed by the MBIR based on the Poisson statistics of photon counting. Specific forward model elements for this system that must be accurately modeled include focal spot blur (finite-size X-ray source region in the target), cross-talk between superconducting detector pixels when recording change in conductivity due to a single photon capture, and noise correlations.

Fig. 1
figure 1

Top: Illustrated is the microscope interior chamber, highlighting the electron column, electron detectors (SE, BSE), and the TES-based [3] transmission X-ray detector assembly. The target/sample assembly is mounted in the sample holder and positioned for imaging with a translation and rotation stage. The detector package is mounted in a bellows inserted into a “receiver tube” for high proximity to the X-ray source. The instrument is a customized adaptation of Orsay Physics’ “NanoSpace” instrument, performed collaboratively with Orsay Physics. Bottom: Target/sample assembly and stimulated signals. The tomographic data collection uses fluorescent X-rays induced in a metallic thin film deposited onto the prepared backside of the sample. To obtain desired resolution, the sample must be decapsulated and the silicon backside thinned to within a few hundred nanometers of the semiconductor device layer prior to the deposition step

This paper is intended to provide a detailed theoretical basis for a general set of IQ metrics. Specific illustrative applications to the RAVEN IC tomography problem will be presented separately.

2 System Forward Model

In the X-ray microscope system (Fig. 1), the collected fluorescent photon count data forms the basis for a tomographic inversion of the sample structure. We begin by accounting for the physics and geometry of the measurement, assuming ideal sample stage (translation and rotation) operation, perfect knowledge of the electron beam location, and ideal photon detector behavior. Later, we will include effects of sample stage and electron beam uncertainty, as well as non-ideal detector effects such as pixel cross-talk and noise properties.

2.1 Measurement Physics

In what follows, we choose a 3D coordinate system that is fixed in the sample. Of course, it is generally the sample that is translated and rotated while the electron beam and detector apparatus remain fixed. However, it is the detailed sample structure that is the desired end-product of the tomographic inversion, and these take the form of fixed functions of position within the sample volume. In this frame, the relative positions and orientations of the source, receiver, and electron beam must be carefully tracked for each measurement.

For an idealized point source xS and point receiver xR, the mean count rate for a chosen fluorescent photon with sharp energy E is modeled in the form

$$ \begin{array}{@{}rcl@{}} n(\textbf{x}_{S},\textbf{x}_{R};E) &=& n_{0}(\textbf{x}_{S},E) \frac{e^{-M(\textbf{x}_{S},\textbf{x}_{R};E)}}{4\pi |\textbf{x}_{R} - \textbf{x}_{S}|^{2}} \\ &&+\ n_{B}(\textbf{x}_{S},\textbf{x}_{R};E) {\Delta} E \end{array} $$
(1)

where n0 is the raw fluorescent photon production rate, nB is the background rate (due to bremsstrahlung, multiple scattering, etc.) of continuum photons deposited in the energy bin ΔE about E. The first term represents the photon intensity derived from the physical optics approximate solution to the 3D wave equation, including geometric spreading and energy dissipation. For the latter, fluorescent photons are absorbed (or scattered to lower energy) according to the line integral (valid in the weak absorption limit characteristic of higher energy X-rays)

$$ M(\textbf{x}_{S},\textbf{x}_{R};E) = {\int}_{0}^{|\textbf{x}_{R}-\textbf{x}_{S}|} \mu[\textbf{x}(s);E] \text{ds} $$
(2)

in which

$$ \begin{array}{@{}rcl@{}} \textbf{x}(s) &=& \textbf{x}_{S} + s \hat{\textbf{e}}_{{SR}} \\ \hat{\textbf{e}}_{{SR}} &=& \frac{\textbf{x}_{R} - \textbf{x}_{S}}{|\textbf{x}_{R} - \textbf{x}_{S}|} \end{array} $$
(3)

defines the ray, parameterized by physical distance 0 ≤ s ≤|xRxS|, connecting the two endpoints. The function μ(x, E) is the local absorption rate as a function of 3D location x, commonly modeled as a linear superposition

$$ \mu(\textbf{x}) = \sum\limits_{A} \mu_{A}(E) n_{A}(\textbf{x}), $$
(4)

in which the index A ranges over atomic elements (Au, Al, Si, Cu, etc.) and nA(x) is the element number density.

For specified electron beam entry center point \(\textbf {x}_{S}^{0}\), we write the position-dependent fluorescent production rate in the form

$$ n_{0}(\textbf{x}_{S},E) = \bar n_{0}(\textbf{x}_{S}^{0}) {\Phi}_{S}(\textbf{x}_{S} - \textbf{x}_{S}^{0}; E,\textbf{x}_{S}^{0},\hat{\textbf{e}}_{S}^{0}) $$
(5)

in which the source distribution function ΦS is normalized to unit integral in the first argument, and is now explicitly parameterized as well by the electron beam energy E (through fluorescent production efficiency, electron multiple scattering, etc.), the beam center point \(\textbf {x}_{S}^{0}\), and the incident direction \(\hat {\textbf {e}}_{S}^{0}\) of that beam. Thus,

$$ \bar n_{0}(\textbf{x}_{S}^{0}) = \int d\textbf{x}_{S} n_{0}(\textbf{x}_{S},E) $$
(6)

represents the total integrated production rate, in general also depending on energy and incident beam direction, but this will normally be suppressed from the notation.

We now use ΦS to define an average of any function F(xS, xR) of the source and receiver points by

$$ \begin{array}{@{}rcl@{}} \langle F(\textbf{x}_{S},\textbf{x}_{R}) \rangle_{0} &\equiv& \bar F(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E,\hat{\textbf{e}}_{S}) \\ &=& \int d\textbf{x}_{S} {\Phi}_{S}(\textbf{x}_{S} - \textbf{x}_{S}^{0}; E,\textbf{x}_{S}^{0},\hat{\textbf{e}}_{S}^{0}) \\ &&\times\ {\int}_{A_{R}} \frac{d\textbf{a}_{R} \cdot \hat{\textbf{e}}_{{SR}}^{0}} {\textbf{A}_{R} \cdot \hat{\textbf{e}}_{{SR}}^{0}} F(\textbf{x}_{S},\textbf{x}_{R}), \end{array} $$
(7)

which includes separate averages over the 3D source region volume and over the 2D receiver pixel area AR. The result is defined by the entry center point \(\textbf {x}_{S}^{0}\) of the electron beam on the target surface, the receiver center coordinate \(\textbf {x}_{R}^{0}\), and the corresponding mean incident X-ray flux direction \(\hat {\textbf {e}}_{{SR}}^{0} = \frac {\textbf {x}_{R}^{0} - \textbf {x}_{S}^{0}}{|\textbf {x}_{R}^{0} - \textbf {x}_{S}^{0}|}\). The receiver area integral is normalized by the total projected area \(\textbf {A}_{R} \cdot \hat {\textbf {e}}_{{SR}}^{0}\) along the incident X-ray flux direction. More generally, one could also incorporate a receiver sensitivity profile that is not uniform over the pixel.

With the above definitions, for finite-size source and receiver, the actual mean measured count rate is expressed in the form

$$ \begin{array}{@{}rcl@{}} \bar n_{\text{meas}}(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) &=& \int d\textbf{x}_{S} {\int}_{A_{R}} d\textbf{a} \cdot \textbf{e}_{{SR}}^{0} n(\textbf{x}_{S},\textbf{x}_{R};E) \\ &=& \frac{\Delta {\Omega}_{{SR}}}{4\pi} \bar n_{0}(\textbf{x}_{S}^{0}) \langle e^{-M(\textbf{x}_{S},\textbf{x}_{R};E)} \rangle_{0} \\ &&+\ \bar n_{B}(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) {\Delta} E \end{array} $$
(8)

in which \(\bar n_{B}\) is the integrated background rate, and

$$ {\Delta} {\Omega}_{{SR}} = \frac{\textbf{A}_{R} \cdot \hat{\textbf{e}}_{{SR}}^{0}} {|\textbf{x}_{S}^{0} - \textbf{x}_{R}^{0}|^{2}} $$
(9)

is the solid angle subtended by the receiver pixel—in the denominator, we have simply substituted \(\textbf {x}_{S} - \textbf {x}_{R} \to \textbf {x}_{S}^{0} - \textbf {x}_{R}^{0}\) under the reasonable assumption that source region and pixel diameter are very small compared to their separation. The background term is not resolved in detail here since it is more difficult to model quantitatively and is assumed to be a smooth function of E, which can be fit and subtracted directly from the data. Although in principle present, any residual dependence of the background on μ is neglected.

The function ΦS encompasses the entire fluorescent X-ray creation process, including incident electron beam shape and energy spectrum, fluorescing layer thickness, and orientation. All of these need to be included in the system model. For a horizontally homogeneous target, the extra dependence of ΦS on \(\textbf {x}_{S}^{0}\) drops out (with only the difference variable \(\textbf {x}_{S} - \textbf {x}_{S}^{0}\) surviving). However, target inhomogeneity will in general be present (e.g., through varying deposited film thickness).

The source region may be crudely thought of as a cylinder with axis oriented along \(\hat {\textbf {e}}_{S}^{0}\) and diameter determined by electron beam scattering characteristics. More generally, ΦS would either need to be determined via prior calibration measurements in the absence of the sample, or modeled through Monte Carlo simulations of the electron beam, with given energy and geometry, interacting with the target with given atomic composition and geometry [4]. Clearly, detailed calibration measurements would be most desirable, but these are unlikely to be available given that the target film is most likely directly applied to the sample in our setup (Fig. 1). In absence of this, an effort should be made to produce as uniform a film as possible. With sufficient measurement diversity, it turns out that although it is essential that the electron beam position be tracked to high precision, the source intensity (hence film thickness) may to a certain extent be included as part of the inversion. Thus, given a sufficiently large combination of source and receiver pixel positions, there may be sufficient redundancy in the geometry of rays passing through the sample to refine the estimate for the amplitude \(\bar n_{0}\) in Eq. 8 based on the sample-present measurements only (see, e.g., [5] and references therein).

Of course, for tomographic purposes, the critical dependence of Eq. 8 is on the variation of M with the precise fluorescent photon path, which could be quite complex due to discontinuities of μ across sample constituent boundaries. As alluded to above, the coordinate system is defined by the fixed function μ(x), while all other parameters, including the detailed geometry of the function ΦS, vary with the measurement.

The aim is to reconstruct μ(x) from an appropriately large set of line integral estimates \(\{M(\textbf {x}_{S}^{0},\textbf {x}_{R}^{0};E) \}\). These are extracted from \(n_{\text {meas}}(\textbf {x}_{S}^{0},\textbf {x}_{R}^{0};E)\), after subtraction of the continuous background, and properly accounting for photon Poisson statistics in the low count regime (see Section 3).

2.2 Simplified form

Equation 8 implies, in general, a very complicated exponential relationship between the source and receiver profiles and the key line integrals. However, there are natural conditions under which the averages over these profiles can be moved directly to the line integrals themselves. Specifically, the experimental design and X-ray energies are deliberately selected so that \(e^{-M(\textbf {x}_{S},\textbf {x}_{R};E)}\) is large enough that a reasonable fraction of the n0(xS, E) source photons make it through the sample, generating a correspondingly reasonable fluorescence count rate, distinguishable from the background. Thus, M(xS, xR;E) = O(1), and we further assume that, for most source-receiver pairs \((\textbf {x}_{S}^{0},\textbf {x}_{R}^{0})\), the variation of M with \(\textbf {x}_{S} - \textbf {x}_{S}^{0}\) and \(\textbf {x}_{R} - \textbf {x}_{R}^{0}\) is weak:

$$ \begin{array}{@{}rcl@{}} M(\textbf{x}_{S},\textbf{x}_{R};E) &\equiv& M(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) + \delta M(\textbf{x}_{S},\textbf{x}_{R};E), \\ &&\frac{|\delta M|}{M^{0}} \ll 1. \end{array} $$
(10)

There are clearly cases where this will fail, such as when a ray happens to run for a long distance parallel and immediately adjacent to a straight metal–nonmetal boundary. However, this will likely be a rare occurrence: much more commonly, slight adjustments of a ray will produce similarly small changes in the material profile along the ray, hence only slight adjustments to the net line integral. Note that if this were not the case, then the experimental resolution requirements are likely not being met, and the inversion will not produce an accurate result.

Proceeding under the assumption that the experimental design ensures validity of Eq. 10, to high accuracy one obtains as follows:

$$ \begin{array}{@{}rcl@{}} \langle e^{-M(\textbf{x}_{S},\textbf{x}_{R};E)} \rangle_{0} &\approx& e^{-M(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E)} \\ &&\times\ [1 - \langle \delta M(\textbf{x}_{S},\textbf{x}_{R};E) \rangle_{0}] \\ &\approx& e^{-M(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) - \langle \delta M(\textbf{x}_{S},\textbf{x}_{R};E) \rangle_{0}} \\ &=& e^{-\bar M(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E)} \end{array} $$
(11)

in which

$$ \begin{array}{@{}rcl@{}} \bar M(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) &=& \langle M(\textbf{x}_{S},\textbf{x}_{R};E) \rangle_{0} \\ &=& \int d\textbf{x}_{S} {\Phi}_{S}(\textbf{x}_{S} - \textbf{x}_{S}^{0};E) \\ &&\times\ {\int}_{A_{R}} \frac{d\textbf{a}_{R} \cdot \hat{\textbf{e}}_{SR}} {\textbf{A}_{R} \cdot \hat{\textbf{e}}_{{SR}}} \\ && \times\ {\int}_{0}^{|\textbf{x}_{R}-\textbf{x}_{S}|} \mu[\textbf{x}(s);E] \text{ds} \ \ \ \ \ \ \end{array} $$
(12)

is the measurement average of the line integral. Substituting (11) into (8), the mean photon count rate follows in the form

$$ \begin{array}{@{}rcl@{}} \bar n_{\text{meas}}(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) &\approx& \frac{\Delta {\Omega}_{SR}}{4\pi} \bar n_{0}(\textbf{x}_{S}^{0}) e^{-\bar M(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E)} \\ &&+\ \bar n_{B}(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) {\Delta} E. \end{array} $$
(13)

Higher order terms 〈δM20 are neglected. Equation 13 conveniently allows one to treat the measured photon count as a single Poisson process involving a “cone volume” average of rays, rather than a complex superposition of Poisson processes for each individual ray in the cone. This greatly simplifies the inversion scheme. For an actual microchip, there will be structure on many length scales, requiring a multi-scale approach to estimate μ (or, more specifically, the atomic densities nA) constrained by available prior information.

2.3 Measurement Imperfections

In the above discussion, we have assumed perfect knowledge of the source positions xS and the associated function ΦS. These will be impacted by electron beam wandering and by sample stage translation and rotation uncertainties. We have also not yet accounted for detector uncertainties, such as pixel cross-talk and multiple photon absorption within a detection time window, that will generate, respectively, photon count and energy estimate errors.

For typical CT scans which use scintillator detectors, rather than being directly absorbed the X-ray photon creates a cascade of visible light photons that are then converted to an electronic signal. Scattering of the photons between pixels can be an important blurring effect.

In contrast, the TES detector pixel fully absorbs an X-ray photon, generating an equilibrium heating effect that translates to a superconducting element conductivity change that in turn measures the photon energy [6]. The detailed equilibration process is very complicated, and might also deposit energy in more than one pixel. However, the energy deposited in each pixel, while summing to E, would be less than E, hence would appear as part of the background Bremsstrahlung. This leads to a fluorescent X-ray undercount rather than a spreading of the count [2]. Such processes might then be modeled in the form [1, 2]

$$ \bar n_{\text{meas}}(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) \to \varepsilon(\textbf{x}_{R}^{0},E) \bar n_{\text{meas}}(\textbf{x}_{S}^{0},\textbf{x}_{R}^{0};E) $$
(14)

in which \(\varepsilon (\textbf {x}_{R}^{0},E) \leq 1\) is an overall efficiency factor for the detector pixel defined by \(\textbf {x}_{R}^{0}\). This is significantly simpler than the scintillator blurring model that replaces the scalar ε by a matrix B that mixes the count among nearby pixels [2].

3 Tomographic Inversion

The MBIR result for the absorption function μ(x) is based on minimization of an appropriate objective function

$$ \hat {\boldsymbol{\mu}}(\textbf{n}) = \arg \min_{\boldsymbol{\mu}} {\Phi}({\boldsymbol{\mu}},\textbf{n}) $$
(15)

in which the array μ now represents the absorption function values over an M element sample grid, and the array n represents the (integer photon count) data. The function Φ takes a penalized likelihood (PL) form

$$ {\Phi}({\boldsymbol{\mu}},\textbf{n}) = L({\boldsymbol{\mu}},\textbf{n}) + \beta R({\boldsymbol{\mu}}), $$
(16)

with log-likelihood term L, quantifying the photon count Poisson statistics, and regularization term R which encompasses all sample prior knowledge. The latter includes prior constraints on the decomposition (4), such as material stepwise uniformity (with sharp, flat interfaces), metal interconnect geometry, and expected metal types in various sample regions. The relative weight β controls the balance between data fidelity and regularization and its optimal choice must be part of an investigation using various known test samples.

For the present problem, the data consist of independent photon count measurements for each source-receiver pair, with mean count

$$ \begin{array}{@{}rcl@{}} \bar n_{l} &=& \tau_{l} n_{\text{meas}}(\textbf{x}_{S,l}^{0},\textbf{x}_{R,l}^{0};E) \\ &\equiv& b_{l} \langle e^{-x_{l}} \rangle_{0} + r_{l} \end{array} $$
(17)

where τl is the dwell time, and we define the following:

$$ \begin{array}{@{}rcl@{}} b_{l} &=& \tau_{l} \frac{\Delta {\Omega}_{SR}}{4\pi} \bar n_{0}(\textbf{x}_{S,l}^{0}) \\ r_{l} &=& \tau_{l} \bar n_{B}(\textbf{x}_{S,l}^{0},\textbf{x}_{R,l}^{0};E) {\Delta} E \\ x_{l} &=& M(\textbf{x}_{S,l},\textbf{x}_{R,l};E). \end{array} $$
(18)

These are, respectively, the source mean photon count (determined by target and e-beam properties), the mean background count (obtained, e.g., by a smooth fit over a range of energies surrounding the fluorescent energy), and the attenuation factor. The first two are, for now, assumed accurately known, while the latter contains the μ-dependence targeted by the inversion. For simplicity, additional dependence on beam incident direction \(\hat {\textbf {e}}_{S}\), beam geometry, etc., has been suppressed from the notation, and extensions to problems where \(\bar n_{0}\) is also uncertain [5] will not be treated here.

The Poisson likelihood takes the form of an independent sum over all measurements,

$$ L({\boldsymbol{\mu}},\textbf{n}) = \sum\limits_{l=1}^{N} h_{l}(n_{l},\bar n_{l}), $$
(19)

in which

$$ \begin{array}{@{}rcl@{}} h_{i} &=& -\ln\left( \frac{\bar n_{l}^{n_{l}} e^{-\bar n_{l}}}{n_{l}!} \right) \\ &=& \bar n_{l} - n_{l} \ln(\bar n_{l}) + \ln(n_{l}!), \end{array} $$
(20)

with nl being the actual measured (integer) photon count. Recall here that the dependence on μ resides, via (1) and (2), in the mean count \(\bar n_{l} = \bar n_{l}[{\boldsymbol {\mu }}]\).

3.1 Simplified Form Likelihood

In general, the μ-dependence of \(\bar n_{l}\) is complicated (being a superposition of exponentials of linear functionals of μ), but with the approximation (10), Eq. (13) allows one to simplify (17) to the form

$$ \bar n_{l}(x_{l}) = b_{l} e^{-\bar x_{l}} + r_{l},\ \ \bar x_{l} \equiv \langle x_{l} \rangle_{0}. $$
(21)

The key simplification here is that μ now enters through the single linear functional \(\bar x_{l} = \bar x_{l}[{\boldsymbol {\mu }}]\). Explicitly, upon discretizing the sample one may write an individual line integral in the form

$$ M(\textbf{x}_{S},\textbf{x}_{R};E) = \sum\limits_{j=1}^{M} w_{j}(\textbf{x}_{S},\textbf{x}_{R}) \mu_{j} $$
(22)

in which wj(xS, xR) is the length of the intersection of the line segment connecting xS to xR with voxel j. It follows that one may write

$$ \bar x_{l} = \sum\limits_{j} \bar w_{{lj}} \mu_{j} \equiv \bar{\textbf{w}}_{l} \cdot {\boldsymbol{\mu}} $$
(23)

in which the (non-negative) matrix elements

$$ \begin{array}{@{}rcl@{}} \bar w_{{lj}} &=& \langle w_{j}(\textbf{x}_{S,l},\textbf{x}_{R,l}) \rangle_{0} \\ &\equiv& \int d\textbf{x}_{S} {\Phi}_{S}(\textbf{x}_{S} - \textbf{x}_{S,l}^{0};E) \\ &&\times\ {\int}_{A_{R,l}} \frac{d\textbf{a}_{R,l} \cdot \hat{\textbf{e}}_{SR,l}^{0}} {\textbf{A}_{R,l} \cdot \hat{\textbf{e}}_{{SR},l}^{0}} w_{j}(\textbf{x}_{S},\textbf{x}_{R}) \end{array} $$
(24)

are weighted averages of the intersection lengths over the paths associated with measurement l (characterized by source spot center \(\textbf {x}_{S,l}^{0}\) and receiver pixel center \(\textbf {x}_{R,l}^{0}\)).

The function \(h_{l}[n_{l},\bar n_{l}(x)]\) has a unique minimum at

$$ b_{l} e^{-x} + r_{l} = n_{l} \ \Rightarrow\ x = \ln\left( \frac{b_{l}}{n_{l} - r_{l}}\right), $$
(25)

but is not a convex function of x if rl > 0—unfortunately ruling out certain numerically efficient objective function minimization algorithms. The result (25) simply states that the most likely absorption model x is such that the measured photon count is precisely the expected count: \(n_{l} = \bar n_{l}(x)\). Except for statistical outliers, the measured photon count will typically lie in the range bl + rl > nl > rl, yielding the physically required positive minimum, x > 0.

Note that a simplified measurement model might forgo the average in Eqs. 17 or 24, using instead the single set of line segment lengths

$$ w_{{lj}}^{0} \equiv w_{j}(\textbf{x}_{S,l}^{0},\textbf{x}_{R,l}^{0}), $$
(26)

associated with the single line connecting the measurement centroids. The result is then an inversion that neglects “blurring” effects in the real data. In principle, for sufficiently large photon counts, one may expect there to be a noticeable difference between images derived from the two different models. As long as parameters {bl, rl, xS, l, xR, l}, along with the associated fluorescence weighting functions {ΦS, l} (which depend on the measurement through the local fluorescent layer structure and orientation) are accurately known, the blur-aware model should produce an unblurred image from the blurred data, whereas the mismatched model will produce a blurred image from blurred data.

4 Image Quality Metrics: Smooth Reconstructions

We consider first tomographic reconstructions based on smoothly varying targets, controlled by L2-type regularizations. This rules out piecewise constant type prior constraints on which (4) is based. We will generalize the theory to include the latter in subsequent sections.

4.1 Spatial Resolution Metrics

In what follows, we assume the existence of a rapidly converging algorithm that produces \(\hat {\boldsymbol {\mu }}[\textbf {n}]\) for given fixed data array n and underlying statistical signal forward model defined by Eqs. 1720.

The simplest image quality metrics are based on “noise free” data. Thus, let \(\hat {\boldsymbol {\mu }}(\bar {\textbf {n}})\) be the solution (15)

$$ \hat \mu(\bar{\textbf{n}}) = \left.\arg \min_{\boldsymbol{\mu}} {\Phi}({\boldsymbol{\mu}},\textbf{n}) \right|_{\textbf{n} = \bar{\textbf{n}}({\boldsymbol{\mu}})} $$
(27)

obtained by replacing the data values by the mean values, \(n_{i} \to \bar n_{i}[{\boldsymbol {\mu }}]\), following the minimization (i.e., the replacement is not\(\textbf {n} = \bar {\textbf {n}}(\hat {\boldsymbol {\mu }})\), which would entail a minimization that includes the μ-dependence of \(\bar {\textbf {n}}\)). Since ni appears linearly in the μ-dependent terms in Eq. 20 (the ni! term is a trivial normalization that plays no role in the minimization (15)), the evaluation of Eq. 27 for continuous \(\bar {\textbf {n}}\) is straightforward.

Equation 27 defines an estimator \(\hat {\boldsymbol {\mu }} = \hat {\boldsymbol {\mu }}[\bar {\textbf {n}}({\boldsymbol {\mu }})]\) as a function of the true μ. An obvious measure of image quality is therefore how closely the former reproduces the latter. A way to quantify this is through the set of point spread functions

$$ \begin{array}{@{}rcl@{}} \textbf{P}_{j} &=& \lim_{\epsilon \to 0} \frac{\hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}} + \epsilon \hat{\textbf{e}}^{(\mu)}_{j})] - \hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}})]}{\epsilon} \\ &=& \frac{\partial \hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}})]}{\partial \mu_{j}} = \frac{\partial \bar{\textbf{n}}({\boldsymbol{\mu}})}{\partial \mu_{j}} \cdot \nabla_{\textbf{n}} \hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}})], \end{array} $$
(28)

in which \(\epsilon [\hat {\textbf {e}}^{(\mu )}_{j}]_{l} = \epsilon \delta _{lj}\) perturbs the absorption on the single voxel j. These measure the change in the estimated \(\hat \mu \) associated with a change in the actual μ (under the assumption of noise-free data). In a perfect world, \(\hat {\boldsymbol {\mu }} = {\boldsymbol {\mu }}\), and one obtains [Pj]i = δij—the inversion result precisely tracks the input value. More realistically, the support of Pj will spread to nearby voxels, and optimal measurement design is aimed at minimizing this spread.

The last line of Eq. 28 naturally factors the point spread function into the product of (a) a vector quantifying the sensitivity of the data to absorption changes at the single site xj, with (b) a matrix quantifying the sensitivity of the inversion to changes in the data. Using Eqs. 17 and 18, one obtains for term (a):

$$ \frac{\partial \bar n_{l}({\boldsymbol{\mu}})}{\partial \mu_{j}} = - b_{l} \langle w_{{lj}} e^{-x_{l}} \rangle_{0}. $$
(29)

The result is nonzero only if \(\bar w_{{lj}} > 0\), i.e., the ray intersects voxel j within the support of the average (24).

The matrix term (b) quantifies the degree to which the set of measurements (together with the regularization conditions) can be used to resolve a unique signature from voxel j. Resolution will improve as one increases the number of line integrals passing through xj from as diverse a set of directions as possible. An explicit expression is obtained from the minimum condition

$$ \nabla_{\boldsymbol{\mu}} {\Phi}(\hat {\boldsymbol{\mu}} + \delta {\boldsymbol{\mu}},\bar{\textbf{n}} + \delta \textbf{n}) = 0 $$
(30)

in the neighborhood of the given solution \(\hat {\boldsymbol {\mu }}[\bar {\textbf {n}}]\) at δμ = 0, δn = 0. We emphasize here again that ∇μ operates only on the first argument of Φ, not on the implicit μ-dependence of \(\bar {\textbf {n}}\). To linear order, one obtains

$$ (\delta {\boldsymbol{\mu}} \cdot \nabla_{\boldsymbol{\mu}}) \nabla_{\boldsymbol{\mu}} {\Phi}_{0} + (\delta \textbf{n} \cdot \nabla_{\textbf{n}}) \nabla_{\boldsymbol{\mu}} {\Phi}_{0} = 0, $$
(31)

in which the subscript 0 indicates that the second derivatives are evaluated at δμ = 0, δn = 0. The smooth reconstruction assumption being made in this section is equivalent here to the existence of the derivatives in Eq. 31, and as a consequence one obtains

$$ \nabla_{\textbf{n}} \hat {\boldsymbol{\mu}} = -(\nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} {\Phi}_{0})^{-1} (\nabla_{\boldsymbol{\mu}} \nabla_{\textbf{n}} {\Phi}_{0}). $$
(32)

Note here that ∇μμΦ0 is a square matrix, hence with nominally well-defined inverse, while ∇μnΦ0 is in general not square. From Eqs. 1620, one obtains

$$ \begin{array}{@{}rcl@{}} \nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} {\Phi} &=& \sum\limits_{l} \left[\left( 1 - \frac{n_{l}}{\bar n_{l}}\right) \nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} \bar n_{l} \right. \\ &&+\ \left. \frac{n_{l}}{\bar {n_{l}^{2}}} (\nabla_{\boldsymbol{\mu}} \bar n_{l}) (\nabla_{\boldsymbol{\mu}} \bar n_{l})^{T} \right] + \textbf{R} \\ \nabla_{\boldsymbol{\mu}} \nabla_{\textbf{n}} {\Phi} &=& - \sum\limits_{l} \frac{n_{l}}{\bar n_{l}} (\nabla_{\boldsymbol{\mu}} \bar n_{l}) \textbf{e}^{(n)}_{l} \end{array} $$
(33)

in which \([\textbf {e}^{(n)}_{l}]_{m} = \delta _{{lm}}\) is the unit vector along data dimension l, while

$$ \textbf{R} = \nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} R $$
(34)

is the corresponding second derivative matrix derived from the regularization term. It is precisely the existence of the matrix R that defines a smooth reconstruction [1, 2]. A typical choice is a quadratic form for R, giving rise to a constant, positive definite matrix R. The mean photon count second derivatives are given by

$$ \frac{\partial^{2} \bar n_{l}({\boldsymbol{\mu}})}{\partial \mu_{i} \partial \mu_{j}} = b_{i} \langle w_{{il}} w_{{jl}} e^{-x_{l}} \rangle_{0}. $$
(35)

Finally, setting \({\boldsymbol {\mu }} \to \hat {\boldsymbol {\mu }}[\bar {\textbf {n}}[{\boldsymbol {\mu }}]]\) and \(\textbf {n} \to \bar {\textbf {n}}[{\boldsymbol {\mu }}]\), one obtains

$$ \begin{array}{@{}rcl@{}} \nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} {\Phi}_{0} &=& \sum\limits_{l} \left[\left( 1 - \frac{\bar n_{l}({\boldsymbol{\mu}})} {\bar n_{l}(\hat {\boldsymbol{\mu}})}\right) (\nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} \bar n_{l})_{0} \right. \\ &&+\ \left. \frac{\bar n_{l}({\boldsymbol{\mu}})}{\bar n_{l}(\hat {\boldsymbol{\mu}})^{2}} (\nabla_{\boldsymbol{\mu}} \bar n_{l})_{0} (\nabla_{\boldsymbol{\mu}} \bar n_{l})_{0}^{T} \right] + \textbf{R} \\ \nabla_{\boldsymbol{\mu}} \nabla_{\textbf{n}} {\Phi}_{0} &=& -\sum\limits_{l} \frac{\bar n_{l}({\boldsymbol{\mu}})}{\bar n_{l}(\hat {\boldsymbol{\mu}})} (\nabla_{\boldsymbol{\mu}} \bar n_{l})_{0} \textbf{e}^{(n)}_{l} \end{array} $$
(36)

in which the subscript 0 in the \(\bar n_{l}\)-gradient terms denotes substitution \({\boldsymbol {\mu }} \to \hat {\boldsymbol {\mu }}\) inside xl in Eqs. 29 and 35.

For sufficiently high-quality data, one may expect \(\bar n_{l}[{\boldsymbol {\mu }}]/\bar n_{l}[\hat {\boldsymbol {\mu }}] \simeq 1\), and to leading order (36) simplifies to

$$ \begin{array}{@{}rcl@{}} \nabla_{\boldsymbol{\mu}} \nabla_{\boldsymbol{\mu}} {\Phi}_{0} &\approx& \sum\limits_{l} \frac{1}{\bar n_{l}({\boldsymbol{\mu}})} (\nabla_{\boldsymbol{\mu}} \bar n_{l}) (\nabla_{\boldsymbol{\mu}} \bar n_{l})^{T} + \textbf{R} \\ \nabla_{\boldsymbol{\mu}} \nabla_{\textbf{n}} {\Phi}_{0} &\approx& -\nabla_{\boldsymbol{\mu}} \bar{\textbf{n}}, \end{array} $$
(37)

which involve only the first derivative (29), and all quantities are evaluated at the input value μ. In this way, under conditions where the image is expected to be reasonably accurate, correspondingly accurate image quality metrics may be derived from the forward model alone, without actually needing to first derive the tomographic inversion \(\hat {\boldsymbol {\mu }}\).

4.2 Local Noise Metrics

The next step is to explore the effects of noise, measuring image quality in the presence of both model imperfection and measurement noise. Adopting the linear approximation,

$$ \hat {\boldsymbol{\mu}}(\textbf{n}) = \hat {\boldsymbol{\mu}}(\bar{\textbf{n}}) + [\textbf{n} - \bar{\textbf{n}}({\boldsymbol{\mu}})] \cdot \nabla_{\textbf{n}} \hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}})], $$
(38)

the effects of noise may be quantified by the covariance

$$ \textbf{C}(\hat {\boldsymbol{\mu}}) = \nabla_{\textbf{n}} \hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}})]^{T} \textbf{C}(\bar{\textbf{n}}) \nabla_{\textbf{n}} \hat {\boldsymbol{\mu}}[\bar{\textbf{n}}({\boldsymbol{\mu}})] $$
(39)

in which

$$ \textbf{C}(\bar{\textbf{n}}) = \langle [\textbf{n} - \bar{\textbf{n}}({\boldsymbol{\mu}})] [\textbf{n} - \bar{\textbf{n}}({\boldsymbol{\mu}})]^{T} \rangle $$
(40)

is the underlying photon count covariance. In our model, the Poisson statistics of each measurement are independent and \(\textbf {C}[\bar {\textbf {n}}]\) is therefore diagonal, with each measurement variance equal to the mean:

$$ \textbf{C}[\bar{\textbf{n}}]_{{lm}} = \langle (n_{l} - \bar n_{m}) (n_{m} - \bar n_{m}) \rangle = \bar n_{l} \delta_{{lm}} \approx n_{l} \delta_{{lm}}. $$
(41)

The local image covariance \(\textbf {C}(\hat {\boldsymbol {\mu }})\) is in general non-diagonal since the matrix \(\nabla _{\textbf {n}} \hat {\boldsymbol {\mu }}[\bar {\textbf {n}}({\boldsymbol {\mu }})]\), the same as that appearing in Eqs. 28 and 32, is nonlocal. It follows that sensitivity of a particular photon count to μ spreads to the neighborhood of the corresponding source-receiver ray.

4.3 Simplified Form Image Quality Metrics

For the simplified form Eq. 21, with Eq. 23, one obtains (via (25))

$$ \bar n_{l}({\boldsymbol{\mu}}) = b_{l} e^{-\bar{\textbf{w}}_{l} \cdot {\boldsymbol{\mu}}} + r_{l}, $$
(42)

which then produces

$$ \begin{array}{@{}rcl@{}} \frac{\partial \bar n_{l}({\boldsymbol{\mu}})}{\partial \mu_{j}} &=& -\bar w_{{lj}} b_{i} e^{-\bar{\textbf{w}}_{l} \cdot {\boldsymbol{\mu}}} = -\bar w_{{lj}} (\bar n_{l} - r_{l}) \\ &\approx& -\bar w_{{lj}} (n_{l} - r_{l}) \\ \frac{\partial^{2} \bar n_{l}({\boldsymbol{\mu}})}{\partial \mu_{i} \partial \mu_{j}} &=& \bar w_{{li}} \bar w_{{lj}} (\bar n_{l} - r_{l}) \end{array} $$
(43)

and depends on a combination of the (average) line integral sensitivity \(\bar {\textbf {w}}_{l}\) of measurement l to voxels i, j and the overall expected non-background photon count \((\bar n_{l} - r_{l})\).

Once again, with high-quality data, one may use \(\bar n_{l}[\hat {\boldsymbol {\mu }}]\) interchangeably with \(\bar n_{l}[{\boldsymbol {\mu }}]\), avoiding the need to first reconstruct \(\hat {\boldsymbol {\mu }}\). Moreover, if the photon counts are sufficiently high, one will have \(n_{l}/\bar n_{l} \approx 1\), and the quality metrics may then be computed directly from the data with the replacement \(\bar {\textbf {n}} \to \textbf {n}\) [2]: the simplified forms require only the blur-averaged geometrical parameters \(\bar {\textbf {w}}_{l}\), measured photon counts nl, and data-derived background counts rl without any direct reference to μ or \(\hat {\boldsymbol {\mu }}\).

Correspondingly, from a simulation point of view, one may investigate the quality of a proposed experimental setup using the forward model alone to generate the values \(\bar {\textbf {n}}({\boldsymbol {\mu }})\) (along with background count estimates) from a given target model μ. One may either use these values for n, or add further realism by generating simulated Poisson-distributed counts from these computed average counts. One is not required to go through the more complicated inversion step of first deriving \(\hat {\boldsymbol {\mu }}[\bar {\textbf {n}}({\boldsymbol {\mu }})]\) from μ.

4.4 Numerical Considerations

The matrix inversion in Eq. 32 is perhaps the most numerically intensive step in the computation of the point spread functions (28). Under certain conditions, this step may be sped up. An important case is that of approximate translation invariance, where Fourier transforms can be used to speed up the inversion tremendously. Thus, let

$$ A_{{ij}} = A(\textbf{x}_{i},\textbf{x}_{j}) $$
(44)

be a matrix that is a function in both indices of a coordinate on a rectangular grid. One may always rewrite this as a function of the difference and center coordinates,

$$ A_{{ij}} = \tilde A\left( \textbf{x}_{i} - \textbf{x}_{j}; \frac{\textbf{x}_{i} + \textbf{x}_{j}}{2} \right). $$
(45)

However, we suppose now that \(\tilde A\) is local in the first argument, effectively vanishing outside of some range r0, Aij → 0 for |xixj| > r0, while essentially constant over this same range in the second argument.

The objective is to solve a matrix equation of the form

$$ \sum\limits_{j} A_{{ij}} B_{{jl}} = C_{{il}} \ \Rightarrow \ B_{il} = \sum\limits_{j} \left[A^{-1}\right]_{{ij}} C_{{jl}}, $$
(46)

in which we define

$$ B_{{il}} = B(\textbf{x}_{i},n_{l}),\ \ C_{{il}} = C(\textbf{x}_{i},n_{l}), $$
(47)

with indices lying in the distinct spatial grid and photon measurement spaces. For the present problem, we identify

$$ \begin{array}{@{}rcl@{}} A(\textbf{x}_{i},\textbf{x}_{j}) &\equiv& \frac{\partial^{2} {\Phi}_{0}} {\partial \mu(\textbf{x}_{i}) \partial \mu(\textbf{x}_{j})} \\ B(\textbf{x}_{i},n_{l}) &=& \frac{\partial \hat \mu(\textbf{x}_{i})}{\partial n_{l}} \\ C(\textbf{x}_{i},n_{l}) &=& \frac{\partial {\Phi}_{0}} {\partial \mu(\textbf{x}_{i}) \partial n_{l}}. \end{array} $$
(48)

Under the slowly varying condition, one may then derive an approximate solution to Eq. 46 using Fourier transform “windowing.” Thus, about each given point x, consider a box of diameter 2r0, and define the local Fourier transforms

$$ \begin{array}{@{}rcl@{}} \hat A(\textbf{q},\textbf{r}) &=& \frac{1}{M} {\sum\limits_{i}}' e^{-i\textbf{q} \cdot \textbf{x}_{i}} \tilde A(\textbf{x}_{i},\textbf{x}), \\ \hat B_{l}(\textbf{q};\textbf{x}) &=& \frac{1}{M} {\sum\limits_{i}}' e^{-i\textbf{q} \cdot \textbf{x}_{i}} B(\textbf{x} + \textbf{x}_{i},n_{l}) \\ \hat C_{l}(\textbf{q};\textbf{x}) &=& \frac{1}{M} {\sum\limits_{i}}' e^{-i\textbf{q} \cdot \textbf{x}_{i}} C(\textbf{x}+\textbf{x}_{i},n_{l}) \end{array} $$
(49)

in which M is the number of points in the box, and the prime on the sums indicates restriction to the corresponding box centered at the origin. The wave vector q ranges over the box domain reciprocal lattice, and describes the more rapid variation of A and B on scales smaller than r0. Note that since \(\tilde A\) is short range in its first argument, the sum restriction is essentially redundant.

Applying the local Fourier transform to both sides of (46), one obtains

$$ \begin{array}{@{}rcl@{}} \hat C_{l}(\textbf{q};\textbf{x}) &=& {\sum\limits_{j}}' e^{-i \textbf{q} \cdot \textbf{x}_{j}} B(\textbf{x} + \textbf{x}_{j},n_{l}) {\sum\limits_{i}}' e^{-i \textbf{q} \cdot (\textbf{x}_{i} - \textbf{x}_{j})} \\ &&\times\ \tilde A\left( \textbf{x}_{i} - \textbf{x}_{j}; \textbf{x} + \frac{\textbf{x}_{i} - \textbf{x}_{j}}{2} \right) \\ &\approx& \hat A(\textbf{q};\textbf{x}) {\sum\limits_{j}}' e^{-i \textbf{q} \cdot \textbf{x}_{j}} B(\textbf{x}_{j},n_{l}) \\ &=& \hat A(\textbf{q};\textbf{x}) \hat B_{l}(\textbf{q};\textbf{x}) \end{array} $$
(50)

In the approximate equality, we have neglected the xixj dependence of the second argument of \(\tilde A\) and thereby succeeded in proving a local version of the convolution theorem. The matrix inversion now reduces to an algebraic division of both sides by \(\hat A\) and a subsequent inverse Fourier transform, and one obtains the desired result

$$ B(\textbf{x}+\textbf{x}_{i},n_{l}) \approx \sum\limits_{\textbf{q}} e^{i\textbf{q} \cdot \textbf{x}_{i}} \frac{\hat C_{l}(\textbf{q};\textbf{x})}{\hat A(\textbf{q};\textbf{x})}. $$
(51)

Here, the result is restricted to the local box neighborhood of a chosen point x. The global solution is obtained by varying x over the appropriate global set of box centers.

4.4.1 Consistency Conditions

It remains to discuss the conditions under which one indeed expects \(\tilde A\) to obey the desired conditions.

First, the regularization term R is typically local, e.g., depending on near-neighbor differences μ(xi) − μ(xj). Often a quadratic regularization is used [2], in which case R is a constant, near-diagonal matrix (or perhaps a slowly varying set of near-diagonal quadratic coefficients, depending on prior knowledge of the target), and the desired conditions indeed hold because such a regularization biases the solution toward relatively smooth \(\hat \mu (\textbf {x})\).

Similarly, if the bias is toward smooth, slowly varying \(\hat {\boldsymbol {\mu }}(\textbf {x})\), the likelihood term (19) will be a smooth function of the mean counts \(\bar {\textbf {n}}(\hat {\boldsymbol {\mu }})\). However, each independent count \(\bar n_{l}\) depends on a narrow cylinder of voxels connecting source and receiver and is hence strongly nonlocal. In particular, the second derivative (35) will be nonzero for any pair xi, xj lying within the cylinder defined by \(\bar n_{l}\).

On the other hand, the sum over l in Eq. 19 contains many cylinders, and one expects high-resolution images to emerge only when each voxel is intersected by many cylinders. In this case for widely separated xi, xj, one expects a small number of cylinders to pass through both, and a correspondingly small number of terms will contribute to the l-sum in the first equalities in Eqs. 36 or 37. In contrast, for xi, xj within a cylinder diameter, many terms will contribute. It follows that 2L/μiμj will be strongly peaked about the diagonal xi = xj, and the locality property emerges.

5 Generalized Image Quality Metrics

We next consider more general classes of regularization terms R. The structure of L remains the same as before, and we continue to assume that many l-cylinders pass through each voxel. Thus, 2L/μiμj continues to be near diagonal. More problematical is the structure (4) in which the densities nA(x) are piecewise slowly varying, but with sharp interfaces between, further biased, e.g., toward metal interconnect rectilinear geometry. The regularization term will be relatively agnostic to the position of the interface, but sharp interfaces dictate an L1 rather than L2 regularization. The second derivative of R will therefore be very singular when evaluated at \(\hat \mu (\textbf {x})\). Moreover, strong rectilinear constraints may lead to global shifts in an interconnect position with small change in n. For example, for cylinders aligned along a preferred interconnect direction, the value of nl will have a large jump as the cylinder crosses from one side of a metal interface to the other.

5.1 Illustrative Regularization

Let the target consist of K different materials/compounds with absorption \(\{\mu (\kappa ) \}_{\kappa = 1}^{K}\). Each target voxel is then assigned a material index Mi ∈{1,2,…, K} with absorption μi = μ(Mi). A simple regularization term might take the Potts model form

$$ R(\textbf{M}) = -\sum\limits_{\kappa = 1}^{K} h_{\kappa} \sum\limits_{i} \delta_{M_{i},\kappa} - \sum\limits_{\kappa = 1}^{K} J_{\kappa} \sum\limits_{\langle i,j \rangle} \delta_{M_{i},\kappa} \delta_{M_{j},\kappa}, $$
(52)

in which the fields hκ are used to control the fractional area of each material type, and the coupling constants Jκ > 0 favor nearest neighbor voxels 〈i, j〉 being of the same material type. Both could also be made slow functions of position in order to encode prior information on material types in different regions of the target.

Additional terms may be included that favor particular material boundary orientations (e.g., rectilinear). For example, if a, b, c, d are counter-clockwise (starting from the first quadrant) labels of a 2 × 2 plaquette centered on some point i, with corresponding material labels Ma(i), Mb(i), Mc(i), Md(i), then the terms in

$$ \begin{array}{@{}rcl@{}} R_{4}(\textbf{M}) &=& -\sum\limits_{\kappa} L_{\kappa} \sum\limits_{i} [\delta_{M_{a}(i),\kappa} \delta_{M_{b}(i),\kappa} \\ &&\ \ \ \ \times\ (1-\delta_{M_{c}(i),\kappa}) (1 - \delta_{M_{d}(i),\kappa}) \\ &&+\ \delta_{M_{c}(i),\kappa} \delta_{M_{d}(i),\kappa} \\ &&\ \ \ \ \times\ (1-\delta_{M_{a}(i),\kappa}) (1 - \delta_{M_{b}(i),\kappa})] \end{array} $$
(53)

are nonzero only if sites a, b are both the same material but different from c, d, or vice versa, hence favoring horizontal interfaces. Additional terms, for example, could be introduced favoring certain interface types (e.g., metal–insulator).

5.2 Properly Designed Image Quality Metrics

By balancing the data-driven likelihood term L against this alternative type of regularization term, characteristic material solutions \(\hat {\textbf {M}}\) will emerge with certain types of enforced geometries (e.g., piecewise constant, preferred shape and orientation)—notional geometries are illustrated in Fig. 2. The size of the coefficients hκ, Jκ, Lκ,… are chosen to emphasize different target features in proportion to one’s degree of confidence in such prior information. Of course, the more high-quality data one has (the larger L is relative to R), the less impact these coefficients will have. They are most useful under data-starved conditions in which R is able to resolve ambiguities in L in favor of the desired geometry. For example, some violations of geometry rules may in fact be real (due to target imperfections, unrecognized manufacturing specifications, etc.), and sufficiently high-quality data must be permitted to resolve the discrepancy in favor of L rather than R. One must therefore ensure that the coefficients in R are chosen in such a way that the regularization does not completely dominate the data. Thus, the difference in the likelihood terms, \(|L[{\boldsymbol {\mu }}(\hat {\textbf {M}}_{0}), \bar {\textbf {n}}_{0}] - L[{\boldsymbol {\mu }}(\textbf {M}_{0}), \bar {\textbf {n}}_{0}]|\) should be small: the adjustments in M0 that significantly reduce R should remain consistent with the data.

Fig. 2
figure 2

Illustration of sample geometry benefiting from Potts model-type regularization (52) and (53) of the objective function (16), including resolution varying with position. Different material constituents (e.g., metallic interconnects colored blue and green; insulating materials colored yellow and orange), are generally contiguous and compact in shape. However, the Poisson likelihood term (19) and (20) ensures that the photon count data strongly influence the shape of the boundary to be consistent with the line integral values (2) along the fans of X-ray paths (receiver array not shown). Proper balance between the two sets of terms is influenced by the density and diversity of line integrals and by a physically reasonable choice for the regularization amplitude β

The results here are therefore quite different from the continuously varying solutions favored by the L2-type regularizations discussed in Section 4. In particular, it no longer makes sense to define the objective function Φ as a continuous function of μ, and to subsequently formulate the sensitivity of the minimum \(\hat {\boldsymbol {\mu }}\) in terms of gradients—see Eqs. 28 and 3032. The regularization smoothing process is unlikely to be sensitive to changes limited to a single site.

Instead, one needs to recognize that small changes in the photon count data could lead to strongly nonlocal changes in \(\hat {\textbf {M}}\), e.g., small lateral shift of an entire metal interconnect (which would be best sensed by a photon count measurement whose source-receiver cylinder of support strongly overlaps the region of this shift). Thus, instead of considering the effects of single local changes in μ, as in Eq. 28, one should probe the effects of more general changes consistent with R. Specifically, let M0 be a representative physically consistent material geometry (in the sense that R(M0) is small), with corresponding mean photon counts \(\bar n_{0} = \bar n[{\boldsymbol {\mu }}(\textbf {M}_{0})]\), and let

$$ \hat{\textbf{M}}_{0} = \arg \min_{\textbf{M}} {\Phi}(\textbf{M},\bar{\textbf{n}}_{0}) $$
(54)

be the minimizer. An obvious image quality requirement is that \(|\hat {\textbf {M}}_{0} - \textbf {M}_{0}|\) should be small (in an appropriate norm that, e.g., measures the number of voxels on which the models differ).

Image quality should also be judged on the ability of the photon count data (sparse as it might be) to quantitatively distinguish changes consistent with R. If the data is so starved that such changes have too small an effect on L, then the tomography experiment cannot be expected to yield robust results. To quantify this, consider the class of changes δM0 such that \(|R(\hat {\textbf {M}}_{0} + \delta \textbf {M}_{0}) - R(\hat {\textbf {M}}_{0})|\) is small. We then demand that L be able sense such changes—the measurement design must be capable of disambiguating nearby target geometries that are both consistent with R. Defining the reconstruction change \(\delta \hat {\textbf {M}}_{0}\) by

$$ \hat{\textbf{M}}_{0} + \delta \hat{\textbf{M}}_{0} = \arg \min_{\textbf{M}} {\Phi}[\textbf{M},\bar{\textbf{n}}(\textbf{M}_{0} + \delta \textbf{M}_{0})], $$
(55)

then a more general image quality metric, generalizing (28), is that

$$ \begin{array}{@{}rcl@{}} \textbf{P}(\textbf{M}_{0},\delta \textbf{M}_{0}) &=& \hat{\textbf{M}}[\bar{\textbf{n}}(\textbf{M}_{0} + \delta \textbf{M}_{0})] - \hat{\textbf{M}}[\bar{\textbf{n}}(\textbf{M}_{0})] \\ &\equiv& \delta \hat{\textbf{M}}_{0} \end{array} $$
(56)

should be “close” to δM0 (also in the sense of differing on a small number of target pixels). In the present case, δM0 might correspond to a small, smooth translation of a material boundary, and the desire would be that \(\delta \hat {\textbf {M}}_{0}\) substantially matches this change.

5.2.1 Forward-Model-Only Approximate Formulation

As discussed below (37), it would be desirable to derive reasonably accurate image quality metrics from the forward model alone. Thus, at the end of Section 4.1, we succeeded in estimating the key second derivative matrices appearing in Eq. 32 in terms of the forward photon data \(\bar {\textbf {n}}({\boldsymbol {\mu }})\) alone without reference to the optimal model \(\hat {\boldsymbol {\mu }}\). Here, the presence of R(M), which is deliberately sensitive to the discrete nature of M in order to help select models with desirable piecewise constant geometries, is a complication because the assumption of continuous second derivatives no longer make sense.

An alternative approach taken here is to consider only models M0 consistent with the geometrical constraints, hence with relatively small values of R[M0], and to consider only perturbations δM0 such that R(M0 + δM0) − R(M0) is very small, i.e., such that the change in the tomographic estimate \(\delta \hat {\textbf {M}}_{0}\) is driven by the change \(\bar {\textbf {n}}[{\boldsymbol {\mu }}(\textbf {M}_{0} + \delta \hat {\textbf {M}}_{0})] - \bar {\textbf {n}}[{\boldsymbol {\mu }}(\textbf {M}_{0})]\) in the photon count data. Thus, we suppose that the measurement protocol is such that the change in Φ is dominated by the change in L, which remains continuous in μ.

It is convenient to write the mean photon count likelihood term in the form

$$ L(\hat {\boldsymbol{\mu}},\textbf{n})|_{\textbf{n} = \bar{\textbf{n}}({\boldsymbol{\mu}})} \equiv \tilde L(\hat {\boldsymbol{\mu}},{\boldsymbol{\mu}}), $$
(57)

in which one calls out the separate dependencies on the reconstructed and true absorption profiles \(\hat {\boldsymbol {\mu }}\), μ (both appearing, via (19), through their respective mean photon counts \(\bar {\textbf {n}}(\hat {\boldsymbol {\mu }})\), \(\bar {\textbf {n}}({\boldsymbol {\mu }})\)). One notes as well that a model perturbation M0M0 + δM generates an absorption field change

$$ \begin{array}{@{}rcl@{}} \delta {\boldsymbol{\mu}}(\textbf{M}_{0},\delta \textbf{M}) &\equiv& {\boldsymbol{\mu}}(\textbf{M}_{0} + \delta \textbf{M}) - {\boldsymbol{\mu}}(\textbf{M}_{0}) \\ &=& \sum\limits_{i} [\mu(M_{0,i} + \delta M_{i}) - \mu(M_{0,i})] \textbf{e}_{i} \ \ \ \ \ \ \end{array} $$
(58)

which reassigns the absorption values on the support of δM. With this notation, the minimum condition for the reconstruction \(\hat {\boldsymbol {\mu }}_{0}({\boldsymbol {\mu }}_{0})\) now takes the form

$$ \tilde L(\hat {\boldsymbol{\mu}}_{0} + \delta \hat {\boldsymbol{\mu}}, {\boldsymbol{\mu}}_{0}) - \tilde L(\hat {\boldsymbol{\mu}}_{0}, {\boldsymbol{\mu}}_{0}) = 0 $$
(59)

over the subspace of permissible \(\delta \hat {\boldsymbol {\mu }} \equiv \delta \hat {\boldsymbol {\mu }}(\hat {\textbf {M}}_{0},\delta \hat {\textbf {M}})\). Given that the mean photon counts \(\bar {\textbf {n}}({\boldsymbol {\mu }})\) depend continuously on μ, and that one expects only small changes if the support of δμ is small (except for singular cases, which we neglect, where the support of a long narrow change happens to line up exactly with a ray trajectory), one may approximate (59) by

$$ (\delta \hat {\boldsymbol{\mu}} \cdot \nabla_{1}) \tilde L(\hat {\boldsymbol{\mu}}_{0}, {\boldsymbol{\mu}}_{0}) = 0, $$
(60)

in which ∇1,2 will refer to μ-derivatives with respect to the first and second arguments.

Considering next small changes μ0μ0 + δμ0 to the true profile, the generalization of the minimum condition (31) is

$$ (\delta \hat {\boldsymbol{\mu}} \cdot \nabla_{1}) (\delta \hat {\boldsymbol{\mu}}_{0} \cdot \nabla_{1} + \delta {\boldsymbol{\mu}}_{0} \cdot \nabla_{2}) \tilde L(\hat {\boldsymbol{\mu}}_{0},{\boldsymbol{\mu}}_{0}) = 0, $$
(61)

which is enforced for all permitted independent choices of \(\delta \hat {\boldsymbol {\mu }}, \delta {\boldsymbol {\mu }}_{0}\), and is to be solved for the induced change \(\delta \hat {\boldsymbol {\mu }}_{0}\) in the reconstruction for each given δμ0. The second derivatives of \(\tilde L\) are computed as in Section 4.1. As observed below (37), they may be expressed entirely in terms of the forward model parameters alone—one need not perform the tomographic inversion in order to evaluate the image quality.

A reasonable starting point for investigating solutions to Eq. 61 is to evaluate the second derivatives of \(\tilde L\) at \(\hat {\boldsymbol {\mu }}_{0} = {\boldsymbol {\mu }}_{0}\) (as in Eq. 37), which then also allows one to draw the perturbations \(\delta \hat {\boldsymbol {\mu }}\), \(\delta \hat {\boldsymbol {\mu }}_{0}\), δμ0 from the same set. Thus, if \(\{\delta {\boldsymbol {\mu }}_{a} \}_{a = 1}^{L}\) are the permitted common set of small R perturbations, and one defines the L × L matrices

$$ \begin{array}{@{}rcl@{}} L^{(1)}_{{ab}} &=& (\delta {\boldsymbol{\mu}}_{a} \cdot \nabla_{1}) (\delta {\boldsymbol{\mu}}_{b} \cdot \nabla_{1}) \tilde L({\boldsymbol{\mu}}_{0},{\boldsymbol{\mu}}_{0}) \\ L^{(2)}_{{ab}} &=& (\delta {\boldsymbol{\mu}}_{a} \cdot \nabla_{1}) (\delta {\boldsymbol{\mu}}_{b} \cdot \nabla_{2}) \tilde L({\boldsymbol{\mu}}_{0},{\boldsymbol{\mu}}_{0}), \end{array} $$
(62)

then Eq. 61 takes the form

$$ L^{(1)}_{{ac}} + L^{(2)}_{{ab}} \simeq 0 $$
(63)

for all a, and for some value c for each choice of b. The use of “\(\simeq \)” here indicates that perfect equality is unlikely to be achieved when only discrete values of the absorption on a discrete lattice are permitted (compared to the continuous values permitted in the corresponding matrix (31)). Instead, one will likely only obtain approximate correspondence, e.g., of displacements of various interfaces between Potts domains of uniform material. For a perfect measurement, one expects c = b, but in general, there will be an imperfect correspondence between the true and reconstructed perturbations.