Trace formulation for photonic inverse design with incoherent sources

Spatially incoherent light sources, such as spontaneously emitting atoms, naively require Maxwell's equations to be solved many times to obtain the total emission, which becomes computationally intractable in conjunction with large-scale optimization (inverse design). We present a trace formulation of incoherent emission that can be efficiently combined with inverse design, even for topology optimization over thousands of design degrees of freedom. Our formulation includes previous reciprocity-based approaches, limited to a few output channels (e.g. normal emission), as special cases, but generalizes to a continuum of emission directions by exploiting the low-rank structure of emission problems. We present several examples of incoherent-emission topology optimization, including tailoring the geometry of fluorescent particles, a periodically emitting surface, and a structure emitting into a waveguide mode, as well as discussing future applications to problems such as Raman sensing and cathodoluminescence.


Introduction
Incoherent emission (light emission from random current sources) arises in many problems in optics: spontaneous emission (fluorescence) (Milonni, 1976;Kim, 1986;Polimeridis et al, 2015), thermal emission in both far (Carey et al, 2008) and near (Basu et al, 2009;Rodriguez et al, 2013) fields, scintillation (Brenny et al, 2014;Roques-Carmes et al, 2021), Casimir and van der Waals forces (Gong et al, 2021), Raman scattering in fluid suspensions (Pilot et al, 2019), incoherent incident waves (Wolf, 2007) (which can be transformed to random sources via the equivalence principle (Harrington, 2001)), and even scattering from surface roughness via a Born approximation (Johnson et al, 2005). However, accurate modeling of such spatially random sources can pose severe computational challenges, because a direct approach would involve averaging the results of many simulations over an ensemble of sources (Rodriguez et al, 2011;Luo et al, 2004;Bao et al, 2019); the statistics (correlation functions) of the sources are known, but the difficulty is converting this into statistics (e.g. average power) of the resulting fields. In the cases of fluorescence (Polimeridis et al, 2015), near-field thermal radiation (Rodriguez et al, 2013), and Casimir forces (Gong et al, 2021), for example, tractable methods for arbitrary geometries were only obtained recently. This challenge is compounded when one wishes to perform inverse design (Molesky et al, 2018)-large-scale optimization of emission over many geometric parameters, perhaps even over "every pixel" of a design region via topology optimization (TopOpt) (Jensen and Sigmund, 2011)-because one must then repeat the computation 10s-1000s of times as the design evolves, e.g. to maximize spontaneous emission (Rogobete et al, 2003;Liang and Johnson, 2013;Wang et al, 2018;Yao et al, 2020) or Raman emission (Christiansen et al, 2020) from a single molecule, much less a distribution of sources.
In this paper, we present a unified framework for inverse design of incoherent emission, combining a trace formulation adapted from recent work (Rodriguez et al, 2013;Polimeridis et al, 2015;Reid et al, 2017) (Sec. 2) with a new algorithm to simultaneously optimize the geometry and evolve to an accurate estimate of the average emission/trace (Sec. 2.5). We apply this framework to perform density-based TopOpt (Jensen and Sigmund, 2011) on several example problems in two dimensions: fluorescence from an optimized nanoparticle (Sec. 4.1), enhanced emission from a corrugated surface analogous to a light-emitting diode (Erchak et al, 2001) (Sec. 4.2), and optimized emission into a waveguide (Sec. 4.3). In each case, the emission is not from a single molecule, but the average power produced by an ensemble of incoherent emitters at every point in some material. We show that this emission can be computed by a small number of "eigen-sources" of a Hermitian operator, which can be determined by a Rayleigh-quotient optimization (Li, 2015) that is combined with the inverse-design (geometric) optimization. In the special case of emission into a small number K of channels, such as K farfield directions, K waveguide modes, or K points in space, we show that a simple algebraic manipulation transforms the problem into K simulations (Sec. 2.3)-this unifies and generalizes known results based on Kirchhoff's law of thermal radiation (Reif, 1965;Greffet et al, 2018) or (more generally) reciprocity (Roques-Carmes et al, 2021;Janssen et al, 2010) for computing emission into a single planewave direction-but our alternative approach (Sec. 2.5) yields a small number of solves even for K → ∞. The other well known special case is that of a single emitter location with a random orientation, which reduces to the local density of states (LDOS) via three Maxwell solves (Milonni, 1976;Oskooi and Johnson, 2013), and this appears as another low-rank special case in our formulation (Sec. 2.4). We believe that this computational framework will enable many future developments in the computational design of complex optical devices involving a wide variety of incoherent processes (Sec. 5).
Density-based TopOpt has attracted increasing interest over the last few decades because of its ability to reveal surprising high-efficiency designs by optimizing over thousands or even millions of design degrees of freedom (Jensen and Sigmund, 2011). It parameterizes a structure by an artificial "density" ρ(x) ∈ [0, 1] at every point (or every "pixel") in a design region, which is typically passed through smoothing and threshold steps to yield a physical "binary" design consisting of one of two materials at every point. We apply a damped-diffusion filter (Lazarov and Sigmund, 2011), which regularizes the problem by setting a minimum lengthscale on the design. (Additional manufacturing constraints can be imposed by well-known techniques (Hammond et al, 2021), but in the present work we focus on the fundamental algorithms and not on experimental realization.) Once a scalar objective function (to be optimized) is defined, such as the emitted power (e.g. the new formulation in this paper), its derivatives (sensitivities) with respect to all the design parameters can be efficiently computed with a single additional simulation via adjoint methods (Molesky et al, 2018;Tortorelli and Michaleris, 1994). Given the objective function and its derivatives, a variety of large-scale optimization algorithms are available; we use the CCSA/MMA method (Svanberg, 2002). We employ a recent free/open-source finite-element method (FEM) package, Gridap.jl (Badia and Verdugo, 2020), in the Julia language (Bezanson et al, 2017), which allows us to efficiently code highly customized FEM-based trace formulations in a high-level language, with the construction of the adjoint problem aided by automatic-differentiation (AD) tools (Revels et al, 2016;Innes, 2018).

Trace Formulation
In this section, we first review the formulation of the frequency-domain Maxwell equations as a linear equation, discretized for numerical computation, with physical quantities like power as quadratic forms. Then we show how the ensemble average of such an expression over a distribution of random current sources can be rewritten as a deterministic trace formula. Finally, we explain how such a trace formula can be evaluated efficiently in the context of photonics optimization, both in the "easy" cases of coupling to a small number of output/input channels as well as in the more general cases of a continuum of outputs.

Wave sources and quadratic outputs
In the frequency domain, the linear Maxwell equations for the electric field E in response to a time-harmonic current source at a frequency ω are (Jin, 2014) where ε(x, ω) is the relative electric permittivity, µ is the relative magnetic permeability (µ ≈ 1 for most materials at optical and infrared wavelengths, so we assume µ = 1 throughout this work), c is the speed of light in vacuum, and f = iωJ is a current-source term. Numerically, one discretizes the problem (e.g. using finite elements (Jin, 2014)) into a linear equation: where A is a matrix representing the Maxwell operator on the left-hand of Eq. (1), u is a vector representing the discretized electric (and/or magnetic) field, and b is a vector representing the discretized source term. In the following, it is algebraically convenient to work with such a discretized (finite-dimensional) form, to avoid cumbersome infinite-dimensional linear algebra, but one could straightforwardly translate to the latter context as well (Joannopoulos et al, 2008). Most physical quantities P of interest in photonics-such as power (via the Poynting flux), energy density, and force (via the Maxwell stress tensor)can be expressed as quadratic functions of the electromagnetic fields u. Since these are real-valued quantities, they correspond in particular to Hermitian quadratic forms where † denotes the conjugate transpose (adjoint) and O = O † is a Hermitian matrix/operator. In this paper, we are mainly concerned with computing emitted power P , which is constrained by the outgoing boundary conditions to be non-negative, in which case O must furthermore be a positive semidefinite Hermitian matrix (i.e., non-negative eigenvalues) in the subspace of permissible u, a property that will be useful in Sec. 2.5.

Trace formula for random sources
Now, consider the case where one has an ensemble of random current sources b drawn from some statistical distribution with zero mean and a known correlation function (e.g. a known mean-square current at each point if they are spatially uncorrelated). In this case, we wish to compute the ensemble average, denoted by · · · , of our quadratic form Eq. (3): where Note that only b is random in the right-hand expression. Naively, this average could be computed by a brute-force method in which one explicitly solves the Maxwell equations (u = A −1 b) for many possible sources b and then integrates over the distribution, perhaps by a Monte-Carlo (random-sampling) method. That approach is possible, and has been accomplished e.g. for evaluating thermal radiation (Rodriguez et al, 2011;Luo et al, 2004), but is computationally expensive. Worse, such a direct approach quickly becomes prohibitive in the context of inverse design, where the averaging must be repeated for many geometries over the course of solving an optimization problem using an iterative algorithm.
Instead, we adapt "trace formula" techniques that have been developed for similar problems in thermal radiation (Rodriguez et al, 2013) and spontaneous emission (Polimeridis et al, 2015), where one must compute the average effect of many random current sources distributed throughout a volume. The basic trick (as reviewed in yet another related setting in (Reid et al, 2017)) is to write the scalar P as a 1 × 1 "matrix" trace, and then employ the cyclic-shift property (Lax, 2013) to group the b terms together: Here, the ensemble average is now confined to the bb † term, which is just the correlation matrix B (Johnson and Wichern, 2018) of the currents; such a matrix is positive semi-definite, so it can be factorized (Trefethen and Bau, 1997) (for convenience below) as for some known matrix D. Further information about constructing the matrix B or its factorization D is given in Appendix A. (For the case of finite-element discretizations, we show that B is a sparse matrix that is straightforward to assemble and D is, for example, a sparse Cholesky factor (Davis, 2006).) Algebraically, expressing our results in terms of D below leads to convenient Hermitian matrices, but we show in Appendix B that the final algorithms can easily employ B directly to avoid the computational cost of an explicit factorization. In the simple case where random currents are spatially uncorrelated, which holds for spontaneous emission and thermal emission in local materials (Landau et al, 1980), B and D are conceptually diagonal linear operators whose diagonal entries are the mean-square and root-mean-square currents, respectively, at each point in space. Whether this leads to a strictly diagonal matrix depends on the discretization scheme as explained in Appendix A. For instance, in the case of thermal and quantum fluctuations, the mean-square currents are given by the fluctuation-dissipation theorem (FDT) (Landau et al, 1980), while for spontaneous emission one can use the FDT with a "negative temperature" determined by the population inversion (Pick et al, 2015;Patra, 2015). Inserting Eq. (6) into Eq. (5), we obtain our objective as the trace of a deterministic Hermitian matrix H (which is positive-semidefinite if O is, as for power), given by: The challenge now is to efficiently compute such a matrix trace. Evaluating a trace is easy once the matrix elements are known-it is the sum of the diagonal entries-but the difficulty in Eq. (7) is the computation of A −1 D. Recall that the N × N matrix A is a discretized Maxwell operator where N is the number of grid points (or basis functions), a huge matrix (especially in 3D). There are fast methods to solve for A −1 (Dv) for any single right-hand side v, typically because the matrix A is sparse (mostly zero) as in finite-element methods (Jin, 2014), but computing the whole matrix A −1 D corresponds to solving N righthand sides. Equivalently, computing explicit (dense) matrix inverses A −1 is typically prohibitively expensive (in both time and storage) for matrices arising in large physical systems (Davis, 2006). Fortunately, a large number of "iterative" algorithms have been proposed for estimating matrix traces to any desired accuracy using relatively few matrix-vector products (Hutchinson, 1989;Ubaru et al, 2017), and what remains is to find a method well-suited to inverse design.

Trace computation: Few output channels
In the important special cases where the desired output is the power in a small number (K) of discrete directions/channels/ports, or perhaps the intensity at a few points in space, we show in this section that the trace computation Eq. (7) simplifies to only K scattering problems. This fact is a generalization of earlier results commonly derived from electromagnetic reciprocity (Chew, 2008), such as the well-known Kirchhoff's law of thermal radiation (reciprocity of emission and absorption) (Reif, 1965) or analogous results for scintillation (Roques-Carmes et al, 2021). More generally, this simplification arises whenever the matrix O in Eq. (3) is low rank.
For example, suppose that the objective function is the electric field intensity E(x 1 ) 2 at a single point x 1 in space, which is the case for "metalens" optimization problems in which one is maximizing intensity at a focal spot (Bayati et al, 2021). In matrix notation for a discretized problem, this quantity corresponds to where e 1 is the unit vector with a nonzero entry at the location ("grid point") corresponding to x 1 . We then have a rank-1 (Lax, 2013) matrix O = e 1 e † 1 , and the trace Eq. (7) simplifies to and A − † e 1 corresponds to solving a (conjugate-) transposed Maxwell problem with a "source" e 1 at the output location, which is closely related to electromagnetic reciprocity (Chew, 2008).
Another important example where O is low-rank arises when the output P is the power in one (or more) orthogonal "wave channels" (Snyder and Love, 1983), such as waveguide modes, planewave directions (e.g. diffraction orders), or spherical waves. In such cases the power in a given channel can be computed by squaring a mode-overlap integral (e.g. a Fourier component for planewaves) of the form o † 1 u 2 (Snyder and Love, 1983). Exactly as in the single-point case above, this corresponds to a rank-1 matrix O = o 1 o † 1 and one must solve only a single "reciprocal" scattering problem to obtain the trace, where the "source" term is the (conjugated) output mode o 1 . This is precisely the situation in Kirchhoff's law, where in order to compute the average thermal radiation (emissivity) in a given direction, one solves a reciprocal problem for the absorption of an incident planewave in the opposite direction (the absorptivity) (Reif, 1965;Greffet et al, 2018;Janssen et al, 2010). A similar technique was recently applied to optimize the average power emitted in the normal direction from a scintillation device (Roques-Carmes et al, 2021).
More generally, such cases correspond to an output quadratic form O that takes a low-rank (Lax, 2013) form: where K is the number of rank-1 terms o i o † i (e.g. output channels/ports, output points, or other "overlap integrals"). Substituting Eq. (10) into Eq. (7) and applying the cyclic-trace identity, we obtain: where corresponds to a single "reciprocal" Maxwell solve (Electromagnetic reciprocity simply corresponds to the fact that A T = A for reciprocal materials (Chew, 2008).) Hence, the full trace-the average emission into K channels-can be computed with only K solves, and in many such cases K = 1.

Trace computation: Few input channels
One trivial special case in which the trace computation drastically simplifies is that of only a few sources or a few input channels, most famously in the case of the local density of states (LDOS): emission by a molecule at a single location in space but with a random polarization (Milonni, 1976;Oskooi and Johnson, 2013). In the case of LDOS, this reduces the trace computation to three Maxwell solves, one per principal polarization direction, making the problem directly tractable for topology optimization (Liang and Johnson, 2013;Wang et al, 2018;Yao et al, 2020). More generally, this situation corresponds to the correlation matrix B being low-rank: if B is rank K, we can compute the trace in K solves.
In particular, suppose that the currents b are of the form b = K i=1 β i b i where the b i are "input channel" basis functions (e.g. a point source with a particular orientation, or an equivalent-current source for a waveguide mode (Oskooi and Johnson, 2013)) and β i are uncorrelated random numbers with zero mean and unit mean-square. Then the correlation matrix B = bb † is simply the rank- In this case, the trace simplifies to: where computing u i = A −1 b i again requires only K solves, one per source b i .

Trace computation: Many output channels
In general, neither the matrix O nor the matrix B are low rank-for example, one may be interested in the total power radiated into a continuum of angles above a surface, or some other infinite set of possible far-field distributions, from sources distributed over a continuous spatial region. Fortunately, it turns out that there is another structure we can exploit: the Hermitian matrix (7) is itself typically approximately low rank ("numerically low rank" (Markovsky, 2012)) even if O is not: the trace, which equals the sum of the eigenvalues of H (Lax, 2013), is dominated by a few of H's largest eigenvalues. In this section, we first explain why that is the case, and then show how it can be exploited to efficiently estimate the trace during optimization.
There are two reasons to expect approximate low-rank structure of H (which we illustrate with numerical examples in Sec. 4). First, on physical grounds, emission enhancement arises due to resonances (via the Purcell effect) (Agio and Cano, 2013), but in any finite volume there is some limit to the number of resonances that can interact strongly with emitters in a given bandwidth, related to an average density of states (Yu et al, 2010). The traditional definition of resonant modes corresponds to poles of A −1 at complex resonant frequencies, which are (linear or nonlinear) eigenvalues ω satisfying det A(ω) = 0 (Nussenzveig, 1972); analogously, Eq. (7) decomposes the total power into a sum of eigenvalues corresponding to "resonant current" sources which diagonalize H at a given frequency. More explicitly, if A −1 D can be accurately approximated by the action of K resonances of A (a quasinormal mode expansion (Lalanne et al, 2018;Ge et al, 2014)), so that A −1 can be replaced by a rank-K matrix, it follows that H is also approximately rank ≤ K (since it is a product of rank-deficient matrices (Lax, 2013)). Moreover, geometric optimization to maximize the emitted power modifies the structure to further enhance one or more resonances (Liang and Johnson, 2013), and we observe that this sometimes increases the concentration of the trace into a few eigenvalues of H; that is, optimized structures tend to be even lower rank. Second, in a more general mathematical sense, the matrix H is built from offdiagonal blocks of the Green's function matrix A −1 , connecting sources (at the the support of D) to emitted power at some other location (the support of O, e.g. where the Poynting flux is computed), and off-diagonal blocks of Green's functions are known to be approximately low-rank (Hackbusch, 2015). This is closely related to fast methods for integral equations, such as the fast-multipole method and others (Gibson, 2021); essentially, far fields mostly depend on low-order spatial moments of the near fields/currents.
If tr H is dominated by K N largest eigenvalues of the N × N matrix H, then one merely needs a numerical algorithm to compute the K extremal (largest-magnitude) eigenvalues using only a sequence matrix-vector products Hv (corresponding to individual scattering problems). Fortunately, there are many such algorithms, especially for Hermitian H (Lanczos, 1950;Knyazev, 2001), and one can simply increase K until the trace converges to any desired tolerance. We argue here that methods based on Rayleigh-quotient maximization are particularly attractive for inverse design because they can be combined with geometric/topology optimization. The key fact is that one can express the sum of the largest K eigenvalues as the maximum of a block Rayleigh quotient (Li, 2015;Johnson and Joannopoulos, 2001;Knyazev, 2001;Kokiopoulou et al, 2011), and for positive semidefinite H (= positive semidefinite O) this sum is a lower bound on the trace (Kokiopoulou et al, 2011): where V represents any K-dimensional subspace basis, so that one is maximizing the trace over all possible subspaces. This ≥ becomes equality for N = K, but in many problems (below) we find that K < 10 suffices for < 1% error in the trace (and, as expected from the arguments above, we find in Sec. 4.1 that the required K increases with the diameter of the emission region). Computationally, one can maximize the right-hand side of Eq. (14) by some form of gradient ascent (Li, 2015;Knyazev, 2001), each step of which only requires the evaluation of A −1 DV for a N × K matrix V. That is to say, one only needs K Maxwell solves at each step (instead of N for the full matrix H), which vastly reduces the computational cost.
Moreover, this Rayleigh-quotient maximization formula is especially attractive in the context of inverse design, because it can be combined with the geometric optimization itself. That is, instead of "nesting" the trace computation inside a larger geometric optimization procedure, we can simply add V to the geometry degrees of freedom and optimize over both V and the geometry simultaneously. The full inverse-design problem with incoherent emission can now be bounded by a single optimization problem: where the geometric parameters (e.g. material densities (Jensen and Sigmund, 2011) or level sets (van Dijk et al, 2013)) only affect A and (perhaps) D, and may be subject to some geometric and/or material constraints. The gradient of the right-hand side with respect to the geometry can be computed efficiently with adjoint methods (Molesky et al, 2018;Tortorelli and Michaleris, 1994), whereas the gradient with respect to V has a simple analytical formula (Johnson and Joannopoulos, 2001) (Appendix C), so a variety of gradient-based optimization algorithms (Chong and Zak, 2001) can be applied to simultaneously evolve both V and the geometry. Furthermore, the Rayleigh quotient has the nice property that, since we are maximizing a lower bound on the full trace, the actual performance P is guaranteed to be at least as good as the estimated performance at every optimization step.

Topology-optimization formulation
In this section, we briefly review the density-based TopOpt formulation (Jensen and Sigmund, 2011) that we employ for our example applications in Sec. 4. The key idea of TopOpt is that an "artificial density" field, ρ(x) ∈ [0, 1], is defined on a spatial "design" domain. This field is then filtered (to impose a non-strict minimum length-scale) and thresholded (to mostly "binarize" the geometry, resulting in a physically admissible geometry). The resulting smoothed and thresholded field is then used to control the spatial material distribution, constituting the structure under design. The design field, ρ, is discretized into a finite number of design degrees of freedom, which constitutes the design variables in the inverse design problem to be solved, e.g. Eq. (15), using a finite-element method (FEM) on a triangular mesh (Badia and Verdugo, 2020;Jin, 2014), and the geometry is optimized using a well-known gradient-based algorithm that scales to high-dimensional problems with thousands or millions of degrees of freedom (Svanberg, 2002). Given a density ρ(x) ∈ [0, 1], one should first regularize the optimization problem by setting a non-strict minimum lengthscale r f , as otherwise one may obtain arbitrarily fine features as the spatial resolution is increased. This is achieved by convolving ρ with a low-pass filter to obtain a smoothed densitỹ ρ (Jensen and Sigmund, 2011). There are many possible filtering algorithms, but in an FEM setting (with complicated nonuniform meshes), it is convenient to perform the smoothing by solving a simple "damped diffusion" PDE, also called a Helmholtz filter (Lazarov and Sigmund, 2011): where r f is the lengthscale design parameter and n is the normal vector at the boundary ∂Ω D of the design domain Ω D . This damped-diffusion filter essentially makesρ a weighted average of ρ over a radius of roughly r f (Lazarov and Sigmund, 2011). (In addition to this filtering, it is possible to impose additional fabrication/lengthscale constraints, for example to comply with semiconductor-foundry design rules (Hammond et al, 2021).) Next, one employs a smooth threshold projection on the intermediate variableρ to obtain a "binarized" density parameterρ that tends towards values of 0 or 1 almost everywhere (Wang et al, 2010): where β is a steepness parameter and η = 0.5 is the threshold. During optimization, one begins with a small value of β (allowing smoothly varying structures) and then gradually increases β to progressively binarize the structure (Christiansen and Sigmund, 2021); here, we used β = 5, 10, 20, 40, 80, similar to previous authors (Christiansen et al, 2020). Finally, one obtains a material, described by an electric relative permittivity (dielectric constant) ε(r) in Eq. (1), given by: where ε 1 is the background material (usually air, ε 1 = 1) and ε 2 is the design material (we use dielectric of ε 2 = 12 throughout this work). Equation (18) includes an optional "artificial loss" term ∼ 1/Q, which effectively smooths out resonances to have quality factors ≤ Q (fractional bandwidth ≥ 1/Q) (Liang and Johnson, 2013). Such an artificial loss is useful in single-ω emission optimization in order to set a minimum bandwidth of enhanced emission, rather than obtaining diverging enhancement over an arbitrarily narrow bandwidth as is possible with lossless dielectric materials (Liang and Johnson, 2013). Also, optimizing low-Q resonances often leads to betterbehaved optimization problems (less "stiff" problems with faster convergence), so during optimization we start with a low Q = 5 and geometrically increase it (to Q = 1000) as the optimization progresses (Liang and Johnson, 2013) The details of the FEM discretization are described in Appendix C, but it is essentially a standard triangular mesh with first-order Lagrange elements (Jin, 2014) and perfectly matched layers (PMLs) for absorbing boundaries (Oskooi and Johnson, 2011). We discretized ρ and {ρ,ρ} with piecewise-constant (0thorder) and first-order elements, respectively. During optimization, one must ultimately compute the sensitivity of the objective function (the trace from Sec. 2) with respect to the degrees of freedom ρ-for each step outlined above (smoothing, threshold, PDE solve, etcetera) we formulate a vector-Jacobian product following the adjoint method for sensitivity analysis (Molesky et al, 2018;Tortorelli and Michaleris, 1994) with some help from automation (Revels et al, 2016), and then these are automatically composed ("backpropagated") by an automatic-differentiation (AD) system (Innes, 2018). In this way, the gradient with respect to all of the degrees of freedom (ρ at every mesh element) can be computed with about the same cost as that of evaluating the objective function once (Molesky et al, 2018).

Numerical examples
In this section, we present three example problems in 2D illustrating how our trace-optimization procedure works in practice for typical problems involving ensembles of spatially incoherent emitters. We start in Sec. 4.1 with a general case where we are maximizing the total emitted power from many emitters distributed throughout a "fluorescent" dielectric material. Next, in Sec. 4.2, we study the enhanced emission from a corrugated surface, analogous to a light-emitting diode (Erchak et al, 2001), showing how the trace formulation Fig. 1 (a) A 2D fluorescent particle (of dielectric ε = 12) with a circular design domain of radius r. The emitters are distributed uniformly within the dielectric material. The total power P radiated outwards in any direction (integral of Poynting flux over Γout) at a wavelength λ is optimized. (b) Typical local optima found for design radius r = 0.5λ with filling-ratio R f = 0.5 and bandwidth quality factor Q = 1000. The numbers above denote the optimized emitting (average) power in arbitrary unit. (c) Emitted power of a disk as a function of the disk radius r for different bandwidth quality factors Q. (d) The number of eigenvalues that contribute 99% of the trace as a function of the disk radius r for different bandwidth quality factors Q.
can be applied to a periodic structure with aperiodic emitters. Both of these examples are based on the general algorithm from Sec. 2.5, which can handle emission into a continuum of possible angles. Finally, in Sec. 4.3 we apply the more specialized algorithm from Sec. 2.3 to optimizing emission from a fluorescent material into a single-mode waveguide. Since Maxwell's equations are scale-invariant (Joannopoulos et al, 2008), the same optimal designs will be obtained for any wavelength λ if the geometry (thickness and period) is scaled with λ (for the same dielectric constants).

Fluorescent particle
In this example, illustrated in Fig. 1a, we optimize the shape/topology of a 2D fluorescent dielectric (ε = 12) particle constrained to have a given area lying within a circular design domain of radius r, maximizing the total power P radiated outwards in any direction at a wavelength λ. The emitters are distributed uniformly within the dielectric material. Further computational details can be found in Appendix C.1.
Because this is a non-convex optimization problem, topology optimization can converge to different local optima from different initial geometries (Molesky et al, 2018). Figure 1b shows multiple local-optima geometries for a design radius r = 0.5λ with filling-ratio R f = 0.5 and bandwidth quality factor Q = 1000 (artificial loss, from Sec. 3), obtained from different initial geometries (disks of different radii and/or ε). The numbers above the geometries denote the corresponding emitted (average) power P in arbitrary units. In this particular case, after examining a large number of local optima (not shown), we found that the best local optimum is simply a circular disk with a particular radius. The existence of many local optima with performance varying by factors of 2-5 is not unusual in wave problems (Yao et al, 2020;Diaz and Sigmund, 2010;Bermel et al, 2010), and while various heuristic strategies have been proposed to avoid poor local minima (Mutapcic et al, 2009;Aage and Egede Johansen, 2017;Bermel et al, 2010;Schneider et al, 2019) beyond simply probing multiple random starting points, the only way to obtain rigorous guarantees is to derive theoretical upper bounds (Miller et al, 2016;Yao et al, 2020) as discussed further in Sec. 5 (purely numerical global search can generally provide practical guarantees only for very low-dimensional Maxwell optimization (Azunre et al, 2019)).
Whether the best optimum is a disk changes with the design-domain radius, and appears to depend on whether there is a nearby radius with a high-Q resonance at the design λ. (In fact, for this particular case the locally optimal disk has an area slightly less than our upper bound, meaning that the area constraint is not active. In consequence, this particular disk remains a local optimum even if the design domain is enlarged, and apparently remains a global optimum until the design domain is sufficiently enlarged to admit a stronger resonance. Although the area constraint is not active at this particular local optimum, it is active at intermediate points during the optimization process, and there are many other local optima that would also be found if the area constraint were not present. Physically that emitted power can increased simply by adding more fluorescent material; correspondingly, without an area constraint we often find a local optimum in which the design region is almost entirely filled with dielectric.) In Fig. 1c, we show how the average power radiated by a circular disk varies with radius r/λ, and clearly exhibits a series of sharp peaks correspond to radii which support high-Q resonances at λ: the familiar whispering-gallery resonant modes (Yang et al, 2015).
The key assumption of our algorithm in Sec. 2.5 was that only a small number of eigenvalues would contribute to the trace, and this assumption clearly holds here. In Fig. 1d, we plot the number of eigenvalues that contribute 99% of the trace as a function of the disk radius. We can see that only a small number of eigenvalues is required to obtain a good estimate of the trace; we find similar results for other shapes. Naively, one might expect that the number of contributing eigenvalues would scale with the area (or volume in 3d), corresponding to the number of resonances per unit bandwidth from the density of states (DOS) (Yu et al, 2010). However, we find that the scaling is nearly linear with the disk radius; the reason the simple DOS argument fails is that it doesn't take into account the variable loss (radiation) rates of the modes, which causes most of the resonances to contribute weakly even if the real part of their frequency is close to the emission frequency. In fact, we have found similar linear scaling of the number of contributing eigenvalues for many other shapes, including other locally optimized shapes, and it appears to be an interesting open theoretical question to prove (or disprove) asymptotic linear scaling.

Periodic emitting surface
In this example, we enhance the emission from a thin "emitting layer" by optimizing a periodically patterned surface situated on top of the layer.-this is inspired by a light-emitting diode (LED) with a patterned surface above an active emitting layer, where it is well known that a periodic pattern can enhance emission via guided-mode resonances (Erchak et al, 2001;Noda and Fujita, 2009). As illustrated in Fig. 2a, the design domain consists of dielectric material (ε = 12) in air with a period L and thickness H d = 0.5λ, the spontaneous-emission current sources are uniformly distributed on an horizontal line ("active layer") inside a lower-index substrate (ε = 2.25) a distance H s = 0.1λ below the design domain. The objective, here, is the total power emitted upwards, integrated over all angles (i.e., the total Poynting flux) using the methods of Sec. 2.5. (Emission purely into the normal direction could be optimized much more efficiently using the methods of Sec. 2.3.) Further computational details can be found in Appendix C.2. Fig. 3 (a) A 2D fluorescent dielectric (ε = 12) medium coupling to a waveguide (ε = 12, width λ/2 √ 12). The design domain is of height H d = 1.5λ and width L d = 0.5λ, the power coupled into a single-mode dielectric waveguide (mode overlap integral at Γout) is optimized. (b) Optimized shape with a filling-ratio R f = 0.5. (c) Averaged field intensity |Hz| 2 distribution. About 64% percent of the power is coupled into the waveguide mode.
Even though the dielectric structure is periodic (the design domain is a single unit cell of ε), the emitters are not periodic-they are independent random currents at every point in the active layer. Computationally, however, we can still reduce the simulation of non-periodic sources in a periodic medium to a set of small unit-cell simulations, using the "array-scanning method" (Capolino et al, 2007). An arbitrary aperiodic source current can be Fourier-decomposed into a superposition of Bloch-periodic sources (J k (x + L) = e ikL J k (x)), each of which can be simulated with a single unit cell and Bloch-periodic boundary conditions in x. The total power is then simply obtained from an integral ( π/L −π/L dk)) over the Bloch wavevector k in the Brillouin zone. For incoherent aperiodic random sources, each of these Bloch-periodic unit-cell calculations is an operator trace (over random currents in the unit cell only) computed by the methods of Sec. 2. (Unit-cell calculations for different k values are completely independent and can be performed in parallel.) Further details of this formulation are described in Appendix C.2. (Moreover, the array-scanning method can be viewed as a special case of a reduction using symmetry: for any symmetry group, sources can be decomposed into a superposition of "partner functions" of the irreducible representations of the symmetry group (Inui et al, 2012), thus reducing the simulation domain even for asymmetrical random sources.) The optimized structures for the design parameter H d = 0.5λ, H s = 0.1λ is shown in Fig. 2b. Note that we have also optimized over the period L (here, simply by repeating the optimization for different values of L) to find an optimized period L = 0.6λ. The eigenvalue distribution of the average power is given in Fig. 2c: Again, we observe that only the first few eigenvalues contribute significantly to the trace, as conjectured in Sec. 2.5.

Emission into a waveguide
This example considers a fluorescent dielectric (ε = 12) medium in air, similar to Sec. 4.1, but in this case we are maximizing the power coupled into a single-mode dielectric waveguide (ε = 12, width λ/2 √ 12) rather than into radiation (Fig. 3a). Since the output is a single channel (O is rank 1), this allows us to apply the method of Sec. 2.3 to perform only a single "reciprocal" Maxwell solve per optimization step. Since the waveguide breaks the rotational symmetry of the problem, the optimum structure is now very different from a circular disk, and must somehow redirect light emitted anywhere in the fluorescent material into the waveguide. This task is made more difficult by the fact that we employ a design domain whose size is only 1.5λ×0.5λ, so the optimization cannot simply surround the emitters with a multi-layer Bragg mirror to confine the radiation (as occurs when optimizing LDOS in a large design domain (Liang and Johnson, 2013;Wang et al, 2018)). Further computational details can be found in Appendix C.3. Figure 3b shows the optimized geometry with a design domain of height H d = 1.5λ and width L d = 0.5λ. The material is constrained to fill at most half of the design domain (to illustrate that we can independently constrain the design region and the design volume); unlike for the disk optimum in Sec. 4.1, this area constraint was active at the optimum shown here. The corresponding averaged field intensity |H z | 2 is displayed in Fig. 3c. We found that 64% of the power is coupled into the desired waveguide mode. In comparison, only 4% of the power is coupled to the waveguide mode for a trivial rectangular design where the the whole design domain is filled with ε = 12 fluorescent material.

Conclusion
We presented a trace formulation and accompanying algorithms for topology optimization of incoherent emitters, which unify and generalize earlier work, and in particular provide the first tractable optimization algorithms for the challenging case of many random emitters and many output channels. Looking forward, we believe that there are many potential applications of these ideas, as well as further algorithmic improvements and generalizations.
We are already preparing to use these techniques to optimize Raman sensing in fluid suspensions of many Raman molecules, in contrast to previous work that only considered a single molecule location (Christiansen et al, 2020;Pan et al, 2021)-it will help us to answer the interesting open question of the optimal spatial density of "hot spots" where light is concentrated to enhance Raman emission. Another application is enhancing cathodoluminescence or other forms of scintillation detectors, which were previously optimized only for normal emission (Roques-Carmes et al, 2021). In contrast to spontaneous emission, where the light is emitted by spatially uncorrelated point sources, one can instead consider incoherent beams of light consisting of uncorrelated random planewave amplitudes-this corresponds to spatially correlated random currents (Wolf, 2007), and we are investigating the resulting trace formulation to design metalenses for incoherent focusing. Other applications include the study of radiation loss due to surface roughness, which can be modeled via random sources with a prescribed correlation function related to the manufacturing disorder and may naively require a large number of Maxwell solves (Johnson et al, 2005;Kita et al, 2018;Payne and Lacey, 1994). Nor is our approach limited to Maxwell's equations-it is applicable to any linear system where one wishes to optimize quadratic functions of random source terms.
Algorithmically, we are investigating ways to apply more sophisticated algorithms to the joint structure/trace optimization problem Eq. (15). When solving the eigenproblem alone (maximizing over V to obtain extremal eigenvalues), it is well known that one can greatly improve upon straightforward gradient ascent by Krylov algorithms such as Arnoldi (Trefethen and Bau, 1997) or LOBPCG (Knyazev, 2001), and we would like to incorporate Krylov acceleration into to joint problem as well. Recent techniques to accelerate frequency-domain solves for multiple sparse inputs and outputs (Lin et al, 2022) may also be applicable to accelerate our trace optimization (since we have multiple sources in a sparse subset of the domain, and objective functions like the power only involve sparse outputs). Similar to the stochastic Lanczos algorithm (Ubaru et al, 2017), one could further exploit the fact that we are computing the trace of a function f (A) of the Maxwell operator A in order to relate the trace more efficiently to Krylov subspaces of A. More generally, there are other applications where one is maximizing tr f (A(p), p) for some f and some parameters p, and it seems similarly beneficial to combine the trace estimation with the parameter optimization in such problems.
Theoretically, it is desirable to complement improved numerical optimizations with new rigorous upper bounds on incoherent emission. Significant progress has already been made on bounding thermal-emission processes (Miller et al, 2015;Molesky et al, 2020) as well as to absorption (Kuang and Miller, 2020;Miller et al, 2016) (related to emission via reciprocity), and many of these techniques should be adaptable to other forms of random emission.

Disclosure
On behalf of all authors, the corresponding author states that there is no conflict of interest.

Replication of results
The code for Sec. 4 can be found at https://github.com/WenjieYao/TraceFormula.

Appendix A Correlation matrix
In this section, we show how to compute the correlation matrix B corresponding to random current sources J discretized in a finite-element basis. One can express the frequency-domain Maxwell equations either in terms of the electric field E, in which case the source term is proportional to J, or in terms of the magnetic field H, in which case the source term is proportional to ∇ × J (Jin, 2014). These two formulations lead to different B correlation matrices.
In particular, we consider the case where the currents J (at a frequency ω) are spatially uncorrelated with a given correlation function: where C is a given 3 × 3 Hermitian positive-semidefinite correlation matrix. For example, in 2D with in-plane electric currents, as in the examples of Sec. 4, one has where J 2 0 (x) is the mean-square current at x. For isotropic random currents, C = J 2 0 I where I is the identity matrix. In a finite-element method, the source vector b is constructed by taking inner products of the source current with real vector-valued basis "element" functionsv n (Nedelec elements in 3D, orv nẑ with scalar Lagrange elementŝ v n in 2D for z-polarized fields) (Jin, 2014). That is, the components of b are b n = v n · (source current) dΩ.
For an electric-field formulation with a source current J, we obtain the correlation function: For localized basis functions (as in a finite-element method), this results in an extremely sparse matrix B-it is zero ifv m andv n don't overlap, or in regions where the mean-square current C is zero. (If C is the identity, B is equal to the Gram matrix of the basis.) Note also that, by construction, B is a Hermitian semidefinite matrix, so it has factorization B = DD † , such as a Cholesky factorization (Trefethen and Bau, 1997). For a magnetic-field formulation, J is replaced by ∇ × J above, but we can simply integrate by parts (Joannopoulos et al, 2008) to move the ∇ × curl operation to act on the basis functions, yielding: Again, this yields a sparse Hermitian semidefinite matrix B.
In the 2D examples of Sec. 4, we employed a magnetic-field formulation with an out-of-plane magnetic field H = H zẑ and corresponding basis functionŝ v nẑ , along with in-plane current sources corresponding to Eq. (A2). In this case, Eq. (A5) simplifies to:

Appendix B Factorization-free trace formulation
Although it is conceptually attractive to use a trace formulation Eq. (7) in terms of the Hermitian matrix H, this formulation required a factorization B = DD † of the correlation matrix B. Computationally, it is desirable to avoid this factorization, especially if the current distribution (and hence B) depends on the geometric degrees of freedom ρ (which would require us to differentiate through the matrix factorization in our adjoint calculation). Instead, it is straightforward to reformulate our optimization problems Eq. (11) and Eq. (15) in terms of B alone using a change of variables. For the few-output-channel case in Sec. 2.3, one can simply start with Eq. (7) and rewrite it as P = tr[A − † OA −1 B], which for a low-rank O simplifies, similar to Eq. (11), to where A † u i = o i , and we have defined the parameter ρ dependence (which can effect both A and B) as a function g(ρ) for use in the adjoint formulation of Appendix C. For the many-channel case of Eq. (15), the key point is that we can choose V to be orthogonal to the nullspace N (D) of D, as any nullspace component would contribute nothing to the trace (DV projects it to zero). Equivalently, we can choose V = D † W (⊥ N (D) (Lax, 2013)) for any N × K matrix W, and this change of variables yields a new optimization problem: where again we have defined the function g(ρ, W) for the parameter and W dependence, along with U = A −1 BW, for use in the adjoint formulation of Appendix C. where The PML conductivity σ (x), = x, y, z function is used to gradually "turn on" the PML to compensate for discretization errors (Oskooi and Johnson, 2011), and we use a quadratic profile σ ( is the distance inside the PML).

C.1 Fluorescent particle
For the problem of Sec. 4.1, the governing equation is exactly Eq. (C11) with µ = 1, whose weak form is (Jin, 2014): where k 0 = ω/c is the free-space wave number, f = (∇ × J) ·ẑ is the source term, and ∇Λ denotes the linear operator ∇Λu = ∇(Λu). The matrix A and the source vector b for the discretized Maxwell equation Eq.
(2) are obtained by replacing u and v with the finite-element basis functionsû n andv n , using first-order Lagrange elements on a triangular mesh (Jin, 2014). The mesh was generated with Gmsh (Geuzaine and Remacle, 2009), corresponding to a spatial resolution of roughly λ/40 in the air and λ/80 in the design region. Notice that in Eq. (B8), only U (via A) and B (describing emission only in the dielectric) depend on the design parameters ρ. We have now the optimization problem as: where R f is the area filling-ratio. Applying adjoint-method analysis (Molesky et al, 2018;Tortorelli and Michaleris, 1994), we obtain the partial derivatives: where Z is the result of an adjoint solve: The partial derivative with respect to W is simply obtained via matrix (Petersen and Pedersen, 2012) CR calculus (Kreutz-Delgado, 2009): We validated the derivatives from the adjoint method against finite differences at random points, and found that the relative error was only about 10 −6 or less, which is not a problem for the CCSA algorithm when converging the optimum to only a few decimal places.
The analysis workflow for this example is shown in Fig. C1. This CCSA update is implemented with NLopt in Julia (Johnson, 2021) for an increasing series of β = 5, 10, 20, 40, 80. And for each β, the loop is terminated either a relative difference of 10 −8 is achieved or the maximum iteration reaches 200. The design parameter ρ is bounded from 0 to 1.

C.2 Periodic emitting surface
For the problem of Sec. 4.2, we simulate a single unit cell with Bloch-periodic boundary conditions in x. Since Gridap only supports periodic boundary conditions in its current version, we make a change of variables H z → H z e ikx so that H z is the periodic "Bloch envelope" function (Joannopoulos et al, 2008). In comparison to Eq. (C10) in Appendix C.1, this corresponds to the transformation ∇ → ∇ + ikx (Joannopoulos et al, 2008): with periodic boundaries in x, whose weak form (including PML in y) can then be obtained via integration by parts: where Λ is the diagonal PML "stretching" matrix Eq. (C12). The objective (average power) is then constructed by a Brillouin-zone integration over the Bloch wavevector k (Capolino et al, 2007): where L is the period of the unit cell and A k is assembled using Eq. (C19). Since this integrand is a periodic function of k, the integral can be approximated by a simple trapezoidal sum over equally spaced points k with exponential accuracy (Trefethen and Weideman, 2014); we used 100 k points in order to resolve sharp resonances. Commuting the integral and the trace in Eq. (C20), similarly to Appendix B (noting that tr = tr ), we obtain g(ρ, W) = max ρ,W L 2π π/L −π/L tr U k (ρ) † OU k (ρ)(W † B(ρ)W) −1 dk , , The adjoint analysis for Eq. (C21) is almost the same as in Appendix C.1, except for the additional integration over k. Also, it shares the same analysis workflow as in Appendix C.1.

C.3 Emission into a waveguide
For the problem of Sec. 4.3, the governing equation and the weak form are identical to Appendix C.1. The main difference is our objective function is now the power in a waveguide mode, computed via an overlap integral using mode orthogonality (Snyder and Love, 1983), rather than a total Poynting flux. Here, we briefly review how this overlap integral is implemented in the finite-element method.
For a propagating waveguide mode with electric and magnetic fields e i and h i , the modal-expansion coefficient α i of that mode for a total magnetic field H is given by the overlap integral (Snyder and Love, 1983) where we have assumed an x-oriented waveguide in 2D and an in-plane electricfield polarization. The power carried by this mode is then simply |α i | 2 . In Sec. 4.3, our objective is the power |α 0 | 2 in a single mode: where N 0 is the normalization (which can be omitted for optimization) from Eq. (C22). If H z is expressed as a linear combination n u nûn of finite-element basis functionsû n , Eq. (C23) becomes o † u 2 as in Eq. (10), where o has components o n given by the linear functional o n = o(û n ) = 1 N 0 e y0ûn dy .
Computationally, the assembly of o in finite-element software is equivalent to constructing a right-hand-side (source) vector b. The optimization becomes: By the adjoint method, for any K, we obtain the derivatives: where w i solves Aw i = Bu i and u i solves the reciprocal problem A † u i = o i from Eq. (B7). This derivative is also compared with the finite difference method and a difference of about 10 −6 is observed. The analysis work flow is provided in Fig. C2.