1 Introduction

Incoherent emission (light emission from random current sources) arises in many problems in optics: spontaneous emission (fluorescence) (Milonni 1976; Kim 1986; Polimeridis et al. 2015), thermal emission in both far (Carey et al. 2008) and near (Basu et al. 2009; Rodriguez et al. 2013) fields, scintillation (Brenny et al. 2014; Roques-Carmes et al. 2021), Casimir and van der Waals forces (Gong et al. 2021), Raman scattering in fluid suspensions (Pilot et al. 2019), incoherent incident waves (Wolf 2007) (which can be transformed to random sources via the equivalence principle Harrington 2001), and even scattering from surface roughness via a Born approximation (Johnson et al. 2005). However, accurate modeling of such spatially random sources can pose severe computational challenges, because a direct approach would involve averaging the results of many simulations over an ensemble of sources (Rodriguez et al. 2011; Luo et al. 2004; Bao et al. 2019); the statistics (correlation functions) of the sources are known, but the difficulty is converting this into statistics (e.g., average power) of the resulting fields. In the cases of fluorescence (Polimeridis et al. 2015), near-field thermal radiation (Rodriguez et al. 2013), and Casimir forces (Gong et al. 2021), for example, tractable methods for arbitrary geometries were only obtained recently. This challenge is compounded when one wishes to perform inverse design (Molesky et al. 2018)—large-scale optimization of emission over many geometric parameters, perhaps even over “every pixel” of a design region via topology optimization (TopOpt) (Jensen and Sigmund 2011)—because one must then repeat the computation 10–1000 s of times as the design evolves, e.g., to maximize spontaneous emission (Rogobete et al. 2003; Liang and Johnson 2013; Wang et al. 2018; Yao et al. 2020) or Raman emission (Christiansen et al. 2020) from a single molecule, much less a distribution of sources.

In this paper, we present a unified framework for inverse design of incoherent emission, combining a trace formulation adapted from recent work (Rodriguez et al. 2013; Polimeridis et al. 2015; Reid et al. 2017) (Sect. 2) with a new algorithm to simultaneously optimize the geometry and evolve to an accurate estimate of the average emission/trace (Sect. 2.5). We apply this framework to perform density-based TopOpt (Jensen and Sigmund 2011) on several example problems in two dimensions: fluorescence from an optimized nanoparticle (Sect. 4.1), enhanced emission from a corrugated surface analogous to a light-emitting diode (Erchak et al. 2001) (Sect. 4.2), and optimized emission into a waveguide (Sect. 4.3). In each case, the emission is not from a single molecule, but the average power produced by an ensemble of incoherent emitters at every point in some material. We show that this emission can be computed by a small number of “eigen-sources” of a Hermitian operator, which can be determined by a Rayleigh-quotient optimization (Li 2015) that is combined with the inverse-design (geometric) optimization. In the special case of emission into a small number K of channels, such as K far-field directions, K waveguide modes, or K points in space, we show that a simple algebraic manipulation transforms the problem into K simulations (Sect. 2.3)—this unifies and generalizes known results based on Kirchhoff’s law of thermal radiation (Reif 1965; Greffet et al. 2018) or (more generally) reciprocity (Roques-Carmes et al. 2021; Janssen et al. 2010) for computing emission into a single planewave direction—but our alternative approach (Sect. 2.5) yields a small number of solves even for \(K\rightarrow \infty\). The other well-known special case is that of a single-emitter location with a random orientation, which reduces to the local density of states (LDOS) via three Maxwell solves (Milonni 1976; Oskooi and Johnson 2013), and this appears as another low-rank special case in our formulation (Sect. 2.4). We believe that this computational framework will enable many future developments in the computational design of complex optical devices involving a wide variety of incoherent processes (Sect. 5).

Density-based TopOpt has attracted increasing interest over the last few decades because of its ability to reveal surprising high-efficiency designs by optimizing over thousands or even millions of design degrees of freedom (Jensen and Sigmund 2011). It parameterizes a structure by an artificial “density” \(\rho (\mathbf {x}) \in [0,1]\) at every point (or every “pixel”) in a design region, which is typically passed through smoothing and threshold steps to yield a physical “binary” design consisting of one of two materials at every point. We apply a damped-diffusion filter (Lazarov and Sigmund 2011), which regularizes the problem by setting a minimum length scale on the design. (Additional manufacturing constraints can be imposed by well-known techniques Hammond et al. 2021, but in the present work, we focus on the fundamental algorithms and not on experimental realization.) Once a scalar objective function (to be optimized) is defined, such as the emitted power (e.g., the new formulation in this paper), its derivatives (sensitivities) with respect to all the design parameters can be efficiently computed with a single additional simulation via adjoint methods (Molesky et al. 2018; Tortorelli and Michaleris 1994). Given the objective function and its derivatives, a variety of large-scale optimization algorithms are available; we use the CCSA/MMA method (Svanberg 2002). We employ a recent free/open-source finite-element method (FEM) package, Gridap.jl (Badia and Verdugo 2020), in the Julia language (Bezanson et al. 2017), which allows us to efficiently code highly customized FEM-based trace formulations in a high-level language, with the construction of the adjoint problem aided by automatic-differentiation (AD) tools (Revels et al. 2016; Innes 2018).

2 Trace formulation

In this section, we first review the formulation of the frequency-domain Maxwell equations as a linear equation, discretized for numerical computation, with physical quantities like power as quadratic forms. Then we show how the ensemble average of such an expression over a distribution of random current sources can be rewritten as a deterministic trace formula. Finally, we explain how such a trace formula can be evaluated efficiently in the context of photonics optimization, both in the “easy” cases of coupling to a small number of output/input channels as well as in the more general cases of a continuum of outputs.

2.1 Wave sources and quadratic outputs

In the frequency domain, the linear Maxwell equations for the electric field \(\mathbf {E}\) in response to a time-harmonic current source at a frequency \(\omega\) are (Jin 2014)

$$\left[ \nabla \times \frac{1}{\mu }\nabla \times -\left( \frac{\omega }{c}\right) ^{2}\varepsilon \right] \mathbf {E}=\mathbf {f},$$
(1)

where \(\varepsilon (\mathbf {x},\omega )\) is the relative electric permittivity, \(\mu\) is the relative magnetic permeability, (\(\mu \approx 1\) for most materials at optical and infrared wavelengths, so we assume \(\mu =1\) throughout this work), c is the speed of light in vacuum, and \(\mathbf {f}={\text{i}}\omega \mathbf {J}\) is a current-source term.

Numerically, one discretizes the problem (e.g., using finite elements Jin 2014) into a linear equation:

$$\mathbf {A}\mathbf {u}=\mathbf {b} ,$$
(2)

where \(\mathbf {A}\) is a matrix representing the Maxwell operator on the left hand of Eq. (1), \(\mathbf {u}\) is a vector representing the discretized electric (and/or magnetic) field, and \(\mathbf {b}\) is a vector representing the discretized source term. In the following, it is algebraically convenient to work with such a discretized (finite-dimensional) form, to avoid cumbersome infinite-dimensional linear algebra, but one could straightforwardly translate to the latter context as well (Joannopoulos et al. 2008).

Most physical quantities P of interest in photonics—such as power (via the Poynting flux), energy density, and force (via the Maxwell stress tensor)—can be expressed as quadratic functions of the electromagnetic fields \(\mathbf {u}\). Since these are real-valued quantities, they correspond in particular to Hermitian quadratic forms:

$$P = \mathbf {u}^{\dagger} \mathbf {O} \mathbf {u} ,$$
(3)

where \(\dagger\) denotes the conjugate transpose (adjoint) and \(\mathbf {O}=\mathbf {O}^{\dagger}\) is a Hermitian matrix/operator. In this paper, we are mainly concerned with computing emitted power P, which is constrained by the outgoing boundary conditions to be non-negative, in which case \(\mathbf {O}\) must furthermore be a positive semidefinite Hermitian matrix (i.e., non-negative eigenvalues) in the subspace of permissible \(\mathbf {u}\), a property that will be useful in Sect. 2.5.

2.2 Trace formula for random sources

Now, consider the case where one has an ensemble of random current sources \(\mathbf {b}\) drawn from some statistical distribution with zero mean and a known correlation function (e.g., a known mean-square current at each point if they are spatially uncorrelated). In this case, we wish to compute the ensemble average, denoted by \(\langle \cdots \rangle\), of our quadratic form Eq. (3):

$$\langle P \rangle = \left\langle \mathbf {u}^{\dagger} \mathbf {O}\mathbf {u}\right\rangle = \left\langle \mathbf {b}^{\dagger} \mathbf {A}^{-\dagger }\mathbf {O}\mathbf {A}^{-1}\mathbf {b}\right\rangle ,$$
(4)

where \(\mathbf {A}^{-\dagger }\) denotes \((\mathbf {A}^{-1})^{\dagger} = (\mathbf {A}^{\dagger} )^{-1}\). Note that only \(\mathbf {b}\) is random in the right-hand expression.

Naively, this average could be computed by a brute-force method in which one explicitly solves the Maxwell equations (\(\mathbf {u} = \mathbf {A}^{-1}\mathbf {b}\)) for many possible sources \(\mathbf {b}\) and then integrates over the distribution, perhaps by a Monte Carlo (random-sampling) method. That approach is possible and has been accomplished, e.g., for evaluating thermal radiation (Rodriguez et al. 2011; Luo et al. 2004), but is computationally expensive. Worse, such a direct approach quickly becomes prohibitive in the context of inverse design, where the averaging must be repeated for many geometries over the course of solving an optimization problem using an iterative algorithm.

Instead, we adapt “trace formula” techniques that have been developed for similar problems in thermal radiation (Rodriguez et al. 2013) and spontaneous emission (Polimeridis et al. 2015), where one must compute the average effect of many random current sources distributed throughout a volume. The basic trick (as reviewed in yet another related setting in Reid et al. 2017) is to write the scalar \(\langle P \rangle\) as a \(1 \times 1\) “matrix” trace and then employ the cyclic-shift property (Lax 2013) to group the \(\mathbf {b}\) terms together:

$$\langle P \rangle = \left\langle \mathbf {b}^{\dagger} \mathbf {A}^{-\dagger }\mathbf {O}\mathbf {A}^{-1}\mathbf {b}\right\rangle ={\text {tr}}\left\langle \mathbf {b}^{\dagger} \mathbf {A}^{-\dagger }\mathbf {O}\mathbf {A}^{-1}\mathbf {b}\right\rangle ={\text {tr}}\left[ \mathbf {A}^{-\dagger }\mathbf {O}\mathbf {A}^{-1}\langle \mathbf {b}\mathbf {b}^{\dagger} \rangle \right] .$$
(5)

Here, the ensemble average is now confined to the \(\langle \mathbf {b}\mathbf {b}^{\dagger} \rangle\) term, which is just the correlation matrix \(\mathbf {B}\) (Johnson and Wichern 2018) of the currents; such a matrix is positive semidefinite, so it can be factorized (Trefethen and Bau 1997) (for convenience below) as follows:

$$\langle \mathbf {b}\mathbf {b}^{\dagger} \rangle =\mathbf {B}=\mathbf {D}\mathbf {D}^{\dagger} ,$$
(6)

for some known matrix \(\mathbf {D}\). Further information about constructing the matrix \(\mathbf {B}\) or its factorization \(\mathbf {D}\) is given in “Appendix 1.” (For the case of finite-element discretizations, we show that \(\mathbf {B}\) is a sparse matrix that is straightforward to assemble and \(\mathbf {D}\) is, for example, a sparse Cholesky factor Davis 2006.) Algebraically, expressing our results in terms of \(\mathbf {D}\) below leads to convenient Hermitian matrices, but we show in “Appendix 2” that the final algorithms can easily employ \(\mathbf {B}\) directly to avoid the computational cost of an explicit factorization. In the simple case where random currents are spatially uncorrelated, which holds for spontaneous emission and thermal emission in local materials (Landau et al. 1980), \(\mathbf {B}\) and \(\mathbf {D}\) are conceptually diagonal linear operators whose diagonal entries are the mean-square and root-mean-square currents, respectively, at each point in space. Whether this leads to a strictly diagonal matrix depend on the discretization scheme as explained in “Appendix 1.” For instance, in the case of thermal and quantum fluctuations, the mean-square currents are given by the fluctuation–dissipation theorem (FDT; Landau et al. 1980), while for spontaneous emission, one can use the FDT with a “negative temperature” determined by the population inversion (Pick et al. 2015; Patra 2015).

Inserting Eq. (6) into Eq. (5), we obtain our objective as the trace of a deterministic Hermitian matrix \(\mathbf {H}\) (which is positive-semidefinite if \(\mathbf {O}\) is, as for power), given by

$$\langle P \rangle ={\text {tr}}\underbrace{\left[ (\mathbf {A}^{-1}\mathbf {D})^{\dagger} \mathbf {O}(\mathbf {A}^{-1}\mathbf {D})\right] }_\mathbf {H}.$$
(7)

The challenge now is to efficiently compute such a matrix trace. Evaluating a trace is easy once the matrix elements are known—it is the sum of the diagonal entries—but the difficulty in Eq. (7) is the computation of \(\mathbf {A}^{-1}\mathbf {D}\). Recall that the \(N \times N\) matrix \(\mathbf {A}\) is a discretized Maxwell operator where N is the number of grid points (or basis functions), a huge matrix (especially in 3D). There are fast methods to solve for \(\mathbf {A}^{-1} (\mathbf {D} \mathbf {v})\) for any single right-hand side \(\mathbf {v}\), typically because the matrix \(\mathbf {A}\) is sparse (mostly zero) as in finite-element methods (Jin 2014), but computing the whole matrix \(\mathbf {A}^{-1}\mathbf {D}\) corresponds to solving N right-hand sides. Equivalently, computing explicit (dense) matrix inverses \(\mathbf {A}^{-1}\) is typically prohibitively expensive (in both time and storage) for matrices arising in large physical systems (Davis 2006). Fortunately, a large number of “iterative” algorithms have been proposed for estimating matrix traces to any desired accuracy using relatively few matrix–vector products (Hutchinson 1989; Ubaru et al. 2017), and what remains is to find a method well-suited to inverse design.

2.3 Trace computation: few output channels

In the important special cases where the desired output is the power in a small number (K) of discrete directions/channels/ports, or perhaps the intensity at a few points in space, we show in this section that the trace computation equation (7) simplifies to only K scattering problems. This fact is a generalization of earlier results commonly derived from electromagnetic reciprocity (Chew 2008), such as the well-known Kirchhoff’s law of thermal radiation (reciprocity of emission and absorption) (Reif 1965) or analogous results for scintillation (Roques-Carmes et al. 2021). More generally, this simplification arises whenever the matrix \(\mathbf {O}\) in Eq. (3) is low rank.

For example, suppose that the objective function is the electric-field intensity \(\Vert \mathbf {E}(\mathbf {x}_{1})\Vert ^{2}\) at a single point \(\mathbf {x}_{1}\) in space, which is the case for “metalens” optimization problems in which one is maximizing intensity at a focal spot (Bayati et al. 2021). In matrix notation for a discretized problem, this quantity corresponds to

$$P = \Vert \mathbf {E}(\mathbf {x}_{1})\Vert ^{2} = \Vert \mathbf {e}_{1}^{\dagger} \mathbf {u}\Vert ^{2} = \mathbf {u}^{\dagger} \underbrace{\mathbf {e}_{1}\mathbf {e}_{1}^{\dagger} }_\mathbf {O} \mathbf {u} ,$$
(8)

where \(\mathbf {e}_{1}\) is the unit vector with a nonzero entry at the location (“grid point”) corresponding to \(\mathbf {x}_{1}\). We then have a rank-1 (Lax 2013) matrix \(\mathbf {O} = \mathbf {e}_{1}\mathbf {e}_{1}^{\dagger}\), and the trace equation (7) simplifies to \(\langle P \rangle = \mathbf {v}_{1}^{*} \mathbf {v}_{1} = \Vert \mathbf {v}_{1} \Vert ^{2},\) where

$$\mathbf {v}_{1} = \mathbf {D}^{\dagger} \mathbf {A}^{-\dagger } \mathbf {e}_{1}$$
(9)

and \(\mathbf {A}^{-\dagger } \mathbf {e}_{1}\) corresponds to solving a (conjugate-) transposed Maxwell problem with a “source” \(\mathbf {e}_{1}\) at the output location, which is closely related to electromagnetic reciprocity (Chew 2008).

Another important example where \(\mathbf {O}\) is low rank arises when the output P is the power in one (or more) orthogonal “wave channels” (Snyder and Love 1983), such as waveguide modes, planewave directions (e.g., diffraction orders), or spherical waves. In such cases, the power in a given channel can be computed by squaring a mode-overlap integral (e.g., a Fourier component for planewaves) of the form \(\Vert \mathbf {o}_{1}^{\dagger} \mathbf {u} \Vert ^{2}\) (Snyder and Love 1983). Exactly as in the single-point case above, this corresponds to a rank-1 matrix \(O = \mathbf {o}_{1}\mathbf {o}_{1}^{\dagger}\) and one must solve only a single “reciprocal” scattering problem to obtain the trace, where the “source” term is the (conjugated) output mode \(\mathbf {o}_{1}\). This is precisely the situation in Kirchhoff’s law, where in order to compute the average thermal radiation (emissivity) in a given direction, one solves a reciprocal problem for the absorption of an incident planewave in the opposite direction (the absorptivity) (Reif 1965; Greffet et al. 2018; Janssen et al. 2010). A similar technique was recently applied to optimize the average power emitted in the normal direction from a scintillation device (Roques-Carmes et al. 2021).

More generally, such cases correspond to an output quadratic form \(\mathbf {O}\) that takes a low-rank (Lax 2013) form:

$$\mathbf {O} = \sum _{i=1}^{K} \mathbf {o}_{i}\mathbf {o}_{i}^{\dagger} ,$$
(10)

where K is the number of rank-1 terms \(\mathbf {o}_{i}\mathbf {o}_{i}^{\dagger}\) (e.g., output channels/ports, output points, or other “overlap integrals”). Substituting Eq. (10) into Eq. (7) and applying the cyclic-trace identity, we obtain

$$\begin{aligned} \langle P \rangle= & {} \sum _{i=1}^{K}{\text {tr}}\left[ (\mathbf {A}^{-1}\mathbf {D})^{\dagger} \mathbf {o}_{i}\mathbf {o}_{i}^{\dagger} \mathbf {A}^{-1}\mathbf {D}\right] \\= & {} \sum _{i=1}^{K}\mathbf {o}_{i}^{\dagger} \mathbf {A}^{-1}\mathbf {D}(\mathbf {A}^{-1}\mathbf {D})^{\dagger} \mathbf {o}_{i} \\= & {} \sum _{i=1}^{K} \mathbf {v}_{i}^{\dagger} \mathbf {v}_{i} = \sum _{i=1}^{K} \Vert \mathbf {v}_{i} \Vert ^{2}, \end{aligned}$$
(11)

where

$$\mathbf {v}_{i}=\mathbf {D}^{\dagger} \mathbf {A}^{-\dagger }\mathbf {o}_{i}$$
(12)

corresponds to a single “reciprocal” Maxwell solve \(\mathbf {A}^{-\dagger }\mathbf {o}_{i} = (A^{-T} \mathbf {o}_{i}^{*})^{*}\) (a single scattering problem) for each i. (Electromagnetic reciprocity simply corresponds to the fact that \(A^{T} = A\) for reciprocal materials Chew 2008.) Hence, the full trace—the average emission into K channels—can be computed with only K solves, and in many such cases \(K=1\).

2.4 Trace computation: few input channels

One trivial special case in which the trace computation drastically simplifies is that of only a few sources or a few input channels, most famously in the case of the local density of states (LDOS): emission by a molecule at a single location in space but with a random polarization (Milonni 1976; Oskooi and Johnson 2013). In the case of LDOS, this reduces the trace computation to three Maxwell solves, one per principal polarization direction, making the problem directly tractable for topology optimization (Liang and Johnson 2013; Wang et al. 2018; Yao et al. 2020). More generally, this situation corresponds to the correlation matrix \(\mathbf {B}\) being low-rank: if \(\mathbf {B}\) is rank K, we can compute the trace in K solves.

In particular, suppose that the currents \(\mathbf {b}\) are of the form \(\mathbf {b} = \sum _{i=1}^{K} \beta _{i} \mathbf {b}_{i}\) where the \(\mathbf {b}_{i}\) are “input channel” basis functions (e.g., a point source with a particular orientation, or an equivalent-current source for a waveguide mode Oskooi and Johnson 2013) and \(\beta _{i}\) are uncorrelated random numbers with zero mean and unit mean-square. Then the correlation matrix \(\mathbf {B} = \langle \mathbf {b} \mathbf {b}^{\dagger} \rangle\) is simply the rank-K matrix \(\mathbf {B} = \sum _{i} \mathbf {b}_{i} \mathbf {b}_{i}^{\dagger}\). In this case, the trace simplifies to

$${\text {tr}}\mathbf {H}= \sum _{i=1}^{K} \mathbf {u}_{i}^{\dagger} \mathbf {O}\mathbf {u}_{i} ,$$
(13)

where computing \(\mathbf {u}_{i}=\mathbf {A}^{-1}\mathbf {b}_{i}\) again requires only K solves, one per source \(\mathbf {b}_{i}\).

2.5 Trace computation: many output channels

In general, neither the matrix \(\mathbf {O}\) nor the matrix \(\mathbf {B}\) are low rank—for example, one may be interested in the total power radiated into a continuum of angles above a surface, or some other infinite set of possible far-field distributions, from sources distributed over a continuous spatial region. Fortunately, it turns out that there is another structure we can exploit: the Hermitian matrix \(\mathbf {H} = (\mathbf {A}^{-1} \mathbf {D})^{\dagger} \mathbf {O} (\mathbf {A}^{-1} \mathbf {D})\) from Eq. (7) is itself typically approximately low rank (“numerically low rank” Markovsky 2012) even if \(\mathbf {O}\) is not: the trace, which is equal to the sum of the eigenvalues of \(\mathbf {H}\) (Lax 2013), is dominated by a few of \(\mathbf {H}\)’s largest eigenvalues. In this section, we first explain why that is the case, and then show how it can be exploited to efficiently estimate the trace during optimization.

There are two reasons to expect approximate low-rank structure of \(\mathbf {H}\) (which we illustrate with numerical examples in Sect. 4). First, on physical grounds, emission enhancement arises due to resonances (via the Purcell effect) (Agio and Cano 2013), but in any finite volume there is some limit to the number of resonances that can interact strongly with emitters in a given bandwidth, related to an average density of states (Yu et al. 2010). The traditional definition of resonant modes corresponds to poles of \(\mathbf {A}^{-1}\) at complex resonant frequencies, which are (linear or nonlinear) eigenvalues \(\omega\) satisfying \(\det A(\omega )=0\) (Nussenzveig 1972); analogously, Eq. (7) decomposes the total power into a sum of eigenvalues corresponding to “resonant current” sources which diagonalize \(\mathbf {H}\) at a given frequency. More explicitly, if \(\mathbf {A}^{-1}\mathbf {D}\) can be accurately approximated by the action of K resonances of \(\mathbf {A}\) (a quasinormal mode expansion Lalanne et al. 2018; Ge et al. 2014), so that \(\mathbf {A}^{-1}\) can be replaced by a rank-K matrix, it follows that \(\mathbf {H}\) is also approximately rank \(\le K\) (since it is a product of rank-deficient matrices Lax 2013). Moreover, geometric optimization to maximize the emitted power modifies the structure to further enhance one or more resonances (Liang and Johnson 2013), and we observe that this sometimes increases the concentration of the trace into a few eigenvalues of \(\mathbf {H}\); that is, optimized structures tend to be even lower rank. Second, in a more general mathematical sense, the matrix \(\mathbf {H}\) is built from off-diagonal blocks of the Green’s function matrix \(\mathbf {A}^{-1}\), connecting sources (at the the support of \(\mathbf {D}\)) to emitted power at some other location (the support of \(\mathbf {O}\), e.g., where the Poynting flux is computed), and off-diagonal blocks of Green’s functions are known to be approximately low-rank (Hackbusch 2015). This is closely related to fast methods for integral equations, such as the fast-multipole method and others (Gibson 2021); essentially, far fields mostly depend on low-order spatial moments of the near fields/currents.

If \({\text {tr}}\mathbf {H}\) is dominated by \(K \ll N\) the largest eigenvalues of the \(N \times N\) matrix \(\mathbf {H}\), then one merely needs a numerical algorithm to compute the K extremal (largest magnitude) eigenvalues using only a sequence matrix–vector products \(\mathbf {H}\mathbf {v}\) (corresponding to individual scattering problems). Fortunately, there are many such algorithms, especially for Hermitian \(\mathbf {H}\) (Lanczos 1950; Knyazev 2001), and one can simply increase K until the trace converges to any desired tolerance. We argue here that methods based on Rayleigh-quotient maximization are particularly attractive for inverse design because they can be combined with geometric/topology optimization. The key fact is that one can express the sum of the largest K eigenvalues as the maximum of a block Rayleigh quotient (Li 2015; Johnson and Joannopoulos 2001; Knyazev 2001; Kokiopoulou et al. 2011), and for positive semidefinite \(\mathbf {H}\) (\(=\) positive semidefinite \(\mathbf {O}\)) this sum is a lower bound on the trace (Kokiopoulou et al. 2011):

$${\text {tr}}\mathbf {H}\ge \max _{\mathbf {V}\in {\mathbb {C}}^{N\times K}}{\text {tr}}\left[ (\mathbf {A}^{-1}\mathbf {D}\mathbf {V})^{\dagger} \mathbf {O}(\mathbf {A}^{-1}\mathbf {D}\mathbf {V})(\mathbf {V}^{\dagger} \mathbf {V})^{-1}\right] ,$$
(14)

where \(\mathbf {V}\) represents any K-dimensional subspace basis, so that one is maximizing the trace over all possible subspaces. This \(\ge\) becomes equality for \(N=K\), but in many problems (below), we find that \(K < 10\) suffices for \(< 1\%\) error in the trace (and, as expected from the arguments above, we find in Sect. 4.1 that the required K increases with the diameter of the emission region).

Computationally, one can maximize the right-hand side of Eq. (14) by some form of gradient ascent (Li 2015; Knyazev 2001), each step of which only requires the evaluation of \(\mathbf {A}^{-1}\mathbf {D}\mathbf {V}\) for a \(N \times K\) matrix \(\mathbf {V}\). That is to say, one only needs K Maxwell solves at each step (instead of N for the full matrix \(\mathbf {H}\)), which vastly reduces the computational cost.

Moreover, this Rayleigh-quotient maximization formula is especially attractive in the context of inverse design, because it can be combined with the geometric optimization itself. That is, instead of “nesting” the trace computation inside a larger geometric optimization procedure, we can simply add \(\mathbf {V}\) to the geometry degrees of freedom and optimize over both \(\mathbf {V}\) and the geometry simultaneously. The full inverse-design problem with incoherent emission can now be bounded by a single optimization problem:

$$\langle P \rangle _{\text{optimum}}\ge \max _{{\text{geometry}},\mathbf {V}\in {\mathbb {C}}^{N\times K}}{\text {tr}}\left[ (\mathbf {A}^{-1}\mathbf {D}\mathbf {V})^{\dagger} \mathbf {O}(\mathbf {A}^{-1}\mathbf {D}\mathbf {V})(\mathbf {V}^{\dagger} \mathbf {V})^{-1}\right] ,$$
(15)

where the geometric parameters (e.g., material densities Jensen and Sigmund 2011 or level sets van Dijk et al. 2013) only affect \(\mathbf {A}\) and (perhaps) \(\mathbf {D}\), and may be subject to some geometric and/or material constraints. The gradient of the right-hand side with respect to the geometry can be computed efficiently with adjoint methods (Molesky et al. 2018; Tortorelli and Michaleris 1994), whereas the gradient with respect to \(\mathbf {V}\) has a simple analytical formula (Johnson and Joannopoulos 2001) (“Appendix 3”), so a variety of gradient-based optimization algorithms (Chong and Zak 2001) can be applied to simultaneously evolve both \(\mathbf {V}\) and the geometry. Furthermore, the Rayleigh quotient has the nice property that, since we are maximizing a lower bound on the full trace, the actual performance \(\langle P \rangle\) is guaranteed to be at least as good as the estimated performance at every optimization step.

3 Topology-optimization formulation

In this section, we briefly review the density-based TopOpt formulation (Jensen and Sigmund 2011) that we employ for our example applications in Sect. 4. The key idea of TopOpt is that an “artificial density” field, \(\rho (\mathbf {x}) \in [0,1]\), is defined on a spatial “design” domain. This field is then filtered (to impose a non-strict minimum length scale) and thresholded (to mostly “binarize” the geometry, resulting in a physically admissible geometry). The resulting smoothed and thresholded field is then used to control the spatial material distribution, constituting the structure under design. The design field, \(\rho\), is discretized into a finite number of design degrees of freedom, which constitutes the design variables in the inverse-design problem to be solved, e.g., Eq. (15), using a finite-element method (FEM) on a triangular mesh (Badia and Verdugo 2020; Jin 2014), and the geometry is optimized using a well-known gradient-based algorithm that scales to high-dimensional problems with thousands or millions of degrees of freedom (Svanberg 2002).

Given a density \(\rho (\mathbf {x}) \in [0,1]\), one should first regularize the optimization problem by setting a non-strict minimum length scale \(r_{\text {f}}\), as otherwise one may obtain arbitrarily fine features as the spatial resolution is increased. This is achieved by convolving \(\rho\) with a low-pass filter to obtain a smoothed density \(\tilde{\rho }\) (Jensen and Sigmund 2011). There are many possible filtering algorithms, but in an FEM setting (with complicated nonuniform meshes), it is convenient to perform the smoothing by solving a simple “damped diffusion” PDE, also called a Helmholtz filter (Lazarov and Sigmund 2011):

$$\begin{aligned} -r_{\text {f}}^{2}\nabla ^{2}\tilde{\rho }+\tilde{\rho }&=\rho , \\ \left. \frac{\partial \tilde{\rho }}{\partial \mathbf {n}} \right| _{\partial \varOmega _{\text {D}}}&=0 , \end{aligned}$$
(16)

where \(r_{\text {f}}\) is the length scale design parameter and \(\mathbf {n}\) is the normal vector at the boundary \(\partial \varOmega _{\text {D}}\) of the design domain \(\varOmega _{\text {D}}\). This damped-diffusion filter essentially makes \(\tilde{\rho }\) a weighted average of \(\rho\) over a radius of roughly \(r_{\text {f}}\) (Lazarov and Sigmund 2011). (In addition to this filtering, it is possible to impose additional fabrication/length scale constraints, for example to comply with semiconductor-foundry design rules Hammond et al. 2021.)

Next, one employs a smooth threshold projection on the intermediate variable \(\tilde{\rho }\) to obtain a “binarized” density parameter \(\tilde{\tilde{\rho }}\) that tends towards values of 0 or 1 almost everywhere (Wang et al. 2010):

$$\tilde{\tilde{\rho }} = \frac{\tanh (\beta \eta )+\tanh \left( \beta (\tilde{\rho }-\eta )\right) }{\tanh (\beta \eta )+\tanh \left( \beta (1-\eta )\right) },$$
(17)

where \(\beta\) is a steepness parameter and \(\eta = 0.5\) is the threshold. During optimization, one begins with a small value of \(\beta\) (allowing smoothly varying structures) and then gradually increases \(\beta\) to progressively binarize the structure (Christiansen and Sigmund 2021); here, we used \(\beta =5,10,20,40,80\), similar to previous authors (Christiansen et al. 2020).

Finally, one obtains a material, described by an electric relative permittivity (dielectric constant) \(\varepsilon (\mathbf {r})\) in Eq. (1), given by

$$\varepsilon (\mathbf {r}) = \left[ \varepsilon _{1} +(\varepsilon _{2}-\varepsilon _{1})\tilde{\tilde{\rho }}(\mathbf {r})\right] \left( 1+\frac{{\text{i}}}{2Q}\right) ,$$
(18)

where \(\varepsilon _{1}\) is the background material (usually air, \(\varepsilon _{1}=1\)) and \(\varepsilon _{2}\) is the design material (we use dielectric of \(\varepsilon _{2}=12\) throughout this work).

Equation (18) includes an optional “artificial loss” term \(\sim 1/ Q\), which effectively smooths out resonances to have quality factors \(\le Q\) (fractional bandwidth \(\ge 1/ Q\)) (Liang and Johnson 2013). Such an artificial loss is useful in single-\(\omega\) emission optimization in order to set a minimum bandwidth of enhanced emission, rather than obtaining diverging enhancement over an arbitrarily narrow bandwidth as is possible with lossless dielectric materials (Liang and Johnson 2013). Also, optimizing low-Q resonances often leads to better-behaved optimization problems (less “stiff” problems with faster convergence), so during optimization we start with a low \(Q = 5\) and geometrically increase it (to \(Q = 1000\)) as the optimization progresses (Liang and Johnson 2013).

The details of the FEM discretization are described in “Appendix 3,” but it is essentially a standard triangular mesh with first-order Lagrange elements (Jin 2014) and perfectly matched layers (PMLs) for absorbing boundaries (Oskooi and Johnson 2011). We discretized \(\rho\) and \(\{{\tilde{\rho }},\tilde{{\tilde{\rho }}}\}\) with piecewise-constant (0th-order) and first-order elements, respectively. During optimization, one must ultimately compute the sensitivity of the objective function (the trace from Sect. 2) with respect to the degrees of freedom \(\rho\)—for each step outlined above (smoothing, threshold, PDE solve, etcetera) we formulate a vector–Jacobian product following the adjoint method for sensitivity analysis (Molesky et al. 2018; Tortorelli and Michaleris 1994) with some help from automation (Revels et al. 2016), and then these are automatically composed (“backpropagated”) by an automatic-differentiation (AD) system (Innes 2018). In this way, the gradient with respect to all of the degrees of freedom (\(\rho\) at every mesh element) can be computed with about the same cost as that of evaluating the objective function once (Molesky et al. 2018).

4 Numerical examples

In this section, we present three example problems in 2D illustrating how our trace-optimization procedure works in practice for typical problems involving ensembles of spatially incoherent emitters. We start in Sect. 4.1 with a general case where we are maximizing the total emitted power from many emitters distributed throughout a “fluorescent” dielectric material. Next, in Sect. 4.2, we study the enhanced emission from a corrugated surface, analogous to a light-emitting diode (Erchak et al. 2001), showing how the trace formulation can be applied to a periodic structure with aperiodic emitters. Both of these examples are based on the general algorithm from Sect. 2.5, which can handle emission into a continuum of possible angles. Finally, in Sect. 4.3, we apply the more specialized algorithm from Sect. 2.3 to optimizing emission from a fluorescent material into a single-mode waveguide. Since Maxwell’s equations are scale invariant (Joannopoulos et al. 2008), the same optimal designs will be obtained for any wavelength \(\lambda\) if the geometry (thickness and period) is scaled with \(\lambda\) (for the same dielectric constants).

4.1 Fluorescent particle

In this example, illustrated in Fig. 1a, we optimize the shape/topology of a 2D fluorescent dielectric (\(\varepsilon = 12\)) particle constrained to have a given area lying within a circular design domain of radius r, maximizing the total power P radiated outwards in any direction at a wavelength \(\lambda\). The emitters are distributed uniformly within the dielectric material. Further computational details can be found in “Appendix C.1.”

Fig. 1
figure 1

a A 2D fluorescent particle (of dielectric \(\varepsilon =12\)) with a circular design domain of radius r. The emitters are distributed uniformly within the dielectric material. The total power P radiated outwards in any direction (integral of Poynting flux over \(\varGamma _{\text {out}}\)) at a wavelength \(\lambda\) is optimized. b Typical local optima found for design radius \(r=0.5\lambda\) with filling ratio \(R_{\text {f}}=0.5\) and bandwidth quality factor \(Q=1000\). The numbers above denote the optimized emitting (average) power in arbitrary unit. c Emitted power of a disk as a function of the disk radius r for different bandwidth quality factors Q. d The number of eigenvalues that contribute 99% of the trace as a function of the disk radius r for different bandwidth quality factors Q

Because this is a non-convex optimization problem, topology optimization can converge to different local optima from different initial geometries (Molesky et al. 2018). Figure 1b shows multiple local-optima geometries for a design radius \(r=0.5\lambda\) with filling ratio \(R_{\text {f}}=0.5\) and bandwidth quality factor \(Q=1000\) (artificial loss, from Sect. 3), obtained from different initial geometries (disks of different radii and/or \(\varepsilon\)). The numbers above the geometries denote the corresponding emitted (average) power P in arbitrary units. In this particular case, after examining a large number of local optima (not shown), we found that the best local optimum is simply a circular disk with a particular radius. The existence of many local optima with performance varying by factors of 2–5 is not unusual in wave problems (Yao et al. 2020; Diaz and Sigmund 2010; Bermel et al. 2010), and while various heuristic strategies have been proposed to avoid poor local minima (Mutapcic et al. 2009; Aage and Egede Johansen 2017; Bermel et al. 2010; Schneider et al. 2019) beyond simply probing multiple random starting points, the only way to obtain rigorous guarantees is to derive theoretical upper bounds (Miller et al. 2016; Yao et al. 2020) as discussed further in Sect. 5 (purely numerical global search can generally provide practical guarantees only for very low-dimensional Maxwell optimization Azunre et al. 2019).

Whether the best optimum is a disk changes with the design-domain radius and appears to depend on whether there is a nearby radius with a high-Q resonance at the design \(\lambda\). (In fact, for this particular case, the locally optimal disk has an area slightly less than our upper bound, meaning that the area constraint is not active. In consequence, this particular disk remains a local optimum even if the design domain is enlarged, and apparently remains a global optimum until the design domain is sufficiently enlarged to admit a stronger resonance. Although the area constraint is not active at this particular local optimum, it is active at intermediate points during the optimization process, and there are many other local optima that would also be found if the area constraint were not present. Physically that emitted power can increased simply by adding more fluorescent material; correspondingly, without an area constraint we often find a local optimum in which the design region is almost entirely filled with dielectric.) In Fig. 1c, we show how the average power radiated by a circular disk varies with radius \(r/\lambda\) and clearly exhibits a series of sharp peaks correspond to radii which support high-Q resonances at \(\lambda\): the familiar whispering-gallery resonant modes (Yang et al. 2015).

The key assumption of our algorithm in Sect. 2.5 was that only a small number of eigenvalues would contribute to the trace, and this assumption clearly holds here. In Fig. 1d, we plot the number of eigenvalues that contribute 99% of the trace as a function of the disk radius. We can see that only a small number of eigenvalues is required to obtain a good estimate of the trace; we find similar results for other shapes. Naively, one might expect that the number of contributing eigenvalues would scale with the area (or volume in 3d), corresponding to the number of resonances per unit bandwidth from the density of states (DOS) (Yu et al. 2010). However, we find that the scaling is nearly linear with the disk radius; the reason the simple DOS argument fails is that it does not take into account the variable loss (radiation) rates of the modes, which causes most of the resonances to contribute weakly even if the real part of their frequency is close to the emission frequency. In fact, we have found similar linear scaling of the number of contributing eigenvalues for many other shapes, including other locally optimized shapes, and it appears to be an interesting open theoretical question to prove (or disprove) asymptotic linear scaling.

4.2 Periodic emitting surface

In this example, we enhance the emission from a thin “emitting layer” by optimizing a periodically patterned surface situated on top of the layer—this is inspired by a light-emitting diode (LED) with a patterned surface above an active emitting layer, where it is well known that a periodic pattern can enhance emission via guided-mode resonances (Erchak et al. 2001; Noda and Fujita 2009). As illustrated in Fig. 2a, the design domain consists of dielectric material (\(\varepsilon = 12\)) in air with a period L and thickness \(H_{\text {d}}=0.5\lambda\), the spontaneous-emission current sources are uniformly distributed on an horizontal line (“active layer”) inside a lower-index substrate (\(\varepsilon =2.25\)) a distance \(H_{\text {s}}=0.1\lambda\) below the design domain. The objective, here, is the total power emitted upwards, integrated over all angles (i.e., the total Poynting flux) using the methods of Sect. 2.5. (Emission purely into the normal direction could be optimized much more efficiently using the methods of Sect. 2.3.) Further computational details can be found in “Appendix C.2”.

Fig. 2
figure 2

a Unit cell of a 2D periodic emitting surface with period L. The design domain consists of dielectric material (\(\varepsilon = 12\)) in air with thickness \(H_{\text {d}}=0.5\lambda\), the spontaneous-emission current sources are uniformly distributed on an horizontal line (purple line) inside a lower-index substrate (\(\varepsilon =2.25\)) a distance \(H_{\text {s}}=0.1\lambda\) below the design domain. The objective is the total power emitted upwards, integrated over \(\varGamma _{\text {out}}.\) b Optimized geometry with period \(L=0.6\lambda\). c The eigenvalue distribution of the average power for the optimized geometry

Even though the dielectric structure is periodic (the design domain is a single unit cell of \(\varepsilon\)), the emitters are not periodic—they are independent random currents at every point in the active layer. Computationally, however, we can still reduce the simulation of non-periodic sources in a periodic medium to a set of small unit-cell simulations, using the “array-scanning method” (Capolino et al. 2007). An arbitrary aperiodic source current can be Fourier decomposed into a superposition of Bloch-periodic sources (\(\mathbf {J}_{k}(x+L)=e^{{\text {i}}kL}\mathbf {J}_{k}(x)\)), each of which can be simulated with a single unit cell and Bloch-periodic boundary conditions in x. The total power is then simply obtained from an integral (\(\int _{-\pi /L}^{\pi /L}{\text {d}}k)\)) over the Bloch wavevector k in the Brillouin zone. For incoherent aperiodic random sources, each of these Bloch-periodic unit-cell calculations is an operator trace (over random currents in the unit cell only) computed by the methods of Sect. 2. (Unit-cell calculations for different k values are completely independent and can be performed in parallel.) Further details of this formulation are described in “Appendix C.2.” (Moreover, the array-scanning method can be viewed as a special case of a reduction using symmetry: for any symmetry group, sources can be decomposed into a superposition of “partner functions” of the irreducible representations of the symmetry group Inui et al. 2012, thus, reducing the simulation domain even for asymmetrical random sources.)

The optimized structures for the design parameter \(H_{\text {d}}=0.5\lambda\), \(H_{\text {s}}=0.1\lambda\) are shown in Fig. 2b. Note that we have also optimized over the period L (here, simply by repeating the optimization for different values of L) to find an optimized period \(L=0.6\lambda\). The eigenvalue distribution of the average power is given in Fig. 2c: again, we observe that only the first few eigenvalues contribute significantly to the trace, as conjectured in Sect. 2.5.

4.3 Emission into a waveguide

This example considers a fluorescent dielectric (\(\varepsilon =12\)) medium in air, similar to Sect. 4.1, but in this case, we are maximizing the power coupled into a single-mode dielectric waveguide (\(\varepsilon =12\), width \(\lambda /2\sqrt{12}\)) rather than into radiation (Fig. 3a). Since the output is a single channel (O is rank 1), this allows us to apply the method of Sect. 2.3 to perform only a single “reciprocal” Maxwell solve per optimization step. Since the waveguide breaks the rotational symmetry of the problem, the optimum structure is now very different from a circular disk, and must somehow redirect light emitted anywhere in the fluorescent material into the waveguide. This task is made more difficult by the fact that we employ a design domain of which size is only \(1.5\lambda \times 0.5\lambda\), so the optimization cannot simply surround the emitters with a multi-layer Bragg mirror to confine the radiation (as occurs when optimizing LDOS in a large design domain Liang and Johnson 2013; Wang et al. 2018). Further computational details can be found in “Appendix C.3.”

Fig. 3
figure 3

a A 2D fluorescent dielectric (\(\varepsilon =12\)) medium coupling to a waveguide (\(\varepsilon =12\), width \(\lambda /2\sqrt{12}\)). The design domain is of height \(H_{\text {d}}=1.5\lambda\) and width \(L_{\text {d}}=0.5\lambda\), the power coupled into a single-mode dielectric waveguide (mode overlap integral at \(\varGamma _{\text {out}}\)) is optimized. b Optimized shape with a filling ratio \(R_{\text {f}}=0.5\). c Averaged field intensity \(\langle \vert H_{z}\vert ^{2}\rangle\) distribution. About 64% of the power is coupled into the waveguide mode

Figure 3b shows the optimized geometry with a design domain of height \(H_{\text {d}}=1.5\lambda\) and width \(L_{\text {d}}=0.5\lambda\). The material is constrained to fill at most half of the design domain (to illustrate that we can independently constrain the design region and the design volume); unlike for the disk optimum in Sect. 4.1, this area constraint was active at the optimum shown here. The corresponding averaged field intensity \(\langle \vert H_{z}\vert ^{2} \rangle\) is displayed in Fig. 3c. We found that 64% of the power is coupled into the desired waveguide mode. In comparison, only 4% of the power is coupled to the waveguide mode for a trivial rectangular design where the the whole design domain is filled with \(\varepsilon =12\) fluorescent material.

5 Conclusion

We presented a trace formulation and accompanying algorithms for topology optimization of incoherent emitters, which unify and generalize earlier work, and in particular provide the first tractable optimization algorithms for the challenging case of many random emitters and many output channels. Looking forward, we believe that there are many potential applications of these ideas, as well as further algorithmic improvements and generalizations.

We are already preparing to use these techniques to optimize Raman sensing in fluid suspensions of many Raman molecules, in contrast to previous work that only considered a single-molecule location (Christiansen et al. 2020; Pan et al. 2021)—it will help us to answer the interesting open question of the optimal spatial density of “hot spots” where light is concentrated to enhance Raman emission. Another application is enhancing cathodoluminescence or other forms of scintillation detectors, which were previously optimized only for normal emission (Roques-Carmes et al. 2021). In contrast to spontaneous emission, where the light is emitted by spatially uncorrelated point sources, one can instead consider incoherent beams of light consisting of uncorrelated random planewave amplitudes—this corresponds to spatially correlated random currents (Wolf 2007), and we are investigating the resulting trace formulation to design metalenses for incoherent focusing. Other applications include the study of radiation loss due to surface roughness, which can be modeled via random sources with a prescribed correlation function related to the manufacturing disorder and may naively require a large number of Maxwell solves (Johnson et al. 2005; Kita et al. 2018; Payne and Lacey 1994). Nor is our approach limited to Maxwell’s equations—it is applicable to any linear system where one wishes to optimize quadratic functions of random source terms.

Algorithmically, we are investigating ways to apply more sophisticated algorithms to the joint structure/trace-optimization problem equation (15). When solving the eigenproblem alone (maximizing over \(\mathbf {V}\) to obtain extremal eigenvalues), it is well known that one can greatly improve upon straightforward gradient ascent by Krylov algorithms such as Arnoldi (Trefethen and Bau 1997) or LOBPCG (Knyazev 2001), and we would like to incorporate Krylov acceleration into to joint problem as well. Recent techniques to accelerate frequency domain solves for multiple sparse inputs and outputs (Lin et al. 2022) may also be applicable to accelerate our trace optimization (since we have multiple sources in a sparse subset of the domain, and objective functions like the power only involve sparse outputs). Similar to the stochastic Lanczos algorithm (Ubaru et al. 2017), one could further exploit the fact that we are computing the trace of a function \(f(\mathbf {A})\) of the Maxwell operator \(\mathbf {A}\) in order to relate the trace more efficiently to Krylov subspaces of \(\mathbf {A}\). More generally, there are other applications where one is maximizing \({\text {tr}}f(\mathbf {A}(p),p)\) for some f and some parameters p, and it seems similarly beneficial to combine the trace estimation with the parameter optimization in such problems.

Theoretically, it is desirable to complement improved numerical optimizations with new rigorous upper bounds on incoherent emission. Significant progress has already been made on bounding thermal-emission processes (Miller et al. 2015; Molesky et al. 2020) as well as to absorption (Kuang and Miller 2020; Miller et al. 2016) (related to emission via reciprocity), and many of these techniques should be adaptable to other forms of random emission.