Stability and conditioning of immersed finite element methods: analysis and remedies

This review paper discusses the developments in immersed or unfitted finite element methods over the past decade. The main focus is the analysis and the treatment of the adverse effects of small cut elements. We distinguish between adverse effects regarding the stability and adverse effects regarding the conditioning of the system, and we present an overview of the developed remedies. In particular, we provide a detailed explanation of Schwarz preconditioning, element aggregation, and the ghost penalty formulation. Furthermore, we outline the methodologies developed for quadrature and weak enforcement of Dirichlet conditions, and we discuss open questions and future research directions.


Introduction
Over the past decades, the finite element method (FEM) has become an essential tool in scientific research and engineering. In its standard form, the finite element method requires the construction of a mesh that fits to the boundary of the considered geometry. For problems of practical interest, such a boundary-fitting mesh is constructed using mesh generators. Automatically generating boundary-fitting meshes can lack robustness, however, in the sense that manual intervention is required to, for example, repair distorted elements or non-matching surfaces. This is particularly the case when the geometry is very complex, such as for (possibly non-water-tight) CAD objects with many patches and trimming curves, geometries presented in the form of scan data, or settings in which frequent remeshing is required (for example in fluid-structure interactions). For such problems, immersed methods have been demonstrated to be capable of establishing a more efficient analysis pipeline, as illustrated by the examples in Figure 1.
The immersed analysis concept was originally proposed in the context of the finite difference method by Peskin [62] in 1972. This immersed boundary method (IBM) and its enhancements have been employed in a wide range of applications ever since (see, e.g., Ref. [63] for a contemporary review). The application of the immersed element concept in the finite element setting can be traced back to the work on the partition of unity method [64,65], generally referred to as the generalized or extended finite element method (GFEM [66,67] or XFEM [68,69]), where elements are cut in order to construct enrichment functions. The concept of cutting finite elements as an unfitted meshing technique was pioneered by Hansbo [70]. This work can be considered as the first instance of an immersed finite element method. The pace in the development and impact of immersed finite element methods increased significantly with the introduction of the finite cell method (FCM) [2,9,71,72], which combines the cut element concept with higher-order basis functions, and CutFEM [21,23,[73][74][75], which generally employs the ghost penalty to enhance numerical stability [73,76]. Besides these prominent immersed FEM techniques, other notable examples are the aggregated finite element method (AgFEM) [77,77,78], the Cartesian grid finite element method (cgFEM) [54,79], weighted extended B-splines (WEB-splines) [80,81] and immersed B-splines (isplines) [82]. Discontinuous Galerkin methods can readily be used on cut meshes after aggregation of cells, since they can be posed on polytopal meshes [83]. In recent years, the immersed analysis concept has been considered in conjunction with isogeometric analysis [84,85] (often referred to as iga-FCM [9] or immersogeometric analysis [41]). In this setting, immersed methods have been demonstrated to be capable of leveraging the advantageous properties of the spline basis functions used in isogeometric analysis, while enhancing the versatility of the simulation workflow for cases where boundary-fitting spline geometries are not readily available, e.g., in scan-based analyses.
Immersed finite element methods are typically confronted by three computational challenges in comparison to standard mesh-fitting finite elements, viz.: i) the numerical evaluation of integrals over cut elements; ii) the imposition of (essential) boundary conditions over the immersed or unfitted boundaries; and iii) the stability of the formulation in relation to small cut elements. A myriad of advanced techniques has been developed over the past decades to resolve these challenges, which has made immersed finite element techniques a competitive simulation strategy for a wide range of problems. This review focuses on the third challenge, i.e., the effects of small cut elements on the performance of immersed finite element methods. We restrict ourselves to a high-level consideration of the first two of these challenges, and we refer the reader to, e.g., Ref. [72] for a detailed review on these topics.
Small cut elements typically give rise to stability and conditioning problems. In this article we review three prominent methods to resolve these problems, which have been developed in recent years, viz.: (Schwarz) preconditioning, ghost-penalty stabilization (commonly used in CutFEM), and aggregation of cut elements (commonly used in AgFEM). In the discussion of these techniques, it is important to note that small cut elements do not only affect the conditioning of the problem, but also the accuracy of the solution. More specifically, direct application of Nitsche's method for the weak imposition of essential boundary conditions requires a Nitsche parameter that scales inversely proportional with the size (in particular the thickness) of the cut element. As observed in, e.g., [1], the unboundedness of this parameter can deteriorate the accuracy of the solution. When using preconditioning techniques to resolve the small-cut-element problem, it is important to realize that these techniques do not resolve this potential issue regarding the accuracy. In contrast to preconditioning techniques, ghost penalty stabilization and aggregation ensure well-posedness with a Nitsche parameter inversely proportional to the ambient-domain mesh size, in addition to controlling the condition number. This makes the immersed finite element approximation using these stabilization techniques robust with respect to the cut element configurations, and preserves the error estimates of boundary-fitted finite element methods. It should be noted that, besides ghost penalty stabilization and aggregation, several other techniques have been developed to resolve the stability problems on small cut elements. However, in general, these do not simultaneously resolve the conditioning problem. An overview of these techniques is also presented in this review.
This review article has three objectives, viz.: (i) to clarify that the small-cut-element problem is multi-faceted; (ii) to present the different techniques in an accessible theoretical framework, so that the implications of practical choices become apparent to a non-expert audience; and (iii) to provide a comparison of the techniques for conditioning and stabilization of immersed finite element methods. With the myriad of immersed finite element techniques available, naturally comes the luxury problem of choosing which technique is most suitable in a particular situation. The rigorous theoretical underpinning of the methods as reviewed in this article is instrumental to aiding in the selection of a particular method, as it provides a fundamental understanding of the relation between small cut elements, conditioning, and stability and accuracy.
It should be noted that immersed methods are not the only techniques to create a robust workflow to deal with complex or implicitly defined geometries. One alternative is the shifted boundary method [86][87][88][89]. The shifted boundary method aims to replace an immersed problem with a similar boundary-fitted problem on the interior element mesh, by projecting boundary conditions from the real (unfitted) boundary to the interior element boundaries. This concept was already introduced in [90], and is also applied in [59]. This method bypasses the aforementioned three computational challenges of immersed finite element methods, but instead introduces other challenges, such as a non-trivial treatment of boundary conditions (including the projection of the boundary data), and a non-obvious geometrical treatment. Another approach is to use hybridizable techniques on unfitted meshes [91,92]. Hybridizable methods can naturally be posed on polytopal meshes, giving additional geometrical flexibility compared to standard finite element methods. As a result, these methods can readily be used on the meshes obtained after the intersection of the boundary representation and background mesh and possibly after aggregation of elements. The impact of small cut elements and small cut faces (these schemes add unknowns on the mesh skeleton) on stability and condition numbers has only been studied very recently in the context of hybrid high-order methods, see [92]. A detailed discussion of these alternative techniques is beyond the scope of this work.
This article is structured as follows. In Section 2 we introduce the basic formulation based on a model problem and discuss the developments regarding the three computational challenges associated with immersed finite element methods. In Section 3 a compact analysis of the illconditioning problem is presented, and Schwarz preconditioning is discussed as a natural technique to resolve this problem. Section 4 then considers stabilization techniques, specifically the ghostpenalty method and aggregation technique, and presents a theoretical framework required to analyze the stability and conditioning properties of these techniques. A discussion on the current state of the field and concluding remarks are finally presented in Section 5. Note that the results presented in this manuscript are reproduced from previous publications by the authors, which are referenced in the text or in the captions.

Immersed finite element methods
In this section we introduce the immersed finite element framework. Section 2.1 presents the concept of immersed finite element methods and specifies the formulations based on a model problem. In Section 2.2 the most prominent challenges of immersed finite element methods, compared to standard boundary-fitted finite element methods, are discussed.

Formulations
As a model problem, we consider the Poisson equation on the domain Ω ⊂ R d , with d ∈ {2, 3} being the number of spatial dimensions ( Figure 2). The domain Ω has a boundary ∂Ω with outward-pointing unit normal vector n. The boundary consists of complementary parts ∂Ω D and ∂Ω N on which Dirichlet (or essential) and Neumann (or natural) conditions are prescribed with boundary data g D and g N , respectively. The field variable u : Ω → R is subject to the strong formulation (Strong) where ∂ n = n · ∇ denotes the normal gradient operator.
In boundary-fitted finite elements, the domain Ω is subdivided into elements T (generally at the expense of a geometrical error), which together comprise the mesh T h (Figure 2b). The size of element T is denoted by h T and a global mesh size parameter is defined as h = max T ∈T h h T . This manuscript only considers quasi-uniform discretizations, that is h T ≈ h ∀T ∈ T h . On the mesh, shape function are introduced, which form the basis of the approximate solution u h . Integration is performed via standard quadrature rules for polynomials on simplices or hypercubes. Homogeneous Dirichlet conditions are usually imposed strongly by removing the basis functions with support on ∂Ω D . The span of the remaining basis functions is referred to as V h,0 ⊂ H 1 0 (Ω). The Bubnov-Galerkin finite element method imposes inhomogeneous boundary conditions through Figure 2: Schematic representation the domain Ω, a boundary-fitted discretization, the embedding in A, and an unfitted discretization. a so-called lifting function, (g D ) ∈ H 1 (Ω), which is generally constructed using the removed basis functions. The Bubnov-Galerkin finite element formulation corresponding to the strong form (2.1) can then be condensed into where the symmetric bilinear form a(w h , v h ) = Ω ∇w h · ∇v h dV is continuous and coercive on H 1 0 (Ω), and the linear form ∇v h dV is also continuous on H 1 0 (Ω). These conditions are sufficient to guarantee well-posedness of the problem (2.2) to find w h ∈ H 1 0 (Ω) and u h ∈ H 1 (Ω) for sufficiently smooth boundary data [93,94]. Because of the coercivity of the bilinear form, it induces the operator norm v h 2 a = a(v h , v h ), which is identical to the H 1 0 (Ω)-seminorm for the boundary-fitted finite element formulation of the Poisson problem. In immersed finite element methods, the domain Ω is embedded into an ambient domain A (Figure 2c). Instead of generating a boundary-fitted partitioning of the domain Ω, the ambient domain is partitioned by the background mesh T A h (Figure 2d). Since the ambient domain is geometrically simple, mesh generation is trivial. But, as the element boundaries do not coincide with the boundaries of the domain Ω, evaluating integrals requires dedicated procedures (see Section 2.2.1). Hence, the complexity of the geometry is essentially captured by the integration procedure, instead of by the mesh generation. This geometrical simplification also has implications on the Galerkin formulation.
To specify the immersed finite element formulation, we define the active mesh T h as the set of elements T which intersect the domain Ω, i.e., where T Ω = T ∩ Ω. Furthermore, the active domain is defined as the union of all the active elements Ω h = ∪ T ∈T h T ⊇ Ω. The set of active elements that are cut by (and therefore intersect) the boundary is defined as and the set of internal elements that are fully supported on the domain Ω is defined as Analogous to the boundary-fitted case, shape functions are introduced on the active mesh T h and an approximate solution u h is formed as a linear combination of the restriction of the shape functions to Ω. The space spanned by these functions is referred to as V h . In contrast to the boundary-fitted finite element formulation, in the unfitted setting it is generally not feasible to create a finite dimensional subspace of V h,0 ⊂ H 1 0 (Ω), which is needed to strongly impose Dirichlet conditions. Instead, Dirichlet conditions are generally imposed weakly. The most common approach for weakly enforcing Dirichlet boundary conditions is via Nitsche's method [95], which gives rise to the immersed finite element formulation where the bilinear and linear form are defined as The parameter β, commonly referred to as the Nitsche or penalty parameter, must be chosen large enough to ensure coercivity of the bilinear form a h (·, ·) in the discrete space Ω , which can be computed by solving a generalized eigenvalue problem [96]. This can result in arbitrarily high values of β over the entire boundary ∂Ω D . It is therefore common to select a local, element-wise, Nitsche parameter satisfying TΩ by solving a local generalized eigenvalue problem for each element. Based on dimensional considerations, one can infer that β| T should be inversely proportional to a generalized thickness, h TΩ , of T Ω normal to T ∩ ∂Ω D [97,98]. We refer to h TΩ as a generalized thickness, because T Ω can be irregularly shaped and, consequently, a well-defined length scale cannot generally be provided. While, with an element-wise parameter, a single small cut element does not result in a high value of β over the entire boundary, an element-wise parameter is still locally unbounded, which can lead to numerical issues [1]. With appropriate stabilization (see Section 4), the properties of the immersed formulation revert to those of the boundary-fitted setting considered in the original paper by Nitsche [95], and a global parameter inversely proportional to the mesh width of the background mesh suffices. In the remainder of this manuscript, both element-wise and global Nitsche parameters will be indicated by β and the notation a h (·, ·) and l h (·) will be used in both stabilized and unstabilized formulations, with the choice of the Nitsche parameter following from the context. Similar to the boundary-fitted setting, the coercive and bounded bilinear form a h induces an equivalent operator norm This operator norm coincides with the matrix energy norm of a corresponding coefficient vector, which renders it useful in the analysis of condition numbers of linear-algebraic systems emerging from immersed formulations.
A myriad of immersed finite element variants to the problem (2.6)-(2.7) exist, the most prominent of which will be discussed in Section 2.2.2. A particularly noteworthy variation to (2.7) is the penalty method, which corresponds to the case where the terms that involve normal gradients in the above-mentioned operators are left out. This makes the method formally inconsistent with the strong form (2.1), but simplifies the implementation and ensures coercivity independent of the parameter β. In this review, we do not discuss the penalty method in detail and focus on the application of Nitsche's method. In general, the use of the penalty method instead of Nitsche's method has a negligible impact on the conditioning, but does affect the accuracy of the solution.
To solve the immersed finite element formulation (2.6) numerically, it is recast as a linear algebra problem where the components of the matrix A ∈ R N ×N and right hand side vector b ∈ R N correspond to with φ i the i-th basis function and N the number of dimensions of the finite dimensional function space V h . The approximate solution u h is given by with x i the components of the coefficient vector x ∈ 2 (N ) in equation (2.8). In the remainder we employ two norms for this coefficient vector, viz. the 2 vector norm x 2 2 = x T x and, as A is Symmetric Positive Definite (SPD), the matrix energy norm Note that, on account of (2.9), this matrix energy norm is equal to the operator norm, i.e., In the remainder of this manuscript we focus our presentation on the single field Poisson problem (2.1), discretized by quasi-uniform meshes with C 0 -continuous piecewise polynomial basis functions of order p. The presented analyses and methods naturally extend to vector-valued problems that can be expressed as a Cartesian product of scalar fields (i.e., one field per space dimension). Mixed formulations and discretizations with local refinements are not discussed in detail, but, unless otherwise specified, the provided insights also carry over mutatis mutandis to these cases. Curl-conforming and div-conforming problems (i.e., posed in H(curl; Ω) or H(div; Ω), respectively), suffer from similar problems with stability and conditioning on small cut elements as the grad-conforming problems (posed in H 1 (Ω)) considered here. With respect to conditioning, maximum continuity splines as commonly used in isogeometric analysis [84,99] can behave differently from C 0 -continuous bases. Therefore, Section 3 also considers B-spline bases as a special case.

Challenges in immersed finite elements
Although the immersed finite element formulation introduced above fits within the general framework of the conventional finite element method, the application of immersed finite elements involves the consideration of various specific challenges. In this section we discuss the most prominent of these, viz.: (i) numerical evaluation of integrals over cut elements, (ii) imposition of Dirichlet boundary conditions, and (iii) stability and conditioning of the formulation.

Cut-element integration
In boundary-fitted FEM, integrals over the domain Ω are split into element-wise integrals. The elements generally correspond to polygons such as simplices or hypercubes, and the integrands are usually element-wise polynomials. For this reason, standard quadrature rules can readily be used. Integration in immersed methods is more involved, as the integration procedure should adequately approximate integrals on, in principle, arbitrarily shaped cut elements. This problem closely relates to the special treatment of discontinuous integrands in enriched finite element methods such as XFEM and GFEM.
In immersed finite element methods the geometry representation is independent of the mesh. In general, there are two ways to represent the geometry, viz. implicit representations (e.g., voxel data [49,100] or a level set function [47]) and explicit boundary representations (e.g., spline surfaces [3], B-rep objects [9], or isogeometric analysis on trimmed CAD objects [8,12]). For all geometry representations, dedicated techniques are required to evaluate volume integrals over the intersection of active elements with the domain. A myriad of such integration procedures has been developed over the years in the context of immersed FEM (see [72,101] for reviews/comparisons) and enriched FEM (see [102]), an overview of which is presented below. Techniques to integrate over trimmed boundaries are strongly dependent on the geometry description. In explicit representations, the geometry description itself can sometimes be leveraged, whereas generally a boundary-reconstruction procedure is required in the case of implicit boundary representations. The reader is referred to, e.g., Refs. [72] for a discussion regarding the various techniques to handle different geometry representations, and the related problem of integrating over unfitted boundaries.
The cut element volume integration techniques can be categorized as: • Octree subdivision: The general idea of octree (or quadtree in two dimensions) integration is to capture the geometry of a cut element by recursively bisecting sub-cells that intersect with the boundary of the domain, as illustrated in Figure 3. At every level of this recursion, sub-cells that are completely inside the domain are preserved, while sub-cells that are completely outside of the domain are discarded. This cut element subdivision strategy was initially proposed in the context of the finite cell method (FCM) in [2] and is generally appraised for its simplicity and robustness with respect to cut element configurations. Octree integration has been widely adopted in immersed FEM, see, e.g., [41,47,72,103]. Various generalizations and improvements to the original octree procedure have been proposed, of which the consideration of tetrahedral cells [48,104], the reconstruction of the unfitted boundary by tessellation at the lowest level of bisectioning [47], and the consideration of variable integration schemes for the sub-cells [101], Figure 3: Illustration of the octree procedure to integrate cut elements, in which integration sub-cells that intersect with the immersed boundary are recursively bisected.
are particularly noteworthy. Despite the various improvements to the original octree strategy, a downside of the technique remains the number of integration sub-cells (and consequently the number of integration points) that result from the procedure, especially in three dimensions and with high-order bases, where the refinement depth needs to be increased under mesh refinement to reduce the integration error with the same rate as the approximation error [105].
• Cut element reparameterization: Accurate cut element integration schemes can be obtained by modifying the geometry parameterization of cut elements in such a way that the immersed boundary is fitted. This strategy was originally developed in the context of XFEM by decomposing cut elements into various sub-cells with only one curved side and then to alter the geometry mapping related to the curved sub-cell to obtain a higher-order accurate integration scheme [102]. This concept has been considered in the context of implicitly defined geometries (level sets) [106][107][108][109], the NURBS-enhanced finite element method (NEFEM) [110,111], the Cartesian grid finite element method (cgFEM) [54,79], and the mesh-transformation methodology presented in [112]. In the context of the finite cell method, the idea of cut element reparameterization has been adopted as part of the smart octree integration strategy [113][114][115], where a boundary fitting procedure is employed at the lowest level of octree bisectioning in order to obtain higher-order integration schemes for cut elements with curved boundaries. Reparameterization procedures have the potential to yield accurate integration schemes at a significantly lower computational cost than octree procedures, but generally compromise in terms of robustness with respect to cut element configurations.
• Polynomial integration: Provided that one can accurately evaluate integrals over cut elements (for example using octree integration), it is possible to construct computationally efficient integration rules for specific classes of integrands. In the context of immersed finite element methods, it is of particular interest to derive efficient cut-element integration rules for polynomial functions. The two most prominent methods to integrate polynomial functions over cut elements are moment fitting techniques [115][116][117][118][119], in which integration point weights and (possibly) positions are determined in order to yield exact quadrature rules, and equivalent polynomial methods [120,121], in which a non-polynomial (e.g., discontinuous) integrand is represented by an equivalent polynomial which can then be treated using standard integration procedures. Such methods have been demonstrated to yield efficient quadrature rules for a range of scenarios. A downside of such techniques is the need for the evaluation of the exact integrals (using an adequate cut-element integration procedure) in order to determine the optimized integration rules. This can make the construction of such quadrature rules computationally expensive, which makes them more suitable in the context of time-dependent and non-linear problems (with fixed boundaries and interfaces), for which the construction of the integration rule is only considered as a pre-processing operation (for each cut element) and the optimized integration rule can then be used throughout the simulation. In the group of Dominik Schillinger at TU Darmstadt, work is currently being done to use neural networks for the computation of integration point weights to accelerate this process.
• Dimension-reduction of integrals: Depending on the problem under consideration, it can be possible to reformulate volumetric integrals over cut elements by equivalent lower-dimensional integrals. This approach is advantageous from a computational effort point of view, as the equivalent integrals are generally less costly to evaluate. A reformulation of volume integrals in terms of boundary integrals has been proposed in the context of XFEM in [122] and in the immersed FEM setting in [123]. Another dimension-reduction approach pertains to high-order quadrature for embedded methods with hexahedral background meshes and implicitly defined convex surfaces [124]. This technique relies on a reduction of integrals to one-dimensional integrations and provides strictly positive weights. The methodology proposed in [125] (and references therein) provides closed-form formulas for the integration of monomials on convex and nonconvex polyhedra and the extension to curved domains, relying on a reduction of integrals up to vertex evaluations. A downside of dimension-reduction techniques is that they are less general than standard quadrature rules. For example, some techniques can lead to negative quadrature weights in the case of high-order FEM [126], which can potentially cause instabilities with inexact arithmetics.
• Parameter optimization: Various strategies have been proposed to optimize the parameters of the cut element integration techniques discussed above, most notably for octree subdivision techniques. In [101,105] algorithms are proposed to select the integration order on the different levels of sub-cells. Ref. [126] presents a methodology to reduce the number of integration points in a manner similar to moment fitting techniques. These optimization techniques have demonstrated that reducing the number of integration points does not necessarily compromise the accuracy of the simulations. This is theoretically supported by Strang's first lemma [93,127,128], which indicates that integration does not need to be exact in order to attain (optimal) convergence [127] 1 . It should be mentioned that this lemma is also considered in the context of CutFEM in e.g., [105,129,130].
In the selection of an appropriate cut element integration scheme one balances robustness (with respect to cut element configurations), accuracy, and expense. If one requires a method that automatically treats a wide range of cut element configurations and is willing to pay the price in terms of accuracy and computational effort, octree integration is the compelling option. For moderate accuracy, quadratures that rely on exact monomial integrations and only involve vertex evaluations are appealing in terms of accuracy and robustness. If the accuracy and computational-expense requirements are more stringent and the range of configurations is suitably restricted, alternative techniques such as cut element reparameterization are attractive. In the case of implicit boundary representations, an additional consideration in the selection of the cut element integration scheme is whether or not it is required to obtain a parameterization of (or integration scheme on) the unfitted boundary. Parameter optimization procedures can aid in fine-tuning the balance between robustness, accuracy and expense. The appropriateness of the various techniques also depends on the way in which the geometry is represented (implicit vs. explicit), as this can have a substantial impact on the implementation of a particular technique.

Dirichlet boundary condition imposition
Since the (immersed) boundary of the physical domain does not coincide with the background mesh, the imposition of boundary conditions in immersed methods requires special consideration. Given that a parameterized boundary representation for integration over the boundary exists, Neumann (or natural) boundary conditions can be imposed weakly, in the same way as done in boundary-fitted FEM. The imposition of Dirichlet (or essential) boundary conditions in immersed FEM is not as straightforward, however. Because of the disparity between the background grid and the physical domain, basis functions defined on the background mesh are not generally interpolatory on the unfitted boundary. This precludes strong imposition of Dirichlet conditions as in boundary-fitted FEM. Therefore, boundary conditions in immersed FEM are usually imposed weakly. Different techniques for the imposition of Dirichlet boundary conditions on unfitted boundaries exist, the most prominent of which are: • Penalty method: The penalty method supplements the weak form of boundary-fitted FEM with a penalty term that penalizes differences between the approximate solution and the prescribed Dirichlet data. This approach, which has been applied in the pioneering work on the finite cell method [2], is generally considered as the most straightforward technique to impose Dirichlet conditions on unfitted boundaries. The formulation omits the boundary terms that arise from the partial integration in the derivation of the weak form -in boundary-fitted FEM these terms drop out as the test functions vanish on Dirichlet boundaries -such that a modeling error is introduced which yields an inconsistent formulation. For appropriately selected penalty parameters the inconsistency can be acceptable [131], making the penalty method effective for a broad class of immersed problems with complex geometries, see, e.g., [49,100]. Nevertheless, the choice of an appropriate penalty parameter is challenging. A too small value does not adequately enforce the prescribed boundary conditions, while a too large value exacerbates the conditioning problems [8] and can lead to large, nonphysical, gradients on cut elements [1,132,133].
• Nitsche's method: Nitsche's method [95] can be considered as the consistent equivalent of the penalty method, as it retains the boundary gradients (sometimes referred to as the flux terms) in the weak formulation. Through appropriate scaling of the Nitsche (or penalty) parameter, a stable formulation is obtained, see, e.g., [134]. Nitsche's method is a widely used technique for the weak imposition of boundary conditions in immersed finite element methods. An elegant aspect of Nitsche's method is that the parameters can be computed per element [96], avoiding potential difficulties in the selection of a single global Nitsche parameter, see, e.g., [10]. The value of the Nitsche parameter should be inversely proportional to the thickness of the element 1 , and can become arbitrarily large for small cut elements [1]. This problem can be remedied by means of additional stabilization terms such as the ghost penalty, or by the aggregation of basis functions; see Section 4. Also, a nonsymmetric Nitsche method can be applied to avoid the need for stabilization [135][136][137], although this does affect the linear solver. Additionally, nonsymmetric Nitsche methods are not adjoint consistent, which means that these formulations result in suboptimal approximation properties in the L 2 (Ω)-norm [135].
• Lagrange multiplier techniques: Dirichlet conditions on immersed boundaries can be enforced by supplementing the weak formulation with additional constraint terms, e.g., [134,138]. In contrast to the penalty method and Nitsche's method, in Lagrange multiplier techniques these constraint terms are associated with an auxiliary field variable which is defined over the unfitted boundary. This auxiliary field is referred to as the Lagrange multiplier field. Lagrange multiplier techniques result in a saddle point problem, which implies that the discrete Lagrange multiplier field needs to be selected in such a way that a stable system is obtained [139]. Examples of Lagrange multiplier type techniques for immersed FEM are presented in [35,[140][141][142][143]. While an advantage of Lagrange multiplier techniques over Nitsche's method and the penalty method is that these do not require the selection of a parameter, the downsides are that additional degrees of freedom are introduced through the Lagrange multiplier field, and that a (inf-sup) stable discretization of the Lagrange multiplier field is generally non-trivial. Additionally, for many problems, the introduction of Lagrange multipliers changes the nature of the linear system from positive (semi-)definite to indefinite and breaks the diagonal dominance. This affects the applicability of iterative solvers (in particular this precludes the conjugate gradient method) and the factorization in sparse direct solvers.
• Basis function redefinition: An alternative class of techniques to impose Dirichlet conditions on immersed boundaries is based on the idea to redefine the basis functions in such a way that the modified (non-vanishing) basis functions are interpolatory on the unfitted boundary. This enables traditional strong imposition of boundary conditions, as is standard in boundary-fitted finite element methods. Prominent examples of methods based on this concept are WEBsplines [80,81] and, more recently, i-splines [82]. The main advantage of these techniques is that they do not require modifications to the weak formulation in comparison to boundary-fitted FEM, but the algorithms to redefine the basis functions constitute an additional non-trivial component in the implementation and analysis.
In the selection of an appropriate technique for imposing Dirichlet conditions in immersed finite elements, consistency and accuracy requirements are a prominent consideration. If there are no stringent accuracy requirements, the penalty method is an attractive option on account of its simplicity. If this method does not meet the accuracy requirements, one can resort to consistent weak formulations, where particularly Nitsche's method strikes a suitable balance between accuracy and ease of implementation. Basis function redefinition strategies are an attractive alternative when there are reasons to enforce Dirichlet conditions in a strong manner, like in boundary-fitted finite element formulations.

Stability and conditioning
Immersed discretizations that make use of finite element spaces defined on a background mesh generally suffer from the so-called small-cut-element problem. Conventional boundary-fitted finite element methods impose conditions on the shape and the size of the elements in the considered (family of) meshes. Such conditions can be (more-or-less) directly managed by the mesh-generation algorithm. In immersed finite element methods, on the other hand, one has no control over the shape and size of cut elements. Consequently, cut elements can have arbitrarily small intersections with the physical domain, which is commonly referred to as the small-cut-element problem. This loss of control in the immersed or unfitted setting can have extensive implications on the wellposedness and conditioning of the resulting discrete problem, unless the immersed method is judiciously formulated. The challenge of stability and conditioning in immersed finite elements with respect to small cut elements is the main focus of this review. Frequently, this small-cut-element problem is considered as a single-faceted problem. In our opinion, however, there are two distinct (albeit strongly related) facets to the small-cut-element problem, viz. stability and conditioning: • Stability is related to the fact that the numerical formulation itself can be ill-posed in the immersed setting. The most clear example of this is the imposition of Nitsche's method with standard unfitted finite element spaces. The Nitsche parameter required for coercivity tends to infinity under mesh refinement, which can lead to unbounded gradients on immersed or unfitted boundaries. It should be mentioned that the stability of immersed finite elements is closely related to the way in which essential boundary conditions are enforced, and, in the case of Nitsche's method, to the value of the Nitsche parameter.
• Conditioning is related to the linear algebraic problem of obtaining the solution of the discrete system that arises from an immersed finite element formulation. Even if the problem is properly defined from a stability perspective, e.g., with only Neumann conditions on unfitted boundaries, the resulting system can be arbitrarily ill-conditioned, which impedes the solution of the system. This is caused by functions that are only supported on small cut elements, for which the operator norm (that is equal to the matrix energy norm of the corresponding coefficient vector) is affected by the cut-element size, while the norm of the coefficient vector itself is not. This implies that the eigenvalues of the system matrix can be arbitrarily close to zero, depending on the cut-element configuration.
A myriad of techniques has been developed to counteract the problems associated with small cut elements, encompassing treatments for both stability and conditioning issues. The effects of cut elements on the conditioning of the linear system are discussed in detail in Section 3.1. Section 3.2 provides an overview of tailored preconditioners that address this issue. Specific attention in this section is devoted to Schwarz preconditioners, that form a natural resolution to the conditioning problem. The stability of immersed discretizations is treated in detail in Section 4, which considers both the stability problem itself in Section 4.1 and the methodologies that have been developed to resolve it in Section 4.2. Two particular approaches, viz. element aggregation and the ghostpenalty method, resolve the stability problems in a manner that yields enhanced coercivity in H 1 (Ω h ) (i.e., on the union of all active elements) instead of just in H 1 (Ω). Consequently, these approaches do not only guarantee stability, but simultaneously preclude conditioning problems. Element aggregation and the ghost-penalty method are discussed in detail in Sections 4.3 and 4.4, respectively, and Section 4.5 presents a unified mathematical approach to analyze these techniques.
Remark 2.1. Interpretation from the perspective of norm equivalences. In terms of the problem definitions presented above, stability and conditioning can be distinguished as different norm equivalences in the discrete space. Stability pertains to the strength of the norm equivalence between the H 1 (Ω)-norm, which is a common measure for establishing the quality of a solution, and the operator norm, · a h (or the equivalent β-norm or energy norm, which will be defined in Section 4). Conditioning, on the other hand, pertains to the strength of the norm equivalence of the matrix energy norm, · A (which is equal to the operator norm of the corresponding discrete function), and the 2 vector norm, · 2 . In an unfitted discretization that contains small cut elements, the equivalences between these norms can become very weak, which indicates the potential of difficulties with respect to stability and/or conditioning. To ensure coercivity of the weak formulation when Nitsche's method is employed in an immersed setting without a dedicated stabilization technique, a (locally) very large Nitsche parameter is required. Such a very large Nitsche parameter, however, also causes the operator norm to be only very weakly bounded by the H 1 (Ω)-norm [1]. Similarly, functions that are only supported on small cut elements have a small operator norm, while the size of the cut elements does not affect the the 2 -norm of the coefficient vector. For this reason, the bound on the 2 -norm of the coefficient vector in terms of the matrix energy norm can be arbitrarily weak [144].

Remark 2.2. Stable explicit time integration.
While not considered in detail in this review, a related challenge is the stability of explicit time integrators on unfitted grids containing small cut elements. Analogous to the system matrix, the eigenvalues of the (consistent) mass matrix cannot be bounded from below in immersed formulations, such that the stable time step can become arbitrarily small. Similar to the stability of the solution on small cut elements discussed above, this can generally be resolved by the function space manipulations discussed in Section 4.3 or by the addition of the weak stabilization terms as discussed in Section 4.4 to the mass matrix, see, e.g., [145]. Regarding the stability of explicit time integrators, also the investigations into the spectral behavior of Nitsche's method in both boundary-fitted and immersed settings presented in [146][147][148] are of particular interest. It should be mentioned that systems with lumped mass matrices and smooth (isogeometric) discretizations form an exception to the dependence of the stable time-step size on cut elements. For such systems, the eigenvalues of the lumped mass matrix on small cut elements scale more favorably with the size of small cut elements than the eigenvalues of the stiffness matrix. Therefore, stable explicit time integration can be performed without additional stabilization and with time steps dependent on the background element size, see [149,150] for details.

Ill-conditioning and preconditioning
An integral aspect of finite element methods is to find the solution of the corresponding linear system of equations. For small systems this is generally done by a direct solver, which factorizes the linear system and directly computes the solution up to machine precision. The computational cost of direct solvers scales poorly with the size of the system, however, resulting in computation time and memory requirements that become prohibitive for large systems [151]. For this reason, large systems are generally solved by iterative solvers, the computational cost of which generally scales better with the size of the system [152]. The convergence of iterative solvers is strongly correlated with the conditioning of the system. Without tailored stabilization or preconditioning, systems derived from immersed finite element formulations are generally severely ill-conditioned, impeding the application of iterative solvers [144]. In Section 3.1, an analysis of the causes of ill-conditioning in immersed finite element systems is presented. Preconditioners which effectively resolve these conditioning problems are discussed in Section 3.2. Section 3.3 provides a detailed discussion about Schwarz preconditioners, which have commonly been applied to immersed finite element systems in recent years. This last subsection also includes an example that illustrates the effect of Schwarz preconditioning on an immersed finite element simulation of a scan-based geometry.

Condition number
The conditioning of a linear system is often used as an indication of the complexity of solving that linear system by means of an iterative solver. An important property of a system matrix to indicate the conditioning is the (Euclidean) condition number, κ (A), which is defined as the product of the norm of the system matrix and the norm of its inverse While the condition number is formally defined as the sensitivity of the solution to perturbations in the right hand side, it also relates to the convergence of iterative solvers. It is noted that the convergence of these solvers is dependent on more factors than simply the condition number, such as, e.g., the grouping of eigenvalues and the orthogonality of eigenvectors [152,153]. Based on the equality between the matrix energy norm y A of a vector y in the vector space 2 (N ) and the operator norm v h a h of the corresponding function v h = i y i φ i in the isomorphic function space V h , the norm of the (symmetric) system matrix A can be written as Similarly, for the norm of the inverse of A it can be written that The identities in (3.2) and (3.3) convey that the condition number of a linear system emanating from an (immersed) finite element formulation is determined by the tightness of the equivalence between, on one side, the vector norm y 2 of the coefficient vector, and, on the other side, the operator norm v h a h of the corresponding function, on the isomorphic spaces 2 (N ) and V h . The weakness of this norm equivalence for systems that contain small cut elements is the root cause of ill-conditioning in immersed finite element methods. The norms defined above can be employed to provide an estimate, or lower bound, for the condition number of a system with a small cut element. To derive this estimate, we consider a two-dimensional cut-element scenario as indicated in Figure 4. First, we define the volume fraction η i of the cut element T i ∈ T cut h as the ratio between the volume of the intersection of the element with the domain Ω and the volume of the full element in the background grid Second, we note that in a polynomial basis of order p it is generally possible to construct a function with ξ = (ξ 1 , ξ 2 ) a local coordinate that has its origin ξ = (0, 0) at the (not-cut) vertex of T i , as indicated in Figure 4, and where the corresponding coefficient vector y in the vector space 2 (N ) has a norm that is approximately y 2 ≈ 1 depending on the employed basis (e.g., B-splines, Lagrange, or integrated Legendre). Under the assumption that element T i is cut by a natural boundary such that the weak form does not contain boundary terms, the operator norm of v h is given by This shows that while the 2 vector norm y 2 is not affected by the volume fraction η i , the operator norm v h a h and, by equivalence, the matrix energy norm scales with the volume fraction η i . Therefore it follows that with denoting an inequality involving a constant (i.e., a b indicates a > Cb for some constant C). As the norm A 2 does not depend on the volume fraction η i , this results in a condition number κ(A) η −2p i . In [144] this derivation is performed in an abstract setting for more general cut scenarios (under certain shape assumptions), for different numbers of dimensions, and with different boundary conditions. This results in the estimate of the condition number of linear systems derived from immersed finite element formulations of second order problems with η denoting the smallest volume fraction in the system, i.e., η = min i η i . The scaling relation (3.8) is numerically verified in [144] for different grid sizes, discretization orders, and for both C 0 -continuous and maximum continuity spline bases.
It can be noticed that the shape of the cut element is not included in the condition number estimate. That is because within certain shape-regularity assumptions (see [144] for details), the shape of the cut element is not a dominant factor in the order of magnitude of the smallest eigenvalue of the system matrix. When the system matrix is (diagonally) scaled, as will be discussed below, the shape of the cut element does play a role. It should be mentioned that, under the same shape-regularity assumptions, the derivation in [144] also establishes that when a local, element-wise, Nitsche parameter is employed, the condition number estimate is not affected by the type of boundary condition imposed on the (cut) boundary of the smallest cut element. When a global Nitsche parameter is employed, the value of this parameter is determined by the smallest cut element. While this does not affect the smallest eigenvalue of the system, this makes the largest eigenvalue dependent on the smallest cut element as well, resulting in even larger condition numbers.
Effect of diagonal scaling, smoothness and the cut-element shape The previous paragraph considered the condition number of the system matrix A as is, without any treatment. It is noted, however, that linear algebra solvers typically perform basic rescaling procedures, most notably diagonal scaling operations such as Jacobi, Gauss-Seidel or SSOR preconditioning [154]. Consequently, it is important to not only consider the condition number independently, but also to have a closer look at the coefficient vector y corresponding to a function with a very small operator norm v h a h . As indicated in [144], there exist two mechanisms by which the operator norm of a basis function can be much smaller than the norm of the corresponding coefficient vector. First, it is possible that a basis function in itself is very small. In this case, simply the unit vector corresponding to that specific basis function will already cause a large condition number through (3.3) (i.e., with this unit vector taken as y, the quotient in this equation will be very large). Second, on small cut elements it is possible that the dependence of certain basis functions on a specific parametric coordinate or on higher-order terms diminishes. This essentially reduces the dimension of the space spanned by the basis functions on the small cut element, such that these basis functions become almost linearly dependent (see Figure 5). In this case, the small function v h as in (3.5) cannot be represented by a unit vector, and the coefficient vector y corresponding to the small function requires multiple nonzero entries. This is an important nuance in relation to the scaling of the system, as small eigenvalues that correspond to an (almost) unit vector are resolved by diagonal scaling, while small eigenvalues caused by almost linear dependencies are not.
Because of the above-discussed distinction between the classes of small eigenvalues in immersed finite element systems, there is a significant difference in conditioning between, on one hand, higher-order (p ≥ 2) C 0 -polynomials, and, on the other hand, maximal continuity splines and linears (note that linears are a subclass of maximal continuity splines). This is because in the former case, almost linear dependencies will be formed on all small cut elements. For such systems, Jacobi preconditioning lowers the condition number, but the resulting systems are still ill-conditioned and the condition number still shows a dependence on the smallest cut element. For maximum continuity splines, the shape of small cut elements starts to play a role. On elements in which a vertex is contained in T Ω , as indicated by element T 1,Ω in Figure 5, a discretization with maximum continuity splines will only contain a single basis function of which the support is restricted to the small cut element. Therefore, a Jacobi preconditioner suffices to repair the conditioning with regard to that element. On cut elements where T Ω does not contain a vertex, as indicated by element T 2,Ω in Figure 5, the diminished dependence on the (in this case horizontal) parametric coordinate causes almost linear dependencies. Consequently, such elements still cause ill-conditioning even after diagonal rescaling of immersed systems with maximum continuity splines. Because of this dependence on the continuity of the basis, a scaled linear system derived from an unfitted discretization with maximum continuity splines will contain far fewer problematically small eigenvalues than a scaled linear system derived from an unfitted discretization with C 0 -continuous basis functions of the same degree p ≥ 2. Depending on the size of the system and the smoothness of the boundary, the number of problematic eigenvalues in an immersed iso- Figure 5: Illustration of an unfitted boundary with different types of cut elements. Element T1,Ω contains a vertex. With maximal continuity splines, only one basis function is supported on this element only (for a linear basis, the node corresponding to this function is indicated by the blue dot). Rescaling the basis functions will resolve the conditioning problems for this type of cut element. Element T2,Ω does not contain a vertex, and multiple basis functions are only supported on this element (with a linear basis, these are the basis functions corresponding to the nodes indicated with the green dots). If the volume fraction of this element is very small, the dependence of the basis functions on the horizontal coordinate (relative to the dependence on the vertical coordinate) diminishes, such that the basis functions describe essentially the same degree of freedom and become linearly dependent. Therefore, rescaling the basis functions will not resolve the conditioning problems on this type of cut element. In a similar manner, the relative contribution of higher-order terms diminishes relative to the contribution of linear terms on small cut elements. As a result, with higher-order discretizations that are not of maximal continuity, almost linear dependencies generally occur on small cut elements of any shape, while these only occur on specifically cut elements with discretizations of maximal continuity.
geometric discretization in conjunction with diagonal scaling can even be small enough to render direct application of a Krylov-subspace based iterative solver feasible. Because only cut elements of a specific shape cause small eigenvalues for systems based on maximum continuity splines in conjunction with diagonal scaling, the overall smallest volume fraction and the scaled condition number are in general only weakly correlated in such systems [144].

Preconditioning and literature overview
Based on the conditioning analysis presented above, ill-conditioning of immersed FEM systems can effectively be negated by dedicated preconditioning techniques. The general idea of preconditioning is to construct a preconditioning matrix, B, which is an approximation of the inverse of the system matrix A, and then to solve the preconditioned system Although the original system matrix A in equation (2.8) can be ill-conditioned, i.e., κ(A) 1, a properly formed preconditioner results in a well-conditioned preconditioned system matrix, i.e., κ(BA) ≈ κ(A −1 A) = κ(I) = 1. In constructing the preconditioner B, one balances the computational effort required to compute and apply the preconditioner with the extent to which the inverse of the original system matrix is approximated.
Various dedicated preconditioning techniques to resolve ill-conditioning problems caused by small cut elements have been developed in the context of GFEM and XFEM, such as: a preconditioner based on local Cholesky decompositions [155], a FETI-type preconditioner tailored to XFEM [156], an algebraic multigrid preconditioner that is based on the Schur complement of the enriched basis functions [157], and domain decomposition preconditioners based on additive Schwarz [158,159].
In recent years, dedicated preconditioners have been developed for immersed FEM. It is demonstrated in [160] that systems with linear bases can effectively be treated by a diagonal preconditioner in combination with the removal of very small basis functions. Under certain restrictions on the cut-element geometry, it is derived in [161] that a scalable preconditioner for linear bases is obtained by combining a Jacobi preconditioner for basis functions on cut elements with a standard multigrid preconditioner for interior basis functions that do not intersect the boundary. In [144], a preconditioner is developed that combines diagonal scaling with local Cholesky factorizations. This technique is motivated by the analysis of the conditioning problems of immersed methods in the previous section, and can be interpreted as a local change of basis on small cut elements, where the Cholesky factorizations correspond to local orthonormalization procedures for almost linearly dependent basis functions, which are identified by a tailored algorithm. The resulting preconditioner effectively resolves ill-conditioning for immersed methods discretized with higher-order (continuous) bases.
Recently, Schwarz preconditioners -which were already discussed above in the context of GFEM and XFEM [158,159] -have also gained momentum in immersed finite elements. The concept of Schwarz preconditioning to overcome the problem of linear dependencies on small cut elements was considered in [61,162]. This concept was generalized to multilevel hp-finite element bases in [60], where it was also demonstrated to be effective in parallel computing frameworks. In [163], Schwarz preconditioning of unfitted systems was applied as a smoother in a multigrid solver, making it suitable for large scale computations. The methodology was applied in highperformance parallel-computing settings in [164,165], with [165] demonstrating excellent scalability for problems with multi-billion degrees of freedom distributed over close to 10 5 cores. A Balancing Domain Decomposition by Constraints (BDDC) scalable preconditioner, tailored to immersed FEM by choosing appropriate weighting coefficients for cut basis functions, has been proposed in [166]. BDDC methods are multilevel additive domain decomposition algorithms that can scale up to millions of cores/subdomains [167]. The preconditioner in [166] results in an effective preconditioner for linear basis functions and exhibits the same parallelism potential as standard BDDC, thus being well-suited for large-scale systems on distributed-memory machines.
It is worth mentioning that dedicated preconditioners have also been developed and investigated for ghost-penalty stabilized discretizations. While, as will be demonstrated in Section 4, such stabilized systems do not suffer from the typical conditioning problems related to small cut elements discussed in Section 3.1, the different setting of the problem and the involvement of additional terms does warrant the investigation into the applicability of efficient solvers, in particular multigrid techniques. Dedicated multigrid routines for Nitsche-based ghost-penalty stabilized methods are presented in [168][169][170]. It is notable that [170] additionally presents multigrid routines for Lagrange multiplier based methods, see also [171,172]. Furthermore, in [173] and [174] a similar approach as in [161] (discussed above) is followed. In these references it is demonstrated that a stable splitting of the degrees of freedom exists, and that a scalable solver is obtained by combining a standard multigrid technique for a well-defined set of internal degrees of freedom with a diagonal preconditioner for the set of degrees of freedom along the boundary. While this technique is only applicable to ghost-penalty stabilized systems, in contrast to [161] it can also be applied to higher-order discretizations. A final noteworthy contribution is [175], which presents a preconditioner for ghost-penalty stabilized immersed interface problems of high contrast. In this reference a different splitting of the degrees of freedom is applied to define a Schur complement, based on which preconditioners are presented that are robust to high contrast ratios.
The remainder of this section focuses on Schwarz preconditioning and the considerations regarding its application to systems derived from immersed finite element methods.

Concept of Schwarz preconditioning
The concept of Schwarz preconditioning is to invert (restrictions of) local blocks of the system matrix A ∈ R N ×N and then sum these contributions to form the preconditioner. To provide a definition, we consider a set of (potentially overlapping) index blocks, where each index block contains M i ≥ 1 indices. The additive Schwarz preconditioner is then defined as and the multiplicative Schwarz preconditioner as The prolongation operator P i ∈ R N ×Mi consists of the unit vectors corresponding to the M i indices in the i-th index block. Pre-and post-multiplying the system matrix A ∈ R N ×N with the (transpose of) this prolongation operator restricts it to the submatrix A i ∈ R Mi×Mi consisting of only the indices in the i-th index block. Similarly, the opposite pre-and post-multiplication with these operators injects the local inverse It is to be noted that the index blocks may overlap. In additive Schwarz these contributions are then added in B AS and treated simultaneously. In multiplicative Schwarz, repetitions of the same index are treated sequentially. Application of the multiplicative Schwarz preconditioner can be expressed by recursive relations; see, e.g., [163]. It is noted that index blocks consisting of a single index simply reduce to Jacobi preconditioning in additive Schwarz, and to Gauss-Seidel preconditioning in multiplicative Schwarz. The formal definition and details about the construction of Schwarz preconditioners are presented [61] and [163].

Application to immersed systems
Schwarz preconditioners can be conceived of as locally orthonormalizing the basis functions corresponding to the indices in a block. In order to effectively employ the concept of Schwarz preconditioning as a tailored preconditioner for unfitted systems, it is therefore essential that for every set of almost linearly dependent functions, there is an index block containing all these functions. Since almost linear dependencies occur between basis functions that are supported on a small cut element, the index blocks are generally chosen by selection procedures based on the overlapping support of basis functions, either for all active elements in T h or only for the cut elements in T cut h . A discussion regarding considerations in the index blocks is provided later in this section.
A further interpretation of Schwarz preconditioning in relation to immersed finite element methods can be obtained from the additive Schwarz lemma. This lemma states that for a Symmetric Positive Definite (SPD) matrix A, the B −1 AS -inner product of an arbitrary vector y is equal to [176,177] (see [178,179] for this specific form) In these identities,ỹ i ∈ R Mi denotes a block vector corresponding to the i-th index block. The statement y = j P jỹj indicates that the sum of the prolongations of these block vectors form a partition of the vector y ∈ R N . A set of block vectors with this property exists, provided that every index is contained in (at least) one index block. In the case that the blocks do not overlap, this set of block vectors is unique. In the case that the index blocks do overlap, multiple sets of block vectors have the partition property. Accordingly, the lemma states that the B −1 AS -inner product of y is equal to the minimum of the sum of the A i -inner products of the block vectorsỹ i over all sets of block vectors with the partition property.
To relate this to the specific conditioning problems in immersed finite element methods, recall that B AS can be considered as a sparse approximation of A −1 . Efficient preconditioning requires B AS to have similar properties as A −1 or, similarly, B −1 AS to have similar properties as A. The analysis of the conditioning problems associated with immersed finite element methods in Section 3.1 conveys that the principal cause of these problem is that, potentially, y 2 y A in case y corresponds to a function comprised of i) a very small basis function or ii) almost linearly dependent basis functions. For the first case, it follows from (3.12) that y must also have a small B −1 AS -inner product, such that this property is captured by the additive Schwarz preconditioner, independent of the index blocks. For the second case, assume that indeed for every set of almost linearly dependent functions, there is an index block containing all these functions, in accordance with the previously stated postulate. Then, in case y corresponds to a function comprised of almost linearly dependent basis functions, y can be written as the prolongation of a single block vector. Consequently, also in the second case, it follows from (3.12) that y will have a small B −1 AS -inner product, such that this property is captured in the additive Schwarz preconditioner. As a result, with a proper choice of the index blocks, for both causes of very small eigenvalues of the matrix A, the corresponding modes will also be present in B −1 AS . The Schwarz preconditioner thus specifically targets the problematic aspects of small eigenvalues due to small cut elements, and thereby effectively resolves the ill-conditioning in systems derived from immersed formulations.

Effectivity and multigrid preconditioning
While the effectivity of Schwarz preconditioning for immersed problems can be explained by the additive Schwarz lemma in combination with a particular selection of the blocks, a formal mathematical bound on the eigenvalues of an unfitted system treated by Schwarz preconditioning has not yet been formulated. Numerical results, however, consistently show that the resulting systems behave the same as boundary-fitted systems, in the sense that (for second order problems as the Poisson equation) the condition number scales as h −2 , and the number of iterations that is required to solve the linear system up to a prescribed tolerance is proportional to h −1 [61,163].
To resolve the remaining grid dependence after Schwarz preconditioning, the Schwarz preconditioner can be applied as a smoother in a geometric multigrid framework. In [163][164][165] this is demonstrated to result in a methodology that is robust to cut elements and which solves linear systems with quasi-optimal complexity, i.e., at a computational cost that is linear with the number of degrees of freedom (DOFs). A delicate consideration in a multigrid framework is the choice between additive and multiplicative Schwarz. As demonstrated in [163], the stability of a multigrid solver with an additive Schwarz smoother requires a considerable amount of relaxation, and for this reason [163] employs a multiplicative implementation. As discussed later in this section, multiplicative Schwarz is less suited for parallelization, such that parallel implementations employ either additive Schwarz [165] or a hybrid variant [164]. Another important aspect of multigrid solvers with Schwarz smoothers is the dependence of their effectivity on the discretization order, which is investigated and discussed in [163].

Block selection
As previously mentioned, for Schwarz preconditioning to be effective, it is essential that every combination of basis functions that can become almost linearly dependent is contained in an index block, which is generally achieved by selecting these blocks based on the overlap in the supports of basis functions. In [61], an index block is devised for every cut element, containing all basis functions supported on it. For basis functions that are not supported on any cut element, a simple diagonal scaling is performed. A tailored block selection procedure for multilevel hp-adaptive discretizations is developed in [60]. In this procedure, the set of basis functions supported on a refined (leaf) element is further restricted to only the necessary DOFs. Additionally, the procedure is optimized by only devising blocks for cut elements with a volume fraction that is smaller than a certain threshold η * ∈ [0, 1]. For locally refined discretizations based on truncated hierarchical B-splines, a block selection strategy is presented in [163]. In [61] it is noted that, for vectorvalued problems, degrees of freedom describing the solution in a certain geometrical dimension can generally not form a linear dependency with degrees of freedom describing the solution in another geometrical dimension. Hence, it can be beneficial to generate separate index blocks for each geometrical dimension.
The most important consideration in the selection of index blocks is the size of the blocks. Small blocks are computationally inexpensive, but can miss almost linear dependencies on pathologically cut elements. Large blocks are more robust, but are computationally more expensive. With the Schwarz preconditioner directly employed in an iterative solver, the number of iterations is generally large enough for small blocks to be more efficient. When the Schwarz procedure is employed as a smoother in a multigrid method, the small number of iterations generally renders larger blocks more appropriate. For this reason, [163] and [165] consider block selection procedures based on multiple elements. In [163] an index block is created for every basis function, containing all the basis functions with a support that is encapsulated by the support of the function for which the block is created. Ref. [165] employs index blocks containing all basis functions supported on a cluster of 2 d elements, with d the number of dimensions.
Regarding the lack of consensus on the block selection procedures in different contributions, it can be concluded that an optimal choice of index blocks is still an unresolved question.

Extension to other problems
The additive Schwarz lemma in (3.12) only pertains to Symmetric Positive Definite (SPD) systems. Such systems cover a large number of applications, encompassing many problems in structural mechanics, but do not comprise all problems commonly solved by immersed finite element methods. In particular, incompressible-flow problems in which the pressure takes the form of a Lagrange multiplier in the weak formulation are indefinite, and problems involving convection or formulations based on the nonsymmetric Nitsche method are not symmetric. Applications of the Schwarz framework to immersed finite element approximations of such problems are considered in [61], in which for all considered problems it is observed that the solver is independent of the cut elements and that the number of iterations is approximately inversely proportional to the mesh size.

Implementation
A specific operation in the construction of a Schwarz preconditioner is the computation of stable inverses of the submatrices A i . With very small eigenvalues, numerical round-off errors can cause detrimental errors in the inversion of submatrices with eigenvalues that are too close to machine precision. In the worst case, this can lead to negative eigenvalues in the preconditioned system, resulting in failure of the iterative solver. For that reason, the inverses are generally not computed by a simple inversion operation, but via an eigenvalue decomposition. After the decomposition, eigenvalues that are smaller than a prescribed threshold are discarded. Details about this procedure are described in [60,61].
The Schwarz preconditioner is suitable for parallel implementations [60]. A particular facet of the parallelization of a Schwarz preconditioner is that each submatrix A i has to be available in a single subprocess for inversion. As this is not generally the case in parallel (boundary-fitted) finite element codes, this calls for special care in the implementation, e.g., by applying ghost elements [60]. As previously mentioned, a specific consideration pertaining to parallel multigrid implementations is the choice between additive and multiplicative Schwarz. While multiplicative Schwarz does not require relaxation, it does require extensive communication and synchronization between parallel processes, specifically in distributed memory systems. For that reason the parallel multigrid implementation in [165] employs an additive Schwarz smoother, while in [164] a hybrid variant is applied with an additive approach for DOFs that are shared between processes and a multiplicative treatment of DOFs that belong to a single process.  In (b) the convergence behavior of the preconditioned iterative solver is presented. Data previously published in [163].

Preconditioning example
To illustrate the effectivity of Schwarz preconditioning incorporated in a multigrid cycle in providing a robust solution procedure that is not affected by either the cut elements or by the size of the system, we consider the µCT-scanned trabecular bone specimen displayed in Figure 6a. This geometry was first studied in [47] and the presented results were previously published in [163]. The specimen is compressed with an average uniaxial strain of 1% in the horizontal direction, resulting in the (Frobenius norm) stresses indicated by the colors. Three different meshes are considered, consisting of 32 3 , 64 3 and 128 3 elements in the ambient domain. Figure 6a shows the active elements of the coarsest mesh on the right half of the geometry. With quadratic basis functions, these different meshes result in, respectively, 182 · 10 3 , 1.03 · 10 6 and 6.65 · 10 6 active degrees of freedom. Figure 6b shows the convergence behavior of the conjugate gradient solver preconditioned by a multigrid cycle with a tailored multiplicative Schwarz smoother for the different discretizations. It can be observed that the convergence is virtually independent of both the cut elements and the size of the system. For details regarding this example the reader is referred to [163].

Stability analysis
As indicated in Section 2.2.3, small cut elements do not only adversely affect the conditioning of the resulting linear system, but can also disturb the quality of the solution as the normal gradient is not adequately controlled. When Nitsche's method is applied on a boundary-fitted discretization, the trace inverse inequality ∂ n u h T ∩∂Ω D h −1/2 T ∇u h T holds on every element T adjacent to the Dirichlet boundary ∂Ω D ; see, e.g., [97]. By virtue of this trace inequality, the bilinear operator a h (·, ·) defined in (2.7) is coercive with a global Nitsche parameter β that scales with h −1 , and optimal error bounds with respect to the H 1 (Ω)-norm hold. With an immersed discretization, however, an element T that intersects ∂Ω D only partially intersects the physical domain Ω. Therefore, the aforementioned trace inverse inequality assumes the form ∂ n u h T ∩∂Ω D h −1/2 TΩ ∇u h TΩ , where h TΩ indicates a generalized thickness of the fragment T Ω = T ∩ Ω (the intersection of element T and the physical domain) normal to T ∩ ∂Ω D (the intersection of element T and the Dirichlet boundary); see Section 2.1. Consequently, coercivity of the bilinear form a h (·, ·) requires an element-wise Nitsche parameter β that is not inversely proportional to the characteristic mesh width of the background element, h T , but instead is inversely proportional to the generalized thickness of the intersection, i.e., β ∼ h −1 TΩ h −1 T . Due to the fact that without special precautions, in immersed finite element methods one has no control over h TΩ , the Nitsche parameter can in principle become arbitrarily large.
In [1] it is shown that the error of immersed Nitsche-based formulations admits a natural analysis in the norm (within this manuscript referred to as the β-norm) provided that β is large enough for the bilinear form to be coercive. Note that this norm is referred to differently in [1], and that it differs from the energy norm that will be introduced in Section 4.5. On account of the last term, this norm cannot be applied to the full space H 1 (Ω). For this reason, it is assumed that u ∈ H 2 (Ω) in the analysis, and the setting is restricted to the composite space V h ⊕ H 2 (Ω). In [1] it is demonstrated that the Nitsche-based approximation possesses a best-approximation property with respect to the β-norm, but that the error of the best approximation in this norm can still be arbitrarily large. In view of this and the fact that the equivalence between the β-norm in (4.1) and the H 1 (Ω)-norm can be arbitrarily weak, error bounds in the H 1 (Ω)-norm cannot be provided. Ref. [1] provides examples of computations where direct application of Nitsche's method fails for unfitted problems and provides references in which nonphysical stress patterns on small cut elements are reported.

Remedies and literature overview
Multiple stability-enhancing techniques have been developed to overcome the aforementioned problem. A list of these is presented below. Two techniques in particular are discussed in detail in the subsequent subsections, viz. element aggregation in Section 4.3 and ghost-penalty stabilization in Section 4.4. Essentially, these approaches provide control over the gradients on cut elements in terms of the gradients on interior elements, in a manner that is consistent with the original problem. Consequently, extended coercivity holds with respect to H 1 (Ω h ) (n.b. on the entire active domain), and the bilinear form is coercive with a well-behaved Nitsche parameter β ∼ h −1 T ( h −1 TΩ ). With element aggregation this is achieved through a modification of the approximation space, while with the ghost penalty a consistent stiffness is added to the bilinear form. As a result, these techniques provide optimal approximation properties, and condition number estimates analogous to those for boundary-fitted finite element formulations. Therefore, both aspects of the small-cutelement problem are resolved simultaneously. A unified analysis of the two methods is presented in Section 4.5.
• By employing a different formulation to enforce boundary conditions, the stability problems caused by Nitsche's method are circumvented. Such alternative methodologies to enforce boundary conditions are discussed in Section 2.2.2 (e.g., the nonsymmetric Nitsche method [135][136][137] or the shifted boundary method [86][87][88][89]). For some applications, however, these approaches lead to other complications, such as loss of consistency, symmetry, or adjoint consistency.
• A fictitious domain stiffness is commonly employed with high-order discretizations in the finite cell method [2,71] (see [72,103] for reviews). This approach is mathematically analyzed in [180]. With a fictitious domain stiffness, the volume integrals of the bilinear operator are extended into the fictitious domain multiplied by a small parameter (in practical applications this only holds for the fictitious parts of cut elements, i.e., Ω h \ Ω). This provides a stiffness on the fictitious part of cut elements, providing some control over the approximate solution on these. This approach has been applied to many real-world applications, notable examples of which are implant-vertebra models [49] and additively manufactured structures [100]. The fictitious domain stiffness is not consistent with the original problem, however. For this reason, a drawback of this approach is that it can be challenging (or for certain applications impossible) to set the parameter low enough to keep the consistency error at an acceptable level, but high enough to obtain sufficient control over the fictitious parts of elements. Besides the stability, the fictitious domain stiffness also improves the conditioning, but generally not to a level that permits the application of iterative solvers.
• A minimal stabilization procedure is developed in [181]. In this procedure, the elements are first classified as good (resp. bad) elements, i.e., elements with a sufficiently (resp. insufficiently) large overlap with the physical domain. Next, for each bad element, the normal derivative of the discrete function on the Dirichlet boundary that appears in the boundary integral of the bilinear form, is replaced by the normal derivative of an extension of the function on a nearby good element. This gives rise to a similar trace theorem as in boundary-fitted discretizations and, accordingly, the weak formulation is coercive with a Nitsche parameter that is inversely proportional to the (untrimmed) element size of the background grid such that optimal error estimates can be derived. This technique has mainly been applied to trimmed patches in isogeometric analysis, and can also be employed for coupling conditions on unfitted interfaces in overlapping multi-patch discretizations [182]. Additionally, in [183] it is shown that this methodology can be used to obtain inf-sup stable mixed formulations. It is to be noted that this approach does not provide control of the gradients in the (fictitious part of) cut elements themselves, and therefore only provides coercivity with respect to H 1 (Ω), not extended coercivity with respect to H 1 (Ω h ). As such, this approach yields stability and provides a well-posed imposition of boundary conditions, but it does not solve conditioning problems.
• A recovery-based stabilization technique is commonly applied in the Cartesian grid finite element method (cgFEM, [54,79]), and is similar to the ghost penalty with patch control [184]. In this stabilization procedure, the solution on a small cut element is weakly constrained by adding an L 2 -projection term over the entire background element to the bilinear form. Instead of constraining the solution on the small cut elements to a smooth extension of the solution on neighboring elements, Richardson iteration is applied to constrain the solution to a smooth extension of the approximate solution in the previous iteration. As a benefit of this technique, [184] mentions that it does not require operators that are not generally available in (immersed) finite element codes. Furthermore, [184] establishes the convergence of the Richardson iteration and a bound on the condition number similar to that of boundary-fitted finite element methods, and also illustrates these aspects numerically.
• A least squares stabilization term can be applied when the employed function space is (at least) C 1 -continuous. In [185] and [186] it is demonstrated that a stable and coercive formulation can also be achieved by applying a least squares finite element term in the vicinity of the Dirichlet boundary. Because in H 2 (Ω) the normal gradient on the boundary can be controlled by volumetric terms, coercivity is even achieved in the full space instead of only in the discrete space. A drawback of this approach is that it only applies to C 1 -continuous bases. The least squares stabilization does not repair the conditioning, such that in [185] this approach is combined with the removal of basis functions with very small supports in the physical domain and in [186] it is combined with a fictitious domain stiffness.

Aggregated finite element methods
In this section, we introduce a methodology to solve both the stability and conditioning issues described in Section 2.2.3. The method was originally proposed in [77] and was coined the aggregated finite element method (AgFEM). As the name points out, the method relies on an aggregation (also called agglomeration) of elements. This element aggregation is used to define a discrete extension operator from degrees of freedom on well-posed (interior) elements to ill-posed (cut) elements. The AgFEM space is the image of this operator. Thus, as indicated in Section 4.2, AgFEM solves the issues described above by modifying the finite element space, without altering or perturbing the bilinear form. We present this idea for (nodal or Lagrangian) C 0 -continuous finite element spaces. It can also be applied to discontinuous Galerkin methods, even though this case is trivial; discontinuous Galerkin schemes can readily be applied to the polytopal meshes obtained after aggregation.

Geometrical construction
Let us recall from Section 2.1 the problem domain Ω ⊂ R d that is embedded in the ambient domain A ⊃ Ω. The mesh T A h is a quasi-uniform and shape-regular discretization of A, T h is the set of all active elements T ∈ T A h that intersect Ω, and the union of all active elements is defined as Ω h = ∪ T ∈T h T ⊃ Ω. The set of cut elements that intersect the boundary ∂Ω of the problem domain is denoted as T cut h ⊂ T h and additionally we define the set of interior elements as T in h = T h \T cut h , see Figure 7a. As previously mentioned, immersed finite element formulations can lead to ill-conditioned discrete systems and unbounded gradients on cut elements. The interior elements in T in h do not play a role in these problems, but it is the cut elements in T cut h that can lead to the so-called small-cut-element problem.  The definition of finite element spaces that are robust to cut-element configurations makes use of an element aggregation strategy. The idea is to aggregate cut elements to interior elements. An aggregate contains one interior element (the root element) and one or more cut elements. We do not make any assumption on the shape of aggregates. As a result of this finite element space definition, aggregation is only active on the boundary. Interior elements that are not in touch with cut elements are not affected. We represent the aggregated mesh with T ag h . In Figure 7b, cut elements have been aggregated to interior elements, creating aggregates. The root element is highlighted with a different color in each aggregate. The aggregation algorithm is straightforward and can be found, e.g., in [77]. The algorithm proposed in this reference is iterative. At each iteration, cut elements are aggregated to an interior element or to a previously aggregated element (after the first iteration). The aggregation should minimize the aggregate size, since this will affect the final accuracy of the solution.
Remark 4.1. In this discussion, we considered for simplicity that cut elements are ill-posed and must be aggregated to interior elements. However, this naive definition of ill-posed elements is not a requirement of the method and more subtle definitions can be considered. For instance, one can define a threshold volume fraction η * ∈ [0, 1] (see (3.4)) and mark any element T i ∈ T cut h with η i ≤ η * as ill-posed. The naive case is recovered for η * = 1 and the standard Galerkin method with η * = 0. One can verify that the condition number and stability bounds hold for this relaxed definition with constants that depend on η * . η * = 1 is an excellent choice as soon as one wants to have enough resolution to capture geometrical details, which is the situation in most cases. For under-resolved situations, e.g., for thin-walled structures in which the thickness scale t is not captured by the background mesh, a more clever choice of η * , e.g., η * ∼ t h , is required. In practice, the choice of η * is a trade-off between accuracy (aggregate size) and wellposedness (condition number bound) of the resulting linear system. We note that the definition and implementation of AgFEM is independent of η * , since this parameter only affects the geometrical algorithm that aggregates elements.

Discontinuous spaces
Let us consider first the case of discontinuous Galerkin finite element spaces. We denote by V − h a discontinuous Galerkin space of a given order. The main reason why standard finite element methods on unfitted meshes fail is that one has no control over u h Ω h . Control over this quantity implies the stability of the method regardless of the cut location. Let us consider a discontinuous Galerkin space on top of Ω h (see Figure 8a). The support of a shape function on a cut element T ∈ T cut h is T Ω . This intersection depends on the cut location, and its measure can be arbitrarily small. In the limit of vanishing measure, the value of the degree of freedom associated with this shape function does not affect the linear system. Thus, the problem is singular.
(a) Discontinuous Galerkin constraints (b) Continuous Galerkin constraints Figure 8: On the left, we depict the degrees of freedom of a piecewise linear discontinuous Galerkin space Q − 1 (Ω h ). The nodes are deliberately placed in the element interiors to make it evident that they do belong to the element. Red circles represent well-posed degrees of freedom and blue squares ill-posed ones. In this case, for each ill-posed degree of freedom, its constraining degrees of freedom are the well-posed ones in the same aggregate using the expression in (4.4). On the right, we glue together degrees of freedom between elements to enforce C 0 continuity. In order to define the constraints, we must provide the concept of the aggregate owner of ill-posed degrees of freedom. This is obvious for aggregateinterior nodes (there is only one option). For the ill-posed degrees of freedom on aggregate interfaces, lines point to the owner aggregate. With this ownership information, one can now constrain the ill-posed degrees of freedom by the well-posed ones in the aggregate that owns them using the same expression (4.4) as above.
Given a mesh T h , we can define the discontinuous element-wise spaces depending on the local finite element space, namely polynomials up to order p or the tensor product of univariate polynomials of order p. The discontinuous Galerkin space V − h on the active mesh can either be P − p (Ω h ) or Q − p (Ω h ). Other spaces, like serendipity finite elements, could readily be considered. The space in Figure 8a corresponds to Q − 1 (Ω h ).
Based on the previous discussion, it is straightforward to check that degrees of freedom on interior elements are not problematic. The corresponding shape function has support on a whole element. Only the degrees of freedom on cut elements are problematic. Discontinuous Galerkin methods are geometrically flexible, i.e., the functional space and the topology of the elements are not connected. Thus, we can simply define the aggregated discontinuous Galerkin space as P − p (T ag h ) or Q − p (T ag h ). All these shape functions are polynomials, and their support is at least one full interior element. Thus, it can be checked that Extended stability of gradients is obtained using the same argument. Gradients of functions in V ag,− h are polynomials of degree p − 1 in each aggregate, and the norm on the whole aggregate can be bounded by the one on the root element, getting These relations are also employed in the mathematical analysis of stabilized methods in Section 4.5. The constants in the extended stability bounds (4.2) and (4.3) depend on the order of approximation, but are independent of the background mesh size h and the cut location. Thus, the small-cut-element problem is solved when using discontinuous Galerkin methods on aggregated meshes. Similarly, cell merging strategies in the finite volume setting have been proposed, e.g., in [187]. Its application to the discontinuous Galerkin method can be found in [83,188].
The aggregation method preserves the order of accuracy if the aggregate size is proportional to the background mesh size. The ratio between the aggregate size and root element size is related to the geometry of the domain boundary (see [77] for more details) and can be improved with refinement (for a given definition of the domain boundary ∂Ω). For a fixed mesh representation of the domain boundary (the standard situation), the assumption that the aggregate size is proportional to the background mesh size holds for h small enough.

Constraining discontinuous spaces
It is straightforward to see that V ag,− h ⊂ V − h . Even though it is not really needed for the discontinuous Galerkin method, one can build V ag,− h by constraining V − h . The idea is to define an extension In the discontinuous Galerkin case, this extension operator can be defined aggregate-wise. The idea is to extend the root-element shape functions to the whole aggregate. This is equivalent to constraining degrees of freedom on cut elements to the degrees of freedom on the respective root elements.
Let us recall Ciarlet's definition of finite elements; see, e.g., [93]. For nodal finite element methods (continuous or discontinuous), one can define a set of points (so-called nodes) and an associated set of degrees of freedom, corresponding to pointwise evaluation at the nodes. Nodes are chosen such that the degrees of freedom uniquely define functions in V − h and thus define a basis for its dual space. The dual basis of the degrees of freedom are the so-called shape functions. Thus, there is a correspondence between nodes, degrees of freedom and shape functions. We use the following notation: given a node α i , we represent its corresponding shape function and degree of freedom by φ i and σ i (·), respectively. The duality between the degrees of freedom and the shape functions implies σ i (φ j ) = δ ij with δ ij the Kronecker delta, and the relation between the nodes and the degrees of freedom in turn implies φ i (α j ) = δ ij . By virtue of the dual relation between degrees of freedom and shape functions, the following correspondence holds for all We can classify the degrees of freedom as well-posed (the ones of the root element) and illposed (the ones on the cut elements). We denote the set of well-posed (resp., ill-posed) degrees of freedom by O wpd h (resp., O ipd h ). Next, we define a map O ipd→wpd h (·) that for an ill-posed degree of freedom returns the well-posed ones. For discontinuous Galerkin methods, this map is defined aggregate-wise. At each aggregate T ∈ T ag h , the ill-posed degrees of freedom (blue squares in each aggregate of Figure 8a) are constrained by the values of its corresponding well-posed degrees of freedom (the ones in the corresponding root element, red circles in Figure 8a). More specifically, an ill-posed degree of freedom σ i (·) is constrained as where the basis functions φ j of well-posed degrees of freedom are evaluated outside of the elements on which these are defined in σ i (φ j ) and φ j (α i ). Equation

Aggregation for continuous spaces
Now, we would like to generalize the extension-based definition of the aggregated discontinuous finite element space to C 0 Lagrangian spaces. First, we discuss how the active and interior C 0 spaces can be obtained from the discontinuous space by enforcing C 0 continuity of piecewise polynomials using a local-to-global map. We also discuss why this approach cannot be easily applied to the aggregated meshes. Instead, we introduce the concept of ownership of ill-posed degrees of freedom on aggregates, and define an extension operator from well-posed to ill-posed degrees of freedom that preserves C 0 continuity. We can readily define the C 0 Lagrangian finite element space as a subspace of the discontinuous space as V h = V − h ∩ C 0 (Ω h ), and its restriction to interior elements is represented with V in h . In standard finite element methods, one can enforce C 0 continuity by using the previously introduced one-to-one relation between nodes, degrees of freedom, and shape functions. A shape function is different from zero only on its corresponding node, and zero on all the other nodes, such that the degree of freedom is just the evaluation on its corresponding node. C 0 continuity is enforced by the following equivalence class: two degrees of freedom must have the same value (irrespective of the element) if their corresponding nodes are in the same spatial point. Doing this, we glue together degrees of freedom from different elements. Figure 8b illustrates the result after gluing together the element-wise degrees of freedom in Figure 8a. This enforcement of continuity is amenable for implementation: one defines an element-wise matrix assembly and a local-to-global index map for degrees of freedom.
Unlike discontinuous Galerkin methods, this construction only holds for particular elementwise polynomial spaces and element topologies. First, it is not straightforward to implement such continuity in aggregated meshes. Second, it can destroy the approximation properties of V ag h . Let us take a look at an ill-posed degree of freedom on the interface between two aggregates (purple nodes in Figure 8b). In a straightforward extension of the constraints in (4.4), such a degree of freedom would be constrained by the multiple aggregates that contain it. This, together with the enforcement of C 0 -continuity, would couple the degrees of freedom between separate aggregates, and thereby could ruin approximation properties. In extreme cases, it can lead to over-constrained systems. This locking phenomenon has been analyzed in detail in [78].
The aggregated finite element method in [77] was designed to solve this locking phenomenon and provide an accurate C 0 Lagrangian finite element space on unfitted meshes that is easy to implement. The idea is to introduce the concept of ownership for ill-posed degrees of freedom. For each ill-posed global degree of freedom in V ag h , one assigns one of the aggregates in T ag h containing this degree of freedom as the owner. The choice of the owner is arbitrary and does not affect the main properties of the method. We can define a map O ipd→wpd h using the fact that each aggregate contains only one interior element. Given an ill-posed degree of freedom, its corresponding wellposed constraining degrees of freedom are those of the root element of the aggregate owner. We illustrate this construction in Figure 8b. The purple elements are the ones that belong to more than one aggregate (the other ones are straightforward). For these degrees of freedom, the small rectangle points to the aggregate owner of the degree of freedom. With this construction, we can now constrain global ill-posed degrees of freedom using the same constraints as for discontinuous Galerkin methods (4.4).
These constraints are local and define an extension E ag h : V in h → V ag h . In fact, the aggregated finite element space is the image of this extension operator. The method satisfies the desired approximation properties and the extended L 2 (Ω h ) and H 1 (Ω h ) stability in (4.2) and (4.3); see [77] for more details. We also refer to [189] for an alternative definition of the discrete extension operator that relies on interpolation for high-order polynomial bases and to [190][191][192] for the extension of higher-order continuous (isogeometric or spline-based) discretizations. The definition of a space-time discrete extension operator for moving domains and interfaces can be found in [193].

Implementation aspects
The aggregation algorithm can readily be implemented in a finite element code that supports mesh adaptivity. First, local element matrices are computed as usual. Next, the constraints for the aggregated finite element method are direct constraints, i.e., constraints which are enforced in the local-to-global assembly by expressing the ill-posed degrees of freedom in terms of well-posed ones to eliminate them from the system.
The distributed-memory implementation of the aggregated finite element method has been considered in [194]. The proposed implementation starts with a standard distributed implementation of the finite element method that relies on a sub-mesh partition (with gluing info, e.g., via one layer of neighbor elements). First, one must parallelize the aggregation algorithm. The parallel implementation of this step is straightforward; one needs to perform nearest neighbor communications at the end of each iteration of the aggregation algorithm to propagate information. The result is identical to the serial version (see [194, §3.2]).
In general, the aggregation strategy is such that aggregates can have support on multiple processors. Thus, constraints in (4.4) cannot be computed locally. To compute these constraints in a parallel environment, one requires information from the root element. The cut element to root element map (and its processor Id) is a by-product of the parallel aggregation step. This way, it is easy to define the information that each processor must receive from other processors to compute constraints. However, the inverse map that provides the cut elements constrained by a given root element is not straightforward. This map is required to prepare the data that needs to be sent to other processors. The parallel inverse path reconstruction algorithm in [194, §3.5] generates this information.

Adaptive meshes
The size of the aggregates on the boundary can harm the accuracy of the method. One way to limit the size of aggregates is via mesh refinement. However, local mesh refinement in continuous finite elements is not straightforward. In the resulting mesh so-called hanging nodes appear, which must be constrained by regular nodes to keep C 0 continuity. The nature of this constraint is analogous to the ones defined in (4.4) for aggregated finite elements.
The combination of mesh adaptivity and aggregation constraints is not straightforward. First, the definition of ill-posed versus well-posed degrees of freedom is more complex. A degree of freedom that only belongs to cut elements can be well-posed because it constrains well-posed hanging degrees of freedom. The right intertwining of these two sets of constraints has been analyzed in [195]. This work proposed a two-step algorithm to construct the discrete extension operator that carefully mixes aggregation constraints of problematic degrees of freedom and standard hanging degree of freedom constraints. Following this approach, the aggregated finite element space is defined as a discrete extension with well-defined linear constraints.

Extension to other problems
The aggregated spaces can be used to discretize elliptic and parabolic problems (combined with some time-stepping scheme). These schemes have, for example, been applied to nonlinear solid mechanics in [196]. The extension to interface problems is possible by defining independent aggregated spaces on both sides of the interface [197]. In this work continuity of traces is weakly enforced (as Dirichlet data), which leads to robust schemes that can handle high-contrast problems.
The extension of aggregated finite elements to indefinite systems has been analyzed in [198]. We note that its application to stabilized finite element formulations that transform the indefinite system into a definite one is straightforward [199]. The stability of indefinite problems relies on so-called inf-sup conditions. These conditions are stringent and are only valid for very specific i.e., finite element spaces. For instance, the velocity and pressure finite element spaces in the (Navier-)Stokes problem have to match to satisfy this condition. Aggregation on these spaces does not generally preserve inf-sup stability. The aggregated mixed finite element method for the Stokes problem in [198] starts with the inf-sup stable pair i.e., the velocity resides component-wise in Q p (T in h ) ∩ C 0 (T in h ) and the pressure is a discontinuous function in P p−1 (T in h ). The extension defined above for continuous and discontinuous spaces leads to a pair that does not satisfy the inf-sup condition. Instead, a modified extension of the velocity components is proposed in [198], which relies on serendipity finite elements. In some specific situations, additional pressure jump stabilization terms must also be added.
Aggregation has also been applied to enable stable explicit time stepping, such as in, e.g., [145] for the wave equation.

Solving the linear system
One of the main motivations of aggregated finite element methods is to produce a well-posed discrete system. For second-order elliptic problems, one can prove the same condition number bounds as for boundary-fitted formulations. More specifically, the condition number bound is of the order of h −2 ; see [77,Corollary 5.9]. Thus, the condition number is independent of the mesh location. As a result, one can apply standard direct or iterative solvers and preconditioning techniques. However, the sparsity pattern of the matrix is different from standard finite elements. As in adaptive finite elements, the sparsity pattern is affected by the constraints.
The linear solver step in aggregated finite element methods has been studied in [194]. The authors propose a row-wise linear algebra distribution layout for the resulting matrix in [194, §3.6] and minimize the amount of inter-processor communications. The authors use the parallel implementation of aggregated finite elements in FEMPAR [200]. They apply a standard algebraic multigrid preconditioner (called GAMG) from the PETSc library [201] using a standard configuration. The results in [194, §4] show excellent scalability and optimality properties for this standard solver applied to the aggregated finite element method. The aggregated results are very similar to the results obtained with boundary-fitted meshes. The largest problems that are considered exceed 10 9 elements on 16,464 processors. From these numerical experiments, it is observed that the additional coupling between boundary degrees of freedom does not affect a standard algebraic multigrid preconditioner.

A numerical example
This section is a summary of the numerical experiments in [194], in which the aggregated finite element method is considered. Herein, we are particularly interested in the linear solver step when using the aggregated finite element method. We refer the interested reader to [194], in which one can find a more thorough numerical experimentation, including, e.g., the time spent in all stages of the simulation and their scalability or the effect of some algorithmic parameters on the results. The motivation for this test is to show that the aggregated finite element method leads to a wellposed discrete system for which one can readily use standard iterative solvers and preconditioners. To this end, we consider the widely available linear solvers available in the PETSc library [201]. We use a conjugate gradient method from the KSP module of PETSc, preconditioned with the smoothed-aggregation algebraic multigrid preconditioner GAMG [202]. We set up the linear solver by relying as much as possible on the default configuration given by GAMG to effectively show that the aggregated finite element spaces lead to linear systems that can be efficiently solved using standard multigrid tools. The software used for these experiments is the MPI-parallel implementation of the AgFEM available at FEMPAR [200] linked with p4est v2.0 [203] for octree mesh handling and PETSc v3.9.0 [201] for distributed-memory linear algebra data structures and solvers.
We consider the Poisson equation with Dirichlet boundary conditions weakly imposed using Nitsche's method, as introduced in Section 2.1. The Nitsche parameter is taken as β = 10 h −1 . When using the standard finite element space without aggregation, this expression for the Nitsche parameter does not necessarily lead to a coercive bilinear form, as explained in Section 2.1. Therefore, in the results with standard finite element spaces, an element-wise Nitsche parameter that follows from a local generalized eigenvalue problem is employed. Note that, as mentioned previously, this local value of the parameter is not bounded due to the lack of stability of the method.
We consider hexahedral elements with continuous piecewise trilinear shape functions. The geometry in the experiments is the complex body depicted in Figure 9, which is often considered in the literature of immersed finite element methods; see [74]. The bounding box used to define the background mesh is the unit cube [0, 1] 3 , which is also depicted in Figure 9. We consider background meshes of hexahedral elements generated and partitioned in parallel with the p4est library. The test is performed for three different local loads (referred to as "load 1", "load 2" and "load 3") with 20 3 , 30 3 , and 40 3 elements per parallel subdomain, respectively. We generate the meshes by recursive subdivision of the unit cube. The finest mesh considered in the examples has 1, 073, 741, 824 elements (corresponding to refinement level 10), and it is partitioned into 16, 777 subdomains which are mapped to the same number of MPI tasks. Figure 9: View of the geometry and the bounding box considered in the numerical example. Figure reproduced from [194].
First, we analyze the impact of using aggregated finite-element or standard finite-element spaces on the number of solver iterations; see Figure 10a. This figure conveys that the use of aggregated finite element spaces is beneficial (and generally indeed essential) to achieve good performance of the linear solver. For the standard finite element spaces, some results are missing in Figure 10a. These correspond to cases in which the linear solver was not able to provide a converged solution. This shows that standard finite element spaces are not reliable in immersed discretization methods. In contrast, aggregated finite element spaces allow one to effectively solve the underlying linear systems. The results obtained with aggregated finite element spaces (blue lines) are very close to the expected optimal performance of multigrid methods (i.e., number of linear solver iterations is asymptotically independent of the problem size). Figure 10b reports the wall clock time spent in the linear solver step, separated into time spent in the solver setup and in the solver run. The solver wall clock time increases only moderately as the problem size increases.

Ghost-penalty stabilization
In this section, we present common techniques for the weak stabilization of immersed finite element methods, often referred to as the ghost penalty and generally associated with the Cut Finite Element Method (CutFEM) [73,74,76]. In the ghost-penalty approach, additional terms are added to the bilinear form which provide control over the solution on the elements that intersect the boundary, while the original approximation space V h is unmodified. These terms are denoted by s h (v h , w h ) and provide a contribution to the operator norm in the form of v h The underlying idea is to control the variation of the approximation function across neighboring elements, such that the solution on small cut elements is controlled by the solution on interior elements. A standard way to achieve this is by controlling the jump in the normal derivatives across faces, by adding appropriately scaled stiffnesses to these jumps. Another approach is to add a volumetric penalization to the difference between a function itself and the extension of the function on a neighboring element. In the subsequent sections, both approaches are discussed.

Face-based stabilization
The most common stabilization term for cut elements is the face-based ghost penalty [73,74], defined as where τ j are positive constants, F gh h is the set of all interior faces that are shared by an element in T cut h (see Figure 11), (·, ·) F indicates the inner product over F , and is the jump in the normal gradient across the face F . In this definition of the jump on face F , n 1 and v h,1 , and n 2 and v h,2 , correspond to the exterior unit normal and the function v h on the elements T 1 and T 2 that share face F , respectively. Note that, therefore, n 1 = −n 2 . With this penalty on the jumps in the normal derivatives between two neighboring elements, the norm of a function v h ∈ V h on one of the elements can be controlled by the norm on the other. Accordingly, gradients on cut elements are controlled by gradients on full elements in the interior. The parameters τ j in the stabilization terms must be chosen in an appropriate way. With too small parameters, the effect is negligible, while with too large parameters, degrees of freedom can be constrained too strongly, potentially degenerating the accuracy of the approximation [204]. On Dirichlet boundaries, the required value to provide coercivity of the bilinear form correlates inversely with the Nitsche parameter. On Neumann boundaries, coercivity of the bilinear form does not depend on the stabilization term. Nevertheless, stabilization can still be applied to the Neumann boundary in order to repair the conditioning problems discussed in Section 3.1, for which a scaling with h 2j+1 instead of h 2j−1 in (4.5) suffices [205].
To elaborate the ghost-penalty approach, let us for simplicity consider the case of linear elements, i.e., p = 1. It then holds that To derive these estimates, we consider the element-wise restrictions v h,i = v h | Ti (i ∈ {1, 2}), both extended onto T 1 ∪ T 2 ; see Figure 12. The second estimate in (4.7) follows from (4.8) with ∂ ni indicating either of the normal derivatives. Here we first added and subtracted v h,2 and used the triangle inequality. We then used the Taylor at ξ with point ξ F ∈ F the projection of ξ along the normal direction, which holds since v h,1 − v h,2 = 0 on F by virtue of the continuity of v h . Additionally, we used the estimate v h,2 T1 v h,2 T2 which holds since v h,2 is polynomial. The first estimate follows in the same way, but is simpler since the gradients are piecewise constant, leading to an h scaling factor instead of the h 3 factor.
Using the pairwise bounds in (4.7) -either directly or through a chain of neighboring elements -the degrees of freedom on cut elements in T cut h can be controlled by the degrees of freedom on interior elements T in h via the ghost penalty ∇v h To prove the first bound we proceed as The second bound in (4.9) can be established similarly. Note that the constants in these estimates will depend on the lengths of the paths connecting cut elements to interior elements, which is related to the properties and the resolution of the boundary, as well as the properties of the mesh. For smooth boundaries and locally quasi-uniform meshes with sufficiently small mesh sizes, one can show that the length of these paths is uniformly bounded. Conversely, the stabilization term can be bounded in terms of the L 2 -norm of the gradient of the approximation on the active domain Ω h which follows from where we first employed the triangle inequality, then a trace inverse estimate to pass from the face F to the neighboring elements T 1 and T 2 [97], and finally another inverse estimate to arrive at a bound in terms of ∇v h [93]. Using the same inverse estimate it follows that The estimates (4.9), (4.11), and (4.13) are central to the coercivity and continuity of the (stabilized) bilinear form with a (global) Nitsche parameter that scales inversely with the mesh size of the background grid. The estimates are also essential for the condition number bounds derived in Section 4.5. The principal notion of the ghost-penalty method is encapsulated in the bounds in (4.9), which convey that the ghost penalty provides control over the solution on the entire active domain including the fictitious part. The bounds (4.11) and (4.13) in turn impart that the ghost-penalty term is suitably well behaved.

Element-and patch-based stabilization forms
An alternative to the face-based penalty is a penalty that is based on volumetric integrals. This is convenient for higher-order polynomials, since it avoids the computation of higher-order derivatives at element interfaces, which is a non-standard operation in many finite element codes. An overview and analysis of different types of such ghost-penalty formulations is presented in [206].
A stabilization term based on element integrals can be defined according to any of the two forms where for each face F ∈ F gh h , T h (F ) is the pair of elements that share F and  .14) is presented in [207,208] and is obtained by example formulation 2 in [206, §2.4] in case the element pairs are chosen based on the faces in F gh h . We may also consider stabilization based on patches or macro elements, which is the form that was presented with the introduction of the ghost penalty in [76] and, for a certain choice of patches, is also considered in [208]. With this form of stabilization, each cut element is associated with a patch of neighboring elements that contains at least one element in the interior. We may then penalize the difference between the finite element function and a global polynomial on the patch. More precisely, let M T be a patch of elements containing T , then we can define is the L 2 -projector or the Ritz projector, respectively. Note that by strictly following this formulation, patches containing multiple cut elements will be contained in the summation multiple times. In practice, every patch that contains cut elements is included only once, or the boundary is simply covered by a set of non-overlapping patches such as in [209]. Ghost-penalty stabilization can also be combined with the concept of aggregation to design weakly aggregated schemes, as proposed in [78,206]. In these schemes, one makes use of the standard finite element space V h on the active mesh, and instead of strongly imposing constraints via an extension operator of interior degrees of freedom, the difference between V h and V ag h is weakly penalized. Recall the discrete extension operator E ag h v h can then be penalized similarly as in (4.16) These terms can be computed element-wise (in a similar manner as described in Section 4.3) and are similar to local-projection stabilization techniques; see, e.g., [210,211]. We note that the (strong) aggregated finite element method of Section 4.3 is formally recovered in the limit τ j → ∞. Thus, the stabilization method is also accurate for large values of τ j . This last statement does not hold for other forms of the ghost penalty, in which a large parameter can constrain too many degrees of freedom, which inhibits the accuracy [204] and is referred to as locking in [206]. Detailed analyses and experimentation with this type of stabilization can be found in [78] and [206]. Stabilization techniques based on volumetric integrals have also been developed for multimesh discretizations [212][213][214] and for problems on manifolds [75]. Additionally, it is noteworthy that with the H 1 -product and with the projector P p = 0 or P ag h = 0, the fictitious domain stiffness that is commonly employed in the finite cell method is recovered [215]. Fictitious domain stiffness requires τ 1 1, however, as this stabilization term is not consistent with the original problem. Therefore, with a fictitious domain stiffness, stability and conditioning need to be balanced with consistency.

Implementation aspects
To implement any of the aforementioned stabilization forms, one must be able to find face neighbors of given elements and assemble contributions from pairs of elements sharing a face. For higherorder elements, the face-based ghost penalty requires computation of higher-order derivatives. The element-based formulation may then be preferable. All the stabilization terms lead to additional coupling and, as a consequence, fill-in in the stiffness matrix [52,216]. Systems stabilized by the ghost penalty have similar condition number bounds as boundary-fitted formulations, and therefore larger systems (i.e., finer grids) yield larger condition numbers and require more iterations in iterative solvers. This can be overcome by combining the ghost penalty with a multigrid solver; see the discussion on this in Section 3.2.

Extension to other problems
The ghost penalty can also be applied to stabilize time-dependent problems, for which generally the mass form needs to be stabilized as well. In that case the scaling is chosen in such a way that the stabilization terms scale with the mesh parameter in the same way as the mass matrix; see (4.13). Similar to Schwarz preconditioning and aggregation, the ghost penalty can also be applied to provide inf-sup stable discretization of mixed formulations such as that of the Stokes problem; see, e.g., [21,28,52,217]. Furthermore, the ghost penalty can also be applied to stabilize discretizations of codimensional manifolds, such as partial differential equations posed on nonplanar surfaces discretized with three-dimensional meshes; see, e.g., [75,[218][219][220].

A unified analysis of aggregation and stabilization
In this section, we present a framework for the analysis of stable immersed finite element methods. We consider aggregated and ghost-penalty-stabilized techniques at the same time, to identify the common parts and the main differences. It is recalled from Section 4.1 that when Nitsche's method is applied without any stability-enhancing technique, the bilinear form is only coercive with an a priori unbounded Nitsche parameter β. In that case, most of the properties and estimates presented in this section do not apply, and error bounds cannot be provided. Remark 4.2 at the end of this section discusses which specific analyses still hold and which no longer hold without any form of stabilization.

Methods and formulations
We recall from (2.6) and (2.7) that the standard Nitsche finite element method takes the form with bilinear form a h (·, ·) according to We note that this is consistent with the strong formulation in (2.1), i.e., the exact solution u to the corresponding continuous problem satisfies provided that u is regular enough, for instance if u ∈ H 2 (Ω). To accommodate both approaches in the same analysis we consider the formulation where the finite dimensional function space is defined by (ghost-penalty stabilization) (4.22) and the bilinear form is defined by For the ghost-penalty method, the stabilization form is included in the definition of a Ω (·, ·) because the stabilization form s h (·, ·) is naturally paired with the bulk or volumetric term (∇·, ∇·) Ω to enable bounds of the form (4.7) and (4.9).

Properties
For the analysis, we start from the following intrinsic properties of aggregation methods and ghost-penalty formulations: • There is a constant (hidden in the binary operator ) such that where as before Ω h = ∪ T ∈T h T is the union of all the active elements, such that Ω h ⊃ Ω. See, respectively, (4.3) and (4.10) for these estimates in the context of aggregated and weakly stabilized methods. This property forms the basis of the effectiveness of weakly stabilized and aggregated methods. Essentially, it indicates that the solution on the extended domain Ω h , including the fictitious parts outside Ω of active elements, is controlled by a Ω (·, ·). Consequently, this precludes the very small eigenvalues on small cut elements discussed in Section 3.1, and it provides control of the normal gradients which enables bounding the Nitsche parameter β from above as discussed in Section 4.1 and 4.2. With aggregation this is achieved by directly precluding basis functions with a very small support in the physical domain as discussed in Section 4.3, while with the ghost penalty such functions are given a contribution in a Ω (·, ·) by s h (·, ·) as discussed in Section 4.4.
• There is an interpolation operator π h : H s (Ω) → V h, , s ≥ 2, such that we use a continuous extension operator E : v ∈ H s (Ω) → H s (R d ) to extend v outside of Ω. For ghost-penalty-stabilized methods, the interpolation operator is generally constructed by first extending the function v outside of Ω and then applying a standard interpolation operator, for instance the Clément operator [221]. For the aggregated finite element space, a specific construction of the extension is used to establish the interpolation estimate [192,222].
• The stabilization form s h (·, ·) satisfies This is a consistency assumption on the stabilization form s h (·, ·), formulated in such a way that the stabilization form only acts directly on functions in the finite element space, and acts on other functions v indirectly via their projection π h v. Note that the stabilization form s h (·, ·) may for instance involve higher-order derivatives, which are not generally well defined for the exact solution or the solution to a dual problem used in an error analysis. This complication is avoided by formulating the consistency condition (4.27) in terms of projections onto the finite element space.
To provide an example we verify this consistency assumption (4.27) for the face-based stabilization form (4.5) in the case of linear elements. Consider a face F ∈ F h shared by the two elements T 1 and T 2 . Then for any linear polynomial w ∈ P 1 (T 1 ∪ T 2 ) we have where we used a trace inverse estimate to pass from the face to the elements [97]. Using the Bramble-Hilbert lemma (see, e.g., [223]) we can choose w in such a way that where S(T i ) is the union of the set of elements in T h that share a node with T .
Element aggregation and ghost-penalty stabilization are conceptually different in the manner in which they approach the conditions (4.25)-(4.27). Aggregation methods aim to satisfy condition (4.25) by restricting the approximation to a suitable subspace V ag h ⊂ V h . On the other hand, condition (4.26) demands optimal approximation properties of the (sequence of) aggregated approximation spaces V ag h and, hence, the aggregated approximation spaces cannot be too small. Because the stabilization operator s h (·, ·) vanishes in aggregation methods, condition (4.27) is trivially satisfied. Ghost-penalty stabilization requires that the stabilization term s h (·, ·) is strong enough to comply with condition (4.25). On the other hand, condition (4.27) insists that the stabilization cannot be too strong, as this will affect the solution. Because ghost-penalty stabilization imposes no auxiliary conditions on the approximation spaces, condition (4.26) is trivially satisfied.

Coercivity and Continuity
We define the following energy norms Note that these norms are different from the β-norm that was applied in [1] and presented in (4.1) in Section 4.1. First, ||| · ||| h, employs h −1 , which is uniformly bounded, instead of β, which is not bounded in case no stabilization technique is applied. Second, in case of ghost-penalty stabilization, ||| · ||| h, also includes a ghost-penalty contribution. Consequently, whereas the βnorm admits a natural analysis of unstabilized forms, the |||v h ||| h, -norm naturally accompanies weakly stabilized and aggregated formulations.
and, if β is chosen large enough, a h, (·, ·) is coercive on V h,

34)
Proof. Under the notion that with aggregation or weak stabilization β h −1 , the continuity follows directly from the Cauchy-Schwarz inequality together with the observation that ∂ n v ∈ L 2 (∂Ω D ) for v ∈ V h, ⊕ H 2 (Ω) and therefore ∂ n v ∂Ω D is well defined. For the coercivity we use the element-wise trace inverse inequality [97] h ∂ n v 2 and (4.25) to conclude that where T cut h is the set of elements that intersect the boundary. Next, for any δ > 0, we have which leads to where C is the hidden constant between the first and final expressions in (4.36). Taking δ small enough to guarantee that δC 1/2 and β large enough such that β − (hδ) −1 > h −1 /2 on each element in T cut h , the coercivity follows.
we may now apply the Lax-Milgram lemma to show that there is a unique and stable solution u h ∈ V h, to (4.21).

Error Estimates
Conditions (4.25)-(4.27) ensure that the corresponding immersed finite-element approximation according to (4.18) has optimal approximation properties in the energy norm. Proof. We start by splitting the error in two parts using the interpolation operator We first consider the first term and show that To establish (4.41), we note that by virtue of a standard trace inequality (see, e.g., [223]) it holds that for v ∈ V h, + H 2 (Ω). Setting v = u − π h u and using (4.26), we obtain (4.41). Equation (4.43) applies the definition of ||| · ||| 2 h given in (4.30). Considering the second term in (4.40), we note that the definition of |||·||| h, and the coercivity property (4.34) imply In the case of the aggregation method, we have a h, (·, ·) = a h (·, ·) and ||| · ||| h, = ||| · ||| h and it immediately follows that where we used, consecutively, the consistency relation (4.20) to replace l h (w h ) by a h (u, w h ), continuity of a h according to (4.33), and the interpolation error estimate (4.41). In the case of ghost-penalty stabilization, it holds that where we used the interpolation error estimate (4.41) and the weak consistency condition (4.27) on s h (·, ·). From (4.44)-(4.46) it follows that the second term in (4.40) is also bounded in accordance with (4.39).

Condition number estimate
Under certain (non-restrictive) conditions, both aggregation and ghost-penalty stabilization engender stiffness matrices that exhibit similar condition numbers as conventional boundary-fitted finite element methods. To analyze the condition numbers corresponding to both methods, we consider the basis where y ∈ R N is the coefficient vector. It is to be noted that with ghost-penalty stabilization the basis V h, = V h is a standard nodal basis, while with aggregation V h, = V ag h is the basis that is modified at the boundary in such a way that the support of each basis function has a sufficiently large intersection with Ω, more precisely |supp and for the considered symmetric bilinear forms, the condition number of the stiffness matrix A corresponds to where λ max and λ min are the maximum and minimum eigenvalues of A, respectively. To estimate the condition number we start from the following basic bounds, which hold by construction for both aggregated and weakly stabilized methods: • There is a constant (hidden in the binary relation ) such that Both for weakly stabilized and aggregated methods, this bound follows from standard inverse inequalities; see (4.11) for the inverse estimate of the ghost-penalty form.
• There is a (hidden) constant such that This is a Poincaré inequality, requiring aggregation of the approximation space or ghost-penalty stabilization of the bilinear form to gain control on the approximate solution outside Ω.
• There are constants such that the following equivalence holds: For weakly stabilized methods we have V h, = V h and this is a standard estimate that follows from the quasi-uniformity of the mesh and the use of a nodal basis. For the aggregated finite element space V h, = V ag h , this estimate holds by virtue of the construction of the extension operator. Proof. To estimate the condition number according to (4.49), we use the following Rayleigh quotient characterization of the eigenvalues to bound the maximum (resp. minimum) eigenvalue from above (resp. below): We bound the Rayleigh quotient from above as follows where we used, consecutively, the definition (4.48) of the stiffness matrix, the equivalence (4.52), and continuity of a h, (·, ·) according to (4.33) and the inverse inequality (4.50). Conversely, to bound the quotient from below, we infer from the equivalence relation (4.52), coercivity condition (4.34), and the Poincaré inequality (4.51) that Combining the upper and lower bounds, we obtain the desired estimate Remark 4.2. Comparison to Nitsche's method without stabilization. We here briefly discuss which aspects of Section 4.5 do, and do not, apply when Nitsche's method is applied in an immersed setting without an additional stability-enhancing technique. First and foremost, the intrinsic property (4.25) will not be satisfied. Consequently, the normal gradients on the boundary are not sufficiently controlled by the gradients in the bulk, such that coercivity only holds with locally very large ( i.e., a priori unbounded) values for the Nitsche parameter β. Nevertheless, with the Nitsche parameter large enough, in the β-norm both (4.33) and (4.34) hold ( i.e., respectively continuity and coercivity), and only the continuous dependence on the data in the Lax-Milgram lemma can formally not be verified as the right-hand-side operator l h (·) is not bounded in the βnorm such that |||u h ||| β cannot be a priori bounded. The error estimate provided in (4.39) does not hold for unstabilized formulations in combination with the β-norm, as β ∼ h −1

TΩ
h −1 such that (4.43), and consequently (4.41), do not apply. Note that (4.41) is also employed in the proof of the bound on the second part of the right-hand-side of (4.40). As discussed in Section 4.1, this conclusion is also obtained in [1], which shows that a best approximation property in the βnorm can be demonstrated following Céa's lemma, but that the problem lies in the error estimates because a bound on the error of the best approximation in the β-norm cannot be provided. As already demonstrated in Section 3.1, condition number bounds cannot be provided and tailored preconditioning is required. Related to the analysis presented in this section, both (4.50) and (4.51) are contingent on ghost-penalty stabilization or aggregation. For (4.50) the bound cannot be provided as there is no a priori upper bound on β. Nevertheless, this lack of an upper bound to the eigenvalues is generally not the cause of the conditioning problems of these methods, and this type of large eigenvalues only occurs on highly irregularly shaped cut elements. The main problem with the conditioning is that without additional stabilization, (4.51) does not hold, and as demonstrated in Section 3.1 it is this lack of a lower bound to the eigenvalues that generally causes the ill-conditioning of unstabilized immersed methods.

Discussion and conclusion
In this review, we considered the problems resulting from small cut elements in immersed finite element methods, as well as the solutions that have been developed to counter these problems. This section assesses the current state of the field and presents a discussion about future research directions. Similarities and differences between the different techniques that are discussed in this manuscript are treated in the respective sections, and we do not provide a qualitative comparison here. This is because, first, the presented techniques pursue different goals related to the different problems on small cut elements, i.e., stability and conditioning. Second, the approaches are not mutually exclusive and combinations can be made. In Section 4 the combination of aggregation and ghost-penalty stabilization is already discussed, and similarly it is possible to combine it with, e.g., the minimal stabilization procedure, which fixes the stability but not the conditioning, with a preconditioning approach.
The field of immersed finite element methods has substantially grown and developed over the past decade. The employed formulations have matured, and are currently based on rigorous theoretical and mathematical foundations. Furthermore, efficient parallel implementations have been developed that enable the application to large problems into the billions of degrees of freedom. Certain aspects are still challenging, however, which particularly manifest in the application to real-world problems. These challenging aspects, and the corresponding unresolved questions, form the basis of future research directions that we believe are important for the further advancement of the field of immersed finite element methods. In this section we discuss these open research questions in a general sense, and indicate it if these are specifically related to one of the techniques that are discussed in this manuscript.
One aspect that requires further consideration to advance the field of immersed finite element methods, is the detection of small features in the geometry or the solution. The decoupling of the geometry description from the finite element mesh makes the consideration of adaptive discretizations natural to immersed formulations. In this regard, it is notable that grid refinements in the vicinity of domain boundaries are commonly considered in the hp-adaptive finite cell method [49,103,224,225]. Such refinements are not required along all boundaries, but mainly in regions where small solution features such as large gradients are expected. For efficiency purposes, we therefore consider the further advancement of error estimators and error-estimation-based adaptivity schemes tailored to immersed formulations as presented in [226] a desired development. One complication in regard of a posteriori error estimates is that the bilinear form is only coercive on discrete spaces, and that the dependence of the bilinear form on the discretization (both on the mesh size and on the order) complicates saturation assumptions. Besides the accuracy considerations related to enriched approximation spaces along small geometrical features, stabilized methods also require a certain level of geometry resolution, which makes local adaptive schemes well suited for the application of these techniques to problems on (scan-based) porous domains, as the refinements reduce the ratio of elements that are cut. Since element aggregation strongly ties degrees of freedom on badly cut elements to well posed degrees of freedom, local refinement procedures can be beneficial in cases where a large portion (or in certain regions even the majority) of the active mesh consists of badly cut elements. This is similar for ghost-penalty stabilization terms on domains with (relative to the mesh) thin structures, which can potentially make the domain artificially stiff in such cases. Note that in these cases the choice of the parameters η * (with aggregation; see Section 4.3) and τ (with ghost-penalty stabilization; see Section 4.4) is particularly critical, and that in many cases problems can be avoided by a adequate parameter choice. Nevertheless, the accurate approximation of problems with very thin features in an unfitted setting remains a difficult problem for any method, and deserves further attention. Schwarz preconditioners are not affected by the matter of mesh resolution, but it should be noted that this only pertains to the solving of the linear system. With a natural boundary condition on the unfitted boundary and a preconditioner to resolve the conditioning problems, a relatively low level of mesh resolvement can therefore lead to reasonable solutions. It should be mentioned that this can still result in artificial couplings as described in e.g., [47,163], however. With essential boundary conditions on the unfitted boundary, such as with flow problems in porous media, the application of a preconditioner without an additional stabilization technique will lead to inaccurate approximations, as described in Section 4.
While this manuscript focuses on single-field problems, mixed formulations, in particular flow problems, are an important application area of immersed finite element methods. The techniques presented in this manuscript are all applicable to such problems, and can retain or precondition inf-sup stable formulations (see, e.g., [198] for aggregation, [21,52] for ghost penalty, and [61] for preconditioning). An unresolved topic is the application of (pointwise) divergenceconforming or geometry-preserving discretizations for incompressible problems. Compatible pairs of approximation spaces that result in pointwise divergence-free approximations for boundaryfitted discretizations, generally lose this property in an unfitted setting. Furthermore, stabilization techniques complicate the formulation of pointwise divergence-free methods. Ghost-penalty terms interfere with the enforcement of incompressibility in the bilinear form, and it has not been thoroughly investigated how the existing discrete aggregation and extension operators affect the compatibility of structure-preserving pairs of approximation spaces. Besides these comments about H(div)-conforming discretizations for the conservation of geometry, similar considerations can be made about H(curl)-conforming discretizations when solving the Maxwell equations in electromagnetism.
An aspect that relates to both of the previous topics of mesh adaptivity and flow problems is the consideration of anisotropic grid refinements. To accurately capture boundary layers at a reasonable computational cost, boundary layer meshes are commonly anisotropic with a much finer resolution normal to the boundary. While the aspect of boundary layers in immersed methods has already been considered in multiple studies (see, e.g., [24,227]), present understanding of this aspect is incomplete, and this is a topic that requires further investigation.
While only briefly considered in Section 2.2.3, a topic that is strongly related to small cut elements is explicit time integration for problems in dynamics. This is particularly interesting for crash simulations, as such problems are commonly solved by explicit time steppers [150]. Although tools exist that result in stable time steps that are not affected by the cut elements (i.e., element aggregation, ghost penalty terms in the mass matrix, or lumping of the mass matrix [145,149,150]), the research on this topic is not exhaustive. In particular, a comprehensive analysis of stable time-step sizes with different choices for the mass and stiffness forms (i.e., boundary condition enforcement technique, ghost-penalty stabilization of both forms, lumping of the mass matrix, etc.) does, to the authors' knowledge, not yet exist.
Another open question deals with the choice, or the optimization, of parameters in the presented approaches. This does not only concern, e.g., the parameters η * and τ that are explicitly contained in element aggregation and the ghost penalty, but also hidden parameters (or choices) in the setup such as the choice of the index blocks with Schwarz preconditioning. While these parameters generally do not affect the asymptotic behavior, well-chosen (or, conversely, poorly-chosen) parameters can significantly affect the performance pre-asymptotically. We therefore consider an optimal choice of (hidden) parameters an important topic for future research. This specifically applies to the dependence of this optimality on the basis-function type and the discretization order and, in particular for discretizations with local refinements or local order elevations, to the option of choosing parameters locally or element-wise (similar to the Nitsche parameter without stabilization).
Over the past years, many well-developed, well-documented, and user-friendly open source codes have become available that aid the application of immersed finite element methods, a comprehensive list of which is beyond the scope of this work. These toolboxes support all facets of immersed finite element computations, such as pre-processing steps (in particular integration and the application of boundary conditions), the solution process itself, and also post-processing, where often a subtriangulation is employed like this is also done for higher-order boundary-fitted finite elements. Something that is not yet available is a graphical user interface that makes immersed finite element methods accessible also to non-expert users. Besides the developments in open source codes, also commercial software has adopted (aspects of) immersed methods. To provide some examples: in Ansys Fluent the Immersed Boundary Method (IBM) is available [63,228]; Ansys LS-DYNA facilitates the application of trimmed isogeometric analysis in both shell and solid models for static and dynamic analyses [229,230]; in Abaqus it is possible to (partially) include and exclude elements from a computation with element progressive activation [231,232], which is mainly intended for the simulation of additive manufacturing; and the platform Hyperganic [233] employs immersed finite element techniques to perform simulations. The implementation of the stabilization methods discussed in this work can require tools that are not generally available in existing software and codes. For instance, face-based ghost-penalty stabilization requires the computation of jumps in normal gradients up to the order of approximation. For quadratic and higher-order discretizations, this functionality is generally not available, even for software frameworks that include the possibility to perform face loops and compute jumps for standard Discontinuous Galerkin (DG) formulations. The implementation of patch-based ghost-penalty stabilization or aggregation require the implementation of an aggregation routine and the computation of aggregate-wise terms. These terms can be implemented using standard element-wise terms if the finite element software provides functionality to enforce linear constraints. Many codes already provide this functionality as this is essential for local grid refinements and periodic boundary conditions (e.g., deal.ii [234], fenics [235], gridap [236], fempar [200], netgen [237], or dune [238]). While (geometric) multigrid routines are intrusive to the finite element software, the implementation of tailored Schwarz preconditioners is exceptionally non-intrusive and only requires the matrix itself and the basis function connectivity. Furthermore, as additive Schwarz preconditioning is a well-established concept also in boundary-fitted finite element methods, these preconditioners are available in many codes. The construction of the index blocks is not generally the same in existing software, however, and it should be noted that the effectivity for immersed methods relies on the specific construction of these blocks. Additionally, advanced procedures in which e.g., index blocks are only devised for basis functions with support on cut elements, require the availability of this information or a routine to extract it.
By virtue of the stabilization, aggregation, and preconditioning techniques as discussed in this review, immersed finite element methods have developed into reliable and versatile computational analysis instruments. In our assessment, the potential of immersed methods and, in particular, their ability to provide new simulation workflows with minimal user intervention, has not been fully exploited yet. Another recommendation therefore pertains to the exploration of immersed methods in new application fields, such as hyperbolic systems, diffuse interface models, poroelasticity, computational aeroacoustics, and phase field fracture.