Cluster-Galaxy Weak Lensing

Weak gravitational lensing of background galaxies provides a direct probe of the projected matter distribution in and around galaxy clusters. Here we present a self-contained pedagogical review of cluster--galaxy weak lensing, covering a range of topics relevant to its cosmological and astrophysical applications. We begin by reviewing the theoretical foundations of gravitational lensing from the first principles, with special attention to the basics and advanced techniques of weak gravitational lensing. We summarize and discuss key findings from recent cluster--galaxy weak-lensing studies on both observational and theoretical grounds, with a focus on cluster mass profiles, the concentration--mass relation, the splashback radius, and implications from extensive mass calibration efforts for cluster cosmology.


Introduction
The propagation of light rays from a distant source to the observer is governed by the gravitational field of local inhomogeneities, as well as by the global geometry of the universe . Hence the images of background sources carry the imprint of gravitational lensing by intervening cosmic structures (see Pyne and Birkinshaw 1993 for the effects of lens motion that can also induce the frequency shift). Observations of gravitational lensing phenomena can thus be used to study the mass distribution of cosmic objects dominated by dark matter and to test models of cosmic structure formation (Blandford and Narayan 1992).
Galaxy clusters represent the largest class of self-gravitating systems formed in the universe, with typical masses of M ∼ 10 14−15 h −1 M . In the context of the standard structure formation scenario, cluster halos correspond to rare massive local peaks in the primordial density perturbations (e.g., Tinker et al. 2010). Galaxy clusters act as powerful cosmic lenses, producing a variety of detectable lensing effects from strong to weak lensing (Kneib and Natarajan 2011), including deflection, shearing, and magnifying of the images of background sources (e.g., Umetsu et al. 2016). The critical advantage of cluster gravitational lensing is its ability to study the mass distribution of individual and ensemble systems independent of assumptions about their physical and dynamical state (e.g., Clowe et al. 2006).
Weak gravitational lensing is responsible for the weak shape distortion and magnification of the images of background sources due to the gravitational field of intervening massive objects and large scale structure Schneider 2005;Umetsu 2010;Hoekstra 2013;Mandelbaum 2018). Weak shear lensing by galaxy clusters gives rise to levels of up to a few 10 percent of elliptical distortions in images of background sources. Thus, the weak shear lensing signal, as measured from small but coherent image distortions in galaxy shapes, can provide a direct measure of the projected mass distribution of galaxy clusters (e.g., Kaiser and Squires 1993;Fahlman et al. 1994;Okabe and Umetsu 2008). On the other hand, lensing magnification can influence the observed surface number density of background galaxies seen behind clusters, by enhancing their apparent fluxes and expanding the area of sky (e.g., Broadhurst et al. 1995Broadhurst et al. , 2005bTaylor et al. 1998;Umetsu et al. 2011b;Chiu et al. 2020). The former effect increases the source counts above the limiting flux, whereas the latter reduces the effective observing area in the source plane, thus decreasing the observed number of sources per unit solid angle. The net effect, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function.
In this paper we present a self-contained pedagogical review of weak gravitational lensing of background galaxies by galaxy clusters (cluster-galaxy weak lensing), highlighting recent advances in our theoretical and observational understanding of the mass distribution in galaxy clusters. We shall begin by reviewing the theoretical foundations of gravitational lensing (Section 2), with special attention to the basics and advanced techniques of clustergalaxy weak lensing (Sections 3, 4, and 5). Then, we highlight and discuss key findings from recent cluster-galaxy weak-lensing studies (Sections 6), with a focus on cluster mass distributions (Section 6.1), the concentration-mass relation (Section 6.2), the splashback radius (Section 6.3), and implications from extensive mass calibration efforts for cluster cosmology (Section 6.4). Finally, conclusions are given in Section 7.
There have been a number of reviews of relevant subjects (e.g., Blandford and Narayan 1992;Narayan and Bartelmann 1996;Mellier 1999;Hattori et al. 1999a;Umetsu et al. 1999;Van Waerbeke and Mellier 2003;Schneider 2005;Kneib and Natarajan 2011;Hoekstra 2013;Futamase 2015;Mandelbaum 2018). For general treatments of gravitational lensing, we refer the reader to . For a general review of weak gravitational lensing, see  and Schneider (2005). For a comprehensive review of cluster lensing, see Kneib and Natarajan (2011). For a pedagogical review on strong lensing in galaxy clusters, see Hattori et al. (1999a).
Throughout this paper, we denote the present-day density parameters of matter, radiation, and Λ (a cosmological constant) in critical units as Ω m , Ω r , and Ω Λ , respectively (see, e.g., Komatsu et al. 2009). Unless otherwise noted, we assume a concordance Λ cold dark matter (ΛCDM) cosmology with Ω m = 0.3, Ω Λ = 0.7, and a Hubble constant of H 0 = 100h km s −1 Mpc −1 with h = 0.7. We denote the mean matter density of the universe at a particular redshift z as ρ(z) and the critical density as ρ c (z). The presentday value of the critical density is ρ c,0 = 3H 2 0 /(8πG) ≈ 1.88 × 10 −29 h 2 g cm −3 ≈ 2.78 × 10 11 h 2 M Mpc −3 , with G the gravitational constant. We use the standard notation M ∆ c or M ∆ m to denote the mass enclosed within a sphere of radius r ∆ c or r ∆ m , within which the mean overdensity equals ∆ c × ρ c (z) or ∆ m × ρ(z) at a particular redshift z. That is, M ∆ c = (4π/3)∆ c ρ c (z)r 3 ∆ c and M ∆ m = (4π/3)∆ m ρ(z)r 3 ∆ m . We generally denote three-dimensional radial distances as r and reserve the symbol R for projected radial distances. Unless otherwise noted, we use projected densities (e.g., Σ(R)) and distances (e.g., R) both defined in physical (not comoving) units. All quoted errors are 1σ confidence levels (CL) unless otherwise stated.

Theory of Gravitational Lensing
The local universe appears to be highly inhomogeneous on a wide range of scales from stars, galaxies, through galaxy groups and clusters, to forming superclusters, large-scale filaments, and cosmic voids. The propagation of light from a far-background source is thus influenced by the gravitational field caused by such local inhomogeneities along the line of sight. In general, a complete description of the light propagation in an arbitrary curved spacetime is a complex theoretical problem. However, a much simpler description is possible under a wide range of astrophysically relevant circumstances, which is referred to as the gravitational lensing theory (e.g., ; Kneib and Natarajan 2011). This section reviews the basics of gravitational lensing theory to provide a basis and framework for cluster lensing studies, with an emphasis on weak gravitational lensing.

Bending of Light in an Asymptotically Flat Spacetime
To begin with, let us consider the bending of light in a weak-field regime of an asymptotically flat spacetime in the framework of general relativity. Specifically, we assume an isolated and stationary mass distribution . Then, the metric tensor g µν (µ, ν = 0, 1, 2, 3) of the perturbed spacetime can be written as where (x µ ) = (ct, x 1 , x 2 , x 3 ) are the four spacetime coordinates, Φ N is the Newtonian gravitational potential in a weak-field regime |Φ N /c 2 | 1, and c is the speed of light in vacuum. We consider the metric given by Equation (1) to be the sum of a background metric g (b) µν and a small perturbation denoted by h µν , that is, g µν = g (b) µν + h µν with |h µν | 1.
The null geodesic condition leads to δk 0 (λ) = −2Φ N (λ)/c 2 + O(h 2 ), or cdt/dλ = 1 − 2Φ N (λ)/c 2 + O(h 2 ) > 1. The gravitational time delay ∆t grav , with respect to the unperturbed light propagation, is thus given by Note that there is an additional time delay due to a change in the geometrical path length caused by gravitational deflection (see Section 2.6.1).

Lens Equation
Let us consider the situation illustrated in Figure 1. A light ray propagates from a far-distant source (S) at the position η in the source plane to an observer (O), passing the position ξ in the lens plane, in which the light is deflected by a bending angleα. Here the source and lens planes are defined as planes perpendicular to the optical axis at the distance of the source and the lens, respectively. The exact definition of the optical axis does not matter, because the angular scales involved are very small. The angle between the optical axis and the unlensed source (S) position is β, and the angle between the optical axis and the image (I) is θ. The The angular position of the source (S) relative to the optical axis is denoted by β, and that of the image (I) relative to the optical axis is denoted by θ. The figure is adapted from Umetsu (2010). angular diameter distances between the observer and the lens, the observer and the source, and the lens and the source, are denoted by D l , D s , and D ls , respectively. As illustrated in Figure 1, we have the following geometrical relation: η = (D s /D l )ξ + D lsα (ξ). Equivalently, this is translated into the relation between the angular source and image positions, β = η/D s and θ = ξ/D l , as where we defined the reduced bending angle, or the deflection field (Broadhurst et al. 2005a), α(θ) = (D ls /D s )α in the last equality. Equation (8) is referred to as the lens equation, or the ray-tracing equation.
In general, the lens equation is nonlinear with respect to the image position θ, so that it may have multiple solutions θ for a given source position β. This corresponds to multiple imaging of a background source (see Hattori et al. 1999a;Kneib and Natarajan 2011). An illustration of the typical circularly symmetric lens system is shown in Figure 2. We refer to Keeton (2001) for a review of various families of mass models for gravitational lensing.

Cosmological Lens Equation
Here we turn to the cosmological lens equation that describes the light propagation in a locally inhomogeneous, expanding universe. There are various approaches to derive a cosmological version of the lens equation (e.g., Schneider 1985;Sasaki 1993;Seitz et al. 1994;  image parities (lower panel) for a typical axis-symmetric lens system. Three images (I 1 , I 2 , I 3 ) are produced for a source (S) at the location β = βs with a radial width δβ. The images are formed at the intersections of β = β(θ) and the horizontal line β = βs. Critical curves are also shown by two solid concentric circles. The inner circle represents the radial critical curve where dβ(θ)/dθ = 0, while the outer circle represents the tangential critical curve where β(θ) = 0. The image parities are illustrated in the lower panel. The images I 1 and I 2 have the same total parity as the source S, while I 3 has the opposite total parity to S. Futamase 1995;Dodelson 2003;Sereno 2009). We follow the approach by Futamase (1995) based on perturbed null geodesic equations as introduced in Section 2.1.
Consider a perturbed Friedman-Lemaítre-Robertson-Walker (FLRW) metric in the Newtonian gauge of the form (e.g., Kodama and Sasaki 1984): ds 2 = −(1 + 2Ψ/c 2 )c 2 dt 2 + (1 − 2Ψ/c 2 )a 2 (t) dχ 2 + r 2 (dϑ 2 + sin 2 ϑdϕ 2 ) , (9) where a(t) is the scale factor of the universe (normalized to unity at present), χ is the comoving distance, ϑ and ϕ are the spherical polar and azimuthal angles, respectively, Ψ is a scalar metric perturbation, K is the spatial curvature of the universe, and r = f K (χ) is the comoving angular diameter distance, The spatial curvature K is expressed with the total density parameter at the present epoch, Ω 0 ≡ X Ω X = Ω m + Ω r + Ω Λ , as K = (Ω 0 − 1)H 2 0 /c 2 . The evolution of a(t) is determined by the Friedmann equation, H(a) ≡ (da/dt)/a = H 0 [Ω r a −4 + Ω m a −3 + Ω Λ + (1 − Ω 0 )a −2 ] 1/2 . In the line element (9), we have neglected all terms of higher than O(Ψ/c 2 ), the contributions from vector and tensor perturbations, and the effects due to anisotropic stress. As we will discuss in Section 2.5.1, Ψ is interpreted as the Newtonian gravitational potential generated by local inhomogeneities of the matter distribution in the universe.
Since the structure of a light cone is invariant under the conformal transformation, we work with the conformally related spacetime metric given by ds 2 = a −2 ds 2 ≡g µν dx µ dx ν with (x µ ) = (η, χ, ϑ, ϕ), where η = c t dt /a(t ) is the conformal time. The metricg µν can be rewritten in the form ofg µν =g (b) µν + h µν , as a sum of the background metric and a small perturbation (|h| 1). We follow the prescription given in Section 2.1 to solve the null geodesic equations in the perturbed spacetime (Equation (9)). To this end, we consider past-directed null geodesics from the observer. Choosing the spherical coordinate system centered on the observer, we have k (b)µ = (−1, 1, 0, 0) in the background metric with Ψ = 0. The unperturbed path is parameterized by the affine parameter λ along the photon path as x (b)µ (λ) = (−λ, λ − λ o , ϑ I , ϕ I ), where λ o is the affine parameter at the observer and (ϑ I , ϕ I ) denote the angular direction of the image position on the sky. The comoving angular distance r in the background spacetime can be parameterized by λ as r(λ) ≡ f K [χ(λ − λ o )] (see Equation (10)).

Flat-sky Approximation
Now we consider a small patch of the sky around a given line of sight (ϑ = 0), across which the curvature of the sky is negligible (ϑ 1). Then, we can locally define a flat plane perpendicular to the line of sight. By noting that δθ ≡ (δϑ, ϑδϕ) is an angular displacement vector within this sky plane, we can express Equation (12) as where β(χ s ) is the (unlensed) angular position of the source, θ is the apparent angular position of the source image, and α(χ s ) is the deflection field given by (Futamase 1995): where is the transverse comoving gradient and the integral is performed along the perturbed trajectory (13) can be applied to a range of lensing phenomena, including multiple deflections of light from a background source (Section 2.5), strong and weak gravitational lensing by individual galaxies and clusters (Section 2.6), and cosmological weak lensing by the intervening large-scale structure (a.k.a., the cosmic shear). Note that the cosmological lens equation is obtained using the standard angular diameter distance in a background FLRW spacetime without employing the thin lens approximation (see Section 2.6).

Multiple Lens Equation
We consider a discretized version of the cosmological lens equation (Equation (13)) by dividing the radial integral between the source (χ = χ s ) and the observer (χ = 0) into N comoving boxes (N − 1 lens planes) separated by a constant comoving distance of ∆χ. The angular position θ (n) of a light ray in the nth plane (1 n N ) is then given by (e.g., Schneider 2019): where θ (0) is the apparent angular position of the source image andα (m) is the bending angle at the mth lens plane (m = 1, 2, ..., n − 1), The 2 × 2 Jacobian matrix of Equation (15) is expressed as (e.g., Jain et al. 2000): 3 where I denotes the identity matrix, , D n is the angular diameter distance between the observer and the nth lens plane, and D mn is the angular diameter distance between the mth and nth lens planes (m < n). In general, the Jacobian matrix A (n) can be decomposed into the following form: where κ is the lensing convergence, (γ 1 , γ 2 ) are the two components of the gravitational shear (see Section 2.6.2 for the definitions and further details of the convergence and shear), ω is the net rotation (e.g., Cooray and Hu 2002), and σ a (a = 1, 2, 3) are the Pauli matrices that satisfy σ a σ b = i abc σ c , with abc the Levi-Civita symbol. The Born approximation A (m) = I on the right-hand side of Equation (17) leads to a symmetric Jacobian matrix with ω = 0. The multiple lens equation has been widely used to study gravitational lensing phenomena by ray-tracing through N -body simulations (e.g., Schneider and Weiss 1988;Hamana et al. 2000;Jain et al. 2000).

Cosmological Poisson Equation
We assume here a spatially flat geometry with K = 0 motivated by cosmological observations based on CMB and complementary data sets (e.g., Hinshaw et al. 2013;Planck Collaboration et al. 2015b). The cosmological Poisson equation relates the scalar metric perturbation Ψ (see Equation (9)) to the matter density perturbation δρ on subhorizon scales as where δ = δρ/ρ is the density contrast with respect to the background matter density ρ of the universe, ρ = a −3 (3H 2 0 Ω m )/(8πG), and ∇ is the three-dimensional gradient operator in comoving coordinates. A key implication of Equation (19) is that the amplitude of Ψ is related to the amplitude of δ as |Ψ/c 2 | ∼ (3Ω m /2)(l/L H ) 2 (δ/a) where l and L H = c/H 0 denote the characteristic comoving scale of density perturbations and the Hubble radius, respectively. Therefore, assuming the standard matter power spectrum of density fluctuations (e.g., Smith et al. 2003), we can safely conclude that the degree of metric perturbation is always much smaller than unity, i.e., |Ψ/c 2 | 1, even for highly nonlinear perturbations with |δ| 1 on small scales of l L H (∼ 3h −1 Gpc).

Thin Lens Approximation
Let us turn to the case of gravitational lensing caused by a single cluster-scale halo. Galaxy clusters can produce deep gravitational potential wells, acting as powerful gravitational lenses. In cluster gravitational lensing it is often assumed that the total deflection angle, α(θ), is dominated by the cluster of interest and its surrounding large-scale environment, which becomes important beyond the cluster virial radius, r vir (Cooray and Sheth 2002;Oguri and Hamana 2011;Diemer and Kravtsov 2014).
Assuming that the light propagation is approximated by a single lens event due to the cluster and that a light deflection occurs within a sufficiently small region (χ l − ∆χ/2, χ l + ∆χ/2) compared to the relevant angular diameter distances, we can write the deflection field by a single cluster as where D s = a(χ s )r(χ s ) and D ls = a(χ s )r(χ s − χ l ) are the angular diameter distances from the observer to the source and from the deflector to the source, respectively, and r(χ l )θ is the comoving transverse vector on the lens plane. In a cosmological situation, the angular diameter distances D mn between the planes m and n (z m < z n ) are of the order of the Hubble radius, L H ≡ c/H 0 ∼ 3h −1 Gpc, while physical extents of clusters are about 2r 200m ∼ (2 − 4)h −1 Mpc in comoving units. Therefore, one can safely adopt the thin-lens approximation in cluster gravitational lensing. We then introduce the effective lensing potential ψ(θ) defined as where D l is the angular diameter distance between the observer and the lens, D l = a(χ l )r(χ l ).
In terms of ψ(θ), the deflection field α(θ) is expressed as where With the Fermat or time-delay potential defined by the lens equation can be equivalently written as ∇ θ τ (θ; β) = 0 (Blandford and Narayan 1986). Here the first term on the right hand side of Equation (23) is responsible for the geometric delay and the second term for the gravitational time delay. The Fermat potential τ (θ; β) is related to the time delay ∆t with respect to the unperturbed path in the observer frame by ∆t(θ; β) (Refsdal 1964). According to Fermat's principle, the images for a given source position β are formed at the stationary points of τ (θ; β) with respect to variations of θ (Blandford and Narayan 1986).
Note that cluster gravitational lensing is also affected by uncorrelated large-scale structure projected along the line of sight (e.g., Schneider et al. 1998;Hoekstra 2003;Umetsu et al. 2011a;Host 2012). The intervening large-scale structure in the universe perturbs the propagation of light from distant background galaxies, producing small but continuous transverse excursions along the light path. For a given depth of observations, the impact of such cosmic noise is most important in the cluster outskirts where the cluster lensing signal is small (Hoekstra 2003;Becker and Kravtsov 2011;Gruen et al. 2015).

Convergence and Shear
Let us work with local Cartesian coordinates θ = (θ 1 , θ 2 ) centered on a certain reference point in the image plane. The local properties of the lens mapping are described by the Jacobian matrix defined as where we have introduced the notation, ψ ,ij = ∂ 2 ψ/∂θ i ∂θ j (i, j = 1, 2). Alternatively, we can write the Jacobian matrix as A ij = δ ij − ψ ,ij (i, j = 1, 2) with δ ij the Kronecker delta in two dimensions. This symmetric 2 × 2 Jacobian matrix A can be decomposed as where σ a (a = 1, 2, 3) are the Pauli matrices (Section 2.5); κ(θ) is the lensing convergence responsible for the change in the trace part of the Jacobian matrix (tr(A) = 2(1 − κ)), with = ∇ 2 θ ; and (γ 1 , γ 2 ) are the two components of the complex shear γ(θ) := γ 1 (θ) + iγ 2 (θ), Note that Equation (26) can be regarded as a two-dimensional Poisson equation, ψ(θ) = 2κ(θ). Then, the Green function in the (hypothetical) infinite domain is −1 (θ, θ ) = ln |θ − θ |/(2π), 4 so that the convergence is related to the lensing potential as The Jacobian matrix is expressed in terms of κ and γ as The determinant of the Jacobian matrix (29) is given as detA = (1 − κ) 2 − |γ| 2 . In the weak-lensing limit where |κ|, |γ| 1, detA 1 − 2κ to the first order. The deformation of the image of an infinitesimal circular source (dβ → 0) behind the lens can be described by the inverse Jacobian matrix A −1 of the lens equation. In the weaklensing limit (|κ|, |γ| 1), we have where Γ ij is the symmetric trace-free shear matrix defined by Crittenden et al. 2002): 4 Here we assume that the field size is sufficiently larger than the characteristic angular scale of the lensing clusters but small enough for the flat-sky assumption to be valid. with ∂ i := ∂/∂θ i (i = 1, 2). The shear matrix can be expressed in terms of the Pauli matrices as Γ = σ 3 γ 1 + σ 1 γ 2 . The first term in Equation (30) describes the isotropic light focusing or area distortion in the weak-lensing limit, while the second term induces an asymmetry in lens mapping. The shear γ is responsible for image distortion and can be directly observed from image ellipticities of background galaxies in the regime where |κ|, |γ| 1 (see Section 3). Note that both κ and γ contribute to the isotropic and anisotropic distortions in the non-weak lensing regime.

Convergence + Shear
Convergence alone

Grabitational Lensing
Gravitational Lensing Fig. 3 Illustration of the effects of the convergence κ and the shear γ on the angular shape and size of a hypothetical circular source. The convergence acting alone causes an isotropic focusing (magnification) of the image (dashed circle), while the shear deforms it to an ellipse.
In Figure 3, we illustrate the effects of the lensing convergence κ and the gravitational shear γ on the angular shape and size of an infinitesimal circular source. The convergence acting alone causes an isotropic magnification of the image, while the shear deforms it to an ellipse. Note that the magnitude of ellipticity induced by gravitational shear in the weaklensing regime (|γ| < ∼ 0.1) is much smaller than illustrated here.

Magnification
Gravitational lensing describes the deflection of light by gravity. Lensing conserves the surface brightness of a background source, a consequence of Liouville's theorem. On the other hand, lensing causes an isotropic focusing of light rays, resulting in an amplification of the image flux through the local solid-angle distortion. Lensing magnification µ is thus given by taking the ratio between the lensed to the unlensed image solid angle as µ = δΩ I /δΩ S = 1/detA, with In the weak-lensing limit (|κ|, |γ| 1), the magnification factor to the first order is The magnitude change at κ(θ) = 0.1 is thus ∆m = −(5/2) log 10 (µ) ∼ −0.2.

Strong-and Weak-lensing Regimes
The A(θ) matrix has two local eigenvalues Λ ± (θ) at each image position θ, with Λ + Λ − . Fig. 4 Shape distortion field produced by a simulated lens with a bimodal mass distribution. At each grid point in the image plane (left panel), we have drawn the apparent shape for an intrinsically circular source using the local deformation factors Λ ± (Equation (34)). All ellipses have an equal area regardless of the magnification factor. The right panel shows the κ map of the bimodal lens. In both panels, the solid lines indicate the critical curves. The image distortion disappears locally along the curve κ = 1 indicated by the dashed line, which lies in the negative-parity region.
Images with detA(θ) > 0 have the same parity as the source, while those with detA(θ) < 0 have the opposite parity to the source. A set of closed curves defined by detA(θ) = 0 in the image plane are referred to as critical curves, on which lensing magnification formally diverges, and those mapped into the source plane are referred to as caustics (see Hattori et al. 1999a). The critical curves separate the image plane into even-and odd-parity regions with detA > 0 and detA < 0, respectively.
A lens system that has a region with κ(θ) > 1 can produce multiple images for certain source positions β, and such a system is referred to as being supercritical. Note that being supercritical is a sufficient but not a necessary condition for a general lens to produce multiple images, because the shear can also contribute to multiple imaging. Nevertheless, this provides us with a simple criterion to broadly distinguish the regimes of multiple and single imaging. Keeping this in mind, we refer to the region where κ(θ) > ∼ 1 as the strong-lensing regime and the region where κ(θ) 1 as the weak-lensing regime.

Critical Surface Mass Density
The lensing convergence κ is essentially a distance weighted mass overdensity projected along the line-of-sight. We express κ(θ) due to cluster gravitational lensing as where χ s is the comoving distance to the source plane; Σ = χ s 0 (ρ − ρ) adχ is the surface mass density field of the lens projected on the sky; and Σ cr is the critical surface mass density of gravitational lensing, 5 for z s > z l and Σ −1 cr (z l , z s ) = 0 (i.e., D ls /D s = 0) for an unlensed source with z s ≤ z l . In the second (approximate) equality of Equation (35), we have explicitly used the thin-lens approximation (Section 2.6.1). The critical surface mass density Σ cr depends on the geometric configuration (z l , z s ) of the lens-source system and the background cosmological parameters, such as (Ω m , Ω Λ , H 0 ). For example, for z l = 0.3 and z s = 1 in our fiducial cosmology, we have Σ cr ≈ 4.0 × 10 15 hM Mpc −2 . For a fixed lens redshift z l , the geometric efficiency of gravitational lensing is determined by the distance ratio D ls /D s as a function of z s and the background cosmology.
In order to translate the observed lensing signal into surface mass densities, one needs an estimate of Σ cr (z l , z s ) for a given background cosmology. In the regime where z l z s (say, z l < ∼ 0.2 for background galaxy populations at z s ∼ 1), Σ cr depends weakly on the source redshift z s , so that a precise knowledge of the source redshift distribution is less critical (e.g., Okabe and Umetsu 2008;Okabe et al. 2010).
Conversely, this distance dependence of the lensing effects can be used to constrain the cosmological redshift-distance relation by examining the geometric scaling of the lensing signal as a function of the background redshift (Taylor et al. 2007(Taylor et al. , 2012Medezinski et al. 2011;Dell'Antonio et al. 2019). Figure 5 compares D ls /D s as a function of z s for various sets of the lens redshift and the cosmological model. Note that, in the limit where the lensing matter is continuously distributed along the line of sight, the first equality of Equation (35) can be formally rewritten as with g(χ, χ s ) = r(χ)r(χ s − χ)/r(χ s ) and δ = δρ/ρ. Equation (37) coincides with the expression for the cosmic convergence due to intervening cosmic structures (see Jain et al. 5 In the weak-lensing literature, projected densities and distances are often defined to be in comoving units. For example, the critical surface mass density for lensing in comoving units, Σ cr (z l , zs), is related to that in physical units, Σcr(z l , zs), as Σ (c) cr = Σcr(1 + z l ) −2 . Similarly, the comoving projected separation R (c) is related to that in physical units, R, as R (c) = (1 + z l )R.

Fig. 5
Distance ratio D ls /Ds as a function of the source redshift zs for various sets of the lens redshift z l and the cosmological parameters (Ωm, Ω Λ ). The ratio D ls /Ds is shown for three different lens redshifts, z l = 0.2, 0.5, and 0.8 (from left to right) and for three sets of the cosmological parameters, (Ωm, Ω Λ ) = (1, 0), (0.3, 0), and (0.3, 0.7).

2000)
. It is interesting to compare the above line-of-sight integral (Equation (37)) to the thermal Sunyaev-Zel'dovich effect (SZE) in terms of the Compton-y parameter (e.g., Sunyaev and Zeldovich 1972;Rephaeli 1995;Birkinshaw 1999), where σ T , m e , k B are the Thomson scattering cross-section, the electron mass, and the Boltzmann constant, respectively; T CMB = T 0 (1 + z) is the temperature of CMB photons with T 0 = 2.725 K; and T e and n e are the electron temperature and number density of the intracluster gas, with P e = n e k B T e the electron pressure. In the second (approximate) equality, we have used T e T 0 (1 + z). The Compton-y parameter is proportional to the electron pressure integrated along the line of sight, thus probing the thermal energy content of thermalized hot plasmas residing in the gravitational potential wells of galaxy clusters. The combination of the thermal SZE and weak lensing thus provides unique astrophysical and cosmological probes (e.g., Doré et al. 2001;Umetsu et al. 2009;Osato et al. 2020).

Einstein Radius
Detailed strong-lens modeling using many sets of multiple images with measured spectroscopic redshifts allows us to determine the location of the critical curves (e.g., Zitrin et al. 2015;Meneghetti et al. 2017), which in turn provides accurate estimates of the projected total mass enclosed by them. In this context, the term Einstein radius is often used to refer to the size of the outer (tangential) critical curve (i.e., Λ − (θ) = 0; Section 2.6.4). We note, however, that there are several possible definitions of the Einstein radius used in the literature (see Meneghetti et al. 2013). Here we adopt the effective Einstein radius definition (Redlich et al. 2012;Meneghetti et al. 2013Meneghetti et al. , 2017Zitrin et al. 2015), ϑ Ein = A c /π, where A c is the (angular) area enclosed by the outer critical curve. For an axisymmetric lens, the average surface mass density within the critical area is equal to Σ cr (see Hattori et al. 1999b;Meneghetti et al. 2013), thus enabling us to directly estimate the enclosed projected mass by M 2D (< ϑ Ein ) = π(D l ϑ Ein ) 2 Σ cr . Even for general non-axisymmetric lenses, the projected enclosed mass profile M 2D (< ϑ) = Σ cr D 2 l ϑ ≤ϑ κ(θ ) d 2 θ at the location ϑ ∼ ϑ Ein is less sensitive to modeling assumptions and approaches (e.g., Umetsu et al. 2012Umetsu et al. , 2016Meneghetti et al. 2017), thus serving as a fundamental observable quantity in the strong-lensing regime (Coe et al. 2010).

Basics of Cluster Weak Lensing
In this section, we review the basics of cluster-galaxy weak lensing based on the thin-lens formalism (Section 2.6). Unless otherwise noted, we will focus on subcritical lensing (i.e., outside the critical curves). We consider both linear (κ 1) and mildly nonlinear regimes of weak gravitational lensing.

Spin Operator and Lensing Fields
For mathematical convenience, we introduce a concept of "spin" for weak-lensing quantities as follows Okura et al. 2007Okura et al. , 2008Schneider and Er 2008;Bacon and Schäfer 2009): a quantity is said to have spin N if it has the same value after rotation by 2π/N . The product of spin-A and spin-B quantities has spin (A + B) and the product of spin-A and spin-B * quantities has spin (A − B), where * denotes the complex conjugate.
We define a complex spin-1 operator ∂ := ∂ 1 + i∂ 2 that transforms as a vector, ∂ = ∂e iϕ , with ϕ being the angle of rotation relative to the original basis. Then, the lensing convergence is expressed in terms of ψ(θ) as where ∂∂ * = ∇ 2 θ is a scalar or a spin-0 operator. Similarly, the complex shear γ = γ 1 + iγ 2 ≡ |γ|e 2iφ γ is expressed as whereD is a spin-2 operator that transformes asD =De 2iϕ under a rotation of the basis axes by ϕ.

Linear Mass Reconstruction
Since γ(θ) and κ(θ) are both linear combinations of the second derivatives of ψ(θ), they are related to each other by (Kaiser 1995;Crittenden et al. 2002;Umetsu 2010) The shear-to-mass inversion can thus be formally expressed as Using −1 (θ, θ ) = ln |θ − θ |/(2π) (Section 2.6.2), Equation (43) in the flat-sky limit can be solved to yield the following nonlocal relation between κ and γ (Kaiser and Squires 1993, hereafter KS93): where κ 0 is an additive constant and D(θ) is a complex kernel defined as Similarly, the complex shear field can be expressed in terms of the convergence κ as This linear mass inversion formalism is often referred to as the KS93 algorithm. It is computationally faster to work in Fourier domain (Jain et al. 2000) using the fast Fourier transform algorithm. By taking the Fourier transform of Equation (42), we have a mass inversion relation in the conjugate Fourier space as where k is the two-dimensional wave vector conjugate to θ,κ(k) andγ(k) are the Fourier transforms of κ(θ) and γ(θ) = γ 1 (θ) + iγ 2 (θ), respectively. In practical applications, one may assumeκ(0) = 0 if the angular size of the observed shear field is sufficiently large so that the mean convergence across the data field is approximated to zero. Otherwise, one must explicitly account for the boundary conditions imposed by the observed shear field to perform a mass reconstruction on a finite field (e.g., Kaiser 1995;Seitz and Schneider 1996;Bartelmann et al. 1996;Seitz and Schneider 1997;Umetsu and Futamase 2000).
In Figure 6, we show the shape distortion field in the rich cluster Cl0024+1654 (z l = 0.395) obtained by Umetsu et al. (2010) from deep weak-lensing observations taken with Suprime-Cam on the 8.2 m Subaru telescope. They accounted and corrected for the effect of the weight function used for calculating noisy galaxy shapes, as well as for the anisotropic and smearing effects of the point spread function (PSF), using an improved implementation of the modified Kaiser et al. (1995, hereafter KSB) method (see Section 3.4.2). In the left panel of Figure 7, we show the κ(θ) field reconstructed from the Subaru weak-lensing data shown in Figure 6. A prominent mass peak is visible in the cluster center, around which the distortion pattern is clearly tangential ( Figure 6). In this study, a variant of the linear KS93 algorithm was used to reconstruct the κ map from the weak shear lensing data. In the right panel of Figure 7, we show the member galaxy distribution Σ n (θ) in the cluster. Overall, mass and light are similarly distributed in the cluster.   (Okabe et al. 2014). In the figure, the weak-lensing mass map is compared to the luminosity and number density distributions of spectroscopically identified cluster members, as well as to the projected large-scale structure model based on galaxy-galaxy lensing with the light-tracing-mass assumption. The projected mass and galaxy distributions in the Coma cluster are correlated well with each other. Thanks to the large angular extension of the Coma cluster, Okabe et al. (2014) measured the weak-lensing masses of 32 cluster subhalos down to the order of 10 −3 of the cluster virial mass.

Mass-sheet Degeneracy
Adding a constant mass sheet to κ(θ) in the shear-to-mass formula (46) does not change the shear field γ(θ) that is observable in the weak-lensing limit. This leads to a degeneracy of solutions for the weak-lensing mass inversion problem, which is referred to as the masssheet degeneracy (Falco et al. 1985;Gorenstein et al. 1988;Schneider and Seitz 1995).
As we shall see in Section 3.4, in general, the observable quantity for weak shear lensing is not the shear γ, but the reduced shear, in the subcritical regime where detA > 0 (or 1/g * in the negative-parity region with detA < 0). We see that the g(θ) field is invariant under the following global transformation: with an arbitrary scalar constant λ = 0 . This transformation is equivalent to scaling the Jacobian matrix A(θ) with λ, A(θ) → λA(θ). It should be noted that this transformation leaves the location of the critical curves (detA(θ) = 0) invariant as well. Moreover, the location of the curve defined by κ(θ) = 1, on which the distortion locally disappears, is left invariant under the transformation (Equation (49)). A general conclusion is that, all mass reconstruction methods based on shape information alone can determine the κ field only up to a one-parameter family (λ or κ 0 ) of linear transformations (Equation (49)).
In principle this degeneracy can be broken or alleviated, for example, by measuring the magnification factor µ in the subcritical regime (i.e., outside the critical curves; see Umetsu 2013), because µ transforms under the invariance transformation (Equation (49)) as

Nonlinear Mass Reconstruction
Following Seitz and Schneider (1995), we generalize the KS93 algorithm to include the nonlinear but subcritical regime (outside the critical curves). To this end, we express the KS93 inversion formula in terms of the observable reduced shear g(θ). Substituting γ = g(1 − κ) in Equation (44), we have the following integral equation: For a given g(θ) field, this nonlinear equation can be solved iteratively, for example, by initially setting κ(θ) = 0 everywhere , Equivalently, Equation (51) can be formally expressed as a power series expansion (Umetsu et al. 1999): whereĜ is the convolution operator defined bŷ HereĜ(θ, θ ) acts on a function of θ . The KS93 algorithm corresponds to the first-order approximation to this power series expansion in the weak-lensing limit. Note that solutions for nonlinear mass reconstructions suffer from the generalized mass-sheet degeneracy, as explicitly shown in Equation (52).
Here the first term associated with ψ E is a gradient or scalar E component and the second term with ψ B is a curl or pseudoscalar B component. The shear components (γ 1 , γ 2 ) are written in terms of ψ E and ψ B as As we have discussed in Section 3.1.1, the spin-2 γ(θ) field is coordinate dependent and transforms as γ = γe 2iϕ under a rotation of the basis axes by ϕ. The E and B components can be extracted from the shear matrix as where we have defined the E and B fields, κ E = (1/2) ψ E and κ B = (1/2) ψ B , respectively. This technique is referred to as the E/B-mode decomposition. We see from Equation (58) that the relations between E/B-fields and spin-2 fields are intrinsically nonlocal. Remembering that the shear matrix due to weak lensing is given as Γ ij = (∂ i ∂ j − δ ij /2)ψ(θ) (i, j = 1, 2), we identify ψ E (θ) = ψ(θ) and ψ B (θ) = 0. Hence, for a lensing-induced shear field, the E-mode signal is related to the convergence κ, i.e., the surface mass density of the lens, while the B-mode signal is identically zero. Figure 9 illustrates characteristic distortion patterns from E-mode (curl-free) and Bmode (divergence-free) fields. Weak lensing only produces curl-free E-mode signals, so that the presence of divergence-free B-modes can be used as a null test for systematic effects. In the weak-lensing regime, a tangential E-mode pattern is produced by a positive mass overdensity (e.g., halos), while a radial E-mode pattern is produced by a negative mass overdensity (e.g., cosmic voids). Now we turn to the issue of E/B-mode reconstructions from the spin-2 shear field. Rewriting Equation (58) in terms of the complex shear γ, we find  where (Z) and (Z) denote the real part and the imaginary part of a complex variable Z, respectively. Defining κ ≡ κ E +iκ B , we see that the first of Equation (59) is identical to the mass inversion formula (Equation (42)). The B-mode convergence κ B can thus be simply obtained as the imaginary part of Equation (44), which is expected to vanish for a purely weak-lensing signal. Moreover, the second of Equation (59) indicates that the transformation γ (θ) = iγ(θ) (γ 1 = −γ 2 , γ 2 = γ 1 ) is equivalent to an interchange operation of the E and B modes of the original maps by κ E (θ) = −κ B (θ) and κ B (θ) = κ E (θ). Since γ is a spin-2 field that transforms as γ = γe 2iϕ , this operation is also equivalent to a rotation of each ellipticity by π/4 with each position vector fixed. Note that gravitational lensing can induce B-modes, for example, when multiple deflections of light are involved (Section 2.5). However, these B modes can be generated at higher orders and the B-mode contributions coming from multiple deflections are suppressed by a large factor compared to the E-mode contributions. In real observations, intrinsic ellipticities of background galaxies also contribute to weak-lensing shear estimates. Assuming that intrinsic ellipticities have random orientations in projection space, such an isotropic ellipticity distribution will yield statistically identical contributions to the E and B modes. Therefore, the B-mode signal provides a useful null test for systematic effects in weaklensing observations ( Figure 9).

Flexion
Flexion is introduced as the next higher-order lensing effects responsible for an arc-like and weakly skewed appearance of lensed galaxies (Goldberg and Bacon 2005;Bacon et al. 2006) observed in a regime between weak and strong lensing (i.e., a nonlinear but subcritical regime). Such higher-order lensing effects occur when κ(θ) and γ(θ) are not spatially constant across a source galaxy image. By taking higher-order derivatives of the lensing potential ψ(θ), we can work with higher-order transformations of galaxy shapes by weak lensing (e.g., Massey et al. 2007b;Okura et al. 2007Okura et al. , 2008Goldberg and Leonard 2007;Schneider and Er 2008;Viola et al. 2012). The convergence κ is a spin-0 quantity, the first flexion F = F 1 + iF 2 is spin-1, the shear γ = γ 1 + iγ 2 is spin-2, and the second flexion G = G 1 + iG 2 is spin-3. This figure is taken from Bacon et al. (2006).
The third-order derivatives of ψ(θ) can be combined to form a pair of complex flexion fields as : The first flexion F has spin 1 and the second flexion G has spin 3. The two complex flexion fields satisfy the following consistency relation: Figure 10 illustrates the characteristic weak-lensing distortions with different spin values for an intrinsically circular Gaussian source ). If the angular size of an image is small compared to the characteristic scale over which ψ(θ) varies, we can locally expand Equation (13) to the next higher order as where A ij,k = −ψ ,ijk (i, j, k = 1, 2). The A ij,k matrix can be expressed with a sum of two terms, with the spin-1 part F ijk and the spin-3 part G ijk defined by Flexion has a dimension of inverse (angular) length, so that the flexion effects depend on the angular size of the source image. That is, the smaller the source image, the larger the amplitude of intrinsic flexion contributions (Okura et al. 2008). The shape quantities affected by the first flexion F alone have spin-1 properties, while those by the second flexion G alone have spin-3 properties. Note that, as in the case of the spin-2 shear field, what is directly observable from higherorder image distortions are the reduced flexion effects, F/(1 − κ) and G/(1 − κ) (Okura et al. 2007(Okura et al. , 2008Goldberg and Leonard 2007;Schneider and Er 2008), a consequence of the mass-sheet degeneracy.
From Equation (60), the inversion equations from flexion to κ can be obtained as follows : where the complex part iB describes the B-mode component that can be used to assess the noise properties of weak-lensing data (e.g., Okura et al. 2008). An explicit representation for the inversion equations is obtained in Fourier space as for k = 0.
In principle one can combine independent mass reconstructions κ a (k) (a = γ, F, G) linearly in Fourier space to improve the statistical significance with minimum noise variance weighting as (Okura et al. 2007): where W κ|a (k) = 1/P κ|a (k) the two-dimensional noise power spectrum of κ reconstructed using the observable a, with P (N ) a (k) the shot noise power, σ a the shape noise dispersion, and n a the mean surface number density of background source galaxies, for the observable a (a = γ, F, G). Assuming that errors in κ a (k) between different observables are independent, the noise power spectrum for the estimator (Equation (69)) is obtained as (Okura et al. 2007):  Figure 11 shows the κ field in the central region of the rich cluster Abell 1689 (z l = 0.183) reconstructed from the spin-1 flexion alone (Okura et al. 2008) measured with Subaru Suprime-Cam data. Okura et al. (2008) used measurements of higher-order lensing image characteristics (HOLICs) introduced by Okura et al. (2007). Their analysis accounted for the effect of the weight function used for calculating noisy shape moments, as well as for higher-order PSF effects. One can employ the assumption of random orientations for intrinsic HOLICs of background galaxies to obtain a direct estimate of flexion, in a similar manner to the usual prescription for weak shear lensing. Okura et al. (2008) utilized the Fourier-space relation (Equation (68)) between F (θ) and κ(θ) with the linear weak-lensing approximation. The B-mode convergence field was used to monitor the reconstruction error in the κ map. The reconstructed κ map exhibits a bimodal feature in the central region of the cluster. The pronounced main peak is associated with the brightest cluster galaxy (BCG) and central cluster members, while the secondary mass peak is associated with a local concentration of bright galaxies.

Shear Observables
Since the pioneering work of Kaiser et al. (1995), numerous methods have been proposed and implemented in the literature to accurately extract the lensing signal from noisy pixelized images of background galaxies (e.g., Kuijken 1999;Bridle et al. 2002;Bernstein and Jarvis 2002;Refregier 2003;Hirata and Seljak 2003;Miller et al. 2007). On the other hand, considerable progress has been made in understanding and controlling systematic biases in noisy shear estimates by relying on realistic galaxy image simulations (e.g., Heymans et al. 2006;Massey et al. 2007a;Refregier et al. 2012;Kacprzak et al. 2012;Mandelbaum et al. 2014Mandelbaum et al. , 2015Mandelbaum et al. , 2018a. Here we will review the basic idea and essential aspects of the moment-based KSB formalism. We refer the reader to Mandelbaum (2018) for a recent exhaustive review on the subject.

Ellipticity Transformation by Weak Lensing
In a moment-based approach to weak-lensing shape measurements, we use quadrupole moments Q ij (i, j = 1, 2) of the surface brightness distribution I(θ) of background galaxy images to quantify the shape of the images as : where q I [I(θ)] is a weight function and ∆θ i = θ i − θ i denotes the offset vector from the image centroid. Here we assume that the weight q I does not explicitly depend on θ but is set by the local value of the brightness I(θ) . The trace of Q ij describes the angular size of the image, while the traceless part describes the shape and orientation of the image. With the quadrupole moments Q ij , we define the complex ellipticity e = e 1 + ie 2 as 7 For an ellipse with a minor-to-major axis ratio of q(≤ 1), |e| = (1 − q 2 )/(1 + q 2 ).
The spin-2 ellipticity e (73) transforms under the lens mapping as where e (s) is the unlensed intrinsic ellipticity and g = γ/(1−κ) is the spin-2 reduced shear. Since e is a nonzero spin quantity with a direction dependence, the expectation value of the intrinsic source ellipticity e (s) is assumed to vanish, i.e., E(e (s) ) = 0, where E(X) denotes the expectation value of X. Schneider and Seitz (1995) showed that Equation (74) with the condition E(e (s) ) = 0 is equivalent to where e n is the image ellipticity for the nth object, w n is a statistical weight for the nth object, and δ g is the spin-2 complex distortion : Note that the complex distortion δ g is invariant under the transformation g → 1/g * . For an intrinsically circular source with e (s) = 0, we have On the other hand, in the weak-lensing limit (|κ|, |γ| 1), Equation (74) reduces to e (s) e − 2g e − 2γ. Assuming random orientations of source galaxies, we average observed ellipticities over a local ensemble of source galaxies to obtain For an input signal of g = 0.1, Equation (77) yields e ≈ 0.198. Hence, the weak-lensing approximation (Equation (78)) gives a reduced-shear estimate of g (est) ≈ 0.099, corresponding to a negative bias of 1%. For g = 0.2 in the mildly nonlinear regime, Equation (78) gives g (est) ≈ 0.192, corresponding to a negative bias of 4%.
In real observations, the reduced shear g may be estimated from a local ensemble average of background galaxies as g e /2. The statistical uncertainty in the reduced shear estimate g decreases with increasing the number of background galaxies N (see Section 4.2 for more details) as ∝ σ/ √ N , with σ the dispersion of background image ellipticities (dominated by the intrinsic shape noise). Weak-lensing analysis thus requires a large number of background galaxies to increase the statistical significance of the shear measurements.

The KSB Algorithm: A Moment-based Approach
For a practical application of weak shear lensing, we must account for various observational and instrumental effects, such as the impact of noise on the galaxy shape measurement (both statistical and systematic uncertainties), the isotropic smearing component of the PSF, and the effect of instrumental PSF anisotropy. Therefore, one cannot simply use Equation (78) to measure the shear signal from observational data.
A more robust estimate of the shape moments (Equation (72)) is obtained by using a weight function W (|θ|) that depends explicitly on the separation |θ| from the image centroid. In the KSB approach, a circular Gaussian that is matched to the size of each object is used as a weight function . The quadrupole moments obtained with such a weight function W (|θ|) suffer from an additional smearing and do not obey the transformation law (Equation (74)). Therefore, the expectation value E(e) of the image ellipticity is different from the distortion δ g = 2g/(1 + |g| 2 ) (see Equation (77)).
The KSB formalism Hoekstra et al. 1998) accounts explicitly for the Gaussian weight function used for measuring noisy shape moments, the effect of spin-2 PSF anisotropy, and the effect of isotropic PSF smearing. The KSB formalism and its variants assume that the PSF can be described as an isotropic function convolved with a small anisotropic kernel. In the limit of linear response to lensing and instrumental anisotropies, KSB derived the transformation law between intrinsic (unlensed) and observed (lensed) complex ellipticities, e (s) and e, respectively. The linear transformation between intrinsic and observed complex ellipticities can be formally expressed as Hoekstra et al. 1998;: where q i denotes the spin-2 PSF anisotropy kernel, (C q ) ij is a linear response matrix for the PSF anisotropy q i , (C g ) ij is a linear response matrix for the reduced shear g i . The PSF anisotropy kernel and the response matrices can be calculated from observable weighted shape moments of galaxies and stellar objects Erben et al. 2001). In real observations, the PSF anisotropy kernel q(θ) can be estimated from image ellipticities e * observed for a sample of foreground stars for which e (s) and g vanish, so that q i (θ) = (C q ) −1 ij e * j . Assuming that the expectation value of the intrinsic source ellipticity vanishes E[e (s) ] = 0, we find the following linear relation between the reduced shear and the ensemble-averaged image ellipticity: In the KSB formalism, the shear response matrix C g is denoted as P g (or P γ ) and dubbed pre-seeing shear polarizability. Similarly, C q is denoted as P sm and dubbed smear polarisability.
A careful calibration of the signal response P g is essential for any weak shear lensing analysis that relies on accurate shape measurements from galaxy images (see Mandelbaum 2018). The levels of shear calibration bias are often quantified in terms of a multiplicative bias factor m and an additive calibration offset c through the following relation between the true input shear signal, g true , and the recovered signal, g obs (Heymans et al. 2006;Massey et al. 2007a;Mandelbaum et al. 2014): The original KSB formalism, when applied to noisy observations, is known to suffer from systematic biases that depend primarily on the size and the detection signal-to-noise ratio (S/N) of galaxy images (e.g., Erben et al. 2001;Refregier et al. 2012). Different variants of the Kaiser et al. (1995) method (KSB+) have been developed and implemented in the literature primarily to study mass distributions of high-mass galaxy clusters (e.g., Hoekstra et al. 1998Hoekstra et al. , 2015Clowe et al. 2004;Umetsu et al. 2010Umetsu et al. , 2014Oguri et al. 2012;von der Linden et al. 2014a;Okabe and Smith 2016;Schrabback et al. 2018). Note that KSB+ pipelines calibrated against realistic image simulations of crowded fields can achieve a ∼ 2% shear calibration accuracy even in the cluster lensing regime (e.g., Herbonnet et al. 2019).

Tangential and Cross Shear Components
As we have seen in Section 3.1, the spin-2 shear components γ 1 and γ 2 are coordinate dependent, defined relative to a reference Cartesian coordinate frame (chosen by the observer). It is useful to consider components of the shear that are coordinate independent with respect to a certain reference point, such as the cluster center. We define a polar coordinate system (ϑ, ϕ) centered on an arbitrary point θ c on the sky, such that θ = (ϑ cos ϕ, ϑ sin ϕ) + θ c . The convergence κ(ϑ) averaged within a circle of radius ϑ about θ c is then expressed as where Σ(ϑ) is the surface mass density averaged within a circle of radius ϑ about θ c and Σ(ϑ) is the surface mass density averaged over a circle of radius ϑ about θ c . The reference point θ c can be taken to be the cluster center, which can be determined from symmetry of the strong-lensing pattern, the X-ray centroid position, or the BCG position. Let us introduce the tangential and 45 • -rotated cross shear components, γ + (θ) and γ × (θ), respectively, defined relative to the position θ c as which are directly observable in the weak-lensing limit where |κ|, |γ| 1 (see Section 3.4). Using the two-dimensional version of Gauss' theorem, we find the following identity for an arbitrary choice of θ c (Kaiser 1995): where we have defined the excess surface mass density ∆Σ(ϑ) around θ c as a function of ϑ by (Miralda-Escude 1991): From Equations (82) and (84), we find Equation (84) shows that, given an arbitrary circular loop of radius ϑ around the chosen center θ c , the tangential and cross shear components averaged around the loop extract Emode and B-mode distortion patterns (Section 3.2). An important implication of the first of Equation (84) is that, with tangential shear measurements from individual source galaxies (see Section 3.4), one can directly determine the azimuthally averaged ∆Σ(ϑ) profile around lenses in the weak-lensing regime, even if the mass distribution Σ(θ) is not axis-symmetric around θ c . Moreover, the second of Equation (84) tells us that the azimuthally averaged × component, or the B-mode signal, is expected to be statistically consistent with zero if the signal is due to weak lensing. Therefore, a measurement of the B-mode signal g × (ϑ) provides a useful null test against systematic errors.

Azimuthally Averaged Reduced Tangential Shear
The reduced tangential shear g + (ϑ) averaged over a circle of radius ϑ about an arbitrary reference point θ c is expressed as If the projected mass distribution around a cluster has quasi-circular symmetry (e.g., elliptical symmetry), then the azimuthally averaged reduced tangential shear g + (ϑ) around the cluster center can be interpreted as where γ + (ϑ) and κ(ϑ) are the tangential shear and the convergence, respectively averaged over a circle of radius ϑ about θ c .
According to N -body simulations in hierarchical ΛCDM models of cosmic structure formation, dark matter halos exhibit aspherical mass distributions that can be well approximated by triaxial mass models (e.g., Jing and Suto 2002;Limousin et al. 2013;Despali et al. 2014). Since triaxial halos have elliptical isodensity contours in projection on the sky (Stark 1977), Equation (88) can give a good approximation to describe the weak-lensing signal for regular clusters with a modest degree of perturbation. However, the approximation is likely to break down for merging and interacting lenses having complex, multimodal mass distributions. To properly model the weak-lensing signal in such a complex merging system, one needs to directly model the two-dimensional reduced-shear field (g 1 (θ), g 2 (θ)) with a lens model consisting of multi-component halos (e.g., Watanabe et al. 2011;Okabe et al. 2011;Medezinski et al. 2013). Alternatively, one may attempt to reconstruct the convergence field κ(θ) in a free-form manner from the observed reduced shear field, with additional constraints or assumptions to break the mass-sheet degeneracy (e.g., Jee et al. 2005;Bradač et al. 2006;Merten et al. 2009;Jauzac et al. 2012;Umetsu et al. 2015;Tam et al. 2020).
On the other hand, for a statistical ensemble of galaxy clusters, the average mass distribution around their centers tends to be spherically symmetric if the assumption of statistical isotropy holds (e.g., Okabe et al. 2013). Hence, the stacked weak-lensing signal for a statistical ensemble of clusters can be interpreted using Equation (88). For more details, see Sections 3.6.4 and 4.5.

Source-averaged Reduced Tangential Shear
With the assumption of quasi-circular symmetry in the projected mass distribution around clusters (see Equation (88)), let us consider the nonlinear effects on the source-averaged cluster lensing profiles. The reduced tangential shear for a given lens-source pair is written as To begin with, let us consider the expectation value for the reduced tangential shear averaged over an ensemble of source galaxies. For a given cluster, the source-averaged reduced tangential shear is expressed as where ... denotes the averaging over all sources, defined such that where the index s runs over all source galaxies around the lens (l) and w s is a statistical weight for each source galaxy. An optimal choice for the statistical weight is w s = 1/σ 2 g + ,s , with σ g + ,s the statistical uncertainty of g + (ϑ|z l , z s ) estimated for each source galaxy. Note that this choice for the weight assumes that σ g + ,s is independent of the lensing shear signal (see Schneider and Seitz 1995;Seitz and Schneider 1995). In the continuous limit, Equation (91) is written as with dN (z)/dz the redshift distribution function of the source sample and w(z) a statistical weight function. For a given cluster lens, In the weak-lensing limit, Equation (90) gives g + γ + . The next order of approximation is given by (Seitz and Schneider 1997): From Equation (93), we see that an interpretation of the averaged weak-lensing signal g + (ϑ|z l ) does not require knowledge of individual source redshifts. Instead, it requires ensemble information regarding the statistical redshift distribution dN (z)/dz of background source galaxies used for weak-lensing measurements.
For a lens at sufficiently low redshift (see Section 2.6.5), f l ≈ 1, thus leading to the single source-plane approximation: g + γ + /(1 − κ ). The level of bias introduced by this approximation is ∆ g + / g + (f l − 1) κ . In typical ground-based deep observations of z l ∼ 0.4 clusters, ∆f l = f l − 1 is found to be of the order of several percent (Umetsu et al. 2014), so that the relative error ∆ g + / g + is negligibly small in the mildly nonlinear regime of cluster lensing.

Source-averaged Excess Surface Mass Density
Next, let us consider the following estimator for the excess surface mass density ∆Σ(ϑ) for a given lens-source pair: This assumes that an estimate of Σ −1 cr (z l , z s ) for each individual source galaxy is available, for example, from photometric-redshift (photo-z) measurements. This estimator is widely used in recent cluster weak-lensing studies thanks to the availability of multi-band imaging data and the advances in photo-z techniques (see Section 4.1).
In real observations, if the photo-z probability distribution function (PDF), P s (z), is available for individual source galaxies (s), one can calculate averaged over the PDF for each source galaxy. Similarly to Equation (90), ∆Σ + (ϑ|z l , z s ) averaged over all sources is expressed as with where the index s runs over all source galaxies around the lens (l) and w ls is a statistical weight for each source galaxy. An optimal choice for the statistical weight is where σ g + ,s is the statistical uncertainty of g + (ϑ|z l , z s ) estimated for each source galaxy (Section 3.6.2). In the weak-lensing limit, we thus have ∆Σ + ∆Σ. The next order of approximation is

Lens-Source-averaged Excess Surface Mass Density
Finally, we consider an ensemble of galaxy clusters. Now, let ∆Σ be the ensemble mass distribution of these clusters. Then, ∆Σ + averaged over all lens-source (ls) pairs is expressed as (Johnston et al. 2007): with where ... denotes the averaging over all lens-source pairs, the double summation is taken over all clusters (l) and all source galaxies (s), and w ls is a statistical weight for each lenssource pair (ls). An optimal choice for the statistical weight is given by Equation (98). Again, the weak-lensing limit yields ∆Σ + ∆Σ and the next order of approximation is given by (Umetsu et al. 2014: Equation (102) can be used to interpret the stacked weak-lensing signal including the nonlinear regime of cluster lensing. In Section 4.5, we provide more details on the stacked weak-lensing methods.

Aperture Mass Densitometry
In this subsection, we introduce a nonparametric technique to infer a projected total mass estimate from weak shear lensing observations. Integrating Equation (86) between two concentric radii ϑ in and ϑ out (> ϑ in ), we obtain an expression for the ζ statistic as (Fahlman et al. 1994;Kaiser 1995;Squires and Kaiser 1996): where κ(ϑ in , ϑ out ) is the convergence averaged within a concentric annulus between ϑ in and ϑ out , In the weak-lensing regime where γ + (ϑ) g + (ϑ), ζ can be determined uniquely from the shape distortion field in a finite annular region at ϑ in θ ϑ out , because additive constants κ 0 from the invariance transformation (Equation (49)) cancel out in Equation (103). Note that this technique is also referred to as aperture mass densitometry.
Since galaxy clusters are highly biased tracers of the cosmic mass distribution, κ(ϑ in , ϑ out ) around a cluster is expected to be positive, so that ζ(ϑ in , ϑ out ) yields a lower limit to κ(ϑ in ). That is, the quantity M ζ ≡ π(D l ϑ in ) 2 Σ cr ζ(ϑ in , ϑ out ) yields a lower limit to the projected mass inside a circular aperture of radius ϑ in , M 2D = π(D l ϑ in ) 2 Σ(ϑ in ). This technique provides a powerful means to estimate the total cluster mass from shear data in the weaklensing regime |γ| 1. We now introduce a variant of aperture mass densitometry defined as (Clowe et al. 2000): where the aperture radii (ϑ, ϑ in , ϑ out ) satisfy ϑ < ϑ in < ϑ out , and the first and second terms in the second line of Equation (105) are equal to κ(ϑ) − κ(ϑ in ) and κ(ϑ in ) − κ(ϑ in , ϑ out ), respectively. In the weak-lensing limit, the quantity yields a lower limit to the projected mass inside a circular aperture of radius ϑ, that is, We can regard ζ c (ϑ|ϑ in , ϑ out ) as a function of ϑ for fixed values of (ϑ in , ϑ out ) and measure ζ c (ϑ|ϑ in , ϑ out ) at several independent aperture radii ϑ. As in the case of the standard ζ statistic (Equation (103)), one may choose the inner and outer annular radii (ϑ in , ϑ out ) to lie in the weak-lensing regime where g + γ + . In general, however, ϑ may lie in the nonlinear regime where γ + is not directly observable. In the subcritical regime, we can express γ + (ϑ) in terms of the observed reduced tangential shear g + (ϑ) as when assuming a quasi-circular symmetry in the projected mass distribution (Section 3.6). If these conditions are satisfied, for a given boundary condition κ 0 ≡ κ(ϑ in , ϑ out ), Equation (105) can be solved iteratively as : where we have introduced a differential operator defined asL(ϑ) = 1 2ϑ 2 d d ln ϑ ϑ 2 that satis-fiesLκ(ϑ) = κ(ϑ) andL1 = 1, and the quantities indexed by (n) refer to those in the nth iteration (n = 0, 1, 2, ...).
We solve a discretized version of Equation (109). See Appendix A of Umetsu et al. (2016) for discretized expressions for g + (ϑ) and κ(ϑ). One can start the iteration process with an initial guess of κ (0) (ϑ) = 0 for all ϑ bins and repeat it until convergence is reached in all ϑ bins. This procedure will yield a solution for the binned mass profile, for a given value of κ 0 . Note that the errors for the mass profile solution in different radial bins are correlated. The bin-to-bin error covariance matrix can be calculated with the linear approximation κ(ϑ) 1 in Equation (109), by propagating the errors for the binned g + (ϑ) profile (e.g., Okabe and Umetsu 2008;Umetsu et al. 2010;Okabe et al. 2010).
Alternatively, one can attempt to determine the boundary term κ 0 from shear data by incorporating additional iteration loops. Starting with an initial guess of κ 0 = 0, one can update the value of κ 0 in each iteration by using a specific mass model (e.g., a powerlaw profile) that best fits the binned κ(ϑ) profile. This iteration procedure is repeated until convergence is obtained (see Umetsu et al. 2010).

Standard Shear Analysis Methods
In this section, we outline procedures to obtain cluster mass estimates from azimuthally averaged reduced tangential shear measurements for a given galaxy cluster.

Background Source Selection
A critical source of systematics in weak lensing comes from accurately estimating the redshift distribution of background source galaxies, which is needed to convert the lensing signal into physical mass units (Medezinski et al. 2018b). Contamination of background samples by unlensed foreground and cluster galaxies with Σ −1 cr (z l , z s ) = 0, when not accounted for, leads to a systematic underestimation of the true lensing signal. Inclusion of foreground galaxies produces a dilution of the lensing signal that does not depend on the cluster-centric radius. In contrast, the inclusion of cluster galaxies significantly dilutes the lensing signal at smaller cluster radii and causes an underestimation of the concentration of the cluster mass profile (Broadhurst et al. 2005b), as well as of the halo mass M ∆ especially at higher overdensities ∆. The level of contamination by cluster galaxies increases with the cluster mass or richness (see Figure 12). A secure selection of background galaxies is thus key for obtaining accurate cluster mass estimates from weak gravitational lensing (Medezinski et al. 2007(Medezinski et al. , 2018bUmetsu and Broadhurst 2008;Okabe et al. 2013;Gruen et al. 2014).
In real observations, acquiring spectroscopic redshifts for individual source galaxies is not feasible, particularly to the depths of weak-lensing observations. Instead of spectroscopic redshifts, photo-z's can be used when multi-band imaging is available. Cluster weaklensing studies, however, often rely on two to three optical bands for deep imaging (e.g., Broadhurst et al. 2005b;Medezinski et al. 2010;Oguri et al. 2012;Okabe and Smith 2016), so that reliable photo-z's could not be obtained. Instead, well-calibrated field photo-z catalogs, such as COSMOS (Ilbert et al. 2009;Laigle et al. 2016), were used to determine the . Upper panels show the excess surface mass density profile ∆Σ + ; middle panels show the 45 • -rotated component, expected to be consistent with zero; and lower panels show the effective number density of source galaxies. All quantities in this figure were calculated using photo-z PDFs of individual source galaxies, P (z) (see also Section 3.6.3). Different lines in each panel show different source selection schemes: using all galaxies incorporating their P (z) (black), using P -cut selected galaxies (for which 98% of P (z) lies behind z l + 0.2; cyan), or CC-cutselected galaxies (blue). Weak-lensing S/N values for each selection are given in the legend of each panel. The figure is adapted from Medezinski et al. (2018b). redshift distribution dN (z)/dz of background galaxies for a given color-magnitude selection Okabe et al. 2010). Such field surveys are often limited to deep but small areas and thus subject to cosmic variance.
Dedicated wide-area optical surveys, such as the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP; Miyazaki et al. 2018a;Aihara et al. 2018a,b), the Dark Energy Survey (DES; Abbott et al. 2018), and the upcoming Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008), are designed to observe in several broad bands, so that photo-z's are better determined. These photo-z estimates will still suffer from a large fraction of outliers due to inherent color-redshift degeneracies, as limited by a finite number of broad optical bands. The photo-z uncertainties are folded in by incorporating the full PDF for each source galaxy ). However, photo-z PDFs are often sensitive to the assumed priors. Moreover, the accuracy of photo-z PDFs will be limited by the representability of spectroscopic-redshift samples used for calibration. Alternative approaches rely on more stringent color cuts to reject objects with biased photo-z's (Medezinski et al. , 2011Umetsu et al. 2010Umetsu et al. , 2012Umetsu et al. , 2014Okabe et al. 2013), which however lead to lower statistical power because they result in lower source galaxy densities.
Using the first-year CAMIRA (Cluster finding Algorithm based on Multiband Identification of Red-sequence gAlaxies; Oguri et al. 2018) catalog of ∼ 900 clusters (0.1 < z l < 1.1) with richness N 20 found in ∼ 140 deg 2 of HSC-SSP survey data, Medezinski et al. (2018b) investigated robust source selection methods for cluster weak lensing. They compared three different source selection schemes: (1) relying on photo-z's and their full PDFs P (z) to correct for dilution (all), (2) selecting background galaxies in color-color space (CC-cut), and (3) selection of robust photo-z's by applying constraints on their cumulative PDF (P -cut). All three methods use photo-z PDFs of individual source galaxies, P (z), to convert the lensing signals into physical mass units. With perfect P (z) information, all these methods should thus yield consistent, undiluted ∆Σ + (R) profiles. After applying basic quality cuts, Medezinski et al. (2018b) found the typical mean unweighted galaxy number density in the HSC shape catalog to be n g = 21.7 arcmin −2 . Similarly, they found n g = 11.6 arcmin −2 and n g = 13.8 arcmin −2 for cluster lenses at z l < 0.4 using the CC-cut and P -cut methods, respectively. Medezinski et al. (2018b) showed that simply relying on the photo-z PDFs (all) results in a ∆Σ + profile that suffers from dilution due to residual contamination by cluster galaxies. Using proper limits, the CC-and P -cut methods give consistent ∆Σ + profiles with minimal dilution. Differences are only seen for rich clusters with N 50, where the P -cut method produces a slightly diluted signal in the innermost radial bin compared to the CC-cuts (see Figure 12). Employing either the P -cut or CC-cut selection results in cluster contamination consistent with zero to within the 0.5% uncertainties. For more details on the source selection methods, we refer the reader to Medezinski et al. (2018b) and references therein.

Tangential Shear Signal
Here we describe a procedure to derive azimuthally averaged radial profiles of the tangential (+) and cross (×) shear components around a given cluster lens at a certain redshift, z l . Specifically, we calculate for each cluster the lensing profiles, b=1 , in N bin discrete cluster-centric bins spanning the range ϑ ∈ [ϑ min , ϑ max ]. Since weak shear measurements of individual background galaxies (Equation (80)) are very noisy, we calculate the weighted average of the source ellipticity components as where the summation is taken over all source galaxies (s) that lie in the bin (b); g +,s and g ×,s represent the tangential and 45 • -rotated cross components of the reduced shear (Equation 83), respectively, estimated for each source galaxy; and w s is its statistical weight. The azimuthally averaged cross component, g × (ϑ) , is expected to be statistically consistent with zero (see Section 3.6.1). The statistical uncertainty per shear component per source galaxy is denoted by σ g + ,s = σ g × ,s ≡ σ g,s , which is dominated by the shape noise. Here σ g,s includes both contributions from the shape measurement uncertainty and the intrinsic dispersion of source ellipticities (e.g., Mandelbaum 2018). In general, an optimal choice for weighting is to apply an inversevariance weighting with w s = 1/σ 2 g,s (Section 3.6.2). However, using inverse-variance weights from noisy variance estimates may result in an unbalanced weighting scheme (e.g., sensitive to extreme values). To avoid this, one can employ a variant of inverse-variance weighting, w s = 1/(σ 2 g,s + α 2 g ), with α g a properly chosen softening constant (see, e.g., Hamana et al. 2003;Umetsu et al. 2009Umetsu et al. , 2014Okabe et al. 2010;Oguri et al. 2010;Okabe and Smith 2016). The error variance per shear component for g +,× (ϑ b ) is given by where we have assumed isotropic, uncorrelated shape noise, E(∆g i,s ∆g j,s ) = σ 2 g,s δ ss δ ij (i, j = +, ×) with s and s running over all source galaxies.
To quantify the significance of the tangential shear profile measurement {g + (ϑ b )} N bin b=1 , we define a linear S/N estimator by Umetsu et al. 2020): This estimator gives a weak-lensing S/N integrated over the radial range of the data. Equation (113) assumes that the total uncertainty is dominated by the shape noise and ignores the covariance between different radial bins. Note that we shall use the full covariance matrix for cluster mass measurements (Section 4.4). This S/N estimator is different from the conventional quadratic estimator defined by (e.g., Umetsu and Broadhurst 2008;Okabe and Smith 2016): As discussed by Umetsu et al. (2016Umetsu et al. ( , 2020, this quadratic definition breaks down and leads to an overestimation of significance in the noise-dominated regime where the actual per-bin S/N is less than unity. The observed g + (ϑ) profile can be interpreted according to Equation (93). Then, it is important to define the corresponding bin radii ϑ b so as to minimize systematic bias in cluster mass measurements. We define the effective clutter-centric bin radius ϑ b (b = 1, 2, ..., N bin ) using the weighted harmonic mean of lens-source transverse separations as : If source galaxies are uniformly distributed in the image plane and w s is taken to be constant, Equation (115) in the continuous limit yields ϑ b1 dϑw(ϑ)] −1 = (ϑ b1 + ϑ b2 )/2 for a single radial bin defined in the range ϑ ∈ [ϑ b1 , ϑ b2 ]. 8

NFW Model
The radial mass distribution of galaxy clusters is often modeled with a spherical Navarro-Frenk-White (Navarro et al. 1996, hereafter NFW) profile, which has been motivated by 8 In general, the weighted bin center is defined by ϑ dϑϑw(ϑ)] −1 with w(ϑ) a weight function. Assuming a power-law form for the weight function w(ϑ) ∝ ϑ −n , we see that Equation (115) corresponds to the case where w(ϑ) ∝ ϑ −1 , which is optimal for an isothermal density profile with γ + (ϑ) = κ(ϑ) ∝ 1/ϑ  cosmological N -body simulations (Navarro et al. 1996(Navarro et al. , 1997Oguri and Hamana 2011;Child et al. 2018). The radial dependence of the two-parameter NFW density profile is given by where ρ s is the characteristic density parameter and r s is the characteristic scale radius at which the logarithmic density slope, γ 3D (r) ≡ d ln ρ(r)/d ln r, equals −2. The logarithmic gradient of the NFW profile is γ 3D (r) = −[1 + 3(r/r s )]/[1 + (r/r s )]. For r/r s 1, γ 3D → −1, whereas for r/r s 1, γ 3D → −3. The radial range where the logarithmic density slope is close to the "isothermal" value of −2 is particularly important, given that such a mass distribution is needed to explain the flat rotation curves observed in galaxies.
The overdensity mass M ∆ is given by integrating Equation (116) out to the corresponding overdensity radius r ∆ at which the mean interior density is ∆ × ρ c (z l ) (Section 1). For a physical interpretation of the cluster lensing signal, it is useful to specify the NFW model by the halo mass, M 200c , and the concentration parameter, c 200c = r 200c /r s . The characteristic density ρ s is then given by Analytic expressions for the radial dependence of the projected NFW profiles, Σ NFW (R) = 2ρ s r s × f NFW (r/r s ) and Σ NFW (R) = 2ρ s r s × g NFW (r/r s ) with R = D l ϑ, are given by (Wright and Brainerd 2000, see also Bartelmann 1996): and The excess surface mass density for an NFW halo is then obtained as ∆Σ NFW (R) = Σ NFW (R) − Σ NFW (R). These projected NFW functionals provide a good approximation for the projected matter distribution around cluster-size halos (Oguri and Hamana 2011).
As an example, we show in Figure 13 the reduced tangential and 45 • -rotated shear profiles g + (ϑ) and g × (ϑ) , respectively, for two high-mass clusters, Abell 2142 and Abell 1689, obtained from Subaru/Suprime-Cam data ). The g + (ϑ) profiles are compared with their best-fit NFW and singular isothermal sphere (SIS) models. The SIS density profile is given by ρ(r) = σ 2 v /(2πGr 2 ), with σ v the one-dimensional isothermal velocity dispersion. For both clusters, the observed g + (ϑ) profiles are better fitted by the NFW model having an outward-steepening density profile. Abell 2142 is a nearby cluster at z l = 0.091 perturbed by merging substructures (e.g., Okabe and Umetsu 2008;Umetsu et al. 2009;Liu et al. 2018). The radial curvature observed in the g + profile of Abell 2142 is highly pronounced, so that the power-law SIS model is strongly disfavored by the Subaru weak-lensing data. From the best-fit NFW model, the mass and concentration parameters of Abell 2142 are constrained as M 200c = (13.0 ± 2.7) × 10 14 M and c 200c = 4.1 ± 0.8 Liu et al. 2018). In contrast, Abell 1689 (z l = 0.183) is among the best-studied clusters and the most powerful known lenses to date (e.g., Broadhurst et al. 2005a;Limousin et al. 2007;Umetsu and Broadhurst 2008;Lemze et al. 2009;Kawaharada et al. 2010;Coe et al. 2010;Diego et al. 2015;Umetsu et al. 2011bUmetsu et al. , 2015, characterized by a large Einstein radius (Section 2.6.6) of ϑ Ein = (47.0 ± 1.2) arcsec for a fiducial source at z s = 2 (Coe et al. 2010). This indicates a high degree of mass concentration in projection of the sky. In fact, the observed g + (ϑ) profile of Abell 1689 is well fitted by an NFW profile with a high concentration of c 200c ∼ 10 (Broadhurst et al. 2005b;Umetsu and Broadhurst 2008;Umetsu et al. 2009Umetsu et al. , 2015Coe et al. 2010), compared to the theoretical expectations, c 200c ∼ 4 (e.g., Bhattacharya et al. 2013;Diemer and Kravtsov 2015). From full triaxial modeling of two-dimensional weak-lensing, X-ray and Sunyaev-Zel'dovich effect (SZE) observations, Umetsu et al. (2015) obtained M 200c = (17.3 ± 2.7) × 10 14 M and c 200c = 7.91 ± 1.41, which overlaps with the 1σ tail of the predicted distribution of halo concentration. Moreover, the multi-probe data set is in favor of a triaxial geometry with a minor-to-major axis ratio of c/a = 0.39 ± 0.15 and a major axis closely aligned with the line of sight by (22 ± 10) • . Therefore, the superb lensing efficiency of Abell 1689 is likely caused by its intrinsically high mass concentration combined with a chance alignment of its major axis with the line-of-sight direction (see also Oguri et al. 2005).

Halo Model
At large cluster-centric distances, the correlated matter around the cluster, namely the 2-halo term (Cooray and Sheth 2002), contributes to the lensing signal. In the standard halo-model prescription (Oguri and Takada 2011;Oguri and Hamana 2011), the total lensing signal ∆Σ(R) is given as the sum of the 1-halo and 2-halo terms. The 1-halo term ∆Σ 1h accounts for the mass distribution within the main cluster halo, which can be described by a smoothly truncated NFW profile (Baltz et al. 2009, hereafter BMO), where the truncation parameter r t is set to a fixed multiple of the halo outer radius (e.g., r t ≈ 2.6r vir or r t ≈ 3r 200c ; see Oguri and Hamana 2011;Covone et al. 2014;Umetsu et al. 2014). Analytic expressions for the radial dependence of the projected BMO profiles are given by Baltz et al. (2009) and Oguri and Hamana (2011). The 2-halo term contribution ∆Σ 2h to the tangential shear signal is expressed as (Oguri and Takada 2011;Oguri and Hamana 2011, see de Putter and Takada 2010 for the full-sky expression): 9 where b h (M 200c ; z l ) is the linear halo bias (e.g., Tinker et al. 2010), k ≡ /[(1+z l )D l (z l )], P (k; z l ) is the linear matter power spectrum as a function of the comoving wavenumber k evaluated at the cluster redshift z l , and J n (x) is the Bessel function of the first kind and the nth order. We can compute the corresponding radial profile Σ 2h (R|M 200c , z l ) of the lensing convergence by replacing J 2 (x) in Equation (121) with the zeroth-order Bessel function J 0 (x). The 2-halo term is proportional to the product b h σ 2 8 , with σ 8 the rms amplitude of linear mass fluctuations in a sphere of comoving radius 8h −1 Mpc. In the standard ΛCDM model, the 2-halo term contribution to ∆Σ (or Σ) becomes important, on average, at R > ∼ several (or two) virial radii (Oguri and Hamana 2011;Becker and Kravtsov 2011). In particular cases where clusters are residing in extremely dense environments, the 2-halo contribution to the lensing signal could become more significant (Sereno et al. 2018a).

DK14 Model
Diemer and Kravtsov (2014, hereafter DK14) provide a useful fitting function for the spherically averaged density profile ρ(r) around dark matter halos calibrated for a suite of N -body simulations in ΛCDM cosmologies. The DK14 density profile is given by with r piv = 5r 200m and ∆ max = 10 3 , which is introduced as a maximum cutoff density of the outer term to avoid a spurious contribution at small halo radii (Diemer 2018). The inner profile ρ inner (r) describes the intra-halo mass distribution, which is modeled by an Einasto profile (Einasto 1965) with ρ −2 and r −2 the scale density and radius at which the logarithmic slope is −2 and α E the shape parameter describing the degree of profile curvature. The transition term f trans (r) characterizes the steepening around a truncation radius, r t . The outer term ρ outer , given by a softened power law, is responsible for the 2halo term. DK14 found that this fitting function provides a precise description ( < ∼ 5%) of their simulated DM density profiles at r < ∼ 9r vir . At larger radii, the outer term is expected to follow a shape proportional to the matter correlation function (e.g., Oguri and Takada 2011). As in the case of the NFW profile, it is useful to define the halo concentration by The DK14 profile is described by eight parameters, (ρ −2 , r −2 , α E , β, γ, r t , b e , s e ), and is sufficiently flexible to reproduce a range of fitting functions, such as the halo model (Oguri and Hamana 2011;Hikage et al. 2013) and density profiles with a sharp steepening feature associated with the splashback radius (see Section 6.3). Equation (122) can be used as a fitting function in conjunction with generic priors for some of the shape and structural parameters (see Diemer and Kravtsov 2014;More et al. 2015;Umetsu and Diemer 2017;Chang et al. 2018). By projecting ∆ρ(r) along the line of sight, we obtain the surface mass density responsible for gravitational lensing as where the line-of-sight integral is evaluated numerically. The publicly available code, COLOS-SUS (Diemer 2018), implements a range of calculations relating to three-dimensional and projected halo profiles including the NFW, Einasto, and DK14 models.

Shear Likelihood Function
The likelihood function L of a mass model for weak shear observations is written as where C is the N bin × N bin error covariance matrix for the binned reduced tangential shear profile d and g + (ϑ b |p) represents the theoretical expectation for g + (ϑ b ) (Equation (93)) predicted by the model parameterized by a set of parameters p. Note that modeling of the cluster lensing signal g + (ϑ|p) requires information of the lensing depth Σ −1 cr averaged over the source redshift distribution (Section 3.6.2). Similarly, one can define a likelihood function for the lensing convergence profile κ(ϑ), which can be reconstructed from combined shear and magnification measurements (e.g., Umetsu et al. 2011bUmetsu et al. , 2014. A well-characterized inference of the model parameters p can be obtained within the Bayesian framework by properly choosing the priors . In this context, when interpreting the cluster lensing signal with an NFW profile (Section 4.3.1), it is useful to take p = (M 200c , c 200c ) as fitting parameters. 10 Tangential shear fitting with a spherical NFW profile is a standard approach for measuring individual cluster masses from weak lensing (e.g., Okabe et al. 2010;Applegate et al. 2014;Hoekstra et al. 2015). Numerical simulations suggest that mass estimates from tangential shear fitting can be biased low (by ∼ 5%−10%; Meneghetti et al. 2010;Becker and Kravtsov 2011;Rasia et al. 2012) because local substructures that are abundant in the outskirts of massive clusters dilute the shear tangential to the cluster center. Moreover, systematic deviations of the lensing signal from the assumed NFW profile form in projection can lead to a substantial level of mass bias, even if the spherically averaged density profiles ρ(r) in three dimensions are well described by the NFW form (e.g., Sereno et al. 2016;Umetsu et al. 2020). Therefore, it is highly important to optimize the radial range for tangential shear fitting so as to alleviate the mass bias (  . The black solid line shows the total covariance C; the blue dashed line is the uncertainty due to intrinsic shapes of source galaxies (C stat , denoted as C shape in this paper), the orange dotted line is the covariance due to intrinsic variations of the projected cluster lensing signal (C int ), and the green dashed-dotted line is the cosmic noise covariance due to large-scale structure uncorrelated with clusters (C lss ). Shape noise is dominant for R < ∼ 3h −1 Mpc (comoving), while the cosmic noise dominates at larger separations. Right panel: correlation matrix of the stacked total covariance as a function of the radial bin. The correlation between radial bins appears at large separation due to the cosmic noise covariance. The figure is adapted from Miyatake et al. (2019).
In order to obtain robust constraints on the underlying cluster mass distribution, we need to ensure that the shear likelihood function (Equation (124)) includes all relevant sources of uncertainty (Gruen et al. 2015). Following Umetsu et al. (2016Umetsu et al. ( , 2020, we decompose the error covariance matrix C for where (C shape ) bb = σ 2 shape (ϑ b )δ bb is the diagonal statistical uncertainty due to the shape noise (Equation (112)); (C lss ) bb is the cosmic noise covariance due to uncorrelated largescale structure projected along the line of sight (Schneider et al. 1998;Hoekstra 2003); and (C int ) bb accounts for statistical fluctuations of the projected cluster lensing signal due to intrinsic variations associated with assembly bias and cluster asphericity (Gruen et al. 2015). Figure 14 shows the diagonal elements of the covariance matrix used in a stacked weaklensing analysis of Miyatake et al. (2019) and the correlation matrix, defined with the total covariance matrix as C bb / √ C bb C b b . A similar figure but for the κ profile was presented in Umetsu et al. (2016), which presents a joint weak and strong lensing analysis of 20 highmass clusters targeted by the CLASH survey (Cluster Lensing And Supernova survey with Hubble; Postman et al. 2012;Umetsu et al. 2014).
The elements of the C lss matrix are given by (Hoekstra 2003;Oguri and Takada 2011): whereĴ 2 ( ϑ b ) is the Bessel function of the first kind and second order averaged over the bth annulus (for the case of the binned convergence profile, see Umetsu et al. 2011aUmetsu et al. , 2016Gruen et al. 2015); and P κ ( ) is the two-dimensional convergence power spectrum (see Equation (37)) as a function of angular multipole calculated using the flat-sky and Limber's approximation as (Limber 1953;Kaiser 1992): with χ the comoving coordinate along the line of sight, W (χ, χ s ) = r(χ s − χ)/r(χ s ) the ratio of angular diameter distances D ls /D s , and P NL (k; χ) the nonlinear matter power spectrum. The convergence power spectrum P κ ( ) can be evaluated for a given source population and a cosmological model. In Equation (127), we have assumed a single comoving distance χ s corresponding to the effective single-plane redshift of source galaxies (i.e., all source galaxies lying at χ = χ s ). Provided that ∆ϑ b /ϑ b 1 with ∆ϑ b the radial bin width, we haveĴ 2 ( ϑ b ) J 2 ( ϑ b ) (without bin-averaging) in Equation (126).
The C int matrix describes statistical fluctuations of the projected cluster lensing signal at fixed halo mass due to intrinsic variations in halo concentration, triaxiality and orientation, and correlated secondary structures around the cluster, as well as to deviations from the assumed NFW form (Becker and Kravtsov 2011;Gruen et al. 2015). 11 Gruen et al. (2015) constructed a semi-analytical model of C int that is calibrated to cosmological numerical simulations. Umetsu et al. (2016) found that the diagonal part of the intrinsic covariance for the convergence κ can be well approximated by 12 with α int = 0.2 in the intracluster (1-halo) regime at R = D l ϑ < ∼ r 200m . This suggests that the intrinsic variance is self-similar in the sense that (C int κ ) bb /κ(ϑ b ) ∼ const. A further simplification can be made by setting the off-diagonal elements of C int κ to zero, i.e., (C int κ ) bb = α 2 int κ 2 (ϑ b )δ bb . In general, the diagonal approximation to C int κ can lead to an underestimation of parameter uncertainties (Gruen et al. 2015, see their Figure 5), where the degree of underestimation depends on the binning scheme, depth of weak-lensing observations, and halo mass. The impact of the diagonal approximation is more severe for 11 When simultaneously determining the mass and concentration for a given individual cluster, strictly speaking, the contribution from the intrinsic variance in concentration should be excluded from C int . Nevertheless, the effect of the concentration variance becomes important only at small cluster radii (Gruen et al. 2015). 12 Following Umetsu et al. (2016Umetsu et al. ( , 2020, this formalism excludes the external contribution from C lss (Equation (126)), which was formally included in the C int covariance by Gruen et al. (2015). deeper observations (or higher S/N weak-lensing data). 13 Assuming a representative mass profile, it is possible to convert the intrinsic covariance matrix C int κ for the convergence into that for the tangential shear. This can be done by assuming an NFW density profile together with the concentration-mass relation c 200c (M 200c , z) for a given cosmological model (Miyatake et al. 2019;Umetsu et al. 2020). The covariance C int for the g + profile obtained in this way thus depends on halo mass. Miyatake et al. (2019) found that, however, the intrinsic covariances with different halo masses remain nearly self-similar in their shapes once scaled by R → R/r 200m .

Stacked Weak Lensing Estimator
Stacking an ensemble of galaxy clusters helps average out large statistical fluctuations inherent in noisy weak-lensing observations of individual clusters. The statistical precision can be largely improved by stacking together a large number of clusters, allowing for tighter and more robust constraints on the ensemble properties of the cluster mass distribution. A stacked lensing analysis is thus complementary to an alternative approach that relies on individual cluster mass measurements (Sections 4.2 and 4.4). In particular, a comparison of the two approaches provides a useful consistency check in different S/N regimes (e.g., Okabe et al. 2010;Umetsu et al. 2014Umetsu et al. , 2016Umetsu et al. , 2020Okabe and Smith 2016).
Let us consider an ensemble of N galaxy clusters. We model the ensemble mass distribution of these clusters in terms of the excess surface mass density profile as Specifically, our model is a vector of N bin parameters containing the binned ∆Σ(R) profile as a function of the projected cluster-centric radius R (see Section 3.5). Here we aim to construct an unbiased estimator for the model m, or the ensemble ∆Σ(R) profile, given weak-lensing observations of N individual clusters. We assume that these clusters are distributed in redshift, having different geometric responses to the lensing signal through Σ −1 cr (z l , z s ). We express weak-lensing observations d l = { g + (R b |z l ) } N bin b=1 for a given cluster (l) as a sum of the signal vector s l and the noise vector n l as d l = s l + n l (l = 1, 2, ..., N ), with s l = a l m, where the response coefficient a l represents the source-averaged inverse critical surface mass density evaluated for the lth cluster (Equation (91)), In this expression, we assume that both d l and a l have been averaged over an ensemble of source galaxies to represent the respective source-averaged quantities for the lth cluster. For simplicity, we have ignored the nonlinearity between the lensing signal g + and the surface mass density ∆Σ (see Section 3.6.2). 14 We refer to Umetsu et al. (2020) for a treatment 13 Adopting a constant logarithmic binning with ∆ ln ϑ ∼ 0.3, Umetsu et al. (2016) found that the lensing S/N estimated using the diagonal approximation to C int κ is accurate to ∼ 10% for their ground-based weaklensing observations of high-mass clusters with M 200c ∼ 10 15 h −1 M . 14 Remember that the observable quantity for weak shear lensing is the reduced shear, g = γ/(1 − κ).
of the stacked weak-lensing analysis accounting for the nonlinear correction for the sourceaveraging effect. Assuming that n obeys Gaussian statistics and that the noise vectors between different clusters are statistically independent, we can write the total likelihood function of observa- where C l = n l n t l is the error covariance matrix (Section 4.4) for the lth cluster and N l = (2π) −N bin /2 |C l | −1/2 is a normalization factor. In ground-based cluster weak-lensing observations, the shear covariance matrix (C l ) bb per cluster (b, b = 1, 2, ..., N bin ) is dominated by the statistical uncertainty due to the shape noise. The contribution from cosmic noise (Section 4.4) becomes important at large cluster-centric distances ( Figure 14).
The total log-likelihood function ln P (d|m) is expressed as According to Bayes' theorem the posterior probability distribution of m given the data d is where P (m) is the prior probability distribution for the model m and P (d) is the evidence that serves as a normalization factor here. We assume an uninformative uniform prior for our model m, such that P (m|d) ∝ P (d|m). By maximizing ln P (m|d) with respect to m, we obtain the desired expression for the stacked weak-lensing estimator m as (e.g., Umetsu et al. 2011a): Note that the weight assigned to ∆Σ + of each cluster is proportional to a 2 l = Σ −1 cr,l 2 (see also Equation (98)) because s l ∝ a l . The error covariance matrix C for the stacked estimator ∆Σ + (Equation (136)) is given by with F the Fisher information matrix defined as (e.g., Umetsu et al. 2011a): The total S/N for detection is given by (e.g., Umetsu and Broadhurst 2008): Again, this quadratic S/N estimator breaks down and leads to an overestimation of significance if the actual per-bin S/N is less than unity (see Section 4.2). It is noteworthy that interpreting the effective mass from the stacked lensing signal (Equation (136)) requires caution especially when the cluster sample spans a wide range in mass and redshift. This is because the amplitude of the lensing signal is weighted by the redshift-dependent sensitivity and it is not linearly proportional to the cluster mass (e.g., Mandelbaum et al. 2005;Umetsu et al. 2016Umetsu et al. , 2020Melchior et al. 2017;. We refer the reader to Miyatake et al. (2019) and Murata et al. (2019) for further discussion of this issue. The results are based on Subaru HSC-SSP survey data, shown for three different source-selection methods (black squares, blue squares, and red circles). The data points with different selection methods are horizontally shifted for visual clarity. The solid line and the dashed line represent the best-fit NFW model and the halo model, respectively, derived from the fiducial P -cut measurements. The dotted line shows the 2-halo term of the best-fit halo model. The figure is adapted from Umetsu et al. (2020). Figure 15 shows the stacked weak-lensing signals around a sample of 136 spectroscopically confirmed X-ray groups and clusters at 0.033 z l 1.033 selected from the XMM-XXL survey . Their weak-lensing analysis is based on HSC-SSP survey data. The figure compares stacked ∆Σ + profiles of the XXL sample obtained with different source selection methods (see Section 4.1). This comparison shows no significant difference between these profiles within errors in all bins. From a single-mass-bin NFW fit to the stacked shear profile, Umetsu et al. (2020) found M 200c = (8.7±0.8)×10 13 h −1 M and c 200c = 3.5 ± 0.9 at a lensing-weighted mean redshift of z l ≈ 0.25. This is in agreement with the mean concentration expected for dark matter halos in the standard ΛCDM cosmology, c 200 ≈ 4.1 at M 200c = 8.7 × 10 13 h −1 M and z = 0.25 (e.g., Diemer and Joyce 2019). Figure 15 also displays the best-fit halo model including the effects of surrounding large-scale structure as a 2-halo term. Figure 15 shows that the 2-halo term in the range R ∈ [0.3, 3] h −1 Mpc (comoving) is negligibly small even in low-mass clusters and groups (e.g., Leauthaud et al. 2010;Covone et al. 2014;, for which the maximum radius corresponds to ∼ 3r 200c . This is because the tangential shear ∆Σ(R) = Σ(R) − Σ(R) is insensitive to flattened sheet-like structures .

Projected Halo Shape and Multipole Expansion
Halos formed in collisionless CDM simulations are not spherical and can have complex shapes. A more realistic description of individual cluster halos is as triaxial ellipsoids with minor-to-major axis ratios of order a/c ∼ 0.5, slowly increasing with halo-centric radius (Jing and Suto 2002;Bonamigo et al. 2015). More massive halos are less spherical and more prolate, as they tend to form later. The projected matter distributions around clusters are thus expected to be anisotropic, with typical axis ratios of q ∼ 0.6 (e.g., Okabe et al. 2018). The projected axis ratio of cluster halos varies slowly with cluster-centric distance (e.g., Okabe et al. 2018).
Following Adhikari et al. (2015), we introduce a formalism that allows for modeling the effects of halo ellipticity on weak shear observables based on an angular multipole expansion of the lensing fields. 15 Let us write the azimuthally averaged projected mass density profile as Assuming that q is constant with cluster-centric radius, we can write the surface mass density around clusters as Σ(x, y) ∝ R −η 0 q (Adhikari et al. 2015;Clampitt and Jain 2016), where R q is an elliptical radial coordinate defined as (Evans and Bridle 2009;Oguri et al. 2012;Umetsu et al. 2012Umetsu et al. , 2018: with q the minor-to-major axis ratio (0 < q 1). Here we have chosen the Cartesian coordinate system (x, y) centered on the halo, such that the x-axis is aligned with the major axis of the projected ellipse. We define the corresponding mass ellipticity by e = (1 − q 2 )/(1 + q 2 ).
We express the multipole expansion of Σ as (Adhikari et al. 2015;Clampitt and Jain 2016): where ϕ is the azimuthal angle relative to the halo's major axis, the multipole Σ (m) (R) is the coefficient of the e imϕ component of the azimuthal behavior, and we assume eη 0 /2 1 to justify the neglect of higher-order terms in the expansion. We thus model the projected mass distributions of clusters as the sum of a monopole and a quadrupole. We further assume that 15 For example, we decompose the κ field into angular multipoles as κ(R, ϕ) = ∞ m=−∞ κ (m) (R) e imϕ . Explicitly, the multipoles are κ (0) (R) = (2π) −1 κ(R, ϕ) dϕ for m = 0 and κ (m) (R) = π −1 κ(R, ϕ) cos mϕ dϕ for m 1.
The quadrupole Σ (2) can thus be completely determined by the monopole Σ (0) ∝ R −η 0 , up to a multiplicative factor corresponding to the halo ellipticity e.
Similarly, the quadrupole moments of the tangential (+) and cross (×) shear components are given by (Adhikari et al. 2015): where I 1 (R) and I 2 (R) are defined by (Clampitt and Jain 2016): Equation (143) suggests an optimal estimator weighted by cos 2ϕ to extract the quadrupole of the excess surface mass density, ∆Σ (2) (R), from tangential shear measurements. Weighted quadrupole estimators for the tangential and cross shear components are given by (Natarajan and Refregier 2000;Mandelbaum et al. 2006 where ∆Σ +,× (R|z l , z s ) = Σ cr (z l , z s )g +,× (R|z l , z s ) (Equation (94)); w ls = Σ −2 cr,ls /σ 2 g,ls is the statistical weight for each lens-source pair (ls), with σ g,ls the statistical uncertainty per shear component (see Equation (98)); and ϕ ls is the azimuthal angle of each source galaxy (s) relative to the major axis of each cluster lens (l). In real observations, we must rely on the major axis of the distribution of baryonic tracers (e.g., central galaxies, X-ray gas) in order to perform aligned, stacked lensing measurements by Equation (145)  As discussed by Mandelbaum et al. (2006), in practical applications, Equation (145) is susceptible to a possible systematic alignment of lens galaxy (e.g., BCGs) and source ellipticities. Such a spurious alignment signal can arise from an incomplete correction of the PSF anisotropy, which tends to affect neighboring objects in a similar manner. On the other hand, when interpreting the quadrupole shear signal, one must take into account possible misalignment between the underlying matter and tracer distributions, which will cause a dilution of the quadrupole shear signal. Moreover, modeling of the quadrupole shear based on the multipole expansion (Equation (142)) should only be applied to the case with a small halo ellipticity (eη/2 1), so that the higher-order terms can be safely ignored (see Equation (141)). Fig. 16 The quadrupole shear pattern produced by an elliptical mass distribution. The x-axis of the Cartesian coordinate system is aligned with the major axis of the tracer distribution, which is assumed to be aligned with the major axis of the underlying mass distribution. The sign convention for the Cartesian shear components (γ 1 , γ 2 ) is shown at the right. Note that the monopole shear (which is purely tangential) is not contributing to the shear pattern illustrated here. The figure is adapted from Clampitt and Jain (2016). Now we introduce the Cartesian estimator of Clampitt and Jain (2016). Compared to the estimator of Natarajan and Refregier (2000), a practical advantage of this estimator is that one of the two Cartesian components (∆Σ (±) 2 defined below) is insensitive to the spurious alignment of lens-source galaxy ellipticities (Clampitt and Jain 2016) discussed at the end of Section 4.6.1. With this estimator, we measure the stacked quadrupole shear signal with respect to a coordinate system with the x-axis aligned with the major axis of the distribution of baryonic tracers (e.g., central galaxies, X-ray gas). The monopole signal is nulled with this Cartesian estimator. We adopt the same sign convention for the Cartesian γ 1 and γ 2 components as defined in Clampitt and Jain (2016) and use ϕ to denote the azimuthal angle relative to the x-axis of each cluster. This is illustrated in Figure 16.

Cartesian Estimator
The Cartesian shear components are related to the tangential and cross components (see Equation (83)) by γ 1 (R, ϕ) = −γ + (R, ϕ) cos 2ϕ + γ × (R, ϕ) sin 2ϕ, In the framework of Adhikari et al. (2015) based on the multipole expansion, the multipole moments of the Cartesian shear components are written as follows (Clampitt and Jain 2016): (147) Equation (147) shows that the azimuthal dependence of the Cartesian shear components goes as cos 4ϕ (except for the two terms without ϕ dependence; see Clampitt and Jain 2016 for more discussion) and sin 4ϕ, so that there is a sign change in both components after every angle π/4. When moving around the circle, the shear signal from elliptical clusters transitions between regions where γ  Clampitt and Jain (2016). The x-axis of the Cartesian coordinate system is aligned with the major axis of the tracer distribution (black solid ellipse). We group together the Cartesian first and second shear components in same-sign regions of cos 4ϕ and sin 4ϕ (gray-shaded regions), respectively, and define four quadrupole shear components, namely, ∆Σ Following Clampitt and Jain (2016), we group together the first and second shear components (g 1 , g 2 ) of background source galaxies in the regions where cos 4ϕ and sin 4ϕ have the same sign (see Figure 17), respectively, and define the following estimator: where we have introduced the notation in analogy to the tangential shear (Equation (94)), where w ls = Σ −2 cr,ls /σ 2 g,ls is the statistical weight for each lens-source pair (ls), with σ g,ls the statistical uncertainty per shear component (see Equation (98)); and s runs over all source galaxies that fall in the specified bin (R, ϕ), different for each shear component i and sign (Clampitt and Jain 2016): i = 1, sign = −, −π/8 ϕ < π/8; i = 1, sign = +, π/8 ϕ < 3π/8; i = 2, sign = −, 0 ϕ < π/4; i = 2, sign = +, π/4 ϕ < π/2. For each case, the summation in Equation (148) also includes source galaxies lying in symmetrical regions shifted by π/2, π, and 3π/2, as illustrated in Figure 17. Figure 18 shows the stacked quadrupole shear profiles, ∆Σ , derived for a sample of 20 high-mass CLASH clusters ). The quadrupole shear signal was measured with respect to the major axis of the X-ray gas shape of each cluster. Umetsu et al. (2018) modeled the stacked ∆Σ (±) 1,2 profiles by assuming an elliptical NFW density profile with the major axis aligned with the X-ray major axis (for an elliptical extension of lensing mass models, see Keeton 2001). Any misalignment would thus lead to a dilution of the quadrupole signal and hence an underestimation of the halo ellipticity. Umetsu et al. (2018) obtained stacked constraints on the projected axis ratio of q = 0.67 ± 0.10 (or 1 − q = 0.33 ± 0.10), which is fully consistent with the median axis ratio q = 0.67 ± 0.07 of this sample obtained from their two-dimensional shear and magnification analysis of the 20 individual clusters. Their results suggest that the total matter distribution is closely aligned with the X-ray brightness distribution (with a median misalignment angle of |∆PA| = 21 • ± 7 • ) as expected from cosmological hydrodynamical simulations (see Okabe et al. 2018).

Magnification Bias
In addition to the shape distortions, gravitational lensing can cause an isotropic focusing of light rays, which results in an amplification of the image flux through the solid-angle distortion (Section 2.6.3). Lensing magnification provides complementary and independent observational alternatives to gravitational shear, especially at high redshift where source galaxies are more difficult to resolve (Van Waerbeke et al. 2010;Hildebrandt et al. 2011;Ford et al. 2014;Chiu et al. 2016Chiu et al. , 2020.

Magnified Source Counts
Let us consider source number counts n 0 (> F ) per unit solid angle as a function of the limiting flux F for a given population of background objects (e.g., color-magnitude-selected galaxies, quasars, etc.). In the absence of gravitational lensing, the intrinsic source counts can be written as where d 2 V (z)/dz/dΩ is the comoving volume element per redshift interval per unit solid angle, d 2 N (L, z)/dL/dV is the luminosity function of the background population, L(z) = 4πD 2 L (z)F is the luminosity threshold corresponding to the flux limit F at redshift z, with D L (z) the luminosity distance, and dn 0 [z| > L(z)]/dz is the redshift distribution function.
The former effect reduces the geometric area in the source plane, thus decreasing the observed number of background sources per unit solid angle. On the other hand, the latter effect amplifies the flux of background sources, increasing the observed number of sources above the limiting flux.
In the presence of gravitational lensing, the magnified source counts are given as where dn µ [z| > L(z)]/dz is the magnified redshift distribution function of the source population. Hence, the net change of the magnified source counts n µ (> F )/n 0 (> F ), known as magnification bias, depends on the intrinsic (unlensed) source luminosity function, d 2 N (L, z)/dL/dV . One can calculate the expectation value for the magnified source counts n µ (> F ) for a given background cosmology and a given source luminosity function.
In real observations, we apply different cuts (e.g., size, magnitude, and color cuts) in source selection for measuring the shear and magnification effects, thus leading to different source redshift distributions. In contrast to the former effect, measuring the effect of magnification bias does not require source galaxies to be spatially resolved, but it does require a stringent flux limit against incompleteness effects (Hildebrandt 2016;Chiu et al. 2020).
Equation (151) indicates that, when redshift information of individual source galaxies is available from spectroscopic redshifts, we can directly measure the magnified redshift distribution of background source galaxies for a flux-limited sample : Hence, in principle, the lensing-induced distortion of the redshift distribution dn µ [z| > L(z)]/dz can be measured from spectroscopic-redshift measurements with respect to the unlensed distribution dn 0 [z| > L(z)]/dz, which can be found in random fields. In particular, the integrated magnification-bias effect will translate into an enhancement in mean source redshift of the background sample (i.e., the first moment of Equation (152)). Using 300,000 BOSS survey galaxies with accurate spectroscopic redshifts, Coupon et al. (2013) measured their mean redshift depth behind four large samples of optically selected clusters from the Sloan Digital Sky Survey (SDSS) survey, totaling 5000-15,000 clusters. They found a > ∼ 1 percent level of mean redshift increase δz(R) toward the cluster center for SDSS-defined optical clusters with an effective mass of M 200c ∼ (1 − 2) × 10 14 M .

Magnification Observables
To simplify the calculations, we discretize Equation (151) as where n µ (z s | > F ) represents a subsample of the background population in the redshift interval [z s , z s + ∆z]. If the change in flux due to magnification is small compared to the range over which the slope of the luminosity function varies, the intrinsic source counts n 0 [z s | > L(z s )] can be approximated at L(z s ) by a power law with a logarithmic slope of 17 The magnified source counts n µ (z s | > F ) in the redshift interval [z s , z s + ∆z] are given by The corresponding magnification bias is given by Equation (156) implies a positive bias for α > 1 and a negative bias for α < 1. The net magnification effect on the source counts vanishes at α = 1. For a depleted sample of background sources with α 1, the effect of magnification bias is dominated by the geometric area distortion (b µ → µ −1 at α → 0) and insensitive to the intrinsic source luminosity function (Umetsu 2013). In the weak-lensing limit (|γ|, |κ| 1), we have b µ 1 + 2(α − 1)κ.
Hence, the flux magnification bias b µ in the weak-lensing limit provides a local measure of the surface mass density field, κ(θ). The combination of shear and magnification can thus be used to break or alleviate mass-sheet degeneracy (Schneider et al. 2000;Broadhurst et al. 2005b;Umetsu and Broadhurst 2008;Umetsu et al. 2011bUmetsu et al. , 2014Umetsu et al. , 2018.
In practical applications, we need to average over a broad range of source redshifts to increase the S/N. The magnification bias averaged over the source redshift distribution is expressed as In the continuous limit s n 0 (z s | > F ) → dz dn 0 (z| > F )/dz, we have the following equation (Umetsu 2013;Umetsu et al. 2016): Equation (159) gives a general expression for the flux magnification bias. Deep multi-band photometry spanning a wide wavelength range allows us to identify distinct populations of background galaxies (e.g., Medezinski et al. 2010Medezinski et al. , 2011Medezinski et al. , 2018bUmetsu et al. 2012Umetsu et al. , 2014Umetsu et al. , 2015. Since a given flux limit (F ) corresponds to different intrinsic luminosities L(z s ) at different source redshifts z s (Equation (150)), source counts of distinctly different background populations probe different regimes of magnification bias. The bias is strongly negative for quiescent galaxies at z s ∼ 1, with a faint-end slope of α ∼ 0.4 at the limiting magnitude z ≈ 25.6 ABmag (e.g., Umetsu et al. 2014Umetsu et al. , 2015. A net count depletion (b µ < 1) results for such a source population with α 1 (e.g., Broadhurst 1995;Taylor et al. 1998;Broadhurst et al. 2005b;Umetsu and Broadhurst 2008;Umetsu et al. 2011bUmetsu et al. , 2012Umetsu et al. , 2014Umetsu et al. , 2015Ford et al. 2012;Coe et al. 2012;Medezinski et al. 2013;Radovich et al. 2015;Ziparo et al. 2016;Wong et al. 2017), because the effect of magnification bias is dominated by the geometric area distortion. In the regime of density depletion, a practical advantage is that the effect is not sensitive to the exact form of the source luminosity function. The S/N for detection of b µ increases progressively as the flux limit F decreases. Fig. 19 Weak-lensing radial profiles of MACS J1206.2−0847 (z l = 0.439) from Subaru/Suprime-Cam observations. The top panel shows the reduced tangential shear profile g + (squares). The bottom panel shows the coverage-and masking-corrected number-count profiles nµ for flux-limited samples of blue and red background galaxies (blue and red circles, respectively). The error bars include contributions from Poisson counting uncertainties and those from intrinsic angular clustering of each source population. For the red sample, a strong depletion of the source counts is seen toward the cluster center due to magnification of the sky area, while a slight enhancement of blue counts is present in the innermost radial bins due to the effect of positive magnification bias. Also shown for each observed profile is the joint Bayesian reconstruction (shaded area) from combined strong-lensing, weak shear lensing, and positive/negative magnification-bias measurements. The figure is adapted from Umetsu (2013). Figure 19 displays weak-lensing radial profiles for the cluster MACS J1206.2−0847 at z l = 0.439 derived from Subaru/Suprime-Cam observations (Umetsu 2013). It is a highly massive X-ray cluster with M 200c = (15.9 ± 3.6) × 10 14 M (Umetsu et al. 2014) targeted by the CLASH survey (Postman et al. 2012). The black squares in the top panel show the reduced tangential shear profile g + (R). The blue and red circles in the bottom panel are positive and negative magnification-bias measurements n µ (R) showing density enhancement and depletion, respectively, as a function of cluster-centric radius R. These weak-lensing measurements yield respective S/N values of 10.2, 2.9, and 4.7. Figure 19 also shows a joint Bayesian reconstruction of each observed profile obtained from combined strong-lensing, weak shear lensing, and positive/negative magnification-bias measurements.

Nonlinear Effects on the Source-averaged Magnification Bias
It is instructive to consider a maximally depleted population of source galaxies with α = −d log n 0 (> F )/d log F = 0 at the limiting flux F . For such a population, the effect of magnification bias is purely geometric, b µ = µ −1 , and insensitive to details of the intrinsic source luminosity function, d 2 N (L, z)/dL/dv. In the nonlinear subcritical regime, the source-averaged inverse magnification factor is expressed as (Umetsu 2013): where ... denotes the averaging over the source-redshift distribution (see Equation (159)), f l = Σ −2 cr,l / Σ cr,l 2 is a quantity of the order of unity, and ∆ µ −1 is the correction with respect to the single source-plane approximation. The error associated with the single source-plane approximation is ∆ µ −1 / µ −1 (f l − 1) κ 2 − | γ | 2 , which is much smaller than unity for background populations with α ∼ 0 in the mildly nonlinear subcritical regime where | κ | ∼ | γ | ∼ O(10 −1 ). It is therefore reasonable to use the single sourceplane approximation for calculating the magnification bias of depleted source populations with α 0. In the regime of density enhancement (α > 1), on the other hand, interpreting the observed lensing signal requires detailed knowledge of the intrinsic source luminosity function (see, e.g., Chiu et al. 2016Chiu et al. , 2020, especially in the nonlinear regime where the flux amplification factor is correspondingly large (say, µ > ∼ 1.5). For example, a blue distant population of background galaxies is observed to have a well-defined redshift distribution that is fairly symmetric and peaked at a mean redshift of z s ∼ 2 (e.g., Lilly et al. 2007;Medezinski et al. 2010). Therefore, the majority of these faint blue galaxies are in the far background of typical cluster lenses, so that the lensing signal has a weaker dependence on the source redshift z s . In such a case, the single source-plane approximation may well be justified (Umetsu 2013).

Observational Systematics and Null Tests
In real observations, contamination of the background sample by unlensed galaxies is a critical source of systematics in cluster weak lensing, as discussed in Section 4.1. In particular, contamination by cluster galaxies has a direct impact on the interpretation of background source counts because it will cause an apparent density enhancement at small cluster-centric radii. To avoid significant contamination and alleviate this problem as much as possible, one often relies on a stringent color-color selection (Section 4.1) to measure the lensing magnification signal from a distinct population of background galaxies (e.g., Broadhurst et al. 2005b;Umetsu and Broadhurst 2008;Umetsu et al. 2011bUmetsu et al. , 2014Ziparo et al. 2016;Chiu et al. 2016Chiu et al. , 2020. If well-calibrated photo-z PDFs are available from multi-band observations, the impact of cluster contamination can be characterized and assessed by statistically decomposing the photo-z PDF P (z) into the cluster and random-field populations (e.g., Gruen et al. 2014;Chiu et al. 2020).
Moreover, for unbiased magnification-bias measurements, one has to correct for the incomplete area coverage due to masked regions and incomplete measurement annuli. Specifically, masking (or blocking) of background galaxies by foreground objects, cluster members, and saturated or bad pixels needs to be properly accounted for (see Umetsu et al. 2011b;Chiu et al. 2020). Another concern is the impact of blending effects in the crowded regions of cluster environments and the presence of intracluster light (Gruen et al. 2019a), which could bias the photometry and thus photo-z's. The effects of masking and blending on the source counts can be examined and quantified by injecting synthetic galaxies into real images from observations (Huang et al. 2018;Chiu et al. 2020).  Since the net effect of magnification bias is expected to vanish for a flux-limited background sample defined at α = 1 (Section 5.2), weak-lensing magnification provides a powerful null test, similar to the cross-shear (B-mode) signal in the case of weak shear lensing (Section 4.2). By performing a null test, one can empirically assess the level of residual bias that could be present in the measurement for a "lensing-cut" sample defined at α > 1 or α < 1. The only assumption made in this approach is that residual systematics are the same between the lensing-cut and null-test samples defined at different flux (magnitude) limits. This null test allows us to quantify the impact of deblending effects, biased photometry in crowded regions, and any incorrect assumptions about P (z)-decomposition . This is demonstrated in Figure 20. The figure shows the stacked magnification-bias profiles around a sample of 3029 CAMIRA clusters with richness N > 15 in the redshift range 0.2 z < 1.1, obtained using flux-limited low-z and high-z background samples, as well as the joint sample, selected in the g − i versus r − z diagram from HSC-SSP survey data . The magnification-bias signal for the full CAMIRA sample is detected at a significance level of 9.5σ. On the other hand, the residual bias estimated from the null-test samples was found to be statistically consistent with zero .

Recent Advances in Cluster Weak Lensing Observations
Galaxy clusters provide valuable information from the physics driving cosmic structure formation to the nature of dark matter and dark energy. Their content reflects that of the universe: ∼ 85% dark matter and ∼ 15% baryons (cf. Ω b /Ω m = (15.7 ± 0.4)%; Planck Collaboration et al. 2016a), with ∼ 90% of the baryons residing in the gaseous intracluster medium. Since clusters are highly massive and dominated by dark matter, they offer fundamental tests on the assumed properties of dark matter, as well as on models of nonlinear structure formation. The ΛCDM paradigm assumes that dark matter is effectively cold (nonrelativistic) and collisionless on astrophysical and cosmological scales (Bertone and Tait 2018). In this context, the standard CDM model and its variants, such as selfinteracting dark matter (SIDM; Spergel and Steinhardt 2000) and wave dark matter (ψDM; Schive et al. 2014), can provide a range of testable predictions for the properties of clustersize halos (Sections 6.1 and 6.2). A prime example is the "Bullet Cluster", a merging pair of clusters exhibiting a significant offset between the centers of the gravitational lensing mass and the X-ray peaks of the collisional cluster gas (Clowe et al. 2004. The data support that dark matter in clusters is effectively collisionless like galaxies, placing a robust upper limit on the self-interacting cross section for dark matter of σ DM /m < 1.25 cm 2 g −1 (Randall et al. 2008).
The abundance of clusters as a function of redshift provides a sensitive probe of the amplitude and growth rate of primordial density perturbations, as well as of the cosmological volume element d 2 V (z)/dz/dΩ. This cosmological sensitivity arises mainly because clusters populate the high-mass exponential tail of the halo mass function (e.g., Haiman et al. 2001). In principle galaxy clusters can complement other cosmological probes, such as CMB, galaxy clustering, cosmic shear, and distant supernova observations. To place cosmological constraints using clusters, it is essential to study large cluster samples with wellcharacterized selection functions, spanning a wide range in mass and redshift (Allen et al. 2004;Vikhlinin et al. 2009;Mantz et al. 2010). However, the ability of galaxy clusters to provide robust cosmological constraints is currently limited by systematic uncertainties in their mass calibration (Pratt et al. 2019). Since the level of mass bias is sensitive to calibration systematics of the instruments (Donahue et al. 2014;Israel et al. 2015) and is likely mass dependent (Sereno and Ettori 2017;Umetsu et al. 2020), a concerted effort is needed to enable an accurate mass calibration with weak gravitational lensing (see Section 6.4).
Substantial progress has been made in constructing statistical samples of galaxy clusters thanks to dedicated wide-field surveys in various wavelengths (Planck Collaboration et al. 2014a, 2015aBleem et al. 2015;Miyazaki et al. 2018b;Oguri et al. 2018). Systematic lensing studies of galaxy clusters often target X-ray-or SZE-selected samples (e.g., von der Linden et al. 2014a ;Postman et al. 2012;Gruen et al. 2014;Hoekstra et al. 2015;. This is because the hot intracluster gas provides an excellent tracer of the cluster's gravitational potential (e.g., Donahue et al. 2014), except for the cases of massive cluster collisions caught in an ongoing phase of dissociative mergers Okabe and Umetsu 2008). Moreover, X-ray and SZE observations provide useful centering information of individual clusters. The effect of off-centered clusters is to dilute and flatten the observed Σ(R) profile at scales smaller than the offset scale σ off (Johnston et al. 2007;Du and Fan 2014). Since flattened, sheet-like mass distributions produce little shear, the impact of miscentering on ∆Σ(R) is much larger. The off-centered ∆Σ(R) profile is strongly suppressed by smoothing at scales R < ∼ 2.5σ off (Johnston et al. 2007). A comparison of X-ray, SZE, and optical (e.g., BCGs) center positions allows us to empirically assess the level of halo miscentering. It should be noted, however, that a merger can boost the X-ray and SZE signals and make their peaks off-centered during the compression phase (Molnar et al. 2012). Although the timescale on which this happens is expected to be short (∼ 1 Gyr; Ricker and Sarazin 2001), it could induce a selection effect and contribute to the scatter in their scaling relations .
In this section, we review recent advances in our understanding of the distribution and amount of mass in galaxy clusters based on cluster weak-lensing observations.

Cluster Mass Distribution
The distribution and concentration of mass in dark-matter-dominated halos depend fundamentally on the properties of dark matter. For the case of collisionless CDM models, cosmological N -body simulations with sufficiently high resolution can provide accurate predictions for the end product of collisionless collapse in a expanding universe. Although the formation of halos is a complex, nonlinear dynamical process and halos are evolving through accretion and mergers, ΛCDM models predict that the structure of quasi-equilibrium halos characterized in terms of the spherically averaged density profile ρ(r) is approximately selfsimilar with a characteristic density cusp in their centers, ρ(r) ∝ 1/r (Navarro et al. 1996(Navarro et al. , 1997. The density profile ρ(r) of dark-matter-dominated halos steepens continuously with radius and it is well described by the NFW form out to the virial radius, albeit with large variance associated with the assembly histories of individual halos (Jing and Suto 2000).
Subsequent numerical studies with improved statistics and higher resolution found that the spherically averaged density profiles of ΛCDM halos are better approximated by the three-parameter Einasto function with an additional degree of freedom (Merritt et al. 2006;Gao et al. 2008), which is closely linked with the mass accretion history of halos (van den Bosch 2002; Ludlow et al. 2013). The Einasto profile has a power-law logarithmic slope of γ 3D (r) = −2(r/r −2 ) α E (Section 4.3.3). For a given halo concentration, an Einasto profile with α E ≈ 0.18 closely resembles the NFW profile over roughly two decades in radius (Ludlow et al. 2013). The shape parameter α E of ΛCDM halos increases gradually with halo mass and redshift (see Gao et al. 2008;Child et al. 2018, 0.15 < ∼ α E < ∼ 0.25 at z = 0), so that the density profiles of ΛCDM halos are not strictly self-similar (Navarro et al. 2010). By analyzing a large suite of N -body simulations in ΛCDM, Child et al. (2018) found that both Einasto and NFW profiles provide a good description of the stacked mass distributions of cluster-size halos at low redshift, implying that the two fitting functions are nearly indistinguishable for stacked ensembles of low-redshift clusters, in contrast to clusters at higher redshift (z > ∼ 1). The three-dimensional shape of collisionless halos is predicted to be generally triaxial with a preference for prolate shapes (Warren et al. 1992;Jing and Suto 2002), reflecting the collisionless nature of dark matter (Ostriker and Steinhardt 2003). Older halos tend to be more relaxed and thus to be rounder. Since more massive halos form later on average, cluster-size halos are expected to be more elongated than less massive systems (Shaw et al. 2006;Ho et al. 2006;Despali et al. 2014Despali et al. , 2017Bonamigo et al. 2015). Accretion of matter from the surrounding large-scale environment also plays a key role in determining the shape and orientation of halos. The halo orientation tends to be in the preferential infall direction of the subhalos and hence aligned along the surrounding filaments (Shaw et al. 2006). The shape and orientation of galaxy clusters thus provide an independent test of models of structure formation (see Section 4.6).
Prior to dedicated wide-field optical imaging surveys such as Subaru HSC-SSP and DES, several cluster lensing surveys carried out deep targeted observations toward a few to several tens of highly massive galaxy clusters with M 200c ∼ 10 15 M (e.g., Postman et al. 2012;Okabe et al. 2013;von der Linden et al. 2014a;Hoekstra et al. 2015). Since such clusters are extremely rare across the sky, targeted weak-lensing observations with deep multi-band imaging currently represent the most efficient approach to study in detail the high-mass population of galaxy clusters each with sufficiently high S/N (see Contigiani et al. 2019). In the last decade, cluster-galaxy weak-lensing observations have established that the total matter distribution within clusters in projection can be well described by cuspy, outwardsteepening density profiles (Umetsu et al. 2011b(Umetsu et al. , 2014Newman et al. 2013b;Okabe et al. 2013;, such as the NFW and Einasto profiles with a near-universal shape (Niikura et al. 2015;Umetsu and Diemer 2017), as predicted for collisionless halos in quasi-gravitational equilibrium (e.g., Navarro et al. 1996Navarro et al. , 1997Taylor and Navarro 2001;Merritt et al. 2006;Gao et al. 2008;Hjorth and Williams 2010;Williams and Hjorth 2010). Moreover, the shape and orientation of galaxy clusters as constrained by weak-lensing and multiwavelength data sets are found to be in agreement with ΛCDM predictions (e.g., Oguri et al. 2005;Evans and Bridle 2009;Morandi et al. 2012;Sereno et al. 2013Sereno et al. , 2018bUmetsu et al. 2015Umetsu et al. , 2018Chiu et al. 2018;Shin et al. 2018), although detailed studies of individual clusters are currently limited to a relatively small number of high-mass clusters with deep multiwavelength observations (see Sereno et al. 2018b;Umetsu et al. 2018). These results are all in support of the standard explanation for dark matter as effectively collisionless and nonrelativistic on sub-megaparsec scales and beyond.
In Figure 21, we show the ensemble-averaged ∆Σ + profile in the radial range R ∈ [0.1, 2.8] h −1 Mpc obtained for a stacked sample of 50 X-ray clusters  targeted by the LoCuSS Survey (Local Cluster Substructure Survey; Smith et al. 2016). Their weak shear lensing analysis is based on two-band imaging observations with Subaru/Suprime-Cam. Their cluster sample is drawn from the RASS catalogs at 0.15 < z < 0.3 and is approximately X-ray luminosity limited . The stacked shear profile of the LoCuSS sample is in excellent agreement with the NFW profile with M 200c = 6.37 +0.28 −0.27 × 10 14 h −1 M and c 200c = 3.69 +0.26 −0.24 at z l = 0.23. The 2-halo term contribution to ∆Σ for the LoCuSS sample is negligibly small in the radial range < ∼ 2r 200c . From a single Einasto profile fit to the stacked ∆Σ profile, Okabe and Smith (2016) obtained the best-fit Einasto shape parameter of α E = 0.161 +0.042 −0.041 , which is consistent within the errors with the ΛCDM predictions for cluster-size halos at z l = 0.23 (Gao et al. 2008;Dutton and Macciò 2014). The averaged mass profile is well described by a family of density profiles predicted for dark-matter-dominated halos in gravitational equilibrium, such as the NFW, Einasto, and DARKexp models. Cuspy halo models (red) that include the 2-halo contribution from surrounding large scale structure provide improved agreement with the data. This is a slightly modified version of the figure presented in Umetsu et al. (2016). Figure 22 shows the ensemble-averaged Σ profile of 16 CLASH X-ray-selected clusters ) based on a joint strong and weak lensing analysis of 16-band Hubble Space Telescope (HST) observations (Zitrin et al. 2015) and wide-field multicolor imaging taken primarily with Subaru/Suprime-Cam (Umetsu et al. 2014). The CLASH survey is an HST Multi-Cycle Treasury program designed to study with 525 assigned orbits the mass distributions of 25 high-mass clusters (Postman et al. 2012). In this sample, 20 clusters were selected to have regular X-ray morphologies and X-ray temperatures above 5 keV. Numerical simulations suggest that this X-ray-selected subsample is mostly composed of relaxed clusters (∼ 70%) but the rest (∼ 30%) are unrelaxed systems (Meneghetti et al. 2014). Another subset of five clusters were selected by their high-magnification lensing properties. Umetsu et al. (2016) studied a subset of 20 CLASH clusters (16 X-ray-selected and 4 highmagnification systems) taken from Umetsu et al. (2014), who presented a joint shear and magnification weak-lensing analysis of these individual clusters. The stacked Σ profile over two decades in radius, R ∈ [0.02, 2]r 200m , is well described by a family of density profiles predicted for cuspy dark-matter-dominated halos in gravitational equilibrium, namely, the NFW, Einasto, and DARKexp models (Umetsu et al. 2016). 18 In contrast, the single power-law, cored-isothermal, and Burkert density profiles are statistically disfavored by the data. Cuspy halo models that include the 2-halo term provide improved agreement with the data. Umetsu et al. (2016) found the best-fit NFW parameters for the stacked CLASH Σ profile of M 200c = 10.1 +0.8 −0.7 × 10 14 h −1 M and c 200c = 3.76 +0.29 −0.27 ) at a lensing-weighted mean redshift of z l ≈ 0.34. Similarly, the best-fit Einasto shape parameter for the stacked Σ profile is α E = 0.232 +0.042 −0.038 , which is in excellent agreement with predictions from ΛCDM numerical simulations, α E = 0.21 ± 0.07 (Meneghetti et al. 2014, α E = 0.24 ± 0.09 when fitted to surface mass density profiles of projected halos).
Note that the innermost bin in Figure 22 represents the mean density interior to R min = 40h −1 kpc corresponding to the typical resolution limit of their HST strong-lensing analysis, δϑ ≈ 10 arcsec. This scale R min is about twice the half-light radius of the CLASH BCGs (see Tian et al. 2020), within which the stellar baryons dominate the total mass of the clusters (e.g., Caminha et al. 2019). Determinations of the central slope of the dark matter density profile ρ DM (r) in clusters require additional constraints on the total mass in the innermost region, such as from stellar kinematics of the BCG (Newman et al. 2013a). In order to constrain ρ DM (r), one needs to carefully model the different contributions to the cluster total mass profile, coming from the stellar mass of member galaxies, the hot gas component, the BCG stellar mass, and dark matter (Sartoris et al. 2020). Moreover, it is important to take into account the velocity anisotropy on the interpretation of the line-of-sight stellar velocity dispersion profile of the BCG (Sartoris et al. 2020;He et al. 2020). Current measurements and interpretations of the asymptotic central slope of ρ DM (r) in galaxy clusters appear to be controversial (e.g., Newman et al. 2013a;Sartoris et al. 2020;He et al. 2020).
According to cosmological N -body simulations, the spherically averaged density profiles in the halo outskirts are most self-similar when expressed in units of overdensity radii r ∆ m defined with respect to the mean density of the universe, ρ(z). especially to r 200m (Diemer and Kravtsov 2014). This self-similarity indicates that overdensity radii defined with respect to the mean cosmic density are preferred to describe the structure and evolution of the outer density profiles. The structure and dynamics of the infall region are expected to be universal in units of the turnaround radius, according to self-similar infall models (Gunn and Gott 1972;Fillmore and Goldreich 1984;Bertschinger 1985;Shi 2016). In these models, the turnaround radius is a fixed multiple of the radius enclosing a given fixed overdensity with respect to the mean cosmic density. The outer profiles can thus be expected to be selfsimilar in r/r 200m . In contrast, the density profiles in the intra-halo (1-halo) region are found to be most self-similar when they are scaled by r 200c (Diemer and Kravtsov 2014). That is, density profiles of ΛCDM halos in N -body simulations prefer different scaling radii in different regions of the density profile (Diemer and Kravtsov 2014). These empirical scalings were confirmed in cosmological hydrodynamical simulations of galaxy clusters (Lau et al. 2015, see also Shi 2016). However, the physical explanation for the self-similarity of the inner density profile when rescaled with r 200c is less clear.
In Figure 23, we show the projected total mass density (Σ) and enclosed mass (M 2D ) profiles for seven CLASH clusters derived from a detailed strong-lensing analysis of Caminha et al. (2019) based on extensive spectroscopic information, primarily from the Multi Unit Spectroscopic Explorer (MUSE) archival data complemented with CLASH-VLT red-

Fig. 23
Projected projected mass density (left) and enclosed mass (right) profiles for seven CLASH clusters derived from a detailed strong-lensing analysis, rescaled by M 200c and r 200c obtained from NFW fits to independent weak-lensing measurements . Vertical lines color-coded for each cluster indicate the positions of multiple images used for the lens modeling, all belonging to spectroscopic confirmed families. MACS J0416 with a shallower inner density slope is a highly asymmetric merging system. The figure is adapted from Caminha et al. (2019). shift measurements (Biviano et al. 2013;Rosati et al. 2014). In the figure, the projected mass profiles of individual clusters are rescaled using M 200c and r 200c obtained from NFW fits to independent ground-based weak-lensing measurements . All clusters have a relatively large number of multiple-images constraints in the region 10 −2 < ∼ R/r 200c < ∼ 10 −1 , where shape of the rescaled Σ(R) and M 2D (< R) profiles are remarkably similar. Even MACS J0416 (Zitrin et al. 2013;Jauzac et al. 2015;Grillo et al. 2015;Balestra et al. 2016), which is a highly asymmetric merging system, does not deviate significantly from the overall homologous profiles. Within 10% and 20% of r 200c for the seven clusters, Caminha et al. (2019) measured a mean projected total mass value of 0.13 and 0.32 × M 200c , respectively, finding a remarkably small scatter of 5% and 6%. At these same radii, for the projected total mass density profiles, they found a mean value of 9.0 and 4.7 × M 200c /(πr 2 200c ), with a slightly larger scatter of 7% and 9%, respectively. The observed trend is qualitatively consistent with the predictions by Diemer and Kravtsov (2014) and Lau et al. (2015).

The Concentration-Mass Relation
The halo concentration c ∆ is a key quantity that characterizes the density structure of dark matter halos, where c ∆ is defined as the ratio of the outer halo radius r ∆ (typically defined at an overdensity of ∆ c = 200) and the inner characteristic radius r s at which the logarithmic density slope is −2 (Section 4.2). 19 The halo concentration as a function of halo mass and redshift is referred to as the concentration-mass (c-M ) relation (e.g., Duffy et al. 2008;Neto et al. 2007). For NFW halos, the c-M relation fully specifies the structure of halos at fixed halo mass and thus is a key ingredient of cluster cosmology.
In hierarchical ΛCDM models, c ∆ is predicted to depend on the accretion history. In the early phase of rapid mass accretion, the scale radius r s of a halo scales approximately as the virial radius and thus c ∆ remains nearly constant (Zhao et al. 2003). During the subsequent slow accretion phase, the scale radius stays approximately constant, whereas r ∆ continues to grow through a mixture of physical accretion and pseudo-evolution, resulting in an increase in halo concentration (Navarro et al. 1997;Bullock et al. 2001;Wechsler et al. 2002;Diemer et al. 2013;Correa et al. 2015). Since the mass accretion history depends on the amplitude and shape of peaks in the initial density field, as well as on the mass scale and the background cosmology, c ∆ depends on halo mass, redshift, and cosmological parameters (Prada et al. 2012;Dutton and Macciò 2014;Diemer and Kravtsov 2015;Diemer and Joyce 2019). There have been a number of attempts to obtain a more universal representation of c ∆ as a function of physical parameters, such as the halo peak height ν(M ∆ , z) and the local slope of the matter power spectrum d ln P (k)/d ln k (see Zhao et al. 2009;Prada et al. 2012;Dutton and Macciò 2014;Diemer and Kravtsov 2015;Diemer and Joyce 2019).
Since galaxy clusters are, on average, dynamically young and still growing through accretion and mergers, cluster halos are expected to have relatively low concentrations, c 200c (z = 0) ∼ 4, in contrast to individual galaxy halos that have denser central regions, c 200c (z = 0) ∼ 7 − 8 (Bhattacharya et al. 2013;Dutton and Macciò 2014;Diemer and Kravtsov 2015;Child et al. 2018;Diemer and Joyce 2019). These general trends are complicated by diverse formation and assembly histories of individual halos (Ludlow et al. 2013), which translate into a substantial scatter (∼ 30%-35%) in the c-M relation (e.g., Bhattacharya et al. 2013;Diemer and Kravtsov 2015).
In Figure 24,  (Bhattacharya et al. 2013;Meneghetti et al. 2014;Diemer and Kravtsov 2015), indicating no significant impact of the X-ray selection on the normalization and mass slope parameters. Okabe and Smith (2016) found an intrinsic lognormal dispersion of σ int (ln c 200c ) < 20% (68.3% CL) at fixed halo mass, which is lower than found for ΛCDM halos in N -body simulations (∼ 30%-35% for the full population of halos including both relaxed and unrelaxed systems; Bhattacharya et al. 2013;Diemer and Kravtsov 2015). Similar results on the concentration scatter were obtained for independent X-ray cluster samples (e.g., the CLASH and the XXL samples with σ int (ln c 200c ) = (13 ± 6)% and σ int (ln c 200c ) < 24% at the 99.7% CL, respectively; see Umetsu et al. 2016Umetsu et al. , 2020. This is likely caused in part by the X-ray selection bias in terms of the cool-core or relaxation state, as found by previous studies (Buote et al. 2007;Ettori et al. 2010;Eckert et al. 2011;Meneghetti et al. 2014).
Cluster samples are traditionally defined by X-ray or optical observables, and more recently through the thermal SZE strength (e.g., Planck Collaboration et al. 2015a). The SZE is a characteristic spectral distortion in the CMB induced by inverse Compton scattering between cold CMB photons and hot ionized electrons (Section 2.6.5; Rephaeli 1995;Birkinshaw 1999). Unlike any other detection techniques, SZE-selected cluster samples are nearly mass-limited and have well-behaved selection functions. This is because the SZE detection signal has a very weak dependence on the redshift (see Equation (38)) and it is also less sensitive to the relaxation state of the cluster. Blind SZE surveys can thus provide unbiased cluster samples representative of the full population of halos out to high redshift. This makes SZE surveys ideal for cosmological tests based on the evolution of the cluster abundance. We show in Figure 25 the c 200c -M 200c relation at z l = 0.20 obtained for a sample of Planck clusters targeted by the PSZ2LenS project   Hildebrandt et al. 2016). The PSZ2LenS sample represents a faithful subsample of the whole population of Planck clusters, for which homogeneous weak-lensing data and photometric redshifts are available from the CFHTLenS and RCSLenS surveys . The resulting relation between mass and concentration is in broad agreement with theoretical predictions from N -body simulations calibrated for recent ΛCDM cosmologies (Bhattacharya et al. 2013;Dutton and Macciò 2014;Meneghetti et al. 2014;Ludlow et al. 2016).

Superlens Clusters: Are They Overconcentrated?
In contrast to X-ray-or SZE-selected samples, galaxy clusters identified by the presence of strongly lensed giant arcs represent a highly biased population. In particular, cluster lenses selected to have large Einstein radii (e.g., ϑ Ein > 30 arcsec for z s = 2) represent the most lensing-biased population of clusters with their major axis preferentially aligned with the observer's line of sight (Hennawi et al. 2007;Oguri and Blandford 2009). Such an extreme population of cluster lenses is referred to as superlenses (Oguri and Blandford 2009). A selection bias in favor of prolate structure pointed to the observer is expected because this orientation boosts the projected surface mass density and hence the lensing signal. A population of superlens clusters is also expected to be biased toward halos with intrinsically higher concentrations (Hennawi et al. 2007;Sereno et al. 2010). Accordingly, in the context of ΛCDM, superlens clusters are predicted to have large apparent concentrations in projection of the sky, compared to typical clusters with similar masses and redshifts (Oguri and Blandford 2009). Calculations of the enhancement of the projected mass and thus boosted Einstein radii find a statistical bias of ∼ 34% in concentration based on N -body simulations of ΛCDM cosmologies (Hennawi et al. 2007). Semianalytical simulations based on triaxial halos find a concentration bias of ∼ 40%-60% for superlens clusters (Oguri and Blandford 2009). Despite attempts to correct for potential projection and selection biases inherent to lensing, initial results from combined strong-and weak-lensing measurements assuming a spherical halo revealed a relatively high degree of halo concentration in lensing clusters (Gavazzi et al. 2003;Kneib et al. 2003;Broadhurst et al. 2008;Zitrin et al. 2011;Umetsu et al. 2011a), lying above the c-M relation calibrated for ΛCDM cosmologies (e.g., Neto et al. 2007;Duffy et al. 2008) based on earlier Wilkinson Microwave Anisotropy Probe (WMAP) releases (Spergel et al. 2003;Komatsu et al. 2009). A possible explanation for the apparent discrepancy was that cluster halos are highly overconcentrated than expected from ΛCDM models. Motivated by possible implications for the overconcentration problem, one of the key objectives of the CLASH survey is to establish the degree of mass concentration for a lensing-unbiased sample of high-mass clusters using combined strongand weak-lensing measurements with homogeneous data sets (Postman et al. 2012 (Duffy et al. 2008;Bhattacharya et al. 2013;Dutton and Macciò 2014;Meneghetti et al. 2014;Diemer and Kravtsov 2015), all evaluated at z = 0.32 for the full population of halos. The dashed lines show 60% superlens corrections to the solid lines, accounting for the effects of strong-lensing selection and orientation bias expected for the population of superlens clusters (Oguri and Blandford 2009). The figure is adapted from Umetsu et al. (2016).
As a precursor study of the CLASH survey, Umetsu et al. (2011a) carried out a combined strong-and weak-lensing analysis of four superlens clusters of similar masses (A1689, A1703, A370, and Cl0024+1654; see Umetsu et al. 2011b) using strong-lensing, weaklensing shear and magnification measurements obtained from high-quality HST and Subaru observations. The stacked sample has a lensing-weighted mean redshift of z l ≈ 0.32. These clusters display prominent strong-lensing features characterized by large Einstein radii, θ Ein > ∼ 30 arcsec for z s = 2. Umetsu et al. (2011a) found that the stacked Σ profile of the four clusters in the range R ∈ [40, 2800]h −1 kpc is well described by a single NFW profile, with an effective concentration of c 200c = 6.31 ± 0.35 at M 200c = (13.4 ± 0.9) × 10 14 h −1 M , corresponding to an Einstein radius of ϑ Ein ≈ 36 arcsec for z s = 2. After applying a 50% superlens correction, Umetsu et al. (2011a) found a dis-crepancy of ∼ 2σ with respect to the c-M relation of Duffy et al. (2008) calibrated for the WMAP five-year cosmology. They concluded that there is no significant tension between the concentrations of the four clusters and those of ΛCDM halos if lensing biases are coupled to a sizable intrinsic scatter in the c-M relation.
In Figure 26, we show in the c 200c -M 200c plane the stacked lensing constraints obtained for the CLASH X-ray-selected subsample ) and those for the superlens sample of Umetsu et al. (2011a). The stacked lensing constraints for the two cluster samples are compared to theoretical c-M relations of Duffy et al. (2008), Bhattacharya et al. (2013), Dutton and Macciò (2014), Meneghetti et al. (2014), and Diemer and Kravtsov (2015, their mean relation), all evaluated for the full population of halos at z = 0.32. This comparison demonstrates that c-M relations that are calibrated for more recent simulations and ΛCDM cosmologies (WMAP seven-and nine-year cosmologies and Planck cosmologies) provide better agreement with the CLASH lensing measurements (Umetsu et al. 2014Merten et al. 2015). This is also in line with the findings of Dutton and Macciò (2014), who showed that the c-M relation in the WMAP five-year cosmology has a 20% lower normalization at z = 0 than in the Planck cosmology, which has a correspondingly higher normalization in terms of Ω m and σ 8 .
To account for the superlens bias in the Umetsu et al. (2011a) sample, Figure 26 shows each of the c-M predictions with a maximal 60% correction applied (Oguri and Blandford 2009). We see from the figure that, once the effects of selection and orientation bias are taken into account, the results of Umetsu et al. (2011a) come into line with the models of Dutton and Macciò (2014), Meneghetti et al. (2014), and Diemer and Kravtsov (2015), the three most recent c-M models studied in Umetsu et al. (2016). Hence, the discrepancy found by Umetsu et al. (2011a) can be fully reconciled by the higher normalization of the c-M relation as favored by more recent WMAP and Planck cosmologies (Komatsu et al. 2011;Hinshaw et al. 2013;Planck Collaboration et al. 2015b). Therefore, there appears to be no compelling evidence for the overconcentration problem within the standard ΛCDM framework (see also Oguri et al. 2012;Foëx et al. 2014;Robertson et al. 2020).
It is intriguing to note that, as already discussed in Section 4.3.1, full triaxial modeling of Abell 1689 (z l = 0.183) shows that combined lensing, X-ray, and SZE observations of the cluster can be consistently explained by its intrinsically high mass concentration combined with a chance alignment of its major axis with the line-of-sight direction (Umetsu et al. 2015). A careful interpretation of lensing, dynamical, and X-ray data based on Nbody/hydrodynamical simulations suggests that Cl0024+1654 (z l = 0.395) is the result of a high-speed, line-of-sight collision of two massive clusters viewed approximately 2-3 Gyr after impact when the gravitational potential has had time to relax in the center, but before the gas has recovered . Similar to the case of Cl0024+1654, Abell 370 (z l = 0.375) is faint in both X-ray and SZE signals and does not follow the X-ray/SZE observable-mass scaling relations (see Czakon et al. 2015). N -body/hydrodynamical simulations constrained by lensing, dynamical, X-ray, and SZE observations suggest that Abell 370 is a post major merger after the second core passage in the infalling phase, just before the third core passage (Molnar et al. 2020). In this post-collision phase, the gas has not settled down in the gravitational potential well of the cluster, which explains why A370 does not follow the mass scaling relations. Note that, because of its large projected mass and high lensing magnification capability, Abell 370 has been selected as one of the six Hubble Frontier Fields clusters (Lotz et al. 2017).
Finally, it should be noted that high-magnification-selected clusters at z l > 0.5, such as those selected by the CLASH and Frontier Fields surveys (Postman et al. 2012;Lotz et al. 2017), often turn out to be dynamically disturbed, highly massive ongoing mergers (e.g., Zitrin and Broadhurst 2009;Merten et al. 2011;Zitrin et al. 2013;Medezinski et al. 2013Medezinski et al. , 2016. These ongoing mergers can produce substructured, highly elongated lenses in projection of the sky (e.g., Umetsu et al. 2005;Acebron et al. 2019), enhancing the lensing efficiency (Meneghetti et al. 2003(Meneghetti et al. , 2007Redlich et al. 2012) by boosting the number of multiple images per critical area, due to the increased ratio of the caustic area relative to the critical area (Zitrin et al. 2013). The projected mass distributions of such ongoing mergers cannot be well described by a single NFW profile. In contrast to the superlens clusters at z l < 0.4 (Umetsu et al. 2011a), NFW fits to the lensing profiles of high-magnification CLASH clusters yield relatively low concentrations (see Umetsu et al. 2016).

Splashback Radius
In the standard ΛCDM paradigm of hierarchical structure formation, galaxy clusters form through accretion of matter along surrounding filamentary structures, as well as through successive mergers of smaller objects. An emerging picture of halo assembly is that shells of matter surrounding an overdense region in the early universe will initially expand with the Hubble flow, decelerate, turn around, and fall back in. Each shell will cross previously collapsed shells that are oscillating in the growing halo potential. In this picture, accreting particles will pile up near the apocenter of their first orbit, thus creating a sharp density enhancement or caustic in the halo outskirts (Gunn and Gott 1972;Fillmore and Goldreich 1984;Bertschinger 1985). This steepening feature depends on the slope of the initial mass perturbation, which determines the mass accretion rate of dark matter halos (Fillmore and Goldreich 1984;Lithwick and Dalal 2011). This is illustrated in Figure 27. Recently, a closer examination of the halo density profiles in cosmological N -body simulations has revealed systematic deviations from the NFW and Einasto profiles in the outskirts at r > ∼ 0.5r 200m (Diemer and Kravtsov 2014). In particular, the halo profiles exhibit a sharp drop in density, a feature associated with the orbital apocenter of the recently accreted matter in the growing halo potential (see Figure 27). The location of the outermost density caustic expected in collisionless halos is referred to as the splashback radius (Diemer and Kravtsov 2014;Adhikari et al. 2014;More et al. 2015). The splashback radius constitutes a physical boundary of halos because it sharply separates the multistream intra-halo region from the outer infall region (Diemer and Kravtsov 2014;More et al. 2015;Mansfield et al. 2017;Okumura et al. 2018). It is also related to the transition scale between the 1-halo and 2-halo regimes to a certain extent (Cooray and Sheth 2002;More et al. 2015;Tomooka et al. 2020).
Splashback features are determined by the orbits of dark matter particles in the halo potential and thus fully characterized in phase space (Diemer 2017;Okumura et al. 2018). Hence, the steepening feature in the density profile alone cannot capture the full dynamical information of dark matter halos (see Okumura et al. 2018). In particular, the "true" location of the splashback radius based on particle orbits is not equivalent to a particular location in the spherically averaged density profile (Diemer 2017;Diemer et al. 2017). Keeping this in mind, it is useful to define the splashback radius r sp as the halo radius where the logarithmic slope of the three-dimensional density profile, γ 3D (r) = d ln ρ(r)/d ln r, is steepest (Diemer and Kravtsov 2014;More et al. 2015).
In the context of ΛCDM, the location of r sp with respect to r 200m is predicted to decrease with mass accretion rate s(a) ≡ d ln M vir (a)/da (Diemer and Kravtsov 2014;Adhikari et al. 2014;More et al. 2015;Diemer et al. 2017) and to increase with Ω m (a) ≡ ρ(a)/ρ c (a) (More et al. 2015;Diemer et al. 2017), with some additional dependence on peak height ν (see Diemer et al. 2017). In ΛCDM cosmologies, fast accreting halos have r sp < ∼ r 200m with a sharper splashback feature, while for slowly accreting halos r sp can be as large as ∼ 1.6r 200m (More et al. 2015;Diemer et al. 2017). The steepening splashback signal is thus expected to be strongest for massive galaxy clusters because they are, on average, fast accreting systems (Adhikari et al. 2014;Diemer and Kravtsov 2014).
The splashback radius r sp has a well-defined feature that can be inferred from weak lensing and density statistics of the galaxy distribution. When using galaxies as a tracer of the mass distribution around clusters, however, one needs to account for the effect of dynamical friction that acts to reduce the orbital apocenter of subhalos hosting cluster galaxies (see Adhikari et al. 2016). Since the efficiency of dynamical friction increases with the ratio of subhalo to cluster halo mass, the impact of dynamical friction on the splashback feature is expected to depend on the luminosity of tracer galaxies (e.g., Chang et al. 2018).
Cluster-galaxy weak lensing can be used to directly test detailed predictions for the splashback feature in the outer density profile of cluster halos. Since the location of the steepest slope in three dimensions is a trade-off between the steepening 1-halo term and the 2-halo term, one needs to precisely measure the lensing signal in both 1-halo and 2halo regimes spanning a wide range in cluster-centric radius. The location of the threedimensional splashback radius r sp can then be inferred by forward-modeling the projected lensing profile (∆Σ(R) or Σ(R)) assuming a flexible fitting function, such as the DK14 profile (Section 4.3.3). 20 As discussed in More et al. (2016) and Umetsu and Diemer (2017), one would apply two requirements to claim a detection of the splashback radius using a DK14 model (Equation (122)), namely (1) that the location of the steepest slope in three dimensions can be identified at high statistical significance and (2) that this steepening is greater than that expected from a DK14 model with f trans (r) = 1 (i.e., an Einasto profile). The second criterion is important to ensure that the steepening is associated with a density caustic rather than the transition between the 1-halo and 2-halo terms. The best-fit DK14 profile is shown as blue dots at the locations of the data points. This is a slightly modified version of the figure presented in Umetsu and Diemer (2017). Umetsu and Diemer (2017) were the first to attempt to place direct constraints on the splashback radius r sp around clusters by using gravitational lensing observations. They developed methods for modeling averaged cluster lensing profiles scaled to a chosen halo overdensity ∆, which can be optimized for the extraction of gradient features that are local in cluster radius, in particular the density steepening due to the splashback radius. Umetsu and Diemer (2017) examined the ensemble mass distribution of 16 CLASH X-ray-selected clusters with a weighted mean mass of M 200m ≈ 13 × 10 14 h −1 M , by forward-modeling the Σ(R) profiles of individual clusters (Figure 22) obtained by Umetsu et al. (2016). The maximum radius of their ensemble analysis is R max ≈ 5h −1 Mpc ∼ 2.5r 200m . Their results are shown in Figure 28. They found that the CLASH ensemble mass profile in projection is remarkably well described by an NFW or Einasto density profile out to R ≈ 1.2r 200m (Section 6.1), beyond which the data exhibit a flattening with respect to the NFW or Einasto profile due to the 2-halo term. The gradient feature in the cluster outskirts is most pronounced for a scaling with r 200m , which is consistent with simulation results of Diemer and Kravtsov (2014) and Lau et al. (2015) (Section 6.1). Umetsu and Diemer (2017), however, did not find statistically significant evidence for the existence of r sp in the CLASH lensing data, as limited by the field of view of Suprime-Cam (34 × 27 arcmin 2 ) on the Subaru telescope. Assuming the DK14 profile form and generic priors calibrated with numerical simulations, they placed an informative lower limit on the splashback radius of the clusters, if it exists, of r sp /r 200m > 0.89 at 68% confidence. This constraint is in agreement with the ΛCDM expectation for the CLASH sample, r sp /r 200m ≈ 0.97 (More et al. 2015). Left: The top panel shows the stacked ∆Σ + profile as a function of (comoving) cluster-centric radius R (black points with error bars) derived for a sample of 3684 redMaPPer clusters (0.2 < z l < 0.55; 20 < λ < 100) in the first year DES data. The red line shows the model fit to the lensing measurements. The inferred location of the splashback radius rsp with its 1σ uncertainty is marked by the vertical orange band. The bottom panel shows the residual in the fits divided by the uncertainty of the measurement. Right: comparison of model-fits from the projected galaxy distribution (gray) and weak lensing (red). The upper panel shows the fraction of the density profile ρ coll (r)/ρ(r) for the collapsed material ρ coll (r) ≡ ρ E (r)ftrans(r) over the total density profile ρ(r) (see Equation (122)). The middle panel shows the logarithmic gradient of the total density profile compared to that of an NFW profile (dashed line). The lower panel shows the logarithmic gradient of the profile for the collapsed material compared to that of an NFW profile. The vertical lines mark the mean values of rsp inferred from the model fits for both galaxy and lensing measurements, while the horizontal bars in the middle panel indicate the uncertainties on rsp. The figure is adapted from Chang et al. (2018). Chang et al. (2018) measured both galaxy number density (Σ g ) and tangential shear (∆Σ + ) profiles around a statistical sample of clusters detected by the red-sequence Matchedfilter Probabilistic Percolation (redMaPPer) algorithm in the first year DES data. Their fiducial sample of 3684 clusters is defined by a redshift selection of 0.2 < z l < 0.55 and a richness selection of 20 < λ < 100. The sample is characterized by an effective mass of M 200m ≈ 1.8 × 10 14 h −1 M at a mean redshift of z l ≈ 0.41. The left panels of Figure 29 show the stacked ∆Σ + (R) profile around their fiducial sample along with the best-fit DK14 profile. For DK14 modeling of the weak-lensing signal (see Section 4.3.3), Chang et al. (2018) assumed uniform priors on (ρ −2 , r −2 , b e ρ, s e ) and Gaussian priors on the shape parameters of log 10 α E = log 10 (0.19) ± 0.1, log 10 β = log 10 (6.0) ± 0.2, and log 10 γ = log 10 (4.0) ± 0.2 (More et al. 2016;Umetsu and Diemer 2017), allowing a rep-resentative range of values calibrated by N -body simulations (Diemer and Kravtsov 2014;More et al. 2015). They also marginalized over two additional parameters describing the cluster miscentering effect expected for optically selected clusters (Johnston et al. 2007).
With the stacked DES weak-lensing measurements, Chang et al. (2018) constrained the location of the steepest slope in the three-dimensional density profile to lie in the range r sp /r 200m = 0.97 ± 0.15. The location and steepness of this gradient feature inferred from the weak-lensing signal agrees within the errors with those inferred from the stacked galaxy density measurements (r sp /r 200m = 0.82±0.05), as shown in the right panels of Figure 29. Note that, as mentioned above, the r sp determined by the galaxy density profile is expected to be smaller than that of the underlying matter distribution because of dynamical friction, depending on the mass of the galaxies used. From the weak-lensing (or galaxy density) profile, Chang et al. (2018) found the total cluster density profile at the location of r sp to be steeper than the NFW profile form at a significance level of 2.0σ (or 3.0σ). Similarly, they found the 1-halo term of the DK14 profile ρ E (r)f trans (r) at r sp to be steeper than the NFW form at the 2.9σ (or 4.6σ) level. Chang et al. (2018) found that r sp measured from weak lensing is in agreement with the expectation from N -body simulations within the large errors. In contrast, the r sp measured from the galaxy density profile with a higher precision is significantly smaller than that determined from the corresponding population of subhalos in N -body simulations, which is in agreement with previous results from SDSS data (More et al. 2016;Baxter et al. 2017). This discrepancy is likely due in part to the systematic effects associated with the optical cluster finding algorithm (Zu et al. 2017;Busch and White 2017).
More recently, Contigiani et al. (2019) placed a stacked lensing constraint on the splashback feature for their sample of 27 high-mass clusters at 0.15 < z l < 0.55 targeted by the CCCP survey (Cluster Canadian Comparison Project;Hoekstra et al. 2015). The cluster sample is characterized by a weighted mean mass of M 200m ≈ 1.2 × 10 15 h −1 M at a mean redshift of z l ∼ 0.2. Their analysis is based on wide-field weak-lensing data taken with CFHT/MegaCam with a 1 deg 2 field of view. Their data set is very similar in nature to the CLASH sample of Umetsu and Diemer (2017) because both studies are based on targeted lensing observations of similarly high-mass clusters. Although they did not detect a significant steepening, Contigiani et al. (2019) constrained the splashback radius for their stacked sample as r sp = 3.6 +1.2 −0.7 Mpc (comoving) assuming a DK14 profile with generic priors calibrated with numerical simulations. Although the sample size of clusters in Contigiani et al. (2019) is not significantly larger than that of Umetsu and Diemer (2017), the large field-of-view of CFHT/MegaCam allowed them to better constrain the location and steepness of the splashback feature in the cluster outskirts.
These studies represent a first step toward using cluster-galaxy weak lensing and density statistics of the galaxy distribution to examine well-defined predictions for the splashback features from cosmological N -body simulations (e.g., Diemer et al. 2017) and to explore the physics associated with the splashback radius of collisionless halos. A significant improvement in the statistical quality of data is expected from ongoing and upcoming wide-field surveys. On the other hand, improved understanding of systematic effects, such as selection bias of observable-selected clusters and projection effects, will be needed to harness the full potential of such high-precision measurements. Furthermore, the use of phase-space statistics will be extremely useful to explore the properties and signatures of dark matter, in particular of dark matter self interactions (More et al. 2016;Okumura et al. 2018).

Mass Calibration for Cluster Cosmology
Determining the abundance of rare massive galaxy clusters above a given mass threshold provides a powerful probe of cosmological parameters, especially Ω m and σ 8 (e.g., Rosati et al. 2002). This constraining power is primarily due to the fact that clusters constitute the high-mass tail of hierarchical structure formation, which is exponentially sensitive to the growth of cosmic structure (Haiman et al. 2001;Watson et al. 2014). Conversely, obtaining an accurate calibration of the mass scale for a given cluster sample is key to harness the power of cluster cosmology (Pratt et al. 2019). In this context, large statistical samples of clusters spanning a wide range in mass and redshift, with a well-characterized selection function, provide an independent means of examining any viable cosmological model (e.g., Allen et al. 2004;Vikhlinin et al. 2009;Mantz et al. 2010;de Haan et al. 2016). In principle, clusters can complement other cosmological probes if the systematics are well understood and controlled.
Thanks to dedicated blind SZE surveys that are made possible in recent years, large homogeneous samples of clusters have been obtained through the SZE selection, out to and beyond a redshift of unity over a wide area of the sky (e.g., Planck Collaboration et al. 2014aCollaboration et al. , 2015aBleem et al. 2015). In particular, the Planck satellite mission produced representative catalogs of galaxy clusters detected via the SZE signal from its all-sky survey. The Planck PSZ2 catalog contains 1653 SZE detections of cluster candidates from the 29 month fullmission Planck data (Planck Collaboration et al. 2015a). The full-mission Planck cosmology sample contains 439 clusters down to S/N of 6, representing the most massive population of clusters with a well-behaved selection function (Planck Collaboration et al. 2015a, 2016b. Despite these recent advances, we must reiterate that accurate cluster mass measurements are essential in the cosmological interpretation of the cluster abundance (Pratt et al. 2019). Since clusters are detected by the observable baryonic signature, an external calibration of the corresponding observable-mass scaling relation is necessary for a cosmological interpretation of the cluster sample, by accounting for inherent statistical effects and selection bias (e.g., Battaglia et al. 2016;Miyatake et al. 2019;Sereno et al. 2020). This was acutely demonstrated as an internal tension in the Planck analyses (Planck Collaboration et al. 2014bCollaboration et al. , 2016b, which revealed a non-negligible level of discrepancy between the cosmological parameters (Ω m , σ 8 ) derived from the Planck cluster counts and those from combining the Planck primary CMB measurements with non-cluster data sets, both within the framework of the standard ΛCDM cosmology. Here it should be note that Planck Collaboration et al. (2014b) employed X-ray observations with XMM-Newton to calibrate the scaling relation between the SZE signal strength and cluster mass for their Planck cluster cosmology sample. However, the determination of cluster mass relied on the assumption that the intracluster gas is in hydrostatic equilibrium (hereafter HSE) with the gravitational potential dominated by dark matter.
To characterize the overall level of mass bias (assumed to be constant in cluster mass and redshift) for their SZE-selected clusters, Planck Collaboration et al. (2014b) introduced a parameter defined as (see also Planck Collaboration et al. 2016b): where M SZE denotes the SZE mass proxy and M true is the true mass (see Penna-Lima et al. 2017), both defined at an overdensity of ∆ c = 500. Note that this factor includes not only astrophysical biases but all systematics encoded in the statistical relationship between the Planck-based mass and the true mass (see Planck Collaboration et al. 2014b;Donahue et al. 2014). The Planck team initially adopted 1−b = 0.8 as a fiducial value with a flat prior in the range [0.7, 1.0] (Planck Collaboration et al. 2014b), which is about the level expected due to deviations from the assumed HSE, b HSE ∼ (10 − 20)% (e.g., Nagai et al. 2007;Meneghetti et al. 2010;Angelinelli et al. 2020). If the bias were zero, the Planck CMB cosmology would predict far more massive clusters than observed. By analyzing the Planck cosmology sample from the full PSZ2 cluster catalog, Planck Collaboration et al. (2016b) found that the level of mass bias required to bring the Planck cluster counts and Planck primary CMB into full agreement in their base ΛCDM cosmology is 1 − b = 0.58 ± 0.04. Intriguingly, this would imply that Planck masses underestimate the true values by b = (42 ± 4)%, compared to b ∼ 20% initially adopted by Planck Collaboration et al. (2014b). The level of disagreement between the Planck cluster counts and Planck primary CMB is about 2σ. While this tension could potentially reflect a higher-than-expected sum of neutrino masses or more exotic physics, the confidence in such a scenario would be limited by systematic uncertainties arising from both astrophysical and observational effects (e.g., Planck Collaboration et al. 2014b;Donahue et al. 2014;Sereno and Ettori 2017). In fact, the level of disagreement appears to slightly decrease after accounting for updated lower values of the reionization optical depth (Planck Collaboration et al. 2016c). Nevertheless, this tension has attracted considerable attention in the cluster community and led to deeper investigations into the mass calibration for this representative cosmology sample of Planck clusters.
Several recent studies used weak-lensing mass estimates, M WL , to recalibrate cluster masses for subsets of the Planck cosmology sample assuming that the average weak-lensing mass is unbiased (von der Linden et al. 2014b;Hoekstra et al. 2015;Smith et al. 2016;Penna-Lima et al. 2017). The samples used in these studies typically contain a few tens of Planck clusters. The results of mass calibrations are controversial in terms of the inferred level of bias, with some studies finding relatively high values of mass bias, 1 − b WL ≡ M SZE /M WL ∼ 0.6 − 0.8 with typical uncertainties of ±0.1 (von der Linden et al. 2014b;Hoekstra et al. 2015;Penna-Lima et al. 2017), and some finding little bias for low-z Planck clusters, 1 − b WL = 0.95 ± 0.04 (at 0.15 < z < 0.3; Smith et al. 2016). However, it should be noted that some of these mass calibrations did not include the correction for Eddington bias and their inferred values of (1 − b WL ) are thus likely overestimated (see Battaglia et al. 2016;Medezinski et al. 2018a;Miyatake et al. 2019). The results of mass-calibration efforts for SZE-selected cluster samples are summarized in Figure 30 (for details, see Miyatake et al. 2019).
Recently, Medezinski et al. (2018a) performed a weak-lensing analysis of five Planck clusters located within ∼ 140 deg 2 of full-depth and full-color HSC-SSP data. With its unique combination of area and depth, the HSC Wide layer will provide uniformly determined weak-lensing mass measurements for thousands of clusters over the total sky area of ∼ 1000 deg 2 . This is different from previous studies in which weak-lensing measurements are based on targeted observations and/or archival data. Using the high-quality HSC So the mystery continues. More representative subsamples with greater overlap with the Planck cosmology sample are needed to draw a definitive conclusion about (1 − b) and its cosmological implications. When the full HSC-Wide survey is complete, we expect to have ∼ 40 Planck clusters observed in the total area of ∼ 1000 deg 2 . The level of uncertainty on the mass calibration, if assuming it is statistics dominated, will be reduced from the ∼ 10% level achieved with five Planck-HSC clusters (Medezinski et al. 2018a) to reach ∼ 4%. This is below the current level of systematic uncertainty in the cluster mass calibration, < ∼ 10% (Medezinski et al. 2018a;Miyatake et al. 2019) and will thus require an even more stringent treatment of weak-lensing systematics. It will also allow us to examine in detail the level of HSE bias and study its possible dependence on cluster mass and redshift, providing valuable information about the thermodynamic history of intracluster gas.

Conclusions
In this paper we presented a comprehensive review of cluster-galaxy weak lensing, covering a range of topics relevant to its cosmological and astrophysical applications. The goals of this review were (1) to provide a self-contained pedagogical overview of the theoretical foundations for gravitational lensing from the first principles (Section 2), with special attention to the basics and advanced concepts of cluster weak lensing (Sections 3, 4, and 5), and (2) to summarize and highlight recent advances in our understanding of the mass distribution in and around cluster halos based on numerical simulations and observational results (Section 6).
Thanks to concerted community efforts over the last few decades, there have been substantial progress over the last few decades in this area on both observational and theoretical grounds. In this review we focused on several key issues, namely, the shape of the mass density profile (Section 6.1), the c-M relation and its intrinsic scatter (Section 6.2), splashback features in the cluster outskirts (Section 6.3), and cluster mass calibrations for cluster cosmology (Section 6.4). Observations of cluster-galaxy weak lensing are, thus far, generally favorable for the standard ΛCDM paradigm of structure formation, in terms of the standard explanation for dark matter as effectively collisionless and nonrelativistic on submegaparsec scales and beyond, with an excellent match between data and predictions for cluster-size massive halos (Section 6).
These studies constitute an encouraging step toward using cluster-galaxy weak lensing to robustly test detailed predictions of ΛCDM and its variants, such as SIDM and ψDM, calibrated from cosmological numerical simulations. Such predictions can be unambiguously tested across a wide range in cluster mass and redshift, with large statistical samples of clusters from ongoing and planned lensing surveys, such as Subaru HSC-SSP, DES, the Large Synoptic Survey Telescope (LSST), WFIRST, and Euclid.