1 Introduction

The propagation of light rays from a distant source to the observer is governed by the gravitational field of local inhomogeneities, as well as by the global geometry of the universe (Schneider et al. 1992). Hence, the images of background sources carry the imprint of gravitational lensing by intervening cosmic structures. Observations of gravitational lensing phenomena can thus be used to study the mass distribution of cosmic objects dominated by dark matter and to test models of cosmic structure formation (Blandford and Narayan 1992).

Galaxy clusters represent the largest class of self-gravitating systems formed in the universe, with typical masses of \( M\sim 10^{{14 - 15}} h^{{ - 1}} M_{ \odot } \). In the context of the standard structure formation scenario, cluster halos correspond to rare massive local peaks in the primordial density perturbations (e.g., Tinker et al. 2010). Galaxy clusters act as powerful cosmic lenses, producing a variety of detectable lensing effects from strong to weak lensing (Kneib and Natarajan 2011), including deflection, shearing, and magnifying of the images of background sources (e.g., Umetsu et al. 2016). The critical advantage of cluster gravitational lensing is its ability to study the mass distribution of individual and ensemble systems independent of assumptions about their physical and dynamical state (e.g., Clowe et al. 2006).

Weak gravitational lensing is responsible for the weak shape distortion, or shear, and magnification of the images of background sources due to the gravitational field of intervening massive objects and large-scale structure (Bartelmann et al. 2001; Schneider 2005; Umetsu 2010; Hoekstra 2013; Mandelbaum 2018). Weak shear lensing by galaxy clusters gives rise to levels of up to a few 10% of elliptical distortions in images of background sources. Thus, the weak shear lensing signal, as measured from small but coherent image distortions in galaxy shapes, can provide a direct measure of the projected mass distribution of galaxy clusters (e.g., Kaiser and Squires 1993; Fahlman et al. 1994; Okabe and Umetsu 2008). On the other hand, lensing magnification can influence the observed surface number density of background galaxies seen behind clusters, by enhancing their apparent fluxes and expanding the area of sky (e.g., Broadhurst et al. 1995, 2005b; Taylor et al. 1998; Umetsu et al. 2011b; Chiu et al. 2020). The former effect increases the source counts above the limiting flux, whereas the latter reduces the effective observing area in the source plane, thus decreasing the observed number of sources per unit solid angle. The net effect, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function.

In this paper, we present a self-contained pedagogical review of weak gravitational lensing of background galaxies by galaxy clusters (cluster–galaxy weak lensing), highlighting recent advances in our theoretical and observational understanding of the mass distribution in galaxy clusters. We shall begin by reviewing the theoretical foundations of gravitational lensing (Sect. 2), with special attention to the basics and advanced techniques of cluster–galaxy weak lensing (Sects. 3, 4, and 5). Then, we highlight and discuss key findings from recent cluster–galaxy weak-lensing studies (Sects. 6), with a focus on cluster mass distributions (Sect. 6.1), the concentration–mass relation (Sect. 6.2), the splashback radius (Sect. 6.3), and implications from extensive mass-calibration efforts for cluster cosmology (Sect. 6.4). Finally, conclusions are given in Sect. 7.

There have been a number of reviews of relevant subjects (e.g., Blandford and Narayan 1992; Narayan and Bartelmann 1996; Mellier 1999; Hattori et al. 1999; Umetsu et al. 1999; Van Waerbeke and Mellier 2003; Schneider 2005; Kneib and Natarajan 2011; Hoekstra 2013; Futamase 2015; Mandelbaum 2018). For general treatments of gravitational lensing, we refer the reader to Schneider et al. (1992). For a general review of weak gravitational lensing, see Bartelmann and Schneider (2001) and Schneider (2005). For a comprehensive review of cluster lensing, see Kneib and Natarajan (2011). For a pedagogical review on strong lensing in galaxy clusters, see Hattori et al. (1999a).

Throughout this paper, we denote the present-day density parameters of matter, radiation, and \({\varLambda }\) (a cosmological constant) in critical units as \(\varOmega _\mathrm {m}, \varOmega _\mathrm {r}\), and \(\varOmega _{\varLambda }\), respectively (see, e.g., Komatsu et al. 2009). Unless otherwise noted, we assume a concordance \({\varLambda }\) cold dark matter (\({\varLambda }\hbox {CDM}\)) cosmology with \(\varOmega _\mathrm {m}=0.3\), \(\varOmega _{\varLambda }=0.7\), and a Hubble constant of \(H_0 = 100\,\hbox {km s}^{-1}\,\hbox {Mpc}^{-1}\) with \(h=0.7\). We denote the mean matter density of the universe at a particular redshift z as \(\overline{\rho }(z)\) and the critical density as \(\rho _\mathrm {c}(z)\). The present-day value of the critical density is \(\rho _\mathrm {c,0}=3H_0^2/(8\pi G)\approx 1.88\times 10^{-29}h^2\,\hbox {g cm}^{-3}\approx 2.78\times 10^{11}h^2 M_\odot \,\hbox {Mpc}^{-3}\), with G the gravitational constant. We use the standard notation \(M_{\varDelta _\mathrm {c}}\) or \(M_{\varDelta _\mathrm {m}}\) to denote the mass enclosed within a sphere of radius \(r_{\varDelta _\mathrm {c}}\) or \(r_{\varDelta _\mathrm {m}}\), within which the mean overdensity equals \(\varDelta _\mathrm {c} \times \rho _\mathrm {c}(z)\) or \(\varDelta _\mathrm {m} \times \overline{\rho }(z)\) at a particular redshift z. That is, \(M_{\varDelta _\mathrm {c}}=(4\pi /3)\varDelta _\mathrm {c}\rho _\mathrm {c}(z)r_{\varDelta _\mathrm {c}}^3\) and \(M_{\varDelta _\mathrm {m}}=(4\pi /3)\varDelta _\mathrm {m}\overline{\rho }(z)r_{\varDelta _\mathrm {m}}^3\). We generally denote three-dimensional radial distances as r and reserve the symbol R for projected radial distances. Unless otherwise noted, we use projected densities (e.g., \(\varSigma (R)\)) and distances (e.g., R) both defined in physical (not comoving) units. All quoted errors are \(1\sigma \) confidence levels (CL) unless otherwise stated.

2 Theory of gravitational lensing

The local universe appears to be highly inhomogeneous on a wide range of scales from stars, galaxies, through galaxy groups and clusters, to forming superclusters, large-scale filaments, and cosmic voids. The propagation of light from a far-background source is thus influenced by the gravitational field caused by such local inhomogeneities along the line of sight. In general, a complete description of the light propagation in an arbitrary curved spacetime is a complex theoretical problem. However, a much simpler description is possible under a wide range of astrophysically relevant circumstances, which is referred to as the gravitational lensing theory (e.g., Schneider et al. 1992; Bartelmann and Schneider 2001; Kneib and Natarajan 2011). This section reviews the basics of gravitational lensing theory to provide a basis and framework for cluster lensing studies, with an emphasis on weak gravitational lensing.

2.1 Bending of light in an asymptotically flat spacetime

To begin with, let us consider the bending of light in a weak-field regime of an asymptotically flat spacetime in the framework of general relativity. Specifically, we assume an isolated and stationary mass distribution (Schneider et al. 1992). Then, the metric tensor \(g_{\mu \nu }\) (\(\mu ,\nu =0,1,2,3\)) of the perturbed spacetime can be written as:

$$\begin{aligned} {\text{d}}s^2&= g_{\mu \nu } \text {d}x^\mu \text {d}x^\nu \nonumber \\&= -(1+2\varPhi _\mathrm {N}/c^2)c^2 \text {d}t^2 +(1-2\varPhi _\mathrm {N}/c^2) \left[ (\text {d}x^1)^2+(\text {d}x^2)^2+(\text {d}x^3)^2\right] , \end{aligned}$$
(1)

where \((x^\mu )=(ct,x^1,x^2,x^3)\) are the four spacetime coordinates, \(\varPhi _\mathrm {N}\) is the Newtonian gravitational potential in a weak-field regime \(|\varPhi _\mathrm {N}/c^2|\ll 1\), and c is the speed of light in vacuum. We consider the metric given by Eq. (1) to be the sum of a background metric \(g_{\mu \nu }^{(\mathrm {b})}\) and a small perturbation denoted by \(h_{\mu \nu }\), that is, \(g_{\mu \nu }=g_{\mu \nu }^{(\mathrm {b})} + h_{\mu \nu }\) with \(|h_{\mu \nu }|\ll 1\).

To the first order in \(\varPhi _\mathrm {N}/c^2\), we have \(g_{\mu \nu }^{(\mathrm {b})}=\eta _{\mu \nu }=\mathrm {diag}(-1,1,1,1)\) and \(h_{\mu \nu }=\mathrm {diag}(-2\varPhi _\mathrm {N},-2\varPhi _\mathrm {N},-2\varPhi _\mathrm {N},-2\varPhi _\mathrm {N})/c^2\), where \(g^{\mu \nu }\) and \(g^{(\mathrm {b})\mu \nu }\) are defined by \(g^{\mu \rho } g_{\rho \nu }= \delta ^{\mu }_{\nu }\) and \(g^{(\mathrm {b})\mu \rho } g^{(\mathrm {b})}_{\rho \nu }= \delta ^{\mu }_{\nu }\), with \(\delta ^\mu _\nu \) the Kronecker delta symbol in four dimensions. Then, to the first order of h, we have \(g^{\mu \nu }= g^{(\mathrm {b})\mu \nu }-h^{\mu \nu }\), where \(h^{\mu \nu }\) is defined by \(h^{\mu \nu }\equiv g^{(\mathrm {b})\mu \rho } g^{(\mathrm {b})\nu \sigma } h_{\rho \sigma } =\eta ^{\mu \rho }\eta ^{\nu \sigma }h_{\rho \sigma }\).

The propagation of light is described by null geodesic equations:

$$\begin{aligned} k^\mu&\equiv \frac{\text {d}x^\mu (\lambda )}{\text {d}\lambda },\nonumber \\ 0&=g_{\mu \nu }k^\mu k^\nu ,\nonumber \\ \frac{\text {d}k^\mu }{\text {d}\lambda }&=-\varGamma ^{\mu }_{\nu \lambda }k^\nu k^\lambda , \end{aligned}$$
(2)

where \(k^\mu \) is the four-momentum, \(\lambda \) is the affine parameter, and \(\varGamma ^\mu _{\nu \lambda }\) denotes the Christoffel symbol, \(\varGamma ^{\mu }_{\nu \rho } = (1/2) g^{\mu \lambda }\left( g_{\lambda \nu ,\rho } + g_{\lambda \rho , \nu } - g_{\nu \rho ,\lambda }\right) \), with \(g_{\mu \nu }^{(\mathrm {b})}=\eta _{\mu \nu }\) and \(\varGamma ^{(\mathrm {b})\mu }_{\nu \rho }=0\) in the background Minkowski spacetime. For a light ray propagating along the \(x^3\)-direction in the background metric, the photon four-momentum \(k^{(\mathrm {b})\mu }\) and the unperturbed orbit \(x^{(\mathrm {b})\mu }\) are given by \(k^{(\mathrm {b})\mu }={\text{d}}x^{(\mathrm {b})\mu }/{\text{d}}\lambda =(1,0,0,1)\) and \(x^{(\mathrm {b})\mu }=(\lambda ,0,0,\lambda )\).

Now, we consider the light ray propagation in a perturbed spacetime. To this end, we express the perturbed orbit \(x^\mu (\lambda )\) as a sum of the unperturbed path \(x^{(\mathrm {b})\mu }(\lambda )\) and the deviation vector \(\delta x^\mu (\lambda )\):

$$\begin{aligned} x^\mu (\lambda ) = x^{(\mathrm {b})\mu }(\lambda ) + \delta x^\mu (\lambda ). \end{aligned}$$
(3)

Without loss of generality, we can take the deflection angle to lie in the \(x^3 x^1\) plane with \(x^2=0\), and we denote \((x^1, x^3) = (x^\perp , x^{||})\). In the weak-field limit of \(|\varPhi _\mathrm {N}/c^2|\ll 1\), the impact parameter b of the incoming light ray is much greater than the Schwarzschild radius of the deflector with mass M, that is, \(b\gg 2GM/c^2\). Then, the linearized null geodesic equations are written asFootnote 1:

$$\begin{aligned} k^\mu (\lambda )&= k^{(\mathrm {b})\mu }(\lambda ) + \delta k^\mu (\lambda ),\nonumber \\ 0&=h_{\mu \nu }k^{(\mathrm {b})\mu } k^{(\mathrm {b})\nu }+ 2g^{(\mathrm {b})}_{\mu \nu } k^{(\mathrm {b})\mu }\delta k^\nu ,\nonumber \\ \frac{{\text{d}}(\delta k^\mu )}{\text {d}\lambda }&= -2\varGamma ^{(\mathrm {b})\mu }_{\nu \lambda } k^{(\mathrm {b})\nu } \delta k^\lambda -\delta \varGamma ^{\mu }_{\nu \lambda } k^{(\mathrm {b})\nu } k^{(\mathrm {b})\lambda }. \end{aligned}$$
(4)

The perturbed Christoffel symbol is \(\delta \varGamma ^{\mu }_{\nu \rho }= (1/2) \eta ^{\mu \lambda }\left( h_{\lambda \nu ,\rho } + h_{\lambda \rho , \nu }- h_{\nu \rho ,\lambda } \right) + O(h^2)\). Choosing the boundary conditions in the in-state (\(\lambda \rightarrow -\infty \)) as \(\delta k^\mu (-\infty )=0\), we integrate the linearized null geodesic equations (Eq. (4)) to obtain the following equations for the spatial components in the out-state (\(\lambda \rightarrow +\infty \)):

$$\begin{aligned} \delta k^\perp (+\infty ) = -\frac{2}{c^2}\int _{-\infty }^{+\infty }\frac{\partial \varPhi _\mathrm {N}(\lambda )}{\partial x^\perp }\,\text {d}\lambda , \ \ \ \delta k^{||}(+\infty ) = 0, \end{aligned}$$
(5)

where \(\lambda =x^{||}(\lambda )+O(h)\). Taking the unperturbed path, we obtain the bending angle \(\hat{\alpha }\) in the small-angle scattering limit (\(|\hat{\alpha }|\ll 1\)) asFootnote 2

$$\begin{aligned} \hat{\alpha }\simeq \frac{k^{\perp }(+\infty )}{k^{||}(+\infty )} \simeq -\frac{2}{c^2}\int _{-\infty }^{+\infty } \frac{\partial \varPhi _\mathrm {N}(x^{||},x^\perp )}{\partial x^\perp }\,\text {d}x^{||}, \end{aligned}$$
(6)

which is known as the Born approximation. This yields an explicit expression for the bending angle of \(|\hat{\alpha }|\simeq 4GM/(rc^2) = 1.75^{\prime\prime} (M/M_\odot )(r/R_\odot )^{-1}\). General relativity predicts a deflection angle twice as large as that Newtonian physics would provide. Einstein’s prediction for the solar deflection of light is verified within \(\sim 0.1\%\) (e.g., Lebach et al. 1995).

The null geodesic condition leads to \(\delta k^0(\lambda )=-2\varPhi _\mathrm {N}(\lambda )/c^2 + O(h^2)\), or \(c\text {d}t/\text {d}\lambda =1-2\varPhi _\mathrm {N}(\lambda )/c^2 + O(h^2)>1\). The gravitational time-delay \(\varDelta t_\mathrm {grav}\), with respect to the unperturbed light propagation, is thus given by:

$$\begin{aligned} c\varDelta t_\mathrm {grav} = -2\int _{-\infty }^{+\infty }\text {d}\lambda \, \varPhi _\mathrm {N}(\lambda )/c^2. \end{aligned}$$
(7)

Note that there is an additional time-delay due to a change in the geometrical path length caused by gravitational deflection (see Sect. 2.6.1).

2.2 Lens equation

Fig. 1
figure 1

Illustration of a typical lens system. The light ray propagates from the source (S) at the position \(\varvec{\eta }\) in the source plane to the observer (O), passing the position \(\varvec{\xi }\) in the lens plane (L), resulting in a bending angle \(\hat{{\varvec{\alpha }}}\). The angular position of the source (S) relative to the optical axis is denoted by \({\varvec{\beta }}\), and that of the image (I) relative to the optical axis is denoted by \(\varvec{\theta }\). The \(D_l\), \(D_s\), and \(D_{ls}\) are the observer–lens, observer–source, and lens–source angular diameter distances, respectively

Let us consider the situation illustrated in Fig. 1. A light ray propagates from a far-distant source (S) at the position \(\varvec{\eta }\) in the source plane to an observer (O), passing the position \(\varvec{\xi }\) in the lens plane, in which the light is deflected by a bending angle \(\hat{{\varvec{\alpha }}}\). Here, the source and lens planes are defined as planes perpendicular to the optical axis at the distance of the source and the lens, respectively. The exact definition of the optical axis does not matter, because the angular scales involved are very small. The angle between the optical axis and the unlensed source (S) position is \({\varvec{\beta }}\), and the angle between the optical axis and the image (I) is \(\varvec{\theta }\). The angular diameter distances between the observer and the lens, the observer and the source, and the lens and the source, are denoted by \(D_l, D_s\), and \(D_{ls}\), respectively.

As illustrated in Fig. 1, we have the following geometrical relation: \(\varvec{\eta }=(D_s/D_l)\varvec{\xi }+D_{ls}\hat{{\varvec{\alpha }}}(\varvec{\xi })\). Equivalently, this is translated into the relation between the angular source and image positions, \({\varvec{\beta }}=\varvec{\eta }/D_s\) and \(\varvec{\theta }=\varvec{\xi }/D_l\), as:

$$\begin{aligned} {\varvec{\beta }}= \varvec{\theta }+\frac{D_{ls}}{D_s} \hat{{\varvec{\alpha }}}\equiv \varvec{\theta }+ {\varvec{\alpha }}(\varvec{\theta }), \end{aligned}$$
(8)

where we defined the reduced bending angle, or the deflection field (Broadhurst et al. 2005a), \({\varvec{\alpha }}(\varvec{\theta })=(D_{ls}/D_s)\hat{{\varvec{\alpha }}}\) in the last equality. Equation (8) is referred to as the lens equation, or the ray-tracing equation.

Fig. 2
figure 2

Illustration of the lens mapping \(\beta =\beta (\theta )\) (upper panel), image configuration (middle panel), and image parities (lower panel) for a typical axis-symmetric lens system. Three images (\(I_1,I_2,I_3\)) are produced for a source (S) at the location \(\beta =\beta _s\) with a radial width \(\delta \beta \). The images are formed at the intersections of \(\beta =\beta (\theta )\) and the horizontal line \(\beta =\beta _s\). Critical curves are also shown by two solid concentric circles in the middle panel. The inner circle represents the radial critical curve where \({\text{d}}\beta (\theta )/{\text{d}}\theta =0\), while the outer circle represents the tangential critical curve where \(\beta (\theta )=0\). The image parities are illustrated in the lower panel. The images \(I_1\) and \(I_2\) have the same total parity as the source S, while \(I_3\) has the opposite total parity to S. The concepts of critical curves and image parities are given in Sect. 2.6.3

In general, the lens equation is nonlinear with respect to the image position \(\varvec{\theta }\), so that it may have multiple solutions \(\varvec{\theta }\) for a given source position \({\varvec{\beta }}\). This corresponds to multiple imaging of a background source (see Hattori et al. 1999a; Kneib and Natarajan 2011). An illustration of the typical circularly symmetric lens system is shown in Fig. 2. We refer to Keeton (2001) for a review of various families of mass models for gravitational lensing.

2.3 Cosmological lens equation

Here, we turn to the cosmological lens equation that describes the light propagation in a locally inhomogeneous, expanding universe. There are various approaches to derive a cosmological version of the lens equation (e.g., Schneider 1985; Sasaki 1993; Seitz et al. 1994; Futamase 1995; Dodelson 2003; Sereno 2009). We follow the approach by Futamase (1995) based on perturbed null geodesic equations as introduced in Sect. 2.1.

Consider a perturbed Friedman–Lemaítre–Robertson–Walker (FLRW) metric in the Newtonian gauge of the form (e.g., Kodama and Sasaki 1984):

$$\begin{aligned} \text {d}s^2= -(1+2\varPsi /c^2)c^2\text {d}t^2 + (1-2\varPsi /c^2) a^2(t)\left[ \text {d}\chi ^2+r^2(\text {d}\vartheta ^2+\sin ^2\vartheta \text {d}\varphi ^2) \right] , \end{aligned}$$
(9)

where a(t) is the scale factor of the universe (normalized to unity at present), \(\chi \) is the comoving distance, \(\vartheta \) and \(\varphi \) are the spherical polar and azimuthal angles, respectively, \(\varPsi \) is a scalar metric perturbation, K is the spatial curvature of the universe, and \(r = f_K(\chi )\) is the comoving angular diameter distance:

$$\begin{aligned} f_K(\chi ) = {\left\{ \begin{array}{ll} (-K)^{-1/2}\sinh (\chi /\sqrt{-K}) &{} (K<0),\\ \chi &{} (K=0),\\ K^{-1/2}\sin (\chi /\sqrt{K}) &{} (K>0). \end{array}\right. } \end{aligned}$$
(10)

The spatial curvature K is expressed with the total density parameter at the present epoch, \(\varOmega _0\equiv \sum _X\varOmega _{X}=\varOmega _\mathrm {m}+\varOmega _\mathrm {r}+\varOmega _{\varLambda }\), as \(K=(\varOmega _0-1)H_0^2/c^2\). The evolution of a(t) is determined by the Friedmann equation, \(H(a)\equiv (\text {d}a/\text {d}t)/a=H_0[\varOmega _\mathrm {r}a^{-4} + \varOmega _\mathrm {m}a^{-3} + \varOmega _{\varLambda }+ (1-\varOmega _0)a^{-2}]^{1/2}\). In the line element (9), we have neglected all terms of higher than \(O(\varPsi /c^2)\), the contributions from vector and tensor perturbations, and the effects due to anisotropic stress. As we will discuss in Sect. 2.5.1, \(\varPsi \) is interpreted as the Newtonian gravitational potential generated by local inhomogeneities of the matter distribution in the universe.

Since the structure of a light cone is invariant under the conformal transformation, we work with the conformally related spacetime metric given by \(\text {d}\tilde{s}^2 = a^{-2}\text {d}s^2 \equiv \tilde{g}_{\mu \nu } \text {d}x^\mu \text {d}x^\nu \) with \((x^\mu )=(\eta , \chi , \vartheta , \varphi )\), where \(\eta =c\int ^t\text {d}t'/a(t')\) is the conformal time. The metric \(\tilde{g}_{\mu \nu }\) can be rewritten in the form of \(\tilde{g}_{\mu \nu }=\tilde{g}^{(\mathrm {b})}_{\mu \nu }+h_{\mu \nu }\), as a sum of the background metric and a small perturbation (\(|h|\ll 1\)).

We follow the prescription given in Sect. 2.1 to solve the null geodesic equations in the perturbed spacetime (Eq. (9)). To this end, we consider past-directed null geodesics from the observer. Choosing the spherical coordinate system centered on the observer, we have \(k^{(\mathrm {b})\mu }=(-1,1,0,0)\) in the background metric with \(\varPsi =0\). The unperturbed path is parameterized by the affine parameter \(\lambda \) along the photon path as \(x^{(\mathrm {b})\mu }(\lambda )=(-\lambda ,\lambda -\lambda _o,\vartheta _I,\varphi _I)\), where \(\lambda _o\) is the affine parameter at the observer and \((\vartheta _I,\varphi _I)\) denote the angular direction of the image position on the sky. The comoving angular distance r in the background spacetime can be parameterized by \(\lambda \) as \(r(\lambda )\equiv f_K[\chi (\lambda -\lambda _o)]\) (see Eq. (10)).

The perturbed null geodesic equations for the angular components (\(\vartheta ,\varphi \)) can be formally solved as:

$$\begin{aligned} \delta k^i (\lambda ) = -\frac{2}{r^2(\lambda )}\int _{\lambda _o}^{\lambda _s} \text {d}\lambda ^{\prime} \partial ^i \varPsi (\lambda ')/c^2 \ \ \ (i=\vartheta ,\varphi ), \end{aligned}$$
(11)

where \(\partial ^i \varPsi =(\varPsi _{,\vartheta }, \sin ^{-2}\vartheta \varPsi _{,\varphi })\) and we have chosen \(\delta k^\vartheta (\lambda _o)=\delta k^\varphi (\lambda _o)=0\). Inserting this result in Eq. (4) and integrating by part yield (Futamase 1995; Dodelson 2003):

$$\begin{aligned} \vartheta _S&= \vartheta _I -\frac{2}{c^2}\int _{\lambda _o}^{\lambda _s} \frac{r(\lambda _s-\lambda )}{r(\lambda _s)r(\lambda )}\partial ^\theta \varPsi (\lambda )\,\text {d}\lambda ,\nonumber \\ \varphi _S&= \varphi _I -\frac{2}{c^2}\int _{\lambda _o}^{\lambda _s} \frac{r(\lambda _s-\lambda )}{r(\lambda _s)r(\lambda )}\partial ^\varphi \varPsi (\lambda )\,\text {d}\lambda , \end{aligned}$$
(12)

where \(\lambda _s\) is the affine parameter at the background source, \((\vartheta _S, \varphi _S)\equiv (\vartheta (\lambda _s),\varphi (\lambda _s))\) denote the angular direction of the unlensed source position on the sky, and we set \(\delta \vartheta (\lambda _o)=\delta \varphi (\lambda _o)=0\). Here, the integral is performed along the perturbed trajectory \(x^\mu (\lambda )=x^{(\mathrm {b})\mu }(\lambda )+\delta x^\mu (\lambda )\). Equation (12) relates the observed direction of the image position \((\vartheta _I, \varphi _I)\) to the (unlensed) direction of the source position \((\vartheta _S,\varphi _S)\) for a given background cosmology and metric perturbation \(\varPsi (\varvec{\chi },\eta )\). This is a general expression of the cosmological lens equation obtained by Futamase (1995).

2.4 Flat-sky approximation

Now, we consider a small patch of the sky around a given line of sight (\(\vartheta =0\)), across which the curvature of the sky is negligible (\(\vartheta \ll 1\)). Then, we can locally define a flat plane perpendicular to the line of sight. By noting that \(\delta \varvec{\theta }\equiv (\delta \vartheta ,\vartheta \delta \varphi )\) is an angular displacement vector within this sky plane, we can express Eq. (12) as:

$$\begin{aligned} {\varvec{\beta }}(\chi _s) = \varvec{\theta }+ {\varvec{\alpha }}(\chi _s), \end{aligned}$$
(13)

where \({\varvec{\beta }}(\chi _s)\) is the (unlensed) angular position of the source, \(\varvec{\theta }\) is the apparent angular position of the source image, and \({\varvec{\alpha }}(\chi _s)\) is the deflection field given by (Futamase 1995):

$$\begin{aligned} {\varvec{\alpha }}(\chi _s) =-\frac{2}{c^2}\int _0^{\chi _s} \frac{r(\chi _s-\chi )}{r(\chi _s)} \varvec{\nabla }_{\perp }\varPsi [x^\mu (\chi )]\,\text {d}\chi , \end{aligned}$$
(14)

where \(\varvec{\nabla }_\perp \equiv r^{-1}(\lambda )(\partial _\vartheta ,\vartheta ^{-1}\partial _\varphi )\) is the transverse comoving gradient and the integral is performed along the perturbed trajectory \(x^\mu (\lambda )=x^{(\mathrm {b})\mu }(\lambda )+\delta x^\mu (\lambda )\) with \(\lambda =\chi + O(\varPsi /c^2)\). Equation (13) can be applied to a range of lensing phenomena, including multiple deflections of light from a background source (Sect. 2.5), strong and weak gravitational lensing by individual galaxies and clusters (Sect. 2.6), and cosmological weak lensing by the intervening large-scale structure (a.k.a., the cosmic shear). Note that the cosmological lens equation is obtained using the standard angular diameter distance in a background FLRW spacetime without employing the thin-lens approximation (see Sect. 2.6).

2.5 Multiple lens equation

We consider a discretized version of the cosmological lens equation (Eq. (13)) by dividing the radial integral between the source (\(\chi =\chi _s\)) and the observer (\(\chi =0\)) into N comoving boxes (\(N-1\) lens planes) separated by a constant comoving distance of \(\varDelta \chi \). The angular position \(\varvec{\theta }^{(n)}\) of a light ray in the nth plane (\(1\leqslant n\leqslant N\)) is then given by (e.g., Schneider et al. 1992; Schneider 2019):

$$\begin{aligned} {\varvec{\beta }}^{(n)}=\varvec{\theta }^{(0)}+\sum _{m=1}^{n-1} \frac{r(\chi _n-\chi _m)}{r(\chi _n)} \hat{{\varvec{\alpha }}}^{(m)}, \end{aligned}$$
(15)

where \(\varvec{\theta }^{(0)} = {\varvec{\beta }}^{(1)}\) is the apparent angular position of the source image and \(\hat{{\varvec{\alpha }}}^{(m)}\) is the bending angle at the mth lens plane (\(m=1,2,\ldots,n-1\)):

$$\begin{aligned} \hat{{\varvec{\alpha }}}^{(m)}= -\frac{2}{c^2}\varvec{\nabla }_\perp \varPsi [\chi _m,r(\chi _m){\varvec{\beta }}^{(m)}]\,\varDelta \chi . \end{aligned}$$
(16)

The \(2\times 2\) Jacobian matrix of Eq. (15) (\(1\leqslant n \leqslant N\)) is expressed as (e.g., Jain et al. 2000):Footnote 3

$$\begin{aligned} \mathscr{A}^{(n)} := \frac{\partial {\varvec{\beta }}^{(n)}}{\partial \varvec{\theta }^{(0)}} = \mathscr{I} + \sum _{m=1}^{n-1} \frac{D_{mn}}{D_n} \mathscr{H}^{(m)} \mathscr{A}^{(m)}, \end{aligned}$$
(17)

where \(\mathscr{I}\) denotes the identity matrix, \(\mathscr{H}^{(m)}\equiv \partial \hat{{\varvec{\alpha }}}^{(m)}/\partial {\varvec{\beta }}^{(m)}\) is a symmetric dimensionless Hessian matrix with \(\mathscr{H}_{ij}^{(m)}=-(2/c^2)r(\chi _m)\nabla _{\perp ,i}\nabla _{\perp ,j}\varPsi [\chi _m,r(\chi _m){\varvec{\beta }}^{(m)}]\,\varDelta \chi \) (\(i,j=1,2\)), \(D_n\) is the angular diameter distance between the observer and the nth lens plane, and \(D_{mn}\) is the angular diameter distance between the mth and nth lens planes (\(m<n\)). In general, the Jacobian matrix \(\mathscr{A}^{(n)}\) can be decomposed into the following form:

$$\begin{aligned} \mathscr{A}^{(n)}=(1-\kappa )\mathscr{I} - \gamma _1\sigma _3 - \gamma _2\sigma _1 -i\omega \sigma _2, \end{aligned}$$
(18)

where \(\kappa \) is the lensing convergence, \((\gamma _1,\gamma _2)\) are the two components of the gravitational shear (see Sect. 2.6.2 for the definitions and further details of the convergence and shear), \(\omega \) is the net rotation (e.g., Cooray and Hu 2002), and \(\sigma _{a}\; (a=1,2,3)\) are the Pauli matrices that satisfy \(\sigma _{a}\sigma _{b}=i\epsilon _{abc}\sigma _{c}\), with \(\epsilon _{abc}\) the Levi–Civita symbol in three dimensions. The Born approximation \(\mathscr{A}^{(m)}=\mathscr{I}\) on the right-hand side of Eq. (17) leads to a symmetric Jacobian matrix with \(\omega =0\).

The multiple lens equation has been widely used to study gravitational lensing phenomena by ray-tracing through N-body simulations (e.g., Schneider and Weiss 1988; Hamana et al. 2000; Jain et al. 2000).

2.5.1 Cosmological poisson equation

We assume here a spatially flat geometry with \(K=0\) motivated by cosmological observations based on cosmic microwave background (CMB) and complementary data sets (e.g., Hinshaw et al. 2013; Planck Collaboration et al. 2015b). The cosmological Poisson equation relates the scalar metric perturbation \(\varPsi \) (see Eq. (9)) to the matter density perturbation \(\delta \rho \) on subhorizon scales as:

$$\begin{aligned} \varvec{\nabla }^2 \varPsi (\varvec{\chi },\eta ) =4\pi G a^2 \delta \rho =\frac{3H_0^2\varOmega _\mathrm {m}}{2}\frac{\delta }{a}, \end{aligned}$$
(19)

where \(\delta =\delta \rho /\overline{\rho }\) is the density contrast with respect to the background matter density \(\overline{\rho }\) of the universe, \(\overline{\rho }=a^{-3}(3H_0^2\varOmega _\mathrm {m})/(8\pi G)\), and \(\varvec{\nabla }\) is the three-dimensional gradient operator in comoving coordinates. A key implication of Eq. (19) is that the amplitude of \(\varPsi \) is related to the amplitude of \(\delta \) as \(|\varPsi /c^2| \sim (3\varOmega _\mathrm {m}/2)(l/L_H)^2 (\delta /a)\) where l and \(L_H=c/H_0\) denote the characteristic comoving scale of density perturbations and the Hubble radius, respectively. Therefore, assuming the standard matter power spectrum of density fluctuations (e.g., Smith et al. 2003), we can safely conclude that the degree of metric perturbation is always much smaller than unity, i.e., \(|\varPsi /c^2|\ll 1\), even for highly nonlinear perturbations with \(|\delta |\gg 1\) on small scales of \(l\ll L_H\; (\sim 3h^{-1}\mathrm {Gpc}\)).

2.6 Thin-lens equation

2.6.1 Thin-lens approximation

Let us turn to the case of gravitational lensing caused by a single cluster-scale halo. Galaxy clusters can produce deep gravitational potential wells, acting as powerful gravitational lenses. In cluster gravitational lensing, it is often assumed that the total deflection angle, \({\varvec{\alpha }}(\varvec{\theta })\), is dominated by the cluster of interest and its surrounding large-scale environment, which becomes important beyond the cluster virial radius, \(r_\mathrm {vir}\) (Cooray and Sheth 2002; Oguri and Hamana 2011; Diemer and Kravtsov 2014).

Assuming that the light propagation is approximated by a single-lens event due to the cluster and that a light deflection occurs within a sufficiently small region (\(\chi _l-\varDelta \chi /2, \chi _l+\varDelta \chi /2\)) compared to the relevant angular diameter distances, we can write the deflection field by a single cluster as:

$$\begin{aligned} {\varvec{\alpha }}(\varvec{\theta }) \simeq -\frac{2}{c^2} \frac{D_{ls}}{D_s} \int _{\chi _l-\varDelta \chi /2}^{\chi _l+\varDelta \chi /2} \varvec{\nabla }_{\perp }\varPsi [\chi ,r(\chi _l)\varvec{\theta }]\,\text {d}\chi , \end{aligned}$$
(20)

where \(D_s=a(\chi _s)r(\chi _s)\) and \(D_{ls}=a(\chi _s)r(\chi _s-\chi _l)\) are the angular diameter distances from the observer to the source and from the deflector to the source, respectively, and \(r(\chi _l)\varvec{\theta }\) is the comoving transverse vector on the lens plane. In a cosmological situation, the angular diameter distances \(D_{mn}\) between the planes m and n (\(z_m < z_n\)) are of the order of the Hubble radius, \(L_H\equiv c/H_0\sim 3h^{-1}\mathrm {Gpc}\), while physical extents of clusters are about \(2r_\mathrm {200m} \sim (2 - 4)h^{-1}\mathrm {Mpc}\) in comoving units. Therefore, one can safely adopt the thin-lens approximation in cluster gravitational lensing.

We then introduce the effective lensing potential \(\psi (\varvec{\theta })\) defined as:

$$\begin{aligned} \psi (\varvec{\theta }) \simeq \frac{2}{c^2} \frac{D_{ls}}{D_lD_s} \int _{\chi _l-\varDelta \chi /2}^{\chi _l+\varDelta \chi /2} \varPsi [\chi ,r(\chi _l)\varvec{\theta }] \, a \text {d}\chi , \end{aligned}$$
(21)

where \(D_l\) is the angular diameter distance between the observer and the lens, \(D_l=a(\chi _l)r(\chi _l)\). In terms of \(\psi (\varvec{\theta })\), the deflection field \({\varvec{\alpha }}(\theta )\) is expressed as:

$$\begin{aligned} {\varvec{\alpha }}(\varvec{\theta })=-\varvec{\nabla }_{\theta }\psi (\varvec{\theta }), \end{aligned}$$
(22)

where \(\varvec{\nabla }_{\theta }=r\varvec{\nabla }_{\perp }=(\partial _\theta ,\theta ^{-1}\partial _\phi )\).

With the Fermat or time-delay potential defined by:

$$\begin{aligned} \tau (\varvec{\theta }; {\varvec{\beta }})\equiv \frac{1}{2}(\varvec{\theta }-{\varvec{\beta }})^2-\psi (\varvec{\theta }), \end{aligned}$$
(23)

the lens equation can be equivalently written as \(\varvec{\nabla }_{\theta }\tau (\varvec{\theta };{\varvec{\beta }})=0\) (Blandford and Narayan 1986). Here, the first term on the right-hand side of Eq. (23) is responsible for the geometric delay and the second term for the gravitational time-delay. The Fermat potential \(\tau (\varvec{\theta };{\varvec{\beta }})\) is related to the time-delay \(\varDelta t\) with respect to the unperturbed path in the observer frame by \(\varDelta t(\varvec{\theta };{\varvec{\beta }})=D_lD_s/(cD_{ls})(1+z_l)\tau (\varvec{\theta };{\varvec{\beta }})\equiv D_{\varDelta t}\tau (\varvec{\theta };{\varvec{\beta }})/c\). with \(D_{\varDelta t}=(1+z_l)D_lD_s/D_{ls}\propto H_0^{-1}\) the time-delay distance (Refsdal 1964). According to Fermat’s principle, the images for a given source position \({\varvec{\beta }}\) are formed at the stationary points of \(\tau (\varvec{\theta };{\varvec{\beta }})\) with respect to variations of \(\varvec{\theta }\) (Blandford and Narayan 1986).

Note that cluster gravitational lensing is also affected by uncorrelated large-scale structure projected along the line of sight (e.g., Schneider et al. 1998; Hoekstra 2003; Umetsu et al. 2011a; Host 2012). The intervening large-scale structure in the universe perturbs the propagation of light from distant background galaxies, producing small but continuous transverse excursions along the light path. For a given depth of observations, the impact of such cosmic noise is most important in the cluster outskirts where the cluster lensing signal is small (Hoekstra 2003; Becker and Kravtsov 2011; Gruen et al. 2015).

2.6.2 Convergence and shear

Let us work with local Cartesian coordinates \(\varvec{\theta }=(\theta _1,\theta _2)\) centered on a certain reference point in the image plane. The local properties of the lens mapping are described by the Jacobian matrix defined as:

$$\begin{aligned} \mathscr{A}(\varvec{\theta }):=\frac{\partial {\varvec{\beta }}}{\partial \varvec{\theta }} = \left( \begin{array}{cc} 1 - \psi _{,11}&{} - \psi _{,12} \\ -\psi _{,12} &{} 1 - \psi _{,22} \end{array}\right) , \end{aligned}$$
(24)

where we have introduced the notation, \(\psi _{,ij}=\partial ^2\psi /\partial \theta _i\partial \theta _j\) (\(i,j=1,2\)). Alternatively, we can write the Jacobian matrix as \(\mathscr{A}_{ij}=\delta _{ij}-\psi _{,ij}\) (\(i,j=1,2\)) with \(\delta _{ij}\) the Kronecker delta in two dimensions. This symmetric \(2\times 2\) Jacobian matrix \(\mathscr{A}\) can be decomposed as:

$$\begin{aligned} \mathscr{A}=(1-\kappa )\mathscr{I}- \gamma _1\sigma _3 - \gamma _2\sigma _1, \end{aligned}$$
(25)

where \(\sigma _{a} \; (a=1,2,3)\) are the Pauli matrices (Sect. 2.5); \(\kappa (\varvec{\theta })\) is the lensing convergence responsible for the change in the trace part of the Jacobian matrix (\(\mathrm {tr}(\mathscr{A})=2(1-\kappa )\)):

$$\begin{aligned} \kappa := \frac{1}{2}\left( \psi _{,11}+\psi _{,22}\right) = \frac{1}{2}\triangle \psi \end{aligned}$$
(26)

with \(\triangle = \varvec{\nabla }_\theta ^2\), and (\(\gamma _1,\gamma _2\)) are the two components of the complex shear \(\gamma (\varvec{\theta }):=\gamma _1(\varvec{\theta })+i\gamma _2(\varvec{\theta })\):

$$\begin{aligned} \gamma _1&:= \frac{1}{2}\left( \psi _{,11}-\psi _{,22}\right) ,\nonumber \\ \gamma _2&:= \frac{1}{2}\left( \psi _{,12}+\psi _{,21}\right) =\psi _{,12}. \end{aligned}$$
(27)

Note that Eq. (26) can be regarded as a two-dimensional Poisson equation, \(\triangle \psi (\varvec{\theta }) = 2\kappa (\varvec{\theta })\). Then, the Green function in the (hypothetical) infinite domain is \(\triangle ^{-1}(\varvec{\theta },\varvec{\theta }')=\ln |\varvec{\theta }-\varvec{\theta }'|/(2\pi )\),Footnote 4 so that the convergence is related to the lensing potential as:

$$\begin{aligned} \psi (\varvec{\theta })=\frac{1}{\pi } \int \ln (\varvec{\theta }-\varvec{\theta }') \kappa (\varvec{\theta }')\,{\text{d}}^2\theta '. \end{aligned}$$
(28)

The Jacobian matrix is expressed in terms of \(\kappa \) and \(\gamma \) as:

$$\begin{aligned} \mathscr{A}(\varvec{\theta }) = \left( \begin{array}{cc} 1 - \kappa - \gamma _1 &{} - \gamma _2 \\ - \gamma _2 &{} 1 - \kappa + \gamma _1 \end{array}\right) . \end{aligned}$$
(29)

The determinant of the Jacobian matrix (Eq. (29)) is given as \(\mathrm {det}\mathscr{A}=(1-\kappa )^2-|\gamma |^2\). In the weak-lensing limit where \(|\kappa |, |\gamma | \ll 1\), \(\mathrm {det}\mathscr{A}\simeq 1-2\kappa \) to the first order.

The deformation of the image of an infinitesimal circular source (\({\text{d}}{\varvec{\beta }}\rightarrow 0\)) behind the lens can be described by the inverse Jacobian matrix \(\mathscr{A}^{-1}\) of the lens equation. In the weak-lensing limit (\(|\kappa |, |\gamma |\ll 1\)), we have:

$$\begin{aligned} \left( \mathscr{A}^{-1}\right) _{ij} \simeq (1 + \kappa )\delta _{ij} + {\varGamma }_{ij} \ \ \ (i,j=1,2), \end{aligned}$$
(30)

where \(\varGamma _{ij}\) is the symmetric trace-free shear matrix defined by (Bartelmann and Schneider 2001; Crittenden et al. 2002):

$$\begin{aligned} \varGamma _{ij}= \left( \partial _i\partial _j-\delta _{ij}\frac{1}{2}\triangle \right) \psi (\varvec{\theta }), \end{aligned}$$
(31)

with \(\partial _i:=\partial /\partial \theta _i\) (\(i=1,2\)). The shear matrix can be expressed in terms of the Pauli matrices as \(\varGamma =\sigma _3\gamma _1+\sigma _1\gamma _2\). The first term in Eq. (30) describes the isotropic light focusing or area distortion in the weak-lensing limit, while the second term induces an asymmetry in lens mapping. The shear \(\gamma \) is responsible for image distortion and can be directly observed from image ellipticities of background galaxies in the regime where \(|\kappa |,|\gamma |\ll 1\) (see Sect. 3). Note that both \(\kappa \) and \(\gamma \) contribute to the area and shape distortions in the non-weak-lensing regime.

Fig. 3
figure 3

Illustration of the effects of the convergence \(\kappa \) and the shear \(\gamma \) on the angular shape and size of a hypothetical circular source. The convergence acting alone causes an isotropic focusing (magnification) of the image (dashed circle), while the shear deforms it to an ellipse

In Fig. 3, we illustrate the effects of the lensing convergence \(\kappa \) and the gravitational shear \(\gamma \) on the angular shape and size of an infinitesimal circular source. The convergence acting alone causes an isotropic magnification of the image, while the shear deforms it to an ellipse. Note that the magnitude of ellipticity induced by gravitational shear in the weak-lensing regime (\(|\gamma |\lesssim 0.1\)) is much smaller than illustrated here.

2.6.3 Magnification

Gravitational lensing describes the deflection of light by gravity. Lensing conserves the surface brightness of a background source, a consequence of Liouville’s theorem. On the other hand, lensing causes focusing of light rays, resulting in an amplification of the image flux through the local solid-angle distortion. Lensing magnification \(\mu \) is thus given by taking the ratio between the lensed to the unlensed image solid angle as \(\mu =\delta \varOmega ^I/\delta \varOmega ^S = 1/\mathrm {det}\mathscr{A}\), with:

$$\begin{aligned} \mu (\varvec{\theta })=\frac{1}{[1-\kappa (\varvec{\theta })]^2-|\gamma (\varvec{\theta })|^2}. \end{aligned}$$
(32)

In the weak-lensing limit (\(|\kappa |,|\gamma |\ll 1\)), the magnification factor to the first order is:

$$\begin{aligned} \mu (\varvec{\theta }) \simeq 1+2\kappa (\varvec{\theta }). \end{aligned}$$
(33)

The magnitude change at \(\kappa (\varvec{\theta })=0.1\) is thus \(\varDelta m = -(5/2)\log _{10}(\mu ) \sim -0.2\).

2.6.4 Strong- and weak-lensing regimes

The \(\mathscr{A}(\varvec{\theta })\) matrix has two local eigenvalues \(\varLambda _\pm (\varvec{\theta })\) at each image position \(\varvec{\theta }\):

$$\begin{aligned} \varLambda _{\pm }=1-\kappa \pm |\gamma |, \end{aligned}$$
(34)

with \(\varLambda _+ \geqslant \varLambda _-\).

Fig. 4
figure 4

Shape distortion field produced by a simulated lens with a bimodal mass distribution. At each grid point in the image plane (left panel), we have drawn the apparent shape for an intrinsically circular source using the local deformation factors \(\varLambda _\pm \) (Eq. (34)). All ellipses have an equal area regardless of the magnification factor. The right panel shows the \(\kappa \) map of the bimodal lens. In both panels, the solid lines indicate the critical curves. The image distortion disappears locally along the curve \(\kappa =1\) indicated by the dashed line, which lies in the negative-parity region

Images with \(\mathrm {det}\mathscr{A}(\varvec{\theta })>0\) have the same parity as the source, while those with \(\mathrm {det}\mathscr{A}(\varvec{\theta })<0\) have the opposite parity to the source. A set of closed curves defined by \(\mathrm {det}\mathscr{A}(\varvec{\theta })=0\) in the image plane are referred to as critical curves, on which lensing magnification formally diverges, and those mapped into the source plane are referred to as caustics (see Hattori et al. 1999a). The critical curves separate the image plane into even- and odd-parity regions with \(\mathrm {det}\mathscr{A}>0\) and \(\mathrm {det}\mathscr{A}<0\), respectively.

An infinitesimal circular source is transformed to an ellipse with a minor-to-major axis ratio (\(\leqslant 1\)) of \(|\varLambda _-/\varLambda _+|\) for \(\kappa <1\) and \(|\varLambda _+/\varLambda _-|\) for \(\kappa > 1\), and it is magnified by the factor \(|\mu |=1/|\varLambda _+\varLambda _-|\) (see Sect. 2.6.3). The gravitational distortion locally disappears along the curve defined by \(\mathrm {tr}[\mathscr{A}(\varvec{\theta })]=0\), i.e., \(\kappa (\varvec{\theta })=1\), which lies in the odd-parity region (Kaiser 1995). This is illustrated in Fig. 4 for a simulated lens with a bimodal mass distribution. Images forming along the outer (tangential) critical curve \(\varLambda _{-}(\varvec{\theta })=0\) are distorted tangentially to this curve, while images forming close to the inner (radial) critical curve \(\varLambda _{+}(\varvec{\theta })=0\) are stretched in the direction perpendicular to the critical curve.

A lens system that has a region with \(\kappa (\varvec{\theta })>1\) can produce multiple images for certain source positions \({\varvec{\beta }}\), and such a system is referred to as being supercritical. Note that being supercritical is a sufficient but not a necessary condition for a general lens to produce multiple images, because the shear can also contribute to multiple imaging. Nevertheless, this provides us with a simple criterion to broadly distinguish the regimes of multiple and single imaging. Keeping this in mind, we refer to the region where \(\kappa (\varvec{\theta }) \gtrsim 1\) as the strong-lensing regime and the region where \(\kappa (\varvec{\theta })\ll 1\) as the weak-lensing regime.

2.6.5 Critical surface mass density

The lensing convergence \(\kappa \) is essentially a distance-weighted mass overdensity projected along the line of sight. We express \(\kappa (\varvec{\theta })\) due to cluster gravitational lensing as:

$$\begin{aligned} \kappa (\varvec{\theta })= \int _0^{\chi _s} (\rho -\overline{\rho }) \left( \frac{c^2}{4\pi G}\frac{D_s}{D_lD_{ls}} \right) ^{-1} a \text {d}\chi \simeq \frac{\varSigma (\varvec{\theta })}{\varSigma _\mathrm {cr}}, \end{aligned}$$
(35)

where \(\chi _s\) is the comoving distance to the source plane; \(\varSigma =\int _0^{\chi _s} (\rho -\overline{\rho })\,a {\text{d}}\chi \) is the surface mass density field of the lens projected on the sky; and \(\varSigma _\mathrm {cr}\) is the critical surface mass density of gravitational lensingFootnote 5:

$$\begin{aligned} \varSigma _\mathrm {cr}(z_l,z_s) = \frac{c^2}{4\pi G}\frac{D_s}{D_lD_{ls}} \end{aligned}$$
(36)

for \(z_s > z_l\) and \(\varSigma _\mathrm {cr}^{-1}(z_l,z_s) = 0\) (i.e., \(D_{ls}/D_s= 0\)) for an unlensed source with \(z_s\leqslant z_l\). In the second (approximate) equality of Eq. (35), we have explicitly used the thin-lens approximation (Sect. 2.6.1). The critical surface mass density \(\varSigma _\mathrm {cr}\) depends on the geometric configuration (\(z_l, z_s\)) of the lens–source system and the background cosmological parameters, such as (\(\varOmega _\mathrm {m}, \varOmega _{\varLambda }, H_0\)). For example, for \(z_l=0.3\) and \(z_s=1\) in our fiducial cosmology, we have \(\varSigma _\mathrm {cr}\approx 4.0 \times 10^{15}hM_\odot \,\hbox {Mpc}^{-2}\). For a fixed lens redshift \(z_l\), the geometric efficiency of gravitational lensing is determined by the distance ratio \(D_{ls}/D_s\) as a function of \(z_s\) and the background cosmology.

Fig. 5
figure 5

Distance ratio \(D_{ls}/D_s\) as a function of the source redshift \(z_s\) for various sets of the lens redshift \(z_l\) and the cosmological parameters \((\varOmega _\mathrm {m},\varOmega _{\varLambda })\). The ratio \(D_{ls}/D_s\) is shown for three different lens redshifts, \(z_l=0.2, 0.5\), and 0.8 (from left to right) and for three sets of the cosmological parameters, \((\varOmega _\mathrm {m},\varOmega _{\varLambda })=(1,0), (0.3,0)\), and (0.3, 0.7)

To translate the observed lensing signal into surface mass densities, one needs an estimate of \(\varSigma _\mathrm {cr}(z_l,z_s)\) for a given background cosmology. In the regime where \(z_l\ll z_s\) (say, \(z_l\lesssim 0.2\) for background galaxy populations at \(z_s\sim 1\)), \(\varSigma _\mathrm {cr}\) depends weakly on the source redshift \(z_s\), so that a precise knowledge of the source-redshift distribution is less critical (e.g., Okabe and Umetsu 2008; Okabe et al. 2010).

Conversely, this distance dependence of the lensing effects can be used to constrain the cosmological redshift–distance relation by examining the geometric scaling of the lensing signal as a function of the background redshift (Taylor et al. 2007, 2012; Medezinski et al. 2011; Dell’Antonio et al. 2019). Figure 5 compares \(D_{ls}/D_s\) as a function of \(z_s\) for various sets of the lens redshift and the cosmological model.

Note that, in the limit where the lensing matter is continuously distributed along the line of sight, the first equality of Eq. (35) can be formally rewritten as:

$$\begin{aligned} \kappa (\varvec{\theta })= \frac{3H_0^2}{2c^2}\varOmega _\mathrm {m}\int _0^{\chi _s}\text {d}\chi \,g(\chi ,\chi _s)a^{-1}(\chi )\delta [\chi ,r(\chi )\varvec{\theta }], \end{aligned}$$
(37)

with \(g(\chi ,\chi _s)=r(\chi )r(\chi _s-\chi )/r(\chi _s)\) and \(\delta =\delta \rho /\overline{\rho }\). Equation (37) coincides with the expression for the cosmic convergence due to intervening cosmic structures (see Jain et al. 2000).

It is interesting to compare the above line-of-sight integral (Eq. (37)) to the thermal Sunyaev–Zel’dovich effect (SZE) in terms of the Compton-y parameter (e.g., Sunyaev and Zeldovich 1972; Rephaeli 1995; Birkinshaw 1999):

$$\begin{aligned} y=\frac{\sigma _\mathrm {T}}{m_\mathrm {e} c^2}\int n_\mathrm {e} k_\mathrm {B} \left( T_\mathrm {e}-T_\mathrm {CMB}\right) \,c\text {d}t\simeq \frac{\sigma _\mathrm {T}}{m_\mathrm {e} c^2}\int P_\mathrm {e}\,c{\text{d}}t, \end{aligned}$$
(38)

where \(\sigma _\mathrm {T}\), \(m_\mathrm {e}\), and \(k_\mathrm {B}\) are the Thomson scattering cross-section, the electron mass, and the Boltzmann constant, respectively; \(T_\mathrm {CMB}= T_0(1+z)\) is the temperature of CMB photons with \(T_0=2.725\,\hbox {K}\); and \(T_e\) and \(n_e\) are the electron temperature and number density of the intracluster gas, with \(P_\mathrm {e}= n_\mathrm {e} k_\mathrm {B}T_\mathrm {e}\) the electron pressure. In the second (approximate) equality, we have used \(T_e\gg T_0(1+z)\). The Compton-y parameter is proportional to the electron pressure integrated along the line of sight, thus probing the thermal energy content of thermalized hot plasmas residing in the gravitational potential wells of galaxy clusters. The combination of the thermal SZE and weak lensing thus provides unique astrophysical and cosmological probes (e.g., Doré et al. 2001; Umetsu et al. 2009; Osato et al. 2020).

2.6.6 Einstein radius

Detailed strong-lens modeling using many sets of multiple images with measured spectroscopic redshifts allows us to determine the location of the critical curves (e.g., Zitrin et al. 2015; Meneghetti et al. 2017), which, in turn, provides accurate estimates of the projected total mass enclosed by them. In this context, the term Einstein radius is often used to refer to the size of the outer (tangential) critical curve (i.e., \(\varLambda _{-}(\varvec{\theta })=0\); Sect. 2.6.4). We note, however, that there are several possible definitions of the Einstein radius used in the literature (see Meneghetti et al. 2013). Here, we adopt the effective Einstein radius definition (Redlich et al. 2012; Meneghetti et al. 2013, 2017; Zitrin et al. 2015), \(\vartheta _\mathrm {Ein}=\sqrt{A_\mathrm {c}/\pi }\), where \(A_\mathrm {c}\) is the (angular) area enclosed by the outer critical curve. For an axisymmetric lens, the average surface mass density within the critical area is equal to \(\varSigma _\mathrm {cr}\) (see Hattori et al. 1999; Meneghetti et al. 2013), thus enabling us to directly estimate the enclosed projected mass by \(M_\mathrm {2D}(<\vartheta _\mathrm {Ein})=\pi (D_l\vartheta _\mathrm {Ein})^2\varSigma _\mathrm {cr}\). Even for general non-axisymmetric lenses, the projected enclosed mass profile \(M_\mathrm {2D}(<\vartheta )=\varSigma _\mathrm {cr}D_l^2\int _{\vartheta '\le \vartheta }\kappa (\varvec{\theta }')\,{\text{d}}^2\theta '\) at the location \(\vartheta \sim \vartheta _\mathrm {Ein}\) is less sensitive to modeling assumptions and approaches (e.g., Umetsu et al. 2012, 2016; Meneghetti et al. 2017), thus serving as a fundamental observable quantity in the strong-lensing regime (Coe et al. 2010).

3 Basics of cluster weak lensing

In this section, we review the basics of cluster–galaxy weak lensing based on the thin-lens formalism (Sect. 2.6). Unless otherwise noted, we will focus on subcritical lensing (i.e., outside the critical curves). We consider both linear (\(\kappa \ll 1\)) and mildly nonlinear regimes of weak gravitational lensing.

3.1 Weak-lensing mass reconstruction

3.1.1 Spin operator and lensing fields

For mathematical convenience, we introduce a concept of “spin” for weak-lensing quantities as follows (Bacon et al. 2006; Okura et al. 2007, 2008; Schneider and Er 2008; Bacon and Schäfer 2009): a quantity is said to have spin N if it has the same value after rotation by \(2\pi /N\). The product of spin-A and spin-B quantities has spin (\(A+B\)), and the product of spin-A and spin-\(B^*\) quantities has spin (\(A-B\)), where \(*\) denotes the complex conjugate.

We define a complex spin-1 operator \(\partial :=\partial _1+i\partial _2\) that transforms as a vector, \(\partial '=\partial e^{i\varphi }\), with \(\varphi \) being the angle of rotation relative to the original basis. Then, the lensing convergence is expressed in terms of \(\psi (\varvec{\theta })\) as:

$$\begin{aligned} \kappa (\varvec{\theta }) = \frac{1}{2}\partial ^*\partial \psi (\varvec{\theta }), \end{aligned}$$
(39)

where \(\partial \partial ^*=\nabla _\theta ^2\) is a scalar or a spin-0 operator. Similarly, the complex shear \(\gamma =\gamma _1+i\gamma _2\equiv |\gamma |e^{2i\phi _\gamma }\) is expressed as:

$$\begin{aligned} \gamma (\varvec{\theta }) = \frac{1}{2}\partial \partial \psi (\varvec{\theta }) = \hat{\mathcal{D}} \psi (\varvec{\theta }), \end{aligned}$$
(40)

where

$$\begin{aligned} \hat{\mathcal{D}}:=\partial \partial /2 =(\partial _1^2-\partial _2^2)/2+i\partial _1\partial _2 \end{aligned}$$
(41)

is a spin-2 operator transforms such that \(\hat{\mathcal{D}}'=\hat{\mathcal{D}}e^{2i\varphi }\) under a rotation of the basis axes by \(\varphi \).

3.1.2 Linear mass reconstruction

Since \(\gamma (\varvec{\theta })\) and \(\kappa (\varvec{\theta })\) are both linear combinations of the second derivatives of \(\psi (\varvec{\theta })\), they are related to each other by (Kaiser 1995; Crittenden et al. 2002; Umetsu 2010)Footnote 6:

$$\begin{aligned} \triangle \kappa (\varvec{\theta }) = \partial ^*\partial ^*\gamma (\varvec{\theta }) = 2\hat{\mathcal{D}}^*\gamma (\varvec{\theta }). \end{aligned}$$
(42)

The shear-to-mass inversion can thus be formally expressed as:

$$\begin{aligned} \kappa (\varvec{\theta }) = \triangle ^{-1}(\varvec{\theta },\varvec{\theta }')\left[ \partial ^*\partial ^*\gamma (\varvec{\theta }') \right] =2{\hat{\mathcal{D}}^*} \left( \triangle ^{-1}(\varvec{\theta },\varvec{\theta }')\left[ \gamma (\varvec{\theta }')\right] \right) . \end{aligned}$$
(43)

Using \(\triangle ^{-1}(\varvec{\theta },\varvec{\theta }')=\ln |\varvec{\theta }-\varvec{\theta }'|/(2\pi )\) (Sect. 2.6.2), Eq. (43) in the flat-sky limit can be solved to yield the following nonlocal relation between \(\kappa \) and \(\gamma \) (Kaiser and Squires 1993, hereafter KS93):

$$\begin{aligned} \kappa (\varvec{\theta }) -\kappa _0 = \frac{1}{\pi } \int {\text{d}}^2\theta '\,D^*(\varvec{\theta }-\varvec{\theta }')\gamma (\varvec{\theta }'), \end{aligned}$$
(44)

where \(\kappa _0\) is an additive constant and \(D(\varvec{\theta })\) is a complex kernel defined as:

$$\begin{aligned} D(\varvec{\theta }) \equiv 2\pi \hat{\mathcal{D}}\left[ \triangle ^{-1}(\varvec{\theta })\right] = \frac{\theta _2^2-\theta _1^2-2i\theta _1\theta _2}{|\theta |^4} =-\frac{1}{(\theta _1-i\theta _2)^2}. \end{aligned}$$
(45)

Similarly, the complex shear field can be expressed in terms of the convergence \(\kappa \) as:

$$\begin{aligned} \gamma (\varvec{\theta }) = \frac{1}{\pi }\int {\text{d}}^2\theta '\,D(\varvec{\theta }-\varvec{\theta }')\kappa (\varvec{\theta }'). \end{aligned}$$
(46)

This linear mass inversion formalism is often referred to as the KS93 algorithm.

It is computationally faster to work in Fourier domain (Jain et al. 2000) using the fast Fourier transform algorithm. By taking the Fourier transform of Eq. (42), we have a mass inversion relation in the conjugate Fourier space as:

$$\begin{aligned} \kappa (\varvec{\theta })&= \int \frac{{\text{d}}^2k}{(2\pi )^2}\,\hat{\kappa }(\varvec{k}) e^{i\varvec{k}\cdot \varvec{\theta }},\nonumber \\ \hat{\kappa }(\varvec{k})&= \frac{k_1^2-k_2^2-2ik_1 k_2}{k_1^2+k_2^2} \hat{\gamma }(\varvec{k}) \ \ \ (\varvec{k}\ne 0), \end{aligned}$$
(47)

where \(\varvec{k}\) is the two-dimensional wave vector conjugate to \(\varvec{\theta }\), and \(\hat{\kappa }(\varvec{k})\) and \(\hat{\gamma }(\varvec{k})\) are the Fourier transforms of \(\kappa (\varvec{\theta })\) and \(\gamma (\varvec{\theta })=\gamma _1(\varvec{\theta })+i\gamma _2(\varvec{\theta })\), respectively. In practical applications, one may assume \(\hat{\kappa }(0)=0\) if the angular size of the observed shear field is sufficiently large, so that the mean convergence across the data field is approximated to zero. Otherwise, one must explicitly account for the boundary conditions imposed by the observed shear field to perform a mass reconstruction on a finite field (e.g., Kaiser 1995; Seitz and Schneider 1996; Bartelmann et al. 1996; Seitz and Schneider 1997; Umetsu and Futamase 2000).

Fig. 6
figure 6

Shape distortion field in the rich cluster Cl0024+1654 (\(z_l=0.395\)) obtained from deep weak-lensing observations taken with Subaru/Suprime-Cam. The mean surface number density of background galaxies after the color–color selection (Sect. 4.1) is \(n_\mathrm {g}=17.2\,\hbox {arcmin}^{-2}\). A stick with a length of 10% shear is indicated in the top right corner. The filled circle indicates the FWHM (1.4 arcmin) of the Gaussian smoothing. The distortion field exhibits a coherent tangential pattern around the cluster center. Image reproduced with permission from Umetsu et al. (2010), copyright by AAS

Fig. 7
figure 7

Comparison of mass and galaxy distributions in the rich cluster Cl0024+1654 (\(z_l=0.395\)). Left panel: projected mass distribution \(\kappa (\varvec{\theta })=\varSigma (\varvec{\theta })/\varSigma _\mathrm {cr}\) reconstructed from deep weak-lensing data (Fig. 6) taken with Subaru/Suprime-Cam. Right panel: surface number density distribution \(\varSigma _n(\varvec{\theta })\) of color–color-selected cluster member galaxies. The solid circle in each panel indicates the cluster virial radius of \(r_\mathrm {vir}\approx 1.8h^{-1}\mathrm {Mpc}\). Both maps are smoothed with a Gaussian of \(1.4'\) FWHM. Also overlaid on the \(\varSigma _n\) map is the \(\kappa (\varvec{\theta })\) field shown in the left panel, given in units of \(2\sigma \) reconstruction error from the lowest contour level of \(3\sigma \). North is to the top, east to the left. Image reproduced with permission from Umetsu et al. (2010), copyright by AAS

In Fig. 6, we show the shape distortion field in the rich cluster Cl0024+1654 (\(z_l=0.395\)) obtained by Umetsu et al. (2010) from deep weak-lensing observations taken with Suprime-Cam on the 8.2 m Subaru telescope. They accounted and corrected for the effect of the weight function used for calculating noisy galaxy shapes, as well as for the anisotropic and smearing effects of the point spread function (PSF), using an improved implementation of the modified Kaiser et al. (1995, hereafter KSB) method (see Sect. 3.4.2). In the left panel of Fig. 7, we show the \(\kappa (\varvec{\theta })\) field reconstructed from the Subaru weak-lensing data (see Fig. 6). A prominent mass peak is visible in the cluster center, around which the distortion pattern is clearly tangential (Fig. 6). In this study, a variant of the linear KS93 algorithm was used to reconstruct the \(\kappa \) map from the weak shear lensing data. In the right panel of Fig. 7, we show the member galaxy distribution \(\varSigma _n(\varvec{\theta })\) in the cluster. Overall, mass and light are similarly distributed in the cluster.

Fig. 8
figure 8

Projected mass distribution (contours) in the Coma cluster at \(z=0.0236\) reconstructed from a \(4\,\hbox {deg}^2\) weak-lensing survey with Subaru Suprime-Cam observations. Left and middle panels: luminosity and number density distributions of spectroscopically identified cluster members, respectively. Right panel: projected large-scale structure model based on galaxy–galaxy lensing results. The mean surface number density of background galaxies after the color–magnitude selection is \(n_\mathrm {g}=41.3\,\hbox {arcmin}^{-2}\). Image reproduced with permission from Okabe et al. (2014), copyright by AAS

Figure 8 shows the projected mass distribution in the very nearby Coma cluster (\(z_l=0.0236\)) reconstructed from a \(4\,\hbox {deg}^2\) weak-lensing survey of cluster subhalos based on Subaru Suprime-Cam observations (Okabe et al. 2014). In the figure, the weak-lensing mass map is compared to the luminosity and number density distributions of spectroscopically identified cluster members, as well as to the projected large-scale structure model based on galaxy–galaxy lensing with the light-tracing-mass assumption. The projected mass and galaxy distributions in the Coma cluster are correlated well with each other. Thanks to the large angular extension of the Coma cluster, Okabe et al. (2014) measured the weak-lensing masses of 32 cluster subhalos down to the order of \(10^{-3}\) of the cluster virial mass.

3.1.3 Mass-sheet degeneracy

Adding a constant mass sheet to \(\kappa (\varvec{\theta })\) in the shear-to-mass formula (46) does not change the shear field \(\gamma (\varvec{\theta })\) that is observable in the weak-lensing limit. This leads to a degeneracy of solutions for the weak-lensing mass inversion problem, which is referred to as the mass-sheet degeneracy (Falco et al. 1985; Gorenstein et al. 1988; Schneider and Seitz 1995).

As we shall see in Sect. 3.4, in general, the observable quantity for weak shear lensing is not the shear \(\gamma \), but the reduced shear:

$$\begin{aligned} g(\varvec{\theta })=\frac{\gamma (\varvec{\theta })}{1-\kappa (\varvec{\theta })} \end{aligned}$$
(48)

in the subcritical regime where \(\mathrm {det}\mathscr{A}>0\) (or \(1/g^*\) in the negative-parity region with \(\mathrm {det}\mathscr{A}<0\)). We see that the \(g(\varvec{\theta })\) field is invariant under the following global transformation:

$$\begin{aligned} \kappa (\varvec{\theta }) \rightarrow \lambda \kappa (\varvec{\theta }) + 1-\lambda , \ \ \ \gamma (\varvec{\theta }) \rightarrow \lambda \gamma (\varvec{\theta }) \end{aligned}$$
(49)

with an arbitrary scalar constant \(\lambda \ne 0\) (Schneider and Seitz 1995). This transformation is equivalent to scaling the Jacobian matrix \(\mathscr{A}(\varvec{\theta })\) with \(\lambda \), \(\mathscr {A}(\varvec{\theta }) \rightarrow \lambda \mathscr{A}(\varvec{\theta })\). It should be noted that this transformation leaves the location of the critical curves (\(\mathrm {det}\mathscr{A}(\varvec{\theta })=0\)) invariant as well. Moreover, the location of the curve defined by \(\kappa (\varvec{\theta })=1\), on which the distortion locally disappears, is left invariant under the transformation (Eq. (49)). A general conclusion is that all mass reconstruction methods based on shape information alone can determine the \(\kappa \) field only up to a one-parameter family (\(\lambda \) or \(\kappa _0\)) of linear transformations (Eq. (49)).

In principle, this degeneracy can be broken or alleviated, for example, by measuring the magnification factor \(\mu \) in the subcritical regime (i.e., outside the critical curves; see Umetsu 2013), because \(\mu \) transforms under the invariance transformation (Eq. (49)) as:

$$\begin{aligned} \mu (\varvec{\theta }) \rightarrow \lambda ^{-2} \mu (\varvec{\theta }). \end{aligned}$$
(50)

3.1.4 Nonlinear mass reconstruction

Following Seitz and Schneider (1995), we generalize the KS93 algorithm to include the nonlinear but subcritical regime (outside the critical curves). To this end, we express the KS93 inversion formula in terms of the observable reduced shear \(g(\varvec{\theta })\). Substituting \(\gamma =g(1-\kappa )\) in Eq. (44), we have the following integral equation:

$$\begin{aligned} \kappa (\mathbf {\theta })-\kappa _0 = \frac{1}{\pi }\int {\text{d}}^2 \theta ^{\prime} \, D^{*}(\varvec{\theta }-\varvec{\theta }^{\prime})\, g(\varvec{\theta }^{\prime})\, [1-\kappa (\varvec{\theta }^{\prime})]. \end{aligned}$$
(51)

For a given \(g(\varvec{\theta })\) field, this nonlinear equation can be solved iteratively, for example, by initially setting \(\kappa (\varvec{\theta })=0\) everywhere (Seitz and Schneider 1995),

Equivalently, Eq. (51) can be formally expressed as a power series expansion (Umetsu et al. 1999):

$$\begin{aligned} \kappa (\varvec{\theta }) - \kappa _0&= (1-\kappa _0) \left( \hat{\mathcal{G}} -\hat{\mathcal{G}}\circ \hat{\mathcal{G}} +\hat{\mathcal{G}}\circ \hat{\mathcal{G}}\circ \hat{\mathcal{G}}-\cdots \right) \nonumber \\&= (1-\kappa _0)\sum _{n=1}^{\infty }(-1)^{n-1} \hat{\mathcal{G}}^n, \end{aligned}$$
(52)

where \(\hat{\mathcal{G}}\) is the convolution operator defined by:

$$\begin{aligned} \hat{\mathcal{G}}(\varvec{\theta },\varvec{\theta }^{\prime}) := \frac{1}{\pi }\int {\text{d}}^2\theta ^{\prime}\, D^*(\varvec{\theta }-\varvec{\theta }^{\prime})g(\varvec{\theta }^{\prime})\,\times . \end{aligned}$$
(53)

Here, \(\hat{\mathcal{G}}(\mathbf {\theta },\mathbf {\theta }^{\prime})\) acts on a function of \(\varvec{\theta }^{\prime}\). The KS93 algorithm corresponds to the first-order approximation to this power series expansion in the weak-lensing limit. Note that solutions for nonlinear mass reconstructions suffer from the generalized mass-sheet degeneracy, as explicitly shown in Eq.  (52).

Note that there is another class of mass inversion algorithms that uses maximum-likelihood and Bayesian approaches to obtain a mass map solution and its error covariance matrix from weak-lensing data (e.g., Bartelmann et al. 1996; Bradač et al. 2006; Merten et al. 2009).

3.2 E/B decomposition

The shear matrix \(\varGamma (\varvec{\theta })=\gamma _1(\varvec{\theta })\sigma _3+\gamma _2(\varvec{\theta })\sigma _1\) that describes a spin-2 anisotropy can be expressed as a sum of two components corresponding to the number of degrees of freedom. By introducing two scalar fields \(\psi _E(\varvec{\theta })\) and \(\psi _B(\varvec{\theta })\), we decompose the shear matrix \(\varGamma _{ij}\) (\(i,j=1,2\)) into two independent modes as (Crittenden et al. 2002):

$$\begin{aligned} \varGamma (\varvec{\theta })= \left( \begin{array}{cc} \gamma _1 &{} \gamma _2\\ \gamma _2 &{} -\gamma _1 \end{array}\right) = \varGamma ^{(E)}(\varvec{\theta }) +\varGamma ^{(B)}(\varvec{\theta }), \end{aligned}$$
(54)

with

$$\begin{aligned} \varGamma _{ij}^{(E)}(\varvec{\theta })&= \left( \partial _i\partial _j-\delta _{ij} \frac{1}{2}\triangle \right) \psi _E(\varvec{\theta }),\nonumber \\ \varGamma _{ij}^{(B)}(\varvec{\theta })&= \frac{1}{2} \left( \epsilon _{kj}\partial _i\partial _k + \epsilon _{ki}\partial _j\partial _k \right) \psi _B(\varvec{\theta }), \end{aligned}$$
(55)

where \(\epsilon _{ij}\) is the Levi–Civita symbol in two dimensions, defined such that \(\epsilon _{11}=\epsilon _{22}=0\), \(\epsilon _{12}=-\epsilon _{21}=1\). Here, the first term associated with \(\psi _E\) is a gradient or scalar E component and the second term with \(\psi _B\) is a curl or pseudoscalar B component.

The shear components \((\gamma _1,\gamma _2)\) are written in terms of \(\psi _E\) and \(\psi _B\) as:

$$\begin{aligned} \gamma _1= & {} +\varGamma _{11}=-\varGamma _{22}=\frac{1}{2}\left( \psi _{E,11}-\psi _{E,22}\right) -\psi _{B,12} \end{aligned}$$
(56)
$$\begin{aligned} \gamma _2= & {} \varGamma _{12}=\varGamma _{21}=\psi _{E,12} +\frac{1}{2}\left( \psi _{B,11}-\psi _{B,22}\right) . \end{aligned}$$
(57)

As we have discussed in Sect. 3.1.1, the spin-2 \(\gamma (\varvec{\theta })\) field is coordinate dependent and transforms as \(\gamma ^{\prime}=\gamma e^{2i\varphi }\) under a rotation of the basis axes by \(\varphi \). The E and B components can be extracted from the shear matrix as:

$$\begin{aligned} 2\nabla ^2_\theta \kappa _E&\equiv \nabla ^4_\theta \psi _E=2\partial ^i\partial ^j\varGamma _{ij},\nonumber \\ 2\nabla ^2_\theta \kappa _B&\equiv \nabla ^4_\theta \psi _B=2\epsilon _{ij} \partial ^i\partial ^k\varGamma _{jk}, \end{aligned}$$
(58)

where we have defined the E and B fields, \(\kappa _E=(1/2)\triangle \psi _E\) and \(\kappa _B=(1/2)\triangle \psi _B\), respectively. This technique is referred to as the E/B-mode decomposition. We see from Eq. (58) that the relations between E/B fields and spin-2 fields are intrinsically nonlocal. Remembering that the shear matrix due to weak lensing is given as \(\varGamma _{ij}=(\partial _i\partial _j-\delta _{ij}\triangle /2)\psi (\varvec{\theta })\) (\(i,j=1,2\)), we identify \(\psi _E(\varvec{\theta })=\psi (\varvec{\theta })\) and \(\psi _B(\varvec{\theta })=0\). Hence, for a lensing-induced shear field, the E-mode signal is related to the convergence \(\kappa \), i.e., the surface mass density of the lens, while the B-mode signal is identically zero.

Fig. 9
figure 9

Illustration of shape distortion patterns from E-mode and B-mode fields. Image reproduced with permission from Van Waerbeke and Mellier (2003)

Figure 9 illustrates characteristic distortion patterns from E-mode (curl-free) and B-mode (divergence-free) fields. Weak lensing only produces curl-free E-mode signals, so that the presence of divergence-free B modes can be used as a null test for systematic effects. In the weak-lensing regime, a tangential E-mode pattern is produced by a positive mass overdensity (e.g., halos), while a radial E-mode pattern is produced by a negative mass overdensity (e.g., cosmic voids).

Now, we turn to the issue of E/B-mode reconstructions from the spin-2 shear field. Rewriting Eq. (58) in terms of the complex shear \(\gamma \), we find:

$$\begin{aligned} \triangle \kappa _E&= \mathfrak {R}\left( 2\hat{\mathcal{D}}^*\gamma \right) , \nonumber \\ \triangle \kappa _B&= \mathfrak {I}\left( 2\hat{\mathcal{D}}^*\gamma \right) , \end{aligned}$$
(59)

where \({\mathfrak {R}}(Z)\) and \({\mathfrak {I}}(Z)\) denote the real part and the imaginary part of a complex variable Z, respectively. Defining \(\kappa \equiv \kappa _E+i\kappa _B\), we see that the first of Eq. (59) is identical to the mass inversion formula (Eq. (42)). The B-mode convergence \(\kappa _B\) can thus be simply obtained as the imaginary part of Eq. (44), which is expected to vanish for a purely weak-lensing signal. Moreover, the second of Eq. (59) indicates that the transformation \(\gamma ^{\prime}(\varvec{\theta })=i\gamma (\varvec{\theta })\) (\(\gamma _1^{\prime}=-\gamma _2, \gamma _2'=\gamma _1\)) is equivalent to an interchange operation of the E and B modes of the original maps by \(\kappa _E'(\varvec{\theta })=-\kappa _B(\varvec{\theta })\) and \(\kappa '_B(\varvec{\theta })=\kappa _E(\varvec{\theta })\). Since \(\gamma \) is a spin-2 field that transforms as \(\gamma '=\gamma e^{2i\varphi }\), this operation is also equivalent to a rotation of each ellipticity by \(\pi /4\) with each position vector fixed.

Note that gravitational lensing can induce B modes, for example, when multiple deflections of light are involved (Sect. 2.5). However, these B modes can be generated at higher orders and the B-mode contributions coming from multiple deflections are suppressed by a large factor compared to the E-mode contributions (see, e.g., Krause and Hirata 2010). In real observations, intrinsic ellipticities of background galaxies also contribute to weak-lensing shear estimates. Assuming that intrinsic ellipticities have random orientations in projection space, such an isotropic ellipticity distribution will yield statistically identical contributions to the E and B modes. Therefore, the B-mode signal provides a useful null test for systematic effects in weak-lensing observations (Fig. 9).

3.3 Flexion

Flexion is introduced as the next higher order lensing effects responsible for an arc-like and weakly skewed appearance of lensed galaxies (Goldberg and Bacon 2005; Bacon et al. 2006) observed in a regime between weak and strong lensing (i.e., a nonlinear but subcritical regime). Such higher order lensing effects occur when \(\kappa (\varvec{\theta })\) and \(\gamma (\varvec{\theta })\) are not spatially constant across a source galaxy image. By taking higher order derivatives of the lensing potential \(\psi (\varvec{\theta })\), we can work with higher order transformations of galaxy shapes by weak lensing (e.g., Massey et al. 2007b; Okura et al. 2007, 2008; Goldberg and Leonard 2007; Schneider and Er 2008; Viola et al. 2012).

Fig. 10
figure 10

Weak-lensing distortions with different spin values increasing from left to right. The convergence \(\kappa \) is a spin-0 quantity, the first flexion \(F=F_1+iF_2\) is spin-1, the shear \(\gamma =\gamma _1+i\gamma _2\) is spin-2, and the second flexion \(G=G_1+iG_2\) is spin-3. Image reproduced with permission from Bacon et al. (2006), copyright by the authors

The third-order derivatives of \(\psi (\varvec{\theta })\) can be combined to form a pair of complex flexion fields as (Bacon et al. 2006):

$$\begin{aligned} F&:= F_1+i{F}_2 = \frac{1}{2}\partial \partial \partial ^*\psi =\partial \kappa ,\nonumber \\ G&:= G_1+i{G}_2 = \frac{1}{2}\partial \partial \partial \psi =\partial \gamma . \end{aligned}$$
(60)

The first flexion F has spin-1 and the second flexion G has spin-3. The two complex flexion fields satisfy the following consistency relation:

$$\begin{aligned} \partial ^* \partial G(\varvec{\theta }) = \partial \partial F(\varvec{\theta }). \end{aligned}$$
(61)

Figure 10 illustrates the characteristic weak-lensing distortions with different spin values for an intrinsically circular Gaussian source (Bacon et al. 2006).

If the angular size of an image is small compared to the characteristic scale over which \(\psi (\varvec{\theta })\) varies, we can locally expand Eq. (13) to the next higher order as:

$$\begin{aligned} \delta \beta _i = \mathscr{A}_{ij}\delta \theta _j + \frac{1}{2}\mathscr{A}_{ij,k}\delta \theta _j \delta \theta _k + O(\delta \theta ^3), \end{aligned}$$
(62)

where \(\mathscr{A}_{ij,k}=-\psi _{,ijk}\) (\(i,j,k=1,2\)). The \(\mathscr{A}_{ij,k}\) matrix can be expressed with a sum of two terms:

$$\begin{aligned} \mathscr{A}_{ij,k}={F}_{ijk}+{G}_{ijk}, \end{aligned}$$
(63)

with the spin-1 part \({F}_{ijk}\) and the spin-3 part \({G}_{ijk}\) defined by:

$$ F_{{ij1}} = - \frac{1}{2}\left( {\begin{array}{*{20}l} {3F_{1} } \hfill & {F_{2} } \hfill \\ {F_{2} } \hfill & {F_{1} } \hfill \\ \end{array} } \right),\;F_{{ij2}} = - \frac{1}{2}\left( {\begin{array}{*{20}l} {F_{2} } \hfill & {F_{1} } \hfill \\ {F_{1} } \hfill & {3F_{2} } \hfill \\ \end{array} } \right), $$
(64)
$$ G_{{ij1}} = - \frac{1}{2}\left( {\begin{array}{*{20}l} {G_{1} } \hfill & {G_{2} } \hfill \\ {G_{2} } \hfill & { - G_{1} } \hfill \\ \end{array} } \right),\;G_{{ij2}} = - \frac{1}{2}\left( {\begin{array}{*{20}l} {G_{2} } \hfill & { - G_{1} } \hfill \\ { - G_{1} } \hfill & { - G_{2} } \hfill \\ \end{array} } \right). $$
(65)

Flexion has a dimension of inverse (angular) length, so that the flexion effects depend on the angular size of the source image. That is, the smaller the source image, the larger the amplitude of intrinsic flexion contributions (Okura et al. 2008). The shape quantities affected by the first flexion F alone have spin-1 properties, while those by the second flexion G alone have spin-3 properties.

Note that, as in the case of the spin-2 shear field, what is directly observable from higher order image distortions are the reduced flexion effects, \(F/(1-\kappa )\) and \(G/(1-\kappa )\) (Okura et al. 2007, 2008; Goldberg and Leonard 2007; Schneider and Er 2008), a consequence of the mass-sheet degeneracy.

From Eq. (60), the inversion equations from flexion to \(\kappa \) can be obtained as follows (Bacon et al. 2006):

$$\begin{aligned} (\kappa +iB)_{F}= & {} \triangle ^{-1} \partial ^*{F}, \end{aligned}$$
(66)
$$\begin{aligned} (\kappa +iB)_{G}= & {} \triangle ^{-2} \partial ^*\partial ^*\partial ^*{G}, \end{aligned}$$
(67)

where the complex part iB describes the B-mode component that can be used to assess the noise properties of weak-lensing data (e.g., Okura et al. 2008). An explicit representation for the inversion equations is obtained in Fourier space as:

$$\begin{aligned} \widehat{\kappa }_{F}(\varvec{k})&= -i \frac{k_1\widehat{F}_1 + k_2\widehat{F}_2}{k_1^2+k_2^2},\nonumber \\ \widehat{\kappa }_{G}(\varvec{k})&= -i \frac{ \widehat{G}_1 (k_1^3 - 3k_1 k_2^2)+ \widehat{G}_2 (3k_1^2 k_2 - k_2^3) }{(k_1^2+k_2^2)^2}, \end{aligned}$$
(68)

for \(\varvec{k}\ne 0\).

In principle, one can combine independent mass reconstructions \(\widehat{\kappa }_a(\varvec{k})\) \((a=\gamma ,{F}, {G})\) linearly in Fourier space to improve the statistical significance with minimum noise variance weighting as (Okura et al. 2007):

$$\begin{aligned} \widehat{\kappa }(\varvec{k})=\frac{\sum _a \widehat{W}_{\kappa |a}(\varvec{k})\widehat{\kappa }_a(\varvec{k})}{\sum _a \widehat{W}_{\kappa |a}(\varvec{k}) }, \end{aligned}$$
(69)

where \(\widehat{W}_{\kappa |a}(\varvec{k}) = 1/P^{(N)}_{\kappa a}(\varvec{k})\) with \(P^{(N)}_{\kappa |a}(\varvec{k})\) the two-dimensional noise power spectrum of \(\kappa \) reconstructed using the observable a:

$$\begin{aligned} P^{(N)}_{\kappa |\gamma }(\varvec{k})&= \frac{P_{\gamma }^{(N)}(\varvec{k})}{2} = \frac{\sigma ^2_{\gamma }}{2 n_\gamma },\nonumber \\ P^{(N)}_{\kappa |F}(\varvec{k})&= \frac{P^{(N)}_F(\varvec{k})}{2k^2}=\frac{\sigma ^2_{F}}{2 n_F \varvec{k}^2},\nonumber \\ P^{(N)}_{\kappa |G}(\varvec{k})&= \frac{P^{(N)}_G(\varvec{k})}{2k^2}=\frac{\sigma ^2_{G}}{2 n_G \varvec{k}^2}, \end{aligned}$$
(70)

with \(P^{(N)}_a(\varvec{k})\) the shot noise power, \(\sigma _a\) the shape noise dispersion, and \(n_a\) the mean surface number density of background source galaxies, for the observable a (\(a=\gamma , F, G\)). Assuming that errors in \(\widehat{\kappa }_a(\varvec{k})\) between different observables are independent, the noise power spectrum for the estimator (Eq. (69)) is obtained as (Okura et al. 2007):

$$\begin{aligned} P^{(N)}_{\kappa }(\varvec{k})=\frac{1}{\sum _a \widehat{W}_a(\varvec{k})}= \frac{1}{\sum _a 1/P^{(N)}_{\kappa |a}(\varvec{k})}. \end{aligned}$$
(71)
Fig. 11
figure 11

Mass contours of the rich cluster Abell 1689 (\(z_l=0.183\)) reconstructed from spin-1 flexion measurements based on Subaru Suprime-Cam observations, superposed on the Suprime-Cam \(Vi'\) color image. The contours are spaced in units of \(1\sigma \) reconstruction error estimated from the rms of the B-mode reconstruction. The field size is \(4'\times 4'\). North is to the top and east to the left. Image reproduced with permission from Okura et al. (2008), copyright by AAS

Figure 11 shows the \(\kappa \) field in the central region of the rich cluster Abell 1689 (\(z_l=0.183\)) reconstructed from the spin-1 flexion alone (Okura et al. 2008) measured with Subaru Suprime-Cam data. Okura et al. (2008) used measurements of higher order lensing image characteristics (HOLICs) introduced by Okura et al. (2007). Their analysis accounted for the effect of the weight function used for calculating noisy shape moments, as well as for higher order PSF effects. One can employ the assumption of random orientations for intrinsic HOLICs of background galaxies to obtain a direct estimate of flexion, in a similar manner to the usual prescription for weak shear lensing. Okura et al. (2008) utilized the Fourier-space relation (Eq. (68)) between \(F(\varvec{\theta })\) and \(\kappa (\varvec{\theta })\) with the linear weak-lensing approximation. The B-mode convergence field was used to monitor the reconstruction error in the \(\kappa \) map. The reconstructed \(\kappa \) map exhibits a bimodal feature in the central region of the cluster. The pronounced main peak is associated with the brightest cluster galaxy (BCG) and central cluster members, while the secondary mass peak is associated with a local concentration of bright galaxies.

Note that, as discussed in Viola et al. (2012), there is a cross-talk between shear and flexion arising from shear–flexion coupling terms, which makes quantitative measurements of flexion challenging.

3.4 Shear observables

Since the pioneering work of Kaiser et al. (1995), numerous methods have been proposed and implemented in the literature to accurately extract the lensing signal from noisy pixelized images of background galaxies (e.g., Kuijken 1999; Bridle et al. 2002; Bernstein and Jarvis 2002; Refregier 2003; Hirata and Seljak 2003; Miller et al. 2007). On the other hand, considerable progress has been made in understanding and controlling systematic biases in noisy shear estimates by relying on realistic galaxy image simulations (e.g., Heymans et al. 2006; Massey et al. 2007a; Refregier et al. 2012; Kacprzak et al. 2012; Mandelbaum et al. 2014, 2015, 2018a).

Here, we will review the basic idea and essential aspects of the moment-based KSB formalism. We refer the reader to Mandelbaum (2018) for a recent exhaustive review on the subject.

3.4.1 Ellipticity transformation by weak lensing

In a moment-based approach to weak-lensing shape measurements, we use quadrupole moments \(Q_{ij}\) (\(i,j=1,2\)) of the surface brightness distribution \(I(\varvec{\theta })\) of background galaxy images to quantify the shape of the images as (Kaiser et al. 1995):

$$\begin{aligned} Q_{ij} \equiv \frac{\int {\text{d}}^2\theta \, q_I[I(\varvec{\theta })]\varDelta \theta _i \varDelta \theta _j}{\int {\text{d}}^2\theta \,q_I[I(\varvec{\theta })]}, \end{aligned}$$
(72)

where \(q_I[I(\varvec{\theta })]\) is a weight function and \(\varDelta \theta _i = \theta _i-\overline{\theta }_i\) denotes the offset vector from the image centroid. Here, we assume that the weight \(q_I\) does not explicitly depend on \(\varvec{\theta }\), but is set by the local value of the brightness \(I(\varvec{\theta })\) (Bartelmann and Schneider 2001). The trace of \(Q_{ij}\) describes the angular size of the image, while the traceless part describes the shape and orientation of the image. With the quadrupole moments \(Q_{ij}\), we define the complex ellipticity \(e=e_1+ie_2\) asFootnote 7:

$$\begin{aligned} e \equiv \frac{Q_{11} - Q_{22} + 2iQ_{12}}{Q_{11} + Q_{22}}. \end{aligned}$$
(73)

For an ellipse with a minor-to-major axis ratio of \(q \; (\leqslant 1)\), \(|e|=(1-q^2)/(1+q^2)\).

The spin-2 ellipticity e (Eq. (73)) transforms under the lens mapping as:

$$\begin{aligned} e^{(s)}=\frac{e-2g+e^* g^2}{1+|g|^2-2\mathfrak {R}(e^* g)}, \end{aligned}$$
(74)

where \(e^{(s)}\) is the unlensed intrinsic ellipticity and \(g = \gamma /(1 - \kappa )\) is the spin-2 reduced shear. Since e is a nonzero spin quantity with a direction dependence, the expectation value of the intrinsic source ellipticity \(e^{(s)}\) is assumed to vanish, i.e., \(\mathcal{E}(e^{(s)})=0\), where \(\mathcal{E}(X)\) denotes the expectation value of X. Schneider and Seitz (1995) showed that Eq. (74) with the condition \(\mathcal{E}(e^{(s)}) = 0\) is equivalent to:

$$\begin{aligned} 0=\sum _n w_n \frac{e_n - \delta _g}{1-\mathfrak {R}(e^*_n \delta _g)}, \end{aligned}$$
(75)

where \(e_n\) is the image ellipticity for the nth object, \(w_n\) is a statistical weight for the nth object, and \(\delta _g\) is the spin-2 complex distortion (Schneider and Seitz 1995):

$$\begin{aligned} \delta _g \equiv \frac{2g}{1+|g|^2}. \end{aligned}$$
(76)

Note that the complex distortion \(\delta _g\) is invariant under the transformation \(g\rightarrow 1/g^*\).

For an intrinsically circular source with \(e^{(s)}=0\), we have:

$$\begin{aligned} e = \delta _g = \frac{2g}{1+|g|^2}. \end{aligned}$$
(77)

On the other hand, in the weak-lensing limit (\(|\kappa |, |\gamma |\ll 1\)), Eq. (74) reduces to \(e^{(s)} \simeq e-2g \simeq e-2\gamma \). Assuming random orientations of source galaxies, we average observed ellipticities over a local ensemble of source galaxies to obtain:

$$\begin{aligned} \gamma \simeq g \simeq \frac{\mathcal{E}(e)}{2}. \end{aligned}$$
(78)

For an input signal of \(g=0.1\), Eq. (77) yields \(e\approx 0.198\). Hence, the weak-lensing approximation (Eq. (78)) gives a reduced-shear estimate of \(g^\mathrm {(est)}\approx 0.099\), corresponding to a negative bias of \(1\%\). For \(g=0.2\) in the mildly nonlinear regime, Eq. (78) gives \(g^\mathrm {(est)}\approx 0.192\), corresponding to a negative bias of \(4\%\).

In real observations, the reduced shear g may be estimated from a local ensemble average of background galaxies as \(\langle g \rangle \simeq \langle e\rangle /2\). The statistical uncertainty in the reduced-shear estimate \(\langle g\rangle \) decreases with increasing the number of background galaxies N (see Sect. 4.2 for more details) as \(\propto \sigma /\sqrt{N}\), with \(\sigma \) the dispersion of background image ellipticities (dominated by the intrinsic shape noise). Weak-lensing analysis thus requires a large number of background galaxies to increase the statistical significance of the shear measurements.

3.4.2 The KSB algorithm: a moment-based approach

For a practical application of weak shear lensing, we must account for various observational and instrumental effects, such as the impact of noise on the galaxy shape measurement (both statistical and systematic uncertainties), the isotropic smearing component of the PSF, and the effect of instrumental PSF anisotropy. Therefore, one cannot simply use Eq. (78) to measure the shear signal from observational data.

A more robust estimate of the shape moments (Eq. (72)) is obtained using a weight function \(W(|\varvec{\theta }|)\) that depends explicitly on the separation \(|\varvec{\theta }|\) from the image centroid. In the KSB approach, a circular Gaussian that is matched to the size of each object is used as a weight function (Kaiser et al. 1995). The quadrupole moments obtained with such a weight function \(W(|\varvec{\theta }|)\) suffer from an additional smearing and do not obey the transformation law (Eq. (74)). Therefore, the expectation value \(\mathcal{E}(e)\) of the image ellipticity is different from the distortion \(\delta _g = 2g/(1+|g|^2)\) (see Eq. (77)).

The KSB formalism (Kaiser et al. 1995; Hoekstra et al. 1998) accounts explicitly for the Gaussian weight function used for measuring noisy shape moments, the effect of spin-2 PSF anisotropy, and the effect of isotropic PSF smearing. The KSB formalism and its variants assume that the PSF can be described as an isotropic function convolved with a small anisotropic kernel. In the limit of linear response to lensing and instrumental anisotropies, KSB derived the transformation law between intrinsic (unlensed) and observed (lensed) complex ellipticities, \(e^{(s)}\) and e, respectively. The linear transformation between intrinsic and observed complex ellipticities can be formally expressed as (Kaiser et al. 1995; Hoekstra et al. 1998; Bartelmann and Schneider 2001):

$$\begin{aligned} e_i = e_i^{(s)} + \left( C^{g}\right) _{ij} g_j + \left( C^q\right) _{ij} q_j \ \ (i,j=1,2), \end{aligned}$$
(79)

where \(q_i\) denotes the spin-2 PSF anisotropy kernel, \((C^q)_{ij}\) is a linear response matrix for the PSF anisotropy \(q_i\), \((C^g)_{ij}\) is a linear response matrix for the reduced shear \(g_i\). The PSF anisotropy kernel and the response matrices can be calculated from observable weighted shape moments of galaxies and stellar objects (Kaiser et al. 1995; Bartelmann and Schneider 2001; Erben et al. 2001). In real observations, the PSF anisotropy kernel \(q(\varvec{\theta })\) can be estimated from image ellipticities \(e^*\) observed for a sample of foreground stars for which \(e^{(s)}\) and g vanish, so that \(q_i(\varvec{\theta })=(C^q)^{-1}_{ij}e^*_{j}\).

Assuming that the expectation value of the intrinsic source ellipticity vanishes \(\mathcal{E}[e^{(s)}]=0\), we find the following linear relation between the reduced shear and the ensemble-averaged image ellipticity:

$$\begin{aligned} g_i = \mathcal{E}\left[ \left( C^{g}\right) ^{-1}_{ij} (e-C^q q)_j \right] \ \ \ (i,j=1,2). \end{aligned}$$
(80)

In the KSB formalism, the shear response matrix \(C^g\) is denoted as \(P^g\) (or \(P^\gamma \)) and dubbed pre-seeing shear polarizability. Similarly, \(C^q\) is denoted as \(P^\mathrm {sm}\) and dubbed smear polarisability.

A careful calibration of the signal response \(P^g\) is essential for any weak shear lensing analysis that relies on accurate shape measurements from galaxy images (see Mandelbaum 2018). The levels of shear calibration bias are often quantified in terms of a multiplicative bias factor m and an additive calibration offset c through the following relation between the true input shear signal, \(g^\mathrm {true}\), and the recovered signal, \(g^\mathrm {obs}\) (Heymans et al. 2006; Massey et al. 2007a; Mandelbaum et al. 2014):

$$\begin{aligned} g_i^{\mathrm {obs}} = (1+m_i) g_i^{\mathrm {true}} + c_i \ \ (i=1,2), \end{aligned}$$
(81)

The original KSB formalism, when applied to noisy observations, is known to suffer from systematic biases that depend primarily on the size and the detection signal-to-noise ratio (S/N) of galaxy images (e.g., Erben et al. 2001; Refregier et al. 2012). Different variants of the Kaiser et al. (1995) method (KSB+) have been developed and implemented in the literature primarily to study mass distributions of high-mass galaxy clusters (e.g., Hoekstra et al. 1998, 2015; Clowe et al. 2004; Umetsu et al. 2010, 2014; Oguri et al. 2012; von der Linden et al. 2014a; Okabe and Smith 2016; Schrabback et al. 2018). Note that KSB+ pipelines calibrated against realistic image simulations of crowded fields can achieve a \(\sim 2\%\) shear calibration accuracy even in the cluster lensing regime (e.g., Herbonnet et al. 2019; Hernández-Martín et al. 2020).

3.5 Tangential- and cross-shear components

As we have seen in Sect. 3.1, the spin-2 shear components \(\gamma _1\) and \(\gamma _2\) are coordinate dependent, defined relative to a reference Cartesian coordinate frame (chosen by the observer). It is useful to consider components of the shear that are coordinate independent with respect to a certain reference point, such as the cluster center.

We define a polar coordinate system (\(\vartheta , \varphi \)) centered on an arbitrary point \(\varvec{\theta }_\mathrm {c}\) on the sky, such that \(\varvec{\theta }=(\vartheta \cos \varphi ,\vartheta \sin \varphi )+\varvec{\theta }_\mathrm {c}\). The convergence \(\overline{\kappa }(\vartheta )\) averaged within a circle of radius \(\vartheta \) about \(\varvec{\theta }_\mathrm {c}\) is then expressed as:

$$\begin{aligned} \overline{\kappa }(\vartheta )&:= \frac{2}{\vartheta ^2}\int _0^\vartheta \text {d}\vartheta '\, \vartheta ' \kappa (\vartheta ') \equiv \frac{\overline{\varSigma }(\vartheta )}{\varSigma _\mathrm {cr}},\nonumber \\ \kappa (\vartheta )&:=\oint \frac{\text {d}\varphi }{2\pi }\,\kappa (\vartheta ,\varphi )\equiv \frac{\varSigma (\vartheta )}{\varSigma _\mathrm {cr}}, \end{aligned}$$
(82)

where \(\overline{\varSigma }(\vartheta )\) is the surface mass density averaged within a circle of radius \(\vartheta \) about \(\varvec{\theta }_\mathrm {c}\) and \(\varSigma (\vartheta )\) is the surface mass density averaged over a circle of radius \(\vartheta \) about \(\varvec{\theta }_\mathrm {c}\). The reference point \(\varvec{\theta }_\mathrm {c}\) can be taken to be the cluster center, which can be determined from symmetry of the strong-lensing pattern, the X-ray centroid position, or the BCG position.

Let us introduce the tangential and \(45^\circ \)-rotated cross-shear components, \(\gamma _+(\varvec{\theta })\) and \(\gamma _\times (\varvec{\theta })\), respectively, defined relative to the position \(\varvec{\theta }_\mathrm {c}\) as:

$$\begin{aligned} \gamma _+(\varvec{\theta })&= -\gamma _1(\varvec{\theta })\cos (2\varphi ) - \gamma _2(\varvec{\theta })\sin (2\varphi ),\nonumber \\ \gamma _\times (\varvec{\theta })&= +\gamma _1(\varvec{\theta })\sin (2\varphi ) - \gamma _2(\varvec{\theta })\cos (2\varphi ), \end{aligned}$$
(83)

which are directly observable in the weak-lensing limit where \(|\kappa |,|\gamma |\ll 1\) (see Sect. 3.4). Using the two-dimensional version of Gauss’ theorem, we find the following identity for an arbitrary choice of \(\varvec{\theta }_\mathrm {c}\) (Kaiser 1995):

$$\begin{aligned} \gamma _+(\vartheta )&:= \oint \frac{\text {d}\varphi }{2\pi }\,\gamma _+(\vartheta ,\varphi ) = \overline{\kappa }(\vartheta ) -\kappa (\vartheta )\equiv \frac{\varDelta \varSigma (\vartheta )}{\varSigma _\mathrm {cr}},\nonumber \\ \gamma _\times (\vartheta )&:= \oint \frac{\text {d}\varphi }{2\pi }\,\gamma _\times (\vartheta ,\varphi )= 0, \end{aligned}$$
(84)

where we have defined the excess surface mass density \(\varDelta \varSigma (\vartheta )\) around \(\varvec{\theta }_\mathrm {c}\) as a function of \(\vartheta \) by (Miralda-Escude 1991):

$$\begin{aligned} \varDelta \varSigma (\vartheta ) = \overline{\varSigma }(\vartheta ) -\varSigma (\vartheta ). \end{aligned}$$
(85)

From Eqs. (82) and (84), we find:

$$\begin{aligned} \frac{\text {d}\overline{\kappa }(\vartheta )}{\text {d}\ln \vartheta } =-2\gamma _+(\vartheta ). \end{aligned}$$
(86)

Equation (84) shows that, given an arbitrary circular loop of radius \(\vartheta \) around the chosen center \(\varvec{\theta }_\mathrm {c}\), the tangential and cross-shear components averaged around the loop extract E-mode and B-mode distortion patterns (Sect. 3.2).

An important implication of the first of Eq. (84) is that, with tangential shear measurements from individual source galaxies (see Sect. 3.4), one can directly determine the azimuthally averaged \(\varDelta \varSigma (\vartheta )\) profile around lenses in the weak-lensing regime, even if the mass distribution \(\varSigma (\varvec{\theta })\) is not axis-symmetric around \(\varvec{\theta }_\mathrm {c}\). Moreover, the second of Eq. (84) tells us that the azimuthally averaged \(\times \) component, or the B-mode signal, is expected to be statistically consistent with zero if the signal is due to weak lensing. Therefore, a measurement of the B-mode signal \(\langle g_\times (\vartheta )\rangle \) provides a useful null test against systematic errors.

3.6 Reduced tangential shear

3.6.1 Azimuthally averaged reduced tangential shear

The reduced tangential shear \(g_+(\vartheta )\) averaged over a circle of radius \(\vartheta \) about an arbitrary reference point \(\varvec{\theta }_\mathrm {c}\) is expressed as:

$$\begin{aligned} g_+(\vartheta ) := \oint \frac{\text {d}\varphi }{2\pi }\,g_+(\vartheta ,\varphi ) =\oint \frac{\text {d}\varphi }{2\pi }\,\frac{\gamma _+(\vartheta ,\varphi )}{1-\kappa (\vartheta ,\varphi )}. \end{aligned}$$
(87)

If the projected mass distribution around a cluster has quasi-circular symmetry (e.g., elliptical symmetry), then the azimuthally averaged reduced tangential shear \(\langle g_+(\vartheta )\rangle \) around the cluster center can be interpreted as:

$$\begin{aligned} g_+(\vartheta ) \simeq \frac{\gamma _+(\vartheta )}{1-\kappa (\vartheta )}, \end{aligned}$$
(88)

where \(\gamma _+(\vartheta )\) and \(\kappa (\vartheta )\) are the tangential shear and the convergence, respectively, averaged over a circle of radius \(\vartheta \) about \(\varvec{\theta }_\mathrm {c}\).

According to N-body simulations in hierarchical \({\varLambda }\hbox {CDM}\) models of cosmic structure formation, dark-matter halos exhibit aspherical mass distributions that can be well approximated by triaxial mass models (e.g., Jing and Suto 2002; Limousin et al. 2013; Despali et al. 2014). Since triaxial halos have elliptical isodensity contours in projection on the sky (Stark 1977), Eq. (88) can give a good approximation to describe the weak-lensing signal for regular clusters with a modest degree of perturbation. However, the approximation is likely to break down for merging and interacting lenses having complex, multimodal mass distributions. To properly model the weak-lensing signal in such a complex merging system, one needs to directly model the two-dimensional reduced-shear field \((g_1(\varvec{\theta }),g_2(\varvec{\theta }))\) with a lens model consisting of multicomponent halos (e.g., Watanabe et al. 2011; Okabe et al. 2011; Medezinski et al. 2013). Alternatively, one may attempt to reconstruct the convergence field \(\kappa (\varvec{\theta })\) in a free-form manner from the observed reduced shear field, with additional constraints or assumptions to break the mass-sheet degeneracy (e.g., Jee et al. 2005; Bradač et al. 2006; Merten et al. 2009; Jauzac et al. 2012; Umetsu et al. 2015; Tam et al. 2020).

On the other hand, for a statistical ensemble of galaxy clusters, the average mass distribution around their centers tends to be spherically symmetric if the assumption of statistical isotropy holds (e.g., Okabe et al. 2013). Hence, the stacked weak-lensing signal for a statistical ensemble of clusters can be interpreted using Eq. (88). For more details, see Sects. 3.6.4 and 4.5.

3.6.2 Source-averaged reduced tangential shear

With the assumption of quasi-circular symmetry in the projected mass distribution around clusters (see Eq. (88)), let us consider the nonlinear effects on the source-averaged cluster lensing profiles. The reduced tangential shear for a given lens–source pair is written as:

$$\begin{aligned} g_+(\vartheta |z_l,z_s)=\varDelta \varSigma (\vartheta )\sum _{n=0}^\infty \left[ \varSigma _\mathrm {cr}^{-1}(z_l,z_s)\right] ^{n+1}\varSigma ^n(\vartheta ). \end{aligned}$$
(89)

To begin with, let us consider the expectation value for the reduced tangential shear averaged over an ensemble of source galaxies. For a given cluster, the source-averaged reduced tangential shear is expressed as:

$$\begin{aligned} \langle g_+(\vartheta |z_l)\rangle = \varDelta \varSigma \left[ \langle \varSigma _{\mathrm {cr},l}^{-1}\rangle + \langle \varSigma _{\mathrm {cr},l}^{-2}\rangle \varSigma + \langle \varSigma _{\mathrm {cr},l}^{-3}\rangle \varSigma ^2+\cdots \right] , \end{aligned}$$
(90)

where \(\langle \cdots \rangle \) denotes the averaging over all sources, defined such that:

$$\begin{aligned} \langle \varSigma _{\mathrm {cr},l}^{-n}\rangle = \left( \sum _{s}w_{s}\varSigma _{\mathrm {cr},ls}^{-n}\right) \left( \sum _s w_{s}\right) ^{-1}, \end{aligned}$$
(91)

where the index s runs over all source galaxies around the lens (l) and \(w_s\) is a statistical weight for each source galaxy. An optimal choice for the statistical weight is \(w_{s} = 1 / \sigma _{g_{+},s}^{2}\), with \(\sigma _{g_{+},s}\) the statistical uncertainty of \(g_+(\vartheta |z_l,z_s)\) estimated for each source galaxy. Note that this choice for the weight assumes that \(\sigma _{g_{+},s}\) is independent of the lensing shear signal (see Schneider and Seitz 1995; Seitz and Schneider 1995). In the continuous limit, Eq. (91) is written as:

$$\begin{aligned} \langle \varSigma _{\mathrm {cr},l}^{-n}\rangle = \left[ \int _0^\infty \text {d}z\,\frac{\text {d}N(z)}{\text {d}z}w(z)\varSigma _\mathrm {cr}^{-n}(z_l,z)\right] \left[ \int _0^\infty \text {d}z\,\frac{\text {d}N(z)}{\text {d}z}w(z)\right] ^{-1}, \end{aligned}$$
(92)

with \(\text {d}N(z)/\text {d}z\) the redshift distribution function of the source sample and w(z) a statistical weight function. For a given cluster lens, \(\varSigma _\mathrm {cr}^{-1}(z_l,z_s)\propto D_{ls}/D_s\), so that \(\langle \varSigma _{\mathrm {cr},l}^{-n}\rangle \propto \langle (D_{ls}/D_s)^{n}\rangle \).

In the weak-lensing limit, Eq. (90) gives \(\langle g_+\rangle \simeq \langle \gamma _+\rangle \). The next order of approximation is given by (Seitz and Schneider 1997):

$$\begin{aligned} \langle g_+\rangle&\simeq \frac{\langle \gamma _+\rangle }{1-f_l\langle \kappa \rangle },\nonumber \\ f_l&= \frac{\langle \varSigma _{\mathrm {cr},l}^{-2}\rangle }{\langle \varSigma _{\mathrm {cr},l}^{-1}\rangle ^2}. \end{aligned}$$
(93)

From Eq. (93), we see that an interpretation of the averaged weak-lensing signal \(\langle g_+(\vartheta |z_l)\rangle \) does not require knowledge of individual source redshifts. Instead, it requires ensemble information regarding the statistical redshift distribution dN(z)/dz of background source galaxies used for weak-lensing measurements.

For a lens at sufficiently low redshift (see Sect. 2.6.5), \(f_l\approx 1\), thus leading to the single source-plane approximation: \(\langle g_+\rangle \simeq \langle \gamma _+\rangle /(1-\langle \kappa \rangle )\). The level of bias introduced by this approximation is \(\varDelta \langle g_+\rangle /\langle g_+\rangle \simeq (f_l-1)\langle \kappa \rangle \). In typical ground-based deep observations of \(z_l\sim 0.4\) clusters, \(\varDelta f_l=f_l-1\) is found to be of the order of several percent (Umetsu et al. 2014), so that the relative error \(\varDelta \langle g_+\rangle /\langle g_+\rangle \) is negligibly small in the mildly nonlinear regime of cluster lensing.

3.6.3 Source-averaged excess surface mass density

Next, let us consider the following estimator for the excess surface mass density \(\varDelta \varSigma (\vartheta )\) for a given lens–source pair:

$$\begin{aligned} \varDelta \varSigma _+(\vartheta |z_l,z_s) := \varSigma _\mathrm {cr}(z_l,z_s)g_+(\vartheta |z_l,z_s). \end{aligned}$$
(94)

This assumes that an estimate of \(\varSigma _\mathrm {cr}^{-1}(z_l,z_s)\) for each individual source galaxy is available, for example, from photometric-redshift (photo-z) measurements. This estimator is widely used in recent cluster weak-lensing studies thanks to the availability of multiband imaging data and the advances in photo-z techniques (see Sect. 4.1).

In real observations, if the photo-z probability distribution function (PDF), \(P_s(z)\), is available for individual source galaxies (s), one can calculate:

$$\begin{aligned} \varSigma ^{-1}_{\mathrm {cr},ls}\equiv \left[ \int \text {d}z\,P_s(z)\varSigma _\mathrm {cr}^{-1}(z_l,z)\right] \left[ \int \text {d}z\,P_s(z)\right] ^{-1} \end{aligned}$$
(95)

averaged over the PDF for each source galaxy. Similarly to Eq. (90), \(\varDelta \varSigma _+(\vartheta |z_l,z_s)\) averaged over all sources is expressed as:

$$\begin{aligned} \langle \varDelta \varSigma _+\rangle = \varDelta \varSigma \left[ 1 + \langle \varSigma _{\mathrm {cr},l}^{-1}\rangle \varSigma + \langle \varSigma _{\mathrm {cr},l}^{-2}\rangle \varSigma ^2+\cdots \right] \end{aligned}$$
(96)

with

$$\begin{aligned} \langle \varSigma _{\mathrm {cr},l}^{-n}\rangle = \left( \sum _{s}w_{ls}\varSigma _{\mathrm {cr},ls}^{-n}\right) \left( \sum _s w_{ls}\right) ^{-1}, \end{aligned}$$
(97)

where the index s runs over all source galaxies around the lens (l) and \(w_{ls}\) is a statistical weight for each source galaxy. An optimal choice for the statistical weight is:

$$\begin{aligned} w_{ls} = \varSigma _{\mathrm {cr},ls}^{-2} / \sigma _{g_{+},s}^{2}, \end{aligned}$$
(98)

where \(\sigma _{g_{+},s}\) is the statistical uncertainty of \(g_+(\vartheta |z_l,z_s)\) estimated for each source galaxy (Sect. 3.6.2).

In the weak-lensing limit, we thus have \(\langle \varDelta \varSigma _+\rangle \simeq \varDelta \varSigma \). The next order of approximation is:

$$\begin{aligned} \langle \varDelta \varSigma _+\rangle \simeq \frac{\varDelta \varSigma }{1-\langle \varSigma _{\mathrm {cr},l}^{-1}\rangle \varSigma }. \end{aligned}$$
(99)

3.6.4 Lens–source-averaged excess surface mass density

Finally, we consider an ensemble of galaxy clusters. Now, let \(\varDelta \varSigma \) be the ensemble mass distribution of these clusters. Then, \(\varDelta \varSigma _+\) averaged over all lens–source (ls) pairs is expressed as (Johnston et al. 2007):

$$\begin{aligned} \langle \langle \varDelta \varSigma _+\rangle \rangle = \varDelta \varSigma \left[ 1 + \langle \langle \varSigma _{\mathrm {cr}}^{-1}\rangle \rangle \varSigma + \langle \langle \varSigma _{\mathrm {cr}}^{-2}\rangle \rangle \varSigma ^2+...\right] \end{aligned}$$
(100)

with

$$\begin{aligned} \langle \langle \varSigma _{\mathrm {cr}}^{-n}\rangle \rangle = \left( \sum _{l,s}w_{ls}\varSigma _{\mathrm {cr},ls}^{-n}\right) \left( \sum _{l,s} w_{ls}\right) ^{-1}, \end{aligned}$$
(101)

where \(\langle \langle \cdots \rangle \rangle \) denotes the averaging over all lens–source pairs, the double summation is taken over all clusters (l) and all source galaxies (s), and \(w_{ls}\) is a statistical weight for each lens–source pair (ls). An optimal choice for the statistical weight is given by Eq. (98).

Again, the weak-lensing limit yields \(\langle \langle \varDelta \varSigma _+\rangle \rangle \simeq \varDelta \varSigma \) and the next order of approximation is given by (Umetsu et al. 2014, 2020):

$$\begin{aligned} \langle \langle \varDelta \varSigma _+\rangle \rangle \simeq \frac{\varDelta \varSigma }{1-\langle \langle \varSigma _\mathrm {cr}^{-1}\rangle \rangle \varSigma }. \end{aligned}$$
(102)

Equation (102) can be used to interpret the stacked weak-lensing signal including the nonlinear regime of cluster lensing. In Sect. 4.5, we provide more details on the stacked weak-lensing methods.

3.7 Aperture mass densitometry

In this subsection, we introduce a nonparametric technique to infer a projected total mass estimate from weak shear lensing observations. Integrating Eq. (86) between two concentric radii \(\vartheta _\mathrm {in}\) and \(\vartheta _\mathrm {out}(>\vartheta _\mathrm {in})\), we obtain an expression for the \(\zeta \) statistic as (Fahlman et al. 1994; Kaiser 1995; Squires and Kaiser 1996):

$$\begin{aligned} \zeta (\vartheta _\mathrm {in},\vartheta _\mathrm {out})&:=\overline{\kappa }(\vartheta _\mathrm {in})- \overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out})\nonumber \\&=\frac{2}{1-(\vartheta _\mathrm {in}/\vartheta _\mathrm {out})^2} \int _{\vartheta _\mathrm {in}}^{\vartheta _\mathrm {out}} \text {d}\ln \vartheta ' \,\gamma _+(\vartheta '), \end{aligned}$$
(103)

where \(\overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) is the convergence averaged within a concentric annulus between \(\vartheta _\mathrm {in}\) and \(\vartheta _\mathrm {out}\):

$$\begin{aligned} \overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out}) :=\frac{1}{\pi (\vartheta _\mathrm {out}^2-\vartheta _\mathrm {in}^2)} \int _{\vartheta _\mathrm {in}}^{\vartheta _\mathrm {out}}\text {d}\vartheta '\,\vartheta ' \kappa (\vartheta '). \end{aligned}$$
(104)

In the weak-lensing regime where \(\gamma _+(\vartheta )\simeq g_+(\vartheta )\), \(\zeta \) can be determined uniquely from the shape distortion field in a finite annular region at \(\vartheta _\mathrm {in}\leqslant \theta \leqslant \vartheta _\mathrm {out}\), because additive constants \(\kappa _0\) from the invariance transformation (Eq. (49)) cancel out in Eq. (103). Note that this technique is also referred to as aperture mass densitometry.

Since galaxy clusters are highly biased tracers of the cosmic mass distribution, \(\overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) around a cluster is expected to be positive, so that \(\zeta (\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) yields a lower limit to \(\overline{\kappa }(\vartheta _\mathrm {in})\). That is, the quantity \(M_\zeta \equiv \pi (D_l\vartheta _\mathrm {in})^2 \varSigma _\mathrm {cr}\zeta (\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) yields a lower limit to the projected mass inside a circular aperture of radius \(\vartheta _\mathrm {in}\), \(M_\mathrm {2D}=\pi (D_l\vartheta _\mathrm {in})^2\overline{\varSigma }(\vartheta _\mathrm {in})\). This technique provides a powerful means to estimate the total cluster mass from shear data in the weak-lensing regime \(|\gamma |\ll 1\).

We now introduce a variant of aperture mass densitometry defined as (Clowe et al. 2000):

$$\begin{aligned} \zeta _\mathrm {c}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out})&:= \overline{\kappa }(\vartheta )-\overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out})\nonumber \\&= 2\int _{\vartheta }^{\vartheta _\mathrm {in}}\text {d}\ln \vartheta '\,\gamma _+(\vartheta ') + \frac{2}{1-(\vartheta _\mathrm {in}/\vartheta _\mathrm {out})^2} \int _{\vartheta _\mathrm {in}}^{\vartheta _\mathrm {out}} \text {d}\ln \vartheta '\,\gamma _+(\vartheta '), \end{aligned}$$
(105)

where the aperture radii \((\vartheta ,\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) satisfy \(\vartheta< \vartheta _\mathrm {in} < \vartheta _\mathrm {out}\), and the first and second terms in the second line of Eq. (105) are equal to \(\overline{\kappa }(\vartheta )-\overline{\kappa }(\vartheta _\mathrm {in})\) and \(\overline{\kappa }(\vartheta _\mathrm {in})-\overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out})\), respectively. In the weak-lensing limit, the quantity

$$\begin{aligned} M_{\zeta _\mathrm {c}}(<\vartheta )\equiv \pi (D_l\vartheta )^2\varSigma _\mathrm {cr}\zeta _\mathrm {c}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out}) \end{aligned}$$
(106)

yields a lower limit to the projected mass inside a circular aperture of radius \(\vartheta \), that is:

$$\begin{aligned} M_\mathrm {2D}(<\vartheta )=\pi (D_l\vartheta )^2\overline{\varSigma }(\vartheta ). \end{aligned}$$
(107)

We can regard \(\zeta _\mathrm {c}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) as a function of \(\vartheta \) for fixed values of \((\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) and measure \(\zeta _\mathrm {c}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out})\) at several independent aperture radii \(\vartheta \). As in the case of the standard \(\zeta \) statistic (Eq. (103)), one may choose the inner and outer annular radii (\(\vartheta _\mathrm {in}, \vartheta _\mathrm {out}\)) to lie in the weak-lensing regime where \(g_+\simeq \gamma _+\). In general, however, \(\vartheta \) may lie in the nonlinear regime where \(\gamma _+\) is not directly observable. In the subcritical regime, we can express \(\gamma _+(\vartheta )\) in terms of the observed reduced tangential shear \(g_+(\vartheta )\) as:

$$\begin{aligned} \gamma _+(\vartheta ) = g_+(\vartheta ) [1-\kappa (\vartheta )], \end{aligned}$$
(108)

when assuming a quasi-circular symmetry in the projected mass distribution (Sect. 3.6). If these conditions are satisfied, for a given boundary condition \(\overline{\kappa }_0\equiv \overline{\kappa }(\vartheta _\mathrm {in},\vartheta _\mathrm {out})\), Eq. (105) can be solved iteratively as (Umetsu et al. 2010):

$$\begin{aligned} \zeta _\mathrm {c}^{(n+1)}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out})&= 2\int _{\vartheta }^{\vartheta _\mathrm {in}}\text {d}\ln \vartheta '\, g_+(\vartheta ') \left[ 1-\kappa ^{(n)}(\vartheta ')\right] \nonumber \\&+ \frac{2}{1-(\vartheta _\mathrm {in}/\vartheta _\mathrm {out})^2} \int _{\vartheta _\mathrm {in}}^{\vartheta _\mathrm {out}} \text {d}\ln \vartheta '\, g_+(\vartheta '),\nonumber \\ \kappa ^{(n)}(\vartheta )&= \hat{\mathcal{L}}\zeta _\mathrm {c}^{(n)}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out}) + \overline{\kappa }_0, \end{aligned}$$
(109)

where we have introduced a differential operator defined as \(\hat{\mathcal{L}}(\vartheta ) = \frac{1}{2\vartheta ^2}\frac{{\text{d}}}{\text {d}\ln \vartheta }\vartheta ^2\) that satisfies \(\hat{\mathcal{L}}\overline{\kappa }(\vartheta )=\kappa (\vartheta )\) and \(\hat{\mathcal{L}}1 = 1\), and the quantities indexed by (n) refer to those in the nth iteration (\(n=0,1,2,\ldots \)).

We solve a discretized version of Eq. (109). See Appendix A of Umetsu et al. (2016) for discretized expressions for \(g_+(\vartheta )\) and \(\overline{\kappa }(\vartheta )\). One can start the iteration process with an initial guess of \(\kappa ^{(0)}(\vartheta )=0\) for all \(\vartheta \) bins and repeat it until convergence is reached in all \(\vartheta \) bins. This procedure will yield a solution for the binned mass profile:

$$\begin{aligned} \overline{\kappa }(\vartheta ) =\zeta _\mathrm {c}(\vartheta |\vartheta _\mathrm {in},\vartheta _\mathrm {out})+\overline{\kappa }_0, \end{aligned}$$
(110)

for a given value of \(\overline{\kappa }_0\). Note that the errors for the mass profile solution in different radial bins are correlated. The bin-to-bin error covariance matrix \(C_{bb'}\equiv \mathrm {Cov}[\overline{\kappa }(\vartheta _b),\overline{\kappa }(\vartheta _{b'})]\) (\(b,b'=1,2,\ldots \)) can be calculated with the linear approximation \(\kappa (\vartheta )\ll 1\) in Eq. (109), by propagating the errors for the binned \(g_+(\vartheta )\) profile (e.g., Okabe and Umetsu 2008; Umetsu et al. 2010; Okabe et al. 2010).

Alternatively, one can attempt to determine the boundary term \(\overline{\kappa }_0\) from shear data by incorporating additional iteration loops. Starting with an initial guess of \(\overline{\kappa }_0=0\), one can update the value of \(\overline{\kappa }_0\) in each iteration using a specific mass model (e.g., a power-law profile) that best fits the binned \(\overline{\kappa }(\vartheta )\) profile. This iteration procedure is repeated until convergence is obtained (see Umetsu et al. 2010).

4 Standard shear analysis methods

In this section, we outline procedures to obtain cluster mass estimates from azimuthally averaged reduced tangential shear measurements for a given galaxy cluster.

4.1 Background source selection

A critical source of systematics in weak lensing comes from accurately estimating the redshift distribution of background source galaxies, which is needed to convert the lensing signal into physical mass units (Medezinski et al. 2018b). Contamination of background samples by unlensed foreground and cluster galaxies with \(\varSigma _\mathrm {cr}^{-1}(z_l,z_s)=0\), when not accounted for, leads to a systematic underestimation of the true lensing signal. Inclusion of foreground galaxies produces a dilution of the lensing signal that does not depend on the cluster-centric radius. In contrast, the inclusion of cluster galaxies significantly dilutes the lensing signal at smaller cluster radii and causes an underestimation of the concentration of the cluster mass profile (Broadhurst et al. 2005b), as well as of the halo mass \(M_\varDelta \), especially at higher overdensities \(\varDelta \). The level of contamination by cluster galaxies increases with the cluster mass or richness (see Fig. 12). A secure selection of background galaxies is thus key for obtaining accurate cluster mass estimates from weak gravitational lensing (Medezinski et al. 2007, 2010, 2018b; Umetsu and Broadhurst 2008; Okabe et al. 2013; Gruen et al. 2014).

Fig. 12
figure 12

Stacked weak-lensing profiles around CAMIRA clusters from the Subaru HSC survey, shown as a function of cluster-centric comoving radius R. This figure compares different source-selection methods for two different richness bins (left for \(20\leqslant N \leqslant 50\) and right or \(50\leqslant N \leqslant 200\)). Upper panels show the excess surface mass density profile \(\langle \langle \varDelta \varSigma _+\rangle \rangle \); middle panels show the \(45^\circ \)-rotated component, expected to be consistent with zero; and lower panels show the effective number density of source galaxies. All quantities in this figure were calculated using photo-z PDFs of individual source galaxies, P(z) (see also Sect. 3.6.3). Different lines in each panel show different source-selection schemes: using all galaxies incorporating their P(z) (black), using P-cut selected galaxies (for which 98% of P(z) lies behind \(z_l+0.2\); cyan), or CC-cut-selected galaxies (blue). Weak-lensing S/N values for each selection are given in the legend of each panel. Image reproduced with permission from Medezinski et al. (2018b), copyright by the author

In real observations, acquiring spectroscopic redshifts for individual source galaxies is not feasible, particularly to the depths of weak-lensing observations. Instead of spectroscopic redshifts, photo-zs can be used when multiband imaging is available. Cluster weak-lensing studies, however, often rely on two to three optical bands for deep imaging (e.g., Broadhurst et al. 2005b; Medezinski et al. 2010; Oguri et al. 2012; Okabe and Smith 2016), so that reliable photo-zs could not be obtained. Instead, well-calibrated field photo-z catalogs, such as COSMOS (Ilbert et al. 2009; Laigle et al. 2016), were used to determine the redshift distribution \(\text {d}N(z)/\text {d}z\) of background galaxies for a given color–magnitude selection (Medezinski et al. 2010; Okabe et al. 2010). Such field surveys are often limited to deep but small areas and thus subject to cosmic variance.

Dedicated wide-area optical surveys, such as the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP; Miyazaki et al. 2018a; Aihara et al. 2018a, b), the Dark Energy Survey (DES; Abbott et al. 2018), and the upcoming Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008), are designed to observe in several broad bands, so that photo-zs are better determined. These photo-z estimates will still suffer from a large fraction of outliers due to inherent color–redshift degeneracies, as limited by a finite number of broad optical bands. The photo-z uncertainties are folded in by incorporating the full PDF for each source galaxy (Applegate et al. 2014). However, photo-z PDFs are often sensitive to the assumed priors. Moreover, the accuracy of photo-z PDFs will be limited by the representability of spectroscopic-redshift samples used for calibration. Alternative approaches rely on more stringent color cuts to reject objects with biased photo-zs (Medezinski et al. 2010, 2011; Umetsu et al. 2010, 2012, 2014; Okabe et al. 2013), which, however, lead to lower statistical power, because they result in lower source galaxy densities.

Using the first-year CAMIRA (Cluster finding Algorithm based on Multiband Identification of Red-sequence gAlaxies; Oguri et al. 2018) catalog of \(\sim 900\) clusters (\(0.1<z_l<1.1\)) with richness \(N \geqslant 20\) found in \(\sim 140\,\hbox {deg}^2\) of HSC-SSP survey data, Medezinski et al. (2018b) investigated robust source-selection methods for cluster weak lensing. They compared three different source-selection schemes: (1) relying on photo-z’s and their full PDFs P(z) to correct for dilution (all), (2) selecting background galaxies in color–color space (CC-cut), and (3) selection of robust photo-z’s by applying constraints on their cumulative PDF (P-cut). All three methods use photo-z PDFs of individual source galaxies, P(z), to convert the lensing signals into physical mass units. With perfect P(z) information, all these methods should thus yield consistent, undiluted \(\langle \langle \varDelta \varSigma _+(R)\rangle \rangle \) profiles. After applying basic quality cuts, Medezinski et al. (2018b) found the typical mean unweighted galaxy number density in the HSC shape catalog to be \(n_\mathrm {g}=21.7\,\hbox {arcmin}^{-2}\). Similarly, they found \(n_\mathrm {g}=11.6\,\hbox {arcmin}^{-2}\) and \(n_\mathrm {g}=13.8\,\hbox {arcmin}^{-2}\) for cluster lenses at \(z_l<0.4\) using the CC-cut and P-cut methods, respectively.

Medezinski et al. (2018b) showed that simply relying on the photo-z PDFs (all) results in a \(\langle \langle \varDelta \varSigma _+\rangle \rangle \) profile that suffers from dilution due to residual contamination by cluster galaxies. Using proper limits, the CC- and P-cut methods give consistent \(\langle \langle \varDelta \varSigma _+\rangle \rangle \) profiles with minimal dilution. Differences are only seen for rich clusters with \(N \geqslant 50\), where the P-cut method produces a slightly diluted signal in the innermost radial bin compared to the CC-cuts (see Fig. 12). Employing either the P-cut or CC-cut selection results in cluster contamination consistent with zero to within the 0.5% uncertainties. For more details on the source-selection methods, we refer the reader to Medezinski et al. (2018b) and references therein. An alternative approach to correct for dilution of the lensing signal is to statistically estimate the level of contamination and subtract it off (e.g., Varga et al. 2019), in which the effect of magnification bias must be properly taken into account (see Sect. 5).

4.2 Tangential shear signal

Here, we describe a procedure to derive azimuthally averaged radial profiles of the tangential (\(+\)) and cross (\(\times \)) shear components around a given cluster lens at a certain redshift, \(z_l\). Specifically, we calculate for each cluster the lensing profiles, \(\{\langle g_+(\vartheta _b)\rangle \}_{b=1}^{N_\mathrm {bin}}\) and \(\{\langle g_\times (\vartheta _b)\rangle \}_{b=1}^{N_\mathrm {bin}}\), in \(N_\mathrm {bin}\) discrete cluster-centric bins spanning the range \(\vartheta \in [\vartheta _\mathrm {min},\vartheta _\mathrm {max}]\).

Since weak shear measurements of individual background galaxies (Eq. (80)) are very noisy, we calculate the weighted average of the source ellipticity components as:

$$\begin{aligned} \langle g_+(\vartheta _b\rangle&= \frac{\sum _{s\in b}w_{s}\,g_{+,s}}{\sum _{s\in b} w_{s}},\nonumber \\ \langle g_\times (\vartheta _b)\rangle&=\frac{\sum _{s\in b} w_{s}\,g_{\times ,s}}{\sum _{s\in b}w_{s}}, \end{aligned}$$
(111)

where the summation is taken over all source galaxies (s) that lie in the bin (b); \(g_{+,s}\) and \(g_{\times ,s}\) represent the tangential and \(45^\circ \)-rotated cross components of the reduced shear (Eq. (83)), respectively, estimated for each source galaxy; and \(w_s\) is its statistical weight. The azimuthally averaged cross component, \(\langle g_\times (\vartheta )\rangle \), is expected to be statistically consistent with zero (see Sect. 3.6.1).

The statistical uncertainty per shear component per source galaxy is denoted by \(\sigma _{g_{+},s} = \sigma _{g_{\times },s} \equiv \sigma _{g,s}\), which is dominated by the shape noise. Here, \(\sigma _{g,s}\) includes both contributions from the shape measurement uncertainty and the intrinsic dispersion of source ellipticities (e.g., Mandelbaum 2018). In general, an optimal choice for weighting is to apply an inverse-variance weighting with \(w_s = 1/\sigma _{g,s}^2\) (Sect. 3.6.2). However, using inverse-variance weights from noisy variance estimates may result in an unbalanced weighting scheme (e.g., sensitive to extreme values). To avoid this, one can employ a variant of inverse-variance weighting, \(w_s = 1/(\sigma _{g,s}^2+\alpha _g^2)\), with \(\alpha _g\) a properly chosen softening constant (see, e.g., Hamana et al. 2003; Umetsu et al. 2009, 2014; Okabe et al. 2010; Oguri et al. 2010; Okabe and Smith 2016). The error variance per shear component for \(\langle g_{+,\times }(\vartheta _b)\rangle \) is given by:

$$\begin{aligned} \sigma ^2_\mathrm {shape}(\vartheta _b) = \frac{\sum _{s\in b} w_s^2\sigma _{g,s}^2}{\left( \sum _{s\in b} w_s\right) ^2}, \end{aligned}$$
(112)

where we have assumed isotropic, uncorrelated shape noise, \(\mathcal{E}(\varDelta g_{i,s}\varDelta g_{j,s'})=\sigma _{g,s}^2\delta _{s s'}\delta _{ij}\) (\(i,j=+,\times \)) with s and \(s'\) running over all source galaxies.

To quantify the significance of the tangential shear profile measurement \(\{g_+(\vartheta _b)\}_{b=1}^{N_\mathrm {bin}}\), we define a linear S/N estimator by (Sereno et al. 2017; Umetsu et al. 2020):

$$\begin{aligned} \left( \mathrm {S/N}\right) _\mathrm {L} = \frac{\sum _{b=1}^{N_\mathrm {bin}} g_+(\vartheta _b)/\sigma ^2_\mathrm {shape}(\vartheta _b)}{\left[ \sum _{b=1}^{N_\mathrm {bin}}1/\sigma ^2_\mathrm {shape}(\vartheta _b)\right] ^{1/2}}. \end{aligned}$$
(113)

This estimator gives a weak-lensing S/N integrated over the radial range of the data. Equation (113) assumes that the total uncertainty is dominated by the shape noise and ignores the covariance between different radial bins. Note that we shall use the full covariance matrix for cluster mass measurements (Sect. 4.4). This S/N estimator is different from the conventional quadratic estimator defined by (e.g., Umetsu and Broadhurst 2008; Okabe and Smith 2016):

$$\begin{aligned} \mathrm {S/N} = \left[ \sum _{b=1}^{N_\mathrm {bin}}g_{+}^2(\vartheta _b)/\sigma _{\mathrm {shape}}^2(\vartheta _b) \right] ^{1/2}. \end{aligned}$$
(114)

As discussed by Umetsu et al. (2016, 2020), this quadratic definition breaks down and leads to an overestimation of significance in the noise-dominated regime where the actual per-bin S/N is less than unity.

The observed \(\langle g_+(\vartheta )\rangle \) profile can be interpreted according to Eq. (93). Then, it is important to define the corresponding bin radii \(\vartheta _b\) so as to minimize systematic bias in cluster mass measurements. We define the effective clutter-centric bin radius \(\vartheta _b\) (\(b=1,2,\ldots ,N_\mathrm {bin}\)) using the weighted harmonic mean of lens–source transverse separations as (Okabe and Smith 2016; Sereno et al. 2017):

$$\begin{aligned} \vartheta _{b} = \frac{\sum _{s\in b}w_s}{\sum _{s\in b} w_s \vartheta _s^{-1}}. \end{aligned}$$
(115)

If source galaxies are uniformly distributed in the image plane and \(w_s\) is taken to be constant, Eq. (115) in the continuous limit yields \(\vartheta _b = [\int _{\vartheta _{b1}}^{\vartheta _{b2}}\text {d}\vartheta \vartheta w(\vartheta ))][\int _{\vartheta _{b1}}^{\vartheta _{b2}}\text {d}\vartheta w(\vartheta )]^{-1} = (\vartheta _{b1}+\vartheta _{b2})/2\) for a single radial bin defined in the range \(\vartheta \in [\vartheta _{b1},\vartheta _{b2}]\).Footnote 8

4.3 Lens mass modeling

4.3.1 NFW model

The radial mass distribution of galaxy clusters is often modeled with a spherical Navarro–Frenk–White (Navarro et al. 1996, hereafter NFW) profile, which has been motivated by cosmological N-body simulations (Navarro et al. 1996, 2004). The radial dependence of the two-parameter NFW density profile is given by:

$$\begin{aligned} \rho (r)=\frac{\rho _\mathrm {s}}{(r/r_\mathrm {s})(1+r/r_\mathrm {s})^2}, \end{aligned}$$
(116)

where \(\rho _\mathrm {s}\) is the characteristic density parameter and \(r_\mathrm {s}\) is the characteristic scale radius at which the logarithmic density slope, \(\gamma _\mathrm {3D}(r)\equiv \text {d}\ln {\rho (r)}/\text {d}\ln {r}\), equals \(-2\). The logarithmic gradient of the NFW profile is \(\gamma _\mathrm {3D}(r)=-[1+3(r/r_\mathrm {s})]/[1+(r/r_\mathrm {s})]\). For \(r/r_\mathrm {s}\ll 1\), \(\gamma _\mathrm {3D}\rightarrow -1\), whereas, for \(r/r_\mathrm {s}\gg 1\), \(\gamma _\mathrm {3D}\rightarrow -3\). The radial range where the logarithmic density slope is close to the “isothermal” value of \(-2\) is particularly important, given that such a mass distribution is needed to explain the flat rotation curves observed in galaxies.

The overdensity mass \(M_\varDelta \) is given by integrating Eq. (116) out to the corresponding overdensity radius \(r_\varDelta \) at which the mean interior density is \(\varDelta \times \rho _\mathrm {c}(z_l)\) (Sect. 1). For a physical interpretation of the cluster lensing signal, it is useful to specify the NFW model by the halo mass, \(M_\mathrm {200c}\), and the concentration parameter, \(c_\mathrm {200c}=r_\mathrm {200c}/r_\mathrm {s}\). The characteristic density \(\rho _\mathrm {s}\) is then given by:

$$\begin{aligned} \rho _\mathrm {s}= \frac{\varDelta }{3} \frac{c_{\varDelta }^3}{\ln (1+c_{\varDelta })-c_{\varDelta }/(1+c_{\varDelta })}\rho _\mathrm {c}(z_l). \end{aligned}$$
(117)

Analytic expressions for the radial dependence of the projected NFW profiles, \(\varSigma _\mathrm {NFW}(R)=2\rho _\mathrm {s}r_\mathrm {s}\times f_\mathrm {NFW}(R/r_\mathrm {s})\) and \(\overline{\varSigma }_\mathrm {NFW}(R)=2\rho _\mathrm {s}r_\mathrm {s}\times g_\mathrm {NFW}(R/r_\mathrm {s})\) with \(R=D_l\vartheta \), are given by Wright and Brainerd (2000, see also Bartelmann 1996):

$$\begin{aligned} f_\mathrm {NFW}(x)= {\left\{ \begin{array}{ll} \frac{1}{1-x^2}\left( -1+\frac{2}{\sqrt{1-x^2}}\mathrm {arctanh}{\sqrt{\frac{1-x}{1+x}}}\right) &{} (x <1),\\ \frac{1}{3} &{} (x =1),\\ \frac{1}{x^2-1}\left[ 1-\frac{2}{\sqrt{x^2-1}}\arctan {\sqrt{\frac{x-1}{x+1}}}\right) &{} (x>1), \end{array}\right. } \end{aligned}$$
(118)

and

$$\begin{aligned} g_\mathrm {NFW}(x)= {\left\{ \begin{array}{ll} \frac{2}{x^2}\left[ \frac{2}{\sqrt{1-x^2}}\mathrm {arctanh}\sqrt{\frac{1-x}{1+x}}+\ln \left( \frac{x}{2}\right) \right] &{} (x <1),\\ 2\left[ 1+\ln \left( \frac{1}{2}\right) \right] &{} (x =1),\\ \frac{2}{x^2}\left[ \frac{2}{\sqrt{x^2-1}}\arctan \sqrt{\frac{x-1}{x+1}}+\ln \left( \frac{x}{2}\right) \right] &{} (x>1). \end{array}\right. } \end{aligned}$$
(119)

The excess surface mass density for an NFW halo is then obtained as \(\varDelta \varSigma _\mathrm {NFW}(R)=\overline{\varSigma }_\mathrm {NFW}(R)-\varSigma _\mathrm {NFW}(R)\). These projected NFW functionals provide a good approximation for the projected matter distribution around cluster-size halos (Oguri and Hamana 2011).

Fig. 13
figure 13

Azimuthally averaged radial profiles of the reduced tangential shear \(g_+\) (top panels) and the cross-shear component \(g_\times \) (bottom panels) for Abell 2142 (left) and Abell 1689 (right) based on Subaru Suprime-Cam data. The solid and dashed lines show the best-fit NFW and SIS profiles for each cluster, respectively. The \(45^\circ \)-rotated cross-shear component is expected to be consistent with zero. Image reproduced with permission from Umetsu et al. (2009), copyright by AAS

As an example, we show in Fig. 13 the reduced tangential and \(45^\circ \)-rotated shear profiles \(\langle g_+(\vartheta )\rangle \) and \(\langle g_\times (\vartheta )\rangle \), respectively, for two high-mass clusters, Abell 2142 and Abell 1689, obtained from Subaru Suprime-Cam data (Umetsu et al. 2009). The \(\langle g_+(\vartheta )\rangle \) profiles are compared with their best-fit NFW and singular isothermal sphere (SIS) models. The SIS density profile is given by \(\rho (r)=\sigma _v^2/(2\pi G r^2)\), with \(\sigma _v\) the one-dimensional velocity dispersion. For both clusters, the observed \(\langle g_+(\vartheta )\rangle \) profiles are better fitted by the NFW model having an outward-steepening density profile. Abell 2142 is a nearby cluster at \(z_l=0.091\) perturbed by merging substructures (e.g., Okabe and Umetsu 2008; Umetsu et al. 2009; Liu et al. 2018). The radial curvature observed in the \(\langle g_+\rangle \) profile of Abell 2142 is highly pronounced, so that the power-law SIS model is strongly disfavored by the Subaru weak-lensing data. From the best-fit NFW model, the mass and concentration parameters of Abell 2142 are constrained as \(M_\mathrm {200c}=(9.1\pm 1.9)\times 10^{14}h^{-1}M_\odot \) and \(c_\mathrm {200c}=4.1\pm 0.8\) (Umetsu et al. 2009; Liu et al. 2018).

In contrast, Abell 1689 (\(z_l=0.183\)) is among the best-studied clusters and the most powerful known lenses to date (e.g., Broadhurst et al. 2005a; Limousin et al. 2007; Umetsu and Broadhurst 2008; Lemze et al. 2009; Kawaharada et al. 2010; Coe et al. 2010; Diego et al. 2015; Umetsu et al. 2011b, 2015), characterized by a large Einstein radius (Sect. 2.6.6) of \(\vartheta _\mathrm {Ein}=(47.0\pm 1.2)\,\hbox {arcsec}\) for a fiducial source at \(z_s=2\) (Coe et al. 2010). This indicates a high degree of mass concentration in projection of the sky. In fact, the observed \(\langle g_+(\vartheta )\rangle \) profile of Abell 1689 is well fitted by an NFW profile with a high concentration of \(c_\mathrm {200c}\sim 10\) (Broadhurst et al. 2005b; Umetsu and Broadhurst 2008; Umetsu et al. 2009, 2015; Coe et al. 2010), compared to the theoretical expectations, \(c_\mathrm {200c}\sim 4\) (e.g., Bhattacharya et al. 2013; Diemer and Kravtsov 2015). From full triaxial modeling of two-dimensional weak-lensing, X-ray, and SZE observations, Umetsu et al. (2015) obtained \(M_\mathrm {200c}=(12.1\pm 1.9)\times 10^{14}h^{-1}M_\odot \) and \(c_\mathrm {200c}=7.91\pm 1.41\), which overlaps with the \(1\sigma \) tail of the predicted distribution of halo concentration. Moreover, the multiprobe data set is in favor of a triaxial geometry with a minor-to-major axis ratio of \(c/a=0.39\pm 0.15\) and a major axis closely aligned with the line of sight by \((22\pm 10)^\circ \). Therefore, the superb lensing efficiency of Abell 1689 is likely caused by its intrinsically high mass concentration combined with a chance alignment of its major axis with the line-of-sight direction (see also Oguri et al. 2005).

4.3.2 Halo model

At large cluster-centric distances, the correlated matter around the cluster, namely the 2-halo term (Cooray and Sheth 2002), contributes to the lensing signal. In the standard halo-model prescription (Oguri and Takada 2011; Oguri and Hamana 2011), the total lensing signal \(\varDelta \varSigma (R)\) is given as the sum of the 1-halo and 2-halo terms. The 1-halo term \(\varDelta \varSigma _\mathrm {1h}\) accounts for the mass distribution within the main cluster halo, which can be described by a smoothly truncated NFW profile (Baltz et al. 2009, hereafter BMO):

$$\begin{aligned} \rho (r)=\frac{\rho _\mathrm {s}}{(r/r_\mathrm {s})(1+r/r_\mathrm {s})^2} \left[ 1+\left( \frac{r}{r_\mathrm {t}}\right) ^2\right] ^{-2}, \end{aligned}$$
(120)

where the truncation parameter \(r_\mathrm {t}\) is set to a fixed multiple of the halo outer radius (e.g., \(r_\mathrm {t}\approx 2.6r_\mathrm {vir}\) or \(r_\mathrm {t}\approx 3r_\mathrm {200c}\); see Oguri and Hamana 2011; Covone et al. 2014; Umetsu et al. 2014). Analytic expressions for the radial dependence of the projected BMO profiles are given by Baltz et al. (2009) and Oguri and Hamana (2011).

The 2-halo term contribution \(\varDelta \varSigma _\mathrm {2h}\) to the tangential shear signal is expressed as (Oguri and Takada 2011; Oguri and Hamana 2011, see de Putter and Takada 2010 for the full-sky expression):Footnote 9

$$\begin{aligned} \varDelta \varSigma _\mathrm {2h}(R|M_\mathrm {200c},z_l) = \frac{\overline{\rho }(z) b_\mathrm {h}(M_\mathrm {200c};z_l)}{(1+z_l)^3 D_l^2(z_l)} \int \frac{\ell \text {d}\ell }{2\pi }\,J_2(\ell \vartheta )P(k_\ell ; z_l), \end{aligned}$$
(121)

where \(b_\mathrm {h}(M_\mathrm {200c}; z_l)\) is the linear halo bias (e.g., Tinker et al. 2010), \(k_\ell \equiv \ell /[(1+z_l)D_l(z_l)]\), \(P(k;z_l)\) is the linear matter power spectrum as a function of the comoving wavenumber k evaluated at the cluster redshift \(z_l\), and \(J_n(x)\) is the Bessel function of the first kind and the nth order. We can compute the corresponding radial profile \(\varSigma _\mathrm {2h}(R|M_\mathrm {200c},z_l)\) of the lensing convergence by replacing \(J_2(x)\) in Eq. (121) with the zeroth-order Bessel function \(J_0(x)\). The 2-halo term is proportional to the product \(b_\mathrm {h}\sigma _8^2\), with \(\sigma _8\) the rms amplitude of linear mass fluctuations in a sphere of comoving radius \(8h^{-1}\mathrm {Mpc}\). In the standard \({\varLambda }\hbox {CDM}\) model, the 2-halo term contribution to \(\varDelta \varSigma \) (or \(\varSigma \)) becomes important, on average, at \(R\gtrsim \) several (or two) virial radii (Oguri and Hamana 2011; Becker and Kravtsov 2011). In particular cases where clusters are residing in extremely dense environments, the 2-halo contribution to the lensing signal could become more significant (Sereno et al. 2018a).

4.3.3 DK14 model

Diemer and Kravtsov (2014, hereafter DK14) provide a useful fitting function for the spherically averaged density profile \(\rho (r)\) around dark-matter halos calibrated for a suite of N-body simulations in \({\varLambda }\hbox {CDM}\) cosmologies. The DK14 density profile is given by:

$$\begin{aligned} \varDelta \rho&\equiv \rho (r)-\overline{\rho } =\rho _\mathrm {inner}\times f_\mathrm {trans}+\rho _\mathrm {outer},\nonumber \\ \rho _\mathrm {inner}&= \rho _{-2} \exp \left\{ -\frac{2}{\alpha _\mathrm {E}}\left[ \left( \frac{r}{r_{-2}}\right) ^{\alpha _\mathrm {E}}-1\right] \right\} ,\nonumber \\ f_\mathrm {trans}&= \left[ 1+ \left( \frac{r}{r_\mathrm {t}}\right) ^\beta \right] ^{-\frac{\gamma }{\beta }},\nonumber \\ \rho _\mathrm {outer}&=\frac{b_\mathrm {e}\overline{\rho }}{\varDelta _\mathrm {max}^{-1} + (r/r_\mathrm {piv})^{s_\mathrm {e}}}, \end{aligned}$$
(122)

with \(r_\mathrm {piv}=5r_\mathrm {200m}\) and \(\varDelta _\mathrm {max} = 10^3\), which is introduced as a maximum cutoff density of the outer term to avoid a spurious contribution at small halo radii (Diemer 2018). The inner profile \(\rho _\mathrm {inner}(r)\) describes the intra-halo mass distribution in a multistream region, which is modeled by an Einasto profile (Einasto 1965) with \(\rho _{-2}\) and \(r_{-2}\) the scale density and radius at which the logarithmic slope is \(-2\) and \(\alpha _\mathrm {E}\) the shape parameter describing the degree of profile curvature. The transition term \(f_\mathrm {trans}(r)\) characterizes the steepening around a truncation radius, \(r_\mathrm {t}\). The outer term \(\rho _\mathrm {outer}\), given by a softened power law, is responsible for the material infalling toward the cluster in a single-stream region at large halo radii. DK14 found that this fitting function provides a precise description (\(\lesssim 5\%\)) of their simulated DM density profiles at \(r\lesssim 9r_\mathrm {vir}\). At larger radii (\(r\gtrsim 9r_\mathrm {vir}\)), the outer term is expected to follow a shape proportional to the matter correlation function. As in the case of the NFW profile, it is useful to define the halo concentration by \(c_\varDelta =r_\varDelta /r_{-2}\).

The DK14 profile is described by eight parameters, \((\rho _{-2}, r_{-2}, \alpha _\mathrm {E}, \beta , \gamma , r_\mathrm {t}, b_\mathrm {e}, s_\mathrm {e})\), and is sufficiently flexible to reproduce a range of fitting functions, such as the halo model (Oguri and Hamana 2011; Hikage et al. 2013) and density profiles with a sharp steepening feature associated with the splashback radius (see Sect. 6.3). Equation (122) can be used as a fitting function in conjunction with generic priors for some of the shape and structural parameters (see Diemer and Kravtsov 2014; More et al. 2015; Umetsu and Diemer 2017; Chang et al. 2018). By projecting \(\varDelta \rho (r)\) along the line of sight, we obtain the surface mass density responsible for gravitational lensing as:

$$\begin{aligned} \varSigma (R)=2\int _R^\infty \frac{\varDelta \rho (r)r\text {d}r}{\sqrt{r^2-R^2}}, \end{aligned}$$
(123)

where the line-of-sight integral is evaluated numerically. The publicly available code, colossus (Diemer 2018), implements a range of calculations relating to three-dimensional and projected halo profiles including the NFW, Einasto, and DK14 models.

4.4 Shear likelihood function

The likelihood function \(\mathcal{L}\) of a mass model for weak shear observations \(\varvec{d}\equiv \{\langle g_+(\vartheta _b)\rangle \}_{b=1}^{N_\mathrm {bin}}\) is written as:

$$\begin{aligned} -2\ln \mathcal{L}(p)&= \sum _{b,b'=1}^{N_\mathrm {bin}} \left[ \langle g_+(\vartheta _b)\rangle -\widehat{g}_+(\vartheta _b|p)\right] (C^{-1})_{bb'} \left[ \langle g_+(\vartheta _{b'})\rangle -\widehat{g}_+(\vartheta _{b'}|p)\right] \nonumber \\&\quad + \ln \left[ (2\pi )^{N_\mathrm {bin}}\mathrm {det}(C)\right] , \end{aligned}$$
(124)

where C is the \(N_\mathrm {bin}\times N_\mathrm {bin}\) error covariance matrix for the binned reduced tangential shear profile \(\varvec{d}\) and \(\widehat{g}_+(\vartheta _b|\varvec{p})\) represents the theoretical expectation for \(\langle g_+(\vartheta _b)\rangle \) (Eq. (93)) predicted by the model parameterized by a set of parameters \(\varvec{p}\). Note that modeling of the cluster lensing signal \(\widehat{g}_+(\vartheta |\varvec{p})\) requires information of the lensing depth \(\langle \varSigma _\mathrm {cr}^{-1}\rangle \) averaged over the source-redshift distribution (Sect. 3.6.2). Similarly, one can define a likelihood function for the lensing convergence profile \(\kappa (\vartheta )\), which can be reconstructed from combined shear and magnification measurements (e.g., Umetsu et al. 2011b, 2014).

A well-characterized inference of the model parameters \(\varvec{p}\) can be obtained within the Bayesian framework by properly choosing the priors (Umetsu et al. 2020). In this context, when interpreting the cluster lensing signal with an NFW profile (Sect. 4.3.1), it is useful to take \(\varvec{p}=(M_\mathrm {200c},c_\mathrm {200c})\) as fitting parameters.Footnote 10 Tangential shear fitting with a spherical NFW profile is a standard approach for measuring individual cluster masses from weak lensing (e.g., Okabe et al. 2010; Applegate et al. 2014; Hoekstra et al. 2015). Numerical simulations suggest that mass estimates from tangential shear fitting can be biased low (by \(\sim 5-10\%\); Meneghetti et al. 2010b; Becker and Kravtsov 2011; Rasia et al. 2012), because local substructures that are abundant in the outskirts of massive clusters dilute the shear tangential to the cluster center. Moreover, systematic deviations of the lensing signal from the assumed NFW profile form in projection can lead to a substantial level of mass bias, even if the spherically averaged density profiles \(\rho (r)\) in three dimensions are well described by the NFW form (e.g., Sereno et al. 2016; Umetsu et al. 2020). Therefore, it is highly important to optimize the radial range for tangential shear fitting so as to alleviate the mass bias (von der Linden et al. 2014a; Applegate et al. 2014; Pratt et al. 2019).

Fig. 14
figure 14

Left panel: diagonal components of the covariance matrix as a function of cluster-centric radius R obtained from a stacked shear analysis based on Subaru Hyper Suprime-Cam observations. The black solid line shows the total covariance C; the blue dashed line is the uncertainty due to intrinsic shapes of source galaxies (\(C^\mathrm {stat}\), denoted as \(C^\mathrm {shape}\) in this paper), the orange dotted line is the covariance due to intrinsic variations of the projected cluster lensing signal (\(C^\mathrm {int}\)), and the green dashed-dotted line is the cosmic noise covariance due to large-scale structure uncorrelated with clusters (\(C^\mathrm {lss}\)). Shape noise is dominant for \(R\lesssim 3h^{-1}\mathrm {Mpc}\) (comoving), while the cosmic noise dominates at larger separations. Right panel: correlation matrix of the stacked total covariance as a function of the radial bin. The correlation between radial bins appears at large separation due to the cosmic noise covariance. Image reproduced with permission from Miyatake et al. (2019), copyright by AAS

To obtain robust constraints on the underlying cluster mass distribution, we need to ensure that the shear likelihood function (Eq. (124)) includes all relevant sources of uncertainty (Gruen et al. 2015). Following Umetsu et al. (2016, (2020), we decompose the error covariance matrix C for \(\varvec{d}=\{\langle g_+(\vartheta _b)\rangle \}_{b=1}^{N_\mathrm {bin}}\) as:

$$\begin{aligned} C = C^\mathrm {shape} + C^\mathrm {lss} + C^\mathrm {int}, \end{aligned}$$
(125)

where \((C^\mathrm {shape})_{bb'}=\sigma ^2_\mathrm {shape}(\vartheta _b)\delta _{bb'}\) is the diagonal statistical uncertainty due to the shape noise (Eq. (112)); \((C^\mathrm {lss})_{bb'}\) is the cosmic noise covariance due to uncorrelated large-scale structure projected along the line of sight (Schneider et al. 1998; Hoekstra 2003); and \((C^\mathrm {int})_{bb'}\) accounts for statistical fluctuations of the projected cluster lensing signal due to intrinsic variations associated with assembly bias and cluster asphericity (Gruen et al. 2015).

Figure 14 shows the diagonal elements of the covariance matrix used in a stacked weak-lensing analysis of Miyatake et al. (2019) and the correlation matrix, defined with the total covariance matrix as \(C_{bb'}/\sqrt{C_{bb}C_{b'b'}}\). A similar figure but for the \(\kappa \) profile was presented in Umetsu et al. (2016), which presents a joint weak and strong lensing analysis of 20 high-mass clusters targeted by the CLASH survey (Cluster Lensing And Supernova survey with Hubble; Postman et al. 2012).

The elements of the \(C^\mathrm {lss}\) matrix are given by (Hoekstra 2003; Oguri and Takada 2011):

$$\begin{aligned} (C^\mathrm {lss})_{bb'}=\int \frac{\ell \text {d}\ell }{2\pi }P_\kappa (\ell ) \hat{J_2}(\ell \vartheta _b)\hat{J_2}(\ell \vartheta _{b'}), \end{aligned}$$
(126)

where \(\hat{J}_2(\ell \vartheta _b)\) is the Bessel function of the first kind and second order averaged over the bth annulus (for the case of the binned convergence profile, see Umetsu et al. 2011a, 2016; Gruen et al. 2015); and \(P_\kappa (\ell )\) is the two-dimensional convergence power spectrum (see Eq. (37)) as a function of angular multipole \(\ell \) calculated using the flat-sky and Limber’s approximation as (Limber 1953; Kaiser 1992):

$$\begin{aligned} P_\kappa (\ell ) = \frac{9H_0^4\varOmega _\mathrm {m}^2}{4c^4}\int _0^{\chi _s}\text {d}\chi \, W^2(\chi ,\chi _s) a^{-2}(\chi )P_\mathrm {NL}\left[ k=\frac{\ell }{r(\chi )};\chi \right] , \end{aligned}$$
(127)

with \(\chi \) the comoving coordinate along the line of sight, \(W(\chi ,\chi _s)=r(\chi _s-\chi )/r(\chi _s)\) the ratio of angular diameter distances \(D_{ls}/D_s\), and \(P_\mathrm {NL}(k;\chi )\) the nonlinear matter power spectrum. The convergence power spectrum \(P_\kappa (\ell )\) can be evaluated for a given source population and a cosmological model. In Eq. (127), we have assumed a single comoving distance \(\chi _s\) corresponding to the effective single-plane redshift of source galaxies (i.e., all source galaxies lying at \(\chi =\chi _s\)). Provided that \(\varDelta \vartheta _b/\vartheta _b\ll 1\) with \(\varDelta \vartheta _b\) the radial bin width, we have \(\hat{J}_2(\ell \vartheta _b)\simeq J_2(\ell \vartheta _b)\) (without bin-averaging) in Eq. (126).

The \(C^\mathrm {int}\) matrix describes statistical fluctuations of the projected cluster lensing signal at fixed halo mass due to intrinsic variations in halo concentration, triaxiality and orientation, and correlated secondary structures around the cluster, as well as to deviations from the assumed NFW form (Becker and Kravtsov 2011; Gruen et al. 2015).Footnote 11 Gruen et al. (2015) constructed a semianalytical model of \(C^\mathrm {int}\) that is calibrated to cosmological numerical simulations. Umetsu et al. (2016) found that the diagonal part of the intrinsic covariance for the convergence \(\kappa \) can be well approximated byFootnote 12:

$$\begin{aligned} (C^\mathrm {int}_{\kappa })_{bb} \simeq \alpha _\mathrm {int}^2\kappa ^2(\vartheta _b) \end{aligned}$$
(128)

with \(\alpha _\mathrm {int}=0.2\) in the intracluster (1-halo) regime at \(R=D_l\vartheta \lesssim r_\mathrm {200m}\). This suggests that the intrinsic variance is self-similar in the sense that \(\sqrt{(C^\mathrm {int}_\kappa )_{bb}}/\kappa (\vartheta _b)\sim \mathrm {const}\). A further simplification can be made by setting the off-diagonal elements of \(C^\mathrm {int}_\kappa \) to zero, i.e., \((C^\mathrm {int}_{\kappa })_{bb'}=\alpha ^2_\mathrm {int}\kappa ^2(\vartheta _b)\delta _{bb'}\). In general, the diagonal approximation to \(C^\mathrm {int}_\kappa \) can lead to an underestimation of parameter uncertainties (Gruen et al. 2015, see Fig. 5), where the degree of underestimation depends on the binning scheme, depth of weak-lensing observations, and halo mass. The impact of the diagonal approximation is more severe for deeper observations (or higher S/N weak-lensing data).Footnote 13 Assuming a representative mass profile, it is possible to convert the intrinsic covariance matrix \(C^\mathrm {int}_\kappa \) for the convergence into that for the tangential shear. This can be done by assuming an NFW density profile together with the concentration–mass relation \(c_\mathrm {200c}(M_\mathrm {200c},z)\) for a given cosmological model (Miyatake et al. 2019; Umetsu et al. 2020). The covariance \(C^\mathrm {int}\) for the \(g_+\) profile obtained in this way thus depends on halo mass. Miyatake et al. (2019) found that, however, the intrinsic covariances with different halo masses remain nearly self-similar in their shapes once scaled by \(R\rightarrow R/r_\mathrm {200m}\).

4.5 Stacked weak-lensing estimator

Stacking an ensemble of galaxy clusters helps average out large statistical fluctuations inherent in noisy weak-lensing observations of individual clusters. The statistical precision can be largely improved by stacking together a large number of clusters, allowing for tighter and more robust constraints on the ensemble properties of the cluster mass distribution. A stacked lensing analysis is thus complementary to an alternative approach that relies on individual cluster mass measurements (Sects. 4.2 and 4.4). In particular, a comparison of the two approaches provides a useful consistency check in different S/N regimes (e.g., Okabe et al. 2010; Umetsu et al. 2014, 2016, 2020; Okabe and Smith 2016).

Let us consider an ensemble of N galaxy clusters. We model the ensemble mass distribution of these clusters in terms of the excess surface mass density profile as:

$$\begin{aligned} \varvec{m}= \left\{ \varDelta \varSigma (R_b)\right\} _{b=1}^{N_\mathrm {bin}}. \end{aligned}$$
(129)

Specifically, our model is a vector of \(N_\mathrm {bin}\) parameters containing the binned \(\varDelta \varSigma (R)\) profile as a function of the projected cluster-centric radius R (see Sect. 3.5). Here, we aim to construct an unbiased estimator for the model \(\varvec{m}\), or the ensemble \(\varDelta \varSigma (R)\) profile, given weak-lensing observations of N individual clusters.

We assume that these clusters are distributed in redshift, having different geometric responses to the lensing signal through \(\varSigma _\mathrm {cr}^{-1}(z_l,z_s)\). We express weak-lensing observations \(\varvec{d}_l=\{\langle g_{+}(R_b|z_l)\rangle \}_{b=1}^{N_\mathrm {bin}}\) for a given cluster (l) as a sum of the signal vector \(\varvec{s}_l\) and the noise vector \(\varvec{n}_l\) as:

$$\begin{aligned} \varvec{d}_{l} =\varvec{s}_{l} + \varvec{n}_{l} \ \ \ (l=1,2,\ldots ,N), \end{aligned}$$
(130)

with

$$\begin{aligned} \varvec{s}_{l} = a_{l} \varvec{m}, \end{aligned}$$
(131)

where the response coefficient \(a_l\) represents the source-averaged inverse critical surface mass density evaluated for the lth cluster (Eq. (91)):

$$\begin{aligned} a_{l} = \left\langle \varSigma _{\mathrm {cr},l}^{-1}\right\rangle . \end{aligned}$$
(132)

In this expression, we assume that both \(\varvec{d}_l\) and \(a_l\) have been averaged over an ensemble of source galaxies to represent the respective source-averaged quantities for the lth cluster. For simplicity, we have ignored the nonlinearity between the lensing signal \(g_+\) and the surface mass density \(\varDelta \varSigma \) (see Sect. 3.6.2).Footnote 14 We refer to Umetsu et al. (2020) for a treatment of the stacked weak-lensing analysis accounting for the nonlinear correction for the source-averaging effect.

Assuming that \(\varvec{n}\) obeys Gaussian statistics and that the noise vectors between different clusters are statistically independent, we can write the total likelihood function of observations \(\varvec{d}=\{\varvec{d}_1,\varvec{d}_2,\ldots ,\varvec{d}_N\}\) as:

$$\begin{aligned} P(\varvec{d}|\varvec{m}) = \left( \prod _{l=1}^{N}\mathcal{N}_l\right) \exp {\left[ -\frac{1}{2}\sum _{l=1}^N (\varvec{d}_l-a_l\varvec{m})^t C_l^{-1} (\varvec{d}_l-a_l\varvec{m}) \right] }, \end{aligned}$$
(133)

where \(C_l=\langle \varvec{n}_l \varvec{n}_l^t\rangle \) is the error covariance matrix (Sect. 4.4) for the lth cluster and \(\mathcal{N}_l=(2\pi )^{-N_\mathrm {bin}/2}|C_l|^{-1/2}\) is a normalization factor. In ground-based cluster weak-lensing observations, the shear covariance matrix \((C_l)_{bb'}\) per cluster (\(b,b'=1,2,\ldots ,N_\mathrm {bin}\)) is dominated by the statistical uncertainty due to the shape noise. The contribution from cosmic noise (Sect. 4.4) becomes important at large cluster-centric distances (Fig. 14).

The total log-likelihood function \(\ln {P(\varvec{d}|\varvec{m})}\) is expressed as:

$$\begin{aligned} \ln {P(\varvec{d}|\varvec{m})} = -\frac{1}{2}\sum _{l=1}^N (\varvec{d}_l-a_l\varvec{m})^t C_l^{-1} (\varvec{d}_l-a_l\varvec{m}) + \mathrm {const}. \end{aligned}$$
(134)

According to Bayes’ theorem, the posterior probability distribution of \(\varvec{m}\) given the data \(\varvec{d}\) is:

$$\begin{aligned} P(\varvec{m}|\varvec{d}) = P(\varvec{d}|\varvec{m})\frac{P(\varvec{m})}{P(\varvec{d})}, \end{aligned}$$
(135)

where \(P(\varvec{m})\) is the prior probability distribution for the model \(\varvec{m}\) and \(P(\varvec{d})\) is the evidence that serves as a normalization factor here. We assume an uninformative uniform prior for our model \(\varvec{m}\), such that \(P(\varvec{m}|\varvec{d}) \propto P(\varvec{d}|\varvec{m})\). By maximizing \(\ln {P(\varvec{m}|\varvec{d})}\) with respect to \(\varvec{m}\), we obtain the desired expression for the stacked weak-lensing estimator \(\widehat{\varvec{m}}\) as (e.g., Umetsu et al. 2011a):

$$\begin{aligned} \widehat{\varvec{m}} = \left( \displaystyle \sum _{l=1}^N a_l^2 C_l^{-1} \right) ^{-1} \, \left( \displaystyle \sum _{l=1}^{N}{a_l C_l^{-1} \varvec{d}_l} \right) \equiv \langle \langle \varvec{\varDelta \varSigma }_+\rangle \rangle . \end{aligned}$$
(136)

Note that the weight assigned to \(\varDelta \varSigma _+\) of each cluster is proportional to \(a_l^2=\left\langle \varSigma _{\mathrm {cr},l}^{-1}\right\rangle ^2\) (see also Eq. (98)), because \(\varvec{s}_l\propto a_l\). The error covariance matrix \(\mathcal{C}\) for the stacked estimator \(\langle \langle \varvec{\varDelta \varSigma }_+\rangle \rangle \) (Eq. (136)) is given by:

$$\begin{aligned} \mathcal{C}_{bb'} = \left( {F}^{-1}\right) _{bb'}, \end{aligned}$$
(137)

where F is the Fisher information matrix defined as (e.g., Umetsu et al. 2011a):

$$\begin{aligned} {F}_{bb'} \equiv - \mathcal{E}\left[ \frac{\partial ^2 \ln {P}(\varvec{m}|\varvec{d})}{\partial m_b \partial m_{b'}} \right] _{\varvec{m}=\widehat{\varvec{m}}} = \left( \sum _{l=1}^N a_l^2 C_l^{-1}\right) _{bb'}. \end{aligned}$$
(138)

The total S/N for detection is given by (e.g., Umetsu and Broadhurst 2008):

$$\begin{aligned} (\mathrm {S/N})^2 = \sum _{b,b'=1}^{N_\mathrm {bin}} \widehat{m}_b \left( \mathcal{C}^{-1}\right) _{bb'} \widehat{m}_{b'} = \widehat{\varvec{m}}^t \mathcal{C}^{-1}\widehat{\varvec{m}}. \end{aligned}$$
(139)

Again, this quadratic S/N estimator breaks down and leads to an overestimation of significance if the actual per-bin S/N is less than unity (see Sect. 4.2).

It is noteworthy that interpreting the effective mass from the stacked lensing signal (Eq. (136)) requires caution especially when the cluster sample spans a wide range in mass and redshift. This is because the amplitude of the lensing signal is weighted by the redshift-dependent sensitivity and it is not linearly proportional to the cluster mass (e.g., Mandelbaum et al. 2005; Umetsu et al. 2016, 2020; Melchior et al. 2017; Sereno et al. 2017). We refer the reader to Miyatake et al. (2019) and Murata et al. (2019) for further discussion of this issue.

Fig. 15
figure 15

Stacked tangential (\(\varDelta \varSigma _+\): top) and cross (\(\varDelta \varSigma _\times \): bottom) shear profiles as a function of cluster-centric comoving radius R, derived for a sample of 136 spectroscopically confirmed X-ray-selected groups and clusters (\(0.033\leqslant z_l\leqslant 1.033\)) detected in the \(25\,\hbox {deg}^2\) XXL-N region. The results are based on Subaru HSC-SSP survey data, shown for three different source-selection methods (black squares, blue squares, and red circles). The data points with different selection methods are horizontally shifted for visual clarity. The solid line and the dashed line represent the best-fit NFW model and the halo model, respectively, derived from the fiducial P-cut measurements. The dotted line shows the 2-halo term of the best-fit halo model. Ephor_AB and MLZ refer to HSC photo-z codes (Tanaka et al. 2018). Image reproduced with permission from Umetsu et al. (2020), copyright by AAS

Figure 15 shows the stacked weak-lensing signals around a sample of 136 spectroscopically confirmed X-ray groups and clusters at \(0.033\leqslant z_l\leqslant 1.033\) selected from the XMM-XXL survey (Umetsu et al. 2020). Their weak-lensing analysis is based on HSC-SSP survey data. The figure compares stacked \(\langle \langle \varDelta \varSigma _{+}\rangle \rangle \) profiles of the XXL sample obtained with different source-selection methods (see Sect. 4.1). This comparison shows no significant difference between these profiles within errors in all bins. From a single-mass-bin NFW fit to the stacked shear profile, Umetsu et al. (2020) found \(M_\mathrm {200c}=(8.7\pm 0.8)\times 10^{13}h^{-1}M_\odot \) and \(c_\mathrm {200c}=3.5\pm 0.9\) at a lensing-weighted mean redshift of \(z_l\approx 0.25\). This is in agreement with the mean concentration expected for dark-matter halos in the standard \({\varLambda }\hbox {CDM}\) cosmology, \(c_\mathrm {200}\approx 4.1\) at \(M_\mathrm {200c}=8.7\times 10^{13}h^{-1}M_\odot \) and \(z=0.25\) (e.g., Diemer and Joyce 2019). Figure 15 also displays the best-fit halo model including the effects of surrounding large-scale structure as a 2-halo term. Figure 15 shows that the 2-halo term in the range \(R\in [0.3,3]\,h^{-1}\mathrm {Mpc}\) (comoving) is negligibly small even in low-mass clusters and groups (e.g., Leauthaud et al. 2010; Covone et al. 2014; Sereno et al. 2015), for which the maximum radius corresponds to \(\sim 3r_\mathrm {200c}\). This is because the tangential shear \(\varDelta \varSigma (R)=\overline{\varSigma }(R)-\varSigma (R)\) is insensitive to flattened sheet-like structures (Schneider and Seitz 1995).

4.6 Quadrupole shear

Halos formed in collisionless CDM simulations are not spherical and can have complex shapes. A more realistic description of individual cluster halos is as triaxial ellipsoids with minor-to-major axis ratios of order \(a/c\sim 0.5\), slowly increasing with halo-centric radius (Jing and Suto 2002; Bonamigo et al. 2015). More massive halos are less spherical and more prolate, as they tend to form later. The projected matter distributions around clusters are thus expected to be anisotropic, with typical axis ratios of \(q\sim 0.6\) (e.g., Okabe et al. 2018). The projected axis ratio of cluster halos varies slowly with cluster-centric distance (e.g., Okabe et al. 2018).

For sufficiently massive clusters at low redshift, deep weak-lensing observations allow us to constrain the halo shape on an individual cluster basis, by forward-modeling the observed two-dimensional shear (or convergence) field with elliptical lens models (Oguri et al. 2010; Okabe et al. 2011; Watanabe et al. 2011; Umetsu et al. 2012; Medezinski et al. 2013, 2016; Wegner et al. 2017). This tow-dimensional fitting approach is flexible and can be readily generalized to include multiple halo components (see the discussion in Sect. 3.6.1, e.g., Okabe et al. 2011; Medezinski et al. 2013) and triaxial halo shapes (e.g., Oguri et al. 2005; Sereno and Umetsu 2011; Umetsu et al. 2015; Chiu et al. 2018).

In this subsection, we introduce quadrupole shear estimators for measuring the projected halo shape for a stacked ensemble of galaxy clusters.

4.6.1 Projected halo shape and multipole expansion

Following Adhikari et al. (2015), we introduce a formalism that allows for modeling the effects of halo ellipticity on weak shear observables based on an angular multipole expansion of the lensing fields.Footnote 15 Let us write the azimuthally averaged projected mass density profile as \(\varSigma ^{(0)}(R)\propto R^{-\eta _0}\) with \(\eta _0 = -d\ln {\varSigma ^{(0)}(R)/d\ln {R}}>0\). Assuming that q is constant with cluster-centric radius, we can write the surface mass density around clusters as \(\varSigma (x,y)\propto R_q^{-\eta _0}\) (Adhikari et al. 2015; Clampitt and Jain 2016), where \(R_q\) is an elliptical radial coordinate defined as (Evans and Bridle 2009; Oguri et al. 2012; Umetsu et al. 2012, 2018):

$$\begin{aligned} R_q = \left( qx^2+y^2/q\right) ^{1/2}, \end{aligned}$$
(140)

with q the minor-to-major axis ratio \((0<q\leqslant 1)\). Here, we have chosen the Cartesian coordinate system (xy) centered on the halo, such that the x-axis is aligned with the major axis of the projected ellipse. We define the corresponding mass ellipticity by \(e=(1-q^2)/(1+q^2)\).

We express the multipole expansion of \(\varSigma \) as (Adhikari et al. 2015; Clampitt and Jain 2016):

$$\begin{aligned} \varSigma (R,\varphi ) \propto R_q^{-\eta _0}&= R^{-\eta _0}\left[ 1+\frac{e\eta _0}{2}\cos {2\varphi } + O(e^2) \right] ,\nonumber \\&\equiv \varSigma ^{(0)}(R) + \varSigma ^{(2)}(R)\cos {2\varphi } + ..., \end{aligned}$$
(141)

where \(\varphi \) is the azimuthal angle relative to the halo’s major axis, the multipole \(\varSigma ^{(m)}(R)\) is the coefficient of the \(e^{im\varphi }\) component of the azimuthal behavior, and we assume \(e\eta _0/2\ll 1\) to justify the neglect of higher order terms in the expansion. We thus model the projected mass distributions of clusters as the sum of a monopole and a quadrupole. We further assume that:

$$\begin{aligned} e \simeq \frac{2\varSigma ^{(2)}(R)}{\eta _0(R)\varSigma ^{(0)}(R)}. \end{aligned}$$
(142)

The quadrupole \(\varSigma ^{(2)}\) can thus be completely determined by the monopole \(\varSigma ^{(0)}\propto R^{-\eta _0}\), up to a multiplicative factor corresponding to the halo ellipticity e.

Similarly, the quadrupole moments of the tangential (\(+\)) and cross (\(\times \)) shear components are given by (Adhikari et al. 2015):

$$\begin{aligned} \varSigma _\mathrm {cr}\gamma _+^{(2)}(R)&= \frac{e}{2}\left[ -\varSigma ^{(0)}(R)\eta _0(R)+I_1(R)+I_2(R)\right] \cos {2\varphi },\nonumber \\ \varSigma _\mathrm {cr}\gamma _\times ^{(2)}(R)&= \frac{e}{2}\left[ I_1(R)-I_2(R)\right] \sin {2\varphi }, \end{aligned}$$
(143)

where \(I_1(R)\) and \(I_2(R)\) are defined by (Clampitt and Jain 2016):

$$\begin{aligned} I_1(R)&= \frac{3}{R^4}\int _0^RR'^3\varSigma ^{(0)}(R')\eta _0(R')\,\text {d}R',\nonumber \\ I_2(R)&=\int _R^\infty \frac{\varSigma _0(R')\eta _0(R')}{R'}\,\text {d}R'. \end{aligned}$$
(144)

Equation (143) suggests an optimal estimator weighted by \(\cos {2\varphi }\) to extract the quadrupole of the excess surface mass density, \(\varDelta \varSigma ^{(2)}(R)\), from tangential shear measurements.

Weighted quadrupole estimators for the tangential and cross-shear components are given by (Natarajan and Refregier 2000; Mandelbaum et al. 2006):Footnote 16

$$\begin{aligned} \langle \langle \varDelta \varSigma ^{(2)}_+\rangle \rangle&= \left[ \sum _{l,s}w_{ls}\varDelta \varSigma _+(R|z_l,z_s)\cos {2\varphi _{ls}}\right] \left[ \sum _{l,s}w_{ls}\cos ^2{2\varphi _{ls}}\right] ^{-1},\nonumber \\ \langle \langle \varDelta \varSigma ^{(2)}_\times \rangle \rangle&= \left[ \sum _{l,s}w_{ls}\varDelta \varSigma _\times (R|z_l,z_s)\sin {2\varphi _{ls}}\right] \left[ \sum _{l,s}w_{ls}\sin ^2{2\varphi _{ls}}\right] ^{-1}, \end{aligned}$$
(145)

where \(\varDelta \varSigma _{+,\times }(R|z_l,z_s)=\varSigma _\mathrm {cr}(z_l,z_s) g_{+,\times }(R|z_l,z_s)\) (Eq. (94)); \(w_{ls} = \varSigma _{\mathrm {cr},ls}^{-2} / \sigma _{g,ls}^{2}\) is the statistical weight for each lens–source pair (ls), with \(\sigma _{g,ls}\) the statistical uncertainty per shear component (see Eq. (98)); and \(\varphi _{ls}\) is the azimuthal angle of each source galaxy (s) relative to the major axis of each cluster lens (l). In real observations, we must rely on the major axis of the distribution of baryonic tracers (e.g., central galaxies and X-ray gas) to perform aligned, stacked lensing measurements by Eq. (145) (Mandelbaum et al. 2006; van Uitert et al. 2012, 2017; Clampitt and Jain 2016).

As discussed by Hoekstra et al. (2004) and Mandelbaum et al. (2006), in practical applications, Eq. (145) is susceptible to a possible systematic alignment of lens galaxy (e.g., BCGs) and source ellipticities. Such a spurious alignment signal can arise from an incomplete correction of the PSF anisotropy, which tends to affect neighboring objects in a similar manner. On the other hand, when interpreting the quadrupole shear signal, one must take into account possible misalignment between the underlying matter and tracer distributions, which will cause a dilution of the quadrupole shear signal. Moreover, modeling of the quadrupole shear based on the multipole expansion (Eq. (142)) should only be applied to the case with a small halo ellipticity (\(e\eta /2\ll 1\)), so that the higher order terms can be safely ignored (see Eq. (141)).

4.6.2 Cartesian estimator

Fig. 16
figure 16

The quadrupole shear pattern produced by an elliptical mass distribution. The x-axis of the Cartesian coordinate system is aligned with the major axis of the tracer distribution, which is assumed to be aligned with the major axis of the underlying mass distribution. The sign convention for the Cartesian shear components (\(\gamma _1,\gamma _2\)) is shown at the right. Note that the monopole shear (which is purely tangential) is not contributing to the shear pattern illustrated here. Image reproduced with permission from Clampitt and Jain (2016), copyright by the authors

Now, we introduce the Cartesian estimator of Clampitt and Jain (2016). Compared to the estimator of Natarajan and Refregier (2000), a practical advantage of this estimator is that one of the two Cartesian components (\(\varDelta \varSigma _2^{(\pm )}\) defined below) is insensitive to the spurious alignment of lens–source galaxy ellipticities (Clampitt and Jain 2016) discussed at the end of Sect. 4.6.1. With this estimator, we measure the stacked quadrupole shear signal with respect to a coordinate system with the x-axis aligned with the major axis of the distribution of baryonic tracers (e.g., central galaxies and X-ray gas). The monopole signal is nulled with this Cartesian estimator. We adopt the same sign convention for the Cartesian \(\gamma _1\) and \(\gamma _2\) components as defined in Clampitt and Jain (2016) and use \(\varphi \) to denote the azimuthal angle relative to the x-axis of each cluster. This is illustrated in Fig. 16.

The Cartesian shear components are related to the tangential and cross components (see Eq. (83)) by:

$$\begin{aligned} \gamma _1(R,\varphi )&= -\gamma _+(R,\varphi )\cos {2\varphi }+\gamma _\times (R,\varphi )\sin {2\varphi },\nonumber \\ \gamma _2(R,\varphi )&= -\gamma _+(R,\varphi )\sin {2\varphi }-\gamma _\times (R,\varphi )\cos {2\varphi }. \end{aligned}$$
(146)

In the framework of Adhikari et al. (2015) based on the multipole expansion, the multipole moments of the Cartesian shear components are written as follows (Clampitt and Jain 2016):

$$\begin{aligned} \varSigma _\mathrm {cr}\gamma _1^{(2)}&= \frac{e}{4}\left( \left[ \varSigma ^{(0)}(R)\eta _0(R)-2I_1(R)\right] \cos {4\varphi } + \varSigma ^{(0)}(R)\eta _0(R)-2I_2(R)\right) ,\nonumber \\ \varSigma _\mathrm {cr}\gamma _2^{(2)}&= \frac{e}{4}\left[ \varSigma ^{(0)}(R)\eta _0(R)-2I_1(R)\right] \sin {4\varphi }. \end{aligned}$$
(147)

Equation (147) shows that the azimuthal dependence of the Cartesian shear components goes as \(\cos {4\varphi }\) (except for the two terms without \(\varphi \) dependence; see Clampitt and Jain 2016 for more discussion) and \(\sin {4\varphi }\), so that there is a sign change in both components after every angle \(\pi /4\). When moving around the circle, the shear signal from elliptical clusters transitions between regions where \(\gamma _1^{(2)}\) and then \(\gamma _2^{(2)}\) alternately dominate (Clampitt and Jain 2016), as illustrated in Fig. 16.

Fig. 17
figure 17

Illustration of the Cartesian quadrupole shear estimator of Clampitt and Jain (2016). The x-axis of the Cartesian coordinate system is aligned with the major axis of the tracer distribution (black solid ellipse). We group together the Cartesian first and second shear components in same-sign regions of \(\cos {4\varphi }\) and \(\sin {4\varphi }\) (gray-shaded regions), respectively, and define four quadrupole shear components, namely, \(\varDelta \varSigma _{1}^{(-)}\) (upper left), \(\varDelta \varSigma _{1}^{(+)}\) (upper right), \(\varDelta \varSigma _{2}^{(-)}\) (lower left), and \(\varDelta \varSigma _{2}^{(+)}\) (lower right). Image reproduced with permission from Umetsu et al. (2018), copyright by AAS

Following Clampitt and Jain (2016), we group together the first and second shear components \((g_1,g_2)\) of background source galaxies in the regions where \(\cos {4\varphi }\) and \(\sin {4\varphi }\) have the same sign (see Fig. 17), respectively, and define the following estimator:

$$\begin{aligned} \langle \langle \varDelta \varSigma _i^{(\pm )}(R)\rangle \rangle =\left[ \sum _{l,s} w_{ls}\varDelta \varSigma _{i}(R|z_l,z_s) \right] \left[ \sum _{l,s} w_{ls}\right] ^{-1} \ \ \ (i=1,2), \end{aligned}$$
(148)

where we have introduced the notation in analogy to the tangential shear (Eq. (94)):

$$\begin{aligned} \varDelta \varSigma _i(R|z_l,z_s) = \varSigma _\mathrm {cr}(z_l,z_s) g_i(R|z_l,z_s), \end{aligned}$$
(149)

where \(w_{ls} = \varSigma _{\mathrm {cr},ls}^{-2} / \sigma _{g,ls}^{2}\) is the statistical weight for each lens–source pair (ls), with \(\sigma _{g,ls}\) the statistical uncertainty per shear component (see Eq. (98)); and s runs over all source galaxies that fall in the specified bin \((R,\varphi )\), different for each shear component i and sign (Clampitt and Jain 2016): \(i=1\), \(\mathrm {sign}=-\), \(-\pi /8\leqslant \varphi <\pi /8\); \(i=1\), \(\mathrm {sign}=+\), \(\pi /8\leqslant \varphi <3\pi /8\); \(i=2\), \(\mathrm {sign}=-\), \(0\leqslant \varphi <\pi /4\); \(i=2\), \(\mathrm {sign}=+\), \(\pi /4\leqslant \varphi <\pi /2\). For each case, the summation in Eq. (148) also includes source galaxies lying in symmetrical regions shifted by \(\pi /2\), \(\pi \), and \(3\pi /2\), as illustrated in Fig. 17.

Fig. 18
figure 18

Stacked quadrupole shear profiles for a sample of 20 CLASH clusters measured with respect to the X-ray major axis of each cluster. Left panel: observed \(\langle \langle \varDelta \varSigma _1^{(-)}\rangle \rangle \) (red squares) and \(\langle \langle \varDelta \varSigma _2^{(-)}\rangle \rangle \) (blue triangles) profiles shown along with the best-fit elliptical-NFW model. Right panel: same as the left panel, but showing the results for the \(\langle \langle \varDelta \varSigma _{1,2}^{(+)}\rangle \rangle \) profiles. The best-fit model was obtained from a simultaneous elliptical-NFW fit to the four quadrupole shear profiles. Image reproduced with permission from Umetsu et al. (2018), copyright by AAS

Figure 18 shows the stacked quadrupole shear profiles, \(\langle \langle \varDelta \varSigma _1^{(+)}\rangle \rangle \), \(\langle \langle \varDelta \varSigma _1^{(-)}\rangle \rangle \), \(\langle \langle \varDelta \varSigma _2^{(+)}\rangle \rangle \), and \(\langle \langle \varDelta \varSigma _2^{(-)}\rangle \rangle \), derived for a sample of 20 high-mass CLASH clusters (Umetsu et al. 2018). The quadrupole shear signal was measured with respect to the major axis of the X-ray gas shape of each cluster. Umetsu et al. (2018) modeled the stacked \(\langle \langle \varDelta \varSigma _{1,2}^{(\pm )}\rangle \rangle \) profiles by assuming an elliptical-NFW density profile with the major axis aligned with the X-ray major axis (for an elliptical extension of lensing mass models, see Keeton 2001). Any misalignment would thus lead to a dilution of the quadrupole signal and hence an underestimation of the halo ellipticity. Umetsu et al. (2018) obtained stacked constraints on the projected axis ratio of \(q=0.67\pm 0.10\) (or \(1-q=0.33\pm 0.10\)), which is fully consistent with the median axis ratio \(q=0.67\pm 0.07\) of this sample obtained from their two-dimensional shear and magnification analysis of the 20 individual clusters. Their results suggest that the total matter distribution is closely aligned with the X-ray brightness distribution (with a median misalignment angle of \(|\varDelta \mathrm {PA}|=21^\circ \pm 7^\circ \)) as expected from cosmological hydrodynamical simulations (see Okabe et al. 2018).

5 Magnification bias

In addition to the shape distortions, gravitational lensing can cause focusing of light rays, which results in an amplification of the image flux through the solid-angle distortion (Sect. 2.6.3). Lensing magnification provides complementary and independent observational alternatives to gravitational shear, especially at high redshift where source galaxies are more difficult to resolve (Van Waerbeke et al. 2010; Hildebrandt et al. 2011; Ford et al. 2014; Chiu et al. 2016, 2020).

5.1 Magnified source counts

Let us consider source number counts \(n_0(>F)\) per unit solid angle as a function of the limiting flux F for a given population of background objects (e.g., color–magnitude-selected galaxies, quasars, etc.). In the absence of gravitational lensing, the intrinsic source counts can be written as:

$$\begin{aligned} n_0(>F) =\int \text {d}z\,\frac{\text {d}^2V}{\text {d}z\text {d}\varOmega }\, \int _{L(z)}^\infty \text {d}L\,\frac{\text {d}^2N(L,z)}{\text {d}L{\text {d}}V} \equiv \int \text {d}z\,\frac{{\text {d}}n_0[z|>L(z)]}{\text {d}z}, \end{aligned}$$
(150)

where \({\text {d}}^2V(z)/\text {d}z/\text {d}\varOmega \) is the comoving volume element per redshift interval per unit solid angle, \({\text {d}}^2N(L,z)/\text {d}L/\text {d}V\) is the luminosity function of the background population, \(L(z)=4\pi D_L^2(z)F\) is the luminosity threshold corresponding to the flux limit F at redshift z, with \(D_L(z)\) the luminosity distance, and \({\text{d}}n_0[z|>L(z)]/\text {d}z\) is the redshift distribution function.

Here, we focus on the subcritical regime with \(\mu (\varvec{\theta }) > 0\) (i.e., outside the critical curves). Lensing magnification causes focusing of light rays while conserving the surface brightness (Sect. 2.6.3), resulting in the following two competing effects (Broadhurst et al. 1995; Umetsu 2013):

  1. 1

    Area distortion: \(\delta \varOmega \rightarrow \mu (\varvec{\theta })\delta \varOmega \);

  2. 2

    Flux amplification: \(F\rightarrow \mu (\varvec{\theta })F\).

The former effect reduces the geometric area in the source plane, thus decreasing the observed number of background sources per unit solid angle. On the other hand, the latter effect amplifies the flux of background sources, increasing the observed number of sources above the limiting flux.

In the presence of gravitational lensing, the magnified source counts are given as:

$$\begin{aligned} n_\mu (>F)&=\int \text {d}z\,\frac{{\text{d}}^2V}{\mu (z)\text {d}z\text {d}\varOmega }\, \int _{L(z)/\mu (z)}^\infty \text {d}L\,\frac{{\text{d}}^2N(L,z)}{\text {d}L\text {d}V}\nonumber \\&= \int \text {d}z\,\frac{\text {d}n_0[z|>L(z)/\mu (z)]}{\mu (z)\text {d}z}\equiv \int \text {d}z\,\frac{\text {d}n_\mu [z|>L(z)]}{\text {d}z}, \end{aligned}$$
(151)

where \(\text {d}n_\mu [z|>L(z)]/\text {d}z\) is the magnified redshift distribution function of the source population. Hence, the net change of the magnified source counts \(n_\mu (>F)/n_0(>F)\), known as magnification bias, depends on the intrinsic (unlensed) source luminosity function, \({\text{d}}^2N(L,z)/\text {d}L/\text {d}V\). One can calculate the expectation value for the magnified source counts \(n_\mu (>F)\) for a given background cosmology and a given source luminosity function.

In real observations, we apply different cuts (e.g., size, magnitude, and color cuts) in source selection for measuring the shear and magnification effects, thus leading to different source-redshift distributions. In contrast to the former effect, measuring the effect of magnification bias does not require source galaxies to be spatially resolved, but it does require a stringent flux limit against incompleteness effects (Hildebrandt 2016; Chiu et al. 2020).

Equation (151) indicates that, when redshift information of individual source galaxies is available from spectroscopic redshifts, we can directly measure the magnified redshift distribution of background source galaxies for a flux-limited sample (Broadhurst et al. 1995):

$$\begin{aligned} \frac{\text {d}n_\mu [z|>L(z)]}{\text {d}z} =\frac{\text {d}n_0[z|>L(z)/\mu (z)]}{\mu (z)\text {d}z}. \end{aligned}$$
(152)

Hence, in principle, the lensing-induced distortion of the redshift distribution \(\text {d}n_\mu [z|>L(z)]/\text {d}z\) can be measured from spectroscopic-redshift measurements with respect to the unlensed distribution \(\text {d}n_0[z|>L(z)]/\text {d}z\), which can be found in random fields. In particular, the integrated magnification-bias effect will translate into an enhancement in mean source redshift of the background sample (i.e., the first moment of Eq. (152)). Using 300,000 BOSS survey galaxies with accurate spectroscopic redshifts, Coupon et al. (2013) measured their mean redshift depth behind four large samples of optically selected clusters from the Sloan Digital Sky Survey (SDSS), totaling 5000–15,000 clusters. They found a \(\gtrsim 1\) percent level of mean redshift increase \(\delta z(R)\) toward the cluster center for SDSS-defined optical clusters with an effective mass of \(M_\mathrm {200c}\sim (1-2)\times 10^{14}M_\odot \).

5.2 Magnification observables

To simplify the calculations, we discretize Eq. (151) as:

$$\begin{aligned} n_\mu (>F) \simeq \sum _s \frac{\text {d}n_\mu [z_s|>L(z_s)]}{\text {d}z}\varDelta z\equiv \sum _s n_\mu (z_s|>F), \end{aligned}$$
(153)

where \(n_\mu (z_s|>F)\) represents a subsample of the background population in the redshift interval \([z_s,z_s+\varDelta z]\). If the change in flux due to magnification is small compared to the range over which the slope of the luminosity function varies, the intrinsic source counts \(n_0[z_s|>L(z_s)]\) can be approximated at \(L(z_s)\) by a power law with a logarithmic slope ofFootnote 17:

$$\begin{aligned} \alpha (z_s) = \left. -\frac{{\text{d}}\log n_0(z_s|>L)}{\text {d}\log L}\right | _{L(z_s)}. \end{aligned}$$
(154)

The magnified source counts \(n_\mu (z_s|>F)\) in the redshift interval \([z_s,z_s+\varDelta z]\) are given by:

$$\begin{aligned} n_\mu (z_s|>F)&= \frac{1}{\mu }\frac{\text {d}n_0[z_s|>L(z_s)/\mu ]}{\text {d}z}\varDelta z&= n_0(z_s|>F)\,\mu ^{\alpha (z_s)-1}. \end{aligned}$$
(155)

The corresponding magnification bias is given by:

$$\begin{aligned} b_\mu (z_s) := \frac{n_\mu (z_s|>F)}{n_0(z_s|>F)} = \mu ^{\alpha -1}. \end{aligned}$$
(156)

Equation (156) implies a positive bias for \(\alpha > 1\) and a negative bias for \(\alpha < 1\). The net magnification effect on the source counts vanishes at \(\alpha =1\). For a depleted sample of background sources with \(\alpha \ll 1\), the effect of magnification bias is dominated by the geometric area distortion (\(b_\mu \rightarrow \mu ^{-1}\) at \(\alpha \rightarrow 0\)) and insensitive to the intrinsic source luminosity function (Umetsu 2013). In the weak-lensing limit (\(|\gamma |,|\kappa |\ll 1\)), we have:

$$\begin{aligned} b_\mu \simeq 1+2(\alpha -1)\kappa . \end{aligned}$$
(157)

Hence, the flux magnification bias \(b_\mu \) in the weak-lensing limit provides a local measure of the surface mass density field, \(\kappa (\varvec{\theta })\). The combination of shear and magnification can thus be used to break or alleviate mass-sheet degeneracy (Schneider et al. 2000; Broadhurst et al. 2005b; Umetsu and Broadhurst 2008; Umetsu et al. 2011b, 2014, 2018).

In practical applications, we need to average over a broad range of source redshifts to increase the S/N. The magnification bias averaged over the source-redshift distribution is expressed as:

$$\begin{aligned} \langle b_\mu \rangle := \frac{n_\mu (>F)}{n_0(>F)} = \frac{\sum _s n_0(z_s|>F) \mu ^{\alpha (z_s)-1}}{\sum _s n_0(z_s|>F)}. \end{aligned}$$
(158)

In the continuous limit \(\sum _s n_0(z_s|>F) \rightarrow \int \text {d}z\,\text {d}n_0(z|>F)/dz\), we have the following equation (Umetsu 2013; Umetsu et al. 2016):

$$\begin{aligned} \langle b_\mu \rangle = \left[ \int \text {d}z\,\frac{\text {d}n_0(z|>F)}{\text {d}z}\mu ^{\alpha (z)-1}\right] \left[ \int \text {d}z\,\frac{\text {d}n_0(z|>F)}{\text {d}z} \right] ^{-1} =\langle \mu ^{\alpha -1}\rangle . \end{aligned}$$
(159)

Equation (159) gives a general expression for the flux magnification bias. Deep multiband photometry spanning a wide wavelength range allows us to identify distinct populations of background galaxies (e.g., Medezinski et al. 2010, 2011, 2018b; Umetsu et al. 2012, 2014, 2015). Since a given flux limit (F) corresponds to different intrinsic luminosities \(L(z_s)\) at different source redshifts \(z_s\) (Eq. (150)), source counts of distinctly different background populations probe different regimes of magnification bias. The bias is strongly negative for quiescent galaxies at \(\langle z_s\rangle \sim 1\), with a faint-end slope of \(\alpha \sim 0.4\) at the limiting magnitude \(z'\approx 25.6\,\hbox {ABmag}\) (e.g., Umetsu et al. 2014, 2015). A net count depletion (\(b_\mu <1\)) results for such a source population with \(\alpha \ll 1\) (e.g., Broadhurst 1995; Taylor et al. 1998; Broadhurst et al. 2005b; Umetsu and Broadhurst 2008; Umetsu et al. 2011b, 2012, 2014, 2015; Ford et al. 2012; Coe et al. 2012; Medezinski et al. 2013; Radovich et al. 2015; Ziparo et al. 2016; Wong et al. 2017), because the effect of magnification bias is dominated by the geometric area distortion. In the regime of density depletion, a practical advantage is that the effect is not sensitive to the exact form of the source luminosity function. The S/N for detection of \(b_\mu \) increases progressively as the flux limit F decreases.

Fig. 19
figure 19

Weak-lensing radial profiles of MACS J1206.2−0847 (\(z_l = 0.439\)) from Subaru Suprime-Cam observations. The top panel shows the reduced tangential shear profile \(g_+\) (squares). The bottom panel shows the coverage- and masking-corrected number-count profiles \(n_\mu \) for flux-limited samples of blue and red background galaxies (blue and red circles, respectively). The error bars include contributions from Poisson counting uncertainties and those from intrinsic angular clustering of each source population. For the red sample, a strong depletion of the source counts is seen toward the cluster center due to magnification of the sky area, while a slight enhancement of blue counts is present in the innermost radial bins due to the effect of positive magnification bias. Also shown for each observed profile is the joint Bayesian reconstruction (shaded area) from combined strong-lensing, weak shear lensing, and positive/negative magnification-bias measurements. Image reproduced with permission from Umetsu (2013), copyright by AAS

Figure 19 displays weak-lensing radial profiles for the cluster MACS J1206.2−0847 at \(z_l = 0.439\) derived from Subaru Suprime-Cam observations (Umetsu 2013). It is a highly massive X-ray cluster with \(M_\mathrm {200c}=(11.1\pm 2.5)\times 10^{14}h^{-1}M_\odot \) (Umetsu et al. 2014) targeted by the CLASH survey. The black squares in the top panel show the reduced tangential shear profile \(g_+(R)\). The blue and red circles in the bottom panel are positive and negative magnification-bias measurements \(n_\mu (R)\) showing density enhancement and depletion, respectively, as a function of cluster-centric radius R. These weak-lensing measurements yield respective S/N values of 10.2, 2.9, and 4.7. Figure 19 also shows a joint Bayesian reconstruction of each observed profile obtained from combined strong-lensing, weak shear lensing, and positive/negative magnification-bias measurements.

5.3 Nonlinear effects on the source-averaged magnification bias

It is instructive to consider a maximally depleted population of source galaxies with \(\alpha =-\text {d}\log {n_0(>F)}/\text {d}\log {F}=0\) at the limiting flux F. For such a population, the effect of magnification bias is purely geometric, \(\langle b_\mu \rangle =\langle \mu ^{-1}\rangle \), and insensitive to details of the intrinsic source luminosity function, \({\text{d}}^2N(L,z)/{\text{d}}L/{\text{d}}z\). In the nonlinear subcritical regime, the source-averaged inverse magnification factor is expressed as (Umetsu 2013):

$$\begin{aligned} \langle \mu ^{-1}\rangle&= \left\langle (1-\kappa )^2-|\gamma |^2\right\rangle \nonumber \\&= \left( 1-\langle \kappa \rangle \right) ^2 - |\langle \gamma \rangle |^2 + (f_l-1)\left( \langle \kappa \rangle ^2 - |\langle \gamma \rangle |^2\right) \nonumber \\&\equiv (1-\langle \kappa \rangle )^2-|\langle \gamma \rangle |^2+\varDelta \langle \mu ^{-1}\rangle , \end{aligned}$$
(160)

where \(\langle \cdots \rangle \) denotes the averaging over the source-redshift distribution (see Eq. (159)), \(f_l=\langle \varSigma _{\mathrm {cr},l}^{-2}\rangle /\langle \varSigma _{\mathrm {cr},l}\rangle ^2\) is a quantity of the order of unity, and \(\varDelta \langle \mu ^{-1}\rangle \) is the correction with respect to the single source-plane approximation. The error associated with the single source-plane approximation is \(\varDelta \langle \mu ^{-1}\rangle /\langle \mu ^{-1}\rangle \simeq (f_l-1)\left( \langle \kappa \rangle ^2-|\langle \gamma \rangle |^2\right) \), which is much smaller than unity for background populations with \(\alpha \sim 0\) in the mildly nonlinear subcritical regime where \(|\langle \kappa \rangle |\sim |\langle \gamma \rangle |\sim O(10^{-1})\). It is therefore reasonable to use the single source-plane approximation for calculating the magnification bias of depleted source populations with \(\alpha \ll 0\).

In the regime of density enhancement (\(\alpha >1\)), on the other hand, interpreting the observed lensing signal requires detailed knowledge of the intrinsic source luminosity function (see, e.g., Chiu et al. 2016, 2020), especially in the nonlinear regime where the flux amplification factor is correspondingly large (say, \(\mu \gtrsim 1.5\)). For example, a blue distant population of background galaxies is observed to have a well-defined redshift distribution that is fairly symmetric and peaked at a mean redshift of \(\langle z_s\rangle \sim 2\) (e.g., Lilly et al. 2007; Medezinski et al. 2010). Therefore, the majority of these faint blue galaxies are in the far background of typical cluster lenses, so that the lensing signal has a weaker dependence on the source redshift \(z_s\). In such a case, the single source-plane approximation may well be justified (Umetsu 2013).

5.4 Observational systematics and null tests

In real observations, contamination of the background sample by unlensed galaxies is a critical source of systematics in cluster weak lensing, as discussed in Sect. 4.1. In particular, contamination by cluster galaxies has a direct impact on the interpretation of background source counts, because it will cause an apparent density enhancement at small cluster-centric radii. To avoid significant contamination and alleviate this problem as much as possible, one often relies on a stringent color–color selection (Sect. 4.1) to measure the lensing magnification signal from a distinct population of background galaxies (e.g., Broadhurst et al. 2005b; Umetsu and Broadhurst 2008; Umetsu et al. 2011b, 2014; Ziparo et al. 2016; Chiu et al. 2016, 2020). If well-calibrated photo-z PDFs are available from multiband observations, the impact of cluster contamination can be characterized and assessed by statistically decomposing the photo-z PDF P(z) into the cluster and random-field populations (e.g., Gruen et al. 2014; Chiu et al. 2020).

Moreover, for unbiased magnification-bias measurements, one has to correct for the incomplete area coverage due to masked regions and incomplete measurement annuli. Specifically, masking (or blocking) of background galaxies by foreground objects, cluster members, and saturated or bad pixels needs to be properly accounted for (see Umetsu et al. 2011b; Chiu et al. 2020). Another concern is the impact of blending effects in the crowded regions of cluster environments and the presence of intracluster light (Gruen et al. 2019a), which could bias the photometry and thus photo-zs. The effects of masking and blending on the source counts can be examined and quantified by injecting synthetic galaxies into real images from observations (Huang et al. 2018; Chiu et al. 2020).

Fig. 20
figure 20

Stacked magnification-bias profiles around a sample of 3029 CAMIRA clusters obtained using low-z (left) and high-z (middle) samples, as well as the joint (right) background sample, from the Subaru HSC-SSP survey. The results are shown as a function of the projected physical cluster-centric radius R. Filled circles with error bars represent the stacked magnification-bias measurements with the respective lensing-cut samples. The total detection significance for each measurement is labeled in each panel. Open diamonds with error bars show the results for “null-test” samples, for which the net effect of magnification bias is expected to be zero. The green shaded region in each panel represents the \(1\sigma \) confidence interval around the best-fit profile. Image reproduced with permission from Chiu et al. (2020), copyright by the authors

Since the net effect of magnification bias is expected to vanish for a flux-limited background sample defined at \(\alpha =1\) (Sect. 5.2), weak-lensing magnification provides a powerful null test, similar to the cross-shear (B-mode) signal in the case of weak shear lensing (Sect. 4.2). By performing a null test, one can empirically assess the level of residual bias that could be present in the measurement for a “lensing-cut” sample defined at \(\alpha > 1\) or \(\alpha < 1\). The only assumption made in this approach is that residual systematics are the same between the lensing-cut and null-test samples defined at different flux (magnitude) limits. This null test allows us to quantify the impact of deblending effects, biased photometry in crowded regions, and any incorrect assumptions about P(z)-decomposition (Chiu et al. 2020). This is demonstrated in Fig. 20. The figure shows the stacked magnification-bias profiles around a sample of 3029 CAMIRA clusters with richness \(N>15\) in the redshift range \(0.2\leqslant z < 1.1\), obtained using flux-limited low-z and high-z background samples, as well as the joint sample, selected in the \(g-i\) versus \(r-z\) diagram from HSC-SSP survey data (Chiu et al. 2020). The magnification-bias signal for the full CAMIRA sample is detected at a significance level of \(9.5\sigma \). On the other hand, the residual bias estimated from the null-test samples was found to be statistically consistent with zero (Chiu et al. 2020).

6 Recent advances in cluster weak-lensing observations

Galaxy clusters provide valuable information from the physics driving cosmic structure formation to the nature of dark matter and dark energy. Their content reflects that of the universe: \(\sim 85\%\) dark matter and \(\sim 15\%\) baryons (cf. \(\varOmega _\mathrm {b}/\varOmega _\mathrm {m}=(15.7\pm 0.4)\%\); Planck Collaboration et al. 2016a), with \(\sim 90\%\) of the baryons residing in a hot, X-ray-emitting phase of the intracluster medium. Massive clusters dominated by dark matter are not expected to be significantly affected by baryonic gas cooling (Blumenthal et al. 1986; Duffy et al. 2010), unlike individual galaxies, because the high temperature and low density prevent efficient cooling and gas contraction. Consequently, for clusters in a state of quasi-equilibrium, the form of their total mass profiles reflects closely the underlying dark-matter distribution. Hence, galaxy clusters offer fundamental tests on the assumed properties of dark matter, as well as on models of nonlinear structure formation.

The \({\varLambda }\hbox {CDM}\) paradigm assumes that dark matter is effectively cold (nonrelativistic) and collisionless on astrophysical and cosmological scales (Bertone and Tait 2018). In this context, the standard CDM model and its variants, such as self-interacting dark matter (SIDM; Spergel and Steinhardt 2000) and wave dark matter (\(\psi \)DM; Peebles 2000; Hu et al. 2000; Schive et al. 2014), can provide a range of testable predictions for the properties of cluster-size halos (Sects. 6.1 and 6.2). A prime example is the “Bullet Cluster”, a merging pair of clusters exhibiting a significant offset between the centers of the gravitational lensing mass and the X-ray peaks of the collisional cluster gas (Clowe et al. 2004, 2006). The data support that dark matter in clusters is effectively collisionless like galaxies, placing a robust upper limit on the self-interacting cross-section for dark matter of \(\sigma _\mathrm {DM}/m<1.25\,\hbox {cm}^2\,\hbox {g}^{-1}\) (Randall et al. 2008).

The abundance of clusters as a function of redshift provides a sensitive probe of the amplitude and growth rate of primordial density perturbations, as well as of the cosmological volume element \({\text{d}}^2V(z)/{\text{d}}z/{\text{d}}\varOmega \). This cosmological sensitivity arises mainly because clusters populate the high-mass exponential tail of the halo mass function (e.g., Haiman et al. 2001). In principle, galaxy clusters can complement other cosmological probes, such as CMB, galaxy clustering, cosmic shear, and distant supernova observations. To place cosmological constraints using clusters, however, it is essential to study large cluster samples with well-characterized selection functions, spanning a wide range in mass and redshift (Allen et al. 2004; Vikhlinin et al. 2009; Mantz et al. 2010). Currently, the ability of galaxy clusters to provide robust cosmological constraints is limited by systematic uncertainties in their mass calibration (Pratt et al. 2019). Since the level of mass bias is sensitive to calibration systematics of the instruments (Donahue et al. 2014; Israel et al. 2015) and is likely mass dependent (Sereno and Ettori 2017; Umetsu et al. 2020), a concerted effort is needed to enable an accurate mass calibration with weak gravitational lensing (see Sect. 6.4).

Substantial progress has been made in constructing statistical samples of galaxy clusters thanks to dedicated wide-field surveys in various wavelengths (Planck Collaboration et al. 2014a, 2015a; Bleem et al. 2015; Miyazaki et al. 2018b; Oguri et al. 2018). Systematic lensing studies of galaxy clusters often target X-ray- or SZE-selected samples (e.g., von der Linden et al. 2014a; Postman et al. 2012; Gruen et al. 2014; Hoekstra et al. 2015; Sereno et al. 2017). This is because the hot intracluster gas provides an excellent tracer of the cluster’s gravitational potential (e.g., Donahue et al. 2014), except for the cases of massive cluster collisions caught in an ongoing phase of dissociative mergers (Clowe et al. 2006; Okabe and Umetsu 2008). Moreover, X-ray and SZE observations provide useful centering information of individual clusters. The effect of off-centered clusters is to dilute and flatten the observed \(\varSigma (R)\) profile at scales smaller than the offset scale \(\sigma _\mathrm {off}\) (Johnston et al. 2007; George et al. 2012; Du and Fan 2014). Since flattened, sheet-like mass distributions produce little shear, the impact of miscentering on \(\varDelta \varSigma (R)\) is much larger. The off-centered \(\varDelta \varSigma (R)\) profile is strongly suppressed by smoothing at scales \(R\lesssim 2.5\sigma _\mathrm {off}\) (Johnston et al. 2007). A comparison of X-ray, SZE, and optical (e.g., BCGs) center positions allows us to empirically assess the level of halo miscentering. It should be noted, however, that a merger can boost the X-ray and SZE signals, and make their peaks off-centered during the compression phase (Molnar et al. 2012). Although the timescale on which this happens is expected to be short (\(\sim 1\,\hbox {Gyr}\); Ricker and Sarazin 2001), it could induce a selection effect and contribute to the scatter in their scaling relations (Umetsu et al. 2020).

In this section, we review recent advances in our understanding of the distribution and amount of mass in galaxy clusters based on cluster weak-lensing observations.

6.1 Cluster mass distribution

The distribution and concentration of mass in dark-matter-dominated halos depend fundamentally on the properties of dark matter. For the case of collisionless CDM models, cosmological N-body simulations with sufficiently high resolution can provide accurate predictions for the end product of collisionless collapse in an expanding universe. Although the formation of halos is a complex, nonlinear dynamical process and halos are evolving through accretion and mergers, \({\varLambda }\hbox {CDM}\) models predict that the structure of quasi-equilibrium halos characterized in terms of the spherically averaged density profile \(\rho (r)\) is approximately self-similar with a characteristic density cusp in their centers, \(\rho (r)\propto 1/r\) (Navarro et al. 1996, 1997). The density profile \(\rho (r)\) of dark-matter-dominated halos steepens continuously with radius and it is well described by the NFW form out to the virial radius, albeit with large variance associated with the assembly histories of individual halos (Jing and Suto 2000).

Subsequent numerical studies with improved statistics and higher resolution found that the spherically averaged density profiles of \({\varLambda }\hbox {CDM}\) halos are better approximated by the three-parameter Einasto function with an additional degree of freedom (Merritt et al. 2006; Gao et al. 2008), which is closely linked with the mass accretion history of halos (Ludlow et al. 2013). The Einasto profile has a power-law logarithmic slope of \(\gamma _\mathrm {3D}(r) = -2(r/r_{-2})^{\alpha _\mathrm {E}}\) (Sect. 4.3.3). For a given halo concentration, an Einasto profile with \(\alpha _\mathrm {E}\approx 0.18\) closely resembles the NFW profile over roughly two decades in radius (Ludlow et al. 2013). The shape parameter \(\alpha _\mathrm {E}\) of \({\varLambda }\hbox {CDM}\) halos increases gradually with halo mass and redshift (see Gao et al. 2008; Child et al. 2018; \(0.15\lesssim \alpha _\mathrm {E}\lesssim 0.25\) at \(z=0\)), so that the density profiles of \({\varLambda }\hbox {CDM}\) halos are not strictly self-similar (Navarro et al. 2010). By analyzing a large suite of N-body simulations in \({\varLambda }\hbox {CDM}\), Child et al. (2018) found that both Einasto and NFW profiles provide a good description of the stacked mass distributions of cluster-size halos at low redshift, implying that the two fitting functions are nearly indistinguishable for stacked ensembles of low-redshift clusters, in contrast to clusters at higher redshift (\(z\gtrsim 1\)).

The three-dimensional shape of collisionless halos is predicted to be generally triaxial with a preference for prolate shapes (Warren et al. 1992; Jing and Suto 2002), reflecting the collisionless nature of dark matter (Ostriker and Steinhardt 2003). Older halos tend to be more relaxed and thus to be rounder. Since more massive halos form later on average, cluster-size halos are expected to be more elongated than less massive systems (Shaw et al. 2006; Ho et al. 2006; Despali et al. 2014, 2017; Bonamigo et al. 2015). Accretion of matter from the surrounding large-scale environment also plays a key role in determining the shape and orientation of halos. The halo orientation tends to be in the preferential infall direction of the subhalos and hence aligned along the surrounding filaments (Shaw et al. 2006). The shape and orientation of galaxy clusters thus provide an independent test of models of structure formation (see Sect. 4.6).

Fig. 21
figure 21

Stacked tangential (\(\varDelta \varSigma _+\): top) and cross (\(\varDelta \varSigma _\times \): bottom) shear profiles obtained for a sample of 50 X-ray luminous LoCuSS clusters. Solid red and dashed green lines are the total mass model and the NFW model, respectively. The dotted blue and dash-dotted magenta lines represent the 2-halo term and the central point source multiplied by a factor of 10, respectively. Image reproduced with permission from Okabe and Smith (2016), copyright by the authors

Prior to dedicated wide-field optical imaging surveys such as Subaru HSC-SSP and DES, several cluster lensing surveys carried out deep targeted observations toward a few tens to several tens of highly massive galaxy clusters with \(M_\mathrm {200c}\sim 10^{15}M_\odot \) (e.g., Postman et al. 2012; Okabe et al. 2013; von der Linden et al. 2014a; Hoekstra et al. 2015). Since such clusters are extremely rare across the sky, targeted weak-lensing observations with deep multiband imaging currently represent the most efficient approach to study in detail the high-mass population of galaxy clusters each with sufficiently high S/N (see Contigiani et al. 2019).

In the last decade, cluster–galaxy weak-lensing observations have established that the total matter distribution within clusters in projection can be well described by cuspy, outward-steepening density profiles (Umetsu et al. 2011b, 2014, 2016; Beraldo e Silva et al. 2013; Newman et al. 2013b; Okabe et al. 2013; Sereno et al. 2017), such as the NFW and Einasto profiles with a near-universal shape (Niikura et al. 2015; Umetsu and Diemer 2017), as predicted for collisionless halos in quasi-gravitational equilibrium (e.g., Navarro et al. 1996, 1997; Taylor and Navarro 2001; Merritt et al. 2006; Gao et al. 2008; Hjorth and Williams 2010; Williams and Hjorth 2010). Moreover, the shape and orientation of galaxy clusters as constrained by weak-lensing and multiwavelength data sets are found to be in agreement with \({\varLambda }\hbox {CDM}\) predictions (e.g., Oguri et al. 2005; Evans and Bridle 2009; Oguri et al. 2010; Morandi et al. 2012; Sereno et al. 2013, 2018b; Umetsu et al. 2015, 2018; Chiu et al. 2018; Shin et al. 2018), although detailed studies of individual clusters are currently limited to a relatively small number of high-mass clusters with deep multiwavelength observations (see Sereno et al. 2018b; Umetsu et al. 2018). These results are all in support of the standard explanation for dark matter as effectively collisionless and nonrelativistic on sub-megaparsec scales and beyond, with an excellent match with standard \({\varLambda }\hbox {CDM}\) predictions (however, see Meneghetti et al. 2020, for an excess of galaxy–galaxy strong-lensing events in clusters with respect to \({\varLambda }\hbox {CDM}\)).

In Fig. 21, we show the ensemble-averaged \(\langle \langle \varDelta \varSigma _+\rangle \rangle \) profile in the radial range \(R\in [0.1,2.8]\,h^{-1}\mathrm {Mpc}\) obtained for a stacked sample of 50 X-ray clusters (Okabe and Smith 2016) targeted by the LoCuSS Survey (Local Cluster Substructure Survey; Smith et al. 2016). Their weak shear lensing analysis is based on two-band imaging observations with Subaru/Suprime-Cam. Their cluster sample is drawn from the ROSAT All-Sky Survey (RASS; Voges et al. 1999) at \(0.15<z<0.3\) and is approximately X-ray luminosity limited. The stacked shear profile of the LoCuSS sample is in excellent agreement with the NFW profile with \(M_\mathrm {200c}=6.37^{+0.28}_{-0.27}\times 10^{14}h^{-1}M_\odot \) and \(c_\mathrm {200c}=3.69^{+0.26}_{-0.24}\) at \(z_l=0.23\). The 2-halo term contribution to \(\varDelta \varSigma \) for the LoCuSS sample is negligibly small in the radial range \(\lesssim 2r_\mathrm {200c}\). From a single Einasto profile fit to the stacked \(\langle \langle \varDelta \varSigma \rangle \rangle \) profile, Okabe and Smith (2016) obtained the best-fit Einasto shape parameter of \(\alpha _\mathrm {E}=0.161^{+0.042}_{-0.041}\), which is consistent within the errors with the \({\varLambda }\hbox {CDM}\) predictions for cluster-size halos at \(z_l=0.23\) (Gao et al. 2008; Dutton and Macciò 2014).

Fig. 22
figure 22

Left panel: ensemble-averaged projected mass density profile \(\langle \langle \varSigma (R)\rangle \rangle \) (black squares) for a sample of 16 CLASH X-ray-selected clusters (gray lines), obtained from a joint analysis of strong-lensing, weak-lensing shear and magnification data. Right panel: models with \(\chi ^2\) probabilities to exceed (PTE) of \(>0.05\) are shown with solid lines, while those with \(\mathrm {PTE}\le 0.05\) are shown with dashed lines. The averaged mass profile is well described by a family of density profiles predicted for dark-matter-dominated halos in gravitational equilibrium, such as the NFW, Einasto, and DARKexp models. Cuspy halo models (red) that include the 2-halo contribution from surrounding large-scale structure provide improved agreement with the data. This is a slightly modified version of the figure presented in Umetsu et al. (2016)

Figure 22 shows the ensemble-averaged \(\langle \langle \varSigma \rangle \rangle \) profile of 16 CLASH X-ray-selected clusters (Umetsu et al. 2016) based on a joint strong- and weak-lensing analysis of 16-band Hubble Space Telescope (HST) observations (Zitrin et al. 2015) and wide-field multicolor imaging taken primarily with Subaru/Suprime-Cam (Umetsu et al. 2014). The CLASH survey is an HST Multi-Cycle Treasury program designed to study with 525 assigned orbits the mass distributions of 25 high-mass clusters. In this sample, 20 clusters were selected to have regular X-ray morphologies and X-ray temperatures above 5 keV. Numerical simulations suggest that this X-ray-selected subsample is mostly composed of relaxed clusters (\(\sim 70\%\)) but the rest (\(\sim 30\%\)) are unrelaxed systems (Meneghetti et al. 2014). Another subset of five clusters were selected by their high-magnification lensing properties. Umetsu et al. (2016) studied a subset of 20 CLASH clusters (16 X-ray-selected and 4 high-magnification systems) taken from Umetsu et al. (2014), who presented a joint shear and magnification weak-lensing analysis of these individual clusters. The stacked \(\langle \langle \varSigma \rangle \rangle \) profile over 2 decades in radius, \(R\in [0.02,2]r_\mathrm {200m}\), is well described by a family of density profiles predicted for cuspy dark-matter-dominated halos in gravitational equilibrium, namely, the NFW, Einasto, and DARKexp models (Umetsu et al. 2016).Footnote 18 In contrast, the single power-law, cored-isothermal, and Burkert density profiles are statistically disfavored by the data. Cuspy halo models that include the 2-halo term provide improved agreement with the data.

Umetsu et al. (2016) found the best-fit NFW parameters for the stacked CLASH \(\langle \langle \varSigma \rangle \rangle \) profile of \(M_\mathrm {200c}=10.1^{+0.8}_{-0.7}\times 10^{14}h^{-1}M_\odot \) and \(c_\mathrm {200c}=3.76^{+0.29}_{-0.27}\) (Umetsu et al. 2016) at a lensing-weighted mean redshift of \(z_l\approx 0.34\). Similarly, the best-fit Einasto shape parameter for the stacked \(\langle \langle \varSigma \rangle \rangle \) profile is \(\alpha _\mathrm {E}=0.232^{+0.042}_{-0.038}\), which is in excellent agreement with predictions from \({\varLambda }\hbox {CDM}\) numerical simulations, \(\alpha _\mathrm {E}=0.21 \pm 0.07\) (Meneghetti et al. 2014, \(\alpha _\mathrm {E}=0.24\pm 0.09\) when fitted to surface mass density profiles of projected halos).

Note that the innermost bin in Fig. 22 represents the mean density interior to \(R_\mathrm {min}=40h^{-1}\mathrm {kpc}\) corresponding to the typical resolution limit of their HST strong-lensing analysis, \(\delta \vartheta \approx 10\,\hbox {arcsec}\). This scale \(R_\mathrm {min}\) is about twice the typical half-light radius of the CLASH BCGs (see Tian et al. 2020), within which the stellar baryons dominate the total mass of the clusters (e.g., Caminha et al. 2019). Determinations of the central slope of the dark-matter density profile \(\rho _\mathrm {DM}(r)\) in clusters require additional constraints on the total mass in the innermost region, such as from stellar kinematics of the BCG (Newman et al. 2013a). To constrain \(\rho _\mathrm {DM}(r)\), one needs to carefully model the different contributions to the cluster total mass profile, coming from the stellar mass of member galaxies, the hot gas component, the BCG stellar mass, and dark matter (Sartoris et al. 2020). Moreover, it is important to take into account the velocity anisotropy on the interpretation of the line-of-sight stellar velocity dispersion profile of the BCG (Schaller et al. 2015; Sartoris et al. 2020; He et al. 2020). For these reasons, current measurements and interpretations of the asymptotic central slope of \(\rho _\mathrm {DM}(r)\) in galaxy clusters appear to be controversial (e.g., Newman et al. 2013a; Sartoris et al. 2020; He et al. 2020).

According to cosmological N-body simulations, the spherically averaged density profiles in the halo outskirts are most self-similar when expressed in units of overdensity radii \(r_{\varDelta _\mathrm {m}}\) defined with respect to the mean density of the universe, \(\overline{\rho }(z)\), especially to \(r_\mathrm {200m}\) (Diemer and Kravtsov 2014). This self-similarity indicates that overdensity radii defined with respect to the mean cosmic density are preferred to describe the structure and evolution of the outer density profiles. The structure and dynamics of the infall region are expected to be universal in units of the turnaround radius, according to self-similar infall models (Gunn and Gott 1972; Fillmore and Goldreich 1984; Bertschinger 1985; Shi 2016). In these models, the turnaround radius is a fixed multiple of the radius enclosing a given fixed overdensity with respect to the mean cosmic density. The outer profiles can thus be expected to be self-similar in \(r/r_\mathrm {200m}\). In contrast, the density profiles in the intra-halo (1-halo) region are found to be most self-similar when they are scaled by \(r_\mathrm {200c}\) (or any other critical overdensity radius with a reasonable threshold; Diemer and Kravtsov 2014). That is, density profiles of \({\varLambda }\hbox {CDM}\) halos in N-body simulations prefer different scaling radii in different regions of the density profile (Diemer and Kravtsov 2014). These empirical scalings were confirmed in cosmological hydrodynamical simulations of galaxy clusters (Lau et al. 2015, see also Shi 2016). However, the physical explanation for the self-similarity of the inner density profile when rescaled with \(r_\mathrm {200c}\) is less clear.

Fig. 23
figure 23

Projected mass density (left) and enclosed mass (right) profiles for seven CLASH clusters derived from a detailed strong-lensing analysis, rescaled by \(M_\mathrm {200c}\) and \(r_\mathrm {200c}\) obtained from NFW fits to independent weak-lensing measurements (Umetsu et al. 2018). Vertical lines color-coded for each cluster indicate the positions of multiple images used for the lens modeling, all belonging to spectroscopic confirmed families. MACS J0416 with a shallower inner density slope is a highly asymmetric merging system. Image reproduced with permission from Caminha et al. (2019), copyright by ESO

In Fig. 23, we show the projected total mass density (\(\varSigma \)) and enclosed mass (\(M_\mathrm {2D}\)) profiles for seven CLASH clusters derived from a detailed strong-lensing analysis of Caminha et al. (2019) based on extensive spectroscopic information, primarily from the Multi Unit Spectroscopic Explorer (MUSE) archival data complemented with CLASH-VLT redshift measurements (Biviano et al. 2013; Rosati et al. 2014). In the figure, the projected mass profiles of individual clusters are rescaled using \(M_\mathrm {200c}\) and \(r_\mathrm {200c}\) obtained from NFW fits to independent ground-based weak-lensing measurements (Umetsu et al. 2018). All clusters have a relatively large number of multiple-image constraints in the region \(10^{-2}\lesssim R/r_\mathrm {200c}\lesssim 10^{-1}\), where the shapes of the rescaled \(\varSigma (R)\) and \(M_\mathrm {2D}(<R)\) profiles are remarkably similar. Even MACS J0416 (Zitrin et al.