1 Introduction

In the present era of precision cosmology, various cosmological parameters are known at the percent level, while serious tension remains, in particular, concerning the value of the Hubble parameter [1]. As a next step, a similar precision is desired for galaxy clusters, to be called clusters from now on. Cluster masses can be several times \(10^{15}M_\odot \) and their size, several Mpc. A good understanding of suitable clusters may provide an additional grip on the nature of dark matter.

While interesting details can be derived from dynamical clusters such as the Bullet Cluster [2] and the Cosmic Train Wreck Cluster Abell 520 [3, 4], their structure is complicated and their analysis subject to questions. We shall focus here on clusters that are reasonably spherically symmetric, so that spherical symmetry is a good, and often employed, approximation.

Apart from interest for its own right, study of clusters [5, 6] provides information on the dark matter versus modified Newton force dispute. In particular, the MOND theory [7] has achieved success for galaxies [8]. However, for fat clusters with \(M_{200}\sim 10^{15}M_\odot \), the gravitational acceleration starts out large in the center, and the Newton regime holds up to the MOND radius \(\sqrt{GM_{200}/a_0}\approx 1\) Mpc, given that \(a_0\approx 1.2\times 10^{-10}\) m/s\(^2\) [9]. In fact, self-gravitating isothermal spheres are unstable in Newtonian dynamics, hence they would expand to fill up their MOND radius, causing even high-acceleration systems like galaxy clusters to be affected by MOND in the sense that their size should correspond to their MOND radius [10]. Likewise, the observed velocity dispersion profile of Dragonfly 44 falsifies MOG at 5.5\(\sigma \) [11]. In short, modifications of gravity, such as MOND, but also EG [12, 13], MOG [14] and f(R) are under severe stress [15,16,17]. Like the related f(T) and f(RT) theories, they do not change matters appreciably inside the huge 1 Mpc domain, so that Zwicky’s conundrum—there must be dark matter or something alike—remains unsolved [16].

In principle, dark matter and modified gravity may co-exist, and this combination may actually be required to fully explain the properties of galaxy clusters like El Gordo [18]. In a MOND context, the best developed such proposal is the \(\nu \)HDM framework [19], in which galaxy clusters are explained together with the CMB anisotropies using 11 eV/\(c^2\) sterile neutrinos with the same overall density as the CDM in \(\Lambda \)CDM. This framework might also account for the Hubble tension [20].

Another issue of importance is to establish whether dark matter is self-interacting. Analysis of clusters puts forward a cross section-to-mass ratio \(\sigma /m\sim 1\,\mathrm{cm}^2/\mathrm{gr}\) [21, 22], although the question is not settled [23]. This large value excludes a lot of parameter space for various models of dark matter. Indeed, for MACHOs of Earth mass, the cross section would be comparable to the size of the Earth orbit (\(\pi \) AU\(^2\)), while in reality its cross section is \(\pi R_\oplus ^2\) with \(R_\oplus \) the Earth radius. Hence any type of 100% MACHO dark matter, even if consisting of axion stars or of (primordial) black holes, would be ruled out. Even dark matter particles heavier than, say, 0.4 GeV/\(c^2\), need mediators to establish the self-interaction [24].

Clusters are mostly dynamical, meaning that they are an aggregate of subclusters, which still need giga years to get into an equilibrium to form a (meta)stable cluster. For such situations, at best a reasonable description can be achieved. Two well studied exceptions are the clusters Abell 1689 and Abell 1835. A1689 actually has one in-falling subcluster, far away from the center, with not very large mass, so that including or excluding the quadrant in which it lies, does not cause marked differences. A proper description of A1689 requires triaxiality [25], but actually, the mostly employed spherical approximation functions rather well. A1835 looks even more symmetrical visually, and the spherical approximation works well, but see [26] for triaxiality in its X-ray gas and lensing.

The setup of this article is as follows. In Sect. 2 we present lensing and gas data for A1689 and A1835. In Sect. 3 we discuss models for their mass distributions. In Sect. 4 the fits to A1689 are discussed and in Sect. 5 those to A1835. In Sect. 6 we compare these fits and we close with a summary.

2 Data for Abell 1689 and Abell 1835

2.1 Strong lensing data for A1689

The galaxy cluster Abell 1689 lies at redshift \(z=0.183\) and acts as a strong lens, lensing many background galaxies into a number of up to 5 arclets, i.e., pieces of the ideal Einstein ring. From the observed strong lensing (SL) arclets, 2D mass maps have been generated using the program Lenstool [27], a strong lensing inversion algorithm. The code utilises the positions, magnitudes, shapes, multiplicity and spectroscopic redshifts for the multiply imaged background galaxies to derive the detailed mass distribution of the cluster. The overall mass distribution in cluster lenses is modeled in Lenstool as a superposition of smoother large-scale potentials and small scale substructure that is associated with the locations of bright, cluster member galaxies. Both potentials are described using parametric mass models. The parameters are adjusted in a Bayesian way, i.e., their posterior probability is probed with a MCMC sampler. This process allows an easy and reliable estimate of the errors on derived quantities such as the amplification maps and the mass maps.

This inversion is an underdetermined problem, so that an ensemble \(\mathcal{N}=1001\) of maps compatible with the data is produced. Integrated over the interior of circles around the cluster center, this yields data for \(M_{2d}(r)\), the mass inside a cylinder of projected radius r around the sightline to the cluster centre [17]. For each map m, these \(M_{2d}^{(m)}\) values are evaluated at radii \(r_n= r_1a^{n-1}\) with \(n=1,\ldots , 149\) and \(a=1.0388\), such that \((r_1,r_{149})=(3.15,879)\) kpc. Only \(N=117\) of these \(r_n\) contain physical data, the other ones are omited. The ensemble averages \(M_{2d}(r_n)\) define the cylindrical mass density \({\overline{\Sigma }}{}_n=M_{2d}(r_n)/\pi r_n^2\), while their covariances \(\Gamma _{mn}\) also follow as averages over the maps [17]. The \({\overline{\Sigma }}{}_n\) data with error bars equal to \((\Gamma _{nn})^{1/2}\) are presented in Fig 1, for the present cluster A1689 and for the later discussed cluster A1835. For small and large r, they show quite similar behavior, while at intermediate r, A1689 is denser than A1835.

Fig. 1
figure 1

Strong lensing data for the cylindrical mass density \({\overline{\Sigma }}\) as function of the radius for the clusters A1689 (upper, black) and A1835 (lower, blue). Both clusters behave similarly around their centers; A1689 has more mass around 100 kpc; the clusters behave similarly beyond 500 kpc

2.2 Regularization of the covariance matrix

We shall fit theoretical models for \({\overline{\Sigma }}(r)\) by minimizing

$$\begin{aligned}&\chi ^2({\overline{\Sigma }})=\sum _{i,j=1}^N [{\bar{\Sigma }}(r_i)-{\bar{\Sigma }}_i]C_{ ij}^{-1}[{\bar{\Sigma }}(r_j)-{\bar{\Sigma }}_j]. \end{aligned}$$
(1)

In principle, one has \(C_{ij}=\Gamma _{ij}\). However, the matrix \(\Gamma \) has a big spread of eigenvalues, from \(4.4\times 10^{-15}\) to 0.216 \(\mathrm{gr}^2/\mathrm{cm}^4\). The near-degeneracies arise since the 2d maps are based on bins of which many are empty. The small eigenvalues are somewhat reflected in the small diagonal element around 120 kpc, see Fig. 1. But Eq. (1) with \(C=\Gamma \) is dominated by the very small eigenvalues, which are numerical artefacts. To regularize them, it is customary to employ a Tikhonov regularization counting for further scatter, by adding a constant to the diagonal elements of \(\Gamma \) [28,29,30],

$$\begin{aligned} \mathbf{C}={\varvec{\Gamma }}+\sigma ^2_{SL} {\mathbf {1}}. \end{aligned}$$
(2)

In [29], where the data are binned, we take \(\sigma _{SL}=0.19\) gr/cm\(^2\) and in [30] 0.16 gr/cm\(^2\); the latter value is employed in Fig. 2. It is seen that the Tikhonov-regularized \(C_{ ii}\) lie for large r much above the empirical values \(\Gamma _{ ii}\), so that those data hardly play any role in the analysis.. To acknowledge the decay of \({\overline{\Sigma }}\) as function of r, we add, instead instead of (2), a constant times \({\overline{\Sigma }}{}^2\) to the diagonal, so that we adopt instead the “poor man’s regularization”

$$\begin{aligned} \mathbf{C}={\varvec{\Gamma }}+\alpha ^2{\varvec{{\overline{\Sigma }}}}^2 \end{aligned}$$
(3)

with diagonal \(({\varvec{{\overline{\Sigma }}}})_{ ij}\equiv \delta _{ ij}{\overline{\Sigma }}_i\) and constant \(\alpha \). Writing this as

$$\begin{aligned}&\mathbf{{\tilde{C}}}={{\varvec{\tilde{\Gamma }}}}+\alpha ^2\,{\mathbf {1}}, \quad \mathbf{{\tilde{C}}}={\varvec{{\overline{\Sigma }}}}^{\,-1}{\varvec{C}}\,{\varvec{{\overline{\Sigma }}}}^{\,-1}, \nonumber \\&\quad {{\varvec{\tilde{\Gamma }}}}={\varvec{{\overline{\Sigma }}}}^{\,-1}{\varvec{\Gamma }}\,{\varvec{{\overline{\Sigma }}}}^{\,-1}, \end{aligned}$$
(4)

it is seen to generalize Eq. (2) for cases where \({\overline{\Sigma }}\) changes appreciably, as happens in cluster lensing. This regularization weighs the contributions to \(\chi ^2\) with basically equal weight for all \(r_i\). In Fig. 2 the lower (black) data present the \(\Gamma _{ ii}\). The red points (upper on the right side), present Eq. (2), while the blue ones (middle on the right), presenting Eq. (3), do more justice to the \(\Gamma _{ ii}\) data.

Fig. 2
figure 2

Lower data (black): empirical values \(\Gamma _{ ii}\) of the \({\overline{\Sigma }}{}_i\) covariances, versus \(r_i\). The middle-upper data (red) show the Tikhonov regularized \(C_{ ii}\) with \(\sigma _{SL}=0.16\, \,\mathrm{gr}/\mathrm{cm}^2\). We employ the “poor man’s regularization” (3) for the \(C_{ ii}\) for \(\alpha ^2=0.001\) (in blue), which does more justice to the \(\Gamma _{ ii}\) data

2.3 The transversal shear

The transversal shear is defined as

$$\begin{aligned} g_t(r)=\frac{{\overline{\Sigma }}(r)-\Sigma (r)}{\Sigma _c-\Sigma (r)} , \end{aligned}$$
(5)

where \({\overline{\Sigma }}\) was introduced above, \(\Sigma _c\) is a constant, and \(\Sigma \) is the line-of-sight density (projected density, 2d density) at transversal distance r of the center,

$$\begin{aligned} \Sigma (r)=\int \limits _{-\infty }^\infty \mathrm{d}z\,\rho \big (\sqrt{r^2+z^2}\big ). \end{aligned}$$
(6)

Weak lensing (WL) data for \(\Sigma \) and for \(g_t\) with \(\Sigma _c=0.6815 \, \mathrm{gr}/\mathrm{cm}^2\) in A1689 have been reported by Umetsu et al. [31] and employed by us [30]. They are represented in Figs.  5 and 8, respectively. Since here all bins were filled, their covariance matrices are well behaved.

2.4 Mass profile of the X-ray gas

The mass density of the X-ray gas follows from the observation of the electron density. Recent data at \(r<900\) kpc have been presented in [17], which fit well to a cored Sérsic profile [30]

$$\begin{aligned} n_S(r)=n_e^0\exp \left[ k_g-k_g\left( 1+\frac{r^2}{R_g^{2}}\right) ^{1/(2n_g)}\right] . \end{aligned}$$
(7)

Data for \(r>1\) Mpc, taken from Planck-ROSAT [32], fit to

$$\begin{aligned} n_T=\frac{d_t^2n_e^0}{r^2 + R_t^2}, \qquad \rho _{g,T}={{\overline{m}}}_N n_T=\frac{ \sigma _g^2}{2\pi G(r^2+R_t^2)}.\nonumber \\ \end{aligned}$$
(8)

These behaviors combine into the global fit

$$\begin{aligned} n_e(r)=\big [n_S^{s_t}(r)+n_T^{s_t}(r) \big ]^{1/s_t}. \end{aligned}$$
(9)

so that \(n_e(0)=n_e^0\). The best fit parameters are

$$\begin{aligned}&n_e^0 =0.04376 \pm 0.00098\,\mathrm{cm}^{-3}, \qquad k_g=2.06 \pm 0.12 , \nonumber \\&R_g =21.8 \pm 1.4 \, \mathrm{kpc},\qquad n_g =3.044 \pm 0.062 , \nonumber \\&d_t =81.6 \pm 1.6\,\mathrm{kpc}, \qquad \sigma _g =476.5\pm 7.7\, \mathrm{km}/\mathrm{s},\nonumber \\&R_t = 718 \pm 108 \,\mathrm{kpc},\qquad s_t =8.4 \pm 2.7. \end{aligned}$$
(10)

For a typical \(Z=0.3\) in units of Solar metallicity [33, 34], the mass density of the X-ray gas reads

$$\begin{aligned} \rho _g(r)=\alpha m_H n_e(r) \end{aligned}$$
(11)

where \(\alpha \approx 15/13\) is the average number of nucleons per proton, \(m_H\) is the mass of a neutral hydrogen atom, and \(n_{e}(r)\) is the electron number density.

2.5 Generating \(\Sigma \) data by numerical differentiation

It is useful to derive data for \(\Sigma \), which can be obtained from the SL data for \({\overline{\Sigma }}\). We start from the relation

$$\begin{aligned} \Sigma (r)= {\overline{\Sigma }}(r)+\frac{1}{2}\frac{\mathrm{d}{\overline{\Sigma }}(r)}{\mathrm{d}\log r} , \end{aligned}$$
(12)

which follows from the relation between the cylindrical mass density \({\overline{\Sigma }}\) and the line-of-sight mass density \(\Sigma \),

$$\begin{aligned} {\overline{\Sigma }}(r)= & {} \frac{M_{2d}(r)}{\pi r^2} =\frac{2}{r^2}\int _0^r\mathrm{d}u\,u\Sigma (u) , \end{aligned}$$
(13)

where \(M_{2d}\) is the mass in a cylinder of radius r around the cluster center. By taking \(\mathrm{d}{\overline{\Sigma }}\) and \(\mathrm{d}\log r\) from 58 pairs of adjacent data points, we work with the numerical derivative

$$\begin{aligned} \frac{\Delta {\overline{\Sigma }}(r)}{\Delta \log r} =\frac{{\overline{\Sigma }}{}_{2n}^{(1)}-{\overline{\Sigma }}{}_{2n-1}^{(1)}}{\log r_{2n}^{(1)}-\log r_{2n-1}^{(1)}},\quad \end{aligned}$$
(14)

to be considered at the geometrical average position,

$$\begin{aligned} r_n^{(2)}=\sqrt{r_{2n-1}^{(1)}r_{2n}^{(1)}}. \end{aligned}$$
(15)

Here \({\overline{\Sigma }}{}_{n}^{(1)}\) and \(r_{n}^{(1)}\) denote the original unbinned data (having bin size \(b=1\)). The data for \(\Sigma \) follow from these as

$$\begin{aligned}&\Sigma _n^{(2)}={\overline{\Sigma }}{}^{(2)}_n +\frac{{\overline{\Sigma }}{}_{2n}^{(1)}-{\overline{\Sigma }}{}_{2n-1}^{(1)}}{2\log r_{2n}^{(1)}/r_{2n-1}^{(1)}},\nonumber \\&\quad {\overline{\Sigma }}{}^{(2)}_n\equiv \frac{{\overline{\Sigma }}{}_{2n-1}+{\overline{\Sigma }}{}_{2n}}{2}. \end{aligned}$$
(16)

In these definitions, the superscript denotes that two \({\overline{\Sigma }}\) data points are employed for each \(\Sigma \) value. The covariance matrix of the \(\Sigma \), to be denoted as \(\Gamma (\Sigma ^{(2)})\) follows by the rule (16) from the covariance matrix \(\Gamma ({\overline{\Sigma }})\equiv \Gamma \). It has to be regularized as well. In analogy with (3) we take

$$\begin{aligned} C_{ mn}(\Sigma ^{(2)})=\Gamma _{ mn}(\Sigma ^{(2)})+\alpha ^2\delta _{ mn}(\Sigma _n^{(2)})^2. \end{aligned}$$
(17)

It turns out that several of the \({\overline{\Sigma }}{}_n^{(2)}-\Sigma _n^{(2)}\) are negative, an unphysical effect arising from noise in the data and/or lack of perfect sphericality. This can be overcome when first binning the data in bins of \(b=3\) points. For general b one has

$$\begin{aligned} {\overline{\Sigma }}{}_n^{(b)}= \frac{1}{b}\sum _{k=0}^{b-1}{\overline{\Sigma }}{}_{nb-k}^{(1)}. \end{aligned}$$
(18)

The binned position is located at

$$\begin{aligned} r_n^{(b)}=\Big (\Pi _{k=0}^{b-1}r_{bn-k}^{(1)}\Big )^{1/b} \end{aligned}$$
(19)

After the binning, the quantities are defined as in (16),

$$\begin{aligned}&\Sigma _n^{(2b)}={\overline{\Sigma }}{}_n^{(2b)}+\frac{{\overline{\Sigma }}{}_{2n}^{(b)}-{\overline{\Sigma }}{}_{2n-1}^{(b)}}{2 \log r_{2n}^{(b)}/r_{2n-1}^{(b)}}, \nonumber \\&\quad {\overline{\Sigma }}{}_n^{(2b)}=\frac{{\overline{\Sigma }}{}_{2n-1}^{(b)}+{\overline{\Sigma }}{}_{2n}^{(b)}}{2}, \end{aligned}$$
(20)

and located at the binned position

$$\begin{aligned} r_n^{(2b)}=\sqrt{r_{2n-1}^{(b)} r_{2n}^{(b)}}. \end{aligned}$$
(21)

With the same rules taken bilinearly, the correlator \(\Gamma (\Sigma ^{(2b)})\) follows the \(\Gamma ({\overline{\Sigma }})\), and its regularization \(C(\Sigma ^{(2b)})\) involves \(\Sigma _i^{(2b)}\) and the common value \(\alpha =\sqrt{0.001}=0.032\).

We bin the data in bins of \(b=3\); this produces 19 pairs of data points from which \(\Sigma \) is determined. Likewise, we produce the related 19 data points for the combinations \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\).

Their covariance matrices are constructed along similar lines, with a common value of \(\alpha \). Like the \(\Sigma \), these observables are made up by combining two adjacent bins,

$$\begin{aligned} O_i=\lambda _{2i-1}^{(b)}{\overline{\Sigma }}{}_{2i-1}^{(b)}+\lambda _{2i}^{(b)}{\overline{\Sigma }}{}_{2i}^{(b)}, \end{aligned}$$
(22)

with coefficients determined by (18) and (20). Their covariances,

$$\begin{aligned} \Gamma _{ ij}(O)=\sum _{s,s'=0}^1 \lambda ^{(b)}_{2i-s} \lambda ^{(b)}_{2j-s'} \,\Gamma ({\overline{\Sigma }}^{(b)})_{ 2i-s,2j-s'}, \end{aligned}$$
(23)

inherit small eigenvalues from \({\varvec{\Gamma }}({\overline{\Sigma }}) \), so regularization is again needed. We take \(\mathbf{C}(O)={\varvec{\Gamma }}(O)+\alpha ^2\mathbf{O}^2\), taking a common \(\alpha \) for \(O=\Sigma \), \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\).

It is our philosophy to use all information available, and in particular, not to skip the data for, say, \(r<300\) kpc. While such a restricted data set would involve small scatter also in the differentiated data, its fit would be stronger driven by the regularization, see Figs. 2 and 3. The global outcome of such analyses is that the present fits still work, though less decisive, while other fits would be less ruled out, or even comparably acceptable.

2.6 Data for A1835

The galaxy cluster Abell 1835 lies at redshift 0.2532 and shares many characteristics with Abell 1689. In Fig. 1 it is seen that it has a similar mass in the center, less mass around 100 kpc, and quite similar mass beyond 500 kpc.

Strong lensing and X-ray data for A1835 were presented recently by us [17]. Our routine generates mass maps as in A1689, at the radii \(r_n= r_1a^{n-1}\) for \(n=1,\cdots , 149\). This involves the same ratio a but different \(r_1\), such that with \((r_1,r_{149})=(4.027, 1120)\) kpc. Again 117 points contain physical information. We produced 1001 mass maps \(M_{2d}^{(m)}\), which are averaged to yield data for \({\overline{\Sigma }}\).

In A1835 the covariance matrix for \({\overline{\Sigma }}\) encounters the same problem as in A1689: there are very small eigenvalues (they range from 4.4 \(\times 10^{-15}\) to 0.22 gr\(^2\)/cm\(^4\)) and the diagonal elements have a minimum at 187 kpc, see Fig. 3. Instead of the Tikhonov regularization (2), we again adopt the “poor man’s regularization” (3), here with \(\alpha =0.05\). The elements \(\Gamma _{ ii}\) and \(C_{ ii}\) are presented in Fig. 3. The values of \(C_{ ii}/{\overline{\Sigma }}{}_i^2\) vary over a factor 9.3. Making \(\alpha \) still smaller would enhance this ratio and induce an overfitting of the data around 187 kpc.

As before, we also construct 19 data points for \(\Sigma \), \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\) from the \({\overline{\Sigma }}\) data in A1835.

Fig. 3
figure 3

The empirical covariance elements \(\Gamma _{ ii}\) as function of the \(r_i\) (lower data, black). The Tikhonov regularization (2) with \(\sigma _{SL}=0.03\) gr/cm\(^2\) (red) exceeds the data strongly at large r. The “poor man’s regularization” (3) with \(\alpha _{SL}=0.01\) (blue) does more justice to the large-r data

The X-ray data for the electron density have been presented by us as well [29]. We found that the following profile explains the data well,

$$\begin{aligned}&\rho _{ g}(r)={{\overline{m}}}_Nn_e(r),\nonumber \\&\quad n_e(r)=\frac{(1+r^2/R_0^2)\, n_e^0}{ (1 + r^2/R_1^2) (1 + r^2/R_2^2) } . \end{aligned}$$
(24)

The best fit parameters are

$$\begin{aligned} n_e^0= & {} 0.0927\pm 0.0070 \, \mathrm{cm}^{-3},\quad R_0 = 91 \pm 13 \, \mathrm{kpc}, \nonumber \\ R_1= & {} 31.8\pm 2.9 \, \mathrm{kpc},\quad R_2 = 169 \pm 15 \, \mathrm{kpc}. \end{aligned}$$
(25)

In these clusters, the gas mass density only becomes significant at large r, because the galaxies are dominant at small r.

3 Theoretical mass profiles

3.1 Generalities

A celebrated profile in astrophysics is the Sérsic profile for the line-of-sight (2d) luminosity of galaxies, which has the form of a stretched exponential, see Eq. (26) below. In cosmology, the most popular mass profile is the so-called NFW profile, (see Eq. (30) below), which has a cusp at the origin and falls of as a power law [35]. With further coauthors, the NFW authors observe in dark-matter-only simulations that the stretched exponential profile describes the 3d mass density better than the NFW profile [36]. As exposed in their Fig. 6, the authors find good fits for families of dwarf galaxies, families of galaxies and families of galaxy clusters, with mass densities ranging from \(\times 10^4M_\odot /\mathrm{kpc}^3\) to \(10^8M_\odot /\mathrm{kpc}^3\). The scales range from 0.2 kpc for dwarf galaxies, to several hundred kpc for galaxies, up to 1.5 Mpc for fat galaxy clusters, respectively. The mean Sérsic n values are 3.0 for dwarf- and galaxy-sized halos and 2.4 for cluster-sized halos, similar to the values that characterize luminous elliptical galaxies [36]. We consider two mass profiles: the double stretched exponential profile (DSE) and thermal fermions.

3.2 The stretched exponential BCG mass profile

A stretched exponential profile has three parameters, the amplitude \(\rho _0\), the scale R, and the index n,

$$\begin{aligned} \rho _{ SE}=\rho _0 \exp [{-(r/R)^{1/n}}]. \end{aligned}$$
(26)

It has total mass

$$\begin{aligned} M=\frac{4\pi }{3}\Gamma (1+3n) \rho _0R^3, \end{aligned}$$
(27)

where \(\Gamma (n)\) is the standard generalization of \((n-1)!\) It corresponds to a central line-of-sight mass density

$$\begin{aligned} S\equiv \Sigma (0)={\overline{\Sigma }}(0)=2\Gamma (1+n)\rho _0 R =\frac{\Gamma (n) M}{2\pi \Gamma (3n) R^2}.\nonumber \\ \end{aligned}$$
(28)

3.3 Double stretched exponential profile

The stretched exponential is an interesting candidate to model the combined mass density of the dark matter and the galaxies. For this aim, one assumes a sort of equilibration between them. To put it bluntly, for this profile one works with the adagio “where there are stars, there can not be dark matter”.

Since the central, brightest cluster galaxy is much heavier than the cluster halo extrapolated towards the center, we adopt an additional stretched exponential for it and arrive at the double stretched exponential profile (DSE). For the modeling of clusters, we thus assume a stretched exponentials for the total cluster (halo, h), and an additional one for the brightest cluster galaxy (BCG, b). Incorporating the gas, the total mass density reads

$$\begin{aligned} \rho =\rho _be^{-(r/R_b)^{1/n_b}}+\rho _he^{-(r/R_h)^{1/n_h}} +\rho _{ g}(r) . \end{aligned}$$
(29)

In the central regions, \(\rho _{ g}\) is much smaller than the other two, and hence irrelevant. While \(\rho _h\) decays exponentially fast to zero for r beyond \(R_h\), \(\rho _{ g}\) only decays as a power law, so it assures a power law decay of the total density.

3.4 NFW profiles for the halo

A popular profile is the NFW profile,

$$\begin{aligned} \rho _{ NFW}= \frac{ A}{ (r/R) (1+r/R)^{2}} , \end{aligned}$$
(30)

and its generalization the gNFW profile (any n),

$$\begin{aligned} \rho _{ gNFW}= \frac{ A}{ (r/R)^{n} (1+r/R)^{3 - n}} . \end{aligned}$$
(31)

This often employed profile has first been inferred from dark matter-only simulations, and it is often supposed to hold with the baryon density is included. This has the benefit of very few fit parameters (2 for NFW, 3 for gNFW). A drawback is then that it does not provide a handle on the matter density of the galaxies.

Putting things together, we have a stretched exponential for the BCG, a (g)NFW for the halo, on top of the gas density, viz.

$$\begin{aligned} \rho (r)=\rho _be^{-(r/R_b)^{1/n_b}}+\rho _{\mathrm{(g)NFW}}(r)+\rho _{ g}(r). \end{aligned}$$
(32)

3.5 Thermal fermionic dark matter

In 2009 we found the first indications that thermal fermions provide a good fit for lensing data of the cluster Abell 1689 [34]. Our followup studies have confirmed this [17, 29, 30]. Here we subject this profile to a more stringent test.

Consider g thermal fermion species of mass m at temperature \(T=m\sigma ^2\), where \(\sigma \) is the velocity dispersion and \(\mu \) chemical potential per unit mass, in the local gravitational potential \(\varphi (r)\). Their mass density reads

$$\begin{aligned} \!\! \rho _\nu (r)= & {} \!\!\int \! \!\frac{\mathrm{d}^3p}{(2\pi \hbar )^3}\frac{gm}{\exp \{[p^2/2m+m\varphi (r)-m\mu ]/T \}+1} \nonumber \\= & {} gm\Big (\frac{m\sigma }{\sqrt{2\pi }\hbar }\Big )^3\mathcal{L}i_{3/2}\Big [\frac{\mu -\varphi (r)}{\sigma ^2}\Big ]. \end{aligned}$$
(33)

The index \(\nu \) expresses that a possible realization lies in sterile neutrinos, as suggested by [19, 20] in the context of MOND [7], for which also arguments from the El Gordo cluster were given [37].

The logarithmic integral is in general defined as

$$\begin{aligned} \mathcal{L} \, \! i_\alpha (x)= & {} \frac{1}{\Gamma (\alpha )}\int _0^\infty \frac{\mathrm{d}y \, y^{\alpha -1}}{e^{y-x}+1} \qquad \nonumber \\= & {} -\mathrm{Li}_{\alpha }[-e^x]=\sum _{k=1}^\infty \frac{(-1)^{k-1}}{k^{\alpha }}e^{kx} \end{aligned}$$
(34)

with \(\Gamma (\alpha )\) Euler’s Gamma function and Li\(_\alpha \)(y) the standard logarithmic integral. For \(\mathfrak {R}(\alpha )>0\) the integral is well defined at all real x. The sum converges for \(x\le 0\), so that \(\mathcal{L}\, \! i_\alpha (x)\approx e^x\) for \(x\ll -1\). A better approximation is

$$\begin{aligned} \mathcal{L}\, \! i_\alpha (x)\approx \frac{e^x}{1+\frac{1}{2^\alpha } e^x} \qquad (x\ll -1). \end{aligned}$$
(35)

Exact to order \(e^{6x}\) is the Padé approximant

$$\begin{aligned}&\mathcal{L}\, \! i_\alpha (x)\nonumber \\&\quad \approx \frac{e^x+(\frac{2^\alpha }{3^\alpha }-\frac{1}{2^\alpha })e^{2x} + \frac{1}{2^a}e^{3x}+( \frac{2^\alpha }{5^\alpha }+\frac{2^\alpha }{9^\alpha }-\frac{2}{4^\alpha })e^{4x}}{1+\frac{2^\alpha }{3^\alpha }e^x + \frac{1}{2^a}e^{2x}+\frac{2^\alpha }{5^\alpha }e^{3x}}.\nonumber \\ \end{aligned}$$
(36)

For any \(\alpha >0\) the coefficients are positive. In our case \(\alpha =\frac{3}{2}\), Eq. (36) takes at \(x=0\) the value 0.768095, close to the exact value \((1-\frac{1}{2^{1/2}})\zeta _{3/2}=0.765147.\)

In this approach, the baryonic mass has to be specified. Various components contribute to the baryonic matter: the brightest cluster galaxy, the other (“halo”) galaxies, globular clusters, cold gas clouds, the X-ray gas, etc. A fit to data for the X-ray gas has been discussed above. For the brightest cluster galaxy (BCG, “b”) we adopt the previous stretched exponential form

$$\begin{aligned} \rho _{{ b}}(r)= & {} \frac{1}{\Gamma (1+3 n_{ b})} \frac{3M_{ b}}{4\pi R_{ b}^3} \exp [{-(r/R_{ b})^{1/n_{ b}}}]. \end{aligned}$$
(37)

All other parts are lumped into the term “mass density of galaxies” (G). An adequate profile with total mass \(M_{ G}\), inner scale \(R_c\) and outer scale \(R_g\) is [38]

$$\begin{aligned} \rho _{ G}(r)= & {} \frac{(R_c + R_g)M_{ G}}{2 \pi ^2 (r^2 + R_c^2) (r^2 + R_g^2)} , \end{aligned}$$
(38)

These components model the total mass density of galaxies \(\rho _{{ b}}(r)+ \rho _{ G}(r)\) which has at \(r=0\) the property

$$\begin{aligned} \Sigma _{{ b}}(0)+ \Sigma _{ G}(0)= \frac{\Gamma (n_{ b})}{ \Gamma (3 n_{ b})}\frac{M_{ b}}{2\pi R_{ b}^2}+\frac{ M_{ G}}{ 2\pi R_c R_g}. \end{aligned}$$
(39)

The gravitational potential \(\varphi \), which enters \(\rho _\nu \) in Eq. (33), is solved self-consistently from the Poisson equation

$$\begin{aligned} \varphi ''+\frac{2}{r}\varphi '=4\pi G\rho , \qquad \rho =\rho _b+\rho _G+\rho _g+\rho _\nu .\nonumber \\ \end{aligned}$$
(40)

3.6 Lensing observables

We focus on the line-of-sight mass density (2d-density)

$$\begin{aligned} \Sigma (r)=\int _{-\infty }^\infty \mathrm{d}z\,\rho (\sqrt{r^2+z^2}) =\int _r^\infty \mathrm{d}u\,\frac{2u\rho (u )}{\sqrt{u^2-r^2}} ,\nonumber \\ \end{aligned}$$
(41)

and the cylindrical mass density

$$\begin{aligned}&{\overline{\Sigma }}(r)=\frac{1}{\pi r^2}M_{2d}(r)=\frac{2}{ r^2}\int _0^r\mathrm{d}s \, s\Sigma (s)\nonumber \\&= \int _0^r\mathrm{d}u\,\frac{4u^2}{r^2}\rho (u )+\nonumber \\&\quad \int _r^\infty \mathrm{d}u\,\frac{4u\rho (u)}{u+\sqrt{u^2-r^2}}. \end{aligned}$$
(42)

In the fermionic application, the Poisson equation allows to express \(\Sigma (r)\) as [29]

$$\begin{aligned}&\Sigma (r)=\frac{1}{2\pi G}\int _0^\infty \mathrm{d}s\,\frac{\cosh 2s}{\sinh ^2s}\nonumber \\&\quad \Big [\varphi '(r\cosh s)-\frac{\varphi '(r)}{\cosh ^2s}\Big ] , \end{aligned}$$
(43)

and \({\overline{\Sigma }}(r)\) as the simpler expression [34]

$$\begin{aligned} {\overline{\Sigma }}(r)=\frac{1}{\pi G}\int _0^\infty \mathrm{d} s\, \varphi '(r\cosh s) . \end{aligned}$$
(44)

We also consider the combinations \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\). If the mass density is cored at the origin, \({\overline{\Sigma }}-\Sigma \) will vanish there, so this combination tests the central behaviors. \(2\Sigma -{\overline{\Sigma }}\), on the other hand, tests the decay at large r. Indeed, consider an isothermal fall off \(\rho \approx \sigma ^2/2\pi G r^2\), for which

$$\begin{aligned} \Sigma \approx \frac{\sigma ^2}{2 G r},\quad {\overline{\Sigma }}\approx \frac{\sigma ^2}{G r}+ \frac{M_0}{r^2} . \end{aligned}$$
(45)

In our cases where \(\rho \) always exceeds at intermediate r its large-r asymptote, the “excess” mass \(M_0\) is positive. While \({\overline{\Sigma }}-\Sigma \) starts linearly from 0 at the origin, it will decay as \({\sigma ^2}/{2G r}+ {M_0}/{r^2}\) for large r. On the contrary, the combination \(2\Sigma -{\overline{\Sigma }}\) starts at some finite \(\Sigma (0)\), while at large r the leading terms cancel, leaving a \(-M_0/r^2\) decay. Obviously, it changes sign at some finite r; hence \(2\Sigma -{\overline{\Sigma }}\) is a sensitive quantity for testing the large-r regime. It actually holds by definition that

$$\begin{aligned} 2\Sigma (r)-{\overline{\Sigma }}(r)=\frac{1}{\pi }\,\frac{\mathrm{d}}{\mathrm{d}r}\left( \frac{M_{2d}(r)}{r}\right) , \end{aligned}$$
(46)

so its zero crossing occurs at the maximum of \(M_{2d}/r\). This is reminiscent of the circular rotation speed of objects in the cluster, \(v_{ rot}(r)=\sqrt{GM_{3d}(r)/r}\). The circular speed can have a maximum and, with it, \(M_{3d}/r\). Eq. (46) changes sign at an intermediate r.

4 Fits for A1689

We fit models for the mass distribution in A1689 to the strong lensing data for \({\overline{\Sigma }}\) and the weak lensing for \(\Sigma \) and \(g_t\) data. We combine with the SL data for \(\Sigma \), \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\), derived numerically from \({\overline{\Sigma }}\), while neglecting their mutual correlations. Hereto one may imagine that each of them derives from averages over 250 of the in total 1001 \(M_{2d}^{(m)}\) maps. Alternatively, one may view the derived values of \(\Sigma \), \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\) just as tools to optimize fit to the \({\overline{\Sigma }}\) data.

In A1689 the total \(\chi ^2\) is taken as

$$\begin{aligned} \chi ^2= & {} \frac{1}{4}\chi ^2_{SL}({\overline{\Sigma }})+\chi ^2_{WL} (\Sigma )+\chi ^2_{WL}(g_t)+\Delta \chi ^2 _{SL}\nonumber \\ \Delta \chi ^2_{SL}= & {} \chi ^2_{SL}(\Sigma )+\chi ^2_{SL} ({\overline{\Sigma }}-\Sigma )+\chi ^2_{SL}(2\Sigma -{\overline{\Sigma }}).\nonumber \\ \end{aligned}$$
(47)

The first term involves the 117 SL data for \({\overline{\Sigma }}\), with a weight factor 1/4 adopted to compensate for not-binning these data. \(\chi ^2_{WL}(\Sigma )\) involves the 14 WL data points for \(\Sigma \) and their covariance matrix, and \(\chi ^2_{WL}(g_t)\) involves the 13 WL data points for \(g_t\) and their diagonal covariance matrix. The correlation matrix \(C(\Sigma )\) is regularized by adding \(\alpha ^2\Sigma ^2\) on the diagonal of \(\Gamma (\Sigma )\). Likewise, for \(C({\overline{\Sigma }}-\Sigma )\) and \(C(2\Sigma -{\overline{\Sigma }})\) we add \(\alpha ^2({\overline{\Sigma }}-\Sigma )^2\) and \(\alpha ^2(2\Sigma -{\overline{\Sigma }})^2\), respectively, to their diagonals. In A1689 we adopt the values

$$\begin{aligned} \alpha ({\overline{\Sigma }})\!=\!0.03,\quad \alpha (\Sigma )\!=\!\alpha ({\overline{\Sigma }}-\Sigma )\!=\!\alpha (2\Sigma -{\overline{\Sigma }})\!=\!0.1.\nonumber \\ \end{aligned}$$
(48)

We neglect the correlation between the various SL terms in (47), but keep the off-diagonal elements in each one.

We have attempted various further regularization schemes, without much improvement of the fits.

As typical when working with SL data that involve very small eigenvalues, the choice of our regularization parameter \(\alpha \) in Eq. (3) needs some care. Extreme cases are to be avoided: a too large \(\alpha \) effectively discards all information in the covariance matrix, while taking it too small gives too much weigth to the numerical artifacts in it. Hence it has been selected to get values \(\chi ^2/\nu \) of order 1 for the best fit. It then serves to establish the relative quality of fits, and varying it within a reasonable range does not alter the relative quality much.

4.1 Double stretched exponentials

The minimum of \(\chi ^2\) is determined as function of the parameters. The errors in the parameters \(\{p_i\}\) are set by linear regression. First, the Hessian \(H_{ij}=\frac{1}{2}\partial _{p_i}\partial _{p_j}\chi ^2\) is calculated by numerical differentiation and inverted. The diagonal elements provide the \(1\sigma \) error bars \(\delta p_i=\sqrt{(H^{-1})_{ ii}}\), while the off-diagonal elements represent parameter covariances. The best fit in A1689 reads

$$\begin{aligned} M_h= & {} (171.2 \pm 6.7)10^{13}M_\odot ,\quad M_b=( 6.7 \pm 2.7)\times 10^{11}M_\odot , \nonumber \\ R_h= & {} (1.91\pm 0.81) \mathrm{kpc},\qquad \quad R_b=( 1.56 \pm 0.56)\mathrm{kpc}, \nonumber \\ n_h= & {} 2.84 \pm 0.07, \qquad \qquad \qquad n_b=1.29 \pm 0.44 \end{aligned}$$
(49)

While the halo is quite well confined, the brightest cluster galaxy is less so; this is no big surprise, since the central data are scarce and relatively noisy. The \(M_b\) value is to be compared with \((13.0\pm 2.7) 10^{11}M_\odot \) from Loubser et al [39].

The values of the separate \(\chi ^2\) terms of Eq. (47) are listed in the second column of Table 1. The last line gives \(\chi ^2/\nu \), with the number of free parameters \(\nu =N-6\) for the DSE model. The number of data points corresponding to (47) is \(N=152\frac{1}{4}\). The value \(\chi ^2/\nu =1.81\) presents an acceptable fit. The enclosed mass at overdensity of 200 and 500 reads, respectively,

$$\begin{aligned} r_{200}= & {} 1992\,\,\mathrm{kpc},\qquad M_{200}=(163 \pm 5)\times 10^{13} M_\odot , \nonumber \\ r_{500}= & {} 1435\,\,\mathrm{kpc},\qquad M_{500}=(139\pm 4)\times 10^{13} M_\odot .\nonumber \\ \end{aligned}$$
(50)
Table 1 The separate \(\chi ^2\) values in A1689, for the double stretched exponential (DSE) model, and for the thermal fermion dark matter model
Fig. 4
figure 4

Strong lensing data for the cylindrical mass density \({\overline{\Sigma }}\) in A1689 with the best double stretched exponential (DSE, red) and fermion (blue) fits. Both are very good

Fig. 5
figure 5

The strong lensing data of the line-of-sight mass density \(\Sigma \) in A1689 (small symbols) stem well with the weak lensing data (large symbols). The difference between the best double stretched exponential (DSE, red, lower) and the best fermion (blue, upper) fit at large r is statistically not relevant

Fig. 6
figure 6

Data of \({\overline{\Sigma }}-\Sigma \) in A1689 with the best double stretched exponential fit (red) and the best fermion fit (blue)

Fig. 7
figure 7

Data of \(r(2\Sigma -{\overline{\Sigma }})\) in A1689 with the best DSE fit (red, lower) and the best fermion fit (blue, upper). Additional data beyond 1 Mpc may settle their difference statistically

Fig. 8
figure 8

Data of the transversal shear \(g_t\) in A1689. The small data points are obtained, without binning, from the strong lensing \({\overline{\Sigma }}\) data. The large data points are from weak lensing analysis. Both sets agree well. The best DSE fit (red) is the lowest, except around 1 Mpc, and the best fermion fit is in blue

4.2 Thermal fermions in A1689

The best fit occurs at parameter values

$$\begin{aligned}&m_{12} =1.56 \pm 0.14 \,\mathrm{eV},\quad&\mu = (7.3 \pm 1.4 )\times 10^6 \, \mathrm{km}^2/\mathrm{s}^2,\nonumber \\&\sigma =1290\pm 120 \, \mathrm{km}/\mathrm{s},\quad&M_G = ( 8.97 \pm 0.47)\times 10^{13}M_\odot ,\quad \nonumber \\&R_c =33.8 \pm 0.48 \,\mathrm{kpc},\quad&R_g\,=119 \pm 21 \,\mathrm{kpc},\quad \nonumber \\&M_b = ( 13.0\pm 2.7)\times 10^{11}M_\odot ,\,\,&R_b= 4.27 \pm 0.47 \, \mathrm{kpc},\quad \nonumber \\&n_b=0.90\pm 0.13 . \end{aligned}$$
(51)

where \(m_{12}=(g/12)^{1/4}m\), which we employ for comparison with earlier work. The first six compare well with earlier fits, while the last three, referring to the BCG, are new. The value for \(M_b\) is adopted from Loubser et al [?]. The \(\chi ^2\) values for the various components are presented in the right column of Table 1. The overall value \(\chi ^2/\nu =1.32\) represents a good fit. The enclosed masses at \(r_{200}\) and \(r_{500}\) in the fermion model are very close to the ones (50) of the DSE model,

$$\begin{aligned} \,\,r_{200}&=2048\,\mathrm{kpc},\quad&M_{200}=(165 \pm 6)\times 10^{13}M_\odot \nonumber \\ \,\, r_{500}&=1389\,\mathrm{kpc},\quad&M_{500}=(126\pm 4) \times 10^{13}M_\odot ,\nonumber \\ \end{aligned}$$
(52)

the reason being that both fits are good. For comparison, the MOND radius \(\sqrt{GM_{200}/a_0}\) with \(a_0\approx 1.2\times 10^{-10}\) m/s is 1.4 Mpc.

4.3 NFW-type profiles in A1689

Fitting the 2-parameter NFW profile combined with the gas profile to secure a proper fall off at large r, to the SL data of \({\overline{\Sigma }}\) alone yields the good fit \(\chi ^2/\nu =0.88\). However, this deteriorates when other data are included, the reason being in particular the behavior at \(r\sim 2-3\) Mpc. With the weak lensing data for \(\Sigma \) and \(g_t\) added, the fit is already bad, with \(\chi ^2/\nu \sim 10\). When the \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\) are included, the situation worsens considerably; then the failure at large r already becomes relevant for \(r<1\) Mpc. To employ only 2 fit parameters is too poor, given the precise data.

Taking the sum of 2 NFW profiles does not work well either. Neither does the case regularly adopted in literature of one NFW profile at small r and another at larger r (which corresponds to taking their maximum).

One of our further attempts to improve the fit involves a gNFW profile for the halo and a stretched exponential for the BCG, to which the gas density is again added for the behavior at large r. The best case is found when the gNFW index is \(n=0\), that is to say: no cusp in the halo part, the BCG being fully accounted for by the stretched exponential. This cored case \(n=0\) goes against the philosophy of a cuspy NFW profile. Since the remaining fit is by far not so good as in the DSE and thermal fermion model, we refrain from presenting further details.

5 Fits for Abell 1835

In this cluster we take half of the regularization parameters (48) in A1689,

$$\begin{aligned}&\alpha ({\overline{\Sigma }})=0.015,\quad \alpha (\Sigma )\nonumber \\&\quad =\alpha ({\overline{\Sigma }}-\Sigma )=\alpha (2\Sigma -{\overline{\Sigma }})=0.05. \end{aligned}$$
(53)

This sharpening is permissible, since the cluster looks more regular, and, perhaps, because there are no WL data upto 2–3 Mpc that would put further constraints.

5.1 Double stretched exponentials in A1835

We repeat the above procedure. The best fit reads

$$\begin{aligned}&M_1=(0.60 \pm 0.21 )\times 10^{13}M_\odot , \quad M_2=(161\pm 15)\times 10^{13}M_\odot \nonumber \\&R_1=(8.0\pm 4.8)\mathrm{kpc}, R_2=(29 \pm 14)\mathrm{kpc}, \nonumber \\&n_1=1.04 \pm 0.36, \quad n_2=1.94 \pm 0.20 . \end{aligned}$$
(54)

Over-densities of 200 and 500 relate to

$$\begin{aligned}&r_{200}=(1926 \pm 42)\,\mathrm{kpc},\quad M_{200}=(160 \pm 10) \times 10^{13} M_{\odot } , \nonumber \\&r_{500}= (1327 \pm 30) \,\mathrm{kpc},\quad M_{500}=(131.0 \pm 8.8) \times 10^{13} \, M_\odot .\nonumber \\ \end{aligned}$$
(55)

Accidentally, this \(M_{200}=1.61 \times 10^{15} M_{\odot }\) is close to the \(M_{200}=1.65 \times 10^{15} M_{\odot }\) for A1689. Hence the MOND radius \(\sqrt{GM_{200}/a_0}\) with \(a_0\approx 1.2\times 10^{-10}\) m/s is again 1.4 Mpc.

Table 2 The separate \(\chi ^2\) values per data point for the double stretched exponential (DSE), and thermal fermion profiles, fit to data in A1835

The \(\chi ^2\) values for the separate components are given in Table  2. It is seen that the DSE fit is stunningly good, with the only large term in the weakly determined BCG. The excellence of the fit is also observed from the red curves in Figs. 9, 10, 11 and 12.

Fig. 9
figure 9

Data of \({\overline{\Sigma }}\) in A1835 with the best Double Stretched Exponential (red) and best fermion fit (blue) fits

Fig. 10
figure 10

Data of \(\Sigma \) in A1835 with the best Double Stretched Exponential fit (red) and the best fermion fit (blue)

Fig. 11
figure 11

Data of \({\overline{\Sigma }}-\Sigma \) in A1835 with the best Double Stretched Exponential fit (red) and the best fermion fit (blue)

Fig. 12
figure 12

Data of \(r(2\Sigma -{\overline{\Sigma }})\) in A1835 with the best Double Stretched Exponential fit (red) and the best fermion fit (blue). While the difference between the two models is statistically relevant, precise data beyond 1 Mpc could be decisive

5.2 Thermal fermions in A1835

For the thermal fermion profile in this cluster it appears that the scale parameters \(R_c\) and \(R_g\) of the galaxy distribution are both about 100 kpc. Taking them equal, \(R_g=R_c\), eliminates one free parameter, and giving as best fit

$$\begin{aligned}&m_{12}=(1.45 \pm 0.16)\,\mathrm{eV}, \quad \sigma =(1308 \pm 19 ) \,\mathrm{km}/\mathrm{s}, \quad \nonumber \\&\mu = (5.61 \pm 0.70 )\times 10^6\mathrm{km}^2/\mathrm{s}^2, \nonumber \\&M_G = (7.6\pm 6.2) \times 10^{13}M_\odot , \nonumber \\&R_c=(112\pm 32)\,\mathrm{kpc}, \, \qquad R_g=(150\pm 55) \mathrm{kpc}, \quad \nonumber \\&S_{ b}=(1.47\pm 0.21) \mathrm{gr}/\mathrm{cm}^2, R_{ b}= (8.6 \pm 0.16)\,\mathrm{kpc}, \quad \nonumber \\&n_{ b}=1.06 \pm 0.16. \end{aligned}$$
(56)

with \(S_b\) defined by (28). The best BCG parameters coincide with those in the DSE fit, but are more constrained. The BCG mass is

$$\begin{aligned} M_{ b}=(7.2\pm 0.9)\times 10^{12}M_\odot . \end{aligned}$$
(57)

Similar values at \(r_{200}\) and \(r_{500}\) arise, however, with smaller error bars,

$$\begin{aligned}&r_{200}=(1926 \pm 25)\,\mathrm{kpc},\quad M_{200}=(160 \pm 6)\times 10^{13} M_{\odot } , \nonumber \\&r_{500}= (1327 \pm 15) \,\mathrm{kpc},\quad M_{500}=(131.0 \pm 4.4)\times 10^{13} \, M_\odot .\nonumber \\ \end{aligned}$$
(58)

5.3 NFW-type profiles in A1835

The NFW situation is comparable to that in A1689, but here no WL data exist, hence no lensing data beyond 1.12 Mpc, which allows more flexibility.

The \({\overline{\Sigma }}\)-only fit has again a very good \(\chi ^2/\nu =0.72\). Similarly to the case in A1689, this deteriorates when more data are included, the basic reason seeming to be that the NFW profile decays too slowly at large r.

6 Summary

We have considered precise strong lensing data for the clusters A1689 and A1835. For A1689 we include existing weak lensing data. In both cases, the X-ray gas density is known from observations and fit to an analytical profile.

The strong lensing data have been gathered for cylindrical mass \(M_{2d}(r)\) or, equivalently, the cylindrical mass density \({\overline{\Sigma }}(r)=M_{2d}/\pi r^2\), where r is the projected distance to the cluster center. After binning, the data allow numerical differentiation, to yield data for the line-of-sight mass density \(\Sigma \) and, consequently, for the combinations \({\overline{\Sigma }}-\Sigma \) and \(2\Sigma -{\overline{\Sigma }}\). They emphasize the behavior at the origin and at large r, respectively.

Fits to a double stretched exponential (DSE) profile and to thermal fermion profile are considered. It is observed that both profiles fit reasonably well, with the fermions fitting better in A1689 and the DSE better in A1835. Somewhat surprisingly, NFW-type profiles fit considerably less well at this level of accuracy, even though the NFW profile is expected in the \(\Lambda \)CDM framework [35]; this is compensated, however, by the good fit of the DSE profile, of which the halo part was put forward as a better profile by the same authors with their collaborators [36]. Sharp data beyond 1 Mpc may help to discriminate more between the profiles, with the “winner” expected to be the double exponential profile. NFW and NFW-type profiles fit these precise data considerably less well.

The covariance matrix of the strong lensing data has very small eigenvalues, hence a cutoff is needed, as is well known. We propose a “poor man’s” regularization, which is better suited in situations such as lensing, where the observable decays at large distance. The regularization parameter has been chosen to get a reasonable value for \(\chi ^2/\nu \), but a theoretical criterion to fix it, such as by some maximal entropy condition, would be welcome.