1 Introduction

In the last decades, the growing availability of an increasingly larger amount of astrophysical and cosmological data led us into a so-called “precision cosmology” era. The effective model that best fits most of the collected observations is the \(\varLambda \)CDM model, which is based on General Relativity (GR) and on the introduction of two exotic fluids, the cold dark matter (DM) and the dark energy (DE). They should represent the \(\sim 27\%\) and \(\sim 68\%\) of the matter–energy content of the present-day Universe, respectively [1,2,3], and being responsible for the dynamical features of the Universe at all scales. Despite being the best model to explain the collected data, the \(\varLambda \)CDM paradigm is plagued by several problems, both experimental and theoretical [4]. On the one hand we have a huge variety of solutions for DM but we face the complete lack of any detection of any viable particle candidate for DM at fundamental scales [5]. On the other hand, we also have an humongous number of possibilities to describe DE, but also many theoretical shortcomings regarding its nature and behaviour, such as the discrepancy (\(\sim \)120 orders of magnitude) between the observed value of the cosmological constant and the vacuum energy density calculated via QFT. Other big issues are the presence of singularities in the theory and the completely unsatisfactory description of gravity at quantum level.

All these problems led to the idea of developing theories of gravity “beyond GR”, aiming at fixing its infrared (IR) and ultraviolet (UV) shortcomings. Several proposal for extended theories of gravity (ETGs) have been made during last thirty years [6, 7]. Some of these are based on modifications of the geometrical content of the theory, namely the Hilbert–Einstein Lagrangian; some others are based on modifications of the matter content, for example by adding extra scalar fields. Often these two kind of modifications are introduced together. The most famous and extensively investigated ETGs are f(R) theories [6, 8,9,10,11,12], where the Hilbert–Einstein Lagrangian is replaced by a general function of the Ricci scalar R, and scalar-tensor theories [13,14,15,16,17,18], in which one or more scalar fields are minimally or non-minimally coupled to gravity. These two classes of theories can be made equivalent, since f(R) theories can be recovered by scalar-tensor ones with some specific changes of variables and viceversa.

Other ETGs approaches replace the Ricci scalar R with the torsion scalar T, instead [19]. In fact, it is possible to build the so called Teleparallel Equivalent of General Relativity (TEGR) by giving up the Equivalence Principle and replacing the Ricci scalar R with the Torsion scalar T. TEGR, firstly introduced by Einstein himself [20], and GR are dynamically equivalent. However, the equivalence does not hold for the teleparallel equivalent of f(R), i.e. f(T) theories. The latter leads to second-order field equations, while the field equations derived by f(R) gravity, in metric formalism, are of fourth-order [21].

An interesting approach is the introduction of non-local terms [13, 22]. Non-locality is one of the main feature of quantum theory, thanks to the Heisenberg Principle, and it automatically arises in quantum field theory (QFT) when one-loop effective actions are considered. Instead, GR is a classical theory, hence local by definition. In order to merge the gravitational interaction with the quantum formalism, it is thus possible to implement an effective approach by adding non-local operators to the gravitational Lagrangian. The procedure can be considered as an effective approach to link gravitation and QFT.

In general, two main different families of non-local theories of gravity can be considered. The first ones are the Infinite Derivatives Theories of Gravity (IDGs), which present, at short range, non-locality caused by terms with entire analytic transcendental functions of a differential operator. The most commonly used terms are the exponential functions of the covariant d’Alembert operator, \(\Box = g_{\mu \nu } \nabla ^{\mu } \nabla ^{\nu }\). IDGs have been introduced to solve one of the main problems of f(R) theories, namely the lack of unitarity caused by bad ghosts. Moreover, this class of non-local terms weakens gravitational attraction at short length scale. Hence, a natural solution to UV shortcomings of GR is provided [23,24,25,26].

Then, we also have Integral Kernel Theories of Gravity (IKGs), inspired by quantum corrections obtained in QFT on curved spacetime. In IKGs, the Lagrangian is extended by adding transcendental functions of the fields (R, T, G, etc.), which can be always represented by the integral kernels of differential operators, such as

$$\begin{aligned} \Box ^{-1} \eta (x) \equiv \int d^{4}x' G(x,x') \eta (x') \end{aligned}$$
(1)

Due to its integral nature, this class of terms gives rise to long range non-localities, which can fix the IR shortcomings of GR [27,28,29,30]. Interesting IKG models have been proposed in [31, 32]. Subsequent comparisons with cosmological observations [33,34,35,36] showed agreement with data at a level statistically equivalent to \(\varLambda \)CDM and an intriguing reduction of the Hubble tension.

In this paper, we consider a metric IKG proposed in [28] to explain the late-time cosmic acceleration. Non-locality is introduced by a general function of the inverse d’Alembert operator, Eq. (1), the form of which can be found by applying the so-called Noether Symmetry Approach as discussed in [22]. Furthermore, we consider the local representation of the theory, firstly introduced in [37]. Two auxiliary scalar fields emerge as Lagrange multipliers, so that the new scalar-tensor theory is made equivalent to the original non-local theory. The Newtonian limit is finally performed and two Newtonian potentials, \(\phi \) and \(\psi \), are derived. These point-mass potentials can be generalized to extended spherically symmetric mass distributions which can model gravitational structures like galaxy clusters. Then, the non-local model can be compared to observational data. In other words, signatures of non-locality can emerge from large-scale structure.

Here, we perform an analysis similar to [38,39,40] using a sample of 19 high-mass galaxy clusters, for which strong- and weak-lensing data sets from deep Hubble Space Telescope (HST) and ground-based wide-field observations are available. All data products used in this study are obtained from the Cluster Lensing and Supernova survey with Hubble (CLASH) program [41]. The aim of the paper is to constrain the free parameters of the non-local theory and compare the results to those obtained in [42, 43], where the same theory has been analyzed using the orbits of S2 stars around the Galactic center.

The paper is organised as follow: in Sect. 2 we present the non-local model and its local scalar-tensor representation. Then, we sketch the Noether Symmetry Approach to derive the form of the non-local terms related to the existence of conserved quantities. Moreover, we perform the weak-field limit in order to find the Newtonian potentials, \(\phi \) and \(\psi \), and we generalize the results to extended spherically symmetric systems. Finally, our choice for the mass density profile is presented and an overview of the gravitational lensing theory, used to derive the theoretical prediction, is introduced. In Sect. 3, the data sets of the CLASH program that we have used in our analysis is presented. Then, we introduce the main points of the statistical analysis we have performed. Furthermore, in Sect. 4, the results of the statistical analysis are discussed and a comparison between them and previous results in the literature is made. In Sect. 5, the conclusions are drawn.

2 The model

The model we are going to study in this paper was firstly introduced in [28]. Here, the Hilbert–Einstein action is extended by adding an extra non-local term characterized by a general function of the inverse d’Alembert operator

$$\begin{aligned} S \, = \, \frac{1}{16 \pi G} \int d^{4}x \, \sqrt{-g} \, \Big [ R \big ( 1 + f(\Box ^{-1}R) \big ) \Big ] \end{aligned}$$
(2)

The function f is called distortion function and GR is immediately restored when f is set to zero. Notice that the theory defined by Eq. (2) can be seen as a special case of the Generalized non-local Teleparallel Gravity (GNTG) proposed in [44], from which the above model is recovered by setting the coupling constants equal to 1 and \(-1\), respectively.

The model in Eq. (2) explains well the current cosmic acceleration. In fact, the non-local terms here introduced allow a delayed response to the transition from radiation to matter dominance, and then avoid fine tunings often introduced to address late time acceleration [28]. Furthermore, using a proper piecewise-defined distortion function, the non-local model may lead to the unification of the early-time inflation with late-time acceleration [37].

A local representation of the non-local model Eq. (2) has been proposed in [37], where an equivalent scalar-tensor theory is built by introducing two auxiliary scalar fields, \(\eta \) and \(\xi \), so that the action is rewritten as follows

$$\begin{aligned} \begin{aligned} S \,&= \, \frac{1}{16 \pi G} \int d^{4}x \, \sqrt{-g} \, \Big [ R \big ( 1 + f(\eta ) \big ) + \xi \big ( \Box \eta - R \big ) \Big ] \, \\&= \, \frac{1}{16 \pi G} \int d^{4}x \, \sqrt{-g} \, \Big [ R \big ( 1 + f(\eta ) - \xi \big ) - \nabla ^{\mu }\xi \, \nabla _{\mu }\eta \Big ] \end{aligned}\nonumber \\ \end{aligned}$$
(3)

where the term \(\xi \Box \eta \) has been integrated by parts and the boundary term is set to zero. A general procedure to build a local representation of non-local theories is presented in [22]. Varying the action in Eq. (3) with respect to \(\xi \) and \(\eta \), we obtain the two Klein-Gordon equations:

$$\begin{aligned} \Box \eta= & {} R \Rightarrow \eta = \Box ^{-1}R \end{aligned}$$
(4)
$$\begin{aligned} \Box \xi= & {} -R \, \frac{df(\eta )}{d\eta } \end{aligned}$$
(5)

where the auxiliary fields behave as Lagrange multipliers. Equation (4) is a constraint such that substituting it into (3) one recovers (2). Instead, Eq. (5) is a non-trivial equation for the dynamics of the scalar field \(\xi \). Furthermore, the variation of the action in Eq. (3) plus the matter action, with respect to the metric tensor \(g_{\mu \nu }\), yields to the following gravitational field equation:

$$\begin{aligned} G_{\mu \nu }= & {} \frac{1}{1+f(\eta )-\xi } \, \Bigg [ \big (8\pi G \big )^{2} \, T^{(m)}_{\mu \nu } + \nabla _{\mu }\xi \, \nabla _{\nu }\eta \nonumber \\&+ \big ( \nabla _{\mu }\nabla _{\nu } - g_{\mu \nu } \, \Box \big ) \big ( f(\eta ) - \xi \big ) - \frac{1}{2} g_{\mu \nu } \nabla ^{\alpha }\xi \, \nabla _{\alpha }\eta \Bigg ] \end{aligned}$$
(6)

A non-trivial form for the distortion function has been reconstructed in [45] in order to reproduce the \(\varLambda \)CDM expansion. The model provided by substituting such form of f in Eq. (2) has been analysed in its localized version in [46], using Redshift-Space Distortions (RSD) data: a slightly lower value of \(\sigma _{8}\) is derived and, consequently, a better agreement with data results with respect to the \(\varLambda \)CDM model. Moreover, in [47], the author has shown that the same results can be obtained in the original non-local version in [45], as long as the initial conditions are set the same. Further analysis have been performed in [48], using Cosmic Microwave Background (CMB), Type Ia supernovae (SNIa) and RSD observations: a deficit of growth of linear structures and a higher lensing power as compared to \(\varLambda \)CDM is derived, so that a significant shift for the parameter \(\sigma _{8}\) and \(\tau _{re}\) results. It follows a CMB-RSD tension and “weak” evidence for the \(\varLambda \)CDM model. The same authors also highlight a modification in the propagation equations for Gravitational Waves (GWs), which provides a powerful way for testing deviations to GR with future third generation GW interferometers. See also [29, 30].

A drawback that could arise in the non-local theories of gravitation is the lack of a screening mechanism, which is necessary to avoid any undesired effect at Solar System scale. In [49], it is argued that inside bound objects \(\Box ^{-1}R\) acquires positive value, whereas it is negative in cosmology. Thus, as one is free to choose the distortion function, one can set it so that it vanishes for positive values of \(\Box ^{-1}R\), i.e. \(f(\Box ^{-1}R) \sim \theta (\Box ^{-1}R)\) where \(\theta \) is the Heaviside step function. It follows a perfect screening mechanism which allows the model to reduce to GR at the Solar System scales. However, in [50], has been shown that the value of \(\Box ^{-1}R\) is actually negative also at Solar System scale, therefore this procedure cannot be applied. As a consequence, the above model would present a time dependence of the effective Newton constant in the small scale limit and it would thus be ruled out by Lunar Laser Ranging (LLR) observations. This conclusion, drawn in [50], seems to be too strong, since it is still not clear how a Friedman–Lemaître–Robertson–Walker (FLRW) background quantities behave when evaluated from cosmological scales down to Solar System ones, where the system decouples from the Hubble flow. In fact, a full non-linear time- and scale-dependent solution around a non-linear structure would be necessary.

However, some proposals exist in this direction and the so-called Vainshtein mechanism [51] can be considered the paradigm to realize the screening. The main problem is that GR is very well-tested at Solar System scales and then any extension requires fine constraints to be physically viable. For example, f(R) gravity requires an accurate chameleon mechanism to give realistic models ranging from Solar System up to cosmology (see [52] for a discussion). Basically, any screening mechanism require a scalar field coupled to matter, and mediating a fifth-force which might span from Solar System up to cosmological scales. For high density, this force has to be suppressed, so that no deviation from GR should emerge. For lower densities, the modification to GR become effective with some observational signature. The screening can be accomplished in several ways: for example, a weak coupling between the field and matter in regions of high density can be considered. This coupling should induce a weak fifth-force. In this situation, the field can acquire a large mass in high density environments, being short-ranged and undetectable. In lower density regions, it should be light and long-ranged, as in the case of chameleon fields [52]. Finally, the field may change the kinetic contribution in the effective Lagrangian, with first or second order derivatives becoming important in a certain range, as in the Vainshtein case. In [38], it has been discussed for clusters of galaxies in presence of galileon fields. In the case we are going to discuss here, non-local terms can be “localized” and they result in an effective scalar field depending on the scale [36]. This feature can naturally give rise to some screening mechanism solving the above reported problems. We will discuss this topic in detail elsewhere.

2.1 The Noether symmetry approach

The action in Eq. (3) as well as the two non-trivial field equations Eqs. (5) and (6), all depend on the specific form of the distortion function \(f(\Box ^{-1}R)\). It is thus possible to use the Noether Symmetry Approach [53] to select a form of the distortion function such that the theory is invariant under point transformations. This method has been extensively used in the literature [44, 53,54,55,56,57], aiming to study modified theories of gravity based on general functions, such as f(R), f(T), etc. Such theories have to be checked against data in order to be constrained and then obtain physically reliable models. However, it is possible to constrain modified-gravity theories using a theoretical approach rather than a phenomenological one. Specifically, Noether symmetries can be used as a geometric criterion to choose among different models. Moreover, the presence of symmetries imply conserved quantities that, in many cases, have a physical meaning and allow to reduce dynamics and find exact solutions [53].

The Noether Symmetry Approach works as follows:

  • One first selects a class of background space-time metrics, which, in our case, is spherically symmetric, and one therefore writes the metric;

  • Then, one substitutes the metric into the Lagrangian density Eq. (2) and, after integrating out all the total derivative terms, it is possible to obtain a point-like canonical Lagrangian;

  • One derives the Noether vector X, i.e. the infinitesimal generator of point transformations;

  • It is then possible to apply the Noether symmetry condition [54]

    $$\begin{aligned} X^{[1]} {\mathcal {L}} + {\mathcal {L}} \bigg ( \frac{d\xi ^{t}}{dt} + \frac{d\xi ^{r}}{dr} \bigg ) \, = \, \frac{dh^{t}}{dt} + \frac{dh^{r}}{dr} \end{aligned}$$
    (7)

    where \(X^{[1]}\) is the first prolongation of X, \(\xi ^{t}\) and \(\xi ^{r}\) are the coefficients of the Noether vector and \(h^{t}\) and \(h^{r}\) are two arbitrary functions depending on time and generalized coordinates. Expanding the condition Eq. (7), one finds a system of equations with 9 unknown variables, which yields to two possible models that are invariant under point transformations

    $$\begin{aligned} f(\eta ) \,= & {} \, c_{4} + c_{3} , \eta \end{aligned}$$
    (8)
    $$\begin{aligned} f(\eta ) \,= & {} \, c_{4} + \frac{c_{5}}{c_{1}} \, e^{c_{1}\eta } . \end{aligned}$$
    (9)

See [22] for details. An overview of the general method is given in [54], while the explicit calculation of the Noether Symmetry Approach applied to the model (2) is performed in [42]. It is interesting to note that the two forms Eq. (8) and (9) of the distortion function correspond to those phenomenologically introduced in [58, 59].

Hereinafter, we consider the exponential form Eq. (9) for the distortion function and we set all the integration constants to one. Thus, we have

$$\begin{aligned} f(\eta ) \, = \, 1 + e^{\eta } . \end{aligned}$$
(10)

As reported in [60], this form of coupling is particularly relevant to achieve the so called “super-renormalizability” for effective theories of gravity. Here, it is selected thanks to the existence of related Noether symmetries.

2.2 The Newtonian limit

Now, let us perform the weak field limit for the above non-local model, in order to derive the Newtonian potentials which will be used to match observations. The gravitational field for a static and spherically symmetric metric is described by

$$\begin{aligned} ds^{2} \, = \, A(r) \, dt^{2} - B(r) \, dr^{2} - r^{2} \, d\varOmega ^{2} . \end{aligned}$$
(11)

Notice that the Birkhoff theorem is not guaranteed in non-local gravity, but we expect that static and spherically symmetric solutions are a good approximation when the Newtonian limit is performed. Not even the existence of the solution \(B(r) = 1/A(r)\) is guaranteed in modified theories of gravity [61], so it cannot be chosen a priori in Eq. (11).

It is well known, from GR, that expanding \(g_{00}\) up to \(v^{2} \sim {\mathcal {O}}(2)\), one obtains the Newtonian potential \(\phi \) for time-like particles. However, for theories beyond GR the Post-Newtonian (PN) limit is necessary, i.e.

$$\begin{aligned} g_{00} \sim {\mathcal {O}}(6), \quad g_{0i} \sim {\mathcal {O}}(5), \quad g_{ij} \sim {\mathcal {O}}(4) . \end{aligned}$$
(12)

When higher-order corrections are taken into account, two length scales arise, which are related to the scalar degrees of freedom we introduced to localize the theory. The expansion of the metric components as well as the scalar fields therefore gives

$$\begin{aligned} A(r) \,= & {} \, 1 + \frac{1}{c^2} \phi ^{(2)}(r) + \frac{1}{c^4} \phi ^{(4)}(r) + \frac{1}{c^6} \phi ^{(6)}(r) + {\mathcal {O}}(8) \nonumber \\ \end{aligned}$$
(13)
$$\begin{aligned} B(r) \,= & {} \, 1 + \frac{1}{c^2} \psi ^{(2)}(r) + \frac{1}{c^4} \psi ^{(4)}(r) + {\mathcal {O}}(6) \end{aligned}$$
(14)
$$\begin{aligned} \eta (r) \,= & {} \, \eta _{0} + \frac{1}{c^2} \eta ^{(2)}(r) + \frac{1}{c^4} \eta ^{(4)}(r) + \frac{1}{c^6} \eta ^{(6)}(r) + {\mathcal {O}}(8)\nonumber \\ \end{aligned}$$
(15)
$$\begin{aligned} \xi (r) \,= & {} \, \xi _{0} + \frac{1}{c^2} \xi ^{(2)}(r) + \frac{1}{c^4} \xi ^{(4)}(r) + \frac{1}{c^6} \xi ^{(6)}(r) + {\mathcal {O}}(8)\nonumber \\ \end{aligned}$$
(16)

Combining these expansions together with the 00- and 11-components of Eq. (6) and with the equations for the scalar fields, Eqs. (4) and (5), one obtains [42]

$$\begin{aligned} A(r)= & {} \, 1 - \frac{2GM\eta _{c}}{c^{2}r} + \frac{G^{2}M^{2}}{c^{4}r^{2}} \Bigg [ \frac{14}{9}\eta _{c}^{2} + \frac{18 r_{\xi } - 11 r_{\eta }}{6 r_{\eta } r_{\xi }} \, r \Bigg ] \nonumber \\&- \frac{G^{3}M^{3}}{c^{6}r^{3}} \Bigg [ \frac{50 r_{\xi } - 7 r_{\eta }}{12 r_{\eta } r_{\xi }} \, \eta _{c} \, r + \frac{16}{27} \eta _{c}^{3} - \frac{2 r_{\xi }^{2} - r_{\eta }^{2}}{r_{\eta }^{2} r_{\xi }^{2}} \, r^{2} \Bigg ]\nonumber \\ \end{aligned}$$
(17)
$$\begin{aligned} B(r)= & {} 1 + \frac{2GM\eta _{c}}{3 c^{2}r} + \frac{G^{2}M^{2}}{c^{4}r^{2}} \Bigg [ \frac{2}{9}\eta _{c}^{2} + \frac{3 r_{\eta } - 2 r_{\xi }}{2 r_{\eta } r_{\xi }} \, r \Bigg ] \end{aligned}$$
(18)
$$\begin{aligned} \eta (r)= & {} \frac{4GM\eta _{c}}{3 c^{2}r} - \frac{G^{2}M^{2}}{c^{4}r^{2}} \Bigg [ \frac{11 r_{\eta } + 6 r_{\xi }}{6 r_{\eta } r_{\xi }} \, r - \frac{2}{9}\eta _{c}^{2} \Bigg ] \nonumber \\&- \frac{G^{3}M^{3}}{c^{6}r^{3}} \Bigg [ \frac{r^{2}}{r_{\eta }^{2}} - \frac{25 r_{\eta } - 14 r_{\xi }}{12 r_{\eta } r_{\xi }} \, \eta _{c} \, r - \frac{4}{81} \eta _{c}^{3} \Bigg ] \end{aligned}$$
(19)
$$\begin{aligned} \xi (r)= & {} 1 + \frac{G^{2}M^{2}}{c^{4}r^{2}} \Bigg [ \frac{2}{3}\eta _{c}^{2} - \frac{13 r_{\eta } - 6 r_{\xi }}{6 r_{\eta } r_{\xi }} \, r \Bigg ] + \frac{G^{3}M^{3}}{c^{6}r^{3}} \nonumber \\&\times \Bigg [ \frac{20}{27} \eta _{c}^{3} - \frac{r_{\eta }^{2} - r_{\xi }^{2}}{r_{\eta }^{2} r_{\xi }^{2}} \, r^{2} - \frac{131 r_{\eta } + 6 r_{\xi }}{36 r_{\eta } r_{\xi }} \, \eta _{c} \, r \Bigg ] \end{aligned}$$
(20)

where \(\eta _{c}\) is a dimensionless constant, which can be set to 1 in order to recover the Newtonian limit for \(\phi \). The two parameters \(r_{\eta }\) and \(r_{\xi }\), which arise in the higher-order terms \({\mathcal {O}}(4)\) and \({\mathcal {O}}(6)\), are the length scales related to the the scalar degrees of freedom and thus to the non-localities. These length scales are the free parameters of the theory that we want to constrain by observations.

2.3 Extended spherically symmetric systems

In order to confront the theory against data, it is necessary to generalize the results in Eqs. (17), (18) to extended mass distributions. First of all, from Eqs. (17), (18) we derive the point-mass expressions for the gravitational \((\phi )\) and metric \((\psi )\) potentials, which are, respectively:

$$\begin{aligned} \phi (r)&= \phi _{0}(r) + \phi _{1}(r) + \phi _{2}(r) , \\ \phi _{0}(r)&= -\frac{GM}{r} , \nonumber \\ \phi _{1}(r)&= \frac{G^{2}M^{2}}{2c^{2}r^{2}} \bigg [ \frac{14}{9} + \left( \frac{3}{r_{\eta }} - \frac{11}{6 r_{\xi }}\right) r \bigg ] , \nonumber \\ \phi _{2}(r)&= \frac{G^{3}M^{3}}{2c^{4}r^{3}} \bigg [\frac{7 r_{\eta }-50r_{\xi }}{12r_{\eta }r_{\xi }} r -\frac{16}{27} + \frac{2r_{\xi }^{2}-r_{\eta }^{2}}{r_{\eta }^{2}r_{\xi }^2}r^2\bigg ] ; \nonumber \end{aligned}$$
(21)

and

$$\begin{aligned} \psi (r)&= \psi _{0}(r) + \psi _{1}(r) , \nonumber \\ \psi _{0}(r)&= -\frac{GM}{3r}, \nonumber \\ \psi _{1}(r)&= -\frac{G^{2}M^{2}}{2c^{2}r^{2}} \bigg [ \frac{2}{9} + \left( \frac{3}{2 r_{\xi }} - \frac{1}{r_{\eta }} \right) r \bigg ] . \end{aligned}$$
(22)

Before proceeding, it is worth having a look at the orders of magnitude of each of the above contributions, to forecast their weight in the analysis and thus estimate which kind of constraints (if any) it is possible to put on the theory. A rough estimation, using the typical masses and radii of the clusters of galaxies (respectively, \(M=10^{15}\) \(M_{\odot }\) and \(r=3\) Mpc), tells us that:

$$\begin{aligned}&{\mathcal {O}}\left( \frac{G M}{r}\right) \sim 10^{-27} \; \mathrm {kpc^{2} \, s^{-2}} \end{aligned}$$
(23)
$$\begin{aligned}&{\mathcal {O}}\left( \frac{G^2 M^2}{2 c^2 r^2}\right) \sim 10^{-32} \; \mathrm {kpc^{2} \, s^{-2}} \end{aligned}$$
(24)
$$\begin{aligned}&{\mathcal {O}}\left( \frac{G^3 M^3}{2 c^4 r^3}\right) \sim 10^{-37} \; \mathrm {kpc^{2} \, s^{-2}} \end{aligned}$$
(25)

It immediately follows that the terms (25) are completely negligible in our analysis, and we will not consider them further. However we can also make two additional and more important considerations about these qualitative results.

The first one is that, in order to expect a significant contribution from \(\phi _1\) and \(\psi _1\) with respect to the standard terms, \(\phi _0\) and \(\psi _0\), we should have \(r_{\eta } \sim r_{\xi }\sim 10^{-5}r\).

The second consideration is probably even more decisive: in case the non-local corrections to the potential were not large enough, we would have here a theory which does not reduce to GR. In fact, it is straightforward to see that in that case we would have \(\psi \sim \phi /3\). If such a scenario would be able to fit the data (and if yes, with which implications) is something we are going to check carefully in the following analysis.

The generalization to extended system is based on performing the following integrals (using spherical coordinates)

$$\begin{aligned} \varPhi _{0}(r) = \int _{0}^{\infty } \rho (x') x'^2 dx'\int _{0}^{\pi } \sin \theta ' d\theta ' \int _{0}^{2\pi } d\phi ' \phi _0(r) ,\nonumber \\ \end{aligned}$$
(26)

with r defined as:

$$\begin{aligned} r = |{\varvec{x}} - \varvec{x'}| = \big (x^{2}+ x'^{2} - 2\,x\,x'\,\mathrm {cos}\theta \big )^{1/2} , \end{aligned}$$
(27)

where \({\varvec{x}}\) is the vector position of the point in the space where we want to calculate the potential, and \(\varvec{x'}\) is the vector position connected to the mass distribution (source of the gravitational potential). Note how the integration over the radial coordinate has to be performed between 0 and \(\infty \) because Newton’s theorems are not guaranteed in non-local gravity, so that we cannot apply the Gauss theorem and the effects of external shell of matter cannot be neglected.Footnote 1

While the term \(\varPsi _0(r)\) is derived as in Eq. (26), \(\varPhi _1(r)\) and \(\varPsi _1(r)\) need an intermediate step, as they involve the term \(M^2\). Since the mass element can be written as

$$\begin{aligned} dM = \rho (x') \, x'^{2} \, dx' \, \sin \theta \, d\theta \, d\phi \, , \end{aligned}$$
(28)

and considering that \(dM^2 = 2 M dM\), with

$$\begin{aligned} M(r) = \, 4\pi \int _{0}^{r} dx' \, x'^{2} \, \rho (x')\, \end{aligned}$$
(29)

we get:

$$\begin{aligned} \varPhi _{1}(r)&= 2 \int _{0}^{\infty } \rho (x') x'^2 dx'\int _{0}^{x'} \rho (x'') x''^2 dx''\nonumber \\&\quad \times \int _{0}^{\pi } \sin \theta ' d\theta ' \int _{0}^{\pi } \sin \theta '' d\theta '' \\&\quad \times \int _{0}^{2\pi } d\phi ' \int _{0}^{2\pi } d\phi '' \phi _1(r) . \nonumber \end{aligned}$$
(30)

The same results holds for \(\varPsi _{1}(r)\).

2.4 The mass density profile

To compute the integrals of the extended potentials, it is necessary to make a choice for the mass density profile describing the mass distribution in galaxy clusters. In our analysis, the matter distribution in galaxy clusters is considered to be dominated by DM. We have chosen to describe the cluster mass distribution with a spherically symmetric Navarro–Frenk–White (NFW) mass density profile [63],

$$\begin{aligned} \rho _\mathrm {NFW}(r) = \frac{\rho _{s}}{\frac{r}{r_{s}} \Big ( 1 + \frac{r}{r_{s}} \Big )^{2}} \end{aligned}$$
(31)

where \(\rho _{s}\) and \(r_{s}\) are the characteristic halo density and radius, respectively. Here, the choice of the NFW density profile is motivated by cosmological N-body simulations [63, 64] and observational results based on gravitational lensing [65] (see Sect. 3), both in GR context.

It is useful to express the NFW parameters in terms of the overdensity radius \(r_\varDelta \) and the dimensionless concentration parameter \(c_\varDelta \) as

$$\begin{aligned} \rho _{s}= & {} \frac{\varDelta }{3} \rho _{c} \frac{c_{\varDelta }^{3}}{\mathrm {ln}(1+ c_{\varDelta }) - \frac{c_{\varDelta }}{1 + c_{\varDelta }}}, \end{aligned}$$
(32)
$$\begin{aligned} r_{s}= & {} \frac{r_{\varDelta }}{c_{\varDelta }}, \end{aligned}$$
(33)

where \(r_{\varDelta }\) is the spherical radius within which the mean interior density equals \(\varDelta \) times the critical density \(\rho _c\) of the Universe at the cluster redshift. Instead of working with \(r_{\varDelta }\), it is more common to use \(M_{\varDelta }\), i.e., the total mass contained within the overdensity radius \(r_{\varDelta }\),

$$\begin{aligned} M_{\varDelta } = \frac{4}{3} \pi r_{\varDelta }^{3} \varDelta \rho _{c} = 4 \pi \rho _{s} r_{s}^{3} \bigg [ \mathrm {ln}(1+ c_{\varDelta }) - \frac{c_{\varDelta }}{1 + c_{\varDelta }} \bigg ] . \end{aligned}$$
(34)

In the literature, the typical choice of the overdensity characterizing the halo mass is \(\varDelta = 200\), while higher overdensities, such as \(\varDelta = 500\) and 2500, are used to characterize the properties of halos in their inner regions. In our analysis, we set \(\varDelta =200\). Thus, the free NFW parameters that we have used for our analysis are \(\{ M_{200}, c_{200} \}\).

Once the choice for the mass density profile has been made, it is finally possible to compute analytically the integrals for the extended potentials from Eqs. (26) and (30), which we omit to write here explicitly just for the sake of clarity.

2.5 Gravitational lensing

All the previous machinery is fundamental to calculate the quantity which we will then compare with observational data which, as explained in next section, are based on gravitational lensing event analysis from the clusters of galaxies we have considered.

The general configuration for a gravitational lensing system [66, 67] has a foreground object (the lens) between the observer and the background source of light. The angular diameter distances between the observer and the lens and between the observer and the source are denoted by \(D_{L}\) and \(D_{S}\), respectively, while the angular diameter distance between the lens and the source is \(D_{LS}\). The angular diameter distance as a function of redshift is defined as:

$$\begin{aligned} D_{A}(z) = \frac{c}{1+z} \int _{0}^{z} \frac{dz'}{H(z')} . \end{aligned}$$
(35)

In a \(\varLambda \)CDM scenario, the Hubble function H(z) is given by the first Freedman equation,

$$\begin{aligned} H(z) = H_{0} \sqrt{\varOmega _{m} (1+z)^{3} + \varOmega _{k} (1+z)^{2} + \varOmega _{\varLambda }} . \end{aligned}$$
(36)

In our analysis, we assume as our fiducial background cosmology the one from the latest release of the Planck survey [1], with \(\varOmega _m = 0.308\), \(H_0 = 67.89\) and \(\varOmega _k=0\) (from which \(\varOmega _{\varLambda } = 1-\varOmega _m\)). It is important to note here that we are implicitly assuming that the non-local model we are analyzing behaves on cosmological scales as this chosen \(\varLambda \)CDM one, at least effectively. Any cosmological implication and analysis is out of the scopes of this paper, however it is worth noticing that, according to the Deser and Woodard model [28] which we are essentially adopting here, using non-local term is a natural way to address dark energy behaviour and recover the observed late universe (see also [22]. In this perspective, assuming \(\varLambda \)CDM is in agreement with this result.

It is generally verified that the distances \(D_{L}\) and \(D_{LS}\) are much larger than the physical extension of the lens, so that the latter can be approximated as a two-dimensional system (“thin-lens” approximation). In such a configuration the relation among the angular position of the source \((\mathbf {\theta }_{s})\) and the angular position of the image \((\mathbf {\theta })\) is given by the lens equation:

$$\begin{aligned} \mathbf {\beta } = \mathbf {\theta } - \frac{D_{LS}}{D_S}\, \hat{\mathbf {\alpha }}(\mathbf {\theta }) , \end{aligned}$$
(37)

where \({\hat{\mathbf {\alpha }}}\) is the deflection angle, which, in GR, is defined as:

$$\begin{aligned} \hat{\mathbf {\alpha }} = \frac{2}{c^{2}} \int _{-\infty }^{+\infty } \mathbf {\nabla }_{\perp } \varPhi \mathrm {d}z , \end{aligned}$$
(38)

with \(\mathbf {\nabla }_{\perp }\) being the two-dimensional gradient operator perpendicular to the light propagation and z the line-of-sight direction. The deflection angle is then directly related to the quantity which we will use to constrain our model, the convergence

$$\begin{aligned} \kappa (R) \equiv \frac{1}{c^{2}} \frac{D_{LS}D_{L}}{D_{S}} \int _{-\infty }^{+\infty } \nabla ^{2}_{r} \varPhi (R,z)\mathrm {d}z \end{aligned}$$
(39)

where: R is the two-dimensional projected radius in the lens plane;Footnote 2\(r=\sqrt{R^2+z^2}\) is the three-dimensional radius; \(\nabla ^{2}_{r}\) is the Laplacian operator in spherical coordinates; and c is the speed of light. If we make use of the Poisson equation

$$\begin{aligned} \nabla ^{2}_{r} \varPhi = 4\pi G \rho (r)\, \end{aligned}$$
(40)

we then obtain the final expression for the convergence

$$\begin{aligned} \kappa (R) = \int _{-\infty }^{+\infty } \frac{4 \pi G}{c^{2}} \frac{D_{LS}D_{L}}{D_{S}} \rho (R,z)\mathrm {d}z \equiv \frac{\varSigma }{\varSigma _{cr}}\, \end{aligned}$$
(41)

where \(\varSigma (R)\) is the surface density of the lens, defined as

$$\begin{aligned} \varSigma (R) = \int _{\infty }^{+\infty } \rho (R,z) \mathrm {d}z , \end{aligned}$$
(42)

and \(\varSigma _{c}\) is the critical surface density for gravitational lensing,

$$\begin{aligned} \varSigma _{c} \, \equiv \, \frac{c^{2}}{4\pi G} \, \frac{D_{s}}{D_{ls}D_{l}} . \end{aligned}$$
(43)

If we want to generalize all the results which we have obtained so far, valid in GR, we must express the convergence in its most general form as

$$\begin{aligned} \kappa (R) \equiv \frac{1}{c^{2}} \frac{D_{LS}D_{L}}{D_{S}} \int _{-\infty }^{+\infty } \nabla ^{2}_{r} \left( \frac{\varPhi (R,z) + \varPsi (R,z)}{2}\right) \mathrm {d}z\,\nonumber \\ \end{aligned}$$
(44)

where the potentials \(\varPhi \) and \(\varPsi \) which we are going to use are those defined in Sect. 2.3.

3 The data

The data sets we use for our analysis are obtained from the CLASH program [41]. One of the main goals of the CLASH program was to precisely determine the mass density profiles of 25 high-mass galaxy clusters using deep multi-band HST imaging, in combination with wide-field weak-lensing observations [68]. The CLASH sample contains 20 hot X-ray clusters (\(>5\) keV) with nearly concentric X-ray isophotes and a well-defined X-ray peak located close to the brightest cluster galaxy (BCG). Notice that no lensing preselection was used to avoid a biased sample towards intrinsically concentrated clusters and those systems whose major axis is preferentially aligned with the line of sight. Cosmological hydrodynamical simulations suggest that the CLASH X-ray-selected subsample is mostly composed of relaxed systems (\(\sim 70\%\)) and largely free of such orientation bias [69]. Additionally, the CLASH sample has five clusters selected by their exceptional lensing strength so as to magnify galaxies at high redshift. These clusters often turn out to be dynamically disturbed systems [67].

The CLASH sample spans nearly an order of magnitude in mass () and covers a wide redshift range (\(0.18< z < 0.90\) with a median redshift of \(z_\mathrm {med} = 0.40\)). For each of the 25 clusters, HST weak- and strong-lensing data products are available in the central regions [70]. For 20 of them (16 X-ray-selected and 4 lensing-selected clusters), radial convergence profiles were reconstructed [65] from the combination of ground-based weak-lensing shear and magnification data and HST lensing data.

In our analysis, we focus on 15 X-ray-selected and 4 lensing-selected clusters taken from the CLASH subsample analyzed in [65]. Here, one of the X-ray-selected clusters (RXJ1532) has been discarded because no multiple image systems have been identified in the cluster [70] and thus the mass reconstruction is based only on the wide-field weak-lensing data [65]. Our analysis sample spans a redshift range of \(0.187 \le z \le 0.686\), with a median redshift of \(z_\mathrm {med}= 0.352\). The typical resolution limit of the mass reconstruction, set by the HST lensing data, is 10 arcseconds, which corresponds to \(\approx 35 h^{-1}\) kpc at \(z_\mathrm {med}\). Note that, according to [69], about half of the selected sample clusters are expected to be unrelaxed.

It was found in [65] that the ensemble-averaged surface mass density \(\varSigma (R)\) of the CLASH X-ray-selected subsample is best described by the NFW profile, when GR is considered. The NFW model describes well the DM distribution in clusters, which is the dominant component over the whole cluster scale. The cluster baryons, such as the X-ray-emitting hot gas and BCGs, are sensitive to non-gravitational and local astrophysical phenomena. Accordingly, hydrostatic total mass estimates, based on the X-ray probe, are highly influenced by the dynamical and physical conditions of the cluster. In contrast, gravitational lensing can provide a direct probe of the projected mass distribution in galaxy clusters.

3.1 Statistical analysis

The aim of this work is to constrain the parameters of the non-local model (2) and the parameters describing the NFW profile for each individual cluster. Thus, the vector of our free theoretical parameters is \(\varvec{\theta } = \{c_{200}, \, M_{200}, \, r_{\eta }, \, r_{\xi } \}\). The \(\chi ^{2}\) function for each cluster used in the analysis is defined as

$$\begin{aligned} \chi ^{2} = \big ( \varvec{\kappa ^{theo}}(\varvec{\theta }) - \varvec{\kappa ^{obs}} \big ) \cdot {\mathbf {C}}^{-1} \cdot \big ( \varvec{\kappa ^{theo}}(\varvec{\theta }) - \varvec{\kappa ^{obs}} \big ) , \end{aligned}$$
(45)

where \(\varvec{\kappa ^{obs}}\) is the data vector containing the observed convergence values. The data vector consists of 15 data elements, each representing the value of \(\kappa \) measured in each radial bin. The vector \(\varvec{\kappa ^{theo}}(\varvec{\theta })\) contains theoretical predictions for the model convergence calculated from Eq. (44). Finally, \({\mathbf {C}}\) is the covariance error matrix constructed by [65].

The \(\chi ^{2}\) function is then minimized using our own Monte Carlo Markov Chain (MCMC) code written in \(\texttt {Wolfram Mathematica}\). The convergence of the chains has been checked according to the method proposed in [71]. Full convergence has been reached for 18 out of 19 sample clusters, while MACSJ0717 shows a pathological behaviour. Such behaviour is due to a peculiar shape of the \(\chi ^{2}\) function, with two different minima, which is shared by all the samples, but is especially pronounced in MACSJ0717. This shape is directly related to the degeneracy of the geometrical and the matter effects, as will clearly emerge from the results of our analysis.

In order to assess the validity of our non-local model against the standard GR case, we calculated the Bayesian evidence \({\mathcal {E}}\), so as to provide a statistical meaningful comparison tool between the two theories. We have calculated the Bayesian evidence using the algorithm proposed in [72]. As this algorithm is stochastic, in order to reduce the statistical noise we run it \(\sim 100\) times, obtaining a distribution of values from which we extract the best value of \({\mathcal {E}}\), as the median of the distribution, and the corresponding error.

Since the Bayesian evidence depends on the choice of the priors [73], we have always used uninformative flat priors on the parameters. For each cluster, we have assumed flat positive priors on the NFW parameters, i.e., \(c_{200}>0\) and \(M_{200}>0\) (note that the choice of NFW priors is different from that of [65], who used flat logarithmic priors on \(M_{200}\) and \(c_{200}\)), while for the non-local parameters we use flat logarithmic priors, \(-5<\mathrm {log}\,r_{\eta ,\xi }<5\).

Table 1 CLASH clusters ordered by redshift. For each cluster, we provide \(1\sigma \) constraints on each parameter in the top line, and the minimum values in the \(\chi ^2\) function in bottom one. Units: cluster radii are in kpc. Lensing-selected CLASH clusters are indicated by stars

Finally, the Bayes factors \({\mathcal {B}}\) are computed for each sample cluster. The Bayes factor is defined as the ratio of evidence factors of two models

$$\begin{aligned} {\mathcal {B}}^{\,i}_{j} = \frac{{\mathcal {E}}(M_{i})}{{\mathcal {E}}(M_{j})} \end{aligned}$$
(46)

where \(M_{j}\) is the reference model, which, in this case, is GR. The model selection is then performed by using the so-called Jeffreys scale [74], which states that: for \(\ln {\mathcal {B}}^{\,i}_{j}<0\) there is evidence against the model \(M_{i}\), i.e. the reference model is statistically favored; for \(0<\ln {\mathcal {B}}^{\,i}_{j} <1\) there is no significant evidence in favor of the model \(M_{i}\); for \(1<\ln {\mathcal {B}}^{\,i}_{j} <2.5\) the evidence is substantial; for \(2.5< \ln {\mathcal {B}}^{\,i}_{j} <5\) the evidence is strong; for \(\ln {\mathcal {B}}^{\,i}_{j} >5\) the evidence is decisive.

4 Results and discussion

First of all, we have performed the fits in the classical GR scenario (second column of Table 1), which will be our reference model. We can thereby cross-check our modelling and statistical analysis algorithm with results from the literature for the same sample. In the top panels of Fig. 1, we compare our results with [65], the original work were the lensing data we are using were first obtained and presented. Note that a direct comparison is possible because the same NFW parametrization, \(\{c_{200},M_{200}\}\), has been used, with the same mass modelling, i.e. no further components (gas, galaxies) have been used. The cross-check shows an excellent agreement, with all the nineteen estimates of both NFW parameters, \(c_{200}\) and \(M_{200}\), which agree within the \(1\sigma \) level.

Concerning the non-local model, results are shown in the third column of Table 1. The NFW parameters are well constrained by the lensing data. From the bottom panels of Figs. 1, 3, 4, 5 and 6, one can see that the estimates of concentration parameter \(c_{200}\) show no significant differences between GR and our non-local model, being all consistent with each other at the \(1\sigma \) level, although the non-local model tends to show higher concentrations than GR. On the other hand, this trend is much clearer in the cluster mass \(M_{200}\), whose non-local estimates clearly deviate from the GR ones.

Such increased estimates of \(M_{200}\) are very likely related to the form of the metric potentials and the role/weight of the correction terms. As explained above, if the non-local corrections are small and the \(\varPhi _0\) and \(\varPsi _0\) terms are the dominant contributions, the theory does not reduce to GR, because the potential \(\varPsi _0\) only takes into account 1/3 of the contribution which would be expected. Thus, in the non-local model, the cluster mass (mostly, and slightly for \(c_{200}\)) must be increased to compensate the missing contribution from \(\varPsi _0\) to the convergence. From Figs. 3, 4, 5 and 6, it immediately appears that the \(1\sigma \)- and \(2\sigma \)-regions shift toward higher masses, except for two cases (A2261 and MACSJ0717) where the \(1\sigma \) contours do not overlap.

As a further check, in Fig. 7 we compare the \((c_{200},M_{200})\) constraints obtained in our work for GR and for the non-local model with cM relations from the literature. As expected, the points representing the cM relation for each cluster scatter due to the different physical properties of the samples. However, for GR, the region spanned by the clusters agrees with the bands representing the relation obtained in the literature. Instead, for the non-local model, the same region shifts towards higher concentrations and masses, hence it does not coincide with expected results, especially with the green band derived in [75] using the same sample of our work. The gap is around 2\(\sigma \), thus the non-local model is just slightly statistically disfavoured respect to GR and cannot be discarded.

When we consider the non-local length scales \(r_{\eta }\) and \(r_{\xi }\), the statistics does not appear as much regular as in the case of the NFW parameters. The MCMC sampling does not identify a clear minimum in the \(\chi ^2\) function; instead, the \(\chi ^2\) exhibits a shallow valley in the parameter space. This is due to the fact that the non-local corrections are too small to be effectively detected given the observational uncertainties. We find that the MCMC samples are mostly concentrated in the region \(-1<\mathrm {log}\,r_{\eta , \, \xi }<5\), and only lower limits can be extracted (see column 3 of Table 1).

Fig. 1
figure 1

Concentrations and masses of CLASH clusters: (top panels) comparison between our results and those from [65] in the GR scenario; (bottom panels) comparison between the GR and the nonlocal scenario from our analysis

Fig. 2
figure 2

Gravitational lensing potential (dimensionless) contributions for A611. Left panel: total gravitational potential for GR (black) and nonlocal (blue). Right panel: \(\varPhi (r)\) (green) and \(\varPsi (r)\) (red) contributions; zeroth-order terms are in dark colors, first-order terms are in light color

Fig. 3
figure 3

Contour plot: blue—GR; green—nonlocal

Fig. 4
figure 4

Contour plot: blue—GR; green—nonlocal

Fig. 5
figure 5

Contour plot: blue—GR; green—nonlocal

Lower limits on the typical non-local scales can be set with the present data. They are \(r_{\eta }>4\times 10^{-5} - 7\times 10^{-2}\, \mathrm {kpc}\) and \(r_{\xi }>2\times 10^{-5} - 3\times 10^{-2}\, \mathrm {kpc}\), so that the magnitudes of the non-local corrections \(\phi _{1}\) and \(\psi _{1}\) in Eq. (2122) are \(\phi _{1} \sim 10^{-28} - 10^{-25}\; \mathrm {kpc^{2} \, s^{-2}}\) and \(\psi _{1} \sim 10^{-27} - 10^{-24} \; \mathrm {kpc^{2} \, s^{-2}}\). Thus, such terms might be dominant or of comparable magnitude with respect to the zeroth-order terms, \(\phi _{0}\) and \(\psi _{0}\), when the non-local parameters approach their lower bounds. This result would appear to conflict with the increased estimates of \(c_{200}\) and \(M_{200}\) that we obtained for the non-local model.

However, a deeper insight of the MCMC results shows that it naturally emerges from the degeneracy of the geometrical and mass effects. In fact, given that the theory cannot be reduced to GR, and that \(\varPsi _{NL}\) can only account for \(\varPsi _{GR}/3\), there exist only two possibilities to fit the observations: an increase of estimated cluster mass or a strong contribution of the non-local corrections, \(\varPhi _{1}(r)\) and \(\varPsi _{1}(r)\), to the lensing potential. Both options are statistically viable, leading to an agreement with data at a level statistically equivalent to GR.

This degeneracy of geometrical and matter effects corresponds to a peculiar shape of the \(\chi ^{2}\) function, which presents two minima. Such behaviour is extremely noticeable in MACSJ0717, whose Markov chain does not remain in a specific region and consequently does not converge to particular values of the parameters. In particular, for MACSJ0717, the minimum corresponding to low values of the non-local length scales (i.e. strong non-local correction to the gravitational potential) is remarkably deep and the associated estimates of \(M_{200}\) are extremely low. It follows that the non-local model seems to be able to fit the observations thanks to its non-local geometrical contributions to the lensing potential, with a very good statistical viability. Moreover, the fact that the corresponding \(M_{200}\) estimates are very low means that little matter contribution is necessary to fit data, so that it would open to the possibility to fit the gravitational lensing observations without dark matter. Further studies will be necessary to investigate in detail this possibility.

Fig. 6
figure 6

Contour plot: blue—GR; green—nonlocal

Fig. 7
figure 7

Concentrations and masses of CLASH clusters: comparison between our work and cM relations from the literature. Yellow band: cM relation from [76]; orange band: cM relation from [77]; green band: cM relation for CLASH clusters derived by [75]. The upper and lower limits of the colored bands correspond to the minimum and maximum redshift values of CLASH clusters

On the other hand, the \(\chi ^{2}\) function of our non-local model shows a low sensibility with respect to the non-local parameters, \(r_{\eta }\) and \(r_{\xi }\), so that the \(\chi ^{2}\) minimum corresponding to their lower limits results in an unstable equilibrium point. In contrast, the \(\chi ^2\) minimum corresponding to increased values of the NFW parameters turns out to be a stable equilibrium point and, as a result, the MCMC sampling is concentrated in this specific area.

This behaviour is dominant in all the sample clusters except for MACSJ0717. Such an example is illustrated in Fig. 2. The upper panel shows an overlap of the non-local lensing potential with that of GR, demonstrating that the two theories are able to fit the data at similar levels of significance. In the lower panels, the two contributions to the non-local lensing potential, \(\varPhi (r)\) and \(\varPsi (r)\), are presented separately and each of them is decomposed into its zeroth order and first order terms. One can immediately see that \(\varPhi _{1}(r)\) is completely subdominant with respect to the zeroth order term, which is greater (more negative) than the GR potential due to the increased mass estimate. Thus, the resulting effective potential is dominated by the contribution that corresponds to the “2/3 of GR” minimum. It follows the ostensibly contradictory results: the posterior constraints on the NFW parameters are dominated by regions of high posterior probability, while the lower bounds of the non-local length scales are related to the minimum corresponding to the fully non-local regime. Both solutions can fit data at similar levels of statistical significance, as a result of the matter-geometry degeneracy.

Fig. 8
figure 8

Profiles: blue—GR; red—nonlocal

Fig. 9
figure 9

Profiles: blue—GR; red—nonlocal

Fig. 10
figure 10

Profiles: blue—GR; red—nonlocal

We may now compare our results with those of [42], which are obtained by numerically simulating the S2 star orbits around our Galactic centre. It immediately appears that the two results for \(r_{\eta }\) are not consistent: in [42] the authors obtain a constraint of \(10^{7}\) km \(< r_{\eta } < 10^{10}\) km, while our lower bound is \(r_{\eta } > 10^{14}\)km. Instead, the two estimates for the parameter \(r_{\xi }\) are consistent with each other, because our lower bound \(r_{\xi } > 10^{12}\) km is within that found in [42], \(r_{\xi } > 10^{6}\) km. These results are well expected. Our analysis is performed at galaxy cluster scales, while [42] is at galactic scales. Even though the theoretical parameters should be independent of the test scale, it is difficult to reach a constraint of the AU order if the resolution limit of the galaxy clusters data is of the order of \(\sim 10\) kpc. Notice that the results for \(r_{\xi }\) are consistent, probably due to the fact that this parameter is associated with the scalar field which is not dynamical, but only plays an auxiliary role to localize the original Lagrangian in Eq. (2).

Finally, the computed Bayes factors are presented in the fourth column of Table 1. Using the Jeffreys scale, we can see that there is no evidence in favour of the non-local model with respect to GR. For ten samples we have \(0<\ln {\mathcal {B}}^{\,i}_{j} <1\), for eight samples \(\ln {\mathcal {B}}^{\,i}_{j}\) is slightly negative and for one sample \(\ln {\mathcal {B}}^{\,i}_{j}>1\). However, in both negative and “substantial” cases the logarithm of the Bayes factor is consistent with [0, 1] within few \(\sigma \).

5 Summary and conclusions

Since the non-local theory provides a viable mechanism to explain the late time cosmic acceleration with no need to introduce any form of dark energy, it is of great interest to perform further tests of this model also to address dark matter issues. In fact, its features and the physical consequences of non-locality have to be further analysed both on astrophysical and cosmological scales.

In this work, we have performed a completely new test of the non-local model, using gravitational lensing data at galaxy cluster scales. The theory provides two effective ways to fit the observations at the same level of statistical significance as GR. On the one hand, when the high-value limit of the non-local parameters is reached, the non-local model reduces to a GR-like theory, being able to fit the data at the cost of increased cluster mass. In this scenario, the non-local theory mimics GR at galaxy cluster scales, affecting only the estimated cluster mass. We cross-checked our results by comparing them with \(c_{200}-M_{200}\) relations from the literature, finding that the non-local model is slightly disfavoured with respect to GR. However, further comparisons with the constraints from different probes are necessary.

On the other hand, when the low-value limit of the non-local length scales is reached, the non-local corrections to the lensing potential become larger and comparable to, or even dominant over, the standard zeroth-order terms. In such a scenario, the non-local scenario may be able to mimic GR, neither affecting the mass estimates nor the statistical viability of the model. Moreover, when the non-local contributions becomes completely dominant, the non-local theory seems to be able to fit the lensing observations with extremely low cluster mass. Consequently, it emerges an intriguing possibility to fit data with no dark matter. In order to investigate such a scenario, further studies should be performed, taking into account the baryonic contributions from the hot gas and stellar components in galaxy clusters (Figs. 8, 9, 10.