1 Introduction

The Einstein weak equivalence principle (WEP) is one of the cornerstones of general relativity. It states that the trajectories of any freely falling, uncharged test bodies are independent of their energy, internal structure, or composition [1]. In the theory of parameterized post-Newtonian (PPN), the WEP requires that the PPN parameter \(\gamma \) of different particles or the same type of particle with different energies (hereafter, “different particles” represents both cases) should be the same. WEP has been tested by time delay of different particles. In 1964, [2] proposed that one could use the time delays between transmission of radar pulses towards either of the inner planets and detection of the echoes. The Shapiro time delay can be formulated as \(t_\mathrm{Shapiro}=-\frac{1+\gamma }{c^3}\int _{r_e}^{r_0}\varPsi (r)\mathrm{d}r\), where \(r_e\) and \(r_0\) are the emitting position of the particle and the position of observer, respectively, c is the speed of light, and \(\varPsi (r)\) is the gravitational potential. If the WEP is violated, the arrival times of two different particles emitted simultaneously traveling in the same gravitational potential will be different. The relative Shapiro time delay is given by

$$\begin{aligned} \varDelta t_{\mathrm{gra}} = \frac{|\gamma _1-\gamma _2|}{c^3}\int _{r_e}^{r_0}\varPsi (r)\mathrm{d}r. \end{aligned}$$
(1)

Up to presently, the observed time delays of different type particles (e.g. photons, neutrinos, or gravitational waves), or the same type of particles with different energies have been used to test WEP, such as the different arrival times of photons and neutrinos from SN1987A [3, 4], the time delay of a PeV-energy neutrino event associated with a giant flare of the blazar PKS B1424-418 [5], the photons in different energy bands of gamma-ray bursts (GRBs) [6, 7], polarized photons of GRBs [7], radio signals at different frequency bands of fast radio bursts (FRBs) [8,9,10] and the Crab pulsar [11], and gravitational wave (GW) sources [12,13,14,15]. However, the observed time delay should include several terms:

$$\begin{aligned} \varDelta t_{\mathrm{obs}}=\varDelta t_{\mathrm{int}}+\varDelta t_{\mathrm{LIV}}+\varDelta t_{\mathrm{spec}}+\varDelta t_{\mathrm{DM}}+\varDelta t_{\mathrm{gra}}, \end{aligned}$$
(2)

where \(\varDelta t_{\mathrm{int}}\) is the intrinsic time delay between two particles due to different emission times, \(\varDelta t_{\mathrm{LIV}}\) is the time delay caused by Lorentz-invariance violation, \(\varDelta t_{\mathrm{spec}}\) is the potential time delay due to photons having a rest mass, and \(\varDelta t_{\mathrm{DM}}\) is the time delay contributed by the dispersion of the line-of-sight free electrons.

In the high-energy range (from keV to GeV), \(\varDelta t_{\mathrm{DM}}\) can be negligible. However, most of the previous work omitted all of the other potential effects based on the following assumptions. First, the observed time delay is mainly attributed to the gravitational potential so the time delays caused by other effects are omitted. Second, the time delays caused by other effects and \(\varDelta t_\mathrm{gra}\) have the same sign. So they can give an upper limit of the violation of WEP. These two assumptions are unreasonable. For the first assumption, in some cases, the \(\varDelta t_{\mathrm{int}}\) term would dominate the observed time delay especially when the observed time delay is very small, which leads to a non-physical constraint on the WEP. In addition, \(\varDelta t_{\mathrm{spec}}\) and \(\varDelta t_{\mathrm{LIV}}\) are strongly correlated with \(\varDelta t_{\mathrm{gra}}\) in the total observed time delay. For the second assumption, in the special case that the signs of \(\varDelta t_{\mathrm{other}}\) caused by other effects and \(\varDelta t_{\mathrm{gra}}\) are opposite (e.g. \(\varDelta t_{\mathrm{other}}=-\,0.99\) s and \(\varDelta t_{\mathrm{gra}}=1\) s), a much smaller time delay is obtained. This will lead to a much tighter, but incorrect constraint on WEP. Therefore, the intrinsic time delay \(\varDelta t_{\mathrm{int}}\) and \(\varDelta t_{\mathrm{other}}\) can severely limit the ability of this method. To avoid these effects, we propose that the strongly lensed cosmic transients can be used to test WEP. This is the first time to correct the other effects in a WEP constraint. The time delay due to strong gravitational lensing between different particles is a powerful tool, which has been used to test the Lorentz-invariance violation [16] and the speed of GW [17, 18].

2 The method and constraints on WEP

The gravitational lens effect is a prediction of general relativity. After the first observation example of gravitational lensing, the quasar QSO 0957+561A, B [19], it has become a powerful tool in many fields of astronomy, such as probing dark matter halo, large scale structures, the Hubble constant, and parameters of universe. Generally, there will be multiple images of the source when it is strongly lensed. The differences of arrival times for images are caused by the Shapiro time delay and geometric delay due to the bending of light rays [20]. In a general model of the lens, the time delay of the images relative to the case that source, lens and image are on a straight line is

$$\begin{aligned} t(\varvec{\theta })=\frac{1+z_l}{c}\frac{d_ld_s}{d_{ls}}\left[ \frac{1}{2}(\varvec{\theta }-\varvec{\beta })^2-\psi (\varvec{\theta })\right] , \end{aligned}$$
(3)

where \(\varvec{\theta }\) and \(\varvec{\beta }\) are the position vectors of the image and source, \(z_l\) is the redshift of the lens, \(d_l\) and \(d_s\) are the angular diameter distances of the lens and source, \(d_{ls}\) is the angular diameter distance from the lens to the source and \(\psi (\varvec{\theta })=\frac{d_{ls}}{d_ld_s}\frac{1+\gamma }{c^2}\int \varPsi (d_l\varvec{\theta },z)\mathrm{d}z\) is the projected gravitational potential [20]. Actually, \(t(\varvec{\theta })\) can be divided into \(t_\mathrm{geo}=\frac{1+z_l}{c}\frac{d_ld_s}{d_{ls}}\frac{1}{2}(\varvec{\theta }-\varvec{\beta })^2\) and \(t_\mathrm{gra}=\frac{1+z_l}{c}\frac{d_ld_s}{d_{ls}}\psi (\varvec{\theta })\), which are the geometric time delay and Shapiro time delay, respectively.

Fig. 1
figure 1

The geometry of the gravitational lens considered here. Illustration of a strongly lensed cosmic transient which can be used to test WEP

We show a strong lens of a bright cosmic transient in Fig. 1. In this figure, O and L are the observer and lens object. \(S_1\) and \(S_2\) are the two signals which are associated with each other in this transient event, for example a GW and its electromagnetic (EM) counterpart or particles at two different energy bands in a cosmic explosion. \(d_l\), \(d_s\), \(d_{ls}\) and \(\varvec{\beta }\) are the same as those in Eq. (3). \(\varvec{\theta _1}\) and \(\varvec{\theta _2}\) are the positions of two images formed by gravitational lens. \(P_1\) and \(P_2\) are the trajectories of the light rays of the two images. We assume that the intrinsic time delay between \(S_1\) and \(S_2\) is \(\varDelta t_{\mathrm{int}}\) (\(S_1\) is earlier than \(S_2\)) and the first and second signals of \(S_1\) arrive at \(t_{11}\) and \(t_{12}\), respectively. Here we just assume that \(\varDelta t_\mathrm{spec}\) and \(\varDelta t_{\mathrm{LIV}}\) can be omitted, and we discuss them in the following section. If the WEP is valid, one can expect that the first and second signals of \(S_2\) will arrive at \(t_{21}=t_{11}+(1+z_s)\varDelta t_{\mathrm{int}}\) and \(t_{22}=t_{12}+(1+z_s)\varDelta t_{\mathrm{int}}\), where \(z_s\) is the redshift of the source. If there is some small violation of WEP, the positions of the images have small changes \(\delta \theta _{11}\), \(\delta \theta _{12}\), \(\delta \theta _{21}\), \(\delta \theta _{22}\) and the arrival times of the signals become \(t_{11}^\prime \), \(t_{12}^\prime \), \(t_{21}^\prime \), and \(t_{22}^\prime \). Similar to [21], we perform a Taylor expansion of these new arrival times and consider the first order terms

$$\begin{aligned} t_{11}^\prime= & {} t_{11}+\frac{\partial t}{\partial \theta }|_{\theta _1}\delta \theta _{11}+\frac{\partial t}{\partial \gamma }|_{\theta _1,\gamma _0}(\gamma _1-\gamma _0), \\ t_{12}^\prime= & {} t_{12}+\frac{\partial t}{\partial \theta }|_{\theta _2}\delta \theta _{12}+\frac{\partial t}{\partial \gamma }|_{\theta _2,\gamma _0}(\gamma _1-\gamma _0), \\ t_{21}^\prime= & {} t_{21}+\frac{\partial t}{\partial \theta }|_{\theta _1}\delta \theta _{21}+\frac{\partial t}{\partial \gamma }|_{\theta _1,\gamma _0}(\gamma _2-\gamma _0), \\ t_{22}^\prime= & {} t_{22}+\frac{\partial t}{\partial \theta }|_{\theta _2}\delta \theta _{22}+\frac{\partial t}{\partial \gamma }|_{\theta _2,\gamma _0}(\gamma _2-\gamma _0). \end{aligned}$$

Because the Fermat principle, which we assume is still valid in this case, requires that the lensed image position makes the travel time stationary, \(\frac{\partial t}{\partial \theta }|_{\theta _1}\) and \(\frac{\partial t}{\partial \theta }|_{\theta _2}\) are equal to 0 and all the second terms of the right parts of these equations must vanish. Then comparing \(t_{21}^\prime -t_{11}^\prime \) and \(t_{22}^\prime -t_{12}^\prime \), the effect of the intrinsic time delay \(t_{\mathrm{int}}\) can be removed naturally. The difference between \(\gamma _1\) and \(\gamma _2\) is

$$\begin{aligned} \varDelta \gamma \equiv |\gamma _1-\gamma _2|\le 2(1+\alpha )\frac{|(t_{22}^\prime -t_{12}^\prime )-(t_{21}^\prime -t_{11}^\prime )|}{|t_{22}^\prime -t_{21}^\prime |}, \end{aligned}$$
(4)

where \(\alpha =\varDelta t_{\mathrm{geo}}/\varDelta t_{\mathrm{gra}}\) is the ratio of time delays caused by geometric effect and Shapiro delay effect. Hereafter, we define ‘time delay’ as \(t_{22}^\prime -t_{12}^\prime \) or \(t_{21}^\prime -t_{11}^\prime \), which is the time difference between different particles in the same light path. The ‘strong lensing time delay’ is defined as \(t_{22}^\prime -t_{21}^\prime \) or \(t_{12}^\prime -t_{11}^\prime \), which is the time delay between two different paths due to strong lensing. The value of \(\alpha \) depends on the choice of the lens model.

For the purpose of illustrating our method, we use the singular isothermal sphere (SIS) model, which has been proved to be a reliable model for lenses. In the SIS model, the distribution of stars and other mass components in galaxies are thought to be like that of particles in idea gas. The projected potential and Einstein radius of a SIS lens are \(\psi (\xi )=\frac{d_{ls}}{d_s}\frac{4\pi \sigma _v^2}{c^2}|\xi |\) and \(\theta _E=4\pi \frac{\sigma _v^2}{c^2}\frac{d_{ls}}{d_s}\), respectively, where \(\sigma _v\) is the velocity dispersion and \(\xi \) is the angular distance from the center of the lens [20]. If the lensing is strong, \(\beta <\theta _E\), there are two images of the source at the positions \(\theta _{\pm }=\beta \pm \theta _E\). Then, from Eq. (3), the time delay between the two images is

$$\begin{aligned} \varDelta T_{\mathrm{SIS}} = 2\beta \theta _E\frac{1+z_l}{c}\frac{d_ld_s}{d_{ls}}, \end{aligned}$$
(5)

which is all caused by the Shapiro time delay effect (i.e. \(\varDelta T_{\mathrm{SIS}}=t_{\mathrm{gra,\,SIS}}\)). The difference in arrival times of two images caused by geometric time delay is \(\varDelta t_{\mathrm{geo}}=0\), which means that the lengths of the different light trajectories of the two images \(P_1\) and \(P_2\) are equal. It should be pointed out that the SIS model is an idealized lens model. The real strong lensing by a galaxy is likely not to be a SIS case. For a certain strong lensing event, if there are other observations of the properties of the lens object, we can calculate the value of \(\alpha \) in a suitable lens model. Generally, both of \(\varDelta t_\mathrm{geo}\) and \(\varDelta t_{\mathrm{gra}}\) are of the order of \(GM/c^3\), where M is the mass of lens [22]. Therefore, one can expect that the parameter \(\alpha \) should be of the order of one.

GRBs are promising tools to constrain the WEP for photons with different energies [6]. Due to their large luminosities, GRBs can be observed at very high redshifts [23]. Therefore, there is a much greater possibility for a GRB to be lensed by a galaxy or galaxy cluster in the universe. Due to the success of BATSE, Swift and Fermi satellites, the number of detected GRBs keeps increasing. There is a lot of work researching the potential strong lensed GRBs in several GRB catalogs. However, no such event was ever found [24, 25]. Li and Li (2014) researched the potential lensing events in BATSE GRB data. They found four candidates. The second couple of GRBs, 2044 and 2368, has properties closest to a strong lensed GRB event. We use this GRB as an example, although they excluded the possibility in their work [25]. The flux ratios in four considered energy channels seem similar. In addition, the angular separation of them is \(\varDelta \theta =3.88^\circ \) while the location uncertainties of them are \(2.88^\circ \) and \(6.06^\circ \). The detected time delay between 2044 and 2368 is about \(1.77\times 10^7\) s and the time delays of the photons in energy channels of 25–60 and 60–110 keV are \(0.085\pm 0.042\) and \(1.730\pm 0.162\) s for 2044 and 2368, respectively. Assuming there is a lensed GRB event with similar time delay, we constrain the WEP with our method. From Eq. (4), the constraint on the violation of WEP between the photons in these two energy bands is \(\varDelta \gamma \le 2(1+\alpha )\frac{1.730-0.085}{1.77\times 10^7}=1.86(1+\alpha )\times 10^{-7}\). If we choose the SIS lens model, \(\alpha =0\), we have \(\varDelta \gamma \le 1.86\times 10^{-7}\). For the Fermi GBM, the expected time to observe one lensed GRB is about 11 years [25]. Since the Fermi GBM has served about 9 years and will be in service for more than another 10 years, it is reasonable to expect a lensed GRB event in the operating period of the Fermi GBM.

3 Discussion

In this section, we discuss two potential biases. The first point is the time delay potentially caused by the non-zero rest mass of photons \(\varDelta t_{\mathrm{spec}}\) and the LIV \(\varDelta t_{\mathrm{LIV}}\). Generally, these two effects are similar since both of them will cause an energy-dependent speed of photons. If the photon has a non-zero rest mass, then the higher-energy photons will travel faster than lower-energy photons. On the contrary, LIV will lead to the opposite effect that higher-energy photons have lower traveling speed because of the so-called vacuum dispersion effect [26,27,28,29]. Therefore, the time delay terms \(\varDelta t_{\mathrm{spec}}\) and \(\varDelta t_{\mathrm{LIV}}\) are both caused by the potential difference of the traveling speeds of photons with different energies.

In Fig. 1, there are two traveling paths and for each path there are two images formed by two different kinds of particles. Let us assume that the speeds of those two different particles are \(v_1\) and \(v_2\) and the lengths of the two traveling paths are \(L_1\) and \(L_2\), respectively. Therefore, the time delays caused by this effect are \(L_1/v_1-L_1/v_2\) and \(L_2/v_1-L_2/v_2\), respectively. Their contribution to the difference of time delays \(|(t_{22}^\prime -t_{12}^\prime )-(t_{21}^\prime -t_{11}^\prime )|\), which we use in Eq. (4) to constrain the WEP, is \(\frac{(L_1-L_2)(v_2-v_1)}{v_1v_2}\). Considering a strong lensing system with a strong lensing time delay, which is about 1 year, and the time delays of different particles \(t_{22}^\prime -t_{12}^\prime \) and \(t_{21}^\prime -t_{11}^\prime \) are about 1 s, the difference of the time delays \(|(t_{22}^\prime -t_{12}^\prime )-(t_{21}^\prime -t_{11}^\prime )|\) should also be of the order of 1 s. For this kind of strong lensing system, the difference between \(L_1\) and \(L_2\) should be of the order of 1 light year. However, the distance of a typical cosmic source should be of the order of 1 Gpc, so it has

$$\begin{aligned}&\frac{(L_1-L_2)(v_2-v_1)}{v_1v_2}/\left( \frac{L_1}{v_1}-\frac{L_1}{v_2}\right) \nonumber \\&\quad = \frac{L_1-L_2}{L_1} \sim \frac{1 \mathrm{ly}}{1 \mathrm{Gpc}}\sim 10^{-9}. \end{aligned}$$
(6)

Therefore, even though the time delay caused by non-zero rest mass of photons and LIV effect is hundreds of times larger than the observed time delay between two different particles, \(\frac{(L_1-L_2)(v_2-v_1)}{v_1v_2}\) only contributes a very small part of \(|(t_{22}^\prime -t_{12}^\prime )-(t_{21}^\prime -t_{11}^\prime )|\). Therefore, it is reasonable to omit the terms \(\varDelta t_{\mathrm{spec}}\) and \(\varDelta t_{\mathrm{LIV}}\), which means our method can also exclude the potential effects of a non-zero rest mass of photons and LIV.

The second point is that the two traveling paths will perhaps introduce some other contributions to the observed time delay. Actually, for a typical strong lensing system, the Einstein radius is about \(4\times 10^{-6} (\frac{M}{10^{11}M_\odot })^{0.5}(\frac{D}{1\mathrm Gpc})^{-0.5}\) [20], which is very much smaller than the relativistic beaming angle \(\theta \sim \Gamma ^{-1}\sim 10^{-3}\) where \(\Gamma \) is the Lorentz factor of the GRB’s jet [30]. Therefore, the difference of the special-relativistic boost factor caused by different viewing angles can be omitted reasonably. In addition, the different paths may also lead to different effects of gravitational potential along the paths, such as the weak lensing by the large scale structure and also the effects of our Galaxy and local galaxy cluster, which may also contribute into the observed time delay. However, from Eq. (1), the difference between the Shapiro time delays for two paths relies on the total difference of the gravity potential along the whole paths which includes the gravitational potential of large scale structure, the local galaxy cluster and also our galaxy. Therefore, the effect is considered even though there is some small difference of gravitational potentials along the two different paths.

4 Conclusions

In this paper, we have proposed a method to constrain the violation of the WEP with strongly lensed cosmic transients. Our method does not need to make any assumption as regards the physical mechanism of the transient. Moreover, because it utilizes the difference of time delays of the two lensing images, the potential effect of intrinsic time delay \(\varDelta t_{\mathrm{int}}\), LIV time delay \(\varDelta t_{\mathrm{LIV}}\) and the non-zero rest mass time delay \(\varDelta t_{\mathrm{spec}}\) can be naturally removed. By analyzing the properties of time delay of gravitational lens, we find that the parameter \(\alpha \), which represents the ratio of time delays between two lensing images caused by geometric and Shapiro time delay effects, will be zero for the SIS lens model. Therefore, the lengths of the light ray paths for two lensing images are the same in the SIS lens model, which means that any test depending on the difference of the path length cannot work in the SIS lens model, such as testing the difference between the speeds of light and GW. Even though for a realistic lens, one should consider the parameter \(\alpha \) when constraining the speed of the GW. Otherwise an unreasonably tighter constraint will be obtained.

Due to the number of GRBs increasing, the detection of a strong lensed GRB is promising. However, there is still no such event at present. Assuming that a strong lensed GRB with time delay between images is about \(1.77\times 10^7\) s and the time delays of the photons in the two energy channels are \(0.085\pm 0.042\) and \(1.730\pm 0.162\) s, one can give a constraint on WEP at about the \(10^{-7}\) level. Besides GRBs, the lensed FRBs and the GW events with their EM counterparts are also potential candidates to test the WEP using our method [31,32,33]. Interestingly, lensing of FRBs has been proposed to probe dark matter [34, 35].

From Eq. (4), with the time delay measurements of a single strong gravitational lensing event one can give a tight constraint on the \(\varDelta \gamma \). It can also be found that the efficiency of the constraint on WEP is proportional to the accuracy of the time delay measurement and inversely proportional to the strong lensing time delay. A typical strong lensing by a galaxy will give multiple images with days to months strong lensing time delays [36]. Because the strong lensing time delay is proportional to the mass of lens, the time delay due to galaxy cluster lens is much longer [37], which will give much stricter constraint on \(\varDelta \gamma \). If a several-months strong lensing time delay is observed and the time delay of two images is about 0.1 s, the constraint on the violation of WEP will be up to \(\varDelta \gamma <10^{-8}\). For the measurement of the time delay between two different particles, it depends on the accuracy of the timing measurement of the transient event. The accuracies of the time measurement of FRB, GW and GRB are about 0.01 ms [38], \(10^{-4}\) ms [31] and 0.1 s [39]. Recently, the detections of GW170817 and its electromagnetic counterparts [14, 40] also encouraged us to find a more reliable constraint on the WEP using strong lensing of GWs and electromagnetic counterparts. Therefore, the accuracy of the WEP can be improved by several orders of magnitude in the future, if the lens is a galaxy cluster and the strongly lensed cosmic transients have a much more precise time delay measurement, such as the GW events and FRBs.