1 Introduction

Recent observations of type Ia supernovae (SNIa) provide the main evidence that the current Universe is in an accelerating phase of expansion [1]. Cosmic microwave background (CMB) data indicate that the present Universe has also a negligible space curvature [2]. Therefore if we assume the Friedmann–Robertson–Walker (FRW) model in which the effects of nonhomogeneities are neglected, then the acceleration must be driven by a dark energy component \(X\) (matter fluid violating the strong energy condition \(\rho _{{X}}+3p_{{X}}\ge 0)\). This kind of energy represents roughly 70 % of the matter content of the current Universe. Because the nature as well as mechanism of the cosmological origin of the dark energy component are unknown some alternative theories try to eliminate the dark energy option by modifying the theory of gravity itself. The main prototype of this kind of models is a class of covariant brane models based on the Dvali–Gabadadze–Porrati (DGP) model [3] as generalized to cosmology by Deffayet [4]. The simplest explanation of a dark energy component is the cosmological constant with effective equation of state \(p=-\rho \) but then the problem of its smallness appears and hence its relatively recent dominance. Although the \(\Lambda \)CDM model offers a possibility of explanation of the observational data it is only the effective theory which contains the enigmatic theoretical term—the cosmological constant \(\Lambda \). Numerous other candidates for a dark energy description have also been proposed like the evolving scalar field [5], usually referred as quintessence, the phantom energy [6, 7], the Chaplygin gas [8] model, etc. Some authors believe that the dark energy problem belongs to the quantum gravity domain [9].

Recent Planck observations still favor the standard cosmological model [10], especially for the high multipoles. However, in this model there are some problems with understanding the values of the density parameters for both dark matter and dark energy. The question is why energies of vacuum energy and dark matter are of the same order for the current Universe. The very popular methodology to solve this problem is to treat the coefficient equation of state as a free parameter, i.e. the wCDM model, which should be estimated from the astronomical and astrophysical data. The observations from the CMB and baryon acoustic oscillation (BAO) data sets give \(w_x=-1.13^{+0.24}_{-0.23}\) with 95 % confidence levels [10].

Alternative to this idea of the phantom dark energy mechanism of alleviating the coincidence problem is to consider the interaction between dark matter and dark energy; the interaction model. Many authors investigated observational constraints of the interaction model. Costa et al. [11] concluded that the interaction models become in agreement with the admissible observational data which can provide some argument toward consistency of the measured density parameters. Yang and Xu [12] constrained some interaction models under the choice of an ansatz for the transfer energy mechanism. From this investigation the joined geometrical tests show a stricter constraint on the interaction model if we include information from the large scale structure [\(f\sigma _{8}(z)\) data] of the Universe. These authors have found the interaction rate in the \(3\sigma \) region. This means that the recent cosmic observations favor it but with rather a small interaction between the both dark sectors. However, the measurement of the redshift-space distortion could rule out a large interaction rate in the \(1\sigma \) region. Zhang and Liu [13] using the SNIa observations, \(H(z)\) data (OHD), CMB, and secular Sandage–Loeb obtained the small value of the interacting parameter: \(\delta =-0.019\pm 0.01 (1 \sigma ), \pm 0.02 (2\sigma )\).

In all interaction models the specific ansatz for a model of interaction is postulated. There are infinite many of such models with a different form of interaction and there is some kind of a theoretical bias or degeneracy, coming from the choice of the potential form in scalar field cosmology. Szydlowski [14] proposed the idea of the estimation of the interaction parameter without any ansatz for the model of the interaction.

These theoretical models are consistent with the observations; they are able to explain the phenomenon of the accelerated expansion of the Universe. But should we really prefer such models over the \(\Lambda \)CDM one? All observational constraints show that the \(\Lambda \)CDM model still shows a good fit to the observational data. But from these constraints the small value of the interaction is still admissible. To answer this question we should use some model comparison methods to confront the existing cosmological models having observations at hand. We choose the information and Bayesian criteria of the model selection which are based on Occam’s razor (principle), the well-known and effective instrument in science to obtain a definite answer of whether the interacting \(\Lambda \)CDM model can be rejected.

Let us assume that we have \(N\) pairs of measurements \((y_i,x_i)\) and that we want to find the relation between the \(y\) and \(x\) variables. Suppose that we can postulate \(k\) possible relations \(y\equiv f_i(x,\bar{\theta })\), where \(\bar{\theta }\) is the vector of the unknown model parameters and \(i=1,\dots ,k\). With the assumption that our observations come with uncorrelated Gaussian errors with a mean \(\mu _i=0\) and a standard deviation \(\sigma _i\), the goodness of fit for the theoretical model is measured by the quantity \(\chi ^2\) given by

$$\begin{aligned} \chi ^2=\sum _{i=1}^{N} \frac{(f_l(x_i,\bar{\theta }) - y_i)^2}{2\sigma _i^2}=-2\ln L, \end{aligned}$$
(1)

where \(L\) is the likelihood function. For the particular family of models \(f_l\) the best one to minimize the \(\chi ^2\) quantity we denote \(f_l(x,\hat{\bar{\theta }})\). The best model from our set of \(k\) models \({f_1(x,\hat{\bar{\theta }}),\dots ,f_k(x,\hat{\bar{\theta }})}\) could be the one with the smallest value of the quantity \(\chi ^2\). But this method could give us misleading results. Generally speaking, for a more complex model the value of \(\chi ^2\) is smaller, thus the most complex one will be the choice as the best from the set under consideration.

A clue is given by Occam’s principle known also as Occam’s razor: “If two models describe the observations equally well, choose the simplest one”. This principle has an aesthetic as well as empirical justification. Let us quote the simple example which illustrates this rule [15]. In Fig. 1 is observed a black box and a white one behind it. One can postulate two models: first, there is one box behind the black box, second, there are two boxes of identical height and color behind the black box. Both models explain our observations equally well. According to Occam’s principle we should accept the explanation which is simpler so that there is only one white box behind the black one. Is not it more probable that there is only one box than two boxes with the same height and color?

Fig. 1
figure 1

The illustration of Occam’s principle

We could not use this principle directly because the situations when two models explain the observations equally well are rare. But in information theory as well as in the Bayesian theory there are methods for model comparison which include such a rule.

In information theory there are no true models. There is only reality which can be approximated by models, which depend on some number of parameters. The best one from the set under consideration should be the best approximation to the truth. The information lost when truth is approximated by the model under consideration is measured by the so called Kullback–Leibler (KL) information, so the best one should minimize this quantity. It is impossible to compute the KL information directly because it depends on the truth, which is unknown. Akaike [16] found an approximation to the KL quantity, which is called the Akaike information criterion (AIC), given by

$$\begin{aligned} \text {AIC}=-2\ln \mathcal {L} +2d, \end{aligned}$$
(2)

where \(\mathcal {L}\) is the maximum of the likelihood function and \(d\) is the number of model parameters. A model which is the best approximation to the truth from a set of models under consideration has the smallest value of the AIC quantity. It is convenient to evaluate the differences between the AIC quantities computed for the rest of the models from the set and the AIC for the best one. Those differences (\(\Delta _{\text {AIC}}\)) are easy to interpret and allow for a quick ‘strength of evidence’ for a considered model with respect to the best one. The models with \(0 \le \Delta _{\text {AIC}}\le 2\) have substantial support (evidence), those where \(4<\Delta _{\text {AIC}}\le 7\) have considerably less support, while models having \(\Delta _{\text {AIC}} > 10 \) have essentially no support with respect to the best model.

It is worth noting that the complexity of the model is interpreted here as the number of its free parameters that can be adjusted to fit the model to the observations. If models under consideration fit the data equally well according to the Akaike rule the best one is with the smallest number of model parameters (the simplest one in such an approach).

In the Bayesian framework the best model (from the model set under consideration) is that which has the largest value of probability in the light of the data (so-called posterior probability) [17]

$$\begin{aligned} P(M_{i}|D)=\frac{P(D|M_{i})P(M_{i})}{P(D)}, \end{aligned}$$
(3)

where \(P(M_{i})\) is a prior probability for the model \(M_{i}\), \(D\) denotes the data, \(P(D)\) is the normalization constant,

$$\begin{aligned} P(D)= \sum _{i=1}^{k} P(D|M_{i})P(M_{i}). \end{aligned}$$
(4)

\(P(D|M_{i})\) is the marginal likelihood, also called the evidence,

$$\begin{aligned} P(D|M_{i})=\int P(D|\bar{\theta },M_{i})P(\bar{\theta }|M_{i}) \ \mathrm{d} \bar{\theta } \equiv E_{i}, \end{aligned}$$
(5)

where \(P(D|\bar{\theta },M_{i})\) is the likelihood under model \(i\), \(P(\bar{\theta }|M_{i})\) is the prior probability for \({\bar{\theta }}\) under model \(i\).

Let us note that we can include Occam’s principle by assuming the greater prior probability for the simpler model, but this is not necessary and rarely used in practice. Usually one assumes that there is no evidence to favor one model over another which causes one to assign equal values of the prior for all models under consideration. It is convenient to evaluate the posterior ratio for models under consideration which in the case with a flat prior for the models is reduced to the evidence ratio, called the Bayes factor,

$$\begin{aligned} B_{ij} = \frac{P(D|M_i)}{P(D|M_j)}. \end{aligned}$$
(6)

The interpretation of twice the natural logarithm of the Bayes factor is as follows: \(0<2\ln B_{ij}\le 2\) as weak evidence, \(2<2\ln B_{ij}\le 6\) as positive evidence, \(6<2\ln B_{ij}\le 10\) as strong evidence and \(2\ln B_{ij}> 10\) as very strong evidence against model \(j\) comparing to model \(i\). This quantity is our Occam’s razor. Let us simplify the problem to illustrate how this principle works here [15, 18].

Assume that \(\bar{P}(\bar{\theta }|D,M)\) is the non-normalized posterior probability for the vector \(\bar{\theta }\) of the model parameters. In this notation \(E=\int \bar{P}(\bar{\theta }|D,M)d\bar{\theta }\). Suppose that the posterior has a strong peak in the maximum: \(\bar{\theta }_{\text {MOD}}\). It is reasonable to approximate the logarithm of the posterior by its Taylor expansion in the neighborhood of \(\bar{\theta }_{\text {MOD}}\), so we finish with the expression

$$\begin{aligned} \bar{P}(\bar{\theta }|D,M)&= \bar{P}(\bar{\theta }_\text {MOD}|D,M) \nonumber \\&\quad \times \exp \left[ -(\bar{\theta }-\bar{\theta }_{\text {MOD}})^T C^{-1}(\bar{\theta }-\bar{\theta }_{\text {MOD}})\right] , \end{aligned}$$
(7)

where \(\left[ C^{-1} \right] _{ij} = -\left[ \frac{\partial ^2\ln \bar{P}(\bar{\theta }|D,M)}{\partial \theta _i\partial \theta _j}\right] _{\bar{\theta }=\bar{\theta }_\text {MOD}}\). The posterior is approximated by the Gaussian distribution with the mean \(\bar{\theta }_\text {MOD}\) and the covariance matrix \(C\). The evidence then has the form

$$\begin{aligned} E&= \bar{P}(\bar{\theta }_{\text {MOD}}|D,M) \nonumber \\&\times \int \exp \left[ -(\bar{\theta }-\bar{\theta }_{\text {MOD}})^T C^{-1}(\bar{\theta }-\bar{\theta }_{\text {MOD}})\right] \ \mathrm{d} \bar{\theta }. \end{aligned}$$
(8)

Because the posterior has a strong peak near the maximum, the highest contribution to the integral comes from the neighborhood close to \(\bar{\theta }_\text {MOD}\). The contribution from the other region of \(\bar{\theta }\) can be ignored, so we can expand the limit of the integral to the whole of \(R^d\). With this assumption one can obtain \(E=(2\pi )^{\frac{d}{2}}\sqrt{\det C}\bar{P}(\bar{\theta }_{\text {MOD}}|D,M)= (2\pi )^{\frac{d}{2}}\sqrt{\det C}P(D|\bar{\theta }_\text {MOD},M)P(\bar{\theta }_\text {MOD}|M)\). Suppose that the likelihood function has a sharp peak in \(\hat{\bar{\theta }}\) and the prior for \(\bar{\theta }\) is nearly flat in the neighborhood of \(\hat{\bar{\theta }}\). In this case \(\hat{\bar{\theta }}=\bar{\theta }_{\text {MOD}}\) and the expression for the evidence takes the form \(E=\mathcal {L}(2\pi )^{\frac{d}{2}}\sqrt{\det \text {C}}P(\hat{\bar{\theta }}|M)\). The quantity \((2\pi )^{\frac{d}{2}}\sqrt{\det \text {C}}P(\hat{\bar{\theta }}|M)\) is called the Occam factor (OF). When we consider the case with one model parameter with a flat prior, \(P(\theta |M)=\frac{1}{\Delta \theta }\) the \(\mathrm{OF} =\frac{2\pi \sigma }{\Delta \theta }\), this can be interpreted as the ratio of the volume occupied by the posterior to the volume occupied by the prior in the parameter space. The more parameter space wasted by the prior, the smaller value of the evidence. It is worth noting that the evidence does not penalize parameters which are unconstrained by the data [19].

As the evidence is hard to evaluate, an approximation to this quantity was proposed by Schwarz [20], the so-called Bayesian information criterion (BIC), and is given by

$$\begin{aligned} \text {BIC}=-2\ln \mathcal {L}+2d\ln N, \end{aligned}$$
(9)

where \(N\) is the number of the data points. The best model from a set under consideration is the one which minimizes the BIC quantity. One can notice the similarity between the AIC and BIC quantities, though they come from different approaches to the model selection problem. The dissimilarity is seen in the so-called penalty term: \(ad\), which penalizes more complex models (complexity is identified here as the number of free model parameters). One can evaluate the factor by which the additional parameter must improve the goodness of fit to be included in the model. This factor must be greater than \(a\); equal to \(2\) in the AIC case and equal to \(\ln N\) in the BIC case. Notice that the latter depends on the number of data points.

It can be shown that there is the simple relation between the BIC and the Bayes factor,

$$\begin{aligned} 2 \ln B_{ij} = -(\text {BIC}_i - \text {BIC}_j). \end{aligned}$$
(10)

The quantity \(B_{ij}\) is the Bayes factor for the hypothesis (model) \(i\) against the hypothesis (model) \(j\). We categorize this evidence against the model \(j\) taking the following ranking. The evidence against the model \(j\) is not worth than bare mentioning when twice the natural logarithm of the Bayes factor (or minus the difference between BICs) is \(0< 2\ln B_{ij} \le 2\), is positive when \(2< 2\ln B_{ij} \le 6\), is strong when \(6< 2\ln B_{ij}\le 10\), and is very strong when \( 2\ln B_{ij} > 10\).

It should be pointed out that the model selection methods presented are widely used in the context of cosmological model comparisons [18, 19, 2140]. We should keep in mind that the conclusions based on such quantities depend on the data at hand. Let us mention again the example with the black box. Suppose that we made a few steps toward this box so that we can see the difference between the height of the left and right side of the white box. Our conclusion changes now.

Let us quote the example taken from [30]. Assume that we want to compare the Newtonian and Einsteinian theories in the light of the data coming from a laboratory experiment where general relativistic effects are negligible. In this situation the Bayes factor between Newtonian and Einsteinian theories will be close to unity. But comparing the general relativistic and Newtonian explanations of the deflection of a light ray that just grazes the Sun’s surface gives the Bayes factor \(\sim 10^{10}\) in favor of the first one (and even greater with more accurate data).

We share George Efstathiou’s opinion [4143] that there is no sound theoretical basis for considering dynamical dark energy, whereas we are beginning to see an explanation for a small cosmological constant emerging from a more fundamental theory. In our opinion the \(\Lambda \)CDM model has the status of a satisfactory effective theory. Efstathiou argued why the cosmological constant should be given a higher weight as a candidate for a dark energy description than the dynamical dark energy. In this argumentation Occam’s principle is used to point out a more economical model explaining the observational data.

The main aim of this paper is to compare the simplest cosmological model—the \(\Lambda \)CDM model—with its generalization where the interaction between dark energy and matter sectors is allowed using the methods described above.

2 Interacting \(\Lambda \)CDM model

The interaction interpretation of the continuity condition (conservation condition) was investigated in the context of the coincidence problem since the paper Zimdahl [44], for recent developments in this area see Olivares et al. [45, 46]; see also Le Delliou et al. [47] for a discussion of recent observational constraints.

Let us consider two basic equations which determine the evolution of FRW cosmological models,

$$\begin{aligned} \frac{\ddot{a}}{a}&=-\frac{1}{6}(\rho +3p) ,\end{aligned}$$
(11)
$$\begin{aligned} \dot{\rho }&=-3H(\rho +p). \end{aligned}$$
(12)

Equation (11) is called the acceleration equation and Eq. (12) is the conservation (or adiabatic) condition. Equation (11) can be rewritten in a form analogous to the Newtonian equation of motion,

$$\begin{aligned} \ddot{a}=-\frac{\partial V}{\partial a}, \end{aligned}$$
(13)

where \(V=V(a)\) is potential function of the scale factor \(a\). To evaluate \(V(a)\) from (13) via integration by parts it is useful to rewrite (12) in the new equivalent form

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t}(\rho a^3) + p \frac{\mathrm{d}}{\mathrm{d}t}(a^3)=0. \end{aligned}$$
(14)

From (11) we obtain

$$\begin{aligned} \frac{\partial V}{\partial a}=\frac{1}{12}(\rho +3p)\mathrm{d}(a^2). \end{aligned}$$
(15)

It is convenient to calculate the pressure \(p\) from (14) and then substitute into (15). After simple calculations we obtain from (15)

$$\begin{aligned} \frac{\partial V}{\partial a} = - \frac{1}{6}\left[ a^2 \frac{\mathrm{d}\rho }{\mathrm{d}a}+\rho \mathrm{d}(a^2) \right] . \end{aligned}$$
(16)

Therefore

$$\begin{aligned} V(a)=-\frac{\rho a^2}{6}. \end{aligned}$$
(17)

In Eq. (17) \(\rho \) means the effective energy density of the fluid filling the Universe.

We find a very simple interpretation of (11): the evolution of the Universe is equivalent to the motion of a particle of unit mass in the potential well parameterized by the scale factor. In the procedure of the reduction of the problem of the FRW evolution to the problem of the investigation of a dynamical system of a Newtonian type we only assume that the effective energy density satisfies the conservation condition. We do not assume the conservation condition for each energy component (or non-interacting matter sectors).

Equations (11) and (12) admit the first integral which is usually called the Friedmann first integral. This first integral has a simple interpretation in the particle-like description of the FRW cosmology, namely energy conservation. We have

$$\begin{aligned} \frac{\dot{a}^2}{2}+V(a)=E=-\frac{k}{2}, \end{aligned}$$
(18)

where \(k\) is the curvature constant and \(V\) is given by Eq. (17).

Let us consider the Universe filled with the two fluid components,

$$\begin{aligned} \rho =\rho _{{m}} + \rho _X, \quad p=0+w_X\rho _X, \end{aligned}$$
(19)

where \(\rho _{{m}}\) means the energy density of the usual dust matter and \(\rho _X\) denotes the energy density of dark energy satisfying the equation of state \(p_X=w_X\rho _X\), where \(w_X=w_X(a)\). Then Eq. (14) can be separated into the dark matter and dark energy sectors, which in general can interact

$$\begin{aligned}&\frac{\mathrm{d}}{\mathrm{d}t}(\rho _{{m}} a^3) + 0 \cdot \frac{\mathrm{d}}{\mathrm{d}t}(a^3)=\Gamma ,\end{aligned}$$
(20)
$$\begin{aligned}&\frac{\mathrm{d}}{\mathrm{d}t}(\rho _X a^3) + w_X(a)\rho _X \frac{\mathrm{d}}{\mathrm{d}t}(a^3)=-\Gamma . \end{aligned}$$
(21)

In our previous paper [48] it was assumed that

$$\begin{aligned} \Gamma =\alpha a^n \frac{\dot{a}}{a}, \end{aligned}$$
(22)

which enables us to integrate (20), which gives

$$\begin{aligned}&\rho _m = \frac{C}{a^3}+\frac{\alpha }{n}a^{n-3},\end{aligned}$$
(23)
$$\begin{aligned}&\frac{\mathrm{d} \rho _X}{\mathrm{d}a}+\frac{3}{a}(1+w_X(a))\rho _X=-\alpha a^{n-4}. \end{aligned}$$
(24)

The solution of the homogeneous equation (24) can be written in terms of the average \(\overline{w_X}(a)\) as

$$\begin{aligned} \rho _X=\rho _{X,0}a^{-3(1+\overline{w_X}(a))}, \end{aligned}$$
(25)

where

$$\begin{aligned} \overline{w_X}(a)=\frac{\int w_X(a)\mathrm{d}(\ln a)}{\mathrm{d}(\ln a)}. \end{aligned}$$
(26)

The solution of the nonhomogeneous equation (24) is

$$\begin{aligned} \rho _X&= - \left[ \int _1^a a^{n-1+3\overline{w_X}(a)}\mathrm{d}a\right] a^{-3(1+\overline{w_X}(a))} \nonumber \\&\quad +\, \frac{C_X}{a^{3(1+\overline{w_X}(a))}}. \end{aligned}$$
(27)

Finally we obtain

$$\begin{aligned} \rho _{\text {eff}}&\equiv 3H^2 +3\frac{k}{a^2}=\rho _m+\rho _{X} \nonumber \\&= \frac{C_m}{a^3}+\frac{\alpha }{n}a^{n-3} +\frac{C_X}{a^{3(1+\overline{w_X}(a))}} \nonumber \\&- \left[ \int _1^a a^{n-1+3\overline{w_X}(a)}\mathrm{d}a\right] a^{-3(1+\overline{w_X}(a))}. \end{aligned}$$
(28)

The second and last terms originate from the interaction between the dark matter and dark energy sectors.

Let us consider the simplest case of \(\overline{w_X}(a)=\) const\(~=w_X(a)\). Then integration of (27) can be performed and we obtain

$$\begin{aligned} \rho _{\text {eff}}=\frac{C_m}{a^3}+\frac{C_X}{a^{3(1+w_X)}}+\frac{C_\text {int}}{a^{3-n}} \end{aligned}$$
(29)

where \(C_{\text {int}}=\frac{\alpha }{n}-\frac{\alpha }{n-3w_X}\). In this case we obtain one additional term in \(\rho _{\text {eff}}\) or in the Friedmann first integral scaling like \(a^{2-n}\). It is convenient to rewrite the Friedmann first integral in a new form, using dimensionless density parameters. Then we obtain

$$\begin{aligned} \left( \frac{H}{H_0}\right) ^2&= \Omega _{{m},0}(1+z)^3+\Omega _{k,0}(1+z)^2 \nonumber \\&\quad +\, \Omega _{\text {int}}(1+z)^{3-n} + \Omega _{X,0}(1+z)^{3(1+w_X)}. \end{aligned}$$
(30)

Note that this additional power law term related to interaction can also be interpreted as the Cardassian or polytropic term [49, 50] (one can easily show that the assumed form of the interaction always generates a correction of type \(a^m, m=1-n\), in the potential of the \(\Lambda \)CDM model and vice versa). Another interpretation of this term might originate from the Lambda decaying cosmology when the Lambda term is parametrized by the scale factor [51].

In the next section we draw a comparison between the above model with the assumption that \(\overline{w_X}(a)=\text {const}=-1\) and the \(\Lambda \)CDM model.

3 Data

To estimate the parameters of the two models we used for our purposes the modified CosmoMC code [52, 53] with the implemented nested sampling algorithm Multinest [54, 55].

We used the observational data of 580 type Ia supernovae (the Union2.1 compilation [56]), 31 observational data points of the Hubble function from [5766] collected in [67], the measurements of BAO from the Sloan Digital Sky Survey (SDSS-III) combined with the 2dF Galaxy Redshift Survey [6871], the 6dF Galaxy Survey [72, 73], and the WiggleZ Dark Energy Survey [7476]. We also used information coming from determinations of the Hubble function using the Alcock–Paczyński test [77, 78]. This test is very restrictive in the context of modified gravity models.

The likelihood function for the type Ia supernova data is defined by

$$\begin{aligned} L_{\text {SN}} \propto \exp \left[ \! -\! \sum _{i,j}\left( \mu _{i}^{\text {obs}} \!-\! \mu _{i}^{\text {th}}\right) C_{ij}^{-1} \left( \mu _{j}^{\text {obs}} \!-\! \mu _{j}^{\mathrm {th}}\right) \right] , \end{aligned}$$
(31)

where \(C_{ij}\) is the covariance matrix with the systematic errors, \(\mu _{i}^{\text {obs}}=m_{i}-M\) is the distance modulus, \(\mu _{i}^{\text {th}}=5\log _{10}D_{Li} + \mathcal {M}=5\log _{10}d_{Li} + 25\), \(\mathcal {M}=-5\log _{10}H_{0}+25\) and \(D_{Li}=H_{0}d_{Li}\), where \(d_{Li}\) is the luminosity distance which is given by \(d_{Li}=(1+z_{i})c\int _{0}^{z_{i}} \frac{dz'}{H(z')}\) (with the assumption \(k=0\)).

For \(H(z)\) the likelihood function is given by

$$\begin{aligned} L_{H_z} \propto \exp \left[ - \sum _i\frac{\left( H^{\text {th}}(z_i)-H^{\text {obs}}_i\right) ^2}{2 \sigma _i^2} \right] , \end{aligned}$$
(32)

where \(H^{\text {th}}(z_i)\) denotes the theoretically estimated Hubble function, \(H^{\text {obs}}_i\) is observational data.

The likelihood function for the BAO data is characterized by

$$\begin{aligned} L_{\text {BAO}} \propto \exp \left[ - \sum _{i,j}\left( d^{\text {th}}(z_i)-d^{\text {obs}}_i\right) C_{ij}^{-1} \left( d^{\text {th}}(z_j)-d^{\text {obs}}_j\right) \right] \end{aligned}$$
(33)

where \(C_{ij}\) is the covariance matrix with the systematic errors, \(d^{\text {th}}(z_i)\equiv r_s(z_d) \left[ (1+z_i)^2 D_\mathrm{A}^2(z_i)\frac{cz_i}{H(z_i)} \right] ^{-\frac{1}{3}}\), \(r_s(z_d)\) is the sound horizon at the drag epoch, and \(D_\mathrm{A}\) is the angular diameter distance.

The likelihood function for the information coming from the Alcock–Paczyński test is given by

$$\begin{aligned} L_{AP} \propto \exp \left[ - \sum _i\frac{\left( \mathrm{AP}^{\text {th}}(z_i)-\mathrm{AP}^{\text {obs}}_i\right) ^2}{2 \sigma _i^2} \right] \end{aligned}$$
(34)

where \(\mathrm{AP}^{\text {th}}(z_i)\equiv \frac{H(z_i)}{H_0 (1+z_i)}\).

Finally, we used likelihood function for the CMB shift parameter \(R\) [79], which is defined by

$$\begin{aligned} L_{\text {CMB}} \propto \exp \left[ -\frac{1}{2}\frac{(R^{\text {th}}-R^{\text {obs}})^2}{\sigma _{\mathcal {A}}^2} \right] \end{aligned}$$
(35)

where \(R^{\text {th}}=\frac{\sqrt{\Omega _{{m}} H_0}}{c}(1+z_{*})D_\mathcal {A}(z_{*})\), \(D_\mathcal {A} (z_{*})\) is the angular diameter distance to the last scattering surface, \(R^{\text {obs}}=1.7477\) and \(\sigma _{\mathcal {A}}^{-2}=48976.33\) [80].

The total likelihood function \(L_{\text {tot}}\) is defined as

$$\begin{aligned} L_{\text {tot}}=L_{\text {SN}}L_{H_z}L_{\text {BAO}}L_{\text {CMB}}L_{\mathrm{AP}}. \end{aligned}$$
(36)

4 Results

4.1 The model parameter estimation

The results of the estimation of the parameters of the \(\Lambda \)CDM and the interacting \(\Lambda \)CDM models are presented in Table 1. Given the likelihood function (31), first, we estimated the models parameters using the Union2.1 data only. Next, the parameter estimations with the joint data of the Union2.1, \(h(z)\), BAO, Alcock–Paczyński test [likelihood functions (31)–(34)] have been performed. Finally, we estimated the model parameters with the joint data enlarged with the CMB data [the total likelihood function (36)].

Table 1 The mean of marginalized posterior PDF with 68 % confidence level for the parameters of the models. In the brackets are shown parameter values of the joint posterior probabilities. Estimations were made using the Union2.1, \(h(z)\), BAO, determinations of Hubble function using the Alcock–Paczyński test and the CMB \(R\) data sets

The value of the interaction parameter \(\Omega _{\text {int},0}\) is very small for all data sets. Especially the result for the second data set [Union2.1, \(h(z)\), BAO, AP data] indicates that the interaction is probably negligible. There is also no indication of the direction of the interaction if it is a physical effect. While for the Union2.1 data set only the interaction parameter \(\Omega _{\text {int},0}\) is negative and a greater value of \(\Omega _{{m},0}\) in the interacting \(\Lambda \) CDM model implies the flow from the dark energy sector to the matter sector, and for the data set consisting of all data the opposite.

The uncertainty of the each estimated model parameter is presented twofold: as 68 % confidence levels in Table 1 and as the marginalized probability distributions in Figs. 2 and 3.

Fig. 2
figure 2

Posterior constraints for the interaction model. Joint probability distributions for \(h_{100}\), \(\Omega _{{M},0}\), \(\Omega _{\text {int}}\) and \(m\) with each other as well as marginalized probability distributions for each variable. Solid lines denote 68 and 95 % confidence intervals of fully marginalized probabilities, the colors illustrate the mean likelihood of the sample. Top estimations with the Union2.1 data only. Middle estimations made using the Union2.1, \(h(z)\), BAO, and determinations of the Hubble function using the Alcock–Paczyński test data sets. Bottom estimations made using the Union2.1, \(h(z)\), BAO, determinations of the Hubble function using the Alcock–Paczyński test and the CMB \(R\) data sets

Fig. 3
figure 3

Posterior constraints for the \(\Lambda \)CDM model. Joint probability distributions for \(h_{100}\), \(\Omega _{{M},0}\) with each other as well as marginalized probability distributions for each variable. Solid lines denote 68 and 95 % confidence intervals of fully marginalized probabilities, the colors illustrate the mean likelihood of the sample. Top estimations with the Union2.1 data only. Middle estimations made using the Union2.1, \(h(z)\), BAO, and determinations of the Hubble function using the Alcock–Paczyński test data sets. Bottom estimations made using the Union2.1, \(h(z)\), BAO, determinations of the Hubble function using the Alcock–Paczyński test and CMB \(R\) data sets

4.2 The likelihood ratio test

We begin our statistical analysis with the likelihood ratio test. In this test one of the models (null model) is nested in a second model (alternative model) by fixing one of the second model parameters. In our case the null model is the \(\Lambda \)CDM model, the alternative model is the interactive \(\Lambda \)CDM model, and the parameter in question is \(\Omega _{\text {int}}\). We have

$$\begin{aligned} H_0 :\Omega _{\text {int}}&= 0, \\ H_1 :\Omega _{\text {int}}&\ne 0. \end{aligned}$$

The statistic is given by

$$\begin{aligned} \lambda = 2 \ln \left( \frac{L(H_1|D)}{L(H_0|D)} \right) = 2\left( \frac{\chi ^2_{\text {int}}}{2} - \frac{\chi ^2_{\Lambda \text {CDM}}}{2} \right) \end{aligned}$$
(37)

where \(L(H_1|D)\) is the likelihood of the interacting \(\Lambda \)CDM model, \(L(H_0|D)\) is the likelihood of the \(\Lambda \)CDM model obtained using three different sets of data. The statistic \(\lambda \) has the \(\chi ^2\) distribution with \(df=n_1-n_0=2\) degrees of freedom where \(n_1\) is the number of parameters of the alternative model, \(n_0\) is the number of parameters of the null model. The results are presented in Table 2. In all three cases the \(p\) values are greater than the significance level \(\alpha = 0.05\), which is why the null hypothesis cannot be rejected. In other words we cannot reject the hypothesis that there is no interaction between the dark matter and dark energy sector.

Table 2 The results of the likelihood ratio test for the \(\Lambda \)CDM model (null model) and the \(\Lambda \)CDM interacting model (alternative model). The values of \(\chi ^2_{\text {int}}\), \(\chi ^2_{\Lambda \text {CDM}}\), test statistic \(\lambda \), and the corresponding \(p\) value (\(df=4-2=2\)). Estimations were made using the Union2.1, \(h(z)\), BAO, determinations of the Hubble function using the Alcock–Paczyński test and the CMB \(R\) data sets

4.3 The model comparison using the AIC, BIC, and Bayes evidence

To obtain the values of the AIC and BIC quantities we perform the \(\chi ^2=-2\ln L\) minimization procedure after marginalization over the \(H_0\) parameter in the range \(\langle 60,80 \rangle \). They are presented in Table 3.

Table 3 Values of the \(\chi ^2\), AIC, \(\Delta \)AIC (with respect to the \(\Lambda \)CDM model), BIC, and Bayes factor. Estimations were made using the Union2.1, \(h(z)\), BAO, determinations of Hubble function using the Alcock–Paczyński test and the CMB \(R\) data sets

Regardless the data set the differences of the AIC quantities are in the interval \((3.4, 4)\) and are a little outside the interval \((4,7)\), which indicates the considerably smaller support for the interacting \(\Lambda \)CDM model. It means that while the \(\Lambda \)CDM model should be preferred over the interacting \(\Lambda \)CDM model, the latter cannot be ruled out.

However, we can arrive at a decisive conclusion employing the Bayes factor. The difference of BIC quantities is greater than 10 and have values in the interval \((12,13)\) for all data sets. Thus, the Bayes factor indicates strong evidence against the interacting \(\Lambda \)CDM model compared to the \(\Lambda \)CDM model. Therefore we are strongly convinced we should reject the interaction between dark energy and dark matter sectors due to Occam’s principle.

5 Conclusion

We considered the cosmological model with dark energy represented by the cosmological constant and the model with interaction between dark matter and dark energy (the interacting \(\Lambda \)CDM model). These models were studied statistically using the available astronomical data and then compared using the tools taken from information as well as Bayesian theory. In both cases the model selection is based on Occam’s principle, which states that if two models describe the observations equally well we should choose the simpler one. According to the Akaike and Bayesian information criteria the model complexity is interpreted in terms of the number of free model parameters, while according to the Bayesian evidence a more complex model has a greater volume of the parameter space.

Anyone using the Bayesian methods in astronomy and cosmology should be aware of the ongoing debate not only about pros but also cons of this approach. Efstathiou provided a critique of the evidence ratio approach indicating difficulties in defining models and priors [81]. Jenkins and Peacock [82] called attention to too much noise in the data, which does not allow one to decide to accept or reject a model based solely on whether the evidence ratio reaches some threshold value. That is the reason that we also used the AIC based on information theory.

The observational constraints on the parameter values, which we have obtained, have confirmed previous results that if the interaction between dark energy and matter is a real effect it should be very small. Therefore it seems to be natural to ask whether cosmology with interaction between dark energy and matter is plausible.

At the beginning of our model selection analysis we performed the standard likelihood ratio test. This test conclusion was to fail to reject the null hypothesis that there is no interaction between matter and dark energy sectors with the significance level \(\alpha =0.05\). It was the first clue against the interacting \(\Lambda \)CDM model. The \(\Delta \)AIC between both models was less conclusive. While the \(\Lambda \)CDM model was more supported, the interacting \(\Lambda \)CDM cannot be rejected. On the other hand the Bayes factor has given a decisive result; there was a very strong evidence against the interacting \(\Lambda \)CDM model compared to the \(\Lambda \)CDM model. Given the weak or almost non-existing support for the interacting \(\Lambda \)CDM model and bearing in mind Occam’s razor we are inclined to reject this model.

We have also the theoretical argument against the interacting \(\Lambda \)CDM model. If we consider the \(H^2\) formula which is a base for estimation there is a degeneracy because one cannot distinguish the effects of interaction from the effect \(w(z)\)—the case of varying equation of state depending on time or redshift.

As was noted by Kunz [83] there is a dark degeneracy problem. It means that the effect of interaction cannot be distinguished from the effect of an additional non-interacting fluid with the constant equation of state \(w_{\text {int}}=n/3-2\). Therefore if we consider a mixture of all three non-interacting fluids we obtain the coefficient equation of state for the dark energy and interacting fluid in the form

$$\begin{aligned} w_{\text {dark}}&= \frac{(p_X+ p_{\text {int}})}{C_X(1+z)^{3(1+w_X)} + C_{\text {int}}(1+z)^{3-n}} \nonumber \\&= \frac{w_{X}(1+z)^{3(1+w_{X})} + C_{\text {int}}(1+z)^{3-n}w_{\text {int}}}{C_X(1+z)^{3(1+w_X)} + C_{\text {int}}(1+z)^{3-n}}. \end{aligned}$$
(38)