AIC, BIC, Bayesian evidence against the interacting dark energy model

Recent astronomical observations have indicated that the Universe is in the phase of accelerated expansion. While there are many cosmological models which try to explain this phenomenon, we focus on the interacting $\Lambda$CDM model where the interaction between the dark energy and dark matter sectors takes place. This model is compared to its simpler alternative---the $\Lambda$CDM model. To choose between these models the likelihood ratio test was applied as well as the model comparison methods (employing Occam's principle): the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the Bayesian evidence. Using the current astronomical data: SNIa (Union2.1), $h(z)$, BAO, Alcock--Paczynski test and CMB we evaluated both models. The analyses based on the AIC indicated that there is less support for the interacting $\Lambda$CDM model when compared to the $\Lambda$CDM model, while those based on the BIC indicated that there is the strong evidence against it in favor the $\Lambda$CDM model. Given the weak or almost none support for the interacting $\Lambda$CDM model and bearing in mind Occam's razor we are inclined to reject this model.


Introduction
Recent observations of type Ia supernovae (SNIa) provide the main evidence that the current Universe is in an accelerating phase of expansion [1]. Cosmic microwave background (CMB) data indicate that the present Universe has also a a e-mail: marek.szydlowski@uj.edu.pl b e-mail: adam.krawiec@uj.edu.pl c e-mail: alex@oa.uj.edu.pl d e-mail: kamionka@astro.uni.wroc.pl negligible space curvature [2]. Therefore if we assume the Friedmann-Robertson-Walker (FRW) model in which the effects of nonhomogeneities are neglected, then the acceleration must be driven by a dark energy component X (matter fluid violating the strong energy condition ρ X + 3 p X ≥ 0). This kind of energy represents roughly 70 % of the matter content of the current Universe. Because the nature as well as mechanism of the cosmological origin of the dark energy component are unknown some alternative theories try to eliminate the dark energy option by modifying the theory of gravity itself. The main prototype of this kind of models is a class of covariant brane models based on the Dvali-Gabadadze-Porrati (DGP) model [3] as generalized to cosmology by Deffayet [4]. The simplest explanation of a dark energy component is the cosmological constant with effective equation of state p = −ρ but then the problem of its smallness appears and hence its relatively recent dominance. Although the CDM model offers a possibility of explanation of the observational data it is only the effective theory which contains the enigmatic theoretical term-the cosmological constant . Numerous other candidates for a dark energy description have also been proposed like the evolving scalar field [5], usually referred as quintessence, the phantom energy [6,7], the Chaplygin gas [8] model, etc. Some authors believe that the dark energy problem belongs to the quantum gravity domain [9].
Recent Planck observations still favor the standard cosmological model [10], especially for the high multipoles. However, in this model there are some problems with understanding the values of the density parameters for both dark matter and dark energy. The question is why energies of vacuum energy and dark matter are of the same order for the current Universe. The very popular methodology to solve this problem is to treat the coefficient equation of state as a free parameter, i.e. the wCDM model, which should be estimated from the astronomical and astrophysical data. The observations from the CMB and baryon acoustic oscillation (BAO) data sets give w x = −1.13 +0. 24 −0.23 with 95 % confidence levels [10].
Alternative to this idea of the phantom dark energy mechanism of alleviating the coincidence problem is to consider the interaction between dark matter and dark energy; the interaction model. Many authors investigated observational constraints of the interaction model. Costa et al. [11] concluded that the interaction models become in agreement with the admissible observational data which can provide some argument toward consistency of the measured density parameters. Yang and Xu [12] constrained some interaction models under the choice of an ansatz for the transfer energy mechanism. From this investigation the joined geometrical tests show a stricter constraint on the interaction model if we include information from the large scale structure [ f σ 8 (z) data] of the Universe. These authors have found the interaction rate in the 3σ region. This means that the recent cosmic observations favor it but with rather a small interaction between the both dark sectors. However, the measurement of the redshift-space distortion could rule out a large interaction rate in the 1σ region. Zhang and Liu [13] using the SNIa observations, H (z) data (OHD), CMB, and secular Sandage-Loeb obtained the small value of the interacting parameter: In all interaction models the specific ansatz for a model of interaction is postulated. There are infinite many of such models with a different form of interaction and there is some kind of a theoretical bias or degeneracy, coming from the choice of the potential form in scalar field cosmology. Szydlowski [14] proposed the idea of the estimation of the interaction parameter without any ansatz for the model of the interaction.
These theoretical models are consistent with the observations; they are able to explain the phenomenon of the accelerated expansion of the Universe. But should we really prefer such models over the CDM one? All observational constraints show that the CDM model still shows a good fit to the observational data. But from these constraints the small value of the interaction is still admissible. To answer this question we should use some model comparison methods to confront the existing cosmological models having observations at hand. We choose the information and Bayesian criteria of the model selection which are based on Occam's razor (principle), the well-known and effective instrument in science to obtain a definite answer of whether the interacting CDM model can be rejected. Let us assume that we have N pairs of measurements (y i , x i ) and that we want to find the relation between the y and x variables. Suppose that we can postulate k possible relations y ≡ f i (x,θ), whereθ is the vector of the unknown model parameters and i = 1, . . . , k. With the assumption that our observations come with uncorrelated Gaussian errors with a mean μ i = 0 and a standard deviation σ i , the goodness of fit for the theoretical model is measured by the quantity χ 2 given by where L is the likelihood function. For the particular family of models f l the best one to minimize the χ 2 quantity we denote f l (x,θ). The best model from our set of k models f 1 (x,θ), . . . , f k (x,θ) could be the one with the smallest value of the quantity χ 2 . But this method could give us misleading results. Generally speaking, for a more complex model the value of χ 2 is smaller, thus the most complex one will be the choice as the best from the set under consideration. A clue is given by Occam's principle known also as Occam's razor: "If two models describe the observations equally well, choose the simplest one". This principle has an aesthetic as well as empirical justification. Let us quote the simple example which illustrates this rule [15]. In Fig. 1 is observed a black box and a white one behind it. One can postulate two models: first, there is one box behind the black box, second, there are two boxes of identical height and color behind the black box. Both models explain our observations equally well. According to Occam's principle we should accept the explanation which is simpler so that there is only one white box behind the black one. Is not it more probable that there is only one box than two boxes with the same height and color?
We could not use this principle directly because the situations when two models explain the observations equally well are rare. But in information theory as well as in the In information theory there are no true models. There is only reality which can be approximated by models, which depend on some number of parameters. The best one from the set under consideration should be the best approximation to the truth. The information lost when truth is approximated by the model under consideration is measured by the so called Kullback-Leibler (KL) information, so the best one should minimize this quantity. It is impossible to compute the KL information directly because it depends on the truth, which is unknown. Akaike [16] found an approximation to the KL quantity, which is called the Akaike information criterion (AIC), given by where L is the maximum of the likelihood function and d is the number of model parameters. A model which is the best approximation to the truth from a set of models under consideration has the smallest value of the AIC quantity. It is convenient to evaluate the differences between the AIC quantities computed for the rest of the models from the set and the AIC for the best one. Those differences ( AIC ) are easy to interpret and allow for a quick 'strength of evidence' for a considered model with respect to the best one. The models with 0 ≤ AIC ≤ 2 have substantial support (evidence), those where 4 < AIC ≤ 7 have considerably less support, while models having AIC > 10 have essentially no support with respect to the best model. It is worth noting that the complexity of the model is interpreted here as the number of its free parameters that can be adjusted to fit the model to the observations. If models under consideration fit the data equally well according to the Akaike rule the best one is with the smallest number of model parameters (the simplest one in such an approach).
In the Bayesian framework the best model (from the model set under consideration) is that which has the largest value of probability in the light of the data (so-called posterior probability) [17] where P(M i ) is a prior probability for the model M i , D denotes the data, P(D) is the normalization constant, P(D|M i ) is the marginal likelihood, also called the evidence, where P(D|θ, M i ) is the likelihood under model i, P(θ |M i ) is the prior probability forθ under model i.
Let us note that we can include Occam's principle by assuming the greater prior probability for the simpler model, but this is not necessary and rarely used in practice. Usually one assumes that there is no evidence to favor one model over another which causes one to assign equal values of the prior for all models under consideration. It is convenient to evaluate the posterior ratio for models under consideration which in the case with a flat prior for the models is reduced to the evidence ratio, called the Bayes factor, The interpretation of twice the natural logarithm of the Bayes factor is as follows: 0 < 2 ln B i j ≤ 2 as weak evidence, 2 < 2 ln B i j ≤ 6 as positive evidence, 6 < 2 ln B i j ≤ 10 as strong evidence and 2 ln B i j > 10 as very strong evidence against model j comparing to model i. This quantity is our Occam's razor. Let us simplify the problem to illustrate how this principle works here [15,18]. Assume thatP(θ |D, M) is the non-normalized posterior probability for the vectorθ of the model parameters. In this notation E = P (θ |D, M)dθ. Suppose that the posterior has a strong peak in the maximum:θ MOD . It is reasonable to approximate the logarithm of the posterior by its Taylor expansion in the neighborhood ofθ MOD , so we finish with the expression where . The posterior is approximated by the Gaussian distribution with the mean θ MOD and the covariance matrix C. The evidence then has the form Because the posterior has a strong peak near the maximum, the highest contribution to the integral comes from the neighborhood close toθ MOD . The contribution from the other region ofθ can be ignored, so we can expand the limit of the integral to the whole of R d . With this assumption one can obtain E = (2π) . Suppose that the likelihood function has a sharp peak inθ and the prior forθ is nearly flat in the neighborhood ofθ . In this caseθ =θ MOD and the expression for the evidence takes the form E = L(2π) det CP(θ |M) is called the Occam factor (OF). When we consider the case with one model parameter with a flat prior, P(θ |M) = 1 θ the OF = 2πσ θ , this can be interpreted as the ratio of the volume occupied by the posterior to the volume occupied by the prior in the parameter space. The more parameter space wasted by the prior, the smaller value of the evidence. It is worth noting that the evidence does not penalize parameters which are unconstrained by the data [19].
As the evidence is hard to evaluate, an approximation to this quantity was proposed by Schwarz [20], the so-called Bayesian information criterion (BIC), and is given by where N is the number of the data points. The best model from a set under consideration is the one which minimizes the BIC quantity. One can notice the similarity between the AIC and BIC quantities, though they come from different approaches to the model selection problem. The dissimilarity is seen in the so-called penalty term: ad, which penalizes more complex models (complexity is identified here as the number of free model parameters). One can evaluate the factor by which the additional parameter must improve the goodness of fit to be included in the model. This factor must be greater than a; equal to 2 in the AIC case and equal to ln N in the BIC case. Notice that the latter depends on the number of data points. It can be shown that there is the simple relation between the BIC and the Bayes factor, The quantity B i j is the Bayes factor for the hypothesis (model) i against the hypothesis (model) j. We categorize this evidence against the model j taking the following ranking. The evidence against the model j is not worth than bare mentioning when twice the natural logarithm of the Bayes factor (or minus the difference between BICs) is 0 < 2 ln B i j ≤ 2, is positive when 2 < 2 ln B i j ≤ 6, is strong when 6 < 2 ln B i j ≤ 10, and is very strong when 2 ln B i j > 10.
It should be pointed out that the model selection methods presented are widely used in the context of cosmological model comparisons [18,19,[21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. We should keep in mind that the conclusions based on such quantities depend on the data at hand. Let us mention again the example with the black box. Suppose that we made a few steps toward this box so that we can see the difference between the height of the left and right side of the white box. Our conclusion changes now. Let us quote the example taken from [30]. Assume that we want to compare the Newtonian and Einsteinian theories in the light of the data coming from a laboratory experiment where general relativistic effects are negligible. In this situation the Bayes factor between Newtonian and Einsteinian theories will be close to unity. But comparing the general relativistic and Newtonian explanations of the deflection of a light ray that just grazes the Sun's surface gives the Bayes factor ∼ 10 10 in favor of the first one (and even greater with more accurate data).
We share George Efstathiou's opinion [41][42][43] that there is no sound theoretical basis for considering dynamical dark energy, whereas we are beginning to see an explanation for a small cosmological constant emerging from a more fundamental theory. In our opinion the CDM model has the status of a satisfactory effective theory. Efstathiou argued why the cosmological constant should be given a higher weight as a candidate for a dark energy description than the dynamical dark energy. In this argumentation Occam's principle is used to point out a more economical model explaining the observational data.
The main aim of this paper is to compare the simplest cosmological model-the CDM model-with its generalization where the interaction between dark energy and matter sectors is allowed using the methods described above.

Interacting CDM model
The interaction interpretation of the continuity condition (conservation condition) was investigated in the context of the coincidence problem since the paper Zimdahl [44], for recent developments in this area see Olivares et al. [45,46]; see also Le Delliou et al. [47] for a discussion of recent observational constraints.
Let us consider two basic equations which determine the evolution of FRW cosmological models, Equation (11) is called the acceleration equation and Eq. (12) is the conservation (or adiabatic) condition. Equation (11) can be rewritten in a form analogous to the Newtonian equation of motion, where V = V (a) is potential function of the scale factor a. To evaluate V (a) from (13) via integration by parts it is useful to rewrite (12) in the new equivalent form From (11) we obtain It is convenient to calculate the pressure p from (14) and then substitute into (15). After simple calculations we obtain from (15) Therefore In Eq. (17) ρ means the effective energy density of the fluid filling the Universe. We find a very simple interpretation of (11): the evolution of the Universe is equivalent to the motion of a particle of unit mass in the potential well parameterized by the scale factor. In the procedure of the reduction of the problem of the FRW evolution to the problem of the investigation of a dynamical system of a Newtonian type we only assume that the effective energy density satisfies the conservation condition. We do not assume the conservation condition for each energy component (or non-interacting matter sectors).
Equations (11) and (12) admit the first integral which is usually called the Friedmann first integral. This first integral has a simple interpretation in the particle-like description of the FRW cosmology, namely energy conservation. We havė where k is the curvature constant and V is given by Eq. (17). Let us consider the Universe filled with the two fluid components, where ρ m means the energy density of the usual dust matter and ρ X denotes the energy density of dark energy satisfying the equation of state p X = w X ρ X , where w X = w X (a). Then Eq. (14) can be separated into the dark matter and dark energy sectors, which in general can interact In our previous paper [48] it was assumed that which enables us to integrate (20), which gives The solution of the homogeneous equation (24) can be written in terms of the average w X (a) as where The solution of the nonhomogeneous equation (24) is Finally we obtain α n a n−3 + C X a 3(1+w X (a)) − a 1 a n−1+3w X (a) da a −3(1+w X (a)) .
The second and last terms originate from the interaction between the dark matter and dark energy sectors. Let us consider the simplest case of w X (a) = const = w X (a). Then integration of (27) can be performed and we obtain where C int = α n − α n−3w X . In this case we obtain one additional term in ρ eff or in the Friedmann first integral scaling like a 2−n . It is convenient to rewrite the Friedmann first integral in a new form, using dimensionless density parameters. Then we obtain Note that this additional power law term related to interaction can also be interpreted as the Cardassian or polytropic term [49,50] (one can easily show that the assumed form of the interaction always generates a correction of type a m , m = 1 − n, in the potential of the CDM model and vice versa). Another interpretation of this term might originate from the Lambda decaying cosmology when the Lambda term is parametrized by the scale factor [51].
In the next section we draw a comparison between the above model with the assumption that w X (a) = const = −1 and the CDM model.

Data
To estimate the parameters of the two models we used for our purposes the modified CosmoMC code [52,53] with the implemented nested sampling algorithm Multinest [54,55].
The likelihood function for the type Ia supernova data is defined by where C i j is the covariance matrix with the systematic errors, where H th (z i ) denotes the theoretically estimated Hubble function, H obs i is observational data.
The likelihood function for the BAO data is characterized by where C i j is the covariance matrix with the systematic errors, is the sound horizon at the drag epoch, and D A is the angular diameter distance.
The likelihood function for the information coming from the Alcock-Paczyński test is given by . Finally, we used likelihood function for the CMB shift parameter R [79], which is defined by where is the angular diameter distance to the last scattering surface, R obs = 1.7477 and σ −2 A = 48976.33 [80]. The total likelihood function L tot is defined as   1, h(z), BAO, and determinations of the Hubble function using the Alcock-Paczyński test data sets. Bottom estimations made using the Union2.1, h(z), BAO, determinations of the Hubble function using the Alcock-Paczyński test and the CMB R data sets

The model parameter estimation
The results of the estimation of the parameters of the CDM and the interacting CDM models are presented in Table 1. Given the likelihood function (31), first, we estimated the models parameters using the Union2.1 data only. Next, the parameter estimations with the joint data of the Union2.1, h(z), BAO, Alcock-Paczyński test [likelihood functions (31)- (34)] have been performed. Finally, we estimated the model parameters with the joint data enlarged with the CMB data [the total likelihood function (36)].
The value of the interaction parameter int,0 is very small for all data sets. Especially the result for the second data set [Union2.1, h(z), BAO, AP data] indicates that the interaction is probably negligible. There is also no indication of the direction of the interaction if it is a physical effect. While for the Union2.1 data set only the interaction parameter int,0 is negative and a greater value of m,0 in the interacting CDM model implies the flow from the dark energy sector to the matter sector, and for the data set consisting of all data the opposite.
The uncertainty of the each estimated model parameter is presented twofold: as 68 % confidence levels in Table 1 and as the marginalized probability distributions in Figs. 2 and 3.

The likelihood ratio test
We begin our statistical analysis with the likelihood ratio test. In this test one of the models (null model) is nested in a second model (alternative model) by fixing one of the second model parameters. In our case the null model is the CDM model, the alternative model is the interactive CDM model, and the parameter in question is int . We have The statistic is given by  Table 2. In all three cases the p values are greater than the significance level α = 0.05, which is why the null hypothesis cannot be rejected. In other words we cannot reject the hypothesis that there is no interaction between the dark matter and dark energy sector.

The model comparison using the AIC, BIC, and Bayes evidence
To obtain the values of the AIC and BIC quantities we perform the χ 2 = −2 ln L minimization procedure after marginalization over the H 0 parameter in the range 60, 80 . They are presented in Table 3.
Regardless the data set the differences of the AIC quantities are in the interval (3.4, 4) and are a little outside the interval (4, 7), which indicates the considerably smaller support for the interacting CDM model. It means that while the CDM model should be preferred over the interacting CDM model, the latter cannot be ruled out. However, we can arrive at a decisive conclusion employing the Bayes factor. The difference of BIC quantities is greater than 10 and have values in the interval (12,13) for all data sets. Thus, the Bayes factor indicates strong evidence against the interacting CDM model compared to the CDM model. Therefore we are strongly convinced we should reject the interaction between dark energy and dark matter sectors due to Occam's principle.

Conclusion
We considered the cosmological model with dark energy represented by the cosmological constant and the model with interaction between dark matter and dark energy (the interacting CDM model). These models were studied statistically using the available astronomical data and then com- pared using the tools taken from information as well as Bayesian theory. In both cases the model selection is based on Occam's principle, which states that if two models describe the observations equally well we should choose the simpler one. According to the Akaike and Bayesian information criteria the model complexity is interpreted in terms of the number of free model parameters, while according to the Bayesian evidence a more complex model has a greater volume of the parameter space. Anyone using the Bayesian methods in astronomy and cosmology should be aware of the ongoing debate not only about pros but also cons of this approach. Efstathiou provided a critique of the evidence ratio approach indicating difficulties in defining models and priors [81]. Jenkins and Peacock [82] called attention to too much noise in the data, which does not allow one to decide to accept or reject a model based solely on whether the evidence ratio reaches some threshold value. That is the reason that we also used the AIC based on information theory.
The observational constraints on the parameter values, which we have obtained, have confirmed previous results that if the interaction between dark energy and matter is a real effect it should be very small. Therefore it seems to be natural to ask whether cosmology with interaction between dark energy and matter is plausible.
At the beginning of our model selection analysis we performed the standard likelihood ratio test. This test conclusion was to fail to reject the null hypothesis that there is no interaction between matter and dark energy sectors with the significance level α = 0.05. It was the first clue against the interacting CDM model. The AIC between both models was less conclusive. While the CDM model was more supported, the interacting CDM cannot be rejected. On the other hand the Bayes factor has given a decisive result; there was a very strong evidence against the interacting CDM model compared to the CDM model. Given the weak or almost non-existing support for the interacting CDM model and bearing in mind Occam's razor we are inclined to reject this model.
We have also the theoretical argument against the interacting CDM model. If we consider the H 2 formula which is a base for estimation there is a degeneracy because one cannot distinguish the effects of interaction from the effect w(z)-the case of varying equation of state depending on time or redshift.
As was noted by Kunz [83] there is a dark degeneracy problem. It means that the effect of interaction cannot be distinguished from the effect of an additional non-interacting fluid with the constant equation of state w int = n/3 − 2. Therefore if we consider a mixture of all three non-interacting fluids we obtain the coefficient equation of state for the dark energy and interacting fluid in the form w dark = ( p X + p int ) C X (1 + z) 3(1+w X ) + C int (1 + z) 3−n = w X (1 + z) 3(1+w X ) + C int (1 + z) 3−n w int C X (1 + z) 3(1+w X ) + C int (1 + z) 3−n .