1 Introduction

The standard model of cosmology is almost universally accepted as the concordance model for explaining cosmological observations [1, 2]. This is based on the incorporation of cold dark matter (CDM) to explain aspects of clustering [3, 4] while the late time accelerated expansion of the Universe [5, 6] is described through the action of a cosmological constant [7]. While theoretical problems [8] of the cosmological constant description and the direct measurability of CDM [9, 10] have been in question for decades, the recent problems of cosmological tensions [11,12,13,14,15,16,17,18] have brought into question the predictability of \(\varLambda \)CDM concordance model.

The cosmological tensions issue is most pronounced with the Hubble constant tension, which has shown a growing discrepancy between direct and indirect determinations of the \(H_0\) parameter [19]. The indirect approaches rely on assuming a \(\varLambda \)CDM cosmology [20] which is part of the reason why this model is being possibly reconsidered as the standard model of cosmology. In terms of indirect measurements, the latest reported values from the Planck and ACT collaborations are respectively \(H_0^\textrm{P18} = 67.4 \pm 0.5\) \(\mathrm{km\, s}^{-1} \textrm{Mpc}^{-1}\) [21] and \(H_0^\mathrm{ACT-DR4} = 67.9 \pm 1.5\) \(\mathrm{km\, s}^{-1} \textrm{Mpc}^{-1}\) [22], which point to a generically lower Hubble constant. On the other end of the spectrum, direct measurements of the Hubble constant have come from various different phenomenological sources. The strongest determination of the constant has come from the SH0ES team who have determined a best value of \(H_0^\textrm{R20} = 73.2 \pm 1.3\) \(\mathrm{km\, s}^{-1} \textrm{Mpc}^{-1}\) [23]. This is based on observations of Type Ia Supernovae (SN-Ia) that are calibrated using Cepheid stars in their host galaxies. In this spirit, strong lensing measurements by quasar systems has also produced a consistent direct result of \(H_0^\textrm{HW} = 73.3^{+1.7}_{-1.8}\) \(\mathrm{km\, s}^{-1} \textrm{Mpc}^{-1}\) which is due to the H0LiCOW Collaboration [24]. On the other hand, there is a direct result using the Tip of the Red Giant Branch (TRGB) technique which results in a lower value of the Hubble constant which gives \(H_0^\textrm{F20} = 69.8 \pm 1.9 \,\mathrm{km\, s}^{-1} \textrm{Mpc}^{-1}\) [25]. While systematics feature in every experiment, the Hubble tension appears to appear in several independent surveys and has now been present in several studies in the literature for some years.

The community has responded in several ways to this pressing problem. While work on understanding whether systematics may be the source of this tension will be ongoing for years to come, there is a growing body of work that is considering modifications to our standard picture of cosmology. The Hubble tension has been confronted with several interesting approaches in the literature including modifications to early Universe dark energy [20], as well as the neutrino sector [26], and renewed interest in modifications to gravitational models [27,28,29,30,31,32,33]. These approaches all offer interesting paths to new physics either through revisiting the foundations of cosmological models or by adding unknown components to the cosmological framework. However, many of these models are degenerate with each other in terms of current observational approaches which may require a new way of investigating new physics in the observational sector. One such approach is to consider the class of so-called model-independent methods. In this work, we aim to extend the current implementation of artificial neural networks (ANN) [34] in terms of the Hubble diagram so that there will eventually be a way to perform reconstruction of cosmological models.

Through ANNs, real-world observational data can be used for undertaking reconstructions and inferences that are independent of any underlying physical models. They are also free of many of the statistical assumptions that appear in many of the other techniques. In this work, we reconstruct the Hubble diagram from various combined data sets where we fully incorporate the information in the data, specifically the covariance matrix. We do this by building on ReFANNFootnote 1 [35] which was originally designed for reconstructing the Hubble diagram for data sets with independent uncertainties, based on PyTorch.Footnote 2 We ran this code on GPUs which significantly reduced the computational time as compared with CPU runs. In Sect. 2, we briefly introduce the data sets and discuss the reconstruction methodology adopted. We show the outputs for these analyses in Sect. 3. We compared and contrast our ANN outputs against their GP analogues in Sect. 4. The null tests for these outputs are performed in Sect. 5, while in Sect. 6 we discuss our main results and make some concluding remarks.

2 Observational data sets and methodology

In this part of the work we present the reconstruction methods used with a particular emphasis on ANNs and their architecture. We also discuss the data sets under investigation together with the priors used from the literature.

2.1 Methodology

The most popular approach to using model-independent techniques to study cosmology is through Gaussian processes (GP) [36] since they offer an integrated way to produce cosmological parameters together with their associated uncertainties. GP is based on a covariance function, or kernel, that characterizes the relationship between pairs of data points in a distribution. The kernel is functionally dependent on non-physical hyperparameters which can be fit using ordinary methods. The literature contains numerous works based on using this approach to reconstructing cosmological parameters [37,38,39,40,41,42,43,44,45,46,47,48,49,50]. Most recently, GPs have been used to reconstruct cosmological models [29,30,31,32,33] from a foundational perspective. However, GP suffer from two major drawbacks, namely (i) they have an overfitting issue for low redshifts which can artificially constrain the Hubble constant at the level of its uncertainties; (ii) there is an over-reliance on the choice of kernel which may affect the profile of the reconstructed parameters.

An alternative approach to reconstructed cosmological parameters is through ANNs, which also open the way to the use of more complex data such as non-Gaussian data points and correlated data sets. Here, artificial neurons are modeled to mimic their biological counterpart, which are then organized into layers through which input signals are transformed into output signals. One example that this is formulated is input redshifts giving Hubble parameter and uncertainty outputs [51,52,53]. An ANN is generally composed of a huge number of neurons that undergo training to optimize their associated hyperparameter values. A recent study in which this is performed is Ref. [35] which was further studied in Ref. [54] using null tests. Now, GP are a very attractive as an approach because they organically give higher order derivatives of their reconstructed function, and given that most cosmological models include such derivatives, they enter into the range of models that can be reconstructed in this way. In the recent work Ref. [55], the Hubble diagram ANN reconstruction method was extended to higher order derivatives using a Monte Carlo approach. This has opened the way for performing reconstructions of cosmological models. However. this work is based on using independent data points whereas most real world data is correlated in some way. This is normally contained in some covariance matrix. In Markov chain Monte Carlo analyses, this covariance matrix would feature in the log-likelihood of the sampler. Our main aim in the current work is to extend the reconstruction approach of the Hubble diagram to include covariance information. Together with the reconstruction of higher derivatives of the Hubble parameter this means that more complex reconstruction programmes of cosmological models can be considered.

Fig. 1
figure 1

The general structure of the adopted ANN, where the input is the redshift of a cosmological parameter \(\varUpsilon (z)\), and the outputs are the corresponding value and error of \(\varUpsilon (z)\)

To do this, consider the mechanics of ANN systems in which an input layer is connected to an output layer through a series of hidden internal layers where the majority of neurons are located. These neurons each feature hyperparameters which are set by training with the aim of having new inputs produce outcomes that real observations would. In our setup, the input signal simply consists of a redshift value while the output layer gives the mean Hubble parameter at that redshift together with the uncertainty at that point. This system is depicted in Fig. 1 for a generalized scenario where each redshift value z results in a generic cosmological parameter \(\varUpsilon (z)\) together with its corresponding uncertainty \(\sigma _\varUpsilon ^{}(z)\).

The ANN architecture is composed of each neuron possessing an activation function which calibrates the impact each neuron will have on the output for a particular input signal. Each neuron depends on hyperparameters (weights and biases) which during the training of the ANN take an optimal value. The layers are then structured as the input and output connections between each neuron. In this way, a signal traverses the whole network to produce an output signal in a structured way. In this work, we consider the exponential linear unit (ELU) [56] as the activation function, specified by

$$\begin{aligned} f(x) ={\left\{ \begin{array}{ll} {x} &{} \text {if } x>0 \\ {\alpha (e^x-1)} &{} \text {if } x \le 0 \end{array}\right. }\,, \end{aligned}$$
(1)

where \(\alpha \) is a positive hyperparameter that scales the value to which negative inputs are calibrated to, while positive inputs continue to traverse the network. Thus, complexity in the data would be incorporated through differently optimized hyperparameter values.

The hyperparameter values are set in the training process where real data is inputted through the system and hyperparameter values are optimized against real-world data. This is characterized by a loss function which measures the difference between predicted and ground truth values in \(\varUpsilon \). By minimizing the loss function, the ANN hyperparameters are optimized for particular data sets. An example of this process is the gradient descent combined with the back-propagation algorithm, while Adam’s algorithm [57] represents a slightly better version of this optimization algorithm. The L1 loss function is the simplest and most direct way of assessing the difference between the predicted and observed values of some parameter, where the absolute difference between observed and predicted values of the Hubble parameter at the observation redshift points are each summed, that is

$$\begin{aligned} \textrm{L1} = \sum _i |H_\textrm{obs}(z_i) - H_\textrm{pred}(z_i)|, \end{aligned}$$
(2)

where \(H_\textrm{obs}(z)\) and \(H_\textrm{pred}(z)\) are observed and ANN predicted values of the Hubble parameter at observation redshifts z. This is akin to the MCMC log-likelihood for independent data sets (less the uncertainties). Other loss functions exist but they do not generally incorporate more complexity in the data. In this work, we consider a native way to incorporate more complexity in the observed data sets by defining a new loss function analogous to the MCMC log-likelihood for correlated data sets. We do this by defining the following \(\chi ^2\) loss function

$$\begin{aligned}{} & {} \mathrm{L_{\chi ^2}} = \sum _{i,j} \left[ H_\textrm{obs}(z_i) - H_\textrm{pred}(z_i)\right] ^\text {T} \textrm{C}_{ij}^{-1} \nonumber \\{} & {} \quad \left[ H_\textrm{obs}(z_j) - H_\textrm{pred}(z_j)\right] , \end{aligned}$$
(3)

where \(\textrm{C}_{ij}\) is the total noise covariance matrix of the data, which includes the statistical noise and systematics. In this way, we will be able to naively use correlated data in our ANN architecture. While the exact details of the training process are contained in Sect. 3, this loss function assures that the ANN will infer Hubble expansion values that reflect both the mean observational values as well as the covariance matrix relationships between these points. To ensure the fidelity of this process, we employ a batch size that is equal to the Pantheon compilation sample size. On the other hand, one could divide this matrix and utilize smaller batch sizes if the whole data set were to be unmanageable larger.

Fig. 2
figure 2

Plots showing the reduced \(\chi ^2\) (left panel), and the evolution of the \(\chi ^2\) loss function (right panel), for configuring the optimal neural network architecture using the Pantheon SN-Ia \(d_L\) compilation

Table 1 Reduced \(\chi ^2\) obtained with different neural network architecture to determine the optimal configuration for the Pantheon SN-Ia \(d_L\) data. The best neural network architecture is highlighted in bold

In order to configure and train our network, we undertake the following steps:

  1. 1.

    Designing the neural network: After sorting the observational data sets from low to high redshifts, we use simple ANN, with one input layer (to feed the training redshifts) and one output layer (to predict the reconstructed function). We take into account network models with 1 and 2 hidden layers. The dropout rate is set to 0.2 to prevent it from over-fitting. The number of neurons in the hidden layers is chosen as \(2^n\) where \(2 \le n \le 13\). So the ANN architectures are \(1,~ 2^n,~ 1\) for ANN with 1 hidden layer and \(1,~ 2^n,~ 2^n,~ 1\) for those with two hidden layers.

  2. 2.

    Determining the optimal network configuration: The hyperparameters (weights and biases) of the network are initialized with fixed values. All the ANN configurations are trained after \(10^5\) iterations, to ensure that the loss function no longer decreases. We set the initial learning rate to 0.01 which goes on decreasing with the number of iterations and compute the averaged loss of the last 100 iterations. The predictions are made at the training redshifts and evaluate reduced \(\chi ^2\) for all the architectures considered. The ANN architecture with the least averaged loss of the last 100 iterations, and reduced \(\chi ^2\) just less than 1 is chosen as the optimal configuration. The optimal network architecture for Pantheon \(d_L\) compilation is found to be 1, 64, 64, 1 (see Fig. 2 and Table 1). On proceeding in a similar fashion, we get 1, 1024, 1 as the optimal network structure for the Hubble H(z) data.

  3. 3.

    Monte Carlo approach for final predictions: This optimal network architecture is now iterated over 500 times, for random initialization of hyperparameters along with the dropout effect. Thus, we get 500 samples of the reconstructed functions at the corresponding test redshifts, from which we compute the mean function and the respective uncertainties.

  4. 4.

    Derivative predictions: With the 500 realizations of the predicted functions, we compute numerical derivatives as, \( f'(z_i) \simeq \frac{f(z_{i+1}) - f(z_{i-1})}{z_{i+1} - z_{i-1}} \). From the reconstructed \(f'(z)\) samples, we obtain the mean values of reconstructed \(f'(z)\) along with the associated confidence levels using another MC routine [55].

  5. 5.

    Batch size: For determining the optimal network configuration, we employ a batch size that is equal to the data size. During the final predictions, the batch size adopted for the Pantheon compilation is 40 (equal to the size of the binned Pantheon data), and half the number of available measurements for the Hubble data.

These are also illustrated in Fig. 3 where the different processes in the construction, training and eventual reconstruction procedures are connected together.

Fig. 3
figure 3

Flow of ANN architecture design and reconstruction process

2.2 Data sets

We now employ ANNs to reconstruct the Hubble diagram, considering three sources of data. These include the cosmic chronometers (CC) and baryonic acoustic oscillation (BAO) measurements of the Hubble parameter, as well as the type Ia supernovae (SN) apparent magnitude data. Furthermore, keeping in mind the rising \(H_0\) tension, we consider the most precise Cepheid calibration result of \(H_0 = 73.3 \pm 1.04\) km Mpc\(^{-1}\) s\(^{-1}\) [58] by the SH0ES team (hereafter referred to as R21), recently inferred \(H_0 = 69.7 \pm 1.9\) km Mpc\(^{-1}\) s\(^{-1}\) [59] via the Tip of the Red Giant Branch (TRGB) calibration technique (hereafter referred to as TRGB) and the most precise early-time determination of \(H_0 = 67.4 \pm 0.5\) km Mpc\(^{-1}\) s\(^{-1}\) [21] inferred from the Cosmic Microwave Background (CMB) sky by the Planck 2018 survey (hereafter referred to as P18). In our analysis, we assume Gaussian prior distributions with the mean and variances corresponding to the central and 1\(\sigma \) reported values of each prior above.

Fig. 4
figure 4

Marginalized posteriors for the calibrated values of supernovae apparent magnitude \(M_B\) in the Pantheon compilation considering the R21, TRGB, and P18 \(H_0\) priors (in units of km Mpc\(^{-1}\) s\(^{-1}\)), respectively. The constraints obtained are \(M_B\) = \(-19.302 \pm 0.031\), \(-19.369 \pm 0.037\) and \(-19.425\pm 0.017\) corresponding to the R21, TRGB and P18 \(H_0\) priors

For the SN data, we take into account the full Pantheon [60] compilation consisting of 1048 supernovae. We attempt to reconstruct the comoving distances from the Pantheon compilation. To begin with, we convert the apparent magnitudes m(z) from the full supernova sample to the respective luminosity distances (in units of Mpc), as

$$\begin{aligned} d_L(z) = {10^{\frac{1}{5} \left[ m(z) - M_B -25 \right] }}, \end{aligned}$$
(4)

where \(M_B\) is the absolute magnitude of supernovae. We obtain the marginalized constraints on \(M_B\) assuming vanilla \(\varLambda \)CDM, considering a uniform prior \(M_B \in [-35, -5]\) via a Markov Chain Monte Carlo (MCMC) analysis using emceeFootnote 3 [61] python library. The calibrated constraints obtained are \(M_B\) = \(-19.302 \pm 0.031\), \(-19.369 \pm 0.037\) and \(-19.425\pm 0.017\) corresponding to the R21, TRGB and P18 \(H_0\) priors, respectively, are shown in Fig. 4 using GetDistFootnote 4 [62].

Again, we make use of the latest 32 CC Hubble parameter measurements [63,64,65,66,67,68,69], covering the redshift range up to \(z \sim 2\). These data do not assume any particular cosmological model but depend on the differential ages technique between galaxies, where we consider the full covariance matrix including the systematic and calibration errors [70]. We also take into account the BAO Hubble distance \(\frac{d_H(z)}{r_d}\) measurements [71,72,73,74,75,76,77] from different galaxy surveys like Sloan Digital Sky Survey (SDSS), the Baryon Oscillation Spectroscopic Survey (BOSS) and the extended Baryon Oscillation Spectroscopic Survey (eBOSS), such that

$$\begin{aligned} H(z) = c/{d_H(z)}. \end{aligned}$$
(5)

Note that, the BAO H(z) data assumes a fiducial value for the radius of the comoving sound horizon \(r_d\). To investigate the effect of the sound horizon scale on the reconstruction when using the BAO data, we consider the following constraint on \(r_d h = 102.56 \pm 1.87\) obtained by Camarena and Marra [78], keeping in mind the degeneracy between \(H_0\) and \(r_d\).

3 Neural network reconstruction

After preparation of the \(d_L\) data, we train a network model to learn to mimic the complex relationships between z, \(d_L(z)\) and \(\sigma _{d_L}(z)\). With this trained model, any arbitrary number of \(d_L(z)\) samples can be reconstructed by feeding a sequence of redshifts to this network model. Before training the network model on real data, we structure the optimal network configuration of our network model, i.e. determining the optimal number of neurons and layers according to Sec. A of [55].

Now, for the given sample of reconstructed \(d_L(z)\), we can arrive at the evolution of the normalized transverse comoving distance, D, from the Pantheon sample as

$$\begin{aligned} D(z) = \frac{H_0}{c(1+z)} d_L(z). \end{aligned}$$
(6)
Fig. 5
figure 5

Plots for the reconstructed (i) D(z) (left panel), and (ii) \(D^\prime (z)\) (right panel), using neural networks from the Pantheon SN data considering R21, TRGB, and P18 \(H_0\) priors

Fig. 6
figure 6

Plots for the reconstructed reduced Hubble parameter E(z) from the (i) Pantheon SN compilation (left panel) and (ii) combined CC+BAO Hubble data set (right panel), using neural networks considering R21, TRGB, and P18 \(H_0\) priors

The plot for the reconstructed D is shown in the left panel of Fig. 5. In this setting, the reconstruction is produced by feeding a number of redshift points into the ANN so that values of D and its associated uncertainty can be obtained. The observational covariance information will have been imprinted on the ANNs through the training process due to the form of the loss function, while the reconstructed diagram will simply be composed of mean values and uncertainties at specific redshift points. We also undertake the simultaneous reconstruction of \(D^\prime (z)\), the first order derivative of D(z), where this prime denotes derivative with respect to the redshift z, via an MC routine on multiple \(d_L(z)\) realizations, such that \(D^\prime (z) = \frac{H_0}{c(1+z)} ~d_L^\prime (z)\). This compounding effect of MC with ANNs is undertaken following the methodology described in Ref. [55]. The plot for the reconstructed \(D^\prime (z)\) is shown in the right panel of Fig. 5. Finally, one can plot the evolution of the reduced Hubble parameter E(z) from the supernovae data as, \(E(z) = 1/D^\prime (z)\), given in the left panel of Fig. 6.

For a comparison between the Hubble and supernovae data sets, we next utilize the ANN method to reconstruct the reduced Hubble parameter,

$$\begin{aligned} E(z) = H(z)/H_0, \end{aligned}$$
(7)

directly from the combined CC+BAO Hubble data. The uncertainty associated with the reconstructed E(z) is obtained via the Monte Carlo method. Plots for the reconstructed E(z) from the Hubble data are shown in the right panel of Fig. 6.

Fig. 7
figure 7

Plots for the reconstructed (i) D(z) (left panel), and (ii) \(D^\prime (z)\) (right panel), using Gaussian processes from the Pantheon SN data considering R21, TRGB, and P18 \(H_0\) priors

Fig. 8
figure 8

Plots for the reconstructed reduced Hubble parameter E(z) from the (i) Pantheon SN compilation (left panel) and (ii) combined CC+BAO Hubble data set (right panel), using Gaussian processes considering R21, TRGB, and P18 \(H_0\) priors

4 Comparison with Gaussian processes reconstruction

In this section, we will discuss the work done in this paper using ANN-based reconstruction techniques, compared to the ones from Gaussian Processes. We recall that the methods by which these two reconstruction strategies function are fundamentally different. While GP requires some constraints on the type of data that it can be applied to, ANNs make vastly fewer assumptions and feature a much higher number of hyperparameters, which are then fit during the training of the neural network. Thus, one would expect an ANN to be much less constrained by the complexity of the data, and to have wider uncertainties. On the other hand, since GP does have some information about the behavior of the data, it can obtain smaller uncertainties.

We start by comparing the normalized transverse comoving distance D(z) (6) which quantifies the comoving distance for an object of relatively small characteristic length with respect to the Hubble flow. This is an appropriate way in which to interpret the SN data, since it does not require a fully determined cosmological model on which to perform numerical integrals. In our case, we first show the reconstruction for D(z) in Fig. 5 where the evolution is shown for a wider range of redshifts with means being shown for the \(\varLambda \)CDM model, as well as reconstructions for various literature priors. Given our reconstruction approach, we can also show the reconstruction of the redshift derivative of D(z) for the same priors. This can be contrasted with the analogous plot Fig. 7 which is the GP reconstruction of the same plots. In both cases, the reconstructions have very low uncertainties for most of the evolution of both D(z) and its first derivative. This happens because there is such a volume of data for the Pantheon sample. Thus, both methods will function quite well in the reconstruction of this particular data set.

The other comparison that provides an important dimension to the performance of GP and ANNs is that of the reduced Hubble parameter described in Eq. (7) which is a rescaled Hubble parameter that features a theoretical prior in that \(E(0) = 1\). This rescaled Hubble parameter is used for both the Pantheon data set as well as for Hubble data in the form of CC+BAO. For the ANN reconstruction, the reduced Hubble parameter gives Fig. 6 in which the reconstruction based on the Pantheon data set shows good behavior for low to medium values of redshift but then becomes numerically unbounded for much larger redshifts, while the same parameter is well behaved for the whole data range in the CC+BAO case. On the other hand, the GP reconstruction, shown in Fig. 8 has associated uncertainties that increase at slightly lower redshifts for the Pantheon data set case. Also, the CC+BAO reconstruction with is in mild tension with \(\varLambda \)CDM at comparatively lower redshifts.

GP and ANN both have positive features in reconstructing cosmological data sets. However, ANN shows greater promise in that they rely on less rigid training data and can model more complex structures of data sets.

5 Null tests

We now introduce some null tests, namely the \(\mathscr {O}m\) diagnostics [79,80,81], followed by the \(H_0\) diagnostics [82], to test the validity of the concordance model of cosmology.

5.1 \(\mathscr {O}m\) diagnostics

The \(\mathscr {O}m\) diagnostic [79,80,81] serves as a null test to distinguish the \(\varLambda \)CDM model from alternative dark energy and modified gravity models, defined as

$$\begin{aligned} \mathscr {O}m (z) = \frac{E^2(z) - 1}{(1+z)^3 -1}, \end{aligned}$$
(8)

where \(E (z) = {H(z)}/{H_0}\) is the reduced Hubble parameter. It works on the principle that different models have different evolutionary trajectories in \(z-\mathscr {O}m(z)\) plane. Being a function of H(z) only, which can be directly reconstructed from observational data, it is independent of the cosmic equation of state. Moreover, there is no dependence on any theory of gravity. So, this exercise serves as an alternative route towards understanding the late-time cosmic acceleration in the absence of any convincing physical theory [83,84,85,86].

For a universe with an underlying expansion history E(z), given by the \(\varLambda \)CDM model, \(\mathscr {O}m(z)\) will essentially be a constant, exactly equal to \(\varOmega _{m0}\), the matter density parameter at the present epoch. The slope of \(\mathscr {O}m(z)\) can differentiate between different dark energy and modified gravity models even if the \(\varOmega _{m0}\) is not accurately known. Therefore, any possible deviation of \(\mathscr {O}m(z)\) from \(\varOmega _{m0}\) can be used to draw inferences on the dynamics of the universe. For the phenomenological wCDM model, where the dark energy component is described by a constant equation of state parameter w, a positive slope of the \(\mathscr {O}m(z)\) indicates a phantom behaviour of dark energy, whereas a negative slope points towards a quintessence dark energy model.

Fig. 9
figure 9

Plots for the reconstructed \(\mathscr {O}m\) diagnostics using (i) neural networks (left panel) and (ii) gaussian processes (right panel), from the Pantheon SN data considering R21, TRGB, and P18 \(H_0\) priors

Fig. 10
figure 10

Plots for the reconstructed \(\mathscr {O}m\) diagnostics using (i) neural networks (left panel) and (ii) gaussian processes (right panel), from the combined CC+BAO Hubble data considering R21, TRGB, and P18 \(H_0\) priors

We plot the \(\mathscr {O}m\) diagnostics, as a function of the redshift z, using the reconstructed E(z) in Figs. 9 and 10 from the Pantheon SN and combined CC+BAO Hubble data respectively. The uncertainties associated with the reconstructed \(\mathscr {O}m\) diagnostics are obtained by an MC error propagation technique. We also show a comparison between the two methods of reconstruction, i.e. implementation with neural networks in the left panel, and employing Gaussian processes in the right panel. Figures 9 and 10 show that the reconstructed values are not well constrained at lower redshifts \(z < 0.2\). The mean reconstructed \(\mathscr {O}m\) curves in both the figures show evolution with increasing redshift. In Fig. 9, we find that the mean curves are characterised by a significant positive slope for \(z > 1\), nonetheless the \(\varLambda \)CDM model assuming the Planck best-fit \(\varOmega _{m0}= 0.315\) [21] is consistent with the \(\mathscr {O}m\) reconstruction at the 2\(\sigma \) confidence level. Whereas, the reconstruction profile in Fig. 10 tends to be characterised by a negative slope for \(z > 1\), excluding \(\varLambda \)CDM at 2\(\sigma \) confidence level for \(z>2\). This deviation from the concordance model possibly arises from the inclusion of high redshift Ly-\(\alpha \) BAO measurements which calls for further investigation.

5.2 \(H_0\) diagnostics

The Hubble tension, routinely presented as a mismatch between the Hubble constant \(H_0\) determined from local measurements and a value inferred from the CMB sky assuming \(\varLambda \)CDM cosmology, essentially boils down to a disagreement between two numbers. Assuming this tension is cosmological in origin, the authors in [82] explore the possibility of other inferred values of \(H_0\), predicting that a “running of \(H_0\) with z” may be expected within the concordance model. Similar possibilities of a steadily varying trend in the inferred \(H_0\) as one moves from low to high redshift data have also been studied [87,88,89,90,91,92,93,94]. Such a phenomenological evolution of \(H_0\) with the z could be a straightforward alternative in resolving the tension without any direct investigation of the fundamental framework. One such diagnostic that flags possible deviations from \(\varLambda \)CDM is the \(H_0\) diagnostics \(\mathbf {{H0}}\), defined as

$$\begin{aligned} \textbf{H0} = \frac{H(z)}{ \sqrt{\varOmega _{m0}(1+z)^3 + 1-\varOmega _{m0} }} . \end{aligned}$$
(9)

This quantity \(\textbf{H0}\) provides us with a null test for the concordance model and a non-constancy of \(\textbf{H0}\) suggests evidence for new physics beyond \(\varLambda \)CDM.

Fig. 11
figure 11

Plots for the reconstructed \(\mathrm {H_0}\) diagnostics using (i) neural networks (left panel) and (ii) gaussian processes (right panel), from the Pantheon SN data considering R21, TRGB, and P18 \(H_0\) priors

Fig. 12
figure 12

Plots for the reconstructed \(\mathrm {H_0}\) diagnostics using (i) neural networks (left panel) and (ii) gaussian processes (right panel), from the combined CC+BAO Hubble data considering R21, TRGB, and P18 \(H_0\) priors

In this section, we plot the evolution of \(\textbf{H0}\) with respect to the redshift z from the reconstructed E(z) in Figs. 11 and 12 from the Pantheon SN and combined CC+BAO Hubble data respectively. The left panels correspond to the reconstruction with ANNs, whereas the right panel represents the reconstruction using GPs. We make use of the employed \(H_0\) priors to obtain the numerator \(H(z) = H_0 E(z)\), in the RHS of 9. The denominator has been fixed by sampling \(\varOmega _{m0}\) directly via an MCMC analysis with the combined CC+BAO+SN data sets assuming \(\varLambda \)CDM cosmology. The constraints obtained on \(\varOmega _{m0}\) are \(0.290 \pm 0.016\), \(0.298 \pm 0.017\) and \(0.303 \pm 0.016\) considering the R21, TRGB and P18 \(H_0\) priors. The uncertainties associated with the parameter \(\varOmega _{m0}\) and reconstructed H(z) are propagated using the MC error propagation technique.

Our results show that the mean reconstructed \(\textbf{H0}\) curves in both the figures show a non-monotonic evolution with respect to z. In Fig. 11, \(\textbf{H0}\) progressively increases with increasing z, but on going beyond \(z>2\) we observe a dip in the reconstruction profile. The presence of such a dip is apparent in the right panel when employing GPs. We also plot the R21, TRGB, and P18 \(H_0\) values in black solid, dashed and dotted lines to simultaneously compare them with the obtained \(\textbf{H0}(z)\) respectively. We find that the reconstructed errors accommodate \(\varLambda \)CDM within a \(2\sigma \) level. The non-monotonic nature of \(\textbf{H0}\) is clearly visible in Fig. 12, when the Hubble data is taken into consideration. The reconstructed \(\textbf{H0}\) profile indicates a clear deviation from \(\varLambda \)CDM at the 2\(\sigma \) confidence level, driven by Lyman-\(\alpha \) BAO leading to a significant dip in \(\textbf{H0}\) for \(z>2\). However, if we restrict our attention to \(z < 1\), where the quality of available data is much better, one finds little evidence for any deviation from \(\varLambda \)CDM cosmology.

6 Conclusion

Even though reconstruction techniques have been a very popular topic of research the last few years in cosmology, the majority of the studies focus on GP to reconstruct dark energy and its potential theoretical foundations. GP, however, suffer from various problems among which are overfitting at low redshifts, meaning that the reconstructed function is too closely aligned to low redshift data points, as well as the selection of a kernel which introduces a statistical bias.

ANNs have been proposed as a promising alternative to GPs, but in contrast to GPs, one can reconstruct only the cosmological parameters without their derivatives. There has been a recent work on the reconstruction of higher derivatives of the Hubble function in [55], where the authors use an MC approach. Even though, this helps with the testing of cosmological models, up to now there have been used only independent data points, while the most realistic data sets are correlated somehow.

In this work, our goal was to include covariance information in the reconstruction approach in order to be able to use more realistic data sets. Once we reconstruct a cosmological parameter, we can use the Monte Carlo approach to reconstruct its higher derivatives and thus reproduce or test the viability of various cosmological models with better accuracy than before.

In greater detail, we reconstructed the Hubble diagram for various combinations of Cosmic Chronometers, Baryon Acoustic Oscillations, as well as the 1048 data points of Supernovae type Ia of Pantheon, which are correlated. To do this, we expanded ReFANN, that was initially formed based on PyTorch, using only independent data points.

The type of data that ANNs can use is not as constrained as in GP. Specifically, ANNs make much less assumptions, because the many more hyperparameters they use, imitate in a better way the natural process compared to GP. For this reason, one would expect that, ANNs would produce higher uncertainties, however this is not the case here. Because of the large volume of data in the Pantheon set, both GP and ANNs perform in a similar way in terms of error bars. Thus, comparison between the two techniques shows more potential for the latter, since it does on exact training data and also can represent more complicated structures of data sets.

Last but not least, apart from the reconstruction of the Hubble function, we performed null tests in order to test the consistency of our results. In particular, through the \(\mathscr {O}m\) and the \(\textbf{H0}\) diagnostics we tried to identify possible deviations from the \(\varLambda \)CDM model. Both diagnostics indicate a deviation from the concordance model at \(z>2\), most probably because of the inclusion of the high redshift BAO data points. However, they both can accommodate \(\varLambda \)CDM at \(2\sigma \) confidence level.

What would interesting to see from now on, is not only to forecast observations for experiments in progress that are about to publish their results, but also to use the reconstructed Hubble parameter and its derivative to constrain or even eliminate alternative cosmological models.