1 Introduction

One of the biggest challenges for the cosmological community is the explanation of the current accelerated Universe expansion. A theoretical conception, commonly called Dark Energy (DE), is introduced to explain this mysterious phenomenon and whose nature is still unraveled [1,2,3]. The standard model of cosmology, or simply \(\Lambda \)CDM, is the homogeneous and isotropic cosmological model whose material content is as follows: ordinary matter, the simplest form of Dark Energy known as cosmological constant \(\Lambda \) and, finally, a key component for the formation of structures in the Universe called Cold Dark Matter (CDM). It has had significant achievements, such as being in excellent agreement with most of the currently available data, for example, measurements coming from the Cosmic Microwave Background radiation [4], Supernovae Ia (SNeIa) [5], Cosmic Chronometers (CC) [6] and Baryon Acoustic Oscillations (BAO) [7]. Nevertheless, the \(\Lambda \)CDM model has its drawbacks: on theoretical grounds, the cosmological constant faces several problems, i.e., fine-tuning and cosmic coincidence [8, 9], and from an observational point of view, it also suffers to the so-called Hubble tension, a measurement disagreement of the Hubble parameter \(H_0\) among different datasets [10]. The presence of these issues opens the possibility to extensions beyond the standard cosmological model by considering, for instance, a dynamical DE, modifications to the general theory of relativity [11] or other approaches.

The search for possible signatures for cosmological models beyond the \(\Lambda \)CDM has led to the creation of an impressive set of high accuracy surveys, already underway or being planned [12,13,14], to gather a considerable amount of information that constrains the properties of the universe. A viable cosmological model that leads to the current accelerating universal expansion is demanded to comply with all the relevant observational data. Extensions to the cosmological constant that allow a redshift-dependent equation-of-state (EoS) w(z) include extra dimensions [15], modified gravity [16], scalar fields [17], scalar-tensor theories with non-minimal derivative coupling to the Einstein tensor [18], graduated dark energy [19], just to mention a few. However, in the absence of a fundamental and definitive theory of dark energy, a time-dependent behavior can also be investigated by choosing an EoS mathematically appealing or a parameterized form in a simple way; examples of these forms in terms of redshift include a Taylor expansion [20], polynomial [21], logarithmic [22], oscillatory [23, 24], a combined form of them [25] or in terms of cosmic time [26]. Nonetheless, the a priori assumption of a specific theoretical model may lead to misleading model-dependent results regardless of the dark energy properties. Hence, instead of committing to a particular cosmological model, the non-parametric inference techniques allow extract information directly from the data to detect features within cosmological functions, for instance, w(z). The main aim of a non-parametric approach is to infer (reconstruct) an unknown quantity based mainly on the data and make as few assumptions as possible [8, 27]. Several non-parametric techniques are used to perform model-independent reconstructions for cosmological functions directly from the data, such as histogram density estimators [28], Principal Component Analysis (PCA) [29], smoothed step functions [30], gaussian processes [31,32,33,34], Simulation Extrapolation method (SIMEX) [35] and Bayesian nodal free-form methods [36, 37].

After the reconstruction is performed, the function can be considered as a new model to look for possible deviations from the standard \(\Lambda \)CDM. In other words, the result of a model-independent reconstruction may be used to analyze its similarity with different theoretical models and, therefore, to select its best description for the data. There are several examples of model-independent reconstructions of cosmological functions, some of them focus on dark energy features [28, 30, 38, 39], on the cosmic expansion [35], deceleration parameter [34], growth rate of structure formation [33], luminosity distance [40, 41] and primordial power spectrum [42, 43], among many others.

On the other hand, the recent increase in computing power and the vast amount of coming data have allowed the incursion of machine learning methods as analysis tools in observational cosmology [44,45,46,47,48,49]. In this work, we focus on the computational models called Artificial Neural Networks (ANNs). They have been used in a variety of applications, such as image analysis [50, 51], N-body simulations [52, 53] and statistical methods [54,55,56,57,58].

In a similar way that model-independent reconstructions are used to recover the baseline function, the main goal of this paper is to propose a new method based on artificial neural networks using solely the current datasets with the most minimal theoretical assumptions. Here, we refer to the neural networks output as model-independent reconstructions because they do not incorporate any a priori cosmological assumption to generate the model from the datasets. This work is similar to previous research in which neural networks produce reconstructions of cosmological functions [59,60,61]. However, the novel differences here are the exploration of more cosmological datasets, the null consideration of a fiducial cosmology in the reconstructions, the exclusive use of the observational data to train the neural networks (even if they are small), and the new treatment to the non-diagonal error covariance matrices.

A benefit of using well-trained neural networks is that these do not consider a fiducial cosmology; the data generated can be assumed as new observations of the exact nature of the original dataset. Another advantage of neural networks over other standard interpolation techniques is that, due to their nonlinear modeling capabilities, these do not require consideration of any statistical distribution for the data. In addition, neural networks also allow us to generate computational models for the errors of the observational datasets; when the errors have no correlations (diagonal covariance matrices), we develop a single neural network model that considers measurements and errors from the original dataset. However, we must generate a different neural model when the error matrices are non-diagonal. We show that our methodology can apply to several astronomical datasets, including full covariance matrices with correlations among measurements, for which we introduce a special treatment with variational autoencoders.

The rest of the paper has the following structure. In Sect. 2, we briefly introduce the cosmological and statistical concepts used throughout this work: cosmological models, functions and observations in Sect. 2.1; a short summary of Bayesian inference in Sect. 2.2 and an overview of neural networks in section 2.3. Section 3 describes the methodology used during the neural network training to generate computational models based on cosmological data. Section 4 contains our results, in Sect. 4.1 we show the generation of model-independent reconstructions using neural networks from observational measurements of the Hubble distance H(z), a combination of the growth rate of cosmological perturbations times the matter power spectrum normalization \(f\sigma _8(z)\) and the distance modulus \(\mu (z)\) along with its covariance matrix. In Sect. 4.2, we use Bayesian inference on two cosmological models to check the consistency of our reconstructions in comparison with the original data and the expected values of the cosmological parameters. Finally, in Sect. 5 we expose our final comments. Furthermore, within the appendices, a brief description of feedforward neural networks and variational autoencoders is included, as well as the training process used for the networks and our experimental method to learn the covariance matrix.

2 Cosmological and statistical background

This section introduces the cosmological models, functions, and datasets used throughout this work. The datasets are used to develop the model-independent reconstructions with our method and the cosmological models are used to compare these reconstructions with the theoretical predictions. We also provide a brief overview of the relevant concepts of Bayesian inference, which we use as a consistency test for the results of our neural network reconstructions, and of the essential elements of Artificial Neural Networks, which are the core of our proposed method. Throughout this paper we use the geometric unit system where \( \hbar = c = 8\pi G = 1\).

2.1 Cosmological models and datasets

2.2 Models

The Friedmann equation describing the late-time dynamical evolution for a flat-\(\Lambda \)CDM model can be written as:

$$\begin{aligned} H(z)^2 = H_0^2\left[ \Omega _{m,0}(1+z)^3 + (1-\Omega _{m,0})\right] , \end{aligned}$$
(1)

where H is the Hubble parameter and \(\Omega _{m}\) is the matter density parameter, subscript 0 attached to any quantity denotes its present \((z= 0)\) value. In this case, the DE EoS is \(w(z) = -1\).

A step further to the standard model is to consider the dark energy being dynamic, where the evolution of its EoS is usually parameterized. A commonly used form of w(z) is to take into account the next contribution of a Taylor expansion in terms of the scale factor \(w(a)= w_0 + (1-a)w_a\) or in terms of redshift \(w(z) = w_0 + \frac{z}{1+z} w_a\); we refer to this model as CPL [20, 62]. The parameters \(w_0\) and \(w_a\) are real numbers such that at the present epoch \(w|_{z=0}=w_0\) and \(dw/dz|_{z=0}=-w_a\); we recover \(\Lambda \)CDM when \(w_0 = -1\) and \(w_a=0\). Hence the Friedmann equation for the CPL parameterization turns out to be:

$$\begin{aligned} H(z)^2= & {} H_0^2\left[ \Omega _{m,0}(1+z)^3 \right. \nonumber \\ {}{} & {} \left. + (1-\Omega _{m,0})(1+z)^{3(1+w_0+w_a)} e^{-\frac{3w_a z}{1+z}}\right] . \end{aligned}$$
(2)

2.3 Datasets

Cosmic chronometers (CC) are galaxies that evolve slowly and allow direct measurements of the Hubble parameter H(z). These measurements have been collected over several years [6, 63,64,65,66,67,68,69], and now 31 data points are available within redshifts between 0.09 and 1.965, along with their independent statistical errors.

The growth rate measurement is usually referred to the product of \(f\sigma _8(a)\) where \( f (a) \equiv d \ln \delta (a)/d\ln a \) is the growth rate of cosmological perturbations given by the density contrast \(\delta (a)\equiv \delta \rho /\rho \), being \(\rho \) the energy density and \(\sigma _8\) the normalization of the power spectrum on scales within spheres of \(8h^{-1}\)Mpc [70]. Therefore, the observable quantity \(f\sigma _8(a)\), or equivalently \(f\sigma _8(z)\), is obtained by solving numerically:

$$\begin{aligned} f\sigma _8 (a) = a\frac{\delta '(a)}{\delta (1)} \sigma _{8,0}. \end{aligned}$$
(3)

The \(f{\sigma _8}\) data are obtained through the peculiar velocities from Redshift Space Distortions (RSD) measurements [71] observed in redshift survey galaxies or by weak lensing [72], where the density perturbations of the galaxies are proportional to the perturbations of matter. We used an extended version of the Gold-2017 compilation available in [73], which includes 22 independent measurements of \(f\sigma {_8}(z)\) with their statistical errors obtained from redshift space distortion measurements across various surveys (see references therein); the authors explain that the data used from the \(f\sigma {_8}\) combination has been shown to be unbiased.

Table I of [73] contains the \(f\sigma {_8}(z)\) measurements along with their standard deviations used in this work to form our training dataset. In the same Table, it is indicated the reference matter density parameter \(\Omega _{m,0}\) for each measurement and other details of the dataset.

Supernovae (SNeIa). The SNeIa dataset used in this work corresponds to the Joint Lightcurve Analysis (JLA), a compilation of 740 Type Ia supernovae. It is available in a binned version that consists of 31 data points with a covariance matrix \(C_{JLA} \in {\mathbb {R}}^{31 \times 31}\) related to the systematic measurement errors [5]. As a proof of the concept, we focused on the binned version because, even though the treatment of a matrix in \({\mathbb {R}}^{740 \times 740}\) from the entire dataset is a straightforward process, it is very computationally expensive (see Appendix 1 for details). However, it can be implemented on more powerful computers.

Let us assume a spatially flat universe, for which the relationship between the luminosity distance \(d_L\) and the comoving distance D(z) is given by:

$$\begin{aligned} d_L (z) = \frac{1}{H_0}(1+z)D(z), \qquad \textrm{with }\qquad D(z) = H_0\int \frac{dz}{H(z)}. \nonumber \\ \end{aligned}$$
(4)

Using \(d_L\) defined in Eq. (4), and considering that the distance is expressed in Mega parsecs, the distance modulus is defined as follows:

$$\begin{aligned} \mu (z) = m - M = 5 \log _{10} d_L(z) + 25, \end{aligned}$$
(5)

where m is the apparent magnitude and M refers to the absolute magnitude. According to Ref. [5], in order to use the JLA binned data, and to perform the Bayesian parameter estimation, we need to apply the following likelihood:

$$\begin{aligned} \log \mathcal {L}= r^T \cdot C_{\textrm{JLA}}^{-1} \cdot r, \end{aligned}$$
(6)

where \(r = \mu _b - M - 5\log _{10} d_L (z)\) and \(\mu _b\) is the distance modulus obtained from the binned JLA dataset.

We can use the definition for the theoretical distance modulus from Eq. (5) and obtain \(r = \mu _b - \mu (z) + (25 - M)\), and for simplicity, we fixed M because the prior knowledge suggests a constant value [2]. However, reference [5] warns about the importance of treating the absolute magnitude M as a free parameter in the Bayesian inference when using the binned dataset to avoid any potential issues with the estimated value of the Hubble parameter. Nevertheless, as our main aim is to use the Bayesian inference as a proof of the concept for our methodology and not to draw a cosmological conclusion from the results, we have fixed M to the same value in all the tests for the sake of simplifying computations with the data and their covariance matrix.

Details about the calibration of the Type IA supernovae binned dataset, and its covariance matrix used in this work are contained in appendices E and F of the Ref. [5].

2.4 Bayesian inference

Given a set of observational data and a mathematical expression for a cosmological model, a conditional probability function can be constructed regarding the model parameters and the observables. There are many ways to infer the combination of parameters that best fit the data. In cosmology, Bayesian inference algorithms have been used prominently [74,75,76]; however, methods such as the Laplace approximation [77], genetic algorithms [46, 78], simulated annealing [79] or particle swarm optimization [80] have also been explored.

Bayesian statistics is a paradigm in which probabilities are computed given the prior knowledge of the data [81, 82]. It can perform two essential tasks in data analysis: parameter estimation and model comparison. The Bayes’ Theorem on which it is based is as follows:

$$\begin{aligned} P(\theta |D)= \frac{P(D|\theta )P(\theta )}{P(D)}, \end{aligned}$$
(7)

where D represents the observational dataset and \(\theta \) is the set of free parameters in the theoretical model. \(P(\theta )\) is the prior probability density function and represents the previous knowledge of the parameters. \(\mathcal {L}= P(D|\theta )\) is the likelihood function and indicates the conditional probability of the data D given the parameters \(\theta \) of a model. Finally, P(D) is a normalization constant, that is, the likelihood marginalization, and is called the Bayesian evidence. This quantity is very useful in model comparison, for example, it has been used in several papers to compare dark energy models through the Bayes factor and Jeffrey’s scale [17, 36].

Considering the datasets described above, we use the following log-likelihoods:

$$\begin{aligned} \log \mathcal {L}_i = {-\frac{1}{2}}(D^i_{\textrm{th}}-D^i_{\textrm{obs}})^T \cdot C_i^{-1} \cdot (D^i_{\textrm{th}}-D^i_{\textrm{obs}}), \end{aligned}$$
(8)

where \(i=1,2,3\) correspond to the three datasets: cosmic chronometers [\(D^{i=1} = H(z)\)] and growth rate measurements [\(D^{i=2} = f\sigma _8(z)\)]. \(D_{obs}\) represents the observational measurements, while \(D_{\textrm{th}}\) is the theoretical value for the cosmological models. \(C_{i=1}\) and \(C_{i=2}\) are diagonal covariance matrices. The log-likelihood for the SNeIa has been previously defined (see Eq. 6).

2.5 Artificial neural networks

Fig. 1
figure 1

Neural network architectures chosen to model the data from cosmic chronometers (CC), \(f{\sigma _8}\) measurements and the JLA SNeIa compilation, respectively; the batch size found for each case was: 16, 1 and 1. In the last architecture, there is only one node in the output layer because the errors are computed with a variational autoencoder (described in the Appendix D) given the original non-diagonal covariance matrix of the systematic errors. Blue numbers indicate the nodes in each layer. It is worth mentioning that, in all diagrams, both the redshift functions and the errors in the output layers refer to observational measurements, and no functional form of any cosmology is being considered

Artificial Neural Networks (ANNs) are computational models that learn the intrinsic patterns of a dataset. A neural network consists of several sets of neurons or nodes grouped into layers, and between the nodes of different layers some connections are associated with numbers called weights. The training of a neural network aims to find the best values for all the weights to produce a generalization of the data, and this is done through the minimization of an error function (called loss function) that measures the difference between the values predicted by the neural network and the actual values of the dataset (see Appendix A for more details and in Ref. [83] for an introduction to the subject).

The Universal Approximation Theorem [84] states that an Artificial Neural Network with at least one hidden layer with a finite number of neurons can approach any continuous function if the activation function is continuous and nonlinear. Therefore an ANN is capable of learning the intrinsic functions inside cosmological datasets and generating a model based only on the data. Two types of artificial neural networks are implemented in this work: Feedforward Neural Networks (FFNN) and AutoEncoders (AE). The FFNN, also called multilayer perceptrons or deep feedforward networks, are the quintessential deep learning models [85]. In this type of ANN, the connections and information flow are feed-forward, i.e., from the first to the last layer without loops. These consist of one input layer, at least one hidden layer, and an output layer. The input consists of the dataset’s independent variables (or features), while the output contains the dependent variables (or labels).

On the other hand, the autoencoders [86] are trained to generate a copy of its input on its output. We use this type of network to learn the errors of a dataset when they conform to a non-diagonal covariance matrix. We use the Variational Autoencoders (VAE). Details about autoencoders are in the Appendix B.

3 Methodology

The datasets in this work contain redshifts, an observable for each redshift and the corresponding statistical errors. Our goal is to generate neural network models for the data despite the complex dependency of these three variables. That is, we take advantage of the ability of neural networks to model the relationship between these variables. Neural networks, with a structure based on multiple neurons and nonlinear activation functions, allow us to generate computational models utterly independent of any existing cosmological model or statistical assumptions.

Even though neural networks are standard in the treatment of large datasets, there is no mathematical constraint in using them for any size of a given dataset, and it is probed in [87]; in particular, it is demonstrated that neural models can be used with a total number of weights larger than the number of sample data points. New approaches in neural network research that focus on small datasets are the references [88, 89], and a machine learning field, so-called few shot learning [90], uses only a few of samples to train the network. It is worth mentioning that using small datasets, although the computing resources are not demanding, it could become challenging to find the hyperparameters that generate an acceptable model. By monitoring the behavior of the loss function both in the training and the validation sets, we can check the equilibrium between the bias and variance to have certainty about the excellent calibration of the neural network.

In all our datasets, we use their lowest and highest redshifts as the limits of the training set, and then we select a random 20% as the validation set. We do not use a test set due to the small size of the dataset. However, we test several combinations of parameters to select those that generate an excellent neural network model.

For the analysis of cosmic chronometers and \(f{\sigma _8} \) measurements, we work with the FFNNs because their diagonal covariance matrices can be arranged into a single column of the same length as the number of observational measures. For these networks, we use the mean squared error (MSE) as a loss function which is a usual selection in regression problems:

$$\begin{aligned} {\textrm{MSE}} = \frac{1}{n} \sum _i^n (Y_i - {\hat{Y}}_i)^2, \end{aligned}$$
(9)
Fig. 2
figure 2

Hubble distance reconstruction with FFNNs. Left: Purple points represent the FFNN predictions for H(z) along with their error bars in red color. Right: Similarly to FFNN but adding MC-DO, we executed 100 times the Monte Carlo dropout to compute the uncertainties of the predictions. Therefore the purple points are the average predictions of the MC-DO executions, and the red error bars are the uncertainties of the FFNN plus the error predictions (see Eq. 16). In both cases, we compare the neural reconstructions with the original cosmic chronometers (green bars) and H(z) from \(\Lambda \)CDM, as shown in the labels. The small panels show the individual behavior of the loss function (MSE) in the training (red) and validation (green) sets, and these plots suggest that it is an excellent neural network model with no overfitting or underfitting

where \(Y_i\) is a vector with predictions of the ANN, \({\hat{Y}}_i\) a vector with the expected values, and n is the number of predictions (or the length of \(Y_i\) and \({\hat{Y}}_i\)).

In the case of SNeIa data, we use a FFNN to learn the distance modulus and a Variational Autoencoder for the non-diagonal covariance matrix of the systematic errors.

In addition, we implemented the Monte Carlo Dropout (MC-DO) method [91] in all our FFNNs. This method allows the output of a neural network to have an uncertainty associated with it and to generate robust models due to the dropout being a regularization technique. In the last part of Appendix A, we describe the basic definitions of Dropout and MC-DO.

We found the best architectures (shown in Fig. 1) among several combinations of the intrinsic parameters (hyperparameters) of the neural networks. Appendix C describes our careful selection of the hyperparameters of the feedforward neural networks, such as epochs, number of nodes, and how we have applied the MC-DO method. On the other hand, Appendix D explains how we configure the VAE neural network, its loss function and other details about its training, with the non-diagonal covariance matrix of the binned JLA compilation.

Once the neural networks are well trained, they constitute a model-independent reconstruction, for which we can compare with observations and theoretical predictions. As a consistency test of our neural reconstructions, we perform Bayesian inference for \(\Lambda \)CDM and CPL models, and the expected posterior probabilities would be very similar between the reconstruction and the original datasets; otherwise, another neural network architecture must be chosen. We use the following flat priors: for the matter density parameter today \(\Omega _m \in [0.05, \; 0.5]\), for the physical baryon density parameter \(\Omega _b h^2 \in [0.02, \; 0.025]\), for the reduced Hubble constant \(h \in [0.4, \; 0.9]\), and for the amplitude of the (linear) power spectrum \(\sigma _8 \in [0.6, \; 1.0]\). When assuming the CPL parameterisation, we use \(w_0 \in [-2.0, \; 0.0]\) and \(w_a \in [-2.0, \; 2.0]\). The h parameter refers to the dimensionless reduced Hubble parameter today H/100 kms\(^{-1}\)Mpc\(^{-1}\).

4 Results

From the observational datasets, we have trained the neural networks to reconstruct the Hubble parameter H(z), the growth rate \(f{\sigma _8}(z)\) and the distance modulus \(\mu (z)\), their predictions conform the corresponding model-independent reconstructions. Finally, we have performed the parameter estimation to test the consistency of the reconstructions.

4.1 Reconstructions

4.2 H(z) data

To visualize the H(z) reconstructions performed by the FFNN using the CC, we generate predictions of H(z) and their corresponding errors given 1000 different redshifts. In Fig. 2 we show the FFNN alone (left) and the FFNN using MC-DO (right), where the original data points with their statistical errors are green, while in magenta the neural network reconstruction along with their predicted errors. Also in this figure, we compare the outputs of the neural network models with the theoretical predictions of \(\Lambda \)CDM using the two values that yield the Hubble tension \(H_0 = 73.24 \;\)km s\(^{-1}\) Mpc\(^{-1}\) and \(\Omega _m = 0.27\) coming from the Cepheid variables [92] and, on the other hand, \(H_0 = 67.40 \;\) km s\(^{-1}\) Mpc\(^{-1}\) and \(\Omega _m = 0.316\) measured by the Planck mission [4].

Fig. 3
figure 3

Neural reconstructions for \(f\sigma _8(z)\) (red lines) and their respective errors. The original observations are in green with their uncertainties. Left: The \(f{\sigma _{8}}\) reconstruction performed by FFNN alone. Right: \(f{\sigma _{8}}\) reconstruction predicted by the FFNN using Monte Carlo dropout, the averages of 100 executions of MC-DO are indicated with the red data and their standard deviations are added to the error predictions. In both cases, the small panels display the behavior of the loss function (MSE) in the training (red curve) and validation (green curve) sets; in this case, these curves also show a good neural network model

We can notice that the FFNN alone and with MC-DO generate reconstructions in agreement with the theoretical predictions, exclusively based on the observable measurements and their statistical errors. The observational data points are only a few, therefore the scatter of the measurements is underestimated; however, based on their curves for the loss function, we can confirm that the neural networks generate good models. The dispersion in the FFNN+MC-DO reconstruction is higher because it performs statistics on several predictions and includes the uncertainty of the method itself, therefore its results are more robust and reliable than FFNN alone.

It is worth mentioning that the reconstructions performed by our method with FFNNs are consistent with the H(z) reconstructions of other works performed with Gaussian processes [47, 93,94,95] and with neural networks [60], where the training dataset consists of H(z) evaluations from a flat \(\Lambda \)CDM cosmology, redshifts are distributed under a gamma distribution, and errors are produced by an analytical expression [96]. In this sense, the advantages of our results are that they have no statistical assumptions on the data as Gaussian processes usually do, we do not use either the Friedmann equation or another cosmological equation to augment the datasets, and the neural networks learned directly to model observational errors without imposing some analytical expression beforehand.

4.3 \(f{\sigma _8}(z)\) data

We trained the FFNNs with the extended Gold-2017 compilation of growth rate measurements and their statistical uncertainties, we generate 1000 predictions from the trained neural nets to visualize the \(f{\sigma _{8}}(z)\) reconstructions. In Fig. 3, we plot the original data with their uncertainties (green), while the neural network predictions and their errors are displayed in red (left panel is the FFNN alone and right panel corresponds to FFNN+MC-DO). We also draw some curves of \(f{\sigma _{8}}(z)\) from the analytical evaluation of the CPL model for different values of \(w_0\) and \(w_a\). We notice that the models are within the reconstructions in both cases. Hence, this dataset by itself may provide loose constraints on the CPL parameters, mainly because there are very few points and relatively large statistical errors. However, the values \(w_0 = -0.8\) and \(w_a = -0.4\) (yellow line) seem to have a better agreement with the reconstruction.

We can analyze Fig. 3 to compare the two results. We can deduce that it is better to use the MC-DO than just the FFNN alone because MC-DO provides a dropout as a regularization technique, avoiding overfitting and producing a more general data model. The small dataset makes for the FFNN alone difficult to learn at redshifts close to zero; however, FFNN+MC-DO performs better in that respect. Regarding the MC-DO improvement, it can be noticed that in the case of the FFNN method, several data points are outside the reconstruction, while in the reconstruction generated by FFNN+MC-DO only the \(f{\sigma _{8}}(z=0.17)=0.51\) point is excluded. Despite the significant errors and its sparsity, the FFNNs could generate a model consistent with the underlying cosmological theory of the \(\Lambda \)CDM and CPL models. Moreover, the reconstructions produced by the FFNNs have a similar trend to other model-independent reconstructions of \(f{\sigma _{8}}(z)\) made by Gaussian processes [33, 97] with the advantage of letting aside any statistical assumption of the data distribution.

Fig. 4
figure 4

Comparison between the percentage error times 100 between the \(\Lambda \textrm{CDM}\) theoretical predictions for the distance modulus with the observational measurements and the neural network reconstructions. In the small panels of both figures, the behavior of the loss function, in logarithmic scale, is shown for both the validation (green curve) and training (red curve) sets along with the number of epochs for each case (300 and 1800); it can be seen that after the training process, we obtain acceptable models for the binned JLA dataset. Left: With FFNN alone. Right: FFNN with Monte Carlo Dropout

Fig. 5
figure 5

Left: Original covariance matrix with systematic errors from JLA compilation (binned version) with 961 entries. Middle: Absolute error for the covariance matrices predicted by the VAE+FFNN concerning the original ones. Right: Absolute error for the covariance matrices predicted by the VAE+FFNN with MC-DO

4.4 Distance modulus \(\mu (z)\) data

Our reconstruction methodology for the distance modulus differs from those previously presented; in this case, the main aim is modeling the errors of the observational measurements when they are correlated, that is, when the covariance matrix is non-diagonal. For this purpose, we introduce a new method based on a variational autoencoder (VAE) along with an FFNN to perform the whole neural network modeling for this dataset.

With the distance modulus reconstruction, performed by the FFNNs, we have generated synthetic data points from 31 log-uniformly distributed redshift values \(z \in [0.01, 1.3]\) plus a small Gaussian noise for both the FFNN alone and the FFNN+MC-DO. For comparison, in the Fig. 4 are the percentage differences between the \(\Lambda \)CDM predictions with the original observations from the binned JLA compilation (in green), and with the neural networks reconstructions (in red).

We can generate several points at any different values of redshift from the neural network models trained with the distance modulus and model the errors with a VAE neural network (see Appendix D for details of the developed method). Our motivation for using autoencoders for the covariance matrix is that an autocoder is trained to generate an output of the same nature as the input while encoding a compressed representation in the part between the encoder and the decoder. In addition, if we use a VAE, during training this compressed representation is also sampled through variational inference and, at the end of training, we can know the probability distribution that characterizes it and perform interpolations, sweeping the latent space, to generate new covariance matrices. Furthermore, we can force the dimension of this compressed representation (latent space) to be one-dimensional, for easier interpretation or to map to another 1D distribution.

A limitation of our method is that the new points, and errors, should correspond to the dimensionality of the matrix, in our case 31. Figure 5 shows the absolute error for the outputs of the VAE trained with the non-diagonal covariance matrix of the JLA systematic errors; it can be seen that in both cases (VAE+FFNNN and VAE+FFNNN+MC-DO) the differences are two or more orders of magnitude lower than the original matrix; therefore the new matrices are in agreement with the original one. Nonetheless, in Sect. 4.2, we test these covariance matrices predicted by the neural networks in a Bayesian inference framework to verify whether they are statistically consistent with the original data.

From Fig. 4, it can be seen that the reconstructions are in better agreement with the \(\Lambda \)CDM model than to the original data points; this may occur because when the neural network generates a model for all data points, it underestimates some of the observational variances and focuses more on the similarity of all observations. The FFNN alone has a smaller error in the first prediction, but the FFNN+MC-DO reconstructs the last redshifts better; however, based on the behavior of the loss function, we can say that the computational models generated by the neural networks for the binned JLA compilation are acceptable, both in the case of the FFNN alone, and with MC-DO.

Fig. 6
figure 6

2D marginalized posterior distributions from different combinations of datasets: original data, reconstructions data points from FFNN and FFNN+MC-DO. The green dashed lines (\(w_0=-1\), \(w_a=0\)) and (\(\Omega _1 = 0\), \(\Omega _2 = 0\)) correspond to the \(\Lambda \)CDM model. The constraints are plotted with \(1\sigma \) and \(2\sigma \) confidence contours

Table 1 Parameter estimation using Bayesian inference with datasets from different sources: original, FFNN alone, and FFNN using Monte Carlo dropout

4.5 Testing reconstructions with Bayesian inference

We use a Bayesian inference process for testing the consistency of the reconstructions obtained with the neural networks. In addition to the three original datasets (cosmic chronometers, \(f{\sigma _8}\) measurements, and binned JLA compilation), we have created two datasets for each type of observation from the trained FFNNs with and without MC-DO. As proof of the concept, the new datasets for CC and \(f{\sigma _8}\) consist of 50 random uniformly distributed points in redshift. At the same time, for SNeIa, they were 31 log-uniformly distributed in redshift (same size as the original dataset). We also generated its respective covariance matrix for the SNeIa case with the decoder part of the trained VAE. We performed the Bayesian inference with the data from the neural networks reconstructions and with the original data to evaluate the quality of the reconstructions. For this purpose, we analyze the \(\Lambda \)CDM and CPL models. The idea is that if the neural network reconstructions are satisfactory, the Bayesian estimation of the parameters for the theoretical models should be very similar from those obtained with the original observations, i.e., they should have similar means and standard deviations in the posterior distributions. If this condition is not satisfied, it is necessary to retrain the neural networks or use another hyperparameter configuration.

We have used the data from CC, \(f\sigma _8\) measurements, and JLA separately. The most representative results are in Fig. 6, along with Table 1, which contains mean values and standard deviations, and they have been sorted according to the datasets used as a source (original, FFNN, and FFNN+MC-DO), and to the two models involved (\(\Lambda \)CDM and CPL). Results are displayed for the reduced Hubble parameter h, \(\sigma _8\), \(w_0\) and \(w_a\) parameters for the CPL model. In addition, the last column of the Table 1 contains the \(-2\ln \mathcal {L}_{\textrm{max}}\) of the Bayesian inference process for each case. One thing to note is that the neural networks make models that could be thought of as a function \(g: z \in {\mathbb {R}}\rightarrow v\in {\mathbb {R}}^2\), \(v = (f(z), err(f(z)))\), where both f(z) and the error of the observational measurements are being modeled, so when neural network predictions are used to make Bayesian inference, the errors are of the same order of magnitude as the original ones. Before analyzing each scenario separately, it is worth mentioning some generalities in the results. First, it can be noted that when using a single source separately, the constraints are consistent. They all have a similar best fit (maximum likelihood), and secondly, the results agree with the \(\Lambda \)CDM model.

In the case of parameter estimation, displayed in Table 1, and posterior distributions shown in Fig. 6, we notice that the best-fit values are mutually contained within their \(1\sigma \) standard deviations, in agreement with the \(\Lambda \)CDM and CPL values. Therefore, the neural network models generated by cosmic chronometers,\(f{\sigma _8}(z)\) measurements and distance modulus, through the Bayesian parameter estimation, are statistically consistent with each other.

5 Conclusions

Throughout this work, we generated neural network models for cosmological datasets between redshifts \(z=0\) and \(z=2\) (cosmic chronometers, \(f\sigma _8\) measurements, and SNeIa). We used the neural models to generate model-independent reconstructions of H(z), \(f\sigma _8(z)\) and \(\mu (z)\). Then, we applied Bayesian inference to data points from the reconstructions to verify that they can reproduce the expected values of the cosmological parameters in \(\Lambda \)CDM and CPL models.

We have shown that well-calibrated artificial neural networks can produce computational models for cosmological data, even when the original datasets are small. The neural network models generate model-independent reconstructions of the Hubble distance H(z), \(f\sigma _8(z)\) and distance modulus \(\mu (z)\) exclusively from observational data and without assuming any cosmological model. Our results are consistent with previous works using different non-parametric inference techniques.

In general, the results of the neural networks with MC-DO are better because they are considering the uncertainty of the produced models, and the dropout technique provides regularization generating a more robust model. On the other hand, the standard deviations (or variance) of the FFNN+MC-DO predictions are small, which gives us the certainty that the neural network is well-trained. The FFNN+MC-DO predictions may have more variance than the FFNN alone, and the fact that the results obtained with both models are close allows us to conclude that the FFNN predictions are acceptable.

Because we are taking into account the original statistical errors as part of the training datasets, in the reconstructions of H(z) and \(f\sigma _8\), the errors have also been modeled by the neural networks. We are generating models for the errors; therefore, the new error bars are independent of a real data point at a given redshift, which is not the case in the Gaussian processes.

As seen in the appendices, a disadvantage of our method is that the neural networks training and their hyperparameter tuning are computationally more complex and have a higher CPU consumption than other interpolation or non-parametric inference techniques. However, our method offers some advantages that can make it a viable alternative:

  • Well-trained neural network models can be generated even with few data points.

  • No fiducial cosmology is necessary to generate model-independent neural reconstructions consistent with cosmological theory.

  • No assumptions have to be made about the statistical distribution of the data.

  • It allows computational models for observational data and their errors, even if they have correlations among them.

We have explored the generation of synthetic covariance matrices through a VAE neural network, and the results have allowed us to carry out Bayesian inference without drawbacks. The results we have obtained, as a first approach, are in agreement with other techniques. For larger datasets, we consider that using more complex architectures of autoencoders and a slightly different approach for dealing with the computing demand will be convenient.

It is worth mentioning that the results obtained in this work are for the chosen observations and have been sufficient to show some interesting features from the data alone. In this way, we can see that using neural networks for the model-independent reconstructions can complement the analysis of cosmological models and improve the interpretations of their behaviors. We plan to apply similar techniques to other data types, including a full set of covariance matrices, and also incorporate more sophisticated hyperparameter tuning to improve reconstructions.