Measuring the Hubble constant with cosmic chronometers: a machine learning approach

Local measurements of the Hubble constant (H0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document}) based on Cepheids e Type Ia supernova differ by ≈5σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\approx 5 \sigma $$\end{document} from the estimated value of H0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} from Planck CMB observations under Λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Lambda $$\end{document}CDM assumptions. In order to better understand this H0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} tension, the comparison of different methods of analysis will be fundamental to interpret the data sets provided by the next generation of surveys. In this paper, we deploy machine learning algorithms to measure the H0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} through a regression analysis on synthetic data of the expansion rate assuming different values of redshift and different levels of uncertainty. We compare the performance of different regression algorithms as Extra-Trees, Artificial Neural Network, Gradient Boosting, Support Vector Machines, and we find that the Support Vector Machine exhibits the best performance in terms of bias-variance tradeoff in most cases, showing itself a competitive cross-check to non-supervised regression methods such as Gaussian Processes.


I. INTRODUCTION
The standard model of Cosmology consists of a flat, homogeneous and isotropic universe whose energy content is dominated by a cosmological constant (Λ) and cold dark matter (ΛCDM) [1,2].Such a model provides the best description of cosmological observations such as temperature fluctuations of the Cosmic Microwave Background (CMB) [3], luminosity distances to Type Ia Supernovae (SNe) [4], large-scale clustering of galaxies (LSS), and weak gravitational lensing (WL) [5][6][7][8].Despite its tremendous success, this model presents theoretical caveats, such as the value of the vacuum energy density [9,10], in addition to observational challenges e.g. the 5σ Hubble constant tension between CMB and SNe observations [11][12][13], as well as milder tensions between matter density perturbation estimates from CMB and LSS, and slightly enhanced CMB lensing amplitude than predicted by the ΛCDM model.These conflicting measurements may hint at physics beyond the standard cosmology.
In this paper we discuss the ability to measure the Hubble Constant H 0 from cosmic chronometers measurements (H(z)) using different ML algorithms.We first produce H(z) synthetic data-sets with different number of data points and measurement uncertainties, in order to perform a benchmark test of the H 0 constraints for each algorithms given the quality of the input data.Rather than performing a numerical reconstruction across the redshift range probed by the data, and then fitting H 0 , we carry out an extrapolation of the reconstructed H(z) values down to z = 0. We also compare their performance with other non-parametric reconstruction methods, such as the popularly adopted Gaussian Processes (GAP) [81].Our goal is to verify whether they can provide a competitive cross-check with the GAP.
The paper is structured as follows: Section 2 is dedi- cated to the cosmological framework and the simulations produced for our analysis.Section 3 explains how this analysis is performed, along with the metrics adopted for algorithm performance evaluation.Section 4 presents our results; finally our main conclusions and final remarks are presented in Section 5.

II. SIMULATIONS A. Prescription
In order to compare how different predicting algorithm perform with different quality of data, we produce sets of H(z) simulated data sets and adopt the following prescription: (i) We assume as fiducial cosmology the flat ΛCDM model given by Planck 2018 (TT, TE, EE+lowE+lensing; hereafter P18) [3]: so that the Hubble parameter follows the Friedmann equation for the fiducial ΛCDM model (ii) We compute the values of H(z) considering the N z data points following a redshift distribution p(z) such as [27] where we fix θ and k to their respective best fits to the real cosmic chronometers data, as in [27], i.e., θ bf = 0.647 and k = 1.048.
(iii) In order to understand how our knowledge of H(z) along the redshift space affects the performance of the statistical learning, we provide different sets of H(z) assuming different numbers of points N z = 20, 30, 50 and 80, and assuming different relative uncertainties values, i.e., σ H /H = 0.008, 0.01, 0.03, 0.05, 0.08.This variation of N z and σ H (z) allows to evaluate what level of accuracy of measurements of H(z) is necessary in order to obtain a specific precision on the prediction of H 0 .
(iv) We also produce H(z) simulations based on the current cosmic chronometer data, which consists of N z = 31 measurements presented in I -see also Table I in [27].
Such a prescription provides a benchmark to test the performance of the ML algorithms deployed.

B. Uncertainty estimation
Although these algorithms are able to provide measurements of H 0 at a given redshift, they do not provide their uncertainties.We develop a Monte Carlo-bootstrap (MC-bootstrap) method for this purpose, described as follows • Rather than creating a single simulation centered on the fiducial model for each data-set (item (i) of section II), we produce H(z) measurements at a given redshift following a normal distribution centred around its fiducial value according to N (H fid (z), σ H /H). H fid (z) represents the H(z) value given by the fiducial Cosmology, whereas σ H consists on its uncertainty as described in the item (iii) of section II.
• As for the "real data" simulations, described in item (iv) of section II, we replace the i-th H(z) measurement presented in the second column of Table I by a value drawn from a normal distribution centered on the fiducial model, i.e, N (H fid (z i ), σ H;i ), where z i represents the redshift of each data point and σ H;i its corresponding uncertainty -first and third columns in I, respectively.
• We repeat this procedure 100 times for each dataset of N z data-points with σ H /H uncertainties as described in the item (iii) and (iv) of section II.
• The 100 MC realisations produced for each case are provided as inputs for each ML algorithms described in subsection IIIa • We report the average and standard deviation of these 100 values as the H 0 measurement and uncertainty, respectively, for each N z and σ H /H case.Same applies for the "real data" simulations.

III. ANALYSIS A. Methods
Our regression analysis are carried out on all simulated and "real" data-sets with several ML algorithms available in the scikit-learn package1 [90].Firstly we divide our input sample into training and testing datasets as z t r a i n , z t e s t , h z t r a i n , h z t e s t = t r a i n t e s t s p l i t ( z , hz , t e s t s i z e =0.25 , r a n d o m s t a t e =42) so that our testing sub-set contains 25% of the original sample size.Then we deploy different ML algorithms on the training test, looking for the "best combination" of hyperparameters with the help of GridSearchCV2 .This function of scikit-learn, given a ML method, performs the learning with all the combination of hyperparameters and shows the performance of every combinationor each one of them -during the cross-validation (CV) procedure 3 .Such a procedure is performed for the sake of avoiding overfitting on the test set.We chose CV = 3 in our analysis, given the limited number of H(z) data-points.
The ML methods deployed in our analysis are given as follows: • Extra-Trees (EXT): An ensemble of randomised decision trees (extra-trees).The goal of the algorithm is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.A tree can be seen as a piecewise constant approximation4 [91].We evaluate the algorithm hyperpameter values that best fit the input simulations through a grid search.Hence, our grid search over the EXT hyperpameters are given by: gcv = GridSearchCV ( E x t r a T r e e s R e g r e s s o r ( m i n s a m p l e s s p l i t =2, r a n d o m s t a t e =42) , p a r a m g r i d={ ' n e s t i m a t o r s ' : np .a r a n g e ( 1 , 1 0 0 , 2 ) , ' max depth ' : np .a r a n g e ( 1 , 1 0 , 2 ) , cv =3, r e f i t =True ) • Support Vector Machines (SVM): A linear model that creates a line or hyperplane to separate data into different classes [94,95].Originally developed for classification problems, it was also extended for regression, as in the goal of this work 7 .The hyperparameter grid search of the SVM method reads gcv = GridSearchCV (SVR( k e r n e l =' poly ' , C=100 , gamma=' auto ' , e p s i l o n =.1 , c o e f 0 =1) , p a r a m g r i d={ ' d e g r e e ' : np .a r a n g e ( 1 , 1 0 ) , } , cv =3, r e f i t =True ) Note that we adopted the default evaluation metric for each ML algorithm as defined by the scikit-learn package.So the EXT method uses the squared error metric to define the quality of the tree split -likewise for the GBR and ANN loss functions -whereas the SVM method assumes = 0.1, so that samples whose prediction is at least away from their true target are penalised 8 .
In order to evaluate the performance of these methods, we report the results of the training and test score as p r i n t ( gcv .b e s t e s t i m a t o r .p r e d i c t ( [ [ 0 .] ] ) , gcv .s c o r e ( z t r a i n , h z t r a i n ) , gcv .s c o r e ( z t e s t , h z t e s t ) ) Moreover, we deploy the well-known Gaussian Processes regression (GAP) algorithm on the same simulated data-sets using the GaPP package [81].We compare the results obtained with the ML algorithms just described with GAP since the latter has been widely used in the literature for similar purposes for about a decade.Two GAP kernels are assumed in our analysis, namely the Squared Exponential (SqExp) and Matérn(5/2) (Mat52).We justify these choices on the basis that the SqExp kernel exhibits greater differentiability than the Mat52, which may result in a larger degree of smoothing on the data reconstruction -hence, smaller reconstruction uncertainties -that may or may not fully represent the underlying data.

B. Robustness of results
We define the bias (b) as the average displacement of the predicted Hubble Constant (H pred 0 ), obtained from the MC-bootstrap method, from the fiducial value, i.e., ∆H 0 = H pred 0 − H fid 0 , and the Mean Squared Error (MSE) as the average squared displacement: Using the definition of variance, we estimate the biasvariance tradeoff of our analysis therefore we can evaluate the performance of these algorithms for each simulated data-set specification.

IV. RESULTS
We show our H 0 measurements for each algorithm in Fig. 1.The top panels present the results obtained from the EXT (left) and ANN (right) algorithms, whereas the middle panels display the GBR (left) and SVM (right) results, and the bottom ones show the GAP predictions for the Mat52 and SqExp kernels in the left and right plots, respectively.Each data point at these plots represents the predicted H 0 values (H pred 0 ) according to the prescription described in Section IIB for each simulated data-set specifications, i.e., different σ H /H results against the N z values.The light blue horizontal line corresponds to the fiducial H 0 .We can see that GBR and EXT are able to correctly predict the fiducial H 0 for higher N z and lower σ H /H values, but not otherwiseespecially for low values of N z , where these algorithms predict a larger H 0 value.This indicates a bias in the results, despite the cross-validation procedure adopted.We also find that ANN and SVM are able to recover the fiducial H 0 , even for lower quality sets of simulations.However, its predictions present larger variances as σ H /H increases.
These results show that GBR and EXT are more sensitive to N z concerning bias, whereas ANN is more sensitive to σ H /H with respect to its variance.On the other hand, SVM exhibits the best bias-variance tradeoff among all algorithms, as shown in Fig. 2, along with tables II to VI, presented in Appendix B. We obtain that SVM is able to recover the fiducial H 0 without significant losses in bias and variance as the simulation quality decreases.Such a result may happen due to a few reasons: For example, due to the non-guaranteed convergence of neural networks.By an appropriate choice of hyperparameters, ANN can approach a target function until a satisfactory result is reached; however, SVMs are theoretically grounded in their capacity to converge to the solution for a problem.Note that we adopted a polynomial kernel for SVM, and a nonlinear activaction function for ANN, namely "relu", in order to make a fair comparison between them.As for the EXT case, such an algorithm is known to be prone to overfitting and to outliers, which can explain the larger bias with lower variance.A similar problem applies for the GBR case as well.Not to mention that both demand a longer training time than ANN and SVM, which translates into a longer computational time to obtain the H 0 measurements and uncertainties in our case.
Regarding the comparison with the results obtained with GAP using the Squared Exponential (SqExp) and Matérn(5/2) (Mat52) kernels, we find good agreement between the SVM and GAP measurement of H 0 , in spite of a slightly larger BVT for the former.But note that GAP can also be prone to overfitting, as the data points with smaller uncertainties have a greater impact to determine the function that best represents the distance between data points to perform its numerical reconstruction -especially when assuming the SqExp kernel, which presents greater differentiability than the Mat52 case.Therefore, we show that SVM can be used as a crosscheck method for GAP regression, which has been widely used in the literature..Moreover, we show the H(z) reconstructions obtained from the simulations mimicking the real data configuration in Fig. 3, at a 1, 2 and 3σ confidence level, alongside the actual H(z) measurements.We can clearly see a "step-wise" behaviour on the EXT and GBR reconstructions, contrarily to other algorithms.This illustrates the bias problems they face, as commented before.Once again, the GAP results exhibit the smallest uncertainties among all, but this may also happen due to possible overfitting, as commented before.This is exemplified by the dip on the high-z end of the reconstruction.Nevertheless, the Hubble Constant measured by all algorithms are in agreement with each other, as depicted in Table VII, where EXT and GBR again exhibit a tendency towards larger H 0 values -and hence larger BVT -while ANN and SVM present less biased results, as in the previous cases.Interestingly, we find that ANN performed slightly better than SVM this time around, yielding a slightly lower BVT.Note also that our results are in good agreement with the predicted H 0 in [27], who used ANN as well in their analysis 9 , but we could obtain a slightly lower uncertainty in our predictions -roughly 17% uncertainty versus 23% in their case.
We also checked whether the test sample size affects the H 0 predictions and their bias-variance tradeoff.We find that its default choice, i.e., a split between 75%-25% between training and test sample, respectively, provides the best results for all algorithms compared to a 90%-10% and 60%-40% split, for instance.The EXT and GBR algorithms perform similarly for 10% and 25%, but their predictions become significantly worse for the 40% case.This is an expected result, since both al- 9 The authors reported H 0 = 67.33 ± 15.74 km s −1 Mpc −1 .
gorithms require large training sets to carry out such predictions.On the other hand, ANN and SVM perform significantly worse for split choices other than the default one.Finally, we verified the results for different cross-validations values, such as CV= 2, 4, 8, finding consistent values with those obtained with the standard choice CV= 3.

V. CONCLUSIONS
Machine learning has been gaining formidable importance in present times.Given the state-of-art of modern computation and processing facilities, the application of machine learning algorithms in physical sciences not only became popular, but essential in the process of handling huge data-sets and performing model prediction -specially in light of forthcoming redshift surveys.
Our work focused on a comparison of different machine learning algorithms for the sake of measuring the Hubble Constant from cosmic chronometers measurements.We used four different algorithms in our analysis, which are based on decision trees, artificial neural networks, support vector machine and gradient boosting, as available in the scikit-learn python package.We applied them on simulated H(z) data-sets assuming different specifications, and assuming a flat ΛCDM model consistent with Planck 2018 best fit, in order to measure H 0 through an extrapolation procedure.
Our uncertainties are estimated using a Monte Carlobootstrap method on the simulations, after properly splitting them into training and test sets, and performed a grid search over their hyperparameter space during the cross-validation procedure.In addition, we created a performance ranking between these methods via the bias-variance tradeoff, and compared them with other established methods in the literature, e.g.Gaussian Processes as in the GaPP code.
We obtained that the algorithms based on decision trees and gradient boosting present the lowest performance among all, as they provide low variance with a large bias in the reconstructed H 0 .Instead, the artificial neural networks and support vector machine are able to correctly recover the fiducial H 0 value, where the latter method exhibits the lowest variance among them.We also found that the support vector machine algorithm presents compatible benchmark metrics with the Gaussian Processes one.This result shows that such method can be successfully used as a cross-check method between different non-parametric reconstruction techniques, which will be of great importance in the advent of next-generation cosmological surveys [14][15][16][17][18][19], as they are expected to provide H(z) measurements with a few percent precision.
Artificial Neural Network (ANN): A Multilayered Perceptron algorithm that trains using backpropagation with no activation function in the output layer 5[92].The ANN hyperparameter grid search consists of: gcv = GridSearchCV ( MLPRegressor ( { a c t i v a t i o n =' r e l u ' } , s o l v e r =' l b f g s ' , l e a r n i n g r a t e =' a d a p t i v e ' , m a x i t e r =200 , r a n d o m s t a t e =42) , p a r a m g r i d={ ' h i d d e n l a y e r s i z e s ' : np .a r a n g e ( 1 0 , 2 5 0 , 1 0 ) , } , cv =3, r e f i t =True )• Gradient Boosting Regression (GBR): This estimator builds an additive model in a forward stage-wise fashion; it allows for the optimisation of arbitrary differentiable loss functions.In each stage a regression tree is fit on the negative gradient of the given loss function 6[93].The grid search over the GBR hyperparameters corresponds to: gcv = GridSearchCV ( G r a d i e n t B o o s t i n g R e g r e s s o r ( r a n d o m s t a t e =42) , p a r a m g r i d={ ' n e s t i m a t o r s ' : np .a r a n g e ( 1 , 2 0 0 , 5 ) , } , cv =3, r e f i t =True )

FIG. 1 :
FIG. 1: H0 measurements from the algorithm EXT (top left), ANN (top right), GBR (center left), SVM (center right), SqExp (lower left) and Mat52 (lower right), plotted against the number of simulated H(z) measurements.Each data point represents different σH /H values, whereas the light blue horizontal lines denote the fiducial H0 value.