Improving the accuracy of satellite and reanalysis precipitation data by their ensemble usage

This study evaluated the outputs of five precipitation (PCP) datasets. These models are ECMWF reanalysis 5th generation (ERA5), precipitation estimation from remotely sensed information using artificial neural networks-climate data record (PERSIANN-CDR), Asian precipitation-highly resolved observational data integration toward evaluation (APHRODITE), The national centers for environmental prediction climate forecast system reanalysis (NCEP CFSR) and climatic research unit (CRU). The PCP outputs of these models were compared with data of nine synoptic stations in the Khuzestan province. The results indicated a better match between the APHRODITE outputs and the PCP data at most stations (R2 > 0.85, root-mean-square error (RMSE) < 17.049 mm and − 4.25 < Bias < 2.633 mm). However, CRU model has the highest critical success index (more than 0.711) and the lowest false alarm ratio (less than 0.2) and ERA5 has the highest probability of detection (more than 0.967) at most stations. Then, PCP outputs of five reanalysis (ERA5), interpolated (APHRODITE, NCEP CFSR and CRU) and satellite (PERSIANN-CDR) PCP datasets were combined to reduce the PCP estimation error. The multivariate adaptive regression splines models were employed for this purpose. The results show that the RMSE of all the stations, except Ahvaz station, decreased and the BIAS decreased too. Given the results, using ensemble data methods is a suitable way for reducing the error and increasing the accuracy of these models.


Introduction
PCP is an important climatic variable that has great temporal and spatial changes. The PCP has significant effects on human life and environment. Hence, PCP measurements at different locations is a necessary issue. The accuracy of PCP data is important for water resource planning and management. Although PCP data can be prepared by synoptic stations, these data have not appropriate spatial distribution due to topographic and economic limitations.
PCP data can be prepared by (1) in situ measurements, (2) remote sensing, (3) numerical model outputs, and (4) a combination of these methods using data assimilation techniques (Hosseini-Moghari et al. 2018). The PCP data estimated by satellites and the generated data in climatic data generation centers can provide a suitable spatial distribution of PCP data over a large region.
Performance of PCP data according to different approaches was studied by following researchers Tanarhte et al. (2012) compare and evaluate temperature and PCP data sets based on observations in eight regions that signify distinct climate regimes in the Mediterranean and the Middle East for the period 1961-2000. Result show that APHRODITE outputs for the region of the Middle East with its relatively high density of synoptic stations have low correlation and similarities with the other data sets. Awange et al. (2019) evaluated monthly multi-source weighted-ensemble precipitation (MSWEP) (combination of satellite data, synoptic stations data and reanalysis model) in Australia andAfrica during 1981-2016. Their results show good correlations at most Australia except in regions with monsoon rainfall. Also, they showed that outputs of this model has not any advantage in compared with satellite or reanalysis data in Africa. Lockhoff et al. (2019) evaluated some PCP outputs (data of a synoptic station, three satellite-based models, and two reanalysis-based models) in western and central Europe. Their results show that the quality of the datasets largely depends on the region, season, and PCP characteristic. Results show that the satellite and the reanalysis PCP outputs are not accurate (their overestimations are more than 40%). Khoshchehreh et al. (2020) evaluated three PCP outputs in a heterogeneous and data-scarce basin in Iran. The results indicated that ERA-Interim has the best performance among other datasets. Bai et al. (2020) studied 12 PCP outputs made by raw retrieved outputs, blended with rain gauge data, and blended multiple source datasets in multi-temporal scales in the Qinghai-Tibet Plateau.
The results show that the PCP estimation from remotely sensed information using artificial neural networks has poor performance and multi-sources of blended PCP have better performance than the other PCP outputs. Fallah et al. (2020) evaluated several PCP outputs against local monthly PCP observations from 155 rainfall gauges in the Karun watershed in southwest of Iran during the period 2000-2015. They consider gauge-interpolated datasets, multi-source outputs and reanalysis models outputs. They find that most datasets show significant underestimations and they are usually smallest at low altitudes and increase toward more mountainous areas. Izadi et al. (2021) compared the ERA5 PCP dataset with observational datasets from synoptic stations in nine different PCP zones of Iran during the period 2000-2018. The results showed that PCP outputs of ERA5 and observed PCP in the southwest of Iran includes a mountainous area and a flat region has correlation larger than 0.55 and mean MAE 0.74 mm.
Global precipitation databases do not suffer from a lack of appropriate spatial-temporal resolution and the long precipitation time series required for climatic studies. However, since the accuracy of these databases changes from region to region, a comprehensive evaluation of the results of each database or a combination of databases in each region must be conducted (Tanarhte et al. 2012). This is more prevalent in regions that have not a great number of synoptic stations, such as Iran. In addition, despite extensive research on the subject worldwide, none of the PCP datasets has been agreed upon as the most authoritative. On the other hand, it is difficult to create a sufficient and dense observation network on a global scale. Thus, it is best to conduct these evaluations on a local scale.
Since the Khuzestan province is the largest producer of agricultural and livestock products in Iran, correctly estimating PCP, which is the main source of water for the sustainable development of this region, has great importance. The objectives of this research are 1. Evaluating the accuracy of the reanalysis (ERA5), satellite (PERSIANN-CDR), and interpolated (APHRO-DITE, NCEP, CFSR, and CRU) PCP outputs 2. Determining the performance of models in regions with plain, mountain and coastal features 3. Improving the performance of the different models by their ensemble usage by the Mars method 4. Furthermore, as the use of satellite, reanalysis, and interpolated data for PCP estimation has increased, a novelty in this work is the use of these databases in ensemble form, that has not been addressed in the literature.

Case study
The Khuzestan province (with an area of 64,000 km 2 ) locates in Southwest of Iran between latitudes of 29° 58′ and 33° 4′ N and longitudes of 47° 41′ and 50° 39′ E, occupying approximately 4% of the total area of Iran. The mean annual precipitation in this province is 248.3 mm with 39 rainy days per year (Fig. 1). The present study used data of nine synoptic stations, as shown in Table 1. The PCP data were prepared from the Iranian Meteorological Organization.

PCP datasets
In this study, the PCP values were prepared by satellite, remote sensing, and reanalysis datasets. Table 2 displays the used datasets in this research. These datasets are appropriate for arid and semiarid regions such as Iran. Also, the scale of the CRU, NCEP CFSR, PERSIANN-CDR and ERA5 datasets is global and they can be applied in different regions of world. The APHRODITE dataset is suitable for Asia (Iran

Performance criteria
Two groups of indices were used as performance criteria in this research. The indices of the first group are Nash-Sutcliff efficiency (NSE), root-mean-square error (RMSE), normalized RMSE (NRMSE), bias indicator (BIAS), mean absolute error (MAE), and coefficient of correlation-square (R 2 ). The NSE and R 2 illustrate correlation between PCP outputs of datasets and observed PCP, while RMSE, NRMSE, BIAS and MAE show accuracy of PCP outputs of datasets. Those of the second group, known as classified statistical indices, consist of critical success index (CSI), probability of detection (POD), and false alarm ratio (FAR). The CSI, POD and FAR indicate occurrence or nonoccurrence of PCP (ability of PCP datasets to accurately predict PCP). Due to the conditional nature of PCP, the use of CSI, POD and FAR is essential ( Table 3). The indices in the first group represent the accuracy of the PCP estimated. The indices in the second group determine the occurrence or lack of PCP. In Table 3, N represents the number of data, i represents the day, O i and M i express the observed and estimated data by the dataset. Moreover, H, J, and F denote the number of detected rainy days by the model, the number of undetected rainy days, and the false detected rainy days by the model, respectively. From the classified statistical indices, POD represents the ability of the dataset in correctly detecting the "PCP occurrence." It varies between 0 and 1, where a value close to 1 indicates better performance of a dataset in PCP estimation. CSI represents the fraction of correctly predicted "PCP occurrence" values. The FAR index indicates the number of times the incorrect PCP was recorded by the dataset. The smaller the value of this index, the better the performance of the dataset. Friedman (1991) introduced the multivariate adaptive regression spline (MARS), which is widely used in data mining and model building. It may be considered an extension of linear stepwise regression or a modified form of the regression tree. It is possible to define an estimation function in this method. This method is based on functions called basis functions, each of which can be a linear spline function or the product of two or more such functions, representing mutual interaction. In the MARS model, the explanatory variable space is divided into separate regions using special knots that have produced the largest reduction in the mean squared error. The MARS model is fitted in two steps. In the forward pass, a large number of basis functions with different knots are consecutively added to the model. This produces a complex model which leads to overfitting. In the second step, i.e., backward pass, the basis functions that contribute less to the estimation are eliminated. Ultimately, the best model is selected based on minimizing a criterion named generalized cross-validation (GCV) (Friedman 1991). The MARS model is a nonparametric regression model that has good performance in simulating nonlinear phenomena such as PCP. On the other hand, the execution speed of the MARS model is very high and there is no need to spend so much time. The MARS model has high accuracy in simulating PCP (Abraham et al. 2001), flow discharge , suspended sediment load (Esmaeili-Gisavandani et al. 2022) and flood (Msilini et al. 2020). Figure 2 illustrates the flowchart of research methodology in this study.

The performance criteria for the best PCP dataset
The values of different performance criteria for nine synoptic stations and the best PCP dataset are displayed in Table 4. Also, the complete results of performance criteria for nine synoptic stations and five PCP datasets are illustrated in Appendix (Table 6). Table 3 The used performance criteria in this study Results indicate that the R 2 values of five datasets in all the stations are in the range 0.59-0.97, which is acceptable. According to the results, the R 2 values presented by the APHRODITE model are better for all the stations. The range of R 2 values of this dataset are 0.85-0.97 (with the average being 0.92). The PERSIANN-CDR, ERA, NCEP CFSR, and CRU datasets are the second best model at 5, 2, 1, and 1 stations, respectively. Moreover, the R 2 value of NCEP CFSR, CRU, and ERA datasets is the smallest value at 4, 4, and 1 stations, respectively.
A study of the NSE and MAE indicated that the PCP outputs of the APHRODITE dataset were better than PCP outputs of other models. The NSE for all the stations vary between 0.835 and 0.965 (0.9 on average). On the other  The BIAS of PCP outputs shows that the APHRODITE, PERSIANN-CDR, and CRU data are better at most stations, while those of the other two datasets are not satisfactory. Moreover, it was observed that PCP outputs of the ERA5 and PERSIANN-CDR were more than observed PCP. On the other hand, the PCP outputs of APHRODITE were the smallest values at most stations.
As mentioned previously, the POD, CSI, and FAR, statistical classification indices, were also used in this research to evaluate the performance of the PCP outputs. As can be observed, based on the CSI index, the CRU dataset performed well with respect to correctly predicting PCP. This index varies between 0.71 and 0.82 (0.78 on average) at all the stations. In addition, the values of CSI are the smallest value for the PERSIANN-CDR dataset at all the stations. Moreover, the values of FAR at most stations are better for the CRU dataset, with an average of 0.16 for all the stations. This indicates the great ability of this dataset to detect rainy days. Hence, it seems that this dataset can be used for forecast rainy days, flood warnings, and drought monitoring systems. Furthermore, the ERA5 and APHRODITE datasets performed very well for correctly detecting PCP occurrence based on the POD index.
Also, the performance of the five datasets for nine synoptic stations was summarized in the shown Taylor diagrams (Taylor 2001) in Fig. 3.
The Taylor diagram shows three performance criteria, namely Pearson correlation coefficient, standard deviation, and root-mean-square deviation (RMSD). Each point in Taylor diagram shows the performance criteria of a dataset. The point that is closer to the point of observed PCP shows the best dataset.
According to the plotted Taylor diagrams for each station, the APHRODITE dataset has the best performance with the smallest RMSD at all stations except the Safiabad station. In general, the CRU and PERSIANN CDR are next appropriate datasets, and the ERA5 dataset has the worst performance.

Using the MARS method to combine and improve the PCP outputs
In order to obtain a better PCP estimation, the PCP outputs of 5 databsets were combined by MARS method. The MARS method extracts equations for PCP at each synoptic station. Due to the different climatic and topographic features of each synoptic station, it is necessary to extract a  separate equation for each station. These equations is illustrated in Appendix (Table 7). The results are displayed in Table 5. According to the results, the MARS model produced better overall results than the five datasets.
As seen in the table, the R 2 values in the MARS method reduced in comparison with R 2 of the APHRODITE dataset although it was still acceptable and not less than 0.86. The largest reduction occurred in the Ahvaz station, and the mean value of R 2 in the APHRODITE dataset and MARS method at all the stations were 0.96 and 0.93, respectively.
A study about RMSE in the APHRODITE dataset and the hybrid MARS method indicated that the MARS model reduced the RMSE values and produced better results. The RMSE decreased from 11.3 mm in APHRODITE to 5.98 mm in MARS for the Abadan station. It is worth mentioning that this parameter experiences a slight increase only in the Ahvaz station but, overall, it decreases by an average of 2.44 mm (20%) at all stations. The best performance of the hybrid MARS model with respect to RMSE was obtained in the coastal areas of the Persian Gulf in the Abadan and Bandar-Mahshahr stations with improvements of 46% and 42%, respectively, in comparison with the APH-RODITE dataset.
The MAE decreased in the MARS method in all the stations except for the Omidiyeh, Ahvaz, and Safiabad stations by 2.3%, 2.1%, and 8%, respectively. Overall, this parameter has improved by an average of 0.62 mm (11%) in the MARS method in comparison with the APHRODITE dataset. The best performance of the hybrid MARS method with respect to MAE was obtained in the coastal areas of the Persian Gulf in the Abadan and Bandar-Mahshahr stations with improvements of 36% and 29%, respectively, in comparison with the APHRODITE dataset.
With respect to the BIAS parameter, the MARS method showed better performance than the APHRODITE dataset in all the stations, with very good overall results that are close to zero in many of the stations. The BIAS decreased from 2.63 mm in APHRODITE dataset to 0.000057 mm in MARS for the Ramhormoz station.
A review and comparison of these results indicate that the hybrid MARS method produced better results in comparison with PCP outputs of APHRODITE dataset.

Discussion
The importance of water management is even more acute in a semiarid region. This study has performed a comprehensive analysis to evaluate the reliability of five PCP datasets on a monthly scale in Khuzestan province of Iran. Because of the scarce observed PCP data, it has explored the possibility of using reanalysis, satellite and interpolated PCP datasets for PCP estimates. Also, to increase the accuracy of the results, the PCP outputs of five datasets were combined using the MARS method.
The results in all synoptic stations generally indicate high accuracy of PCP outputs of APHRODITE dataset. For example in all stations, the R 2 value of PCP outputs of APHRODITE dataset is higher than 0.88. The recorded rainfalls by PCP gauges and estimated PCP by 5 datasets show that the APHRODITE dataset can track well how observed rainfall changes. Although APHRODITE dataset is a reliable PCP dataset, this has a major drawback, namely the failure to provide PCP estimates after 2007 for the Middle East (Iran and some other areas in Asia). Some other researchers have also pointed out the good performance of APHRODITE PCP dataset (Darand and Khandu 2020;Ghajarnia et al. 2015;Shayeghi et al. 2020). Howerer, Eini et al. (2019) showed that CRU and NCEP CFSR PCP datasets have better performance in comparison with APHRODITE PCP dataset. Saemian et al. 2021 evaluated PCP datasets in Iran. They observed that the R 2 values of PCP outputs of PCP datasets are more than 0.83 (similar to results of this study). High spatial resolution and Asiaspecific development are the reasons for APHRODITE better results compared to other data.
The results indicate that the obtained accuracy is directly depended to the topography of the station. The R 2 values at higher stations as the Masjed Soleiman and Behbahan are higher than other stations. This has been pointed out also by some other researchers (Darand and Khandu 2020). This underlines the importance of considering the orographic and geographic effects on the quality of PCP data (Fallah et al. 2020). Topography, region climate, and model resolution are some of the factors that affect PCP outputs performance of PCP datasets.
This study showed that PCP outputs of ERA5 PCP dataset were unsatisfactory at most stations. In contrast, some researchers have reported acceptable performance for ERA5PCP dataset (Khoshchehreh et al. 2020;Saemian et al. 2021;Taghizadeh et al. 2021).
The BIAS showed that the APHRODITE dataset exhibited underestimated PCP values at most stations such as Bhattacharyya et al. (2022) and unlike Ghajarnia et al. (2015). In addition, the most underestimated PCP values by CRU dataset occurred at higher regions (the Masjed Soleiman and Behbahan stations). The results of some other researchers indicated overestimation by the ERA5 dataset (Chen et al. 2021;Fallah et al. 2020;Izadi et al. 2021). On the other hand, some researchers observed overestimated (Wang et al. 2020) and some underestimated PCP values by the PERSIANN-CDR dataset (Fallah et al. 2020). Table 4 shows that the CRU PCP dataset has the highest CSI value. Then, the NCEP CFSR and APHRODITH PCP datasets have better CSI value, and the PERSINN-CDR PCP dataset (satellite PCP dataset) has the lowest CSI value.
Results show that the performances of satellite PCP dataset (such as PERSINN-CDR PCP dataset) should be adjusted fundamentally in the Khuzestan province such as results of Ghajarnia et al. (2015) in Iran. This study showed that POD value of five PCP datasets was more than 90 percent. Evaluation of statistical classification indices indicated that all PCP datasets are reliable for detecting the occurrence of rainfall.
The analysis of Taylor diagram in this study shows priority of APHRODITE PCP dataset upon other PCP datasets. On the other hand, PERSIANN-CDR PCP dataset has good correlation and standard deviation while it has not suitable RMSD. These results agree with Ghajarnia et al. (2015). As it can be seen, although PERSIANN-CDR, CRU and NCEP CFSR PCP datasets are having almost equal RMSD values, PERSIANN-CDR and CRU PCP datasets by having closer STDEV to STDEV observed data, is better than NCEP CFSR PCP datasets for PCP estimating.
This study used the MARS method to combine and improve the PCP outputs of five PCP datasets. The comparison of results indicates that the hybrid models produced better results in comparison with the single PCP datasets. The solution domain of the MARS method are divided into different distances according to a function and this matter improves results of the MARS method. Using the determined equations, the PCP values with high reliability can be estimated in case observed data are not available.

Conclusion
In general, PCP datasets are divided into interpolated, reanalysis, and satellite datasets. This study investigated about the PCP outputs of CRU, NCEP CFSR, and APHRODITE as interpolated PCP datasets, ERA5 as a reanalysis PCP dataset, and PERSIANN-CDR as a satellite PCP dataset. Subsequently, the PCP outputs were compared with the observed PCP values of nine synoptic stations with diverse geographical and climatic conditions. Moreover, a solution for combining various PCP outputs of five PCP datasets and increasing their accuracy was introduced. First, the observed PCP data were compared to the PCP outputs of five PCP datasets. According to most performance criteria, the results of the APHRODITE PCP dataset have better fitness with the observed PCP data in the Khuzestan province. Hence, one may conclude that APHRODITE, which is based on interpolating terrestrial PCP data, possesses the best performance, followed by PERSIANN-CDR, and then other interpolated and reanalysis PCP datasets.
A study of the results indicates that the accuracy of the results is directly depended to the topography. Specifically, stations with higher elevation produced better results.
Then, the MARS method was used to combine the PCP outputs of the five PCP datasets in order to obtain better results. An advantage of hybrid methods is the combination of the capabilities of interpolated PCP datasets, which agreed well with the observed PCP data, with those of other PCP datasets (reanalysis and satellite PCP datasets). For example, ability of the ERA5 PCP dataset to detect the occurrence of PCP can improve the accuracy of the hybrid method. The performance of the hybrid methods is better in areas without observed PCP data since they use other PCP datasets in addition to interpolation PCP datasets to estimate precipitation. The MARS method improved the results, such that the RMSE and MAE values in all the stations were, on average, 20% and 11% less than the RMSE and MAE values of the APHRODITE PCP dataset. Furthermore, the BIAS value underwent an improvement from 1.75 to 0.03 mm. According to the results, the combination of PCP datasets with MARS method exhibited good performance in coastal Persian Gulf regions, specifically in the Abadan and Bandar-Mahshahr synoptic stations.
In conclusion, hybrid model PCP data can be used with a high accuracy and safety factor in situations where it is difficult to obtain the observed data.    Authors contributions MJ, AA and ML declare that they have contribution in the preparation of this manuscript.
Funding The author(s) received no specific funding for this work.
Availability of data and materials All data, models, and code are available from the corresponding author by request.

Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.

Ethical approval
The manuscript is an original work with its own merit, has not been previously published in whole or in part, and is not being considered for publication elsewhere.  (