1 Introduction

Accurate precipitation measurements comprise a valuable resource for the scientific community, allowing their implementation into different hydrological models for various purposes, e.g., prediction of floods and droughts [1,2,3,4]. However, precipitation measurements from ground stations do not provide sufficient coverage, with many basins having little or no precipitation measurements, constraining thus the performance of hydrological models and their related applications.

To overcome this limitation, many researchers have used interpolated gauge-based precipitation products in their hydrological studies. Lim Kam Sian et al. [5] studied the seasonal precipitation modes over Africa using daily precipitation data (1° × 1° spatial resolution), from the Global Precipitation Climatology Centre (GPCC) dataset, while Moazzam et al. [6] investigated the impact of climate change on snow cover over Pakistan, utilizing monthly rainfall and temperature data (0.5° × 0.5° spatial resolution), from the gridded Climatic Research Unit (CRU) dataset. Even though these products are based on gauge data, they have some uncertainty due to the scarcity of the gauge network, the interpolation method that was produced with, and the orographic variability [7,8,9]. Mallakpour et al. [8], on the other hand, used six different daily gridded precipitation datasets to study the rainfall characteristics over the United States. Two of them were only gauged-based, namely, the Climate Prediction Center (CPC) and the Daily Surface Weather and Climatological Summaries (DAYMET) datasets, one was produced by interpolating gauge and remote sensing data, the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) dataset, and the rest were produced by interpolating gauge, remote sensing, and reanalysis data, namely, the Multi-Source Weighted-Ensemble Precipitation (MSWEP), the Climate Hazards Group Infrared Precipitation with Stations (CHIRPS), and the Modern-Era Retrospective analysis for Research and Applications (MERRA) datasets. According to Timmermans et al. [9], the uncertainty in these datasets derives from the source data and the assimilation method used, even though they present high spatial and temporal resolution.

On the other hand, remote sensing precipitation estimates with their high spatial and temporal coverage provide an alternative, when measurements from gauges are sparse or not available. The Tropical Rainfall Measuring Mission (TRMM) (1997–2014) was the first satellite mission focused on estimating rainfall from space [10]. Its successor, the Global Precipitation Measurement (GPM) mission, using the Integrated Multi-satellite Retrievals for GPM (IMERG), from numerous Passive Microwave (PMW) and Infrared (IR) Imagers/Sounders and precipitation radars and gauges, has drawn the attention of the researchers owing to its improved temporal resolution (approximately every 30 min), latency (minimum latency 4 h), and reliability [11, 12]. The drawback of IMERG GPM precipitation estimates, though, is their coarse spatial resolution (0.1°). For this reason, many researchers have applied various statistical downscaling techniques to improve the spatial resolution and enhance the information content [13, 14].

Statistical downscaling techniques are based on the correlations between a predictand and one or more predictors. Their performance depends on the correlated variables and their statistical relation. Many different variables have been used that present a statistical correlation with precipitation, i.e., Normalized Differential Vegetation Index (NDVI) and Land Surface Temperature (LST), resulting in the spatial downscaling of the initial precipitation estimates [15, 16], with most applications focusing on downscaling monthly or annual precipitation estimates. Regarding downscaling of daily precipitation, cloud properties from the Moderate Resolution Imaging Spectroradiometer (MODIS), onboard the Aqua satellite, were also used as predictors, namely, cloud optical thickness (COT), cloud effective radius (CER), and cloud water path (CWP). COT is a measure of how much light is attenuated as it travels through the atmosphere due to the presence of cloud droplets describing the cloud thickness, while CER denotes the size of the cloud droplets. In addition, CWP is the total amount of water in an air column inside the cloud per unit area. Many previous studies have underlined their inter-connection with precipitation [17, 18]. For example, Zhao et al. [19] found that polluted clouds with high COT can cause an increase in CER, triggering collision/coalescence processes and the initiation of precipitation, affecting CWP.

Different algorithms and approaches have been utilized for the statistical downscaling of the initial precipitation estimates, including Artificial Neural Network (ANN) [20], Support Vector Machine (SVM) [21], Random Forest (RF) [22], Multivariate Linear Regression (MLR) [18], Geographically Weighted Regression (GWR) [23], and Deep Learning (DL) methods [24].

In this study, the performance of the MLR statistical method using a residual correction algorithm, proposed by Sharifi et al. [18], to spatially downscale daily IMERG precipitation estimates to a finer resolution was investigated, utilizing daily cloud properties from Aqua/MODIS over a region in northeastern Greece, for the period 01/2015-11/2022.

2 Study Area

Our study area is a region in northeastern Greece (40.55–41.35° N, 24.55–25.85° E) with a temperate climate and precipitation of approximately 400–600 mm/year [25, 26] (Fig. 1). The precipitation pattern is typical, with more rain during the autumn and winter seasons and less during the summer, which is influenced by the topography and associated with the westerlies that enter the Mediterranean from the Atlantic Ocean [26, 27]. The complex terrain with mountains, plains, sea, and islands and the wide range of precipitation totals over the region make it an ideal region for investigating the performance of this statistical downscaling method [28].

Fig. 1
figure 1

The study area with the location of the 6 rain gauges used (red dots)

3 Data and Methods

3.1 GPM–IMERG Precipitation Product

The IMERG algorithm integrates numerous multi-satellite retrievals from various bands (infrared (IR) and microwave (MW)) and gauge observations, providing remote sensing precipitation estimates at a spatial resolution of 0.1° × 0.1°, between 60° S–60° N, at a temporal resolution of different time scales (half-hourly, 3-hourly, daily, and monthly). It provides 3 core half-hourly products, namely, the IMERG Early Run (IMERG-E), the IMERG Late Run (IMERG-L), and the IMERG Final Run (IMERG-F) products with approximately 4 h, 18 h, and 4 months latency after observation time, respectively. The IMERG-F product, being adjusted based on the Global Precipitation Climatology Centre (GPCC) analysis, was found to have higher accuracy than the other two products, and thus, it is recommended for research purposes [29]. On the other hand, the IMERG-L product, with its 18 h latency availability and high accuracy, is the ideal precipitation product for near real-time applications. All the GPM IMERG data are available freely via the Goddard Earth Sciences Data and Information Services Center (GES DISC) website (https://disc.gsfc.nasa.gov/). More details about the algorithm and derivation of IMERG products can be found in Huffman et al. [12, 29, 30].

In our investigation, the daily IMERG-L precipitation estimates from the GPM era (Level 3, v06, 0.1° × 0.1°) were utilized, for the period 01/2015-11/2022, in order to evaluate the validity of the methodology presented in Sharifi et al. [18] in a Mediterranean region (Fig. 2a).

Fig. 2
figure 2

a IMERG Late Run precipitation estimates (mm), b MODIS cloud optical thickness (COT), c MODIS cloud effective radius (CER, µm), and d MODIS cloud water path (CWP, g/m2), on March 6, 2015, over the study area. The white areas indicate no data

3.2 MODIS Cloud Products

MODIS, onboard the Aqua satellite, is a passive remote sensing sensor with a swath of approximately 2300 km. It covers the entire globe daily, measuring the reflected solar and terrestrial radiation in 36 bands [31] with an overpass time for the area of study at about 13.30 local time (LT). Products are offered at various temporal (daily, 8-day, and monthly) and spatial resolutions (~ 1, ~ 3, ~5, ~ 10, and ~100 km). The Level 2 (L2) products have a spatial resolution of 1, 3, and 5–10 km, while the Level 3 (L3) (spatiotemporally aggregated from the L2 products) have 1° spatial resolution [32]. Cloud products are created utilizing an optical/microphysical algorithm that takes into consideration various visible, IR, and thermal MODIS channels in conjunction with theoretical model calculations [33]. More information about MODIS cloud products can be found in Levy et al. [31,32,33]. MODIS daily L2 and L3 products are provided by the Level-1 and Atmosphere Archive and Distribution System Distributed Active Archive Center (LAADS DAAC) website (https://ladsweb.modaps.eosdis.nasa.gov).

For our analysis, MODIS cloud optical thickness (COT), cloud effective radius (CER), and cloud water path (CWP) daily products (L2, v06.1, 0.01° × 0.01°) were used for the period 01/2015-11/2022 (Fig. 2b–d).

3.3 Rain Gauge Network

Precipitation data from 6 rain gauge stations for the period 01/01/2015–30/11/2022 were employed, for validating the final downscaled precipitation estimates produced (Fig. 1). These stations are situated at varying elevations in order to assess the efficacy of our methodology across diverse geographical regions. The characteristics of the stations are summarized in Table 1.

Table 1 Characteristics of the 6 meteorological stations used in this study

3.4 Multivariate Linear Regression Downscaling Method

The process of downscaling coarse-resolution remote sensing data using the MLR method has attracted the attention of the scientific community for many years. As far as the downscaling of satellite-derived precipitation estimates is concerned, many different approaches have been proposed, incorporating different predictors, i.e., elevation, NDVI, and LST [34, 35]. The implementation of this method is based on the assumption that if there is a statistical relationship between the predictand and the predictor(s) at a coarse scale, it will also be retained at a finer scale.

In our study, the MLR method was performed for the IMERG precipitation estimates using three cloud properties from Aqua/MODIS sensor as predictors, namely, the cloud optical thickness (COT), cloud effective radius (CER), and the cloud water path (CWP) variables, similar to the methodology presented in Sharifi et al. [18], but for all rain events, re-applying the MLR method for each daily dataset. Our dataset comprised 2870 values for the period between 01/01/2015–30/11/2022.

First, the cloud properties from MODIS were resampled, using bilinear interpolation, to match the IMERG data’s spatial resolution (from 0.01 to 0.1°). Then, to gain a first insight into the relation between the variables, we performed a statistical analysis on the data (Fig. S1, Supplementary Material). We found a weak, positive correlation between the IMERG and the parameters (R2IMERG − COT = 0.14, R2IMERG − CER = 0.04, and R2IMERG − CWP = 0.12) which is statistically significant at the 95% confidence level. Afterward, the statistical relationship between the predictand (IMERG) and the three predictors (COT, CER, and CWP) was investigated, calculating the intercept value and the regression coefficients for each day. The Predicted Precipitation at 0.1° spatial resolution was calculated, using the following equation. 

$${\text{PPi}}_\textit{i}=\alpha_\textit{i}+{\textit{x}}_{\text{1}\textit{i}}\ast\text{COT}+{\textit{x}}_{\text{2}\textit{i}}\ast\text{CER}+{\textit{x}}_{\text{3}\textit{i}}\ast\text{CWP}$$
(1)

where PPi refers to the Predicted Precipitation at day i, ai to the intercept value α (alpha) at day i, and x1i, x2i, and x3i to the estimated regression coefficients of COT, CER, and CWP at day i, respectively [18]. To find the difference between the IMERG precipitation estimates and the PP values for each day i, their residuals for each day were computed, according to Eq. 2.

$${\text{Residuals}}_\textit{i}={\text{IMERG}}_\textit{i}-{\text{PP}}_\textit{i}$$
(2)

Finally, the calculated residuals were resampled to 0.01° × 0.01° spatial resolution, using bilinear interpolation.

The PP at 0.01° spatial resolution was also estimated, using Eq. 1 but with COT, CER, and CWP variables, at their initial 0.01° spatial resolution. The calculated residuals were added to the PP at 0.01° spatial resolution, for the estimation of the final downscaled IMERG precipitation product.

3.5 MLR Method Application

In order to downscale the IMERG precipitation estimates from 0.01 to 0.1° spatial resolution, the following steps were followed.

  1. 1)

    The CER, COT, and CWP variables were resampled to 0.1° spatial resolution, using the bilinear interpolation method.

  2. 2)

    The statistical relationship between the IMERG precipitation estimates and the COT, CER, and CWP variables for each day was found and the parameters ai, x1ix2i, and x3i were calculated.

  3. 3)

    By using the COT, CER, and CWP variables at 0.1° spatial resolution, and the intercept values and regression coefficients for each day that were calculated in step 2, Eq. 1 was utilized to find the PP variables at 0.1° spatial resolution for each day.

  4. 4)

    The residuals between the IMERG precipitation estimates and the PP variables for each day were calculated using Eq. 2 and resampled to 0.01° spatial resolution.

  5. 5)

    By using the COT, CER, and CWP variables at 0.01° spatial resolution, and the intercept values and regression coefficients for each day that were calculated in step 2, the PP variables for each day at 0.01° spatial resolution were found using Eq. 1.

  6. 6)

    Finally, we added the residuals (step 4) to the respective calculated PP variables (step 5), in order to get the final downscaled precipitation estimates.

The methodology followed can be summarized in Fig. 3.

Fig. 3
figure 3

Methodology flowchart

4 Results

4.1 MLR Application

Since the IMERG precipitation estimates and the MODIS parameters have different spatial resolutions, a bilinear interpolation was implemented to match the COT, CER, and CWP’s resolution with that of the IMERG product (0.1°). Then, the MLR method was performed and the residuals were calculated. The results for the 6th of March 2015 are presented in Fig. 4 as an example. Compared with the original IMERG precipitation estimates’ spatial distribution (IMERG(0.1)) (Fig. 4a), the resampled residuals at 0.01° spatial resolution were found to have common spatial distribution, even though there was a lack of MODIS data over the southeastern part of the region on that day. In addition, in some cases, negative residuals were observed, an indication of the greater influence of some variables over certain areas (Fig. 4b) [18]. After applying the MLR method using the high-resolution MODIS parameters, the obtained Predicted Precipitation at 0.01° spatial resolution (PP(0.01), Fig. 4c) was found to be lower than the IMERG(0.1) but had the same spatial distribution in most of the cases. Finally, the downscaled precipitation (IMERG(0.01)) was calculated, after applying the residual correction to PP(0.01) data (Fig. 4d). The unavailability of MODIS data resulted in some missing data at the southernmost part of the study area. Nevertheless, the spatial distribution of the initial IMERG(0.1) data was preserved in the IMERG(0.01) for all the time period examined.

Fig. 4
figure 4

a Original IMERG precipitation estimates at 0.1° spatial resolution (mm), IMERG(0.1), b residuals at 0.01° spatial resolution (mm), c Predicted Precipitation at 0.01° spatial resolution (mm), PP(0.01), and d final downscaled IMERG precipitation (mm), IMERG(0.01), on March 6, 2015, over the study region. The white areas indicate no data

4.2 MLR Evaluation

To investigate the performance of the MLR method and specifically the effect of the residual correction on the PP(0.01) data, a statistical analysis was conducted. The relation between the initial IMERG precipitation estimates and the predicted precipitation at 0.01° spatial resolution, before and after the residual correction, was examined over the 6 different sites (rain gauge stations), namely, Avdira, Komotini, Kimmeria, Kosmio, Kerasia, and Nimfaia. The results of our statistical analysis between the initial IMERG precipitation estimates (IMERG(0.1)), the predicted precipitation estimates that were produced using the downscaled algorithm without the residual correction (PP(0.01)), and the final downscaled IMERG precipitation estimates that were produced using the downscaled algorithm after implementing the residual correction method (IMERG(0.01)) for the study period are presented in Table 2. Over all regions, the coefficient of determination (R2) between the initial and the final downscaled IMERG precipitation estimates increased significantly after applying the residual correction to the predicted precipitation at 0.01° (up to 37%), with a respective decrease of the Root Mean Square Error (RMSE) (up to 75%), the Normalized Root Mean Square Error (NRMSE) (up to 79%), and the Percent Bias (PB) (up to 98%). The observed improvement between the variables is in agreement with similar studies that used analogous methodology to improve the spatial resolution of coarse remote sensing data [18, 25].

Table 2 Coefficients of determination (R2), Root Mean Square Errors (RMSE), Normalized Root Mean Square Errors (NRMSE), and Percent Bias (PB) between the initial IMERG precipitation estimates (IMERG(0.1)) and the predicted precipitation at 0.01° spatial resolution before (PP(0.01)) and after the residual correction method (IMERG(0.01)), over six different sites (rain gauge stations), for the period 01/2015–11/2022

4.3 MLR Validation

To validate our results, a statistical analysis was also conducted, between the precipitation estimates before (IMERG(0.1)), and after the MLR method application with residual correction (IMERG(0.01)) and observations from our gauge sites (Fig. 5, Figs. S2S6, Supplementary Material). A slight improvement was found over all our sites with a respective decrease in the RMSE value. Specifically, over Avdira, an R2 of 0.4 with a RMSE of 8.795, a NRMSE of 4.362, and a PB of 10.661 was found between the IMERG(0.1) data and the rain gauge observations (Fig. 5a). On the other hand, a slightly higher coefficient of determination (R2 = 0.42) with a lower RMSE of 8.363 and NRMSE of 1.871 was observed between the IMERG(0.01) data and the observations (Fig. 5b). In addition, the residuals were closer to the “Zero Line” than before the MLR application, indicating a slightly better agreement with the observations, especially for rain events lower than 10 mm (Fig. 5a, b, bottom plots). It is worth noting that IMERG tends to overestimate precipitation over our region, due to its insufficient calibration by the Global Precipitation Climatology Centre (GPCC) product over Greece, which may explain the low R2 and the higher PB values that were observed in our statistical analysis [36].

Fig. 5
figure 5

Scatterplots of a IMERG(0.1)-Rain gauge observations and b IMERG(0.01)-Rain gauge observations after the application of Multivariate Linear Correlation method and the residual correction, with the respective residuals (bottom plots), over Avdira gauge station, for the period 01/2015–11/2022

The overall performance was similar over the other sites (Table S1, Supplementary Material), in agreement with previous studies [18]. It should be noted that the performance was only evaluated for the dates when both MODIS and rain gauge data were available. This resulted in utilizing 555 set of values over Kimmeria, 457 over Avdira, 89 over Kosmio, 92 over Kerasia, 104 over Nimfaia, and 662 set of values over Komotini.

Thus, the presented approach verifies that the MLR method can successfully downscale daily IMERG precipitation, enhancing its spatial resolution and may offer a suitable input to hydrological models and many other environmental applications. It is important to note that despite the limitations of the IMERG precipitation estimates over complex terrains [36, 37], the suggested approach was able to improve the accuracy of the IMERG data.

5 Discussion

Our study showed that the application of the residual correction method to the predicted precipitation estimates using MODIS cloud properties (COT, CER, and CWP) can improve the final downscaled precipitation IMERG product. Over all rain gauge stations, the R2 (RMSE, NRMSE, and PB) between the IMERG(0.01) and the IMERG(0.1) products increased (decreased) significantly compared to the R2 (RMSE, NRMSE, and PB) between PP(0.01) and IMERG(0.1) products (Table 2).

In addition, the capability of the IMERG(0.01) downscaled product to capture the precipitation pattern compared to the initial IMERG(0.1) product was also examined. For this reason, a comparison between the precipitation measurements from the 6 gauge stations and the IMERG(0.1) and IMERG(0.01) products was conducted. Over all stations, the R2 and RMSE values between IMERG(0.01) and observations were slightly better than the respective R2 and RMSE values between IMERG(0.1) and observations (Fig. 5, Figs. S2S6, Table S1, Supplementary Material). A greater improvement was observed examining the NRMSE values, an indication that the IMERG(0.01) product can perform well in complex terrain at varying altitudes. According to the PB values, which in our case serve as a metric for the over-/underestimation of the actual precipitation, the IMERG(0.01) was found to overestimate precipitation more than the IMERG(0.1). This is probably due to the fact that IMERG tends to overestimate precipitation over our region [36, 38].

6 Conclusions

In this work, the performance of the MLR method using a residual correction algorithm to spatially downscale daily IMERG-L precipitation estimates from 0.1 to 0.01° was investigated, over a region in NE Greece, for the period 01/2015–11/2022. For this purpose, daily cloud properties from Aqua/MODIS with a 0.01° spatial resolution were utilized, namely, cloud optical thickness, cloud effective radius, and cloud water path.

The implementation of the MLR method resulted in a downscaled IMERG precipitation product with higher spatial resolution and improved performance, following the spatial distribution of precipitation events of the original product. Despite the unavailability of MODIS data that was observed over many pixels during the studied time period, the initial IMERG data were spatially improved.

In addition, to investigate the effect of the residuals on the downscaled predicted precipitation data, the relation between the initial IMERG precipitation estimates and the predicted precipitation at 0.01° spatial resolution, before and after the residual correction, was statistically examined over different areas. Our analysis showed that the spatial resolution of predicted precipitation was improved by performing the residual correction algorithm.

Finally, for validation purposes, a statistical analysis was conducted between the precipitation estimates before and after the MLR method application with residual correction and gauge measurements from three sites. The final downscaled product was found to be slightly better correlated with the rain gauge observations over all sites than the initial IMERG product, probably due to the insufficient calibration of IMERG precipitation products over Greece.

Thus, the implementation of the MLR method in conjunction with the residual correction algorithm is an efficient tool for downscaling remote sensing products with a coarse spatial resolution, over complex terrains.

Since our MLR method is based on MODIS data, the unavailability of daily data in some cases hinders us from establishing the exact statistical relation between the predictand and the predictor(s). In addition, IMERG products due to their limited calibration over some areas may overestimate/underestimate the rainfall intensity, leading to a low correlation with the rain measurements provided by the established gauge network. In future works, we will investigate the influence of more parameters on precipitation, in order to improve the MLR technique described in our study and to develop a stronger statistical relationship between the parameters and precipitation. We also intend to investigate the optimal method for generating high resolution precipitation data that could be used in hydrological models, by combining our approach with other statistical and dynamical downscaling techniques over regions with varying climatological conditions.