1 Introduction

Drought, as a natural disaster, can impact various economic sectors, nature, and society. Currently, millions of people are facing challenges due to drought. Drought is defined as a period of precipitation that is below normal for months or even years. Furthermore, climate change may exacerbate the impacts of drought events with different characteristics. Although a drought typically begins with a precipitation shortage, scholars have identified different types of drought, such as meteorological, hydrological, agricultural, socio-economic, and ecological drought (Crausbay et al. 2017), which can be interconnected. For example, groundwater is affected when meteorological drought persists for an extended period. This drought then leads to hydrological drought, resulting in a scarcity of soil moisture, also known as agricultural drought. Subsequently, socio-economic drought occurs when there is an imbalance between water supply and demand. When ecosystems are impacted by drought, it is referred to as ecological drought. Various drought types have distinct characteristics and quantification systems. To formulate effective policies, it is crucial to determine how drought has evolved and understand its potential impacts (Prodhan et al. 2021). However, many environmental studies have underestimated the significance of drought due to the lack of sufficient spatio-temporal field measurements of drought intensity at both regional and global scales (Shen et al. 2019). Spatial patterns of drought can be determined by applying interpolation methods to in-situ data. Nevertheless, accurate drought patterns cannot be captured by interpolation for a sparse in-situ network. Recent developments in space-based satellite techniques show promise, mainly due to the data measurement sensitivity throughout the electromagnetic spectrum for subsurface soil moisture and drought conditions.

In recent times, there has been significant attention given to the use of satellite data for drought monitoring. Most of these efforts focus on the relationships between reflectance in the thermal and visible domains and drought events (Rhee et al. 2010; Small et al. 2018; Wang and Qu 2007). The fundamental concept underlying these efforts is that the optical and thermal domains can measure specific environmental variables, such as total water, surface temperature, vegetation fraction, and soil color, which are significantly impacted by drought events (Mohseni and Mokhtarzade 2020; Shen et al. 2019). Previous studies have proposed and examined various indicators for monitoring drought states across different climatic regions, including the standardized precipitation index (SPI), deciles index, Palmer drought severity index, and precipitation anomaly index. Among these, SPI stands out as the most widely used indicator for drought tracking and analysis. This is attributed to several factors: (1) its simple computation, (2) its utilization of available precipitation data, (3) its flexibility to compute for any required time scale, (4) its ability to assess both short-term and long-term consequences for water resources, particularly soil moisture, and (5) the main benefit of SPI lies in its applicability to study all types of drought. This is crucial considering the significance of precipitation in all these types of drought (Azimi and Azhdary Moghaddam 2020).

Although satellite-based data can be used to compute drought indices, precipitation variation usually occurs at smaller spatial scales compared to the size of the satellite pixel. Therefore, downscaling of satellite-based precipitation data will be an essential step for hydro-climatic modeling. Among different satellite products, precipitation estimation from satellite information using artificial neural networks—climate data record (PERSIANN-CDR) offers near real-time precipitation data employing microwave and infrared satellite records from global geosynchronous satellites. PERSIANN-CDR provides long-term information (since 1983) at a spatial scale of 0.25° in the 60° S—60° N latitude band (Ullah et al. 2019). PERSIANN-CDR is more credible compared to other spatial precipitation data because various data sources are utilized to create PERSIANN-CDR products (Ashouri et al. 2015). However, satellite precipitation estimates suffer from important drawbacks due to their coarse spatial scales (Sun et al. 2018), making them unsuitable for regional applications. Therefore, a reliable downscaling method is necessary to increase the spatial resolution of satellite precipitation estimates.

Methods of downscaling can be categorized as dynamical and statistical (Zhao et al. 2020). Dynamical downscaling methods provide fine-scale climatic data by modeling the physical processes based on a numerical weather model (NWM) or regional climate model (RCM), but they usually require a high number of calculations. Dynamic methods are more suitable for downscaling meteorological variables present in RCMs (such as precipitation) rather than indices. The spatial resolution of RCMs varies from 20 to 60 km. Statistical downscaling methods simulate the statistical relation between fine-scale and coarse-scale covariates. In this way, auxiliary data can be employed to improve the downscaling performance (Immerzeel et al. 2009). Statistical downscaling methods are classified into weather typing and analogs, regression approaches, and weather generators. Due to their simplicity and low computational demand, regression methods are a good alternative to dynamical downscaling. Artificial intelligence (AI) methods, as a type of statistical downscaling method, are highly popular due to their high potential in modeling nonlinear and complex phenomena. Although various studies have been conducted to downscale precipitation (Goly et al. 2014; Sachindra et al. 2018; Tripathi et al. 2006), some of these studies have employed AI models to monitor and model drought. For instance, Neeti et al. (2021) developed an integrated framework employing a random forest algorithm to produce precipitation data at fine spatial scales and used the disaggregated results for monitoring meteorological drought. The results indicated that the developed framework is viable and generally adequate for application in other areas for drought analysis. Rhee and Im (2017) proposed a fine-scale drought modeling framework for ungauged regions. The standardized precipitation evapotranspiration index (SPEI) and SPI were estimated at a 0.05° spatial resolution using AI methods, as well as spatial interpolation by Kriging to satellite products. Park et al. (2017) developed a high-resolution soil moisture drought index (HSMDI) based on the random forest algorithm for monitoring hydrological, agricultural, and meteorological droughts. The obtained results showed a high correlation with in-situ data. Anagnostopoulou (2017) employed a statistical downscaling method using an artificial neural network (ANN) to simulate the SPI. RCM precipitation was also utilized to predict the SPI. The outcomes revealed that RCMs accurately captured the spatial extent, duration, and intensity of drought for SPI-12, while the ANN performed better for estimating SPI-3 compared to long-term indices. In another study, Anagnostopoulou et al. (2013) developed a statistical downscaling method using an ANN for simulating the SPI. While their methodology led to overestimated results for mean SPI, the regeneration of SPI-6 and SPI-3 for spring and winter seasons displayed reliable outcomes.

More recently, deep learning-based AI methods have been applied for drought modeling. For example, Lee et al. (2019) used a deep feedforward neural network (DFNN) model to simulate soil moisture via satellite data for drought investigation, showing a high correlation with in-situ measurements. Zhang et al. (2017) found that a DFNN model trained by satellite data is capable of modeling complicated patterns of soil moisture. Agana and Homaifar (2017) designed a deep belief network (DBN) to model drought using only historical data of the standardized streamflow index (SSI) as predictors. Shen et al. (2019) proposed a drought monitoring method employing a DFNN model, demonstrating a high ability to simulate agricultural and meteorological droughts. Prodhan et al. (2021) monitored agricultural drought employing a DFNN where soil and vegetation parameters, as well as precipitation, were considered as predictors. They showed that the DFNN method outperformed distributed random forest (DRF) and gradient boosting machine (GBM) models. Furthermore, several studies have used deep learning to downscale precipitation. For instance, Kumar et al. (2021) employed a methodology based on a convolutional neural network (CNN) to downscale precipitation over India, showing high accuracy compared to observations. In another study by Sun and Lan (2021), precipitation downscaling was performed using a CNN model over China, revealing that the CNN model had high performance and good skills in reproducing frequency distributions of daily precipitation. Wang et al. (2021) suggested a super-resolution deep residual network (SRDRN) for precipitation downscaling. The outcomes demonstrated that the SRDRN method not only captured spatial and temporal patterns remarkably well but also reproduced precipitation extremes in different locations and times at the local scale. However, it is essential to note that deep learning models have a high number of trainable parameters, and in situations where data is scarce, a deep learning model may not be a suitable choice (Sharghi et al. 2022).

Despite the fact that data-driven methods (e.g., ANNs, etc.) may provide relatively reliable results, it is clear that for a given problem, different methods may yield different results. Thus, by integrating different methods through an ensemble technique, different patterns of phenomena can be captured more accurately. One specific method is not necessarily the best for all time periods and conditions. Such model ensemble methods have been used in various contexts (e.g., see Nourani et al. 2022; Shamseldin et al. 1997; Sharghi et al. 2019), but to the best knowledge of the authors, not in the field of downscaling and drought modeling. Additionally, there is no study that has applied deep learning-based downscaled precipitation products for spatio-temporal drought monitoring.

In this paper, to fill the aforementioned gaps, an attempt was made to reproduce SPI (as a simple and commonly used index for drought monitoring) at a fine spatial resolution based on PERSIANN-CDR data. The suggested framework not only downscales precipitation but also predicts the SPI. Moreover, the vegetation fraction index and soil temperature were utilized as drought-dependent variables to downscale PERSIANN-CDR-based SPI employing shallow AI methods (adaptive neural fuzzy inference system (ANFIS), support vector regression (SVR), feedforward neural network (FFNN)), and the deep learning-based method of long short-term memory (LSTM). Finally, an AI-based modeling ensemble technique was used as a post-processing method.

2 Used materials and methods

2.1 Study area and employed data

2.1.1 Study area and in-situ data

The proposed methodology has been applied to the northwestern region of Iran. Figure 1 presents the spatial extent and geographic position of the study area, which includes two lakes and seventeen rivers. The northern, snow-capped district, incorporating Lake Urmia, experiences abundant precipitation, characterized by profound valleys and fertile lowlands. The climate in the western areas is mountainous due to numerous elevations. The Zagros Mountains, extending from northwest to southeast, hinder the full flow of moisture-laden currents from the Atlantic Ocean and the Mediterranean Sea. Additionally, cold Siberian and Central Asian air currents entering the western part from the north and northeast intensify the coldness and humidity of the climate, leading to severe cold and heavy snowfall. Another influential factor in the region’s climate is the presence of Lake Urmia, which not only provides humidity but also moderates temperatures. The maximum temperature reaches 34 °C in July, while the minimum temperature drops to − 16 °C in January. Generally, the mean annual precipitation in the western area is 300–400 mm. The volcanic peak of Sahand Mountain (3707 m) is the highest point in the central area. It has a semi-arid and cold climate influenced by the Mediterranean Continental air condition. The low-lying regions are also affected by gentle breezes from the Caspian Sea. The central area is a cold and mountainous region, considered a semi-arid area with a mean annual precipitation of 250 to 300 mm. The eastern area, situated in a mountainous region, experiences a cold, semi-arid climate. Large parts of the eastern area are green and forested, with bitterly cold winters and temperatures dropping to − 25 °C. The special geographical and topographic features of the eastern area, including mountain ranges exceeding 4000 m in altitude and wide plains, contribute to better precipitation conditions compared to other parts of the country, ranging between 250 and 600 mm.

Fig. 1
figure 1

Study area, selected meteorological sites and mean monthly precipitation, LST and NDVI maps (2015–2021)

Northwestern Iran encompasses various meteorological sites with several in-situ stations in operation. Among the various measurements taken by these stations (precipitation, evaporation, and air temperature), this paper utilized a monthly precipitation dataset from 15 stations (see Fig. 1) to evaluate the computed high-resolution SPI indices via the proposed AI-based downscaling method.

2.1.2 Satellite data

In this study, monthly PERSIANN-CDR data obtained from https://chrsdata.eng.uci.edu/ were utilized. Derived from satellite infrared measurements, PERSIANN-CDR is a collaborative effort between the National Oceanic and Atmospheric Administration (NOAA) and the Center for Hydrometeorology and Satellite (CHRS). It is employed to generate spatially distributed daily precipitation data from 1983 to the present. Unlike previous PERSIANN products, such as real-time and adjusted versions, PERSIANN-CDR utilizes GridSat-B1 information rather than passive microwave and Climate Prediction Center datasets, thereby using a different infrared dataset as input. Notably, PERSIANN-CDR provides long-term continuous precipitation data since 1983, rendering it more suitable for drought analysis that requires long-term datasets (Ashouri et al. 2015). Data retrieval for PERSIANN-CDR utilizes ANNs and provides a spatial scale of 0.25° with quasi-global coverage (60°N–60°S). Additionally, the 3-hourly precipitation rates are accumulated to determine monthly precipitation (Mohseni et al. 2021).

In this study, two satellite-based datasets were used as auxiliary data alongside the precipitation data. Daytime Land Surface Temperature (LST) measurements (0.05° × 0.05°) served as the satellite-based soil temperature parameter. Furthermore, the Normalized Difference Vegetation Index (NDVI) obtained from MOD13C2 measurements was used as the satellite-based vegetation factor. The satellite products utilized in this paper include Terra Monthly Vegetation Indices (MOD13C2) and Terra Monthly LST & Emissivity (MOD11C3) from MODIS measurements at a resolution of 0.05°. The MOD11C3 data provides monthly mean LST & emissivity values.

2.2 Standard precipitation index

The standard precipitation index (SPI) is an indicator utilized to characterize environmental drought across various timescales, including 3, 6, and 12 months. Vegetation conditions and soil thermal inertia serve as proxies for hydrological and agricultural droughts influenced by vegetation water and soil moisture. In short periods, such as one month, a decrease in precipitation may not impact soil thermal inertia or vegetation health. However, prolonged periods of reduced precipitation can affect surface water storage and vegetation health over a season. SPI, calculated from precipitation data, exhibits a strong correlation with vegetation and temperature conditions. To generate high-resolution environmental drought indices, SPI can be downscaled from coarse-scale satellite estimations of precipitation using MODIS products. In this study, we explored the correlation between two drought-dependent parameters, LST and NDVI, and SPI for downscaling environmental drought at a fine spatial scale.

The SPI is computed from precipitation data at different timescales, including 3, 6, and 12 months, using the formula:

$$SPI=\frac{X-\overline{X}}{\sigma }$$
(1)

where x is the precipitation value, \(\overline{X }\) is the average precipitation over a specific period, and σ is the standard deviation of precipitation. In this study, SPI was calculated in R Studio using the SPEI package for the period 2000 to 2021. It is important to note that different SPI indices are considered due to their application in exploring distinct objectives. For instance, in agricultural drought assessment, each index is valuable for investigating the impact on different crops. Calculating SPI at smaller temporal scales (e.g., 3 months) can serve as an indicator for immediate effects such as decreased flow, snowpack, and soil moisture in smaller streams. Conversely, at medium scales (e.g., 6 to 12 months), it can indicate decreased stream flow and reservoir storage. Therefore, to obtain a comprehensive understanding of the potential effects of a drought, SPI should be computed at different time scales. The correlation between the time scale and the effect of a drought is contingent on human interference and natural conditions. As discussed in previous studies, drought indices, including SPI, are often correlated with soil temperature and vegetation (Zhu et al. 2017). Various drought classifications exist in the literature based on different SPI thresholds. Table 1 illustrates the classification of meteorological drought by McKee et al. (1993).

Table 1 Drought classification based on SPI scores (McKee et al. 1993)

It is important to mention that auxiliary data, specifically NDVI and LST, were also transformed to match the same specific periods used for calculating the SPIs. The newly computed variables are denoted as NDVI-3, NDVI-6, NDVI-12, LST-3, LST-6, and LST-12.

2.3 Downscaling methodology

In recent decades, various statistical downscaling methods have been employed to downscale satellite-based hydro-climatic data. AI methods have demonstrated acceptable performance across areas with diverse topographic and climatic conditions. Figure 2 outlines the main steps of the proposed downscaling framework in this study. Shallow AI models, including FFNN, a commonly used AI method; ANFIS, a hybrid AI method capable of handling uncertainties through fuzzy concepts; SVR, an AI model minimizing operational risks; and the deep learning model of LSTM, increasingly popular but needing precise comparison with shallow AI models, are employed. An ensemble technique is also incorporated to downscale the SPI at different temporal scales. The methodology operates under the assumption that the statistical relationship between NDVI and LST variables as predictors and SPI at a lower resolution remains valid at a higher resolution. Previous studies, such as Gidey et al. (2018), have considered and confirmed the validity of this assumption. To establish the connection between the 0.05° predictive variables and the 0.25° SPI data, NDVI and LST parameters were initially resampled to a resolution of 0.25° using the bilinear method. Simultaneously, the 0.25° SPI data were resampled, using the bilinear technique, to 1.25° for use as input in the modeling. This involved dividing each pixel of 1.25° into a 5 × 5 grid to match the 0.25° resolution.

Fig. 2
figure 2

Overall diagram of the proposed framework for downscaling SPI

The primary objective of this study was to compute SPI data at a resolution of 0.05°, utilizing SPI data both as input and target. Given the original resolution of 0.25°, the proposed approach involved training the model at this resolution to facilitate downscaling to 0.05°. To achieve this, initially the model was trained to downscale from 1.25° to 0.25°, leveraging datasets available at both resolutions. Once the model was trained successfully at the 0.25° resolution, it was applied to downscale from 0.25° to 0.05°, as data at the finer resolution was not directly available for training purposes. Considering that the downscaling factor is 5 times (0.25° to 0.05°), this process was performed to accurately compute SPI data at the desired 0.05° resolution (refer to Fig. 3). This multi-step training approach allowed it to effectively utilize available data and bridge the resolution gap, enabling the computation of SPI data at the targeted finer resolution. Finally, to assess the modeling performance, the simulated results were compared with the recorded gauge-based data. For further clarification, information on the employed satellite data is tabulated in Table 2.

Fig. 3
figure 3

Resampling of SPI data from 0.25° to 1.25° to be used as input of models to improve modeling performance

Table 2 Data information employed in this study

The sub-steps of the proposed methodology for downscaling SPI are as follows:

  1. i.

    LST and NDVI products from MODIS at 0.05° were resampled to a 0.25° scale using the bilinear interpolation method to serve as predictors for the models.

  2. ii.

    PERSIANN-CDR products at 0.25° were resampled to 1.25° using the bilinear method, and the precipitation data were used to calculate SPI as a predictor for the models.

  3. iii.

    The models were trained using monthly data from 2000 to 2015, while the data from 2016 to 2021 were used for validation purposes.

  4. iv.

    The trained models were employed to estimate SPI values at 0.05° using LST and NDVI datasets at a 0.05° resolution for the years 2020–2021.

  5. v.

    Validation of SPI downscaled to a resolution of 0.05° was carried out employing in-situ data.

In some similar studies, SPI was not included as an input, and the residuals were interpolated at the end to reduce errors. However, in this study, by considering SPI as an input, there is no need for the final interpolation of residuals because the upscaled SPI has been used in the model training from the beginning to enhance the modeling (Mohseni et al. 2021; Mohseni and Mokhtarzade 2020; Sharifi et al. 2019).

2.3.1 Feed-forward neural network

The feed-forward neural network (FFNN) method, calibrated by the backpropagation (BP) algorithm, is the most commonly used model in various engineering studies. The first form of ANNs is the FFNN, where the phrase “feed-forward” signifies the forward flow of data during the network’s training.

  1. a)

    Various FFNN architectures, ranging from one middle layer (with 1 to 10 neurons), and different epoch numbers were created and tested.

  2. b)

    The model was calibrated using the BP algorithm, applying gradient descent optimization.

  3. c)

    The model demonstrating the best performance, i.e., the model with the maximum determination coefficient, was selected.

It should be mentioned that there is no specific method to design an ANN; therefore, each model should be structured individually.

2.3.2 Adaptive neural fuzzy inference system

The term “neuro-fuzzy” modeling refers to the process of utilizing different learning algorithms to apply fuzzy simulation to neural networks or fuzzy inference systems. Fuzzy systems comprise three main components: a fuzzifier, a fuzzy database, and a defuzzifier. It should be noted that the two main parts of a fuzzy database are the fuzzy rule base and the inference engine. A fuzzy rule base consists of rules that relate to fuzzy propositions (Jang et al. 1997). Therefore, fuzzy inferences are employed in operational analysis. The Sugeno FIS used by Jang (1993) was applied in this study.

2.3.3 Support Vector Regression

Cortes and Vapnik (1995) introduced the support vector machine (SVM) approach, offering a satisfactory solution for problems such as pattern recognition, regression, prediction, and classification. Statistical learning theory and structural risk minimization are two useful features of SVM, distinguishing it from FFNN. SVR, a type of SVM, entails fitting a linear function to datasets and subsequently applying a nonlinear kernel to the previous results to simulate the target data in a nonlinear manner (Raghavendra and Deka 2014).

2.3.4 Ensemble unit

A model ensemble method is an approach that combines different techniques to enhance the final accuracy. Various studies in diverse contexts have recommended ensembling the results of various models as an efficient post-processing method to improve estimation performance (Kazienko et al. 2013). The use of ensemble methods in this study is justified by their capacity to capitalize on individual models’ strengths while alleviating their weaknesses. The rationale includes: 1) Diversity of Models: Ensembles merge predictions from diverse models, capturing a broader range of patterns. 2) Compensating for Model Limitations: Ensembles offset individual model limitations, excelling in various regions or conditions. 3) Reduction of Overfitting: Nonlinear neural ensembles reduce overfitting, enhancing generalization to unseen data. 4) Improved Robustness: Ensembles minimize outliers’ impact, valuable in complex environmental modeling. 5) Performance Boost: Combining model outputs often yields superior performance. 6) Model Consensus: Ensembles generate predictions based on consensus, enhancing reliability. In summary, ensembles were strategically chosen to enhance downscaling methodology, providing a more comprehensive and accurate representation of underlying data relationships, addressing variability and uncertainties in environmental modeling.

In this paper, a nonlinear technique was employed to integrate the outcomes of the individual models used to enhance the final modeling accuracy. A new FFNN is created as a nonlinear ensemble method. The results of single methods (SVR, ANFIS, and FFNN) are considered as inputs for another new FFNN model to create the nonlinear neural ensemble model (refer to Fig. 4).

Fig. 4
figure 4

Diagram of suggested nonlinear ensemble method

It should be noted that the employed ensemble technique is a shallow AI ensemble, which combines shallow single AI models to utilize each model’s capability, enhance shallow modeling accuracy, and reduce uncertainty. Shallow AI models, having fewer layers, may face difficulty in analyzing more complicated tasks. However, deep learning models with more layers can extract features and patterns from datasets more accurately and easily. But deep learning methods require more data for training, and in the case of limited data availability, an ensemble post-processing technique may be a better choice.

2.3.5 Long short-term memory (LSTM) network

Sequences with long-short-term temporal information cannot be effectively recognized by conventional neural networks. To address this issue, a special type of recurrent neural network (RNN) has been designed, employing a loop framework to learn the temporal dependence between different variables. However, a typical RNN struggles to capture long-term dependencies effectively, as it tends to lose current and dependent information. A specific RNN structure, known as the LSTM network (Hochreiter and Schmidhuber 1997; Li et al. 2022), has been developed to overcome this limitation. In the LSTM network, three “gate” structures are designed to create a chain architecture of memory units. Figure 5 illustrates the diagram of a long short-term memory (LSTM) block, featuring, for example, 5 hidden units and 2 input dimensions.

Fig. 5
figure 5

Structure of LSTM unit with 5 hidden units and 2 input dimension

Through a ‘gate’, information is either discarded or retained within the memory unit to safeguard and control it. The LSTM network effectively addresses both the vanishing and exploding gradient problems through its memory unit, overcoming the limitations of conventional RNNs. In this study, the LSTM network comprises a sequence input layer, LSTM layer, fully connected layer, and regression layer.

2.4 Modeling performance

The root mean square error (RMSE) and Nash–Sutcliffe efficiency (or determination coefficient, DC) were employed to evaluate the modeling performance (Nourani and Behfar 2021):

$$RMSE=\sqrt{\frac{\sum_{t=1}^{N}{\left({I}_{({\text{t}})}-{I}_{com (t)}\right)}^{2}}{N}}$$
(2)
$$DC=1-\frac{\sum_{t=1}^{N}{({I}_{(t)}-{I}_{com (t)})}^{2}}{\sum_{t=1}^{N}{({I}_{({\text{t}})}-\overline{I })}^{2}}$$
(3)

where N stands for the number of data points, \({I}_{(t)}\) is the observed variable, \(\overline{I }\) is the average of the observed variable, and \({I}_{com (t)}\) is the modeled variable. DC ranges between -∞ and 1, with an optimal value of 1, while RMSE ranges between 0 and ∞, with an optimal value of 0. It should be noted that DC is different from the correlation coefficient (CC). DC indicates the agreement between modeled and observed variables, while CC indicates the linear correlation between them.

Although DC, RMSE, and CC are the most commonly used criteria for the verification of satellite data, as reported by Baez-Villanueva et al. (2018), RMSE alone is not sufficient for the verification of satellite-based precipitation data. Thus, for a more precise verification, the false alarm ratio (FAR) and probability of detection (POD) could also be utilized for verification of the precipitation data:

$$POD=\frac{H}{H+M}$$
(4)
$$FAR=\frac{{\text{F}}}{H+F}$$
(5)

where H is the number of satellite-based precipitation measurements that correctly detected the drought event (SPI <  − 1), M represents the number of drought events reported by the ground station but not detected by the satellite-based precipitation estimates, and F refers to the number of drought conditions identified by the satellite-based precipitation estimates but not reported at the ground station. Overall, the FAR and POD scores range from 0 to 1, with optimal values of 1 and 0 for POD and FAR, respectively.

3 Results and discussion

In this study, initially, NDVI and LST were upscaled to a resolution of 0.25° employing the bilinear resampling method. The framework operates under the assumption that the methods (FFNN, ANFIS, SVR, and LSTM) linking the SPI to NDVI and LST at a low spatial resolution (0.25° × 0.25°) are equally effective at a high spatial resolution (0.05° × 0.05°; Zhang et al. 2018). The effectiveness of these methods depends on the relation between the noted datasets (Zhang and Jia 2013; Mohseni et al. 2021). According to the thermal inertia theory, as vegetation intensity decreases, soil temperature increases, and vice versa. Therefore, the correlation between vegetation fraction and soil temperature is negative. This negative relation is more prevalent under dry soil conditions. In this context, SPI has a positive relation with NDVI and a negative relation with LST. As soil temperature decreases, the drought indices increase and vice versa. Some studies have reported that LST is more related to SPI compared to NDVI (e.g., Mohseni et al. 2021), while others (e.g., Zhang and Jia 2013) reported the reverse case. This discrepancy can be related to the climate and vegetation intensity of a region. In the study area, where vegetation is not very dense, the correlation of SPI with NDVI was less apparent than that with LST. Conversely, evaporation is high in this region, which is characterized by land temperature. Thus, the correlation of SPI with LST was higher than with NDVI. The degree of correlation between a drought index and vegetation/soil temperature variables not only depends on climatic conditions but also is affected by the type of drought index. For instance, SPEI is strongly dependent on evapotranspiration, but employing such indices requires more datasets. A correlation analysis was conducted among LST, NDVI, and SPI at different scales (Fig. 6). In Fig. 6, the left column shows the CC pattern of LST and SPI at different time scales, while the right column depicts the CC pattern of NDVI and SPI over the study area. The spatial distribution of CC shows almost similar patterns at different time scales influenced by the climate regimes of different areas. According to Fig. 6, in northern regions, the relationship of SPI with LST is weaker than in southern regions, but the relationship of SPI with NDVI is stronger than in other regions. Conversely, in the central and southern regions, SPI has a stronger relationship with LST and a less strong relationship with NDVI compared to the northern regions.

Fig. 6
figure 6

Correlation between SPI-3 and: (a) LST, (b) NDVI, SPI-6 and: (c) LST, (d) NDVI, SPI-12 and: (e) LST, (f) NDVI

The linking models (FFNN, ANFIS, SVR, and LSTM) between low-resolution PERSIANN-CDR data-based SPIs (i.e., SPI-3, SPI-6, and SPI-12) and NDVI and LST were created using a training dataset. For the FFNN model, different networks with varying numbers of neurons and epochs were examined to select the optimum network structure for downscaling. In the case of ANFIS, different membership functions (MFs) were employed, with the triangular MF yielding better performance in the simulation of SPI. The variation of the calibration iterations (epochs) was also explored to determine the best ANFIS method. The SVR method was implemented using the radial basis function (RBF) kernel. The number of tuning factors for the RBF kernel is smaller than for both polynomial and sigmoid kernels. Additionally, based on smoothness assumptions, the RBF kernel could lead to better performance compared to other kernel functions (Noori et al. 2011). The parameters of the RBF kernel in SVR models were tuned to obtain optimum results (see Table 3). As mentioned earlier, the upscaled SPI data, as well as LST and NDVI, were imposed into the input layers of models to enhance modeling accuracy. The outcomes of the first stage are summarized in Table 3. In addition to the shallow learning AI models, deep learning-based modeling was also carried out via the LSTM model. To design the structure of the LSTM model, a trial-and-error procedure was applied to obtain the optimum network.

Table 3 Results of the first step for single shallow, deep learning, and ensemble methods for training and test phases

Next, outputs of single shallow AI models (FFNN, ANFIS, and SVR) produced in the previous stage were integrated using the ensemble post-processing method. In a nonlinear ensemble method similar to the single FFNN, a network was trained utilizing the scaled conjugate gradient method of the BP algorithm and using the tan-sig activation function for hidden and output layers. Also, a trial and error procedure was employed to obtain the optimal structure. The results of the ensemble post-processing technique are presented in Table 3 as well. It should be noted that a, b, and c in the SVR method stand for the constant parameter (a), approximation accuracy (b), and kernel parameter (c). In the ANFIS structure, MF-a refers to employed MFs and the number of MFs (a). Furthermore, a in the structure of an FFNN and ensemble refers to the number of intermediate neurons (a). Also, a in the LSTM model represents the number of hidden units. Coding was performed in MATLAB software to execute AI methods.

The presented results in Table 3 could be interpreted to follow the aim of the study, which is to explore the capability of deep learning in comparison to shallow AI-based models. Additionally, it aims to evaluate the impact of a nonlinear ensemble post-processing technique in improving the modeling results and eventually compare deep learning and shallow AI-based ensemble methods. As evident from Table 3, the outputs of the individual shallow AI methods are close to each other. However, at different stages and time scales, one of the models slightly outperformed the others. In general, ANFIS, due to the use of fuzzy theory, performed a little better than others in most cases. According to Table 3, the ensemble method could enhance the results of single shallow AI models for SPI-3, SPI-6, and SPI-12 by 18%, 19%, and 9%, respectively, for the training phase, and 18%, 15%, and 23%, respectively, for the test phase. It is evident that the improvement of DC in the training phase is not significant for SPI-12, but the performance improvement is considerable in the test phase for all temporal scales. The modeling improvement of the test phase for SPI-12 is more than for others, considering that the overall results of SPI-12 are less accurate than for others. Considering that each model has its own advantages and disadvantages, the ensemble technique, because it incorporates the benefits of all models, may simulate the task more precisely than the single methods. For instance, the mean SPI-6 obtained using PERSIANN-CDR products and the corresponding mean predicted SPI-6 by employed models over January 2020-June 2021 are depicted in Fig. 7, where it is clear that the ensemble excels the other methods. For instance, points 1, 2, and 3 in June 2020, August 2020, and January 2021, respectively, were considered. Considering the noted points, SVR, FFNN, and ANFIS outperformed other shallow AI models at points 1, 2, and 3, respectively. Overall, the ensemble technique showed the best performance among all employed methods. It should be noted that there are different methods for combining model outputs, but as discussed in previous studies (Shamseldin et al. 1997; Sharghi et al. 2019), the neural ensemble model generally outperforms others. Therefore, in this study, the neural ensemble method was employed to improve the modeling performance.

Fig. 7
figure 7

Time-series comparing mean SPI-6 derived from PERSIANN-CDR data and corresponding mean predicted SPI-6 by employed models from January 2020 to June 2021

On the other hand, the deep learning model LSTM significantly excels over the shallow AI models in the training phase, while it tends to slightly outperform the shallow AI models in the test phase, and overtraining of LSTM is evident. This is due to the high number of trainable parameters of deep learning models. For instance, FFNN, ANFIS, SVR, and LSTM models for modeling SPI-3 have 16, 50, 3, and 186 trainable parameters. In other words, because the LSTM model has many degrees of freedom, it has high performance in the training phase, but the presence of errors in multiple calibrated parameters increases the error in the testing phase. Also, LSTM results are more deteriorated for SPI-12, which has less training data compared to other indices. At this point, it is worth mentioning that the number of calibration data is very important. Hence, if more data are available and used for training the models, the deep learning model may perform significantly better than other models. Thus, the ensemble post-processing method can be a good alternative to a deep learning method when limited data are available. On the other hand, it is clear from Table 3 that the performance of methods in estimating SPI-3 and especially SPI-6 is higher than in estimating SPI-12.

In the second step, fine-scale inputs were imposed into the trained single shallow AI models as well as the LSTM to downscale SPI. The results of this step are presented in Table 4. It is worth mentioning that in some previous downscaling studies (e.g., Sharifi et al. 2019), residual correction was employed to correct and improve the downscaling outcomes. However, considering that in this study, the upscaled SPI was employed as input data, there is no need for residual correction. In other words, in previous downscaling studies, only auxiliary parameters were utilized as input for modeling, which is why the results were not so good. Therefore, residual correction should be done to correct the bias of estimations. In fact, the studies that only used auxiliary parameters as input do not exactly address the downscaling in the training phase of the modeling. Instead, in the last step, the residuals of estimated results are corrected by the low-resolution target. In contrast, in the current modeling, when the low-resolution target is used as an input, the error is reduced by the BP algorithm and loss function via the training process, and there is no need to modify the residuals as a further modeling step. Comparing the employed methodology in this study and previous studies, it is clear that the utilized methodology in some previous studies consists of more steps (i.e., one interpolation by ANN, one interpolation by spline, and again, the sum of ANN estimations and resampled errors, e.g., see Sharifi et al. 2019) than the methodology proposed in this paper (i.e., only one interpolation by AI model).

Table 4 Results of the downscaling phase of the modeling for individual shallow, deep learning and ensemble methods at different temporal scales of SPI

Finally, an evaluation was conducted by comparing the computed SPI-3, SPI-6, and SPI-12 indices with the observed data from ground gauges and the predicted SPI indices by the proposed method to provide a more detailed and in-depth comparison of the employed models. The results of this evaluation and comparison are presented in Table 4. Considering the results presented in Table 4, it can be interpreted that the performance of modeling for both SPI-3 and SPI-6 indices is almost similar. Moreover, regarding the obtained DC, the suggested model shows better results in estimating SPI-6 compared to the SPI-3 values. POD values computed for SPI-6 are 0.76 and 0.8 for LSTM and ensemble methods, respectively, indicating that about 80% of the drought conditions could be detected by the proposed modeling framework. On the other hand, FAR scores of SPI-6 are 0.22 and 0.24 for ensemble and LSTM methods, respectively, indicating that 25% of the drought events captured by the methodology are not real. Also, Fig. 8 shows the downscaled results of ensemble and LSTM methods for SPI-6 for a wet month (September 2020) over the study area with a mean SPI of 0.32. It should be noted that the study area is a semi-arid area, and there are not many months with an SPI greater than 1. Furthermore, Fig. 8 depicts similar outcomes for a dry month (June 2021), in which the mean SPI is about -1.3. The computed SPI by the PERSIANN-CDR data is also displayed to compare the results of deep learning and ensemble methods. According to Fig. 8a, the PERSIANN-CDR’s SPI data vary from − 1.02 to 1.64 in September 2020 as a wet month. The mean SPI is 0.31 for this month. In June 2021, drought conditions can be observed since the majority of SPI data in this month are negative (Fig. 8d). Considering Fig. 8, it is obvious that the low-resolution SPI map derived from PERSIAN-CDR data and high-resolution maps obtained from the proposed methodology are almost in the same range in both dry and wet months, indicating the competency of the suggested framework in linking vegetation and thermal factors to the drought indices. However, the estimates of the ensemble technique were more accurate and reliable than those of the LSTM model.

Fig. 8
figure 8

Spatial map of SPI-6 in September 2020 from (a) PERSIAN_CDR data (0.25°), (b) ensemble method (0.05°), (c) LSTM (0.05°); and in June 2021 from (d) PERSIAN_CDR data (0.25°), (e) ensemble method (0.05°), (f) LSTM (0.05°)

As mentioned earlier, 15 sites were employed to further evaluate the proposed downscaling model. Figure 9 displays scatterplots and provides a quantitative comparison of each algorithm, including in-situ SPI data (reference), original PERSIANN-CDR-based SPI, and downscaled SPI. As observed in Fig. 9, the original PERSIAN-CDR’s SPI values are spread distant from the bisector line. Furthermore, they tend to exhibit higher differences for extreme values, while other methods performed better in estimating the extremes. Moreover, considering the results of the test step, the ensemble technique led to the most accurate results.

Fig. 9
figure 9

Scatterplots between in-situ and simulated data for SPI-3 obtained by (a) PERSIAN-CDR, (b) Ensemble, (c) LSTM, for SPI-6 obtained by (d) PERSIAN-CDR, (e) Ensemble, (f) LSTM, and for SPI-12 obtained by (g) PERSIAN-CDR, (h) Ensemble, (i) LSTM

To provide a more detailed comparison, the performance of the ensemble and LSTM methods in generating, for example, SPI-6 compared to the in-situ based values is presented in Table 5.

Table 5 The performance metrics of ensemble and LSTM methods for downscaled SPI-6 for 15 stations

In Table 5, the presented results indicate that DC scores for 12 stations are higher than 0.5, suggesting that 80 percent of the recorded data could be correctly simulated by the employed methodology. Conversely, DC scores less than 0.5 indicate a lower accuracy, but it does not necessarily imply a failure of the proposed framework in simulating the task. The CC values calculated between in-situ and obtained SPIs, except for Miandoab station, vary from 0.4982 to 0.9613. These outcomes confirm a reliable consistency of SPI indices derived from ground data and the simulated results by the proposed methodology. RMSE scores range from 0.2772 to 0.9513. The FAR and POD scores can examine other aspects of the suggested framework. According to Table 4, the mean POD for LSTM and ensemble methods is 0.77 and 0.81, respectively, showing that about 80% of the drought conditions could be detected by the suggested methodology. Nevertheless, the POD score is low for the Sardasht station (i.e., 0.25). Although the POD score is unacceptable for this site, the suggested framework is still capable of extracting 25% of the drought conditions. Furthermore, the mean FAR is 0.23, indicating that 23% of the extracted drought events are not real.

Comparing the results of Table 5 and Fig. 6, it is clear that in areas with a high correlation between SPI and LST, the accuracy of outputs is higher. However, there is no obvious relation between the high NDVI correlation in Fig. 6 and the results of the models in Table 5. Moreover, considering the detailed outputs presented in Table 5, it is clear that the modeling outcomes for stations 2, 4, 5, 6, 9, 10, and 11 are more accurate than others, while the worst outputs were obtained for stations 13 and 15. Thus it can be concluded that SPI values of stations with a relatively high average precipitation located in the northeast, east, and central regions could be predicted more accurately than in other regions. SPI values of stations with relatively low average precipitation located in the southwest could not be predicted accurately, while the POD value of station 13 is still 60% correct. Taylor diagrams of all SPI indices (SPI-3, SPI-6, and SPI-12) are plotted in Fig. 10 (Taylor 2001). In a Taylor diagram, smaller distances of the downscaled SPI and reference data (in-situ records) show a better performance of the modeling.

Fig. 10
figure 10

Taylor diagram for (a) SPI-3, (b) SPI-6 and (c) SPI-12 for test phase

As illustrated in Fig. 10, ensemble predictions are closer to the reference data compared to the other methods. Overall, the results indicate that the ensemble model, with a lower RMSE and higher CC and DC, clearly outperformed the other employed methods. Furthermore, when the available training data is limited, deep learning is not a suitable choice for simulation, and the ensemble strategy can be a suitable alternative.

The proposed approach in the current study for downscaling SPI in the northwest of Iran offers novel contributions and distinctions when compared to existing studies, showcasing advancements in the understanding of AI-driven drought modeling. Similar to Rhee et al. (2010) and Shen et al. (2019), the proposed methodology leverages multiple data sources, incorporating both PERSIANN-CDR and MODIS-derived variables. However, our distinct contribution lies in the integration of these diverse datasets to downscale the SPI employing AI models, whereas they focused on monitoring agricultural drought. Aligning with Small et al. (2018), the current study recognizes the significance of temporal scale, specifically highlighting the superior performance of SPI-6 in capturing soil moisture dynamics, emphasizing the understanding of seasonal variations in vegetation response during drought conditions. Furthermore, our study shares similarities with Prodhan et al. (2021) and Shen et al. (2019) in exploring deep learning in a drought context. While others monitor drought, our study uniquely downscale SPI, showcasing the efficiency of ensemble approach for data-limited regions. Acknowledging contextual disparities, our AI-based method overcomes limited data challenges, excelling in improving SPI simulation and detecting 80% of drought conditions. The integration of upscaled SPI as a predictor stands out, distinguishing our work from studies like Mohseni et al. (2021) that lack this element. In contrast to these studies, our ensemble post-processing improves accuracy substantially during the test phase. In summary, the current study presents a unique combination of multi-source data integration and upscaled SPI incorporation, contributing to AI-driven drought modeling discourse and highlighting the efficacy of ensemble approaches in regions with data constraints.

4 Conclusions

In conclusion, this study aimed to compute high-resolution spatio-temporal SPI indices derived from satellite-based precipitation data, along with thermal and optical parameters, with the ultimate goal of downscaling SPI indices obtained from PERSIANN-CDR products. The proposed framework conducted the downscaling task, considering the correlation between SPI and the vegetation index, along with soil temperature variables derived from MODIS products. The fundamental assumption of the proposed methodology was that drought events exhibit a meaningful relationship with vegetation conditions and soil temperature. On the other hand, the challenges ahead in this research were the inconsistency between the resolutions of different datasets and the application of the trained model on a higher scale to a lower scale. Downscaling procedures involved the utilization of both shallow AI models—specifically, SVR, ANFIS, and FFNN—and the LSTM deep learning method. Following the downscaling process, an ensemble post-processing technique was applied exclusively to the shallow AI models. To overcome the challenges in the problem, first, the resolution of the high-resolution predictive parameters was reduced according to the existing target resolution at a spatial resolution of 0.25° spanning the years 2000 to 2015; then in this regard, the models were created and trained. In the next step, high-resolution inputs were fed to the trained models to obtain high-resolution output at a spatial resolution of 0.05° for the period from 2016 to 2021 (the goal of the problem at hand).

According to the results obtained, there is no a high similarity between the SPI indices derived from the original PERSIANN-CDR data and the measured ground data. This discrepancy, attributed to spatial resolution differences and errors in satellite measurements, prompted strategic solutions. Through a meticulous examination of the relationship between drought-dependent variables and SPI, we successfully downscaled SPI, with modeling SPI-6 demonstrating superior accuracy compared to SPI-3 and SPI-12. This suggests that, over the 6-month period, drought indices are primarily correlated with soil moisture, a key influencing factor on soil temperature and vegetation index. The integration of MODIS LST and NDVI data, coupled with upscaled SPI data, presented an effective downscaling method. It is important to note that precipitation data from different satellites may contain systematic errors stemming from variations in the use of infrared and microwave signals or differences in precipitation modeling, leading to potential inaccuracies. Consequently, precipitation data from alternative satellites may yield better outcomes for specific time periods or regions. Despite potential errors due to precipitation timing around satellite passages, the nonlinear neural ensemble of shallow AI models significantly improved modeling accuracy by up to 20% and 25% in the training and test phases, respectively, achieving the mean DC score of 0.67 in the validation phase. Although the deep learning model, LSTM, showed slightly better performance than shallow AI models during the training phase, its performance was hampered by the limited available data in comparison to the higher number of parameters involved in deep learning modeling. As a result, the ensemble technique, particularly for unseen data in the test phase, outperformed the deep learning model. Additionally, the proposed methodology successfully detects approximately 80% of drought conditions. A major strength of the proposed framework lies in the pioneering use of upscaled SPI as a predictor for AI models, a novel approach not explored in prior downscaling studies. This consideration, coupled with globally available auxiliary inputs, positions the suggested methodology for broad applicability in various case studies and geographic areas. Additionally, the proposed model holds promise for diverse downscaling purposes, extending beyond satellite products to parameters or global climate model (GCM) data. Despite the reasonable accuracy of SPI indices at the 0.05° scale compared to ground data, it is crucial to acknowledge the inherent uncertainties in the proposed modeling approach, particularly related to the reliance on cloud-free conditions for satellite-based data. While this limitation is not highly significant in the context of the employed monthly modeling scale, it remains an area for consideration in future research.

Looking ahead, future studies could benefit from exploring alternative black box models, such as genetic programming, and validating nonlinear ensemble methods like ANFIS and SVR in place of FFNN. Furthermore, extending the application of the proposed methodology to multi-step ahead modeling could yield valuable insights and enhance its overall applicability.