Introduction

Accurate climate prediction is vital in planning and management of water resources for long-term sustainability of hydrological projects1. Global Circulation Models (GCMs) are considered to be the most dependable numerical models for understanding likely future climate2,3. GCMs simulate the past climate based on the observed concentrations of greenhouse gases (GHGs) and simulate the likely future climate based on the given likely future concentrations of GHGs4. However, uncertainties are involved in the past and future simulations made by GCMs even after significant improvements made in the recent versions of GCMs5,6. Knowledge of possible uncertainties and their respective solutions have also significantly progressed through the years3,5,7,8.

A watershed level climate analysis is necessary for planning suitable mitigation and adaptation techniques9,10. The GCMs are often incapable of giving fine scale simulations which are required for local scale studies. In order to overcome this limitation, several downscaling techniques are developed and improved11,12. However these downscaled or raw simulations of GCMs often have considerable bias, which are frequently corrected through appropriate bias correction techniques13,14,15,16. Another strategy used for reducing the uncertainties associated with GCMs is through appropriate selection of GCM/GCMs3,17. Different approaches are followed for the selection of the best GCM or an ensemble of GCMs.

Many of the earlier studies used the outputs of a single GCM. Recently, usage of ensembles of several GCMs have become a common practice18. The main aim of using multi-model ensembles (MMEs) / ensemble is to improve the reliability of future projections19. In general, ensembling is done in two ways: (1) Calculating the mean or median of a set of GCM outputs, and (2) Giving weights to different GCMs considered. In order to calculate the weights of the GCMs according to their performance in the past, multi-criteria decision making methods or other matrices are often used3,20. Multiple linear regressions (MLR) to complex machine learning (ML) algorithms are used to develop MMEs. ML algorithms are gaining popularity since they are found to be more effective compared to other techniques of ensembling (e.g. Acharya et al.21; Ahmed et al.2; Crawford et al.22; Sachindra et al.23; Wang et al.24). However, most of these studies have done monthly, annual or seasonal evaluation. Thus, reliable MMEs of daily climate variables are thus necessary.

All the above approaches basically consider climate data to be stationary and linear. Several ML models have been proposed for climate data downscaling and multi-model ensemble predictions as an alternative to these approaches that can address the non-linearity in time series data2,23,25,26,27,28,29,30,31,32. The most commonly used models are Support Vector Machine (SVM), Random Forest (RF) and the artificial neural networks (ANNs), which can model complex, mostly nonlinear relationships in climate data. Although these approaches can address non-linearity in the data, they have the drawback of assuming that all inputs and outputs are independent of each other, even when dealing with sequential data33. Since climate data has dependency between successive values, it is imperative to consider this dependency. Long Short-Term Memory (LSTM) deep learning models are designed specially to learn long-term dependencies present in sequential data34. Compared to shallow ANN architecture, deep learning models are more capable of representing highly varying nonlinear functions like complex temporal patterns via high-level temporal abstractions35,36.

The present study aims on comparison and improvement of MMEs using various ensembling techniques. In this objective, special attention is paid to improve the MMEs of daily climate variables like precipitation (P), minimum temperature (Tmin) and maximum temperature (Tmax). Furthermore, special emphasis has been given in testing the ability of each ensembling technique in simulating monsoon rainfall. The methodology proposed in the present study is demonstrated on Netravati basin, a tropical river basin on the southwest coast of India. The present paper is organized as follows: “Study area” and “Data products” sections, introduces the study area and datasets considered. “Methodology” section describes the related methodology followed for creating ensembles of GCMs using simple statistical techniques (mean), regression models (i.e., SVM and MLR), an ensemble learning models (i.e., extra tree regressor (ETR) and RF), and deep learning time series model (i.e., multivariate LSTM). “Results and discussion” section presents the results, while “Summary and conclusions” section concludes and discusses the scope for future work.

Study area

The Western Ghats of India is one among the global biodiversity hotspots. It is biologically rich and biogeographically unique with diverse species of plants, mammals, birds and amphibians37. Netravati, a west-flowing river which drains into the Arabian Sea is located in the central zone of Western Ghats of India. This river is situated between 12°30′N and 13°10′N latitudes and 74°50′E and 75°50′E longitudes covering an area of about 3415 km2 (Fig. 1). The Netravati river basin experiences a humid tropical climate with an average annual rainfall of around 4000 mm. The rainfall over the basin is distributed into three seasons namely, Pre-Monsoon (March–May), Southwest Monsoon (June–September), and Northeast Monsoon (October–December). The Southwest monsoon is the major contributor to annual rainfall. The average daily temperature is the highest during March to May and lowest during December and January. The average minimum and maximum temperatures of the basin are about 19 and 29 °C, respectively. The elevation in the basin varies from 0 to 1884 m with respect to the MSL. Geologically, the basin is of Precambrian formations. The upper part of the basin mainly consists of sandy clay loam soil, while the lower parts consist of clay loam soil38. Mountainous dense forests cover the upstream parts of the basin while agricultural and urban lands dominate the lower parts. Netravati river is a major source of water for agriculture, industries and civic life in cities like Mangaluru, Bantwal, Puttur, Dharmasthala, Ujire etc. in the basin39. The basin is socially, economically and culturally important.

Figure 1
figure 1

Location of the selected study area—Netravati basin (Generated using ArcMap 10.3).

Data products

Reference precipitation and temperature dataset

High resolution gridded precipitation data from the year 1901 at a daily timescale with a spatial resolution of 0.25° longitude × 0.25° latitude has been made available by India Meteorological Department (IMD). This dataset is created by converting 6995 station-based observations into grid values using Shepard’s interpolation technique40. This dataset can represent the spatial rainfall distribution like heavy rainfall areas in the orographic regions of the west coast and low rainfall in the leeward side of the Western Ghats40. IMD also provides gridded daily maximum and minimum temperature from the year 1951 at a spatial of 1° longitude × 1° latitude. This dataset is developed based on the data from 395 quality-controlled stations using a modified version of the Shepard’s angular distance weighting algorithm for interpolation41. These datasets can be accessed through IMD Pune’s website (http://www.imdpune.gov.in/Clim_Pred_LRF_New/Grided_Data_Download.html). These datasets are extensively used for climate-related research and applications over India42,43,44,45. Hence, the daily rainfall and temperature dataset from the India Meteorological Department (IMD) was used in this study as the reference/observation dataset.

GCM precipitation and temperature dataset

The statistically downscaled and bias-corrected Coupled Model Inter-comparison Project, Phase 5 (CMIP5) dataset provided by the National Aeronautics Space Administration (NASA) Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) was used in this study. This dataset includes historical (1950–2005) and future (2006–2100) climate projections (Representative Concentration Pathways (RCP) scenarios 4.5 and 8.5) of precipitation and temperature at high spatial (25 km2 grid-scale) and temporal (daily) resolutions from 21 GCMs. The dataset is generated using Bias-Correction Spatial Disaggregation (BCSD) method46 using the Global Meteorological Forcing Dataset (GMFD) provided by the Terrestrial Hydrology Research Group at Princeton University47. This data can be accessed from NASA Centre for Climate Simulation (NCCS) portal (https://portal.nccs.nasa.gov/datashare/NEXGDDP/). The list of 21 GCMs with the country of their origin is given in Table 1. This dataset has been utilized for many studies around the world42,47,48,49. It is considered as the highest resolution and most accurate climate data based on CMIP5 scenarios in India50. The high resolution of NEX-GDDP not only provides information at finer scales but also incorporates local topography effects which influence the local extremes of rainfall events. A study by Jain et al., (2019) evaluated and compared the performance of NEX-GDDP dataset with CMIP5 and CORDEX datasets in India. They found that the NEX-GDDP data could realistically capture the precipitation and temperature variability in India and recommended it for future climate impact studies.

Table 1 Twenty-one CMIP5 models included in NEX-GDDP dataset.

Further, bias-corrected daily projections of precipitation, maximum temperature, and minimum temperature for South Asia developed by Mishra et al.51 using outputs from 13 CMIP6 GCMs was also used in this study. This dataset is bias-corrected using Empirical Quantile Mapping (EQM) for the historic (1951–2014) and projected (2015–2100) period. The dataset contains bias corrected projections for the four scenarios (SSP126, SSP245, SSP370, SSP585). This bias-corrected dataset is technically validated against the observations for both mean and extremes of precipitation, maximum and minimum temperatures51. Spatial resolution of this bias corrected dataset is 0.25°. The list of these 13 GCMs with the country of their origin is given in Table 2. Here after these GCMs are collectively referred to as CMIP6 dataset.

Table 2 Thirteen CMIP6 models considered in the study.

Methodology

There are many methods available for ensembling, like Bayesian approaches and machine learning approaches52,53. Six techniques were used for creating MMEs of P, Tmax and Tmin simulated by 21 NEX-GDDP and 13 CMIP6 GCMs in Netravati basin. These methods were mean, Multiple Linear Regression (MLR), Support Vector Machine (SVM), Extra Tree Regressor (ETR), Random Forest (RF) and long short-term memory (LSTM). These methods cover the major types of existing machine learning ensembling methods. These ensembling techniques can be classified as simple statistical techniques (mean), regression models (i.e., SVM and MLR), ensemble learning models (i.e., ETR and RF), and deep learning time series model (i.e., multivariate LSTM). All these methods try to improve the GCM simulations with respect to the observation dataset in the historical time period. All the BC methods except LSTM were implemented P, Tmin and Tmax using the scikit-learn library in Python54. The LSTM was implemented using Keras, which is one of the most popular deep learning libraries in Python55. All the calculations have been carried out independently for each grid cell and the results for one representative grid in the basin is shown to simplify the presentation. More about data pre-processing and a brief description of each ensembling method are provided in the following sections.

Data preparation

Each ensembling method was carried out at each grid point considering P, Tmax and Tmin separately. Bilinear interpolation was done in order to bring the GCM values to the corresponding observation grids in the basin. Ensemble mean was calculated by finding the mean of P, Tmax and Tmin simulated by all GCMs at each grid respectively. The data was split into training and testing datasets for validation and comparison of each method of ensembling. The input to each ML model was preprocessed using Principal component analysis (PCA). More about PCA is described below.

Principal component analysis (PCA)

Before applying any ML algorithm, it is vital to acquire only the relevant features in the training dataset. This way of reducing the feature space is termed as dimensionality reduction or feature selection56. In this study the features are the various GCMs in ensemble. Ahmed et al.2 has mentioned that choice of the number of the GCMs used in MME is a key decision in ensembling. In the present study PCA was used for this purpose. It is a part of the exploratory data analysis in ML technique for predictive models57. It makes the model simple and efficient which in turn reduces the run time of the model. PCA prevents overfitting and converts a group of correlated variables to uncorrelated variables through orthogonal transformation58. A principal component (PC) is chosen such that it would describe most of the available variance59. Thus, it removes the risk of multicollinearity. In this study, the PCs of 21 GCMs of NEX-GDDP dataset and 13 GCMs of downscaled CMIP6 dataset for each grid was calculated separately. The PC’s which gave cumulative contribution rates greater than 95% were used as input to ML models.

Machine learning algorithms

MMEs were developed for P, Tmax and Tmin separately at each grid point in the basin using machine learning methods. The observed and simulated values of P, Tmax and Tmin were divided into a calibration period and validation period. The first 45-years (1951–1995) of overlapping observed and simulated data were used for calibrating the MMEs. The rest of the data were used for validating the MMEs. More about the methods adopted in the study are given in the following sections.

Multiple linear regression (MLR)

MLR is a common form of regression analysis. Multiple linear regression attempts to explain the relationship between one dependent variable and two or more independent variables by fitting a linear Eq.60. It has been widely used for climate studies for downscaling and impact analysis27,61. In general, MLR can be mathematically written as:

$$y={\beta }_{0}+{\beta }_{1}{x}_{1}+\dots +{\beta }_{n}{x}_{n}+\varepsilon $$
(1)

where y is the dependent variable, \({\mathrm{x}}_{\mathrm{i}}\) are independent variables, \({\upbeta }_{\mathrm{i}}\) are parameters, \(\upvarepsilon \) is the error.

In this study, the ordinary linear least squares (LLS) regression which minimizes the residual sum of squares between the observed values and the ensemble values was used. This was implemented using ‘sklearn.linear_model’ module in python54.

Support vector machine

SVM is based on Vapnik–Chervonenkis (VC) theory and the rule of structural risk minimization62. SVM is used for various climate change and hydrological applications2,23,25,63. Support Vector Regression (SVR) is the SVM that elucidates nonlinear regression problems by mapping the low-dimensional data to a high-dimensional feature space using kernel functions. Mathematically, SVR model can be represented as follows:

$$y=\sum_{i=1}^{n}\left({\alpha }_{i}-\widehat{{\alpha }_{i}}\right)Kernel\langle {x}_{i},x\rangle +b$$
(2)

where \(\mathrm{Kernel}\langle {\mathrm{x}}_{\mathrm{i}},\mathrm{x}\rangle \) represents the kernel function used; \({\mathrm{\alpha }}_{\mathrm{i}}\,\,\mathrm{ and}\,\,\widehat{{\mathrm{\alpha }}_{\mathrm{i}}}\) denote the Lagrange multipliers; \({\mathrm{x}}_{\mathrm{i}}\) denote the vectors; x represents the independent vector; b represents the bias parameter.

SVR uses a symmetrical loss function, which equally penalizes high and low misestimates. Using Vapnik’s Open image in new window -insensitive approach, a flexible tube of minimal radius is formed symmetrically around the estimated function, such that the absolute values of errors less than a certain threshold Open image in new window are ignored both above and below the estimate. In this manner, points outside the tube are penalized, but those within the tube, either above or below the function, receive no penalty. One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space. Additionally, it has excellent generalization capability, with high prediction accuracy64.

MMEs which used the polynomial kernel function performed better than the MMEs that used other kernel functions. Hence in this study polynomial kernel function was put to use similar to Sachindra et al.23 and Ahmed et al.2. The choice of hyperparameters plays a great role in machine learning methods. In the current study, the Bayesian hyperparameter optimization (BHO) was used to determine the hyperparameters for all machine learning algorithms. The “hyperopt” package in Python was used to implement BHO65. The important hyper-parameters optimized in SVR are C, kernel function and epsilon.

Random forest and extra tree regressor

The RF and ETR models are ensemble machine learning techniques. RF is proposed by Breiman66 based on a combination of statistical learning theory and classification or regression methods. The multiple classification and regression decision tree (CART) included in the algorithm prevents over-fitting and adjusts different types of input variables. This algorithm generates many independent trees and generates a decision based on the characteristics of nonparametric statistical regression and randomness26. A decision tree comprises of a root node, sub node, and leaf node. A leaf node corresponds to a judgement level while a sub node contains a judgement rule. The average of predicted values from all trees is the result of the algorithm. RF is internally cross-validated using out of bag (OOB) score25. ETR is a variation of that adds a further level of randomness to the splitting of the trees67. It is an extension of RF with two major differences: (1) ETR does not apply bootstrapping but each tree is trained with the whole of training data, (2) ETR selects a random cut point instead of a locally optimum cut point. The split which gives the highest score is selected from the set of randomly generated splits. That is k decision trees are generated and m features at selected for each training sample. At each of the decision tree a random cut-point is selected. This helps to avoid overfitting to some extent. More about ETR can be found in Xu et al.25.

Long short-term memory (LSTM) deep learning models

Climate data is a time series data involving sequence of observations over regularly spaced intervals with trend (upward, downward, or absent), seasonality (periodic fluctuation within a certain period), cyclic variations (rises and falls) and irregular or random components68,69. Meteorological predictions of GCMs can be seen as a multivariate sequential data. Hence a LSTM model which belongs to the family of deep recurrent neural networks could be used for creating multi-model ensembles of climate data. The current prediction of LSTMs is influenced by the feed network activations from the previous time steps. Hence, this connection develops a memory of previous events in the LSTM network. The architecture of a LSTM cell is given in Fig. 2 where ft, it and ot are forget, input, and output gates respectively. Xt, St and Ct are input, hidden and cell state at time step t, respectively. St-1 and Ct-1 are the hidden and cell state at time step t − 1, respectively.  ⊗ ,  ⊕  and σ are pointwise multiplication, pointwise addition and sigmoid activation, respectively.

Figure 2
figure 2

Architecture of a LSTM cell.

The network has three inputs: Xt—input at the current time step, St-1 is the output from the previous LSTM unit, and Ct-1 is the memory of the previous unit. As for outputs, St− the output of the current network, and Ct is the memory of the current unit. The LSTM model has input it, output ot, and forget ft learnable gates that modulate the flow of information and maintains an explicit hidden state that is recursively carried forward and updated as each element of the sequential data is passed through the network. The input gate decides what information to add from the present input to the cell state, the forget gate decides what must be removed from the St-1 state, thus keeping only relevant information, and the output gate decides what information to output from the current cell state. More information LSTM can be found in Bouktif et al.69 and Sagheer and Kotb70. In this study, the LSTM was optimised for learning rate, batch size, units, layers and window.

Performance evaluation

The observed and simulated values of P, Tmax and Tmin used for developing MMEs are divided into calibration and validation dataset. The first 45-years (1951–1995) of overlapping observed and simulated data were used for calibrating the machine learning models. The rest of the data were used for validating the MMEs developed using each method. Performance evaluation on validation data on daily basis was done in terms of Root-Mean Square Error (RMSE) or Root-Mean Square Deviation (RMSD) and correlation coefficient (R). These performance indicators are widely used by many researchers71,72,73. Further, the daily data were converted into monthly data for performance evaluation. Scatter plots and Taylor diagrams are used for the evaluation of performance on monthly basis. The scatter plots along with coefficient of determination (R2) provided a useful comparison of observed and MME values. Taylor diagram summarised the performance of each MME in terms of RMSD, R and standard deviation (SD). The procedure was repeated explicitly for MME’s of precipitation for the monsoon season to study their ability in simulating rainfall magnitudes.

Results and discussion

The performance evaluation of each ensembling method for simulating P, Tmin and Tmax is done grid wise on daily and monthly scales for NEX-GDDP and CMIP6 datasets separately. The performance evaluation on daily scale is done using R and RMSE. Results of this evaluation during the validation period is given in Table 3. Further, scatter plots and Taylor diagrams are used to evaluate the performance on a monthly basis. The performance of each ML method was more or less the same at each grid. Hence, the results obtained for one representative grid in the basin is shown and discussed for simplification of the presentation.

Table 3 Performance of various MMEs in simulating daily P, Tmin and Tmax.

Performance evaluation of MMEs in the case of precipitation

Performance evaluation of MMEs for daily rainfall

The results of performance evaluation on daily precipitation given in Table 3 indicate that the ML approaches have improved performance of MMEs when compared with the mean ensemble approach. However, the improvements are not very significant for all ML methods except for LSTM. The MME developed using LSTM for NEX-GDDP dataset could significantly improve the R value from 0.52 to 0.74 when compared to mean ensemble technique. Similarly, a reduction in RMSE from 19.03 to 14.59 is also achieved by using LSTM for ensembling when compared to mean ensembling. Thus, the MMEs made using LSTM is performing significantly better for NEX-GDDP and CMIP6 datasets. The same is observed in the scatterplots of monthly precipitation given in Figs. 3 and 4 for NEX-GDDP and CMIP6 datasets respectively. The R2 value increased from 0.82 to 0.94 and 0.78 to 0.92 for LSTM ensemble when compared to mean ensemble for NEX-GDDP and CMIP6 dataset respectively. Figures 5 and 6 show the Taylor diagrams of observed and MME simulated monthly precipitation of NEX-GDDP and CMIP6 datasets respectively for the validation period. These figures demonstrate that MME developed using LSTM method matches better with the observed data than MMEs developed using other methods.

Figure 3
figure 3

Scatter plot of observed and MME simulated monthly precipitation for NEX-GDDP dataset.

Figure 4
figure 4

Scatter plot of observed and MME simulated monthly precipitation for CMIP6 dataset.

Figure 5
figure 5

Taylor diagram of observed and MME simulated monthly precipitation of NEX-GDDP dataset during the validation period.

Figure 6
figure 6

Taylor diagram of observed and MME simulated monthly precipitation of CMIP6 dataset during the validation period.

Performance evaluation of MMEs for monsoon season

The results of performance evaluation on daily precipitation of monsoon months (June to September) are given in Table 4. These results indicate that the ML approaches namely, MLR, SVM, ETR and RF have shown very slight and insignificant improvement in performance of MMEs when compared with the mean ensemble approach in the case of daily precipitation in monsoon months of NEX-GDP and CMIP6 datasets. However, MME made using LSTM has shown significant improvement in the performance of daily monsoon rainfall in terms of R and RMSE. The MME developed using LSTM for NEX-GDDP dataset could improve the R value from 0.038 to 0.386 when compared to mean ensemble technique. Similarly, a reduction in RMSE from 31.49 to 23.35 is also achieved by using LSTM model. Similar improvements in R (0.031 to 0.357) and RMSE (29.26 to 23.33) was seen in the case of CMIP6 dataset. Thus, the MMEs of monsoon precipitaion made using LSTM is performing significantly better for NEX-GDDP and CMIP6 datasets. The same is observed in the scatterplots of monthly precipitation given in Figs. 7 and 8 for NEX-GDDP and CMIP6 datasets respectively. The R2 value increased from 0.506 to 0.81 and 0.366 to 0.788 for LSTM ensemble when compared to mean ensemble for NEX-GDDP and CMIP6 dataset respectively. Figures 9 and 10 show the Taylor diagrams of observed and MME simulated monthly monsoon precipitation of NEX-GDDP and CMIP6 datasets respectively for the validation period. These figures demonstrate that MME of monsoon precipitation developed using LSTM method matches better with the observed data than MMEs developed using other methods.

Table 4 Performance of various MMEs in simulating monsoon P.
Figure 7
figure 7

Scatter plot of observed and MME simulated monthly monsoon precipitation for NEX-GDDP dataset.

Figure 8
figure 8

Scatter plot of observed and MME simulated monthly monsoon precipitation for CMIP6 dataset.

Figure 9
figure 9

Taylor diagram of observed and MME simulated monthly monsoon precipitation of NEX-GDDP dataset during the validation period.

Figure 10
figure 10

Taylor diagram of observed and MME simulated monthly monsoon precipitation of CMIP6 dataset during the validation period.

Performance evaluation of MMEs in the case of maximum temperature

Table 3 reveals that all ML methods performed significantly better in simulating daily maximum temperature when compared to ensemble mean approach. The MMEs developed for NEX-GDDP dataset using MLR, SVM, ETR, RF and LSTM gave R values of 0.838, 0.832, 0.86, 0.872 and 0.868 respectively while mean ensemble gave a R value of 0.484. The MMEs made using LSTM and RF method performed the best with RF slightly outperforming LSTM. Further, the MLR method slightly outperformed SVM method. The Figs. 11 and 12 shows the scatter plot and Taylor diagram of average monthly maximum temperature simulations of MMEs developed by different ensembling approaches against reference dataset. These figures show the performance of MMEs developed by all ensembling methods are more or less the same on a monthly basis. In the case of CMIP6 dataset significant improvement is seen in the MMEs developed by ML methods when compared to mean ensemble approach on a daily and monthly case. MME developed by LSTM method performed the best with R value of 0.869 in the case of daily maximum temperature. The scatterplot (Fig. 13) and Taylor diagram (Fig. 14) show the better performance of all ML methods when compared to mean ensemble approach.

Figure 11
figure 11

Scatter plot of observed and MME simulated average monthly maximum temperature for NEX-GDDP dataset.

Figure 12
figure 12

Taylor diagram of observed and MME simulated average monthly maximum temperature of NEX-GDDP dataset during the validation period.

Figure 13
figure 13

Scatter plot of observed and MME simulated average monthly maximum temperature for CMIP6 dataset.

Figure 14
figure 14

Taylor diagram of observed and MME simulated average monthly maximum temperature of CMIP6 dataset during the validation period.

Performance evaluation of MMEs in the case of minimum temperature

All ML methods performed significantly better than mean ensembling methods in the case of minimum temperature in the case of NEX-GDDP and CMIP6 datasets. In the case of NEX-GDDP dataset the R value improved from 0.522 to 0.8 when ML methods are used. A similar increase in R value was also observed for CMIP6 dataset. When it came to evaluation of average monthly minimum temperature no significant improvement is observed. This can be observed in the scatter plots and Taylor diagrams. Figures 15 and 16 show the scatter plots of different MMEs of average monthly minimum temperature against reference dataset for NEX-GDDP and CMIP6 dataset respectively. Figures 17 and 18 show the Taylor diagrams of various MMEs developed for average monthly minimum temperature. However, LSTM remained to be the best performing model in the case of minimum temperature with R values of 0.872 and 0.801 for NEX-GDDP and CMIP6 datasets.

Figure 15
figure 15

Scatter plot observed and MME simulated average monthly minimum temperature for NEX-GDDP dataset.

Figure 16
figure 16

Scatter plot of observed and MME simulated average monthly minimum temperature for CMIP6 dataset.

Figure 17
figure 17

Taylor diagram of observed and MME simulated average monthly minimum temperature of NEX-GDDP dataset during the validation period.

Figure 18
figure 18

Taylor diagram of observed and MME simulated average monthly minimum temperature of CMIP6 dataset during the validation period.

Inter-comparisons of performance of different MMEs

Different approaches like mean, regression models (i.e., SVM and MLR), an ensemble learning models (i.e., ETR and RF), and deep learning time series model (i.e., multivariate LSTM) are used to create MMEs for 21 NEX-GDDP models and 13 CMIP6 models outputs for P, Tmin and Tmax. In the case of precipitation LSTM significantly outperformed all the other MME approaches with R values of 0.74 and 0.73 for NEX-GDDP and CMIP6 dataset respectively. The performance of all the other MME approaches was more or less the same with R values in the range of 0.52 to 0.58. Similarly, MMEs of LSTM gave R2 of 0.94 and 0.92 in the case of NEX-GDDP and CMIP6 datasets for monthly precipitation. The study done explicitly for monsoon rainfall shows that all methods except LSTM failed in giving good performance of MMEs. This shows that LSTM method to an extend is successful in predicting rainfall magnitude in monsoon months. Hence, this study reveals the superiority of LSTM compared to other methods in ensembling monsoon precipitation.

However, in the case of temperature, all ML approaches performed equally good when compared to mean ensembling approach. All ML methods could improve the R value from 0.5 to a range of 0.8 in the case of temperature. In the case of maximum temperature of NEX-GDDP dataset, the MME made with RF (R = 0.872) slightly outperformed LSTM (R = 0.868). In all the other cases of all ML methods performed equally well, with LSTM showing a slightly increased performance. The same pattern was observed in all the grid points in the basin. Ensemble learning models like RF and ETR also performed well in the basin in the case of maximum and minimum temperature. They outperformed MLR and SVM in all the cases. Hence, MMEs developed through LSTM, RF and ETR algorithms are recommended for creating MMEs in the basin. In general, all ML methods performed better than mean ensemble approach. This is seen in other studies like that of Ahmed et al.2.

Summary and conclusions

In this study, an attempt has been taken to evaluate the performance of MMEs developed using six ensembling methods. These ensembling techniques include simple statistical technique (mean), regression models (i.e., SVM and MLR), ensemble learning models (i.e., ETR and RF), and deep learning time series model (i.e., LSTM). The performance evaluation of each ensembling technique was done in order to find the best-performing MMEs of 21 NEX-GDDP and 13 CMIP6 GCMs in Netravati basin. This comparison shows that the application of a LSTM model for climate model ensemble prediction performs significantly better than the benchmark models including other machine learning techniques and mean ensembling techniques in the case of precipitation. It gave a coefficient of determination of 0.94 and 0.92 in the case of NEX-GDDP and CMIP6 monthly precipitation datasets. The MME of LSTM method could simulate the monsoon rainfall magnitude satisfactorily when compared to all the other methods. Hence, LSTM deep learning models are seen to be an attractive approach for climate data prediction. This could be because of its capability in learning long-term dependencies in observed data, which lead to better predictions results that outperform several alternative machine learning and statistical approaches. In case of temperature all the ML methods showed equally good performance, with RF and LSTM performing consistently well in all the cases of temperature. The coefficient of determination in the range of 0.9 and 0.8 are observed for MMEs developed using RF and LSTM techniques in the case of monthly average maximum and minimum temperature respectively. Hence, based on this study RF and LSTM are recommended for creation of MMEs in the basin. In general, all ML approaches performed better than mean ensemble approach. However, this study limits its scope to machine learning methods and does not analyse its performance on extreme vales. Hence, a future study which analyses its effectiveness on extreme values may be done. Further, other multi-model combination like triple collocation and Bayesian approaches may be explored in future studies53,74,74. Thus, based on present study the following specific conclusions may be drawn:

  1. 1.

    The inter-comparison of MMEs developed using mean, SVM, MLR, ETR, RF and LSTM show that ML-based MMEs performed better than the mean ensemble approach. Therefore, ML methods are recommended for the creation of MMEs of climate data in future studies.

  2. 2.

    A time series model like LSTM could be a good choice for creation of MMEs. Hence, more studies which explore the usage of time series/sequential models for creation of MMEs may be done in the future.