Proposing an ensemble machine learning based drought vulnerability index using M5P, dagging, random sub-space and rotation forest models

Drought is one of the major barriers to the socio-economic development of a region. To manage and reduce the impact of drought, drought vulnerability modelling is important. The use of an ensemble machine learning technique i.e. M5P, M5P -Dagging, M5P-Random SubSpace (RSS) and M5P-rotation forest (RTF) to assess the drought vulnerability maps (DVMs) for the state of Odisha in India was proposed for the first time. A total of 248 drought-prone villages (samples) and 53 drought vulnerability indicators (DVIs) under exposure (28), sensitivity (15) and adaptive capacity (10) were used to produce the DVMs. Out of the total samples, 70% were used for training the models and 30% were used for validating the models. Finally, the DVMs were authenticated by the area under curve (AUC) of receiver operating characteristics, precision, mean-absolute-error, root-mean-square-error, K-index and Friedman and Wilcoxon rank test. Nearly 37.9% of the research region exhibited a very high to high vulnerability to drought. All the models had the capability to model the drought vulnerability. As per the Friedman and Wilcoxon rank test, significant differences occurred among the output of the ensemble models. The accuracy of the M5P base classifier improved after ensemble with RSS and RTF meta classifiers but reduced with Dagging. According to the validation statistics, M5P-RFT model achieved the highest accuracy in modelling the drought vulnerability with an AUC of 0.901. The prepared model would help planners and decision-makers to formulate strategies for reducing the damage of drought.


Introduction
Drought vulnerability assessment is becoming an important topic of research due to the increased interest in developing evaluation approaches and adaptation strategies that are associated with climate change. Frequent droughts and tremendous heat events, according to the Intergovernmental Panel on Climate Change (IPCC, 2001), might increase drought vulnerability and effects on socio-economic condition of the region. Drought, an inherent feature of the earth's climate, frequently emerges without notice and with no discernible borders, resulting in yearly agriculture damage that costs billions of dollars (Ortiz-Bobea 2021). Drought impacts nearly all climatic zones (Ajayi & Ilori 2020) and over half of the world every year (Feng et al. 2019a, b). According to Rosselló et al. (2020), drought ranks first among natural catastrophes in terms of the number of people directly impacted across the world. Drought happens in high-and low-rainfall locations and in all climatic zones; its repercussions are crucial and costly, impacting more people globally than other natural catastrophes (Mishra et al. 2021). Drought has varying consequences based on the degree of progress and coping skills of regions and nations; it affects the economy and livelihood, as well as the trade of public and private companies in developing nations.
Other significant catastrophes, such as cyclones, floods and droughts, affect the economy of India. Drought occurs on a regular basis in various parts of the country. Drought is a geographically widespread hazard globally. As per the recent study by the National Centre for Atmospheric Research (NCAR), the percentage of severe drought Extended author information available on the last page of the article affected area has increased four times in the Earth since 1970 to the 2000. According to Baarsch et al. (2020), the percentage of worldwide land experiencing very dry circumstances increased from 10 to 15% in 1970s to over 30% in 2002. Drought vulnerability is receiving more attention because of its significant economic consequences and social risk concerns (Karimi et al. 2018;Haile et al. 2020). Drought vulnerability depends on the water resources system's dependability and capacity to adapt efficiently, and it may increase with population growth, with different water requirements, and with the intensification of the conflicting demands for water resources (Thomas et al. 2016;Hoque et al. 2021). In disaster management, drought vulnerability assessment is the latest paradigm that helps decision makers plan for drought, assign resources and mitigate damage. The degree of drought exposure and the area's damage coping capabilities determine a region's vulnerability to drought, and underdeveloped places with weaker coping capacities and greater exposure are at the greatest danger. Vulnerability of any area is determined by exposure, sensitivity and adaptive capacity (Intergovernmental Panel on Climate Change [IPCC] 2001). While sensitivity refers to how much a system is affected by a disaster, adaptive capacity refers to how well a system can withstand and absorb a disaster, and exposure refers to how much and how long a population is exposed to a disaster (Ebi et al. 2006).
Vulnerability includes both temporal and spatial aspects, because it evolves in response to technological changes, human behaviour, activities and legislation (Bevacqua et al. 2018;Turner 2021). The vulnerability manifests itself in certain places at various times, indicating that it is context-, place-and time-specific, as well as particular to the viewpoint of the individual judging it (Germain and Knight 2021). In the fields of geography, agricultural science, water resources, climate science and social science, numerous studies on assessment of vulnerability have been conducted (Mukherjee et al. 2019). Some scholars have established quantitative methods of drought vulnerability (Han and Zhang 2018;Hurlbert and Gupta 2019;Sharafi et al. 2020), whereas others have tried to conceptualise the character of vulnerability from different theoretical perspectives (Kaufman et al. 2020).
Creating techniques for measuring vulnerability is challenging because of the difficulty of the systems under investigation and the premise that susceptibility is not an immediately detectable phenomenon (Alodah 2019;. A rigorous assessment of vulnerability is critical, because it may help build focused drought prevention and response plans. Murthy (2020) devised a statistical weighting methodology to measure the potential of agricultural drought and found that non-irrigated farmland and ranging land on sandy soils seem to be the places most sensitive to agricultural drought. The security graph idea was utilised by West et al. (2019) to measure the susceptibility to dryness of India under extreme climatic stress; environmental pressure was obtained from indications of water stress created by the model of Water Gap. The World Meteorological Organization (WMO) suggested the use of the standardised precipitation index (SPI) (Kobrossi et al. 2021) as the worldwide index to assess drought. Nyairo et al. (2020) used dynamic system analytic methods to explore the vulnerability of the food system to climate change and the degradation of land in the pastoral area of Kalahari of Botswana with emphasis on drought susceptibility. Huynh and Stringer (2018) offered an outline of the susceptibility to climatic change of the connected systems of social ecology. Guo et al. (2021) evaluated the drought vulnerability across three intensities of drought, namely, very-high, extremely-high and critical regions, for the farmers who produce wheat in the western part of Iran. They concluded that drought vulnerability influences the socio-cultural and economic conditions. Banihashemi et al. (2021) established and assessed drought susceptibility parameters among the farmers who produce wheat in Mashhad County, Iran, including social, economic and technical variables. Paul et al. (2020) used multi-attribute strategic planning techniques depending on a set of criteria, performance and various indicators to build a novel extremely effective technique for geographical evaluation of drought susceptibility in Iran for the river basin of Zayandeh-Rood. Thomas et al. (2016) created a drought vulnerability indicator (DVI) that depicts multiple dimensions of drought susceptibility assessed at the Pan African level depending on economic capability, renewable natural capital, societal and human resources, technology and infrastructure.
The present research was conducted in the state of Odisha, India, where drought is a serious concern (Senapati 2019; Saha et al 2021). A large part of Odisha is frequently affected by drought (Sahu and Nandi 2016). Farmers in the state are frequently affected by drought, which leads to many agricultural losses. Numerous works have been done to predict or forecast drought conditions throughout the world in consideration of different machine learning techniques. Methods such as artificial neural network (Nabipour et al. 2020), random forest (Dikshit et al. 2020) and support vector machine (Zahraie and Nasseri 2011) for hydrological and meteorological drought forecasting provided good results. When Dikshit et al. (2021a; employed a variety of deep learning algorithms to anticipate drought, the results were superior to those produced using conventional statistical methods. Utilizing conventional statistical techniques, some research have been done on assessing drought vulnerability. Recently, Hoque et al. (2020; and Saha et al (2021) used analytical hierarchical process (AHP) and Fuzzy-AHP methods to analyse drought vulnerability. Saha et al. (2021a) used ANN and Bagging method for assessing the drought vulnerability situation of Karnataka state. In order to model the susceptibility of various hazards, a variety of ensemble machine learning algorithms (MLAs) were utilised, including landslide (Antronico et al. 2020), gully erosion (Roy et al. 2021), land subsidence (Tien Bui et al. 2018), deforestation (Saha et al. 2021b) and flood (Nhu et al. 2020a, b, c), rather than drought. Better accuracy was obtained compared with when conventional statistical and semi-quantitative methods were used (Dikshit et al. 2020). Combining various models is advised in order to decrease model mistakes and improve forecast accuracy. Although each combination demonstrated greater modelling prediction performance with high dependability, new ensemble-based techniques still need to be investigated and used. In various geo-hazards, ensemble models including bagging, boosting, and stacking have frequently been used. By building a number of prediction functions and then combining them in a certain way to create a predictive function, bagging helps to enhance unstable estimating or classification schemes and may be used to increase the accuracy of learning algorithms. Boosting has the ability to correct the poor classifier mistakes made by unreliable learners. Its core concept is to train several weak classifiers on the same training set, then combine these weak classifiers to create a stronger final classifier (Freund and Schapire 1997). A learning algorithm is taught to aggregate the predictions of multiple different learning models in a technique called stacking, which is an ensemble learning approach that often produces greater performance than any trained model alone (Wolpert 1992). It is important to keep in mind that models with theoretically ideal performance may not always produce superior outcomes in practical situations. Furthermore, it is uncertain if different ML techniques may be applied or generalized in varied geographic situations.Most of the MLAs were used for forecasting the drought conditions rather than drought vulnerability. Dagging, Random Subspace (RSS), and Rotation Forest (RTF) were applied in landslide, flood and deforestation susceptibility modelling (Pham et al. 2017;Wang et al. 2020;Saha et al 2021b) except drought vulnerability modelling and provided good results. The main research questions are as follows: (1) can ensemble MLAs provide better results than the conventional statistical and semi-quantitative methods in drought vulnerability assessment? (2) can meta classifiers increase the level of accuracy of base classifier (M5P) in drought vulnerability modelling? Therefore, the main novelty of our work is the assessment of the drought vulnerability of Odisha using ensemble models, such as M5P, M5P-Dagging, M5P-RSS and M5P-RTF. To conduct this research, a wide variety of exposure, sensitivity and adaptive capacity (total 53 factors) were used to account for all possible drought scenarios. The criteria were chosen according to past research, and the evaluation was conducted using well-known MLAs. In each case, the capacity for forecasting of the output of the model was enormously appreciable. As a result, using MLAs is nothing new. The use of these ensemble machine learning models to evaluate drought vulnerability by taking into account as many as 53 parameters, however, is special. The primary goal of the current study is to develop a relative drought vulnerability map for Odisha using a variety of meteorological, hydrological and socioeconomic factors.

Study area
Odisha is situated on India's eastern coast, which spans from 17.31°N to 22.31°N latitudes and 81.31°E to 87.29°E longitudes geographical coordinates (Fig. 1). The region's coastline stretches for 485 km all along the Bay of Bengal. High temperatures, heavy humidity, medium to high rainfall and brief and mild winters define Odisha's climate. A tropical climate defines the state (Santos et al. 2021). The state's average rainfall is 1451.2 mm, most of which falls between June and September. Drought, floods and cyclones occur every year, with various degrees of intensity (Pidathala et al. 2018). The central plateaus, Utkal plains, central mountains, western hills, highlands and floodplains and western hills and floodplains are the five primary physiographic zones of Odisha. Brahmani, Mahanadi and Baitarani are the state's main rivers, all of which flow into the Bay of Bengal. Odisha has a population of 42 million, according to the 2011 Indian Census (3.47% of the total population of India). Drought is not unknown to the people of Odisha, as it happens every year in several parts of the state, with variable degrees of intensity and scale. The first severe drought in the state occurred in 1866 (Saha et al 2021a). Since then, the state has experienced several moderate-to-severe drought events. This indicated that droughts of moderate-to-high intensity occur every 8 years or so in Odisha. Droughts of exceptional intensity occurred in Odisha in 1866,1919,1965,[2000][2001]2015 (www.business-standard.com/article/current-affairs) and 2019 (Swain et al. 2021a, b). The 2000-2001 drought was the most devastating. Droughts hit nine districts throughout Odisha in 2018, the most of which were in western Odisha, where farmers faced crop losses of 33% and higher owing to moisture stress. Drought hit at least 25 of the 30 districts in 2015, owing to an irregular southwest monsoon. Drought has long been a problem in the state's western and southcentral regions. In terms of strategic management and planning, a drought risk analysis of this state is critical.

Methodology
The drought vulnerability maps were produced using four well-known MLAs in the following four phases (Fig. 2).
Step-1: selection of vulnerability parameters: The selection of the drought vulnerability factors (DVFs) was based on a review of the literatures and the state of the environment. The variables were then separated into three sub-categories, namely exposure, sensitivity, and adaptive capability. Step-2: Development of the thematic data layers: To forecast spatial drought vulnerability, data on droughtimpacted villages and DVFs were gathered.
Step-3: DV maps preparation: With the aid of training datasets, ensemble machine learning models (M5P, M5P-Dagging, M5P-RTF and M5P-RSS) were used to create drought vulnerability maps. Specific indices, such as exposure, sensitivity and adaptive capability, were assessed using a set of chosen parameters and to generate the final vulnerability map. Finally, three indices were used to create the vulnerability map of drought.
Step-4: validation and comparison of the models: To verify the used models, the AUC-ROC, precision, K-index, root mean square error (RMSE) and mean absolute error (MAE) were used. Friedman and Wilcoxon rank tests methods were also used to examine the dissimilarities in the models' prediction performance.

Constructing spatial data layers
Identification of current drought-impacted regions is critical for drought vulnerability mapping. A total of 248 drought affected areas (points) were collected from the Odisha state record and were divided into 70:30 ratio among testing and training points (Fig. 2). Similarly, same number of non-drought affected locations were selected randomly for training and testing the applied models. Both drought and non-drought locations were identified after consulting the data from the Disaster Management Department of Odisha and local dwellers. Various factor layers have been created in the ArcGIS environment based on the available data (Table 1). Various socioeconomic and meteorological characteristics (a total of 53 parameters) were chosen based on prior literature and geo-environmental condition of the study area (Table 2). These parameters were then divided into three categories, namely, adaptive ability, sensitivity and exposure. After taking into account all of the layers, data incorporation and analysis were completed. For producing the final drought vulnerability maps resolution of DEM (30 m*30 m) was considered as base resolution and the other factors having the resolution more or less than the DEM were resampled. Afterwards, the M5P model was utilised as a basis classifier, and three more models were used to create the novel ensemble models. M5P-Dagging, M5P-RSS and M5P-RTF were the ensemble models.

Exposure indicators
These variables indicated the degree to which a region or its people are subjected to drought (Table 3). Severe drought frequency (%) of 3, 6, 12 and 24 months, frequency of extreme drought (%) of 3, 6, 12 and 24 months, magnitude of drought of 3, 6, 12 and 24 months, mean intensity of drought of 3, 6, 12 and 24 months, return period of extreme drought of 3, 6, 12 and 24 months, return period of severe drought of 3, 6, 12 and 24 months, critical rainfall, yearly average rainfall, rainfall trend and vegetation condition index (VCI) are some of the indicators used to measure exposure (Fig. 3). The Standard Precipitation Index was used to calculate severity and frequency of extreme drought, magnitude of drought, mean intensity of drought and extreme and severe drought return periods. Where the rainfall was high, the drought effect was low. When the requirement for rainfall was higher, the chances of drought were high in that particular area. By the frequency, magnitude, intensity and return period of severe and extreme drought, the occurrence of a drought scenario in a specific area can be determined.

Standard Precipitation Index (SPI)
To gauge the severity of the drought, the SPI value was calculated (McKee et al. 1993). The WMO authorised this index.
Only the rainfall is required to calculate SPI. This rainfall data were used to calculate the drought for a variety of time periods, including 48, 24, 12, 6, 3 and SPI-1 months (Mehr and Vaheddoost 2020). Specific SPIs were estimated by using the precipitation data and the equation given by McKee et al. (1993) in the environment of R.
Only the classifications of severe and extreme drought were utilised to determine the severity of drought incidence in Odisha (Table 3). The droughts have been categorized  based on drought classification values of SPI given by McKee et al. (1993). The periods of the SPI utilised in the study were 3, 6, 12 and 24 months since it effectively depicts long-term rainfall conditions and may be used to calculate reservoir levels, stream flows and levels of groundwater. The frequency of intense and severe droughts is computed as a percentage using the formula: where DF i,100 represents the drought frequency number for time scale I (3, 6, 12 and 24 month) in 100 years; N i denotes the figure of drought months for a i time scale within the of n year set; i = time scale (i.e., 6, 12 and 24 months). Cumulative water stress was represented by drought magnitude throughout the period of drought, and the average of this cumulative scarcity of water throughout the period of drought was denoted by mean drought intensity (MID) (Dayal et al. 2018). Thus, MD and MID can be computed using Eqs. 2and3, as follows: SPIij denotes the SPI values for a specific time scale j (3, 6, 12 and 24 month) and m denotes the month number. The California approach was applied using the Eq. 4 (Wable et al. 2019) to calculate the recurrence interval (RI) of severe and extreme drought. All values of SPI were sorted in rising order, and a rank was assigned to all SPI values.
where n is the number of occurrences, and p is the event's rank.
Linear regression was used to assess the rainfall trend (Panda and Sahu 2019). Critical or threshold of rainfall is defined as the total amount of rain below which the drought will occur (Ghosh, 2019). The Eq. 5 was used to compute critical rainfall or threshold rainfall, as follows: where, r indicates the Standard Deviation of rainfall. The value of SPI is equivalent to -1.5. Average value of rainfall is indicated by the X. The SPI value ''-1.5'' is considered as the threshold rainfall value. Equation 6 has been utilised for calculating the vegetation condition index (VCI). The state of vegetation is generally expressed as a percentage. VCI levels around 0 (zero) percent indicate severe dryness, and VCI values from 50 to 100% represent typical vegetative conditions. Drought circumstances were indicated by a VCI of lessthan 50%, whereas severe drought situations were indicated by a VCI of 0% to 35%.
where, NDVIi represents the value of NDVI for a single pixel in the i month. NDV Imax and NDVI min are the Less than -2 Extreme Drought maximum and minimum NDVI values for the same pixel, respectively. The VCI has been used to analyse the geographical features of drought, but prior research has seldom assessed its efficacy in detecting and distinguishing waterstressed farmland from other plants. (Table 3)

Sensitivity indicators
Cropping intensity, irrigation intensity, net sown area, population density, small and marginal farmers, total water demand, total water use, temperature trend, bare soil index (BSI), slope, evaporation, aridity index, soil texture, altitude and annual wet day frequency were the sensitivity factors (Fig. 4). These factors affected how exposure would manifest; for example, when a region's population grew, more individuals would be exposed to drought, increasing the region's vulnerability (Naumann et al. 2019). Evaporation and temperature are two of the most important meteorological variables that influence land cover, ecological sustainability and water balance (Ekwueme and Agunwamba 2020). Both biophysical and economic variables are included in this class of sensitivity. Temperature is a key determinant of drought sensitivity (Mega et al. 2019). As the temperature rises, drought becomes more prevalent (Shi et al. 2021). In this study, the temperature trend was calculated using linear regression. Total water usage was linked to drought vulnerability, since water demands in areas with high water use are likely to be greater during dry years. Therefore, regions with considerably higher water usage would experience more dry seasons than regions with lower water use. Cropping intensity is defined as the ratio of net cropped area to gross cropped area. According to Potopová et al. (2021), crop intensity and drought intensity increase together. In areas where there is a lack of water, a drought might be disastrous (Bakht et al. 2020). Domestic water use, agricultural water use, animal water use, and industrial water requirements are all added together to get the overall water demand. Drought will affect marginal and small farmers more severely if their numbers are high (Venancio et al. 2020). Small farmers will be more affected by drought conditions than large farmers since most of them utilise lowtech production techniques and have limited agricultural area. The net planted area can also be used to estimate drought vulnerability. Drought will have a greater impact on agriculture with the growth of the net sown area and vice versa. As the aridity index value increases, the dryness will increase, but dryness will decrease as the aridity index value decreases (Wu et al. 2021).
where, the PET stands for the potential evapotranspiration and AET stand for actual evapo-transpiration. Thus, the drought effect is higher in areas with high aridity (Yves et al. 2020). On one hand, when the soil is open or bare, the area will be highly affected by drought; on the other hand, the area covered by vegetation will be relatively protected from the effect of drought (Sankaran 2019). The bareness of the soil is calculated by using the BSI. It is a numerical indicator that normalises the blue, red, near-infrared, and short-wave-infrared spectral bands of a multispectral picture. The spatial picture of soil bareness is obtained by combining those bands in a defined fashion (Eq. 8), and it is utilised as the sensitivity data layer (Fig. 2g) for this investigation, as follows: where, b SWIR stands for short-wave infrared band brightness, b R for red band brightness, b NIR for near-infrared band brightness and b B for blue band brightness. A coarser soil texture cannot hold the moisture in the top layer.
Consequently rain water will penetrate deep into the soil. The top layer of the soil will remain dry. So, this area will be more affected by drought events. On the contrary, if the soil texture is fine, then the soil will hold the water in the top layer, and the top layer will remain wet. So, this area will be less affected by the drought event. Annual wet day frequency is the most significant indicator of drought. The likelihood of a drought decreasing as the number of wet days in a year increases and vice-versa.

Adaptive capacity
The inadequacy of a population group to react adequately to a certain widespread stressor is how vulnerability is typically defined (Cianconi et al. 2020). As a result, social vulnerability refers to a population group's vulnerability as a result of the deficit of resources with which to react to a hazard (Antronico et al. 2020). Consequently, environmental indicators of adaptation capability, such as distance from river, distance from wetland, net water availability, NDWI and NDVI, and socio-economic factors, such as distance from dam, net irrigated area, income index, health index and education index, were included in this study. These factors demonstrate the population's capacity to b Fig. 3 Factors of exposure: Drought magnitude-A 3 month, B 6 month, C 12 month, D 24 month, Mean drought intensity-E 3 month, F 6 month, G 12 month, H 24 month, Return period of extreme drought-I 3 month, J 6 month, K 12 month, L 24 month, Return period of severe drought-M 3 month, N 6 month, O 12 month, P 24 month, Extreme drought frequency -Q 3 month, R. 6 month, S. 12 month, T 24 month, Severe drought frequency -U 3 month, V 6 month, W 12 month, X 24 month, Y Vegetation condition index (VCI), Z Rainfall trend, A 1. Average annual rainfall and A 2. Critical rainfall respond to a drought situation (Fig. 5). Higher availability of irrigation facility reduces drought vulnerability, because it fulfils the water demand during dry season. Drought cannot readily affect regions when vast volumes of water are available at all times of the year (Mafi-Gholami et al. 2020). If there is insufficient rain in a particular location with high water availability, then the region may make up the shortfall by using available water sources. Such areas will experience drought if there are no other water supplies available than rainfall. The health, wealth, and level of education of the local people strongly influence drought susceptibility. If a region's health, education, and economic possibilities improve, drought vulnerability will diminish (Phelps and Kelly 2019). The more away from dams, rivers, and wetlands a place is, the more susceptible it is to drought conditions; this is because the supply of water declines with increasing distance from the dam, the river and the wetlands (Moser et al. 2019). The area is protected from dryness by a dense plant cover (Helcoski et al. 2020). Bare terrain is more susceptible to drought. If the NDVI value is closer to 1, then a significant quantity of plant cover is available. If the value is closer to -1, then a very small amount of vegetation cover is available. The NDVI is calculated (Eq. 9) using Landsat 8 Operational Land Imager (OLI) imagery, as follows: NDWI is an important parameter for identifying drought situations, because the moisture or water condition can be represented by this index (Marusig et al. 2020). Mcfeeters in 1996 developed the NDWI approach to represent water bodies based on the fact that water has the highest absorption, and vegetation has the highest reflectance in near infrared (Ety et al. 2021).

Dagging
Another ensemble machine learning (EML) approach used to generate meta-learners is the dagging algorithm, and it is often called disjoint aggregating (Zounemat-Kermani et al. 2021). Dagging is comparable to bagging, but the sampling technique is different. Rather than using bootstrap sampling, this technique uses the disjoint sampling approach to obtain randomised training sections from the actual dataset without replacing them (Barzegar et al. 2021). Finally, the different output models derived from disjunct samples are combined using the methodology of the majority vote.

Random Subspace (RSS)
RSS was introduced in 1988 to enhance the reliability of weaker classifications and the performance of individual classifications. RSS is a common approach for random selection, in which the main character varies at random (Costello and Lee 2020). RSS has been utilised in a variety of disciplines, including economics and medical, but very seldom in potential groundwater determination.

Rotation Forest (RTF)
An EML classifier, RTF is a generating approach that aims to provide a wide range of precise classifiers within the ensemble (Subasi et al. 2019). This process utilises the bootstrapping sampling method and trains the decision trees; it is separately dependent on constructing a classifier ensemble to use a future extracting approach, such as the principal component analysis (PCA), which is comparable with the ideas of bagging approaches. In Rotation Forest, retrieving features were utilised for every one of the basic classifiers to sustain variety (Geran Malek et al. 2021). Each one of the base classifiers is trained on the whole dataset in the rotated feature space to optimise individual efficiency. The model parameters are randomly divided into subgroups, and every subset is subjected to feature excavation (Table 4). The ultimate result is obtained by merging most of the trees' mean outputs.

Ensemble of models
Nguyen et al. (2020) defined ensemble modelling as a strategy for combining the impacts of many models into a single embedded model to improve prediction capacity. This technique has attracted the attention of academics working on specific machine learning and data mining models. Seni and Elder (2010) described the creation of ensemble models by using the weighted integration of a single model. However, the method used for calculating these weights is complicated. In this study, for ensembling the models, M5P was used as base classifier, and RTF, RSS and Dagging were used as meta classifiers. Ensemble meta classifiers have been used to optimise the input data using training dataset before creating drought vulnerability models. The basis classifier of M5P has then been applied to identify classes for drought vulnerability spatial prediction using optimized input data. Finally, models for drought vulnerability have been developed using machine learning ensemble frameworks. Maps of drought vulnerability have been created using the results of training drought models.

Receiver operating characteristics (ROC)
Validation is a crucial step in determining the scientific relevance of a completed study (Hribar et al. 2018;Hong et al. 2018). The area under curve (AUC)-ROC was used by experts to examine the predictive ability of the models. Graphical representation of a model achievement as indicated by a diagnostic assessment is displayed by the ROC (Heldt et al. 2021). The correct (drought-impacted zone) and the erroneous (non-drought-impacted zone) predictions are represented on the Y and X axes, respectively. The AUC was used to assess the models' predictions. The AUC ranges from 0 to 1 with a value nearer to 1, thereby indicating a model's ability to predict effectively (McCune, et al. 2020).

Precision
When similar data are assessed frequently, the extents of the estimated values are similar to one another. Precision represents the degree of random deviations in the estimation process (Reynolds et al. 2021).
where, TP = True positive value, FP = False positive value.

Root mean square error
The employed models' predictive power was assessed using the ROC and precision, while the predictive model error was assessed using the RMSE and MAE .The RMSE was determined by comparing observed data in the field with projected values provided by the model (Willmott et al., 2005). The following formula was used to determine the RMSE value: Oi and Si are the values of anticipated and observed, respectively. The entire amount of data points is denoted by n.

MAEI
MAE is similar to RMSE in that it is calculated as the sum of differences between model-predicted values and field observed values, but it does not consider the direction of the differences (Willmott et al., 2005) (Eq. 13), as follows: Oi and Si are the values of anticipated and observed, respectively. The entire amount of data points is denoted by the letter n.

K-index
This coefficient is being used to determine how accurate a categorisation is. Kappa is a measure of how well a categorisation performs compared with randomly assigning values (Silva and Eugenio Naranjo 2020). The kappa coefficient might be anything between-1 and 1. A value of 0 indicates that the categorisation is not superior to the arbitrary (Ghada et al. 2019). A negative number indicates that the categorisation is less accurate. A positive value means that the classification is superior to random classification.

Friedman and Wilcoxon rank test
In 1937, Friedman created a non-parametric test to identify substantial differences between two applied models (Miraki et al. 2019). If the P-values are \ 0.05, then the alternative hypothesis is accepted, thereby implying that a substantial difference exists among the predictions of models (Chung et al. 2019

Exposure mapping using ensemble models
The four MLAs were used to create the exposure maps (Fig. 6). By using the natural-break method, each model's anticipated drought exposure was divided into five exposure classes (Hoque et al. 2021). The north western and western parts of the research region were highly exposed to drought. The high and very-high exposure zones for the M5P model encompassed 10.80% and 24.71% of the state. Except for the north-western and western parts of the state, very-low, low and moderate exposure zones accounted for 19.52%, 34.15% and 10.80% of the total area, respectively (Table 5). For the model of M5P-Dagging, very-low and low exposure areas captured 47.79% of the total area in the eastern to southern part. The high and very-high exposure zones comprised 33.09% of the total land on the western and north western sides. The north eastern and central parts

Sensitivity mapping using ensemble models
The same four MLAs were used to create the sensitivity maps (Fig. 7). With the aid of natural-break method, each model's anticipated drought sensitivity was divided into five sensitivity classes, the same as exposure. In the northwestern and western parts of the study area, the high and very-high sensitivity zones for the M5P model encompassed 17.16% and 21.10% of the state, respectively. Very-low, low and moderate sensitivity zones are accounted for 23.20%, 23.85% and 14.67% of the total area, respectively (Table 6). For the model of M5P-Dagging, very-low and low sensitivity areas captured 52.49% of the total area in the eastern to southern part. The high and very-high sensitivity zones comprised 32.40% of the total land on the western and north-western sides. The north-eastern and central parts were occupied by a moderate sensitivity zone (15.08%) ( Table 6). In the northwestern and western parts of the study region, the high and very-high sensitivity zones for the M5P-RSS model comprised 15.57% and 23.42% of the study area, respectively. Very-low, low and moderate sensitivity zones are accounted for 23.38%, 18.04% and 19.56% of the total area, respectively (Table 6). For the model of M5P-RTF, very-low and low sensitivity areas captured 76,422.36 sq. km of the total area in the eastern to southern part. The high and very-high sensitivity zones comprised 54,697.55 sq. km. of the total land on the western and north-western sides. The total central part was occupied by a moderate sensitivity zone (24,587.08 sq. km.) (Fig. 4). Owing to the high evaporation rate, increasing trend of temperature and high aridity, the sensitivity to drought was very high in the western part of the state, as shown by the all models.

Adaptive capability mapping by using ensemble models
To produce the adaptive capability maps, M5P, M5P-Dagging, M5P-RSS, and M5P-RTF ensemble models were also employed (Fig. 8). By using the natural-break method, each model's anticipated drought adaptive capability was divided into five adaptive capability classes, such as in exposure and sensitivity maps. In the north-western and western parts of the region, the high and very-high adaptive capability classes for the M5P model covered 12.03% and 3.73% of the study area, respectively. Except in the northwestern and western parts of the research territory, verylow, low and moderate adaptive capability zones captured 10.94%, 36.85% and 36.42% of the total area, respectively (Table 7). For the model of M5P-Dagging, very-low and low adaptive capability areas captured 58.77% of the total area from the eastern to southern part. The high and veryhigh adaptive capability zone comprised 23.6% of the total land. The north-eastern and central parts were occupied by a moderate adaptive capability zone (17.60%). For the M5P-RSS model, the high and very-high adaptation capacity zones made up 25.37% and 7.62% of the research area, respectively. Zones with very low, low, and moderate adaptation capabilities make up, respectively, 11.79%, 38.83%, and 16.37% of the total area (Table 7). For the model of M5P-RTF, very-low and low adaptive capability

Vulnerability mapping by using ensemble models
The four ensemble models generated three indices, namely exposure, sensitivity, and adaptive capacity, which were then utilised to create maps of drought vulnerability (Fig. 9).The predicted drought vulnerability was categorised into five classes, namely, very-high, high, moderate, low and very-low by the natural break method. For the model of M5P, very-high to high vulnerability zones were found in the western and north western parts of the study region. These two zones occupied 35.6% of the study area. Very-low to low vulnerable areas were found along the north eastern to southern part of the map. These two zones occupied 51.96% of the area. A moderate vulnerable zone was found in the central part (12.41%) in between the veryhigh to high and very low to low vulnerable zones (Table 8). For the model of M5P-Dagging, very-low and low vulnerable areas captured 71,879.59 sq. km of the total area in the eastern to southern part. The high and very-high vulnerable zones comprised 62,400.14 sq. km. of the total land in the western and north-western part. The entire central part was occupied by a moderate vulnerability zone (21,427.26 sq. km.) (Table 8). In the north-western and western parts of the study region, the high and very-high vulnerable zones for the M5P-RSS model comprised 14.22%and 23.68% of the study area, respectively. Except in the north-western and western parts of the research region, very-low, low and moderate vulnerability zones accounted for 25.53%, 25.81% and 10.74% of the total area, respectively (Table 8). In the case of the model of M5P-RTF, very-low and low vulnerable areas covered 75,345.34 s q. km of the total area from the eastern to southern part. The high and very-high vulnerability zones occupied 59,030.78 sq. km. of the total land on the western and north western sides. The middle part of the map was under the zone of moderate vulnerability. (21,330.87 sq. km). 3.5 Correlation between drought vulnerability and exposure, sensitivity and adaptive capacity indices The vulnerability of an area is significantly associated with the exposure, sensitivity and adaptive capability of that area (Singha et al. 2020). To assess the relationship between drought vulnerability and the three indices of exposure, sensitivity and adaptive capability, the Pearson correlation method was used. Exposure and sensitivity indices were very strongly related with the overall drought vulnerability. The exposure index correlation values (r) ranged from 0.545 to 0.990. Sensitivity index r values ranged from 0.556 to 0.984 (Table 9). Adaptive capability index's association with overall drought vulnerability was not as high as those of exposure or sensitivity indices. Table 9 shows that the r values for all pairs were lower in the case of M5P-Dagging than in other applied models.

Comparison and validation between different models of drought vulnerability
Rationality evaluation is a crucial step in reaching a conclusion on the predictive ability of deployed models (Karstoft et al. 2021). The ROC, RMSE, MAE, precision, K-index, Friedman test and Wilcoxon test were used to compare and validate the models. The ROC curve results showed that in the case of training data, the AUCs for the RTF-M5P, RSS-M5P, M5P and DAG-M5P models were 0.873, 0.855, 0.842 and 0.805, respectively (Table 10). The result of the ROC curve showed that in the case of testing data set, the AUCs for the RTF-M5P, RSS-M5P, M5P and DAG-M5P models were 0.901, 0.874, 0.859 and 0.852, respectively (Fig. 10). As the ROC values ranged from 0.805 to 0.901, the four ensemble models had good prediction capabilities for generating the drought vulnerability map. However, the RTF-M5P model was the best fit for producing a map of drought vulnerability, because it had the highest AUC value in both training and testing (

Discussion
Drought is one of the most disastrous climatic hazards. It has a bad influence on the livelihood conditions in areas where most of the people are depended on agricultural activities. Many parts of India are frequently affected by drought almost every year. Odisha is one of the states frequently affected by this climatic hazard. Numerous scientists have conducted drought prevention and policy development research for India (Javadinejad et al. 2020;Sam et al. 2020). Studies were conducted in Odisha to identify the areas prone to drought; results indicated that drought is a serious concern (Senapati 2019; Saha et al 2021a). In most of the previous studies, researchers focused more on drought forecasting rather than drought vulnerability. However, for formulating the scientific strategies to reduce the effect of drought, an assessment of drought vulnerability that considers the various indicators of exposure, sensitivity and adaptive capacity is essential. In this present study, drought vulnerability was evaluated using three different groups of parameters, namely,   exposure, sensitivity and adaptive ability. To conduct this research, a wide variety of exposure, sensitivity and adaptive capacity factors were used to account for all possible drought scenarios. SPI-based drought estimation is well-established and has been utilised for the assessment of drought (Malik et al. 2020;Mehr et al. 2020). The criteria were chosen by depending on past studies and the geoenvironmental conditions of the study area. The evaluation was conducted using well-known MLAs. Three ensemble models, namely, M5P-Dagging, M5P-RSS and M5P-RTF, were used for assessing drought vulnerability. Exposure, sensitivity and adaptive capacity maps were prepared by applying each of these ensemble models (M5P, M5P-Dagging, M5P-RSS and M5P-RTF) and by considering the indicators of exposure, sensitivity and adaptive capacity. A number of researchers used M5P, Dagging, RTF and RSS MLAs in numerous disciplines, such as predicting stream flow, (Onyari & Ilunga 2013), flood hazard (Nhu et al. 2020b), landslide (Antronico et al. 2020), assessment of deforestation susceptibility (Saha et al 2021b), gully erosion (Nhu et al. 2020a;Roy et al. 2021) and drought hazard (Buthelezi 2020). In each case, the capacity for forecasting of the output of the model was enormously appreciable. So, the use of the techniques of machine learning has become very frequent. However, these ensemble machine learning models have been used for assessing drought vulnerability.
In other applications, such as landslides, floods, water level prediction and evaluation of the potential of spring, the M5P model has provided good output (Nhu et al. 2020a, b, c). In the present research, all the applied models provided excellent results, as in the aforementioned fields. As compare to the conventional semi-quantitative method like AHP (Hoque et al. 2020; and Fuzzy-AHP (Saha et al 2021) the applied ensemble models provided better result in the present study. Among the applied models, M5P-RTF achieved the highest accuracy (90.10%). Among ensemble models, M5P was used as base classifier, and RSS, Dagging and RTF were used as meta classifiers. RSS and RTF increased the accuracy of the M5P base classifier. In the case of M5P-RTF ensemble model, the accuracy of the M5P base classifier was increased by nearly 5%, as the rotation forest is a tree-based ensemble that transforms the subsets of attributes before building each tree. When all of the attributes are real-valued, rotation forest outperformed the most frequent alternatives. In the present study better result of ensemble models has been found in case of M5P-RTF than other ensemble models applied in different fields like AdaBoost, Bagging, Dagging, MultiBoost, RTF, and RSS where ANN is base classifier in landslide modelling (Pham et al. 2017;Wang et al. 2020), staking and blending where KNN, RF and SVM were used as base classifiers in flood susceptibility modelling (Yao et al. 2022). This work's findings will undoubtedly aid in the formulation of drought relief measures in Odisha and will serve as a reference for future drought research, particularly in terms of strategy creation. Four maps -exposure, sensitivity, adaptive capability and drought vulnerability maps -were created, each of which was categorised into five groups using the natural break technique (Hoque et al. 2020;. As per the categorisation, the state's north-western regions are extremely vulnerable to drought, owing to rising temperatures, declining rainfall, a high frequency of extreme drought, limited water supply, high evaporation and high water demand. The eastern half of the state is rather drought-resistant due to its location along the Bay of Bengal's coastal strip. It is important to raise awareness on the threat of drought in this region to protect the people who are dependent on agricultural activity. There is good scope for further elaboration of this work in future. New factors and indices can be added in the future to allow for a more exact depiction of drought vulnerability. Present day deep learning method have been used in different fields. In future deep learning method could be used for preparing the drought vulnerability maps. Drought-prediction techniques are improving over the time, and academics can keep up with them and improve them with their own contributions (Madrigal et al. 2018). With more ground level data, the models' accuracy can be improved even further. Such studies will help agricultural designers develop appropriate solutions in drought-prone areas.

Advantages of present work
Nevertheless, a full assessment of drought in the region requires the characterisation of drought, which permits activities such as early drought alarm and drought risk mitigation; this assessment would improve the preparation and catastrophe planning (Tsesmelis et al. 2019; Garca    agricultural operations and crop management strategies using the resources at hand. The created maps of drought vulnerability can help crop insurers to target regions that are vulnerable to drought and promote farmer involvement in crop insurance. Before moving further with the drought mitigation measures, the results of our study can also be complemented by the IMD's drought forecast maps. Additionally, the hydro-climate features, socioeconomic situation, or both in any location are realised from the bivariate choropleth maps as the dominating component driving the overall drought vulnerability. As a result, those involved in creating drought mitigation plans may want to think about strengthening drought-prone areas by improving their socioeconomic status and hydro-climatic adaption. It is possible to improve the socio-economic situation through improving irrigation capacity, better groundwater conservation, and other measures. Contrarily, placing a focus on crop selection and appropriate agricultural management techniques with hydro-climatic condition can aid. In principle, the paradigm for assessing vulnerability to disasters described in this study may be employed to update drought vulnerability information with real-time data for adjustments to drought mitigation techniques.

Conclusion
In this research, for drought vulnerability evaluation, ensemble techniques have been adopted and implemented in Odisha, which has never been done before in drought vulnerability assessments. Considering that the nature of drought changes, the approaches for measuring drought susceptibility across place and time need to change as well. Fifty-three drought determining variables were layered on a GIS environment to create drought vulnerability maps based on prior studies. The M5P, M5P-Dagging, M5P-RSS and M5P-RTF ensemble models were combined with these factor layers. The M5P, M5P-Dagging, M5P-RSS and M5P-RTF ensemble models showed very-high drought vulnerability rates of 25.74%, 25.12%, 23.68% and 25.56% in the region, respectively. If effective drought management strategies are not adopted, then this area would undoubtedly be susceptible in the near future. We could not conduct field observations and document the perception of local people due to fund and time deficits. However, such a flaw has no bearing on the used models' correctness. The state's northern areas are particularly vulnerable to drought due to increasing temperatures, diminishing rainfall, a high frequency of extreme drought, restricted water supply, high evaporation and high water demand. The eastern half of the state is drought-resistant due to its location along the Bay of Bengal's coastal strip. The main limitations of the study include lack of crop type data pr agricultural practices in the drought vulnerable region. Future research may focus on the impact of drought vulnerability on agriculture and socio-economic conditions as well as how drought vulnerability evolves under the changing climate and socioeconomic conditions. However, the government should implement a variety of programmes and increase public awareness. Odisha could construct various machinery and irrigational infrastructures to aid in water conservation. The findings of this study can be utilised through local government and private organisations in the Indian state of Odisha for the management of water resource, preservation of environment and land use planning. Funding Open Access funding enabled and organized by CAUL and its Member Institutions. No fund was received for this work.

Declarations
Competing interests The authors declare no competing interests.
Conflict of Interest None. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.