Introduction

Indonesia is a country that is vulnerable to climate change (International Monetary Fund. Asia and Pacific Dept, 2021; Mariah, 2010) because of Indonesia’s high population density and strong dependence on natural resources (Ministry of Foreign Affairs of the Netherlands, 2018). Jakarta, the current capital of Indonesia on the island of Java, is one of the most populous cities in the world and one of the cities most threatened by climate change due to environmental instability (Ward et al., 2013). Population density has triggered increased vehicle emissions, the large number of coal-fired power plants, forest fires, and a history of open burning of rubbish have also caused air quality in Jakarta to decline (Edwards et al., 2020; Syuhada et al., 2023). The various complexities of problems in the capital city of Jakarta have prompted the government to move the new state capital (IKN) to eastern Kalimantan Province, which is estimated to be home to 1.9 million people starting in 2024 with a land area of 256,000 hectares (Gokkon, 2023).

The new capital is a special regional government unit at the provincial level whose territory is the seat of the National Capital (Republic of Indonesia, 2022). The transfer of IKN is based on several considerations, including integrating and creating economic and political growth in the country’s center (Kementerian PPN/Bappenas, 2021). It will also reduce the density of Java, which has been central to the country’s economic growth. The leading performance indicators for achieving IKN development goals are building cities harmonious with nature, easily accessible and connected, circular, resilient, safe, affordable, technology-friendly, and providing economic opportunities for everyone (Berawi, 2022). However, apart from that, the construction of IKN has sparked concerns about massive environmental damage on the island of Kalimantan, one of the most important biodiversity centers and carbon sinks in the world (Teo et al., 2020; Van de Vuurst & Escobar, 2020). IKN infrastructure development is estimated to have potential environmental impacts, including disrupting biodiversity, reducing forest carbon stocks, reducing water availability, and causing problems with pollution, waste, noise, and drainage systems. (Rahayu, 2022). A more in-depth study of the climate and environmental conditions of IKN is needed to support the strategic program for developing IKN with the concept of Forest Cities and Climate Plus Villages (Proklim+) (Tursilowati et al., 2023). The relocation of the new capital city will increase land use activities, both in the form of building construction, vegetation cover, and daily human activities, and will have an impact on water availability in the future (Denryanto & Virgianto, 2021; Kementerian PPN/Bappenas, 2021).

This research predicts surface water availability in the IKN area as part of future water resource management efforts. Identification of water availability has an important role in land management as the primary source of resource planning and ecosystem management of water and land resources for sustainable development (Jin & Ge, 2021). As cities grow, water needs will increase. The worrying situation in the future is the increase in population with fewer resources while the need for water is higher (Santos et al., 2019). Several researchers say that the factors that influence water availability are climate, geology, and hydrogeology conditions (Baba & Gündüz, 2017; Chen et al., 2020; Sabathier et al., 2021). Identification of water availability is carried out using remote sensing technology. Surface water mapping is important in various remote sensing studies, including estimating water availability, analyzing its changes, and forecasting floods and droughts (R. et al., 2022).

Remote sensing can be used to detect changes in water content in soil or vegetation using the Near-infrared (NIR 0.7–1.3 μm) and SWIR indices (Le et al., 2023). Three multi-band satellite imagery methods are used in research to estimate surface water bodies, namely, NDVI, NDWI, and LSWI. Santos et al. (2019) used NDVI and LST to monitor future water needs. NDVI correlated with changes in groundwater levels, vegetation growth conditions, and soil moisture (Aguilar et al., 2012). Ashok et al. (2021) monitor dynamic wetland changes using Landsat imagery based on NDVI and Normalized Difference Water Index (NDWI). NDWI is a remote sensing index that estimates the water content and availability of vegetation and plants and monitors slight changes in water content in water bodies (JRC European Commission, 2011). Other indices that have high sensitivity to water content can use the near-infrared (NIR) and shortwave infrared (SWIR) bands, which are formulated in the Land Surface Water Index (LSWI). LSWI can represent the water content in vegetation (Bajgain, 2015), and assess drought (Otkin et al., 2021; Christian et al., 2022). Acharya et al. (2018) explored the methods used for surface water extraction with remote sensing spectral indices using a combination of geographic information system (GIS) technology. In spatial modeling involving many parameters, including analysis of water availability, some researchers have relied on integrating remote sensing and artificial neural networks (Imran et al., 2023; Mulualem & Liou, 2020).

Artificial neural networks are very efficient, with reasonably good accuracy results in predicting the risk of natural disasters such as floods (Jahangir et al., 2019), erosion (Arif et al., 2017; Gholami & Booij, 2022), monitoring and predicting water quality (Assegid et al., 2012; Lu et al., 2020), drought forecasting (Dikshit et al., 2022; Ozan Evkaya & Sevinç Kurnaz, 2021). Realizing the limitations of data and the need to expand modeling by utilizing remote sensing data, ANN is used to predict surface water availability in the new capital city. ANN can be relied on in making predictions with limited data and has interference (Arif & Danoedoro, 2014; Arif & Nursantosa, 2021). Several researchers use ANN for monitoring and prediction using spatial data and drawing conclusions from ANN more accurately than other prediction methods (Assegid et al., 2012; Kizil et al., 2012; Santos et al., 2019). Bhavya et al. (2023) prove that ANN can identify nonlinear input and output patterns, making it superior to other methods in predicting spatial and temporal variations in groundwater quality.

The findings in this paper can be used as a basic study to predict future water availability in other regions using the parameters used in this research. The main objective of this research is to obtain the best model to predict surface water availability using ANN-backpropagation.

Methodology

Study Area

This study was conducted to predict water availability in the new capital city and its surroundings (Fig. 1). Geographically, IKN is located at 0°58′23″S and 116°42′31″E, namely in North Penajam Paser Regency and Kutai Kartanegara Regency in East Kalimantan Province, which geographically is in the middle of Indonesia. The areas considered for this research are the buffer areas around IKN, namely North Penajam Paser Regency, Kutai Kertanegara Regency, Balikpapan City, and Samarinda City (Fig. 1).

Fig. 1
figure 1

Study area

Data and Image Processing

The image data used in this research is Sentinel 2 A in 2022, the extraction and processing of which was accessed using Google Earth Engine (GEE) by filtering the best image of that year, which had a maximum cloud cover of 30% using the following script.

  • //Masking function

  • var S2A = ee.ImageCollection(“COPERNICUS/S2_SR_HARMONIZED”)

  • .filter(ee.Filter.lt(‘CLOUDY_PIXEL_PERCENTAGE’, 30)) //cloud cover percentage

  •    .map(maskS2clouds)

  •    .mean()

  •    .clip(geometry);

  • function maskS2clouds(image) {

  •  var qa = image.select(‘QA60’);

  •  var cloudBitMask = 1 < < 10;

  •  var cirrusBitMask = 1 < < 11;

  •  var mask = qa.bitwiseAnd(cloudBitMask).Equation (0)

  •   .and(qa.bitwiseAnd(cirrusBitMask).Equation (0));

  •  return image.updateMask(mask).divide(1);

  • }

The GEE platform is a cloud computing resource that allows researchers to easily access and analyze satellite imagery (Amani et al., 2020). The entire image transformation processing process used in this research was done at GEE. The indices used in the research are presented in Table 1.

Table 1 Spectral indices used in identifying water availability

The Normalized Difference Vegetation Index can be used to monitor surface water availability because it is based on the difference between reflected near-infrared and red. This difference is related to the amount of vegetation cover and water available. When more water is available, vegetation can grow and reflect more near-infrared light, resulting in higher NDVI values. Conversely, vegetation growth is limited when water availability is less and NDVI values are lower (Huang et al., 2018). Therefore, NDVI can be used to detect changes in vegetation cover related to changes in water availability.

The Normalized Difference Water Index (NDWI) is a method that has been developed to describe open water features and enhance their presence in remote sensing digital imagery. NDWI utilizes reflected near-infrared radiation and visible green to enhance such features while eliminating land features and terrestrial vegetation (McFeeters, 1996). Another index used, namely LSWI, is a water-related vegetation index that represents vegetation’s total water content using near-infrared and short wave bands. Determination of sample points as regions of interest (ROI) is presented in Table 2.

Table 2 Determination of water availability values

ANN Classifier

Artificial neural networks are information processing systems with characteristics similar to biological neural networks, namely neural networks in the human brain (Fausett, 1994). ANN was initially designed as a pattern recognition and data analysis tool, which has advantages over ordinary statistical methods where the data must be normally distributed (Ardizzone et al., 2002; Arif & Danoedoro, 2013; Imran et al., 2023; Xiong et al., 2010). ANN effectively identifies multidimensional data patterns and structures (Giuffrida et al., 2020). Besides that, ANN has higher accuracy in remote sensing image classification than other methods (Liu et al., 2022; Miller et al., 1995). The ANN algorithm used in research is the backpropagation algorithm because, in some cases, the classification has quite good accuracy (Arif & Nursantosa, 2021; Suliman & Zhang, 2015). ANN-backpropagation can be learned and generalized from examples, making it useful for classification tasks in remote sensing (Ma et al., 2019). Additionally, backpropagation can handle large amounts of data and be used for supervised and unsupervised learning (Li et al., 2022). The model that will be created follows the stages in an artificial neural network, namely determining the artificial neural network architecture, including input and target (output) layers, preparing sample data, training on sample data, and testing data that has been trained and has not been trained. Meanwhile, the activation function used in this research is the sigmoid function. The ANN architecture in this research is illustrated in Fig. 2.

Fig. 2
figure 2

The ANN architecture of water availability

Accuracy Assessment

Kappa Coefficient calculations were used to evaluate the accuracy of the classification results. Kappa is calculated from the observed and expected frequencies on the diagonal of a square contingency table. A confusion matrix-based approach was used in this study (Table 3), namely comparing extracted and non-water availability maps with reference data. There are three categories for determining pixel types, i.e.:

  1. i.

    High water availability (HW): pixels extracted from a body of water.

  2. ii.

    Medium to low water availability (vegetation water/VW): pixels extracted from water availability detected in vegetation.

  3. iii.

    No water (Non-water/NW): pixels extracted from built-up land.

Based on Table 3, overall accuracy (OA) and kappa coefficient are used together to assess the accuracy of the resulting map. Calculated by dividing the number of correct pixels (sum of the main diagonal) by the total number of pixels in the error matrix (Congalton & Green, 2010). Using the following formula, the kappa suitability level refers to Table 4 (Landis & Koch, 1977).

$$K= \frac{N\sum _{i=1}^{r}xii-\sum _{i=1}^{r}({x}_{i+}.{x}_{+i}) }{{N}^{2}-\sum _{i=1}^{r}({x}_{i+}.{x}_{+i}) }$$
Table 3 A confusion matrix
  • r = number of rows in the matrix

  • xii = number of observations in rows and columns

  • xi+ = number of observations in row i

  • x+ i = number of observations in column i

  • N = total number of observations

Table 4 Standard interpretations of cohen’s kappa (Landis & Koch, 1977)

Results

The results shown in Fig. 3 are based on NDVI (Fig. 3a), NDWI (Fig. 3b), and LSWI (Fig. 3c) values with cloud cover of less than 30% during the period January 2022 to December 2022. Table 5 shows the statistical values of NDVI, NDWI, and LSWI. The three variables used have a strong relationship (Fig. 4), shown by the correlation of NDVI and NDWI (R2 = 0.94, Fig. 4a), NDVI and LSWI (R2 = 0.73, Fig. 3b), LSWI and NDWI (R2 = 0.68, Fig. 3c).

Table 5 NDVI, NDWI, and LSWI statistics
Fig. 3
figure 3

Input dataset: (a) NDVI; (b) NDWI; (c) LSWI

Fig. 4
figure 4

Correlation (a) NDVI and NDWI; (b) NDVI and LSWI; (c) NDWI and LSWI

Figure 3 is the input data for the ANN and is the basis for selecting 449 pixels as ROI for the ANN training data. ANN processing is carried out using IDRISI Selva. ROI points were randomly selected and divided to obtain 60% training and 40% testing data points to calculate kappa and accuracy. Training is carried out by trial because there are no general rules regarding the number of each ANN parameter to produce good accuracy; it all depends on the complexity of the problem and data conditions (Arif et al., 2017; Riyanto et al., 2022; Stathakis, 2009). The hidden layers are between 1 and 2 (Pourdarbani et al., 2019). In previous research, (Arif & Danoedoro, 2013; Arif & Nursantosa, 2021) tested iterations up to 10,000 and RMS 0.001. This study conducted four simulations with different ANN parameters to identify water availability (Table 6). The performance of the training results in the four simulations shows that performance during training requires a minor RMS error compared to testing (Fig. 5).

Fig. 5
figure 5

Relationship between RMS error and iterations: (a) ANN 1; (b) ANN 2; (c) ANN 3; (d) ANN 4

The highest accuracy result is ANN 3 (Table 6; Fig. 6c) with ANN parameters: 2 hidden layers (HL), learning rate (LR) 0.01, Momentum (M) 0.4, root means square (RMS) 0.001, iteration (I) 10,000. The map of the prediction results of the four simulations is presented in Fig. 5. Test data is carried out at several sample points not based on pixel values. Eighty sample points were observed in the classification results image and compared with the values of the three parameters determined (Table 2). Accuracy test results on training and testing data are presented in Table 6. Accuracy values are obtained from calculations in the confusion matrix table in the ANN 3 simulation (Tables 7 and 8).

Table 6 ANN simulation for training data
Fig. 6
figure 6

Water availability classification for different ANN simulations: ANN 1(a); ANN 2(b); ANN 3(c), ANN 4(d)

Table 7 Training data accuracy on ANN 3
Table 8 Test data accuracy on ANN 3

Based on Fig. 5, each simulation produces different class areas. ANN 1 and ANN 3 (Fig. 7a and c) are dominated by pink, namely the non-water (NW) class, while ANN 2 and ANN 4 (Fig. 7b and d) are dominated by green, namely the water-vegetation (VW) class. The percentage area of each class is presented in Fig. 5.

Fig. 7
figure 7

Area of each classification result: (a) ANN 1; (b) ANN 2; (c) ANN 3; (d) ANN 4

Figure 7 shows that ANN 1 and ANN 3 have almost the same percentage, namely high NW compared to VW and HW. Meanwhile, in ANN2 and ANN 4, the percentage of VW is higher than NW and HW.

Discussion

This study shows that ANN is a practical approach for mapping water availability in IKN by only involving input data, namely NDVI, NDWI, and LSWI. The high standard deviations in NDVI and NDWI (Table 5) indicate the sensitivity of NDVI to small changes in vegetation, which can cause relatively high variability in NDVI values, especially in areas with diverse vegetation types or in response to fluctuating environmental conditions (Lou et al., 2021; Zhang et al., 2018). ANNs can be relied on in analyzing and interpreting remote sensing data, even with limited data (Giuffrida et al., 2020; Linderman et al., 2004; Miller et al., 1995). In the correlation test results, the three variables used show that NDVI has a positive correlation with NDWI (R2 = 0.94, Fig. 4a). A similar thing was produced by (Roy & Bari, 2022) where NDVI had a positive correlation coefficient of 0.98 with NDWI. However, this differs from (Abdalkadhum et al., 2021) who found a lower correlation coefficient of 0.13. The correlation strength between NDVI and NDWI can vary depending on soil moisture and land cover type (Gao, 1996; Gu et al., 2008). Figure 4b shows a positive correlation between NDVI and LSWI (R2 = 0.73). However, it is also possible that LSWI has a negative correlation between LSWI and NDVI in drought conditions because LSWI is more sensitive to changes in soil moisture than NDVI (Bajgain et al., 2015). Meanwhile, NDWI has the same spatial pattern as LSWI (Fig. 3b) with a positive correlation (R2 = 0.68), but LSWI is more sensitive to vegetation and soil water content than NDWI (Bajgain et al., 2015; Chandrasekar et al., 2010). The LSWI value is positive for green vegetation and negative for dry vegetation.

The ANN simulation results show that the ANN architecture with 2 HL produces higher accuracy than 1 HL (Table 6), as in several other studies (Arif & Nursantosa, 2021; Mahmon & Ya’acob, 2014). The number of HL greatly influences the classification results (Fig. 6). Simulations with 1 HL, namely ANN 2 and ANN 4 (Fig. 6b and d), have the same spatial pattern with an accuracy of training data < 80%. These two simulations read the VW class as having a wider area than the NW and HW classes (Fig. 7). On the other hand, the results of the 2 HL simulations (ANN 1 and ANN 3) have the same spatial pattern (Fig. 6a and c) with an accuracy of training data > 90%, where the NW class reads in the broader area than the other classes. ANN is very sensitive in providing different results even though it uses the same input data, only differing network parameters (Arif et al., 2017; Arif & Danoedoro, 2013; Czarnecki & Podolak, 2013). In this study, the number of LRs does not significantly influence the accuracy of the results shown in the ANN 3 and ANN 4 simulations, which use the same LR, which produces different accuracy. A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, while a learning rate that is too small can cause the process to stall (Chinwe et al., 2021). When large training samples are provided, the learning rate can become an issue, and the underlying system may become incapable of valuable generalization (Taborsky, 2022). In this research, the more iterations carried out, the higher the accuracy. ANN 3, with an accuracy of 25,000, is higher than ANN 1, with 10,000 iterations. The number of iterations also correlates with the RMS value (Fig. 5); a small RMS error requires many iterations, whereas a significant RMS error requires few iterations (Oliveira et al., 2011). Several other researchers have proven that iteration significantly influences accuracy (Maggiori et al., 2017; Wan-Kadir et al., 2013; Yuan et al., 2009).

Whatever analytical method is used to detect water availability, the method’s usefulness ultimately depends on whether the method can be applied in all regions. Further research is needed if applied to locations with different geographical conditions. Although more practical, analytical procedures similar to those used by (Abdalkadhum et al., 2021; Acharya et al., 2018; McFeeters, 1996) may be too complicated because trial and error requires determining the appropriate ANN parameters to produce good accuracy. However, this study has developed an efficient procedure that can be expanded or modified with additional data, including field data. ANN offer a more accurate and comprehensive approach to predicting water availability than other existing methods due to their ability to handle complex relationships and nonlinearities in the data. Therefore, ANN is a valuable tool for water management and decision-making processes. Some researchers use linear regression methods to predict water availability. (Adamowski et al., 2012; Jafar et al., 2023; Fernandes et al., 2023). Linear regression is limited by the assumption of a linear relationship between the independent and dependent variables, which may not hold true in complex systems, such as water availability (Tu, 1996).

Conclusions

The NDVI, NDWI, and LSWI spectral indices can be used as input in predicting water availability using artificial neural networks. Four simulations were carried out to determine the best ANN prediction results, with the best results obtained from a combination of ANN parameters: 2 HL; LR 0.01; M 0.4; RMS 0.001 and Iteration 25,000 with overall accuracy (OA) 97.7% and kappa index 0.96. The prediction results show that the percentage of water availability in the research area is HW (0.51%), VW (20.41%), and NW (79.08%). This study can be a reference for policymakers and the public about the importance of water conservation and the potential consequences of water scarcity in IKN.