Introduction

Iraq has abundant surface water resources compared to other countries in the Arabian Peninsula. However, mismanagement of this precise resource and interventions in the upstream of Tigris and Euphrates rivers as well as their tributaries by riparian countries bordering Iraq have made surface water gradually a scarce resource in Iraq (Al-Ansari 2013). It is anticipated that groundwater will play an important role in the area to supplement water supply to growing population and agricultural activities in near future. The southern desert of Iraq contains huge amount of groundwater resources suitable for agricultural and industrial uses and even for drinking after appropriate treatment. In the east and north parts of the desert, a set of springs and flowing wells exists which extends parallel to the Euphrates River. The spatial demarcation of this hydrogeological system can facilitate groundwater resources management and development efficiently, and thus agricultural development in the west of the Euphrates River.

Delineation of groundwater potential zones is an important prerequisite for implementation of successful groundwater development, protection, and management program (Ozdemir 2011a). Two main approaches are usually used for demarcation of groundwater potential zones namely, data-driven and knowledge driven (Bonham-Carter 1994). The data-driven approaches use known locations of well/spring as dependent variable and groundwater occurrence controlling factors as independent variables. The advantages of these methods are that they need little data, and they are less affected by potential bias in human input (McKay and Harris 2015). Examples of these techniques are frequency ratio (Ozdemir 2011a; Oh et al. 2011; Manap et al. 2011; Moghaddam et al. 2013; Pourtaghi and Pourghasemi 2014; Naghibi et al. 2014; Elmahdy and Mohamed 2014; Al-Abadi 2015b), artificial neural networks (Corsini et al. 2009; Lee et al. 2012), weights of evidence (Corsini et al. 2009; Ozdemir 2011b; Lee et al. 2012; Pourtaghi and Pourghasemi 2014; Al-Abadi 2015a), maximum of entropy (Rahmati et al. 2016), evidential belief functions (Nampak et al. 2014; Mogaji et al. 2014; Pourghasemi and Beheshtirad 2015), logistic regression (Ozdemir 2011a; Pourtaghi and Pourghasemi 2014), and Shannon’s entropy (Naghibi et al. 2014; Al-Abadi 2015b). In recent years, machine learning techniques such as, boosted regression tress, classification and regression tress (CART), decision tress, and random forest (Lee and Lee 2015; Naghibi et al. 2016; Rahmati et al. 2016) have also been used of spatial zoning of groundwater potential. On the other hand, knowledge driven approaches do not need geographic information of wells or springs, rather they rely on the data to determine the weights or importance of independent groundwater occurrence controlling factors. Examples of these approaches are index overly (Jha et al. 2010; Machiwal et al. 2010; Manap et al. 2011; Abdalla 2012; Pandey et al. 2013; Al-Abadi and Al-Shamma’a 2014), fuzzy logic (Shahid et al. 2002), and analytical hierarchical process (Adiat et al. 2012; Rahmati et al. 2014).

The random forests (RF) algorithm is a machine learning technique which has been applied recently as a data-driven predictive model for groundwater potential mapping (Naghibi et al. 2016; Rahmati et al. 2016). Machine learning is defined as a field of computer science that gives computers the ability to learn without being explicitly programmed. The RF is one of the most powerful, fully automated machine learning techniques (Fernandez-Delgado et al. 2014). It can handle data from various measurement scales without any statistical assumptions (Rahmati et al. 2016). At the same time, RF is computationally inexpensive than other machine learning algorithms like, neural networks or support vector machines (Rodriguez-Galiano and Chica-Olmo 2012). Another advantage of the RF is that it allows assessment of the importance of input variables in prediction (Rahmati et al. 2016). Although RF has been applied recently in many earth science disciplines including groundwater potential mapping, its capability to delineate groundwater flowing well zones is still not well explored.

The major objective of this study is to delineate the artesian zone in the southern desert of Iraq using RF and GIS. The methodology proposed in the study can be used for quick but efficient mapping of groundwater resources with limited amount of data and human interference. It is also expected that the groundwater artesian zone map development in the present study will allow efficient planning, management and development of groundwater resources in the study area.

The study area

The southern desert locates in the southern part of Iraq, south of Euphrates River, encompass an area of about 78,390 km2 (Fig. 1). Four administrative governorates share the area of the southern desert namely, Najif, Samawa, Nasiriya, and Basra. The most parts of the southern desert are unpopulated due to extreme climate and severe water scarcity. Few urban centers are located along the Euphrates River, while small towns such as Al-Shbicha, Al-Salman, and Al-Busaiya are sparsely distributed through the area. The topography of the study area is mostly flat with a rising elevation from the northeast to the southwest (Al-Jiburi and Al-Basrawi 2008). The elevation ranges between 1 and 494 m with an average of 229 m (Fig. 2). The climate in major portion of the southern desert is arid with an annual average rainfall typically in range of 100–150 mm. However, the rainfall varies extremely from year to year. Most of the rainfall occurs between October and May with little or no rainfall in other months (Parsons 1955). Despite small amount of rainfall, different types of vegetation grow in the southern desert as rainfall mostly occurs in winter when evaporation is less. The climatic data at nine meteorological stations within and around the study area for the time period 1980–2015 show that the annual average of rainfall, temperature, evaporation, relative humidity, and wind speed in the area are 108 mm, 30.4°C, 352 mm, 46 %, and 2.9 m/s, respectively. The geological formations exposed in the study area, from oldest to youngest are the Aidah, Rus and Dammam with their various limestone members, and the Zahra, Euphrates, Fatha, Dibdibba and recent alluvium (Parsons 1955). In general, all the formations contain groundwater in varying amounts and quality. Since most of the springs and flowing wells occur with the Euphrates, Dammam, and Quaternary rocks, a brief description of these rocks is given here. The Euphrates Formation is the most widely spread formation of the Early Miocene sequence in Iraq with thickens up to 160 m. It comprises 8 m thick shalley chalky and well bedded recrystallized limestones with texture ranging from oolitic to chalky, which locally contain corals and shell coquinas (Jassim and Goff 2006). The Dammam Formation mainly consists of chalky limestone, dolomite, mark and Shales. The Quaternary formation in the eastern part of the study area consists of sand and alluvium deposits of recent and Pleistocene ages. The geological map of the study area is given in Fig. 3.

Fig. 1
figure 1

Location map of the study area

Fig. 2
figure 2

Elevation (m) map of the study area

Fig. 3
figure 3

Geological map of the study area (after Sissakian 2000)

From the tectonic point of view, the southern desert is a part of the Arabian Platform, which is characterized by the presence of block tectonics and the absence of tectonic folds (Buday and Jassim 1987). The main structural features include several northeast-southwest transversal faults. The eastern boundary of the southern desert is sharply marked by the southern extension of Hit-Abu Jir fault system. This fault system represents structural and geological boundary zone that separates the desert from the Mesopotamia Zone.

There are five major aquifer groups in the study area (Jassim and Goff 2006), which are termed as 4, 5, 6, 7, and 10 aquifer groups (Fig. 4). The aquifer group 4 represents limestone of the Palaeogene–Neogene Euphrates Formation, Kirkuk Group and Ghar formations of the Western and Southern desert of the Rutba Subzone and the Salman Zone. The aquifer group 5 represents the karstified and fractured limestones of the Palaeogene Um-Er Radhuma, and Jill and Dammam formations of the southern desert of the Salman zone. The aquifer group 6 depicts the sandstone and conglomerate of the Miocene to Pleistoncen Ghar and Zahra formations and the Nukhaib Graben fill of the western and southern deserts of the Rutba subzone and the Salman zone (Jassim and Goff 2006). The aquifer group 7 represents the sandstone of the Mio-Pliocene Dibdibba Formation of the southern desert in the Salman Zone, while the aquifer group 10 represents the sands of the Quaternary Mesopotamian flood plain of central and the south of Iraq in the Mesopotamian zone. A detailed description of these aquifer units with relevant information can be found in Jassim and Goff (2006). Groundwater in the study area moves from the west and the southwest (recharge areas) to the east and the northeast (discharge areas) (Fig. 5). Groundwater level varies from ten meters in recharge areas to near surface or artesian in discharge areas (Al-Jiburi and Al-Basrawi 2008). Groundwater quality in the study area can be classified into three types namely, mixed saline with either sulfates of chlorides dominant, carbonate water and high nitrate waters (Parsons 1955).

Fig. 4
figure 4

Spatial distribution of aquifer groups in the southern desert of Iraq [after Jassim and Goff (2006)]

Fig. 5
figure 5

Geological cross sections in the Iraqi Southern Desert [after Araim (1984)]

The main aquifers underlying the study area from oldest to youngest are: Hartha, Tayart, Umm Er Radhuma, Dammam, Ghar-Euphrates, Dibdibba, and Quaternary (Fig. 5). A hydraulic connection is possible between these water bearing layers as a result of the piezometric changes throughout these aquifers. Since Dammam aquifer has high hydraulic pressure, most of the wells drilled in this aquifer is flowing artesian well. A brief description of these aquifers is given here. Dammam formation comprises of limestone, dolomite, limestone and dolomite, with marl and evaporates. Dammam aquifer is characterized by high permeability due to presence of cavities, karstfiied features, factures, fissures, and joints. The Dammam aquifer is considered as the main regional groundwater aquifer in the southern desert due to its wide extension and huge amount of stored groundwater (GEOSURV 1983). The hydrological characteristic of the Dammam aquifer is presented in Table 1. The transmissivity of the aquifer ranges from 3.1 to 4752 m2/day, and thus it regards as extremely heterogeneous. The hydraulic conductivity ranges from 0.1 to 100 m/day, while static water level ranges from 0 to 170 m below the earth surface. The total dissolved solid is in the range of 350 to 8530 mg/l. The dominant water type is sulphatic, in addition to chloride and biocarbonatic water types. The source of sulphate is attributed to the presence of evaporates within the rocks or gypsiferous soil (Al-Jiburi and Al-Basrawi 2000).

Table 1 Hydrogeological data of Dammam aquifer in southern desert of Iraq [after Al-Jiburi and Al-Basrawi (2008)]

Random forest algorithm

The RF is an ensemble of learning techniques that generate many classification tresses which are aggregated to compute a classification or regression (Breiman 1984; 2001). Ensemble learning (EL) is a method that generates many classifier and aggregate their results. EL can be classified into two well-know methodologies: boosting and bagging. In boosting, successive trees give extra weight to point incorrect prediction by earlier predictors, and finally a weighted vote is taken for prediction (Liaw and Wiener 2002). In bagging, successive tress is independently constructed using a bootstrap sample of the dataset, and do not based on generated earlier trees. The prediction is taken as a simple majority vote. RF belongs to the family of ensemble methods appeared in machine learning at the end of 1990s (Dietterich 2000). The principle of RF is to combine many binary decision tresses built using several bootstrap samples coming from the learning sample L and choosing randomly a subset of explanatory variable X at each node (Genuer et al. 2008). Roughly, two-third of the learning samples (also called bag samples) is used for prediction, while the remaining one-third [also called out-of-bag (OOB)] is used for validation. RF algorithm splits the target variable (parent node), into binary pieces, where the child nodes are ‘purer’ than the parent node (Carranza and Laborte 2015a). Each node is split using the best among a subset of predictors randomly chosen at that node. The process of data splitting in each internal node is iterated until a pre-specified stop requirement is reached (Carranza and Laborte 2015a). After that, a simple regression model is attached for every child node (leaf). The final output of RF is the majority of output from all decision tress. The RF algorithm for growing a random forest of k classification trees is as follows (Peters et al. 2007; Hastie et al. 2009):

  1. i.

    for i = 1 to k do:

  2. 1.

    Draw a bootstrap sample (subset X i ) from the original dataset X;

  3. 2.

    Use X i to grow an unpruned classification tree to the maximum depth with the following modification compared with standard classification tree building: (a) randomly select m variables from p variables; (b) choose the best split among these variables; and (c) split the node into two daughter nodes.

  4. ii.

    Predict new data according to the majority vote of the ensemble of k trees.

An unbiased estimate of the error rate is obtained during the construction of a RF based on the training data as:

  1. i.

    At each bootstrap iteration, predict the data in OOB that are not used in the construction of the ith tree.

  2. ii.

    At the end of the run, aggregate the OOB predictions, calculate the error rate, and call it the OOB estimate of error rate.

A more detailed description of the RF algorithm can be found in Breiman (2001) and Liaw and Wiener (2002).

Materials and methods

Data used

The data used in the creation of spatial zones of groundwater flowing wells involve target variable (geographic location of the flowing well) and various predictor variables represented by thematic layers of groundwater affecting occurrence factors. The flowing well inventory was developed by a team of General Commission of Groundwater, Ministry of Water Resource of Iraq through extensive filed survey in the year 2013. In total, 93 perennial flowing wells were identified. This dataset is randomly partitioned using random algorithm in MINITAB 16 software into two sets: training and testing. Of the 93 flowing well locations, 65 wells (70 %) were used as training dataset, and the remaining 28 wells (30 %) were used as validation dataset.

Eleven predictors were used in this study, which were selected based on literature review, expert opinion, data availability, and field conditions. These variables are elevation, slope, profile curvature, aspect, topographic wetness index (TWI), stream power index (SPI), distance to Abu Jir fault, distance to Euphrates River, major aquifer group, total hydraulic head, and well depth. The thematic maps of all predictors were prepared as raster layer with 30 × 30 m resolution in ArcGIS 10.2 software. The total number of pixels for each thematic raster layer was 144,007,200 (12,204 columns and 11,800 rows). Topographic variables namely, elevation, slope, profile curvature, aspect, TWI, and SPI were prepared using Advanced Spaceborne Thermal Emission and Reflection Radiometer-Global Digital Elevation Model (ASTER-GDEM) data with a spatial resolution of 30 m, download from United State of Geological Survey (USGS) website (earthexplorer.usgs.gov). The importance of these variables in delineating groundwater potential zones are extensively described in literatures (Ozdemir 2011a, b; Oh et al. 2011; Pourtaghi and Pourghasemi 2014; Naghibi et al. 2014). The surface elevation map was directly derived from DEM and classified into five categories (McDonald et al. 1990): plains (0–9 m), rises (9–30 m), low hills (30–90), hills (90–300 m), and mountains (>300 m) (Fig. 2). Slope (%) map was also derived from filled DEM and classified into five classes (de Winnaar et al. 2007): flat (<2 %), undulating (2–8 %), rolling (8–15 %), hilly (15–30 %), and mountainous (>30 %) (Fig. 6a). Profile curvature was classified into three classes namely, concave (<0), flat (0), and convex (>0) (Fig. 6b). Aspect was classified into ten classes (Fig. 6c): Flat (−1), North (0–22.5), Northeast (22.5–67.5), East (67.5–112.5), Southeast (112.5-157.5), South (157.5–202.5), Southwest (202.5–247.5), West (247.5–292.5), Northwest (292.5–337.5), and North (337.5–360.0). The secondary topographic predictor variables namely, TWI and SPI were derived using following equations (Moore et al. 1991),

Fig. 6
figure 6figure 6figure 6figure 6figure 6

a Map showing the distribution of slope (%) in the study area. b Map showing spatial distribution of total curvature in the study area. c Aspect map of the study area. d The topographic wetness index (TWI) map of the study area. e The stream power index (SPI) map of the study area. f Map showing the distances to Abu Jir fault (m). g Map showing the distances to Euphrates River. h Total head distribution over the study area (after Al-Jiburi and Al-Basrawi 2008). i Spatial distribution of groundwater depth in the study area

$$TWI = \ln \left( {\frac{a}{\tan \beta }} \right)$$
(1)
$$SPI = A_{s} \tan \beta$$
(2)

where, a is the local unslope area draining through a certain point per unit contour length and \(\tan \beta\) is the local slope in degrees, and A s is the specific catchment area. The Raster Calculator in ArcGIS software was used to derive TWI and SPI layers and finally classified into five categories to prepare the thematic maps of those variables as shown in Fig. 6d, e, respectively.

The proximity variables namely, distance to Abu Jir fault and distance to Euphrates River were prepared by using Euclidean Distance method in ArcGIS environment. Both variables were classified into ten classes using equal interval classification scheme (Fig. 6f, g). It was found that the correlation between well locations and distance from these linear features decreases as the distance increase, and thus a strong negative correlation exists.

Three predictor variables related to hydrogeological characteristics namely, total hydraulic head, well depth, and major aquifer groups were used in this study. Hard copy of total hydraulic head map (Al-Jiburi and Al-Basrawi 2008) was scanned, georeferenced, and then digitized using ArcGIS software (Fig. 6h). Generally, groundwater moves from recharge areas (northeast) to discharge areas (southwest). Groundwater either discharges in form of springs or flows underground into Mesopotamian plan sediments (GEOSURV 1983). Data presented in Table 1 were used to create the map of groundwater depth. The values of groundwater depth were interpolated using inverse distance weighting technique (IDW) method to prepare the map. IDW is a deterministic interpolation technique traditionally used to interpolate groundwater depth (Reed et al. 2000). In IDW, deterministic interpolation techniques create surface from sample points using mathematical functions, based on either the extent of similarity or the degree of smoothing (radial basic function RBF) (Adhikary and Dash 2014). In mathematical terms, IDW is written as:

$$z(x_{ \circ } ) = \frac{{\sum\nolimits_{\text{i = 1}}^{1} {\frac{{x_{i} }}{{h_{ij}^{\beta } }}} }}{{\sum\nolimits_{\text{i = 1}}^{1} {\frac{1}{{h_{ij}^{\beta } }}} }}$$
(3)

where, \(z\left( {x_{ \circ } } \right)\) is the interpolated value, n is total number of sample data, x i is the ith data value, h ij is the separation distance between interpolated value and sample data value, and β is the weighting power factor. The optimal weighting power depends on the spatial structure of the data and is influenced by the coefficient of variation, skewness and kurtosis of the data (Mueller et al. 2001). The map of groundwater depth is shown in Fig. 6i. The figure shows that the depth to groundwater increases from south to north. To create the aquifer group layer, a hard copy of this predictive variable (Fig. 4) was scanned, georeferenced, digitized and finally convert from vector to raster using conversion tool in ArcGIS.

Mapping groundwater flowing well potential zone

The randomForest package in R software was used for the development of RF model. The target variable, the locations of well was represented by 1 for flowing wells and 0 for non-flowing wells. For the training of RF mode, equal number of flowing and non-flowing wells was selected to get the optimal result (Carranza and Laborte 2015b). Non-flowing locations should be distal to any flow location because locations proximal to existing potential zone are likely to have similar multivariate spatial data signatures as the potential zone and thus preclude achievement of desired results (Carranza and Laborte 2015a). Therefore, the point pattern analysis was used to find the distance from any flowing location and corresponding probability that there is one flowing location situated next to it. The point pattern analysis showed that the distance for any non-flowing well location in which there is 100 % probability of a neighboring flowing well location is approximately 10 km. Hence, the 65 non-flowing well locations were selected from areas beyond 10 km of every flowing well location. At flowing and non-flowing well locations, the values of predictor variables were extracted using ArcGIS. These data were stored as *.csv text file and exported to R statistical software to run RF model. Obtained results of RF model were the values between 0 and 1, where 1 refers to high probability of getting flowing well, while 0 refers to non-flowing well. Finally, the probability of getting 1 were stored as text file and exported to the ArcGIS as point shape file. The point data were finally interpolated to prepare the map of groundwater flowing well potential.

Validation of the results

The accuracy of the RF model developed in this study to delineate groundwater flowing well zone are investigated using relative operating characteristics (ROC) curve. The ROC curve is commonly used for examining the quality of deterministic and probabilistic detection and forecast system (Swets 1988). It is a common method used to assess the accuracy of a diagnostic test (Egan 1975). The ROC plots the sensitivity (false postive rate) on X axis against 100-specificity (true positive rate) on Y axis. In ROC analysis, the area under the ROC curves (AUC) used to measure the prediction accuracy qualitatively (Maier and Dandy 2000). The predictive capability of a model is excellent if AUC = 1–9; very good 0.8–0.9; good 0.8–0.7; average 0.7–0.6; and poor 0.6–0.5 (Yesilnacar 2005). Usually, the AUC are used to evaluate the performance during both model training (success rate) and testing (prediction rate). The success rate explains how well the resulting groundwater potential map classified the area of flowing well locations during training (Al-Abadi 2015a), while prediction rate provides a measure of the accuracy of predictive model with unseen data (testing dataset).

Results and discussion

The parameters required to run RF algorithm in R packages are the number of trees (ntree) and the number of predictors (mtry) randomly sampled at each split node. These parameters are determined using OOB error. The OOB error rate is a helpful estimator of the generalization error depending on the number of trees (Rahmati et al. 2016). The OOB error rate depicted in Fig. 7 shows that the OOB error rate decreases as the number of trees increase. When the OOB equals to 0.01, mtry and ntree are equal to 3 and 500, respectively. The mtry value is further checked using the equation proposed by Breiman (2001), which postulates that the mtry should be less than log2(M + 1) in order to minimize the generalization error and correlation among decision tress, where M is the number of predictors used. As the number of predictors in this study is 11, the mtry should be less than int(log2(11 + 1) or 3. On the other hand, Rodrigues-Galiano et al. (2014) proposed that ntree value of 1000 results relatively low prediction errors and most stable predictions (ZhenJie et al. 2015). From OOB plot in (Fig. 7), it is obvious that error rate stabilizes at 500 (ntree), and therefore, ntree value of 500 was adopted in the study.

Fig. 7
figure 7

Plot of out-of-bag (OOB) error of random forest model

One of the most attractive features of the RF algorithm is its capability to rank the importance of predictors according to predictor’s marginal effect on the target variable while keeping all the other predictors constant (Carranza and Laborte 2015a). In order to assess the predictor’s importance in RF model developed in the study, two parameters were used namely, mean decrease accuracy (MeanDecreaseAccuracy) and mean decrease in Gini coefficient (MeanDecreaseGini) (Fig. 8). The mean decrease accuracy is a measure to explain how the model fit decreases as a variable dropped from the analysis. The greater the drop, the more significant the variable. On the other hand, the mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting RF model. The Gini index is often used to describe the overall explanatory power of the predictors. Therefore, mean decrease accuracy is more important for variable selection, while Gini index is important in defining the explanatory association among the variables selected. The mean decrease accuracy plot (Fig. 8) identified the elevation, well depth, distance to Euphrates River, Distance to Abu-Jir fault, aquifer groups, and groundwater heads as most importance predictors. The same results were obtained using Gini index with different ranks of importance. Both measures indicate that the slope, curvature, aspect, TWI, and SPI have lower effect on the groundwater flowing well potential in the study area. Therefore, these predictors were removed and the RF model was run again. The overall accuracy for both RF runs are presented in Table 2. It is obvious from Table 2 that removal of the less important predictors caused an increase in model accuracy from 93.75 % (model I) to 95.31 % (model II). According to Landis and Koch (1977) the coefficient is the best index of fit between the predictors and the target variable. Kappa values >0.8 = strong fit, 0.4–0.8 = moderate fit, and <0.4 = poor fit. Both RF model have Kappa coefficients greater than 0.8 and thus regard as excellent accuracy models but the model II more accurate than model II (0.87 versus 0.91 kappa coefficient, for models I and II, respectively). Therefore, results of the RF model II were used for further analysis. The RF model gave a probability value between 0 and 1 at 128 observation points. In ArcGIS, the probability of getting 1 was interpolated using IDW algorithm to generate the flowing well potential map, as shown in (Fig. 9). The probability values were classified into five categories using brake classification scheme namely, very low (0.00–0.11), low (0.11–0.34), moderate (0.34–0.61), high (0.61–0.83), and very high (0.83–1.0). It was found that the high and very high potential zones cover an area of about 11,868 km2 or 15 % of the total area, which are mainly concentrated in northeastern parts of the study area. The moderate zones encompass total area of about 4342 km2 or 6 % of total study area. The majority of the study area, 62,180 km2 or 79 % of total area was identified as very low and low potential for groundwater flowing well. The low potential zones are found to concentrate mainly in the southwestern parts of the study area. The high and very high potentiality zones are basically associated with low elevation values, closeness to the Abu Jir fault, closeness to the Euphrates River, aquifer groups 10 and 4, and low hydraulic heads.

Fig. 8
figure 8

Mean decrease accuracy and mean decrease GINI of effective predictors

Table 2 Accuracy of RF models using all predictor and only the five most important predictors of target variable
Fig. 9
figure 9

Groundwater flowing well potential index map of the study area

The plot of ROC curves for RF model is shown in (Fig. 10). The AUC for success and prediction rates were 0.98 and 0.97, respectively, which correspond to 98 and 97 % accuracy, respectively. This indicates the excellent capability of RF model in delineating groundwater flowing well potential zone in the study area.

Fig. 10
figure 10

ROC plot of random forest model

Conclusions

The efficacy of RF machine learning technique in demarcating flowing well potential zone at southern desert of Iraq has been investigated in the present study. The spatial associations between target variable (locations of flowing wells) and set of predictors (groundwater occurrence controlling factors) were used to model groundwater potential using RF model. Eleven predictor variables namely, elevation, slope angle, curvature, aspect, TWI, SPI, distance to Abu Jir fault, distance to Euphrates River, major aquifer group, total hydraulic head, and well depth were used for demarcation of groundwater flowing well potential zones. The study revealed that elevation, well depth, distance to Euphrates River, distance to Abu-Jir fault, aquifer groups, and groundwater heads are the most importance predictors, while the slope, curvature, aspect, TWI, and SPI have less influence in delineating groundwater potential in the study area. The groundwater flowing well potential index map shows that the high to very high, moderate, and low to very low potential zones occupy 15, 6, and 79 % of the total area of southern desert of Iraq, respectively. The validation of RF model using ROC curve revealed that the AUC’s of success and prediction rates were 0.98 and 0.97, respectively, which indicate the excellent capability of RF model in delineating groundwater flowing well potential in GIS. It is expected that the map developed in this study will provide valuable information for the development of groundwater resources to solve the long lasting water scarcity in the region. In future, the performance of RF model can be compared with other conventional methods to show its efficacy. Furthermore, other state of art data driven models can also be used in future for mapping groundwater potential zone.