Introduction

Groundwater is a vital natural resource around the world. Globally, groundwater provides approximately 50 % of current potable water supplies, 40 % of the industrial water demand, and 20 % of the water used for irrigation (Molden 2007). It is considered as the main source of future water supply, irrigation, and food production under impacts of global changes phenomena (Cllifton et al. 2010). In spite of huge groundwater reserves, supplies are heading for a crisis in many countries and need more attention. Main aquifers around the world are under pressure to meet the growing demands of water due to population growth. (Shahid and Hazarika 2010). Management of groundwater reserves in a sustainable manner is a major challenge. A goal of ground-water resource assessment is to provide information on the current status of the resource and provide insights about the future availability of ground water (Reilly et al. 2008). Determining groundwater availability involves calculating the volume of groundwater of an area or within an aquifer. In recent years, several authors have attempted to delineate groundwater availability using different data-driven and knowledge-driven techniques combined with remote sensing (RS) and geographic information system (GIS). Data-driven and knowledge-driven techniques reflect different perspectives in spatial modeling (Rajabi et al. 2014). The data-driven technique is merely based on the data while the knowledge-driven is based on evidence of varying quality, guideline, and expert opinions. GIS and RS techniques are effective spatial tools widely used for the assessment, monitoring, management, and visual representation of geographic information in several fields, including environmental, disaster, and hydrological fields (Jha et al. 2010). Combining these techniques involves probabilistic frequency ratio (Ozdemir 2011a; Oh et al. 2011; Manap et al. 2011; Moghaddam et al. 2013; Pourtaghi and Pourghasemi 2014; Naghibi et al. 2014; Elmahdy and Mohamed 2014; Al-Abadi 2015b) logistic regression (Ozdemir 2011a; Pourtaghi and Pourghasemi 2014), Shannon’s entropy (Naghibi et al. 2014; Al-Abadi 2015b), weights of evidence (Corsini et al. 2009; Ozdemir 2011b; Lee and Kim 2011; Pourtaghi and Pourghasemi 2014; Al-Abadi 2015a), artificial neural networks (Corsini et al. 2009; Lee and Kim 2011), fuzzy logic (Shahid et al. 2002), analytical hierarchy process (AHP) (Adiat et al. 2012; Rahmati et al. 2014), certainty factor (Razandi et al. 2015), maximum of entropy (Rahmati et al. 2016), and evidential belief functions model (Nampak et al. 2014; Mogaji et al. 2014; Pourghasemia and Beheshtirad 2015). More recently, machine learning techniques such as boosted regression tress, classification and regression tress (CART), decision tress, and random forest (Lee and Lee 2015; Naghibi et al. 2016; Rahmati et al. 2016) are more frequently used for studies of spatial zoning of groundwater resources.

Of all the mentioned earlier techniques, the FR is the most applicable in the study of groundwater, potentiality. The FR is a simple bivariate statistical tool to calculate the probabilistic relationship between independent and dependent variables including multi-classified maps (Oh et al. 2011). It is the probability of occurrence of a certain attribute (Borham-Carter 1994). According to Ozdemir (2011b) the FR provides a very simple and flexible technique to delineate spring potential zones. The same conclusions have been provided by Moghaddam et al. (2013) and Elmahdy and Mohamed (2014) for spring and aquifer yield potentiality evaluation, respectively. On the other hand, Neghibi et al. (2014) conducted a comparative study between FR and index of entropy models for groundwater qanat potential mapping in the Moghan watershed, Iran. They concluded that both techniques were capable of delineating groundwater qanat potential zones with very good accuracy. The same conclusion was provided by Al-Abadi (2015b) for delineating groundwater potential zones in northeastern Wasit governorate, Iraq.

Index of entropy (also called Shannon’s entropy) is the average unpredictability in a random variable, which is equivalent to its information content. The entropy of groundwater potential refers to the extent that the various controlling groundwater occurrences influence the groundwater productivity. Several influencing factors give extra entropy into the index system. The entropy value can be used to calculate objective weights of the index system (Jaafari et al. 2013). The application of this method for demarcating groundwater availability zones is still limited. The studies by Naghibi et al. (2014) and Al-Abadi (2015b) successfully applied this technique for demarcating groundwater qanat potential and groundwater potential yield, respectively.

The northeastern parts of Missan and Wasit governorate along the border between Iraq and Iran which include Badra–Al Gharbi–Teeb areas have multiple aquifers which are used for different purposes such as grazing, agriculture and more recently, the oil industry. The water demand in the study area has progressively increased during the last decade. The mismanagement of aquifer may lead to mining of groundwater reserve and deterioration of groundwater quality. Therefore, it is vital to develop groundwater resource in order to manage this finite resource properly.

The first step in this context is to demarcate groundwater availability of the aquifer system in the study area, the main objective of this study. The groundwater availability zones of the study area was firstly delineated using FR using linear combination technique and with assumption that the used factors have the same weights. The same linear combination technique was again used after calculation of appropriate weight of each groundwater occurrence factor using index of entropy approach. The results were compared and validated to choose the best method to delineate groundwater availability zones in the study area. One factor for selection of these techniques to delineate groundwater potential in the study area among all the mentioned powerful techniques is that the FR is simple and easy to implement and does not require higher mathematical skills. In addition, applying combining FR-index of entropy highlights the importance of factors affecting the existence of groundwater. The re-evaluation of these factors allows the study of groundwater more accurately.

The study area

The study area lies between (31°17″–33°55″) latitude and (45°38″–47°53″) longitude (Fig. 1). It extends over an area of about 7288 km2 along the Iraq–Iran border. The study area includes the northeastern parts of Wasit and Missan governorates. The major portion of the study area is flat and featureless. Three-quarters of the study area are plain with a gentle slope and occupy the southwestern parts. The remaining quarter locates in the northeastern parts and is roughly parallel to the Iranian border and characterized by low anticlinal folds with intervening synclinal valleys (Parsons 1956). Elevation in the study area ranges from 6 to 691 m with an average of 45 m above mean sea level (Fig. 2). The climate of the study area is characterized by hot, dry summer, cold winter and a pleasant spring and fall. Approximately 90 % of the annual rainfall occurs between November and April, most of it in the winter months from December to March. The remaining 6 months are dry and hot. The area receives an average mean annual rainfall of approximately 212 mm/year with uneven rainfall distribution between plain and mountain parts. Three intermittent streams flow through the study area. The source of all these intermittent streams is in the Iranian territory. In the north of the study area, the Galal–Badra stream is regarded as the bigger one. The mean monthly discharges of this river are 2.5 and 1000 m3/s in drought and flood period, respectively (Al-Shammary 2006). In the south, the area is crossed by two streams: Teeb and Dewereg. The bigger one is Teeb which enters the Iraqi territory at the Teeb town and runs from north to south until it ends in Al-Sanaf marsh outside the study area. The other stream is Dewereg which enters the Iraqi territory at Fauqi area and runs from east to northeast until it finishes in Al-Rais marsh. The most common landforms within the studied area are valley network, alluvial fans, flood plain, sebkhahs, Ahwar (marshs), and sand dunes (Barwary 1993). Due to the prolonged drought conditions and intermittent nature of the streams in the study are, most of the farmers depend on groundwater for their irrigation needs.

Fig. 1
figure 1

Location map of the study area

Fig. 2
figure 2

Elevation map (m) of the study area

Rocks of Middle Miocene and Pliocene exist in the study area. These rocks are buried beneath the Mesopotamian plain by thick deposits of Pleistocene and Holocene age. Most of the study area is covered with fluviatile, lacustrine, and aeolian sediments of recent age. The stratigraphic succession in the study area consists of the following formations: Jeriby, Fatha, Injana, Bakhtiari, and Quaternary deposits. A brief description of these units are summarized in Table 1. The geologic map of the study area is presented in Fig. 3. From a tectonic point of view, Iraq can be divided into three tectonically different areas: the Stable Shelf Zone with major buried arches and antiforms but no surface anticlines, the Unstable Shelf with surface anticlines, and the Zagros Suture which comprises thrust sheets of radiolarian chert, igneous and metamorphic rocks (Jassim and Goff 2006). The largest part of the study area lies within the Mesopotamian Zone. The Mesopotamian Zone is the easternmost unit of the stable shelf. It is bounded in the northeast by the folded ranges of Pesh-i-Kuh in the east, and Hemrin and Makhul in the north. The zone was probably uplifted during the Hercynian deformation but it subsided from late Permian time onwards. The study area contains buried faulted structure below the Quaternary cover, separated by broad synclines (Buday and Jassim 1987). The fold structures mainly trend northwest to southeast in the eastern part of the zone and north–south in the southern part.

Table 1 Geological Description of the lithological units in the study area
Fig. 3
figure 3

Exposed lithological units

The aquifer system in the study area consists of two hydrogeological units. The first represents the shallow unconfined aquifer consisting mainly of layers of sand and gravel, with overlapping clay and silt. This hydrogeological unit is located within the Quaternary lithological layers. The second hydrogeological unit is Mukdadiya water-bearing layer. The aquifer condition of this unit is confined/semi-confined. The regional groundwater flow is from northeast to southwest. Depths to groundwater range from 26 to 162 m. The hydraulic characteristics of the two hydrogeological units were estimated by means of pumping test (Al-Shammary 2006). For the unconfined aquifer, the hydraulic conductivity, transmissivity, and specific yield were 6.3 m/day, 228.43 m2/day, and 0.012, respectively. For the confined aquifer the values were 3.5 m/day, 81.07 m2/day, and 0.0017 for hydraulic conductivity, transmissivity, and storage coefficient, respectively.

Modeling techniques

Frequency ratio FR

Frequency ratio model is based the observed statistical association between geographic locations of productive boreholes, flowing wells, springs and each associated factors controlling groundwater occurrence in an area. The FR is the ratio of the area where boreholes occurred to the total study area. To calculate the FR, the following formula is used (Ozdemir 2011b):

$$FR = \frac{{{A \mathord{\left/ {\vphantom {A B}} \right. \kern-0pt} B}}}{{{C \mathord{\left/ {\vphantom {C D}} \right. \kern-0pt} D}}} = \frac{b}{a}$$
(1)

where A is the area of a class for the groundwater factor; B is the total area of the factor; C is the number of pixels in the class area of the factor; D is the number of total pixels in the study area; b is the percentage for area with respect to a class for the factor and a is the percentage for the entire domain.

The groundwater availability index (GAI) based on this technique is computed as: (Ozdemir 2011b; Jaffari et al. 2013; Naghibi et al. 2014)

$$GAI = \sum\limits_{i = 1}^{n} {FR_{i} }$$
(2)

where FR i is the frequency ratio for a factor and n is the total number of used factors. A detailed mathematical background of this method can be found in Lee et al. (2006).

Index of entropy

In information theory, entropy is a measure of uncertainty in a random variable (Ihara 1993). The entropy indicates the extent of the instability, disorder, imbalance, and uncertainty of a system (Yufeng and Fengxiang 2009). Index of entropy is the average unpredictability in a random variable, which is equivalent to its information content. The following equations are used to calculate the information coefficient w j (weigh for each factor) (Bednarik et al. 2010; Bednarik et al. 2012; Constantin et al. 2011; Jaafari et al. 2013):

$$P_{ij} = FR = \frac{b}{a}$$
(3)
$$\left( {P_{ij} } \right) = \frac{{P_{ij} }}{{\sum\limits_{j = 1}^{Sj} {P_{ij} } }}$$
(4)
$$H_{j} \; = \; - \sum\limits_{i = 1}^{Sj} {(P_{ij} )\;\log_{2} \;(P_{ij} ),\;\;\;\;\;\;\;j\; = \;1, \ldots ,n}$$
(5)
$$H_{j\hbox{max} } = \log_{2} Sj$$
(6)
$$I_{j} \; = \;\frac{{H_{j\hbox{max} } \; - \;H_{j} }}{{H_{j\hbox{max} } }},\;\;\;\;\;\;\;I\; = \;(0,1),\;\;\;\;j\; = \;1, \ldots ,n$$
(7)
$$w_{j} = I_{j} P_{ij}$$
(8)

where (P ij ) is the probability density, Hj and H jmax refer to entropy values, Sj is the number of classes, I j is the information coefficient, and w j is the resultant weight value for the factor as a whole. The range of w j is between 0 and 1. The final GAI is calculated using Eq. 9:

$$GAI = \sum\limits_{i = 1}^{n} {FR_{i} \times w_{j} }$$
(9)

Spatial data preparation

In general, four steps must be implemented to delineate aquifer availability zones using FR and SE approaches. These steps are: (a) Collect borehole locations data and arbitrary divided into two sets: training and testing according to specific criteria such as (70/30) or (80/20). The training data set is solely used to investigate the statistical relationship between borehole locations and groundwater influencing occurrence factor. The testing set is used to validate the results and exhibit the capability of the model to predict availability zones (blind test). (b) Build the geospatial database. In this step of the analysis, the evidential raster thematic layers of groundwater occurrence factors are prepared using different resources such as conventional, field survey, and RS. All thematic layers must be converted to raster format to use in further analysis. (c) The relationship between borehole locations (for training data set) and groundwater occurrence factors are investigated using FR and Index of entropy in the third stage of the analysis. The computation of likelihood ratio for each class of each factor are carried out and the appropriate weight for each factor used is also computed. The GAI is then computed and classified into different classes using appropriate classification scheme such as Natural Break, Geometric, etc. depending on the conditions of the study area and personal experience. (d) The fourth step implies the validation of the results and compare the effectiveness of model in prediction groundwater availability zones. In general, the validation of the results is carried out for training (called success rate) and testing (prediction rate) using well-known Relative Operating Characteristic (ROC) technique, or directly through compare the locations of borehole with predicted groundwater prospective zones. Sometimes if the GAI is estimated using different techniques, the capabilities of these techniques are also compared and the model selected is the best one with minimum prediction error. A flow chart is presented in Fig. 4 for clarifying the previously mentioned procedure.

Fig. 4
figure 4

Flow chart of delineating groundwater availability zones

Borehole locations inventory

The borehole data were obtained from the General Commission of Groundwater/Ministry of Water Resources, Iraq. The data record involved many relevant data such as geographical location (UTM), borehole flow rate (l/s), depth of borehole (m), type of aquifer (confined, unconfined), and physicochemical analysis of groundwater constituents. In Total, there are 211 productive boreholes in the study area. From these only boreholes with relatively high flow rate (8 l/s) (137) were mapped and used for building the groundwater potential models. The flow rate was chosen after literature review and obtaining groundwater expert opinions (Jabar Al-Syadi, Expert, General Commission of Groundwater/Ministry of Water Resources, Iraq, personal communication). From the 211 boreholes, 95 (70 %) were randomly selected as training data and the rest 43 (30 %) were kept for validation purposes. The statistical MINITAB v.16 commercial software was used for splitting data.

Selection of groundwater occurrence factors

In this study, ten groundwater occurrence factors were selected based on expert opinion and literature review. These factors were elevation (m), slope (°), curvature, aspect, TWI, SPI, geology, soil, Land use/land cover (LULC), and distance to faults. Elevation is an important factor for groundwater occurrence because weather and climatic conditions vary greatly at different elevations, and this caused differences in soil and vegetation (Aniya 1985). Slope is a rise or fall of land surface. It is an important factor for groundwater availability mapping studies because it controls accumulation of water in an area, and hence enhances the groundwater recharge (Ozdemir 2011a). Curvature is the second derivative of a surface, or the slope of the slope (Kimerling et al. 2011). It represent the morphology of the topography. There are three different types of curvature: total, profile, and plan. The profile curvature is parallel to the direction of the maximum slope and mainly affects the acceleration of deceleration of flow across the surface. A negative value of profile curvature indicates that the surface is upwardly convex, and a negative curvature implies that the surface is upwardly concave, while the value of zero indicates that the surface is linear (Oh and Lee 2010). Plan curvature is perpendicular to the direction of the maximum slope and mainly affects the convergence and divergence of flow across the surface. A positive value points out that the surface is sidewardly convex, a negative indicates the surface is sidewardly concave, a value of zero implies that the surface is linear (Kimerling et al. 2011). The combination of the profile and plan curvature is called total curvature. Considering both plan and profile curvatures enables understanding more accurately the flow across a surface. Aspect or slope direction identifies the downslope direction of the maximum rate of change in value from each cell in a raster to its neighbors (Burrough and McDonnell 1998). The values of aspect indicate the compass direction that the surface faces at that location. It is measured clockwise in degree from 0 (north) to 360 (again due north), coming full circle. Due to the fact that flat areas have no downslope direction, a value of −1 is given. Aspect strongly affects hydrologic processes via evapotranspiration, direction of frontal precipitation, and thus affects weathering process and vegetation and root development, especially in drier environments; therefore, it is considered in this study. Topographic indices such as TWI and SPI have a vital role in the spatial variation of hydrological conditions such as soil moisture, groundwater flow, and slope stability (Lee and Kim 2011). These topographic indices have been used to describe spatial soil moisture patterns (Moore et al. 1991). Geology is an important factor in groundwater potential mapping (Ozdemir 2011a, b; Lee et al. 2012; Manap et al. 2011; Nampak et al. 2014). Geology influences the occurrence of groundwater because lithological variation often leads to difference in porosity and hydraulic conductivity of rocks and soils. Soil refers to the uppermost portion of the unsaturated zone characterized by significant biological activities. Soil has an impact on the amount of recharge, which can infiltrate to the groundwater, and hence increase groundwater storage of an aquifer. The other considered factor, the LULC, defines the biological state of the earth’s surface and how people utilize land and socio-economic activity. It is consider as a factor for demarcating groundwater availability by many researchers (Manap et al. 2011; Nampak et al. 2014; Gumma and Pavelic 2013; Moghaddam et al. 2013). Variation in LULC categories contributes to the variation of soil conditions and subsequently groundwater occurrence. Structure setting controls the occurrence and movement of groundwater. Most rocks possess fractures and other discontinuities which facilitate storage and movement of fluids through them. Some discontinuities, e.g. faults and dykes, may also act as barriers to water (Singhal and Gupta 1999). Both are taken into consideration in this study as main factors contributing to groundwater availability within the study area.

Preparation of thematic raster layers

All the above-mentioned groundwater occurrence factors were prepared in raster format with a cell size of 30 × 30 m cell size using ArcGIS 10.2 commercial software and its extensions Spatial Analyst, Geostastical Analyst, Image Analyst, Arc Hydro and ArcTool box. For classification of continuous values of influential raster layers, different classification schemes such as Jenks, equal, and manual were used. The process of data classification combines raw data into predefined classes, or bins. These classes may be represented in a map by some unique symbols or, in the case of thematic maps, by a unique color or hue. The Jenks classification scheme (also called natural break classification method) is a data clustering method designed to determine the best arrangement of values into different classes. The method seeks to reduce the variance within classes and maximize the variance between classes (Jenks et al. 1967). Essentially, the Jenks method minimizes within class variances (makes them as similar as possible) and maximizes variance between groups (makes data classes as different as possible). The advantage of the Natural Breaks (Jenks) classification is that it identifies real classes within the data. This is useful because it creates thematic layer maps that have accurate representations of trends in the data. Selection of this classification scheme is based on literature reviews and author’s experience of the study area and its conditions (Al-Abadi 2015b).

To create topographic factors, i.e., elevation, slope angle, curvature, aspect, TWI, and SPI, the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER-GDEM) (http://gdem.ersdac.jspacesystems.or.jp/search.jsp) was used. The ASTER-GDEM was developed by the Ministry of Economy of Japan and the United States National Aeronautics and Space Administration (NASA). The spatial resolution of the ASTER-GDEM tile is approximately 30 m. Six raw DEM tiles were download from the previous web location and merged to create new raster. The new raster is then reprojected to UTM projected coordinate system (WGS datum, 38 N), and sinks are filled to create filled elevation raster layer of the study area (Fig. 2a). An area free of sinks, also called a depressionless DEM, is the preferable input to compute the flow direction in the basin. The process seeks to fill the sinks in a DEM gird; hence, if cells with higher elevation surround a cell in the DEM array, the water is trapped and cannot flow. The fill sinks modifies the elevation values around the cell to eliminate these problems. The elevation raster layer were derived directly from filled DEM and classified using break classification scheme into five classes (Fig. 2) to use in the further analysis. The slope angle (°) of the study area was derived from filled DEM using Spatial Analyst extension of ArcMap and presented in Fig. 5 after being classified manually into five categories: flat-gentle slope <5°, fair slope (5°–15°), moderate slope (15°–30°), steep slope (30°–50°), and very steep slope >50°, (Pourghasemi et al. 2013). The curvature raster map was also derived directly from the filled DEM and the resulting raster was classified manually into three classes: convex <0, flat 0, and concave >0 (Fig. 6). The aspect map was prepared from fill DEM too and classified into ten classes (Fig. 7): Flat (−1), North (0–22.5) (337.5–360), Northeast (22.5–67.5), East (67.5–112.5), Southeast (112.5–157.5), South (157.5–202.5), Southwest (202.5–247.5), West (247.5–292.5), and Northwest (292.5–337.5). The TWI and SPI are defined mathematically as (Moore et al. 1991):

$$TWI = \ln \left( {\frac{a}{\tan \beta }} \right)$$
(10)
$$SPI = A_{s} \tan \beta$$
(11)

where, a is the local unslope area draining through a certain point per unit contour length and tan β is the local slope in degrees, and A s is the specific catchment area. To compute TWI and SPI factors, the flow direction and flow accumulation layers must be firstly computed which are considered as main steps for terrain analysis and watershed delineation. Flow direction function is computed for every cell in the filled DEM the direction that water would flow through it. The value of the cell in a flow direction raster is a number between 1 and 128 that represent a cardinal direction. The flow direction grid is used as input to create the flow accumulation layer. The flow accumulation function computes for each cell in filled DEM array, the number of cells flowing into it. The flow accumulation allows determination of the area draining to any specified point in a DEM. The Arc Hydro extension of ArcGIS was used to derive these layers. The flow accumulation layer was used to derive the raster layers of TWI and SPI using Eqs. 10 and 11 and subsequently classified into four classes for both layers using Jenks classification system (Figs. 8, 9).

Fig. 5
figure 5

Map of slope angle (%)

Fig. 6
figure 6

Map of total curvature

Fig. 7
figure 7

Aspect map of the study area

Fig. 8
figure 8

Raster map of TWI

Fig. 9
figure 9

Raster map of SPI

The geological raster layer was prepared through converting geologic map of the study area (Fig. 3) from polygon to raster thematic layers using conversion tool in ArcGIS 10.2 after assigning appropriate integer codes for each lithological units. To prepare raster layer of soil infiltration capacity, a total of 50 samples of soil were collected at a depth of 25 cm below the groundwater surface. The collected soil samples were collected in clean polyethylene containers and transmitted to soil laboratory of geology department to carry out grain size analysis using Sizer Master Instrument. The geographical location of each sample was determined with a handled global positioning system (GPS). These sample were assign appropriate texture name with the help of United States Department of Agriculture (USDA) triangle (http://soils.usda.gov/technical/aids/investigations/texture). The soil types in the study area were then converted to soil permeability values based on soil taxonomy (USDA 1986). The average typical value of infiltration rates was interpolated using kriging techniques in Geostatistical Analyst extension of ArcGIS 10.2 to produce the soil infiltration rate raster layer. After that, the soil raster layer was classified into five classes using Jenks system (Fig. 10) and reclassified to use in the coming analysis. The LULC rater layer was prepared using analysis of LANDSAT 8 images. Three raw images covering the study area were firstly download from the official web site of United State of Geological Survey (USGS) (http://earthexplorer.usgs.gov/). The seven bands (bands 1–7) for each image were composed and enhance radiometry and merged with other two composed images to create new mosaic raster. The new mosaic raster were then clipped for the study area and classified using supervised maximum likelihood approach by Image Classification tool in ArcGIS 10.2. Four LULC classes were found in the study area after comparison with ground truth: Urban, Agriculture, Barren, and Shrub (Fig. 11). The Barren and Shrub classes encompass an area of 5892 km2 (98 %). Only 120 km2 of the study area (2 %) was covered by Urban and Agricultural classes. To create a raster map to distance the faults, the hard copy of tectonic map of Iraq (Geological Survey of Iraq 1996) was scanned, geo-referenced, and digitized manually into ArcMap software. The distance to faults raster was prepared through applying the Distance commend in Spatial Analyst and reclassified by equal classification schemes into ten classes (Fig. 12).

Fig. 10
figure 10

Raster map of soil infiltration rates

Fig. 11
figure 11

Map of LULC

Fig. 12
figure 12

Distance to faults map

Results and discussion

The ten thematic raster layers were overlaid with the boreholes location inventory map for the application of FR and combining FR-index of entropy models. Results of applying FR and index of entropy approached were summarized in Table 2. The ratio of the area where groundwater borehole locations with high flow rate to the whole area is the relation analysis; so the value of 1 shows an average correlation (Moghaddam et al. 2013). If the value greater than 1, there is a high correlation, and a lower correlation happens when the ratio is lower than 1 (Lee and Pradhan 2006). Analysis of the FR results indicates that the elevation ranges 30–71 and 30–145 have high FR values (2.30 and 2.65, respectively) indicating high probability of groundwater availability for these classes. The rest of the elevation ranges were approximate zero or zero indicating low probability of groundwater availability. It is accepted that groundwater occurrence decreases with altitude (Moghaddam et al. 2013). Higher altitude had a higher runoff whereas lower altitude has more recharge and higher infiltration rate (Manap et al. 2011). In the case of slope, the FR ratio is greater than one for slope range 5–15, indicating high groundwater availability associated with this range. The rest of the classes have FR below 1, indicating low groundwater availability conditions. It is accepted that as the slope increase, the runoff increases as well (Israil et al. 2006). According to Madrucci et al. (2008) slopes greater than 35° are regards as aconstrain on groundwater occurrence. In case of curvature, all three classes have FR values less than 1, implying that this factor only play a minor role in controlling groundwater occurrence in the study area. The only classes which have approximately significance FR value were flat classes. The flat area is suitable for accumulation of water and thus enhanced groundwater recharge and replenishment. With respect to TWI, the TWI class −2.44 to 8.37 has the highest value of FR (1.15) followed by 8.37–9.24 and 9.24–10.40 classes (0.99 and 0.96, respectively). In the case of SPI, the FR increases with the increase of SPI. The highest value of FR is associated with the last class 6.68–18.21 (25.92). In case of geology, the higher values of FR are associated with Quaternary deposits (inner flood, alluvium, and alluvial flood sediments) with FR equal to 1.84, 1.76, and 1.44, respectively, implying that these lithological units have higher probability for groundwater occurrence in the study area. In case of LULC, the urban and agriculture classes have high FR values 13.44 and 13.38, respectively indicating favorability of these classes for groundwater occurrence. For soil factor, the FR values are greater than 1 for the highest infiltration rates (classes 5.65–7.16 and 7.16–10 with 0.99 and 4.01 FRs, respectively. The higher infiltration rate of soil increases the opportunity of water percolation, and thus groundwater recharge and groundwater availability within an area. In the case of distance to faults, the higher FR is concentrated in the first classes 0–2713 m (2.82), followed by third class 5797–8880 (1.01). The FRs for the remaining classes equals zero or approaches zero. This indicates the importance of the structural setting on the groundwater availability in the study area.

Table 2 Computation of FR and entropy weights

The groundwater availability index (GAI) was firstly computed using Eq. 2 with the assumption that all groundwater factors have the same influence on the groundwater availability and demonstrated as a map in Fig. 13. The obtained GAI values are in range of 3.97–58.20 and were classified based on Jenks classification scheme into five classes: very low (3.97–8.73), low (8.73–11.94), moderate (11.95–17.45), high (17.49–26.79), and very high (26–58.20). The areas occupied by each of these classes were summarized in Table 3. The very low to low classes encompass an area of 72 % (5302 km2), the moderate class extend over and area of 20 % (1447 km2), while the high to very high classes only occur within 8 % (539 km2). The large area occupied with very low to low classes indicates the low groundwater availability of the aquifer system in the study area. The spatial distribution of these classes revealed that the high and very high classes are concentrated in the center of the Badra area, north of the study area and extend over sparsely areas in the Ali Al-Gharbi and Teeb areas mainly close to the Iraq–Iran border.

Fig. 13
figure 13

Groundwater availability zones derived by FR model

Table 3 Distribution of the groundwater availability classes and areas covered by each of these classes

The computed weights for each factor using Index of entropy model, Table 2, indicated that the most influencing factors on groundwater availability in the study area were LULC, soil, elevation, slope, and geology. The computed weighs for these factors were 2.258, 1.528, 0.4926, and 0.2327, respectively. The weights were 0.1915, 0.1811, 0.0197, 0.0140, and 0.0113 for distance to faults, SPI, aspect, TWI, and curvature, respectively, indicating that these factors play a minor role in control groundwater occurrence. The combining FR-index of entropy model was computed using Eq. 9. The computed weights for each factor was multiplied by associated factor thematic layer and summed to produce GAI. The obtained GAI value were 2.23–50.22. These were classified into five classes using Jenks scheme: very low (2.24–4.30), low (4.30–5.49), moderate (5.49–8.34), high (8.34–17.86), and very high (17.86–50.22) (Fig. 14). The very low to low classes encompass an area of 70 % (5118 km2); the moderate classes extend over and area of 16 % (1155 km2), while the high to very high classes only occur within 14 % (1015 km2). The high to very high zones are concentrated in the center of the Badra area and the northeastern parts of Al Al-Gharbi and Teeb areas close the Iraq–Iran border. The results are consistent with the first model, but the areas covered by high to very high classes are approximately double the areas obtained by applying FR alone. Results of both models confirm that the groundwater availability within the study area is low.

Fig. 14
figure 14

Groundwater availability zones produced by FR-index of entropy model

The next step in the analysis is to validate the results. The relative operating characteristics (ROC) was used in this study to examine the accuracy of predictive models. The ROC is a useful method of representing the quality of determining and probabilistic detection and forecast systems (Swets 1973). It a graphical chart that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is constructed by plotting the trade-off between the false-positive rate (also called sensitivity) on X axis and true positive rates (also called 1- specificity) on Y axis at various threshold settings. The area under the ROC curve characterizes the quality of a forecast system by describing the system’s ability to anticipate correctly the occurrence or non-occurrence of predefined ‘events’ (Mason and Graham 2002). In a ROC curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (1-Specificity) for different cut-off points of a parameter. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased/normal). The relationship between AUC and prediction accuracy can be summarized as (Yesilnacar and Topal 2005): Poor (0.5–0.6); average (0.6–0.7); good (0.7–0.8); very good (0.8–0.9); and excellent (0.9–1). The AUC was obtained for both the training (success rate) and testing (prediction rate) for both models by using ROC module in IDRISI software, Fig. 15. The success rate is essential to explain how well the resulting GAI map classified the area of existing borehole locations. The success rate results were obtained by comparing the training borehole locations (95) with the two GAI maps. The AUC for FR and combining FR-index of entropy model were 0.832 and 0.834, respectively, indicating that the FR-index of entropy performs slightly better than FR alone. The prediction rate used a measure for predictive capability of the model. It solely used the testing data point to investigate the prediction performance. The prediction rates for both models were shown in Fig. 15. The prediction rates for FR and FR-index of entropy models were 0.804 and 0.806, respectively implying that the FR-index of entropy was slightly better than FR. It can be seen from these results that both models were capable to delineate groundwater availability zones with very good result in the study area, but FR-index of entropy was slightly better than FR alone.

Fig. 15
figure 15

Validation results using ROC curves

Conclusions

The main conclusions from this study were: (a) The FR and FR-index of entropy approach combining with RS and GIS technologies provide a powerful tool for delineating groundwater availability in an arid region. (b) The FR-index of entropy predictive capability is slightly better than FR alone where the prediction rates for FR and FR-index of entropy were 0.804 and 0.806, respectively. (c) Although the results of building two models resulted in approximately similar prediction rates, the main advantage of combining models is that it highlights the importance of factors influencing groundwater occurrence and allows to more in-depth study of these factors and their contribution in future management of the area’s groundwater resources. (d) The computed weights using index of entropy approach indicated that the most influencing factors on groundwater availability in the study area were LULC, soil, elevation, slope, and geology. The computed weighs for these factors were 2.258, 1.528, 0.4926, and 0.2327, respectively. The weights were 0.1915, 0.1811, 0.0197, 0.0140, and 0.0113 for distance to faults, SPI, aspect, TWI, and curvature, respectively, indicate that these factors play a minor role in control groundwater occurrence. (e) The areas covered by very low to low groundwater availability zones occupy 70 and 72 % of the total area for FR and FR-index of entropy models, respectively, indicating that the groundwater availability is low. (f) The results of this study could be used for efficient managing groundwater resources in the study area incorporating challenges facing water resources of Iraq.