Investigation of distinctive physico-chemical soil correlations for Kirkuk city using spatial analysis technique incorporated with statistical modeling

In the sense of challenging economic situation, it is difficult to perform laboratory tests in a whole intended area to define the soil characteristics for whatever task or situation within an entire city. Hence, this study has used the spatial analysis through applying the inverse distance weighted (IDW) technique to integrate with the available data for 56 different soil samples collected from various depths of Kirkuk city. The studied characteristics were mainly classified to physical soil properties included gravel, sand, silt, and clay contents with chemical soil properties included organic and gypsum contents, pH content, sulfur trioxide, and total dissolved solids (TDS) content. Moreover, statistical investigations such as physico-chemical correlation, linear single and linear multi regression models have been applied. The results of this study have focused on constructing thematic digital maps for visualizing different physical and chemical characteristics of Kirkuk soil. In statistical analysis, negative linear correlations are obtained from the relation between the values of the pH, gypsum, organic, and SO3 contents with the increase of the gravel content, whereas a positive linear correlation is attained for the values of TDS content with the increase of the gravel content. Furthermore, the proposed linear multi regression model predicts the pH values reasonably where most of the values were close to the equality line. The obtained digital maps accompanied with the statistical models will provide comprehensive spatial, agricultural, chemical and mechanical visualizations of the nature and morphology of the soils in Kirkuk city.

the landfill areas to avoid the population extents, prevent contamination of any nearby drinking and irrigation groundwater, and prohibit the threatening of the human health [4]. To prepare a simple digital soil map, there is a need for more than millions of pixels or cells. A GIS or "Geographical information system" is an assembly of automated functions and records that are coupled simultaneously for the principle of retrieving, storing, managing, and analyzing information [12]. The GIS procedure can represent maps coinciding different aspects such as road, industry, prospective pollution sources, crop supplies and degradation distributions [20]. Currently, the soil scientists provide a huge database of boreholes data such as borehole logs, borehole plans, borehole tests, surface and subsurface soil properties. Most of these soil data are kept as hard copies in boxes, drawers, and file cabinets. The processes of finding such information or soil data are very time consuming. Always, it is very challenging and costly to identify the soil properties for any agricultural, civil or geotechnical engineering purposes. The GIS applications were introduced as an advanced technology to classify the potential of geological hazards, arranging and tracking fieldwork, generating maps and images, and expanding communications [1,25]. The land use regression model incorporated with geographical information analysis was implemented for measuring different weather seasons in Iran [22].
It is recognized that the soil plays a vital role in miscellaneous activities all over the world. It is proved that the soil has a major role on the biological environment accompanied with the human health since it is a chemical source for plants [36]. The process of sorption of insecticides in any soil relies on the several significant characteristics of soil such as pH, organic content and clay content [19]. Progressively, consideration has developed the focus on the character and function of large voids in the soil depending on both chemical and water movements. The hypothesis of avoid movement through macro pores is an ideal path and the reality reveals that such voids allow fast travel of both chemicals and water [7]. The controlling factor during the process of infiltration is the soil permeability rather than the effect of capillary forces. In addition, the highly permeable soils represented by gravels and sands can expose the unconfined groundwater aquifer to contaminations from the surface sources [8,33]. The ordinary clay minerals such as kaolinite, illite and smectite have great impacts on several soil characteristics represented by organic substance, nature, cation replacement capability and dispersity [31]. Clay minerals are increasingly known as an essential portion in soils affecting the holding capability and bioavailability and mobility of different heavy metals in certain environments [34]. However, clayey soils can be considered as fundamental barriers against liquid contaminants. Thus, having information on clayey soil distribution provides a thorough understanding in opposition to contamination spreading [5]. The contaminated soil can cause miscellaneous engineering problems represented by differential settlement, damage to structures due to soil leachate, cracking and inclination in the ground surface [15]. In addition, the locations and volumes of contaminated soils are necessary for remediation process. However, problematic engineering soils could be fertilized lands for agricultural uses. Thus, collecting various available agricultural, geotechnical, chemical and physical soil characteristics in several digital maps are essential specifically in Kirkuk city where such maps are necessary to identify the right places for various applications.
The physico-chemical soil analysis has been applied in the agricultural process in several conducts. Initially, the physico-chemical soil characterization has been used for determining the biological properties of soils in different agricultural locations in Italy [21]. The physico-chemical analysis for spiroxamine hydrolysis has been inspected under various pH values to enhance the knowledge about the effect of water and its degradation impacts [9]. Moreover, the physico-chemical investigations for both wastewater and ground water have been evaluated to check the quality of such water for drinking and agricultural purposes where such quantifications included measuring pH, temperature and conductivity [6,10]. On the other hand, the sorption of ten agricultural soils collected from various places has been tested where pH, organic content, electrical conductivity, clay content, gravel content, and silt content have been measured [37]. Furthermore, the physico-chemical characteristics of both corn and rice with the advancement of thin flaked rice have been inspected [3,18]. Likewise, the physico-chemical features have been applied in different agricultural products such pomegranate [16], the powder of papaya leave [32], and Indian wheat [30]. Thus, it is essential to identify the distribution of any soil type to characterize the compatibility of the physico-chemical properties of the soil and the planted agricultural products.
Investigations using spatial analysis in the presence of heterogeneity in the soil characteristics play essential functions in improving soil mapping and surveys, developing powerful information for precise fertilization and enhancing the nutrient management in the soils [23]. The spatial adaptability for soil physico-chemical classification has been used in order to predict various Iranian maps with distinctive soil properties [24]. In addition, it was proved that the phosphorous variations in soils require detailed temporal and spatial investigations for better understanding of using phosphoric management systems [11].
Based on ESRI specifications [13], GIS applications have been customized for natural landslide searching, habitat mapping, agricultural reclamation and air pollutant monitoring. In addition, an expert based scheme was developed to provide two-dimensional visualizations for the ground circumstances [2]. An integrated geographical information system and data based Geo-statistical classification has been advanced to consider protecting the subsurface records. The Arc GIS system can connect the soil formation with their properties at any spot within the subject region.
The main objective of this study is to provide a full database for the available soil information obtained from 56 different soil samples for various depths of Kirkuk city using spatial analysis technique tools with inverse distance weighted method. The specific objectives can be summarized as digital maps for visualizing the physical and chemical soil characteristics of Kirkuk city, including the percentages of gravel, sand, silt, clay, pH, gypsum, total dissolved solids, organic and sulfur trioxide distributions. In addition, predicting the physico-chemical behavior of Kirkuk soils using single and multi linear regression models. Such obtained digital maps and statistical models can provide physical, agricultural, chemical and mechanical detailed visualizations about the behavior and features of the soils in Kirkuk city.

The study area
Kirkuk city is considered to be one of the Iraq's governorates where its heritage belonged to 5000 years ago [28]. Kirkuk is located on "latitude 35° 28′ 5″ north" and "longitude 44° 23′ 31″ east", and about 350 m above the sea level based on "UTM" reference project zone 38 N. The Kirkuk area's is about "9679 km 2 " where this area represents 2.2% of the entire area of Iraq [29]. Kirkuk contains 13 governmental offices forming four districts where these districts are Kirkuk, Al-Hawiga, Daquq, and Maxmur shown in Fig. 1.

Methodology
Assembling geographic information is considered to have the greatest cost and most time-consuming process of developing a database for GIS application. The process of collecting and rearranging data in GIS is split into two groups of processes: assembling and integrating information. Data collection contains entering new data into GIS, whereas data integration includes converting data and preserving its integrity. If the spatial distribution of any studied characteristics is random, global autocorrelation analysis is permitted to be used in GIS [27]. Moreover, the spatial analysis can be used for standardized spatial weights matrix [14]. It is meaningful to mention that the progress and use of spatial method has speeded up the development of landform arrangement [35].
The main source of the data is the Kirkuk constructional laboratory for material testing. In addition, fieldwork observations for site investigation have been used in which soil data for 56 different samples from distinctive depths with physical and chemical properties including latitude and longitude Global position system data by (GS 20 Leica) for each point have been collected. Examples of geographical, physical and chemical properties of soil samples from Kirkuk city have been summarized in Table 1.
Every source data were typed in the excel sheet with latitude and longitude for each investigated point as well as for the laboratory data. The thematic map or the shape file was created depending on the tools in the Arc Map. The interpolation maps were created by interpolating the surface from the points using an inverse distance weighted (IDW) technique as shown in Fig. 2.
The procedure of inverse distance weight is simple, wide, and one of the utmost vital methods for interpolation. The inverse distance weighted algorithm is assumed that the average values of the neighborhood points cannot be used to estimate the value for the sampled point, however; the accuracy of the value can be increased using the average values for the non-surrounding points.
Y o = Estimation value of variable z in point I; y i = Sample value in point I; d 1 = Euclidian distance between observe sample point to estimated point; N = A coefficient of weigh based on a distance; n = inverse distance weighing power.
The used interpolation method is based on a distance-weighted average method or a potential model method. In all these models, the exponent related to the distance is defined by the user. In the created digital maps, a surface calculated by IDW is a function of both sample value and strategy of neighborhood search. Both maximum and minimum values of any surface can be exactly interpreted using IDW. The developed digital maps are highly affected by any clustering or outliers. IDW supposes that the modeled phenomenon is determined by local variation and it can be captured using defined acceptable neighborhood search.
Utilizing parameters obtained from part of the input data, the outliers have been evaluated where the IDW interpolation clearly renders the inferences that are connected to each other in which they are more comparable than those that are quite away. IDW applies the measured data accompanying the simulation place to estimate a value for  implications than those further out on the expected value. IDW maintains that each determined point has a local impact that declines with time. It offers greater weights to places nearest to the forecast position, and as a function of distance, the weights decrease, thus the name of inverse distance weighted.

Physical soil distribution
The gravel distribution of Kirkuk map is analyzed using the IDW interpolation method. The studied area can be classified into five main classes in terms of gravel content, which are very low, low, moderate, high, and very high or numerically as demonstrated in Fig. 3a. The first class starts from 0.007 to 9.54%, whereas the range of the fifth class is 32.07 to 75.21%, which represents the highest stable area identified by the green color and distributes in the center of the studied area. The second class is starting from 9.54 to 17.71%, which is the light brown color and distributes in the eastern and northeastern parts of the studied area. The third class is starting from 17.71 to 25.67% that illustrates the moderate class represented by light yellow colors, and distributes in the central, eastern, western south, and northern parts of the studied area. It is clearly predicted that the gravel distribution has the highest percentage in the middle and western zones of Kirkuk city with the lowest percentage in the northern and middle southern zones. It should be noted that areas with high gravel content own several characteristics such as high bearing capacity, high permeability, high water and chemical infiltration capability. In addition, the high gravel content areas are more susceptible to the influence of soil boiling under the impact of any expected earthquakes. Similarly, the sand distribution of Kirkuk map is analyzed using Arc GIS technique as shown in Fig. 3b. The sand and gravel distribution has almost similar distribution on Kirkuk map where the highest portion of the sand was also in the middle and western zones of Kirkuk city with the lowest percentage in the northern and middle southern zones. The ranges of the sand content start from 1.20 to 15.01% for very low zone, 15.01 to 20.6% for low zone, 20.6 to 26.78% for moderate zone, 26.78 to 32.71% for high zone, and 32.71 to 64.03% for the very high zone. In a similar manner, the high sand content areas are considered as coarse grain zones with relatively high bearing capacity, high permeability, high water and chemical infiltration potentials. Moreover, such high sand area zones are exposed to the impact of soil boiling under the effect of any anticipated earthquakes. Furthermore, the zones of high gravel and sand contents are not suitable for agricultural purposes since the water retain capability for such soils are low.
In addition, the Arc GIS technique is used to check down the silt distribution of the Kirkuk map as shown in Fig. 3c. It is obviously indicated that the silt distribution has the highest percentage in the center of the northern zones of Kirkuk city with the lowest percentage in the middle central to middle eastern zones. The ranges of the silt content start from 1.46 to 24.92% for very low zone, 24.92 to 32.81% for low zone, 32.81 to 40.38% for moderate zone, 40.38 to 47.57% for high zone, and 47.57 to 73.73% for the very high zone. It is worthwhile to mention that the zones with high silt contents are the most appropriate areas for agricultural purposes since such soils can retain the water and necessary chemical additives properly.
The clay distribution of Kirkuk map is examined using Arc GIS technique as demonstrated in Fig. 3d. It is noticeably appeared that the clay distribution has the highest share in the middle southern and upper northern zones of Kirkuk city with the lowest percentage in the western zone of the metropolis. The ranges of the clay content start from 0.02 to 9.2% for very low zone, 9.2 to 14.91% for low zone, 14.91 to 21.35% for moderate zone, 21.35 to 28.39% for high zone, and 28.39 to 52.93% for the very high zone. It should be observed that the areas with high clay contents maintain specific features such as low permeability, low water and chemical infiltration potentials. In addition, the high clay content zones exhibit a high expected consolidation settlement that might require a long time to vanish with the dissipation of any generated excess pore water pressure. Moreover, such high clay content areas can operate as a barrier against heavy metal contamination.

Chemical soil distribution
The Arc GIS technique was used to evaluate the pH distribution of the Kirkuk map as shown in Fig. 4a. It is clearly shown that the pH distribution has the highest percentage in the southeastern zone of the city while the lowest percentage in the southwestern zone of the city. The ranges of the pH content start from 6.13 to 7.51% for very low zone, 7.51 to 7.88% for low zone, 7.88 to 8.25% for moderate zone, 8.25 to 8.59% for high zone, and 8.59 to 9.69% for the very high zone. The zones with low pH content might have negative consequences on the future of any organic or inorganic chemicals, erosion, and leaching losses.
In addition, the Arc GIS technique is also used to investigate the gypsum distribution of the Kirkuk map as shown in Fig. 4b. It is clearly shown that the gypsum distribution has the highest percentage in the western zone of the city while the lowest percentage extended from the middle to the southeastern zone of the city. The ranges of the gypsum content start from 0.22 to 0.82% for very low zone, 0.82 to 1.19% for low zone, 1.19 to 1.66% for moderate zone, 1.66 to 2.36% for high zone, and 2.36 to 5.75% for the very high zone. The places with high gypsum content are more susceptible to differential settlement specifically in rainy weathers since gypsum dissolves in water and can be washed away producing uneven large voids in the soil. Thus, a rigorous caution should be considered in designing the sewage system in such places to prevent any geotechnical problems under the foundation of the built structures.
The Arc GIS or Arc map with Arc tool map technique are used to study the total dissolved solid (TDS) distribution of the Kirkuk map as shown in Fig. 4c. It is clearly shown that the TDS distribution has the highest percentage in the southwestern zone of the city while the lowest percentage in the middle western zone of the city. The ranges of the TDS content start from 0.02 to 1.08% for very low zone, 1.08 to 1.79% for low zone, 1.79 to 3.01% for moderate zone, 3.01 to 4.49% for high zone, and 4.49 to 15.18% for the very high zone. The zones with high TDS content are more toxic due to the increase in the salinity substance, alterations in the ionic water configuration, and poisonousness of discrete ions. Thus, such high TDS areas might have constraint biodiversity, changes in biotic groups, and enduring consequences at certain life phases.
The organic distribution of Kirkuk map is considered using an Arc map with Arc tool map technique as demonstrated in Fig. 4d. It is clearly shown that the organic distribution has the highest percentage in the southwestern zone of the city while the lowest percentage extended from the northeastern to southeastern zone of the city. The ranges of the organic content start from 0.045 to 0.36% for very low zone, 0.36 to 0.63% for low zone, 0.63 to 1% for moderate zone, 1 to 1.44% for high zone, and 1.44 to 3.49% for the very high zone. The places with high organic content could be beneficial for agricultural purposes. In addition, the high organic content has a positive impact on the process of sorption of insecticides that represent biochemical resources of the plants.
Finally, the sulfur trioxide (SO 3 ) distribution of Kirkuk map is analyzed using Arc GIS technique as shown in Fig. 4e. It is distinctly established that the SO 3 distribution has the highest percentage in the southwestern zone of the city while the lowest percentage in the southeastern zone of the metropolis. The ranges of the SO 3 content start from 0.06 to 0.34% for very low zone, 0.34 to 0.54% for low zone, 0.54 to 0.79% for moderate zone, 0.79 to 1.13% for high zone, and 1.13 to 3.22% for the very high zone. It is very significant to state that the SO 3 is very toxic and corrosive material and its poisonous effect exceeds the SO 2 by more than ten times [17]. In addition, it is verified that SO 3 can cause health problem to human beings represented by asthma. Hence, the zones with high SO 3 content represent places of extreme risk against human health.

Physico-chemical correlation
The correlations between physico-chemical soil characteristics such as gravel content, sand content, silt content and clay content with distinctive contents of pH, gypsum, TDS, organic, and SO 3 have been shown Table 2. It is clearly shown that there are significant positive correlations between the physical properties of the Kirkuk soil with the highest degree of correlation between gravel and clay contents up to 0.99. In addition, it can be noticed that there are positive correlations between the contents of gravel, sand, silt and clay with the pH content with the highest degree of correlation between the clay and pH content up to 0.55. However, negative correlations can be observed between the contents of gravel, sand, silt and clay with the gypsum content where the highest negative degree of correlation was obtained between gravel and clay contents with the gypsum content and up to − 0.19. Similarly, negative correlations are obtained between all the studied physical soil properties and the TDS content with the highest negative degree of correlation of − 0.30 between gravel and TDS contents. Moreover, negative correlations are obtained between all the studied physical soil properties and the organic and SO 3 contents with the highest negative degree of correlation of -0.19 between gravel and SO 3 contents. It is worthwhile to mention that the correlations between the chemical soil properties vary between positive and negative correlations with the highest positive correlation between gypsum and SO 3 contents of 0.84 and highest negative correlation between pH and TDS contents of − 0.30.
In Kirkuk city, a strong positive degree of correlation between gravel and clay (0.99) has been noticed whereas a weak positive degree of correlation between gravel and clay (0.384) has been reported in the literature [26]. This disparity in the degree of correlation is primarily related to the soil topographic distribution where a large range of soil samples has been obtained all over the Kirkuk area with no restriction to particular soil type. In addition, a strong positive degree of correlation between sand and clay (0.97) has been observed in Kirkuk city, whereas a strong negative degree of correlation between sand and clay (− 0.866) has been stated in the literature [24]. The variation in the degree of the correlation is attributed to the difference in the geological soil characteristics between the studied and compared areas. Moreover, a significant positive degree of correlation between sand and silt (0.91) has been achieved while a good negative degree of correlation between sand and silt (-0.79) has been specified in the literature [24]. In a similar manner, the distinction in the degree of the correlation is attributed to the divergence in the geological soil characteristics between the examined and compared areas. Furthermore, a high positive degree of correlation between silt and clay (0.92) has been obtained whilst a week negative degree of correlation between silt and clay (− 0.35) has been determined in the literature [24]. The key explanation could be due to the distinctive definition for grain sizes between silt and clay in various soil classification schemes accompanied with different geological distribution between the inspected and compared areas.
There was a reasonably positive correlation between pH and sand (0.53), while a strong positive correlation between pH and sand (0.778) was found in the literature [37]. The key discrepancy between the degrees of correlation might be attributed to the size of samples in which ten soil samples were collected from one agricultural area throughout the comparative region, whilst the analyzed region has included a wide variety of soil samples distributed all over the Kirkuk city. In comparison, a reasonably positive degree of pH-silt correlation (0.45) was observed, whereas a strong negative degree of pH-silt correlation (− 0.814) was recorded in the literature [37]. In a similar manner, the contrast between the degrees of correlation might be due to the difference in the size of soil samples and the composition of the environment surveyed for both inspected and compared regions. In addition, a moderate degree of positive correlation between pH and clay (0.55) has been obtained while a weak degree of negative correlation between pH and clay (− 0.027) has been reported in the literature [24]. The variation in the degree of correlation is due to the difference in geological soil characteristics between the examined and compared areas.

Linear regression model
A linear regression model has been proposed to predict the chemical soil properties based on the observed physical soil properties as follows: where A and B are model parameters.
A summary for all the details of the proposed model (Eq. 2) has been shown in Table 3. The least square method has been used to solve the linear regression model. In Table 3

Linear multi regression model
A linear multi regression model has been proposed to predict the chemical soil properties based on the observed physical soil properties as follows: (3) Chemical Soil Property = C * gravel(%) + D * sand(%) + E * silt(%) + F * clay(%) + G where C, D, E, F, and G are model parameters.
A summary for all the details of the linear multi regression model (Eq. 3) has been shown in Table 4. The least square method has been used to solve the linear multi regression model. In Table 4 3 contents with values more than 1.5%. Overall, all the chemical properties were predicted reasonably using the proposed linear multi regression model.

Conclusions
In this study, the spatial analysis has been implemented using the inverse distance weighted (IDW) technique with the data of 56 different soil samples collected from distinctive depths of Kirkuk city. Both physical and chemical properties for collected soil samples have been investigated. Moreover, statistical investigations represented by physico-chemical correlation, linear single and linear multi regression models have been utilized. Based on the available data and the results of this study, the following conclusions can be summarized: 1. Nine digital maps for soil type distribution of Kirkuk city including the percentages of the square area of gravel, sand, silt and clay distributions have been obtained. 2. The gravel and sand are distributed similarly on the created digital maps where the highest percentages of both elements lie in the middle and western zones of Kirkuk city. These high gravel and sand content zones are more susceptible to the influence of soil boiling under the impact of any expected earthquakes. 3. The silt distribution is condensed in the center of the northern zones of Kirkuk city, whereas the clay distribution has the highest share in the middle southern and upper northern zones of Kirkuk city. The zones with high silt content are most suitable areas for agricultural purposes while the zones with high clay content are more appropriate to work as barriers against heavy metal contamination. 4. The gypsum and organic distributions on the produced digital maps have the highest percentages in the western and southwestern zones of the city respectively. The high gypsum content zones are susceptible to differential settlement while the high organic content zones can be classified as rich biochemical resources of the plants. 5. The pH distribution has the highest percentage in the southeastern zone of the city where such zones might affect the organic and inorganic chemicals negatively. However, both SO 3 and TDS distributions on the developed digital map have the highest percentages in the southwestern zone of the city where the areas with high SO 3 and TDS contents are very toxic and represent corrosive materials. 6. For Kirkuk soils, it has been proved that the gravel and clay contents were correlated strongly with a degree of correlation of 0.99 whereas positive correlations have been obtained between physical soil characteristics and pH content with the highest degree of correlation of 0.55. 7. Using the linear regression model for Kirkuk soils, negative linear correlations are proved for the relation between the values of the pH, gypsum, organic, and SO 3 contents with the increase of the gravel content, whereas a positive linear correlation is obtained for the values of TDS content with the increase of the gravel content. 8. Using the linear regression model for Kirkuk soils, negative linear correlations are noticed for the relation between the values of the gypsum, TDS, organic, and SO 3 contents with the increase of the clay content, whereas a positive linear correlation is obtained for the values of pH content with the increase of the clay content. 9. Using the linear multi regression model for Kirkuk soils, the proposed model predicts the pH values reasonably where most of the values were close to the equality line. However, the proposed model over predicted most of the values of gypsum, TDS, organic and SO 3 contents.