Modeling groundwater total dissolved solid from derived electromagnetic data using multiple linear regression analysis: a case study of groundwater contamination

The high concentration of total dissolved solids (TDS) and other physicochemical parameters in groundwater around dumpsites have been used to implicate contamination from decomposed waste materials. A simple multiple linear regression (MLR) TDS model that integrates the TDS data derived from boreholes and hand-dug wells to the geophysical parameters obtained from the frequency-domain electromagnetic (EM) data was developed in this research. This is with a view to efficiently monitor groundwater resources and exploration around the Olusosun dumpsite and its communities. With the aid of the MLR equation, the observed TDS concentration of water samples collected from boreholes and hand-dug wells, and the corresponding estimated ground conductivity data in the vertical dipole mode (VD 40) and horizontal dipole modes (HD 40 and HD 20), obtained from geophysical surveys were regressed in Microsoft Excel software to generate a MLR TDS model. The integrity of the derived TDS model was appraised to examine the possibility of deploying it to investigate the TDS content of groundwater around the study area. The EM data and the resistivity models obtained around the study area confirmed contamination going on around the dumpsite. The developed TDS model can be put to use with high confidence, for groundwater TDS prediction around the study area where there are only terrain conductivity data but with no boreholes parameters. Also, terrain conductivity data alone can be applied to the model to predict the concentration of TDS in groundwater where there are no boreholes and hand-dug wells, therefore reducing the cost and time of determining and monitoring both parameters independently. With the aid of the ArcGIS software, the TDS model was used to generate TDS estimate map for the area. The knowledge of the TDS variability in such a map could give a clue about the integrity of the underground water around the site.


Introduction
The use of geophysical tools for contamination studies around dumpsites and hydrocarbon depots is increasingly gaining ground and cannot be over-emphasized. A dumpsite is a portion of land set aside by Governments or communities of people for the sole aim of depositing wastes that are generated daily from homes and institutions. Most times, these activities are done indiscriminately. One of the great dangers that this kind of practice poses to man is that of pollution to the environment, and a major challenge that comes with environmental pollution is to determine its extent in the environment.
The Olushosun dumpsite is improperly designed and therefore not protected, thus allowing for environmental contamination around the vicinity where the site is located. According to International standard, dumpsites are usually designed in such a way that the lining of the walls are made up of polyethylene geomembrane liners or clayey materials meant to inhibit the lateral and vertical movement of the leachates produced from the refuse materials dumped on the dumpsites . The risk of contamination 1 3 is particularly high around dumpsites with no leachate collection systems and control. However, this scenario is not limited to dumpsites that do not have leachate control equipment, as groundwater around dumpsites with geomembrane liners and leachate collection facilities are also susceptible to contamination from leachate particularly if there are problems of improper design and construction, or challenges of maintenance (Bjerg et al. 2003). The primary threat of contamination to groundwater emanates from the leachate formed from waste materials, which most often contain toxic chemical substances, mainly when wastes of industrial origins are involved (Enekwechi and Longe 2007). It has been, however, previously documented that leachates from dumpsites for non-toxic waste could also contain complex organic compounds, chlorinated hydrocarbons, and metals at concentrations, which becomes a threat to both ground and surface water. The generated leachate is usually made up of inorganic and organic compositions. Also, with the passage of time, the generated leachate moves into subsurface systems resulting in the change of chemical and physical characteristics of groundwater. Sang-il and Peter (1993) had reported that heavy metals, including cadmium, arsenic, and chromium, were found to be at an excessive level in groundwater due to landfilling activities. According to Enekwechi and Longe (2007), the volume of leachate produced is hugely dependent on the area of the landfill, the meteorological and hydrogeological factors and the integrity of capping.
Electromagnetic surveys often are used to locate conductive materials such as buried metal objects, ore bodies, and fluid-filled features and to map conductive plumes, such as landfill leachate or saltwater intrusion (Frischknecht et al. 1991;Grady and Haeni 1984;McNeill 1990;Powers et al. 1999). The D.C resistivity techniques are increasingly being used in contamination studies due to their capacity to discriminate between the zones free of contamination and contaminated areas. Principally, the methods are not used to reveal contamination directly, but rather, they unravel contamination through sharp variation in subsurface resistivity data due to the presence of these contaminants .
One of the primary sources of water is from groundwater. This vital resource is used for industrial purposes and domestic use in many parts of Nigeria, particularly in the coastal areas. But unfortunately, due to surface water pollution based on human activities, it has become a threat to human life. The high concentration of TDS and other physicochemical parameters of groundwater around dumpsites from the literature have been used to confirm contamination from decomposed waste materials.
Because of the daily increase in the population of peoples living in Lagos metropolis, there is a steady increase in the volume of waste generated per day, which by implication, increases the chance of contamination of the environment. Hence, the need for constant information on the status of groundwater around the dumpsites located in the city. One of the ways by which this objective can be achieved is through modeling groundwater properties and geophysical data to assist in groundwater quality prediction and monitoring.
Therefore, this research attempts to integrate the EM and resistivity methods with hydrophysical parameters of groundwater from existing boreholes and wells to investigate possible contamination of groundwater due to leachate emanating from the dumpsite. A proposed model of groundwater TDS was also developed by correlating the observed TDS of water samples collected within and around the vicinity of the dumpsite with the multiple terrain conductivity data set (HD40, VD20, and VD40) obtained from a geophysical method.

The study area
The Olushosun dumpsite is located within longitude 03.372E to 03.374E and latitude 06.588N to 06.595N in the Ikeja Local Council Area of Lagos State, toward the northern part of the Lagos metropolis (Fig. 1). The dumpsite is managed and maintained by the Lagos State Waste Management Authority (LAWMA). The need to keep the site and prevent harmful effects to the residents of the area necessitated the establishment of the agency. The landfill started operation since 1991, and it is situated on 42 ha of land (LAWMA 2004). The site has witnessed rehabilitation, which included the construction of access road for ease of dumping, spreading of waste materials, compaction of waste, reclamation of land, and even recycling of waste materials since its establishment. It is located amid community settlements, industrial estates, and commercial centers. Therefore, waste materials have their origin to these settlements, and waste materials from other parts of Lagos also find their way to the site. The dumpsite is accessible by tarred roads along the Ibadan-Lagos express.

Geology of the study area
The study location is situated within the Eastern Dahomey Basin, which extends from southeastern Ghana, passing through Togo and Republic of Benin to terminate in southwestern Nigeria. The subsurface basement rock, Okitipupa ridge separates the Basin from the Niger Delta Basin. In terms of the geology of the Basin, the constituent rock formations are made up of the Cretaceous to Tertiary sedimentary sequence that pinch out on the east. Generally, outcrops of rocks within the Basin are poor as a result of massive soil and vegetation cover. As a result, it was challenging getting information about the geology of the area, but with recent events of borehole drilling and road cuts, it became easy to gather information about the stratigraphic units within the Basin. Omatsola and Adegoke (1981) proposed the lithological sequences of the Basin to include the Abeokuta Formations (Ise, Afowo, and Araromi Formations), and Ewekoro, Akinbo, Oshoshun, Ilaro, and Benin Formations (Fig. 2). The rock units are composed mainly of loose sediment ranging from clay, silt, and coarse to fine-grained sand, called coastal plain sand. The exposed surface consists of sands with lenses of clays that are not well sorted. Parts of the sand in some places are cross-bedded and are characteristical of continental to transitional environment (Agagu 1985;Enu 1990;Jones and Hockey 1964;Nton 2001).
In terms of groundwater prospectivity in the Lagos metropolis, there are four hydrogeological rock units that are usually explored. The first of them extended from the surface to a depth of about 12 m below the ground surface and made up of intercalation of sand and clay. Their shallow depths make them prone to contamination and pollution arising, most times from human activity. The second aquifer unit occurs at a depth ranging between 20 and 100 m below the sea level. In terms of prospecting for groundwater in the city of Lagos, this rock unit is relatively safer. The third is encountered at a depth range of between 130 and 160 m, while the fourth hydrogeologic unit, separated from the third

Materials and methods
The EM data was obtained with the help of a portable ground conductivity meter (Geonics EM-34). Three coil spacings at 40 m, 20 m, and 10 m are usually employed, using frequencies of 400, 1,600, and 6,400 Hz, respectively. At each position, six terrain conductivity data were acquired by making measurements in both the horizontal and vertical dipole modes with all the three intercoil spacing. The parameters obtained, therefore, represent lateral variations in terrain electromagnetic conductivity at different depths. The exploration depths as a function of the selected frequencies and intercoil spacings were 15.0 m, 30.0 m, and 60.0 m, respectively, for the vertical dipole mode. For the horizontal dipole mode, the respective depths are 7.5 m, 15.0 m, and 30.0 m.
Sixteen (16) traverse lines were occupied on and outside of the dumpsite, with the length of each line ranging between 170 and 240 m (Fig. 1). The separation between measurement points along the line was 10 m. Traverses 1-8 were located on the dumpsite while the remaining 8 traverses served as control and were located at various distances away from the dumpsite (between 100 and 600 m). The data so obtained were subsequently integrated with groundwater hydrophysical parameters to characterize the study area accurately.
Since the objective of the study is TDS prediction through MLR model, sixteen hand-dug well, and borehole water samples were obtained at proximity to the EM data collection point ( Fig. 1) and analyzed to determine their TDS content. This was achieved with the aid of a portable TDS meter and a plastic bowl that was used to collect water samples. The geographic coordinates of the sampling points were taken with the help of the geographic positioning system (Garmin GPS Channel 76 model).
At the end of the data acquisition exercise, TDS values of the selected locations and EM parameters (VD 40, HD 20, and HD 40) were obtained and subsequently analyzed using the regression analysis and correlation module of the Excel 2010 Statistical Software Package. In a MLR analysis, the relationships between two or more variables (in this case, the EM parameters) are modeled using linear predictor functions whose unknown model parameters (in this case, the TDS parameters) are estimated from the data.
Mathematically, the Pearson correlation coefficient is defined as: Considering the generalized multiple linear regression; w h e r e Y = D e p e n d e n t va r i a b l e t o b e p r edicted, 0 = Intercept Of t he reg ression line, 1 through n are the tangents of the regression line , X 1 throuhg X n are the multiple independent variables , i = is the error component.
The coefficients 0 , 1 , 2 and n will be obtained through the interactive model regression analysis of the measured EM data and the TDS parameters. The values of the predicted TDS and the acquired VD 40, HD 20, and HD 40 parameters were interpolated using the ArcGIS software to produce the subsurface spatial distribution maps of the area using the kriging interpolation method. The kriging gridding technique weighs the surrounding measured parameters to determine the prediction for each position. The weights are determined by the prediction location and on the overall spatial arrangement among the measured points (Geoff 2005).
A 2D resistivity survey was integrated into the study to provide information that compliment results from the EM survey. This was achieved with the aid of a digital multielectrode system (Super Sting R8 Earth Resistivity/IP meter). The terameter used 84 electrodes and was deployed along with an electrode selector for the survey. The choice of the pole-dipole configuration was because of its advantage in providing good vertical resolution and a clear image of the contaminated zone. Data were obtained in May 2014, and a time-lapse survey followed in December 2015. The Earth Imager resistivity inversion Software was used to process and invert the 2D resistivity data.

Results and discussion
In the discussion of the cause of the conductivity anomaly, it is essential to note that non-organic contamination of soil or groundwater usually leads to an increase in the conductivity of the groundwater. Then, the challenge with the interpretation is to explain the variations in conductivity aside from the conductivity changes caused by other factors such as changing lithology. This research did not consider lithologic variations as a potential contributing factor in the (2) Y = 0 + 1 X 1 + 2 X 2 + ⋯ n X n + i study area. Conductivity has a linear relationship with salinity, and soluble salts are more likely to build up in areas where there is less direct contact with surface water. Zones of lower resistivity could be interpreted as areas with more confined systems, where areas with higher resistivity could be interpreted as areas with greater groundwater and surface water interaction. In summary, the influences of natural and cultural interferences on the geophysical data obtained in the study area are interpreted to be minimal. Therefore, trends and anomalies observed in the data are most likely to be caused by the elevation of the concentrations of non-organic contaminants in the leachates.
The mean values of the calculated terrain conductivity and the obtained TDS values of groundwater around and within the dumpsite that were used for the MLR analysis are presented in Table 1. Initially, the whole conductivity data set were used for the MLR analysis, but the inclusion of the VD 10, HD 10, and HD 20 data did not yield reasonably predicted TDS values and were considered not to be statistically relevant; hence, they were removed. This trend is attributed to the relatively shallower depths of investigation using VD 10 (15 m), HD 10 (7.5 m), and HD 20 (15 m), when compared with the varied depths to the water table around the study area (18 m to 30 m).
Considering Eq. 2, and using the statistically significant EM parameters, the model becomes; where Y = TDS of water samples, 0 through 3 = Intercept Of the regression line, HD 40 = Horizontal Dipole coil orientation data measured at 40 m interval, VD 20 = Vertical Dipole coil orientation data measured at 20 m interval, VD 40 = Vertical Dipole coil orientation data measured at 40 m interval.
From the summarised results of the MLR analysis (Table 2), the Beta coefficients, β 0 = 60.42, β 1 = 9.58, β 2 = 3.53, and β 3 = − 12.70 were derived. Therefore, substituting these parameters into Eq. 3, the mathematical formulation of the MLR model becomes; From Eq. 4, the TDS and EM parameters (VD 40, VD 20, and HD 40) become the dependent and multiple independent variables, respectively. Based on the work of Mazac et al. (1985), Eq. 4 is referred to as a model. Hence, it is established as the proposed simple multiple linear regression TDS model for predicting TDS of groundwater around the study area. From the MLR analysis carried out, the predicted TDS values are shown in Fig. 3.
Tables 2 and 3 present the results of the parameters evaluated from the regression analysis. The sensitivity analysis of the derived parameter was carried out in order to evaluate the significance of the EM parameters in modeling the TDS content of groundwater around the study area. From Table 2 and Eq. 4, the positive sign of the EM data coefficients and the tStat-values of HD 40 and VD 20 show that a positive association was maintained between TDS values and HD 40 and VD 20 parameters. In contrast, a negative association exists between TDS and VD 40 parameters. The P values indicate the strength of a particular variable within the multiple independent variables in predicting, and in other words, adding value to the modeling equation. The   Table 2). The F statistics also have a P value below 0.5 or 0.1 (6.89355E−12) for the 95% confidence level. Therefore, one can conclude that the integrity of the model as a predictive tool is high. Also, the most critical factor in determining the success of the model, the adjusted R square, when compared with the multiple R or R square (Table 3), shows that the model accounts for 98.5% of the variance in the concentration of TDS in groundwater around the study area. The importance of these results stems from the fact that with HD 40, VD 20, and VD 40 data obtained from the non-investigated part of the study area, the proposed TDS model can be utilized for estimating TDS content in groundwater obtained from such an area.

Model validation
The mean values of the measured terrain conductivity (VD 40,VD 20,and HD 40) and the observed groundwater TDS parameters obtained from the work of Ayolabi et al. (2015) around the Olusosun dumpsite (Table 4) were applied on the model, to validate the predictive power of the developed TDS model. The results were then compared with their observed TDS values. From Fig. 4, it can be seen that a plot of TDS against profile location from the observed TDS parameters correlates well with the plot of the TDS parameters derived from the application of the observed terrain conductivity data (VD 40, VD 20, and   Observed TDS from water samples (Ayolabi et al. 2014) Predicted TDS using proposed model

TDS (mg/l)
EM profile and water sample locaƟon 1 3 HD 40) on the proposed model. This goes further to show that the proposed TDS model can be used as a predictive tool for estimating TDS concentration in groundwater in regions not investigated within the study area, provided the VD 40, VD 20, and HD 40 parameters from such area are known. The proposed TDS model was apprised to examine its suitability as a predictive tool for the estimation of groundwater TDS around the study area. Koutsoyiannis (1977) offered a means of measuring the accuracy of a prediction achieved from a mathematical model. It is termed the Theil inequality coefficient, defined by the mathematical equation; where Y is the Theil's inequality coefficient, X i is the measured TDS value of groundwater, X is the predicted TDS from the model (Fig. 3).
Equation 5 was used to examine the suitability of using it as a predictive tool. The Y value obtained using Eq. 7 is 0.042. The closer Y is to zero; the more suitable the model will be as a predictive tool for estimation of groundwater TDS around the study area. A value of 1 indicates that the prediction is no better than guesswork. The outcome of this analysis has also shown the integrity of the proposed model in estimating groundwater TDS in regions around the study area.

Spatial variation of predicted TDS and measured EM parameters
The predicted TDS and the VD 40, HD 40, and HD 20 parameters were interpolated using the ArcGIS software to produce the subsurface spatial distribution maps of these parameters around the study area (Figs. 5,6,7,8). From the predicted TDS map, it is clear that the TDS concentration in groundwater around the study area ranges from 27 to 1128 mg/L. The information in Fig. 5 using the legend scale shows that the TDS observed around this area is seen to reduce in concentration with distance from the waste site into the surrounding groundwater, suggesting that the closer the groundwater to the dumpsite, the higher the level of TDS. This is a reflection of the flow pattern of the contaminants and probably of the groundwater, as contaminants are usually mobilized and moved in the direction of groundwater. The high concentration of TDS close to the site could be linked with the high level of formation of leachate arising from the decomposed biodegradable materials on the site, and by implication, the contamination around the site. The spatial variation map of TDS shows a higher concentration toward the southern part of the study area, whereas, the north-western part and pockets of the south-western and south-eastern portion of the study area showed low TDS distribution. Figure 6 shows the contour plot of ground conductivity produced using the HD 20 data. Here, the emphasis is on the relative investigative depths of the three intercoil separations. The VD 40, HD 20, and HD 40 coil spacings and 68 mS/m used as a benchmark for the uncontaminated zone, and the EM data were interpreted with respect to these values. At this depth (15 m), the map shows high conductivity on the dumpsite and generally around the central portion of the map with a probable southward trend of migration of contaminants. These high values around the central part could be tied to the decomposed waste materials on the dumpsite. Any water-bearing sand around the zones with moderately high to high apparent conductivity is likely to be impacted by the contaminants generated on the dumpsite. The north-western, south-western, and south-eastern portions of the map with low to medium apparent conductivity values reflect the non-polluted area.
The HD 40 map shows conductivity values ranging between 38 and 216 mS/m with conductivity values from 38 to78 mS/m used as the benchmark for the non-contaminated zone (Fig. 7). At this depth (30 m), the same scenario exists when compared with the HD 20 map, with high conductivity values seen reducing from the dumpsite around the center toward the southern and the eastern portions of the study area. Again, the northern, north-western, south-western, and the extreme south-eastern parts with low to medium conductivity values indicate the non-contaminated parts of the study area.
The VD 40 spatial variation map shows conductivity values ranging between 39 and 186 mS/m with conductivity values from 39 to 72 mS/m used as the benchmark for the uncontaminated zone (Fig. 8). At this depth (60 m) of investigation, high apparent conductivity values are still observed around the dumpsite and radiating away from the center. The results from previous work and 2D resistivity surveys carried out on the dumpsite during the cause of this study indicated that the contaminants had migrated vertically downward beyond 60 m depth (Fig. 9). From the 2D resistivity models, the contaminated zones are characterised by resistivity values ranging from 0.63 to 12.5 Ωm. The pole-dipole models obtained on the dumpsite in May 2014 (Fig. 9a) and December 2015 (Fig. 9b) show clear evidence of vertical migration and contamination of the study area. Information from the borehole log obtained around the area indicates that the shallow aquifers underlying the dumpsite have been invaded by the contaminants from the decomposed waste materials . The results of the EM parameters and TDS in Table 1 were used to plot graphs to show the relationship between the estimated TDS values against the HD 40, VD 20, and VD 40 values (Fig. 10). Association between TDS and HD 40 (0.7864), TDS, and VD 20 (0.687) are high while there is a weak relationship between VD 40 values (0.3621).

Conclusion
Through the instrumentality of the Microsoft Excel Software, a simple multiple linear regression TDS model was developed for predicting the TDS content of groundwater around the Olusosun dumpsite. The development of the TDS model was achieved by regressing the terrain conductivity data (VD 40,VD 20,and HD 40) obtained around the area with observed TDS values obtained from water samples from boreholes and hand-dug wells in the study area. From the outcome of this study, the TDS content of groundwater was adequately assessed using the generated multiple linear regression TDS model. The sensitivity of the model was evaluated from the outcome of the MLR analysis. The model was also appraised and validated using data from the previous study to examine its suitability as a predictive tool for the estimation of groundwater TDS around the study area. Even though the TDS model is proposed for use in the vicinity of the dumpsite, it may also be used confidently for the estimation of the TDS content of groundwater around the study area, where hand-dug wells and boreholes do not exist, but with only VD 40, VD 20, and HD 40 data available. By extension, terrain conductivity data from other areas with similar geology alone can be implemented on the model to estimate the TDS content of groundwater, thereby reducing the cost and time of acquiring and monitoring both parameters independently.
The geophysical investigations and hydrophysical assessment of groundwater around the dumpsite established zones that had been affected by contaminants generated from degraded waste products. From the analysis and interpretation of the spatial distribution maps produced for the area, using the EM data acquired from the area, these zones appeared to be concentrated at the center of the dumpsite with less impact on the other areas. The lateral and vertical spread of the contamination around the site was also exposed from the maps and the 2D resistivity models of the area.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.