Assessing data-scarce contaminated groundwater sites surrounding petrochemical industries

A common problem when studying groundwater contamination in low-income countries is that data required for a detailed risk assessment are limited. This study presents a method for assessment of the potential impact of groundwater contamination by total petroleum hydrocarbons (TPH) in a data-scarce region. Groundwater modeling, using the MODFLOW, was used to simulate regional-scale flow pattern. Then, a semi-analytical contamination transport model was calibrated by minimization of the absolute errors between measured and modeled concentrations. The method was applied to a case study in Kazakhstan to assess the potential spreading of a TPH plume, based on historical observations. The limited data included general information about the local geology, observations of GW level in the area, and concentrations during 5 years of TPH in monitoring wells surrounding the source of the pollution. The results show that the plume could spread up to 2–6 km from the source, depending on estimate of the initial concentrations, until the concentration reaches permissible levels. Sensitivity analysis identified parameters of longitudinal and transverse dynamic dispersivity together with the plume of TPH spreading, as the priority subjects for future investigations. The proposed approach can be used as a tool for governmental and municipal decision-makers to better plan the usage of affected groundwater sites in data-scarce regions. It can also help to decrease the negative impact of contaminated GW on human health and to better manage the industrial pollution.


Introduction
Industrial activities are recognized as major source of pollution worldwide (Hossain 2011). Kazakhstan is a country that currently invests heavily in industrial capacity to develop the economy (UN 2019). Industrial development and economic growth, however, put pressure on environment and may jeopardize environmental safety and well-being of society (Li 2016). Several studies have shown that air (Assanov et al. 2021), water (Hrkal et al. 2006;Karatayev et al. 2017), and soil (Mikolasch et al. 2019;Woszczyk et al. 2018) are under pressure from industrial pollution in Kazakhstan. The current severe ecological status in many industrial regions serves as a challenge for the country to enforce environmental assessment, policy, and remediation practices (Russell et al. 2018). Up to 65% of all freshwater resources in Kazakhstan may be permanently lost due to wasteful use and polluting activities. Simultaneously, the industry consumes about 25% of all available freshwater in Kazakhstan (Karatayev et al. 2017). It is projected that the mismanagement of freshwater resources could lead to a national water deficit by 2030 (Thomas 2015).
Processes of water usage in the oil refinery industry in Kazakhstan with consequent environmental pollution have already been described by Radelyuk et al. (2019). While Kazakhstan has declared it will carry out the implementation of Best Available Techniques (BAT) and approaches towards developing a green economy (Abdildin et al. 2021), the mechanisms for implementation of such measures still have loopholes (Radelyuk et al. 2021a). According to Kazakhstani law, if water in a wastewater discharge point is already polluted and water quality cannot be assured as safe for any type of usage, further wastewater discharge may continue without strict limitation (Ministry of Environment 2012). The requirement for this is that the receiving water body should not be used as a source for drinking, domestic use, or irrigation purposes. Also, the impact on groundwater should be eliminated, as even a relatively small pollution discharge may affect the whole aquifer Water Code 2003). According to Locatelli et al. (2019), when groundwater has been exposed to pollution for decades, the concentration in the aquifer may have achieved quasi-steady-state conditions. In Kazakhstan, many receiving water bodies, such as lakes and ponds, have already been polluted during the Soviet period and the pollution process has been continuing ever since. Thus, industries in Kazakhstan use their legal right to discharge improperly treated wastewater into the environment.
Monitoring of groundwater quality in Kazakhstan is usually carried out by installation of observation wells surrounding the source of contamination. The concentration of contaminants should not exceed the defined limit at a regulated boundary of a sanitary zone, which is defined as being 1000 m downgradient from a contamination source (Ministry of Economy 2015). However, recent research in the experimental area of this study has shown that pollutant concentrations often exceed permissible levels outside the sanitary zone. Consequently, pollutants at some sites are likely to be spreading towards areas with substantial groundwater use (Radelyuk et al. 2021b).
According to EU (EU 2002), the following investigations should be considered to analyze existing pressures and impacts of the pollution on environment and health of settlements: (1) groundwater modeling for identification of groundwater flow and potentially affected areas, and (2) contamination transport modeling, as the old industrial spills continue polluting the aquifer. These studies are a basis for further management of the affected area. Groundwater research, a key component of the procedure, is associated with long-term investigations, uncertainty, challenges for cooperation and data sharing between a wide range of stakeholders, and the use of advanced technological measures to determine the likely fate of contaminants in the subsurface (Li 2016). Karatayev et al. (2017) noted that poor collaboration between key stakeholders (government, industry, and academia) in Kazakhstan is a major weakness for supplying decision-makers with quantitative facts. Consequently, the required research faces several limitations.
Under the conditions of lacking relevant data, a holistic view is needed to evaluate the situation and to give input to decision-making and actions towards improvement of the ecological status of the affected region. This research is an attempt to deal with the complexity assessing and managing groundwater pollution in a low-income country like Kazakhstan, and to obtain an insight concerning pollution spread in an efficient manner in situations where limited data are available.
The method proposed in this study uses a two-step procedure. The first step includes carrying out numerical groundwater modeling using MODFLOW to define the groundwater flow direction. In the second step, a contamination transport model is applied for general description of plume development in the aquifer. As a result, the potential fate of contaminants can be assessed under consideration of different scenarios, depending on local conditions. The suggested method is applied to a case study where groundwater contamination occurs from a petrochemical industrial cluster in Kazakhstan. The aim is first: to identify the area of the aquifer that would be affected by contamination based on historical observations; and second, to assess the potential hazard from the spreading of the contamination in groundwater under different scenarios.
This paper consists of the following sections. The Methodology section includes three parts. Part 1 presents the general procedure of the developed method. Part 2 gives general insights about requirements for each step of the procedure. Part 3 shows how the method was applied to a real-world case study with limited data. The Results and discussion present results of the applied method. The Conclusions section discusses limitations, advantages, and disadvantages of the method, and how the method can be used for real-world problems.

Procedure
The procedure developed in the present study is introduced in Fig. 1. First, all available data should be collected and screened. Potential data sources are official reports from the government and industry, personal communication, and previous research in nearby areas (Step 1). Second, groundwater modeling is performed, using a basic set of information regarding local lithology, boundary conditions, and observed piezometric heads (Step 2). A regional-scale model is developed if the area of investigations is large. After calibration of the regional-scale model, local scale modeling is carried out to define the direction of groundwater flow, as a starting point for contamination transport modeling. Modeling result is considered reliable if the estimated hydrogeological characteristics match the values from technical reports and available literature. Finally, contamination transport modeling is carried out to consider real and potential scenarios of contamination within the aquifer (Step 3). A steady-state contamination transport model is based on the solution of partial differential equations for advection-dispersion processes. The modeling is considered successful, if the calibrated input parameters to the contamination transport modeling fit the values from the groundwater modeling and available data, including measured against modeled concentrations (McMahon et al. 2001). Then, the obtained parameters are used for estimation of spreading of the contamination plume in space, based on historical records and potential scenarios.

General description of the developed method
Step 1. Data preparation Data needed for groundwater modeling are obtained from hydrogeological field studies. Modeling of groundwater flow is reliable when geological characteristics such as the properties of the aquifer matrix, the aquifer thickness and bedrock elevations and hydrological properties such as groundwater levels, boundary characteristics, the hydraulic conductivity (HK) of the aquifer matrix, conductance, and other data if they are available (Hashemi et al. 2013(Hashemi et al. , 2015. However, data from especially older studies need to be quality controlled. The collected data are used to characterize the aquifer, define boundary conditions, achieve an efficient discretization scheme, and subsequently, to build the groundwater flow model. Contamination transport modeling requires information about advection-dispersion processes in the porous media, which depends on both properties of geological layers and the contaminant. They include: concentrations of contaminants at the monitoring points to validate the process of contaminant spread and related characteristics; starting concentrations of the pollutants, depending on the source of pollution, and recharge concentrations of the pollutants depending on the load from the source. The required local geological characteristics for transport modeling include longitudinal dispersivity and porosity. Field studies are important, and if lacking, uncertainty increases (Naseri-Rad and Berndtsson 2019).

Step 2. GW modeling
After all available data are gathered, a conceptual site model (CSM), which is an efficient tool for presentation and understanding of groundwater flow and transport processes (Todd and Mays 2005), is created. The CSM is a consideration of all hydrogeological processes that control the transport, migration, and any potential and real impacts of contamination in groundwater. The conceptualization includes first, assignment of boundary conditions, such as identification and consideration of inflows/outflows/ barriers to the system, geometry, receptor(s), interactions between surface water and groundwater, etc., and second, assignment of values for aquifer characteristics (McMahon et al. 2001).
The choice of a modeling approach depends on two factors: the data availability and reflection to the modeling objectives (McMahon et al. 2001). In case of model construction for investigation of groundwater pollution, one of the most popular and efficient tools for GW modeling is MODFLOW-2000, which was developed by McDonald and Harbaugh (Harbaugh et al. 2000). The CSM is represented in a quantitative framework, which is based on the principal equation of the three-dimensional groundwater flow equation for porous medium using a finite-difference method. The Groundwater Modeling System (GMS 9.0) software can be used to perform the modeling under a steady-state condition Eq. (1): where K xx , K yy , and K zz are hydraulic conductivity along the x, y, and z axis assumed to be parallel to the major axes of hydraulic conductivity (m/d); h is the potentiometric head (m); and W is volumetric flux per unit volume, representing sources and/or sinks of water. Steady-state modeling is the optimal option for consideration of boundary conditions as well as simulating the general groundwater flow (Hashemi et al. 2012). Thus, steady-state conditions are assumed as the best option under data scarcity, such as lack of information about temporal changes of groundwater level, storage parameters, recharge, and discharge rates.
The next step is the calibration process, which aims to estimate and utilize all available model parameters. Manual (trial and error), automated [e.g., PEST package (Doherty et al. 1994)], and combined (manual and automated) calibration methods can be used to calibrate the model by minimizing the difference between simulated and observed groundwater levels, as those measurements are usually available and are commonly used in the calibration processes (Barnett et al. 2012). The problem with only head data being available is that this parameter on its own cannot mathematically constrain an inverse solution of the groundwater flow equation (Haitjema 2006). This problem can be solved using other parameters for calibration, such as hydraulic conductivity, conductance, transmissivity, and drain values, with ranges based on reported available data. The process of calibration is considered successful when the root-meansquare error (RMSE) residual of head is less than a certain value relevant for local conditions (typically 1 m, as default) (Fienen et al. 2013).
If the aquifer is large with limited data and with an uneven distribution of monitoring wells, it is recommended that a number of groundwater flow models are developed at a regional to a local scale. In this case, all the available information together with the simulated regional groundwater flow direction are transferred to the local model as an initial condition. This will help establish reliable conditions for the model boundaries as well as assigning realistic hydraulic parameter values.
However, the local scale modeling calibration may encounter some problems, as the grid size changes from regional to local scale. This may contribute to increased uncertainty of defining correct pathways of contamination transport. To overcome this, while keeping the same layer data such as bottom, surface, and groundwater elevations, the boundary conditions for the local model are assigned to the specified heads (Mehl and Hill 2010). The model grid is re-discretized if needed within the decreased grids. When the groundwater model is calibrated, a particle-tracking tool for MODFLOW, named MODPATH, can be used to assign the required coordinates for contamination transport modeling via displaying the groundwater flow direction in the study area (Pollock 2012). In this way, possible pathways for pollution transport can be determined.
Step 3. Contamination transport modeling The transport model incorporates steady-state solution for groundwater flow as a starting point to compute how concentrations of dissolved contaminants from a plan source change over time as they are transported in groundwater by advection and dispersion. This accounts for the role of varying hydraulic conductivity and other spatially variable hydraulic parameters that accompany aquifer heterogeneity. The transport model used for this purpose is based on partial differential equations for dispersion that have been developed for homogeneous and isotropic media, where Darcy's law is valid (Ogata and Banks 1961;Ogata 1970;Bear 2013). Generally, contaminant source concentration and mass discharge in time are not well known at most contaminated sites (Locatelli et al. 2019). According to Sauty (1980), if a tracer is continuously injected into a uniform flow field from a point source, contamination will spread, as shown in Fig. 2. In this case, flow is governed by Eq. (2), where mass transport is calculated in two dimensions: where v x is down gradient fluid velocity, D L is longitudinal dispersion coefficient (m 2 /d), D T is transverse dispersion coefficient (m 2 /d), and C is solute concentration at x, and y (mg/m 3 ).
An observation point with known concentration is assumed at location x = 0 , and y = 0 with a uniform flow velocity at a rate v x parallel to the x axis. Then, a solute is represented with a concentration C 0 at a rate Q over the aquifer thickness, b . The equation to calculate the concentration on a distance from the observation point can be found from a Green function (Bear 2013;Fried 1975) for the injection of a unit amount of a contaminant, under a steady-state condition, is expressed as where K 0 is the modified Bessel function of the second kind and zero order, C 0 Q b is the rate that the contaminant is injected, and b is the thickness of the aquifer over which the contaminant is injected.
Some parameters (i.e., v x , D L , D T , and Q ) need to be addressed, prior to solving Eq. (3). According to Darcy's law, where v x is average linear velocity (m/d), K is hydraulic conductivity (m/d), and = peffective porosity, and dh dl = hydraulic gradient (m/m).
Multiplying v x by the cross-sectional area of the plume makes up the injection rate of the contaminant ( Q ) to the aquifer: where T is plume width and b is the thickness of the aquifer over which the contaminant is injected. The longitudinal and transverse hydrodynamic dispersion, D L and D T , respectively, can be calculated as: where L is longitudinal dynamic dispersivity and T is transverse dynamic dispersivity. A value of longitudinal dispersivity may exceed 50 m for long flow fields (Gelhar 1986). The ratio between L and T can be equal from 6 to 20.
The D * is an effective diffusion coefficient, which can be calculated as follows: where is a coefficient related to the tortuosity (Bear 2013) and D d is the diffusion coefficient (m 2 /d). However, as D d is small and is always less than 1, the effective diffusion coefficient, D * is neglected.
The solution of the contaminant transport problem Eq. (3) in 2D may be calculated by the Plume2DSS() add-in (Renshaw 2013) in Excel. However, this add-in requires exact values of transport parameters, which are unavailable at the site. Thus, the authors developed an optimizer using Macro in Visual Basic for Application (VBA) in Excel (Naseri-Rad et al. 2021). This Macro solves the Plume2DSS() using an optimizer with multiple iterations within the given range of transport parameters to minimize the absolute error (difference between measured and calculated concentrations at each specific point) for each measurement event.
The model is assigned by the following characteristics: the exact locations of sampling points (x, y, and dL), obtained from the groundwater model; contaminant concentrations; hydraulic gradients between any two wells; and ranges of value changes for transport parameters, i.e., T , b , K , dh, p , L , and T . The developed code then calculates the concentration at any point downstream by optimizing the calibration parameters in the given range to minimize the error. Once optimal parameters are determined, the model is validated using these in the Plume2DSS() add-in, and subsequent comparison between C calculated (a concentration, obtained without running the code) and C measured . Thus, optimal transport parameters are numerically obtained via minimization of the absolute error.
As a final step, a fate and transport modeling assessment is carried out. Once all required solute transport parameters are defined, new coordinates (x + x i , y + y i , and dL i ) are assigned via extension of the already defined pathways (an example is shown in Fig. 2 for the pathways BD and/or BC) and C i (x + x i , y + y i ) is solved using Eq. (3).
The proposed method was applied to the case study to illustrate its applicability in a real-world (although admittedly a complex) environmental problem. The complexity in this case study comes from the heterogeneous geology

Study area description
The Pavlodar region is one of the largest industrial centers in Kazakhstan. Metallurgical, chemical, and petrochemical industrial activities have been causing the environmental condition of the region. The local petrochemical industry is a major actor in the economic activities and the main taxpayer in the region, which accounts for about 50% of the city budget (Neftepererabotchik 2019). As residents of the rural area near the industrial zone are using the groundwater from the shallow aquifer for their drinking and domestic purposes (Tussupova et al. 2016), an investigation was needed to assess potentially affected sites and to prevent members of the local community from drinking unsafe water. The source of potential contamination in this case study is a recipient pond "Sarymsaq" (Appendix A), where wastewater is discharged by the cluster of local petrochemical industry sites (Radelyuk et al. 2021b).
The wastewater from the oil refineries is mainly characterized by petroleum hydrocarbons (Alva-Argaez et al. 2007). Even after primary mechanical treatment, it is difficult to remove hydrocarbons from the wastewater (Bruno et al. 2020). The most basic biological treatment method, activated sludge, which is used by refineries in Kazakhstan, cannot efficiently treat the wastewater for two reasons: first, because the petroleum hydrocarbons have a low level of biodegradability; and, second, because the salinity and toxicity of wastewater inhibits the efficiency of the treating biomass.
While developed countries have identified certain indicators, such as polycyclic aromatic hydrocarbons (PAHs), or benzene, toluene, ethylbenzene, xylenes (BTEX), for the better estimation of toxic effects from wastewater (CONCAWE 2018; USEPA 2019), Kazakhstani oil refineries monitor only the sum of total petroleum hydrocarbons (TPH), without an assessment of the constituent potentially toxic chemical compounds. The term TPH describes the range of hundreds to thousands of individual compounds. Previous research has shown that hydrocarbons, originating from the petrochemical industry, are practically ubiquitous in groundwater, are usually only biodegraded to a limited extent, and the resulting contaminant plumes may be several kilometers long (Balderacchi et al. 2013). Measured concentrations in groundwater samples show that the amount of TPH constantly exceeds the permissible limit in all monitoring wells surrounding Sarymsaq, which confirms the presence of groundwater pollution in the area (Radelyuk et al. 2021b). The amount of TPH usually has maximum peaks during the spring season, which can be explained by intensive snowmelt and large infiltration of water through the soil profile.
Step 1. Data preparation The following sources of information were used for this study: investigations of the hydrogeological conditions conducted during the Soviet and post-Soviet period (Kosolapov et al. 1993); and several recent field surveys, which mainly confirm the information from the previous investigations. The data, needed for contamination transport modeling, which were collected from representatives of industry, include concentrations of TPH in observation wells. There was no information available on initial concentration of TPH in the pond (as the pond has served as a discharge point for wastewater for over 30 years), recharge concentration of TPH (as it always depends on the load of pollutants from factories, which varies depending on the conditions of wastewater treatment systems and the capacity of the factory itself), and the characteristics of the pond, such as sedimentation properties, geometry of the pond, etc. Also, there were no opportunities to perform investigations and measurements to obtain detailed data about TPH characteristics and longitudinal dispersivity, and to identify the fraction content and degradation properties of TPH. As the constraint in data availability, particularly, detailed content of TPH in this certain case study, it was decided to follow the example from the Total Petroleum Hydrocarbon Criteria Working Group and assume that TPH is not degradable (Gustafson et al. 1997). Consequently, degradation characteristics of hydrocarbons were neglected in this study.
The study area is located on the western side of the West-Siberian plate. The surface is a mildly sloping plain with elevations ranging from 132 m above mean sea level in the southeast to 105 m in the northwest. The main recharge is from the Salair-Altai mountains, which are situated in Russia, where infiltration of precipitation, melting of glaciers, and runoff in mountain rivers generate an artesian flow toward the region. Locally, the precipitation also feeds the groundwater. The piezometric levels are established at a depth of 2-51 m below the ground surface. The values of transmissivity of the aquifer vary between 22 and 133 m 2 /d, increasing in the eastern direction (Kosolapov et al. 1993). The groundwater is discharging into the Irtysh River to the west.
The hydrogeological cross-section is mainly represented by three formations (Appendix A). The formation of Upper-Quaternary deposits of the first supra flood plain terrace (aQ III ), which is distributed along Irtysh River, 4-5 km wide. The water-bearing sediments consist of quartz-feldspar sands. The top layer is composed of sandy loam and loam, and the bottom layer is composed of gravel and pebbles. The thickness of the formation is up to 20 m. The groundwater in the aquifer has a free surface. The aquifer complex in Upper-Miocene Lower-Middle-Pliocene deposits of the Pavlodar suite (N 1-2 pv) is distributed over the entire region. The thickness varies from 2 to 7 m in the northern and northwestern parts of the study area, and increase in a southeasterly direction to 80 m. These sediments are characterized by uneven distribution and the occurrence of sand among the clays. Water-bearing sediments in these aquifers consist of quartz-feldspar and micaceous sands. The sands are coarse-grained with gravel and pebbles in the south, and a fine-grained, sometimes clayey texture, in the north. Groundwater is mostly confined and occurs at a depth of 2-28 m to the surface. The formation of the Kulunda formation (N 2 kln) is partly included in the Pavlodar formation. The thickness varies between 5 and 26 m. The water-bearing layer is characterized by alluvial sand, mixed with gravel and pebbles, with clayey and loamy lenses. The Pavlodar formation covers the Tavolzhan formation (N 1 tv). This layer (N 1 tv) consists mostly of clay that constitutes a bottom for the upper formation. Thus, the sediments above N 1 tv formation is the subject of this study. As the formation is complex and there is only one piezometric measurement available at each well location, we only considered one layer for the modeling procedure.
Step 2. Groundwater modeling The study area of this research was discretized into a 3D finite-difference model grid as a one-layer unconfined aquifer. The grid size was assumed 500 m under consideration of the large size of the territory as well as the location and the size of the potential contaminant sinks and sources. The ground surface elevation was obtained from the DEM-file from the Advanced Land Observing Satellite (ALOS) project (Advanced Land Observing Satellite (ALOS) Project). Initial head and bottom elevations of the aquifer were generated by inverse distance weighted interpolation of measured values. The groundwater levels in the observation wells around the pond were entered into the model by knowing the exact location of 18 observation wells. However, the resolution of the chosen DEM was 30 × 30 m, which may affect the accuracy of the calculated levels. The eastern boundary was assigned as a general-head boundary, as it crosses a chain of lakes. The conductance and transmissivity values were taken from the technical reports (Panichkin et al. 2008;Kosolapov et al. 1993). Irtysh River was assigned as a drain on the western boundary of the model with assigned conductance. This was calculated using where k is hydraulic conductivity of riverbed material, l is length of reach, t is thickness of riverbed, and w is width of the river. The initial hydraulic conductivity values were assigned according to the existing data, and subsequently, they were calibrated during the modeling process. The southern boundary is close to the Balkyldak pond, which was the recipient of wastewater from a local chemical plant. Currently, the pond is surrounded by a concrete wall, deepened into the ground, as the pond received 10-15 t of mercury through wastewater from a local chemical plant up to 1990 (Ullrich et al. 2007). Hence, this boundary was defined as a no-flow boundary. The potential recharge for shallow aquifers is normally estimated as the difference between total precipitation and evapotranspiration (Sathe and Mahanta 2019). However, the result might have a negative value, as the reported values for annual precipitation are about 303-352 mm, and for annual evaporation is around 957 mm (Heaven et al. 2007).
A steady-state condition is assumed for the model, as there is relatively negligible exploitation of groundwater resources, and there are no significant trends in observed heads during a long-term period, despite seasonal fluctuations (Radelyuk et al. 2021b). The average head value was used to assign the initial head for the steady-state modeling.
After calibration of the model for a regional-scale assessment, the model grid was re-discretized, and the size of the grids was changed from 500 to 100 m for local scale modeling. This modeling, first, gives an opportunity for a detailed estimation of particle tracking, and second, for insertion of an extra layer for the receiving pond, as a general-head boundary that defines lake-aquifer interactions (Mylopoulos et al. 2007). The local model was aimed to be used for following contamination transport identification. Thus, it is important to focus on a certain area, where the contaminant spread is to be estimated. Hence, a new grid was constructed, and the new model was re-run and re-calibrated.
Step 3. Contamination transport modeling Pathway 1 and 2 (P1 and P2) originated from the monitoring wells W1 and W3, respectively, where coordinates x, and y, and measured concentrations of TPH were established as initial conditions (x 0 = 0, y 0 = 0, and C 0 ) for the contamination modeling. The W2 and W4 (for Pathways 1 and 2, respectively) are located at the boundary of the sanitary zone, where the concentration of TPH should not exceed a regulatory limit of 100 μg/L for drinking and domestic purposes according to Kazakhstani legislation (Ministry of Economy 2015). Coordinates for W2 and W4 (x, y, and dL) and measured concentrations of TPH (C) were assigned as having accurate values, while parameters T, b, p, K, α L , α T , and dh were given a range for the studied aquifer for running the solver coupled with the VBA code to obtain C modeled with a minimized error. The obtained values of transport parameters are defined as being calibrated. After that, the average values of the calibrated parameters, which are associated with the geological characteristics (b, p, K, dh) and T, as the parameter of quasi-steady conditions of the contamination, were assigned for all measurement events; and the solver was re-run to re-calibrate the model by comparison between C modeled and C measured . These averages, defined as unified, were used for the following contaminant fate and transport modeling assessment.
The conceptual fate transport model for P1 and P2, on the examples of lines BD and BC, respectively, is presented in Fig. 2. The directions towards the nearest agricultural fields were chosen, as the nearest consumers of groundwater. The results for P1 are expected having a wave structure as the distance for them was calculated along the example of the direction BD in Fig. 2. This means that y-coordinates were moved in the negative direction, which caused higher concentrations with the following cross-section and y = 0, and subsequently heading away from the source of the pollution. The direction BC in Fig. 2 indicates P2 with a straightforward direction from the source of pollution. The distance was modeled until the concentration of TPH reached a concentration that complied with the regulatory limit. The distance (dL i ), and related dh i steps were assumed based on the groundwater model.
The final step was to look at the sensitivity of the parameters. The idea of this analysis is to identify the most critical parameters for that influence the calibration of the contaminant fate and transport model. The parameters K, T, b, p, dh, α L , and α T were assessed, as input parameters for solving Eq. (3). The assessment was performed by individually varying only one parameter at the time within its range of possible values.
Parameter uncertainty is always a major concern. However, the purpose of this study was to provide a tool that gives a general picture of contamination transport by illustrating the patterns of change in contaminant concentration in space with the following assessment for potential fate for the environment and local inhabitants. Calibration, validation, and sensitivity analysis of the model were done to the extent when available data allowed for this, while a detailed discussion about this procedure including uncertainty analysis is beyond the scope of this study (McMahon et al. 2001).

Groundwater modeling
The hydraulic conductivity, conductance of boundaries, and drainage characteristics of the river were calibrated using PEST and PEST Pilot points after the initial forward run of MODFLOW (Fig. 3a). The simulated hydraulic heads were compared to observed heads (Appendix B). The RMSE was equal to 0.7 m and the R 2 coefficient equal to 0.96, which indicate good performance of the model.
The results of the calibration reflect the lithology of the study area (Appendix C). The higher hydraulic conductivity values (5.5 m/d) occur in sediments adjacent to the Irtysh River and in Upper-Quaternary deposits (aQ III ), as they consist mainly of well-sorted quartz-feldspar rich sands and gravels. The decrease of hydraulic conductivity (5.5-0.8 m/d) from the northwest to the southeast is explained by the local topography, as the groundwater is deeper, and extent of covered (confined) conditions of the aquifer is larger. The conductance of the drain (river) (50 m 2 /d) and transmissivity of the boundaries (125 m 2 /d) were approximately equal to the accessible data (from technical reports) after calibration, which confirms an acceptable estimation of groundwater flow.
After calibration and adjustment of the regional model, local scale modeling using MODFLOW and MODPATH was performed. Figure 3b shows the simulated water level contours for the steady-state conditions. This suggests that the direction of groundwater flow and contaminant transport would be in a westerly direction with some displacement towards the southwest. Wells 1 and 2 (W1 and W2) and Wells 3 and 4 (W3 and W4) could be used for contamination transport modeling, as they are in the downstream direction from the source. Particle tracking showed the potential transport of contaminants towards agricultural fields, where groundwater is used for irrigation (4.5 km from the pond) and nearby rural areas (Michurino and Pavlodarskoe villages, 8 km from the pond). However, MODPATH considers only porosity and not longitudinal dispersivity. The RMSE of the local model was equal to 0.6 m and the R 2 coefficient equal to 0.68, which show a good fit of the model (Appendix D).

Contamination transport modeling
After calibration of the groundwater model, two pathways, located downstream from the source of pollution, were chosen for the contamination transport modeling. Figure 4 and Appendix E show the results of the calibration of the transport parameters. The R 2 coefficient was equal to 0.9 that indicates the good performance of the model. The calculated means of the differences between groundwater levels (dh) by MODFLOW were equal to 0.1 m and match the contamination transport model results. However, dh is one of the most uncertain parameters in this case study, as the number of measurements of groundwater level is low.
The assumptions are considered credible if Q (from Eq. (5)) is similar in both models. In the present case, the average flow rate was calculated to be 50 m 3 /d and the same value resulted from the contamination transport model. The large plume width (T) is explained by the width of the Sarymsaq pond, which is equal to 1.5 km. Hence, according to dispersion, T might be larger than 1500 m. The longitudinal dispersivity depends on the scale of spread of contamination and was close to values found in literature. However, both parameters had high range of potential values, caused by a large scale of calculations. Further studies on a smaller scale may give more accurate results. The calibrated values of aquifer thickness were equal to 10 m, which is confirmed by local lithology (Appendix A). The range for hydraulic conductivity and porosity was extracted from literature and was given as 0.8-8 m/d and 0.22-0.3, respectively. The unified values were equal to 5.6 m/d and 0.28, for hydraulic conductivity and porosity, respectively. Figure 5 presents the potential results of the contamination transport, based on measured concentrations in the monitoring wells, and unified transport geological parameters for individual cases and historical scenarios. The distance (dL i ) step was assumed as the square root of x + 100 and y + 20 m from the observation points. The dh i was estimated as 0.01 m for each dL i step, based on the groundwater model. Obviously, the plume width correlates with the starting concentrations. The potential maximum distribution during the spring season in 2019 and 2018 was equal to 6 and 3 km for Pathways 1 and 2, respectively. The starting concentrations (C 0 ) showed maximum historical values in those years. The same situation was found for the autumn periods when the potential maximum spreading of the contamination was 4.5 and 2.6 km for Pathways 1 and 2, respectively. The calculated distance is higher during the spring seasons than in autumn. Lower values of the calculated plume fit the lower concentrations with a resulting shorter distance from the origin of pollution. However, even for these cases, the distance was relatively large, almost 1.2 km from the source of the contamination. The final step of the application is the analysis of the sensitivity of aquifer parameters. The ranges for K, T, b, and p were changed several times in an equal manner: they were decreased by 25% and 50% and increased by 25% and 50%. For the investigation of the relationships between dh and dL i , the range for dh was considered between 0.01 and 0.1 m. The C 0 was assumed to be 500 μg/L for all scenarios. To assess the potential spread of plume, depending on the initial load of the TPH, the range for C 0 was considered between 50 and 1000 μg/L. After establishing new values, dL i was calculated to the extent, when the concentration of TPH is within permissible levels. The results are presented in Fig. 6. While all parameters show relevant changes in the resulting dL i within the range of change, longitudinal and transverse dynamic dispersivity together with plume width were considered as the most sensitive parameters in this study. This is explained by the fact that these parameters might be changed over time depending on contamination characteristics, while parameters describing geological characteristics are constant. The increase of dh causes significant increase in the distance of plume spreading, as it is related to the enhanced linear velocity, and consequently, enhanced injection rate of the contaminant. The parameters may vary significantly depending on the scale and characteristics of contaminants. If the C 0 is equal to 1000 μg/L, the contamination can spread up to 5 km for Pathway 1 and 2.5 km for Pathway 2. The highest rate of the spreading belongs to the area near the source of pollution, e.g., the growth of the starting concentrations from 100 to 150 μg/L causes extension of the plume spread corresponding to 0.25 km (from 1.30 to 1.55 km) for Pathway 2, while the following dL i was changed for 120-50 m for each 50 μg/L. Thus, future data acquisition for the most sensitive parameters is required to better control the situation of the groundwater contamination in the region.

Conclusions
The management and remediation of groundwater contamination in developing countries is often hampered by lack of data. In this case, it is possible to assess the potential contamination from a holistic viewpoint with some general results, which can be beneficial for guiding future investigations. The method presented in this study has several advantages and is proposed as an example for cases where detailed site-specific data are not available. Basic information about the local hydrogeology and concentrations of pollutants at observation points can be used to estimate the further spreading of pollution. When calculated errors from contamination transport modeling and subsequently calibrated parameters, related to local conditions characteristics (i.e., T, b, K, and dh), match or confirm the results of the calibrated MODFLOW model and available literature data, the model can be considered to be reliable. At the same time, incorrect identification of important parameters, such as initial coordinates, increases the risks of incorrect modeling. Also, the lack of knowledge about contaminant characteristics affects the study. The processes of volatilization and degradation, which are not considered in this study, decrease the amount of petroleum compounds in space and time. However, some petroleum compounds may not be degradable, which cause risks even at low concentrations.
This study showed that the risks for inhabitants in the potentially affected rural area of Kazakhstan can be avoided, as the plume has not reached the villages considered to be within the risk zone. However, agricultural areas at a distance of 2-6 km downstream the source of pollution could be affected by contaminated groundwater. Thus, the recommendation is to avoid usage of groundwater from the shallow aquifer for irrigation purposes in this area. The authors suggest performing additional investigations to determine the extent to which the TPH plume has spread from the contamination source, based on the general results, and especially the results of the sensitivity analysis. This study has also shown the need for local industries to pre-treat wastewater properly before discharging to ponds to decrease the potential fate and spreading of the pollution to the subsurface.
Finally, this study highlights the necessity of integrating key stakeholders in the government-science-industry collaboration. The joint effect of governmental activities, scientific approach, and industrial activity can contribute to the reduction of negative societal impact of pollution release.