1 Introduction

Today, exposure to water and air pollution is the main environmental risk factor affecting people's health [1]. These pollutants are hazardous substances that have the potential to affect human well-being [2]. In both urban and rural areas, tropical countries face severe drinking water shortages due to intermittent and recurrent cuts by the Cameroon Water Utilities Corporation (CAMWATER), forcing households to rely on groundwater. Analysis of chemical parameters in Yaoundé showed that 51% of groundwater samples exceeded the World Health Organization's (WHO) nitrate standard of 50 mg/l, with high and extreme values near groundwater recharge areas [3]. This groundwater is unsafe for consumption, particularly for infants and young children, and exacerbates existing microbiological risks and contributes to illnesses. Chemical analysis has also revealed a persistent and direct impact of wastewater disposal practices on groundwater quality. Regulatory measures have been established through Law no96/12 of August 1996 on environmental management framework and Decree no2011/2582/PM for protecting the atmosphere to reduce pollutant levels in groundwater. However, the challenge remains high in industrial areas near sensitive receptors, such as the town of Mbalmayo with its light fuel oil thermal power plant that releases pollutants harmful to human health [4]. Measures to develop and use non-conventional water resources, such as industrial, urban, rural and agricultural effluents have been proposed to compensate for the agricultural water deficit in Iran [5]. To maintain drinking water quality standards in sensitive areas, assessment and management of pollutant emissions must be done, either through direct monitoring at the site or through dispersion prediction modeling [6]. Due to limited financial resources, the prediction of pollutants in sensitive receptor areas must be done through dispersion modelling, which effectively predicts the movement of pollutants in space and time at a local scale [7].

Several researchers have worked on the modeling of pollutant dispersion with successful dispersion simulation models, including Gaussian, Lagrangian, and computational fluid dynamics (CFD) models. The Gaussian model has a simple mathematical expression and requires only a few parameters, making it useful in operational cases as results are obtained quickly, but its accuracy is limited [8]. Compared to the CFD and Lagrangian models, the Gaussian model is fast in computation time but less accurate in prediction results. A fast and accurate model is necessary to improve the quality of results [9]. In the literature, several works have been presented on the modeling of atmospheric dispersion, aimed at obtaining accurate and operational results for efficient and fast pollutant dispersion prediction. For example, Florian [10] used a Flow'air-3D approach based on CFD wind data calculated at the industrial site and a Simultaneous Localization and Mapping (SLAM) modeling code. The Flow'air-3D/SLAM combination, which balances accuracy and speed, was able to satisfactorily represent dispersion in a complex case, but the model is limited to a single site. Pierre et al. [11] used cellular automata (CA) for transition rule calculation and neural networks (ANN) for atmospheric dispersion modeling of methane puffs. The combined CA-ANN model aims to accurately and quickly predict puff evolution, but has limitations as errors increase with the number of iterations and the recurrent type neural network used can become unstable during modeling.

Sihang et al. [12] developed a method for fast and accurate monitoring of hazardous gas emissions from pollution sources, using a combination of neural networks, particle swarm optimization, and expectation maximization. The neural networks were used to predict the concentration distribution, while particle swarm optimization and expectation maximization were used to evaluate source parameters and accelerate the convergence process. Rongxiao et al. [13] proposed a new model for hazardous source estimation, combining artificial neural networks and a hybrid of particle swarm optimization and simulated annealing. The neural networks predicted scattering, while simulated annealing improved global search. The results showed that the method could accurately estimate hazardous sources and wind fields. Rongxiao et al. [14] aimed to control emissions and mitigate public health problems caused by industrial pollutants, using a model that integrated neural networks and the atmospheric dispersion modeling system AERMOD. The neural network predicted dispersion, while AERMOD provided data for the prediction. The results showed that the model was feasible for contaminant prediction. However, the model still underestimates the concentrations. Jun et al. [15] have worked on the efficient map retrieval of air pollution using machine learning. In this paper, they propose a tree-based multi-cascade spatial–temporal learning model (MCST-Tree) for AQ inference to retrieve pollution maps. The results show that mobile sampling significantly improves the spatio-temporal modelling capability. In this paper, we replace previous models with a coupling of a genetic algorithm and neural network (GA-ANN), which demonstrates superior qualities. The GA-ANN coupling is a prediction and optimization model that considers the chemical and physical characteristics of the environment. This study, conducted between January 2017 and December 2021, aims to estimate the impact of the activity of the Mbalmayo thermal power plant on the groundwater used as a resource by the surrounding population, by comparing the concentration of infiltrated fuel oil with the water quality standards and criteria, using the GA-ANN coupling as a model to optimise the objective function and predict the concentration of infiltrated and dispersed fuel oil in the groundwater.

In this article we will first present the coupled genetic algorithm-neural network model. Then we will compare this model with other models used in the literature review. Finally, based on field data, we will estimate the level of oil concentration in the groundwater, with the aim of helping decision makers in their interventions.

2 Material and method

2.1 Location of study area

The thermal power plant is located in Mbalmayo, a town found at about 45 km to the South-southeast of Yaoundé capital of Cameroon. The plant is located inside the Mbalmayo Forest Reserve at 3° 28′ 06″ N 11° 30′ 51″ E where it occupies a surface area of about 1.6 ha (Fig. 1). The town has a population of about 300,000 inhabitants, and is located on mother rocks of schistose origin, these are very fertile ferralitic soils, very permeable and favourable to the infiltration of water in the soil thus helping to recharge the water table. The rocks are also acidic, and their high acidity is due to the leaching of bases by rainwater [17]. The soils of the lowlands are very sandy and are more often hydromorphic, with the water table close to the surface. Mbalmayo has a thermal power plant that uses two WARTSILA units and has a production capacity of 10 MW. However, this power plant is responsible for accidental pollution of the ground with hydrocarbons such as light fuels. These pollutants infiltrate the soil by spillage, leakage or discharge on a local scale and their fate depends on their chemical and physical nature, as well as on meteorological conditions and the topography of the environment. Several parameters can influence this vertical infiltration of pollutants and their horizontal dispersion in the soil, such as the water content of the soil (We), porosity \((p_{0} )\) and the permeability \((K)\) which translates the ability of a rock to let water pass through it. It allows us to understand the actual groundwater flow velocity which is characterised by the porosity (Table 1) and the permeability coefficient (Table 2).

Fig. 1
figure 1

Localisation of industrial centre and the Forest Reserve of Mbalmayo

Table 1 Grammage of porosity in rocks [18]
Table2 Provides orders of magnitude of permeability coefficient at saturation of homogeneous soils [19]

The typical pathway of a pollutant is as follows: it begins at the soil surface, then moves vertically through the unsaturated zone (UZ) above the water table, reaches the water table, and then moves horizontally in the water table with the spread of the pollutant plume, which is often aligned with the flow direction. This can pose a threat to the health of vulnerable individuals near the source and potentially cause waterborne illnesses like cholera.

During the study period, from January 2017 to December 2021, soil samples were collected in the field and these data were used to experimentally obtain the water content in the soil. (Fig. 2) shows the evolution of water content between January 2017 and December 2021, with We = 1% representing the lowest water content in January 2020 and We = 82% representing the highest content in October 2017.

Fig. 2
figure 2

Monthly soil water content over a 5 year period

2.2 Standards and water quality

This article focuses on fuel oil spillage from the Mbalmayo thermal power plant in Cameroon. The concentrations of the spilled oil in the groundwater are estimated, followed by an evaluation of the impact on the surrounding population. The water quality standards defined by Cameroonian environmental management laws and groundwater protection regulations (Law No. 96/12 and Decree No. 2011/2582/PM) are used to compare the estimated concentrations and are presented in (Table 3).

Table 3 Environmental standards for land disposal in Cameroon

2.3 Experimentation

In this study, 12 soil samples were taken from several points near the power plant. The procedure for determining soil moisture content was as follows: Wet soil samples were collected monthly at multiple points in Mbalmayo, then labeled and weighed using a digital balance in the physics and science laboratory at Ecole Normale Supérieure in Yaoundé I. After 24 h, a second weighing was done to calculate the dry soil mass and evaporated water mass for each sample, which was used to determine the soil's water content using the following relations:

$$ m_{e} = m_{se} - m_{ss} $$
(1)
$$ W_{e} = \frac{{m_{e} }}{{m_{se} }} \times 100 $$
(2)

\(m_{e}\): The mass of water evaporated, \(m_{se}\): Wet soil mass, \(m_{ss}\): Dry soil mass, \(W_{e}\): Soil moisture content

2.4 Presentation of genetic algorithm coupled to neural networks

The initial design of artificial intelligent systems was introduced by Robbins et al. [21]. To solve this problem of accuracy and speed of results, Ostad-Ali-Askari et al. [22] used artificial neural networks to estimate nitrate pollution in groundwater in the marginal area of Zayandeh-rood River, Isfahan, Iran. ANN is a group of machine learning techniques inspired by biological neurons [23]. It is a fairly accurate and fast model that predicts the concentration of pollutants in a complex area at any point in space. It consists of several input layers, several hidden layers and an output layer (Fig. 3). It uses the learning process that adjusts the parameters of the neural network layers so that the error on the results is as small as possible [24]. To do this, it has to select the variables and input parameters that affect the dispersion of the pollutants, which are needed for learning. An increase in the number of neurons in the hidden layer can lead to an increase in the performance of the neural networks in the accuracy of the results. ANNs are learned using the MATLAB neural network toolbox.

Fig. 3
figure 3

Flowchart of the Hybridization algorithm GA-PSO coupled ANN

Optimization is an essential process in modelling the atmospheric dispersion of pollutants. For example, this is the case with the particle swarm optimization (PSO) algorithm, which is an intelligent optimization method, which guides particles to find the optimal solution, but this algorithm tends to fall on a local optimum [13]. Unlike PSO, which is very fast in computation time, genetic algorithms (GAs) are stochastic optimization algorithms based on the mechanisms of natural selection and genetics. The process of the genetic algorithm is as follows: One starts with a population of arbitrarily chosen initial potential solutions called chromosomes, and their relative performance is evaluated. Then based on this performance, a new population of potential solutions is created using simple evolutionary operators: selection, crossover and mutation. This cycle is repeated until a satisfactory solution is found [25]. This method is too computationally intensive, but effective in finding the global solution [26]. For this paper, the genetic algorithm is evaluated and used as an objective function optimization tool (Fig. 3).

The GA-ANN coupling is used to optimise the prediction of the pollutant concentration at date n + 1. The calculation of the prediction starts by using natural selection mechanisms to evolve from one generation (k) to the next (k + 1) in order to find the best fitness function, which in this study represents the concentration at the initial time (Fig. 3). The fitness function, together with other input variables such as the advection term, diffusion term, molecular diffusion coefficient (Dm) and mass flow rate (Qs), is then fed into the input layer of the ANN, resulting in the predicted concentration at the output layer. The prediction model is based on an explicit discretization of the advection and diffusion terms. The flowchart of the coupling process is as follows:

2.5 Selection of input variables

The selection of input variables is necessary for the efficiency of artificial neural networks, but this selection of variables is a difficult task in the definition of neural networks as it reduces the complexity of the model. For this study, the convection–diffusion equation in porous media is used as input variable (4)

$$ \frac{\partial \left( C \right)}{{\partial t}} + \nabla .(v_{0} C) - \nabla .(D_{m} \nabla C) = Q_{s} $$
(3)

The latter, which reproduces the behaviour of pollutants in the study area, where \(C(y,z,t)\) is concentration at point M and time t in (ug/m3), \(D_{m}\) is molecular diffusion coefficient in (m2/s) and \(u_{0} ,v_{0} et\,w_{0}\) water flow velocities along the ox, oy and oz axes, \(Q_{s}\) is the mass flow rate (kg/s).

In the case of infiltration of the solute into the groundwater along the vertical axis oz we have:

$$ \frac{\partial \left( C \right)}{{\partial t}} + w_{0} \frac{\partial C}{{\partial z}} - D_{m} \frac{{\partial^{2} C}}{{\partial z^{2} }} = Q_{s} $$
(4)

For a dispersion of the solute in the underground medium along the horizontal ox and oy we have:

$$ \frac{\partial \left( C \right)}{{\partial t}} + u_{0} \frac{\partial C}{{\partial x}} + v_{0} \frac{\partial C}{{\partial y}} - D_{m} \frac{{\partial^{2} C}}{{\partial x^{2} }} - D_{m} \frac{{\partial^{2} C}}{{\partial y^{2} }} = Q_{s} $$
(5)

2.6 Meshing of the study area

The prediction of the pollutant concentration at a point in the domain can be obtained by first meshing the domain (Fig. 4), then explicitly discretizing the advection–diffusion equation by finite difference in order to obtain the approximate value of the concentration in each time and space step. The discretization of the operators is done using the Taylor method. Either of the concentrations near the target concentration are used as input variables by the neural networks [11].

Fig. 4
figure 4

Mesh of the domain

From Eq. (4) we have:

$$ \frac{{C_{i,j,k}^{n + 1} - C_{i,j,k}^{n} }}{\Delta t} + w_{0} \frac{{C_{i,j,k + 1}^{n} - C_{i,j,k - 1}^{n} }}{2\Delta z} - D_{m} \frac{{C_{i,j,k + 1}^{n} - 2C_{i,j,k}^{n} + C_{i,j,k - 1}^{n} }}{{(\Delta z)^{2} }} = Q_{s} $$
(6)

From Eq. (5) we have:

$$ \frac{{C_{i,j,k}^{n + 1} - C_{i,j,k}^{n} }}{\Delta t} + u_{0} \frac{{C_{i + 1,j,k}^{n} - C_{i - 1,j,k}^{n} }}{2\Delta x} + v_{0} \frac{{C_{i,j + 1,k}^{n} - C_{i,j - 1,k}^{n} }}{2\Delta y} - D_{m} \frac{{C_{i + 1,j,k}^{n} - 2C_{i,j,k}^{n} + C_{i - 1,j,k}^{n} }}{{(\Delta x)^{2} }} - D_{m} \frac{{C_{i,j + 1,k}^{n} - 2C_{i,j,k}^{n} + C_{i,j - 1,k}^{n} }}{{(\Delta y)^{2} }} = Q_{s} $$
(7)

3 Results and discussion

3.1 Performance of the GA-ANN coupling

The present study evaluated the coupled GA-AN model, the CFD model and the Gaussian model by comparing them with the sensors which are operational monitoring tools in the field. The result obtained in Fig. 5 shows that the coupled GA-AN, CFD and Gaussian model underestimates the concentration values beyond 260 s. However, the coupled GA-AN model is more accurate in estimating the concentration level over time than the CFD and Gaussian models.

Fig. 5
figure 5

GA-ANN comparison to CFD and Gaussian model

3.2 Evolution of pollutants over time as a function of depth

At Mbalmayo, between 0 and 90 cm depth, the soil texture is sandy-clay with a high porosity; and beyond 90 cm depth, texture is clay with a low porosity [27]. The curve (Fig. 6) shows the evolution of the concentration of pollutants for x = 500 m, y = 500 m and for different values of depth over time. It is observed that for a depth of z = 41 cm, there is a constant concentration of 0 ug/m3 between 0 and 250 days. This concentration evolves until it reaches a maximum value of 0.22 ug/m3 between 251 and 270 days. It can also be seen that for a depth z = 130 cm the concentration of the fuel oil, which is 0 ug/m3 on the first day, remains constant until 455 days. This means that the evolution of the concentration depends on the depth of the soil, which itself depends on the texture and structure of the soil rock.

Fig. 6
figure 6

Concentration versus depth curves

The curve in (Figs. 7, 8 and 9) shows the evolution of the concentration of pollutants as a function of depth between January and December 2020. It can be seen that in January, with a water content of We = 1%, the concentrations evolve in the soil to a depth of about 12 cm, in May, with a water content of We = 60%, the concentrations evolve to a depth of about 108 cm, and in October, with a water content of We = 76%, the concentrations evolve in the soil to a depth of about 128 cm. This means that the infiltration of pollutants into the soil depends on the water content, the higher the water content the deeper the pollutants infiltrate.

Fig. 7
figure 7

Evolution of the concentration for different months as a function of depth and permeability

Fig. 8
figure 8

Evolution of the concentration for different months as a function of depth and permeability

Fig. 9
figure 9

Evolution of the concentration for different months as a function of depth and permeability

The soil of Mbalmyo is porous with a porosity Po that varies between 26 and 48% according to (Table 1). The curve (Fig. 10) shows the evolution of the concentration of fuel oil as a function of the porosity in the soil of the town of Mbalmyo and the depth. It can be observed that for a porosity Po = 26% the concentration of fuel oil reaches a depth of about 20 cm and for a porosity Po = 48% the concentration of fuel oil reaches a depth of about 98 cm. This means that the porosity is a very important parameter in the evolution of the oil in the soil, because the higher the porosity, the deeper the oil infiltrates into the soil and the lower the porosity, the less the oil infiltrates into the soil.

Fig. 10
figure 10

Evolution of concentration as a function of depth and porosity

The soil of Mbalmyo is permeable with a permeability K that varies from 10–5 to 10–1 m/ s according to Table 1. The curve (Fig. 11) shows the evolution of pollutant concentration for different soil permeabilities in the town of Mbalmyo, according to depth. It can be observed that for a permeability coefficient K = 10–5 m/ s the pollutants reach a depth of about 20 cm and for a permeability coefficient K = 10–1 m/ s the pollutants reach a depth of about 98 cm. This means that the permeability is a very important parameter in the evolution of pollutants in the soil, the higher the permeability the more the pollutants infiltrate the soil and the lower the permeability the less the pollutants infiltrate the soil.

Fig. 11
figure 11

Evolution of concentration as a function of depth and permeability

(Figs. 12 and 13) shows the concentration distribution at t = 100 s in the Mbalmayo groundwater. The red colour represents the concentration of pollutants such as fuel oil in the vicinity of the thermal power plant and the blue colour represents the absence of pollutants. This means that as the pollutant disperses into the water table, it gradually dilutes until it disappears.

Fig. 12
figure 12

Distribution of pollutants in 2 dimensions in the water table of the city of Mbalmayo for z = 150 cm

Fig. 13
figure 13

Distribution of pollutants in 3 dimensions in the water table of the city of Mbalmayo for z = 150 cm

The curve (Fig. 14) shows the horizontal spread of pollutant concentration in the groundwater at various permeability coefficients. It can be seen that for a permeability coefficient K = 10–5 m/s the concentration C = 3.7 ug/m3 decreases progressively until it is cancelled at about 220 m, then for a permeability coefficient K = 10–1 m/s the concentration decreases progressively until it is cancelled at about 445 m. This means that the higher the coefficient of permeability, the faster the pollutants disperse away from the source and get closer to the sensitive receptors by gradually diluting in the water, and the lower the coefficient of permeability, the closer the pollutant remains to the source of pollution. Thus 445 m is the expected distance at which the concentration of pollutant tends to disappear completely and by comparing the concentration of fuel oil in the soil with the water quality standards and criteria, it can be seen that the limit threshold of 50 mg/l is respected throughout the study area.

Fig. 14
figure 14

Evolution of the concentration of pollutants for different values of the water velocity

4 Conclusion

This paper presents a new model for groundwater quality assessment combining genetic algorithm optimization and neural network prediction. The coupling of a genetic algorithm and neural networks was used to predict the infiltrated and dispersed oil concentration in groundwater from parameters such as precipitation, water content, porosity, and permeability. An experimental study was conducted to obtain water content values and the data was collected between January 2017 and December 2021. The GA-ANN model was compared with the CFD and Gaussian models for validation and found to be more accurate and efficient than the Gaussian model. Soil water content, permeability and porosity were assessed. It was found that an increase in water content, permeability and porosity leads to a rapid infiltration of fuel oil into the soil. Moreover, variations in water content, porosity and permeability affect the concentration evolution. Finally, comparing the concentration of fuel oil in the vicinity of the spring, which is 3.7 ug/m3, with the concentration limit for fuel oil, which is 50 mg/l, the results show that the groundwater quality respects the standards of the World Health Organisation over the 445 m distance. However, there is a risk of groundwater pollution by fuel oil in the event of heavy activity at the Mbalmayo thermal power plant in the 0-445 m zone. For a sustainable health protection, a settlement of the population on a perimeter beyond 445 m and the construction of a water treatment plant are recommended. Firstly, as a contribution, these results constitute a potential opportunity to assist emergency authorities in their decision making process. Then we point out however that this model has limitations as it underestimates some concentration values. Finally, it is important to look at micro-sensors as real time monitoring tools.