Introduction

Morocco is a country with limited water availability, possessing a total annual renewable resource of 29 billion cubic meters (BCM), which includes 4 BCM of groundwater, providing over 60% to 70% of potable water supply nationally, certain deep aquifers are non-renewable or have limited recharge capacity (Faysse et al. 2010; Hssaisoune et al. 2020). However, rapid development and agricultural expansion have resulted in aquifer overexploitation, with extraction exceeding natural recharge rates in most major basins (Bouchaou 2004; Ait Brahim et al. 2017; Eslamian et al. 2017; Echogdali et al. 2023). This groundwater depletion is further threatened by increased water stress induced by climate change projections in the region (Bahir et al. 2021). Addressing sustainability thus remains a key challenge. The Tensift-AL Haouz basin, encompassing the Al-Haouz aquifer system near Marrakech city, exemplifies this issue, is currently undergoing severe depletion, with water levels decreasing by up to 65 m over a period of several decades (Hssaisoune et al. 2020). Studies of stable isotopes, water balance, and flow modeling indicate substantial modern recharge but an alarming decline in groundwater levels (Boukhari et al. 2015; Hadri et al. 2021; Kamal et al. 2021). As the expansion of irrigation drives abstraction, better characterization of these aquifers is needed to support management for sustainability.

A key research gap lies in delineating complex subsurface geometries to constrain hydrogeological models (El Mezouary 2016; Zuffetti et al. 2020). The vastness of the region, insufficient and lack of direct measurements on aquifer depths, and spatial heterogeneity create uncertainty (Hermans et al. 2023). Characterizing aquifer substrate topography is crucial for accurate and successful groundwater modeling, analysis, and management. The height and form of the bedrock or other limiting boundary under an unconfined aquifer exert first-order control over system behavior (Fetter 2018; Somers and McKenzie 2020). Substrate topography influences critical parameters such as saturation thickness, transmissivity, flow velocities, and boundary conditions (Tokunaga 2009). Neglecting complex heterogeneous substrates might result in incorrect model assumptions and predictions (Xu and Valocchi 2015; Jing et al. 2019). Oversimplified flat or sloping substrate representations, for example, cannot account for the effects of hidden valleys, bedrock highs, or low-permeability inclusions (Deutsch and Siegel 2020; Song et al. 2020; van Woerkom et al. 2021). However, direct sampling of aquifer bottom elevations is expensive and geographically constrained, making high-resolution mapping impossible.

Traditionally, characterizing aquifer geometry relied heavily on direct subsurface measurements from boreholes and wells. The purpose of hand-drawn cross sections and contour maps is to accurately define hydrostratigraphic surfaces and boundaries by filling in the gaps between sparsely sampled points (Maliva 2016). To adequately confine intricate three-dimensional aquifer structures, extensive drilling was necessary. Geophysical techniques such as ground-penetrating radar, resistivity, and seismic surveys offer additional and complementary imaging of the subsurface in the areas between boreholes (Bechtel et al. 2014). Nevertheless, uncertainties in geophysical inversion constrain the ability to accurately determine the detailed characteristics of an aquifer. Furthermore, traditional hydrogeological models faced difficulties in incorporating various datasets and accurately representing complex geometries (Turner 2006). Statistical methods such as kriging enhance property mapping by integrating spatial correlations (Kitanidis 1997). However, the computational requirements limit the use of simulations with a large number of parameters. In the end, the ability to accurately understand complex aquifer systems using traditional methods was hindered by significant data limitations and methodological constraints.

The application of machine learning (ML) and deep learning (DL) techniques has recently led to significant progress in the diverse domain. Advanced statistical learning algorithms that use large datasets have made it possible for the modeling of complex systems to get a lot better. This has led to the discovery of novel insights and enhanced predictive abilities. ML and DL techniques have greatly accelerated progress in the interconnected fields of geology, hydrology, water resource management, and hydrogeological characterization. For example, ML and DL have proved to be highly valuable in the field of geology for mineral exploration. It is capable of identifying subtle patterns in geophysical and geochemical data that indicate the presence of mineral deposits beneath the Earth's surface (Zhao et al. 2016; Zuo 2017). Deep learning enables the analysis of seismic data using innovative approaches, resulting in significantly improved subsurface exploration through precise structural interpretation (Di et al. 2018; Wang et al. 2018). These applications facilitate the identification of concealed patterns within geological datasets, thereby exposing practical and actionable insights.

Advanced machine learning algorithms that utilize meteorological and streamflow data have made significant advancements in flood forecasting and early warning systems (Shamshirband et al. 2020; El Mezouary et al. 2022). Long short-term memory neural networks greatly improve river discharge and rainfall–runoff predictions (Fang et al. 2017; Kratzert et al. 2018), enabling more informed water management. Deep learning revolutionizes the capabilities of satellite remote sensing by offering unparalleled precision in capturing land surface characteristics (Kattenborn et al. 2021; Aboutalebi et al. 2022; Eshetie et al. 2023). ML also aids in the monitoring of drought (Docheshmeh Gorgij et al. 2022; Shahfahad et al. 2023), projecting reservoir inflow (Gupta and Kumar 2022; Latif and Ahmed 2023), and forecasting water demand (Xu et al. 2022; Zanfei et al. 2022). The unparalleled adaptability of contemporary statistical learning methods is enabling the exploration of novel hydrologic modeling capabilities. Machine learning (ML) enhances the monitoring of water distribution systems (Fu et al. 2022; Yu et al. 2023) and accurately forecasts the inflow of water into reservoirs (Huang et al. 2022; Saab et al. 2022) in the field of water resources engineering. Within the field of hydrogeology, multiple studies have also adopted machine learning for aquifer mapping by extracting information from geophysical surveys and auxiliary datasets (Shirmard et al. 2022; Bonogo et al. 2023), machine learning (ML) is utilized to forecast the groundwater level and quality (Singha et al. 2021; Tao et al. 2022; Deng et al. 2023; Ko and Yoo 2023), optimize pumping strategies (Gaur et al. 2018), and generate detailed maps of groundwater potential (Mousavi et al. 2017). Physics-informed neural networks exhibit potential for constructing comprehensible hydrogeological models (Raissi et al. 2019; Li et al. 2022).

Previous studies have primarily examined subregions rather than the entire aquifer due to the intricate nature of the Al-Haouz-Mejjate system. For instance, Rmiki et al. (2021) conducted research in the central AL Haouz region, whereas Rochdane et al. (2018) developed a three-dimensional model specifically for the eastern section of the basin. Rochdane et al. (2015) and other similar initiatives analyzed the geometry of the eastern AL Haouz and Tassaout aquifers. Other localized studies conducted in the AL Haouz region include electrical resistivity tomography in the eastern AL Haouz (Rochdane et al. 2022), gravimetric analysis in the western AL Haouz, and structural diagram of AL Haouz-Mejjate (El Goumi et al. 2010; Chouikri et al. 2016). While this research pioneers the integration of geospatial datasets with machine learning and deep learning models to map aquifer substrates. By combining sparse borehole data with terrain, geology, hydrology, and other features, the models reveal complicated interactions to produce accurate high-resolution maps. The methodology enables advanced analysis of substrates, improving hydrogeological understanding and sustainability in the vital AL Haouz-Mejjate region a crucial economic, agricultural, and touristic hub for Morocco facing water scarcity pressures. The research offers multiple innovations, including fusing diverse data sources, applying state-of-the-art algorithms to discern subtle patterns, and enhancing limited existing knowledge on regional aquifer architectures. By illuminating these hidden aquifer freshwater systems through data and artificial intelligence, the study provides a foundation for refined groundwater modeling and management amidst rising demands in the region. The unprecedented detailed substrate visualizations will prove invaluable for securing water resources in this preeminent tourist destination and beyond.

Study area

The AL Haouz-Mejjate basin of Marrakech is located in central Morocco and is surrounded by a number of significant physiographic features. The northern limit is defined by the Jebilet massif, while the western boundary is defined by the Essaouira and Chichaoua plateaus. The High Atlas Mountains' foothills may be found to the east and south. This combination of uplands generates a big enclosed depression with an extent of 6800 km2 (Fig. 1). The basin is divided into three major subregions: western AL Haouz (Mejjate Plain), central AL Haouz, and eastern AL Haouz (Bernet and Prost 1975; Sinan 2000).

Fig. 1
figure 1

Map illustrating the geographic location and geology of the AL Haouz-Mejjate plain, with structural anomalies referenced from El Goumi et al. (2010), Rochdane et al. (2015), and Chouikri et al. (2016)

Groundwater resources in EL Haouz-Mejjate Aquifer

The AL Haouz plain surrounding Marrakech, which is a crucial agricultural and urban hub, experiences a particularly severe situation. The AL Haouz aquifer system in Marrakech is a crucial water source, with potable water demands reaching 65 MCM in 2014 (Zhao et al. 2019). However, the act of pumping water excessively poses a significant threat to system equilibrium. This highlights the need for enhanced knowledge in hydrogeology to ensure long-term sustainability. The investigation of AL Haouz's intricate sedimentary aquifer system is impeded by limited direct sampling.

The hydrogeological basin of the Tensift-AL Haouz, including the AL Haouz aquifer, has a semiarid climate with an average annual rainfall of roughly 240 mm, with the majority of it falling between November and April (AGR, 2008). Mountain runoff and groundwater reservoirs provide water supplies. Agriculture is one of the most significant socioeconomic activity made possible by irrigation. Large-scale hydraulic networks, small to medium hydraulic systems, and individual private irrigation on farms are the three principal irrigation technologies used. Irrigated areas cover around 120,000 hectares and are dominated by grain agriculture as well as olive and fruit orchards (Bzioui 2004; Water 2008).

Geological context and hydrogeologic of AL Haouz-Mejjate Aquifer

The AL Haouz Plain is a tectonic sedimentary basin filled with siliciclastic deposits generated from Neogene and Quaternary erosion of the uplifted High Atlas range (Ambroggi and Thuille 1952; Bernet and Prost 1975; Ferrandini and MARREC 1982). On a hydrogeological level, geological structures such as faults, flexures, anticlines, and synclines have a strong influence on reservoir geometry and groundwater circulation (Sinan 2000; El Goumi et al. 2010; Rochdane et al. 2015; Chouikri et al. 2016).

By conducting deep soundings in the south of the plain, it is possible to locate the aquifer reservoirs in the middle and eastern parts of AL Haouz. These reservoirs bevel very quickly (Fig. 2a, and b), putting the Neogene rock right on top of the bedrock. The existence of deep layers is therefore not possible, and the secondary and neogene terrains meet directly with the more recent formations (Bernet and Prost 1975). The AL Haouz Occidental (Mejjate) basin has two main layers, which can be seen in the geological cross section (Fig. 2c): an upper unconfined aquifer that is buried in thick Quaternary and Mio-Pliocene formations, while the Mejjate region exhibits both an unconfined and deep-confined aquifer that rests on impermeable substrate and is made up of Miocene clays and marls (Ambroggi and Thuille 1952; Bernet and Prost 1975; Sinan 2000). These sediments were carried and deposited by an Atlasian wadi network, resulting in large alluvial fans and fluvial structures. This basin fill is made up of alternating permeable pebble, gravel, and sand lenses interspersed with virtually impermeable clay and marl layers. It sits unconformably on an impermeable Miocene clay and marble base (Sinan 2000).

Fig. 2
figure 2

Subfigures a and b illustrate geological cross sections through the eastern and middle parts of AL Haouz, respectively, while subfigure c shows a cross section from the western AL Haouz-Mejjate plain. Image from Boukhari et al. (2015)

The principal water-bearing formations are the Plio-Quaternary alluvial layers, which are recharged by Atlas Mountain streamflow (Bouimouass et al. 2020; Fakir et al. 2021; Hajhouji et al. 2022). The depth of the marly substrate varies regionally. Triassic or Paleozoic schist bedrock surfaces provide the basis in some regions. This multilayered structure serves as a significant unconfined aquifer system, supplying a key source of water to the region (Bernet and Prost 1975; Boukhari et al. 2015). Of the two primary aquifers within the AL Haouz Plain, this study focuses specifically on the high-resolution characterization of the upper unconfined layer buried within Quaternary and Mio-Pliocene sediments. As an alluvial-free aquifer system, the geometry and properties of this layer are heavily influenced by the underlying substrate surface (Sinan 2000; Rochdane et al. 2015).

Methods

Data preprocessing

To support aquifer characterization, lithological data from 635 reconnaissance boreholes and production wells were obtained from the Tensift Hydraulic Basin Agency (ABHT). These logs contain stratigraphic profiles that indicate material types and depth transitions. Some include date information as well as piezometric observations. Originally scanned, this data was digitized and converted into a standardized readable format suitable for geographic information system (GIS) analysis. Unique identifier, surface elevation, depth of each stratigraphic unit, lithological facies codes, piezometric level, outflow rate, permeability, transmissivity, and water inflow are among the key metrics collected. Missing attributes were filled using data from other ABHT datasets, such as the 1971 and 2011 piezometric surveys, pumping tests surveys, and the USGS digital elevation model (DEM).

Stratigraphic columns were divided into three hydrogeological units. The first hydrogeological unit, UHF-1, assumes responsibility for enclosing the upper unconfined aquifer system. This structure is distinguished by high permeability, or a mixture of permeable materials that are elevated above the water table. Its boundaries contain a variety of geological elements, such as sand, gravel, and conglomerate formations (Duffield 2019). These substances have a remarkable capacity to facilitate water flow, making them essential components of the functioning of the aquifer system. Within this category, UHF-2 semipermeable stands out because it reveals the presence of limestone formations that have been further enhanced by a mixture of sands, gravels, clay, or marl (Lewis et al. 2006; Duffield 2019). This geological domain's complexity and importance are highlighted by the variety of materials found in the upper aquifer.

The third unit, aptly referred to as UHF-3, represents the aquiline substrate in stark contrast. This particular unit stands out because it is made up of impervious layers that are positioned below the aquifer materials mentioned above. These layers, which primarily consist of marls, clays, and shales, naturally have low permeability (Lewis et al. 2006; Duffield 2019; Neuzil 2019). This innate quality restricts water's vertical movement and plays a crucial part in hydrogeology. Unit 3's impervious layers frequently act as vital barriers that keep groundwater contained within the aquifer system. Figure 3 shows an example of a classified log that exemplifies the importance of three categorizations in mapping substrate depth.

Fig. 3
figure 3

Illustration of an example of borehole classification, showcasing borehole Log No. IRE 397/52-ABHT before and after analysis, classified into three-unit classes. UHF-1 and UHF-2 denote permeable and semipermeable materials, respectively, while UHF-3 represents impermeable materials corresponding to the substrate bedrock level

A total of 168 boreholes fully penetrated the aquifer system to access the deepest aquifer, thereby yielding substrate elevation limits based on the 635 compiled logs. Their spatial distribution is as follows: 51% in western AL Haouz, 30% in center AL Haouz, and 19% in eastern AL Haouz (Fig. 4). To avoid biasing the substrate estimates, the remaining 467 logs were excluded due to incomplete information or because they did not reach the substrate.

Fig. 4
figure 4

Comprehensive map illustrating the distribution of real reconnaissance boreholes (depicted by green points) and synthetic reconnaissance boreholes (depicted by black points) across the study area

Synthetic data processing

Aquifer substrate elevations in areas with limited available measurements were estimated using a multi-step methodology incorporating machine learning (ML) and deep learning (DL) models, geostatistical interpolation, and the integration of actual and simulated data points. The goal was to fill spatial data gaps and improve the subsurface mapping resolution (Chaplot et al. 2006; Bamisaiye 2018). To expand the coverage, synthetic boreholes (green points in Fig. 4) were generated in locations without direct subsurface measurements from reconnaissance boreholes (black points in Fig. 4). These synthetic boreholes effectively simulate possible subsurface logs at unsampled points by leveraging relationships and patterns learned from real measurement sites (Chen et al. 2023; Zhang et al. 2023). They provide realistic virtual borehole readings that supplement the spatial density when interpolated into maps. Simultaneously, geological formations, lithological details, and diverse hydrological properties were compiled from accessible sources to thoroughly describe the geological context within the study area. This contextual data aids in constraining and validating the synthetic data. The integration of real measurements and modeled synthetic logs enables expanded subsurface characterization.

After gathering the necessary data, machine learning (ML) and deep learning (DL) models were developed to forecast the elevations of the aquifer substrate. The models were trained using the 168 real borehole substrates (Fig. 4). The training phase was crucial in allowing the models to acquire intricate input–output relationships that capture the intricate interaction between hydrogeological parameters and substrate elevations. Afterward, the ML and DL models that had been trained were utilized on the synthetic borehole locations created in the previous step. These locations are visually depicted as green points in Fig. 4. The models accurately predicted the substrate elevations at these synthetic locations by using the appropriate hydrogeological parameters as input. Subsequently, the real and synthetic data were combined in order to unify the recorded and projected substrate measurements. The process of combining this data resulted in an expanded dataset, which greatly increased the extent of spatial coverage and improved the overall depiction of substrate information throughout the study area. The combination of real measurements and synthetic predictions yielded a comprehensive dataset that formed the basis for the subsequent interpolation processes (Corchado and Aiken 2002; Tunkiel et al. 2022). Ordinary kriging, a geostatistical interpolation technique, was used to create a high-resolution substrate surface map. Kriging utilizes spatial patterns within the dataset to estimate substrate elevations at locations that were not sampled. Importantly, the incorporation of synthetic generated predicted points enabled the extrapolation of substrate elevations across sparsely collected actual measurements.

Machine learning and deep learning architectures

The synthetic data depicted in Fig. 4 is employed to assess a diverse array of traditional machine learning algorithms and contemporary deep neural networks for the purpose of mapping aquifer substrates. Linear regression (LR) assumes a linear relationship between the geospatial inputs and the target substrate aquifer (Hastie et al. 2009). Regression trees (tree) capture hierarchical interactions between parameters through recursive binary splits (Elith et al. 2008). Ensemble methods like random forests improve generalizability by averaging many individual decision trees (Prasad et al. 2006; Bernard et al. 2009). Support vector machines (SVM) find optimal hyperplanes for classification and regression (Shmilovici 2010). Through the Gaussian process regression (GPR) models handles nonlinearity in a flexible manner (Snelson 2008; Saul et al. 2016). The quadratic kernel enables us to model data that varies at various scales. This is a very useful algorithm in spatial statistics and geostatistics, where multivariate statistical analysis on metric spaces is performed (Park 2011). These techniques provide breadth across linear, tree, and kernel-based learners. Deep artificial neural networks (ANN) are also assessed, implemented. Feedforward networks with multiple hidden layers learn hierarchical feature representations. Various activation functions, regularization schemes, and optimization algorithms are tested. Hyperparameter optimization identifies ideal model configurations (Cho et al. 2020). Partial dependence plots help visualize internal network workings. By propagating signals through stacked nonlinear transformations, deep learning can uncover subtle geospatial relationships unattainable to classical methods (Camps-Valls et al. 2021).

To mitigate dimensionality, we employed principal component analysis (PCA) to derive six principal components capturing 95% of the variance across all models. In terms of feature selection techniques, we utilized the minimum redundancy maximum relevance (MRMR) (Shirzad and Keyvanpour 2015), ANOVA F-statistic (FTest) (Elssied et al. 2014), and RReliefF algorithms (Relevance ReliefF). Our analysis of machine learning models, aimed at identifying the most influential parameters for aquifer substrate prediction, was multifaceted. Sensitivity analysis was conducted by systematically perturbing inputs and monitoring resultant output changes (Montavon et al. 2018). Additionally, we employed partial dependence plots to visualize the marginal impact of individual parameters on predicted substrate (Molnar 2020). In the case of the Gaussian process model, our focus was on examining the learned length-scale hyperparameters (MacKay 2003).

All techniques are trained on standardized data with stratified sampling. Cross validation prevents overfitting during hyperparameter tuning (Charilaou and Battat 2022). Model evaluation uses out-of-sample testing on borehole substrate measurements. The following goals are specific: 1) gathering representative datasets; 2) analyzing diverse ML/DL architectures; 3) finding critical predictive factors; 4) creating 3D substrate representations with uncertainty; and 5) comparing model accuracy against observed data. Meeting these goals will reveal insights into the linkages that drive aquifer geometry and demonstrate the benefits of cutting-edge ML and DL for lighting of complicated subsurface systems. This approach is shown in Fig. 5 as a diagram architecture that evaluates each method's ability to distill predictive multivariate patterns that accurately generalize across the domain. By harnessing both classical and modern machine learning, the mathematical foundation of each technique and its potential benefits for illuminating subsurface geometries given available geospatial datasets are described in the following sections.

Fig. 5
figure 5

Comprehensive illustration of the architectural framework employed for substrate elevation prediction in the zone area, encompassing data acquisition, methodology, analysis, model results, and predictive outcomes

Linear regression

Linear regression (LR) models the relationship between explanatory variables \(x\) and response \(y\) as:

$$y={\beta }_{0}+{\beta }_{1}{x}_{1}+\dots +{\beta }_{p}{x}_{p}+\varepsilon$$
(1)

where the \(\beta\) terms are model coefficients and \(\varepsilon\) is an error term. Ordinary least squares minimize the sum of squared residuals to solve for the coefficients (Draper and Smith 1998). Regularization methods like ridge regression reduce overfitting on the training data (Tibshirani 1996).

Support vector machines

Support vector machines (SVMs) find an optimal hyperplane to separate classes or predict values using:

$$y\left(x\right)={w}^{T}\Phi \left(x\right)+b$$
(2)

where \(w\) is the normal vector to the hyperplane, \(\Phi \left(x\right)\) maps \(x\) to a higher-dimensional space, and \(b\) is the bias. The solution maximizes the margin between classes. The dual Lagrangian form enables kernel methods like SVRs for nonlinear functions (Smola and Schölkopf 2004).

Gaussian process regression

A Gaussian process regression (GPR) defines a distribution over functions, which are represented as:an activation function like the sigmoid:

$$f(x)\sim {\mathcal{G}\mathcal{P}}\left( {\mu \left( x \right),k\left( {x,x^{\prime } } \right)} \right)$$
(3)

where \(\upmu \left(x\right)\) is the mean function and \(k(x,{x}^{\mathrm{^{\prime}}})\) is the covariance kernel function. This provides a flexible nonparametric Bayesian model for regression. The predictive distribution at a point is Gaussian with a mean and variance tuned on the data (Rasmussen 2003).

Ensemble methods

Ensembles combine multiple weak learners like decision trees into one predictive model for improved performance. Algorithms like random forests bootstrap training data and attributes to build diverse trees and average their predictions (Breiman 2001). Boosting methods like XGBoost incrementally add models to focus on difficult instances (Chen and Guestrin 2016).

Decision trees

Decision trees (tree) make predictions by greedily splitting the feature space into partitions based on criteria like information gain or reduction in variance (Quinlan 1986). Recursive binary splitting forms branches and nodes that segment the data. Pruning and ensemble techniques prevent overfitting on trees.

Artificial neural networks

Artificial neural networks (ANN) are computing systems inspired by biological neural networks and designed to identify patterns in data. They comprise interconnected nodes or "neurons" which transform input signals using an activation function like the sigmoid:

The model can be represented as:

$$y_{i} = f\left( {\sum\limits_{j} {W_{{ij}} } x_{j} + b_{i} } \right)$$
(4)

where \({x}_{j}\) is input features, \({W}_{ij}\) is learned weights, \({b}_{i}\) is biases, and \(f\) is the activation function. By stacking layers of neurons, very complex relationships can be modeled (Goodfellow et al. 2016). Neural networks are trained using backpropagation and gradient descent to iteratively minimize a loss function like mean squared error (Ruder 2016). Regularization methods like dropout prevent overfitting (Srivastava et al. 2014).

Evaluation metrics

The rigorous quantitative validation was used to create models and evaluate the machine learning techniques considered for high-resolution characterization of aquifer substrate. On out-of-sample data, core performance indicators were root mean squared error (RMSE), mean squared error (MSE), R-squared, and mean absolute error (MAE). The RMSE and MSE measure the absolute differences between predicted and measured values, with lower scores indicating more accuracy (Chai and Draxler 2014; Hodson 2022). R-squared \(\left({R}^{2}\right)\) calculates the proportion of variance explained, with values close to one indicating stronger explanatory power (Rights and Sterba 2019). The average magnitude of errors is provided by MAE. These measurements, used together, provide complimentary insights into precision, bias, and representation capabilities (Willmott and Matsuura 2005).

Mean squared error

The mean squared error (MSE) is a common metric for regression model performance that quantifies the average squared difference between the predicted and true target values:

$$MSE=\frac{1}{n}\sum_{i=1}^{n}({y}_{i}-\hat{{y}_{i}}{)}^{2}$$
(5)

where \({y}_{i}\) is the true value, \({\hat{y}}_{i}\) is the predicted value, and \(n\) is the number of samples. \(MSE\) penalties larger errors more strongly than mean absolute error. Lower \(MSE\) indicates better model performance.

Root mean squared error

The root mean squared error (RMSE) takes the square root of the mean squared error:

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}({y}_{i}-\hat{{y}_{i}}{)}^{2}}$$
(6)

This returns error in the same units as the target data, facilitating interpretation. Like \(MSE\), lower \(RMSE\) denotes higher predictive accuracy.

Mean absolute error

The mean absolute error \((MAE)\) calculates the average magnitude of errors without squaring:

$$MAE=\frac{1}{n}\sum_{i=1}^{n}|{y}_{i}-\hat{{y}_{i}}|$$
(7)

Because \(MAE\) avoids amplifying outliers, it can be more robust than \(MSE\) or \(RMSE\) in some applications. Lower MAE signifies better model performance.

R-squared

R-squared \(\left({R}^{2}\right)\) measures how well a model fits the actual data compared to a naive baseline:

$${R}^{2}=1-\frac{S{S}_{res}}{S{S}_{tot}}=\frac{\sum {\left(\widehat{y}-\overline{y }\right)}^{2}}{\sum {\left(y-\overline{y }\right)}^{2}}$$
(8)

where \(S{S}_{res}\) is the residual sum of squares and \(S{S}_{tot}\) is the total sum of squares. \({R}^{2}\) ranges from \(0\) to \(1\), with higher values indicating more variance explained by the model.

These metrics were calculated on both validation and independent test sets, which is critical. The ability to compare performance allowed for the examination of model generalization and the possibility of overfitting (Raschka 2018). Divergence signified overfitting to the validation data, but techniques that maintained accuracy from validation to testing generalized well.

Results and Discussion

Machine Learning and Deep Learning Multivariate Model Performance

To comprehensively assess the accuracy of substrate prediction, Table 1 compiles validation and testing metrics across categorized machine learning approaches to quantify trade-offs in predictive accuracy. Root mean squared error (RMSE) constitutes the primary evaluation criterion. Additionally, other vital statistics like mean absolute error (MAE), model explanatory power (R-squared), computational configuration, and hyperparameters detail comparative capability. Together, the multiple comparative graphics (Figs. 6 and 7) and evaluation indices within Table 1 enable standardized scoring of predictive capacity. Lower errors and higher cluster density confirm model utility and generalizability. We present comprehensive results below to empower selection of the optimal approach based on application-specific accuracy thresholds while accounting for real-world uncertainty through independent testing.

Table 1 Comparison of model performance metrics (MSE, MAE, and R-squared) for validation and test phases across various machine learning and deep learning architectures (GPR, ANN, SVM, tree, ensemble, and linear regression), with details of preset architectures
Fig. 6
figure 6

Comparison of Aquifer Subsurface Elevation Prediction Models. Subfigures (a.1 to f.1) depict scatterplot maps showing actual vs. predicted values for each ML model, while subfigures (a.2 to f.2) illustrate residual analysis maps for validation data of each ML model. The models are denoted as follows: a: LR (linear regression), b: tree (decision tree), c: SVM (support vector machine), d: GPR (Gaussian process regression), e: ANN (artificial neural network), and f: ensemble

Fig. 7
figure 7

Correlation between actual and predicted subsurface values during the validation phase (a.1 to f.1) and test step (a.2 to f.2). The models represented are: a: LR (linear regression), b: tree (decision tree), c: SVM (support vector machine), d: GPR (Gaussian process regression), e: ANN (artificial neural network), and f: ensemble

The scatterplots presented in Fig. 6 depict a thorough comparison between the real borehole substrate elevations and the predicted elevations generated by a variety of machine learning (ML) and deep learning (DL) models during the validation step. Specifically, subplots (Fig. 6a.1) to (Fig. 6f.1) showcase the performance of linear regression (LR), decision trees (tree), support vector machines (SVM), Gaussian process regression (GPR), artificial neural networks (ANN), and ensemble methods (ensemble), respectively. Each subfigure plots the borehole record number along the x-axis and the corresponding substrate elevation response along the y-axis. The vertical distance between the true and predicted elevation for each borehole quantifies the prediction error. Moreover, the residual error distributions are visualized in subplots (Fig. 6a.2) to (Fig. 6f.2) to further assess disparities between predictions and measurements. Observing residual patterns aids in discerning areas for improvement to refine model parameters and enhance accuracy. Both scatterplots and residual plots enable comprehensive evaluation of precision and efficiency across machine learning and deep learning model configurations (Zhang et al. 2018). A reduced scatterplot distance signifies enhanced precision, while residuals dispersed approximately symmetrically around zero, and evenly distributed about zero, signify a well-fitted model.

Figure 7 is dedicated to assessing correlation strength through coefficient analysis and 1:1 ideal fit plotting. By comparing these visualizations, we can ascertain whether complex patterns truly underlie subsurface responses or if simpler assumptions suffice in explaining variance without succumbing to overfitting noise. These figures meticulously scrutinize the correlation between observed and predicted subsurface values, spanning both validation (depicted in subplots Fig. 7a.1 to Fig. 7f.1) and testing phases (illustrated in subplots Fig. 7a.2 to Fig. 7f.2). The graphical representations offer insights into the alignment of actual and predicted values, essential for evaluating the models' capability to capture nuances in subsurface elevation (Verma et al. 2024). By discerning these correlation patterns and trends, researchers can refine model architectures and parameter configurations to enhance predictive accuracy and reliability (Chou et al. 2011). Notably, a closer alignment of markers with the 45-degree correlation line indicates heightened precision, while discrepancies in metrics or patterns between validation and new data testing phases indicate potential instability risks.

Analyzing Table 1 results, the linear regression (LR) methods demonstrate simplicity, while more sophisticated tree ensembles and support vector machines (SVM) attempt to extract subtle intimations within intricate response patterns. Gaussian process regression (GPR) occupies the middle ground by smoothing assumptions and enabling slight nonlinear deviations. Delving into specifics, the customized rational quadratic Gaussian process regression (Wang et al. 2021) achieves a validation RMSE of 64.37 m alongside a MAE of 49.77 m and R-squared of 0.84 (Table 1, Model 8, and model 4.19). Critically, residual errors on unseen test data remain highly consistent at 61.15 m RMSE, 43.59 m MAE, and 0.89 R-squared (Figs. 6d and 7d). This demonstrates remarkable generalization capacity unmatched by other approaches. In contrast, the interactions linear regression (LR) fits training variation well per the 63.01 m validation RMSE but fails to translate to reality with markedly higher 66.45 m test error (Table 1, Model 4.1). Capturing complex geology transcends simplistic assumptions. Moving to ensemble approaches, the boosted trees model delivers validation accuracy near 80 m RMSE, bested by all models except the fine tree model. However, larger deviations emerge when applied to real-world scenarios, with testing errors ballooning over 20 m higher. The pairing of Subfigures 6a and 7a spotlights this model deficiency via dispersed data clusters and residuals spiking over + 120 m (Table 1, Model 4.14). Overly flexible deep neural networks display this overturning phenomenon more egregiously, fitting noise in validation then diverging wildly on test data (Figs. 6e and 7e).

Amidst the model options, Gaussian process regression (GPR) stands apart as optimally balancing accuracy and consistency (Zhao et al. 2022). The smooth kernel functions estimate nonlinear trends while retaining generalizability. The quantification and visualization synergy between low residual test MAE near 43 m (Fig. 6e.2) and high 0.89 testing R-squared proves GPR’s mettle for reliable, accurate subsurface insights.

The analysis of the MRMR, FTest, and RReliefF feature ranking algorithms (Table 2) revealed that elevation-based features, such as digital elevation model (DEM) data and piezometric levels from the year 2011, consistently ranked highest across models, followed by permeability. This observation aligns with the strong physical relationship between elevation and subsurface properties. Additionally, permeability estimated from pumping tests emerged as a key parameter, containing valuable information about subsurface geology. Our analysis provides valuable insights into the significant factors and relationships crucial for accurate aquifer substrate mapping.

Table 2 Feature ranking results using MRMR, F TEST, and RReliefF algorithms applied to the input model, with features represented in rows and ranking algorithms in columns

ML and DL Multivariate Substrate Model Visualization

The investigation into the AL Haouz aquifer system through machine learning (ML) and deep learning (DL) models, validated by visualization techniques, provides a comprehensive understanding of its complex subsurface architecture. Figures 8, 9, 10, and 11 offer crucial insights into the aquifer geometry. Figure 8 presents anticipated substrate characteristics via contour plots, peaks, and depression zones obtained from various ML and DL models. These visualizations provide valuable information about the spatial distribution of substrate surfaces and the presentation quality of the physical properties of aquifer substrates using different ML and DL models. Figure 9 delineates the positions of four cross sections within the aquifer region, showcasing topography, piezometric level data from 2011, and aquifer substrate predictions generated by the GPR ML model. The projections of tested boreholes along identical cross-section lines facilitate a clear understanding of the relationship between topography, piezometric levels, and substrate forecasts. Vertical substrate-piezometric-DEM profiles superposed on tested borehole logs further elucidate prediction uncertainty for the GPR model. It is noteworthy that the GPR models accurately capture the substrate elevation for all test boreholes, with a few instances of overestimation observed, such as at boreholes 3844/53, 3744/53 (Fig. 9a), and 1723/53 (Fig. 9d). This overprediction can be attributed to the overlap of confined and unconfined subsurface layers in the Mejjate part and the western limits of the central AL Haouz-Mejjate aquifer, as affirming this characteristic geometry by Sinan (2000).

Fig. 8
figure 8

Comparison of modeled substrate elevation using machine learning and deep learning models. Subfigures af represent GPR, ANN, SVM, LR, ensemble, and tree models, depicted as contours, peaks, and depressions

Fig. 9
figure 9

Subfigures a to d depict cross sections illustrating DEM, piezometric, and simulated substrate data overlaid on reserved, tested real borehole data. UHF-1 and UHF-2 denote permeable and semipermeable materials, respectively, while UHF-3 represents impermeable materials associated with the substrate bedrock level. The color scale indicates the piezometric level

Fig. 10
figure 10

a Aquifer thickness map. Subfigures b to g depict cross-section profiles from north to south and west to east, illustrating DEM and simulated substrate in multiple aquifer locations

Fig. 11
figure 11

3D visualization illustrating aquifer thickness between substrate elevation from the GPR model and topographic DEM, from multiple angle views

Insights into the complex aquifer geometry in the AL Haouz-Mejjate region are provided in Fig. 10, with aquifer thickness (Fig. 10a) and six DEM-substrate profiles depicted (Fig. 10b-g). Profiles 1–4 consist of vertical transects running from north to south, while Profiles 5–6 depict horizontal sections extending from east to west. These profiles reveal morphological disparities among the western, middle, and eastern regions of the AL Haouz aquifer area. Distinct boundaries, as described in earlier literature (Bernet and Prost 1975), are evident, such as between the western and central Houz parts near the Nfiss stream, and between the central and eastern Houz at the R’dat river (Fig. 10a, g). The set of south–north profiles in Fig. 10 (profiles 3–6) catalog aquifer thickness variations and confirm basin-scale synclinal subsurface architecture that structurally constrains modern flow patterns. The central and western zones exhibit significant sediment thickness, likely occurring through shifting fluvial channel migration and overbank flooding under paleoclimatic fluctuations. Meanwhile, minimal strata deposits and erosion in the northwestern manifest as shale bedrock outcrops, as verified in our analysis by abrupt substrate elevation gains west the Nfiss River (Profile 1 in Fig. 10b). Regarding variations in substrate depth, noticeable discrepancies exist across Profiles 1 to 4 (Fig. 10a to d). The presence of shallow thickness below 10 m in the northwestern and northeastern region indicates either significant erosion or limited deposition. Conversely, the western–southern region and the middle-southern region consist of more than 100 to 200 m of aquifer, suggesting a significant amount of accessible potential reservoirs (Sinan and Razack 2009; El Goumi et al. 2010; Chouikri et al. 2016).

The delineated cross sections offer parallel representations of different geospatial characteristics, demonstrating the accuracy of the ML model in capturing subsurface elevation. Vertical substrate-piezometric-DEM profiles further elucidate prediction uncertainty, encompassing a wide range of actual borehole substrate levels. The aquifer thickness map and multiple vertical transects spotlight the clear differentiation of western, central, and eastern areas via rapid changes in substrate elevations. This confirms earlier findings based on geological descriptors and water table contours. The substrate surfaces within the western, central, and eastern zones display largely parallel, uniform south–north progression as expected for layered sedimentary units, affirming interpretations by Bernet and Prost (1975). However, substantial depressions and valleys characterize the full region subsurface, likely associated with buried ancient drainage networks and structural events.

Lastly, Fig. 11 offers a 3D perspective of the modeled aquifer system, highlighting how surface topography influences subsurface materials and groundwater levels. This validation of the consistency of the GPR model across the study area reinforces the utility of machine learning techniques in understanding complex hydrogeological systems.

Comparative analysis: machine learning/deep learning vs. kriging vs. gravimetric maps

While subsurface architecture has been conceptualized through our previous ML and DL investigations (Fig. 610), accurately validating these map models requires several comparisons with previous studies and several conventional interpolation methods.

In this section, we conducted a comparative analysis between the GPR, ANN, basic conventional kriging (CK), and gravimetry substrate model maps. The sub-Fig. 12a–d shows that the GPR and ANN maps substantially outperform simplistic kriging interpolation, instead aligning with gravity signals developed by El Goumi et al. (2010). The machine learning substrate models presented in Fig. 12a (GPR), and 12b (ANN) quantifiably and visually surpass simplistic kriging interpolation (Fig. 12c). While the convolutional kriging map displays smoothed uniformity, the GPR and neural network approaches exhibit nuanced relief aligned with gravitational signals indicating structural anomalies (El Goumi et al. 2010; Chouikri et al. 2016) (Fig. 12d). Quantitatively, ML and DL models achieved significantly lower root mean squared errors of 61 m and 54 m, respectively, versus kriging’s 100 m value (R2 = 0.63), highlighting predictive gains. The capacity to ingest diverse datasets and represent nonlinear relationships allows efficient identification of concealed depositional and structural patterns.

Fig. 12
figure 12

Comparative visualization of substrate elevation estimation techniques: a Gaussian process regression contour and relief maps; b neural network contour and relief maps; c simple kriging; and d gravimetry map for the aquifer area. Structural anomalies referenced from El Goumi et al. (2010), (Rochdane et al. 2015), and (Chouikri et al. 2016)

Figures 6, 7, 8, 9, and 10 present in previous subsection spotlight through vertical and lateral sections how data fusion exposes sharp structural compartment boundaries between western, central, and eastern areas, corroborating concealed faults and asymmetric synclinal folding noted in earlier conceptual models (Rochdane et al. 2015; Mandour Abdennabi et al. 2016; Rochdane et al. 2018; Rochdane et al. 2022). The angular subsurface transitions and rapid elevation changes over short distances manifest these Cenozoic tectonic events that dictate modern aquifer productivity differences. Capturing this geological legacy, a feat unachievable through individual soundings, underscores machine learning’s revelations of complexity from scattered measurements. While interpolation relies on surface continuity assumptions (Li and Heap 2008), multivariate data-driven methodologies efficiently extract signals from indirect proxies to reconstruct intricacy beyond direct sampling. Revelation of the interconnected conduit-barrier architecture via computational harmonization of decades of field evidence provides the key inputs needed for next groundwater model of AL Haouz-Mejjate.

Deep learning bivariate model

Factors most predictive

Our comparative assessment provides insights into the primary factors governing accurate mapping of aquifer substrate characteristics. Foremost among these is elevation topographic data (DEM), as digital elevation models and piezometric levels consistently ranked as the most predictive individual features (Smith 2021). This finding aligns with the known dependence of subsurface geometries and aquifer properties on topographic relief and water table heights. By helping to explain variance in substrate observations, elevation provides critical predictive power. However, our findings also highlight the importance of collective nonlinear effects between geospatial variables in producing accurate maps (Luo et al. 2023). Models that could flexibly represent complex variable interactions and non-stationary relationships significantly outperformed traditional linear techniques. This suggests that while elevation offers the strongest individual predictive signal, precise aquifer mapping relies heavily on modeling intricate multivariate dependencies. Developing methods to effectively capture these nonlinear relationships while avoiding overfitting remains an open research need.

While the multivariate machine learning model incorporates diverse datasets for holistic substrate mapping, analyzing the specific dependency between surface and subsurface elevations could reveal additional hydrogeological insights. Numerous previous studies have shown strong correlations between topographic attributes and aquifer geometry, given their common response to geomorphological factors (Kumar and P 2022; Ruuska et al. 2023). The feature importance ranking results also highlighted digital elevation model (DEM) data as highly influential (see Table 2). To directly model the relationship between digital elevation model (DEM) input data and observed aquifer substrate elevations, a customized feedforward neural network architecture was developed in MATLAB. This comparatively shallow nonlinear topology leveraged 168 paired data instances to flexibly represent complex mapping functions between the variables (Koçak and Şiray 2021). The paired elevation data instances enabled supervised learning to produce substrate depths from surface inputs (Goodfellow et al. 2016).

Bivariate training model process

The network was trained using the Levenberg–Marquardt backpropagation algorithm, an efficient technique combining gradient descent and Gauss–Newton methods (Ampazis and Perantonis 2000; Yu and Wilamowski 2018). Mean squared error loss was optimized during 300 epochs of training. To improve generalization, the data was divided with a 70/15/15 ratio into training, validation, and testing sets using random sampling. The network inputs and outputs were min–max normalized to aid convergence. Various performance metrics were monitored throughout training, including the mean squared error (MSE) loss on the training, validation, and test partitions. The final model achieved good predictive accuracy with a test set R-squared of 0.85 against measured borehole data. The MATLAB environment enabled rapid prototyping and visualization for this bivariate neural network modeling case study.

The original multivariate model used data from many different sources to make a complete map of the substrate. This bivariate analysis, on the other hand, was meant to focus on the relationship between surface topography and subsurface levels. Therefore, the input layer was reduced to accept only the 30 m DEM raster. The output layer consisted solely of the measured borehole substrate elevations. This simplification to just two variable domains allowed directly quantifying and modeling their linkage, which was revealed to be critical in feature importance analyses. The hidden layer size was adjusted to maintain the complexity required to represent nonlinear interactions without overparameterization (Allen-Zhu et al. 2019). No additional geospatial data was provided in order to examine the DEM-substrate association in isolation. However, their physical correlation may integrate influences from other factors like lithology and drainage. Still, concentrating on the elevation pairing through a tailored neural network provided insights into this important connection complementary to the high-dimensional model. The architecture adaptations balance representation power and interpretability for unpacking this specific subsystem interaction critical to aquifer geometry. Specialized variable analyses will form a growing toolkit for granular process understanding.

Bivariate model convergence

Tables 3 and 4, respectively, the model training progress and the model training result, the neural network model was trained for 10 epochs. The mean squared error (MSE) loss declined from an initial 1.76E + 05 to 3.53E + 03 for training by the stopped epoch 10 (Table 3). The best validation MSE of 8883 occurred at epoch 4 in the performance plot (Fig. 13e). Thereafter overfitting caused validation/test divergence. Within 4 epochs, the model achieved the targeted validation checks and gradient. By epoch 4, R-squared reached 0.935 training, 0.8467 validation, and 0.8586 test. The training MSE decreased each epoch, reaching the minimum at completion. After the optimal epoch 4, overfitting increased validation/test MSE versus training. Analyzing convergence and performance enabled identifying the ideal trained model at epoch 4 before overfitting effects prevailed. The performance plot and metrics demonstrate efficient learning of the DEM-substrate relationship given the constraints.

Table 3 Model training progress: initial, stopped, and target values for parameters of the bivariate ANN model (epoch, elapsed time, performance, gradient, Mu, validation checks)
Table 4 Model training results for the bivariate ANN model, displaying MSE, RMSE, and R-square values for training, validation, and test datasets
Fig. 13
figure 13

Subfigures a to b display regression plots comparing the ANN bivariate predicted model with the actual substrate for the training, validation, test, and all sets. Subfigure e illustrates the model convergence plots, indicating the training and validation losses across epochs. Subfigure f presents histograms depicting the distribution of prediction errors for the training, validation, and test set

Regression plots visualized the relationship between predicted substrate outputs and actual targets for the training, validation, test, and overall data (Fig. 13a-d). Strong linear correlations are evident, with minimal deviation from the 1:1 line across all partitions. The training data exhibits the tightest fit given its direct use optimizing model parameters. The validation and test sets overlay closely, with similar scatter around the ideal fit line. This limited overfitting is quantified by their comparable R-squared values of 0.85 for test and 0.84 for validation. The consistent alignment highlights the network’s capability to accurately represent the elevation–substrate relationship on both optimized training and new out-of-sample data. The overall data regression achieved a very high R-squared of 0.91, further demonstrating the excellent model fit. Together, the regressions validate the neural network successfully learned meaningful subsurface patterns linking surface elevations and aquifer substrate depths.

The error histogram (Fig. 13f) visualizes uncertainties by plotting substrate elevation prediction errors for training, validation, and test data. Tight distributions centered on zero confirm minimal overall bias. The similar spread and symmetry indicate consistent normal error variance across partitions (Bishara and Hittner 2015). No fat tails exist. A large spike at ~ 2 m shows many highly accurate predictions. The validation/test overlay evidences negligible overfitting. While some larger errors occur in the tails, they are minimal, with only ~ 3–5 beyond ± 100 m. The tight clustering around zero substantiates proficient generalization beyond the training data (Brunton et al. 2021). Together with regression and MSE metrics, the error analysis lends confidence in the network reliably modeling the intricate DEM-substrate relationships. Minor deviations are expected, but the predominance of small errors underscores precise modeling of the complex geospatial correlations governing aquifer geometries.

Nonlinear bivariate model function fit

By connecting DEM inputs to substrate depth outputs, the neural network model learned a complex, nonlinear fit surface (Fig. 14). With a test RMSE of 72 m and an R2 of 0.85, the excellent accuracy is highlighted by the close clustering of training, validation, and test points around the fit curve (Fig. 14a). In regions with higher elevations than 700 m, substrate depth rises by more than 200 m for every 100 m of DEM rise, indicating a nearly 1:1 correspondence. The slope is shallower between 500 and 700 m, at about 100 m substrate for every 100 m of surface rise (Almalki and Angelides 2022). The surface flattens to less than 50 m per 100 m below 500 m as the aquifer is limited. Larger deviations up to 100 m are shown in the associated error plot (Fig. 14b) in elevation regions DEM range of 500–600m, which contains the majority of measurement, suggesting modulating factors like lithology and structural events. But errors are tightly clustered near zero across the bulk DEM range of 200–400 m, and 700–800 m. This measures the model's interpolation ability under suitable constraints. Combining this with the 72 m test RMSE shows effective generalization.

Fig. 14
figure 14

a Feature fit plots illustrating the learned mapping from DEM to substrate elevations. b Associated error plot depicting the function fit result from the ANN bivariate model

Even though the fit surface is complicated, the high R2 and many small errors show that the neural network is able to find important subsurface patterns from sparse elevation data. The model strikes a balance between regularization to prevent overfitting given the constraints and flexibility to represent real-world complexities. In addition to providing quantitative metrics to increase confidence, the visualization provides qualitative validation of learning.

Comparison of multivariate and bivariate model performances

The original multivariate Gaussian process regression (GPR) model incorporating additional data sources attained superior performance with a lower test RMSE of 61.15 m and 54.1 m for the multivariate neural network (Table 1), whereas the bivariate neural network model achieved impressive accuracy for mapping substrate from DEM with a test RMSE of 72 m (Table 4). This shows the benefit of integrating all available data rather than concentrating on individual variable pairs. To improve prediction accuracy, the GPR model makes use of supplementary data from piezometric, geologic, and permeability measurements. However, the DEM-substrate relationship is more thoroughly explained by the specialized bivariate network. Additionally, the streamlined model could offer baseline elevation-based estimates in regions lacking ancillary data. Accuracy and interpretability are traded off in the multivariate and bivariate approaches. Combining them could lead to increased performance and transparency (Ismail et al. 2020), for example, by initializing the holistic GPR model with DEM-substrate fits. This emphasizes the value of multifaceted modeling for thorough hydrogeological understanding.

This focused analysis of directly modeling subsurface elevations from surface topography alone is in line with the original goal of figuring out why the relationship between elevation and substrate is so important. Even though the multivariate model uses data from many different sources to map an aquifer as a whole, focusing on the DEM-substrate pairing made it possible to look at this connection in more detail. The customized neural network architecture, nonlinear fit plot, and specialized regression metrics all gave more information about this variable linkage than the high-dimensional model could (Pasupa and Sunhem 2016). The results impart new details on how surface elevations govern subsurface geometries across the watershed. And isolating this bivariate behavior facilitates transferring insights to regions where only minimal elevation data is available. By trading off some holistic accuracy for an in-depth look at a key physical correlation, this study exemplifies the value of targeted modeling approaches for unlocking specific hydrogeological knowledge. The findings will help refine future multivariate models and illuminate linkage intricacies essential for robust aquifer characterization.

Advantages of using deep learning and machine learning approach

This study demonstrates the benefits of specialized deep learning architectures for targeted predictive insights into the complex AL Haouz aquifer system. Bivariate neural networks isolate and examine the relationship between surface elevation and subsurface depth, while multivariate Gaussian processes integrate diverse regional datasets more broadly. The nonlinear subsurface fit plot revealed local intricacies by concentrating solely on the elevation–substrate connection. Simplified statistics facilitated detailed analysis, trading some generalizability for enhanced interpretability (Kratzert et al. 2018). However, Gaussian processes achieved superior overall mapping performance due to greater flexibility in representing uncertainty (Vasudevan et al. 2009).

As shown through feature importance ranking, focused deep networks can extract key associations between topography and burial depth that drive aquifer structure in the AL Haouz basin. Though partial, these targeted insights support and enhance multivariate models by separating influential signals from indirect proxies. This two-pronged approach of specialized and generalized learning combines accuracy with transparency for subsurface characterization (Naghibi and Pourghasemi 2015).

Continued advances in multi-objective deep learning (Vahdat-Aboueshagh et al. 2022) and physics-aware neural networks (Raissi et al. 2019; Roy et al. 2023) should expand possibilities for focused statistical hydrogeology in the AL Haouz region. Machine learning creates new observational windows by concentrating computational power on elucidating interactions between variables controlling this complex aquifer system (Shen 2018). As demonstrated here, strategic deep learning reveals intricacies obscured within immense regional datasets, supporting more holistic modeling of the vital AL Haouz water resource.

Limitations, model uncertainties, and perspectives

The study presents promising capabilities for aquifer subtratum characterization through the application of machine learning and deep learning models. However, it also reveals several key perspectives and limitations that warrant further consideration. Primarily, the reliance on limited data availability, especially regarding borehole substrate measurements, introduces uncertainties into the models' training and evaluation processes. Sparse spatial coverage of these measurements, particularly in the Houaz Eastern region, hampers the models' accuracy and generalization. The lack of borehole data and its uneven distribution, particularly in the eastern regions, limit the use of multivariate machine learning in AL Haouz. Since bivariate deep learning only looks at elevation–substrates, it does not take into account how valuable more hydrogeological data could be.

Addressing this limitation would require more extensive reconnaissance drilling of aquifer properties to improve model performance (Hubbard et al. 1999; Adombi et al. 2021; Rödiger et al. 2023). Additionally, the interference between single-layer and multilayer systems, particularly evident in the western Hoauz, poses challenges to accurately predicting subtratum heights in this region. Furthermore, the study highlights the potential benefits of incorporating supplementary data sources beyond core geospatial datasets like topography, piezometry, and permeability. Integration of hydraulic head time series, recharge estimates, geophysical surveys, and geological maps could provide added constraints for data-driven modeling and enhance prediction capabilities (Yeh et al. 2008; Abowarda et al. 2021). However, the scarcity of such ancillary measurements precluded their inclusion in the current study, suggesting avenues for future research and data collection efforts.

Moreover, while the explainability of machine learning models, such as the Gaussian process regression (GPR) model, provides valuable insights, the black-box nature of neural network models hinders understanding of their internal workings and diagnosing mispredictions. Adopting optimized interpretable algorithms could aid in enhancing trustworthiness and diagnostic capabilities of the models (Razavi 2021; Tabasi et al. 2022; Alshehri and Rahman 2023) Additionally, advancements in subsurface measurement technologies and diverse data integration hold promise for unlocking the full potential of machine learning for robust aquifer geometry characterization.

Despite the significant potential demonstrated by machine learning and deep learning models in revealing subsurface structures, there are still limitations to their overall applicability, particularly in the context of the AL Haouz region. To address these challenges, the integration of enhanced field observations, infusion of domain knowledge, and fusion of specialized deep learning techniques with holistic machine learning approaches are recommended to enhance the robustness of illuminating complex aquifer architectures. This multifaceted approach could lead to more accurate and reliable predictions, facilitating better management and conservation of groundwater resources in semiarid regions like AL Haouz.

Conclusions

This study demonstrates the novel capabilities of data-driven modeling for high-resolution aquifer base mapping, integrating sparse borehole data with regional datasets to produce accurate three-dimensional substrates. The Gaussian process model achieved remarkable generalization capability, evident in consistent validation and testing errors around 60 m RMSE alongside exceptional 0.84 R2 performance. Visualizations and multiple vertical transects confirmed clear differentiation of western, central, and eastern zones via abrupt substrate changes, affirming geological interpretations on structural constraints governing modern flow.

Aquifer thickness maps revealed significant deposits in western and central areas, likely from paleoclimatic fluvial shifts. Meanwhile, minimal strata and erosion manifested as shale outcrops in the north. The complex morphologies of valleys and depressions correspond to intricate patterns along subsurface profiles, highlighting how paleodrainage systems and structural events still influence modern reservoirs.

Comparisons showed multivariate models incorporating diverse data sources achieved superior accuracy over specialized bivariate networks. However, concentrating on key elevation–substrate relationships offered valuable interpretability and transferability. Isolating this linkage provided new details on how surface topography governs subsurface geometries, imparting insights applicable even in data-scarce regions.

By generating high-fidelity 3D aquifer architecture maps without direct drilling, the demonstrated methodology can guide groundwater modeling and sustainability amidst development pressures. Expanding model fusion and physics-infused training further offer untapped potential. Overall, by unraveling the hidden intricacies of vital systems, this study establishes machine learning’s indispensable role in illuminating aquifers to meet rising water challenges not only in the AL Haouz-Mejjate region but also in similar areas worldwide.