Introduction

The use of electrical submersible pumps (ESPs) as an oil and gas primary recovery method has gained wide acceptance with approximately 15 – 20% of one million wells globally using the technology (Breit and Ferrier 2011). Of the oilfield lift systems existing today, the ESP is regarded as one of the most versatile, handling a wide range of flow rates and suitable for both vertical and deviated wells. They are commonly called the champions of high volume and depth, delivering flowrates of over 45,000 bbl/day and working up to 12,000ft depth (Guo et al. 2007). They also present a small footprint making them useful even in offshore installations. However, a few drawbacks include: low efficiencies in dealing with high gas volumes (especially if the volume is > 10% at the pump intake); their low tolerance for solid laden fluids, etc. The anatomy of a typical ESP unit is shown in Fig. 1a. It essentially consists of surface equipment, namely electric power supply, a transformer, a control board, and a control valve, and the underground equipment comprise a centrifugal pump, an electric motor, seal electric cable, and sensors (Bafghi and Vahedi 2018).

Fig. 1
figure 1

a Components of a typical ESP unit, b Free body diagram of flow process in an ESP unit Source: Fontes et al. (2020)

The working principle of the ESP as described by Kimrays Inc (2023) is as follows: To power the motor, heavy duty cables connected to surface controls are used. The motor then rotates the shaft that is connected to the pump. The rotating impellers draw in reservoir fluids via the pump intake, pressurizing it and transporting it to the surface. Since the ESP comprises multiple stages of centrifugal pumps connected to a submersible electric motor, this process is repeated by the fluid in every stage of the pump (Takacs 2009). In a bid to optimize performance, a downhole sensor that communicates real-time system data such as pump intake and discharge pressures, temperatures, and vibration can be installed. The function of the seal-chamber is to isolate and protect the motor from corrosive wellbore fluids and pressure equalization in the wellbore with the oil pressure inside the motor. Figure 1(b) essentially illustrates the free body diagram of the process behind an ESP-assisted oil production system. To aid fluid lifting from an oil well, the ESP alters the following: the frequency (f) with which the pump rotates, and the choke opening (zc). This is done to exert further pressure to the reservoir fluid and to drive it to the surface through the production tubing (Fontes et al. 2020). The hydrocarbon flow rate (qp), flow type in the production column, and the electrical submersible pump head dynamics (H) stipulate the operational envelope constraints, namely upthrust and downthrust constraints. This envelope is essentially the region within which electrical submersible pumps are operated. This region is depicted in Fig. 1b and is the gap that exists between the minimum and maximum rate lines. The pump head is the quotient of the pressure and the fluid density. As the volumetric rate through the pump rises, its pump head drops (Hoffmann and Stanko 2017). The wellhead, bottom hole and manifold pressure are represented by Pwh, Pbh, Pm, respectively.

Numerous studies have been carried out by diverse researchers on various aspects of electrical submersible pumps using artificial intelligence techniques (AI). For instance, for ESP optimization studies, Mohammadzaheri et al. (2016) used genetic algorithm, while Dachanuwattana et al. (2022) utilized random forest algorithm and neural network. For failure predictions of ESP systems, Okoro et al. (2021), Chen et al. (2022), Brasil et al. (2023) utilized neural networks, while Ambade et al. (2021) utilized random forest algorithm. For flowrate predictions for ESP-lifted wells, Mohammadzaheri et al., (2016, 2019); Azim (2020), Popa et al.(2022), Sabaa et al. (2023) utilized neural networks, Hallo et al. (2017) used genetic programming, Ganat and Hrairi (2018) used nonlinear regression, Ricardo et al. (2018) used support vector machines, Franklin et al. (2022) used physics informed neural networks, while Abdalla et al. (2023) used support vector regression, extreme gradient boosting trees and convolutional neural networks for their model development. Other studies on ESP include: power consumption of ESP units (Khakimyanov et al. 2016); mathematical modelling of ESP electric motors (Cortes et al. 2019); comparison of ESP electric motors (Bafghi and Vahedi 2018); validation of empirical models for head and surging characteristics of ESPs (Ali et al. 2022); and the development of an automatic control mechanism for ESP units (Krishnamoorthy et al. 2016). Beyond the use of the conventional artificial neural networks, recent studies involving forecasting of production rate and production enhancement process have made extensive use of deep neural networks/deep learning approaches. For instance, development of data-driven proxy model for accurate production evaluation of shale gas wells with complex fracture network (Chu et al. 2023); ultimate recovery estimation of fractured horizontal wells in tight oil reservoirs (Luo et al. 2022); prediction of production data such as (production rate, cumulative oil production, and temperature field) of steam-assisted gravity drainage in heavy oil reservoirs (Wang et al. 2021a); image recognition models (Wang et al. 2021b); optimization of thermochemical production in heavy oil reservoirs (Zhou and Wang 2022); gas emissions (Zhou et al. 2022) and slope stability estimation (Huang et al. 2023).

This study is focused on developing a robust model for predicting fluid flow rate in ESP-lifted wells. The existing models for electrical submersible pump operated wells were developed without the model being made explicit, no computational cost evaluations and no sensitivity analysis of the input variables to the model. All these factors limit their usage in real-time prediction of oil flow rate. To fill this gap, this work would use artificial neural network to develop an explicit, computationally efficient and technically reproducible model for predicting oil production rate in an electrical submersible pump operated well with inputs that are easily obtainable at the surface.

Of the many machine learning algorithms, the neural network algorithm was chosen due to the significant advantage it possesses over traditional nonlinear modelling tools namely, that of being able to capture the complex nonlinear relationships that exists between a set of input variables and an output parameter without having the need for the models to be stated in advance (Livingstone et al. 1997). Other desirable features of neural networks include their fault tolerance property (Torres-Huitzil and Girau 2017) and the existence of several training algorithms. Despite these advantages, the vast number of tuning parameters coupled with the trial-and-error approach involved in finding the network’s optimal architecture all of which led to longer training times, are some of the drawbacks of this algorithm (Ng et al. 2020).

The attributes of the proposed model for which novelty is claimed include: First, knowledge on the computational cost of the models has been presented, a feature missing in most ANN models on this subject. Second, the explicit presentation of the proposed models has been done (something rarely found in published ANN predictive modelling studies), a feature which allows the models to be easily deployed in software applications, and third, a sensitivity analysis of the input variables has been advanced. A combination of all these features makes this study unique and counts to its contribution to knowledge.

This work is divided into five sections. To begin, the second section contains a critical review of extant literature on the use artificial intelligence in developing models for oil flow rate in artificially lifted wells. The third section describes the dataset used, data analytics approaches applied on the dataset, the method used for the modelling and the various metrics used to evaluate the developed model. The fourth section contains a discussion of the results obtained and a comparison of the results obtained with existing models, while the fifth section concludes the study by highlighting the significant findings and recommends some noteworthy subsequent works that might potentially be embarked upon by future researchers.

Review of related studies

Table 1 provides a summary of the models that were developed for use to estimate flow rate in artificially lifted wells. The researches are arranged in order of their evolution from the earliest to the latest. The key elements of each study such as the modelling strategy, fluid type, data size utilized to construct the models, inputs to the models and the research gaps in the studies are highlighted in the summary. Since the literature on the subject of artificial lift systems is vast, in order to keep the review in line with the aim of the study, papers focusing on modelling of oil flow rate in artificial lift wells were emphasized and given higher priority for inclusion in the summary. Furthermore, to make the review more concrete, critiques in the form of research gaps are highlighted for each reviewed work. The following are the key findings from this summary:

  1. i.

    Input variable selection: The inputs used by various researchers for their model development were wide and varied with each researcher choosing different input variables even when they were developing models for the same kind of artificial lift system. This is evidenced by the fact that while some researchers made use of a large number of input variables (as much as 28 input variables), there is a case where just one input was utilized to develop a model. This indicates no unanimity amongst researchers as to the parameters affecting the fluid flow in an artificially lifted well and by extension may be one of the reasons it is difficult to have a universal model for oil flow rate in artificially lifted wells. Some of these input variables are parameters that are not easily obtainable at the surface. Furthermore, some researchers have GOR or GLR as an input parameter which other researchers like Khan et al. (2020) have argued that it (GOR/GLR) reveals the output to the model and ultimately defeats the purpose of developing a model for oil flow rate.

  2. ii.

    Type of artificial lift system: The preponderant artificial lift system considered by most researchers is the gas lift system. The most probable reason for this focus on gas lift systems is because of its attractive characteristics enumerated by Garrouch et al. (2020) that includes: it uses essentially no moving parts; it has proven to be adaptable across a wide range of well parameters and reservoir characteristics; it has a relatively low operational cost owing to the fact that they take up very little space at the wellhead. This is particularly advantageous for offshore platforms where space is at a premium and every square inch is expensive (Ranjan et al. 2015). Finally, gas lift is advantageous in producing high gas–liquid ratio wells which tend to be problematic when pump-assisted methods such as electrical submersible pumps are used.

  3. iii.

    Parametric importance of input variables: It was observed that whereas Khan et al. (2020) had reported that oil API gravity had significant impact on oil rate predictions in artificially lifted wells, Elgibaly et al. (2021) posited that the oil API gravity had little contribution to the prediction of oil flow rate in artificially lifted wells. According to Ghareeb and Shedid (2007), factors that have significant effect on production rate include: producing depth variation, well head temperature and size of tubing while bottom-hole temperature, gas oil ratio and water cut have negligible effect on production rate.

  4. iv.

    Modelling technique adopted: The presented literature review indicates that numerous correlations have been developed to predict the flow rate in artificially lifted wells using machine learning techniques. About 95% of the models were developed using the ANN technique. The ANN models were mainly multiple input single output models. Other techniques such as SVR, Fuzzy logic and some hybrid intelligent models such as ANFIS were utilized, but their usage was few and far between. For the ANN technique, some of the models were complex, while a few were relatively simple in nature.

  5. v.

    Data set size: According to Syed et al. (2022) any machine learning process for developing predictive models requires a huge amount of data. Here is how machine learning-based models’ performance is impacted by the volume of training data: First, the expressive capability of neural networks – the ability to approximate functions (Lu et al. 2017) is significantly influenced by the amount of training data. In essence, more training data generally result in improved generalization of the learnt patterns and an increase in expressive ability. Second, it leads to better generalization: more different instances are provided by a larger dataset, which aids the neural network in learning a wider variety of patterns and changes in the data. Since the network can now perform better on previously unknown or fresh data, this frequently leads to enhanced generalization. Third, it leads to less overfitting. A larger dataset makes it less probable for the network to memorize noise or outliers that are present in a smaller dataset. This lessens the likelihood of overfitting, which occurs when a network becomes overly focused on training data and performs badly on fresh input. As Ajiboye et al. (2015) put it, employing a substantial amount of data can improve the accuracy and generalizability of a predictive model. However, the ambiguity surrounding what exactly is a big dataset remains undefined. However, with small datasets, techniques such as Bayesian regularization and general regression neural network (GRNN), a probabilistic network could be applied.

  6. vi.

    Generalizability of models: In predictive modelling using machine learning techniques, generalizability is the ability of a model to carry out its intended function on data from a fresh, independent, or unseen dataset (i.e. data that were never utilized during the development process), similar to the external validation of a model. If a model does not fit this criterion, it may have learnt biases from the methods used for data production and processing rather than the underlying link between the attributes and the desired output. This makes it more difficult to produce reliable results generally as well as to replicate results when used in practice. Most of the research failed to conduct an evaluation on the generalizability of the models they developed; rather they solely relied on training and testing the models on the same dataset. This approach can result in overestimating models performance and lead to suboptimal performance when subjected to a new dataset (Abdalla et al. 2023).

Table 1 Summary of existing machine learning-based models for predicting flow rate in artificially lifted wells

Materials and methods

This section's main purpose is to highlight the data sources and development methods required to create the model stated in the study's objectives. Also included in this part are specifics about the modelling process and its parameters.

Data description

The data used for this study were sourced from a field located in the western side of the Gulf of Suez area, Egypt. The reservoir structure is an elongated horst block trending NW–SE with general tilting to the NE direction. The data were collected from seven ESP-operated wells from an oil-bearing Miocene reefal vuggy carbonates and fractured Eocene with a permeability range of 10 to 100 md and a porosity range of 11 to 24%. The used ESP pumps are a Schlumberger G-series models (GN4000, 50 Hz, 2917 RPM, 86 stages, nominal housing diameter of 5.13 inches and minimum casing size of 6.625 inches) with an optimum operating range of 2667–4000 bpd, a shaft break horsepower limit of 500Hp, and a housing burst pressure limit of 6000 psi. The dataset consists of 275 data points, the main parameters in the dataset include: upstream pressure (UP P), downstream pressure (Down P), ESP pump intake pressure (Pi), pump discharge pressures (Pd), intake temperature (Ti), discharge temperature (Tm), casing head pressure (CHP), gas oil ratio (GOR), choke size (1/64 inch), and the output parameter oil flow rate (Q). Table 2 gives a statistical description of the input and output variables using statistical measures such as mean, standard deviation, and range.

Table 2 Statistical description of data used in the study

Exploratory data analysis

According to Abdalla et al. (2023), exploratory data analysis (EDA) utilizes a variety of tools, including visualizations (scatter and box plots, histograms, etc.), dimensionality reduction techniques, etc., to assist in finding relationships, concealed patterns and anomalies. In this study, the box plot would be used to establish the distribution of the data for each variable. Figure 2 shows the box and whisker plot for both the normalized input and output variables.

Fig. 2
figure 2

Box plot of showing data distribution among normalized inputs and output variables

From Fig. 2, it is observed that almost all the variables except the pump intake temperature have dispersed data. This is evident by the length of the boxes. The principle is that the longer the box, the more dispersed the data is whereas the shorter the box, the less dispersed the data. Furthermore, since the median is not in the middle of the box, but closer to the bottom of the box, and the whisker is shorter at the lower end of the box, then the oil flow rate and discharge pressure data distribution are positively skewed while the pump intake pressure is negatively skewed.

Data normalization

The dataset utilized for the study needed to be normalized in order for the neural network to seamlessly understand the relationship between the input variables and the output variable, thereby improving the network’s performance. Put differently, the network concurrence is made easier by the normalization of these variables. To further reinforce the importance of data normalization, Khare and Shiva (2007) reported that if one input has large a number and another a small one, but both show comparable variance, then the possibility of the network ignoring the small input in favour of the larger one would be sufficiently high. These variables in this study originally have ranges that differ by several orders of magnitude. The mapmin-max method was used to normalize each input and output variable so that they all ranged from − 1 to + 1. The formula shown in Eq. 1 was used to normalize each variable in the dataset, reducing the range of potential values from 0 to 6172.1 to between − 1 and + 1.

$$Y=2\left(\frac{X-{X}_{{\text{min}}}}{{X}_{{\text{max}}}-{X}_{{\text{min}}}}\right)-1$$
(1)

where \({X}_{{\text{min}}}\) and \({X}_{{\text{max}}}\) are the minimum and maximum values of X, and X is the set of observed values of X whereas Y is the normalized value of X.

However, as soon as the training of the neural network is completed, since the value of the network output is normalized, there is need to de-normalize it in order to transform it into the actual value. Equation 2 is the formula used for the denormalization.

$$X=0.5\left(Y+1\right)\left({X}_{{\text{max}}}-{X}_{{\text{min}}}\right)+{X}_{{\text{min}}}$$
(2)

where \({X}_{{\text{min}}}\) and \({X}_{{\text{max}}}\) are the minimum and maximum values of X, and X is the set of observed values of X whereas Y is the normalized value of X.

Overview of artificial neural networks

Neural networks are essentially computational algorithms inspired by biological systems particularly the brain, created to learn and use that information to predict the outcomes of complicated systems. The neuron is a neural network's fundamental building block. These neurons are connected to form a network that can handle complex problems (Behnoud far and Hosseini, 2017). The three layers a neural network is made up of are: the input, hidden and output layers. The number of input parameters to the network is represented by the neurons in the input layer. The set of values or features in a dataset that are needed to forecast the outcome are known as the inputs. Feature extraction is the responsibility of the neurons in the hidden layer. The following describes how a neural network processes information: first, each of the inputs (I1, I2, I3, I4) is assigned connection weights (W1, W2, W3, W4). These weights are basically real numbers that are assigned to each input which defines the importance of the input in predicting the output. These inputs are then multiplied by their individual connection weights. The weighted sum of the inputs and connection weights are then combined and a bias term (b) is added to the summation. The essence of the bias is to either increase or decrease the input that goes into the activation function. The summation is passed through a transfer or activation function, and the output is then computed and transferred to another neuron. The responsibility of the activation function is to introduce nonlinearity into the neural network model. Sigmoid transfer function and linear activation function (purelin) are recommended for the hidden and output layers, respectively (Maier and Dandy 2000). All these are illustrated in Fig. 3.

Fig. 3
figure 3

Illustration of a neural network process

Procedure for developing a predictive model using neural networks

The series of steps outlined in Fig. 4 shows the pathway for developing a predictive model using neural networks based on supervised learning. The dataset before being divided into training, validation, and testing datasets, it is first normalized before being fed into the network. Whereas the training dataset represents that slice of the dataset which the network learns from or is used to train the model, the validation dataset provides an objective assessment of the network while adjusting the model’s hyperparameters. The test dataset on the other hand is utilized after the model is completely trained to evaluate how well the model fits the data. To establish the performance of the model, the mean square error (MSE) and the goodness of fit are the metrics employed. The network with the minimum MSE close to zero and goodness of fit close to one is chosen as the best performing model.

Fig. 4
figure 4

Flowchart for developing a predictive model using neural networks

Hyperparameter settings for model training, testing and validation

The settings used for the neural network model are presented in Table 3. By default, the MATLAB software partitions the data into three sets: the training data set (70%), validation data set (15%) and test data set (15%). Training data are used to adjust the weight of the neurons. Validation data are used to guarantee that the network generalizes at the training stage, the testing data are used to evaluate the network after being developed. Figure 5 is a schematic diagram showing the splitting of the data sets into training, validation and testing datasets.

Table 3 Hyperparameter settings for the neural network model
Fig. 5
figure 5

Data splitting in the neural network

The stopping criteria are usually established by the preset error indices, e.g. mean square error (MSE) or when the number of epochs reaches 1000. For the neural network model, the lowest MSE was used. Furthermore, in order to prevent the problem of overfitting, the ‘early stopping’ strategy was adopted.

Protocol for finding the optimal neural network architecture

The learning and performance of a neural network are significantly impacted by the number of neurons present in the hidden layer of the network. The universal approximation theorem asserts that there exists a multilayer perceptron with a specified number of hidden layer neurons that is roughly right for any input–output mapping function in supervised learning. Sadly, the theory provides no guidance on how to locate this number. In order to discover this number, the trial-and-error method was used.

Model performance assessment methods

The task of evaluating the performance of a model is fundamental to predictive modelling. There are many error metrics existing in literature, however, since there is not yet any consensus on the most appropriate metric for evaluating model errors, this work has chosen to use three of the existing metrics. The error analysis metrics include: coefficient of determination (R2), mean square error (MSE) and root mean square error (RMSE). The choice of these three metrics is based on the following. First, in a recent study by Chicco et al. (2021), they suggested from their analysis that the coefficient of determination (R squared) should serve as a standard metric for regression models since it is more informative and does not have interpretability issues as other metrics. It has to be stated that the R2 value shows the discrepancy between the actual and predicted data and indicates how close the points are to the bisector in the scatter plot of two variables. Second, RMSE is a statistical measure of the variance of predicted data values around actual data values. The RMSE is said to be a good statistical criterion for model evaluation because the unit of the error is the same as the unit of the measured value thus making it more interpretable than MSE (Jierula et al. 2021). Furthermore, RMSE is optimal for normal (Gaussian) errors (Hodson 2022). The distinction between R-squared and RMSE is that while the former is a relative measure of fit, the latter is an absolute measure of fit. RMSE is a good indicator of how effectively the model predicts the output parameter. If the primary goal of the model is prediction, then this fit criterion is crucial. The MSE is the average squared difference between actual values and predicted values (Adamowski et al. 2012). A perfect model is described by an R2 of 1 and an RSME and MSE of 0. The mathematical descriptions of the three statistical metrics are shown in Eq. 3, Eq. 4 and Eq. 5.

$${R}^{2}=1-\frac{\sum_{i=1}^{N}{\left({y}_{{\text{actual}}}-{y}_{{\text{predicted}}}\right)}^{2}}{\sum_{i=1}^{N}{\left({y}_{actual}-\overline{y }\right)}^{2}}$$
(3)
$${\text{MSE}}=\frac{1}{N}\sum_{i=1}^{N}{\left({y}_{{\text{actual}}}-{y}_{{\text{predicted}}}\right)}^{2}$$
(4)
$${\text{RMSE}}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({y}_{{\text{actual}}}-{y}_{{\text{predicted}}}\right)}^{2}}$$
(5)

For Eq. 3, Eq. 4 and Eq. 5, N = number of data samples, \({y}_{{\text{actual}}}\) = the actual or experimental values, \({y}_{{\text{predicted}}}\) = values predicted by the developed model, \(\overline{y }\) = average of the actual or experimental values.

Results and discussion

This section presents the results obtained in the course of developing neural network models for the prediction of oil flow rate in an ESP operated well. First, the result from the method used for selecting the model architecture is presented. The result from the performance of the proposed model is presented next, while the mathematical representation of the model comes thereafter. The results from the parametric importance of the input variables, the models computational burden analysis as well as the field deployment of the model would be highlighted.

Optimal neural network architecture determination

The number of neurons in the network’s hidden layer influences the generalization ability of the ANN model. In order to find the appropriate architecture for the networks, the trial-and-error approach was adopted. In this regard, numerous network topologies were evaluated, wherein the number of neurons in the hidden layer was varied between 1 and 20. The error function chosen was the MSE. Decision on the optimum topology was based on the minimum error of the testing data. Each topology was tried for 25 times in order to get the best network from that topology. At the end of the trials, it was observed that a network having 10 neurons in the hidden layer gave the optimal performance oil flow rate prediction with test MSE values of 2.37E-05. Figure 6 illustrates the MSE obtained with different number of hidden neurons for the oil flow rate data. The essence of this figure is to highlight the magnitude of the errors obtained when a specified number of neurons is tried in the hidden layer.

Fig. 6
figure 6

MSE of model by varying the number of hidden layer neurons

The optimal architecture of the ANN network for the model is shown in Fig. 7. This figure aids in the interpretability of the neural network’s optimal topology. It equally helps experts and non-experts in the field of machine learning visualize the way and manner the different layers of the neural network are interconnected.

Fig. 7
figure 7

Optimal architecture of the neural network for oil flow rate estimation

Figure 8 presents the scatter plots of the proposed model for the training, validation and testing dataset. The predicted model is in agreement with the experimental values for the training, testing and validation sets as seen in their correlation coefficients (R). The reason for this assertion is that the closer the correlation coefficient is to 1, the better the model. However, using only the correlation coefficients as the basis for determining the predictive capability of a neural network model is not always recommended. This is because a model may have a good correlation coefficient but predicts poorly when subjected to new data sets that were not used during the ANN training process.

Fig. 8
figure 8

Scatter plots of ANN model for oil rate prediction

Furthermore, the blue, green, red and black coloured solid lines inclined at 45° in the regression plots in Fig. 8 are the best fit linear regression lines between the model outputs and the model targets. These plots are necessary in showing whether or not some of the data points have good or poor fits. In this instance, the training, testing and validation data all have good fits since the data points all lie on the 45° line.

Performance evaluation of developed neural network model for oil flow rate

The network’s performance with regard to its training, validation and testing capability is presented here. Given that the main objective of a trained neural network is its prediction capability, it is believed that the criteria for testing the predictive capability should be the model’s performance when exposed to the test dataset. Every neural network in this analysis was trained with 1000 iterations using the Levenberg–Marquardt algorithm. The performance of the models after being evaluated using three statistical error metrics is summarized in Table 4. From the table, the value for the coefficient of determination for the ANN model developed was 0.999965 for the test data indicating that 99% of the data fit the regression model. According to a categorization of R-values by Taylor (1990), a weak or low correlation has a value R ≤ 0.35; a moderate correlation ranges between 0.36 ≤ R ≤ 0.67 and high correlation ranges between 0.68 < R < 1.0 values. In like manner, Ozili (2023) opined that if the R2 value is more than 0.5, then, a strong correlation exists between the actual/experimental and predicted values. In addition, according to Alel et al. (2018), R2 value ranging between 0.7 and 0.9 can be interpreted as high positive correlation, while a range of 0.0 to 0.3 can be interpreted as negligible correlation. The R2 values obtained in respect of the proposed ANN models in this study indicate strong correlation between the ANN predicted values and the field data. With respect to the MSE, a small value of the MSE and RMSE indicates that it is close to finding the line of best fit. For a perfect model, the MSE and RMSE should have a numerical value of 0. In the case of the oil flow rate data used in developing the ANN model, reasonable values were obtained for the MSE and RMSE.

Table 4 Summary of ANN model performance for oil flow rate prediction in an ESP operated well

Figure 9 shows the plots of the field data against the predictions from the developed ANN model. This figure essentially enables one visualize how well the developed model matches the field data. From the figure, it can be seen that the model has the capacity to capture the nonlinearities associated with the prediction of oil flow rate.

Fig. 9
figure 9

Comparison of field data and neural network model predictions for oil flow rate

Neural network model weights and biases

The weights and biases of a neural network are essentially the most important parts of its development and functionality. The essence of including the crucial details (the weights and biases) of the models developed is to ensure that the models are reproducible (ability to recreate a model without the original code). Legendi et al. (2013) posited that model reproduction is scarcely carried out since successful reproductions do not seem to deliver new scientific results and the reasons of failed reproduction may be hard to discern. Table 5 lists the weights and the biases of the developed empirical correlation that can be used to predict oil flow rate in ESP operated wells. A description of these terms is presented as follows:

Table 5 Weights and biases for ANN Model for oil flow rate

The neural network weights: During neural network training, it is initialized with a set of weights. The network weight values are essentially random numerical values generated by the network and used as coefficients to multiply the input parameter values. The neural network weights are similar to the slope in a linear regression model, where a weight is multiplied by the input to add up to produce the output. A weight essentially dictates how much influence the input will have on the output. For example, weights that are negative indicate that an increase in that input would lead to a decrease in the output and vice versa. Weights that are near zero imply that increasing or decreasing the input would not change the output. In order to get optimum results, during the training period, the weights are adjusted till the optimum weights are obtained. In this work, Wij (or IW) and W2 (or LW) are the nomenclatures used to represent the input layer to hidden layer weights and hidden layer to output layer weights, respectively.

Threshold or Bias: In the context of neural networks, threshold and bias are the same. Basically, the bias in a neural network plays the role that the intercept plays in a linear equation. The bias is an additional parameter which is used to adjust the output along with the weighted sum of the inputs to the neuron. The unique thing about the bias is that it does not interact with the actual input data. The biases for the input layer to the hidden layer and that from the hidden layer to the output layer are represented as b1 and b2, respectively.

Mathematical representation of developed models

The model generated by applying the Levenberg–Marquardt algorithm is given in Eq. 6.

$$\mathrm{Oil flow rate }({\text{Q}})= \sum_{{\text{j}}=1}^{10}\left\{{\text{purelin}}\left[{{\text{LW}}}_{{\text{j}},1}\left(\sum_{{\text{i}}=1}^{9}\sum_{{\text{j}}=1}^{10}{\text{tansig}}\left({{\text{X}}}_{{\text{i}}}*{{\text{IW}}}_{{\text{i}},{\text{j}}}+{{\text{b}}}_{1}\right)\right)\right]+ {{\text{b}}}_{2}\right\}$$
(6)

Equation 6 is the representation of the trained feed-forward neural network model correlating the nine input parameters and the oil flow rate in MATLAB. Here, ‘purelin’ and ‘tansig’ are basically MATLAB functions. They are used in the computation of the output from the network’s input. Whereas tansig is equivalent to the hyperbolic tangent (tanh), purelin gives a linear outcome between the inputs and output parameters. The weights from the inputs to hidden layers and from the hidden layer to the output layer are represented by IW and LW, respectively.

Explicit representation of the developed ANN Model

Presenting the model in an explicit manner helps in enhancing the models’ replicability (ability to obtain the same results using the same data and code). The duo of reproducibility and replicability of machine learning models has been omitted in most studies applying machine learning (Miłkowski et al. 2018). Furthermore, Stroebe and Strack (2014) reported that the majority of published machine learning studies lack any source code, and that machine learning studies rarely include documentation of how precisely the model was created. For the neural network models to be explicitly presented, the weights and biases for the network were presented. These biases and weights are presented in Table 5. With these weights and biases, the model can be replicated and the results obtained therefrom can be reproduced. Equation 7 is the explicit representation of the neural network model for oil flow rate prediction. This explicit nature makes it easy for the model to be deployed in software and also makes it explainable.

$$Q=2953{Q}_{n}+3219.1$$
(7)

where \({Q}_{n}\) is the denormalized value of the oil flow rate and is given by

$${Q}_{n}=\sum_{i=1}^{10}{A}_{i}+ 1.20717$$

\(Where:\)

$$A1 = - 0.0622*{\text{tan}}h\left[ { - 1.593UP_{p} {-}{ }1.138{\text{Down}}_{p} { } + 1.879P_{i} { } + { }2.374P_{d} { } + { }0.4639T_{i} { } + 0.12626T_{m} { } + { }0.6418{\text{CHP}} + 0.000641{\text{GOR }} - { }0.3256{\text{Choke }} + 1.556} \right]$$
$$A2 = - 0.07763*{\text{tan}}h\left[ {1.319{\text{UP}}_{p} + { }0.6377{\text{Down}}_{p} { } + 1.497P_{i} { } - { }0.31888P_{d} { } - { }0.80358T_{i} { } - 0.23196T_{m} + 0.4871{\text{CHP }} - 1.626GOR{ } + { }1.597{\text{Choke }} + 0.456572} \right]$$
$$A3 = 0.597564*{\text{tan}}h\left[ { - 1.3267UP_{p} + { }1.355{\text{Down}}_{p} { } + 0.797P_{i} { } + { }1.427P_{d} { }{-}{ }0.677T_{i} { } + 0.44158T_{m} { } + { }1.0209{\text{CHP}} - 0.9327{\text{GOR }} + { }0.999{\text{Choke }} + 1.7075} \right]$$
$$A4 = 0.01796*\tan h\left[ {0.973UP_{p} {-}{ }1.25145{\text{Down}}_{p} { } - 0.5788P_{i} { } + { }1.7811P_{d} { } + { }1.0255T_{i} { } - 0.42848T_{m} { } + 1.053{\text{CHP}} - 0.848{\text{GOR }} - { }1.92856{\text{Choke }} - 0.37282} \right]$$
$$A5 = 3.9449*{\text{tan}}h\left[ {0.2081{\text{UP}}_{p} {-}{ }0.00359{\text{Down}}_{p} { } - 0.00845P_{i} { } - { }0.01064P_{d} { } + { }0.0381T_{i} { } + 0.01367T_{m} { } + 0.00022{\text{CHP }} - 0.0567{\text{GOR }} + { }0.5825{\text{Choke }} - 1.0109} \right]$$
$$A6 = 0.0186*\tan h \left[ {0.502{\text{UP}}_{p} + 2.557{\text{Down}}_{p} { } + 0.877P_{i} { }{-}{ }2.281P_{d} { } + { }0.1887T_{i} { } - 1.14353T_{m} { } - { }0.18639{\text{CHP}} - 0.3393{\text{GOR }} + { }0.150647{\text{Choke }} + 1.1497} \right]{ }$$
$$A7 = - 0.07636*tanh\left[ { - 1.05069{\text{UP}}_{p} + { }0.9855{\text{Down}}_{p} { } - 2.11743P_{i} { } - { }0.05981P_{d} { } - { }1.32395T_{i} { } - 1.40679T_{m} { } + 0.109623{\text{CHP }} - 1.3732{\text{GOR }} - { }0.04751{\text{Choke }} - 1.32631} \right]$$
$$A8 = - 0.04353*{\text{tan}}h\left[ {1.11UP_{p} + { }0.41{\text{Down}}_{p} { }{-} 1.89P_{i} { } - { }0.10571P_{d} { } + { }0.4665T_{i} { } - 0.00994T_{m} { } + { }0.4396{\text{CHP}} - 0.0204{\text{GOR }} - { }0.86887{\text{Choke }} + 1.272} \right]$$
$$A9 = - 0.90049*\tanh \left[ { - 0.49128{\text{UP}}_{p} {-}{ }0.01563{\text{Down}}_{p} { } + 0.030539P_{i} { } - { }0.07407P_{d} { } + { }0.23T_{i} { } + 0.0999T_{m} { } - 0.01282{\text{CHP }} + 0.186064{\text{GOR}} + { }0.954{\text{Choke }} - 1.42641} \right]$$
$$A10 = - 0.03159*\tanh \left[ {1.409764{\text{UP}}_{p} {-}{ }0.67655{\text{Down}}_{p} { } + 1.4049P_{i} { } + { }0.053311P_{d} { }{-}{ }0.94784T_{i} { } + 1.011T_{m} - 0.04299{\text{CHP }}{-} 0.878{\text{GOR }} + 0.835656{\text{ Choke }}{-} 1.6201} \right]$$

where tanh = Hyperbolic Tangent, \({UP}_{p}\) = Upstream pressure, \({Down}_{P}\) = Downstream Pressure, \({P}_{i}\) = Pump intake pressure, \({P}_{d}\) = Pump discharge pressure, \({T}_{i}\) = Intake Temperature, \({T}_{m}\) = Discharge Temperature, CHP = Casing head pressure and GOR = Gas Oil Ratio. It should be noted that all input variables in parameters A1 to A10 are expressed as normalized values.

Relative importance of input parameters in the developed neural network model

A viable means of getting insight into a model is to assess its behaviour as one or more of its parameters are varied. This gives an idea as to the importance of the variable to the output of the model. The contribution of each input parameter to the output’s prediction defines the relative importance of that variable. Many methods abound in the literature for calculating the relative importance of input variables. Examples include: Garson’s algorithm, connection weights algorithm, use of partial derivatives, sensitivity plots, forward and backward stepwise addition and input perturbation (Olden et al. 2004). Garson’s algorithm was chosen for this research work (Garson 1991). Using this algorithm, the relative importance of a given input variable can be defined as shown in Eq. 8.

$${{\text{RI}}}_{{\text{x}}}=\sum_{{\text{y}}=1}^{{\text{m}}}{{\text{w}}}_{{\text{xy}}}{{\text{w}}}_{{\text{yz}}}$$
(8)

where RIx is the relative importance of input parameter x. \(\sum_{y=1}^{m}{w}_{xy}{w}_{yz}\) is the sum of the product of the connection weights from the input to the hidden neurons with the connection weights from the hidden neurons to the output neuron while y is the total number of hidden neurons and z is the output neurons.

Procedure for determining relative importance of input variables for ANN model using Garson’s algorithm

(a) Extract the absolute value of the connection weights (input layer weights and hidden layer weights) from the weights and biases in Table 5. The result is shown in Table 6.

Table 6 Final connection weights for the inputs of the neural network model

(b) Multiply the hidden-input layer weight vectors by the hidden-output layer weight vector. Applying these results in the values we have in Table 7.

Table 7 Product of input-hidden layer weights and hidden-output layer weights

(c) The rows in Table 7 are added. This results in the values in column 11 of Table 7

(d) A quotient of each value in columns 2 – 10 of Table 7 and the value in column 11 of Table 7 is done. This results in Table 8. Thereafter, a sum of the absolute values in each of columns 2 to 10 is done. The relative importance of the parameters is based on absolute values of this sum.

Table 8 Quotient of input-hidden layer weights and hidden-output layer weights

The relative importance in percentage is calculated by finding the quotient of the sum of each column of Table 8 and the sum total of rows of Table 8 then multiplied by 100. The result is shown in Fig. 10.

Fig. 10
figure 10

Relative Importance of parameters used in developing the neural network model

As shown in Fig. 10, choke size and upstream pressure have the highest effect on oil flow rate estimation in ESP operated wells, while casing head pressure has the least effect on it.

Evaluation of the effects of dimensional reduction on the predictive ability of the neural network

Dimensionality reduction is a crucial stage in the process of obtaining relevant data from a high-dimensional dataset. The objective is to get this data set's dimensionality down to its intrinsic dimensionality, which is the fewest parameters necessary to model the data without sacrificing any relevant information for the application. In this instance, using the parametric importance analysis (which shows an input variable’s usefulness) carried out in this study, the dimensional reduction is implemented by excising the input variable, (in this case, the casing head pressure) which has the least contribution to the prediction of oil flow rate. Thus, the casing pressure would be dropped from the network and the effect of its removal would be assessed by developing a new network without it. After reducing the dimension of the inputs, the new model with 8 inputs had an optimal architecture of [8 – 8 – 1]. The explicit representation of the new model is as shown in Eq. 9:

$$Q=2953{Q}_{n}+3219.1$$
(9)

where \({Q}_{n}\) is the denormalized value of the oil flow rate and is given by

$${Q}_{n}={\sum }_{i=1}^{8}{B}_{i}+0.890054$$

\(where:\)

$$B1 = 0.261071*\tan h\left[ {0073513UP_{p} + { }0.851334{\text{Down}}_{p} { } + 0.822379P_{i} { } + { }0.5587P_{d} { } - { }0.945T_{i} { } + 0.571T_{m} + 0.0058{\text{GOR }} - { }0.8906{\text{Choke }} + 1.6921} \right]$$
$$B2 = 1.21239*\tan h\left[ {0.4707UP_{p} + { }0.0154{\text{Down}}_{p} { } + 0.294P_{i} { } - { }0.0217P_{d} { } - { }0.7067T_{i} { } + 0.1315T_{m} { } + { }0.09{\text{GOR }} + 0.15287{\text{Choke }} - 0.42546} \right]$$
$$B3 = - 1.70762*{\text{tan}}h\left[ {0.616{\text{UP}}_{p} {-}{ }0.01565{\text{Down}}_{p} { } + 0.001513P_{i} { }{-}{ }0.02088P_{d} { } + { }0.10916T_{i} { } - 0.02843T_{m} - 0.03752{\text{GOR }} - { }0.20343{\text{Choke }} - 0.42564} \right]$$
$$B4 = 0.99992*\tan h[0.5612{\text{UP}}_{p} {-}{ }0.0493{\text{Down}}_{p} { } - 0.33P_{i} { } - { }0.00645P_{d} { } + 0.8464T_{i} { } - 0.2053T_{m} { } - 0.2555{\text{GOR }} + 0.3404{\text{Choke }} + 0.160741]$$
$$B5 = 0.130195*{\text{tan}}h\left[ {1.1939{\text{UP}}_{p} {-}{ }0.393{\text{Down}}_{p} { } - 1.308P_{i} { } - { }0.892P_{d} { } + { }1.024T_{i} { } - 1.6239T_{m} { } + 0.48{\text{GOR }} + 1.383{\text{Choke }} - 1.58115} \right]$$
$$B6 = 0.06857*{\text{tan}}h\left[ {1.228{\text{UP}}_{p} - { }2.319{\text{Down}}_{p} { } + 1.17P_{i} { }{-}{ }0.901P_{d} { } + { }0.7468T_{i} { } + 1.565T_{m} { } - 1.786{\text{GOR }} - 0.04742{\text{Choke }} + 1.685515} \right]$$
$$B7 = - 0.50875*{\text{tan}}h\left[ {0.3{\text{UP}}_{p} - 0.00298{\text{Down}}_{p} { } + 0.0326P_{i} { } - { }0.07555P_{d} { } - { }1.1065T_{i} { } + 0.35265T_{m} { } - 0.599{\text{GOR}} + { }1.1098{\text{Choke }} + 0.866529} \right]$$
$$B8 = 1.61344*\tan h\left[ {0.7258{\text{UP}}_{p} + { }0.004508{\text{Down}}_{p} { }{-} 0.05869P_{i} { } + { }0.006947P_{d} { } + { }0.260616T_{i} { } + 0.024664T_{m} - 0.21825{\text{GOR }} + { }0.78666{\text{Choke }} - 1.65034} \right]$$

where tanh = Hyperbolic Tangent, \({UP}_{p}\) = Upstream pressure, \({Down}_{P}\) = Downstream Pressure, \({P}_{i}\) = Pump intake pressure, \({P}_{d}\) = Pump discharge pressure, \({T}_{i}\) = Intake Temperature, \({T}_{m}\) = Discharge Temperature, CHP = Casing head pressure and GOR = gas oil ratio. It should be noted that all input variables in parameters B1 to B8 are expressed as normalized values.

The performance of the new model with 8 inputs is as summarized in Table 9. From the table, it is observed that the performance of the new model with 8 inputs (i.e. excluding the casing head pressure) performs equally as good as the original model performance (shown in Table 4) with 9 inputs judging from the error metrics. This implies that the casing head pressure has minimal effect on the prediction of oil flow rate in ESP operated wells. It is crucial to emphasize that the developed model (Eq. 9) should be applied within the studied range of relevant parameters and caution must be taken when extending beyond this range.

Table 9 Model performance for oil flow rate prediction in an ESP operated well with 8 inputs

Comparison of the performance of existing models with the developed model

A comparison of the performance of existing models for flow rate prediction and the developed model is presented in Table 10. From the table, it is observed that most of the existing models used a limited number of metrics to evaluate the performance of their models. Another point worth noting from the table is that the developed model performs better than the existing models. It must be said that since the model developed in this study is meant for flow rate estimation in ESP lifted wells, it is only natural to compare its performance with other models developed for ESP-lifted wells since the inputs variables for gas lift wells are radically different from those of ESP wells. However, two reasons why this is not possible include: first, there is a general lack or scarcity of flow rate data from ESP-assisted wells. Second, of the seven models in Table 10 meant for flow rate prediction in ESP wells, six were not presented in an explicit manner, thus making it difficult to fit experimental/field data to them.

Table 10 Performance comparison of existing models with the developed neural network model

Computational cost of the developed ANN models

A mathematical model must be implemented in software in order to be effective. The computational complexity of a model is one factor used to determine its effectiveness in software. This has to do with how quickly the model can compute and how much memory it uses. Khan Academy (2022) defines an efficient algorithm as one that produces the desired result with the least amount of memory usage and execution time. A crucial component of an algorithm's design is estimating its complexity because it provides vital information about the intended performance (INTERSOG 2020).

For a neural network, an increase in each hidden layer's node count directly affects how much memory is used. It follows that if the neural network size is extended horizontally, the memory usage will increase (Mahendran 2021). Therefore, a model's computational cost (memory and execution time) increases in direct proportion to its complexity and size. However, there is an effort to try and lessen the ever-increasing memory and computing costs that models incur as their number of parameters continues to expand (Bird et al. 2021).

The success of a software-intensive system in the real world depends on the non-functional software characteristics of quality parameters, such as dependability, computational cost, performance, and memory consumption. In this instance, two distinct costs are examined: (i) the computational cost, which depends on how quickly the resulting model can carry out computations, and (ii) the memory consumption cost, which gauges the eventual size of the model. Computing expenses are more commonly measured in computer science as "complexity".

Computational cost of the model: one way to get an idea of the computational cost of a model is to simply count how many computations it does. In neural network models, MACCs (Multiply-Accumulate Operations) are a measure of computational complexity and effectiveness. They estimate how many basic arithmetic operations are required to process data via the network. It goes to reason that if a model has a lot of computations to carry out, then it would definitely take a longer time to execute. Therefore, knowledge of MACCs aids in the prediction of a model's execution time on various hardware platforms. Generally, models with lower MACCs have a low computational workload (Lin and Yeh 2022). To compute the MACCs for a simple neural network, the formula in Eq. 10 is used.

$$MACCs=\left[\left(2*\mathrm{Number\, of\, Inputs}+1\right)\mathrm{ Number\, of\, hidden\, layer\, neurons}\right]+\mathrm{Number\, of\, output\, neurons}$$
(10)

Memory consumption: Usually a neural network gets represented by a matrix of weights and biases. Each weight can be represented using 64–bit floating point number (double precision). This uses 8 bytes of data since 8 bits is equal to 1 byte. Thus, the size in memory depends on the layers of the neural network. For instance, a neural network architecture of [3–5–1] requires 26 doubles, or 208 bytes of data to represent the network since the weights for the aggregation function are represented by [(3 × 5) + (5 × 1) + (5 + 1)] doubles. There are a total of 26 parameters for such a network that must be optimized in order to obtain the best result. With a huge number of parameters to be optimized, a huge amount of computing resources would be required. The memory consumption of existing models and the models developed in this study is presented in Table 11.

Table 11 Computational workload of developed model compared with existing models

From Table 11, a comparison of the MACCs of the models developed for ESP operated wells with the models developed in this study for oil rate estimation; it is found that the models developed in this study occupied the least memory in terms of space requirements, while it has a small number of MACCs, implying that its speed of computation would be fast. This makes this model useful for real-time deployment in the field where oil flow rate data are required in real time.

Conclusions

This work aimed at developing an explicit, computationally efficient and accurate predictive tool useful for estimating oil flow rate in electrical submersible operated wells. Since oil flow rate is considered a challenging problem that depends on many variables and has high parameter interdependencies, neural network algorithm was chosen for this purpose. This methodology was applied successfully to this flow rate estimation process. A database sourced from the field was used to train the neural network structures and check the validity of the network’s algorithm. Predictive results showed good agreement with field data. Based on the findings of this work, the following conclusions are drawn:

  1. i.

    The uniqueness of this study lies in three features of the developed model namely: its explicit representation, its computational cost evaluation and memory consumption as well as its high predictive capacity. These combined attributes are uncommon in most AI models that aim to predict flow rates in ESP-lifted wells.

  2. ii.

    The performance of the developed model as demonstrated by its high predictive precision compared with the existing models indicates that the developed model is a viable alternative for flow rate prediction in ESP-lifted wells.

  3. iii.

    The explicit nature of the developed model coupled with the sensitivity analysis of the input variables helped in making the developed model explainable and easy to incorporate in a software application.

  4. iv.

    The computational cost analysis of the developed model in comparison with existing models indicates that the developed model would perform computations in a timely manner should it be deployed in a software application.

  5. v.

    It is recommended that the model be tried on other fields operated with ESPs to further assess its predictive strength.