Introduction

Liquid loading is an operational constraint that is associated with gas wells where the dominant driving mechanism for hydrocarbon production is associated by gas-driven mechanisms. Liquid loading occurs in a gas well when the liquid starts to accumulate in the tubing or casing and the velocity of gas is lower than the critical value (the minimum velocity at which the gas pushes the liquid out of the gas well). Thus, when hydrocarbon is produced in a well, the density difference and the effect of gravity the gas will flow faster than the liquid in the well, and hence, the gas push the liquid out of the well by forming a gas core at the centre of casing or tubing. As production continues at the late life of a gas well, or a gas-driven well, the reservoir pressure continues to deplete, and the gas velocity will decrease below the critical value, leading to liquid hold up at the upper part of the casing or tubing, and the liquid starts to fall back, accumulate, and overtime create an hydrostatic back pressure that is far higher than the formation pressure of the well, limiting the flow of gas into the well, which might eventually leave the well dead (Inability of the well to produce hydrocarbons). This phenomenon described is known as liquid loading. Figure 1 shows how liquid loading occurs in a well after a long period of production; (a) the well producing effectively without liquid loading occurring, (b) the gradual occurrence of liquid loading in the well with the gas rate reduces; (c) the liquid holdup taking place in the well bore; (d) the liquid loading fully taken place in the wellbore and hydrocarbon production has stopped due to back pressure by the liquids in the wellbore.

Fig. 1
figure 1

Life history and the process of liquid loading in a gas well (Liu et al. 2017)

Liquid loading despite being very detrimental to the well is difficult to predict, and therefore, software-based technique provides a holistic solution that enables complete understanding of its behaviour. Liquid loading is a function of gas velocity, downhole well components, and surface equipment, and accurately modelling liquid loading will facilitate understanding of its behaviour. However, current mathematical models referenced in the technical literature (Turner et al. 1969; Coleman et al. 1991a, b, c; Nosseir et al. 2000; Li et al. 2016; Dousi et al. 2006; Yuan 2013; Guner 2012; Makinde et al. 2013; Tan et al. 2014; Li et al. 2014; Vieiro et al. 2016) have error and degree of inaccuracies. We have identified the need to develop a new software data-driven code-based environment—Trained Artificial Neural Network—ANN methods that incorporate and optimized the strength of conventional models incorporating well nodal analysis to bridge knowledge gap in prediction of liquid loading for the main purpose of this research work. In addition, software-based methods could facilitate deeper understanding and simulation of intrinsic flow behaviour because data resulting in trained neural network presents reliable prediction of liquid loading in hydrocarbon well systems than attempting to determine this by computational modelling of the physical system. Liquid loading, if adequately predicted prior to it been detrimental to the well, offers solutions to mitigate associated consequences. Some of the mitigation methods could be: deployment of mechanical methods such as gas lift, plunger lift, pumping, and velocity string installation or use of chemical (foamers). Although liquid loading is a serious problem, there is still not yet a common consensus how previous researchers have predicted and determine liquid loading problem over time. Therefore, models from previous works are referenced here as prediction of liquid loading by Turner et al. (1969), Coleman et al. (1991a, b, c), Nosseir et al. (2000), Li et al. (2016), Dousi et al. (2006), Yuan (2013), Guner (2012), Makinde et al. (2013), Tan et al. (2014), Li et al. (2014), Vieiro et al. (2016) just to name a few published article reported to progress in the research work on liquid loading. Turner et al. (1969) researched the analysis and prediction of the minimum flowrate for the continuous removal of liquid from gas wells where they analysed the equation used for calculating the minimum flowrate from physical properties which was proposed by Jones (1947) and Duckler (1960). Jones (1947) and Dukler (1960) analysis postulates the existence of two proposed physical models for the removal of gas from the wells which are; the liquid film moves along the wall of the pipe and the liquid droplet entrained in the gas core. Turner et al. (1969) suggested the concept of the liquid droplet entrained in the gas model to determine the minimum condition required for liquid loading not to occur. Their model compared with field data achieves reasonable fit with an adjustment of 20%. However, the major limitation in their work was that the exchange between the liquid accumulation on the wall of the production tubing and the liquid droplet entrained in the gas–liquid interface were not considered and also the assumptions used were not stated. Coleman et al. (1991a, b, c) demonstrated that there was no need for the adjustment factor of 20% that Turner et al. (1969) applied in their work for reservoirs of low pressures of approximately less than 500psia. Coleman et al. (1991a, b, c) also showed that liquid loading can be modelled and predicted with conventional quantitative analysis technique and hence liquid loading can be predicted far before it takes place. Pleter (1990) did thorough research on a series of field tests on liquid loading of gas wells where he found out that various types of behaviour could be encountered in such wells. Nosseir et al. assumed a drag coefficient value of 0.44 for Reynolds number (Re), 2 × 105–104 and Re value greater than 10^6 have a drag coefficient value to be 0.2. The researchers were able to show the validity of Turner’s and the relevance of the 20% adjustment factor considered by Turner et al. for the predicted minimum gas flowrate. Li et al. (2001) progress Turner and Coleman et al. (1991a, b, c) models by considering the effect of deformation of the free-falling liquid droplet in a gas medium. Thus their model showed liquid droplet is entrained in a high-velocity gas stream, a pressure difference exists between the back and front portions of the droplet leading to its deformation and its shape changes from spherical to convex bean with unequal sides (flat) or ellipsoidal (Fig. 2).

Fig. 2
figure 2

a Flat-shaped droplet model. b Shape of entrained droplet moving in high velocity gas (Li et al. 2001)

Gool et al. (2008) developed a model that agrees with the real life data condition to predict gradual decline in the well production rate resulting from liquid loading. Zhou et al. (2009) suggested that Turner et al. (1969) model failed to predict accurately liquid loading even with a 20% adjustment could not predict some wells, field data also proves that the adjusted model underestimates liquid loading at times (Zhou et al. 2009). A more thorough research on the two theories proposed by Turner et al. (1969) postulated as the continuous film model and the liquid droplet model led to the development of an empirical model prediction for liquid loading in liquid hold up. Zhou et al. (2009) suggested that their model provides better results when compared with Turner et al. (1969) 20% adjusted model and confirmed their model agree with the findings of Coleman et al. (1991a, b, c). Wang et al. (2010) developed a new calculation method for the calculation of critical velocity based on the droplet model. Their model did not use the normal spherical droplet that was used by Turner et al. (1969) but a disk-like ellipsoid shaped droplet which agrees with the actual gas well production. Luan et al. (2012) developed a new model for the prediction of liquid loading in low-pressure wells, in this model they adopted the Li et al. (2001) approach, which is a deformation of the droplet due to gas lifting efficiency. Thus the authors introduced a dimensionless parameter “S” which characterizes the extent of loss of gas lift energy in the wellbore. Luo (2013) conducted a thorough review and comparison of the two models proposed by Turner et al. (1969) which are the continuous film model and the liquid droplet model where his results show that the continuous film model is the best model for liquid loading prediction and may be the major reason for the accumulation of liquid(s). In the wellbore, especially in inclined wells, Faidairo et al. (2014) presented an improved Guo et al. (2006) model based on Turner et al. (1969) critical velocity model, by considering accumulation and kinetic terms that were neglected by Guo et al. (2006) in his model. Few authors have used artificial neural network models for the prediction of liquid loading in gas wells (Khamehchi et al. 2014; Osman et al. 2002; Ghadam 2015). (Khamehchi et al. 2014), (Osman et al. 2002) Osman et al. (2002) and Khamehchi et al.(2014) employed ANN to develop a model and reported a better prediction and higher accuracy than existing models. In this study, the predictive capability of other data-driven methods was investigated and improved for the prediction of liquid loading using machine learning and software-based Bayesian and other statistical measures to improved accuracy in the prediction of liquid loading in gas wells hydrocarbon production systems. The improvement in the modelling approaches adopted neural network artificial intelligence black box to model the algorithm software system using data in and out of the black box to train the model with the input and output data and compared this with the typical results from prior experience that provides improvement in the solution. The ANN machine learning models optimize data from artificial neural network in Bayesian neural network, particle swarm optimization and genetic algorithm framework in the following steps: [1] identify the various variables that can be used for the detection of liquid loading, [2] develop artificial and Bayesian neural network models by coding the algorithms for the prediction of liquid loading using Python programming, [3] develop particle swarm optimization and genetic models by coding the algorithms for the prediction of liquid loading on MATLAB, [4] obtain solution(s) to model using MATLAB and Python, and [5] compare the results of software with the results obtained by other models.

Model development and methodology

In this research paper we have developed a software-based data-driven model for the prediction of liquid loading in gas wells. The data-driven model incorporated in software code environment was evolved from the method solutions of (Turner et al. 1969), particle swarm optimization, and genetic algorithm. The genetic algorithm and particle swarm optimization model algorithms were used to evaluate the optimal values of the constants in Turners et al. (1969) equations by coding these algorithms in MATLAB. The equations are presented in Eqs. (1) and (2) and the optimal constants were gotten by minimizing the mean square errors between the actual flowrates of fluids out of the wellbore to that which the algorithms obtain per iteration, while artificial neural network and Bayesian neural network were programmed using python language to predict liquid loading.

$$V_{c} \left( {{\text{water}}} \right) = \beta_{1} \frac{{\left( {67 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}$$
(1)
$$V_{c} \left( {{\text{condensate}}} \right) = \beta_{2} \frac{{\left( {45 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}$$
(2)
$$Q_{g} = 5.2759\frac{P \times A}{Z} \times \left( {c \times \left( {\beta_{1} \frac{{\left( {67 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}} \right) + b\left( {\beta_{2} \frac{{\left( {45 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}} \right)} \right)$$
(3)

where \(P\): pressure, Ib/ft2; A: Cross-sectional area of conduit, ft2; \(Z\): compressibility factor, dimensionless; \(V_{c} \left( {{\text{Water}}} \right)\): critical velocity of gas required to push water out of the wellbore, ft/s; \(V_{c} \left( {{\text{Condensate}}} \right)\): critical velocity of gas required to push condensate out of the wellbore, ft/Thus genetic algorithm and particle swarm optimizatn aried and evaluated to dermine the optimal values of \(\beta_{1}\). and \(\beta_{2}\). in Eq. (1) that minimizes the difference in the mean square error (MSE) between the test gas production rate \(Q_{{{\text{jg}}.{\text{actual}}}}\). and predicted minimum production rate for continuous removal of liquid from the well \(Q_{{{\text{jg}}.{\text{pred}}}}\). Variable c is equal to a value of one when water only or both oil and condensate are produced in the well, otherwise it has a value of zero, while variable b has a value of one when only condensate is produced otherwise it has a value of zero.

T mean square error (MSE) is given as:

$${\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{j = 1}^{n} \left( {Q_{{{\text{jg}}.{\text{actual}}}} - Q_{{{\text{jg}}.{\text{pred}}}} } \right)^{2}$$
(4)

where \(Q_{{{\text{jg}}.{\text{actual}}}}\): measured critical rate, Mscf/day; \(Q_{{{\text{jg}}.{\text{pred}}}}\): predicted critical rate, Mscf/day; MSE: Mean squared error.

Particle swarm optimization (PSO)

Particle swarm optimization is based on the social behaviour of fish and birds. This model was initially developed by James Kennedy and Russell C Eberhart in 1995. Particle swarm optimization is a swarm intelligence which has population size (P) of candidate solution called particle. Each population has its own position \(X_{j}^{m}\)., velocity \(V_{j}^{m}\)., personal and global best (\({\text{Pb}}_{j}^{m}\). and \({\text{Gb}}_{j}^{m} )\) has shown in the figure below. Particles communicate and learn from each other in order to give the best solution to the optimization problem. There are two equations used in particle swarm optimization, the first is for updated veloci othe particle and the last for updated the position of the particle. These two equations are described below:

$$\begin{aligned} V_{j}^{m + 1} &= \omega \times V_{j}^{m} + a_{1} \times {\text{rand}}\left( {0,1} \right)*\left( {Pb_{j}^{m} - X_{j}^{m} } \right) \\ &\quad + a_{2} \times {\text{rand}}\left( {0,1} \right)*\left( {Gb_{j}^{m} - X_{j}^{m} } \right) \\ \end{aligned}$$
(5)
$$X_{j}^{m + 1} = X_{j}^{m} + V_{j}^{m + 1}$$
(6)

where \(X_{j}^{m}\): particle position, \(V_{j}^{m}\): particle velocity, \(Pb_{j}^{m}\): personal best, \(Gb_{j}^{m}\).—global best, \(a_{1}\). and \(a_{2}\): acceleration elements, \(X_{j}^{m + 1}\): updated position, \(V_{j}^{m + 1}\): updated velocity, \(\omega \times V_{j}^{m}\): Inertia term, \({\text{a}}_{1} \times {\text{rand}}\left( {0,1} \right) \times \left( {{\text{Pb}}_{{\text{j}}}^{{\text{m}}} - {\text{ X}}_{{\text{j}}}^{{\text{m}}} } \right)\).: Cognitive component, \(a_{2} \times {\text{rand}}\left( {0,1} \right)*\left( {Gb_{j}^{m} - X_{j}^{m} } \right)\): Social component (Fig. 3).

Fig. 3
figure 3

Flowcharts of Particle Swarm Optimization (Wang et al. 2018)

Genetic algorithm (GA)

Genetic algorithm is a type of metaheuristics machine learning optimization algorithm developed by David. E Goldberg it is based on the evolution of biological processes analogy was inspired by Darwin’s theory of evolution, which states that the survival of an organism is affected by rule "the strongest species that survives". The chromosome with the highest fitness value will be selected for the next iteration. After several iterations, the chromosome value will converge to a certain value, which is the best solution for the defined problem (Fig. 4).

Fig. 4
figure 4

Genetic algorithm flowchart

Bayesian neural network (BNN)

Bayesian neural network in comparison with other machine learning algorithm is based on probability distribution making it more robust in predicting liquid loading as it incorporates function of probabilities and graphs to accommodate different operating conditions which may lead to liquid loading in well production rather than deterministic approach, more adept to deal with uncertainty, making BNN a good model candidate for this research work. The model is presented in Eq. (7)

$$\begin{aligned} P\left( {\Theta | \, X_{1} , \ldots, \, X_{n} } \right) \, = \frac{{P\left( {x_{1} , \ldots, x_{n} |\Theta } \right)*P\left( \Theta \right)}}{{P\left( { x_{1} ,\ldots, x_{n} } \right)}} \\ \alpha L (\Theta \left( {x_{1} , \ldots, x_{n} } \right) P\left( \Theta \right) \end{aligned}$$
(7)

where X = (X1,...,Xn); vector of variables, P (Θ|X1,...,Xn); probability distribution, P(Θ); conditional probability distributions for the values of each input variable, Θ; Set of local parameters.

A Bayesian network (BN) for a vector of variables X = (X1,...,Xn) consists of two components: P(Θ|X1,...,Xn) represent the probability distribution a structure “S” represented by a directed acyclic graph (DAG), expressing a set of conditional in-dependencies between variables. Set of local parameters Θ representing the conditional probability distributions for the values of each input variable given the different value combinations of their parents, according to the structure “S” of the network. Jospin et al. (2020) (Fig. 5).

Fig. 5
figure 5

Bayesian neural network

Artificial neural network (ANN)

Artificial neural network (ANN) is defined as a smart adaptive system that is similar to the learning methods used by the brain that creates a specific and defined relationship between the input and output variables provided. In designing any artificial neural network model, the type of architecture, the number of hidden layers, the total number of the neurons as well as the connectivity between the layers are of great interest and this must be accurately determined (Awoleke and Lane 2011). The neurons, which comprises of nodes and processing elements, and interconnections (weights) are the two main components of them (Fig. 6).

Fig. 6
figure 6

Diagram showing Artificial Neuron (Babuska 2010)

Field data used for this study

Two different field data were used for this project which are Xinjiang North-west gas field data and the 106 data sets from condensate/water gas well used by Turner et al. (1969), and several authors such as Ikpeka et al. (2019), was used for this research work. Turner et al. (1969), Barnea (1986), Li et al. (2001), and Ikpeka et al. (2019), Chen et al. (2016), Liu et al. (2018). Statistical summary of the 106 data set and Xinjiang North-West gas field data are presented in appendix.

Critical rate prediction using Bayesian neural network

The Bayesian neural network would be used for prediction of liquid loading through the steps shown in the flowchart below (Fig. 7).

Fig. 7
figure 7

Bayesian neural network flowchart for the prediction of liquid loading

Critical rate prediction optimization using particle swarm optimization

See Fig. 8.

Fig. 8
figure 8

Flow diagram for the customized application of the particle swarm optimization algorithm to identify the required unknown using the fitness function based on the data set

Critical rate prediction optimization using genetic algorithm

Genetic algorithm is used to optimize Eq. (9) follows the steps below;

  • Step 1 Determine the number of chromosomes, generation, and mutation rate and crossover rate value.

  • Step 2 Generate chromosome–chromosome number of the population, and the initialization value of the genes chromosome-chromosome with a random value.

  • Step 3 Carried out steps 4–7 until the number of generations is met.

  • Step 4 Evaluation of fitness value (mean square error (MSE)) of chromosomes by calculating objective function which is the formula of the critical rate of the gas. Rank each sample in the population ascending according to its fitness function and update the swarm position. The swarm with the least mean square error serves as the best solution and the particle swarm is updated based on these results.

  • Step 5 Selection chromosomes for the next iteration based on the results in step 4.

  • Step 6 Mating of chromosomes-based chromosome crossover parameter.

  • Step 7 Mutation of chromosomes-based chromosome mutation parameter.

  • Step 8 Repeat the algorithm several times to verify if it is converging to a stable solution. Selection of the chromosome with the best value.

Statistical analysis for the determination of model accuracy

Confusion matrix is a statistical table that is used to evaluate the performance of a model by determination of uncertainty, error, accuracy, sensitivity, etc. Confusion matrix was used to determine the accuracy, validity, and comparative analysis between the new models gotten by genetic algorithm, firefly algorithm and particle swarm optimization and other existing models such as Ikpeka et al. model (2019), Turner et al. model (1969), and Li et al. model. (2001) and so much more. The confusion matrix is shown in Table 1.

Table 1 Modified Confusion Matrix Table for liquid loading prediction

Results and discussion

Generation of new model using genetic algorithm

A new software code environment was evolved from genetic algorithm, particle swarm optimization data-driven models, and Bayesian neural networks solutions to obtain optimum result from this algorithm presented below. The comparative study of this model is provided in the other sections of this paper.

The assumptions of the models Genetic algorithm are;

  • The liquid droplet of both water and oil are spherical

  • Change in liquid droplet shapes is compensated by the algorithm (Table 2).

Table 2 Genetic algorithm parameters used to predict critical gas rate

The new critical gas rate equation gotten by this machine learning optimization algorithm is given as:

$$Q_{{{\text{gga}}}} = 5.2759\frac{P \times A}{Z} \times \left( {c \times \left( {7.8961\frac{{\left( {67 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}} \right) + b \times \left( {5.209\frac{{\left( {45 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}} \right)} \right)$$

where \(P\): pressure, Ib/ft2, A: Cross-sectional area of conduit, ft2, \(Z\): compressibility factor, dimensionless.

The genetic algorithm model prediction accuracy is quite high compared to firefly algorithm model and particle swarm optimization model based on data set recording accuracy of 89% and 83.3% from the simulation in MATLAB. The simulated prediction for liquid loading through genetic algorithms could be further improved through machine learning algorithm discussed in other sections.

An improved model using particle swarm optimization (PSO)

A new model was obtained using particle swarm optimization due to its vast applications in several industries where the global best particles are selected as the unknown parameters as defined in Eq. 10. The parameters assigned to the PSO were the number of swarm size N, Varmin (lower bound of unknown variables), Varmax (upper bound of unknown variables) maximum generation T, acceleration elements a1 and a2 and the inertia weight ω, their values are shown in Table 3. The data-driven model gotten from the MATLAB code representing the background model and particle swarm optimization is based on the assigned parameters shown in Table 3. Particle swarm optimization model sensitivity analysis reveals it was most sensitive to the inertia weight, cognitive component, and social component. The optimal values of the model parameters are presented in Table 3.

$$Q_{{{\text{gpso}}}} = 5.2759\frac{P \times A}{Z} \times \left( {c \times \left( {3.9481\frac{{\left( {67 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}} \right) + b\left( {3.9520\frac{{\left( {45 - 0.0031P} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}}} }}{{\sqrt {0.0031P} }}} \right)} \right)$$

where P: pressure, Ib/ft2, A: Cross-sectional area of conduit, ft2, Z: compressibility factor, dimensionless.

Table 3 Particle swarm optimization parameters used to predict critical gas rate is presented below

Particle swarm model exhibits the same prediction as firefly algorithm with the simulated accuracy of 92% and 83% for 106 data set gotten from water/condensate gas wells and Xinjiang North-west gas field data, respectively, this model proves to have an extremely high prediction for loaded wells which is our major concern, further analysis, validation of this new model gotten through the aid of genetic machine learning algorithm are discussed in other sections of this chapter.

A new model using artificial neural network

A software-based code environment was developed through the application of artificial neural network. The new model is validated by combining the 106 data points and Xinjiang well data. The data set is divided into two sets which are the train and test data. The train set is 60% of the whole data while 40% is the test set presented in other sections (Fig. 9).

Fig. 9
figure 9

Artificial neural network design for liquid loading prediction

A new model using Bayesian neural network

A software-based model was developed through the application of Bayesian neural network/Naive Bayes classification. This new model is validated by combining the 106 data points and Xinjiang well data. The data set is divided into two sets which are the train and test data. The train set is 60% of the whole data while 40% is the test set. The results gotten from this model are presented in other sections.

Comparative study

A comparative study was carried out in our study between the data-driven models obtained using genetic algorithm, particle swarm optimization, and the software-based model obtained using artificial neural network and Bayesian neural network and other models using confusion matrix statistical analysis and cross-plots. The 106 data sets consist of 53 unloaded wells and 37 loaded wells while Xinjiang North-west gas field in China consist of 10 unloaded wells and 8 loaded wells. Comparative analysis and confusion matrix were based on these data.

Comparative analysis of the different models using cross-plots

According to Turner et al. (1969), validity and accuracy of model for liquid loading prediction is based on the degree of data separation it exhibits on a cross-plot. He also stated the following;

  1. 1.

    Unloaded wells/ wells that would unload easily should be located above the diagonal.

  2. 2.

    Loaded well should plot or to be located below the diagonal.

  3. 3.

    Near-loaded wells/ wells about to be loaded should plot below and very close to the diagonal.

The different models including the data-driven models and the software-based models are validated and compared based on these rules after their results have been plotted on the cross-plot.

Cross-plots obtained based of the 106 data points

Despite the success in the prediction of unloaded wells by Turner et al. model, it fails to accurately predict most of the loaded wells. This shows that this model has poor data separation based on the predicted critical rate (Figs. 10, 11).

Fig. 10
figure 10

Cross-plot of test flowrate against critical flowrate using Turner et al. model

Fig. 11
figure 11

Cross-plot of test flowrate against critical flowrate using Li et al. model

The data separation gotten by Li et al. (2001) model is closed to that achieved by Turner et al. (1969) model, but it is of lower accuracy compared to that of Turner et al. (1969). More study was carried out by Ikpeka et al. (2019), where he developed a mathematical model based on the combination of the assumptions of the critical velocity of Li et al. (2001) and Turner et al. (1969), he had a more accurate prediction and data separation compared to these two models it was formed from. The cross-plot obtained by Ikpeka et al. (2019), is re-plotted and shown in Fig. 12 of this study. The model achieves a better data separation compared to Turner et al. (1969) and Li et al. (2001) models as shown; however, the model accuracy is not satisfactory due to the fact that it still predicted some loaded wells which is of primary concern wrongly, as the loaded wells falls in the unloaded region as shown in Fig. 12 (Figs. 13, 14).

Fig. 12
figure 12

Cross-plot of test flowrate against critical flowrate using Ikpeka et al. (2019) model

Fig. 13
figure 13

Cross-plot of test flowrate against critical flowrate using Genetic algorithm model

Fig. 14
figure 14

Cross-plot of test flowrate against critical flowrate using particle swarm optimization model

PSO data-driven model and GA data-driven model achieve greater data separation as compared to Turner et al. (1969), Li et al. (2001) and Ikpeka et al. (2019). These two models also give excellent prediction of loaded wells which is not the case with Turner et al. (1969), Li et al. (2001), and Ikpeka et al. (2019). The 106 data points and Xinjiang well data were combined together in order to have large volume of data for the artificial neural network and Bayesian neural network, out of the combined data 40% was taken as the test data while 60% was taken as the train data, the cross-plots of the test and train data for the artificial neural network and Bayesian neural network are presented in the following figures (Figs. 15, 16).

Fig. 15
figure 15

Cross-plot of test flowrate against critical flowrate for ANN training set

Fig. 16
figure 16

Cross-plot of test flowrate against critical flowrate for ANN test set

The models existing and developed in this project ANN software-based model gave the best data separation for both the test and train data set (Figs. 17, 18).

Fig. 17
figure 17

Cross-plot of test flowrate against critical flowrate for BNN training set

Fig. 18
figure 18

Cross-plot of test flowrate against critical flowrate for BNN test set

BNN software-based model achieves a data separation that is poor compared to ANN software-based model, PSO data-driven model and FA data-driven model, but a better data separation compared to Turner et al. (1969), Li et al. (2001). It achieved a data separation that is close to that of Ikpeka et al. (2019) model for both the test and train data set.

Cross-plots obtained based on Xinjiang north-west gas field in China

Genetic algorithm and particle swarm optimization achieve better data separation compared to the existing models as discussed. These models were also tested using Xinjiang North-west gas field in China. This evaluation shows that genetic algorithm data-driven model exhibits high predictive capacity compared to that of Particle swarm optimization algorithm (Figs. 19, 20).

Fig. 19
figure 19

Cross-plot of test flowrate against critical flowrate using particle swarm optimization model and genetic algorithm for loaded wells

Fig. 20
figure 20

Cross-plot of test flowrate against critical flowrate using particle swarm optimization model and genetic algorithm for unloaded wells

Model validation using confusion matrix statistical analysis

The model validation using confusion matrix statistical analysis incorporates

  • 106 field data points consist of 53 unloaded wells, and 37 loaded wells.

  • Xinjiang North-west gas field in China consists of 10 unloaded wells and 8 loaded wells.

From the tables presented above, data-driven model sand the software-based models were shown to exhibit higher accuracy compared to the existing mathematical models except for BNN whose results and accuracy are close to that given by Ikpeka et al. (2019). The newly obtained models are ranked based on the accuracy they gave. ANN software-based model gave accuracy of 93% and 92% for test and train data respectively, followed by particle swarm optimization data-driven model which gave accuracy of 92% and 83% for the 106 data set and Xinjiang North-west gas field, respectively, followed by genetic algorithm data-driven models with accuracy of 89% and 83% accuracy, respectively, for the 106 data set and Xinjiang North-west gas field, respectively, while BNN gave accuracy of 77% and 73% for train and test data, respectively (Tables 4, 5, 6, 7, 8).

Table 4 Statistical analysis using confusion matrix for genetic algorithm model and other models using 106 data points
Table 5 Statistical analysis using confusion matrix for particle swarm optimization and other models using 106 data points
Table 6 Statistical analysis using confusion matrix for the new models and other existing models using Xinjiang North-west gas field data
Table 7 Artificial Neural Network Validation using Combined Data Set
Table 8 Bayesian neural network (Naïve-Bayes classifications validation using combined data set)

Advantages and disadvantages of study

Advantages of study

Liquid loading problem has affected several fields’ production and has reduced the supply of hydrocarbons. This study presents efficient and effective models that should be incorporated into other software for the prediction of liquid loading, hence saving production companies huge work over operation cost and production time.

Disadvantages of study

This study is limited to vertical wellbores and requires modifications to make it suited for horizontal and deviated wells.

Limitation of study

The scope of this study is limited to vertical wellbores, and hence, it is required that further study should involve the modification of the equations gotten and software-based models to allow for prediction of liquid loading in horizontals by introducing wellbore deviation parameters.

Summary and conclusions

This study enumerates the use of software-based code environment and data-based-driven model neural network design that incorporates and optimized the use of Bayesian neural network, artificial neural network, genetic algorithm, and particle swarm optimization algorithms for the prediction of liquid loading in gas wells, the proficiency of this models were determined using106 data sets point adopted by (Turner et al. 1969) and Xinjiang North-west gas field data. The software data-driven models prediction were significant with the highlighted summaries and conclusion derived from the analysis and studies presented below.

  • These data-driven models and software-based models exhibit high predictive ability in the prediction of liquid loading in gas wells studied.

  • The different data-driven models and software-based models used outperform most of the widely applied published models with 92% accuracy for particle swarm optimization, 89% accuracy for genetic algorithm, based on the 106 data in comparison to 79%, 78%, 71%, and 66% accuracy for Turner et al., Barnea et al. Ikpeka et al. and Li et al. models, respectively.

  • The software-based model gotten using ANN gave accuracy of 92% and 93% for test and train data, respectively, in the prediction of liquid loading, while the software-based model gotten via BNN gave accuracy close to that given by Ikpeka et al. (2019) which are 73–77% for train and test data, respectively.

  • The particle swarm optimization and genetic algorithm model developed in this study performed at different level of accuracy of 92% and 89% accuracy respectively for the 106 data set and 83% for Xinjiang North-west gas field respectively.

  • Comparison using confusion matrix and cross-plot shows that the data-driven models and software-based models give higher reliability due to high date separation and accuracy compared to the previous models. This shows that there are still more that can be harnessed using data-driven model with respect to prediction liquid loading in gas wells.

  • The models gotten are limited to vertical wellbores.

  • These models can be applied to the prediction of liquid loading in deviated wells by making the right modifications such as incorporating the angle of deviation and so on and so forth and this is subject to further studies.