Advance artificial time series forecasting model for oil production using neuro fuzzy-based slime mould algorithm

Oil production forecasting is an important task to manage petroleum reservoirs operations. In this study, a developed time series forecasting model is proposed for oil production using a new improved version of the adaptive neuro-fuzzy inference system (ANFIS). This model is improved by using an optimization algorithm, the slime mould algorithm (SMA). The SMA is a new algorithm that is applied for solving different optimization tasks. However, its search mechanism suffers from some limitations, for example, trapping at local optima. Thus, we modify the SMA using an intelligence search technique called opposition-based learning (OLB). The developed model, ANFIS-SMAOLB, is evaluated with different real-world oil production data collected from two oilfields in two different countries, Masila oilfield (Yemen) and Tahe oilfield (China). Furthermore, the evaluation of this model is considered with extensive comparisons to several methods, using several evaluation measures. The outcomes assessed the high ability of the developed ANFIS-SMAOLB as an efficient time series forecasting model that showed significant performance.


Introduction
Forecasting oil production is a significant step for controlling the management of the cost-effect and monitoring the operation of petroleum reservoirs. Consequently, the forecasting of oil production facilitates the reservoir engineers to design plausible projects, which triggers to prevent the blind investment and attains sustainable evolution. Therefore, accurate forecasting of a petroleum reservoir is highly required to control and manage the effective cost of the oil reservoirs. The reservoir properties, including porosity, permeability, compressibility, fluid saturation, and other well operational parameters have a significant effect on oil production. Therefore, it is challenging to forecast future oil production accurately because of the reservoir's complexity, and uncertain subsurface conditions . Numerical reservoir simulation (NRS) and decline curve analysis (DCA) are conventional methods and are commonly used to predict oil production (Doublet et al. 1994;Cumming 2013;Cancelliere et al. 2011). However, both conventional methods still have some limitations,that affect the accuracy of the forecasting performance. Thus, the effective development of oilfields requires an accurate development approach to predict the oil production precisely which assists to select the proper oil recovery methods to increase oil production, and enhance oil transfer from subsurface to surface. Also, it leads to extending the oilfield's life cycle and energizing the economy profit. The (DCA) method utilizes the empirical equations to fit the oil production historical data to characterize the whole reservoir's production mechanism (Tomomi et al. 2000). Moreover, matching the historical production data of the oil wells is a significant challenge, and consuming time, even if the history well's production presents perfect matching. Nevertheless, the potential of calculating the uncertain predictions is possible, even if there are complex and unstable production conditions (Li et al. 2003). On the other hand, the accuracy of (NRS) is robust and reliable to predict oil production; however, accuracy and reliability depend on the static geological model and the quality of dynamic reservoir simulation models, because the development construction of the static geological models is extremely difficult (Hutahaean et al. 2015(Hutahaean et al. , 2016Al Rassas et al. 2020). Furthermore, the parameterization approaches of the static geological model, and the combing means of objective components have a significant effect on the reservoir history matching, and reservoir predicting Song et al. 2020;Kalra et al. 2018). Although multi-objective optimization issues can be addressed effectively, a perfect reservoir history matching model can trigger to cause a bad prediction. The process of history matching is a challenge and required too much time to deal with extensive work.
Deep learning approaches and their implementation have recently grown in the petroleum industry, particularly in reservoir engineering applications (Alkinani et al. 2019), including predicting porosity and permeability (Erofeev et al. 2019; Ahmadi and Chen 2019), Pressure-Volume, Temperature (PVT) (Goda et al. 2003;Alkinani et al. 2019), sensitive analysis and history matching, and forecasting oil production (Ahmadi and Bahadori 2015;Montgomery and O'sullivan 2017;Guo et al. 2018).
Furthermore, the powerful development of deep learning, with the significant evolution of the deep learning algorithms, was introduced to the petroleum industry to overtake the complication issues of traditional methods (Song et al. 2020). Additionally, in literature, various machine learning and deep learning methods had been presented for forecasting oil production Sagheer and Kotb 2019;Wang and Chen 2019). Song et al. (2020) employed Long Short-Term Memory (LSTM) for forecasting oil production time series. In (Alalimi et al. 2021), a modified Random Vector Functional Link network was proposed for time series prediction. This model was applied for oil production in Tahe oilfield, China. Liu et al. (2020) used LSTM with Empirical Mode Decomposition Ensemble to forecast oil production. In (Cc et al. 2013), an oil production forecasting model, namely, the higher-order neural network was proposed. Masini et al. (2020) proposed a combination of algorithms including clustering and density-based clustering with Artificial Intelligence techniques including (Long Short-term memory cells algorithm (LSTM), Vertical Flow Performance (VFP)) to demonstrate assisted production forecasting from the real-time data. McKenna et al. (2020) employed three different levels of uncertainty, including (facies geometry, permeability distribution, and reservoir rock heterogeneity) to assess their influence on reservoir evaluation and prediction. Sequential Gaussian simulation and Kriging probability-field were used to estimate and demonstrate previous uncertainty levels. Fan et al. (2021) presented a hybrid model which considered the benefits of linearity and non-linearity and the effect of manual operations by incorporating the ARIMA (autoregressive integrated moving average) and the LSTM. Moreover, four evaluation methods were utilized to compute the forecasting accuracy. Rădulescu et al. (2020) proposed an econometric approach for forecasting oil production to permit decision-makers and oil product stakeholders to take liability for the production in OECD partner countries. This liability is perceived from various perspectives: political, economic, environmental, military, social, etc. Sagheer and Kotb (2019) proposed deep LSTM to address the drawbacks of conventional prediction techniques and present accurate predictions. Semenychev et al. (2017) Elucidated the complexities of modeling and forecasting the petroleum industry by integrating several production trend models and models of fluctuation. These methods increase the production forecasting accuracy by incorporating the fluctuation components models and controlling the model's evolution and fluctuation. Allen (2020) proposed a data-driven approach as an alternative to traditional production prediction methods. They presented a proxy-well model to predict the production by choosing significant parameters and reservoir data as independent predictor variables. After that, principal component analysis (PCA) was employed to obtain the relevant features,and was employed to estimate the cumulative productions. Wang et al. (2018) a hybridization model of a nonlinear and linear prediction approach was proposed to establish predicting techniques in two-stages, integrating nonlinear grey approach accompanied by mentalism idea to establish nonlinear metabolism grey approach and incorporating it with ARIMA. Al-Shabandar et al. (2021) presented a new model for prediction oil production using a deep-gated RNN that comprises several hidden layers, in which each one has a set of nodes. This model had been evaluated with long-term time-series data.
Negash and Yaw (2020) proposed a new model for oil production forecasting employing artificial neural networks (NNs), which require a physics-based feature extraction to predict fluid production and to boost the forecasting effect. Additionally, there are also other models, such as (Suhag et al. 2017;Liu et al. 2020;Karasu et al. 2020;Male 2019;Aizenberg et al. 2014).
Furthermore, the application of DL in the petroleum industry was not only apply for forecasting oil production, however, recently different DL methods were employed to simulate the carbon emission and reduction (Wang et al. 2022, as well as the impact of energy consumption during the COVID-19 pandemic  In this study, we develop a time-series forecasting approach using an improved ANFIS (adaptive neuro-fuzzy inference system) (Jang 1993) for oil production. We utilize an enhanced version of the lately proposed metaheuristic optimization method, Slime mould algorithm (SMA) based on the opposition-based learning (OBL).
In recent years, the ANFIS model has been adopted in various forecasting applications, such as, oil consumption  (Zhou et al. 2019). The SMA is a recently developed optimization algorithm, presented by . It simulates the behavior of initializing negative and positive feedback of the slime mould propagation waves of slime mould depending on bio-oscillator to form optimal paths to connect foods using efficient exploitation ability and valued exploratory propensity. Due to its competitive performance in solving complex optimization problems, it has been adopted in different applications.
The modified ANFIS is improved using an enhanced version of the SMA using the OLB; thus, it is named SMAOLB-ANFIS. It works by initializing a set of solutions; each solution represents the configuration from ANFIS parameters. We evaluate each solution using 70% of the samples as a training set. The solution that has the smallest fitness value is considered the best solution. Thereafter, the OLB operators are employed to boost the current population, and then SMA operators are used to improve current solutions till meeting terminal conditions. The best ANFIS configuration ( the best solution) is estimated using 30% of the samples as a testing set. The data used in this study are real-datasets for Masila oilfields in Yemen, and Tahe oilfields in China, provided by local partners. The proposed forecasting approach achieved significant performance using several evaluation metrics with comparisons to other methods.
The main contribution of the current study is: -Present an efficient forecasting model for oil production based on a new improved ANFIS model. -Propose an enhanced SMA algorithm to optimize ANFIS parameters using the OBL intelligence search technique. -We evaluate the proposed forecasting model with two real-world datasets from two different oilfields in Yemen and China. Also, we compare the SMAOLB to several optimization methods to verify its performance.

Backgrounds
In this section, we give a brief description to the applied methods, as follows.

ANFIS
The ANFIS approach was established by Jang (1993) as a new artificial network (ANN). The ANFIS model's structure is considered incorporation of ANN and Fuzzy Inference Systems (FIS). Furthermore, "IF-THEN rules" are applied to generate a mapping for inputs and outputs, identified as the "Takagi-Sugeno inference model". This renders to substantiate that the ANFIS approach is more convenient and reliable to process data as it has a robust learning capability. As stated by these characteristics, the ANFIS approach has been implemented in many applications.
In the common ANFIS workflow, as drawn in Fig. 1, the Layer 1 input is represented by x and y, where L 1i indicates the outputs of i node. The ANFIS mathematical model is expressed as follows: where indicates the generalized Gaussian membership function. The membership values of are defined by A i and B i , and i and i refer to premise parameter set.
(1) Fig. 1 The basic ANFIS structure 1 3 More so, Eq. (3) can be utilized for the second layer: The output of the third Layer is calculated as : In which w i represents ith output from the layer 2.
Furthermore, the output of layer is generated by Eq.5.
In which f indicates a function which use input and parameters of the network as inputs. r i , p i , and q i indicate i consequent parameters. Finally, layer 5 generates the output that is computed as in Eq. (6).

Slime mould algorithm
In 2020, SMA was proposed by ) as an alternative natural-inspired optimization technique that can be implemented to solve different optimization issues. It mimics the performances of slime mould's Oscillation and their propagation wave feedback depending on the bio-oscillator, and generates the optimum routes to connect food. It has three primary phases: 1. Phase 1(Approach food): This phase can be presented as in Eq. 7, to define approaching behavior of slime mould.
in which v b ∈ [−a, a] represents random value, v c indicates t random value that is reduced from 1 to 0, and t is the current iteration number. Moreover, Z b represents the The solutions of the Slime are indicated by X. Z A and Z B are two random selected solutions. Additionally, W represents the slime mould weight. Whereas p is calculated using Eq. 8: in which S(i) indicates the fitness value of i-th solution, and DF is the best fitness value.
The v b is computed using Eq.9: W is computed as follows: here, condition indicates that S(i) is ranked in first half of X, where r is randomly generated in [0,1]. More so, b F indicates the best local fitness value, and w F is the worst local fitness value. SmellIndex stores the sorted fitness value.

Opposition-based learning
The OBL (Tizhoosh 2005) is an artificial intelligence technique that can be utilized to improve various methods of optimization . The OBL strategy is based on the current approach to creating new opposition solutions for the given issue. This approach aims to select the optimal candidate solution by achieving the optimum fitness score to obtain the ideal solution (Abd Elaziz et al. 2017). The X opposite value for the real value, where X ∈ [UB,LB], is computed as shown in Eq. (14).
Opposite point: Thus, This formulation is utilized by adding Equation (15) to resolve n-dimensions.
Furthermore, two solutions are given (x and x old ) and compared in the optimization process based on their fitness functions. Then the best solution is saved, whereas other solutions are removed. If f(x) ≥ f(x old ) is stored for maximization, then x is stored; otherwise, x old is stored.

Proposed SMAOLB-ANFIS model
The developed forecasting oil production model is discussed in this section. The proposed model depends on improving the performance of ANFIS based on enhanced SMA according to the value OBL. The main target of using SMAOBL is to the parameters of ANFIS as in Fig. 2. The first step in the developed model, named SMAOLB-ANFIS, is to split the oil production dataset into training and testing sets, then using the training set during the learning stage. In this stage, the developed SMAOLB-ANFIS constructs a population X, which has a set of N solutions; each of one refers to one configuration from the parameters of ANFIS. The next step is to assess the performance of constructed ANFIS according to the current configuration X i by using the following fitness function.
where T and P denote the targets and predicted outputs, respectively. N a indicates the total number of samples of the training set. The next process is to update the current population X by applying the modified SMAOBL. This is achieved by using the operators of SMA as discussed in Algorithm 1. Followed by applying the OBL operator as discussed in Eq. (15). Because OBL needs more computational time, so the developed SMAOBL uses OBL only during the exploration phase. The next step is to check the terminal condition and if it is not satisfied, then repeat the updating steps; Otherwise, return the best configuration which represents X b . Thereafter, apply the testing set to the best configuration X b and evaluate its quality by predicting the oil production. The description of the developed ANFIS is presented in Algorithm 2.

First study area
The first case study or study area is the Masila Basin, Yemen. It is one of the onshore basins located in Hadrammot governorate. It occupied about 1250 km 2 , and it can be considered as one of the Mesozoic sedimentary basins. It was generated as a rift-basin associated with the Mesozoic breakup of Gondwanaland and its development in the Indian Ocean throughout the Jurassic and Cretaceous. The Mesozoic and Cenozoic sequence in Yemen sedimentary basins are widely exposed. Many researchers have studied the lithostratigraphic structure in the Masila Basin includes Sunah oilfield (Hakimi et al. 2014(Hakimi et al. , 2017Al-Areeq and Maky 2015). Block 14 in the Masila basin comprised 20 producing fields, as illustrated in Figure 3. The Sunah oilfield is located in the northwest portion of the Masila block. The S1A formation is made up of shelf sands with tidal and longshore impacts that range in thickness from 25 to 40 feet. Figure 3 presents the study area of the Masila basin -Block 14, Sunah oilfield.

Geological setting
The geological characteristic of Masila oilfield has a substantial role in determining the hydrocarbon zones throughout Masila oilfield. The hydrocarbon occurrence and movement were mainly monitoring by several attributes, including petrophysical properties, facies, faults, folding, and fractures. The Masila block is located in Hadhramaut city and ranks as the most active oilfields (Figure 4) (Hakimi et al. 2011

Second study area
Taha oilfield was discovered in 1990s with total proven reserves of approximately 600 × 10 6 tones. Taha oilfield is situated in Luntai County, Xinxiang province (Höök et al. 2010;Tian et al. 2017). Triassic Oil Formation in the Block-9 of Tahe Oilfield is located about 60 kilometers(km) away from the Luntai country, and its eastern longitude lies between 84 • 13 � 9}} − 84 • 18 � 52 �� and northing latitude 41 • 15 � 56}} − 41 • 16 � 4 �� . Triassic reservoir block-9 was discovered in 2002. Triassic reservoir, block-9 is a sandstone reservoir, which is considered a favorable place for Hydrocarbon accumulations. The oil production was started in 2002, divided into four stages of development, including the pre-production phase, upper-middle-class, stable production phase, and regressive phase (Li and Pan 2017;Yu et al. 2017). Figure 5 shows the location of this oilfield.

Geological setting
Geologically, block 9 in the Taha oilfield is a sandstone reservoir that belongs to the Triassic era. Block 9 consists of 10 normal faults, three large normal faults are extended from the north to the east direction, and the others are secondary

Evaluation metrics
To validate the ability of the developed method to predict the oil production, a set of performance metrics is employed. These measures are the Standard deviation (std), Mean Square Error, Mean Absolute Percentage Error, Mean Absolute Error, and Coefficient of Determination. and their formulations are given in Table 1.

Results
The experiment results are calculated based on four real datasets to forecast oil products for Yemen and China (one dataset for Yemen and three for China). The Yemen dataset consists 341 records collected yearly between 1993 -2015, whereas the China datasets, namely TK905H, TK906H, and TK907H, contain 4108, 4143, and 3838 records, respectively collected daily from 2003 to 2014. The averages of each dataset are as following: Yemen = 31946.95, TK905H = 29.06, TK906H = 33.53, and TK907H = 38.04. These data vectors are formatted to be used in time-series forecasting by applying the auto-correlation function (ACF). Therefore, 7-lags are applied in preparing the China data to be used in the forecasting process whereas, 2-lags are applied for Yemen data. In addition, the dataset is divided into training and testing sets using 10-cross-validation.

Yemen oil field
To evaluate the proposed SMAOLB-ANFIS as a time series forecasting model, we used real datasets collected from Masila oilfields, Yemen. Additionally, we compared the SMAOLB to other models, including the traditional ANFIS, and several ANFIS improved versions using several optimization techniques, namely, SMA, genetic algorithm(GA), particle swarm optimization algorithm (PSO), and whale optimization algorithm (GWO), and sine cosine algorithm (SCA).  Table 2 shows the evaluation results of all compared algorithms in terms of RMSE, MAE, MAPE, R 2 , STD, and computational time (CPU time). For RMSE, the proposed SMAOLB obtained the best results, followed by PSO, GA, SMA, GWO, ANFIS, and SCA, respectively. It is clear that SMAOLB outperforms the traditional SMA and traditional ANFIS, which confirmed the advancements of the proposed method, which is improved by using the operators of the OLB. In case of MAE, The proposed SMAOLB also achieved the best results, followed by PSO, GA, SMA, ANFIS, GWO, SCA, respectively. For R 2 , it is clear that the proposed SMAOLB achieved the best results with 99.6%. The PSO obtained the second rank, where PSO and GA obtained the third rank. The ANFIS and GWO obtained the fourth rank, and finally, SCA came in the last rank. For STD, SMAOLB also obtained the best rank, followed by PSO, GA, ANFIS, SMA, GWO, and SCA, respectively. In contrast to previous records, for computational time, GWO obtained the shortest time, followed by SCA, PSO, GA, SMA, and SMAOLB. This is because the applications of OLB enhanced the search process of the SMA to obtain optimal solutions Table . Additionally, Figures 6 illustrates the forecasting results of the SMAOLB-ANFIS and the compared models. As shown from this figure, the proposed SMAOLB obtained the nearest values of the target (real value).

Tahe oil field, China
For further evaluation of our proposed model, we use another data for three wells in the Tahe oilfield, China. Tables 3-5 show the results of all algorithms for Tahe oilfield, China. As illustrated in Table 3, for the well TK905H, the proposed SMAOLB obtained the best RMSE value. Then, the PSO came in the second rank, where the GA obtained the third rank. More so, the traditional SMA obtained the fourth rank, where the SCA and traditional ANFIS recorded the fifth and sixth ranks, respectively. For the TK906H and TK907H wells, SMAOLB also came in the first rank, followed by PSO, GA, SMA, ANFIS, and SCA. From Table 4, for TK905H and TK906, we see that the SMAOLB achieved the best MAE values, followed by PSO, GA, SMA, SCA, and ANFIS. For TK907, SMAOLB is also the best, followed by PSO, GA, SMA, ANFIS, and SCA. Furthermore, Table 5 indicates that the developed SMAOLB obtained the best R 2 value for the three wells.

Statistical tests
For further analysis, in this section, the Friedman test is employed to test the robustness of the SMAOLB and other compared algorithms depending on all applied evaluation measures. This test assumes there is no significant differences between the results of the control method (i.e., SMAOBL) and other compared methods. This assumption is named null hypothesis, and it is accepted if the value of p-value is greater than 0.05. Otherwise (i.e., p-value less than 0.05), it was rejected, and this confirms that the difference between SMAOBL and other methods is significant.
As indicated in Table 6, the proposed SMAOLB recorded the best Friedman's value in terms of RSME, MAE, and MAPE. The GA obtained the second rank for both MAE and MAPE, followed by PSO, SMA, GWO, ANFIS, and SCA. For RMSE, the PSO obtained the second rank, followed by GA, SMA, GWO, ANFIS, and SCA.
In summary, the above-mentioned results ensured the competitive performance of the developed SMAOLB-ANFIS over the traditional ANFIS and the modified ANFIS using SMA. More so, it outperformed several optimizers that are applied to improve the ANFIS model, such as PSO, GA, SCA, and GWO. Since the developed SMAOBL combined the strength of the SMA and the OBL strategy that aims to support SMA with a suitable mechanism to avoid stuck in local optima, this has been performed during the exploration phase, and this leads to increase the convergence rates towards the feasible regions which contain the optimal solutions (parameters of ANFIS).

Conclusion
This study proposed a developed variant of the ANFIS model, as a time-series forecasting method for oil production using real-world datasets. The traditional ANFIS was enhanced using an intelligence optimization method called SMAOLB. This method was developed by applying the intelligence OLB technique to improve the search process of the slime mould algorithm (SMA). Thus, the proposed forecasting model called ANFIS-SMAOLB was applied to forecast oil production using different datasets from two real-world oilfields in Yemen and China. We implemented several experiments considering several evaluation metrics and statistical tests to evaluate the performance of the developed ANFIS-SMAOLB. Additionally, we compared it to the original structure of the ANFIS and several modified ANFIS using other optimization mechanisms, such as traditional SMA, SCA, PSO, GA, and GWO. We concluded that the SMAOLB showed better performance than the traditional ANFIS, SMA, and other ANFIS versions in all performance measures, except the computational time (CPU time). Therefore, the main limitation of the developed SMAOLB is the computational time, which can be neglected compared to other performance measures that have more important roles in time series prediction and forecasting, such as R 2 , RMSE, MAE, MAPE, and STD. For future work, there are other applications that could be addressed using the SMAOLB, such as feature selection, multi-optimization tasks, and scheduling tasks (i.e., cloud computing, machine job scheduling in manufacturers).

Declarations
Funding No funding was received for conducting this study Conflict of interest All authors declare that they have no conflict of interest Human and animal rights This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.