Introduction

Computational intelligence (CI) has positively impacted the oil and gas industry especially the reservoir characterization and modeling process in the recent time (Al-Bulushi et al. 2009; Dutta and Gupta 2010; Asadisaghandi and Tahmasebi 2011; Al-Marhoun et al. 2012; Barros and Andrade 2013; Anifowose et al. 2014a). This positive impact resulted from the successful applications of various CI techniques such as artificial neural networks (ANNs), functional networks (FNs), fuzzy logic (FL), generalized regression neural network, support vector machines (SVMs), and radial basis function. These techniques have been used to predict various petroleum reservoir properties such as porosity, permeability, pressure–volume–temperature (PVT), depth, drive mechanism, structure and seal, diagenesis, well spacing, and well-bore stability. Some of these reservoir properties are used for the detection of drilling problems, determination of reservoir quality, optimization of reservoir architecture, identification of lithofacies, and estimation of reservoir volume.

The petroleum industry has partly succeeded in reducing or limiting the coring process and encouraged the utilization of archival data acquired and stored over a period of time. The concept of machine learning, with its CI paradigm, has been instrumental to the use of existing data such as well logs and their accompanying core measurements to predict the core values for new wells and uncored sections of wells or new fields. These core properties make significant impacts on petroleum field operations and reservoir management (Jong-Se 2005). Before the application of CI in petroleum science and technology, mathematical equations and empirical correlations have been established to relate some of the well logs to their respective core values. For example, porosity measurements were directly obtained from core samples using the following relationship (Amyx et al. 1960):

$$\phi = \frac{{V_{\text{P}} }}{{V_{\text{B}} }}$$
(1)

where ϕ = porosity, V P = pore volume, and V B = bulk volume.

When calculated from density logs, porosity has been estimated using the following relations (Coates et al. 1997):

$$\phi_{\text{d}} = \frac{{\rho_{\text{ma}} - \rho_{\text{b}} }}{{\rho_{\text{ma}} - \rho_{\text{f}} }}$$
(2)

where ϕ d = density-derived porosity; ρ ma = matrix density; ρ b = bulk density; and ρ f = fluid density.

From sonic log, porosity has been expressed as (Wyllie et al. 1956):

$$\phi_{\text{s}} = \frac{{\Delta t - \Delta t_{\text{ma}} }}{{\Delta t_{\text{f}} - \Delta t_{\text{ma}} }}$$
(3)

where ϕ s = sonic-derived porosity; Δt = transit time; Δt f = fluid transit time; and Δt ma = transit time for the rock matrix.

In a similar manner, a number of equations have been derived for the estimation of permeability from laboratory-measured properties. Among the popular ones is the Darcy’s equation (Shang et al. 2003):

$$k = \frac{q*\mu *L}{A - \Delta P}$$
(4)

where k = permeability (Darcy); q = flow rate (cc/s); μ = viscosity (cp); L = length (cm); A = cross-sectional area (cm2); and ΔP = pressure difference (Atm).

Another popular equation derived for the calculation of permeability from other properties is the Kozeny–Carman equation (Kozeny 1927; Carman 1937) expressed as:

$$k = \frac{{\phi^{3} }}{{F_{\text{s}} \tau^{2} A^{2}_{\text{g}} (1 - \phi )^{2} }}$$
(5)

where k = permeability (µm2); ϕ = porosity (a fraction); F s = shape factor; τ = tortuosity; and A g = surface area per unit grain volume (µm−1). The term, F s τ 2, is called the Kozeny constant.

From Eq. (5), several extensions were proposed. These include Wyllie and Rose (1950), Timur (1968), Coates and Denoo (1981), and Amaefule et al. (1993). It was, however, argued that all these equations can be reduced to linear terms. It was further argued that natural phenomena such as porosity, water saturation, and permeability cannot be adequately estimated by linear relations. With relevant log measurements representing the dynamics of the subsurface, CI techniques have proved to have the capability to use the available log–core pairs to predict the missing reservoir properties of the uncored but logged sections of the reservoir. The CI techniques achieved this by establishing nonlinear relations between the log measurements and the core values for prediction. CI techniques have also been reported to outperform the statistical regression tools (Mohaghegh 2000; Goda et al. 2003; Osman and Al-Marhoun 2005; Zahedi et al. 2009; Al-Marhoun et al. 2012).

As successful as the CI techniques have appeared to be, it has been shown that each technique has certain limitations and challenges that would not make its application desirable in certain conditions such as small, sparse, limited, and missing data scenarios (Chang 2008; Helmy et al. 2010; Anifowose and Abdulraheem 2010a), and model complexity and high data dimensionality conditions (Mendel 2003; Van et al. 2011). The “no free lunch” theory (Wolpert and Macready 1997) also holds true as no single one of the CI techniques could be considered as being the best to solve all problems in all data and computing conditions. Since each of the techniques has its limitations and challenges associated with its strengths, there has been few research attempts in the area of hybrid intelligent systems (HIS) (Chen and Zhao 2008; Helmy et al. 2010, 2012) to have better generalization than individual CI techniques. This is the focus of this review.

The petroleum reservoir characterization process requires such very high degree of prediction accuracy that any deviation from expectation may result in huge losses and wasted efforts through enormous man-hours and huge investments. Conversely, a little improvement in the prediction accuracies will have multiplicative effect on current exploration and production activities. Present prediction accuracies have remained acceptable in the petroleum industry, but there is always the quest for better and more reliable results. In view of this, there is the need for the hybridization of those techniques with traits that are strong enough to be used to complement the performance of other techniques for increased performance in terms of higher prediction accuracies, reduced prediction errors, and faster execution.

The major motivations for this review paper are:

  • The continued discovery of various CI techniques with common denominators that are suitable for hybridization.

  • The need to extract the relevant input parameters from the deluge of data currently being experienced with the advent of sophisticated data acquisition tools such as logging while drilling (LWD), sensing while drilling (SWD), and measurement while drilling (MWD) in petroleum exploration.

  • The consistent quest for better techniques in the prediction of petroleum reservoir properties for improved production of energy.

Hybrid intelligent systems

Hybrid computational intelligence is the branch of machine learning theory that studies the combination of two or more CI techniques to cooperatively work together to form a single functional entity for better performance (Tsakonas and Dounias 2002; Guan et al. 2003). This process of combining the strengths to overcoming the effects of the weaknesses of multiple CI techniques to solve a problem has become popular in the recent times and especially in fields outside the oil and gas industry. The increased popularity of these systems lies in their extensive success in many real-world complex problems. A key prerequisite for the merging of technologies in the spirit of hybridization is the existence of a “common denominator” to build upon. This includes the inference procedures and excellent predictive capabilities deployed by the techniques.

A single overall technique that comes out of this approach of combining two or more existing techniques is called a hybrid system (Chandra and Yao 2006; Khashei et al. 2011). It is an approach that combines different theoretical backgrounds and algorithms such as data mining and soft computing methodologies. Hence, hybridization of CI techniques can boost their individual performance and make them achieve much more success in dealing with large-scale, complex problems than their individual components. The hybrid concept is rooted in the biological evolution of new traits based on the combination of desired traits in individual species. Hybrid methodology can come in different flavors: feature selection, cooperative architecture, or optimization. The focus of this review is the feature selection-based HIS. This kind of cooperative network has been shown to perform excellently well in the few attempts in petroleum engineering as well as in most other applications. The reasons for focusing on the feature selection methodology are:

  • The petroleum industry is an application domain and not a core domain of computer science. This precludes the cooperative architecture methodology.

  • The feature selection-based hybrid methodology will benefit the petroleum industry in handling its high-dimensional, multimodal, and multi-scale streams of data. This will help to avoid or reduce the curse of dimensionality (Trunk 1979) in modeling its reservoir properties.

A general framework for hybrid modeling is shown in Fig. 1. The chart shows how each technique in the hybrid model contributes its respective part to solving the overall problem. This synergetic spirit and cooperative effort combine the strength of each technique to solve a problem while suppressing the weaknesses of the respective techniques.

Fig. 1
figure 1

Generalized framework of hybrid intelligent systems

The following section reviews the petroleum reservoir characterization process and the benefits it has derived from the application of HIS.

Hybrid intelligent systems in petroleum reservoir characterization

Evolution of the petroleum reservoir characterization process

The logging and coring processes discussed in “Introduction” section partly constitute the overall important task called petroleum reservoir characterization (Mohaghegh 2000). This is a process for quantitatively describing various reservoir properties in spatial variability using available field and laboratory data. It is the process of building reservoir models usually between the discovery phase of a reservoir and its management phase by incorporating certain characteristics that has to do with its ability to store and produce hydrocarbons. It has to do with modeling the behavior of rocks under various circumstances with respect to the presence of oil and gas as well as their ability to flow through the medium. The ultimate aim of a petroleum reservoir characterization process is to determine the properties of the reservoir structure in order to find the optimal production techniques that will optimize the production procedures. Reservoir characterization plays a crucial role in modern reservoir management. It helps to make sound reservoir decisions and improves the asset value of the oil and gas companies. It maximizes integration of multidisciplinary data and knowledge and hence improves the reliability of reservoir predictions. The ultimate goal is a reservoir model with realistic tolerance for imprecision and uncertainty (Helmy et al. 2010).

The reservoir characterization process plays a crucial role in modern reservoir management: making sound reservoir decisions and improving the reliability of the reservoir predictions.

It focuses on modeling each reservoir unit, predicting well behavior, understanding past reservoir performance, and forecasting future reservoir conditions. Furthermore, it is the process between the discovery phase of a reservoir and its management phase. The process integrates the technical disciplines of petrophysics, geophysics, geology, reservoir engineering, production engineering, petroleum economics, and data processing and management (Aminian and Ameri 2005; Wong et al. 2005). The evolution of the petroleum reservoir characterization process from the direct measurement of various reservoir properties directly from core samples, through empirical equations and multivariate regression, to the use of CI techniques has been discussed in “Introduction” section. A major drawback of the empirical equations and correlations is that they have to be derived all over again from the scratch when applied on new datasets. This is effortful and time-consuming. They are static and difficult to re-calibrate on new cases. Though both models cannot be generalized to a new applicable outside their design data coverage, CI techniques have better advantage of being dynamic as they are easily retrained with new datasets and can easily be adapted to a new application.

HIS has not been adequately utilized in the petroleum industry. The awareness is just being felt around. Existing hybrid models in the literature are mostly limited to genetic algorithm (GA) with neuro-fuzzy and artificial neural network coupled with fuzzy logic. One of the earliest works on HIS application in petroleum reservoir characterization is Jong-Se (2005). He simply used a fuzzy curve analysis based on fuzzy logic to select the best related well logs with core porosity and permeability data as input attributes to a neural network model. Another work is Xie et al. (2005) who developed a hybrid GA and fuzzy/neural inference system methodology that provides permeability estimates for all types of rocks in order to determine the volumetric estimate of permeability. Their proposed hybrid system consisted of three modules: one that serves to classify the lithology and categorize the reservoir interval into user-defined lithology types, a second module containing GA to optimize the permeability profile prediction, and the third module that uses neuro-fuzzy inference systems to form a relationship for each permeability profile and lithology.

More recently, in order to obtain the minimum possible duration, Zarei et al. (2008) used a hybrid GA and neuro-fuzzy model to determine the optimal well locations using the net present value as the objective function. Another interesting hybrid algorithm proposed in the literature is Al-Anazi et al. (2009). With the objective of overcoming the poor performance of conventional techniques such as empirical, linear, and multi-linear regression methods in the estimation of petroleum reservoir properties, they presented a two-stage fuzzy ranking algorithm integrated with a fuzzy predictive model to improve generalization capability. They used fuzzy curve with surface analysis to identify information-rich well logs and filter out data dependencies. The results showed that the hybrid model performed better than the conventional methods and offered an effective dynamic system that can be continuously conditioned as new data becomes available.

In the spirit of pseudo-hybridization, Shahvar et al. (2009) used fuzzy logic to predict the flow zone index (FZI) of reservoir rocks from wireline logs which served as part of the input for an ANN model for the estimation of permeability. They reported that with the successful prediction of FZI, the result of the permeability estimation was highly satisfactory and more robust when compared with the conventional multi-linear regression. A fusion of GA and ANN was used to estimate reservoir permeability by Mohsen et al. (2007) while using the GA to automatically optimize the tuning parameters and ANN to establish a relationship between the log data and core permeability. A hybrid fuzzy-GA system for the optimization of gas production operations was proposed by Park et al. (2010a). They used the traditional fuzzy logic to accommodate uncertainties in the field data and GA as a primary optimization scheme to solve the optimum gas production rates of each well and pipeline segment diameters to minimize investment costs. The traditional and conventional hybrid technique used in the modeling of oil and gas reservoir properties is the adaptive neuro-fuzzy inference system (ANFIS), which combines the functionalities of ANN and fuzzy logic techniques. This hybrid technique featured in the study of Ali et al. (2008) where it was used to predict the permeability of tight gas sands using a combination of core and log data.

In some of these proposed hybrid models, fuzzy logic was used to select the best related well logs with core porosity and permeability data, while the ANN component was used as a nonlinear regression method to develop transformation between the selected well logs and core measurements. The GAs were used, in some of the studies, to optimize the tuning parameters of the CI technique and, in some others, to select the dominant variables from the original well logs. The GA, ANN, and fuzzy logic algorithms, as implemented in those studies, have limitations that hamper their choice for such roles. However, based on experience gathered in previous studies and a critical review of existing literature, it could be argued that the combination of evolutionary algorithms such as GA, particle swarm optimization, ant colony optimization, bee colony optimization, fuzzy logic, and ANN in the aforementioned hybrid models could have limitations based on the following reasons:

  • Though GA and the other evolutionary algorithms are very robust optimization algorithms, but since they are based on exhaustive search paradigms, they are well known for their long execution time, need for high processing power due to their computational complexity and sometimes inefficiency as they get cut up in some local optima (Bies et al. 2006).

  • For the fuzzy logic, especially type 2 fuzzy logic system, reports have shown that it easily becomes computationally intensive (Abe 2004), hence requiring more time for execution when applied on high-dimensional data (Mendel 2003), and performs poorly when applied on datasets of small size (Mendel 2003; Pratama et al. 2012).

  • ANN is also known to suffer from many deficiencies (Petrus et al. 1995; Rusu and Rusu 2006) such as having no general framework to design the appropriate network for a specific task, the need to determine the number of hidden layers and hidden neurons of the network architecture are determined by trial and error, requiring a large number of parameters to fit a good network structure, and using pre-defined activation functions without considering the properties of the phenomena being modeled. However, recent studies have proposed to overcome some of these problems especially an optimization-based workflow that determines an optimal topology that fulfills a suitable error margin (Enab and Ertekin 2014; Enyioha et al. 2014).

In view of the above, combining the limitations of GA, ANN, and fuzzy logic in a hybrid model could only result in a combined overall inefficiency despite their reported good individual performances (Xie et al. 2005; Mohsen et al. 2007; Ali et al. 2008; Zarei et al. 2008; Al-Anazi et al. 2009; Shahvar et al. 2009; Park et al. 2010a). Various studies to address these reported problems of ANN through the development of other algorithms such as cascade correlation and radial basis function did not improve its overall performance (Bruen and Yang 2005). It has not been proven in the literature that the use of fuzzy logic and GA components in hybrid models was able to effectively neutralize the limitations of ANN. These deficiencies of ANN are part of the justifications for looking toward HIS, in terms of performance and robustness, for the prediction of reservoir properties. In addition to this, there was the need to apply lightweight feature selection-based algorithms to extract the dominant input parameters rather than using complex algorithms to tune the parameters of the CI techniques.

Some of the early attempts at the application of feature selection-based HIS applications in reservoir characterization include Helmy et al. 2010; Helmy and Anifowose 2010; and Anifowose and Abdulraheem 2010a. They combined the capabilities of functional networks (FNs), type 2 fuzzy logic system (T2FLS), and support vector machines (SVMs) for the prediction of porosity and permeability of some Middle East carbonate reservoirs. The FN was used to reduce the dimensionality of the input data (well logs) for the more efficient training of the T2FLS. The T2FLS component was used to handle the uncertainties in the input data before submitting the fuzzified output to the SVM component for prediction. The attempt was deemed excellent. However, it was felt that the combination of three techniques in a single hybrid model made them cumbersome and complex. It was not also clear how each component contributed to the overall improvement of the hybrid models due to the seeming redundancy of the hybrid components. As a result of these, simpler and lightweight hybrid models that would not consist of more than two components were proposed.

Consequently, in their later attempts (Anifowose and Abdulraheem 2011; Anifowose et al. 2013, 2014b), the same authors (mentioned above) focused on rather simpler methodologies to combine the hybrid models following the Occam Razor’s principle of simplicity (Jefferys and Berger 1991). Due to the simplicity of the newly proposed design of the hybrid models, the contributions of each component became clear. The CI techniques that were explored for the reduced-component hybrid models were taken from those with promising capabilities as reported in the literature. They include FN (Bruen and Yang 2005; Castillo et al. 2000, 2001; El-Sebakhy et al. 2007; El-Sebakhy 2009), T2FLS (Olatunji et al. 2011), SVM (El-Sebakhy et al. 2007; El-Sebakhy 2009; Abedi et al. 2012; Al-Anazi and Gates 2012), decision trees (DT) (Bray and Kristensson 2010), extreme learning machines (ELMs) (Heeswijk et al. 2009; Avci and Coteli 2012), and SVM (Sewell 2008; Vapnik 2000). The proposed hybrid models include FN-SVM, DT-SVM, fuzzy ranking-SVM, FN-T2FLS, and FN-ELM. Kaydani et al. (2011) proposed a hybrid neural genetic algorithm to predict permeability from the well log data in one of the Iranian heterogeneous oil reservoirs. The approach was based on reservoir zonation according to geological characteristics and sorting the data in the same manner.

The studies focused on investigating the capability of the feature selection process to further improve the performance of SVM, T2FLS, and ELM. Since T2FLS was reported to perform poorly in the event of small dataset (Helmy et al. 2010) and taking long to execute with datasets of high dimensionality (Karnik and Mendel 1999; Mendel 2003), the studies investigated the possibility to improve the performance of T2FLS when assisted with a feature selection process. ELM was proposed (Huang et al. 2004) as an effort to overcome some of the shortcomings of ANN (Petrus et al. 1995), and it has been reported to perform well in other fields such as bioinformatics (Huang et al. 2006).

As for the choice of FN, DT, and fuzzy ranking, the authors discovered them as good candidates for feature selection in their comprehensive literature search. They have also been considered to be lightweight, hence possible alternatives to the heavyweight evolutionary algorithms such as GA. The results of the studies confirmed that the feature selection process contributed significantly to the improvement of the SVM, T2FLS, and ELM techniques. For interested readers, more details of all the individual CI techniques mentioned so far could be found in respective computer science and CI applications in petroleum engineering and geosciences literature. The application of HIS so far in petroleum reservoir characterization is given in Table 1.

Table 1 Summary of feature selection-based hybrid systems in reservoir characterization

To close this section, it is pertinent to discuss a comparison of the CI-based methods and the conventional geostatistical methods: kriging and co-kriging.

Comparison of computational intelligence and geostatistics

Kriging and co-kriging are geostatistical techniques used to interpolate or extrapolate the value of a random field at an unobserved location from observations of its value at nearby locations. Interpolation or extrapolation is the estimation of a variable at an unmeasured location from observed values at surrounding locations. Both methods are generalized forms of univariate and multivariate linear regression models, for estimation at a point, over an area, or within a volume. Similar to other interpolation methods, they are linear-weighted averaging methods that not only assign weights according to functions that give a decreasing weight with increasing separation distance but also capitalize on the direction and orientation of the neighboring data to the unsampled location (Bohling 2005).

Kriging is defined as:

$$Z^{*} \left( u \right) - m\left( u \right) = \mathop \sum \limits_{a = 1}^{n\left( u \right)} w_{a} \left[ {Z\left( {u_{a} } \right) - m\left( {u_{a} } \right)} \right]$$
(5)

where Z *(u) is the linear regression estimator, u and u a are the location vectors for estimation point and one of the neighboring data points, indexed by a, n(u) is the number of data points in the local neighborhood used for the estimation of Z *(u), m(u) and m(u a ) are the expected values (means) of Z(u) and Z(u a ), and w a (u) is the kriging weights assigned to data Z(u a ) for estimation location u. Same data points will receive different weight for different estimation locations.

Z(u) is treated as a random field with a trend component, m(u), and a residual component, R(u) = Z(u) − m(u). Kriging estimates residual at u as weighted sum of residuals at surrounding data points. Kriging weights w a are derived from the covariance function or semivariogram, which should characterize residual component (Myers 1984; Switzer 2006).

The major difference between CI and geostatistics is that the former considers the nonlinear relationship between predictor and target variables, while the latter is linear. Due to the limitation of computing power and quest for affordable solutions, the reservoir characterization problem has been assumed to be linear contrary to the reality. This has been the major reason for the better performance of CI-based models than linear estimators. In addition to the aforementioned, the learning capability of the CI techniques makes them more adaptive to new datasets and increases the generalization capability.

Making a case for the feature selection-based hybrid methodology

The reason for focusing on the feature selection-based hybrid methodology in this review was discussed in “Hybrid intelligent systems” section. In particular, the deluge of data in the oil and gas industry acquired from day-to-day modern data acquisition tools such as LWD, SWD, and MWD has made the feature selection process a necessary step in order to ensure that only those attributes that are most relevant to the prediction of targeted reservoir properties are used. The feature selection-based hybrid methodology is the ideal candidate, in terms of efficiency and accuracy, for extracting useful knowledge from such hyper-dimensional data. This is based on the reported successful applications in other fields (Peng and Wang 2009; Helmy et al. 2012) as well as the few successful attempts (Helmy et al. 2010; Kaydani et al. 2011) in petroleum reservoir characterization. The use of those input parameters that are not deemed relevant to the target variables could corrupt the performance and increase the time complexity of a model. Hence, the feature selection process provides three major benefits:

  • Reducing the dimensionality of the input data.

  • Extracting the most relevant of the attributes for best prediction performance.

  • Assisting the attainment of the optimality of a model.

These three benefits will be much desirable for keeping prediction models simple, following the principle of Occam’s Razor (Jefferys and Berger 1991), as well as ensuring the optimum accuracy of reservoir properties predictions. Coincidentally, petroleum and geoscience professionals have always been in search of models that will offer increased accuracy in the prediction of the various petroleum reservoir properties since a marginal increase in reservoir properties predictions will further increase the efficiency of exploration, production, and exploitation of hydrocarbon resources.

Despite that CI techniques have been successfully applied in the petroleum industry, the feature selection procedure has been done with statistical packages such as SPSS. These are based on the linear relationships among the predictor variables. The same argument follows that of the use of multivariate regression tools for predicting reservoir properties that the features are extremely nonlinear attributes and could not be adequately modeled with such linear relationships. Another problem with the statistical tools is that they are completely offline. Consequently, the results obtained from them have to be manually presented to the CI techniques for further processing. This creates a time lapse in addition to the inadequate relationship established among the predictor variables. Studies have shown that CI techniques are better than most statistical procedures and packages (Sfidari et al. 2012).

With the reports of the successful application of HIS (Jong-Se 2005; Lean et al. 2006; Evaggelos et al. 2006; Bullinaria and Li 2007; Jin et al. 2007; Mendoza et al. 2007; Anifowose and Abdulraheem 2010a; Helmy et al. 2012), it becomes clear that the petroleum industry is in dire need of this new modeling approach especially in the petroleum reservoir characterization workflow. The conceptual framework of the feature selection-based hybrid learning concept is presented in Fig. 2.

Fig. 2
figure 2

Conceptual framework of a feature selection-based hybrid learning paradigm

In Fig. 2, an n-dimensional dataset is passed through a feature selection algorithm. The output of this is an m-dimensional subset of the dataset where m ≤ n. Following the supervised machine learning paradigm, the subset is divided into training and testing parts. The training part containing known input and output cases are used to train a CI technique. The testing part is then passed into the trained model to predict the unknown target output. A more complex configuration than this is also possible.

The next section discusses briefly some of the methods used for feature selection in the literature.

Feature selection algorithms used in the literature

An extensive search in the literature revealed only three algorithms that have been used for feature selection: functional networks, decision trees, and fuzzy information entropy (FIE) (also known as fuzzy ranking). Each of them is discussed in the following sections.

Functional networks

FN (Castillo 1998) is an extension of ANN which consists of different layers of neurons connected by links. Each computing unit or neuron performs a simple calculation: a scalar, typically monotone, function f of a weighted sum of inputs. The function f, associated with the neurons, is fixed, and the weights are learned from data using some well-known algorithms such as the least-squares fitting algorithm used in this work. The main idea of FN consists of allowing the f functions to be learned while suppressing the weights. In addition, the f functions are allowed to be multi-dimensional, though they can be equivalently replaced by functions of single variables. When there are several links, say m, going from the last layer of neurons to a given output unit, we can write the value of this output unit in several different forms (one per link). This leads to a system of m − 1 functional equations, which can be directly written from the topology of the network. Solving this system leads to a great simplification of the initial functions associated with the neurons. An example of this is shown in Fig. 3.

Fig. 3
figure 3

a Structure of FN b its simplification (Castillo et al. 2001)

FN has been mathematically and defined generalized in the following:

  • If we assume that we have a neuron with s inputs: (x 1 ,…, x s) and k outputs: (y 1 ,…, y k), then it follows that there exist k functions F j ; j = 1,…, k, such that y j  = F j (x 1,…, x s ); j = 1,…, k.

FN also consists of a set of directed links that connect the input layer to the first layer of neurons, neurons of one layer to neurons of the next layer, and the last layer of neurons to the output units. Connections are represented by arrows, indicating the direction of information flow. FN has also featured in a number of research studies. A comprehensive demonstration of the application of FN in statistics and engineering is given by Castillo et al. (2000, 2001).

For effective learning of FN, there was the need to do a model selection to choose the best FN model using the minimum description length (MDL) principle (Castillo 1998). This measure allows comparisons not only of the quality of different approximations, but also of different FN models. It is also used to compare models with different parameters, because it has a penalty term for overfitting. Accordingly, the best FN model for a given problem corresponds to the one with the smallest MDL value. This was calculated using the backward–forward method with the least-square criterion to determine the least error attainable during the training process. This algorithm has the ability to learn itself and to use the input data directly, by minimizing the sum of squared errors, in order to obtain the parameters, namely the number of neurons and the type of kernel functions needed for the fitting process. It works by building an initial model of all possible functional equations, simplifying the model, and selecting the best parameters for the simplified model. A detailed description of this method can be found in Castillo (1998).

Decision trees

DT learning, as a data mining and machine learning technique, uses a decision tree algorithm as a predictive model that maps observations about a problem to conclusions about the problem’s target value (Breiman 1984). DT models are also known as classification trees or regression trees. In these tree structures, leaves represent classifications and branches represent conjunctions of features that lead to those classifications. A tree can be “learned” by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node has the same value as the target variable, or when splitting no longer adds value to the predictions (Yohannes and Webb 1999).

In data mining, trees can be described also as the combination of mathematical and computational techniques to aid the description, categorization, and generalization of a given set of data. Usually, data come in records of the form:

$$\left( {x,y} \right) = \left( {x_{1} , \, x_{2} , \, x_{3} \ldots , \, x_{k} ,y} \right)$$
(6)

where the dependent variable, y, is the target variable that we are trying to understand, classify, or generalize. The vector x is composed of the input variables, x 1, x 2, x 3, etc., that are used for that task.

Some of the algorithms that are used for constructing decision trees include Gini impurity and information gain (Moore 2015). The Gini impurity is based on squared probabilities of membership for each target category in the node. It reaches its minimum (zero) when all cases in the node fall into a single target category. Mathematically, it is expressed as follows:

Suppose y takes on values in {1, 2,…, m}, and let f (i, j) = frequency of value j in node i. That is, f (i, j) is the proportion of records assigned to node i for which y = j as presented in the equation:

$$I_{\text{G}} \left( i \right) = 1 - \mathop \sum \limits_{j = 1}^{m} f\left( {i,j} \right)^{2} = \mathop \sum \limits_{j \ne k} f\left( {i,j} \right)f\left( {i,k} \right)$$
(7)

The information gain is based on the concept of entropy used in information theory as expressed in the equation:

$$I_{\text{E}} \left( i \right) = \mathop \sum \limits_{j = 1}^{m} f\left( {i,j} \right)\log_{2} f\left( {i,j} \right)$$
(8)

DTs are simple to understand and interpret. They require little data preparation and are able to handle both numerical and categorical data. They use a white box model such that if a given situation is observable in a model, the explanation for the condition is easily given by Boolean logic. It is possible to validate a model using statistical tests making it possible to account for the reliability of the model. They are robust and perform well with large data in a short time (Yohannes and Webb 1999). However, the problem of learning an optimal DT is known to be NP-complete (Breiman 1984; Yohannes and Webb 1999). Consequently, practical DT algorithms are based on heuristic algorithms such as the greedy algorithm where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree similar to ANN. Also, DT learners create over-complex trees that do not generalize the data well, referred to as overfitting, thereby requiring the use of additional mechanisms such as pruning to avoid this problem. This additional mechanism increases the complexity of implementation. Similar to the traditional ANN during its inception, decision trees have some concepts such as XOR, parity, or multiplexer problems that are hard to learn because they cannot be expressed easily. In such cases, the decision tree becomes prohibitively large (Sherrod 2008).

Determining the relative importance of a feature is one of the basic tasks during the generation of a DT model. If a dataset is subdivided using the values of an attribute as separators, a number of subsets will be obtained. For each of these subsets, the information value I i can be computed such that I i < I, and the difference (I − I i) is a measure of how well the parameter has discriminated between different subsets. The parameter that maximizes this difference is then selected. The measure can also be viewed as a class separability measure. However, this measure suffers the drawback that it may choose parameters with very low information content (Lopez de Mantaras 1991). An example of a decision tree is shown in Fig. 4.

Fig. 4
figure 4

Example of a decision tree (Sherrod 2008)

Fuzzy information entropy

Fachao et al. (2008) described FIE in the following manner:Let U = {x 1 , x 2 ,…, x n } be a non-empty universe, R is a fuzzy equivalence relation on U, and let [x i ] R be the fuzzy equivalence class containing x i generated by R, it follows that:

$$|\left[ {x_{\text{i}} } \right]_{R} | = \mathop \sum \limits_{j = 1}^{n} R\left( {x_{\text{i}} ,x_{j} } \right)$$
(9)

This is called the cardinality of [x i ] R . These relations are further given by:

$$H\left( R \right) = - \frac{1}{n}\mathop \sum \limits_{i - 1}^{n} \log_{2} \frac{{|\left[ {x_{\text{i}} } \right]_{R} |}}{n}$$
(10)

This relation H is called the information entropy of R. It extracts the input parameters (such as well logs) that have strong fuzzy relations (i.e., cardinality) with the target variables (such as porosity and permeability in the case of this study). However, the main drawback of the information entropy algorithm is its sensitivity to the dimensionality of the input data (i.e., the number of attributes) (White and Liu 1994). The pseudo-code for the implementation of this algorithm is given as:

  • Step 1. Input information system IS = \(\left\langle {U,A,\; V,\;f} \right\rangle\);

  • Step 2. ∀A i  ϵ A: compute the fuzzy equivalence matrix and choose appropriate fuzzy information filter operator F to filter the fuzzy equivalence matrix;

  • Step 3. Red = ϕ;

  • Step 4. For each A i  ϵ A—red, compute significance of Attribute A i in the attribute set A i \({\bigcup }red\), sig(a i , a i \({\bigcup }red)\) = H(ai \({\bigcup }red) -\) H(a i );

  • Step 5. Choose attribute A x which satisfies:

  • G(A x ) = \(\mathop {\hbox{max} }\limits_{i} \left[ {SIG\left( {A_{i} ,A_{i} {\bigcup }red} \right)} \right]\);

  • Step 6. If G(A x ) > 0, then red = red \({\bigcup }\left\{ {A_{x} } \right\}\), goto step 4, else goto step 7;

  • Step 7. Output the reduct of IS.

Conclusion and the future of HIS in petroleum reservoir characterization

This review started with the appreciation of the application of CI in petroleum reservoir characterization and examined how it evolved into the application of hybrid intelligent systems. A case was made for the feature selection-based hybrid systems by highlighting the dire need for them in modern data-driven and data-centric reservoir modeling endeavors. With the advent of advanced real-time data acquisition tools such as LWD, MWD, and SWD coupled with the recent need to integrate all manners of data from well logs through seismic to nuclear magnetic resonance (NMR) for improved reservoir modeling and predictions, petroleum engineers have had to deal with datasets of increasingly high dimensionality. With these data deluge, there was the need to select the most relevant of the parameters in order to reduce the matrix dimension of the datasets and increase the predictive performance of the models. We have been able to show that the hybrid intelligent learning paradigm is the way to go to handle this challenge.

The various limitations of the existing CI techniques were reviewed, and hybrid systems were proposed as robust ways to handle and overcome some of the limitations. It was observed that though some hybrid techniques may have been applied in petroleum reservoir characterization, there are still rooms for further exploration and investigation. A lot of hybrid learning possibilities are yet to be discovered. Some of the feature selection algorithms that have remained unexplored include the various polynomial degrees of the following:

  • Sequential forward selection (Jain and Zongker 1997; Sun and Yao 2006).

  • Sequential backward elimination (Jain and Zongker 1997).

  • Adaptive forward selection (Somol et al. 1999).

  • Adaptive backward selection (Somol et al. 1999).

  • Forward selection with backward elimination (bidirectional) (Mao 2003).

Each of these can be combined with any of the CI techniques to form an infinite number of possibilities for new hybrid models. With today’s powerful computing resources, algorithmic complexity and memory intensity may no more be an issue of concern. Hence, yet-to-be-explored optimization-based hybrid systems can also be investigated. These include the combination of any of the existing CI techniques with new and state-of-the-art evolutionary algorithms such as bee colony, ant colony, bat colony, cuckoo search, and particle swarm optimization. The evolutionary algorithms will be used to optimize the learnability of the CI techniques. It is believed that this will offer numerous advantages over the manual optimization methods.