Introduction

Permeability prediction is one of the most important tasks in oil and gas reservoir evaluation. The core laboratory analysis obtained from drilling provides the most reliable permeability value. Because of the complexity of cost and process, this method cannot be popularized in large areas. The conventional permeability prediction is based on the regression analysis of multivariate statistics, and the most common approach is to establish a regression formula between porosity and permeability. However, the real laboratory core results confirmed that the core permeability and permeability prediction by regression has big errors especially in low porosity–permeability reservoir. It remains a challenge in tight and heterogeneous formations.

The prediction of permeability is developed with theory and techniques (Geerits et al. 1999; Anifowose et al. 2013a, b, 2014a, b, c; Hassan et al. 2013). The method in early times is to incorporate the variables associated with the permeability, such as porosity, saturation and capillary pressure, into the permeability regression model (Coates and Dumanoir 1974; Kozeny 1927; Carman 1937). It forms a series of models such as the famous Coats formula and the Kozeny–Carman formula; Lev Vernik (2000) set up an exponential model based on porosity and shale content control, but it is only applicable to shallow ocean sediments and fluvial deltaic sediments strongly correlated with the particle (pore) size of weakly diagenetic rocks; Faruk Civan (2002) incorporates the fractal attributes of connected pore space into a bundle of tortuous leaky hydraulic tubes model of porous media to estimate permeability. The logging technique is also improved. Nuclear magnetic resonance (NMR) logs are also used to obtain good estimates of pore-space characteristics to improve permeability correlations (Sen et al. 1990; Coates et al. 1991; Quintero et al. 1999; Amabeoku et al. 2001). Array Sonic Logging utilizes the properties of the frequency shift and time delay of the wave to indicate good permeability result (Winkler 1989). Baziar et al. (2014) employ co-active neuro-fuzzy inference system and support vector machine to predict permeability of Mesaverde tight gas sandstones located in Washakie Basin in USA. Permeability prediction in sandstone reservoirs using data mining and expert system approach is done in KS field (Nashawi and Malallah 2009; Gholami et al. 2012).

The low porosity–permeability reservoir has its own characteristics: the heterogeneity of the strata caused by different curvature and shape coefficients of the pore throat results in a drastic change in samples; a disproportionate correlation makes samples in same porosity probably with simultaneous presence of permeability in high- and low-displayed with high scattering in crossplot. Any empirical correlation obtained from such a plot is vulnerable to unrealistic results. All these limit the prediction accuracy and precision of the permeability in low porosity–permeability reservoir. Feature engineering is the key process of discovering features that have a significant impact on the target variable (Qian et al. 2016). In this study, feature engineering procedure in Data-Driven Analytics is paid most attention to characterize the change in the representation of permeability. It is modeled from the main geological effects using advanced technology and a comprehensive process to analyze and select the main control factor that can be represented as variables, and use appropriate analytical means to carry out model construction, integration, and optimization. According to the characteristics of Mesozoic clastic strata in Gaoqing area, feature engineering process is conducted by a comprehensive analysis by reservoir physical analysis and statistical multivariate variance analysis SNK method. The optimal scale analysis is used to quantify the variable scale for the existence of discrete variables. Finally, it adopts the fuzzy logic algorithm which can tolerate and explain the objective contradictions. It allows the calculation process to be modeled with the error components. Compared with the traditional regression, fuzzy logic algorithm with feature engineering predicts the permeability more conformed to the core data.

Area of study

The Mesozoic buried hill in Gaoqing area is one of the ancient buried hill in Jiyang depression with its Mesozoic strata sloping towards the North. The stratum at the top of Mesozoic is invaded by multi-stage magma. Meanwhile, the formation is raised to the surface under the sharp lifting of the faulting in the south of study area. Long-term weathering and erosion caused hiatus between Mesozoic stratigraphy and overlying strata Kongdian formation constructing angular unconformity (Jiang 1998).

The Mesozoic strata are incomplete. The lithology near Mesozoic unconformity structure in Gaoqing area is red and purple, gray purple, gray mudstone with tuffaceous sandstone or conglomerate and multilayer mafic volcanic, such as gray-green rock, diorite rocks, and igneous rocks. The remaining strata are mainly distributed in the Northern Slope in Gaoqing field forming angular unconformity. Reservoir is mainly located in sandstone, siltstone, and basalt gravel buried shallow with the attribute of low porosity and low permeability.

Methodology

Data scenario

Core data utilized in the study is collected in Mesozoic stratigraphy and the formation of overlying unconformity structure in Gaoqing area: three coring wells (including porosity, permeability and density) with definition of rock and microscope observation of core slice, 21 wells with a complete series of well-logging data. Considering the unbalanced distribution of the sample counts, the sample set was divided into two subsets with equal counts, one for training and one for testing the ability of the trained classifier to correctly assign classes. Nomenclature is as follows:

Nomenclature

GR, gamma ray

DEN, the density logging

SP, spontaneous potential

AC, the acoustic time logging

LLD, the deep-lateral logging

Elec, electrofacies

LLS, the shallow-lateral logging

 

Reservoir properties

A classification of the region’s electrofacies is obtained using an optimized KNN clustering method (Wang et al. 2018b, c) based on the characteristics of logging response of GR, SP, LLD, LLS, AC, and DEN. The optimized KNN clustering method based on weighed cosine distance was proposed to better fit the electrofacies Model. To deal the problem for the initial center selection and outliers, a set of statistic method like box plots is conducted. A better cluster series center to feature more information about electrofacies and new distance algorithm selection is obtained in the base of geology model and a logging data similarity attribute when utilizing KNN clustering. It is divided into six classes of electrofacies by this method (Table 1).

Table 1 The cluster center and the definition of the electrofacies

The overall distribution of porosity and permeability are shown in Figs. 1 and 2. The pore-permeability distribution has good morphological similarity, which indicates that it has slightly certain correspondence. According to the standard of clastic reservoir classification in Table 2, the porosity has obvious multiple humps distribution, and the main distribution locates in ultra-low porosity in a mean value of 5% and the low-to-middle porosity ranging from 10 to 20% with least extra-high porosity. The permeability distribution is mainly concentrated in ultra-low permeability and low-permeability reservoir ranging from 1 × 10−3 μm2 to 40 × 10−3 μm2. The permeability distribution has obvious double-humped properties. With 1 × 10−3 μm2 and 40 × 10−3 μm2 as the two humps characteristics, they all belong to the ultra-low-to-low permeable strata. In a whole, the region mainly distributes low porosity–low permeability and low porosity–ultra-low permeability reservoir and develops few middle porosity–low permeability reservoirs.

Fig. 1
figure 1

Overall distribution of porosity (%)

Fig. 2
figure 2

Overall distribution of permeability (1 × 10−3 μm2)

Table 2 Criterion for classification of Clastic reservoir physical property

By analyzing the distribution of pore permeability by the classification of the electrofacies in Figs. 3 and 4, it can be seen that the porosity and permeability also has a correspondence. Elec1 and Elec2 show significant ultra-low porosity and ultra-low permeability features with a value range of \({\varphi }\le 10\text{\%}\), \(\text{K}\le 1\) × 10−3 μm2. There is a high-permeability point in the distribution that remains to be observed. Porosity distribution in Elec3 has a large range with a slight right tendency distribution, mainly displaying low pore distribution. At the same time there are fewer ultra-low porosity and more middle porosity, and the permeability distribution is mainly distributed in the low-permeability reservoir with a value of 1 × 10−3 μm2, but a small amount of middle permeability with a value of 100 × 10−3 μm2. The porosity in Elec4 and Elec5 has a similar left tendency single humped distribution, mainly in the middle and low pore, and the permeability distribution has a big difference: permeability distribution in Elec4 has obvious double-humped properties with the presence of the hump values of 1 × 10−3 μm2 and 40 × 10−3 μm2 at the same time, while low permeability and middle permeability are also developed. A single hump is developed in Elec5 within the limit in 10 × 10−3 μm2and a major distribution of the peak in 1 × 10−3 μm2, showing the characteristics of ultra-low permeability. Low permeability and middle permeability characteristics are undeveloped. Elec6 has a large range of distributions as same, mainly with low porosity and a small amount of middle porosity.

Fig. 3
figure 3

Overall distribution of porosity in electrofacies categories

Fig. 4
figure 4

Overall distribution of permeability in electrofacies categories

Box plot can indicate the information about symmetry of the data, distribution of the degree of dispersion, especially it can be used to analyze discrete outliers and the extremes. As is shown in Figs. 5 and 6, the permeability distribution in Elec1 has an extreme value. The porosity distribution in Elec4 appears as two outliers. There are also outliers in the distribution of pore permeability in Elec5. By contrastive analysis, extreme values and outliers values are divided into two types. Class 1: the great differences between a sample and its adjacent sample have a great effect on petrophysics. Such as extreme point Sample 197 labeled by cross of Elec1, is close to the sample points of Elec4, Sample 198. Class 2: because of the quantitative classification of electrofacies, samples in transition zones of the electrofacies will also cause disperse phenomenon displayed as outliers in box plot like Sample 229, 424 labeled by circle. Therefore, for the precision of the permeability modeling, the study is carried out by eliminating outlier samples in Class 1, standing still for the samples in Class 2.

Fig. 5
figure 5

Box plot of porosity in electrofacies

Fig. 6
figure 6

Box plot of permeability in electrofacies

The main control factors of permeability

Feature engineering is a typical work to determine the main effect factors and characterize the factor as a parameter in model (Qian et al. 2016). The first task is to figure out the main control factors of permeability. In general (Zeng and Li 2009), the petrophysics of the reservoir is controlled by the particle size, sorting, and diagenesis if no significant development of fracture is involved in the characteristics. (Abbaszadeh et al. 1996). The main factors that affect the petrophysics of oil reservoirs can be inferred by the convergence or divergence of the differential between porosity and permeability. If \({\varphi }\)–K transforms in electrofacies is nearly overlapping and exhibiting minimal divergence in trend, homogenization by Diagenetic modification (Skalinski and Kenter 2015) may be inferred as a significant contributor to the fluid flow characteristics of the reservoir.

Figure 7 is the \({\varphi }\)–K crossplot of the relationship with color representing the different electrofacies. It can be seen that there is a small degree of dispersion in the relationship between different electrofacies proving that the main geologic factor controlling reservoir fluid flow in the Mesozoic strata of Gaoqing area is the transformation of diagenesis (Skalinski and Kenter 2015).

Fig. 7
figure 7

\({\varphi }\)–K crossplot in electrofacies

Digenesis mainly consists of compaction, cementation, dissolution and metasomatism. The first three contributes more on the petrophysics of reservoir because metasomatism causes little effect on the petrophysics due to the slight extent of development and the small amount of generation. It can be seen from the micro-observation that the cementation of the reservoir is carbonate cementation. As is shown in Fig. 8, carbonate cementation is colored by red. Quartz grains in gray or white and other quartz grains with asphaltene in black or dark gray and long shape feldspar can also be seen.

Fig. 8
figure 8

The carbonate cementation in Well G41

Core analysis in carbonate content and permeability makes the crossplot Fig. 9 signifying that the main carbonate content is less 5% and is not significantly correlated with permeability. The unstable carbonate content of low permeability and ultra-low permeability reservoirs has both low value and high value, and reservoir with carbonate cementation less than 5% has no significant effect on permeability.

Fig. 9
figure 9

Carbonate content and permeability crossplot

Cementation–intergranular volume projection crossplot can explain the influence of cementation on reservoir petrophysics in this area. Figure 10 shows the effect of compaction and cementation on reservoir-pore volume in previous research results (Wang et al. 2018a). As shown in the figure, the pore space of the reservoir is mostly affected by the compacting effect. A few samples are affected by cementation because of its postion adjacent to unconformity. This is due to the fact that the primary pores near the unconformity surface have not been dissolved but carbonate filled in a result of reduction of the pore by cementation obviously. But the main body is far from the unconformity surface, and the compaction effect causes the pore volume to decrease, the throat becomes fine or disappears, supplying no space for the cementation to fill, so the cementation content is low and the cementation effect is weaker. This indirectly indicates that the region’s cementation is not strong. Lack of cast core slice in the study makes dissolution hard to evaluate. So the main effect is compaction in a whole view.

Fig. 10
figure 10

Cementation–intergranular volume projection crossplot

The digenesis of this area mainly undergoes compaction, compaction and cementation syngeneic process, and compaction cementation and dissolution syngeneic process (Wang et al. 2018a). Under the precondition of less influence of cementation on reservoir petrophysics, the value of porosity changed by compaction \(\varDelta {\varphi }\) is put forward to characterize the reservoir transformation in the diagenetic stage. The initial porosity of the reservoir is \({{\varphi }}_{0}\), the current porosity is \(\phi\), initial porosity and sorting coefficients \({S}_{0}\) gains a functional relationship (Pan and Liu 2011), that is

$${{{{\upvarphi}}}_0}=20.91+22.90/{S_0},$$
$$\Delta {{{\upvarphi}}}={{{{\upvarphi}}}_0} - {{{\upvarphi}}}{\text{.}}$$

The variation of the porosity is related to the sedimentation and diagenesis: the sedimentation mainly affects the initial value of the pore space, and the diagenesis has a more obvious transformation to the pore space. The sorting coefficients are mainly obtained by fitting sorting coefficients of core data in lab. The main component in Elec1, Elec2, and Elec3 is volcanic rock with no concept of sorting coefficients. However, some researchers used the data of mercury injection to study the microstructure of volcanic rocks using the parameter to characterize the arrangement of volcanic particles (Qu et al. 2007). For exploratory research and result comparison, the sorting coefficient of the three electrofacies is set as two based on regional experience (Wang et al. 2018a, b, c). Figure 11 shows the relationship between porosity and permeability, and it is obvious that the permeability becomes smaller as the value of porosity changes.

Fig. 11
figure 11

\(\varDelta {\varphi }\)–K crossplot in electrofacies

Factors characterization on permeability modeling controls

It is known that the change of porosity can better reflect the diagenesis effect on reservoir petrophysics by control factors analysis. The variation of porosity, as the control factor of permeability model, can be used as the characterization parameter of the model. In this study, the stratum has been classified into six kinds of electrofacies. A cluster is regarded as a distinct electroface reflecting formation hydrology, lithology, and diagenetic properties. By petrophysics analysis, a qualitative understanding is obtained by statistical description. It is understood that the electrofacies properties can be used as an important independent variable of permeability modeling on reservoir petrophysics, but there is a certain overlapping zone in pore-permeability distribution. No obvious and concrete boundary can be observed between different electrofacies. The electrofacies classification cannot be used as a single parameter to estimate permeability. SNK(Student–Newman–Keuls) method in multivariate variance analysis is used to verify and evaluate comparisons quantitatively for different electrofacies (Barr et al. 1977). If you draw a statistically significant conclusion, and further infer which groups are different, which groups are not different, or whether all groups are different, SNK method can quantitatively evaluate the difference of permeability between different electrofacies, and explain the relationship between electrofacies and permeability statistically. With Table 3, we can see similar conclusions with the descriptive analysis of petrophysics statistics analysis. N column represents the sample number and the values in subset are the calculated P values (Demuth 2006) in statistics. P value is the probability that a sample observation or more extreme result will occur when the null hypothesis is true. The dataset are divided into three subsets. Elec1 and Elec2 have similar petrophysics, so do Elec4 and Elec5. Elec3 and Elec6 have obvious overlapping behavior and Elec5 also has a little bit of this behavior. This shows that there are some common petrophysics characteristics in different electrofacies, but the petrophysics have a wide fuzzy area and the electrofacies can be proved to be a very important discrete parameter that can reflect the permeability.

Table 3 Comparison result of electrofacies in permeability with SNK method

Electrofacies is just an application of the discrete attribute regarded as a distinction that reflects formation features. The well-logging data as continuous data consists of abundant information on hydrology, lithology, and diagenetic properties of the reservoir. DEN and AC logs are indirect measures of porosity. In clastic rocks, they reflect formation permeability. The GR log indicates shale content in sandstone reservoirs, which is inversely related to permeability, particularly in low-permeability layers. The SP log indicates the concentration difference between the mud and the formation water which is affected by hydrology, lithology, and diagenetic characteristics of the reservoir. LLS and LLD logs measure the resistivity in the vicinity of the wellbore and deep into the reservoir, respectively; when these two logs are collectively analyzed, and respecting saturation-height considerations, they give an indication of the invasion severity; hence they are coherently related to permeability.

It is known that there is a certain distribution in different logging properties between electrofacies using a multi-crossplot illustrated in Figs. 12, 13, 14, 15, 16, 17 to analyze the relationship between permeability and logging data. (a) Elec1 and Elec4 have a lower GR value. Elec1 displays a relatively compact, ultra-low-permeability distribution, while the Elec4 has a large distribution range. The distributions of the remaining electrofacies are mainly above 110API and have no obvious distribution characteristics. (b) Elec4 displays a relatively low value distribution in SP, but high value also exists correspondingly. (c) The trend of distribution in LLD and LLS are similar, and Elec4 and Elec5 have significant changes in the resistivity. The rest of the distribution features are not obvious, (d) as for sonic features, Elec1 and Elec4, Elec5 reflects a tendency to increase with the large permeability of acoustic time, and Elec4, Elec5 reflect the tendency of the permeability increase with the same acoustic time overlapping more seriously. (e) The density characteristic is similar to the characteristic distribution of sonic features. There are many overlapping areas and no obvious distribution boundaries between different electrofacies.

Fig. 12
figure 12

GR–K crossplot in electrofacies

Fig. 13
figure 13

SP–K crossplot in electrofacies

Fig. 14
figure 14

LLD–K crossplot in electrofacies

Fig. 15
figure 15

LLS–K crossplot in electrofacies

Fig. 16
figure 16

LLS–K crossplot in electrofacies

Fig. 17
figure 17

DEN–K crossplot in electrofacies

Using the crossplot to describe the variables, we can get a qualitative understanding of the effect of the different logging data on the permeability model but not a quantitative knowledge of the influence on the primary control and characterize the influence. By the analysis above, variable selection is conducted among the continuous variables, namely \(\varDelta {\varphi }\)to represent the strength of the diagenesis; GR to represent the radioactive characteristics of the strata; SP to represent the subsurface fluid percolation characteristic information; LLD and LLS to represent the electrical information of the strata and the flushing zone; AC to represent the elastic feature of the strata and DEN to represent the density information. Multivariate selection method is using statistics evaluation to carry out forward and backward method (Mark and Goldberg 2001). The result is in Table 4. B is partial regression coefficient and its standard error without normalization. Beta is the normalized regression coefficient. T is the T statistics value to test for the significance of a single coefficient. Sig represents significance to signify statistic difference. The principle of variable selection is to estimate the magnitude that affects the variables through the calculation of statistics of P value which test the possibility that the null hypothesis is true or more serious. The independent variables are sorted based on P value to characterize the respective impact on dependent variable, K (Charniak 1996).

Table 4 The statistic result of forward approach

Table 4 shows that the variable \(\varDelta {\varphi }\) is firstly included by the forward method. This proves the consistency of statistical model and geological analysis, and also characterizes the availability of the definition of the variable by \(\varDelta {\varphi }\). However, beyond our expectation, the SP curve and the LLS curve will be included in the second and third variables in turn, and AC is the fourth variable.

The backward approach first incorporates all variables to calculate the statistics, according to whether it is meaningful in the statistics to delete the variable. The result of backward approach is the same as the forward method; the reserved variables are \(\varDelta {\varphi }\), SP, LLS and AC. The first excluded variable is GR, followed by LLD, DEN. The two methods remain consistent (see Table 5).

Table 5 The statistics result of backward approach

Description analysis learned similar distribution trends in LLS to LLD and no obvious characteristic of distribution in SP. But both of them are figured out to indicate permeability included in the modeling by calculation of statistics. Combined with the logging principle, the common denominator in LLS, SP, and AC are all reflective of the flushing zone information. The flushing zone is the part of the borehole near the wellbore which is strongly flushed by the mud filtrate, and the different degrees of mud intrusion will strongly affect the formation conductivity, the electrochemical property and the elastic property in the reservoir with different permeability. It is the most abundant ring zone of permeability information (see Fig. 18).

Fig. 18
figure 18

Diagram of the intrusion characteristics of the reservoir and corresponding radial detection range

Analysis of the selection results of independent variables from the logging principle (Schlumberger 1986) presents that: spontaneous potential data, SP is an indicator to a permeable formation by the appearance of the apparent variation in the profile of electrical potential changes of the natural electric field around the borehole. The spontaneous potential is mainly due to the salinity difference of the formation water and the mud to form ion exchange with the formation pressure and the mud column pressure producing potential difference in electrochemical process. The exchange ability of the ion between the formation water and the mud is largely related to the permeability of the strata. Application of LLS in shallow-lateral logging is to measure the resistivity between flushing zone and transition zone, as a result strong permeable formations will invade more mud, and the high resistance of fresh water mud can cause the change of resistance in the flushed zone for the replacement of the low-resistivity formation water in different permeable formations. The statistical conclusion confirms the influence of permeability on the assumption of flushing electrification. In view of the exclusion variable, the first excluded variable GR curve reflects the formation of radioactive, mostly contained by clay volume information, which reveals that it has little impact on permeability. The second exclusion variable LLD and third variables DEN convey more information of uninvaded zone and mud cake, respectively, which is not contributed to manifest information of permeability. Acoustic time logging mainly contains elasticity feature of the flushing zone and zone of transition. As the fourth variable, it reflects a certain degree of reaction to the permeability. However, as a statistics result, little research has been conducted to show it can be inducted by the three logging attribute in theory.

Scale quantification of discrete variable by optimal scale analysis

The variables characterizing the permeability model are electrofacies,\(\varDelta {\varphi }\), SP, LLS, AC based on the analysis above. The types of data attributes are different: electrofacies is a discrete variable, but the others are continuous variables. The scale of the discrete variable is necessary to be quantified for the correct representation of the model and the unification of scales in the other five continuous variables with different attributes should be considered in the modeling process.

The general regression requires data in strict forms as continuous value. When the discrete variable is encountered, the regression cannot accurately reflect the different values of the discrete variables, such as gender variables. It may lose its own significance if discrete variables are labeled directly into the regression model manually label. The optimal scale regression is to solve the problem; it is good at quantifying the discrete variables by different values, and then converts the categorical variables to the numerical model for statistical analysis (Paolillo 2009). It can be said that with the optimal scale method, it will greatly improve the processing ability of categorical variable data and break the restriction of discrete variables to the analysis model selection, and enlarge the application ability of regression analysis. Using the optimal scale analysis, the variables required for modeling the permeability are analyzed as follows:

As shown in Fig. 19, the optimal scale transformation takes a linear transformation to the continuous variable with x axis representing its original value and y axis representing the converted value after scale quantification. It does not affect the control of the independent variable for the continuous variable like well data and\(\varDelta {\varphi }\). For the discrete variable, electrofacies, the optimal scale analysis is quantified. The quantified electrofacies and the petrophysics analysis as mentioned before are in accordance reflecting impact on the permeability of the scale represented in the model established by the value. The specific electrofacies values are as shown in Table 6.

Fig. 19
figure 19

Variables transformation by optimal scale analysis

Table 6 The transformation value of electrofacies

Model establishment by fuzzy logic

Fuzzy Logic theory is a logic extension which allows the existence of partial truth between the entirely true and entirely false and take all options between these alternatives into account (Zadeh 1965). Fuzzy logic is an effective tool for modeling uncertainty, which is associated with vague, imprecise, and/or lack of information about a particular factor in a problem (Boske and Diem 2000; Lababidi and Baker 2003). The Permeability estimation problem is typical situation of imprecise data involving uncertainty, fuzzy correlation between rock properties, and the effects of man-made and/or natural disturbances. For this type of issue, fuzzy logic can tolerate and interpret subjective concepts uniformly such as very high permeability or very low permeability, which effectively fills the missing information gap making the problem to be mathematically expressed and calculated instead of ignoring or minimizing it (Cuddy 1997) which will result in an inherent error term. In addition, the fuzzy logic model is completely open and easy to understand and requires minimal user intervention. Not only the interpretation of fuzzy logic results is simple, but also they often describe complex nonlinear systems that violate conventional logic (Nashawi and Malallah 2009).

The user chooses the number of bins into which the training data is to be divided. The program sorts the training data into roughly equal-sampled bins, starting at the lowest values and extending to the highest. For each data bin the program calculates the mean (\({\upmu }\)) and the standard deviation (\({\upsigma }\)) for all the associated curves to be used in the prediction. The mean and standard deviation values are then used by the program, when run in prediction mode, to find the most likely result.

To make the prediction, the program first calculates the fuzzy probability that an input log is in a certain bin. The following equation is used for this:

$$P\left( {{C_b}} \right)\;=\;\sqrt {{n_b}} \; \times \;{e^{{{{\text{ }} - {{\left( {C - \mu } \right)}^2}} \mathord{\left/ {\vphantom {{{\text{ }} - {{\left( {C - \mu } \right)}^2}} {\left( {2 \times {\sigma _b}^{2}} \right)}}} \right. \kern-0pt} {\left( {2 \times {\sigma _b}^{2}} \right)}}{\text{ }}}},$$

where P(Cb) the probability that curve C is in bin b, nb the number of samples in bin b, C the input value for curve C, \({{\upmu }}_{b}\) the mean value for curve C for bin b, \({{\upsigma }}_{b}\)b the standard deviation for curve C for bin b.

The probabilities for all the input curves are then combined as follows:

$$\frac{1}{{{P_b}}}=\frac{1}{{P(C{1_{\text{b}}})}}+\frac{1}{{P(C{2_{\text{b}}})}}+\frac{1}{{P(C{3_{\text{b}}})}}+......$$

where Pb the total probability for bin b, P(C1b) the probability for curve C1 for bin b.

The most likely solution will be the bin with the highest probability. The program outputs the most likely bin result, the second highest probability bin, and a weighted average of these two highest results. The weighting is done as follows:

$${R_{{\text{av}}}}\;=\;\frac{{{R_{{\text{ml}}}}\; \times \;{P_{{\text{ml}}}}\;+\;{R_{{\text{sl}}}}\; \times \;{P_{{\text{sl}}}}~~}}{{{P_{{\text{ml}}}}\;+\;{P_{{\text{sl}}}}~}},$$

where Rav average weighted result, Rml most likely result, Rsl second most likely result, Pml probability of most likely result, Psl probability of the second most likely result.

To give a quantitative feel for the errors in the results, high and low result curves can be generated for the most likely and the weighted average results. These curves are constructed as follows: at each level the bin probabilities are converted to a normalized (0–1) cumulative frequency distribution. The Result Bin Percentile is found (ResPC).

The Low result is the bin that has the percentile ResPc–Er. The High result is the bin that has the percentile ResPc + Er where Er is the percentile error set with a value of 25% by the manual.

Results and discussion

The well plot is shown with the results of the estimated permeability in Fig. 20: the first to fifth track is the independent variable applied for the modeling of permeability, and the sixth path is average weighted result, most likely result and compared to cores plugs. The seventh track is the most likely result, and the range between the low result and the high result. The ninth track is the spectrum of the probability obtained at each sampling point. It is known from the plot that the calculated permeability is in good accordance with the trend of core permeability.

Fig. 20
figure 20

The well plot of Well G41

There are several understandings on the comparison of calculated permeability and core permeability in the whole well-plot view: There are two periods of volcanic intrusions with the volcanic rocks in the upper part which are affected by dissolution due to close to the unconformity surface with a mean value nearly 2 × 10−3 , while the lower part retains the dense characteristics of volcanic rocks, the permeability is only 0.0765 × 10−3 μm2. In 1070–1100 m interval, electrofacies change more frequently. The core permeability is 173 × 10−3 μm2 and the calculation permeability is 163 × 10−3 μm2 with the relative error only 6% in the middle- and low-permeability area at 1075 m; The interval adjacent below results in a sharp change in permeability due to the existence of Elec2, which is well-reflected in the core data.

Scaling up the plot in Fig. 21, the estimation of permeability is better reflected in the permeability of the changes in the track; precise estimation of Elec2 in the interval in 1100 m is consistent with the core permeability indicating that the calculated permeability is of availability for the low-permeability range; The interval in 1120 m is characterized by the edges of a thick layer of Elec4 and at the junction of Elec3. The core permeability shows that there is a thin low-permeability zone between the ultra-low permeability reservoirs, which calculates the permeability of 0.284 × 10−3 μm2 but core permeability of 2.5 × 10−3 μm2 presenting a slight error in the accuracy comparison of cores, which shows that the algorithm is still unable to break the limit of the resolution of the well-logging. It also exists in the interval of 1089 m and 1103 m.

Fig. 21
figure 21

The well plot of Well G41 between 1000 and 1150 m

The accuracy rate of permeability calculation in the interval of 1230 m and 1275–1300 m is the highest with the core permeability (see Fig. 22). The main reason is that petrophysics characteristics in the overlying strata are significant with a larger thickness and between clear boundaries of electrofacies; the calculated permeability retains the lowest degree of accuracy with core permeability in the interval of 1250–1270 m with the petrophysics of the layer in middle permeability, low permeability and ultra-low permeability at the same time. More types, thin thickness and rapid changes in electrofacies and the complicated petrophysics characteristics of Elec6 as the main component in the interval leads the logging response to be greatly affected by the adjacent layer with unmatched penetration information of the corresponding layer that cannot be correctly indicated.

Fig. 22
figure 22

The well plot of Well G41 between 1230 and 1330 m

A comprehensive knowledge about the precision of the model can be obtained by the analysis above: In the view of electrofacies, the result of permeability calculation is good in Elec4 and Elec5, and the error exists relatively more in Elec2 and Elec6. This is the reason as petrophysics in Elec4 and Elec5 are obvious, but Elec2 and Elec6 are more complex. In terms of thickness, the effect on the medium-thick layer is good, but for the turbulent sedimentary environment with more types of electrofacies and rapid changes with thin thickness, permeability prediction effect is not good. In the perspective of permeability distribution, the model has the ability to adapt to the full range of permeability, which can not only identify low permeability, ultra-low permeability reservoir, but also can identify middle-high permeable layer and have good patience. At the same time, the method has a clue on uncertainty analysis because it determines the upper and lower limit of permeability calculation.

Figures 23, 24, 25 are the comparison crossplots of the calculated permeability (logarithm) and core permeability (logarithm) with the calculation of correlation coefficient. In the analysis of the calculation result by optimized approach in Fig. 23, the calculated permeability characteristics of different electrofacies are consistent with the petrophysics analysis above: Elec1 and Elec2 mainly develops the ultra-low permeability strata in a value below 1 × 10−3 μm2; the main part of Elec4 is higher than Elec5; Elec6 has a good calculation accuracy with a few dispersed points; core permeability range of Elec3 is large, and the calculation of permeability is mainly between 1 and 10 × 10−3 μm2. In the comparison with multiple linear regressions with the same parameter in Fig. 24 and the fuzzy model without feature engineering containing all parameters collected in Fig. 25, the accuracy increased significantly. In the multiple linear-regression model, the accuracy is bad with a narrow distribution above the value of 10. Moreover, it calculates negative in ultra-low permeability area. The regression model apparently acts badly in estimation with the correlation coefficient of 0.07. Compared with model without feature engineering, the result maintains a good trend with the core, but estimated less observation by the reference line. It means that the fuzzy logic is well-behaved in estimation in permeability but the correlated parameter should be purified to feature the relationship. In the result with optimized method, the core points of these samples are estimated well and the correlation coefficients are calculated to reach the 0.76, which fully meets the needs of exploration production.

Fig. 23
figure 23

Cross plot of results from estimated permeability versus core permeability with feature engineering

Fig. 24
figure 24

Cross plot of results from estimated permeability by linear regression model versus core permeability

Fig. 25
figure 25

Cross plot of results from estimated permeability by fuzzy logic without feature engineering versus core permeability

Conclusion

The fuzzy logic application based on permeability control factors of low permeability and low-permeability reservoirs is proposed and the formation permeability of Mesozoic strata in Gaoqing area is estimated and the following conclusions are obtained.

  1. 1.

    The Mesozoic strata in the Gaoqing area are characterized by low permeability, and ultra-low permeability. Based on the optimized K nearest to the neighbor clustering,the reservoir is classified into six electrofacies in which Elec1 and Elec2 show significant ultra-low porosity and ultra-low permeability features; petrophysics of Elec4 and Elec5 is relatively better and Elec3 and Elec6 are the most complicated. Each electrofacies has its own characteristic distribution while different electrofacies have a large range of overlapping of the range that cannot be directly classified by the electrofacies prediction.

  2. 2.

    According to logging data analysis, the main factors affecting reservoir permeability are the homogenization effect caused by diagenesis. The cementation effect is not strong in the area. The petrophysics are mainly related to the effect of compaction and dissolution, and the change of pore value caused by diagenesis is used to characterize the transformation effect of diagenesis.

  3. 3.

    Variable selections in the permeability modeling is conducted by the forward and backward approach integrating the result of electrofacies, SP, LLS, AC and \(\varDelta {\varphi }\) as a combination of independent variables. Combined with the result of variable selection and the logging principle, this paper comprehensively explains the meaning of each independent variable achieving the unification of the mathematical model and the method principle.

  4. 4.

    Apply the optimal scale analysis to scaling the measurement of the model in discrete variable and the continuous variable, and get the value of the discrete variable in electrofacies and the scale conversion value of each continuous variable.

  5. 5.

    Fuzzy logic is assembled to model permeability with the filtered independent variable. By the accuracy analysis, the correlation coefficients reached 0.76 with the elimination of data points in scale error. The application is practical for the ultra-low permeability-to-middle permeability stratum, and the calculation precision is high. As for the stratum of the thin layer or in frequent sedimentary changes, the adaptability is poor.