1 Introduction

Wireline logs including abundant in situ measurement information taken from boreholes can be effectively used to derive basic petrophysical parameters of hydrocarbon reservoirs (Serra 1984). Most of these formation properties (e.g., porosity, shale volume, water saturation etc.) are usually estimated in reliable quality-checked inversion procedures (Ball et al. 1987; Alberty and Hashmy 1984; Dobróka et al. 2016). However, permeability of rock formations cannot be directly estimated by the inversion of conventional well logs, since the available tool response equations do not include it explicitly. Therefore, it is generally estimated empirically by using some types of well logs (Wyllie and Rose 1950; Timur 1968; Coates and Dumanoir 1974) and these estimations often need to be calibrated by core data (Singh 2018). Due to different methods and conditions, the estimation error of permeability is rather large compared to the other petrophysical parameters (e.g., porosity and shale volume) that are required for evaluating the productivity of hydrocarbon reservoirs. Exploratory data analysis methods including regression analysis, dimension reduction and pattern recognition techniques are widely used in geosciences for different purposes. Tang and White (2008) made a comparative study on several statistical tools for facies classification. Jafarpour and McLaughlin (2009) applied a discrete cosine transform for reservoir characterization. Soares et al. (2020) used dictionary learning for the sparse representation of 4D seismic data to highlight only the main features of the dataset. Cosultchi et al. (2012) utilized principal component analysis and cluster analysis for aquifer identification in petroleum reservoirs. Generally, some linear regression analysis approach is used for relating the permeability to existing petrophysical parameters (Dahraj and Bhutto 2014; Handhal 2016). Multivariate statistical tools have been applied for permeability estimation in Al-Mudhafar (2020).

We present a new alternative to predict permeability of hydrocarbon reservoirs, which is based on a metaheuristic approach for factor analysis. The simultaneous processing of well logs allows not just to reduce the dimensionality of the data space, but to find correlation between the extracted factor variables and reservoir parameters. As one of the most important parameters, we estimate the intrinsic permeability utilizing all available well logs leading to more reliable estimates. Szabó and Dobróka (2018) suggested a statistical approach for the processing of well logs, in which a differential evolution algorithm based factor analysis was introduced for shale volume estimation. For solving the problem of factor analysis, we currently apply the particle swarm optimization (PSO) method proposed by Kennedy and Eberhart (1995), which is less computationally expensive and requires less parametrization than genetic algorithm (Rahmat-Samii 2003; Donelli et al. 2006). This metaheuristic technique is based on swarm intelligence (Del Ser 2019), which is widely used in engineering applications. For instance, Nguyen et al. (2020) utilized it for feature selection in data mining. Self et al. (2016) suggested PSO for selecting optimal drilling parameters to minimize the overall cost. Shaw and Srivastava (2007) evaluated the applicability of PSO to inversion of direct current, induced polarization and magnetotelluric data and concluded that the results are consistent with ridge regression and genetic algorithm based inversions. Essa and Elhussein (2018) presented a geophysical application, in which magnetic anomalies were interpreted successfully with the help of this metaheuristic.

Factor analysis is a multivariate statistical tool that is generally used to decrease the dimensionality of multi-parameter problems, while keeping most of the information contained in the original variables to simplify the data interpretation. The newly derived factors are necessarily uncorrelated and may reveal hidden information from the measurements (Lawley and Maxwell 1962). Thus, for processing wireline logging data it can be effectively applied because the measured datasets are usually quite large due to several applied logging tools and numerous measured depth points. The extracted factors from wireline logs are referred to as factor logs that can be linked to petrophysical parameters through regression analysis (Szabó 2011). In this way, factor analysis processes all available well logs to give an estimation to these parameters, while for instance, simple deterministic methods normally rely only on one log type to derive one parameter at a time. Since it is an effective approximate method, it can also process data for which an exact mathematical relation with petrophysical properties is not available (e.g., the caliper log), however they may contain useful lithological or physical information.

We show a two-level embedded computing algorithm for improving the performance of factor analysis. In the inner loop, the PSO algorithm runs to estimate the factor scores while the factor loadings are initially derived from the non-iterative approach of Jöreskog (2007). In the outer loop, the hyperparameters of the same PSO procedure called the cognitive and social parameters are refined by a Metropolis criterion based simulated annealing method as global optimization tool avoiding the possible localities of the objective function (Metropolis et al. 1953). These control parameters are important learning factors, which are normally chosen as arbitrary constants at the beginning of the PSO procedure. In this study, we suggest a much more intelligent manner as estimating them automatically. This approach was first proposed for the interpretation of direct push logs to estimate the volumetric soil moisture content of the shallow unsaturated zone (Abordán and Szabó 2021) as a further improved factor analysis technique used in Szabó et al. (2012). In this study, the factor scores are cross-plotted with reservoir properties. We assume the extracted factors hold quantitative information on the permeability of deep reservoir rocks. We extend the factor analysis algorithm to process multiwell logging data. Our permeability prediction results are shown involving single and multi-borehole data collected from a Hungarian gas field.

2 Methods

2.1 Traditional method of factor analysis

To initialize factor analysis, we build a common data matrix (D) from the standardized data of S number of measured well logs, in which every column represents a dataset recorded by different logging tools, while N number of rows contain all measured depths along the processed interval

$$ {\mathbf{D}} = \left( {\begin{array}{*{20}c} {\mathop D\nolimits_{11} } &\quad {\mathop D\nolimits_{12} } &\quad \cdots &\quad {\mathop D\nolimits_{1S} } \\ {\mathop D\nolimits_{21} } &\quad {\mathop D\nolimits_{22} } &\quad \cdots &\quad {\mathop D\nolimits_{2S} } \\ \vdots &\quad \vdots &\quad \vdots &\quad \vdots \\ {\mathop D\nolimits_{i1} } &\quad {\mathop D\nolimits_{i2} } &\quad \cdots &\quad {\mathop D\nolimits_{iS} } \\ \vdots &\quad \vdots &\quad \vdots &\quad \vdots \\ {\mathop D\nolimits_{N1} } &\quad {\mathop D\nolimits_{N2} } &\quad \cdots &\quad {\mathop D\nolimits_{NS} } \\ \end{array} } \right),$$
(1)

Factor analysis is based on the following model

$$ {\mathbf{D}} = {\mathbf{FL}}^{{\rm T}} + {\mathbf{E}} ,$$
(2)

where F is the factor scores matrix of size N-by-R (R is the number of derived factors). Quantity L is the factor loadings matrix of size S-by-R and E is the residual error matrix of size N-by-S (symbol T is the matrix transpose). According to Eq. (2), the observed wireline logs are expressed as a linear combination of the factors. The factor loadings in matrix L show the partial correlations between the observed and derived statistical variables. The largest part of total variance represented by matrix D is explained by the first factor, the scores of which are found in the first column of matrix F. Given that the factors are uncorrelated, we write the correlation matrix of standardized data as

$$ {\mathbf{R}} = N^{ - 1} {\mathbf{D}}^{{\rm T}} {\mathbf{D}} = {\mathbf{LL}}^{{\rm T}} + {{\varvec{\Psi}}}, $$
(3)

where \({{\varvec{\Psi}}}\) denotes the diagonal matrix of error variances independent of those of the common factors in matrix F. The calculation of correlation matrix R is easy to handle if matrix E is neglected in Eq. (2). This approximation helps to avoid the estimation of specific variances by cancelling matrix \({{\varvec{\Psi}}}\) in Eq. (3). Since the part of the total variance that may not be described by the common factors is neglected, this approach leads to an approximate solution for the factors using the popular method of Principal Component Analysis (PCA). Although PCA does not account for residual matrix E, but it gives a quick and unique (algebraic) solution for extracting the principal components (Maćkiewicz and Ratajczak 1993).

We approximate the factor loadings non-iteratively by a scaling-free estimation method (Jöreskog 2007)

$$ {\mathbf{L}} = (diag{{\varvec{\Sigma}}}^{ - 1} )^{ - 1/2} {{\varvec{\Omega}}}({{\varvec{\Gamma}}} - \theta \,{\mathbf{I}})^{1/2} {\mathbf{U}}, $$
(4)

where \({{\varvec{\Gamma}}}\) denotes the matrix of the first R number of eigenvalues of the empirical covariance matrix Σ, and the corresponding R number of eigenvectors are given in matrix \({{\varvec{\Omega}}}\), and U is a freely selected R-by-R orthogonal matrix. Quantity I denotes the identity matrix and θ is a properly chosen constant that must be accepted smaller than 1. In traditional dimension reduction approaches, the factor scores are calculated by the maximum likelihood method. By assuming the hypothesis of linearity

$$ {\mathbf{F}}^{{\rm T}} =({\mathbf{L}}^{{\rm T}}{\boldsymbol{\Psi }}^{-1} {\mathbf{L}})^{{ - 1}} {\mathbf{L}}^{{\rm T}} {\boldsymbol{\Psi }}^{{ - 1}} {\mathbf{D}}^{{\rm T}} $$
(5)

gives an unbiased estimate to the factor scores (Bartlett 1937). Since Eq. (5) is analogous to a least squares solution weighted by the specific variances (Menke 2012), the localities of the objective function to be minimized cannot be effectively avoided by the gradient-based searching for the absolute minimum.

2.2 Particle swarm optimization based factor analysis

To improve the performance of data reduction, we suggest PSO as a global optimization procedure to determine the vector of factor scores. Abordán and Szabó (2018) established the basic principles of the PSO assisted factor analysis (FA-PSO). To give a metaheuristic estimation of the factors, we reformulate the model of factor analysis given in Eq. (2) as

$$ {\mathbf{d}}^{(m)} = \tilde{\mathbf{L}}\mathbf{f} + {\mathbf{e}} $$
(6)

where d(m) is the vector of standardized wireline logging data of size SN, \({\tilde{\mathbf{L}}}\) denotes the matrix of factor loadings of size NS-by-NR, f is the expanded vector of factor scores of size RN and e is the vector of approximation errors of size SN (Szabó and Dobróka 2017). As a first step, we determine matrix \({\tilde{\mathbf{L}}}\) by Eq. (4) and rotate it using the varimax technique for easier interpretation (Kaiser 1958). We give the optimal values of the factor scores f by using PSO, which can be considered as an iterative inversion procedure. We use the following L2-norm based function as objective function

$$ E = \left[ {\frac{1}{NS}\sum\limits_{i = 1}^{NS} {(d_{i}^{(m)} - d_{i}^{(c)} )^{2} } } \right]^{1/2} $$
(7)

where d(c) contains the standardized calculated wireline logging data. Here, calculated data refers to the term \({\tilde{\mathbf{L}}\mathbf{f}}\) from Eq. (6), which also gives the possibility for calculating theoretical well logs that can be regarded as the solution of the forward problem of the inversion procedure. Previous researches suggest that it can be even used to generate theoretical well logs for intervals where measurements are missing, due to the high correlation of certain log types to the first factor (Szabó et al. 2012). By the minimization of the objective function in Eq. (7), we update the factor scores by PSO, while we keep the factor loadings constant to reduce runtime. It must be mentioned that it is possible to simultaneously refine the factor scores and loadings using the well-known maximum likelihood method, or an iteratively reweighted factor analysis for seeking a more robust solution (Szabó and Dobróka 2017).

Particle swarm optimization originally introduced by Kennedy and Eberhart (1995) was developed to mimic the social behavior of animals. Generally, it was applied for its easy implementation and high efficiency in solving complex problems (Kumar et al. 2017). To find the optimal solution, PSO applies a swarm of particles generated randomly by following uniform distribution in a pre-defined search space. Given an N-dimensional search space, vector fi = (fi1,fi2,…,fiN)T denotes the position of the ith particle. In our problem, quantity fi denotes the ith possible solution for the vector of factor scores. We store the particle’s velocity in vector vi = (vi1,vi2,…,viN)T, which defines its movement in each iteration within the search domain. The swarm of particles seek the minimum of the objective function defined in Eq. (7) by utilizing the following equations

$$ {\mathbf{f}}_{i} (t + 1) = {\mathbf{f}}_{i} (t) + {\mathbf{v}}_{i} (t + 1) $$
(8)
$$ {\mathbf{v}}_{i} (t + 1) = w{\mathbf{v}}_{i} (t) + r_{1} c_{1} ({\mathbf{p}}_{i} (t) - {\mathbf{f}}_{i} (t)) + r_{2} c_{2} ({\mathbf{g}}(t) - {\mathbf{f}}_{i} (t)) $$
(9)

where L denotes the number of particles in the swarm (i = 1,2,…,L), t stands for iterations (t = 1,…,tmax) where tmax is the final iteration step. During the search for the global optimum, PSO utilizes a memory which is updated after each iteration. Every particle’s best position is held in pi = (pi1,pi2,…,piN)T, while vector g stores the best position found by the whole population. Kennedy and Eberhart (1995) suggested to set the cognitive and social constants c1 and c2 equal to 2 as default, separately, during the initialization. These control parameters define to what extent the personal best position of particles pi and the very best position of the swarm g affect the movement of the ith particle during the search. In Eq. (9), r1 and r2 are uniformly distributed random parameters from 0 to 1, while w represents an inertia weight that was suggested by Shi and Eberhart (1998) to more effectively control the optimization procedure.

To start the FA-PSO procedure for finding the optimal vector of factor scores f, we generate a population of particles with uniform distribution within the search space, which can be pre-defined by solving Eq. (5) for the factor scores. As for w, we use a chaotic descending inertia weight scheme to better control the velocity of particles (Feng et al. 2007). We perform it in three steps: first we generate a random number (z) from the range 0 to 1. Then we set z by logistic mapping according to z′ = 4z(1–z) and finally

$$ w = (w_{1} - w_{2} )(t_{\max } - t)/t_{\max } + w_{2} z^{\prime} $$
(10)

where w1 and w2 represent the starting and the final values of inertia weight during the iterative procedure, respectively. Then in each iteration step, we re-determine the particles’ positions according to Eqs. (89), and re-calculate the objective function defined in Eq. (7) with the new factor scores f. We continuously update the particles until the maximum number of iterations is reached, thereafter we accept the particle with the lowest data misfit as the solution for the factor scores.

2.3 Permeability estimation by factor analysis

Previous studies show that the first statistical factor correlates well with the shale content of various geological formations (Szabó 2011; Szabó and Dobróka 2013; Puskarczyk et al. 2015; Puskarczyk 2020; Abordán and Szabó 2020). Advanced mathematical approaches have been developed to extract the shale volume from well logs as efficient as possible. The iteratively reweighted factor analysis gives a reliable estimation to the factors independent of the statistical distribution of input data (Szabó and Dobróka 2017). In addition to give a robust solution, the calculation of factor scores was established on global optimization by using evolutionary computation (Szabó and Dobróka 2018). By analyzing multiple well logs, we take all the measured information into account during the interpretation, which makes a step forward in increasing the reliability of estimation compared to single depth-point approaches using one or few well logs. Under taking all measured information we mean that we select all suitable well logs as input being highly sensitive to shale volume, in other words only those data types are processed that are significantly influenced by the variation of the relevant petrophysical parameter along the borehole. Based on the above considerations, the first statistical factor extracted from multiple well logs can be interpreted as a lithology indicator, which holds valuable information on the shale content of sedimentary formations. When applying factor analysis to well logs, a strong exponential connection between the first factor (F1) and shale volume (Vsh) can be set in geological formations of different ages. In the Hungarian and American examples given in Fig. 1, the consistence of the results is also confirmed by the validity of the same regression function (with approximately the same regression coefficients and by using proper scaling to the factor scores).

Fig. 1
figure 1

Regression relationships between the first factor and shale volume derived by well log analysis for different hydrocarbon fields

We assume a strong correlation between the absolute permeability and the first principal factor in sedimentary formations. This is partly due to the fact that permeability is to some extent inversely proportional to shale content of clastic formations (Schön and Georgi 2003; Revil and Cathles 1999). In our test site, an empirical relation is found between the decimal logarithm of permeability and shale volume both derived from wireline logs as well as core information (Fig. 2). The correlation coefficient for the first case is − 0.91, while for the latter it is − 0.72, both showing a clear relationship between the two quantities. In well log analysis, for estimating the absolute permeability (K) from wireline logging datasets, usually deterministic approaches are used which rely on porosity (Φ) and irreducible water saturation (Sw,irr) data. The most frequently applied manner is based on the use of similar empirical approaches like the formula of Timur (1968)

$$ K = 0.136\frac{{\Phi^{4.4} }}{{S_{w,irr}^{2} }} $$
(11)
Fig. 2
figure 2

Empirical connections between the intrinsic permeability and shale volume of Miocene hydrocarbon formations in the investigated site, Pannonian basin, East Hungary

Analogous studies show that the statistical factors include not only the information on lithological characteristics but also the hydraulic properties of reservoir rocks. For instance, Szabó (2015) explored the hydraulic conductivity of fractured groundwater formations successfully by factor analysis of hydrogeophysical logs. In this study, we test the proposed factor analysis approach to estimate permeability in six wells (Well 1–6) drilled in the Pannonian basin in East Hungary. We investigate unconsolidated upper Miocene gas-bearing formations with high average porosity showing a variability of different grain sizes and lithology as marl, silt and sands. Accordingly, we show that the well log of the first factor derived by the improved FA-PSO approach scaled into the range 0 to 1 can be directly used to estimate permeability along arbitrary intervals. For this purpose, we use a non-linear relationship, which connects the first factor to the intrinsic permeability of hydrocarbon reservoirs. In the given area, the first (properly scaled) factor extracted from the wireline logging dataset by FA-PSO is connected to the decimal logarithm of permeability (K) in the form as

$$ \lg (K) = a\,(1 - F_{1}^{b} )^{c} + d $$
(12)

where a, b, c, d are area specific coefficients estimated by regression analysis using for instance core data and other preliminary information. The above non-linear relationship is applied to different wireline logging datasets in the investigated hydrocarbon field to demonstrate the feasibility of the proposed method.

Our FA-PSO method is a fully metaheuristic approach, therefore its output is highly affected by the control parameters (i.e., learning factors) that are worth to be set as optimal as possible at initialization. The optimal parameter selection of PSO has been studied by several researches (Jiang et al. 2007; Fernández Martínez et al. 2010; Harrison et al. 2019). However, since the optimal control parameters highly depend on the optimization task, here an attempt is made to automatically select the optimal values of parameters c1 and c2 of PSO while solving factor analysis. These parameters are usually chosen according to empirical suggestions, which can make a problem ambiguous. To offer a more advanced solution of the presented PSO-based factor analysis, we propose a further developed algorithm to improve the FA-PSO method in regard of the c1 and c2 hyperparameters. For this purpose, we search for these parameters by Simulated Annealing method (SA) suggested by Metropolis et al. (1953) in an outer loop of our algorithm in an automated way. Then, in the inner loop, FA-PSO is run with the newly estimated c1 and c2 parameters. According to the previously outlined workflow, the procedure starts with the determination of the factor loadings according to Eq. (4). Then, we optimize the factor scores with the help of PSO by minimizing the objective function defined in Eq. (7). However, the control parameters c1 and c2 of PSO are also optimized during the procedure by SA in an outer loop as shown in Fig. 3. In the regression part of the workflow, we can correlate the first factor to available permeability values (and other petrophysical parameters). In the prediction phase, we estimate the permeability to missing logging intervals directly from the factor scores using the regression model in Eq. (12).

Fig. 3
figure 3

Hyperparameter estimation using SA algorithm for improving the PSO based factor analysis method. Cognitive and social constants (c1, c2) are automatically refined to give an estimate to the factor scores, while the permeability (K) is directly derived from the first factor using regression analysis

3 Numerical results

3.1 Single borehole application

We test the feasibility of factor analysis using in situ well logging data in a hydrocarbon well (Well-6). For permeability estimation, we simultaneously process the caliper (CAL), natural gamma-ray intensity (GR), potassium-thorium product (K × TH), spontaneous potential (SP), deep resistivity (Rd), bulk density (ρb), neutron-porosity (ΦN) and acoustic interval time (Δt) logs. Our investigation covers a 290 m long interval with a 0.2 m sampling interval. At the start of the FA-PSO procedure, the input logs are standardized and gathered into column vector d(m) according to Eq. (6). Then the factor loadings are estimated for three factors by the non-iterative approximate method of Jöreskog (2007) based on Eq. (4). The resulting factor loadings are rotated by the varimax algorithm and are collected in Table 1. The first factor is strongly related to the natural gamma-ray intensity, potassium-thorium index and bulk density logs that make it a good lithological indicator, which is consistent with previous applications (Szabó and Dobróka 2017). The second factor log correlates with the neutron-porosity and acoustic logs, while the third factor is greatly sensitive to the variation of the deep resistivity log.

Table 1 Orthogonally transformed factor loadings estimated in Well-6, Pannonian Basin, Hungary

Once the factor loadings are determined, we optimize the factor scores by the algorithm of PSO. First, we give an initial solution for the factor scores by solving Eq. (5), thus we define the search space of the PSO process as [−12, 12]. We estimate the amount of variance explained by the factors based on the singular value decomposition of the reduced covariance matrix of standardized data RΨ = LLT. Since three factors explain more than 90% of total variance of the input data, we extract three factors from the dataset. Thus, the number of unknowns is 4,353 (3 factor × 1451 depth points). For estimating their optimal values by PSO, we initialize 90 particles within the pre-defined search space. They search for the solution by moving around in the search domain according to Eqs. (8)–(9) to minimize the objective function in Eq. (7) in 5,000 iteration steps. In our experiment, we treat the cognitive and social parameters of PSO as unknown. Our algorithm allows changing them freely, instead of selecting them as arbitrary constants. We set the initial value of c1 and c2 equal to 1, respectively, while parameter w is set according to Eq. (10). At the end of the process, the estimated values of hyperparameters are c1 = 1.44 and c2 = 2.06. The variation of the control parameters of PSO and data distance as functions of iteration steps is plotted in Fig. 4. At the end of the optimization phase, the data distance calculated by Eq. (7) reaches 0.53. Both plots show a convergent and stable optimization procedure. The independence of results on the random initial particle positions in PSO algorithm has been earlier investigated in the context of inverse problem solutions (Abordán and Szabó 2020).

Fig. 4
figure 4

Convergence of the 1D PSO based factor analysis procedure in Well-6, Pannonian Basin, Hungary. The variation of learning parameters (c1 plotted with dashed line, c2 plotted with continuous line) during the iterative procedure (left panel). The decrease of data distance (energy) vs. iteration steps (right panel)

To find the regression relationship between the first factor found by the FA-PSO method and decimal logarithm of absolute permeability, we properly scale the former into the range of 0 to 1. For regression analysis, we use permeability data deterministically derived from Eq. (11). The regression coefficients in Eq. (12) are found to be a = 5.90, b = 1.27, d = 2.28 while c is fixed at 1, because it shows no sensitivity in this case. The correlation relationship between the scaled first factor and absolute permeability is plotted in Fig. 5. The Spearman’s rank correlation coefficient of -0.77 between the first factor and the decimal logarithm of permeability indicates a significant (non-linear) inverse relationship. The input well logs of the FA-PSO procedure are plotted in Fig. 6 in the first 8 tracks (note that the depth scale represents relative depth coordinates). The last track indicates that the fit between the factor analysis derived permeability (lg(K)_FA-PSO) and that of derived deterministically by Eq. (11) (lg(K)_DET) is consistent along the investigated interval. The CPU time of the FA-PSO procedure for automatically determined hyperparameters is ~ 7 h (~ 3000 depth points), while for fixed hyperparameters it is 86 s using an octa-core processor workstation.

Fig. 5
figure 5

Regression relationship found between the first factor and the decimal logarithm of permeability measured in mD in Well-6, Pannonian Basin, Hungary

Fig. 6
figure 6

The result of factor analysis in Well-6, Pannonian Basin, Hungary. The input logs of FA-PSO procedure are plotted in tracks 1-8: borehole caliper (CAL), natural gamma-ray intensity (GR), potassium-thorium product (K×TH), spontaneous potential (SP), deep resistivity (Rd), bulk density (ρb), neutron-porosity (ΦN) and acoustic (P-wave) transit time (Δt). The scaled first factor log (F1) is plotted in track 9. In track 10, lg(K)_FA-PSO represents the factor analysis derived logarithmic permeability (originally given in mD), while lg(K)_DET shows the deterministic estimation for the same quantity

3.2 Test of hyperparameters

The learning parameters of classical PSO are usually set to default values based on recommendations (Kennedy and Eberhart 1995). We suggest an improved PSO algorithm to improve these parameters, which forms the basis of the workflow presented in Fig. 3. We focus on the estimation of parameters c1 and c2 to optimize the PSO process. As input, we select the GR, K × TH, SP, Rd, ρb, ΦN, Δt logs in a 20 m long interval in Well-3. At the beginning of the SA procedure, both parameters are set as 1 and in every iteration step their value is slightly altered by adding a random perturbation (δ) to both values. We generate parameter δ as random number in each iteration step from the range −δmax to δmax. We also decrease the maximum value of perturbation (δmax) iteratively using δmax = δmax × ε, where constant ε is smaller than 1. Then with these new hyperparameters (c1 and c2), PSO optimizes the factor scores in 200 iterations. We define the energy function of SA as Eq. (7). If the energy difference (ΔE) in two subsequent SA iterations of the algorithm is negative (i.e., PSO reached a lower data distance), then the new values of c1 and c2 are accepted and the procedure is continued. If the energy difference is greater than 0 (i.e., the efficiency of PSO got worse with the new parameters), then the new c1 and c2 hyperparameters can be still accepted if a random number generated from 0 to 1 is smaller than acceptance probability Pa = exp(− ΔE/T), where T denotes the actual temperature of the search. This mechanism is the basis of the SA algorithm because it enables the search to get out of local minima of the energy function. To ensure the convergence to the global optimum of the objective function, the temperature must be reduced logarithmically as suggested by Geman and Geman (1984). We change the temperature in the qth iteration as T(new) = T0/lg(1 + q), where the initial temperature of the artificial system T0 is experimentally set to 5 × 10–6.

We run the SA procedure for 100 iteration steps to optimize the values of c1 and c2 as shown in Fig. 7. The newly generated control parameters are tested in each SA iteration step by running the FA-PSO method for 200 iterations for three consecutive times. This is particularly advantageous especially at the early stages, where data distances remain rather large due to the suboptimal values of c1 and c2. The mean of the three consecutive FA-PSO runs for each SA iteration can be seen in Fig. 7 (on the right panel). As the figure indicates, the whole procedure was run 200 times to check its stability. After approximately 40 iteration loops, SA successfully optimizes the values of the learning parameters for the FA-PSO algorithm in all 200 separate program runs and data distance defined by Eq. (7) did not decrease any further. This shows the stability of the SA-based hyperparameter estimation method. In Fig. 8, we illustrate all those value pairs of c1 and c2 that were tried by SA during the iterations in 200 repeated program runs. Green dots indicate the ones that did not lead to the global optimum, while the blue ones enabled PSO to converge to the optimal solution of factor scores. According to the results, c1 has less effect on the convergence and can be selected from a wider range (0.25–2.7), while the optimal range for c2 is more restricted (1.7–2.6) when solving the problem of factor analysis. The fitted regression line indicates a moderate correlation between c1 and c2. Our result agrees well with those of the literature, for instance Cleghorn and Engelbrecht (2014) derived similar convergent regions for the sum of the two parameters. This test justifies the advantage of using SA to find the optimal combination of hyperparameters. The CPU time of the FA-PSO procedure for automatically estimated hyperparameters is ~ 70 s, while for fixed hyperparameters is 0.27 s using an octa-core processor workstation.

Fig. 7
figure 7

Convergence of data distance (right panel) by altering control parameters c1 and c2 of the FA-PSO procedure (left panel) repeated 200 times using the dataset of Well-3, Pannonian Basin, Hungary

Fig. 8
figure 8

Cross-plot of learning parameters c1 and c2 estimated by the 1D Particle Swarm Optimization based factor analysis (green and blue dots) performed in Well-3, Pannonian Basin, Hungary. Red line indicates linear relationship between those points that led the PSO based factor analysis procedure to the global optimum. For the latter, the Pearson's correlation coefficient −0.58 indicates moderate correlation between the two quantities

3.3 Multi-borehole application

We extend the improved factor analysis to multi-dimensional cases, which allows the simultaneous processing of well logs measured from several neighboring boreholes. Let the measured data vector be d(h) defined in Eq. (6) in the hth borehole (h = 1,2,…,H). Then the model of factor analysis becomes

$$ \left( \begin{gathered} {\mathbf{d}}^{(1)} \\ \vdots \\ {\mathbf{d}}^{(h)} \\ \vdots \\ {\mathbf{d}}^{(H)} \\ \end{gathered} \right) = {\text{diag}}\left( {{\tilde{\mathbf{L}}}^{(1)} , \ldots ,{\tilde{\mathbf{L}}}^{(h)} , \ldots ,{\tilde{\mathbf{L}}}^{(H)} } \right) \times \left( {\begin{array}{*{20}c} {{\mathbf{f}}^{(1)} } \\ \vdots \\ {{\mathbf{f}}^{(h)} } \\ \vdots \\ {{\mathbf{f}}^{(H)} } \\ \end{array} } \right) + \left( {\begin{array}{*{20}c} {{\mathbf{e}}^{(1)} } \\ \vdots \\ {{\mathbf{e}}^{(h)} } \\ \vdots \\ {{\mathbf{e}}^{(H)} } \\ \end{array} } \right) $$
(13)

where \({\tilde{\mathbf{L}}}^{(h)}\) represents the matrix of factor loadings and vector f(h) contains the estimated factor scores along the hth borehole. There is Nh number of measured depth points in the hth wellbore, therefore the total number of data processed by factor analysis is N* = N1 + N2 + … + NH. In Eq. (13), we determine the matrix of factor loadings in a similar manner as in the previously presented one-dimensional case by Eq. (4), while we approximate the optimal values of factor scores by PSO.

We show a 2D application of FA-PSO to derive the permeability variations in a hydrocarbon zone in the same area of the Pannonian Basin. We gathered the wireline logging data collected from five wells (Well-1, -2, -3, -4, -5) into one dataset. The borehole investigated in Sect. 3.1 (Well-6) falls farther from the line of the other wells, thus we excluded it from our calculation. The GR, ρb, ΦN, Δt and shallow resistivity (Rs) logs being completely available all depths serve as the input of 2D factor analysis. By combining the dataset of the five wells, the total number of data is 37,525. The wells are located along a 2,200 m long profile. By estimating two factors, the orthogonally transformed factor loadings are given in Table 2. When calculating the factor scores the learning parameters are simultaneously optimized. The resultant values of learning parameters are c1 = 1.59 and c2 = 2.23 (Fig. 9). Due to the increased number of unknowns, in this case the PSO routine requires 25,000 iteration steps with 90 particles. Based on the calculation of the singular values of the reduced covariance matrix of standardized wireline logs, the first factor explains 77% of the total data variance, while the remaining 23% is explained by the second factor. The first extracted factor correlates well with all processed well logs except the P-wave acoustic interval time log, while the second factor is quite the opposite.

Table 2 Rotated factor loadings estimated in Wells 1–5, Pannonian Basin, Hungary
Fig. 9
figure 9

Convergence of the 2D PSO based factor analysis procedure in Wells 1–5, Pannonian Basin, Hungary. The variation of learning parameters (c1 plotted with dashed line, c2 plotted with continuous line) during the iterative procedure (left panel), the decrease of data distance (energy) vs. iteration steps (right panel)

We find a strong correlation between the first scaled factor and the decimal logarithm of permeability in the investigated Miocene formations (Fig. 10). The Spearman’s rank correlation coefficient is -0.93. Using the proposed model in Eq. (12), we establish the local regression model, where the available permeability values are obtained from deterministic modeling using Eq. (11). The regression coefficients in this case are a = 7.18, b = 4.5, c = 1.2 and d = –3.8. By interpolating the output data by ordinary kriging, we derive the maps of the estimated factors and that of the permeability estimated by 2D FA-PSO method along the investigated cross-section (Fig. 11). The depth of potential reservoirs and sealing units can be recognized well on the image (note that the depth scale represents relative depth coordinates). Of course, care must be taken, the farther the wells are from each other, the more uncertainty there is in permeability prediction because of lithological and structural variations. The reliability of the results can be ensured with the integration of core measurements and surface geophysical measurements. The CPU time of the FA-PSO procedure for automatically determined hyperparameters is ~ 100 h, while for fixed hyperparameters is ~ 20 min using an octa-core processor workstation.

Fig. 10
figure 10

Regression relationship found between the first factor and the decimal logarithm of permeability measured in mD in Wells 1–5, Pannonian Basin, Hungary

Fig. 11
figure 11

Result of permeability prediction using 2D PSO assisted factor analysis for multiple wells. Top panel shows the section of the extracted first factor, the middle panel includes that of the second factor, bottom panel illustrates the estimated logarithmic permeability (originally given in mD) in Wells 1–5, Pannonian Basin, Hungary

3.4 Calibration by core data

The results of factor analysis highly depend not just on the way of optimization but also what type and quality of the input data we have in the phase of regression analysis (Fig. 3). We test the feasibility of the permeability prediction method using the combination of wireline logs and core plug data. Let us choose a well log suite from Well-3 in the investigated area, which include the natural gamma-ray intensity (GR), potassium-thorium product (K × TH), spontaneous potential (SP), deep resistivity (Rd), shallow resistivity (Rs), bulk density (ρb), neutron-porosity (ΦN) and acoustic interval time (Δt) logs. From core plugs, the equivalent liquid permeability (KL) is available, which is derived from the measured gas permeability (KG) as

$$ K_{L} = \frac{{K_{G} }}{{1 + (\kappa /P_{m} )}} $$
(14)

where κ is a constant for a particular gas in a given rock type and Pm is the mean pressure (Klinkenberg 1941). We emphasize that that laboratory measurements on core plugs and wireline logging are carried out under different conditions (e.g., temperature and pressure etc.) yielding sometimes considerable differences in predicted permeability values. Moreover, core data provide only local information about permeability, while well logs are influenced by a larger extent of the rock formation, therefore the misfit between the well log derived permeability and that of core data sometimes can be significant. To improve the strength of correlation for the variables of Eq. (12), we let the most of the observed information concentrate into only one factor. We conduct the regression analysis to find the relationship between the first factor and lab permeability on those locations where core derived permeability data is available. In the investigated depth interval, core data is available at 54 non-equidistantly sampled locations, therefore wireline logging data is gathered only from these locations and standardized to serve as the input of factor analysis. The processed interval is 18 m long. First, factor loadings are estimated by Eq. (4) only for one factor. We show the calculated and rotated factor loadings in Table 3. Based on the table, we can conclude that the extracted factor from the wireline logging dataset shows a strong correlation with the GR, K × TH, ρb and ΦN logs, separately.

Table 3 Rotated loadings of the first factor estimated in Well-3, Pannonian Basin, Hungary

Once the factor loadings are calculated, the factor scores are optimized by PSO in the same manner as detailed in Sect. 3.1 (in Well-6). First, we solve Eq. (5) for estimating the initial values of factor scores to define the search space of the FA-PSO process [-3, 3]. We extract only one factor from the well logging dataset, therefore the number of unknowns adds up to only 54 (1 factor × 54 depth points). For estimating their optimal values, we randomly initialize 45 particles in the pre-defined search space. They look for the solution of factor scores by moving around in the search space according to Eqs. (89) to minimize the objective function in 200 iteration steps earlier defined in Eq. (7). We let the control parameters (c1, c2) of the PSO algorithm change automatically. At the last iteration step, the resultant values of learning parameters are c1 = 1.98 and c2 = 1.91, while the data distance reaches 0.66 (Fig. 12).

Fig. 12
figure 12

Convergence of the 1D PSO based factor analysis procedure in Well-3, Pannonian Basin, Hungary. The variation of learning parameters (c1 plotted with dashed line, c2 plotted with continuous line) during the iterative procedure (left panel), the decrease of data distance (energy) vs. iteration steps (right panel)

To interpret the regression relationship between the first factor and the decimal logarithm of absolute permeability, first we scale the former into the range of 0 to 1. As a reference value, the equivalent liquid permeability data is used. The non-linear relationship according to the model suggested in Eq. (12) between the first scaled factor and permeability is plotted in Fig. 13. The regression coefficients are found to be a = 2.95, b = 2.10, d = 0.17 while c is fixed at 1. The Spearman’s rank correlation coefficient is –0.78, which indicates a strong non-linear relationship. Because of having some outlying values, we can see only a moderate deviation between the well log analysis derived permeability and the data available from core plugs, nevertheless, it is well described by Eq. (12). The input (observed) well logs of factor analysis are plotted in the first eight tracks in Fig. 14, the scaled first factor log is available in track 9 of the same figure. The last track contains the decimal logarithm of permeability derived by the FA-PSO method (lg(K)_FA-PSO) and the core data (lg(K)_CORE) (note that the depth scale represents relative depth coordinates). We conclude that the two independent methods determine similar permeability distribution with depth along the processed interval.

Fig. 13
figure 13

Regression relationship found between the first factor and the decimal logarithm of core permeability measured in mD in Well-3, Pannonian Basin, Hungary

Fig. 14
figure 14

The result of factor analysis in Well-3, Pannonian Basin, Hungary. The input logs of FA-PSO procedure are plotted in tracks 1-8: natural gamma-ray intensity (GR), potassiumthorium product (K×TH), spontaneous potential (SP), deep resistivity (Rd), shallow resistivity (Rs), bulk density (ρb), neutron-porosity (ΦN) and acoustic (P-wave) transit time (Δt). The scaled first factor log (F1) is plotted in track 9. In track 10, lg(K)_FA-PSO represents the factor analysis derived logarithmic permeability (originally given in mD), while lg(K)_CORE shows the core derived value for the same quantity

4 Discussion

Petrophysical properties including volumetric parameters such as porosity, shale volume and water saturation as well as permeability cannot be directly measured by downhole geophysical tools. Normally, we estimate them by inverting the observed wireline logs, which allows also the quantification of their estimation accuracy by using the law of error propagation. In the knowledge of the data covariance matrix including the data variances in its main diagonal, one can derive the covariance matrix of the estimated model parameters using linear inverse theory (Menke 1984). The PSO method as intelligent optimization tool has been implemented in a stochastic calibration context, specifically for the estimation of subsurface properties (Russian et al. 2019; Patani et al. 2021). Its efficiency has been shown in some applications in well logging inversion, too. For instance, by applying a series expansion based discretization scheme the so-called interval inversion method with a great overdetermination (data-to-unknowns ratio) may significantly reduce the uncertainty of the inverted parameters (Dobróka et al. 2016). Global optimization tools like PSO, SA and genetic algorithms can be effectively combined with linearized inversion techniques to solve such an inverse problem. By the successive application of the PSO technique and the interval inversion method, not only the computer processing time of PSO inversion can be reduced, but there is a possibility to calculate the estimation errors of petrophysical properties from the Jacobi matrix generated in a given linear inversion step (Abordán and Szabó 2020; Szabó and Dobróka 2020).

The problem with the estimation of permeability is that the permeability is not included at all in the theoretical tool response functions connecting the model parameters with the measured data (Alberty and Hashmy 1984). Thus, the quality check of this parameter is not possible within the inversion procedure. In the practice, the permeability is derived from the inversion results (or from different other sources), by using some empirical relation. In this study, the suggested method for permeability prediction is based on factor analysis and its results are all set in a deterministic framework. Our approach provides a quite straightforward estimation for permeability using the factor scores. However, it is possible to test the reliability of the regression model by performing noise sensitivity tests using real data. Consider the well logs and core data presented in Sect. 3.4. First we add 5% Gaussian distributed noise to the input logs, then we run the FA-PSO procedure, and we repeat this process 30 times (in different runs, the noise was always regenerated and added to the observed data). The histograms of the resultant regression coefficients (a, b, d) are plotted in Fig. 15. The average and ranges of their estimated values are: a = 2.8 ± 0.7, b = 2.25 ± 0.75, d = 0.25 ± 0.75 (coefficient c is fixed as earlier). The result of regression analyses can be seen in the bottom right panel of the figure, which shows the consistency of regression model (12). The uncertainty of permeability prediction is acceptably low and is comparable with the level of data noise. This is also proven in Fig. 16, where the error intervals of the input logs, and those of the first factor and permeability are plotted, respectively. It can be seen that the uncertainty of permeability prediction is higher in formations with lower permeability, but it is favorable that the same parameter in the hydrocarbon bearing zones can be more reliably produced. The results are also confirmed by the core data available in the processed interval.

Fig. 15
figure 15

Result of repeating the run of FA-PSO procedure 30 times. The input data were contaminated by 5% Gaussian distributed noise. Top panels and bottom left panel show the domain of estimated regression coefficients. Bottom right panel show the possible range of permeability values estimated by factor analysis

Fig. 16
figure 16

The reliability of factor analysis based permeability estimation method tested by 30 repeated runs in Well-3, Pannonian Basin, Hungary. The input logs and their assumed error ranges (at 5% Gaussian noise) are plotted in tracks 1-8: natural gamma-ray intensity (GR), potassium-thorium product (K×TH), spontaneous potential (SP), deep resistivity (Rd), shallow resistivity (Rs), bulk density (ρb), neutron-porosity (ΦN) and acoustic (P-wave) transit time (Δt). The scaled first factor log (F1) including its possible upper and lower boundaries is plotted in track 9. In track 10, lg(K)_FA-PSO represents the factor analysis derived logarithmic permeability and its uncertainty interval, while lg(K)_CORE shows the core derived value for the same parameter

5 Conclusions

In the paper, we introduce a particle swarm optimization assisted factor analysis with automatically optimized control parameters to permeability prediction in hydrocarbon reservoirs. The hyperparameter estimation based PSO process requires no prior knowledge on its learning parameters. A metaheuristic search is guided for estimating both the optimal set of cognitive and social constants and the factor scores. Based on the above, we perform exploratory factor analysis by a new alternative optimization tool. The advantages of the proposed method against traditional approaches are: (I) derivative-free solution is given, (II) optimal misfit is achieved between the measurements and predictions (as global optimization is made), (III) solution is practically independent from the initial selection of factor scores, (IV) highly adaptive procedure applied to different well log suites (see Table 13), (V) more reliable solution is given by automatically estimating the learning parameters of PSO than those treating arbitrarily chosen control parameters, (VI) robust solution can be assured using an appropriate choose of the misfit function. More robust norms like a most frequent value based or L1-norm can be substituted into the Euclidean norm based objective functions (Szabó et al. 2018). At the moment, the only serious drawback of the hyperparameter estimation method is its long computational running time. For automatically calculated hyperparameters, the CPU times increase approximately 300 times compared to those of a single run using fixed values of c1 and c2. However, based on preliminary program runs, parameters c1 and c2 can be determined for the investigated area and then fixed for factor analysis, which may reduce the running time by orders of magnitude. In the knowledge of proper initial values, instead of the SA algorithm, a linearized optimization procedure may be proposed to reduce the running time greatly.

We have established a well usable regression model between the first factor estimated by the particle swarm optimization based factor analysis and the permeability of hydrocarbon-bearing formations. It is noted that pilot tests confirmed the validity of the same regression relation in some hydrocarbon wells in the Carpathian Foredeep, South Poland (and hydraulic conductivity in Hungary and the USA). This allows the estimation of permeability from an independent well-log-analysis method, which is based on the comprehensive interpretation of multiple wireline logs. We have tested the developed method in single wells in Hungarian hydrocarbon-bearing formations, which was extended successfully to two-dimensional application to simultaneously evaluate the data measured in multiple boreholes. The factor analysis derived permeability is compared and verified both by deterministic modeling of well logs and laboratory based measurements. As a next research step, the inertia weight w might also be automatically selected for PSO, and the factor loadings will be used as unknowns as we have done it in our earlier applications. By using fast computers and the tools of parallel computing, a generalized FA-PSO method can be developed for a reliable and fast 3D permeability prediction in different oilfields.