A preliminary study on wellbore flow interpretation of fiber optic vibration signals based on K-means clustering algorithm

The wellbore flow analysis of optical fiber vibration signal depends on distributed optical fiber logging. Distributed optical fiber logging technology identifies the fluid in the well through distributed optical fiber acoustic sensor (DAS) and distributed optical fiber temperature sensor (DTS). Distributed optical fiber sensor has the advantages of small underground interference, high efficiency and low cost. In this paper, the wellhead data extracted by the distributed optical fiber acoustic sensor is used to calculate the upper bound of the fluid sound frequency band in the pipe by nonlinear least squares fitting. The K-means clustering algorithm is used to cluster the optical fiber vibration signals in the low frequency band. According to the clustering results, the ratio of the optical fiber signal eigenvalues of each production layers is obtained, and the trend of the ratio of the optical fiber signal eigenvalues of each production layers is judged to be close to the trend of the water absorption intensity. Compared with traditional acoustic logging, the wellbore flow analysis using distributed optical fiber acoustic sensor can quickly determine the production contribution of each layer and the change of fluid phase state in the production cycle. Combined with traditional production logging technology, distributed optical fiber logging shows its reliability and accuracy in data collection, logging interpretation and production application. Starting from the principle of distributed optical fiber acoustic sensing technology, this paper briefly expounds the properties of distributed optical fiber acoustic sensor and the principle of injection profile logging, systematically introduces the processing of distributed optical fiber acoustic data, and emphatically introduces the accuracy of K-means clustering algorithm for analyzing distributed optical fiber acoustic signal and qualitative judgment of production layer, which provides a new idea for judging the accuracy of production layers. A distributed fiber optic logging automatic identification technique based on K-mean clustering algorithm model is proposed. The technique uses distributed fiber-optic acoustic sensors to collect wellhead data by injecting profile logging, and the analysis results are consistent with the production reservoir absorption intensity results. The wellbore flow interpretation of optical fiber vibration signal based on K-means clustering algorithm can effectively classify distributed optical fiber data and solve the content problem in production layer analysis. Better accuracy than distributed temperature logging. A distributed fiber optic logging automatic identification technique based on K-mean clustering algorithm model is proposed. The technique uses distributed fiber-optic acoustic sensors to collect wellhead data by injecting profile logging, and the analysis results are consistent with the production reservoir absorption intensity results. The wellbore flow interpretation of optical fiber vibration signal based on K-means clustering algorithm can effectively classify distributed optical fiber data and solve the content problem in production layer analysis. Better accuracy than distributed temperature logging.


Introduction
Geophysical logging is a conventional and important technical means to find oil and gas in the oil and gas industry. With the continuous improvement of data acquisition continuity and resolution requirements, traditional acoustic logging tools are increasingly unable to meet the technical needs of the exploration industry. The optical fiber sensing logging technology is used to realize the measurement of acoustic wave and other parameters in the whole well section through a fiber. During the logging process, the optical cable is not needed to move, and the downhole dynamic environment is not disturbed. Therefore, the logging is more efficient and accurate [1]. Distributed optical fiber sensing technology with relatively low maintenance cost and dense sampling characteristics has gradually become a research hotspot [2].
The distributed optical fiber acoustic sensing technology uses the optical fiber itself as a sensor for signal acquisition. The distributed optical fiber sensor has the characteristics of no electronic components, no electromagnetic radiation interference, high temperature and high pressure resistance, inert chemical reaction, and stable properties [3][4][5]. The wellbore flow analysis of the optical fiber vibration signal is based on the vibration signal obtained by the feedback of the distributed optical fiber acoustic sensor. The production logging principle is used to collect and analyze the signal through the water injection profile. The characteristic signal is obtained by the distributed optical fiber sensor to complete the monitoring of the vibration signal. Compared with the traditional underground monitoring methods, the advantage of the distributed optical fiber acoustic sensor is that the optical fiber is directly used as the carrier to realize the transmission and reception of the signal, which can realize the realtime underground monitoring in various situations such as production and injection, and obtain the distributed data information from the ground to the bottom of the well [6].
In this paper, according to the characteristic signal obtained by the distributed optical fiber sensor, the images are constructed by the different characteristic data of frequency and amplitude when the fluid with different flow rates and properties passes through the pore throats with different radiuses [7,8]. The main frequency bands are determined by looking for the appropriate mathematical model. Finally, the optical fiber characteristic values are analyzed and calculated based on the determined main frequency bands to preliminarily determine the production layer.
Distributed optical fiber acoustic sensing technology is the latest research result in the field of optical fiber sensing. Although it came out late, it soon realized commercial application and was rapidly applied to exploration and production by the oil and gas industry [9]. It has become an important emerging reservoir monitoring technology throughout the whole life cycle of oil wells [10]. Distributed optical fiber logging technology has gradually completed the transformation of fluid from qualitative analysis to quantitative analysis. Through the monitoring and identification of vibration signals of distributed optical fiber acoustic sensors, the production contribution of each layer and the change of fluid phase state in the production cycle can be effectively and reliably obtained [11,12].
In the next section, this paper will briefly describe the hierarchical injection allocation method and the principle of optical fiber vibration signal recognition. In the second section, this paper will introduce the data preprocessing of wellhead data extracted by the distributed optical fiber acoustic sensor. In the third section, this paper will introduce the K-means clustering algorithm and the qualitative judgment of production layer in detail. In the fourth section, this paper will carry out the error analysis and give the results. Finally, this paper will give the conclusion by summarizing in the fifth and sixth sections.

Fiber optic vibration signal recognition principle
In this paper, the distributed optical fiber acoustic sensor is used to inject profile logging technology, and the oil layers with similar properties and characteristics are combined into a water injection layer by layered injection distribution ( Fig. 1), and the required separate layers are separated by separator. In the same layer, when the water injection amount of each layer is different and needs to be controlled, the water distributor controls the injection amount of each layer [13]. Through the water injection profile measurement, we can know the natural water injection situation of each water injection layer and the water injection situation of the stratified section and the sub-layer after the injection distribution, reveal the contradiction between the water injection layers, and also understand the water injection situation of different parts of the same water injection layer. In the evaluation and analysis, the water absorption of perforation layer is reflected by the perforation thickness, namely, the water absorption thickness, which is described by the water absorption intensity. In the actual operation, the distributed optical fiber acoustic sensor is placed in the well, and the signal is obtained by the geophone through the micro-vibration formed by the fluid flow in the well [14]. The distributed optical fiber acoustic sensor converts the standard telecommunication fiber of thousands of meters into a micro-detector array. The coherent optical time-domain reflection measurement technology is used to observe the weak anti-scattering signal caused by the heterogeneity of the fiber glass core [15,16], and the parameters such as sampling rate, spatial resolution and channel number are optimized in the upper reading and writing unit, so as to transmit the original acoustic data from the reading and writing unit to the processing unit [17,18]. The imaging processing is carried out according to the relationship between the frequency of the fluid flowing in the roar and the speed of the roar friction vibration [19], and the spatial image with good accuracy is formed. It can not only form images near wellbore, but also form formation images near wellbore.
In this paper, the wellhead data extracted by the distributed optical fiber acoustic sensor is used for imaging processing. The nonlinear least squares fitting is used to find the optimal fitting function, and the upper bound of the fluid sound frequency band in the pipe is calculated. The K-means clustering algorithm is used to cluster the low frequency optical fiber vibration signals. According to the clustering results, the ratio of the optical fiber signal eigenvalues of each production layers is obtained, and the trend of the ratio of the optical fiber signal eigenvalues of each production layers is judged to be close to the trend of the water absorption intensity.

Nonlinear least squares model
Least square method is to find a 'best' function s(x) in the function class for a given set of data points, such as (I = 1,2,…, n), so that s (x)≈f (x), and satisfies the condition that the sum of squares of deviations is the smallest, so that s (x) can reflect the basic trend of data on the whole. The advantage of the least square method is that in all unbiased estimation classes of the linear model, the least square estimation is the only unbiased estimation that satisfies the minimum variance. Geometrically, the idea of least squares is to minimize the sum of squares of the distance between the observation point and the estimated point, that is, to find the sum of squares of the distance to the given point (x i ,y i ) as the minimum curve y = s (x). Therefore, the function s (x) is called the fitting function or the least square solution, and the method of solving the fitting function s (x) is called the least square method of curve fitting.
The general formulation of least squares curve fitting is as follows: given m + 1 measurement data points (x i , y i ) ( i = 1,2, …, m) and the weight coefficients required to establish a functional model s(x, c), which c = (c 0 , c 1 , ⋯ , c n ) is some parameters to be determined, a fitted function s (x) is constructed to approximately replace (approximate) an unknown function f (x), so that the following equation holds: where Q is the sum of squared deviations. To achieve a minimum (minimum value), this quadratic function must satisfy the following conditions: Nonlinear least squares fitting, s (x, c) is a nonlinear function of coefficient c = (c 0 , c 1 , ⋯ , c n−1 , c n ) T . The non-linear problem can be transformed into a linear least squares fitting problem by using the local linearization idea. For each step of the calculation process, the calculation error will increase. For more complex functions, the calculation error will seriously affect the fitting results. Therefore, the direct use of nonlinear calculation method can effectively reduce the error.

Pre-processing modeling
In this paper, the distributed optical fiber vibration signal data of X well is extracted by optical fiber sensor in the Third Oil Production Plant of Suning County, Hebei Province.
According to the assumption, the frequency spectrum of the optical fiber vibration signal with the flow rate of 80 m3 in Well X is plotted (Fig. 2). It is generally believed that the upper bound of the frequency of the fluid sound frequency band in the tube is the neighborhood of the local amplification region in Fig. 2. This frequency band is exactly in line with the decline approximation of the fluid signal in the tube (the monotone function gradient is close to the same). The nonlinear function model is used and the normal distribution function is used to fit the solution.
Find the optimal fitting function model: The function has good fitting effect (Fig. 3 locally enlarged area), and then the model is solved and analyzed.
The function coefficient is put into c = (a, b, c, d) T for solution, and the following equation is obtained: The derivative of the obtained function is taken, and the first derivative function less than 1 is taken as the upper bound of the fluid frequency band in the tube, whose value is 60. The upper bound of the sound frequency band of the flow in the tube is 60 Hz, which is solved by combining the local frequency spectrum model in Fig. 3.
Through Figs. 2 and 3, we obtain the comparison of spectrum fitting in Fig. 4. Combined with the above (1) model analysis, it can be concluded that the upper bound of the fluid sound frequency band in the tube is about 60 Hz, but the double exponential function in the

Pre-processing results
To sum up, within 60HZ in X well is the acoustic frequency band of fluid in the pipe (fluid friction in the annular casing of the injection well will cause great error to the original acoustic signal). The fiber vibration and sound signal in this frequency band is no longer considered, as shown in Fig. 5, the preprocessing results are more consistent.

K-means clustering algorithm
Mathematical model of K-means. The dataset containing N tuples is divided into K groups, each group is a cluster, K < N, each group meets the conditions: 1. Any grouping contains at least one data record; 2. Any data record belongs to and only belongs to one subgroup.
Algorithm steps: 1. Optionally select k objects as initial clustering centers; 2. Calculate the distance between each object and the cluster center and reclassify it according to the minimum distance; 3. Recalculate the clustering center until the clustering center does not change, this division makes the following minimum: Where x i is the position of the first sample point;m j is the position of the first clustering center. 4. Loop 2 and 3 steps until the clustering center no longer changes [20].
K-means clustering algorithm pseudo-code

Data analysis
The target formation (water absorption layer) of X well data is extracted and the influence of optical fiber vibration sound signal on water absorption intensity (see Table 1) of 5 water absorption layers (sequence number is 1,2,3,7,8) is analyzed. The Fisher discriminant method is used to calculate the appropriate classification number of indicators, and then the cluster index and cluster centroid are calculated. The sequence number of 3 clusters is shown in Table 2. As can be seen from Table 2, this paper uses Euclidean distance square to measure and uses k-means + + algorithm to initialize cluster center [21,22].

Clustering naming and explanation
According to Table 1, the index data (processed data) are divided into three categories, and the classification results are shown in Table 3. The response value of the first category represents the cluster centroid of the optical fiber vibration and sound signal without response, which is the data to measure the basic value response of the optical fiber vibration signal. Therefore, the value of such cluster centroid can be defined as the basic value of the optical fiber sound signal. The response value of the second category represents the cluster centroid of the optical fiber vibration and sound signal in the normal response (the probability density is in the upper and lower neighborhood of the median), which is the data to measure the normal response of the optical fiber vibration signal. Therefore, the value of such cluster centroid can be defined as the characteristics of the optical fiber sound signal. The response value of the third class represents the cluster centroid of the fiber vibration sound signal in the abnormal response (the signal is in the outlier group), which is the data to measure the abnormal response of the fiber vibration signal. Therefore, the value of such cluster centroid can be defined as the abnormal value of the fiber sound signal.

Cluster score analysis
In this paper, we need to get the data of optical fiber signal characteristic value (optical fiber sound signal characteristics minus the basic value of optical fiber sound signal).
According to the comparison of Table 3 data and Fig. 6 violin figure, it is easy to see that the ratio of optical fiber signal characteristic value and the data of Table 1 water absorption intensity can form a 2.6: 1.1: 5.0: 2.9: 8.8 continuous trend.

Results
In this paper, the wellhead data extracted by the distributed optical fiber acoustic sensor is used to calculate the upper bound of the fluid sound frequency band in the pipe by nonlinear least squares fitting. The K-means clustering algorithm is used to extract the low-frequency optical fiber vibration signal of the target formation and cluster analysis is carried out. The Fisher discriminant method is used to calculate the appropriate classification number between indicators, calculate the cluster index and cluster centroid, and initialize the cluster center by k-means ++ algorithm. Finally, the characteristic value of the optical fiber signal of each production layers is obtained. Comparing the characteristic value of optical fiber signal of each production layers with the water absorption intensity [23] (see   Table 4), the results show a positive correlation (see Fig. 7). According to the positive correlation between formation yield and water absorption intensity, combined with the positive correlation between the optical fiber vibration signal of each production layer (light reflection count) and the flow rate of the production layers, it is judged that the main production layer is 2951. small. Combined with an example, it is speculated that the erosion zone around the borehole wall may be enlarged due to the long-term erosion of water injection and the decrease of formation pressure, resulting in no abnormal amplitude of water absorption layer. It may also be due to the different acoustic energy distribution between different perforation clusters [24], resulting in abnormal DAS signals of individual production layers over time; or in the measurement process, with the increase of the measurement length, the optical signal loss increases, making the measurement data inaccurate [25]. Compared with the water absorption intensity obtained by isotope tracer water injection profile logging in traditional layered injection allocation, this paper conducts error analysis on the characteristic value of fiber vibration signal and the production contribution rate measured by distributed temperature sensor and water absorption intensity respectively. (Taking isotope logging as accurate value, abscissa as accurate inhalation, ordinate as inhalation of temperature and vibration, with the same scale, comparing the interpretation results of temperature and vibration, the accurate value is near the diagonal line). It can be seen that the overall relative error of the eigenvalues of the water absorption intensity ratio fiber signal is better than the production contribution rate (see Fig. 7, Table 5). Combined with the advantages of the distributed optical fiber sensor, such as sensitivity, accuracy, pressure and temperature resistance, the distributed optical fiber acoustic sensor is used for fluid analysis around the well to make the data collection more efficient and accurate [26]. Then, the K-means clustering algorithm is used to process the optical fiber vibration signal analysis to qualitatively determine the production layers, so that the judgment results are not interfered and more accurate.

Discussion
Distributed optical fiber acoustic sensing technology has become an important method for downhole fluid monitoring and identification. In this paper, the wellhead data extracted by distributed optical fiber acoustic sensors are used for fluid analysis around wells for the first time. The processing of distributed optical fiber acoustic data and K-means clustering algorithm are mainly introduced to analyze distributed optical fiber acoustic signals.
According to the hypothesis of 3.3, in order to facilitate the subsequent interpretation, this paper assumes that the basic value of the optical fiber sound signal is C 1 , the characteristic of the optical fiber sound signal is C 2 , the abnormal value of the optical fiber sound signal is C 3 , and the characteristic value vector of the optical fiber sound  signal is C t (where C i ( i = 1, 2, 3, t) is a vector that stores the corresponding signal, C t = C 2 − C 1 ). The water absorption intensity of each stratum is F ( F = (F 1 , F 2 , ⋯ , F n ) , F is the vector that stores the water absorption intensity of each stratum), and the production of each stratum is T ( T = (T 1 , T 2 , ⋯ , T n ) , T is the vector that stores the production of each stratum). Based on the above assumption, the value of C t vector can be obtained, and C t = (C t1 , C t2 , ⋯ , C tn ) . For the convenience of comparison, it can be assumed that the subscripts of vectors in F, T and C t are consistent, and then the representative place is in the same stratum. Through the above experiments, we can obtain: According to the positive correlation between the yield of each formation and the water absorption intensity: In summary, it can be deduced that: Through the above formula, it can be known that the eigenvalue vector C t of the optical fiber sound signal in each production layer is positively correlated with the formation yield T, so the eigenvalue vector of the optical fiber sound signal can be used to accurately infer the formation yield.
The application of distributed optical fiber acoustic sensor in injection profile logging is determined by using distributed optical fiber logging technology and qualitatively judging the accuracy of production layer. As a new judgment method, it has great development prospects.

Conclusion
In this paper, the fluid around the well is analyzed by using the wellhead data extracted by the distributed optical fiber acoustic sensor. The upper bound of the fluid sound frequency band in the pipe is calculated by nonlinear least squares fitting. The K-means clustering algorithm is used to cluster the optical fiber vibration signals in the low frequency band. According to the clustering results, the ratio of the optical fiber signal eigenvalues of each production layer is obtained, and the trend of the ratio of the optical fiber signal eigenvalues of each production layer is judged to be close to the trend of the water absorption intensity. According to the positive correlation between the formation yield and water absorption intensity, it is deduced that the ratio of optical fiber signal eigenvalue of each production layer is positively correlated with the formation yield.
For the first time in this paper, a distributed optical fiber acoustic sensor is used to extract wellhead data from optical fiber vibration signals and qualitatively judge the production layer by K-means clustering algorithm. It is an innovative and frontier judgment method. The accuracy of production layer is qualitatively judged by the C t1 ∶ C t2 ∶ ⋯ ∶ C tn = T 1 ∶ T 2 ∶ ⋯ ∶ T n characteristic value of optical fiber signal, which provides a new idea for judging the accuracy of production layer.