Application of streaming analytics for Artificial Lift systems: a human-in-the-loop approach for analysing clustered time-series data from progressive cavity pumps

Saghir, Fahd; Gonzalez Perdomo, M. E.; Behrenbruch, Peter

doi:10.1007/s00521-022-07995-8

Application of streaming analytics for Artificial Lift systems: a human-in-the-loop approach for analysing clustered time-series data from progressive cavity pumps

Review
Open access
Published: 11 November 2022

Volume 35, pages 1247–1277, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Application of streaming analytics for Artificial Lift systems: a human-in-the-loop approach for analysing clustered time-series data from progressive cavity pumps

Download PDF

1366 Accesses
2 Citations
Explore all metrics

Abstract

Assessing real-time performance of Artificial Lift Pumps is a prevalent time-series problem to tackle for natural gas operators in Eastern Australia. Multiple physics, data-driven, and hybrid approaches have been investigated to analyse or predict pump performance. However, these methods present a challenge in running compute-heavy algorithms on streaming time-series data. As there is limited research on novel approaches to tackle multivariate time-series analytics for Artificial Lift systems, this paper introduces a human-in-the-loop approach, where petroleum engineers label clustered time-series data to aid in streaming analytics. We rely on our recently developed novel approach of converting streaming time-series data into heatmap images to assist with real-time pump performance analytics. During this study, we were able to automate the labelling of streaming time-series data, which helped petroleum and well surveillance engineers better manage Artificial Lift Pumps through machine learning supported exception-based surveillance. The streaming analytics system developed as part of this research used historical time-series data from three hundred and fifty-nine (359) coal seam gas wells. The developed method is currently used by two natural gas operators, where the operators can accurately detect ten (10) performance-related events and five (5) anomalous events. This paper serves a two-fold purpose; first, we describe a step-by-step methodology that readers can use to reproduce the clustering method for multivariate time-series data. Second, we demonstrate how a human-in-the-loop approach adds value to the proposed method and achieves real-world results.

YASA: Yet Another Time Series Segmentation Algorithm for Anomaly Detection in Big Data Problems

Monitoring Support for Water Distribution Systems based on Pressure Sensor Data

Article Open access 06 July 2019

AutoMTS: Fully Autonomous Processing of Multivariate Time Series Data from Heterogeneous Sensor Networks

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The State of Queensland is home to approximately nine thousand natural gas wells [1], where energy operators depend on positive displacement pumps to produce hydrocarbons from these geographically dispersed Coal Seam Gas (CSG) assets. As the natural gas supplied from these wells is critical to sustaining energy demand in domestic and international markets, operators need to avoid unplanned downtime caused by pump failures. To monitor pump performance, data acquisition and control systems are deployed across the entire fleet of CSG wells, where they gather and transmit time-series data from pumps and well sensors. Depending on the natural gas operating company, a petroleum engineer may be assigned to manage anywhere from fifty to a hundred wells. They monitor streaming time-series data to evaluate pump performance and anticipate any failure that may disrupt gas production. However, monitoring, analysing, and mitigating issues on a well-by-well basis is a tedious task, and most often, critical pump events are either missed or ignored [2]. Most importantly, CSG producers are looking to add several hundred wells in the coming years to sustain global energy demand, which will only exacerbate the real-time pump performance analysis issue. This is where machine learning-assisted pump performance analysis can improve pump life.

1.1 Drawbacks of time-series analysis methods used for artificial lift systems

Generally, time-series analysis of Artificial Lift systems is based on either fuzzy logic [3, 4], physics-based models [5] or machine learning [6,7,8,9] based pattern recognition methods. However, such methods present a drawback when assessing an Artificial Lift system's performance as they identify events without context, which may or may not impact the pump performance. Moreover, these methods rely on labelled or known events, and any new or outlier events are not detected. Furthermore, it is rare to find labelled datasets for Artificial Lift applications. In most cases, the assistance of subject matter experts (SMEs) is required to label data sets for improved failure prediction results [10]. However, labelling patterns in raw time-series data is challenging for SMEs, as each pump presents a different data performance profile where the same anomaly or event may have very different characteristics, such as amplitude and length of an event.

1.2 Limitations of time-series clustering methods

In a recently published paper, where the authors benchmarked eight (8) well-known time-series clustering methods [11], they set limitations for their evaluation methods which are mentioned below:

1.
Uniform length time-series: The benchmarked methods mentioned in the paper above were tested on time-series data of uniform length for a pre-defined time-window length. However, time-series data from industrial sensors mostly have non-uniform lengths.
2.
Known number of clusters: The datasets tested to benchmark the clustering methods had a known number of clusters (or k values). As our previous publications have demonstrated [12,13,14], it is impossible to pre-define a set number of clusters for industrial time-series data, especially when dealing with data gathered from Artificial Lift Systems.

Another notable research on deep time-series-based clustering [15] mentions similar or related drawbacks. These will be discussed later in the Related Works section.

1.3 A practical approach for streaming time-series analysis of artificial lift systems

To address the drawbacks and limitations mentioned above, we propose a human-assisted approach to labelling clustered time-series data that can be utilized for running streaming performance analytics of positive displacement pumps.

Our research has two unique parts; firstly, we define a streamlined process to cluster multi-variate time-series data. This process is based on our previous research work where we convert multi-variate time-series data into performance heatmap images [14]. These images are then converted to unlabelled clusters based on the methodology defined later in the paper.

Second, to assist with the cluster labelling process, we developed a cluster analysis tool for engineers, where they could apply their petroleum domain expertise to label events of interest. Through this tool, petroleum engineers can combine their expertise with streaming analytics and automate the process of labelling events of interest, allowing them to manage Artificial Lift System proactively. Furthermore, petroleum engineers from two CSG operating companies currently use the cluster analysis tool system developed as part of this research for their daily analysis of approximately five-hundred PCP wells.

2 Overview of Coal Seam Gas production

In eastern Australia, natural gas is predominantly produced through CSG production, where coal seams are depressurized through a dewatering process that allows gas to escape from coal cleats and flow to the surface. Positive displacement pumps are installed in CSG wells, which produce water to the surface and, in the process, depressurize the coal seams. In the oil and gas industry, such pumps are referred to as Artificial Lift Pumps, and a network of these pumps collectively forms an Artificial Lift System. In Fig. 1 (Left), we see how water is displaced from the bottom of the well through the Production Tubing, and gas is produced via the production casing.

A salient characteristic of CSG wells is that they have a shorter production life span, usually ten (10) years, compared to conventional gas-producing wells. This lifespan is shown in Fig. 1 (right), with three (3) distinct stages, where a large quantity of water is produced initially to depressurize the coal seams, followed by a production stage with an increase in gas production. Finally, gas rates decline towards the end of the production lifecycle.

As gas production depletes quickly, CSG producers in Queensland must periodically drill and add new wells to maintain natural gas supplies. Hence, many CSG wells are dotted across Queensland, and this geographical spread and density are shown in Fig. 2.

2.1 Progressive Cavity Pumps

Like any positive displacement pump, a rotor and a stator work in tandem to push the liquid through to achieve vertical hydraulic lift. Figure 3 shows various components of a PCP assembly installed in a producing well. The rotor and elastomer assembly are designed such that the cavities between them push the fluid through when the rotor is operational.

Equations (1) and (2) show the correlation between speed, flow and torque. Time-series trends of these three parameters provide the necessary operational details of PCPs over their lifetime. Hence, the multivariate time-series analysis of our study will focus on these three parameters.

The correlation between flow and speed is shown in Eq. (1) [18].

$$q_{th} = s \omega$$

(1)

where q_th = theoretical flow, s = pump displacement, ω = rotational speed.

The correlation between torque and speed is shown in Eq. (2) [18].

$$T_{{{\text{pr}}}} = \frac{{P_{{{\text{pmo}}}} E_{{{\text{pt}}}} }}{C\omega }$$

(2)

where T_pr = polished rod torque, P_pmo = prime mover power, E_pt = power transmission efficiency, C = constant, ω = rotational speed

2.2 Data gathering from CSG wells

As CSG wells are located in remote and geographically dispersed areas, operators must utilize Supervisory Control and Data Acquisition (SCADA) Systems to control wells through Wireless Telemetry. Ultra-high frequency (UHF) or microwave radio transmit data from CSG wells to a central control room. Figure 4 shows a layout of a typical CSG well site. The Remote Telemetry Unit (RTU) installed at each well site is responsible for recording data from multiple sensors and forwarding it to a central SCADA system. The data are stored and historized in data servers and delivered onwards to a corporate Historian database. It is important to note that data transferred via SCADA systems may not always have a fixed transmit rate; hence, data reporting time in most cases is asynchronous where time windows are not of identical length. Some SCADA systems use a report-by-exception approach, where data are only transmitted when a critical data point changes based on a pre-set percentage change. The report-by-exception method also produces data of unequal time windows.

3 Related work

Unlike univariate time-series data, applying anomaly detection and clustering methods to multivariate time-series data are a complex task which requires additional interpretation and insights [19]. In this section, we will further shed light on research gaps in multivariate time-series based anomaly detection and clustering methods. Furthermore, our previous work on Symbolic Aggregation Approximation (SAX)-based performance heatmap conversion [14] will be discussed to demonstrate why this novel approach provides a better basis for a human-in-the-loop approach when clustering multivariate time-series data. Finally, we will discuss why a human-in-the-loop approach adds value to time-series analysis process proposed in this paper.

3.1 Neural net-based anomaly detection

Neural nets have become a popular choice to solve anomaly detection problems in time-series data. One approach proposes using a fully connected convolutional network, U-Net, to identify anomalies in multivariate time-series data [20]. This method treats a fixed-length multivariate time-series snapshot as a multi-channel image. A U-Net segmentation technique is applied to obtain a final convolution layer corresponding to an augmentation map. The last layer includes the anomaly identification classes for the time-series snapshot, and each anomaly class is considered a mutually exclusive event. However, there are two drawbacks to this anomaly detection approach. Firstly, as the U-Net architecture accepts a fixed number of samples as input in a time window, the time-series data must be resized based on up-sampling or down-sampling techniques. Second, as each anomaly is a mutually exclusive event, it is difficult to segregate anomalies of interest from a routine change in process behaviour.

Another neural net-based anomaly detection approach proposes a Multi-Scale Convolutional Recurrent Encoder–Decoder (MSCRED) method [21]. This method converts multivariate time-series data into signature matrices based on the pairwise inner-product of time-series data streams. The matrices are encoded using a fully connected convolutional encoder. A Convolutional Long Short-Term Memory (ConvLSTM) network is used to extract the hidden layer of each encoder stage, which is added to a convolutional decoder to produce a reconstructed signature matrix. The difference between the original signature and the reconstructed matrix is labelled as the residual signature matrix. This matrix defines a loss function that helps the model detect anomalies in multivariate time-series data. The residual signature matrix also helps determine the duration of anomaly events in time-series data based on small, medium, and large time-window duration.

Although the MSCRED methodology is novel in its approach, there are three limitations to using this approach for multivariate time-series analysis. Firstly, identifying anomaly events depends on the time-window duration. Therefore, the duration of the small, medium and large time windows will have to be tuned based on the properties of the time-series data and the application where it will be applied. Secondly, this approach does not consider the state of the process from time zero (t₀), when the process was initiated for the first time. This restriction, therefore, fails to observe any changes in pump mechanical degradation, which can provide additional insights into time-series-based performance analysis.

3.2 Neural net-based time-series clustering

Multiple research papers have recently been published on the use of neural net based time-series clustering methods [15, 22,23,24], both for univariate and multivariate data sets. These novel research methods extract feature matrices which are fed to a neural net architecture to extract low-dimensional embedding. The embeddings are then used to cluster the time-series data with a known clustering method, primarily the k-means method, which means the number of clusters must be known beforehand.

Although our approach is similar, we do not need to know the number of clusters beforehand. Most importantly, our low-dimensional embeddings are based on the novel approach of SAX derived time-series performance heatmap images.

3.3 Converting time-series data into performance heatmap images

This section provides an overview of how the SAX-based performance heatmaps are created for improved understanding of Artificial Lift Performance analysis and, more importantly, how these images provide contextual clustering of multivariate time-series data.

3.3.1 Expanding window technique

To understand how PCPs operate in CSG operations, it is essential to look at their performance from the day they are initiated into operation for dewatering wells. For this purpose, we use the expanding window technique shown in Fig. 5, which evaluates the multivariate data in the expansion stride based on the elapsed pump performance. By doing so, the exploratory data analysis methods utilized for performance analysis can capture the varying mechanical dynamics in the PCP through the pump's life.

3.3.2 Symbolic aggregation approximation (SAX)-based performance heatmaps for PCPs

Performance heatmaps help capture the temporal variation and time-window-based impact of multiple sensor readings in a single image [12]. By converting time-series data into performance heatmaps, it is possible to visualize the sequential variation in sensor readings while understanding the influence of change in sensor values between time windows. Furthermore, the performance heatmap approach is exempt from some of the shortcomings of other time-series-based image conversion techniques.

While other time-series to image conversion methods require a fixed sampling rate for each analysed time window to produce consistent images, the performance heatmap technique overcomes this deficiency by converting sensor values into Symbolic Aggregation Approximation (SAX) symbols [27]. The SAX symbols obtained through the conversion of time-series data are transformed into a symbol matrix and then converted to a performance heatmap—an example of SAX-based time-series image conversion [12]. Figure 6 shows a 1-h time-series trend of flow, speed and torque converted to a performance heatmap.

Moreover, most image conversion techniques [28] are developed for univariate time-series data. Although some techniques convert multivariate data into images [29], they mostly rely on converting univariate data into images and then either stack them horizontally or vertically to create a single 2D image.

3.3.3 Majority and anomaly heatmap images

Once the performance heatmaps are created, they can be split into majority and anomaly event images. Table 1 shows the time-based colour code used to label major, variation and anomaly event in a performance heatmap. In this study, we will only focus on majority and anomaly events, as the variation events are events in transition that are not significant in deducing any abnormal behaviour of the PCPs.

Table 1 Color code for performance heatmaps based on the counts of SAX symbols in a 1-day window

Application of streaming analytics for Artificial Lift systems: a human-in-the-loop approach for analysing clustered time-series data from progressive cavity pumps

Abstract

Similar content being viewed by others

YASA: Yet Another Time Series Segmentation Algorithm for Anomaly Detection in Big Data Problems

Monitoring Support for Water Distribution Systems based on Pressure Sensor Data

AutoMTS: Fully Autonomous Processing of Multivariate Time Series Data from Heterogeneous Sensor Networks

1 Introduction

1.1 Drawbacks of time-series analysis methods used for artificial lift systems

1.2 Limitations of time-series clustering methods

1.3 A practical approach for streaming time-series analysis of artificial lift systems

2 Overview of Coal Seam Gas production

2.1 Progressive Cavity Pumps

2.2 Data gathering from CSG wells

3 Related work

3.1 Neural net-based anomaly detection

3.2 Neural net-based time-series clustering

3.3 Converting time-series data into performance heatmap images

3.3.1 Expanding window technique

3.3.2 Symbolic aggregation approximation (SAX)-based performance heatmaps for PCPs

3.3.3 Majority and anomaly heatmap images

3.4 Advantages of a human-in-the-loop approach for data labelling

4 Methodology

Assumptions

Experiment tracking setup

4.1 I. Auto-encoder-based dimensionality reduction

4.1.1 i Deep auto-encoder (DAE)

4.1.2 ii. Convolutional auto-encoder

4.2 II. High-density dimensionality reduction

4.2.1 i t-Distributed stochastic neighbour embedding (t-SNE)

4.2.2 ii. Uniform manifold approximation and projection (UMAP)

4.2.3 iii. Minimum-distortion embedding (MDE)

4.3 III. Hierarchical density-based spatial clustering (HDBSCAN)

4.3.1 i. Clustering analysis

4.3.2 ii. Analysing the UMAP and HDBSCAN clusters for Performance Heatmap grouping

4.4 IV Cluster labelling

4.4.1 i. Cluster labelling tool

5 Results

5.1 I. Grouping cluster labels

5.2 II. Cluster sequencing and visual analytics

5.3 III. Cluster group consistency for anomalous events

5.4 IV. Streaming analytics application for PCP performance analysis

6 Conclusion and future works

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation