1 Introduction

The State of Queensland is home to approximately nine thousand natural gas wells [1], where energy operators depend on positive displacement pumps to produce hydrocarbons from these geographically dispersed Coal Seam Gas (CSG) assets. As the natural gas supplied from these wells is critical to sustaining energy demand in domestic and international markets, operators need to avoid unplanned downtime caused by pump failures. To monitor pump performance, data acquisition and control systems are deployed across the entire fleet of CSG wells, where they gather and transmit time-series data from pumps and well sensors. Depending on the natural gas operating company, a petroleum engineer may be assigned to manage anywhere from fifty to a hundred wells. They monitor streaming time-series data to evaluate pump performance and anticipate any failure that may disrupt gas production. However, monitoring, analysing, and mitigating issues on a well-by-well basis is a tedious task, and most often, critical pump events are either missed or ignored [2]. Most importantly, CSG producers are looking to add several hundred wells in the coming years to sustain global energy demand, which will only exacerbate the real-time pump performance analysis issue. This is where machine learning-assisted pump performance analysis can improve pump life.

1.1 Drawbacks of time-series analysis methods used for artificial lift systems

Generally, time-series analysis of Artificial Lift systems is based on either fuzzy logic [3, 4], physics-based models [5] or machine learning [6,7,8,9] based pattern recognition methods. However, such methods present a drawback when assessing an Artificial Lift system's performance as they identify events without context, which may or may not impact the pump performance. Moreover, these methods rely on labelled or known events, and any new or outlier events are not detected. Furthermore, it is rare to find labelled datasets for Artificial Lift applications. In most cases, the assistance of subject matter experts (SMEs) is required to label data sets for improved failure prediction results [10]. However, labelling patterns in raw time-series data is challenging for SMEs, as each pump presents a different data performance profile where the same anomaly or event may have very different characteristics, such as amplitude and length of an event.

1.2 Limitations of time-series clustering methods

In a recently published paper, where the authors benchmarked eight (8) well-known time-series clustering methods [11], they set limitations for their evaluation methods which are mentioned below:

  1. 1.

    Uniform length time-series: The benchmarked methods mentioned in the paper above were tested on time-series data of uniform length for a pre-defined time-window length. However, time-series data from industrial sensors mostly have non-uniform lengths.

  2. 2.

    Known number of clusters: The datasets tested to benchmark the clustering methods had a known number of clusters (or k values). As our previous publications have demonstrated [12,13,14], it is impossible to pre-define a set number of clusters for industrial time-series data, especially when dealing with data gathered from Artificial Lift Systems.

Another notable research on deep time-series-based clustering [15] mentions similar or related drawbacks. These will be discussed later in the Related Works section.

1.3 A practical approach for streaming time-series analysis of artificial lift systems

To address the drawbacks and limitations mentioned above, we propose a human-assisted approach to labelling clustered time-series data that can be utilized for running streaming performance analytics of positive displacement pumps.

Our research has two unique parts; firstly, we define a streamlined process to cluster multi-variate time-series data. This process is based on our previous research work where we convert multi-variate time-series data into performance heatmap images [14]. These images are then converted to unlabelled clusters based on the methodology defined later in the paper.

Second, to assist with the cluster labelling process, we developed a cluster analysis tool for engineers, where they could apply their petroleum domain expertise to label events of interest. Through this tool, petroleum engineers can combine their expertise with streaming analytics and automate the process of labelling events of interest, allowing them to manage Artificial Lift System proactively. Furthermore, petroleum engineers from two CSG operating companies currently use the cluster analysis tool system developed as part of this research for their daily analysis of approximately five-hundred PCP wells.

2 Overview of Coal Seam Gas production

In eastern Australia, natural gas is predominantly produced through CSG production, where coal seams are depressurized through a dewatering process that allows gas to escape from coal cleats and flow to the surface. Positive displacement pumps are installed in CSG wells, which produce water to the surface and, in the process, depressurize the coal seams. In the oil and gas industry, such pumps are referred to as Artificial Lift Pumps, and a network of these pumps collectively forms an Artificial Lift System. In Fig. 1 (Left), we see how water is displaced from the bottom of the well through the Production Tubing, and gas is produced via the production casing.

Fig. 1
figure 1

(Left) Depiction of natural gas production from coal seam gas (CSG) wells [16], (right) production stages of a coal seam gas well [17]

A salient characteristic of CSG wells is that they have a shorter production life span, usually ten (10) years, compared to conventional gas-producing wells. This lifespan is shown in Fig. 1 (right), with three (3) distinct stages, where a large quantity of water is produced initially to depressurize the coal seams, followed by a production stage with an increase in gas production. Finally, gas rates decline towards the end of the production lifecycle.

As gas production depletes quickly, CSG producers in Queensland must periodically drill and add new wells to maintain natural gas supplies. Hence, many CSG wells are dotted across Queensland, and this geographical spread and density are shown in Fig. 2.

Fig. 2
figure 2

a Geographical Spread of CSG Wells in Queensland. b High-Density Well Population of CSG Wells (Source: Queensland Globe)

2.1 Progressive Cavity Pumps

Like any positive displacement pump, a rotor and a stator work in tandem to push the liquid through to achieve vertical hydraulic lift. Figure 3 shows various components of a PCP assembly installed in a producing well. The rotor and elastomer assembly are designed such that the cavities between them push the fluid through when the rotor is operational.

Fig. 3
figure 3

(Left) Main components of a PCP system. (Right) Cut-out view of PCP rotor and stator [18]

Equations (1) and (2) show the correlation between speed, flow and torque. Time-series trends of these three parameters provide the necessary operational details of PCPs over their lifetime. Hence, the multivariate time-series analysis of our study will focus on these three parameters.

The correlation between flow and speed is shown in Eq. (1) [18].

$$q_{th} = s \omega$$
(1)

where qth = theoretical flow, s = pump displacement, ω = rotational speed.

The correlation between torque and speed is shown in Eq. (2) [18].

$$T_{{{\text{pr}}}} = \frac{{P_{{{\text{pmo}}}} E_{{{\text{pt}}}} }}{C\omega }$$
(2)

where Tpr = polished rod torque, Ppmo = prime mover power, Ept = power transmission efficiency, C = constant, ω = rotational speed

2.2 Data gathering from CSG wells

As CSG wells are located in remote and geographically dispersed areas, operators must utilize Supervisory Control and Data Acquisition (SCADA) Systems to control wells through Wireless Telemetry. Ultra-high frequency (UHF) or microwave radio transmit data from CSG wells to a central control room. Figure 4 shows a layout of a typical CSG well site. The Remote Telemetry Unit (RTU) installed at each well site is responsible for recording data from multiple sensors and forwarding it to a central SCADA system. The data are stored and historized in data servers and delivered onwards to a corporate Historian database. It is important to note that data transferred via SCADA systems may not always have a fixed transmit rate; hence, data reporting time in most cases is asynchronous where time windows are not of identical length. Some SCADA systems use a report-by-exception approach, where data are only transmitted when a critical data point changes based on a pre-set percentage change. The report-by-exception method also produces data of unequal time windows.

Fig. 4
figure 4

Layout and key components of a CSG well

3 Related work

Unlike univariate time-series data, applying anomaly detection and clustering methods to multivariate time-series data are a complex task which requires additional interpretation and insights [19]. In this section, we will further shed light on research gaps in multivariate time-series based anomaly detection and clustering methods. Furthermore, our previous work on Symbolic Aggregation Approximation (SAX)-based performance heatmap conversion [14] will be discussed to demonstrate why this novel approach provides a better basis for a human-in-the-loop approach when clustering multivariate time-series data. Finally, we will discuss why a human-in-the-loop approach adds value to time-series analysis process proposed in this paper.

3.1 Neural net-based anomaly detection

Neural nets have become a popular choice to solve anomaly detection problems in time-series data. One approach proposes using a fully connected convolutional network, U-Net, to identify anomalies in multivariate time-series data [20]. This method treats a fixed-length multivariate time-series snapshot as a multi-channel image. A U-Net segmentation technique is applied to obtain a final convolution layer corresponding to an augmentation map. The last layer includes the anomaly identification classes for the time-series snapshot, and each anomaly class is considered a mutually exclusive event. However, there are two drawbacks to this anomaly detection approach. Firstly, as the U-Net architecture accepts a fixed number of samples as input in a time window, the time-series data must be resized based on up-sampling or down-sampling techniques. Second, as each anomaly is a mutually exclusive event, it is difficult to segregate anomalies of interest from a routine change in process behaviour.

Another neural net-based anomaly detection approach proposes a Multi-Scale Convolutional Recurrent Encoder–Decoder (MSCRED) method [21]. This method converts multivariate time-series data into signature matrices based on the pairwise inner-product of time-series data streams. The matrices are encoded using a fully connected convolutional encoder. A Convolutional Long Short-Term Memory (ConvLSTM) network is used to extract the hidden layer of each encoder stage, which is added to a convolutional decoder to produce a reconstructed signature matrix. The difference between the original signature and the reconstructed matrix is labelled as the residual signature matrix. This matrix defines a loss function that helps the model detect anomalies in multivariate time-series data. The residual signature matrix also helps determine the duration of anomaly events in time-series data based on small, medium, and large time-window duration.

Although the MSCRED methodology is novel in its approach, there are three limitations to using this approach for multivariate time-series analysis. Firstly, identifying anomaly events depends on the time-window duration. Therefore, the duration of the small, medium and large time windows will have to be tuned based on the properties of the time-series data and the application where it will be applied. Secondly, this approach does not consider the state of the process from time zero (t0), when the process was initiated for the first time. This restriction, therefore, fails to observe any changes in pump mechanical degradation, which can provide additional insights into time-series-based performance analysis.

3.2 Neural net-based time-series clustering

Multiple research papers have recently been published on the use of neural net based time-series clustering methods [15, 22,23,24], both for univariate and multivariate data sets. These novel research methods extract feature matrices which are fed to a neural net architecture to extract low-dimensional embedding. The embeddings are then used to cluster the time-series data with a known clustering method, primarily the k-means method, which means the number of clusters must be known beforehand.

Although our approach is similar, we do not need to know the number of clusters beforehand. Most importantly, our low-dimensional embeddings are based on the novel approach of SAX derived time-series performance heatmap images.

3.3 Converting time-series data into performance heatmap images

This section provides an overview of how the SAX-based performance heatmaps are created for improved understanding of Artificial Lift Performance analysis and, more importantly, how these images provide contextual clustering of multivariate time-series data.

3.3.1 Expanding window technique

To understand how PCPs operate in CSG operations, it is essential to look at their performance from the day they are initiated into operation for dewatering wells. For this purpose, we use the expanding window technique shown in Fig. 5, which evaluates the multivariate data in the expansion stride based on the elapsed pump performance. By doing so, the exploratory data analysis methods utilized for performance analysis can capture the varying mechanical dynamics in the PCP through the pump's life.

Fig. 5
figure 5

Depiction of the Expanding Window technique captures PCP performance variation from time t0

3.3.2 Symbolic aggregation approximation (SAX)-based performance heatmaps for PCPs

Performance heatmaps help capture the temporal variation and time-window-based impact of multiple sensor readings in a single image [12]. By converting time-series data into performance heatmaps, it is possible to visualize the sequential variation in sensor readings while understanding the influence of change in sensor values between time windows. Furthermore, the performance heatmap approach is exempt from some of the shortcomings of other time-series-based image conversion techniques.

While other time-series to image conversion methods require a fixed sampling rate for each analysed time window to produce consistent images, the performance heatmap technique overcomes this deficiency by converting sensor values into Symbolic Aggregation Approximation (SAX) symbols [27]. The SAX symbols obtained through the conversion of time-series data are transformed into a symbol matrix and then converted to a performance heatmap—an example of SAX-based time-series image conversion [12]. Figure 6 shows a 1-h time-series trend of flow, speed and torque converted to a performance heatmap.

Fig. 6
figure 6

An example of SAX based time-series image conversion [12]

Moreover, most image conversion techniques [28] are developed for univariate time-series data. Although some techniques convert multivariate data into images [29], they mostly rely on converting univariate data into images and then either stack them horizontally or vertically to create a single 2D image.

3.3.3 Majority and anomaly heatmap images

Once the performance heatmaps are created, they can be split into majority and anomaly event images. Table 1 shows the time-based colour code used to label major, variation and anomaly event in a performance heatmap. In this study, we will only focus on majority and anomaly events, as the variation events are events in transition that are not significant in deducing any abnormal behaviour of the PCPs.

Table 1 Color code for performance heatmaps based on the counts of SAX symbols in a 1-day window

Figure 7 shows how a performance heatmap is split into majority and anomaly event images. We will use these images to create our unsupervised image cluster library for time-series data. Each image has a 48 × 48x3 (6912) pixel dimension.

Fig. 7
figure 7

Splitting a performance heatmap to majority and anomaly event images [12]

Figure 8 provides a breakdown of the majority and anomaly heatmaps where we have three (3) parameters (flow, torque and speed) represented in their respective columns. The position of the coloured boxes provides the state of each parameter, which can either be low, medium or high. These states can be used in cluster labelling processing to group various time-series clusters into similar performance and anomalous event categories.

Fig. 8
figure 8

a Majority heatmap depicting medium flow, high torque and low speed. b Anomaly heatmap depicting medium flow, high torque and no speed event

3.4 Advantages of a human-in-the-loop approach for data labelling

Multivariate time-series data collected from industrial processes are seldom labelled, and hence, extracting any meaningful information from such data requires input from domain experts [19, 25]. This holds true for CSG operations, where the geophysical dynamics between coal seams and the pump require inference from domain experts for correct interpretation of normal and abnormal behaviour. By adding human inference to unlabelled data sets, it becomes easier for domain experts to accept the results generated by machine learning solutions [26].

In our methodology, we rely on cluster labelling from petroleum engineers to correctly identify various performance states of PCPs used in Artificial Lift Systems. The SAX performance heatmaps provide petroleum engineers with a visual context as to why different clusters are identified based on the majority and anomaly heatmaps. In the next section, we will cover in detail the methodology to cluster SAX-derived performance heatmaps and how petroleum engineers label these clusters via a cluster analysis tool.

4 Methodology

In this section, we will discuss the methodology shown in Fig. 9, which is used to cluster un-labelled time-series data:

  1. 1.

    Develop a Performance Heatmap-specific Auto-Encoder: This step will be used as the first dimensionality reduction method to reduce the size of the images, which will lower memory usage and improve calculation times on the computer used for conducting the experiments.

  2. 2.

    Embedding-based Dimensionality Reduction: In this step, we will experiment with various dimensionality reduction methods (DRMs) and reduce the dimension of auto-encoded images to a 2-dimensional plane. Doing so will better visualize the performance heatmaps grouping in an XY plot.

  3. 3.

    Hierarchical Density-Based Spatial Clustering (HDBSCAN): We will use HDBSCAN to identify various clusters within the 2-dimensional plane of various DRMs in step 2.

  4. 4.

    Cluster Labelling: Once the images have been clustered, we will assign a label to known historical events from a selected number of wells using a cluster analysis tool. Once these events are labelled, we will use an automated cluster labelling pipeline to identify events on real-time data.

Fig. 9
figure 9

An overview of the methodology steps

Assumptions

Our methodology works under the following assumptions:

  1. 1.

    Availability of Key Data Variables: To analyse PCP performance, the time-series data should have flow, toque and speed variables. These three variables are needed to produce SAX performance heatmap that is required for the clustering process. For other multivariate time-series application, key variables should be defined based on the processed being analysed.

  2. 2.

    Domain Expertise: Petroleum engineers using the cluster analysis tool should have relevant experience in their field to properly label SAX performance heatmap clusters.

  3. 3.

    Data Completeness: The multivariate data set used for the clustering process must cover the entire operation cycle of a PCP in CSG operation, i.e. the time-series data from beginning to end-of-life of PCPs. This will help capture various performance heatmaps over the life cycle of CSG wells.

Experiment tracking setup

We used Weights & Biases [30] for experiment tracking and visualizations to develop insights for this paper. The Weights & Biases application allows automated tracking of machine learning experiments through Code Sweeps. Through this, multiple combinations of model training, hyperparameter tuning and clustering results can be captured and visualized to obtain the best results for machine learning projects. Figure 10 provides an overview of how multiple sweep experiments can be recorded and visualized to provide actionable insights into the effect of different layer properties for a deep auto-encoder (DAE). All coding for these experiments was done using Python 3.7 and necessary statistics, computer vision and machine learning libraries suited for this Python version.

Fig. 10
figure 10

An example of Weights and Biases (WANDB.) Sweep to analyse the effect of various deep auto-encoder (DAE) layers on validation loss

4.1 I. Auto-encoder-based dimensionality reduction

This section will look at selecting the most optimum auto-encoder (AE) to reduce the latent representation of the performance heatmaps. The SAX-based performance heatmaps have a dimension of 48 × 48 ± 3 pixels (6912 pixels in total). Any data clustering problem must represent data in a two-dimensional space to evaluate results through an X–Y scatter plot. We will utilize an AE approach to minimize pixel dimensionality. Furthermore, reducing dimensionality allows us to examine many images due to reduced processing memory requirement, which improves the overall clustering analysis of the Performance Heatmaps.

4.1.1 i Deep auto-encoder (DAE)

We will start the experiment by developing a DAE that reduces the performance heatmap to fewer dimensions. Figure 11 shows a fully connected neural network with an input, hidden and output layer. These layers form a DAE, where the hidden layer is the reduced latent representation of the input layer. The output layer is the input layer reconstruction based on the hidden layer's interpretation. To gauge the performance of the DAE, we track the validation loss (val_loss), where the lowest value determines the best performing DAE architecture.

Fig. 11
figure 11

An overview of a fully connected neural network with an input, hidden and output layer

Step 1–2-Layer DAE sweep run

In Step 1, we begin the experiment by evaluating a two-layer DAE to gauge the performance of val_loss over different channel sizes. The parameters for the first Sweep Run are as follows:

The settings shown in Table 2 run six (6) sweep experiments and measure the val_loss for different layer combinations. Figure 12 shows that for a two-layer deep neural network, a 16-channel layer2 produces the minimum val_loss compared to an eight or four-channel layer2. As shown in Fig. 13, the decoded image for a 16 × 8-channel DAE configuration does not accurately represent the original image. However, as shown in Fig. 14, the decoded image for a 16 × 16-channel DAE configuration is a more accurate copy of the original image. Table 3 confirms that sweep-4, where layer1 and layer2 are sixteen-channel each, produces the best val_loss for a two-layer DAE.

Table 2 Parameter settings for a 2-layer DAE sweep experiment
Fig. 12
figure 12

Results showing the val_loss from a 2 Layer DAE Sweep Run

Fig. 13
figure 13

Result of 16 × 8 DAE showing the Decoded Image versus the Input Image

Fig. 14
figure 14

Result of 16 × 16 DAE showing the decoded image versus the input image

Table 3 val_loss results from the 2 Layer DAE sweep run

Step 2–3-Layer DAE sweep run

In this step, we will add a third layer to the DAE and try further dimensionality reduction. As per Table 4, we will try dimensions 8, 4 and 2 for the third layer in the DAE Table 3 shows the setup for the Sweep experiment used in this step.

Table 4 Parameter settings for a 3-layer DAE sweep experiment

Figure 15 provides an overview of the three-layer DAE Sweep experiment. A 16 × 16x8 DAE produces the minimum val_loss, and the results of this layer configuration are shown in Fig. 16. Results from all three (3) sweep runs are summarized in Table 5.

Fig. 15
figure 15

Results showing the val_loss from a 3-layer DAE sweep run

Fig. 16
figure 16

Result of 16 × 16x8 DAE showing the decoded image versus the input image

Table 5 val_loss results from the 3-layer DAE sweep run

Step 3—4-Layer DAE sweep run

In this step, we will experiment with dimensions 8, 4 and 2 in the four-layer of the DAE. Table 6 shows the setup for this sweep experiment.

Table 6 Parameter settings for a 4-layer DAE sweep experiment

Figure 17 shows that further dimensionality reduction to 2 or 4 channels increases the val_loss; hence, further reduction from 8 channels is not feasible. However, an eight (8) channel fourth-layer does improve the overall val_loss of the DAE from 0.02292 (Table 5) to 0.022079 (Table 7). Results from the four-layer DAE are shown in Fig. 18, which validates that reducing dimensionality below 8 channels is not feasible with a D.A.E. Hence, our final DAE configuration is 16 × 16x8 × 8 for reducing the time-series heatmaps from 6912 pixels (48 × 48x3) to 8 dimensions.

Fig. 17
figure 17

Results showing the val_loss from a 4-layer DAE sweep run

Table 7 val_loss results from the 4-layer DAE sweep run
Fig. 18
figure 18

Result of 16 × 16x8 × 8 DAE showing the decoded image versus the input image

4.1.2 ii. Convolutional auto-encoder

To see if the val_loss and dimensions can be reduced further, we will use a four-layer convolutional auto-encoder (CAE) architecture. Table 8 shows the Sweep experiment parameters that are investigated using a four-layer architecture to see if the CAE can reduce the image to 8 or fewer dimensions while improving val_loss.

Table 8 Parameter settings for a 4-layer CAE sweep experiment

As shown in Fig. 19, CAE, val_loss for less than 8 dimensions in the fourth layer is relatively high versus 4 or 2 dimensions. However, the 16 × 16x8 × 8 CAE configuration further reduces the val_loss compared to the DAE. Table 9 shows the comparison between the DAE and CAE val_loss. Based on this result, we will use a 16 × 16x8 × 8 CAE to encode the 48 × 48x3 major and anomaly event images to 8 dimensions. The final CAE architecture to encode the images is shown in Fig. 20.

Fig. 19
figure 19

Sweep run confirming that 16 × 16x8 × 8 CAE architecture produces the best val_loss for a 4 Layer CAE

Table 9 Parameter settings for a 4-layer CAE sweep experiment
Fig. 20
figure 20

16 × 16x8 × 8 Convolutional auto-encoder

4.2 II. High-density dimensionality reduction

We have demonstrated that the time-series-based images can be reduced to a latent size of eight (8) dimensions with a convolutional autoencoder. However, to provide a visual distribution context to time-series image clustering, we need to reduce the number of dimensions to two (2), and this can be achieved by utilizing high-density dimensionality reduction techniques. For this paper, we will experiment with three (3) methods which are t-distributed stochastic neighbour embedding (t-SNE) [31], uniform manifold approximation and project (UMAP) [32], and the minimum-distortion embedding (MDE) [33] method. These methods take high-density multi-dimensional points and assign them to a two-dimensional map.

4.2.1 i t-Distributed stochastic neighbour embedding (t-SNE)

t-SNE determines the conditional probability of high-dimensional data points by computing the Euclidean distance between the points. The probability represents similarities between two points and determines if these points could be picked as neighbours [31]. The probability \({p}_{j\mid i}\) is represented by Eq. (3) [31], where \({x}_{i}\) and \({x}_{j}\) are the data points being compared for similarity. Figure 21 depicts the t-SNE distribution for various numbers of major heatmap images. This distribution provides abstract localization with no recognizable high-density areas for the images.

$$p_{{\left( {ji} \right)}} = \frac{{\exp \left( {\frac{{ - \parallel x_{i} - x_{j} \parallel^{2} }}{{2\sigma_{i}^{2} }}} \right)}}{{\sum \exp \left( {\frac{{ - \parallel x_{i} - x_{k} \parallel^{2} }}{{2\sigma_{i}^{2} }}} \right)}}$$
(3)
Fig. 21
figure 21

a t-SNE distribution for 172,193 major heatmap images, b t-SNE distribution for 554,436 major heatmap images, c t-SNE distribution for 817,475 major heatmap images

where \({p}_{j\mid i}\) = conditional probability.

4.2.2 ii. Uniform manifold approximation and projection (UMAP)

Although t-SNE and UMAP share some clustering similarities [32], UMAP differentiates itself by creating high- and low-dimensional similarities for the distances between two points. Equations (4) and (5) [32] provide an overview of how these dimensionalities are calculated.

$$v_{{j{\mid }i}} = \exp \left[ {\frac{{\left( { - d\left( {x_{i} ,x_{j} } \right) - \rho_{i} } \right)}}{{\sigma_{i} }}} \right]$$
(4)

where \({v}_{j\mid i}\)= high dimensional similarities, \({\sigma }_{i}\) = normalizing factor, \({\rho }_{i}\) = distance to the nearest neighbour

$$w_{ij} = \left( {1 + a\parallel y_{i} - y_{j} \parallel_{2}^{2b} } \right)^{ - 1}$$
(5)

where Wij = low-dimensional similarities.

Figure 22 shows the UMAP distribution of various number of heatmap images. As highlighted in Fig. 23, the high-density groupings are visible within the overall high-dimensional structure.

Fig. 22
figure 22

a UMAP distribution for 172,193 major heatmap images, b UMAP distribution for 554,436 major heatmap images, c UMAP distribution for 817,475 major heatmap images

Fig. 23
figure 23

a UMAP distribution of major heatmaps (554,436 Images), b observations of high-density areas in the UMAP distributions, which are separated from low-density areas

4.2.3 iii. Minimum-distortion embedding (MDE)

As the name suggests, the MDE dimensionality reduction method (DRM) pairs items based on vectors calculated from distortion functions. Equation (6) [32] shows the equation to calculate the embedding for vectors, aiming to minimize the average distortion. Like UMAP, similar items will have vectors near each other, and different items will have far apart vectors.

$$\left( X \right) = \frac{1}{{\left| {\mathcal{E}} \right|}}\mathop \sum \limits_{{\left( {i,j} \right) \in {\mathcal{E}}}} f_{ij} \left( {d_{ij} } \right)$$
(6)

where E = embedding, \({d_{ij}} = {\left\| {{x_i} - {x_j}} \right\|_2}\) = set of allowable embeddings, \({f}_{ij}\) = distortion functions, \(\mathcal{E}\) = set of vector pairs.

Figure 24 depicts the distribution of the embeddings for various number of major heatmap images. Again, a concentrated mass in the centre represents similar vectors, and dissimilar vectors are spread around the concentrated group.

Fig. 24
figure 24

a MDE distribution for 172,193 major heatmap images, b MDE distribution for 554,436 major heatmap images, c MDE distribution for 817,475 major heatmap images

Figure 25b shows a zoomed-in view of the concentrated mass of the similar vectors, and Fig. 25c shows how this mass further consists of neighbourhoods of high-density areas.

Fig. 25
figure 25

a MDE distribution for 554,436 major heatmap images, b zoomed view of the high-density area, c zoomed view of the neighbourhoods within the high-density MDE area

4.3 III. Hierarchical density-based spatial clustering (HDBSCAN)

In this step, we will use HDBSCAN to conduct unsupervised clustering of the major heatmap images. The HDBSCAN algorithm is a density-based clustering method, where a simplified cluster tree is produced from which significant clusters are extracted [34].

Based on the experiment run, we see that t-SNE 2-d dimensions produced increasing clusters as the image numbers increased. However, UMAP and MDE have very similar cluster numbers with the increase in image numbers. These results are shown in Fig. 26.

Fig. 26
figure 26

Experiment results highlighting the progression of the number of clusters when running HDBSCAN clustering on different DRM methods

The experiment parameters are set as per Table 10. The cluster size determines the minimum samples in a cluster for it to be considered unique, and the sample size determines the critical mass within the cluster neighbourhood [35]. As per Table 10, our cluster size is 5, and the sample size is 200, which means that our clusters should have a minimum of 5 points, and if that cluster grows beyond 200 points, then that cluster becomes a core point.

Table 10 Parameter settings for the HDBSCAN unsupervised clustering experiment

Figure 27 shows the minimum and maximum clusters produced by each DRM method. Based on this experiment run, we will discard t-SNE for any further assessment. Also, the UMAP clustering with HDBSCAN provides the narrowest distribution of clusters across the number of images used in this experiment. Based on this information, we will have an in-depth look at UMAP and MDE clusters to understand how the images are sorted in the 2D plane and confirm which DRM methods provide us with a useable cluster distribution.

Fig. 27
figure 27

Number of minimum and maximum clusters per DRM method. The minimum cluster number corresponds to images from 10 wells, and the maximum cluster number corresponds to images from 70 wells

4.3.1 i. Clustering analysis

To understand the cluster formation in the MDE and UMAP reduction methods, we need to look at how the number of clusters and outliers varies with different combinations of cluster size and sample size parameters in HDBSCAN. To do this, we will run a clustering experiment with the parameters shown in Table 11.

Table 11 Parameter Settings for the HDBSCAN unsupervised clustering experiment

Figure 28 shows how cluster size and sample size impact the cluster and outlier count in the MDE and UMAP reduction methods. Clustering the UMAP distribution provides consistent cluster counts, with zero outliers in most cases. Moreover, running an independent UMAP clustering experiment as per Table 12 shows that the sample size of less than 30 produces the most consistent results where the outliers are minimised and the number of clusters is below 1000. The details of the independent UMAP clustering experiment are shown in Table 13 and Fig. 29. In Fig. 30a, we see that the MDE method produces a large spread of outliers beyond the core cluster area when the sample size is 5 and the cluster size is 2. However, in Fig. 30b we observe that the UMAP method produces 996 clusters with 0 outliers with the same sample and cluster size settings. Hence, it is clear that the UMAP DRM produces the most consistent  number of HDBSCAN-derived clusters when the sample size is set to 5.

Fig. 28
figure 28

Experiment run showing the effect of Cluster Size and Sample Size on UMAP and MDE cluster count

Fig. 29
figure 29

Experiment run showing the effect of cluster size and sample size on UMAP cluster and outlier distribution

Fig. 30
figure 30

a MDE clusters depicting 1055 clusters and 10082 outliers, b UMAP clusters representing 996 clusters and 0 outliers

Table 12 Parameter settings for the HDBSCAN unsupervised clustering experiment
Table 13 HDBSCAN clustering sweep results showing the effect of sample size and cluster size on UMAP clusters

The 996 clusters and 0 outliers from the UMAP clusters will be used to identify time-series events. Although we have 996 clusters identified in the UMAP distribution, we will use the time-series labelling methodology to generalize the cluster grouping.

4.3.2 ii. Analysing the UMAP and HDBSCAN clusters for Performance Heatmap grouping

To understand the cluster formation in the UMAP reduction methods, we will investigate two cluster areas, as shown in Fig. 31. The performance image grouping for major heatmaps, as shown in Fig. 32a and b, depicts that similar images are grouped in their respective high-density areas. We will use the assigned cluster numbers to label PCP performance events and identify any cluster repetition patterns.

Fig. 31
figure 31

Area 1 and Cluster Area 2 are investigated to understand the performance heatmap grouping

Fig. 32
figure 32

a Major heatmap grouping in Cluster Area 1, b major grouping in Cluster Area 2

Using the experiment steps explained in the previous sections, we get a UMAP and HDBSCAN cluster layout for anomaly heatmaps, as shown in Fig. 33. For the anomaly heatmaps, we get 98 clusters and 0 outliers. Investigating Cluster Area 1, we see the groupings created in the identified dense area. Like major heatmaps, we will use these cluster numbers to identify abnormal and anomalous PCP performance events.

Fig. 33
figure 33

a Cluster grouping for anomaly heatmaps, b anomaly heatmap grouping in Cluster Area 1

4.4 IV Cluster labelling

After numbering the major and anomaly heatmap clusters in the previous step, we will now use a cluster labelling tool to add context to cluster numbers. The cluster labelling tool, developed using PowerBI, provides an intuitive approach to identifying clusters. As part of the cluster labelling process, as depicted in Fig. 34, we will use pre-identified event dates marked by production and Artificial Lift engineers to label events of interest. Furthermore, we will also discuss how cluster labelling can help identify the progression of the majority of heatmaps over the lifetime of a well and depict the degradation of a PCP.

Fig. 34
figure 34

Cluster labelling process with human-in-the-loop

4.4.1 i. Cluster labelling tool

The cluster labelling tool has three (3) areas shown in Fig. 35.  The major heatmap clusters and anomaly heatmap clusters areas, shown in Fig. 35a and b, respectively, show the UMAP distribution of clusters for a particular well being analysed for labelling the time-series data. Fig. 35c shows the time-series trend with a days filter to browse various periods where abnormal or activity of interest may have occurred during PCP operations. Such periods can then be used to place clusters in categories that identify abnormal or anomalous PCP behaviour.

In Fig. 36, we look at a flow disturbance event on Day 84 of PCP operation. Two major heatmap clusters (44, 240) and two anomaly heatmap clusters (87, 90) were observed on Day 83. Upon selecting the area of flow disturbance on the time-series trend, we observed that both anomaly heatmap clusters, 87 & 90, are prevalent and relate to the flow column, as shown in Fig. 8. Furthermore, when the petroleum engineer selects abnormal behaviour period (red dotted area in Fig. 37), only major heatmap cluster 240 is visible, which depicts that the pump is in a high flow and high torque state. Hence, by looking at this grouping, we can state that on Day 84, the PCP saw flow anomaly events while in a high flow, high torque state. Using this methodology, we can group the major and anomaly heatmaps clusters into various states of PCP operations.

Fig. 35
figure 35

a Major heatmap cluster area, b Anomaly heatmap cluster area, c Time-series trend area

Fig. 36
figure 36

Day 84 of PCP operations with the respective major and anomaly heatmap clusters

Fig. 37
figure 37

Day 84 of PCP operations with the selected abnormal behaviour and the respective major and anomaly heatmap clusters

5 Results

This study aimed to demonstrate a streamlined and reproducible method of labelling time-series data gathered from CSG wells, so it may aid production engineers with identifying PCP performance profiles and abnormal production events. Our end-to-end approach is shown in Fig. 38, where saved cluster weights and labels help the streaming analytics process and allow operators to manage PCP wells by exception.

Fig. 38
figure 38

Final method for labelling time-series data with the human-in-the-loop approach

We will highlight in this section how our methodology produces meaningful labelled cluster groups that can be visualized as a coloured sequential bar chart against time-series data. Moreover, the anomalous events detected by our method were consistent among the two operators, specifically the solids and gas through pump events which are detrimental to PCP operational life. The streaming analytics approach was not only able to capture the abnormal event amplitude but also the longevity of the event.

5.1 I. Grouping cluster labels

Based on the observations made with the cluster labelling tool on 20 wells, the 996 major heatmap clusters and 98 anomaly heatmaps were segregated into groups, as shown in Table 14. The group labels were defined based on the experience of production and well surveillance engineers.

Table 14 Heatmap groups based on the observations made in the cluster labelling tool

In Table 15, we see the groups with a sample set of images they represent. For example, the major heatmap group labelled erratic torque shows that images within this group did not have a stable torque profile as no green box was recorded in the centre column. This depicts that the torque fluctuated significantly in this period; hence, no SAX character existed long enough to record any symbol count as a major event as per Table 1. Similarly, the image grouping for anomaly heatmaps in Table 16 describes the state that the PCP is in momentarily or may be considered an abrupt change.

Table 15 Sample images of major heatmap groups
Table 16 Sample images of anomaly heatmap groups

It is important to note that the characteristics of major or the anomaly heatmap groups can provide a performance profile for PCPs independently or in combination.

5.2 II. Cluster sequencing and visual analytics

To understand how heatmap groups define the performance of a PCP, we will use the colour code in Table 17 to represent each image group. These colour codes are then plotted along with the time-series data to understand the cluster sequencing and identify patterns in PCP performance.

Table 17 Color codes for major and anomaly heatmap groups

In Fig. 39, we see the heatmap groups plotted with the time-series trend for a lifespan of PCP well. Figure 39a represents the major heatmap group, and Fig. 39c represents the anomaly heatmap group. As seen in the progression of the major heatmap groups, it matches the state of the PCP performance during the dewatering, stable-flow, and high-torque pumping regimes.

Fig. 39
figure 39

a Major heatmap group plotted with the time-series data, b time-series trend for a sample well, c anomaly Heatmap group plotted with the time-series data

If we look closer at a one-week PCP performance window, as shown in Fig. 40, the details in the major and anomaly heatmap groups become more apparent. In this case, we see the major heatmap groups clearly marking the areas of solids through the pump where the PCP torque increases. At the same time, anomaly heatmap groups also present markers of change (primarily high torque, high flow, low flow) either during or before the solids through pump events occur.

Fig. 40
figure 40

PCP performance profile over one-week showing regions of abnormal activity

5.3 III. Cluster group consistency for anomalous events

Another finding during this study was the repeatability of major and anomaly heatmap group sequencing for events of interest. For example, in Fig. 41, we see the solids through pump events on multiple wells, where the major heatmap groups diverge from ideal to either high torque or high high torque. The major heatmap sequencing is very similar for all such events, regardless of the event's intensity or duration.

Fig. 41
figure 41

Solids through pump profile for different wells

Similarly, in Fig. 42, we see gas through pump events. In this case, the major heatmap groups fluctuate between Ideal and Erratic Torque, whereas the anomaly heatmap groups consistently present with low flow and flow and torque events.

Fig. 42
figure 42

Gas through pump profile for different wells

5.4 IV. Streaming analytics application for PCP performance analysis

Putting the previous steps together, we provided two natural gas operators with a streaming analytics tool, which assists them with identifying early PCP performance issues and alerts when critical anomalous events are detected. An overview of the application is shown in Fig. 43.

Fig. 43
figure 43

Streaming analytics application deployment architecture

6 Conclusion and future works

Based on the above methodology, we demonstrated that the human-in-the-loop cluster labelling method and the streaming analytics tools developed as part of this research provide a reliable and scalable approach to determining and evaluating the performance of PCP-operated wells.

We have shown that various performance patterns can be detected with this approach, and the repeatability of the heatmap patterns provides a better understanding of changing PCP behaviour. Furthermore, notification of changes in performance profile and anomaly markers can be automated, where only events that require immediate attention can be reported in real time. By doing so, production and surveillance engineers can manage their wells by exception, aided by informed insights by the method proposed in this study.

Most importantly, by allowing petroleum engineers to aid with the labelling of time-series data, we could gain their trust in a machine learning-driven approach and, in turn, capture their knowledge of assessing Artificial Lift systems.

During this study, it became evident that the level of granularity to detect performance changes could be improved with smaller expansion stride lengths. In a forthcoming paper, we will present the effect of expansion stride length on cluster groups. Moreover, there is work in progress to apply this method to electric submersible pumps, which are centrifugal pumps used as an Artificial Lift method in conventional oil reservoirs.