Neural Clustering and Ranking Approach for Gas-Theft Suspect Detection

Pan, Lusheng; Yi, Xiuwen; Chen, Shun; Huang, Yanyong; Zheng, Yu

doi:10.1007/s44230-023-00022-6

Neural Clustering and Ranking Approach for Gas-Theft Suspect Detection

Research Article
Open access
Published: 15 April 2023

Volume 3, pages 68–79, (2023)
Cite this article

Download PDF

You have full access to this open access article

Human-Centric Intelligent Systems Aims and scope Submit manuscript

Neural Clustering and Ranking Approach for Gas-Theft Suspect Detection

Download PDF

Lusheng Pan¹,
Xiuwen Yi ORCID: orcid.org/0000-0003-2703-6794²,
Shun Chen¹,
Yanyong Huang³ &
…
Yu Zheng²

1495 Accesses
1 Citation
Explore all metrics

Abstract

Some boiler room users steal natural gas by refitting equipment without permission in winter, resulting in gas safety hazards and social problems. Instead of random manual on-site inspection, it is crucial to discover gas-theft suspects timely and automatically by analyzing the gas consumption data. Unfortunately, gas-theft behaviors are complex and various, while the caught gas thefts by gas companies are limited. In this paper, we propose a neural clustering and ranking approach to detect gas theft suspects under the positive-unlabeled learning framework. Our approach contains two modules: joint clustering for normal user identification and triplet ranking for suspicious user detection. The former module considers the regular behaviors to distinguish between normal and unstable users by integrating representation learning and clustering. Then, considering the identified normal samples and the labeled gas thefts, the later module excavates the behavior correlations to discover suspects among unstable users through triplet relation ranking. Thus, normal user identification and suspicious user detection are seamlessly connected to overcome the label scarcity problem. We conduct extensive experiments on three real-world datasets, and the results demonstrate the advantages of our approach over various baselines.

Research on FCM-LR cross electricity theft detection based on big data user profile

Article 18 April 2024

Design and Application of Electricity Theft Identification System Based on Typical Industry Electricity Consumption Feature Database and Anti-electricity Theft Expert Database

CAC-WOA: context aware clustering with whale optimization algorithm for knowledge discovery from multidimensional space in electricity application

Article 15 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the continuous adjustment of energy structure, natural gas, as clean energy, is growing rapidly in energy consumption. During the implementation of replacing coal with gas policy, more and more boiler room users supply heating for inhabitants by consuming natural gas in winter [22]. However, some users steal gas through refitting equipment illegally to reduce the measured gas volume and pay gas fees [13]. The gas theft behavior causes huge economic losses and may lead to gas leakage and even explosion, resulting in a series of gas safety hazards and social problems. To catch gas thefts, gas companies carry out manual on-site inspections indiscriminately. Such a random approach is inefficient and lagging due to the lack of specific suspects. Hence, it is crucial to discover gas-theft suspects automatically, which can help narrow the suspicious range to improve the verification efficiency and timely disposal to avoid accidents.

Fortunately, we can remotely collect gas consumption data through smart gas meters. With these sensed data, we can analyze gas usage patterns and further build a data-driven approach to discover gas theft suspects. However, there are two challenges. (1) Diversity of user behavior: the meanings of stealing for gas theft are diverse, and the behavior laws of gas usage are different, which is reflected in that the behavior fluctuation of legitimate users may also appear abnormal; (2) Weak marking: labeled data only accounts for a small part of limited gas thefts, while a large number of unlabelled data include a very large number of normal users and the rest of gas thefts. We cannot generate some synthetic gas theft labels according to the limited labels, because natural gas theft can only be judged through on-site inspection. Besides, it is not easy to set criteria for gas thefts regarding the degree and anomaly pattern.

Typically, this problem can be regarded as a time series anomaly detection task, focusing on pattern-wise anomalies [3]. However, when the abnormal proportion is very small, the direct use of off-the-shelf classifiers may yield biased results. Furthermore, the absence of a normal sample will lead to considerable uncertainty in dealing with new anomalies [6]. Recently, weakly supervised anomaly detection, especially for positive-unlabeled learning framework, has received extensive attention, using limited labeled anomalies and large unlabeled data [15]. The process has two parts [8]: the initial model selects normal sample candidate sets from unlabeled samples [17], and a modified model is then trained on the new labeled data to identify the other anomalies [28].

For gas scenarios, two studies put forward solutions to detect gas-theft suspects. Yi et al. obtained abnormal users through the negative correlation between temperature and gas consumption, and then use One-Class Support Vector Machine (OCSVM) to divide abnormal users into theft suspects and irregular users [27]. Yang et al. detect the stability of gas use mode to obtain normal users, and then combine normal users with gas-theft labels into positive–negative sample pairs to obtain theft suspects through Ranknet [26]. Nevertheless, the two studies have some limitations. Firstly, they set simple and limited statistical indicators, which are easy to cut across the board. Besides, features are extracted manually, which may fail to capture complex gas usage patterns. Furthermore, the utilization of a small amount of abnormally labeled data and a large amount of unlabeled data is low. These operations make the whole process cumbersome and error-prone.

In response to these challenges and drawbacks, we propose a neural clustering and ranking approach to detect gas-theft suspects among boiler room users. Our approach contains two modules: (1) normal user identification, which uses the behavior rules of most normal users to distinguish unstable and normal users whose behavior is abnormal by integrating representation learning and clustering. (2) suspicious user detection, which deeply excavates the correlations between different users and discovers gas-theft suspects among unstable users through the anomaly score of triplet ranking. Thus, the two modules are seamlessly connected by the combination of clustering and ranking neural networks, which can learn the gas consumption patterns in a deep manner and overcome the problem of label scarcity. Our contributions are four folds:

Under the positive-unlabeled learning framework, we propose a neural clustering and ranking approach for gas-theft suspects detection, which can narrow the suspicious range to increase the efficiency of inspection workforces.
Considering the regular behavior of users, we propose a joint clustering module to obtain reliable normal users by joint optimization of representation and clustering, which can learn the pseudo-normal labels and decrease the scope of abnormal users.
Considering the behavior correlations among users, we propose a triplet ranking module to detect suspects by learning the closeness and deviation relations with the constructed triplets, which can improve data utilization.
Extensive experiments on three real-world datasets show that our approach has obvious advantages over baselines in reducing the false-positive rate.

2 Overview

2.1 Task Definition

Given the daily gas consumption records $X = \left\{ x_1, x_2,..., x_n\right\}$ of n boiler room users, we aim to detect whether the user has gas theft behaviors among all users U. Here, $K \in U$ is a very small labeled abnormal user set, which has k users and $k \ll n$.

Figure 1 illustrates the framework of the proposed neural clustering and ranking approach, consisting of two modules: joint clustering for normal user identification and triplet ranking for suspicious user detection. Firstly, we use a variational autoencoder to learn the hidden representation of gas consumption records. Then, considering the representation of close distance within classes and far distance between classes, we cluster the representations into groups to distinguish the normal and unstable users. Here, the joint clustering synchronous optimizes the clustering label allocation and fine-tunes the encoder network. Based on the learned representation, we take the identified normal samples and the labeled abnormal samples as candidate sets, and then construct triplets. After that, we train the anomaly scoring network with the triplets to generate an anomaly score for each given user. If the anomaly score is higher than the threshold, the user is regarded as a suspect. In this way, two modules are connected to overcome the label scarcity problem and achieve better detection accuracy.

3 Normal User Identification

The most collected gas users are unlabeled, while only a small part is labeled abnormal users. Using unlabeled users as normal samples indiscriminately may lead to negative effects, because some of these data have abnormal behaviors. Besides, due to the diversity of abnormal behavior, it is difficult to detect other new types of abnormal behavior from the limited labeled abnormal data. Therefore, our goal is to find reliable normal users and distinguish unstable users. Thus, we can not only reduce the scope of suspects for reducing the overall complexity, but also provide negative samples (normal users) for subsequent detection module.

Considering that normal users account for the vast majority of all users, and most normal behaviors have certain regularity, we use the representation of close distance within classes and far distance between classes for clustering users into groups. Figure 2 shows the framework of joint clustering network. Specifically, we use a variational autoencoder for pre-training, and input extracted gas feature representation to K-means for initialization clustering. After that, joint clustering synchronously optimizes the clustering label allocation and fine-tunes the network. Thus, all users can be classified into normal users and unstable users.

3.1 Variational Autoencoder

Variational autoencoder (VAE) is widely used for dimensionality reduction to learn robust hidden representation. Generally, it compresses input data to a low-dimensional latent space and then reconstructs the data, making the input and output as close as possible. The VAE consists of two parts: an encoder function $f(z|x_i)$ and a decoder $h(x_i|z)$, where $x_i$ indicates the gas consumption time series data and z indicates the hidden representation. The loss function of the VAE is a negative log-likelihood function with the regular terms, which is the loss sum of each data point, expressed as:

$$\begin{aligned} L_v=\sum _{i=1}^{n}(-E_{f(z|x_i)}\,[log_{h_{(x_i|z)}}]+KL(f(z|x_i)||h(z))) \end{aligned}$$

(1)

where n is the number of gas users. The first term is the feature reconstruction loss. The second term measures the degree of approximation of the posterior distribution $f(z|x_i)$ and the real distribution h(z), which encourages $f(z|x_i)$ to approach h(z) and ensures that the potential space is regularized.

3.2 Clustering

In such an almost unsupervised case, we use clustering to divide user groups based on the learned representation. Here, we use K-means to obtain the initial cluster center, in which the cluster number of K-means is selected according to the experimental results. Then, we use Kullback–Leibler (KL) divergence loss [7] to gradually match the model to a suitably-shaped distribution, so as to slowly update the cluster center and data representation.

We first calculate the soft label between the embedded point and the cluster centroid. Then, the cluster centroid is refined by using the auxiliary target distribution to learn from the high confidence allocation. We repeat this process until the convergence criteria are met. By minimizing KL divergence aligned loss, the samples close to the cluster center can be closer, making it easier to divide the data in the representation space. Thus, the cluster loss $L_c$ is calculated with KL divergence to narrow the gap between the theoretical distribution Q and the target distribution P, resulting in a more compatible potential representation:

$$\begin{aligned} L_c=KL(P||Q)=\sum _i \sum _j p_{ij}log \frac{p_{ij}}{ q_{ij}} \end{aligned}$$

(2)

where $q_{ij}$ measures the similarity between embedded sample $z_i$ and cluster j [11], and $p_{ij}$ is the target distribution that represents the sum of the probabilities that all samples z belong to cluster j.

The existing deep joint clustering is composed of neural network and clustering [, 12, 25], but in the process of clustering, minimizing the loss function of clustering may obtain a degenerate solution: the neural network maps all samples x to the same point. That is, the loss function is 0, but all samples are in the same class. Therefore, we add additional constraints to eliminate the degenerate solution. Here, we consider the display constraint samples to distribute them evenly into two categories.

$$\begin{aligned} L_m=max \left\{ 0,\frac{m_{y=1}}{m_y} -b \right\} \end{aligned}$$

(3)

where $m_y$ represents the number of samples participating in the classification, $m_{y=1}$ represents the number of samples predicted as abnormal category, and b is a margin parameter that controls the classification proportion.

To make the feature representation contain the cluster information, we regard the encoding network of VAE as the clustering network, that is, the loss function of the coding network includes reconstruction loss and clustering loss. By synthesizing the above objectives, the joint clustering loss $L_{norm}$ is the sum of reconstruction loss, KL divergence aligned loss, and constraint loss with hyperparameters $\gamma , \beta$ that control the degree of distorting embedded space:

$$\begin{aligned} L_{norm} = L_v + \gamma L_c + \beta L_m \end{aligned}$$

(4)

The robust feature representation learned by the VAE help to improve the performance of the clustering, and the clustering results in turn can guide the network to learn better representation. Joint clustering optimizes VAE and clustering synchronously by the encoding network, which helps the network better learn clustering and strong representation constrained by the decoding network.

4 Suspicious User Detection

After joint clustering for normal user identification, there remain unstable users to be further analyzed. In this section, we aim to detect gas-theft suspects among unstable users. As the number of collected abnormal labels is limited, it is not easy to train the model directly. Hence, we can generate more training samples by the triplet-wise ranking method to improve the data utilization of labeled gas thefts. Considering that similar users are close to each other and different users are mutually exclusive in the representation space, we excavate the behavior correlations between different users to judge whether it is a suspicious user.

Figure 3 shows the framework of the triplet ranking network. Firstly, we take the normal sample set as a normal group, randomly select a single normal sample from the normal sample set, randomly select a single abnormal sample from the abnormal sample set, and construct a triple based on the representation of these samples. Then, we design an anomaly-scoring network with multiple fully connected layers, which takes the representation as input and generates an anomaly score. Considering the closeness deviation in the representation space, we construct the deviation among the triplet members based on the Z-score to design the loss. After that, we input the identified unstable users into the model to get the corresponding score. According to the score of historical behavior, we set an exception score threshold to determine whether it is a suspicious user.

The existing methods realize the end-to-end learning of abnormal scores through neural deviation learning and use some marked anomalies and prior probabilities to force the statistically significant deviation between the abnormal scores and the abnormal scores of upper tail normal data objects [16]. In response to our problem, we formulate the detection problem as a triplet relation learning task to generate more training samples and perform anomaly score learning. Here, we take the identified normal samples as the normal candidate set N and the labeled abnormal samples as the abnormal candidate set A, and construct triplet instance pairs for data enhancement. Specifically, we sample a normal example $z^+$ and an abnormal example $z^-$ from N and A respectively. We use unified sampling instead of importance sampling to generate more triple combinations to improve data utilization. Let $T = \left\{ \left\{ N, z^+, z^- \right\} | z^+\in N, z^- \in A\right\}$ be a meta triplet instance, which contains critical information for discriminating anomalies from the normal user. The normal candidate set the first mock exam to ensure the quality of the triplet sampling.

Based on the representation of each user, we design an anomaly-scoring learner $\phi (\cdot )$ with multiple fully connected layers, generating the corresponding anomaly score for each given input user. To make the statistically significant deviation between the anomaly scores of all abnormalities and the anomaly scores of normal users, we use the closeness deviation to represent the deviation between a user and the normal user group as one of the measurement standards. Specifically, the deviation is specified as a Z-score:

$$\begin{aligned} d(z)=\frac{\phi (z)-\mu _R}{\sigma _R} \end{aligned}$$

(5)

where a reference score $\mu _R$ is defined as the average of the anomaly scores of the normal data set N and $\sigma _R$ is the standard deviation of the anomaly scores of N to calculate the closeness deviation of the triplet.

Then, we define hinge loss $L_{sus}$ for the triplet to optimize the score generator, which optimizes the small deviation between normal samples and normal groups, the large deviation between abnormal samples and normal groups, and the large deviation between abnormal samples and normal samples:

$$\begin{aligned} \begin{aligned} L_{sus}= max\left\{ 0,|d(z^+)|-d(z^-)-\phi (z^-)+\phi (z^+)+c\right\} \end{aligned} \end{aligned}$$

(6)

where c is a margin. The loss pushes the anomaly scores of normal users $\phi (z^+)$ as close as possible to the reference score of the normal group $\mu _R$ and pushes the anomaly scores of normal users $\phi (z^-)$ as far away as possible to $\mu _R$. Note that, if an anomaly $z^-$ has negative $\phi (z^-)$ and $d(z^-)$, the loss is particularly large, which encourages large positive deviations for all anomalies. Therefore, the deviation function of the normal sample group, normal samples, and abnormal samples enables the network to learn the easy-to-explain anomaly score. During the inference phase, we feed the representation of the unstable users into the network, where the users with a higher score are more suspicious.

We use the closeness deviation between triplet members to design the deviation loss, deeply mine the behavior correlation between different users, and use the triplet-wise training method to generate more training samples to improve the data utilization. Besides, we use extracted representation from normal user identification as input format, take the identified normal samples as the training candidate set, and the identified unstable samples as the detection objects. Thus, normal user identification and suspicious user detection are seamlessly integrated, which can overcome the problem of label scarcity.

4.1 Algorithm Psudo-Code

Algorithm 1 outlines the proposed approach. For the normal user identification based on joint clustering, we first initialize the VAE and clustering centers (Line 1), and then train the joint clustering network (Lines 2-4). After that, we detect normal and unstable users, and obtain corresponding representation (Lines 5-6). For the suspicious user detection based on triplet ranking, we construct the triplets (Line 8), and train the triplet ranking model to obtain the corresponding anomaly scores (Lines 9-10). Finally, we predict suspects among the identified unstable users (Line 11).

5 Experiments

5.1 Datasets

We conduct experiments on three real-world datasets [27] collected by three branches of a gas group, denoted by companies A, B, and C for short. In total, there are 3,035 users while only 11 users are labeled as gas thefts. Specifically, there are 584, 781, and 1,670 users with 4, 2, and 5 labeled thefts for companies A, B, and C, respectively. Each boiler user has daily gas consumption, and the time span lasts from November 15, 2018 to March 15, 2019.

5.2 Parameter Setting

5.3 Data Preparation

For the case of occasional missing data, we use the forward-filling method. For each dataset, we normalize the data between [-1,1].

5.4 Normal User Identification

The cluster number of K-means is set to 2. The length of the gas sequence is 121, and the dimension of latent space is 16. The size of the symmetric hidden layer is 64, with ReLu as the activation function. The margin parameter $\gamma$, $\beta$ and b is set to 1, 0.5 and 0.5. $h(z)= Normal(0,1)$ is the standard normal distribution. For pre-training, the feature extractor is trained for 100 epochs with Adam, then trained for further 70 epochs updating the joint clustering. The parameters batch size and learning rate for each subset are set by grid search. For datasets A and B, the batch size is 15 and the learning rate is 0.0001. For dataset C, the batch size is 25 and the learning rate is 0.0005.

5.5 Suspicious User Detection

For ranking network consists of two hidden layers to learn more intricate data interactions, with 8 and 4 hidden units respectively. The network is trained for 100 epochs with Adam and each unit uses ReLu as the activation function. The parameter c is set to 3 and $\sigma _R$ is set to 1.

5.6 Evaluation Methods

We adopt cross-validation on three subsets for evaluation, where two are the training set, and the left one is for evaluation. We use precision (P) and recall (R) for evaluation. Due to the scarcity of labels, the detected anomalies should cover as many labels as possible and avoid false alarms. Therefore, when the recall rate is close to 1, the precision will be as high as possible.

5.7 Baseline Methods

Deep SVDD [20]: an unsupervised anomaly detection based on kernel-based single class classification, which minimizes the hypersphere volume in the AE-based sample feature space. Eps is set to $10^{-6}$.
DBSCAN [5]: a density-based clustering algorithm, which defines the cluster as the largest set of points connected by density. Here, MinPts is set to 4 and eps is set to 0.75.
DAGMM [31]: an unsupervised anomaly detection method combining an automatic encoder and a Gaussian mixture model. Here, the number of training epochs is set to 200 and the size of mini-batches is 256.
SRCNN [18]: a time series anomaly detection by combining the SR and CNN. It adopts the Spectral Residual in the domain of computer vision to strengthen anomalies. Parameters are set as [18] suggests.
SVOC [27]: which uses the temperature deformation method to find normal users, and trains OCSVM with them. Unstable users are ranked by the probability the trained OCSVM predicts. Parameters are set as [27] suggests.
msRank [26]: which uses gas consumption mode clustering to find normal users and discovers gas-theft suspects among unstable users by RankNet-based suspicion scoring. Parameters are set as [26] suggests.
Deep SAD [19]: which is optimized based on Deep SVDD, and realizes that the entropy of the potential distribution of normal data is lower than that of abnormal distribution. Here, $\eta$ is set to 1 and eps is set to $10^{-6}$.
SSD$_k$ [21]: which takes the Mahalanobis distance to the nearest cluster center as the measure of anomaly degree. Here, the number of training epochs is set to 50 and the size of mini-batches is 15.

5.8 Performance Comparison

Table 1 Performance comparison with baselines

Full size table

5.9 Comparison With Baselines

As Table 1 illustrates, we compare Neural Clustering and Ranking Approach (NCRA) with various baselines. Though Deep SVDD and DAGMM can be used in unsupervised situations, neither of them shows good performance when no normal labels are available. DBSCAN does not perform well on high-dimensional time series, which is related to abnormal users gathering together in the form of small clusters. SRCNN, Deep SAD, and $SSD_k$ leverage the gas-theft labels and all unlabeled data, which makes their hit rates higher. Due to the label scarcity of the realistic condition, the data is mixed with normal and unlabeled abnormal users. Thus, using unlabeled data indiscriminately will degrade the performance of these detection methods. SVOC and msRank manually extract simple statistical features and further learn with a shallow model, which may fail to capture complex gas usage patterns. Different from them, NCRA achieves the best recall with higher precision, since it deeply excavates the regular pattern of gas consumption behavior and tightly connects normal user identification and suspicious user detection modules to solve the problem of label sparsity.

5.10 Comparison With Joint Clustering Variants

As shown in Table 2, we compared the joint clustering (JC) with its variants with different representation learning and different loss function combinations. Here, we replace VAE with other representation learning models, like autoencoder (AE) and Gated Recurrent Unit (GRU), while the other parts are consistent in pre-training and joint clustering. JC w/o cons indicates that the constraint loss is not considered, that is, $L_m$ is set to 0 in the Eq. 4. The results show that the anomalies detected by JC can cover the labels as much as possible, avoid false positives and achieve high accuracy. Unlike GRU and AE for feature representation, VAE is more robust to noise and can better learn the representation of gas use behavior. While for the constraint loss, it can avoid mapping all samples into the same user cluster, eliminating the degenerate solution. Note that this is only the first module of the whole method, so it is necessary to ensure that recall is higher, and the accuracy can be improved in the second module.

Table 2 Comparison with different joint clustering

Full size table

5.11 Comparison With Triplet Ranking Variants

As for the two-step methods presented in Table 3, we compare triplet ranking (TR) with typical classifier and rank tasks. OCSVM does not perform well since it only focuses on modeling normal users. Moreover, the data utilization of labeled gas thefts is low as it is only considered to set the statistical threshold. Ranknet only considers the difference between a single normal sample and an abnormal sample. Compared with these rank models, TR deeply achieves the best performance in reducing the false-positive rate. This is because TR mines the behavior correlations to obtain easy-to-explain anomaly scores, and uses the triplet-wise training method to generate more training samples to improve data utilization.

Table 3 Comparison with substitutes for triplet ranking

Full size table

5.12 Visualization of Triplet Ranking Variants

We visualize the triplet ranking variants by using t-SNE to reduce the dimension of potential representation Z from 16 to 2, and draw the partition of unstable users in Fig. 4. Here, different colors represent different user groups in the figure, blue dots represent irregular users, and purple represents abnormal users. We can find that irregular users account for the majority and gather in the middle, while abnormal users account for a small part and are scattered around. The two user groups of JC & Ranknet are not clearly divided. The abnormal user group of JC & OCSVM is obviously distributed around, but the number of abnormal users is higher. JC & TR has the best performance, in which the number of the detected abnormal users is small and distributed around, reflecting the power of the triplet ranking.

5.13 Detected Suspects

Taking the dataset of company A, we rank the unstable users based on the corresponding anomaly scores to get the gas-theft suspects. Figure 5 shows the original gas consumption curve corresponding to the three detected suspects. The red circle indicates that the gas consumption fluctuates too much, that is, the gas consumption is often increased or decreased greatly or even restarted frequently and irregularly. Our approach can mainly detect frequent and large fluctuations, while it is tolerable that some boiler rooms occasionally need to be closed and adjusted in a small range in special circumstances.

5.14 Parameter Analysis

5.15 Cluster Number of Joint Clustering

We take the number of JC clusters equal to the number of classes as a priori knowledge. To demonstrate the representation ability of JC as an unsupervised clustering model, we set the number of clusters in the range of $\left\{ 2, 3, 4, 5, 10, 20\right\}$. As shown in Fig. 6, the precision is the highest when the cluster number is 2. The purpose of JC is to divide users into normal users and unstable users. However, when the number of clusters is bigger than 2, normal users or unstable users may be clustered into multiple clusters due to the diversity of user behavior. In this way, we can not know which cluster represents normal users, resulting in a worse clustering effect.

5.16 Representation Dimension of Joint Clustering

As the input format of clustering, we set the dimension of hidden layer Z in the range of $\left\{ 4, 8, 16, 32, 64\right\}$. As shown in Fig. 7a, if the dimension is 16, the experiment has impressive precision and recall. When the dimension is 4, the higher precision is obtained by the lower recall, which deviates from our original intention. When the representation dimension is too high, the possibility of clusters is reduced due to sparse data distribution and a large number of irrelevant attributes in high-dimensional data. Furthermore, if the representation dimension is too low, the data will lose some important attributes. Hence, choosing the appropriate representation dimension helps to explain the dataset well.

5.17 Sampling Method of Triplet Ranking

As shown in Table 4, we compared the different values of reference score $\mu _R$ and deviation $\sigma _R$ in Eq. 5 of triplet ranking (TR). There are two main ways to generate $\mu _R$ and $\sigma _R$: data driven method is based on feature data, while prior driven method is based on Gaussian prior probability. For deviation $\sigma _R$, we can set $\sigma _R=1$ based on prior-driven methods follow [16], or the standard deviation (SD) of the anomaly scores of the normal data set N based on data-driven methods. For reference score $\mu _R$, we can select the average, median, maximum, or minimum of the anomaly scores of N. When the $\mu _R$ is the average anomaly score and $\sigma _R$ is 1, the result is more impressive. Compared with the maximum, median, and minimum, the average anomaly scores can better represent the normal sample group.

Table 4 Comparison with different deviation loss

Full size table

5.18 Number of Labeled Anomalies

To further explore the improvement affected by the number of labeled anomalies, the precision with different number of gas thefts in JC and TR is shown in Fig. 8a and b. We selected abnormal users with high reliability in the experimental results, and doubled the number of abnormal users marked for data enhancement, but the experimental results did not change. This is because there are many ways to steal gas in the actual scene, which leads to the ever-changing impact on the data layer. Therefore, data enhancement cannot reduce these changes. In addition, our NCRA techniques can better learn these rules by characterizing the differences between stolen gas labels and conventional gas use, rather than just learning about labels.

6 Related Work

6.1 Gas Theft Detection

Natural gas theft detection mainly relies on on-site inspection by employees of the natural gas company, which consumes manpower and material resources [24]. The intelligent instruments regularly report massive gas consumption records, which provides an opportunity for the data-driven detection method. The previous two gas-theft detection articles first divided normal users and unstable users by business characteristics, and then detected suspected users from unstable users by off-the-shelf classifiers [, 26, 27]. The previously proposed data-driven methods set statistical indicators and manually extract simple statistical features, which may fail to capture complex gas usage patterns. Different from them, we use deep learning to detect stolen gas, which is more capable of discovering potential suspects.

6.2 PU Learning

Learning from positive and unlabeled data or PU learning is the task where a learner only has access to positive examples and unlabeled data [10]. The assumption is that the unlabeled data contain both positive and negative examples [4]. This task has attracted increasing interest within the machine learning literature as this type of data naturally arises in many applications [1]. Most methods can be divided into the following three categories: two-step methods [14], biased learning [9] and class prior incorporation [23]. Due to the scarcity of labeled data, we use two-step methods to closely combine the two modules. The common two-step methods using off-the-shelf classifiers directly will lead to low data utilization. Instead, we generate more training samples by triplet-wise training method to improve the data utilization.

6.3 Urban Anomaly Detection

Urban anomalies are typically unusual events occurring in urban environments that may endanger public safety [30]. Recently, data-driven urban anomaly analysis frameworks have been forming, which utilize urban big data and machine learning to detect urban anomalies automatically [2]. The existing works that specifically focus on urban anomaly detection can be categorized into three groups [29]: spatiotemporal feature based, urban dynamic pattern based and video anomaly detection methods. As an individual anomaly detection task, we use feature-based methods to focus on better feature extraction. In the gas scene, we use joint clustering to learn the behavior representation of the original sequence and synchronously divide normal users, and then detect suspects based on triplet ranking.

7 Conclusion

In this paper, we propose a neural clustering and ranking approach to detect gas-theft suspects of boiler room users. Considering the consistency of behavior rules of most normal users, we first distinguish normal users from unstable users and obtain the representation by joint clustering. Based on the detection results, the triplet ranking network detects gas-theft suspects among unstable users by ranking anomaly scores. Experimental results on three real-world datasets demonstrated the superiority of our approach over various baselines in reducing the false-positive rate.

In addition, through the characteristics of NCRA, we show its ability to detect abnormal tasks in time series, which can identify normal users and then classify suspicious users without using supervision information during training. In the future, we will focus on generalizing our approach to more types of gas users and other urban anomaly detection tasks. We also can design a real-time system. The gas consumption records collected by the electricity meter will be reported to Hive, and the data will be migrated to MySQL database using Sqoop every week. Then NCRA provides a suspicious list based on the data of the past 30 days every week.

Data Availability Statement

And this manuscript has no associated data.

References

Bekker J, Davis J. Learning from positive and unlabeled data: a survey. Mach Learn. 2020;109(4):719–60.
Article MathSciNet MATH Google Scholar
Braei M, Wagner S. Anomaly detection in univariate time-series: a survey on the state-of-the-art. arXiv preprint arXiv:2004.00433 (2020).
Cook AA, Mısırlı G, Fan Z. Anomaly detection for iot time-series data: a survey. IEEE Internet Things J. 2019;7(7):6481–94.
Article Google Scholar
Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 213–220 (2008).
Ester M, Kriegel HP, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD international conference on knowledge discovery and data mining. 96: pp. 226–231 (1996).
Gao Y, Shi B, Dong B, Wang Y, Mi L, Zheng Q. Tax evasion detection with fbne-pu algorithm based on pncgcn and pu learning. IEEE Trans Knowl Data Eng. 2021;35(1):931–44.
Google Scholar
Guo X, Gao L, Liu X, Yin J. Improved deep embedded clustering with local structure preservation. IJCAI International Joint Conference on Artificial Intelligence pp. 1753–1759 (2017).
Jaskie K, Spanias A. Positive and unlabeled learning algorithms and applications: a survey. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications. pp. 1–8. IEEE (2019).
Ke T, Lv H, Sun M, Zhang L. A biased least squares support vector machine based on mahalanobis distance for pu learning. Physica A. 2018;509:422–38.
Article MathSciNet MATH Google Scholar
Li XL, Liu B. Learning from positive and unlabeled examples with different data distributions. In: European conference on machine learning. pp. 218–229. Springer (2005).
Maaten vdL. Hinton G. Visualizing data using t-sne. Journal of machine learning research pp. 2579–2605 (2008).
Min E, Guo, X, Liu Q, Zhang G, Cui J, Long J. A survey of clustering with deep learning: from the perspective of network architecture. IEEE Access. 2018;6:39501–14.
Article Google Scholar
Paltsev S, Zhang D. Natural gas pricing reform in china: getting closer to a market system? Energy Polic. 2015;86:43–56.
Article Google Scholar
Pang G, Cao L, Chen L, Liu H. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 2041–2050 (2018).
Pang G, Shen C, Cao L, Hengel AVD. Deep learning for anomaly detection: a review. ACM Comput Surv. 2021;54(2):1–38.
Article Google Scholar
Pang G, Shen C, van den Hengel A. Deep anomaly detection with deviation networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. pp. 353–362 (2019).
Pang G, Yan C, Shen C, Hengel Avd, Bai X. Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12173–12182 (2020).
Ren H, Xu B, Wang Y, Huang C, Xing T, Yang M, Tong J, Zhang Q. Time-series anomaly detection service at microsoft. In: SIGKDD international conference on knowledge discovery and data mining. pp. 3009–3017 (2019).
Ruff L, Vandermeulen AR, Görnitz N, Binder A, Müller E, Müller KR, Kloft M. Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694 (2019).
Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui SA, Binder A, Müller E, Kloft M. Deep one-class classification. In: International conference on machine learning. pp. 4393–4402. PMLR (2018).
Sehwag V, Chiang M, Mittal P. Ssd: a unified framework for self-supervised outlier detection. arXiv preprint arXiv:2103.12051 (2021).
Wang T, Lin B. China’s natural gas consumption peak and factors analysis: a regional perspective. J Clean Prod. 2017;142:548–64.
Article Google Scholar
Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv. 2020;53(3):1–34.
Article Google Scholar
Xiongjun Z, Xingwei L, Zhou G. Study on gas theft and governance abroad. Gas & Heat. 2020;40(4):37–40.
Google Scholar
Yang L, Cheung NM, Li J, Fang J. Deep clustering by Gaussian mixture variational autoencoders with graph embedding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6440–6449 (2019).
Yang X, Yi X, Chen S, Ruan S, Zhang J, Zheng Y, Li T. You are how you use: Catching gas theft suspects among diverse restaurant users. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. pp. 2885–2892 (2020).
Yi X, Yang X, Huang Y, Ke S, Zhang J, Li T, Zheng Y. Gas-theft suspect detection among boiler room users: a data-driven approach. IEEE Trans Knowl Data Eng. 2021;34(12):5796–808.
Article Google Scholar
Zhang J, Xie Y, Pang G, Liao Z, Verjans J, Li W, Sun Z, He J, et al. Viral pneumonia screening on chest x-rays using confidence-aware anomaly detection. IEEE Trans Med Imaging. 2020;40(3):879–90.
Article Google Scholar
Zhang M, Li T, Yu Y, Li Y, Hui P, Zheng Y. Urban anomaly analytics: Description, detection and prediction. IEEE Trans Big Data. 2020;8(3):809–26.
Article Google Scholar
Zheng Y, Capra L, Wolfson O, Yang H. Urban computing: concepts, methodologies, and applications. ACM Trans Intell Syst Technol. 2014;5(3):1–55.
Google Scholar
Zong B, Song Q, Min MR, Cheng W, Lumezanu C, Cho D, Chen H. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In: Proceedings of the 2018 international conference on learning representations. Vancouver: ICLR, pp. 1203–1224 (2018).

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (62176221), the National Key R &D Program of China (2019YFB2103205), and the Beijing Nova program (Z211100002121119).

Funding

This paper is supported by the National Natural Science Foundation of China (62176221), the National Key R &D Program of China (2019YFB2103205), and the Beijing Nova program (Z211100002121119).

Author information

Authors and Affiliations

Artificial Intelligence Institute, Southwest Jiaotong University, Chengdu, China
Lusheng Pan & Shun Chen
JD Intelligent City Research, Beijing, China
Xiuwen Yi & Yu Zheng
School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
Yanyong Huang

Authors

Lusheng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Xiuwen Yi
View author publications
You can also search for this author in PubMed Google Scholar
Shun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanyong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LP and XY contributed this work in writing, methodology, data preprocessing, and software. SC contributed in conceptualization, writing, reviewing, and supervision. YH and YZ contributed in editing, reviewing, and validation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiuwen Yi.

Ethics declarations

Conflict of Interest

The authors declare they have no conflicts of interest.

Ethical Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

The authors declare consent for publication.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pan, L., Yi, X., Chen, S. et al. Neural Clustering and Ranking Approach for Gas-Theft Suspect Detection. Hum-Cent Intell Syst 3, 68–79 (2023). https://doi.org/10.1007/s44230-023-00022-6

Download citation

Received: 10 February 2023
Accepted: 26 March 2023
Published: 15 April 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s44230-023-00022-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Neural Clustering and Ranking Approach for Gas-Theft Suspect Detection

Abstract

Similar content being viewed by others

Research on FCM-LR cross electricity theft detection based on big data user profile

Design and Application of Electricity Theft Identification System Based on Typical Industry Electricity Consumption Feature Database and Anti-electricity Theft Expert Database

CAC-WOA: context aware clustering with whale optimization algorithm for knowledge discovery from multidimensional space in electricity application

1 Introduction

2 Overview

2.1 Task Definition

3 Normal User Identification

3.1 Variational Autoencoder

3.2 Clustering

4 Suspicious User Detection

4.1 Algorithm Psudo-Code

5 Experiments

5.1 Datasets

5.2 Parameter Setting

5.3 Data Preparation

5.4 Normal User Identification

5.5 Suspicious User Detection

5.6 Evaluation Methods

5.7 Baseline Methods

5.8 Performance Comparison

5.9 Comparison With Baselines

5.10 Comparison With Joint Clustering Variants

5.11 Comparison With Triplet Ranking Variants

5.12 Visualization of Triplet Ranking Variants

5.13 Detected Suspects

5.14 Parameter Analysis

5.15 Cluster Number of Joint Clustering

5.16 Representation Dimension of Joint Clustering

5.17 Sampling Method of Triplet Ranking

5.18 Number of Labeled Anomalies

6 Related Work

6.1 Gas Theft Detection

6.2 PU Learning

6.3 Urban Anomaly Detection

7 Conclusion

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Consent to Participate

Consent for Publication

Open Access

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation