Multi-condition identification of thermal process data based on mixed constraints semi-supervised clustering

Zhang, Yue; Gao, Xiaona; Bai, Yingjun; Wang, Mengxue; Tian, Qing

doi:10.1007/s42452-022-05076-y

Multi-condition identification of thermal process data based on mixed constraints semi-supervised clustering

Research Article
Open access
Published: 06 June 2022

Volume 4, article number 194, (2022)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

Multi-condition identification of thermal process data based on mixed constraints semi-supervised clustering

Download PDF

Yue Zhang ORCID: orcid.org/0000-0002-1832-4412^1,2,
Xiaona Gao¹,
Yingjun Bai¹,
Mengxue Wang¹ &
…
Qing Tian¹

878 Accesses
Explore all metrics

AbstractSection Abstract

The multi-model method is used in complex system modeling or industrial data monitoring, and its goal is to establish the sub-models corresponding to different conditions. How to divide the modeling data into datasets corresponding to different conditions in the case of insufficient prior knowledge of the process is an important problem to be solved. Machine learning provides many excellent algorithms for condition identification. However, many algorithms are easily affected by abnormal data, but making full use of prior knowledge can effectively solve this problem. Based on the semi-supervised clustering, mixed constraints which include composite distance and the pairwise constraint are introduced to distinguish strongly and weakly dependent data and clarify boundary data to realize a multi-model condition division of thermal process data. Radial Basis Function neural network is used to realize feature learning, and an online condition identification is constructed. The influence of network structure and parameters on the generalization ability of the recognizer is analyzed. On the premise of not significantly increasing the amount of calculation, the generalization ability is improved by adjusting the weight coefficient in the composite distance, and the weight coefficient under low error rate is given by particle swarm optimization. Compared with classic the methods, such as Sliding Windows, Bottom-Up and Top-Down, the proposed method has better performance in segmentation results.

AbstractSection Article Highlights

The thermal process data in a segment is divided into strongly and weakly dependent data, and different segmentation methods are given.
The definition of composite distance parameter, including time component, value component and velocity component, is given to distinguish different types of data.
The semi-supervised clustering with mixed constraints, superimposing composite distance constraint and pairwise constraint, is given to segment the thermal process data.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

There are two main types of complex system models, the first-principle models and the data-driven models. The first-principle models require strong professional background knowledge and a deep understanding of modeling objects. Also, many approximations and assumptions are needed in the modeling process. Therefore, when the object structure is complex, a first-principle model cannot be obtained. In contrast, the data-driven models are more robust to determine the relationship between the input and output data. These models are based on a small amount of background knowledge and have a strong data processing capability.

The establishment process of data-driven models can be divided into two categories according to whether the model structure needs to be determined in advance. The first category determines the model structure first, and then identifies the model parameters, developing a parameter model. The second category determines model structure and model parameters simultaneously during the modeling process, developing a non-parameter model. The latter methods are more computationally complex, but the model accuracy is higher than the former. However, with the development of deep learning, many efficient algorithms have emerged in these methods, such as stacked autoencoder, deep belief networks (DBN), and long short-term memory (LSTM).

According to the number of models, a data-driven model can be a single model or a multi-model. Commonly, a single model is used to describe objects with obvious linear characteristics, and a multi-model is applied to complex system modeling and complex industrial data monitoring [1,2,3]. The basic idea of the multi-model is to establish sub-models corresponding to different operating conditions or modes with different characteristics to form the model database of conditions or models, which is used to identify the current data status during the online monitoring and select the appropriate sub-model for data monitoring.

The multi-model method needs to establish different monitoring models for multiple modalities with obvious differences, so how to divide the offline historical data into different modes when the prior knowledge of the process is insufficient is an important problem to be solved. This problem is embodied in the working point identification in the multi-model modeling process and the modal identification in the industrial monitoring process.

The traditional methods for multi-condition identification include artificial judgment based on prior knowledge and machine assistance based on the recursive method [4]. With the development of machine learning, many practical methods for condition division or modal identification have been developed, such as Sliding Windows (SW), Top-Down, Bottom-Up, Sliding Window and Bottom-Up (SWAB) [5], Feasible Space Window (FSW) [6]. Ge et al. [7] divided the historical data into several independent groups using the fuzzy C-means clustering method, and then established the sub-models corresponding to the data subsets using feature extraction, and finally, integrated the monitoring results of each sub-model with Bayesian reasoning, the proposed method can handle non-Gaussian information in each operation mode. Zhu et al. [8] proposed a two-layer clustering method based on the global moving window strategy and global clustering solution strategy and successfully divided the data into different modes using this method. The method allows the different ICA–PCA models to overlap. Lu [9] and Zhao et al. [10] applied the K-means clustering algorithm to the period division of batch processes, the performance of the methods in continuous process needs to be discussed. In [11], a fuzzy segmentation of time series based on the core principal component analysis (KPCA) and Gath-Geva (G-G) clustering, where the window division of multivariate time series is used, is proposed, the advantage of the method is that time dimension attributes are introduced as extra variables. Zhang Yue [12] introduced the principal component analysis into the design method of sub-window time span division, and demarcation points of the time span of the sub-windows are determined by the piecewise analysis, rolling merging, and cyclic validation. The final result is obtained by multi-step cyclic iteration, and the calculation is heavy. Song et al. [13] used the recursive local outlier factor algorithm to divide the multimode chemical process into the stable mode and conversion mode and established the corresponding models, in the algorithm, not only the number of modes does not need to be determined in advance, but also details of mode switching can be acquired. Lv [14] proposed a feature extraction method based on the weighted kernel Fisher criterion to improve the clustering accuracy, where the feature mapping is adopted to bring the edge classes and outliers closer to other normal subclasses. LI Wei [15] used the fuzzy C-means clustering algorithm based on the conditional positive definite kernel to realize the clustering division of a dataset, and then the least square support vector regression is conducted for each cluster. The method realized the clustering of irregular data. Zhang Shu Mei [16] proposed an automatic offline modal recognition method for multi-modal processes based on an improved K-means clustering algorithm. The method avoided the influence of manual identification on the results. Among the existing analysis methods, clustering based on unsupervised learning has been most widely used, and its main advantage is that it does not rely on prior knowledge, but it also has certain shortcomings, especially in the case of window or modal boundary data.

This paper considers the time-sequence relationship of the object under study from the perspective of semi-supervised clustering and proposes a hybrid constraint that combines pairwise constraints and time constraints to improve the identification accuracy of working conditions or mode in the boundary area. The simulation results show that the semi-supervised clustering based on mixed constraints has higher accuracy in condition identification, especially in the boundary data.

This paper is organized into three major sections. Firstly, the background knowledge of the paper is introduced, including the cost of condition division and traditional methods. Secondly, the semi-supervised clustering with mixed constraints is introduced in detail. Thirdly, an online recognizer based on RBFNN is designed. Fourthly, the method in this paper is compared with the traditional method by simulation. The final section presents the key conclusions and limitations of this work, while offering future directions for research that could advance the current body of knowledge on this subject.

2 Background knowledge

2.1 Cost of condition division

Process data condition partition is equivalent to the problem of multivariate time series segmentation. In essence, it means that for a given k-dimensional time series $X = \left\{ {\left. {x_{1} ,x_{2} , \ldots x_{T} } \right\}} \right.,x_{T} = \left( {x_{1t} ,x_{2t} , \ldots x_{kt} } \right)^{T}$, the time domain is divided according to the change law of the data and the correlation relationship before and after.

Assuming that the time series is divided into $N$ segments, the boundary time label of the segmentation result is defined as $t = \left\{ {\left. {t_{1} ,t_{2} , \ldots t_{N} } \right\}} \right.$, then the segmentation result $t$ satisfies $0 < t_{1} < t_{2} < \cdots < t_{N} = T$.

In the problem of time series segmentation, $t_{1} ,t_{2} , \ldots ,t_{N}$ are called segmentation boundary or mutation points, $\left[ {t_{1} ,t_{2} \left] , \right[t_{2} + 1,t_{3} \left] , \right[t_{3} + 1,t_{4} } \right], \ldots \left[ {t_{N - 1} + 1,t_{N} } \right]$ are called segmentation segments, and the number of segments $N$ is called segmentation order [17].

Thermal data condition partition or time series segmentation can be described as optimization problem. The overall cost of segmentation is $J\left( t \right)$,

$$ J\left( t \right) = \mathop \sum \limits_{i = 0}^{N - 1} d_{{t_{i} + 1,t_{i + 1} }} $$

(1)

where $d_{s,t} (0 \le s < t \le T)$ is the segmentation error of segment $\left[ {s,t} \right]$. It is local error, and it is determined by the data in the time series segment $\left\{ {\left. {x_{s} ,x_{s + 1} , \ldots x_{t} } \right\}} \right.$,

$$ d_{s,t} = \mathop \sum \limits_{T = s}^{t} (x_{\tau } - \hat{x}_{\tau } )^{T} \left( {x_{\tau } - \hat{x}_{\tau } } \right) $$

(2)

where $\hat{x}_{\tau }$ is estimated value.

It is not a simple single objective optimization problem in the process of condition partition or segmentation. On the basis of ensuring the overall segmentation cost, it is necessary to make the local segmentation cost as close as possible and at a lower level. As mentioned above, the multivariate time series segmentation problem is transformed into a constrained optimization problem or a multi-step optimization problem.

2.2 The traditional methods for multi-condition identification

The traditional methods for multi-condition identification include artificial judgment based on prior knowledge and machine assistance based on the recursive method. Among them, the more famous methods include Sliding Windows (SW), Top-Down, Bottom-Up, Sliding Window and Bottom-Up (SWAB), Feasible Space Window (FSW).

Sliding Windows (SW) [5] the algorithm determines the width of the potential segment by recursive method. Anchor the left point at the first data point, then attempt to approximate the data to the right with increasing longer segments. At point i, the error is greater than the user specified threshold, so the subsequence from the anchor to $i - 1$ is transformed into a segment. The anchor is moved to location $i$, and the process repeats until the entire time series has been transformed into a piecewise linear approximation.

Top-Down [5] the algorithm works by considering every possible partitioning and splitting it at one location. Both subsections are tested to see if their approximation error is below some user specified threshold. If not, the algorithm recursively continues to split the subsequences until all the segments have approximation errors below the threshold.

Bottom-Up [5] firstly, the algorithm creates the finest possible approximation of the time series, so that $n/2$ segments are used to approximate the n-length time series. Next, the cost of merging each pair of adjacent segments is calculated, and the algorithm begins to iteratively merge the lowest cost pair until a stopping criteria is met.

Sliding Window and Bottom-Up (SWAB) [5] the algorithm keeps a small buffer. Bottom-Up is applied to the data in the buffer and the leftmost segment is reported. The data corresponding to the reported segment is removed from the buffer and more datapoints are read in. These points are incorporated into the buffer and Bottom-Up is applied again. This process of applying Bottom-Up to the buffer, reporting the leftmost segment.

Feasible space window (FSW) [6] the algorithm introduces a point called a Candidate Segmenting Point (CSP) which may be chosen to be the next eligible segmenting point. The distances of all the points lying between the last segmenting point and the new chosen one are all within the maximum error tolerance. The key idea of FSW is to search for the farthest CSP to make the current segment as long as possible under the given maximum error tolerance.

The comparison of the above algorithms is shown in Table 1.

Table 1 Comparison results of the above algorithms

Full size table

2.3 Semi-supervised clustering method

Unlike the traditional unsupervised clustering algorithms, such as the K-means algorithm and expectation–maximization (EM) algorithm, semi-supervised clustering combines clustering and semi-supervised learning to improve the clustering performance using a small amount of label data and prior knowledge in massive data. The semi-supervised clustering algorithms can be divided into constraint-based semi-supervised clustering algorithms, distance-based semi-supervised clustering algorithms, and constraint- and distance-based semi-supervised clustering algorithms.

2.3.1 Semi-supervised clustering based on constraints

The idea of such algorithms is to add constraint and restriction information to the traditional clustering basis to improve the clustering performance. The most typical algorithms are the Seeded-K-means [18] and Cop-K-means [19] algorithms, which play a crucial role in the development of semi-supervised clustering from different perspectives of supervisory information.

The Seeded-K-means algorithm uses a small number of labeled samples as seed sets, and obtains the initial cluster centers according to the seed sets, improving the clustering accuracy. Basu et al. [18] define the seed set as follows: when $S$ satisfies $ S$ ⊆ $L$, for $ \forall x_{i} \in S$, where i = 1, 2,⋯,|S|, and |S| is the size of S, and ${ }\exists y_{i} \in Y$, so that $\left( {x_{i} ,y_{i} } \right) \in L$, then S is called the seed set. In particular, when the number of categories which the samples in S belong to is equal to k, S can be expressed as $S = {\text{U}}_{i = 1}^{k} S_{i}$, where $S_{i}$ is the class i non-empty sample set.

Although the semi-supervised clustering algorithm based on the seed sets can effectively improve the clustering performance, it largely depends on the scale and quality of the seed set. Guo Maozu and Deng Chao [20] introduced a semi-supervised clustering algorithm based on the tri-training and data editing, and combined it with the depuration data editing technology to correct and purify the mislabeled samples in the seed set while expanding the size of the seed set to improve its quality.

The Cop-K-means algorithm introduces the idea of pairwise constraints to the traditional K-means algorithm. During the data distribution process, data objects must meet the Must-link (ML) constraints and Cannot-Link (CL) constraints. In the ML, two selected points must belong to the same class, and in the CL, two selected points do not have to be elements of the same class. The constraints have symmetry and transitivity characteristics that are expressed as follows.

Symmetry:

$$ \left( {x_{i} ,x_{j} } \right) \in {\text{ML}}\quad \left( {x_{j} ,x_{i} } \right) \in {\text{ML}} $$

$$ \left( {x_{i} ,x_{j} } \right) \in {\text{CL}}\quad \left( {x_{j} ,x_{i} } \right) \in {\text{CL}} $$

Transitivity:

$$ \left( {x_{i} ,x_{j} } \right) \in {\text{ML }}\& \left( {x_{j} ,x_{k} } \right) \in {\text{ML}}\quad \left( {x_{i} ,x_{k} } \right) \in {\text{ML}} $$

$$ \left( {x_{i} ,x_{j} } \right) \in {\text{CL }}\& \left( {x_{j} ,x_{k} } \right) \in {\text{CL}}\quad \left( {x_{i} ,x_{k} } \right) \in {\text{CL}} $$

Symmetry and transitivity are crucial for pairwise constraints, which means that when a sample is forced to allocate the constraint relationship, the constraint violation will occur only in the case of CL constraint, and in other cases, there will be no sample allocation failure.

The quality of constraint information directly affects clustering results, so the clustering performance can be improved only by obtaining better constraint information. Zhu Yu et al. [21] proposed an improved linked COP-K-means (LCOP-K-means) algorithm based on the breadth-first search and the Cop-K-means algorithm, which improved data stability and clustering accuracy. Li Chao Ming et al. [22] proposed a cross-entropy semi-supervised clustering algorithm based on pair constraints. This method uses the cross-entropy of samples to express the pairwise constraint information, thus providing higher clustering accuracy and better results using a smaller amount of pairwise constraint information.

2.3.2 Semi-supervised clustering combining multiple methods

Chen [23] and Chang [24] have suggested that the semi-supervised clustering algorithm can simultaneously use two types of supervision information, such as class label and pairwise constraints, for clustering, especially when active learning is added to actively label samples, and higher quality supervision information and better clustering results can be obtained.

Wei et al. [25] proposed a semi-supervised clustering method based on pairwise constraints and measures. In the case of data marked by pairwise constraints, the semi-supervised clustering method based on constraints and measures is used to generate different basic clustering partitions, and then the target clustering is conducted by integration.

Compared with the single clustering method, the combination of multiple clustering methods can yield the best use of given monitoring information, improving the algorithm performance. For the identification of multi-model working conditions, this article uses the initial seed set to provide the number of working conditions and the centers reference of the dataset under the same working condition. It focuses on solving the problem of a fuzzy boundary between different working conditions. By introducing mixed constraints, the accuracy of the boundary division of working conditions has been improved.

3 Multi-model condition identification of thermal process data based on mixed-constraint semi-supervised clustering

3.1 Characteristics of thermal process data

Different from batch process and random process, thermal process data has its own unique characteristics, such as strong coupling, non-linear, slow time-varying, etc. Reflected in the data, the main performance is that there are many factors affecting the data change, it is difficult to predict, the correlation between different data is strong, and has obvious time series characteristics.

Thermal process is usually considered as a transition process from a steady state to a new steady state. Therefore, the thermal process data can be divided into steady-state condition data and transition condition data. The characteristic of steady-state data is that it fluctuates in a small range near the steady-state data. The transition condition data appears to be disorderly in an ultra short period, and the overall change trend is to increase or decrease in one direction. There is no instantaneous jump of amplitude in both steady-state data and transient data.

According to the above characteristics of thermal process data, the key to the division of steady-state condition and process condition is to solve three problems: first, the problem of working condition central point; second, data screening problem with obvious strong connection relationship with working condition central point; third, determination of boundary data of adjacent working conditions.

3.2 The flowchart of the condition identification

In this paper, the semi-supervised clustering with mixed constraints is used to realize condition identification, as shown in Fig. 1.

3.3 Initial seed set establishment

After data preprocessing, the dataset is subjected to the whitening process by referring to the idea of the PCA algorithm to reduce the linear correlation under the premise that the data are as true as possible. The goal is to minimize the variance between the original data and the preprocessed data.

$$ {\text{J}} = \frac{1}{N}\mathop \sum \limits_{n = 1}^{N} x_{n} - \tilde{x}_{n}^{2} $$

(3)

$$ \tilde{x}_{n} = \mathop \sum \limits_{i = 1}^{M} a_{ni} u_{i} + \mathop \sum \limits_{i = M + 1}^{D} b_{i} u_{i} $$

(4)

where $x_{n}$ is the original data, $\tilde{x}_{n}$ is the preprocessed data, $N$ is the number of data, $M$ is principal component dimension, $\left\{ {u_{i} } \right\}$ is D dimensional unit orthogonal set.

The correlation coefficients of different dimensions are guaranteed to be as small as possible. The correlation coefficient $\rho_{{{\text{ij}}}}$ is defined as follows:

$$ \rho_{{{\text{ij}}}} = \frac{1}{N}\mathop \sum \limits_{n = 1}^{N} \frac{{\left( {x_{ni} - \tilde{x}_{i} } \right)}}{{\sigma_{i} }}\frac{{\left( {x_{nj} - \tilde{x}_{j} } \right)}}{{\sigma_{j} }} $$

(5)

where $\sigma_{i}$ and $\sigma_{j}$ is standard deviation.

Next, the data is processed by the density-based clustering method [26], and the initial seed sets are established according to the distance from the data centre.

3.4 Clustering data screening based on distance $D_{ij}$

According to the characteristics of the thermal process data mentioned in Sect. 3.1, there are three problems in dividing the steady state and process conditions to be solved: (1) the problem of the central point of the working condition, which can be solved by the previous preliminary seed set; (2) the data screening problem with an obvious, strong connection relationship with the central point of the working condition, which can be solved by the clustering data filtering based on distance from centres; and (3) the problem of determination of the boundary data between adjacent working conditions, which can be clarified by the following boundary based on the hybrid constraint to fulfill.

According to the thermal data characteristics, distance $ D_{ij}$ is defined as a comprehensive index, including the sampling time length from the centre point of working condition $t_{dis}$, the absolute value of amplitude deviation from the center point of working condition $e_{{\text{u}}}$, and the absolute value of the deviation between the data change velocity at the sampling time and the data change velocity at the centre point $\partial_{{\text{v}}}$, and it is expressed as follows:

$$ \begin{aligned} D_{ij} & = \alpha_{1} t_{dis} + \alpha_{2} e_{{\text{u}}} + \alpha_{3} e_{{\text{v}}} \\ D_{ij} & = \alpha_{1} \left| {t_{ij} - t_{{c_{i} }} } \right| + \alpha_{2} \left| {x\left( {t_{ij} } \right) - x(t_{{c_{i} }} )} \right| + \alpha_{3} \left| {\partial \left( {t_{ij} } \right) - \partial (t_{{c_{i} }} )} \right| \\ \end{aligned} $$

(6)

where $\alpha_{i}$ denotes the coefficient of the distance component, referring to the intensity of the component, $t_{{c_{i} }}$ denotes the time label of the centre point of the ith case, $t_{ij}$ denotes the time label of the jth point of the ith case, $x(t_{{c_{i} }} )$ denotes the amplitude of the centre point of the ith case, $x\left( {t_{ij} } \right)$ denotes the amplitude of the jth point of the ith case; $\partial (t_{{c_{i} }} )$ denotes the speed of the centre point under the ith working condition, and $\partial \left( {t_{ij} } \right)$ denotes the speed of the jth point under the ith working condition. Consider the time series $X = \left( {x\left( {t_{1} } \right),x\left( {t_{2} } \right),x\left( {t_{3} } \right) \cdots x\left( {t_{n} } \right)} \right)$, and $\Delta \left( {t_{k} } \right) = x\left( {t_{k} } \right) - x\left( {t_{k - 1} } \right), k = 2, 3, \ldots ,n$ denotes the data increment of time series X from $t_{k - 1}$ to $t_{k}$, ${\text{E}} = \frac{1}{n - 1}\sum\nolimits_{i = 2}^{n} {\left| {\Delta \left( {t_{i} } \right)} \right|}$ denotes the average absolute value of $\Delta \left( {t_{k} } \right)$, $\partial \left( {t_{k} } \right) = \frac{1}{E}\left| {\Delta \left( {t_{k} } \right)} \right|$ denotes the equalization of the data increment of time series X from $t_{k - 1}$ to $t_{k}$, and it represents the rate of change at time $ t_{k}$, and it is dimensionless data.

Therefore, the data with a strong relationship with the working condition center point can be converted into the clustering based on distance $D_{ij}$. The objective function of the clustering problem is given by:

$$ {\text{J}} = \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{M} D_{ij}^{2} $$

(7)

The clustering based on distance $D_{ij}$ can be used to divide the data with strong affiliation into datasets corresponding to different working conditions.

3.5 Clarification of class boundaries based on mixed constraints

The boundary data between adjacent working conditions refer to the data with a weaker relationship with the center point of a working condition. From the perspective of modeling, such data may belong to two adjacent working conditions at the same time; specifically, the closer the data to the boundary point are, the more obvious the multi-condition attribute of the data is. However, the problem of jitter and jumping of the condition attribute needs to be avoided. The standard K-means clustering algorithm cannot overcome this problem. For example, a set of typical thermal data, namely the bed temperature data of a fluidized bed boiler, is used for clarification. It includes 3000 sampling points with a sampling interval of 5 s. The standard K-means clustering is used to classify the normalized data from the dimension of value. The results are shown in Fig. 2.

In Fig. 2, different colors represent different categories; the red line represent the category belonging of the corresponding sampling time points. The standard K-means clustering considers only the value, so the category lines are irregular and frequently jittered, and such a classification is of little significance.

In view of this, mixed constraints are designed to achieve clearer working condition boundaries and avoid jitter problems. According to the category continuity characteristics of adjacent points, the pairwise constraint is improved by superimposing on the distance constraint, and the results are as follows:

$$ {\text{J}} = \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{M} D_{ij}^{2} + \mathop \sum \limits_{{\begin{array}{*{20}c} {x_{i} ,x_{j} \in M} \\ {s.t. l_{i} \ne l_{i} } \\ \end{array} }} {\text{w}}_{ij} + \mathop \sum \limits_{{\begin{array}{*{20}c} {x_{i} ,x_{j} \in C} \\ {s.t. l_{i} = l_{i} } \\ \end{array} }} \overline{w}_{ij} $$

(8)

where $M$ and $C$ are the given Must-link set and Cannot-link set respectively; ${\text{w}}_{ij}$, $\overline{w}_{ij}$ is the penalty weight for violating Must-link and Cannot-link constraint rules, respectively. In semi-supervised clustering method with pairwise constraints, the constraint set satisfying li = lj is called the Must-link set, and the constraint set satisfying li ≠ lj is called Cannot-link set.

In Eq. 8, $D_{ij}$ is the composite distance mentioned above, which is composed of the change characteristics of data and the time span from the central point of the category. It makes the category of data clearer. Pairwise constraints reduce the frequent jumps of the categories of boundary data.

Moreover, the boundary area data may have multi category attributes, which can be judged by the change rate of time series data. For example, when the change rate of sampling points is less than a certain fixed value within a time span, it can be approximately considered that the data within the time span is in a steady state. If the time span is just within the boundary interval, the data can have multi category labels and belong to two adjacent categories.

3.6 Hyperparameter optimization

In the working condition identification process, there are three hyperparameters; the first is the time span (time distance) of data with a strong relationship with the center point of clustering; the second is the time span of multi-category labels in the boundary area; and, the third is the time span $t_{s}$ that is used to calculate the change in the data increment at a certain time series. There are three common methods for determining the hyperparameters, manual method based on experience, machine-assisted method, and algorithm-based method. In this work, the machine- assisted method has been selected. Therefore, there are two problems to be solved, an optimization objective function and an optimization method.

The model performance is modeled by polynomial fitting under the same conditions for all partitioned working condition data, and the accumulated error of the multi-model is taken as an optimization objective function of the hyperparameter optimization.

$$ {\text{E}} = \mathop \sum \limits_{{i = {\text{i}}}}^{m} \left[ {\frac{1}{2}\mathop \sum \limits_{j = 1}^{n} \left( {y_{ij} \left( {x_{ij} ,\omega } \right) - y_{ij} } \right)} \right] $$

(9)

The conventional grid-search method has been selected as the optimization method.

4 Online condition recognizer based on RBFNN

RBFNN is considered to be one of the most promising algorithms. Compared with other artificial neural networks, RBFNN is more popular because of its simple structure, fast learning process and appropriate approximation ability [27, 28]. There are many factors that affect the performance of RBFNN, such as the network structure, hidden layer activation function, the connection between nodes, training methods, and so on. Many researchers have given good suggestions in these methods. Mosavi et al. analyzed the hidden layer structure of RBF neural network and proposed an efficient training method, which uses Stochastic Fractal Search Algorithm (SFSA) for training RBFNN [29]. By choosing a more reasonable radial basis function, the problem of slow classification speed of RBFNN is solved [28, 30, 31].

The connection between nodes determines the behavior of neural network. There are many connection methods, The most common is full connection. All nodes in a layer are connected to all nodes in a higher layer, and there are other methods such as sparse connection network [32] and direct connection between input and output nodes [33]. If the selection of radial basis function is reasonable, the hidden layer nodes in full connection mode can filter the input data well.

Improving efficiency and ensuring quality is the direction of training method improvement. The growing RBFNN improves training efficiency by increasing the number of RBF units at each learning step [32]. It is an effective solution to ensure the training quality by k-fold cross validation of training samples and validation samples [33].

In summary, the design of RBFNN needs to solve the number of nodes, the type of hidden layer activation function, the connection mode between nodes and the weight adjustment method.

4.1 Structure design of RBF neural network

RBFNN has three layers, such as input layer, output layer and hidden layer. The number of input layer and output layer nodes is easy to determine, which is determined according to the characteristic variables of the model. In this paper, the number of input nodes $N_{input}$ is determined by the composite distance $D_{ij}$, including the data value and change rate at the current sampling time, and the data value and change rate at the adjacent sampling time,

$$ N_{input} = N_{range} \times N_{sample} $$

(10)

where $N_{range}$ is the range of adjacent data and $N_{sample}$ is the number of the reference information at the sampling time.

The number of hidden layer nodes $N_{hidden}$ is determined by the dimension of input data $N_{input}$ and the number of conditions $N_{cond}$. The number of working conditions can be determined by machine learning method [12],

$$ N_{hidden} = N_{input} \times N_{cond} $$

(11)

Considering the nonlinearity of the thermal process data, radial basis function can describe the information of conditions (The parameter C of radial basis function represents the center of condition, and the closer to the center, the greater the output value). Therefore, standard RBF (SRBF), Cauchy RBF (CRBF), inverse multiquadric RBF and generalized inverse multi-quadric Functions can meet the basic requirements of the algorithm.

4.2 Weight calculation of RBFNN

Based on what is explained, the RBFNN structure comprises three layers as shown in Fig. 3. The input of the ith hidden neuron ${\text{s}}_{i}$,

$$ s_{i} = \left[ {x_{1} \omega_{1,i}^{h} ,x_{2} \omega_{2,i}^{h} ,x_{3} \omega_{3,i}^{h} , \ldots x_{n} \omega_{n,i}^{h} , \ldots x_{N} \omega_{N,i}^{h} } \right] $$

(12)

where $n$ is the index of input; $i$ is the index of hidden unit; $x_{n}$ is the nth input and $\omega_{n,i}^{h}$ is the input weight between nth input and ith hidden unit.

The output ${\text{o}}_{j}$ of jth neuron is calculated as follows:

$$ {\text{o}}_{j} = \sum\limits_{p = 1}^{P} {{\varphi }_{p} \left( {s_{p} } \right)\omega_{p,j}^{o} + \omega_{0,j}^{o} } $$

(13)

Here $j$ is the index of output, $\omega_{p,j}^{o}$ is the output weight between the pth hidden neuron and output neuron $j$, and $\omega_{0,j}^{o}$ is the bias weight of the jth output neuron.

The weight adjustment value is obtained by gradient descent method [34]:

$$ \Delta {\upomega }_{ij} = \eta_{1} \left( {y_{i}^{\left( k \right)} - f_{i} \left( {x^{\left( k \right)} } \right)} \right)\varphi_{j} \left( {x^{\left( k \right)} } \right) $$

(14)

$$ \Delta {\upmu }_{j} = \eta_{2} \varphi_{j} \left( {x^{\left( k \right)} } \right)\frac{{x - {\upmu }_{j} }}{{\sigma_{j}^{3} }}\mathop \sum \limits_{i = 1}^{m} {\upomega }_{ij} \left( {y_{i}^{\left( k \right)} - f_{i} \left( {x^{\left( k \right)} } \right)} \right) $$

(15)

$$ \Delta \sigma_{j} = \eta_{3} \varphi_{j} \left( {x^{\left( k \right)} } \right)\frac{{x - {\upmu }_{j}^{2} }}{{\sigma_{j}^{3} }}\mathop \sum \limits_{i = 1}^{m} {\upomega }_{ij} \left( {y_{i}^{\left( k \right)} - f_{i} \left( {x^{\left( k \right)} } \right)} \right) $$

(16)

The online condition recognizer realizes the mapping from continuous data to discrete data. By growing the number of hidden layer nodes, the nonlinear approximation ability of the network will be improved, and the amount of computation will also be increased. By increasing the number of adjacent sampling time and the data information of sampling time, the recognition rate of the network can be improved.

5 Simulation analysis

5.1 Condition division

In this section, the typical thermal data set used above is used for simulation. The normalized data is shown in the scatter in Fig. 2.

5.1.1 Compare the single time constraint with mixed constraints

Based on the changing trend of the data, there are many working conditions. The clustering analysis is used to divide the working conditions, and the results of standard K-means clustering are shown in Fig. 2. The category labels of the data clearly showed that in the adjacent boundary areas of different working conditions, the data category had repeated jumping process, which differed from prior knowledge of condition division. After adding a single time tag constraint, the classification results shown in Fig. 4 are obtained.

In Fig. 4, the method of Semi-supervised clustering results with single time constraint is shown to have superior performance compared to the standard K-mean clustering, which has been shown in Fig. 2. But, When the time tag is around 400 and 2350, there is still the phenomenon of category jump, such as the change of category label in the enlarged area.

The proposed method is used to conduct semi-supervised clustering with mixed constraints, and the clustering results are shown in Fig. 5. Comparing with the the enlarged area in Fig. 4, the obtained category label of the data shows that the phenomenon of the repeated jump of data type has been weakened. In Fig. 5, near the time tags 400 and 2350, there are data classification overlaps between the two categories, which verifies the previous analysis of the thermal process data characteristics.

5.1.2 Comparison of different number of initial seed sets

In addition to the hyperparameters mentioned in Sect. 3.6, the information of the initial seed sets will also affect the segmentation results, it includes the location and number of seed sets. If the number of initial seed sets is defined, the location can be determined by the density-based clustering method. The influence of different number of initial seed sets on segmentation results is shown in Figs. 6 and 7.

Compare Figs. 5, 6 and 7, setting a smaller number of initial seed sets can make the segmentation of the original data clearer, but the segmentation effect is evaluated by the model polynomial fitting in segmented data. On this basis, the data of each working condition is taken as an output, the four group of data such as bed pressure, primary air flow, second air flow and fuel flow are selected as an input, and the model polynomial fitting is performed. The results of model polynomial fitting on segmented data are shown in Table 2. The segmentation effect of setting five initial seed sets is better than setting four or six initial seed sets.

Table 2 Comparison results of different number of initial seeds

Full size table

5.1.3 Compare semi-supervised clustering with mixed constraints with Sliding windows, bottom-up, top-down

The data of each working condition is taken as an output, the four group of data such as bed pressure, primary air flow, second air flow and fuel flow are selected as an input, and the model polynomial fitting is performed. The comparison between the clustering results obtained using mixed constraints, Sliding Windows, Bottom-Up, Top-Down are shown in Fig. 8, The specific comparative data is shown in Table 3.

$$ R^{2} = \left( {1 - \frac{u}{v}} \right), $$

$$ u = \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} , v = \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} y_{i} } \right)^{2} $$

(17)

Table 3 Comparison results of different segmentation methods

Full size table

In Table 3, the calculation formula of fitting score is shown in formula 17, different methods have different segmentation results, Sliding Windows, Bottom-Up, and Top-Down achieve good results in the linear fitting of local sub model, but the results in different segments have large deviation, and the overall effect is not as good as the method in this paper. The polynomial fitting results given in Table 3 show that the semi-supervised clustering with mixed constraints can achieve the condition identification, and compared with Sliding Windows, Bottom-Up, Top-Down, the mixed constraint is better for the division of working conditions.

5.2 Online condition recognizer

The online condition recognizer realizes the mapping from continuous data to discrete data. By increasing the number of hidden layer nodes, the network can perform perfectly in training data, but the network has poor normalization ability, as shown in Fig. 9.

Here, the weight coefficients $\alpha_{i}$ of composite distance $D_{ij}$ are equal. Increase the amount of input data information, such as $N_{range}$ and $N_{sample}$ can improve the ability of normalization, but the amount of computation increases significantly. In general, using PSO (particle swarm optimization) to optimize the weight coefficients $\alpha_{i}$ of composite distance $D_{ij}$ can also improve the network normalization ability, but it does not increase the amount of calculation, as shown in Fig. 10. The optimization process of weight coefficients $\alpha_{i}$ is shown in Fig. 11. Therefore, other parameters, such as $N_{range}$, $N_{sample}$ and the characteristic parameters of radial basis function in hidden layer, can be further optimized by PSO.

On the test data set, when the error rate is low, $\alpha_{1} = 0.96,\alpha_{2} = 3.60,\alpha_{3} = 1$, the ratio of the weight coefficient shows that increasing the proportion of the numerical component of the input data and weakening the proportion of the sequence information can improve the normalization ability of the network. On the contrary, it is necessary to enhance the proportion of sequence information to improve the ability of condition identification.

6 Conclusion

The multi-model approach has been proved to be very effective in describing complex processes. The overall model precision is affected by the sub-model form and division of multiple model sub-windows. Once the time span of the sub-window is established, the modeling process within the sub-window is the same as that of a single model. Therefore, the division of sub-windows greatly affects the overall precision of the multi-model method.

In this paper, using machine learning and combining the characteristics of thermal process data, the semi-supervised clustering with mixed constraints is used to realize condition segmentation and sharpen division of time span of the child window. The online condition identifier is designed, and the network normalization ability is improved by optimizing the weight coefficient of input information. The simulation results show that the proposed method is feasible in dividing the sub-windows, and the overall error of the sub-model established is improved.

One limitation of this study is that the method presented in this paper is based on historical data and cannot be segmented online. Although the online recognizer is designed, it is still offline in essence. It limits the application of the method. Improving it into an online method is the next research content.

Availability of data and material

All the data pertaining to this study are available upon request.

References

Hwang DH, Han C (1999) Real-time monitoring for a process with multiple operating models. Control Eng Pract 7(7):891–902
Article Google Scholar
Zhao SJ, Zhang J, Xu YM (2006) Performance monitoring of processes with multiple operating modes through multiple PLS models. J Process Control 16:763–772
Article Google Scholar
Natarajan S, Srinivasan R (2010) Multi-model based process condition monitoring of offshore oil and gas production process. Chem Eng Res Des 88:572–591
Article Google Scholar
Choi SW, Martin EB, Morris AJ (2005) Fault detection based on a maximum-likelihood principal component analysis (PCA) mixture. Ind Eng Chem Res 44:2316–2327
Article Google Scholar
Keogh E, Chu S, Hart D, Pazzani M (2002) An online algorithm for segmenting time series. In: Proceedings 2001 IEEE international conference on data mining, IEEE
Liu X, Lin Z, Wang H (2008) Novel online methods for time series segmentation. IEEE Trans Knowl Data Eng 20(12):1616–1626
Article Google Scholar
Ge ZQ, Song ZH (2009) Multimode process monitoring based on Bayesian method. J Chemom 23(12):636–650
Google Scholar
Zhu ZB, Song ZH, Palazoglu A (2012) Process pattern construction and multi-mode monitoring. J Process Control 22:247–262
Article Google Scholar
Lu NY, Gao FR, Wang FL (2004) A sub-PCA modelling and online monitoring strategy for batch processes. AIChE J 50(1):255–259
Article Google Scholar
Zhao CH, Wang FL, Lu NY, Jia MX (2007) Stage-based soft-transition multiple PCA modelling and on-line monitoring strategy for batch processes. J Process Control 17(9):728–741
Article Google Scholar
Ling W, Hui Z (2021) Fuzzy segmentation of multivariate time series with KPCA and G-G clustering. Control Decis 36(1):115–124
Google Scholar
Zhang Y, Zhang B, Zheng Wu (2019) Multi-model modeling of CFB boiler bed temperature system based on principal component analysis. IEEE Access 8:389–399
Article Google Scholar
Song B, Tan S, Shi HB (2016) Key principal components with recursive local outlier factor for multimode chemical process monitoring. J Process Control 47:136–149
Article Google Scholar
Ye Lv, Zhong YH (2014) A multi-model approach for soft sensor development based on feature extraction using weighted kernel fisher criterion. Chin J Chem Eng 22(2):146–152
Article Google Scholar
Wei L, Yu-pu Y, Na W (2008) Multi-model LSSVM regression modeling based on kernel fuzzy clustering. Control Decis 23(5):560–562, 566
MathSciNet Google Scholar
Shu-Mei Z, Fu-Li W, Shuai T, Shu W (2016) A fully automatic offline mode identification method for multi-mode processes. ACTA Autom Sin 42(1):60–80
Google Scholar
Hongyue G (2017) Multivariate time series segmentation and prediction approach and application research. Dalian University of Technology, Dalian
Google Scholar
Basu S, Banerjee A, Mooney R (2002). Semi-supervised clustering by seeding. In: Proceedings of the nineteenth international conference on machine learning (ICML 2002), pp 19–26
Wagstaff K, Cardie C (2000). Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning (ICML 2000), pp 1103–1110
Deng C, Guo MZ (2008) Tri-training and data editing based semi-supervised clustering algorithm. J Softw 19(3):663–673
Article MathSciNet Google Scholar
Zhu Y, Qian J, Ji Z (2015) An improved COP-Kmeans algorithm based on BFS. Beijing: China science and technology paper online. http://www.paper.edu.cn/releasepaper/content/201507-93
Li CM, Xu SB, Hao ZF (2017) Cross-entropy semi-supervised clustering based on pairwise constraints. Pattern Recogn Artif Intell 30(7):598–608
Google Scholar
Chen ZY, Wang HJ, Hu M et al (2017) An active semi-supervised clustering algorithm based on seeds set and pairwise constraints. J Jilin Univ Sci Edn 55(3):664–672
Google Scholar
Chang Yu, Ji-Ye L, Jia-Wei G, Jing Y (2012) A semi-supervised clustering algorithm based on seeds and pair-wise constraints. J Nanjing Univ Natl Sci 48(4):405–411
Google Scholar
Wei S, Li Z, Zhang C (2018) Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybern 9(7):1085–1100
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Article Google Scholar
Laleh MS, Razaghi M, Bevrani H (2020) Modeling optical filters based on serially coupled microring resonators using radial basis function neural network. Soft Comput 25(1):585–598
Article Google Scholar
Dash CSK, Behera AK, Dehuri S et al (2016) Radial basis function neural networks: a topical state-of-the-art survey. Open Comput Sci 6(1):33–63
Article Google Scholar
Mosavi MR, Khishe M, Hatam Khani Y, Shabani M (2017) Training radial basis function neural network using stochastic fractal search algorithm to classify sonar dataset. Iran J Electr Electron Eng 13(1):100–111
Google Scholar
Montazer GA, Giveki D (2015) An improved radial basis function neural network for object image retrieval. Neurocomputing 168:221–223
Article Google Scholar
Thandar AM, Khine MK (2012) Radial basis function (RBF) neural network classification based on consistency evaluation measure. Int J Comput Appl 54(15):20–23
Google Scholar
Vachkov G, Stoyanov V, Christova N (2015) Growing RBF network models for solving nonlinear approximation and classification problems. In: Proceedings 29th European conference on modelling and simulation
Duliba KA (1991) Contrasting neural nets with regression in predicting performance in the transportation industry. In: Proceedings of the twenty-fourth Hawaii international conference on system sciences. IEEE
Diaconiţa I, Leon F (2011) A learning model for intelligent agents using radial basis function neural networks with adaptive training methods. Bul Inst Politeh Iaşi Autom Control Comput Sci Sect 57(61)(2):9–20
MathSciNet MATH Google Scholar

Download references

Funding

This work is supported by the Central Universities Fundamental Research Fund under Grant 2019MS098.

Author information

Authors and Affiliations

Department of Automation, North China Electric Power University, Lianchi District, Baoding, 071003, China
Yue Zhang, Xiaona Gao, Yingjun Bai, Mengxue Wang & Qing Tian
Hebei Technology Innovation Center of Simulation and Optimized Control for Power Generation, North China Electric Power University, Baoding, 071003, Hebei Province, China
Yue Zhang

Authors

Yue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaona Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yingjun Bai
View author publications
You can also search for this author in PubMed Google Scholar
Mengxue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Gao, X., Bai, Y. et al. Multi-condition identification of thermal process data based on mixed constraints semi-supervised clustering. SN Appl. Sci. 4, 194 (2022). https://doi.org/10.1007/s42452-022-05076-y

Download citation

Received: 07 June 2021
Accepted: 25 May 2022
Published: 06 June 2022
DOI: https://doi.org/10.1007/s42452-022-05076-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-condition identification of thermal process data based on mixed constraints semi-supervised clustering

1 Introduction

2 Background knowledge

2.1 Cost of condition division

2.2 The traditional methods for multi-condition identification

2.3 Semi-supervised clustering method

2.3.1 Semi-supervised clustering based on constraints

2.3.2 Semi-supervised clustering combining multiple methods

3 Multi-model condition identification of thermal process data based on mixed-constraint semi-supervised clustering

3.1 Characteristics of thermal process data

3.2 The flowchart of the condition identification

3.3 Initial seed set establishment

3.4 Clustering data screening based on distance \(D_{ij}\)

3.5 Clarification of class boundaries based on mixed constraints

3.6 Hyperparameter optimization

4 Online condition recognizer based on RBFNN

4.1 Structure design of RBF neural network

4.2 Weight calculation of RBFNN

5 Simulation analysis

5.1 Condition division

5.1.1 Compare the single time constraint with mixed constraints

5.1.2 Comparison of different number of initial seed sets

5.1.3 Compare semi-supervised clustering with mixed constraints with Sliding windows, bottom-up, top-down

5.2 Online condition recognizer

6 Conclusion

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-condition identification of thermal process data based on mixed constraints semi-supervised clustering

1 Introduction

2 Background knowledge

2.1 Cost of condition division

2.2 The traditional methods for multi-condition identification

2.3 Semi-supervised clustering method

2.3.1 Semi-supervised clustering based on constraints

2.3.2 Semi-supervised clustering combining multiple methods

3 Multi-model condition identification of thermal process data based on mixed-constraint semi-supervised clustering

3.1 Characteristics of thermal process data

3.2 The flowchart of the condition identification

3.3 Initial seed set establishment

3.4 Clustering data screening based on distance \(D_{ij}\)

3.5 Clarification of class boundaries based on mixed constraints

3.6 Hyperparameter optimization

4 Online condition recognizer based on RBFNN

4.1 Structure design of RBF neural network

4.2 Weight calculation of RBFNN

5 Simulation analysis

5.1 Condition division

5.1.1 Compare the single time constraint with mixed constraints

5.1.2 Comparison of different number of initial seed sets

5.1.3 Compare semi-supervised clustering with mixed constraints with Sliding windows, bottom-up, top-down

5.2 Online condition recognizer

6 Conclusion

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation