Continuous detection of concept drift in industrial cyber-physical systems using closed loop incremental machine learning

The embedded, computational and cloud elements of industrial cyber physical systems (CPS) generate large volumes of data at high velocity to support the operations and functions of corresponding time-critical and mission-critical physical entities. Given the non-deterministic nature of these entities, the generated data streams are susceptible to dynamic and abrupt changes. Such changes, which are formally defined as concept drifts, leads to a decline in the accuracy and robustness of predicted CPS behaviors. Most existing work in concept drift detection are classifier dependent and require labeled data. However, CPS data streams are unlabeled, unstructured and change over time. In this paper, we propose an unsupervised machine learning algorithm for continuous concept drift detection in industrial CPS. This algorithm demonstrates three types of unsupervised learning, online, incremental and decremental. Furthermore, it distinguishes between abrupt and reoccurring drifts. We conducted experiments on SEA, a widely cited synthetic dataset of concept drift detection, and two industrial applications of CPS, task tracking in factory settings and smart energy consumption. The results of these experiments successfully validate the key features of the proposed algorithm and its utility of detecting change in non-deterministic CPS environments.


Introduction
Recent advances in cyber-physical systems (CPS) have necessitated machine learning algorithms in embedded applications to operate in nonstationary, time variant environments [1]. In CPS, learning in nonstationary environments, commonly known as concept drift learning, focuses on event driven changes in the environment. The underlying models generated by learning algorithms are influenced by changes in feature information (x) and target variables (y) due to such evolving concepts [2]. Concept drift occurs when this feature information (x) and target variables (y) change over time. Concept drift can be formalized as a change in the joint probability P(x, y) , which is defined as: In a smart factory setting, a large number of Industrial Internet of Things (IIoT) devices and sensors will be collecting data on machine status and factory operations [3]. These data are transmitted to CPS which will then use a variety P(x, y) = P(y∕x) × P(x) of methods to predict when a machine is malfunctioning, or a process is suboptimal. Such anomalous behaviors are detected as concept drifts [3]. Detection of concept drifts in CPS decreases the negative impact of a compounding error and enables cost-effective predictive maintenance. However, data streams in industrial CPS are composed of unlabeled target variables that do not fit into predefined classes [4,5]. Ensemble learning algorithms that integrate multiple supervised algorithms find it infeasible and impractical to detect concept drifts in this environment. To address these challenges as well as to manage complex data patterns and distributional assumption violations embedded in the industrial applications of CPS data streams, a novel unsupervised machine learning technique is needed.
Current research literature defines two distinct types of concept drift, real and virtual [6]. In real concept drift, the conditional distribution P(y∕x) of the target variable p(y) given the input features P(x) changes while the distribution of the input remains unchanged. In virtual concept drift, input data p(x) changes without affecting the conditional distribution P(y∕x) . In both types the joint distribution P(x, y) changes [6]. A large body of existing work assumes the immediate availability of labels and thereby focuses on supervised machine learning algorithms for concept drift detection and adaption [6]. This assumption is not valid for CPS data streams that generate virtual concept drifts where the target label is only available following an unknown/undefined delay. A closed loop framework that has been proposed for real concept drift detection [6], operates even in cases where the target variables are delayed. However, the framework does not support concept drift detection from unlabeled data in evolving data streams.
The key challenges of continuous concept drift detection from CPS data streams are: (1) learn from large volumes of unlabelled data arriving in a short time span as data storage is impractical and infeasible (2) incorporate detected concept drift information into new data (3) unlearn (or forget) data which corresponds to concepts that are irrelevant and (4) integrate with the proposed closed loop framework for updating predictive models based on drift detection. The proposed unsupervised machine learning algorithm overcomes the aforementioned challenges with the following research contributions: 1. A novel unsupervised learning algorithm for continuous detection and adaption to concept drifts that is also able to distinguish between reoccurring and abrupt concept drifts. 2. Extending an existing closed loop framework for concept drift detection to include unlabeled data from evolving data streams. 3. Demonstration of the proposed algorithm and extended framework on SEA dataset and CPS data streams; physical activity monitoring, and energy consumption.
The rest of the paper is organized as follows. Section 2 reports related work in CPS and concept drift, followed by Sect. 3 which delineates the algorithm development; an extension to the generic framework for concept drift followed by an explication of the adaptive learning paradigms (incremental, decremental and online) used in the proposed algorithm. Section 4 presents the proposed unsupervised, adaptive learning algorithm and demonstrates its features using the SEA dataset. Section 5 presents experiments conducted on two industrial datasets demonstrating distinctive features of the algorithm and Sect. 6 concludes the paper.

Related work
A CPS has been defined as a system that integrates its hardware function with a cyber-representation acting as a virtual representation for the physical part. It interlinks embedded systems, which are real-time and deterministic with cloud platforms, which are probabilistic and less-constrained [7]. Within this definition itself, the importance of unsupervised learning from unlabeled data is established as a key driver of the development of CPS through the integration and fusion of both cloud platforms and embedded systems. The introduction and integration of intelligent technologies has been discussed and advocated to address the challenges of flexibility, robustness, adaptation, and reconfigurability in CPS [7][8][9]. Furthermore, the key technological and operational characteristics required for the active use of cyber-physical systems in future smart factories is reported in [10]. Among these, the criticality of concept drift detection and the use of unsupervised machine learning have also been highlighted.
In concept drift literature, two distinct types of concept drift are defined, real and virtual [6]. A majority of this literature assumes the immediate availability of labels and thereby focuses on supervised machine learning algorithms for concept drift detection and adaption. This assumption is not valid for real-world data streams that generate virtual concept drifts where the target label is only available following an unknown/undefined delay. Adaptive machine learning algorithms 1 3 have been proposed for such unlabeled data streams, and these can be categorized into active drift detection techniques, ensemble techniques and hybrid techniques. Active drift detection learns from a partially labeled set of sample data [11], such as 'Just-in-time' classifier and 'Intersection of Confidence' which use Cumulative Sum based active drift detection [12,13]. Sliding window mechanisms such as 'Concept Adapting Very Fast Decision Tree' [14] and 'Incremental Online Information Network' [15] algorithms have also been proposed for active drift detection. The 'Early Drift Detection Method' [16] identifies gradual drifts by monitoring the distance between errors of a classifier and comparing the mean to a threshold. On the other hand, ensemble techniques such as multi-classifiers attempt passive drift detection using techniques such as 'Streaming Ensemble Algorithm (SEA)' [17] and 'Adaptive Hoeffding Tree Bagging' [18] where the oldest concept is replaced with the newest concept. The 'Dynamic Integration' [19] and 'Dynamic Weighted Majority (DWM)' [20] replace the least contributing member. Combining sliding windows from active detection and classifier ensembles, hybrid approaches such as 'Random Forests with Entropy' [21] and 'ADWIN' have been proposed. 'Massive Online Analysis' (MOA) implements ADWIN as a hybrid approach [22]. A semi-supervised learning method for virtual concept drift detection proposed by is based on adaptable clustering, which analyzes the distribution of clusters and updates cluster centroids according to concept drifts in data streams. More recently, unsupervised learning methods have also been proposed for concept drift detection, such as the Plover algorithm that uses varied measure functions [23], online sequential extreme learning machines [24], and a discriminative classifier with a sliding window [25]. In industrial settings, concept drift detection approaches have been proposed for predictive maintenance [26], sensor networks [27], and smart city applications [28].
In terms of machine learning capabilities in CPS, clustering data streams from high throughput machining cycle conditions [29], real-time reliability evaluation of CPS system [30], an IoT-based wearable system for fetal movement monitoring [31], detecting time synchronization attacks on CPS [32], and behaviour-based attack detection and classification [33] are some of the leading instances of direct value generation from machine learning. In contrast, the number of studies focusing on concept drift detection in CPS is limited. The primary work is in the detection and adaption to imbalanced industrial data streams using an ensemble of offline classifiers [3]. This paper highlights the limitations of conditionbased maintenance in addressing or even detecting concept drift and they propose an ensemble approach to offline classification to address the three-stages of condition-based maintenance with concept drifts and imbalance data. It is also pertinent to note that a primary recommendation for future work in concept drift is the detection and validation of change detection and adaptation in the absence, delay and on-demand labeling of CPS data streams. Drawing on this context of technological and operational characteristics required of industrial CPS, as well as the limited application of machine learning in the development of such features leads up towards the contribution of this paper, where we propose a novel machine learning algorithm for continuous detection and adaption to concept drifts from CPS data streams and the integration of this capability into an established closed loop framework for concept drift detection.

Algorithm development
This section begins by extending the aforementioned closed loop framework [6] to include the proposed unsupervised machine learning algorithm, followed by a subsection on the novel learning features of the proposed algorithm, incremental, decremental and online learning.

Extension to the closed loop framework
The closed loop framework proposed for real concept drift detection updates a predictive model based on drift detection [6]. The framework is composed of four modules; memory, machine learning, loss estimation, and change detection. The data stream is initially received by the memory module and then presented to the machine learning module. The loss estimation module tracks the performance of the machine learning algorithm and sends information to the change detection module to update the model and machine learning algorithm.
In the proposed extension ( Fig. 1), the memory module defines what data is presented and how the data flow is managed. The unsupervised learning module (the proposed algorithm) determines how online, incremental and decremental learning are used for detection and adaption to concept drift. Concept drift detection module defines the measure that can be used for detection of various types of concept changes that occur in the data stream and generate alerts for decision-making. The supervised learning module will be notified as the concept drift are detected and trigger the loss estimation module to verify accuracy in the predictive module using late feedback.

Adaptive learning properties
The proposed algorithm is based on three adaptive learning features that are required for concept drift detection from unlabeled data streams. They are, (1) incremental, (2) decremental and (3) online learning. Incremental learning: is necessary for learning from data streams as it effectively addresses both time and memory constraints [34][35][36]. Since incremental learning algorithms learn from continuous incoming data streams, they do not need an initially labeled dataset for training. They assume that the concepts learned before are similar to the concept of new incoming data [37].
Decremental learning: is used to unlearn (to forget) representations of the data stream which are no longer relevant. Learning from data streams should be continuous while preserving the previously known useful knowledge. Natural cognitive systems gradually forget previously learned information [36]. Decremental learning is used for forgetting old concepts and adapt to new concepts since concepts learn at one time is not relevant at another and dilutes the new concept with the old concept.
Online learning: Data streams generate data at high speed and in large volumes. Online learning is introduced to address this limitation of high frequency and high-velocity data streams that influence the iterative nature of a machine learning algorithm.
The incremental learning features of the proposed algorithm are based on the Incremental Knowledge Acquisition and Self Learning (IKASL) algorithm [38]. The IKASL algorithm is an unsupervised, incremental learning algorithm that continues to learn new data based on generalized layers of past learning outcomes. It has been successfully demonstrated on social media text mining [39] and smart electricity meter data for pattern classification and demand forecasting [40][41][42]. Incremental learning in IKASL is initiated by aggregation of unsupervised machine learning outcomes with the formation of generalization layers. Each generalized node expands into its own feature map to generate a topological representation of subsequent input vectors. The proposed algorithm addresses the main limitation of existing concept drift detection in CPS through the above-mentioned unsupervised adaptive learning features. These features allow the proposed algorithm to detect concept drifts with increased accuracy and efficiency compared to the algorithms currently found in literature.

The proposed algorithm
Based on the IKASL learning approach, this algorithm advances into decremental learning and online learning for continuous detection and adaption to concept drift from an unlabeled data stream. A variation of this technique was applied to explore the importance of context awareness to estimate road traffic [43], investigate the impact of driver behavior change on the coordination between self-driven and human-driven vehicles [44], and as the core machine learning function of an expansive intelligent traffic data integration and analysis platform [45]. The proposed algorithm consists  (Fig. 2), each function is discussed below. Online learning: Online k-means clustering is used for one pass online learning for efficient one pass processing of a stream of data rather than storing and processing in batches [46]. In the first iteration k and t a are user-defined for online k-means and the generated cluster feature vectors ( CFV OC ) are input to the offline IKASL function. In subsequent iterations, k is the number of cluster feature vectors ( #GFV IKASL ), t x (e.g. t b -t a ) is the time taken by IKASL for the learning process, and cluster feature vectors for online k-means are the generalized nodes received from the IKASL function. These automated k and t x implements the nonparametric nature of the algorithm.
Incremental and decremental learning: IKASL learning occurs as per the original algorithm for incremental learning. Inputs are batches of CFVoc received periodically from the online learning function (Fig. 2). We extended the IKASL function to facilitate decremental learning by forgetting the generalized node that is not the winner of any of the inputs in the data set of the subsequent learning phase. In this case, the generalized node is forgotten indicating the concept has changed or evolved. Associations between nodes in the generalization layers will be persistent, leading to the creation of a memory-like structure based on the aggregated outcomes of the learning stages. Adaptation to a new concept is formalized with the incremental and decremental learning.
Concept drift detection: Concept drift detection is carried out by calculating the distance between generalized nodes ( CFV IKASL ) of consecutive iterations. The algorithm is sufficiently generic for any distance measure to be used, such as Euclidean distance, heterogeneous Euclidean overlap distance, Mahalanobis distance, Hellinger distance [47]. As a concept drift occurs, there would be a significant distance change, followed by a reduced distance change in the following iteration. Concept drifts detected are further identified by the algorithm as abrupt concept drift and reoccurring concept drift.

Demonstration
The SEA dataset [48], a synthetic dataset widely used in supervised concept drift detection, was used to demonstrate features of the proposed algorithm. SEA concept generator models real, abrupt concept drifts which have three independent real-valued attributes in [0, 10]. The data set consists of 60,000 examples in four concepts, 15,000 examples for each having different threshold values for the concept function. Figure 3 illustrates concept drifts detected from the SEA dataset. The x-axis denotes timestamps of incremental learning, and distance measure (in this case Euclidean distance, ED n ) calculations from step 5 of the algorithm are denoted on the y-axis. Abrupt concept drifts were detected at timestamps 2, 9, 16 and 26 with ED n 0.43, 0.39, 0.31 and 0.39 respectively. Results were validated with concept drifts detected in the same dataset by [48,49].

Research
Discover Artificial Intelligence (2021) 1:7 | https://doi.org/10.1007/s44163-021-00007-z 1 3 To demonstrate the importance of real-time concept drift detection, accuracies of a supervised predictive algorithm with and without concept drift detection were compared (Fig. 4). For the latter case, the algorithm was trained with first 1000 records, and the trained model was used to test the data in each subsequent batch of 1000 records. The accuracy of the algorithm reduces as the concepts evolve over time (Fig. 4). For the former case, the algorithm was trained and retrained at each concept drift detection with the most recent 1000 records. The accuracy of the algorithm improves as the algorithm was re-trained with the evolved concepts (Fig. 4).

Demonstration on modified SEA dataset
An advantage of the SEA dataset generator is that it can be configured to generate data with the repetition of the same four concepts to evaluate the identification of reoccurring concepts. For this demonstration, we generated a SEA dataset with four concepts repeated three times. The proposed unsupervised algorithm was analyzed against the corresponding concept drifts shown by MOA (Fig. 5). A total of twelve (four concepts repeated three times) concept drifts were identified by the proposed algorithm (Fig. 6)   shown in Table 1, time taken to detect a reoccurring concept drift reduces overtime demonstrating the incremental nature of the learning.

Experiments
This section presents experiments conducted on two industrial applications of CPS data streams, activity monitoring and energy consumption. Both experiments are based on real-world settings, where cyber-physical systems have to address the technical challenges of volume of data, frequency of data generation as well as the variety of data, in terms of recurring patterns, outliers and noise.

Wearable sensors in industry CPS
Activity monitoring aims at providing accurate information on human activities by leveraging wearable devices available in today's sensory rich industrial data environment. Numerous applications in industrial settings propose use of activity monitoring. Activity recognition is proposed in proactive instruction systems where instructions for the next activity are displayed at end of a tracked activity [50]. Further, task tracking by activity monitoring is used in training car assemble line workers [51]. Another major use case is quality control which verifies task performance and completion.
In industrial health and safety monitoring systems, activity is monitored for unusual movements such as vibration or acceleration to generate alerts [52]. The PAMAP2 dataset [53] comprises sensor data from three inertial measurement units and a heart-rate monitor. The data are recorded while nine subjects' complete different physical activities such as lying, standing, walking, running, cycling and rope jumping. This multivariate, time-series dataset includes 52 attributes and more than 3.8 million data records. With the use of this labeled dataset, we aim to evaluate the detection of concept drifts. Activity data from one subject, processed in a single data stream is used for the demonstration. Figure 7 illustrates the concept drifts detected from the activity dataset. Each unsupervised concept drift was mapped to the labelled activity as shown in Table 2. CD6 and CD8 were identified as reoccurring concept drifts, which was confirmed by the labels 'Ascending stairs → Descending stairs' . CD5 resembling vacuum cleaning is a gradual concept drift [54] where the drift happens during a period of time. Further experiments on the data showed that the subject's heart rate gradually increased during this period due to the activity. The algorithm proposed in the paper cannot detect gradual concept drifts. This has also been noted in sect. 6 as a future work.
Further, the multi-dimensional generalization nodes (explained in Sect. 3) are visualized using Sammon's mapping [55], a nonlinear projection technique that preserve correlations among nodes, to understand the concept drift detection (Fig. 8). Each activity is learnt in several execution iterations and is denoted by several generalization nodes. Generalization nodes mapped to an activity are clustered together, and low-intensity activities and high-intensity activities are separated in the feature space. Hence, Sammon's mapping results confirm the learning of the concept drift detection and adaptation are accurate. In this labelled dataset, performance of concept drift detection is evaluated with the indicators defined respectively by: These indicators provide an overview of the abrupt and reoccurring concept drift detection, where precision is the probability of a concept drift detection is a true positive; recall is the probability that a true positive concept drift is detected; F_score is a comprehensive indicator which is the harmonic mean between precision and recall. The accuracy of abrupt and reoccurring concept drift detection for all nine subjects are as shown in Table 3, accuracy has significantly improved above the baseline performance of 90% stated in CPS literature [3].

Industrial energy consumption
Smart meters are widely used for energy consumption recording in industrial settings and frequently linked to the CPS data streams for overall monitoring of a smart factory. This dataset contains measurements of electricity consumption at a one-minute sampling rate, for four years, between December 2006 and November 2010 [56]. The extended framework was tested with this dataset to identify daily and monthly patterns (Fig. 9). Figure 10 demonstrates concept drift detection of daily pattern recognized through concept drift detection. Figure 10a denotes concept drifts (reoccurring and abrupt) detected through one week. The section highlighted in Fig. 10a illustrates the concept drifts detected on Sunday, 17th December 2006. The reason for the concept drift is demonstrated in Fig. 10b-d and outlined in Table 4. Usage of sub-meter-1 (kitchen appliances) at approximate timepoints; 10.30 a.m. and 2.30 p.m. have been detected as CD3 and CD4 respectively (Fig. 10b). Usage of sub-meter-2 (laundry room appliances) at approximate timepoints; 1 a.m. and 10.30 a.m. has been detected as CD1 and CD3 respectively (Fig. 10c). Usage of sub-meter-3 (water heater and air-conditioner) at approximate timepoints; 5 a.m. and between 10.30 a.m. and 10 p.m. has been detected as CD2 and CD3 respectively (Fig. 10d).

Conclusion
CPS data streams of industrial applications generate large volumes of data at high velocity for real-time monitoring of the corresponding physical entities. The detection of dynamic and abrupt changes (formally defined as concept drifts) in these time-critical and mission-critical systems is a complex challenge. In this paper, we proposed a new unsupervised, incremental machine learning algorithm to detect and adapt to concept drifts and distinguish between abrupt and reoccurring drifts. We further extended a closed loop concept drift detection framework to incorporate drift detection from unlabeled data streams, such as industry CPS. The proposed algorithm exhibits three learning features; online, incremental and decremental. Experiments were conducted on a benchmark concept drift dataset, the SEA dataset, and CPS data streams from practical industrial application; activity monitoring and energy consumption. Results from all three experiments successfully demonstrate key features of the proposed algorithm in detection, adaption to concept drift and identification of abrupt and reoccurring concept drift. Extension to the concept drift detection framework was also demonstrated using the energy consumption dataset to provide classifier independent, near real-time analysis of drifts in energy usage. As future work, we intend to improve the algorithm to detect concept drifts of other types such as gradual and incremental. Furthermore, we intend to develop a methodology based on sequence analysis to determine causality of concept drift.   [41] Increase in sub-meter_2 around midnight CD2 [43] Increase in sub_meter_3 around 5 a.m. CD3 [48] Increase in sub_meter_1, sub_meter_2 and sub_meter_3 around 9.30 a.m. CD4 [55] Increase in sub_meter_1 around 2.30 p.m.