Nonparametric Bayesian Method for Robot Anomaly Diagnose

In this chapter, we introduce two novel anomaly diagnose methods using the Bayesian nonparametric hidden Markov models when anomaly triggered, including i)multi-class classifier based on nonparametric models, ii) sparse representation by statistical feature extraction for anomaly diagnose. Additionally, the detail procedure for anomaly sample definition, the supervised learning dataset collection as well as the data augmentation of insufficient samples are also declared. We evaluated the proposed methods with a multi-step human-robot collaboration objects kitting task on Baxter robot, the performance and results are presented of each method respectively.


Introduction
In addition, anomaly diagnoses refers to the process of clearly classifying anomalies using supervised learning on the premise that the robot detects the occurrence of anomalies. Due to the uncertainty, diversity and unpredictability of anomalies, the robot's anomaly diagnoses can effectively improve the safety performance of the robot system and provide guarantee for subsequent anomaly repair behaviors. Therefore, the online abnormal monitoring and diagnoses of robots is the focus and difficulty of robots' long-term autonomous operation.

Related Work
The robot introspection not only needs to implement the movement identification and monitoring of anomalies, but also to diagnose the types of anomalies for providing sufficient support in subsequent recovery. The samples used for anomaly diagnoses in this chapter would have the same modal information as in anomaly monitoring and belong to multidimensional time series. Since the problem of multidimensional time series classification is a supervised learning problem [15][16][17], which aims to determine the labels of multidimensional time series of the same structure and length, Orsenigo et al. [18] proposed a discrete support vector machine (Discrete Support Vector Machine) Machine (DSVM) classification method. This method benefits from the concepts of warping distance and softened variable margin and realizes the classification of multidimensional time series. Lee et al. [19] proposed a time series classification method based on K-Nearest Neighbor [20] (K-Nearest Neighbor, KNN), which was successfully used to evaluate the traffic prediction application of the mobile telecommunications industry. Seto et al. [21] combined dynamic time warping [8,22] (DTW) and KNN to derive a multivariate time series classification based on dynamic time warping template selection, and applied it to human activity recognition. In addition to the above-mentioned distance-based multidimensional time series classification method, the method of establishing feature vectors through dimensionality reduction methods has also received attention [23][24][25]. Nanopoulos et al. [26] proposed a method based on statistical feature extraction to establish feature vectors, and then trained a multi-layer perceptron (MLP) from the feature vectors and target categories to achieve the classification of time series. Utomo et al. [27] proposed to classify the multidimensional data of the hospital intensive care unit by the method of multidimensional compression description (MultiCoRe), which extracted the features including time domain and frequency domain for classification. Jaakkola et al. [28] introduced a method combining HMM and SVM for classifying protein domains. This method can also be applied to other fields of biological sequence analysis. Raman et al. [29] proposed the use of a multi-layer HDP-HMM modeling method to achieve classification of human movements, training HDP-HMM from all movement motion samples, and building a multi-objective classifier from this model. Test the size of the log-likelihood function of the specimen to achieve classification. Dilello et al. [30] used sHDP-HMM to classify robots with multiple modal anomalies. The model assumes that the observations are independent of each other, which weakens the correlation between abnormal data to some extent. Although the training of neural networks with small samples is easy to cause overfitting and difficult to achieve online monitoring, but because of its better modeling capabilities and less data pre-processing process, multivariate time based on deep learning Sequence classification is currently being extensively studied by scholars [31][32][33][34].
Besides the aforementioned that multivariate time series classification has received significant interest in areas such as healthcare, object recognition, and human action recognition [35][36][37]. Meanwhile, most of the solutions to the classification problem of multidimensional time series are solved by supervised learning, that is, the multiobjective classifier learning is performed by manually labeled samples in advance by humans. In view of this, the anomaly diagnoses problem considered in this chapter is mainly for the classification of conventional anomaly samples with multimodel observation of robotic systems. In robotics, complex semi-autonomous systems that have provided extensive assistance to humans, anomalies are occur occasionally [39]. For this reason, enabling accurate, robust, and fast anomaly monitoring and diagnoses in robots has the potential to enable more effective, safer, and more autonomous systems [38,49]. Execution monitoring, especially with the focus of detecting and classifying anomalous executions has been well studied in robotics [40]. Daehyung Park et al. [39] introduced a multimodal execution monitoring system to detect and classify anomalous executions for robot-assisted feeding. Specifically, the classifier labelled not only the anomaly type but also the cause of the anomaly by using an artificial neural network system. The neural net successfully integrated the anomaly monitoring and diagnoses for assisting a person with severe quadriplegia. However, due to the HMM anomaly detector limitations, the classified accuracy for anomaly types and causes are 90.0%, 54.0% respectively. In [41], Bjreland et al.introduced a monitoring system that also can detect, classify, and correct anomalous behaviors using a predictive model. Those two integrated system give us a lot of inspiration and confidence for improving the robot introspection with the anomaly monitoring and diagnoses.
The anomaly diagnoses has been used to determine the source of anomalies while running manipulator or mobile robots [42]. Several common time series classification algorithms are distance based metrics, such as the k-nearest neighbors (KNN) approach have proven to be successful in classifying multivariate time series [43]. Plenty of research indicates Dynamic Time Warping (DTW) as the best distance based measure to use along KNN [44]. Besides the distance based metrics, the feature based algorithms have been used over the years [45], which rely heavily on the features being extracted from the time series data or modeling the time series with the parametric methods. Hidden State Conditional Random Field (HCRF) and Hidden Unit Logistic Model (HULM) are two successful feature based algorithms that have led to state of the art results on various benchmark datasets [46]. However, HCRF is a high computational efficiency algorithm that detects latent structures of the input time series using a chain of k-nominal latent variables. Further, the number of parameters is linearly increasing of latent states required. To overcome this, HULM proposes using H binary stochastic hidden units to model 2 H latent structures of the data with only O(H ) parameters. Another approach for multivariate time series classification is by applying dimensional reduction techniques or by concatenating all dimensions of a multivariate time series into a univariate time series [47].

Problem Statement
Another main task of robotic introspection is to accurately evaluate the potential pattern of real-time multi-modal sensing data. It is not limited to the identification of robot movement and abnormal monitoring, but also has the ability to classify conventional abnormalities. Anomaly diagnoses mainly refers to the identification of conventional learned abnormal types through supervised learning after the robot detects the occurrence of abnormalities. An intelligent robot system includes but is not limited to the following types of abnormalities: system abnormalities, internal sensors anomalies, motion instruction failures, noise or damage to sensors and actuators, robot-human-environment interactions, or external disturbances in the environment. Park et al. [50] carried out a diagnoses of the types of robot anomalies and their causes, and integrated this function into a human-robot interactive robot system for feeding disabled people. The types of anomalies considered in this section mainly come from external disturbances caused by system anomalies (kinematic anomalies, sensor failures, etc.) or human improper interference behavior or robot ends in a human-machine collaboration environment. In order to improve the compactness and portability of the proposed robotic sensing system, this section still adopts the method of non-parametric Bayesian hidden Markov model to implement multi-class anomaly diagnoses.

Collection and Augmentation of Anomaly Sample
The collection of anomalous samples in this book takes into account the precursory period and the duration of a certain period of time when the anomaly occurs, which is determined by the value of the given anomalous window range win_len (for ease of adjustment, the duration of the precursory period and duration is usually equal, both are half of win_len). As shown in Fig. 5.1, a sample with an anomaly type of "tool collision" is extracted (light red background area) around the trigger time (black vertical dashed line) of the previous anomaly monitor.
In addition, due to the great randomness and uncertainty of the occurrence of anomalies, the number of samples between different types of anomalies is extremely unbalanced, and for some anomalies (collision between the robot and the environment), the impact on the robot body may even cause damage There is no guarantee  In the algorithm, K represents the number of categories, Means represents the mean. As the name suggests, this method is a method for Dimensional or multidimensional data points, and even the time series considered in this article) clustering algorithm, the core idea is to use a preset K value and the initial centroid of each category (Centroid) to distance (Euclidean distance, Manhattan distance and Time series distance measure (DTW) is used to cluster the data points that are closer, so that the average value of the clustered iteration is preferentially obtained with the smallest objective function value, as shown in the following formula: where, the symbol J is the objective function of the K − means algorithm; K is the number of clusters; N is the number of data points; || · || is expressed as the distance The pseudocode for data augmentation for sparse anomaly class function between the data points and the centroid; and c represents the centroid of the category. From the implementation of the K − means algorithm, we know that by clustering data points close to the centroid, and data points in the same category have similar statistical characteristics, the goal of satisfying data enhancement is to hope to generate similar data based on sparse dataset New data for characteristics. To this end, the proposed algorithm, after initializing K random centroids, loops through the following two steps: (1) Dividing. Divide each data point to the nearest centroid according to the distance measurement of DTW; (2) Update. According to the selected mean measurement method, the centroid is moved to the center of various types of designated data points. This time, each time the updated centroid is used to generate a composite data, the algorithm's pseudo code is shown in Fig. 5.3. Meanwhile, an example of one-dimensional data enhancement for an abnormal type is shown in Fig. 5.4. As can be seen from Fig. 5.4, the proposed algorithm can effectively capture the statistical (such as peak and mean) characteristics of the original data.

Anomaly Diagnose Based on Nonparametric Bayesian Model
The anomaly diagnoses is triggered once an anomaly is detected. A system can possibly address a wide variety of types of anomalies including low-level hardware anomalies: sensor and actuator noise or breakage; mid-level software contingencies like: logic errors or run-time exceptions; high-level misrepresentations: poor modeling of the robot, the world, their interactions, or external disturbances in the In this work, we deal with anomalies caused by external disturbances generated either by intrusive human behavior or resulting from poor modeling or anticipatory ability on the robot's end.

Multiclass Classifier Modeling
We consider the problem of multiclass classification based on multivariate time series, that is, to find a function f (y n ) = c n given the is an observation sequence and c n ∈ {1, .., c} its corresponding anomaly class. An observation y n t ∈ R d consists of the same multimodal features as described in Sect. 4.6.1 at time-step t. Our objective is diagnoses, where given a new test observation sequenceŷ, we have to predict its corresponding anomaly classĉ. Here, theŷ is recorded by considering a window_si ze around the anomaly occurred moment, and theĉ is labeled manually during training procedure. We represent each anomaly class by a separate sHDP-VAR-HMM as outlined in Sect. 4.6.1, the Θ c are the parameters for class c. It would be straightforward to estimate the posterior density of parameters p(Θ c )|Y, C) by That is, each sHDP-VAR-HMM is trained separately and a conditional density p(y|c) for each class is trained throughout the process as defined in Fig. 5.5.

• Anomaly Dataset in Kitting Experiment
The dataset captures sensory-motor signatures of the Kitting task under anomalous scenarios as outlined in this paper. A total of 136 anomalous events were recorded in 142 experimental executions across all skills. The proportions for each anomaly class were as follows: TC 15.7%, HC 16.7%, HCO 16.7%, NO 13.0%, OS 16.7%, WC 15.7%, and OTHER 5.6%. To intuitively explain, analyze, and propose corresponding diagnose methods, all the anomalies identified using our proposed method are taken into further consideration. That is, we extract all the anomaly data points from each abnormal movement and concatenate them with labels, in which the anomalous data point is also restricted with the same features as considered in anomaly identification. Those identified anomalies are visualized via the t-distributed Stochastic Neighbor Embedding (t-SNE) method [51] and labelled manually, as shown in Fig. 5.6.

• Anomaly Diagnoses Window Considerations and Online Recording
Note that when an anomaly monitoring is flagged, we consider a window of time duration ±window_si ze for the subsequent anomaly diagnoses. 1 In cases where an anomaly is detected towards the beginning of a skill execution, and the duration of existing data prior to the detection is less than our window_si ze. In this work, we do not extract data for diagnoses as we deem a minimal presence of signals before and after the detection crucial for the diagnoses. The sensory-motor data collected at this stage, allows to build basic models of the anomalies described in Chap. 4. Our system recorded sensory data online. Upon detection of an anomaly, the timestep at which the flag occurred is recorded. Then, we record (online) the multimodal signatures (F/T, Cartesian velocity, and Tactile signals) before and after the anomaly (also referred to as the sample) according to the window_si ze. Signals were resampled at 10 Hz to achieve temporal synchronization for all modalities and the further preprocessed as described in preprocessing step.

• Parameter Settings and Results
To avoid overfitting, we performed 3-fold cross validation for each anomaly type separately. An 18 dimensional feature vector was used as presented in Equation 4.53. We compare against five baselines consisting of parametric HMMs (with a fixed number of hidden states), various observation models, and various variational inference methods. The training and testing dataset for all the diagnoses methods was the same in this comparison.
(1) Parametric HMM Settings: We train 3 differing types of HMMs for each anomaly class. Each HMM uses four different numbers of hidden states K ∈ {3, 5, 7, 10} to train each class. We need to estimate the transition matrix, mean, and covariance parameters. To this end, K-Means++ [52] clusters the data and yields Visualization of all the identified anomalies using t-SNE method. We heuristically label the clusters according to the analysis of unexpected anomalies and the data points density in such a kitting experiment initial estimates. During testing, we evaluate a test sample against all classes and select the class with the largest log-likelihood. The parametric HMMs result summary is presented in Table 5.1.
• HMM-Gauss-EM: A classifier based on a classical HMM with Gaussian observations was trained independently for each anomaly class. The standard Baum-Welch Expectation Maximization algorithm [53] was used to learn the HMM model.
• HMM-Gauss-VB: A classifier based on classical HMM with Gaussian observation was trained independently for each anomaly class. The standard Variational Bayesian (VB) inference algorithm [54] was used to learn the HMM model.
• HMM-AR-VB: A classifier based on a classical HMM with first-order autoregressive Gaussian observations was trained independently for each anomaly class. The standard VB inference algorithm [54] was used to learn the HMM model.
In conclusion, we find that for parametric HMMs (i) the best diagnoses accuracy rate was 95.7% when using 5-7 fixed states; (ii) Variational inference algorithms outperformed the standard EM algorithm; (iii) The Autoregressive observation model classified better than the Gaussian model due to it's linear conditional nature; (iv) Parametric HMMs, in general, are less effective to jointly model the dynamics of the  robotic task. Therefore, we consider a Bayesian nonparametric verision of HMMs with a hierarchical prior which shares statistical strength across the training samples.
(2) Nonparametric HMMs Setting: We also train 4 kinds of classifiers base on nonparametric HMMs, independently for each anomaly class. We specify the truncation number of states as K=10 as explained in Sect. 4.6.1. Comparing to the parametric HMMs, the actual number of hidden states of each anomaly class is automatically learned from data in non-parametric fashion. The same procedure during testing as described in parametric HMMs. Benefits from the automatic state inference with HDPHMM, the auto-regressive correlation of the anomaly data, and the effective variational inference techniques. The summary of diagnoses results of non-parametric HMMs are presented in Table 5.1. Those numbers in blue are denoted the optimal diagnoses accuracy of specific anomaly type across all the methods, respectively.
• HDPHMM-Gauss-VB: A classifier based on HDPHMM with Gaussian observation was trained independently for each anomaly class. The standard VB inference algorithm [55] was used to learn the HDPHMM model. Similiar to the method proposed in [57], instead of the blocked Gibbs sampling algorithm, we learn the model by variational Bayesian inference.
• HDPHMM-Gauss-moVB: A classifier based on HDPHMM with Gaussian observation was trained independently for each anomaly class. A memoized online variational inference algorithm (moVB) [56] based on scalable adaptation of state complexity is used to learn this HDPHMM model.
• HDPHMM-AR-VB: A classifier based on HDPHMM with first-order autoregressive Gaussian observation was trained independently for each anomaly class. The standard VB inference algorithm [55] was used to learn the HDPHMM model.
• HDPHMM-AR-moVB: Finally, our results were evaluated on the HDPHMM with first-order autoregressive Gaussian observation for each anomaly class. A memoized online variational inference algorithm (moVB) [56] based on scalable adaptation of state complexity was used to learn this HDPHMM model.
Given that non-parametric sHDP-VAR-HMM learns the complexity of the model from the data, it produces more accurate models as is reflected by the higher diagnoses accuracies shown in Table 5.2. The learned number of states for the different anomaly types is shown in Table 5.3. Note that for equivalent parametric HMMs, a tedious model needs to be computed for each class to optimize the state cardinality between types. The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. It is evident from the confusion matrix that the diagnoses outperforms other baseline methods.

Discussion
The multiclass classifier that is flagged when an anomaly is detected, was also tested through the sHDP-VAR-HMM on seven anomalies and six baseline methods. Our evaluations showed that we could not only detect anomalies reliably (overall accuracy of 91.0%, as presented in Chap. 4) but also classify them precisely (overall accuracy of 97.1%). With regards to anomaly diagnoses, anomalies usually occur from various sources such as sensing errors, control failures, or environmental changes. If a robot identifies the type, it may be beneficial to prevent or at least recover from the anomalous situation. In our proposed diagnoses method, we trained the sHDP-VAR-HMM model for each anomaly type separately. To address this, Natraj Raman had proposed a signal discriminative HDP-HMM for all classes albeit with an extra level that is class specific. Thus, an interesting future work direction consists in investigating a multilevel sHDP-VAR-HMM for all classes for multiclass classification and the extensions of the observation model by using higher order autoregressive Gaussian models.
Development of robot introspection is expected to have a direct impact on a large variety of practical applications, such as that can be used to estimate the log-likelihood of failure and prevent the failure during robot manipulation task. Also, the improvement of safety in human-robot collaborative environment by assessing the quality of learned internal model for each skill, which can speed up the anomaly recovery and/or repair process by providing the detailed skill identification and anomaly monitoring.
There are a number of limitations in our work. Currently we do not explicitly reduce the influence of outliers occasionally found in the derived log-likelihood values for specific hidden states. The outliers have a significant impact on the calculation of the threshold and our approach needs to address them specifically to avoid their impact. Additionally, we note the fact that our kitting experiment was not conducted under real factory conditions or in a real household daily task. Thus the verifiability of the work in real-world settings is unclear and further testing in real-factory conditions is necessary. The kitting experiment provides a proof-of-concept and the authors would like to extend their work to actual scenarios through corporate partners.

Anomaly Classifier Based on Feature Extraction
The procedure of the sparse representation system for multimodal time-series is shown in Fig. 5.7.

Anomaly Sample Collection • Sensory Preprocessing
The original multimodal sensory data includes a 6 DoF force and torque signals from F/T sensor, a 6 DoF Cartesian velocity signal from a manipulator's end-effector, 56 DoF tactical signals from a left and a right tactile sensor panels. To produce more consistent data content, we do not directly concatenate individual data sources, instead temporal synchronization is done across modalities at 10H Z. Also, different preprocessing techniques are done for specific modalities.
Wrench modality: Takes a force and torque time-series vector sequence and for each element represents the magnitude of each dimension ( f x , f y , f z , t x , t y , t z ). Empirically, we wish that anomalies HC and TC can effectively flag external perturbations caused from different directions. We also consider the norm of force n f and torque n t respectively: Velocity modality: We measure the linear (l x , l y , l z ) and angular (a x , a y , a z ) Cartesian velocity (the endpoint state of a Baxter right hand), which are reported with respect to the base frame of the robot. As with the wrench source, we also consider the norm of the linear velocity n l and angular velocity n a respectively.
Taxel modality: Due to the high dimensionality of tactical sensor, we do not process all the original signals as the model input. After empirical testing, the standard deviation of each tactile sensors s l , s r were selected as the preferred features and defined as: where the μ l = 1 28 28 i=1 l i and μ r = 1 28 28 i=1 r i is the mean of each tactical panel, respectively.
The above three preprocessing operations capture unexpected changes in the original signals since the signals are slightly different from the robot variable progress in normal executions. We then concatenate all the features to represent robot executions both in nominal and anomalous cases. Note that raw concatenated features can easily result in high False Positive Rates (FPR) during execution due to significant task execution variability across the same task in our experiments. For instance, the F/T signals vary different across differing objects of different weights; or similarly, human collisions which occur from different directions and magnitudes.
A standardization method is used to scale original signals ξ o by its mean and standard deviation according to: Finally, our eighteen dimensional multimodal signal(wrench, velocity, and tactical modalities) is represented as: ξ(a y ), ξ(a z ), ξ(n l ), ξ(n a ), ξ(s l ), ξ(s r )]. (5.7)

Anomaly Features Extraction
In order to keep temporal consistency, anomaly signals are considered for a given window_si ze which is fixed around events flagged as anomalous. For instance, if a tool collision is detected as shown in Fig. 5.8, then window_si ze can evaluate how the diagnoses reactivity performs in our system. Generally, the window_si ze will equal a power of two (the a preferred size when including the Fast Fourier Transformation (FFT) between the time and frequency domain). As described above, the sparse representation is applied for each extracted sample window. According to our previous work on anomaly diagnoses [20], the features are extracted in both the time domain and frequency domain. For anomaly diagnoses, we empirically consider the independent features and the corrective features along a specific modality signal as in Eq. 5.7. Here, the original multimodal signals with 18 DoFs and 12 feature types are considered in both the time and frequency domains, where the final feature vector is of length 558.

A: Independent features
Here, we calculate the features along a specific modality signal ξ { * } = (x 1 , x 2 , ..., x n ) with n data points 3. mean_diff: calculate the mean over the differences between subsequent values 4. mean_abs_diff: calculate the mean over the absolute differences between subsequent values where ε t is drawn from a Gaussian white noise with a mean of zero and unit variance. 5. partial_autocorrelation: calculate the value of partial autocorrelation function of given lag k ∈ {1, 2, 3, 4, 5}, denoted α(k), is the autocorrelation between x t and x t+k with the linear dependence of x t on x t+1 through x t+k−1 removed. As such, the function is defined as

• Experimental Setup
A kitting experiment consists of 5 basic nodes: Home, Pre-pick, Pick, Pre-place, and Place. The experiment is implemented in the following order with those nodes by 6 skills with the ROS-SMACH, 2 Fig. 4.11. The primary goal of the kitting task is designed to transport 6 different objects to a fixed container. The right arm of Baxter humanoid robot is used to pick objects and equipped with a 6 DoF Robotiq FT sensor, 2 Baxter-standard electric pinching fingers. Each finger is further equipped with a multimodal tactile sensor that a four by seven taxel matrix that yield absolute pressure values. The left hand camera is placed flexibly in a region that can capture objects with a resolution of 1280 × 800 at 1 fps (we optimize pose accuracy and lower computational complexity in the system). The use of the left hand camera facilitated calibration and object tracking accuracy. All code was run in ROS Indigo and Linux Ubuntu 14.04 on a mobile workstation with an Intel Xeon processor, 16GB RAM, and 8 cores.
When robot collaboratively works with human in a shared workspace, so many external disturbances are likely to occur. Those anomalies in Fig. 4.12 are considered in the Kitting experiment, which including the following 7 types: Tool Collision (TC) that may be derived from the visual error or the user accidentally collide with the object during robot moving to grasp it (see Fig. 4.12b); Human Collision (HC) is usually happened by a user to unintentionally collide with the robot arm in the human-robot collaboration environment(see Fig. 4.12a). We treat the human collision differently from whether the robot carrying object or not. Thus, Human Collision with Object (HCO) is assumed that the human collision while robot carrying object from the node Pre − pick to Pre − place (see Fig. 4.12d). The object have been knock down by the robot during grasping may induce the No object (NO) or missedgrasps(see Fig. 4.12f). Another common anomaly is Object Slip (OS) that the picked object may slip from the robot's gripper if the grasping pose is not optimal or the robot moves at high speed. Finally, the False Positive (FP) is labeled when some unexpected disturbances may be detected by the anomaly detector for a variety of reasons, for instance, the system error, the object is placed at unreachable zone, without feasible inverse kinematic solution, and so on. In the rest of this paper, we intentionally achieve the spare representation of the seven types of anomalies while preserved sufficient diagnoses accuracy, respectively.

• Results and Analysis
The dataset contains a total of 108 samples from 137 experimental recordings of kitting task and the proportion for each anomaly are TC 15.7%, HC 16.7%, HCO 16.7%, NO 13.0%, OS 16.7%, WC 15.7%, FP 5.6%, respectively. For evaluating the performance of the proposed sparse representation of multivariate time-series, we took the following 9 representative classifiers 3 into consideration and the parameter settings are described respectively in Table 5.4. As described above, the feature selection is a process where the most significant features in predicting the outcome are selected automatically. However, irrelevant features decrease the model's accuracy and increase computational cost. We therefore calculate the p_value of each extracted feature by using the hypothesis testing method in spscitefresh2016. That is, we preform a singular statistical test checking the hypotheses for each extracted feature f 1 , f 2 , ..., f n , (5.20) The result of hypothesis test in Eq. 5.20 is a p_value, which assess the probability that feature x i is not relevant for predicting class y. As such, we define the score of feature by calculating the negative logarithmic value on the p_value. Large scores − log( p_values) indicate features, which are relevant for predicting the target. The performance of different classifiers is shown in Fig. 5.9, as you can see (reading left to right on the graph), the accuracy indicates to increase as the number of features are added, until a point beyond which there seems to be too few features for the classifier to make any reliable conclusions. Specifically, those features are extracted from a sorted feature vector in descending order by score values.

Discussion
This work implements sparsely represent the recorded multimodal time-series with relevant features as small as possible while preserving diagnoses accuracy. We propose the multivariate features are extracted in both time domain and frequency domain and not only consider the static statistical characteristics, but also including the correlation and interaction of each dimensional sensory signal. Results indicate that the data set can be significantly reduced up to 72.2% ∼ 86.1% (the number of features is 100 and 200, respectively) of the raw data while keep the average diagnoses accuracy at about 85% with small data preparation. Future work should therefore include analyzing the trade-off between the value window_si ze and the diagnoses accuracy. So as to represent the recorded multimodal time-series with relevant features as small as possible while preserved diagnoses accuracy.

Summary
In this chapter, anomaly diagnose methods using the Bayesian nonparametric hidden Markov models and sparse representation by statistical feature extraction when anomaly triggered are introduced. Specifically, the detail procedure for anomaly sample definition, the supervised learning dataset collection as well as the data augmen-tation of insufficient samples are also presented. The proposed methods is verified with a multi-step human-robot collaboration objects kitting task on Baxter robot, the performance and results are presented of each method respectively. That is, a multitarget classifier based on the nonparametric Bayesian model is proposed, Which using the sHDP-VAR-HMM model to model the anomaly sample data of various anomaly types. For the task of Baxter robot performing kitting experiment, resulting in the diagnoses accuracy of a total of 7 types of anomaly events in is 97.1%. Additionally, for the sparse representation for anomaly diagnoses, we extract a total of 31 features of time and frequency domains of the anomaly sample. Then, the extracted features importance analysis using the hypothesis testing method, after that the filtered features are verified on 9 common out-of-the-box supervised learning methods for multi-class diagnoses in sklearn.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.