Endowing Robots with Longer-term Autonomy by Recovering from External Disturbances in Manipulation through Grounded Anomaly Classification and Recovery Policies

Robot manipulation is increasingly poised to interact with humans in co-shared workspaces. Despite increasingly robust manipulation and control algorithms, failure modes continue to exist whenever models do not capture the dynamics of the unstructured environment. To obtain longer-term horizons in robot automation, robots must develop introspection and recovery abilities. We contribute a set of recovery policies to deal with anomalies produced by external disturbances as well as anomaly classification through the use of non-parametric statistics with memoized variational inference with scalable adaptation. A recovery critic stands atop of a tightly-integrated, graph-based online motion-generation and introspection system that resolves a wide range of anomalous situations. Policies, skills, and introspection models are learned incrementally and contextually in a task. Two task-level recovery policies: re-enactment and adaptation resolve accidental and persistent anomalies respectively. The introspection system uses non-parametric priors along with Markov jump linear systems and memoized variational inference with scalable adaptation to learn a model from the data. Extensive real-robot experimentation with various strenuous anomalous conditions is induced and resolved at different phases of a task and in different combinations. The system executes around-the-clock introspection and recovery and even elicited self-recovery when misclassifications occurred.


Introduction
As robots enter increased levels of unstructured environments and shared workspaces with humans, unexpected anomalies are anticipated. Even as manipulation and control algorithms become increasingly robust, failure modes continue to exist. Numerous sources of error and possible execution anomalies arise from the complex dynamics found in robots, their interactions with the world and human collaborators, as well as robot's limitations to model the world. Internal errors-resulting from improper modeling of visual, kinematic, or dynamic models -and limited hardware accuracy can potentially lead to anomalies. It is the external anomalies, however, that are the hardest to account for in unstructured environments. External anomalies may arise from the inability to model sudden accidental collisions (human-robot, robot-world, or robot-object-world), object slips due to inertial dynamics, misgrasps; or even a chain reaction were one anomaly generates other anomalies. Fig.  1 illustrates two anomaly examples in a kitting experiment. Furthermore, anomalous conditions are hard to model as similar anomalies can occur with wide variability, making it challenging for robots to recognize. In this work, we refer to self-monitoring as (physical) introspection. It includes the ability of a robot to recognize both the nominal and anomalous state conditions it may be in. Therefore, it is imperative to have a recovery framework that leverages introspection results and incrementally learns to resolve new anomalous situations in unstructured environments.
This work implements a recovery framework to allow the explicit encoding of contextual recovery policies in online collaborative manipulation tasks. Few papers have studied the development of explicit recovery policies for recovery of anomalous conditions, especially those that  are characterized by random or unstructured qualities that are hard to model or anticipate. Recovery action designs in robots shares similarities with motion generation and skill sequencing notions in robot manipulation (Calinon, D'halluin, Sauser, Caldwell and Billard 2010;Ijspeert, Nakanishi, Hoffmann, Pastor and Schaal 2013;Paraschos, Daniel, Peters and Neumann 2013; Levine, Finn, Darrell and Abbeel 2016a; Chernova and Veloso 2009;Grollman and Jenkins 2010;Konidaris, Kuindersma, Grupen and Barto 2012;Niekum, Osentoski, Konidaris, Chitta, Marthi and Barto 2015;Gutierrez, Chu, Thomaz and Niekum 2018;Levine, Pastor, Krizhevsky and Quillen 2016b). However, in anomaly recovery, a robot may require to re-attempt a specific skill, whilst at other times it may need to apply a new skill to adapt to new world conditions. It is important to learn recovery behaviors incrementally in response to specific anomaly events that may occur unpredictably in a task. For recovery to be useful, a recovery critic must resolve the best policy to enact for a given anomaly at a given time.
Anomalies (the deviation of sensor-related signatures from those experienced in nominal executions) have been studied particularly in structured and unimodal formats (Hovland and McCarragher 1998;Pettersson 2005). More recently, anomaly identification (Stolt, Linderoth, Robertsson and Johansson 2011;Rojas, Harada, Onda, Yamanobe, Yoshida, Nagata and Kawai 2013;Rojas, Harada, Onda, Yamanobe, Yoshida and Nagata 2014) and classification (Di Lello, Klotzbucher, De Laet and Bruyninckx 2013;Wu, Lin, Guan, Harada and Rojas 2017a;Park, Erickson, Bhattacharjee and Kemp 2016;Park, Kim, Hoshi, Erickson, Kapusta and Kemp 2017;Park, Kim and Kemp 2018) in unstructured environments have been the subject of growing interest as the need for robots to work in unstructured is greater and more feasible. In (Di Lello, Klotzbucher, De Laet and Bruyninckx 2013), a non-parametric Bayesian prior was used with Hidden Markov Models (HMMs) and a Gaussian observation model and Gibbs sampling to do anomaly classification of four static anomaly situations. In (Luo, Wu, Lin, Duan, Guan and Rojas 2018), Markov Jump Linear Systems were used to model latent states through linear dynamical process (the vector auto regressive) for anomaly identification in a pick-and-place task as well as a drawer opening task. A novel threshold that varies according to the execution of the process was designed through the use of gradient of belief state for the HMM gradient but no anomaly classification was conducted. In (Park, Erickson, Bhattacharjee and Kemp 2016), anomaly identification was conducted by using a traditional HMM but with a detection threshold that varied according to clusters of execution progress. The same work was improved in (Park, Kim and Kemp 2018) and instead introduced two approaches to computed likelihoods as a function of progress. First, they used Gaussian radial basis functions to produce the clusters and associated likelihoods that gauge execution progress and vary the identification threshold. Second, they sought a method to eliminate discontinuities between clusters and opted to use a Gaussian-process regressor to compute the mean and standard deviation of the log-likelihood segments. While this work focused on detecting anomalies caused by a wide variety of sources, it did not implement a multi-class anomaly classifier. Finally in (Park, Kim, Hoshi, Erickson, Kapusta and Kemp 2017), an artificial neural network is used to identify and classify anomalies in the context of robot-assisted feeding and producing competitive results. The network uses input features extracted from an HMM, raw sensory signals, and a convolutional neural network. In our work, we look to improve the performance of the identification and classification of anomalies by using the sticky Hierarchical Dirichlet Process (sHDP)-HMMs with memoized variational inference and scalable adaptation to learn more compact and interpretable models of VAR process to enhance the anomaly identification and classification accuracy (Sec. 4). Few works design recovery policies that explicitly handle the occurrence of various anomalies at different times in a task. For example, in (Rodriguez, Mason, Srinivasa, Bernstein and Zirbel 2011;Nakamura, Nagata, Harada, Yamanobe, Tsuji, Foissotte and Kawai 2013;Wu, Lin, Luo, Duan, Guan and Rojas 2017b), the entire task is only re-attempted upon failure. In (Chang and Kulic 2013), Chang et al. devised an error recovery system based on Petri Nets learned from demonstration. Error conditions however were defined based on object location: if objects were not located in expected states, an error was triggered. This forced the system to maintain a growing list of expected object locations. The work did not consider other anomaly sources. In (Kappler, Pastor, Kalakrishnan, Wuthrich and Schaal 2015), failure classification was performed for only one perturbation and it was pre-taught. No failure identification was presented nor was there an explicit recovery policy. Instead, a recovery behavior was inserted manually in a specific place in the task and no explicit experimental results quantified recovery versatility and robustness. Likewise the ability to grow recovery behaviors incrementally over time was absent. In (Niekum, Osentoski, Konidaris, Chitta, Marthi and Barto 2015), a system that allows for the incremental addition of skills is taught, but there is no mentions on how anomalies could be explicitly classified. Adaptive behavior was taught for two anomalies that occurred predictably and were characterized by a consistent structure. No explicit recovery policy was presented in this work to handle anomalies.
The research question that we studied in this work is: "Given the ability to classify anomalies, to what extent can we extend long-term autonomy when using a simple set of task-level recovery policies that grow over time". Our work contributes the design of an explicit manipulation anomaly recovery system that is characterized by six attributes: 1. The use of Bayesian non-parametric Hidden Markov Models with memoized variational inference with scalable adaptation for robust online anomaly classification trained with few data.
2. The learning of contextual recovery policies that provide unique recovery policies for anomalies at specific locations in the task graph.
3. The establishment of two types of recovery policies that differentiate between anomalies whose causes are accidental (one-off) occurrences and anomalies that are persistent.
4. The ability to introspect and perform recoveries reliably whilst the robot already executes a recovery behavior triggered by a previous anomalous conditions.
5. The integration of a system that combines scalable and encapsulated motion generation, introspection, and recovery.
6. The inclusion of an (costly-to-generate) anomaly dataset in a co-bot Kitting experiment that includes relevant multimodal sensory-motor information and RGB video for a wide range of anomalous conditions and recoveries at different parts of the task (details in Extension 2).
Our contribution builds on top of our previously developed introspection system which could introspect into nominal skills and identify anomalies, but not classify them or recover contextually from them. In this paper, we significantly expanded introspection to deal with the online classification of a challenging set of anomalies and further implemented contextual recovery policies to resolve them. Extensive experimentation showed not only that we can perform globally anomaly classification and contextual recoveries effectively but also that the system self-recovers from erroneous classification and successfully recovers from existing anomalies. Manipulation tasks are represented through a hierarchical graph-based representation. Nodes consist of modules that encode skill generation, skill introspection, and skill goal setting. Modules run in parallel allowing for a constant sense-plan-act-introspect (SPAI) paradigm. Skill modules are flexible and can execute a users preferred skill generation technique (i.e. HMM motion generation (Calinon, D'halluin, Sauser, Caldwell and Billard 2010), Dynamic Motion Primitives (DMPs) (Ijspeert, Nakanishi, Hoffmann, Pastor and Schaal 2013), Probabilistic Motion Primitives (ProMPs) (Paraschos, Daniel, Peters and Neumann 2013), Interaction Probabilistic Movement Primitives (IProMPs) (Chen, Wu, Duan, Guan and Rojas 2017), or deep reinforcement learning (RL) techniques (Levine, Finn, Darrell and Abbeel 2016a). Introspection modules run anomaly identification and classification through Bayesian non-parametric Markov Jump Linear Systems (MJLS) with improved inference techniques for better model representation (Wu, Lin, Guan, Harada and Rojas 2017a;Hughes, Stephenson and Sudderth 2015)). Anomaly identification is a fault detection service that if flagged, triggers anomaly classification services that cluster signals with broadly similar structure.
Once an anomaly is classified, a recovery policy is executed. Recovery policies include re-enacting or adaptive policies. Re-enacting policies resolve accidental (one-off) anomalies, while adaptive policies resolve persistent anomalies. Re-enactments re-attempt either a current or previous manipulation skill but with new goal parameterizations. Re-enactments are learned from human users by modeling human recovery choices through a multinomial distribution of task nodes. Once learned, new node transitions are introduced in the graph for specific accidental anomalies at specific nodes. For adaptive policies, the robot requires user intervention to provide skill training to overcome a persistent anomaly at a given point in the task graph. Once an adaptive recovery is trained (including both skill generation and introspection models), it is introduced into the graph while retaining previously learned policies from the parent node. The approach fashions a system that incrementally learns anomalies globally and recoveries contextually (Sec. 5).
A co-bot experiment performing kitting tasks is used as a proof-of-concept. A human collaborator places objects in a collection bin that the robot has to package. We hold that tedium and monotony on the human collaborator part result in the introduction of a variety of external disturbances or anomalies to the robot system. We demonstrated that we could not only identify anomalies reliably (overall accuracy of 93.09%) but also classify them in an online fashion (overall accuracy of 96.15%). And that given simple tasklevel recovery policies, we could also recover consistently and reliably most of the time. The tight integration achieved in this work enabled robots to continue functioning more than 82% across all our anomaly scenarios and 95% in more typical scenarios.
The framework showed interesting functionality including: (i) the ability to introspect and recover from anomalies that occurred during recovery activities themselves and (ii) the ability to self-correct. Even in situations where the initial classification and recovery policy where wrong, the system at times quickly self-corrected and completed the task successfully. The current framework has broad applicability to all manipulation domains that suffer from uncertainties in unstructured environments: making industrial and service robots prime candidates for this technology. Extensions 1-5 include supplemental video, dataset, results and analysis, and robot-agnostic source-code for the co-bot kitting experiment  Figure 2. Manipulation tasks are controlled through a graph based scheme consisting of nodes and edges. Each node contains three types of modules: motion, visualization, and introspection; all of which run in parallel. Motion modules use pose goals provided by the visualization module as well as node-specific skill parameters to generate desirable skills. Introspection modules use node-specific models, parameter, and hyper-parameter settings to continually look for anomalies. If identified, the introspection modules further classifies them. A recovery critic then issues a policy for re-enactment or adaptation.
with anomalies and recovery information. The supplemental information is also accessible at (Rojas 2018b).

Overview
In this section we introduce a high-level overview of the system along with relevant notation. A summary of all notation can also be found in Appendix C. Directed graphs are a useful tool to manage complexity in manipulation tasks (Kroemer, Daniel, Neumann, van Hoof and Peters 2015;Niekum, Osentoski, Konidaris, Chitta, Marthi and Barto 2015;Kappler, Pastor, Kalakrishnan, Wuthrich and Schaal 2015). Motion comprises structure, not unlike that of grammar, that can be captured as a set of motion primitives and associated sensory-motor perceptions ((Rojas, Luo, Zhu, Du, Lin, Huang, Kuang and Harada 2017;Lin, Shafran, Yuh and Hager 2006;Rosen, Brown, Chang, Sinanan and Hannaford 2006)).
Manipulation tasks are represented as a graph G that consists of a sequence of behaviors. Behaviors B in turn are composed as either simple or compounded actions, where actions are represented by nodes N . Actions are connected by transitions T and as such, behaviors too are connected by transitions. A node transition from a node N s to another node N t is denoted as: T s,t = {s, t ∈ N }. The manipulation graph is thus the set of nodes and transitions: G : {N , T }. We also introduce a pair of additional definitions for behaviors: (i) behaviors are also referred to as phases in a manipulation task. Phases imply temporal progression, hence given behaviors a temporal context in the accomplishment of a task. (ii) The behaviors with which any task is bootstrapped are also referred to as milestones B = (B 1 , ..., B i ), which indicate that it will be these behaviors that define key points in the task and will play a significant part in accomplishing the task.
As we introduce recovery policies, more concretely adaptive policies, we will generate simple adaptive behaviors that are composed essentially of (adaptive) nodes and denominated as N ij . Adaptive nodes will be pushed inbetween milestones (Sec. 5.2). The node insertion generates a new graph branch that connects the current behavior to the subsequent milestone (see the rec_mv z _anom k node in Fig. 2). It is also possible to introduce further adaptive nodes N ijk in existing branches (see the rec_rec_mv z _anom m node in Fig. 2) if a new adaptation takes places as a result of an anomaly F during a recovery skill. In this way, the set of nodes in a task, those within milestone behaviors and those in branched nodes N = {N i N ij . . . N ij...q }, can incrementally grow over time as new capabilities are introduced.
We now turn our attention to a node's internal functionality. A node does more than simply generate motion. Nodes are composed of of a set of modules which run as parallel processes. Generally speaking, modules can encapsulate a wide range of functions like: skill generation, introspection, visual goal setting (visualization for short), natural language processing, navigation, to name a few. For this work, we restrict node modules to: skill generation S, visualization V, and introspection M. In a given task, skill modules S m = {S 1 , ..., S M } perform the necessary motor skills to achieve a task (Sec. 3). Visualization modules V m = {V 1 , ..., V M } process goal targets for specific motor skills (Sec. 3.1). Introspection models M m = {M 1 , ..., M M } aid a robot to understand the types of skills or anomalies that are experienced within a task. In our work, we generate and maintain skill, visual, and anomaly libraries on a per-task basis 1 .
The introspection module is in charge of triggering anomaly flags when the system experiences sensory-motor signatures that deviate from those expected in the currently running node. Once an anomaly triggered, the introspection system will provide a classification F x to the anomaly. Classifying anomalies is by nature more challenging than classifying nominal skills as the variability under which anomalies occur is much larger (see Sec. 4). Similarly, acquiring data for failure activities also brings challenges: discovering a set of anomalies in a task is not a straight forward process, deciding on how to discriminate between them is also not trivial. The policy under which anomalies are re-generated can be controversial: should they be induced or expect to occur accidentally. Sec. 2.2 further comments on these issues.
Once an anomaly has been classified, recovery actions R are necessary. A recovery agent or critic issues one of two types of recovery policies: re-enactment policies R R or adaptive policies R A . Re-enactment policies are applied to anomalies that are distinctly accidental (one-off events), while adaptive policies are applied to anomalies that are persistent (i.e. anomalies that occurs repeatedly). Re-enactment policies, re-attempt a previously enacted skill that is selected as a function of the anomaly that occurred. That is, a re-enacting policy issues a transition from the current node N i to a designated goal node N g such that R R : T N i,N g (see Sec. 5.1). For adaptive policies, the robot requires user intervention to train a motor skill to overcome the persistent fault F x . Once an adaptive recovery is trained, it is added into the graph such that: (Sm, M m, V m). In this way, adaptive recoveries are incrementally introduced to the system as persistent anomalies appear (see Sec. 5.2).

Experimental Setup
In this section we introduce a co-bot-based Kitting experiment selected to test our anomaly classification and recovery policies. We also present the experimental testbed and manipulated objects. Details regarding external disturbances and data collection techniques are also described.

Kitting Experiment
The collaborative kitting experiment consists of a robot and a human co-worker that closely collaborate to place a set of goods in a packaging box. The human co-worker is tasked to place a set of 6 objects on the robot's "collection bin" (located in front of the robot) in a one-at-a-time fashion as shown in Fig. 4(a). The objects may accumulate in a queue in front of the robot. As soon as the first object is on the table, the robot identifies the object and begins the placing process in the packaging box located to the right of the robot. Thus, the robot picks an object ( Fig. 4(b)) and transports it towards the box (Fig. 4(c)), after which, the robot appropriately places it in the box (Fig. 4(d)).
The kitting task is originally bootstrapped with 4 behaviors B and 5 actions N as shown in Fig. 3. All behaviors except pick consists of single actions or nodes. The compound pick behavior consists of two nodes: prepick to pick and pick to pre-pick. The task requires that we train 5 actions and as such 5 skills, visualization goals, and introspection models. However, in the rest of the paper, we will describe the task only in terms of the 4 high-level behaviors for simplicity. Recall that the original graph will grow as adaptive nodes are learned when adaptations are necessary.

External Disturbances
In this section we try to motivate the kinds of external disturbances that may be typical of a collaborative environment in a human-robot collaboration setup in a warehouse-like job as the one described in Sec. 2.1.1. Despite collaboration, we think that collaborative tasks, kitting in this case, might still result in low-cognitive demands for the human user. The lowcognitive load might lead to monotony which would then cause boredom and attention-loss. In such cases, a human co-worker may be more likely to accidentally collide with the robot or alter the environment in unexpected ways. For example, the user may accidentally collide or unintentionally move a packaging object in ways the robot cannot model or anticipate as it tries to grip the objects. Object shifting (objects to be grasped or event the packaging box) may lead to tool-collisions, failed grasps, or even air grasps (where the object was completely removed). There also exists the possibility that picked objects may at times slip from the robot's tool if the grasp is not optimal; or if upon motion, inertial forces acting on the object cause dynamics that break the grasp. The system may even experience a chain of anomalies: human collisions that lead to object slips that move objects in such a way that lead to air grasps. As part of the discovered anomalies from Sec. 2.2, we introduce the basic anomaly types and their acronym in the interest of brevity: human collisions (HC), tool collisions (TC), object slips (OS), and no-object (NO). Sec. 4, will introduce the introspection methodology used to model robot skills including a description of our  Objects that need to be packaged are placed by a human collaborator before the robot in a collection bin. The shared workspace affords possibilities for accidental contact and unexpected alteration of the environment. The robot is tasked to pick-and-place each on of the objects in a packaging box to its right. The visualization module uses the ALVAR tags to provide a consistent global pose with respect to the base of the robot and the introspection system is continually monitoring for anomalies and their types. If an anomaly is classified, the recovery critic selects from amongst two policies to try to restore the task flow and reach the next milestone in the task. The ultimate result is to recover successfully and in doing-so help the robot achieve longer-term autonomy.
Anomaly Identification algorithms in Sec. 4.2) and Anomaly Classification algorithms in Sec. 4.3. Later, in Sec. 5 we introduce our recovery critic policies including Reenactments (Sec. 5.1) and Adaptations (Sec. 5.2).

The Robot
A Baxter humanoid robot's right arm is used to pick commonplace objects set before him. The equipment used with the robot is: a 6 DoF Robotiq FT sensor, the standard Baxter electric pinching fingers, and Baxter's left hand camera. Each finger is further equipped with a multimodal tactile sensor composed of: (i) a four by seven taxel matrix that yield absolute pressure values, (ii) a dynamic sensor which provides a single capacitive reading in millivolts (mV) useful to detect tactile events, and (iii) an IMU and gyroscope (Maslyczyk, Roberge, Duchaine et al. 2017). Baxter's left hand camera is placed flexibly in a region that can capture objects in the collection bin with a resolution of 1280x800 at 1 fps (we are optimizing pose accuracy and lower computational complexity in the system) as seen in Fig. 4(a). The use of the left hand camera facilitated calibration and object tracking accuracy. ROS Indigo on Linux 14.04 and a number of workstations are used to control all aspects of the experimentation. Code is available in our supplementary page (Rojas 2018b).

Objects
A set of 6 common household objects consisting of box-liked shapes and bottles were used in our work as shown in Fig. 4(a). The objects ranged in weight from 0.0308kg to 1.055kg and in volume from 3.2 x 10 −04 m 3 to 1 x 10 −03 m 3 . The object's surfaces also varied slightly: some heavier objects had sleeker surfaces that incited object slips-we believe not an unreasonable determination as warehouses contain a wide variety of objects-whilst other objects had rougher surfaces. Across trials, object locations and order was varied to promote generalization.
Alvar tags, with 0.06m sides, were placed around the circumference of the objects for robust visual recognition (ALVAR can handle change in lighting conditions, optical flow-based tracking, and good performance for multi-tag scenarios) regardless of orientation (Fig. 4).

Cataloging Experiments
In this section we provide brief overviews of the data collection process for skill S and introspection M modules.
Detailed presentations will be found in Sec. 3 & 4 respectively.

Motion Skill Training
In this work, motion skills are encoded through DMPs. DMP training uses one-shot kinesthetic demonstrations to teach five skills each of the four skills needed to bootstrap the behaviors for the kitting task.

Deducing Anomalies
As for the process of discovering what anomalies might exist in a given task, we must express that, undeniably, robot researchers hold a bias towards which anomalies will exist and be discovered in a given task. To this end, in our work, we aim to discover the anomalies in the task by emulating a collaborative kitting task where the human collaborator experiences tedium and monotony and leads to unintentional changes or disturbances in the environment or the robot respectively.
To this end, we tasked 5 robot researchers to act as a collaborative co-worker in a kitting task with the Baxter humanoid robot under the monotonous conditions already mentioned (Sec. 2.1.2). Each user was trained to place the set of six household objects, one-at-a-time, in the collection area. From this exercise we extract two pieces of information: (a) anomaly classification labels (as judged by a human expert) that emerge from the task (those mentioned in Sec. 2.1.2, namely HC, TC, OS, NO) and (b) the recording of sensory-motor data surrounding the anomalous event. We do this by considering a window of ±2secs. and recording through the use of an online database system 2 . The sensorymotor data collected at this stage, allows to build basic models of the anomalies further described in Sec. 4. One thing to note for anomaly classification is that in this work we attempted to classify anomalies broadly. Consider for an example a human collision: regardless of user, high or low collision, right or left, even temporal occurrence in the task, all of these are sought to be classified as the same single event of human collision. The same principle applies across the rest of the anomalies. Our approach to classification is much broader than similar works (Park, Kim and Kemp 2018) and renders the classification task much more challenging. Coupled with the fact that only a limited number of trials is available for training, the modeling task is challenging.

Training and Inducing of Anomalies
Beyond the original data collection performed in Sec. 2.2.2, a second data collection round is conducted to improve training (parameter and hand-designed feature tuning). This round is performed iteratively seeking to maximize optimal performance while protecting against overfitting. The final number of training and testing trials used for anomaly identification and classification are described in Exp. 1 and Exp. 2 respectively.

Learning Recoveries
Upon the occurrence of accidental one-off anomalies, re-enactment recovery policies are learned from human users. Exp. 3 is used to learn probability models from human users given specific anomalies (see Sec. 5.1.1 for details). Similarly, for persistent anomalies, adaptive recoveries are incrementally trained through kinesthetic teaching. In Exp. 4, 5, and 6 a variety of adaptive skills are learned to address specific and emerging anomalies (see Sec. 5.2 for details).

Motor Skills
In manipulation, motor skills are compact action representations that are extracted from continuous high degree-of-freedom (DoF) robot motions (Ijspeert, Nakanishi, Hoffmann, Pastor and Schaal 2013;Paraschos, Daniel, Peters and Neumann 2013;Meier and Schaal 2016;Chen, Wu, Duan, Guan and Rojas 2017;Calinon, D'halluin, Sauser, Caldwell and Billard 2010). Attractive qualities in motor skill representations include stable dynamics when attractor points (start and goal locations) or temporal scales are changed along with flexible re-use like blending or parallelizing primitives. Techniques from dynamical systems like Dynamical Motion Primitives (DMPs), or from probability like Probabilistic Motion Primitives (ProMPs) are widely used to encode manipulation task information. In this work, we encode motions using DMPs though we can handle any manipulation approach by extracting key parameters into the framework's motion module library.
The DMP framework encodes dynamical systems through a set of nonlinear differential equations whose point attractor system is defined by a nonlinear forcing function, which in turn depends on a canonical system for temporal scaling. For a one DoF point attractor system, the point attractor system is defined as (Pastor, Hoffmann, Asfour and Schaal 2009): Eqtn. 1, is an extended PD control signal with spring and damping constants K and D respectively, position and velocity x and v, goal g, scaling s, and temporal scaling factor τ .
The scaling term originates from an additional system, called the canonical dynamical system, which controls the system's phase execution: and where α can be an arbitrary constant.
The forcing term f (s) is used to alter attractor point dynamics and achieve an arbitrary trajectory (often learned from demonstration (Pastor, Hoffmann, Asfour and Schaal 2009)). The forcing term can be defined as a phasedependent linear combination of basis functions ψ i (s): Gaussian distributions with mean c i and variance h i were used as basis functions: The forcing function is the linear combination of basis functions with variable weights w i and normalization constant i ψ i (s). Phase s monotonically decreases from 1 to 0 to control phase progress by activating Gaussian distributions centered at c i . The diminishing phase value guarantees the vanishing of the forcing term leaving the simpler point attractor dynamics to converge to the target. Spatio-temporal scaling is possible through the (g − x) term in Eqtn. 1 performs spatial scaling enabling the system to adjust to varying goals. Finally, system speed-up (or slowdown) is possible through the τ variable in Eqtn. 3 as well.

Learning from Demonstration
Forcing term weights are learned from demonstration.
Next, the goal is set to g = x(T ) and τ is selected such that a DMP reaches 95% convergence at t = T before using standard linear regression to compute the weights w i . Such procedure yields a baseline controller that can be improved by reinforcement learning (Schaal, Peters, Nakanishi and Ijspeert 2005) though this is not done in this work. Motor skills are trained as individual skills (more robust methodologies (Grollman and Jenkins 2010;Konidaris, Kuindersma, Grupen and Barto 2012;Niekum, Osentoski, Konidaris, Chitta, Marthi and Barto 2015) were not used here) for each phase of the task. Cartesian position and XYZ Euler representations are used to encode the attractor dynamics.
With respect to introspection models, we leverage sensory-motor signatures to learn the structure of sensory responses to motion data (Rojas and Peters II 2005;Kappler, Pastor, Kalakrishnan, Wuthrich and Schaal 2015). Our observations consist of a 6 DoF end-effector twist and wrench respectively, a 7 DoF pose (using quaternions as orientation), and 56 tactile values (each finger has 4-by-7 taxels). All observations were hand-processed into features as detailed in Sec. 6.3. All object poses are acquired using AR codes through the ROS ALVAR framework 3 .
As previously mentioned in Sec. 2.1.1, we use kinesthetic teaching to train five simple skills: move-to-pick, prepick-to-pick, pick=to-pre-pick, move-to-box, and place. We ensure that skills are executed in such a way that no occlusion occurs. Skills are executed at least 7 times to obtain sensor information of nominal skills which is used by the introspection models to first implement anomaly identification (as described in Sec. 4 and also seen in (Wu, Lin, Guan, Harada and Rojas 2017a)). Once DMP and introspection models are trained, they are stored in their corresponding libraries. Then, a behavior graph is constructed where nodes contain appropriate ID types that are handled by the system to enact necessary models during task execution. As for transitions, nominal nodes currently transition to only one other node, so no explicit transition classification is enacted. For anomalies however, transitions to different nodes will depend on the anomaly classification (Sec. 4.3) and the re-enactment policy of our critic critic (Sec. 5).

Goal Setting
For task execution, the Visualization module is responsible for selecting appropriate goal targets to enacted skills. While the goal is that the visualization module uses task affordances to select appropriate target goals in a skill, currently goal targets are pre-specified according to the nature of the skill. Pre-pick nodes use the Alvar code pose of objects in queue order from right-to-left. Pick nodes are set to the pose of actively actively tracked objects. The move-to-box skill uses the centroid location of the flat plane of the box. Place skills use packaging box locations set according to the number of objects already picked in the task. Additionally, we highlight that though the skill set used in this work is simple, the space of possible anomalies is significant and is this work's main focus. To this end, in our experimentation, we test strenuous anomalous conditions that could emerge in unstructured environments. (Sec. 6).

Robot Introspection
Robot introspection is a precursor to policy recovery. A non-parametric Bayesian MJLS system is used for anomaly identification and classification. This section will first introduce the Bayesian non-parametric model and then present the specific techniques used for anomaly identification and classification.

Bayesian non-parametric Hidden Markov Modeling
Robot introspection uses Bayesian non-parametric Markov Jump Linear Systems (MJLS) and memoized variational inference with scalable adaptation as the modeling mechanism. A non-parametric Bayesian HMM, namely the the sticky Hierarchical Dirichlet Process HMM can be used to learn a VAR process (sHDP-VAR-HMM).
Such an approach enables us to both learn the model complexity (also the mode or number of latent states) directly from the data. The VAR switching process allows to model mode-specific observations through linear dynamics (Fox, Sudderth, Jordan and Willsky 2010 Wu, Lin, Guan, Harada and Rojas 2017a). Recent advances in variational inference allow to process large datasets incrementally and optimize the creation and removal of states yielding highly optimized models that are simpler, more compact, more interpretable, and better aligned to ground truth state segmentations (Hughes, Stephenson and Sudderth 2015). In this section we first describe the standard Hidden Markov Model, followed by the sHDP-VAR-HMM, followed by variational inference concepts.

Hidden Markov Models
HMMs are a doubly stochastic and generative process used to make inference on temporal data. The underlying stochastic process contains a finite and fixed number of latent states or modes z t which generate observations X = {x t } N t=1 through mode-specific emission distributions b(z t ). These modes are not directly observable and represents sub-skills in a given task node. Transition distributions, encoded in transition matrix π ji , control the probability of transitioning across modes over time. Given the initial mode distribution π 0 and a set of observations, the Baum-Welch algorithm is used to infer model parameters Π = (π, b). HMMs assume a fixed number of latent states as well as mode-specific conditionally independent observations. Such assumptions limit the expressive power of HMMs as they are unable to derive natural groupings and model complex dynamical phenomena.

The sHDP-VAR-HMM
Bayesian non-parametric priors extend HMM models to learn latent complexity from data as well as the transition distribution of an HMM (Fox, Sudderth, Jordan and Willsky 2010;Fox, Hughes, Sudderth, Jordan et al. 2014; Hughes, Stephenson and Sudderth 2015; Wu, Lin, Guan, Harada and Rojas 2017a). This section introduces key concepts of the sHDP-VAR-HMM (although for an extended presentation see (Fox 2009)). To allow for a flexible number of latent states, priors on probability measures G j that have an unbounded number of support points θ k can be used. Dirichlet Process's (DP) are known for their clustering properties (i.e. the Chinese restaurant process) across countably infinite modes θ k and provides a distribution over the support points according to Eqtn 6.
Here, H is a base distribution, and β k are weights sampled via a stick-breaking process generally represented as GEM (γ). The DP allows to sample observations without explicitly constructing an infinite probability measure G 0 ∼ DP (γ, H). Instead, it is possible to use the DP as a prior for the set of HMM transition probability measures G j . However, this construction as it stands, would consistently generate independent HMM modes between transition steps. The goal is to define the probability measures G j on a common base of support points and let G j produce a variation on the global discrete measure G 0 .
So, through a Bayesian hierarchical specification G j ∼ DP (α, G 0 ), where G 0 which itself draws from DP (γ, H), it can be shown that the probability measures are: The HDP-HMM, in this form, does not yet differentiate self-transitions from moves between distinct latent states and allows for fast switching dynamics between them and causing significant posterior uncertainty. For this reason, a "sticky" self-transition bias parameter is introduced that favors self-transitions (Fox, Sudderth, Jordan and Willsky 2010). As for observation models, the sHDP-HMM can be used to learn VAR processes, which are useful to model complex phenomena. The transition distribution is defined as in the sHDP-HMM case, however, instead of independent observations, each mode now has conditionally linear dynamics, where the observations are a linear combination of the past r mode-dependent observations with additive white noise. In our case, we consider the first-order (r = 1) autoregressive Gaussian likelihoods that is the observations are a noisy linear combination of the previous observation plus additive white noise e, with observation x t , can be defined as Where, each state k is composed of time-invariant regression matrix coefficients A and a covariance matrix Σ are necessary. The generative process for the resulting HDP-AR-HMM is then found in Eqtn 7. Both A and Σ for specific latent states are both uncertain, they need to be learned. The parameters θ = {A, Σ} are approximated for each state by defining a conjugate prior distribution on them. Particularly, a Matrix Normal Inverse Wishart (MNIW) is used as a conjugate prior distribution when both A and Σ are uncertain. If only the covariance is uncertain, the conjugate prior is defined as d−dimensional Inverse Wishart (IW) distribution with covariance parameter ∆, a symmetric positive definite scale matrix and ν the degrees of freedom as in Eqtn. 8.
The full definition of this joint prior is found in (Hughes, Stephenson and Sudderth 2015) and defined as N IW(κ, ϑ, ν, ∆). For the IW, the first moment of the distribution is: where, ν, is the degrees of freedom. The expectation of the covariance, for N exemplars of data X N for a given skill and a sequence with length T n , is defined as: Then, to determine the matrix A of regression coefficients, we use the matrix-normal inverse wishart (MNIW) distribution, which places a conditionally matrix-normal prior on A (for a given latent state) such that: The matrix normal is computed once Σ is available, where the covariance Σ represents the covariance across the rows, while K represents the covariance across the columns. By using the model over a set of multi-modal exemplar data X n , the sHDP-AR-HMM can discover and model shared behaviors in the anomaly data across exemplars, even from a few examples. This model does assume however that all exemplars share the same (latent) modes and that modes switch amongst themselves in the same way). It is also possible to use a beta-process prior (Fox, Sudderth, Jordan and Willsky 2010) to avoid this limitation, but this has not yet been implemented for online performance. Pseudo-code for the generation of skill models using the sHDP-VAR-HMM is outlined in Algorithm 1.

Memoized Variational Inference with Scalable Adaptation
Prior to the work in (Hughes, Stephenson and Sudderth Algorithm 1: sHDP-VAR-HMM Models for Classification Input: N c : Number of sequences for class c ∈ C; {X n } Nc n=1 : Dataset with N c sequences, each of length T c ; N i : Number of the maximum iteration for learning; N r : Number of runs for the whole learning; random_state: The random number generator; k_splits: Number of folds; a, b, d, e: Hyper-prior for concentration parameters; ν, ∆, V, M, s F : Hyper-prior for the MNIW distribution; κ: The self-transition bias; K: The truncation active states. Result: HDP-HMM models for each class return Θ π with the maximum L test_mean end 2015), inference algorithms for HMMs and HDP-HMMs have not efficiently learned from large datasets nor have they effectively explored data segmentations with varying number of states. Inference algorithms can be trapped at local optima near their initialization points. Stochastic optimization methods, which are unable to update the number of modes after execution, are particularly vulnerable to data segmentation and exploration and local optima (Johnson and Willsky 2014; Foti, Xu, Laird and Fox 2014). These methods may yield states that become irrelevant and should be removed. Recently, algorithms that add and remove states via split and merge moves have been designed for non-parametric priors like HDP and BP algorithms (Fox, Hughes, Sudderth, Jordan et al. 2014;Chang and Fisher III 2014). However, these Monte Carlo proposals suffer from scalability as they must use the entire dataset and also require that all sequences fit in memory.
Hughe's et al. memoized variational inference algorithm with scalable adaptation uses birth proposals to create new states and merge and delete moves to remove poor predicting states; however, adaptations are validated through a global variational bound (Hughes, Stephenson and Sudderth

2015)
. The algorithm caches sufficient statistics and parallelizes local inference steps to efficiently process sequence subsets at each time step to allow for rapid adaptation of the state space cardinality. The inference algorithm outputs all around better models-more compact and interpretable-to infer the sHDP-HMM's posterior distribution leading to better classification results. Please refer to (Hughes, Stephenson and Sudderth 2015) for complete details of the algorithm and to (bnpy 2017) for the opensource code.

Anomaly Identification
Anomaly identification continuously monitors robot behavior to identify unexpected behaviors during skill execution and even during recovery phases. Recovery phases are challenging as they usually begin in anomalous states and are more likely to trigger falsepositives (Wu, Lin, Luo, Duan, Guan and Rojas 2017b). Different metrics for anomaly identification have been suggested in (Park, Erickson, Bhattacharjee and Kemp 2016;Wu, Lin, Guan, Harada and Rojas 2017a;Park, Kim, Hoshi, Erickson, Kapusta and Kemp 2017;Wu, Lin, Luo, Duan, Guan and Rojas 2017b). Most of these techniques use the maximum cumulative loglikelihood value of the observations given a model. In (Luo, Wu, Lin, Duan, Guan and Rojas 2018), it was shown that such metrics performance is limited during recovery stages. For instance, Fig. 6 contrasts nominal (expected) log-likelihood signals from anomalous ones. In (Luo, Wu, Lin, Duan, Guan and Rojas 2018), we presented a metric based on the the natural logarithm of the HMM filtered belief state (from hereon referred to as the "forward gradient" measure) ∇L. Given an HMM model Π and an incoming time series x 1:t , the natural logarithm of the filtered belief state (see 17.4.1 (Murphy 2012b)) associated with the forward model for latent state i can be represented according to Eqtn 12.
The forward term can be computed iteratively from the previous time-step result as seen in Eqtn. 12 we have: (13) From Theorem 1 in (Luo, Wu, Lin, Duan, Guan and Rojas 2018), we established that for an incremental time series Y , a good HMM model outputs an incremental Viterbi path that stably expands on the previous one. The stable expansion of the Viterbi path is as follows: given a Viterbi path "11223" for an input x[1:t], then the path at x[1:t + 1] becomes "11223*", where * is the newly appended hidden state. From this theorem we derived a corollary that established that the forward gradient L-curve depends on the latest emission probability of the HMM model, which in-turn depends on the latest observation. The key point is the generation of stable and robust large positive-valued gradients when observations are generated by a its true latent state.
Given this fact, anomaly detection using the forward gradient is derived as follows: given an HMM model Π s (Sec. 4.1) representing a certain skill s. Let there be n trials of time series exemplar data X i for i ∈ {1, · · · , n} collected from nominal executions of skills s ∈ S; then anomaly detection in a new time series x can be derived as: where T i is the time length of trial X i and ∇L Πs t (X i ) is the forward gradient output by model Π s at time t computed using time series x i . Then, use the following test to trigger an anomaly for Y: The metric was shown to yield accurate, robust (precision and recall), and fast anomaly identification, even in recovery stages. Fig. 7 illustrates the identification performance of the forward gradient approach. Information regarding, parameters values, models, and training and testing are presented in Sec. 6, whilst anomaly Identification results are found in Exp. 1.

Anomaly Classification
The anomaly classification service is triggered once an anomaly is identified. A system can possibly address a wide variety of types of anomalies including lowlevel hardware anomalies: sensor and actuator noise or breakage; mid-level software contingencies like: logic errors or run-time exceptions; high-level misrepresentations: poor modeling of the robot, the world, their interactions, or external disturbances in the environment (Pettersson 2005)).
In ( anomaly before occurrence anomaly before occurrence anomaly before occurrence anomaly before occurrence anomaly before occurrence Anomalous Trial Figure 7. The log-likelihood gradient ∇L for 5 motor skills s (colored backgrounds) in a task Be. Top plot shows a nominal task whose ∇L is steadily positive (ranges from 10-45 units).
Bottom plot shows a trial that experienced one anomaly per skill execution (caused by human collisions to a robot arm). Anomalies occurred shortly after the red vertical lines seen in each skill (marked with "anomaly before occurrence"). When an anomaly occurs, the gradient becomes negative (ranging from −100s ≤ ∇L ≤ −1000s), providing distinctive data compared to nominal cases.
ability on the robot's end. As introduced in Sec. 2.1.2, four anomaly classes emerged in the cataloging experiments of the kitting task: (accidental) human-collisions (HC) in a shared-workplace; tool collisions (TC) with adjacent objects in the collection bin or the environment; object slips (OS) caused by inertia or external disturbances; and the unexpected movement of objects that led to missed grasps; otherwise described as "No Object" (NO). Compared to anomaly identification, classification is a more challenging problem as one must, not only trigger a binary flag, but have a multi-class classifier affected by unique dynamics of anomalous events: (i) the conditions under which individual anomalies occur can experience a diverse set of dynamics: collisions can happen at different locations, in different directions, and with difference forces. (ii) anomalies may trigger subsequent anomalous events, for example, an HC may trigger an OS. The system must handle the onset of two temporally-near anomalies making it challenging to discern, and (iii) classification becomes increasingly complex as more adaptation nodes occur downstream since the amount of variations in experienced sensory-motor signatures, poses, and physical interactions increase (the implications for recovery are further discussed in Sec. 5).
Just as with anomaly identification, the sHDP-VAR-HMM was used. Given M trained models for M robot skills, 3fold cross validation is used along with the standard forwardbackward algorithm to compute the expected cumulative likelihood of a sequence of observations within the analysis windows (our standard is ± 2 secs.) E log P (X i | Π m ) for each trained model m ∈ M . Given a test trial x, the cumulative log-likelihood is computed for test trial observations conditioned on all available trained skill model parameters log P (x m1:mt | Π) M m at a rate of 200Hz. The process is repeated when a new skill is started. Given the phase in the manipulation graph m c , we can index the correct log-likelihood I(Π m = m c ) and see if its probability density of the test trial given the correct model is greater than the rest for the last observation point: Further information regarding, parameters values, models, and training and testing are presented in Sec. 6. Anomaly classification results are detailed in Exp. 2.

Anomaly Recovery
After classification, the recovery critic implements recovery through re-enacting or adaptive policies as shown in Fig.  2. Re-enacting policies re-execute a skill (possibly the current skill or a previous skill) as designated by the policy (Sec. 5.1). Adaptive policies resolve persistent errors by training adaptive skills that leverage human understanding into the complex set of world-object-robot relations (see Sec. 5.2). The recovery critic runs, not only during all normal phases of the task, but also significantly, during recoveries of anomalous events. To illustrate, refer to Fig. 2, where it is seen that for node move_z, a persistent anomaly anomaly_k led to the creation of an adaptive skill found in node rec_mv z _anom k . Then, during the execution of this adaptive skill, a new persistent anomaly anomaly_m entered the system. Our framework identifies it and assigns a new adaptation encoded in node rec_rec_mv z _anom k that enables the system to reach the next milestone. Implemented recoveries, whether re-enacting or adaptive, are strictly coupled to the specific anomalies (or anomaly labels) that caused them. Recoveries themselves are globally unique and thus emerge contextually in the task (not so with anomalies). To illustrate, consider that the same anomaly may show up at different points in a task, e.g. a tool collision may happen as we try to pick an object; as we move to the packaging box; or as we place the object in the box. However, the recoveries associated with these anomalies are unique. That is, the recovery skill needed when experiencing a collision during the pick phase may be different from the one used when hitting the box (it may be possible that the same recoveries skills repeat, but we have not explicitly studied how to leverage repeated recoveries in this work). An overview of the recovery framework is summarized in Fig. 8.

Re-Enacting Policies
Re-Enacting policies resolve accidental one-off anomalies. All anomalies are considered accidental by default, and only when they cannot be resolved through re-enactment are they considered persistent. The premise is that accidental events are resolved through the re-enactment of reparameterized skills. The key question is to identify which skill needs be re-enacted? A few works have used a policy where either the entire task is repeated from the beginning or fixed points in the task are selected a priori (Nakamura, Nagata, Harada, Yamanobe, Tsuji, Foissotte and Kawai 2013;Wu, Lin, Luo, Duan, Guan and Rojas 2017b;Rodriguez, Mason, Srinivasa, Bernstein and Zirbel 2011). In this work, we learn more efficient skill selection mechanisms.
Given a current milestone N j , for each new accidental anomaly F y , a new re-enactment (transition) R R is inserted into the graph as follows: where * is the target node and it is selected according to the policy introduced in Sec. 5.1.1.
In the kitting experiment, consider an object slip anomaly during node 3 when the robot is moving towards the box. Instead of returning home, the robot can re-enact a reparameterized version of the pick skill. Fig. 2 illustrates the concept, consider node move_y. When anomaly_j occurs, the recovery critic assigns re-enactment rec_mv y _anom j which transitions to the previous pick node. Or, back in the kitting experiment, consider an accidental human collision that bumps the robot arm whilst executing the move_to_box skill. Provided built-in safety procedures, once the temporary accidental contact concludes, the robot could re-enact the current skill. Note that nodes contain skills that are inherently reactive. The starting and goal poses of a skill can be set without altering the skill's properties. A re-enactment of the current skill with a re-parameterized starting pose would be enough to complete that task phase and reach the next milestone. Fig. 2 also illustrates the concept for move_y and anomaly anomaly_l. The critic here assigns re-enactment rec_mv y _anom l which is a self-transition. In effect, reenactment goal nodes are chosen in relation to the nature of the anomaly type.

The Re-Enactment Policy
Re-enactment goal nodes are assigned through multinomial distributions that model human-user goal node selections given a current node and a specific anomaly. Five human users studying a robotics master's degree were trained to understand the graph topology of the task, possible transitions, skill execution, goal parameterization, anomaly types, and legal node selections/transitions for re-enactment. Each user examined 5 trials of induced anomalies on a per-node, per-anomaly basis, yielding independent multinomial distributions to determine re-enactment policies. For instance, if at node 2, three anomaly types occur, then there will be three multinomial distributions modeling the policy. For each multinomial, let N = (N 1 , ..., N K ) be a random vector where N j is the number of times a node j is selected as a Figure 8. After classification, the recovery critic triggers a re-enacting or an adaptive policy according to the nature of the anomaly: persistent or one-off (accidental). Re-enacting policies model human decision making probabilistically (Sec. 5.1). Adaptive policies train a new skill and transform goal to reach a next phase in the task. The new skill is stored and the task graph updated (Sec. 5.2). re-enactment target node. Then N has the following pmf: where, θ j is the probability that node N j is selected. The results are shown in Table 1. The multinomial provides an indirect way to represent human intuition about the complex set of relations that exist between the robot (and its limbs), the relevant objects of the task at hand, and the interactions that the robot and the objects have with the world. Additionally, the multinomial also encode a person's internal belief about the utility of a choice, his/her own learning ability (within a trial and across trials), and the person's risk propensity or aversion in decision making 4 . For instance, OSs that occurred during the picking skill (node 2), were assigned two different types of re-enactment target nodes: to re-execute the same pick skill with 80% probability and to execute the previous move-to-pick (node 1) with 20% probability. The choice of returning to node 1 represents a more conservative belief or risk averse selection on the user's part.

Re-Enactment Target Nodes
Goals for re-enacted target nodes are set by the visualization module. The starting pose is simply the current pose at the time of anomaly, while the goal pose is set as originally described in Sec. 3.2.

Adaptive Policies
Adaptive policies are used to resolve persistent anomalies. Persistent anomalies are classified as such when a reenactment policy fails to resolve a given anomaly twice consecutively. This phenomena indicates that re-enactment is unable to solve the condition and that the task requires explicit adjustments to finish the task successfully. In this work, we rely on human intuition and expertise to provide the necessary adaptation skill to solve the persistent task anomaly.

Kinesthetic Teaching
Our system is designed to pause automatically when two consecutive re-enactment policies occur for the same node-anomaly pair in the graph.
The system then awaits for the user to initiate kinesthetic teaching (through the push of a system button) and encode the adaptive skill. The system also, at this time, records all relevant sensory-motor data necessary (until the end of kinesthetic teaching) to train a new introspection model for the current nominal (adaptive) skill.

Graph
Integration Given a current milestone N i , for each new persistent anomaly F x , a new adaptive recovery node R A : N ij is inserted into the graph as a new branch in-between milestones, where the target node transition * is inherited from the parent node in the graph in accordance to Eqtn. 19. Fig. 2 illustrates the concept, consider how in node move_z, persistent anomaly anomaly_k is resolved using adaptive skill rec_mv z _anom k as a new branch between milestones move_z and place.
For cases in which an anomaly F xx occurs during an adaptive node N ij , a new adaptive node is created in a new branching layer: Branches always transition to the ensuing milestone, not matter the branching level. In this work, we have assumed that a single adaptive skill is sufficient to restore the nominal functioning of the task. It is plausible to sequence skills to achieve more complex manipulations.

Setting Adaptive Node Goals
As described in Sec. 3.2, skill goals are set by the Visualization module of a node. However, for adaptations, when human users introduce additional manipulation, they are also introducing a transformation on the goal pose of the parent skill with respect to the base frame. Adaptive skills then compute the transformation of the last time step in kinesthetic teaching with respect to the goal of the parent node. During online testing, the Visualization module computes the real-time goal of the parent node, whilst the adaptive skill transforms that goal to achieve task generalization during adaptation.

Inheriting Re-Enactment Policies
Whenever we push a new adaptive node into the graph, that adaptive node is set to inherit the same re-enactment policies available to its predecessor. This is important so as to avoid the need to re-train re-enactments in new adaptation nodes.

Training
Cataloging experiments were used to capture sufficient data to create robust nominal skill introspection models for adaptive anomalies. These models are then used by our Anomaly identification routine in Sec. 4.2, to identify anomalies that may occur during such adaptations. Anomaly Identification performance is presented in Exp. 1, whilst the success rates for adaptive policies presented in this section are reported in Exp.'s 4-6 under a variety of different conditions that elucidate system performance.

Experiments and Results
Seven experiments are setup to test the accuracy, robustness, and reactivity of anomaly identification and anomaly classification as well as the efficacy and versatility of our recovery policies under different situations. Exp. 1 & 2, present accuracy and robustness results for Anomaly Identification and Anomaly Classification respectively. Exp. 3-6 examine the recovery policy efficacy and versatility. Exp. 3 measures the robustness of re-enacting recovery policies. Exp. 4 tests the robustness of adaptation policies. Exp. 5 analyzes the robustness when both recovery policies coexist in the same task. Exp. 6 tests the system's ability to introspect anomalies and recover from them whilst the system is executing an undergoing a recovery action due to a previous anomaly in the system. Finally, Exp. 7 analyzes the reactivity of our anomaly classification algorithm.

Kitting Experiment Setup
As stated in Sec. 2.1, the Baxter robot is set-up to perform a kitting experiment in conjunction with a human co-worker. The human is responsible for placing objects in the collection bin and the robot is responsible for the packaging. The space is shared between the robot and the human is shared rendering it possible for the human to provoke anomalies in the system: including both accidental and persistent anomalies. Three computers are used to run the experiment: Baxter's internal computer, which runs Gentoo Base System 2.2 and an Intel(R) Core i7-3770 CPU@3.40GHz, 4GB-RAM, x64-based processor. The internal computer is used to run a ROS joint trajectory server as well as the camera on the left arm. The other two computers run Linux 14.04 with ROS Indigo. One computer has an Intel(R) Core i5-3470 CPU@3.20GHZ, 6GB-RAM, x64based processor and runs alvar recognition, the moveit service, and time-series pre-processing for all sensory-motor data. The second workstation, runs an Intel Xeon i7-6820HQ CPU@2.70GHz(3.60GHz Turbo), 8MB-RMA, x64-based processor and is in charge of running anomaly identification and anomaly classification online which is implemented with BNPY (bnpy 2017), with a ROS-wrapper.
Our graph implementation uses a hybrid approach. Base nodes for the kitting experiment are currently implemented through ROS-SMACH. The non-adaptive nodes however are designed through an internal procedural representation which is detailed in Appendix A. Diagrammatic representations and code are accessible through our supplementary materials page (Rojas 2018b).

Human Subject Training
In Exp.'s 3-6, five different human subjects, under consent, took part in the experiment as human collaborators. They were trained to place consumer goods, one-at-a-time, in the collection bin of the robot. We ask human subjects to assume they are multi-tasking and experiencing loss of attention. The loss of attention can lead (as discovered by the cataloging experiments in Sec. 2.2) to a number of anomalous events including: (i) HCs, (ii) TCs, (iii) OSs, and (IV) NOs-wall collisions (WC) are introduced in Exp. 4 but these result not from human induction but from different object shape properties. HCs may occur when the robot picks up objects from the collection bin and the human collaborator places new ones. TCs may occur when humans inadvertently place objects near each other such that when the robot attempts to pick an object, one of its fingers collides with the adjacent object (see Fig. 16(b)). OSs may occur after human collisions that rattle the gripper and cause heavier or smoother objects to fall. NO anomalies may occur when a human accidentally collides or removes an object that the robot intended to pick up.

Signal Processing
Regarding the signals used in these experiments, we originally considered a 6 DoF end-effector twist and wrench respectively, a 7 DoF pose (using quaternions as orientation), and 56 taxel values (each finger has a 4-by-7 grid). A variety of human-engineered pre-processing techniques were tested for these signals. The final selection of pre-processing features for these signals was decided during the validation stage of experimentation and will be reported individually for Anomaly Identification and Classification in Exp.'s 1 and 2 respectively.
All signals were scaled, resampled, and aligned. With respect to scaling, signals were modified to lie in a range of −1 ≤ y i ≤ 1 by computing the absolute value of the maximum signals during training. With respect to resampling, given that different observation signals have different publishing rates (wrench: 1000Hz, tactile: 1000Hz, pose and twist: 100Hz) a re-sampling rate is used to acquire a single time-point at which to model the observations. Our code relies primarily on python and ROS. Rospy nodes inherently use Python's multi-threading class to handle multiple publishers and subscribers. The class, however, lacks real-time performance support and we have only achieved re-sampling rates of up to 50Hz. Alignment takes places by syncing the timestamps from the varying ROS topics.

sHDP-AR-HMM Parameters & Hyperparameters
Given that both anomaly identification and classification are based on the same model, we present a base-model to introduce parameter settings that are broadly shared across the methodologies. Whenever particular differences exist from the base-model, they will be explained within specific experiments. For the observation model, we use a first-order vector autoregressive with regression matrix coefficients A and covariance matrix Σ for specific latent states. Since both of these dynamic parameters are uncertain, they need to be learned. The MNIW is an appropriate prior distribution when both the mean and the covariance are uncertain (Hughes, Stephenson and Sudderth 2015).
We begin by determining the covariance Σ through the use of the IW distribution N IW . For this computation, we must define the first moment of the distribution according to Eqtn. 9. Here, we set ν, the degrees of freedom, is set to to the sum of the number of dimensions + 2: ν = d + 2. This setting ensures the conjugate MNIW prior has a valid mean (see Sec. 4.5.1 in (Murphy 2012a)). As for the computation of the expectation of the covariance in Eqtn. 10, the scalar s F is set to 1.0 and multiplied by the scatter matrix (also the empirical covariance). This setting is motivated by the fact that the covariance is computed from polling all of the data and it tends to overestimate latent-state-specific covariances. A value slightly less than or equal to 1 of the constant in the scatter matrix mitigates the overestimation.
Then, to determine the matrix A of regression coefficients, the matrix normal of the MNIW uses a mean matrix M set to the zeros matrix M = 0 d , of size d × d. We do so to let the new observation be primarily be determined by the signal noise.
For the covariance K across the columns an identity matrix is used such that K = 1.0 * I d with the same dimension as Σ.
For the concentration parameter α of the HDP prior, a Gamma(a, b) distribution with values a = 0.5, b = 5 is used. For the self-transition parameter µ a weakly informative Beta(c, d) prior distribution is used with values c = 1, d = 10.
For the sticky HMM transition distribution, another κ (the degree of self-transition bias) is set to 50. The number of maximum iterations for the Split-Merge Monte Carlo method is set to 1000. Finally, the truncation (maximum) number for latent states is empirically set to K = 10 for both anomaly identification and classification.

Classification Modalities
As part of Exp.'s 3-6, we present success rate metrics as a function of two distinct classification system modalities: i perfect anomaly classification (independent system) ii imperfect classification (combined system) The perfect anomaly classification modality implies that recoveries are only attempted when true positives classifications are produced by the system. In doing so, we can treat the entire system as three independent sub-systems: anomaly identification (AD), anomaly classification (AC), and the recovery (REC) system. By separating the sub-systems we can study their effectiveness independently from the other systems. The imperfect classification modality on the other hand studies the success rates of recoveries in the presence of misclassifications. This leads us to treating the entire system as a function of two subsystems: AD and AC/REC. Such separation let's us study some interesting phenomena that emerged from the REC system and is detailed in each of the experiments.

Experiment 1: Anomaly Identification
In Exp. 1, we evaluate the performance of the anomaly identification system across the entire set of experiments. Specific context analysis will be presented within Exp.'s 3-6. We have expanded our previous work on anomaly identification by learning to flag anomalies caused by a larger number of classes. A larger class set (including new skills that are learned through adaptive policies) implies more challenging accuracy, precision, and recall performance in the system. Furthermore, since the anomaly identification system is the first to be triggered, it is critical that identification is done accurately; otherwise the system will suffer increasingly from upstream errors. In this section we present the identification accuracy of the system as well as the robustness through accuracy, Precision and Recall metrics. The anomaly identification system used the sHDP-VAR-HMM technique (Sec. 4.1) to create class models for both the original nominal skills introduced in Sec. 2.1 (we will call these non-adaptive nodes), but also and very importantly for new adaptation skills that are learned when persistent  Figure 9. Summary of accuracy, precision, and recall metrics for anomaly identification across all experiments on a per-node basis, including recovery over recovery runs in Exp. 6, and a total summary of performance.
anomalies take place (we will call these adaptive nodes).
In particular, the adaptive skills of Exp. 4a,b,c, and 6a,b.
Once the nominal models are trained, the forward gradient measure (Sec. 4.2) is used for anomaly identification. Upon the collection of offline data for training from the inducing experiments described in Sec. 2.2, a scoring heuristic was implemented over 5-fold cross-validation that allowed us to select from a variety of hand-engineered multi-modal signal features and parameter values. Different combination of features were tested for specific sets of parameter values. Scoring in the form of accuracy, precision, and recall metrics was computed for each combination. The highest scoring model was selected. The highest score resulted in the following combination: • End-effector force F , torque τ , linear velocity ν, and angular velocity ω such that: [F, τ, ν, ω] ∈ R 3 .
• The maximum standard deviation σ computed for each of the 28 taxels in a tactile sensor for the left and right fingers; namely, max σ [σ l , σ r ] ∈ R 1 .
To build anomaly identification models for both nonadaptive and adaptive skills, a fixed number of 7 trials was used. Non-adaptive skills consisted of the move-to-pick, pick, move-to-box, and place skills and adaptive skills are those captured in Exp.'s 4a,b,c and 6a,b respectively.
Macro accuracy, precision, and recall metrics are extracted by testing whether we can identify anomalies (HCs, TCs, OSs, NOs, or WCs) given some domain (nodes or subexperiments).
In this section, we present a summary of the results for anomaly identification for Exp.'s 3-6 (Exp. 2 presents anomaly classification results). Fig. 9 charts the summary across nodes 1-4 as well as new adaptive nodes that are particularly generated when anomalies occur during recoveries as seen in Exp. 6. In Exp. 6, we analyzed two scenarios: Adaptations over Adaptations (AOA) and Reenactments over Adaptations (ROA) which are discussed in detail there. All results and their analysis can be found in Extension 3.

Results
Our anomaly identification accuracy for the totality of all experimental (766) trials was of 93.09% (see Extension 3 for details). The precision was 94.09% and the recall 97.98%. These results show very strong accuracy and performance which is critical to avoid the aforementioned downstream errors. In terms of performance across nodes, the experiments revealed very similar performance throughout the task with an average accuracy of 93.34%. This implies that anomaly identification performance did not improve or decline as the manipulation graph traversed the nodes-rendering the identification consistent and reliable. The system also showed perfect accuracy and robustness for occasions in which persistent anomalies occurred during recoveries (AOA-Exp. 6). For times where accidental anomalies occurred during recoveries (ROA) the accuracy and precision was strong at 90% with no false-negatives.

Experiment 2: Anomaly Classification
After anomaly identification, it is important to understand the performance and robustness of the anomaly classifier. The anomaly classification also uses the sHDP-VAR-HMM with memoized variational inference (Sec. 4.1) along with the same features and training style used in anomaly identification. The model is trained to classify anomalies caused by human collisions (HCs), tool collisions (TCs), object slips (OSs), no objects (NOs), and wall collisions (WCs) introduced in Exp. 4. For training, we used the following number of trials for the aforementioned classes: HC-18, TC-17, OS-18, NO-15, and WC-17. We have not yet implemented an unsupervised learning method that automatically generate new anomaly labels based on previously unseen data (determined through a confidence metric), but we have contemplated this work (see Sec. 7). Anomaly classification is only triggered if anomaly identification experiences a true-positive. Once the classification procedure is called no true-negatives or false-negatives exist in the system. Only true or false positives. For this reason, classification will be measured in terms of accuracy across nodes or confusion matrices for a particular experiment. In Exp. 2, we present a summary of the corresponding information for Exp.'s 3-6. Anomaly classification accuracy across nodes (including the AOA and ROA nodes introduced in Exp 1) is presented in Fig. 10. A confusion matrix was also computed for classification for all experiments and shown as a figure in Fig. 11. Furthermore, we used the F1-score metric to compare the performance variational inference algorithms across allocation and observation models. The models used for this comparison are listed bellow: variational inference with scalable adaptation with variational coordinate ascent under different allocation and observation models. Stochastic variational inference was contemplated but not used as the algorithm did not converge after 1000 iterations. Gibbs sampling was also not used as it was not available as part of online BNPY (bnpy 2017). The comparisons are also conducted as a function of the number of total training trials. The same number of total training trials was used as mentioned at the beginning of this experiment. Fig. 12 shows the comparative performance of the inference methods.

Results
Our anomaly classification accuracy for the totality of all experimental (719 trials) data was of 96.15% (see Extension 3 for details). Interestingly, the accuracy of our anomaly classifier was overall more accurate than our anomaly identification routine. Extensive experimentation has been carried out. General trends are reported here, whilst specific  experimental details are presented within each experimental section. For the non-adaptive nodes, node 1 had perfect classification accuracy. Nodes 2-4 ranged from 94.20% to 96.27%. This indicates very similar performance over task-time and that the classifier was robust in detecting a varying range of challenges (see each experiment for specific details). The performance during already executing recovery actions was of 100%. Although the number of trials for this section was 19, the data suggests strong classification performance even as the robot is adapting to anomalies. In terms of the confusion matrix in Fig. 11, accuracy ranged from 88.7% to 98.4% for NO and HC respectively. The 2nd poorest classification was that of WC. WC were more challenging as the collision sometimes occurred against the gripper but in other occasions against the held object. OS came next with 93.1%, OS classification was challenged primarily by the tactile sensor noise experienced and explained later in Exp. 3. With regards to variational inference performance, Fig.  12 shows how the sHDP-VAR-HMM with Memoized Variational Inference with Scalable Adaptation generally outperformed the rest of the combinations except for a couple of instances. In fact, in around 88.3% of the fraction of training trials our algorithm outperformed all others. The exceptions occurred roughly for the fraction 0.3-0.33 of the total training trials, where the sHDP-HMM-Gauss-MemoVB initially outperformed our algorithm 0.787 to 0.731. Similarly, for the fraction 0.87-0.90 of the total training trials, the the sHDP-HMM-Gauss-VB outperformed our algorithm by 0.9%. Note that results will vary slightly across experimental runs as trial data is selected randomly and the probabilistic framework we is unable to fix the random seed value across runs.

Experiment 3: Testing Re-enactment
Experiment 3 analyzes the accuracy and robustness of the anomaly identification, anomaly classification and recovery critic for accidental anomalies. We study the recovery critic's ability to re-enact reliably at different phases of the task. To this end, accidental anomalies were induced at specific graph phases as listed below: Node 1: HC Node 2: HC, TC, OS, NO Node 3: HC, OS Node 4: HC The results for anomaly identification and anomaly classification for Exp. 3 are shown in Figs. 13(a) & 13(b). A confusion matrix for classification accuracy is shown in Fig. 14.
For the re-enactment recovery system, 60 recoveries were attempted (10 trials per object for 6 objects and induced by 5 trained users) on a per-node basis (4 total) under our two classification modalities: (i) perfect classification and (ii) imperfect classification.
The result of the re-enactment policy for modality (i) is shown in Table 2 and for modality (ii) in Table 3.

Results
For anomaly identification, a total of 574 trials were used for

Anomaly Classification for Accidental
Anomalies (Exp 3) Figure 13. Accuracy, precision, and recall metrics for the anomaly identification and accuracy metrics for the anomaly classification system on a per-node basis for accidental anomalies (left and right respectively).   For anomaly classification, a total of 516 trials were used for testing (93, 242, 122, and 59 for nodes 1 to 4)with an average accuracy of 96.87%. Nodes 1 and 2 were classified perfectly, followed by 4, and struggled the most with node 3 at an accuracy of 90.77%. The confusion matrix for anomaly classes shows perfect or near perfect classification for TC and HC respectively and struggled more with OS and NO. OS detection suffered primarily form noise in our tactile sensor. We believe a large portion of the noise came from false contacts in the electronics in the tactile sensor. Whilst we attempted to rigidly fix the sensor's electronics, there was still wiggle during anomalous events. With regards to NOs, we were surprised with the lower classification rate. We believe that the tactile sensor's noise was also the culprit. We wanted to use the infra-red sensor on the robot's wrist as an additional observation source, however, the force-torque sensor set-up blocked the IR signal and prevented its use.

HC
With regards to re-enactment recovery, we present success rates for both classification modalities. Under perfect classification, we re-enacted and completed the task successfully on average 98.75% across all nodes (see Table  2). Some failures occurred in Node 3 as an OS occurred. After the OS, the object reached a location outside the field of view of the camera and prevented the system from computing the object pose. We should note that there were 11 other trials where system failures occurred (these were not marked as recovery failures). There were two main causes for the system failures: (a) challenging pick poses resulted in tactile sensor cables constraining the gripper and (b) an electricity overload in the system that rendered parts of the robot to a halt.
Under imperfect classification, we expected a lower performance, and obtained an average recovery completion of 92.81% across all nodes (see Table 3). The highest rates were obtained in node 1 and 4 under HC anomalies with 95%, recovery, TC anomalies in node 2 with 98.3% recovery, and OS anomalies in node 3 at 95% recovery. The picking skill was the most problematic to resolve in the presence of HCs and NOs.
With regards to overall system trends we observe: very competitive anomaly detection at an average of 91.16% and very high anomaly classification (one of our contributions)  Figure 15. Overall system success rate as a function of modality. Modality (i) considers the contributions of three independent systems: identification (AD), classification (AC), and recovery (REC). Modality (ii) considers the contributions of an independent AD system with a combined AC/REC system. The figure shows an interesting phenomena during node 3: the 2nd modality performed better than the 1st as wrong classifications were corrected downstream and coupled with correct recovery policies.
at 96.87%. For re-enactment under independent systems we see that re-enactments can resolve almost all accidental anomalies at 98.75%. One last but very interesting development was evident when we computed the performance of the entire system under the two classification modalities as seen in Fig. 15. Note, interestingly, that for one out of the four nodesnode 3-the overall success rate of the combined system was higher than that of the independent system. This implies that system completed the task successfully more times under imperfect classification than with perfect classification. The specific reason for this phenomena is that soon after a misclassification takes place; the introspection system detects that the robot is still in an anomalous state and triggers a new anomaly flag and issues a new round of classification. This time the correct policy is issued and resolves the anomalous situation. One example is when an OS was misclassified as an HC. The HC triggers a reenactment, but the robot is not grasping the object. At a later time step, the introspection system flags another anomaly and classifies it as an NO. This time a pick re-enactment is issued and enables the robot to successfully complete the task.

Experiment 4: Testing Adaptation
Experiment 4 analyzes the robustness of the anomaly identification, classification, and adaptive recovery policy in the face of persistent anomalies. We analyze adaptation robustness by testing three scenarios with an increasing number of persistent anomalies (and thus adaptations). The three sub-experiments test robustness under the following conditions: 4a: one adaptation at a single phase (two examples). 4b: a 2nd adaptation introduced at a new phase. 4c: a 3rd adaptation introduced at a new phase.
For this experiment we run a total of 20 trials per persistent anomaly (4 objects with 5 trial runs per anomaly). A Figure 16. Persistent Pick Anomaly. On the left: the proximity of an adjacent object consistently precludes the proper gripping of a target object leading to a persistent tool collision. On the right: the execution of the learned adaptive skill which rotates the wrist and clears the fingers for the pick skill.
new anomaly class-Wall Collision-was discovered in these experiments and labeled (WC). We analyze whether adaptive policies work robustly independent of the number of adaptations that occur previously in the system and also whether or not the policies generalize across objects. Object locations and order are varied and randomized across trials. Sub-experimental details are given in three distinct sections below. Results are jointly presented and analyzed at the end of this section for succinctness.

Experiment 4a: Adaptation at Distinct Single Nodes
In Experiment 4a, we analyze the robustness of the framework to properly identify, classify, and recover from persistent anomalies in single instances using adaptive recoveries. As described in Sec. 5.2, when the same anomaly occurs twice consecutively in the same node, the anomaly is considered persistent and an adaptive skill is learned from a user demonstration to recover and transition to the succeeding milestone in the task.
Tool collisions (TC): occurred when two objects were placed by a human operator too close to each other. In such conditions, when the pick skill in node 2 is executed, one of the robot's fingers collides with the neighboring object and prevents a proper pick as illustrated in Fig. 16. Reenactments do not resolve the situation so help from a user is elicited to overcome the persistent condition. The taught adaptive skill rotates the robot wrist about the approach axis and clears the fingers from the obstruction.
Wall Collisions (WC): in this (second) example, no tool collision occurs at node 2, however a persistent collision occurs at node 3 as the robot moves the picked object to the packaging box. The wall collision is a variant of of a tool collision. Tool collisions were narrowly defined as collisions that occur on vertical downward motions. In this case, the collision occurs with a lateral motion and the contact can be either tool-wall (of the packaging box) or object-wall. The reason for such anomaly is that the original move-to-box skill was trained on an object of a given height and later, a taller object was picked and the object did not clear the wall using the original skill (see Sec. 7 for a discussion on motion adaptation based on shape properties). Re-enactment does not resolve the anomaly; so an adaptive skill which executes a clearing motion is taught. The execution is shown in Fig.  17.

Experiment 4b: Incremental Growth for Two Adaptations.
In Experiment 4b, we analyze system robustness when two adaptive skills are learned incrementally for different phases of the task. It is important to ensure that the performance of the system is not compromised as more adaptations are introduced into the task graph. In this experiment, we integrate the adaptive recoveries learned in Exp. 3a and induce both persistent anomalies in the same experiment in an incremental fashion at different phases of the task: 4b : @2TC,@3WC.
In this way, the robot first responds by rotating its wrist to clear the persistent obstruction during the pick; and later upon collision with the wall, the robot responds by lifting its arm and clearing the box wall before placing the good in the package.

Experiment 4c: Incremental Growth for Three Adaptations.
Finally in Experiment 4c, we analyze system robustness when we integrate the third adaptation. The next persistent anomaly occurs in node 4 as the robot places an object in the packaging box. The last anomaly results when, upon executing the placing skill, an object already in the box obstructs the placement of our currently held object. So the final sequence of anomalies at varying phase locations is: 4c : @2TC, @3WC, @4TC.
The Visualization module is in charge of allotting unique placement goals for all objects in a box, such that they all have a unique space within the package. However, it is possible that upon placement of an object, the latter falls and shifts to a different location in the box causing a tool collision. The adaptive skill teaches a simple displacement motion whose goal is parameterized by the visualization module to a clear location. Fig. 18 shows such process.

Results
We now summarize the results for Experiment 4a,b,c. For anomaly identification, a total of 124 trials were used for testing (20, 21, 38, and 45 for experiments a.1, a.2, b, and  c). For anomaly identification, we had an average accuracy of 97.04%, an average precision of 97.02% and an average recall of 99.42% across the three sub-experiments. Very strong performance was achieved all around and charted in Fig. 19(a).
For anomaly classification, a total of 121 trials were used for testing (20, 20, 37, and and 44 for experiments a.1, a.2, b, and c) with an average accuracy of 94.09%. Experiment 4a.2 had the worst performance at 85.0%, followed by Experiment 4b at 94.59%, and perfect classification in Experiment 4c.
A confusion matrix was also computed for classification accuracy and shown as a figure in Fig. 20. TC and WC are the core classes, whilst HC appears as a result of misclassification. Across all sub-experiments we were able to identify TCs in Exp.'s 4a and 4c with 100% accuracy. Wall collisions were slightly less accurate at 89.80%. Wall collisions were harder to classify given that those collisions occurred under two different scenarios: at times the gripper collided with the box and at other times the held object made the collision. Hence, the multi-modal signals contained variations that degraded the classification performance.
With respect to adaptive recoveries, Fig. 21 presents success rates under our two classification modalities. As expected, the success rates under perfect classification generally were higher than those with imperfect classification with an average across sub-experiments of 85.0% and 77.5% respectively. The exception was Experiment 4a.1, where the imperfect classification modality achieved 95.0% success rates v.s. 90.0% for modality (i). The failures under modality (i) were due to manipulation system errors. In one trial, during the move-to-box node, the object's collision with the packaging box moved the latter and the place action failed. Our system is limited by not actively tracking objects of Figure 18. As part of the last phase of the task, the robot attempts to place an object in the package only to find an existing object at the target location. A re-enactment does not solve the anomaly, so an adaptive move is taught and a new goal provided by the visualization module.  Figure 19. Accuracy, precision and recall metrics for the anomaly identification system and accuracy metrics for the classification system on a per-(sub)experiment basis for persistent anomalies (left and right respectively).
interest and rationalizing relationships between them (see Sec. 7 for more comments on this). The results also reveal that one object-set of trials in Experiment 4c had difficulties. Under perfect classification, an adaptive behavior rotated the gripped object and cause a collision with objects leading to an irrecoverable situation. For imperfect classification, there was a set of trials that led to 0 completions. Failure occurred during the adaptation to the persistent wall collision in node 3 as the system moved to the box. The culprit was the inability of the system to adapt its motion when an object with different shape attributes (height) was used compared to the one used during user demonstrations. This result points to a weakness in the system's ability to generalize adaptations when object  shapes vary drastically from training as no spatial reasoning is yet embedded in the system. If each of those two trial-sets were not considered, the average success rate would be to 90.83% and 82.50% for perfect and imperfect classification modalities respectively. With respect to overall system performance, we again compare the performance between modalities. We achieved an average success rate of 78.02% and 75.36% for both modalities respectively. Figure 22 charts the results over subexperiments and modalities.
As with Exp. 3, we again see the interesting phenomena that for Experiment 4a.1, modality (ii) achieved higher success rates than modality (i). It supports the premise that even when there are misclassifications in the system, the task can be completed as the system some time later correctly detects, classifies, and recovers from existing anomalies.  Figure 22. Overall system success rate as a function of modality for adaptive recoveries. Modality 1 considers perfect classification and modality 2 considers imperfect classification. It is surprising that some experiments with imperfect classification outperformed those with perfect classification in success rate. Wrong classifications were corrected downstream and coupled with correct recovery policies. Anomaly Classification for Accidental and Persistent Anomalies (Exp 5) Figure 23. Accuracy, precision and recall metrics for the anomaly identification system and accuracy metrics for the classification system on a per-(sub)experiment basis for merged accidental and persistent anomalies (left and right respectively).

Experiment 5: Test Re-enactment and Adaptation
Experiment 5, analyzes the robustness of the system when re-enactment and adaptations are both integrated and present in the system. It is important to verify that re-enactment policies are not detrimental to adaptive policies and viceversa. For this experiment, we integrate the accidental and persistent anomalies of experiments 3 & 4, and similarly use the re-enactments and adaptations already learned. Anomaly identification and classification metrics are presented as before under both classification modalities. The sequence of anomalies and recovery policies present in the system are delineated in Table 4, where we refer to re-enactments as "RE" and adaptations as "AD". For this experiment, Table 4. Sequence of induced accidental and persistent anomalies into the system along with triggered re-enactment (RE) and adaptive (AD) policies during the Kitting experiment. Type   2  TC  RE  3  HC  RE  3  WC  AD  4 TC AD 2 objects were selected at random and 10 test trials were conducted for each object. A total of 20 trials were run for each modality. Anomaly identification results across nodes can be seen in Fig. 23(a) while anomaly classification accuracy can be seen in Fig. 23(b). The anomaly confusion matrix is shown as a figure in Fig. 24. Corresponding success rates for modalities (i) and (ii) are summarized in Table 5. Notation in Table 5   TC that occurs at node 2 followed by a re-enactment, the notation we use is: @2TC-RE, hence @ indicates the node phase, followed by the two digit anomaly, followed by a dash to indicate the type of recovery. Tables 6 and 7 follow the same notation.

Results
We now summarize the results for experiment 5. For anomaly identification a total of 72 trials were tested. We had an average accuracy and recall of 97.9% and a perfect precision. For nodes 2 and 3 anomaly identification was done perfectly for the three metrics. It was node 4 that was more challenging with an accuracy and recall of 93.8% and perfect precision. For anomaly classification, 71 trials were tested with an average accuracy of 92.8%. As with anomaly identification, it was also node 4 that was the most challenging to classify followed by node 3 with an accuracy of 86.7% and 91.7% respectively. Note that by the time the robot reaches node 4 it has undergone 3 different anomalies and is undergoing one more and the system has also experienced two re-enactments and an adaptation. As discussed earlier, a high degree of variability in the sensory-motor signals (compared to training) begins to enter the system as more recoveries take place and change gripping poses, dynamics and inertia, and the interaction with the objects. With regards to success rate, under classification modality (i) the success rate was 90.0% and under modality (ii) the rate was 80%. Fatalities occurred during the wall collision where the collision caused an object slip that displaced Table 5. Success rate for combined Re-Enactment and Adaptive recoveries across 2 objects under different classification modalities. Anomaly and recovery are presented under the following notation: node location for anomaly occurrence denoted with @; followed by anomaly type, and recovery policy indicated after (-). Additionally, manipulation system errors contribution as a percentage of total failures is enclosed in parenthesis. the object beyond the camera's field of view impeding any further attempts to re-pick. Under imperfect classification, we experienced a misclassification of HC as OS. The robot attempted to re-enact a pick. However, the object's pose was too high and no IK solutions existed. On another occasion a WC got misclassified as TC repeatedly, we aborted after 3 attempts. Specific experimental outcomes can be found as comments for this experiment in Extension 3, under the "Exp 5" tab in Excel.

@2TC-RE
The Wall collisions experienced in this experiment, afforded a new phenomenon. Namely, how the generation of one anomaly leads to the trigger of a subsequent anomaly. In Table 4, note that an HC is induced in node 3. This same HC can trigger an OS in the task. For this reason, we further studied the system's ability to recover from a subsequent OS anomaly. As before, 10 trials were tested for the same 2 objects under both classification modalities with results shown in Table 6. Under perfect classification 90.0% Table 6. Success rate for combined Re-Enactment and Adaptive recoveries in the presence of a subsequently generated anomaly across 2 objects under different classification modalities. Generated anomaly is denoted with(→). Manipulation system errors enclosed in parenthesis as a percentage of failure contribution. success rates were also achieved. The fatality occurred when the wall collision displaced the packaging box in a way that precluded further placing of objects in the box. For imperfect classification 70.0% success rates were achieved. In this experiment, during node 3, when an OS occurred, the system misclassified as a HC and triggered a re-enactment of the same node. Later the system triggers an NO object flag; however, because we had not previously trained a reenactment at node 3 (only for node 2) the system halted. Experimental details can be found as comments can also be found under Extension 3.

Experiment 6: Recovering from Anomalies that Happen during Recovery
The final experiment analyzes the robustness of the system in identifying and recovering from anomalies (accidental and persistent) that occur during an already executing recovery skill. It is imperative that the system performs reliably even during recovery actions. In this experiment, we test two situations: i. a persistent anomaly induced during an adaptation. ii. an accidental anomaly induced during an adaptation.
These two conditions will be referred to as "Adaptation over Adaptation" (AOA) and "Re-enactment over Adaptation" (ROA) respectively. Experiments are run under our two aforelisted classification modalities. Each experiment is executed for one object chosen at random and repeated 10 times. Details are shown in Table 7.
For (i) we use the same persistent anomaly and adaptation of Exp. 4a.1. Namely, during pick, one finger collides Table 7. Conditions under which anomalies are induced during an adaptation recovery. Anomaly and recovery are presented under the following notation: node location for anomaly occurrence denoted with @; followed by anomaly type, and recovery policy indicated after (-). Also, (→) indicates a subsequently caused anomaly. For AOA: AD1 and AD2 describe 1st and 2nd adaptations. For ROA: RE refers to re-enactment.

Events
Situation @2TC-AD1, @2TC-AD2 AOA @3WC-AD1, @HC→OS-RE ROA with the placement of an adjacent object. The original adaptation rotates the robot wrist about the approach axis by π/2 rad (see Fig. 16(b)). In this experiment, we consider the placement of an additional object at the position where the already adapted grip fingers would descend. This in turn, would cause a new persistent tool collision. In this scenario, a new adaptation is needed. The human demonstrator decides to teach a sliding approach, whose direction of motion is parallel to the tangent of the table plane, until the fingers are centered on the object, at which point a pick behavior ensues. The adaptation is illustrated in Fig. 25 and can also be seen in the video Extension 1. For (ii) we combine the wall collision adaptation of Exp. 4a.2 with the phenomena experienced in Exp. 5 where an HC during move-to-place causes a subsequent OS that the system recognizes and one that is resolved via a pick re-enactment. In this case, we induce a human collision that results in a subsequent slip whilst the system is resolving a wall collision through a lifting adaptation.

Results
For anomaly identification, a total of 20 trials were used for testing (10 and 10 for experiments AOA and ROA) and had an average accuracy of 100% and 90.0% for AOA and ROA respectively. Precision had the same performance and recall was perfect. For anomaly classification, a total of 19 trials were used for testing (10 and 9 for experiments AOA and ROA) and had an average accuracy of 100% and 77.78% for AOA and ROA respectively. A confusion matrix was also computed and shown as a figure in Fig. 26. TC and WC were the target classes and the resulting HC statistics were due to missclassification.
As for success rates, each of the two situations under both classification modalities are shown in Table  8. For Adaptations-over-Adaptations, the system success Table 8. Success rate for anomalies and adaptations that occur during the original execution of a recovery policy. One object and two classification modalities are used to report performance metrics. Manipulation system errors enclosed in parenthesis as a percentage of failure contribution.

Situation
Perfect Imperfect . The scene shows three objects. Two smaller boxes towards the robot and one wider box away from the robot. Originally, a pick skill looks to grab the object delineated in red. However, a persistent anomaly occurs in Fig. 25(a) as one of the robot fingers collides with one of the adjacent smaller boxes. An adaptation is taught as described in Exp. 4a.1. That recovery behavior now faces a new persistent anomaly as seen in Fig. 25(b). A new wider box was also placed nearby and now causes a new collision with the fingers. Fig's. 25(c) & 25(d) show the implementation of a newly taught adaptation. As part of the whole process, the system is able to learn a new model of the adaptation as a nominal skill and deviations from its norm can be flagged as anomalous.
The framework enables endless extensions to the graph. Given that this is a persistent anomaly, a new node is introduced to the graph. rates were 80.0% and 90.0% for both classification modalities respectively. For AOA with perfect and imperfect classification, a total of three failures occurred as follows: during the 2nd adaptation attempt to grasp the block, the approach pose was inaccurate. Normally, our fingers open when a pre-pick motion has terminated. The approach trajectory had some imprecision and let to the fingers making contact with the block and tip it (instead of a sliding along the block to reach an optimal pick pose). After the tip, the block was displaced beyond the field-of-view of the camera. At this point the system continued to correctly trigger an NO flag, however on re-enactment the pose of the object was unavailable thus holding-up the execution of the re-enactment. This could be prevent by a better implementation of the manipulation skills taught to pick the object. In retrospect, we never envisioned that training the pick in this way would be problematic. It is not clear if end-to-end training would not suffer from similar problems from inception. Clearly, the adaptations could be re-trained or improved to address the issue under any manipulation scheme. The question remains which approach would be more robust to previously unseen situations.
For Re-enactments-over-Adaptations, the system success rates were 100% and 70.0% for both classification modalities respectively. The latter was caused by 1 false-negative in anomaly identification, 1 false-positive in node 3, and the same system limitation previously mentioned for AOA also occurred once here. If we look at the combined contribution of both situations for a given modality we have 90.0% for perfect classification and 80.0% for modality 2.

Experiment 7: Anomaly Classification Reactivity
In this experiment, we analyze if anomaly classification accuracy varies as a function of the time window we use to capture multi-modal signal observations before and after the anomaly identification flag has been issued. We wish to learn the top limits in reactivity of the algorithm. That is, how quickly can we classify without sacrificing important levels of accuracy. As originally stated in Sec. 2.2, we use a standard windows of ± 2 seconds to capture multi-modal signal observations before and after an anomaly has been identified. Fig. 27 shows a contour map of anomaly classification accuracy as a function of pre and post anomaly identification time duration. The figure contains accuracy regions in groupings of 5 percentile points, where the lower left corner indicates the smallest range of time windows, whilst the top right corner indicates the longest range time windows. The anomaly classification data in this experiment was setup in the same way as in Exp. 2. The final anomaly classification accuracy is computed as the average of the truepositive confusion matrix rates. Finally, note that reactivity measurements for anomaly identification were originally presented in (Luo, Wu, Lin, Duan, Guan and Rojas 2018) and concluded that we could identify anomalies on average consuming 1.84% of the duration of skills.

Results
According to Fig. 27, classification accuracy seems to be the highest (95% and above) in an approximate golden central radius, with another outer ring in gray holding the next percentile accuracy grouping (90-95%). For the smallest window combination, the lower left corner, the classification accuracy ranges in the (80-85%) grouping. Recall from Exp. 2 that our overall anomaly classification accuracy for the standard ± 2 second window was of 96.15%. The contour patterns seen in our experiment indicates that in general there tends to be quite similar performance in most of the studied regions. Only the region from 0.5-1.0 seconds seems to register a symmetrical drop in performance across both axis from the 90-95% range to the 80-90% range. Such information indicates that the main structural signatures of anomalies require slightly more than one second, given our classification algorithm in this kind of task, to provide accuracies above 90%. Note that the Extension 1 video uses the standard time window capture of ± 2 seconds.

Summary
In this last section we summarize and analyze the performance of the recovery policies. Fig 28 shows  across 480 trials (across nodes, objects, and users). If we consider the average performance, we still obtain a very strong 92.02%. This result reflects a result that we have commented on already; namely, that our work shows that as a manipulation task experiences a larger degree of recoveries, more variability enters the system rendering further introspection and classification more challenging (we recovered 85% of the time in Exp. 4). Nonetheless, we still recovered on nine out of ten times across users, objects, anomaly types, and nodes in the graph, hence showing very strong performance overall. When we consider classification modality (ii), we are considering the entire system and the effects of not only the recovery critic, but also those of anomaly identification and anomaly classification. These results tell about the effectiveness of a highly integrated introspection and recovery system (along with a manipulation and visualization aspects of the framework). When consider all counts across experiment we recovered 88.33% of the time and we consider the averaged result 82.38% of the time. Hence, the integration of the complete system, diminishes the performance of the recovery system, by slightly less then 10% points. Again, within comments we emphasized that the loss in performance was mainly experienced in Exp. 4 and 5 where a large number of anomalies were induced. This will often not happen in practice. Exp. 4a might be a more likely event, where 95% recovery was achieved under imperfect conditions in our work. Exp. 5 contained our worst performance with successful recoveries 75% of the time. This may not be a bad result after all. Recovering more than seven times out of 10 with unexpected scenarios, in our estimation, is not bad for current robotic performance in unstructured environments. Furthermore, in Sec. 7, we comment in detail specific directions in which we can significantly improve and expect better results. All experimental data is contained in Extension 2, results analysis can be found in Extensions 3 and 4, and code in Extension 5. We expect the community to use the current work and results as future baselines and improve performance further.

Discussion
Our comprehensive experimental results showed that our tightly-integrated, graph-based online motion-generation, introspection, and incremental recovery system worked accurately and robustly for a wide range of anomalous situations in an unstructured co-bot scenario where a human and a robot collaborated to complete kitting tasks. To the best of the author's knowledge, this is the first study where the recovery ability of a robot is examined in the presence of anomalies in manipulation in unstructured environments. In our study, we demonstrated that we could not only identify anomalies reliably (overall accuracy of 93.09%) but also classify them in an online fashion (overall accuracy of 96.15%). And that given simple task-level recovery policies, we could also recover consistently and reliably most of the time. The tight integration achieved in this work enabled robots to continue functioning, more than 82% across all our anomaly scenarios, and 95% in more typical scenarios like Exp. 4a. Even when anomalies occurred during recoveries themselves, we recovered with 80% of effectiveness. Hence, the combination of anomaly identification, with global classification and simple but contextual task-level policies reliably showed broad robustness in being able to recover at all stages of the task, across all anomaly conditions, across different users and objects thus extending the autonomy of the system in significant ways. While the system has a number of weaknesses we will soon address, this system with simple observation capabilities of the world may serve robotics systems were sensors are limited but desire more robustness in unstructured environments.
A couple of unexpected but welcome results are also discussed. First, the robustness results of the anomaly classification system and the recovery critic were somewhat unexpected. The sHDP-VAR-HMM model displayed a strong ability in generating good models that worked across different phases of the task and identified anomaly categories that contain important variations within. The limits of the model seemed to have shown up in Exp. 5 at node 3, when the most strenuous conditions were presented. Even there the classification system had an 86.7% accuracy. In our handengineered features, we attempted to abstract structure from the data instead of only keeping raw-observations. Such that, if signal patterns that were similar occurred at dissimilar temporal positions during the observation window, they would still possess similar representations. Structure was abstracted by integrating the norm of each of the modalities in our feature set.
The second unexpected emergent result occurred when we presented results for classification modality (ii) and saw that the combined (AD/AC/REC) system at times had better performance than under modality (i) where we had perfect classification (see Exp. 3, node 3, in Fig. 15 and Exp. 4a.2). There we learned that many anomaly misclassifications did not result in unsuccessful task completions. We learned in fact that the system could self-heal. Even when a misclassification was originally present and an inappropriate recovery policy enacted, the system self-corrected at a later time step by correctly understanding its anomalous state and later triggering the correct recovery policy.
We believe this work has broad applicability. It's graph based structure with internal modules for motion generation and introspection, and a supervisory recovery critic, allow the system to leverage any class of motion generation algorithms including attractor-based, probabilistic, and deep end-to-end approaches (better introspection techniques can be leveraged as well). The bottom-line is that even as motion generation techniques become increasingly robust to disturbances (Levine, Finn, Darrell and Abbeel 2016a;Levine, Pastor, Krizhevsky and Quillen 2016b;Haarnoja, Pong, Zhou, Dalal, Abbeel and Levine 2018); failure is still a frequent occurrence when uncertainty in the environment surpasses the modeling ability of the system. Thus, our framework can enhance the long-term autonomy and robustness of systems that use various motion-generation approaches.
Additionally, the deep system integration presented in the paper allowed for a comprehensive study of the dynamics between an introspection system and an accompanying recovery-critic. We believe this is the first study of its kind, where an explicit and detailed study of the anomaly-recovery relationship is presented. We have open-sourced the code, dataset, and result analysis (see Extensions 5, 2, and 3/4 respectively) to promote and facilitate further examination of the topic. We hope others can build on our work and use the current results to further improve performance. There is still much improvement ahead and we attempt to discuss some of the main issues next.

Limitations, Comparisons, and Future Work
An important limitation in our work is the fact that the kitting experiment was not conducted under real factory conditions. Thus the verifiability of the work in real-world applications is unclear and further testing in real-factory conditions is necessary. The kitting experiment provides a proof-of-concept and the authors would like to extend their work to actual scenarios through corporate partners. With regards to motion generation, we see the need for the adaptations of motion generation skills when objects are varied. While adaptations often transferred to other objects, Exp. 4c taught us that when the shape properties of an object different significantly from the object shape that was used to train motion skills, the system is susceptible to anomalies such as collision due to the lack of adaptation (end-to-end training motion generation might resolve this as it uses visual input to drive its behavior). Such adaptation is natural in humans to achieve safety (Babič, Oztop and Kawato 2016); in robotics attractor dynamics have also been used to avoid collisions (Haddadin, Urbanek, Parusel, Burschka, Rossmann, Albu-Schaffer and H 2010) although such dynamics have not explicitly considered object morphology in its computation. This is left as possible future work.
Another aspect related to motion generation would be using latent state data from the anomalies to produce low-level feedback signals that could provide more immediate reactivity. The challenge of transferring highlevel knowledge to useful low-level feedback still remains an open challenge. More interestingly would be the ability to recognize not just an anomaly but the onset of an anomaly and trigger feedback that rather than recovering does preventing instead.
With regards to anomaly identification, the work of Park et al. (Park, Kim and Kemp 2018;Park, Kim, Hoshi, Erickson, Kapusta and Kemp 2017;Park, Erickson, Bhattacharjee and Kemp 2016) is the most closely related to our work. For Park et al. , there are a couple of comparison points to be made. The first point relates to the way anomaly data is compartmentalized. Their system applied HMMs to identify anomalies for ensembles of either: a specific robot skill with a specific object, or a specific robot skill with a specific person. Such specificity makes it easier to identify anomalies but it also increases the number of classes to be trained. Evidently, models that can accurately discriminate across broader datasets (such as being trained with a multiplicity of objects or users) is desirable. In our work, our anomaly identification (and classification) was trained to identify anomalies across different task nodes, different objects, and different users (where relevant). Thus, a broader training domain was considered in our work.It is difficult to perform a direct comparison with Park. et al.'s work given that the the task, robot system, and environment are different. A broad comparison is only possible. In their work, they obtained an average anomaly identification accuracy across 5 tasks of 86.87%. In our work the anomaly identification across nodes (also for 5 tasks) was 93.09% (see Fig. 9 in Exp. 1).
With regards to anomaly classification our system seems to outperform the state of the art. The work of Park in (Park, Kim, Hoshi, Erickson, Kapusta and Kemp 2017) and the work of Di Lello et al. (Di Lello, Klotzbucher, De Laet and Bruyninckx 2013) most closely resemble our work. In Park et al.'s work, their multi-perceptron classifier classified 12 common anomalies with 90% accuracy. Furthermore, the paper also includes experiments where the robot feeds a real person with quadriplegia. In this work, they conducted anomaly identification and classification (they also classified the cause of the anomaly) and had 86% and 90% accuracy, resulting in a combined 88% effectiveness for the system. So with regards to anomaly classification, we still outperformed the accuracy marker, nonetheless the number of cases they considered was larger (12 instead of 5). With regards to the combined system, our (AD/AC) overall performance was of 94.62%, about 6% points higher than their, but again for a smaller number of anomaly cases. In Di Lello et al.'s work, they use a simple non-parametric Bayesian model, namely the sHDP-HMM with Gaussian observations and Gibb's sampling to classify anomalies. In their work, they achieved an average classification accuracy of 87.5% over four anomaly classes in an alignment skill with 4 obstructing objects. Our performance was between 6-8% points higher: 96.15% across nodes (Fig. 10 in Exp. 2) and 94.4% was the confusion matrix average in Exp. 2 (Fig. 11). Again, comparisons are difficult. Their experimentation consisted of single anomaly scenarios that did not change over time. Our scenarios included a wide range of anomalies, from one to multiple, occurring at different phases of the task with different objects and users. So, given that our anomaly experimentation was considerably more complex.
With regards to reactivity, Di Lello et al. (Di Lello, Klotzbucher, De Laet and Bruyninckx 2013) only presents a simple statement declaring that his system would have degraded anomaly classification performance if the decision had to be made before 0.65 seconds. For Park et al. , they studied how fast and how well they could classify one of four anomalous signals if they changed the signal amplitude. More specifically they measured detection delay in seconds along with the true positive rate as a function of detection magnitude. They found that small amplitudes, less than 10% could take them as much as three seconds to identify but with low true positive rates ranging less than 20%. Signals which maintained the original amplitudes were identified in around 1 second with about 80% accuracy. In our case, Fig. 27, revealed that for a window of pm 2 seconds, our anomaly classification (for five classes) was 96.15%. If the window after an anomaly is triggered is brought to 1 second, our classification accuracy ranges slightly above 90%.
The comparison with Park et al. work in (Park, Kim and Kemp 2018) may indicate that for anomaly experiments where simulated data is not yet reliable and where real-robot (or cobot) experiments are conducted and produce a limited number of trials, then non-parametric Bayesian models with specialized variational inference algorithms models are very competitive in performing anomaly identification and classification and with very good reaction rates.
We would like to note the time and human cost that it took to gather the anomaly classification data in unstructured environments for this task. The process was arduous as manual induction was required to test anomalies. Labeling the anomalies was also problematic as the anomalies took place in a laboratory settings and may not be reflective of a true factory-floor or warehouse scenario. Automating the anomaly label collection process through simulation or a farm of robots as in (Levine, Pastor, Krizhevsky and Quillen 2016b)) is possible, though the algorithm by which anomaly induction takes place should be examined to understand whether it approximates real-life conditions. Another interesting possibility is the use of synthetically generated anomaly data. Synthetically generated data is becoming more common place (Radovanov and Marcikić 2014;Forestier, Petitjean, Dau, Webb and Keogh 2017;Vinod, López-de Lacalle et al. 2009;Le Guennec, Malinowski and Tavenard 2016), examples include synthetic voices, images, or depth representations. However, when it comes to anomaly data, the use of synthetic data seems more challenging as the structure of anomalous data can have important variations as discussed in this paper. It would be interesting to investigate the minimal amount of nominal data needed from which synthetic data could be generated with sufficient accuracy to properly introspect anomalies. If feasible, it would enable the learning of anomalies in an incremental fashion, similar to the way biological systems can learn from one mistake and apply the knowledge to a new scenario. Incremental learning helps classification especially when we cannot control neither the frequency or type of occurrence. It would also be desirable to continually update our models with the new experiences. More so, consider leveraging learning across similar robots that might independently face unique situations in different environments. Transfer learning of this sort has been an area of growing interest recently (Devin, Gupta, Darrell, Abbeel and Levine 2017). Incremental learning would also open questions about how to discriminate the right level of granularity for anomaly classification labels (are all collisions with a human the same type of collision? Should such collisions be subdivided? How to determine that?) in such systems and how such discrimination would compare to that of a human. Anomaly clustering has a direct impact in recovery policies as different anomaly classifications pair-up with unique recoveries. We also foresee that as the ability to generate more faithful synthetic data becomes available, deep networks will may play more significant roles in anomaly identification and classification." One more future line of research in anomaly classification that stems from this work is the ability to simultaneously identify multiple anomalies. Often times in our experimentation human collisions resulted in object slips, this raised the possibility of having two co-existing anomalies. Based on our current anomaly discrimination approach, we select the class whose likelihood is maximal. We lack an underlying structure that understands that either two anomalies are happening simultaneously or are chained to each other backto-back. We wish to explore this as a future line of work.
With regards to re-enactment policies on a task-planning level, the multinomial distribution is admittedly simplistic. It is an indirect process of capturing decision policies. Furthermore, while we try to reduce re-teaching by having adaptation nodes inherit re-enactment policies from their parent node; there are times anomalies will occur for the first time in later nodes for which no policy exists. This requires user intervention to train the system as happened in Exp. 5 for imperfect classification where the system halted its performance because no re-enactment policy existed for the NO class in a particular node. We are interested in looking for automated policy learning solutions that evolve over time.
With regards to adaption policies, we do not yet model the spatial relations amongst the actors of interest; namely, the robot (end-effector), active objects (like objects to be gripped and the packaging box), and the world (support surfaces like tables and floor). These relationships provide important context for decision making and are recently attracting more attention (Philipp Jund and Burgard 2018; Adjali and Ramdane-Cherif 2018; Aly and Taniguchi 2018;Gong and Zhang 2018). Without spatial relation understanding, the solutions learned in Exp. 6 will not extend to situations where the spatial relations are different from those in training. Not all experiments would fail without spatial relations context however. The HC, OS, and NO anomalies do not seem to explicitly depend on spatial context and may likely be resolved as-is in new situations. In effect, despite the lack of explicit spatial relationship modeling, our recovery policies often overcame external disturbances that might have otherwise terminated the task and endowed the system with longer operational horizons. By learning context relations, adaptations would do more than replay a learned behavior, they would in fact restore the complete and original state of the system before the anomalous condition. The larger overall challenge remains in learning how to integrate real-time reasoning and apply it to a learned skill, how to explicitly consider the spatial and functional relations between objects, the robot, and the world. It is possible that by theoretically grouping anomaly-recovery pairs into groups that do need functional-spatial reasoning and groups that do not (Koppula, Gupta and Saxena 2013;Koppula and Saxena 2016). In (Paulius, Huang, Milton, Buchanan, Sam and Sun 2016;Jelodar, Salekin and Sun 2018), for example FOON graphs and object affordances are introduced and might be particularly useful to resolve spatial and reasoning problems. Resolving this issue will be a consideration for future work. Notwithstanding, the work as-is with its limitations, might be useful in extending the autonomy of robots with limited sensor and/or computational capabilities.
Finally, one last comment involves the application of our work to multi-task scenarios and human-robot interaction (HRI). To further extend long-term autonomy horizons this work should be tested not just in isolated single tasks but in longer-term multi-task scenarios that can further test the effectiveness of the proposed approaches. Additionally, it would be interesting to consider more complex graph topologies in HRI, such as a dual-graph framework that synchronizes both human and robot activity and enables mutual introspection and recovery under explicit collaboration. We plan to extend our work to include handover tasks from humans to the robot instead of placing objects directly in the collection bin.

Conclusion
This work presented a tightly-integrated, graph-based online motion-generation, introspection, and incremental recovery system for manipulation tasks in loosely structured cobot scenarios. Failures are and will continue to be a reality in robotics despite increasingly powerful motiongeneration algorithms. Dealing with them explicitly has been the focus of this work. Recovery and introspection had robust performance. Importantly however we learned that the recovery ability of the system grows in difficulty with an increased number of adaptations as variations in sensorymotor signals increase as more recoveries are attempted. The system also showed signs of self-repair. On occasion, after an anomaly misclassification and improper recovery policy enactment, the system would correct its introspection and emit successful recovery policy that complete the task. Ultimately the system presented in this work significantly extended the autonomy and resilience of the robot and has broad applicability to all manipulation domains that suffer from uncertainties in unstructured environments: making industrial and service robots prime candidates for this technology.

B.2 Recording methodology
All sensory-motor signals exist as ROS topics in our system and as such recorded as ROS bags offline. When an anomaly is identified, we signal this event by sending a timestamped ROS message to a pre-defined topic that is also recorded as a rosbag. Anomaly classification labels are recorded in a text file in a line-by-line basis.
Mapping from data modalities to ROS topics is as follows: •

B.3 Data Organization
The dataset is composed of folders that use the format: "experiment_at_[time]". Each folder represents a test trial in the kitting experiment. Within a given folder, there will be a rosbag "record.bag" and a text file "anomaly_labels.txt". Each of these contain the rosbag topics mentioned in Sec. B.2 and the recorded labels for the given experiment.

B.4 Anomaly Data Extraction
To extract anomaly data, one should first focus on the topic "/anomaly_detection_signal" whose messages are effectively timestamps indicating when anomalies were identified. It's worth noting that a burst of anomaly timestamps might have been published to this topic for one anomaly. Therefore timestamps that are adjacent in time should be ignored. We recommend ignoring a timestamp if its distance to its precursor is less than 1 second. After anomaly timestamps are extracted, labels in the accompanied "anomaly_labels.txt" can be paired accordingly.
We have tried to clear the dataset of any corrupted trials. However, if the number of anomaly timestamps does not equal to the number of labels, that experiment should be discarded.

VAR e t
Additive white noise at time t and mode z t Σ White noise covariance matrix for mode z t A Time-invariant regression matrix at z t ∆, ν Covariance ∆ & degrees of freedom ν in IW IW(ν, ∆) Inverse wishart K Covariance across matrix columns

Anomaly Identification
∇L Natural log of HMM filtered belief state