Keywords

1 Introduction

Service robots are growingly experimentally deployed to carry out everyday tasks in coordination with humans in different settings, such as healthcare and domestic assistance. Unlike industrial settings, these contexts do not set significant boundaries for human actions. As a result, their behavior is substantially unconstrained and constitutes a critical source of uncertainty. Existing software engineering techniques privilege efficiency-related factors (e.g.,  the time it takes the robot to complete a task) and, according to practitioners, are not mature enough to handle this degree of variability [9]. On the other hand, decisions made at design time impact a hefty percentage of the software lifecycle costs [8], and their validity is questioned if unaccounted-for contingencies emerge at runtime.

Fig. 1
An illustration of the model-driven framework. The 4 phases are design time analysis, reconfiguration, deployment, and model adjustment.

Model-driven framework’s workflow. Macro-phases are shown as colored areas. Blocks represent artifacts, and dashed arrows represent manual tasks

This chapter addresses this methodological gap by proposing a model-driven framework for developing service robotic applications where human-robot interaction is a crucial element [14]. Figure 1 gives an overview of the proposed methodology. The framework is devised for practitioners who do not necessarily have prior training in software development; therefore, it supports them throughout all phases, from design to testing and maintenance, while keeping the required manual effort to a minimum degree. The proposed methodology relies, at its core, on formal modeling and verification techniques to develop robotic scenarios with guarantees of robustness to the mentioned sources of uncertainty. Therefore, the framework also puts into question the claim that human behavior modeling falls beyond the limits of formal modeling techniques [17].

Existing works explore the possibility of formalizing human-robot interaction and exploiting the guarantees of formal analysis within the software development process. Previous attempts mostly focus on ensuring that collaborative applications meet safety standards [22] or comply with social norms [19]. The literature also features formalizations of human behavior, for example, by modeling the system as a network of Timed Game Automata [4] or adopting a probabilistic approach, as in the hereby presented work, while focusing on smaller setups [23].

In this work, the robotic scenario is first analyzed offline (macro-phase 1 in Fig. 1) through a custom textual Domain-Specific Language (DSL) [15]. The DSL file is then automatically converted into a formal model capturing the agents involved in the scenario, i.e., the robots and the humans they interact with. As for the latter, in line with the goal of tackling uncertainty, the formal model captures non-traditional aspects of human physiology and their decision-making process. The significance of physiological aspects primarily relates to the healthcare setting since subjects may be in pain or discomfort, impacting their ability to carry out tasks in coordination with the robots. Incorporating a formalization of the human decision-making process, specifically as a stochastic process, into the model is necessary to have its impacts considered by the formal analysis. Such modeling requirements motivate the choice of Stochastic Hybrid Automata (SHA) as the selected formalism [7]. Given the stochastic nature of the model, the framework then applies Statistical Model Checking (SMC) [1] to estimate quality metrics expressed as Metric Interval Temporal Logic (MITL) properties about the robotic application to be examined by the practitioner.

If the SMC results are not satisfactory, the design must be revised by applying reconfiguration measures (macro-phase 2 in Fig. 1). Otherwise, if such metrics satisfy the practitioner’s expectations, the so-obtained design is either deployed on the field or simulated for further investigation (macro-phase 3 in Fig. 1). To this end, the framework introduces a deployment approach with a model-to-code mapping principle to guarantee correspondence between the formal model and the software components deployed at runtime [13]. So-collected data (i.e., either actual sensor logs or simulation logs) are then exploited to learn an updated model of human behavior (macro-phase 4 in Fig. 1) [14]. To this end, a novel active automata learning algorithm, called \(\textsf{L}^*_\textrm{SHA}\), targeting SHA is fed with traces (i.e., event sequences) mined from field data. The learned SHA is plugged back into the formal model to either revise the design of the same scenario or in preparation for the development of future ones.

Previous works present learning algorithms for Hybrid (HA) or Probabilistic Automata. Medhat et al. present a framework for HA mining based on clustering [18]. Works focusing on probabilistic systems adopt a frequentist approach [10] or a state merging method [5]. Tappler et al. also propose an extension of \(\textsf{L}^*\) to learn Markov Decision Processes based on collected samples [21]. The \(\textsf{L}^*_\textrm{SHA}\) algorithm contributes to the area as it targets both hybrid and stochastic features.

All the phases of the framework have been experimentally validated on scenarios inspired by the healthcare setting. Experiments aimed at assessing the accuracy (thus, the reliability of the results) of the different artifacts, the flexibility of the framework with respect to realistic service robotic applications, and its capability to mitigate the sources of uncertainties at play. The key results of experimental validation are reported in this chapter, whereas we refer the interested reader to dedicated publications for a detailed report.

The rest of the chapter is structured as follows. Section 2 outlines the main theoretical concepts underlying the work. Each macro-phase is then illustrated in detail, specifically: Sect. 3 describes the design-time analysis phase; Sect. 4 describes the deployment framework; and Sect. 5 describes the model adjustment phase. Finally, Sect. 6 presents future research directions.

Fig. 2
2 instances of the Self-Healing Automaton modeling human behavior, with a specific focus on the orchestrator SHA. In these scenarios, the machine transitions from an idle to a busy state based on events triggered by the orchestrator.

Example SHA network: invariants and flow conditions are in purple, channels in red, probability weights in orange, guards in green, and updates in blue

2 Preliminaries

As per Sect. 1, the chosen formalism is SHA, an extension of Timed Automata with hybrid and stochastic features. SHA locations, which belong to set L, capture the different operational states of the system under analysis. Figure 2a shows an example of SHA modeling human behavior with locations \(L=\{ h _\textrm{idle}, h_\textrm{busy}\}\), representing the human standing still and walking, respectively.

In Hybrid Automata (HA), set W contains real-valued variables whose time dynamics are constrained through sets of generic ODEs, called flow conditions [2], modeling complex physical behaviors. Given location \({l\in L}\), function \(\mathcal {F}(l)\) assigns flow conditions to l constraining the behavior of real-valued variables in W while in such location. As an example, real-valued variable \({F\in W}\) in Fig. 2a captures the human’s physical fatigue whose time derivative \(\dot{F}\) is constrained by functions \(\mathrm {f_{rec}}(t, k)\) and \(\mathrm {f_{ftg}}(t, k)\). In Stochastic HA, given flow condition \({f(t, k)\in \mathcal {F}(l)}\) where t represents time and k is an independent, randomly distributed parameter, f(tk) acts as a stochastic process and its domain is \({\mathbb {R}_{+} \times \mathbb {R}}\). For each location \({l\in L}\), function \(\mathcal {D}(l)\) assigns a probability distribution to l governing random parameter k (e.g.,  \(\mathcal {D}(h_\textrm{idle})\) and \(\mathcal {D}(h_\textrm{busy})\) in Fig. 2a).

SHA edges capture transitions between two locations and are labeled with the event triggering the transition and, possibly, a guard condition and an update, expressed in terms of variables in W. Guard conditions (e.g.,  \(x\ge \mathsf {T_1}\) in Fig. 2b) enable the firing of the edge when, given the current value of variables in W, they are verified. Updates are sets of assignments to variables in W that are executed when the edge fires. In SHA, assignments may entail the extraction of a sample from a probability distribution. For example, when the SHA in Fig. 2 switches from \(h_\textrm{busy}\) to \(h_\textrm{idle}\), update \(\xi _\textrm{idle}\) assigns a sample from \(\mathcal {D}(h_\textrm{idle})\) to k.

Given channel \(\texttt{c}\), an edge can be labeled either with \(\texttt{c}!\) if the SHA actively triggers an event through \(\texttt{c}\) or with \(\texttt{c}?\) if the SHA listens for events on \(\texttt{c}\). Multiple SHA in a network (e.g.,  the human in Fig. 2a and the orchestrator in Fig. 2b) synchronize through channels when complementary edges fire simultaneously. For example, the SHA in Fig. 2 (i.e., the machine) may switch from \(h_\textrm{idle}\) to \(h_\textrm{busy}\) when the orchestrator fires an event through channel \(\texttt{start}\). In SHA, edges may also be associated with probability weights (i.e., dashed arrows in Fig. 2a) determining the bias of the network toward a certain transition: for example, when event \(\texttt{start}!\) fires, the SHA in Fig. 2a switches to \(h_\textrm{busy}\) with probability \({p=w_1/(w_1+w_2)}\) and stays in \(h_\textrm{idle}\) with probability \({1-p}\).

A location l of an SHA can be endowed with an invariant, i.e., a condition over variables in W that must hold as long as the SHA is in l. In Fig. 2b, the combination of invariants and guards on outgoing edges ensures that edges fire exactly when \({x=\mathsf {T_{ i }}, i\in \{1, 2\}}\) holds.

SHA are eligible for SMC, which can be performed, for example, through the Uppaal tool [12]. SMC generates multiple runs of an SHA network M through the Monte-Carlo simulation technique, each simulating the evolution of the system for a given time \(\tau \in \mathbb {N}\). These runs are then individually examined to check whether a given MITL property \(\psi \) holds, thus constituting a set of Bernoulli trials. The value of expression \(\mathbb {P}_M(\psi )\) then corresponds to the confidence interval for the probability of property \(\psi \) holding for network M within time \(\tau \). By simulating the SHA network, it is also possible to calculate the expected maximum/minimum value of real-valued variables, such as the humans’ physical fatigue or the robot’s residual charge.

Fig. 3
An illustration of the healthcare setup. It includes four agents such as one robot and three humans, cupboards with medical equipment K I T 1 and K I T 2, and T-shaped corridor with doors to four rooms.

Layout for the illustrative example representing the agents (in their initial positions) and the POIs. HUM3’s initial position is randomized

3 Design-Time Analysis and Reconfiguration

The entry point to the model-driven framework is the analysis of the robotic scenario at design time (thus, offline). The goal of this phase is to specify the characteristics of the scenario and subsequently compute quality metrics through formal analysis.

The set of characteristics that can be expressed through the custom DSL constitutes the conceptual model underlying the framework, which is summarized in the following and exemplified through an illustrative use case from the healthcare setting (see Fig. 3). Firstly, the geometrical layout where agents will operate has to be defined (e.g.,  the T-shaped corridor in Fig. 3 with doors to four rooms). The layout can include points of interest (POIs) agents can interact with (e.g.,  cupboards with medical equipment KIT1 and KIT2 in Fig. 3). Agents are either robots or humans. Mobile robots can be of different commercial models, which determines their technical specifications. Humans have different physiological features (including their age group and health status) and behavioral traits (for example, their level of attentiveness). The example in Fig. 3 features four agents: one robot (ROB) and three humans (HUM1, HUM2, and HUM3).

Having defined the characteristics of the agents, it is necessary to configure the scenario. In this work, a scenario is intended as a composition of robotic missions, where each mission is an ordered sequence of services. A service represents a task requiring coordination between the human and the robot with a target in space (i.e., a POI), which must conform to a pattern. Patterns group recurrent human-robot interaction contingencies, such as the human following the robot to a destination or the human and the robot competing for the same resource [15]. The mission for the example in Fig. 3 begins with HUM1 following the robot to the waiting room until the examination room is appropriately set up. While HUM1 is waiting, the robot and HUM3 compete for the same medical kit (KIT1 in Fig. 3): the mission then follows alternative plans depending on the outcome of the competition. If the robot retrieves the resource first, it delivers it to HUM2 in the office. Otherwise, HUM2 leads the robot to retrieve another medical kit (KIT2 in Fig. 3). Once HUM2 is ready for the visit, the robot leads HUM1 to the office and assists HUM2 in administering the medication.

Fig. 4
A workflow diagram of the design time phase. The D S L file transitions to the J S O N file, and subsequently, the J S O N file flows into both the U P P A A L model of the S H A network and a U P P A A L query file are processed, providing results for System Modeling and Checking.

Design time phase workflow. In the SHA network, blue arrows represent sensor readings shared by the agents with the orchestrator, while red arrows represent decisions made by the orchestrator and communicated to the agents

The so-obtained DSL file is then automatically converted into an intermediate JSON notation that decouples DSL parsing from the specific verification tool (see Fig. 4). At the current stage of development, the available component that generates the formal model targets SHA as chosen formalism and Uppaal as the verification tool. The generated SHA network is schematically represented in Fig. 4. The network consists of \(\mathsf {N_h}\) SHA modeling human behavior for each subject (\(\mathcal {A}_{h_i}\) with \({i\in \{1, \mathsf {N_h}\}}\)) and \(\mathsf {N_r}\) SHA for each robotic system, made up of the SHA for the robotic platform, the battery, and the orchestrator (\(\mathcal {A}_{r_i}\), \(\mathcal {A}_{b_i}\), and \(\mathcal {A}_{o_i}\) with \({i\in \{1, \mathsf {N_r}\}}\), respectively). The latter acts as the robot controller; specifically, agents periodically share their position within the layout and further data about their current status (i.e., the fatigue level for humans and residual battery charge for robots). The orchestrator examines the latest batch of sensor readings and checks it against its policies to send commands to the agents (e.g.,  start or stop walking) aiming for the completion of all the services in the mission.

Quality metrics to be computed for the scenario are referred to as queries. Possible queries include the probability of completing the mission within a specific time, the probability of critical events (e.g.,  the human getting fully fatigued or the robot fully discharged) occurring within a specific time, and the estimated value of relevant physical variables captured by the model.

SMC experiments are automatically launched, and their results are provided to the scenario designer. If such results satisfy the designer’s expectations, the application can move forward to the deployment macro-phase. Otherwise, as per Fig. 1, the design of the scenario can be revised to improve the desired indicators. Possible reconfiguration measures include selecting different robots from the fleet, changing the order in which services are provided (unless logical dependencies exist), and re-tuning the orchestrator’s policies (e.g.,  a less conservative orchestrator may lead to faster mission completion while requiring more effort on the humans’ side).

Realistic service robot scenarios from the literature or industrial use cases have been collected to assess the coverage of the design-time analysis phase (i.e.,  whether they would be analyzable through the framework or fall out of its scope) [15]. Results show that 24 out of 27 scenarios are analyzable through the framework, leading to a coverage rate of \(88\%\). The accuracy of the formal model (specifically, the SHA modeling the robotic system) has been assessed with respect to field-collected data on 6 scenarios, resulting in estimation errors up to \(6.7\%\) for the probability of success, \(0.61\%\) for the robot’s charge, and \(8.6\%\) for human fatigue.

4 Application Deployment

Once the design of the robotic scenario satisfies the set of requirements, the application can be deployed on the field or simulated in a virtual environment. At this stage, it is paramount to guarantee to a degree the correspondence between the formal model and the behavior of the system at runtime. To this end, the deployment phase entails the mapping of SHA features to deployment units constituting the deployment framework [13]. Units either consist of simulator scripts governing the behavior of agents in the virtual scene or low-level components controlling the robotic device and the sensors worn by human subjects. The application of the mapping principle to recurring modeling patterns results in recurring code patterns capturing, for example, the periodic refresh of sensor readings and consequent sharing with the orchestrator deployment unit.

The resulting deployment infrastructure features a deployment unit for each agent and a standalone unit for each orchestrator (one for each robot in the fleet). Orchestrators communicate with agents over a middleware layer based on ROS publisher/subscriber nodes [20]. Each sensor associated with an agent corresponds to a ROS publisher node that periodically shares over dedicated topics the latest reading, to whom the orchestrator subscribes. Correspondingly, the orchestrator’s commands are transmitted over dedicated ROS topics with the agents. The ROS-based middleware layer decouples the orchestrators from the specific technology exploited for the agents’ deployment unit, constituting a standard communication interface. As a result, the deployment framework flexibly supports physical agents, simulated agents, and a hybrid setting (e.g., physical robots synchronizing with human subjects in the simulation scene).

The deployment framework has also been tested in terms of accuracy, specifically regarding the correspondence between formal analysis results and the behavior observed at runtime (thus, the accuracy of the model-to-code mapping principle) [13]. Results show a deviation of physical variables values obtained through SMC and simulation of up to \(5.35\%\): given that, at the time of writing, no standardization exists for acceptable thresholds in service robot applications, whether such values meet the facility’s requirements is up to the stakeholder.

5 Model Adjustment

For the first design-time analysis iteration, it can be assumed that the SHA modeling human behavior within the formal model is an underapproximation of real human behavior. Therefore, upon deploying the mission with real human subjects who perform a broader range of actions, it is plausible that results obtained at design time are found to be no longer accurate. To address this issue, sensor data can be exploited as part of a data-driven learning technique of a model of human behavior that is up-to-date with the knowledge accumulated through deployment.

Fig. 5
A flow diagram incorporating learner and teacher interactions. Learner submits m i, h t, and c e x queries to the teacher. The teacher responds with a hypothesis. Flow conditions and probability distribution are included.

High-level workflow of \(\textsf{L}^*_\textrm{SHA}\) split into the teacher’s and the learner’s lanes. Dashed arrows represent the submission of a query and the retrieval of the teacher’s answer

To this end, an active automata learning algorithm targeting SHA has been developed (see Fig. 5 for the algorithm’s workflow) [14]. The algorithm extends the well-known algorithm for Deterministic Finite-state Automata (DFA) learning \(\textsf{L}^*\) [3], hence the name \(\textsf{L}^*_\textrm{SHA}\). \(\textsf{L}^*_\textrm{SHA}\), like \(\textsf{L}^*\), relies on the interaction between a learner and a teacher (or oracle). The learner is in charge of maintaining the hypothesis automaton \(\mathcal {A}_\textrm{hyp}\), while the teacher stores the available knowledge about the System Under Learning (SUL). The learner submits queries to the teacher and refines the hypothesis based on the teacher’s answers.

Knowledge stored by the teacher is in the form of signals collected by sensors. While \(\textsf{L}^*_\textrm{SHA}\) is domain-agnostic, in its application to human behavior learning such signals consist of the agents’ positions, human physical fatigue, and data concerning the environment (e.g.,  humidity and temperature). As per the example in Fig. 2, human behavioral states differ based on how fatigue evolves (e.g.,  it increases while walking and decreases while resting). Therefore, to identify SHA locations correctly, in this use case, the fatigue signal is split into segments based on events that occurred during the mission (e.g.,  human velocity switching from 0 to a value greater than 0 indicates that the human started walking). A sequence of events constitutes a trace. In \(\textsf{L}^*_\textrm{SHA}\), the teacher stores all collected traces and the associated signals.

As per Fig. 5, learning occurs in rounds. At the beginning of each round, since the algorithm targets SHA, each location \({l\in L}\) of \(\mathcal {A}_\textrm{hyp}\) that has already been identified needs to be labeled with flow conditions and probability distributions (i.e., functions \(\mathcal {F}(l)\) and \(\mathcal {D}(l)\), respectively). For flow conditions, the learner submits mi queries, which exploit the Derivative Dynamic Time Warping (DDTW) technique [11] to identify the function out of a set of candidates that best fits a specific signal segment. Concerning probability distributions, the learner submits ht queries for the teacher to determine through a Kolmogorov-Smirnov two-sample test whether the samples of a random parameter observed in the aftermath of a specific trace constitute a new population or they are not statistically different from previously identified populations [16].

After assigning all flow conditions and probability distributions, the learner checks whether \(\mathcal {A}_\textrm{hyp}\) is well-defined, that is whether it is closed and consistent. The hypothesis is closed if all edges reach existing locations (in other words, if a location is defined for each identified operational state). The hypothesis is consistent if no location has more than one outgoing edge with the same event label (in other words, if the SHA is deterministic with respect to edge outputs). If either of the two conditions is not verified, the learner modifies \(\mathcal {A}_\textrm{hyp}\) to make it closed and consistent.

Once \(\mathcal {A}_\textrm{hyp}\) is well-defined, the current learning round ends with the learner submitting a cex query to the teacher, that is asking whether, given the teacher’s knowledge, a counterexample to \(\mathcal {A}_\textrm{hyp}\) exists. A counterexample is a trace known to the teacher but which is not captured by \(\mathcal {A}_\textrm{hyp}\) or is not compatible (i.e., is a source of non-closedness or non-consistency). If a counterexample exists, a new round of learning is necessary; otherwise, \(\textsf{L}^*_\textrm{SHA}\) terminates returning \(\mathcal {A}_\textrm{hyp}\).

Within the model-driven framework, the learned SHA constitutes a refinement of the model of human behavior, which is plugged back into the SHA network to iterate the design-time analysis for the same or different scenarios. Experiments have been carried out by simulating a broader range of human actions (e.g.,  running, sitting, and walking while carrying a load) through the simulated deployment environment to assess the gain in accuracy with the refined model [14]. The latter amounts to an average \(18.1\%\) accuracy gain for the estimation of the probability of success and \(7.7\%\) for the estimation of fatigue. Naturally, better accuracy comes at the cost of the time necessary to complete the learning (approximately 35min for the largest model) and the increase in complexity of the resulting SHA network (thus, in verification time).

6 Future Research Outlook

The framework is open to several future extensions. Ongoing work involves a refinement of the SHA network with cognitive and psychological models of the human decision-making process to be accounted for by the formal analysis and the deployed orchestrator.

As for the model adjustment phase, \(\textsf{L}^*_\textrm{SHA}\) works under a set of simplifying assumptions, which may not be realistic with real CPSs. Therefore, further work is necessary to extend the applicability domain of \(\textsf{L}^*_\textrm{SHA}\). In more general terms, the degree of applicability of active automata learning techniques (which usually rely on the availability of an omniscient oracle to drive the learning) to real systems is an open research question.

Finally, the reconfiguration phase of the framework is now performed entirely manually, which is not aligned with the initial goal of keeping the automation level as high as possible. An automated strategy synthesis procedure (for example, exploiting the Uppaal Stratego tool [6]) could be developed to compute alternative mission plans that optimize the quality metrics, thus at least partially automating the mission re-design task.