A Bayesian Approach to Risk-Based Autonomy, with Applications to Contact-Based Drone Inspections

Enabling higher levels of autonomy while ensuring safety requires an increased ability to identify and handle internal faults and unforeseen changes in the environment. This article presents an approach to improve this ability for a robotic system executing a series of independent tasks by using a dynamic decision network (DDN). A simulation case study of an industrial inspection drone performing contact-based inspection is used to demonstrate the capabilities of the resulting system. The case study demonstrates that the system is able to infer the presence of internal faults and the state of the environment by fusing information over time. This information is used to make risk-informed decisions enabling the system to proactively avoid failure and to minimize the consequence of faults. Lastly, the case study demonstrates that evaluating past states with new information enables the system to identify and counteract previous sub-optimal actions.


Introduction
Highly automatic or autonomous mission executions have advantages for reducing costs [1], improving performance [2], increasing safety [3], and enabling new types of operations [3,4].Examples of such systems include autonomous underwater vehicles, dynamic positioning systems for ships, and autopilots.Today's systems often rely on human operators to monitor them and to manually intervene if necessary [2,5,6].Developing autonomous robotic systems that can operate without direct human supervision can enable a wider range of missions.One example is missions where Previous literature exists on making decisions based on uncertainty or risk.In [12] a Bayesian belief network (BBN) is used to evaluate the collision risk during an under-ice operation with an autonomous underwater vehicle.Safety critical parameters, such as distance to the ice sheet, are automatically changed by considering how they affect the risk evaluated with the BBN.In [13] an emergency landing location for an unmanned aerial vehicle is chosen by evaluating the risk of the different landing locations with a BBN.In [14] a BBN is used to evaluate the effect of different recovery and security strategies during a cyberattack against an industrial control system.Even though these works make risk-based decisions, they do not consider improving the system's ability to interpret its state and the state of the environment.
An ability to infer the health state of the system based on indirect observations is demonstrated in [15][16][17] by using a Dynamic Bayesian Network (DBN).This previous research only considers estimating the state of the system and does not consider using the results for automatic decision-making.Furthermore, they do not consider how the choice of actions affects how the system develops over time.
Considering the action made by the system and using the inferred state of hidden variables for automatic decisionmaking has been done in educational systems [18][19][20] and dialog systems [21,22].These systems use a dynamic decision network (DDN) to infer the state of the user based on their observed response to different actions made by the system.Even though these systems show some of the capabilities needed, they are made for a distinctly different type of problem, making them not directly applicable to automatic decision-making for robotic systems.
In [23] a system is presented that infers the state of the environment and the health state of a robot based on indirect measurements in a DBN and uses this information for automatic emergency fault handling.In contrast to [23], this present article considers operational decision making which makes it necessary to consider the risk and reward of executing different actions, how the choice of action affects how the system develops over time, and to re-evaluate past actions when new information has become available, neither of which is considered in [23].
This article combines the capabilities presented in the earlier literature to make a risk-based decision system for an autonomous robot executing a sequence of independent tasks.The capability of making risk-based decisions presented in [12][13][14] is combined with the capability of identifying hidden states based on indirect observations presented in [15][16][17] and with the capability of considering the actions taken by the system itself as presented in [18][19][20][21][22]. Furthermore, this work introduces a new capability of evaluating past states with new information to identify past mistakes or sub-optimal choices.This article develops a DDN which combines measurements available before a task is attempted with which action that was chosen and what the outcomes of the action were.As the model is dynamic the result of multiple task execution attempts are considered in light of each other to reveal faults with the robotic system and adverse environmental conditions that can not be measured directly.A heuristic is proposed which used the DDN to evaluate if executing a task should be attempted or skipped, which execution action that should be used if more are available, and whether maintenance of the robot is needed.Additionally, the system updates its belief regarding past states when new information becomes available thereby identifying previously attempted tasks that were wrongly skipped and should therefore be reattempted.
To demonstrate the proposed method a case study of an industrial inspection multi-rotor drone is considered.The drone is tasked with mapping the thickness of metal surfaces in an industrial facility to identify damages to the structure.The measurements are conducted by contacting the surface with an ultrasound probe [24][25][26][27][28].The large number of measurements needed to get sufficient coverage makes the operation costly for a human operator to directly and continuously monitor, thereby warranting the need for autonomous execution.
The rest of the article is structured as follows: Section 2 states the problem formulation.Section 3 gives some background on Bayesian models.Section 4 presents the proposed method for developing and using the DDN.This method is applied to the case study in Section 5. Simulation results from the case study are presented in Section 6. Section 7 discusses the proposed method in light of the results from the case study.A conclusion is given in section 8.

Problem Statement
This article considers a robotic system that executes a sequence of independent tasks.Tasks are considered independent when "no task provides a necessary precondition for the fulfillment of another task" [29].This article does not consider in which order the tasks should be executed.It is assumed that the tasks are given as an ordered list at the start of the operation.The tasks are assumed to be timeindependent with no deadlines that have to be considered when planning.
This article considers a part of the autonomy layer that should decide if and how a task should be executed and whether maintenance is needed.Task executions can fail either due to problems related to the task making it harder for the robotic system to solve that task in particular, problems with the robotic system making it harder for the robot 123 to solve tasks in general, or due to random failure.If the task execution fails then the robotic system needs to decide whether it should attempt the task again, skip the current task as it seems impossible to complete, or request maintenance of the robot which can repair faults that are hindering the robot from successfully completing tasks.
Attempting to execute a task can cause a hazardous event.A hazardous event is one which can in the worst case lead to a loss.Crashing is an example of a hazardous event while damage to the robotic system is an example of a loss.There can be different ways of executing the task with different direct costs associated with the execution and they can affect the probability of achieving the goal of the task and causing hazardous events.
There can be different ways of maintaining the system that repairs or mitigates different types of faults with the robotic system.Maintenance actions are associated with a direct cost that the system must weigh against the advantage of maintaining the system.
When changing task, the system can choose between going to the next task in the sequence or returning to a previous task.Leaving a task without fulfilling its goal is associated with a cost.
To make decisions, measurements of relevant features of the current task together with information on how past task execution attempts went are available.Based on this information, the robotic system must infer its own state and the state of the environment to have a foundation for decision-making.

Background Theory
BBNs are directed acyclic graphs (DAG) used for probabilistic inference.The arcs in a BBN point from a parent node to a child node and represent dependencies.Nodes are often modeled as being able to be in a discrete set of states.Conditional probability tables (CPT) can then be used to define the probability that a child node is in a particular state for each possible combination of parent node states.
BBNs can be made dynamic, a DBN, by repeating the network for each time step and connecting the nodes based on how they depend on each other across time.Decisions can be included in the network, making it a DDN, by letting some of the nodes represent decision variables and including the decision in the list of evidence.An example of a DDN is shown in Fig. 1.
BBNs are typically used to evaluate the probability that a particular node is in a particular state, given some evidence.Each piece of evidence specifies which state a particular node is in.The probabilities of interest are evaluated using Bayesian probability laws while considering the dependencies and CPTs defined by the BBN [30,31].Multiple general solvers exist for evaluating Bayesian models [32].As these solvers do all of the necessary computations the rest of the article will focus on the development of the model and how the results evaluated with the model can be used for decision making.

Method
This section presents the proposed method for developing the DDN that will be used to infer the state of the robotic system and the environment, together with a strategy for using the DDN to choose what action the robotic system should take.
Figure 1 shows a simplified version of the network developed with the proposed method focusing on the time dynamics of the DDN. Figure 2 gives another example where more focus is placed on the nodes making up the network at a particular time step.
The basic procedure for using the DDN is as follows: 1.If available, insert evidence based on measurements available before a task is attempted.2. Evaluate the risk and gain of executing different actions.3. Execute the optimal action.4. If available, insert evidence based on the observed outcome of the action.5. Make a new time-step in the DDN.Each time step represents a decision that is made.

Developing the DDN
This article proposes developing the DDN through a topdown approach.This approach ensures that only states that can be distinguished from each other are included.The following steps are used to develop the DDN: 1. Describe the operation and system.

Step 1 -Describe the Operation and System
The operational description defines the tasks the system should execute and which actions the robotic system can choose between.
In the description of the robotic system, the available sensors, and information from different subsystems, such as a navigation system, are given.

Step 2 -Model Relevant Objectives
As risk is the "effect of uncertainty on objectives" [10], the relevant objectives must be identified to make risk-based decisions.Two types of objectives are considered: achieving the task goal and avoiding hazardous events.Not achieving the objectives is considered a failure, while the underlying cause of the failure is called the failure cause.Relevant hazards can be identified through different risk analysis methods, such as preliminary hazard analysis (PHA) [33] or system theoretic process analysis (STPA) [34].A node is introduced in the DDN for every goal and hazard, as shown in Fig. 2.
These nodes take on a binary state indicating whether the objective will be met or not on this execution attempt.

Step 3 -Model Failure Causes
Different failure causes, such as faults in the robotic system and adverse environmental states, can prevent the objectives from being fulfilled.Not achieving an objective is considered a failure.The failure causes can be identified with a risk analysis; see [33,34].Nodes are introduced that represent groups of failure causes that cannot be distinguished from each other, shown as light blue in Fig. 2. All failure causes that affect different measurements or that are affected differently by the choice of action can potentially be distinguished from each other.The failure cause nodes take on a binary state indicating whether any failure cause in this group will cause a failure on this execution attempt.

Step 4 -Model the Condition of the Failure Causes
The failure cause nodes introduced in the last step consider the expected outcome of a single execution attempt.New nodes, called condition nodes, are introduced to model the general condition of the failure causes.These nodes could, for example, be defined as the amount of wear or the failure rate of a component.One condition node is introduced for each failure cause node as shown in dark blue in Fig. 2.These nodes can have multiple states to model varying ability to achieve the goal.

Step 5 -Model Dynamics
A new time step is introduced in the DDN for each decision that is made.The condition nodes introduced in step 4 are connected to themselves between time-step as shown with the dotted arrows in Fig. 2.This enables the DDN to combine information over time.
Some conditions can be independent for each task.These conditions can be modeled by having an instance of the node for each task in the operation.The nodes representing the current task are connected to the current time step.An example of this is given in Fig. 3.

Step 6 -Model Measurements
Separate measurement nodes are introduced to enable the modeling of measurement uncertainty.Measurements available before a task execution depend on the condition nodes, while measurements of how the execution went depend on the objective or failure cause nodes as shown in Fig. 2.
Fig. 3 Example of how different task-specific nodes can be connected to the rest of the network at different time steps.Task 0 is connected to the rest of the network at time steps 0, 1, and 3, while task 1 is connected at time step 2 Step 7 -Quantification Bayesian models can be quantified based on expert judgment and operational data.This enables the models to be used on novel systems where operational data is missing.Quantification of CPTs based on expert judgment is not a trivial task, and many different methods exist to simplify the process [30,35].This article simplifies the process by using Boolean operators to define which combination of failure causes that affect the different objective nodes.The CPT of the failure cause nodes that are children of condition nodes translates the condition into a probability of failure on this execution attempt.The CPTs of the condition nodes specify how the state can degrade or improve based on the choice of action.The CPTs of the measurement nodes quantify the measurement uncertainty.

Decision Policy
Finding the optimal decision policy requires solving a partially observable Markov decision problem (POMDP), which in the general case is intractable except for small problems [36].To circumvent this problem, a heuristic policy is proposed.The policy considers the following three strategies consisting of one or multiple actions: 1) move on to another task, 2) attempt to execute the task once and then move on to another task, or 3) execute a maintenance action, attempt to execute the task execution, and then moving to another task.The expected cost of each strategy is evaluated, and the first action of the cheapest strategy is executed.After executing the first action of the strategy, the optimal strategy is re-evaluated.If strategy 2 is chosen multiple times in a row, then the system executes the current task multiple times without moving to another task.This ensures that the resulting closed-loop behavior can be closer to optimal behavior than any of the proposed strategies.
The cost of strategy 1, C 1 , has only a cost if the goal of the current task is not achieved.This cost, C G , is based on the consequence of not achieving the goal.This is shown in Eq. 1.More cases can be added if there can be a partial fulfillment of the goal.
The cost of strategy 2, C 2 (e), depends on the choice of execution action, e.There is a direct cost for executing action e, C E (e), and an indirect cost if a hazardous event occurs.
There can be multiple different hazards, each associated with its own cost, which are given as elements in the vector C H (e).This cost can depend on the choice of execution action.If the execution does not achieve the goal of this task, then there will be the additional cost of moving to another task, C 1 .The probability of achieving the task's goal, P G , and the probability of different hazardous events occurring, P H , when executing an action are evaluated using the DDN.These values are found by evaluating the probability that the objective nodes are in a failure state at the current time step.The resulting cost function is shown in equation Eq. 2. This cost is evaluated for all possible execution actions, e, applicable to the current task.
The cost of strategy 3, C 3 (m, e), depends on the choice of maintenance action, m, and execution action, e.The maintenance action can increase the probability of achieving the goal and reduce the probability of hazardous events occurring.The effect of the maintenance action is evaluated by inserting it as evidence in the action node at the current time step of the DDN and then simulating one step forward in time by temporarily adding a new time step to the DDN.The cost of execution (strategy 2) can then be evaluated at this time step, C 2,m (e).The cost of the maintenance action must be included as well.This cost is often quite high but can improve the success rate of multiple future task execution attempts.The maintenance cost, C M (m), is divided by the expected number of executions until maintenance is needed again, N (m).The resulting cost is shown in Eq. 3 and should be evaluated for all combinations of maintenance actions, m, and execution actions, e.
When moving to another task (strategy 1), the system can choose to revisit a previously attempted task.The expected cost of executing a previously attempted task is evaluated by simulating that the system moves to this task.The system returns to a previously attempted task if the expected cost of executing the task, C 2 (e), plus the cost of returning to the previous task, C Ret , is lower than the cost of omitting the task, C 1 , as shown in equation Eq. 4. A task is reattempted if the visit is warranted for any of the available execution actions.If none of the previously attempted tasks are worth another attempt, then the system will move to the next task in the sequence that is not attempted.
Attempting a task before and after maintaining the system enables the system to identify if a maintenance action helped.This behavior is encouraged by always choosing an execution action if the current task has not been attempted and if strategy 2 is cheaper than strategy 1.If this is not the case, then the normal policy is followed.

Case Study
In this section, the proposed method is applied to a multirotor drone tasked with industrial inspection.The case study setup is developed in cooperation with the drone inspection technology company ScoutDI.Figure 4 shows the ScoutDI drone performing an ultrasound thickness measurement.The case study is based on simulation.

Step 1 -Describe the Operation and System
The operation consists of measuring metal surface thickness with an ultrasound sensor mounted on a multirotor drone.A large number of points are typically inspected.Every inspection point is considered a task in the proposed method.The system can choose between two different ways of inspecting the surface of the inspection point: a normal inspection and a slower but safer inspection.A small amount of gel is dispensed from a tank mounted on the drone for each inspection.One maintenance action available to the drone is to refill this Fig. 4 A ScoutDI prototype drone during an ultrasound inspection of a storage tank.Courtesy ScoutDI tank.Another is to request a full maintenance check by an operator.The drone can skip inspection points deemed too costly to inspect autonomously.
The drone is equipped with a lidar used to detect obstacles and navigate.

Step 2 -Model Relevant Objectives
The goal of each task is to measure the surface thickness of the inspection points.The drone is assumed to operate in controlled industrial facilities consisting of metal surfaces without any humans present.This makes damage to the drone the most relevant loss.A hazard that can cause this loss is uncontrolled contact with a surface or other object.Nodes representing the two objectives are shown on line L1 in Fig. 5.

Step 3 -Model Failure Causes
Through discussions with ScoutDI different failure causes were identified.Some of the failure causes, such as an empty gel tank, rust or dirt stuck on the ultrasound sensor, or inspection surfaces covered with rust or dirt can prevent data from being gathered.Other failure causes, such as a worn motor, poor navigation quality, or obstacles, can lead to uncontrolled contact in addition to preventing data from being gathered.To simplify modeling, two intermediate nodes are introduced: one for failure causes preventing data from being gathered, the other for failure causes preventing both controlled contact and data from being gathered.These are shown on line L2 in Fig. 5.
The drone and the surface of the inspection point are affected differently by choice of action.Executing an inspection may damage the drone, while the surface will not be affected.Similarly, maintaining the drone does not affect the surface.Moving to a new inspection point will change the surface but not affect the drone.A distinction between drone-related and surface-related nodes is therefore needed.Furthermore, the refill-gel action only affects the gel level.These nodes are shown on line L3 in Fig. 5.
Before an inspection is executed, a lidar scan of the inspection surface can reveal protruding obstacles that will prevent controlled contact and data gathering.The limited resolution of the lidar can cause it to systematically miss thin obstacles, such as welding joints or minor surface irregularities.A distinction between failure causes that are measurable and those that are not can therefore be made, as shown on line L4 in Fig. 5.

Step 4 -Model the Condition of the Failure Causes
A slightly dirty or uneven surface, or a minor fault in the drone, can reduce the likelihood of an inspection succeeding without hindering it completely.For all nodes except the "gel 123 Fig. 5 The conditional probability tables and dependencies in the DDN for the inspection drone case study.Some tables are intentionally left blank and are instead given in Tables 1, 2, and 3. Nodes containing a * refer to the table marked with a * shown on the bottom left of the Fig level" node, the states of the condition nodes reflect the average frequency at which the respective conditions will cause a failure.These frequencies are discretized into different states, as shown on line L5 in Fig. 5.
The state of the gel level indicates the amount of gel left.When the gel level approaches zero, an insufficient amount of gel might be deployed.This will prevent data from being gathered.
Step 5 -Model Dynamics Drone-related conditions have a probability of degrading with each inspection attempt.The probability and severity of the degradation depend on whether an uncontrolled contact occurred and whether a normal or safe inspection was performed.The gel level is gradually depleted with each inspection attempt.There can be some variation in the amount of gel dispensed making the number of inspection attempts before a refilled uncertain.
The surface-related conditions are assumed constant over time and independent at each inspection point.These are handled as discussed in Section 4.1 step 5 and illustrated in Fig. 3.

Step 6-Model Measurements
The "surface suitability measurement" is introduced as shown at the bottom of Fig. 5.This measurement is, as discussed in step 3, based on how flat the area around the inspection point seems based on the lidar scan.
After an inspection is executed, a measurement of how the execution went is needed.Whether data is successfully gathered is readily available from the ultrasound thickness sensor.Whether an uncontrolled contact occurred cannot be directly measured.Instead, this can be inferred based on the trajectory conformity measurement.This measurement is made by comparing the observed trajectory of the drone with the intended trajectory and identifying any deviations in position, velocity, and heading.
Step 7 -Quantification Contact-based inspection drones are in the early stage of development.It is, therefore, little or no operational data and experience to use as a basis for the quantification.The choice of hardware and software design will significantly affect the quantification process.The quantification will be sensitive to factors such as how robust the ultrasound sensor is, how robust the drone is to impact, and how well the drone manages to navigate.To demonstrate the proposed algorithm, some example values are chosen in collaboration with ScoutDI.The following assumptions were considered during the quantification process: • Some inspections require the operators to clean the inspection surface first [37].As this is not possible for the drone, there is a chance that there will be surfaces where the drone cannot gather data.• The drone must be in stable contact to get a measurement.
Touching the wall correctly with the sensor is difficult, making it likely that the drone will fail at some inspection attempts.• The sensor can become defect due to dirt or rust sticking to it.This can happen even without an uncontrolled contact occurring • An uncontrolled contact can displace the sensor or damage the drone's integrity, making it unable to continue.The likelihood of damaging the drone is low as it is built to be robust to impacts.
The result of the quantification process is shown in Fig. 5 and Tables 1, 2, and 3.The initial probability distributions can be found in Table 1.Tables 2 and 3 show the probabilities of transitioning to a worse state for the different drone condition nodes when an inspection is attempted.The refill gel action will set the gel level to 100%.The full maintenance action sets all drone-related nodes, including the gel level, to their initial distribution.

Decision Policy
The decision policy presented in Section 4.2 is used with the parameters given in Table 4.These costs are based on the expected time use of the different actions.The expected time use of an uncontrolled impact is based on the expected time needed to repair the different degrees of damages that can occur times the likelihood of them occurring from an uncontrolled impact.It is assumed that an uncontrolled contact will seldom damage the drone, making the cost relatively low.Even if the drone is not damaged, it might require human assistance if it falls to the ground.The cost of not achieving the goal is based on the additional time used for a manual inspection.Evaluating an exact value for these costs can be difficult in practice.Some tuning of the values might therefore be necessary if the observed behavior is inadequate.
Having values that can be interpreted still gives an advantage as it gives an intuition on what the values should be.

Results
This section presents four scenarios for the inspection drone case study.The scenarios represent different types of failures and events that are deemed likely to occur during the drone's mission.Scenario 1 considers a case where the ultrasound sensor is not working.In scenario 2 the drone is unable to have a controlled contact.Scenario 3 demonstrates the effect measurements have on the system's behavior.Lastly, scenario 4 considers a case where the gel is depleted.The simulations are done by having a model of the drones state and the sate of the different inspection surfaces as state machines.Their respective states define which measurements that are made and what the output will be of performing an action.The drone's state can either be working, with a defective ultrasound sensor, or in a state making it unable to make controlled contact.Each inspection surface can either be ideal, unable to be measured, with a measurable blocking obstacle, or with an immeasurable blocking obstacle.The DDN used to make decisions is evaluated using the SMILE [32] library for Python.

Scenario 1
In this scenario, the ultrasound sensor is not working.All inspections end with no data being gathered but perfect trajectory conformity.Figure 6 show how the belief of the system develops over time when new inspections are attempted.Only the belief that drone-related and surface-related failure causes will prevent data gathering is shown.The rest of the failure causes have a belief close to 0 throughout this scenario.
Table 4 The different costs used in the decision policy for the case study The four first time-steps of Fig. 6 shows the behavior and beliefs of the drone when it is at the first inspection point.As seen in the table in Fig. 6, the drone attempts to execute an inspection, which results in no data but perfect trajectory conformity.After the first inspection fails, the belief that surface-related failure causes prevent data from being gathered increases, as shown by the solid blue line.The belief that drone-related failure causes prevent data gathering also increases but much less.This is due to it being more probable that a single failed inspection is caused by the surface than by the drone.This trend continues for the subsequent inspection attempts.At time step 3, the belief that surface-related failure causes will prevent data gathering is high enough, making the drone skip the current inspection point and move on to inspection point 1.
The dashed blue line in Fig. 6 shows the system's belief about past states evaluated at time step 3.As the stat of the surface cannot change, the belief about the past states is equal to the newest belief.The state of the drone can, on the other hand, degrade, making the updated belief regarding the state of the drone at time step 0 slightly lower than at time step 3.
At inspection point 1, the same behavior is observed as at inspection point 0. After three failed attempts, the system skips this inspection point and moves on to inspection point 2. When evaluating the past states at time step 7, shown with the orange dashed line in Fig. 6, the probability that the drone-related failure causes are preventing data gathering has increased.Since the belief that drone-related failure causes prevented data gathering in time steps 0-3 has increased, the belief that surface-related failures caused the failed inspection at inspection point 0 decreases.This can be seen by the dashed orange line being lower than the dashed blue line at time step 0-3 for the surface-related failure causes.
After failing an inspection at inspection point 2 as well, the belief that the drone-related failure causes prevent data gathering is high enough, making a full maintenance worth the cost.After maintenance, the following inspection at time step 10 is successful.As the inspection failed before the maintenance but succeeded after, it becomes more probable that there was a fault with the drone that was solved by the maintenance.Reasoning backward in time decreases the probability that surface-related failures caused the previ-ously failed inspections, as shown with the dashed green line in Fig. 6 When considering where to go next, the system evaluates whether a previously visited inspection point is worth another inspection attempt.Since the belief that the surfaces on these inspection points caused the failures has decreased, the system concludes that they are worth another attempt.The system first visits inspection point 1 again, where data is gathered successfully.This further strengthens the belief that the drone caused the previously failed inspections.The system then returns to inspection point 0 and has a successful inspection before moving on to a new inspection point.

Scenario 2
The drone is in a condition such that it cannot establish good contact with the surface.All inspections result in data not being gathered and medium path conformity.The belief that drone-related and surface-related failure causes prevent data gathering and that drone-related and unmeasurable surfacerelated failure causes prevent controlled contact and data gathering is shown in Fig. 7.The rest of the failure causes have a belief close to zero throughout this scenario.
In this scenario, the drone does not attempt to inspect the inspection point again after the failed inspection attempt at time step 0, as shown in Fig. 7.This is due to the large cost associated with the possibility of having uncontrolled contact if the inspection is reattempted.At inspection point 1, a safe inspection action is performed since there is a considerable probability that the failure at time step 0 was caused by the drone.The system attempts one last inspection at inspection point 2 before requesting full maintenance.After maintenance, a safe inspection is executed since the failure might have been caused the surface, which was unaffected by the maintenance action.As the inspection was successful, the belief that surface-related failure causes prevented controlled Fig. 7 Scenario 2. The table shows the measurements available before inspection, the choice of action, and the resulting measurements.The graph shows the state of failure causes relevant in this scenario.The solid line shows the belief of the drone that a failure cause is present at each time step.The dashed line shows the updated belief of past states evaluated every time the drone moves to a new inspection point (IP), which is marked with a vertical line.The color of the dashed line indicates when the updated belief was evaluated Fig. 8 Scenario 3. The table shows the measurements available before inspection, the choice of action, and the resulting measurements.No graphs are shown as they give little additional information in this scenario contact and data gathering at inspection point 2 decreased.When reasoning backward in time at time step 7, as shown by the dashed green line, the belief that "drone-related failure causes prevents controlled contact and data gathering" at time steps 0 and 2 is significantly increased.This decreases the belief that the failed inspection was caused by surfacerelated failure causes, making another attempt worth its cost.A safe inspection is performed at inspection point 1, as there could still be surface-related failure causes at this inspection point.

Scenario 3
This scenario demonstrates how the surface suitability measurement affects the choice of actions.Figure 8 shows how the system decides not to attempt an inspection if the surface suitability measurement is poor.With a medium surface suitability measurement, a safe inspection is attempted, but the system only attempts one inspection.When the surface suitability measurement is good but not perfect, two inspection attempts are attempted before moving on.

Scenario 4
This scenario demonstrates the effects of the gel level node.Figure 9 shows the expected value of the gel-level node in addition to the belief that gel-level-related failure causes prevent data gathering to better show how the gel depletes over time.The figure starts after 12 successful inspections.With each inspection, the expected gel level decreases.The belief that the "gel-level-related failure causes preventing data gathering" first increases when the expected gel level is close to depleted.When no data is gathered in the inspection attempt at time step 37, the drone assumes a low gel level caused it, making it execute a refill.

Discussion
Scenario 1 shows that the system is able to distinguish between faults with the drone and adverse inspection surfaces by combining information over time.This enables the system Fig. 9 Scenario The table shows the measurements available before inspection, the of action, and the resulting measurements.All inspections prior to time-step 33 resulted in data being gathered and perfect trajectory conformity.The graph shows the state of failure causes relevant in this scenario.The solid line shows the belief of the drone that a failure cause is present at each time step.The dashed line shows the updated belief of past states evaluated every time the drone moves to a new inspection point (IP), which is marked with a vertical line.The color of the dashed line indicates when the updated belief was evaluated to executing maintenance actions when needed.Furthermore, this scenario demonstrates that reasoning backward in time enables the system to realize that the previously visited inspection points were not the cause of the failure as it previously assumed.This enables the system to return to previously failed tasks and reattempt the inspection.
Scenario 2 shows a case similar to scenario 1, with the difference that the drone experienced a worse trajectory conformity.This could be explained by an unmeasured obstacle in this current location, by a damage to the drone, or it could be a random failure.The possibility that there was an unmeasurable obstacle made the system not reattempt the failed task, as it did in scenario 1, but rather go directly to the next task.The possibility that there was a failure with the drone made the drone execute a safe inspection at the second location.This demonstrates how the system reasons with risk, and how it considers the underlying causes while doing so.
Scenario 3 demonstrates that the system considers the measurements available before the task execution to proactively manage risk.Scenario 4 demonstrates how the gel-level node affects the behavior.In scenario 1, the system does not believe that the gel level caused the failed inspections, as the failure occurs immediately after take-off.In scenario 4, many inspections were successfully performed before the execution failed, making it probable that the gel was depleted.This shows that the system manages to distinguish between different types of internal faults when it affects the system differently.
The proposed method for building the DDN ensures that the condition nodes, which are the possible explanations for the observed behavior, are quite general.Having general nodes ensures that the nodes actually represent features the system is able to distinguish based on the observations.When there is a high belief that "drone-related failure causes prevent data gathering", the system does not know what the failure cause is.It could be anything preventing the drone from gathering data at multiple inspection points that do not affect its motion.The sensor could be displaced, there might be dirt on the sensor, or the sensor might be wrongly calibrated or unsuitable for the current mission.Which of these scenarios is true is irrelevant, as they all prevent data from being gathered and have the same solution: requesting maintenance.Constructing general condition nodes for all possible ways the system can be affected by actions and measurements ensures that the system has a possible explanation for all observations.
The DBN produced by the proposed method does not model the severity of the losses that can be caused by the occurrence of a hazard, such as having an uncontrolled contact.The different losses that can occur may have different severity and probabilities associated with them.Modeling the losses could enable the system to distinguish between different levels of severity, enabling the system to change its behavior accordingly.
Based on the observed results, no obvious sub-optimal behavior with the proposed heuristic decision policy was observed.One drawback with the heuristic is the discount factor, N (m).This factor could, in theory, be based on the probability of degrading the drone with each inspection attempt.This factor has a straightforward interpretation and effect on the resulting behavior, making the discount factor an acceptable trade-off between simplicity and quality of the heuristic.
The resulting decision policy needs to evaluate the network multiple times for each time step to simulate the effect of different maintenance actions and to evaluate whether the system should return to a previous point.This could potentially be alleviated by further simplifications, such as specifying thresholds for the different condition nodes.A predefined maintenance action can then be executed when the belief surpasses the threshold.The drawback of this approach is that it would lose information on the interaction between components.This is especially important for more complicated systems with more causal factors.
The point of the case study was to demonstrate capabilities that can be achieved with the proposed system.It was not to solve the case study in the most optimal or simplest manner.Similar behavior as the presented results could be achieved by, for example, defining an exhaustive set of conditional rules.These types of methods may work for very simple problems but do not scale well for more complex problems.Having a systematic approach, such as the one presented, that achieves the capabilities needed for an autonomous system to operate without human supervision can therefore be of great value.
Evaluating DDNs becomes computationally expensive when the number of time steps increases.The number of time steps can be limited by using a sliding window approach where only the n newest time steps are included in the DDN.The initial condition of the DDN must reflect the information that is no longer inside the sliding window.This can be achieved by setting the priors at the first time step inside the window equal to the posterior evaluated at the last step outside the window.A drawback with a sliding window approach is that only time steps inside the window will be considered when evaluating past states with new information.This constitutes a challenge for the proposed method as the number of time steps that are computationally feasible to consider might be too low to consider all previously attempted tasks that are interesting to reconsider.One possible way to alleviate this problem is to find a more compact way to represent information on previous tasks.Currently, previous tasks are represented by multiple time steps, one for each execution attempt and a time step for every move and repair action.This article presents an approach for structuring a DDN and using it for operational decision-making.The article's goal is to contribute toward enabling autonomous systems to safely operate without direct human supervision.Through a case study of an industrial inspection drone it is demonstrated how the resulting system is able to increase the drones situation awareness about its own state and the state of the environment, and how the drone can use this information to make risk-based decisions.Additionally it is demonstrated how evaluating past states with new information can reveal tasks the drone wrongly skipped that it should return to for another inspection attempt.
Future work can consider how the severity of a hazard occurring can be modeled, how evaluating past states could be simplified such that a longer time horizon can be considered, and on experimental validation.

Fig. 1
Fig. 1 View of a very simple DDN developed with the proposed method where the focus is placed on the time dynamics.Objective nodes are shown in orange, failure cause nodes in light blue, condition nodes in dark blue, measurement nodes in green, and action nodes in gray.The current time step (t) is shown together with one earlier time step (t − 1) and one future time step (t + 1)

Fig. 2
Fig. 2 Example network structure made using the proposed methods.Compared to Fig. 1 this figure considers a more complicated model.In this figure, the focus is placed on the nodes present at each time-step in the network in which of the steps presented in Section 4.1 they are introduced.The circular arrows represent connections across time steps

N (m = Refill) 20 NFig. 6
Fig.6 Scenario 1.The table shows the measurements available before inspection, the choice of action, and the resulting measurements.The graph shows the state of failure causes relevant in this scenario.The solid line shows the belief of the drone that a failure cause is present

Table 2
Probability of transitioning to worse states given the choice of action and whether an uncontrolled contact is avoided.The table gives the probability of degrading the state by different amounts.Transitions that lead to negative probabilities are omitted before the resulting distribution is normalized.The probability of transitioning to a better state is 0