Decision-making under uncertainty: be aware of your priorities

Self-adaptive systems (SASs) are increasingly leveraging autonomy in their decision-making to manage uncertainty in their operating environments. A key problem with SASs is ensuring their requirements remain satisfied as they adapt. The trade-off analysis of the non-functional requirements (NFRs) is key to establish balance among them. Further, when performing the trade-offs it is necessary to know the importance of each NFR to be able to resolve conflicts among them. Such trade-off analyses are often built upon optimisation methods, including decision analysis and utility theory. A problem with these techniques is that they use a single-scalar utility value to represent the overall combined priority for all the NFRs. However, this combined scalar priority value may hide information about the impacts of the environmental contexts on the individual NFRs’ priorities, which may change over time. Hence, there is a need for support for runtime, autonomous reasoning about the separate priority values for each NFR, while using the knowledge acquired based on evidence collected. In this paper, we propose Pri-AwaRE, a self-adaptive architecture that makes use of Multi-Reward Partially Observable Markov Decision Process (MR-POMDP) to perform decision-making for SASs while offering awareness of NFRs’ priorities. MR-POMDP is used as a priority-aware runtime specification model to support runtime reasoning and autonomous tuning of the distinct priority values of NFRs using a vector-valued reward function. We also evaluate the usefulness of our Pri-AwaRE approach by applying it to two substantial example applications from the networking and IoT domains.


Introduction
Self-adaptive systems (SASs) are systems that take dynamic adaptive decisions under uncertain environmental conditions to achieve their functional and non-functional requirements (NFRs) [26,30,51].An example of a SAS would be an Internet of Things (IoT) network [2,61], that serves as a basis for the implementation of a cyber-physical system such as a smart home.These systems are continuously exposed to different environmental situations, such as communication interference and dynamic traffic loads, which may have different effects on the satisfaction levels of the NFRs.Such effects might include high energy consumption or poor packet delivery performance [27,35,52].Hence, a SAS needs to adapt dynamically at runtime.These adaptations may involve trade-offs between the SAS's NFRs as the encountered environmental conditions change, sometimes in ways that deviate from those anticipated at design time.
Specification models exist that describe the decisionmaking process based on these trade-offs, and include the alternative adaptation actions and the priority of NFRs [20,29].If these specification models, developed at design time, can also be used and updated at runtime, a SAS can be made requirements-aware [51] by monitoring its requirements' compliance during runtime.The specification model (S) is derived by analysts from the requirements and the knowledge domain (K).According to Zave and Jackson [63] (see Eq. 1), monitoring compliance with the requirements (R) can be done by monitoring compliance with the specification model (S).

S, K R
( It is safe to believe that S will remain a valid implementation of R if and only if K does not change from the moment that S was built until the moment requirements compliance is assessed.However, in a SAS there is uncertainty about K [10], which cannot be assumed to remain unchanged.On the upside, a SAS may provide opportunities to learn about its environment and so reduce K's uncertainty.
Several runtime optimisation techniques have been developed to support the decision-making process specified in S of SASs [1,6,9,16,18,33,59].These techniques are based on optimisation methods that include decision analysis and utility theory [46,47], that select from a set of alternatives the adaptation that yields the highest utility value.
A problem with these techniques used in SASs is that they typically use single-objective optimisation techniques [44,55], i.e. they use a single-scalar cumulative utility value to represent a combined cardinal priority for all NFRs.However, the adaptive decisions taken by SASs can have different effects, either positive or negative, on the satisfaction levels of individual NFRs [30,55,59].For example, in an IoT network, the decision to increase transmission power on the links under the situation of high interference will have a positive impact on the packet delivery performance, but it will harm energy consumption [61].Single-objective techniques using a combined cardinal priority do not give any information about the different impacts of adaptive decisions on the individual NFRs in terms of their satisfaction.Further, these impacts may change according to the evolution of the SAS over time, leading the SAS to evolve its adaptation strategies [13].Hence, the priorities assigned at design time may no longer be valid at runtime due to unforeseen or emergent contexts that in turn may lead to the violation of a NFR.In a nutshell, the limitations of current optimisation techniques are (i) that they treat NFRs' priorities as a single combined value, and (ii) the assigned NFRs' priorities are considered to be fixed and unchanging.

Principal ideas
We argue that adaptation decisions need to be informed by the NFRs' priorities; something that cannot be achieved if the priorities are aggregated into a single combined value.However, it may prove that the priorities assigned at design time are not appropriate or even achievable under certain conditions.The priorities may need to be re-evaluated and changed, informed by the knowledge gained by the SAS encountering such conditions.
Let us define priority-awareness.
Definition 1 Priority-awareness is the capability of provision for autonomous changes of priorities to address the required satisfaction levels of NFRs.
The compliance with the requirements (R), according to Zave and Jackson [63], can be achieved using a runtime specification model (S) that is equipped with the newly found knowledge (K') that has an effect on changing individual NFR priorities.

S, K R (2)
The key challenge here is to have a runtime specification model that has the capability of: (1) modelling and reasoning with the cardinal priorities of individual NFRs.(2) supporting the tuning of NFRs' priorities to better match the newly discovered situation and acquired knowledge, while respecting their relative priorities.
Based on [63], next we present our main contributions towards addressing the specified research challenges.

Contributions
In this paper, we propose Pri-AwaRE, a self-adaptive architecture, that uses an extension of Multi-Reward Partially Observable Markov Decision Process (MR-POMDP) [54,55] called MR-POMDP++, as a runtime specification model (S) embedded within the MAPE-K loop.MR-POMDP++ is a multi-objective sequential decision-making technique that uses the concept of rewards to: a. support priority-aware decision-making by providing the runtime modelling and reasoning of priorities of individual NFRs using a vector-valued reward function.Hence, the decision-making takes into account the knowledge (K') that allows the MR-POMDP++ to re-evaluate the priorities.
b. provide SASs with a principled way to maintain compliance of the Requirements (R) by autonomously tuning the NFRs' priorities at runtime under uncertain environmental contexts.
We also provide a proof of concept by applying our proposed approach to two different example applications from the networking and IoT domains and comparing it to the existing state-of-the-art techniques.Based on the experiments, we show that priority-aware decisions offered by our Pri-AwaRE architecture support satisfaction of NFRs in terms of more informed choices of NFRs' priorities.

Organization of the paper
The paper is organized as follows: Sect. 2 explains baseline concepts related to decision-making in SASs.Section 3 presents the Pri-AwaRE architecture to support priorityaware decision-making in SASs.In Sect.4, the experiments and evaluations are presented followed by threats to validity in Sect. 5. Section 6 presents related work.Finally, the conclusions and future work are presented in Sect.7.

Underlying concepts
In this section, we introduce the key concepts used in the paper.First, we define the decision-making process in SASs driven by NFRs [3,12,20].We then describe the techniques and architectural concepts that we use in our work to ensure that NFRs are satisfied to an appropriate level that respects their respective priorities.These are: Partially Observable Markov Decision Process (POMDP), Multi-Reward Partially Observable Markov Decision Process (MR-POMDP), Optimistic Linear Support (OLS) algorithm and the MAPE-K architecture.

Decision-making in SASs
SASs are continuously exposed to different environmental contexts that affect the satisfaction of NFRs.During the decision-making process of a SAS, the system performs different tasks that have different impacts on the satisfaction levels of NFRs.The decision-making process of a SAS involves the following key concepts [4]:

NFRs
The main objective of decision-making in a SAS is to satisfy its non-functional quality requirements, i.e. the NFRs, while fulfilling the functional goals [6].The NFRs are associated with two important characteristics at runtime [20]: -Satisfaction Level The satisfaction level of a NFR refers to the extent to which that NFR has been satisfied as a consequence of an action performed by the SAS and its level of satisfaction at the previous time step of execution.The satisfaction level can be represented by a conditional probability distribution P(NFRi is satisfied | action a).
-Priority The priority value for a NFR is a scalar cardinal value used to represent its importance for satisfaction at runtime.The priority for satisfaction of NFRs may change due to the change in environmental conditions at runtime.

Monitorables
As the environment in which the SAS is operating is continuously changing, a SAS continuously monitors these changes over time by using monitorables that represent information about the state of the environment.For example, the interference and traffic load on the network links can be monitored.

Actions
Actions are defined as the adaptation strategies comprising of discrete set of software configurations, solutions or service components that is selected by a SAS during decision-making [5,49].To achieve the target functional goals along with satisfying the NFRs, the SAS selects an action [3,38] based on the monitorable values and the satisfaction levels and priorities for the NFRs at a given point of time.The adaptation actions performed by SASs have an impact (positive or negative) on the satisfaction levels of NFRs.

POMDPs
The significance of POMDPs to SASs is that they are used to solve sequential decision-making problems under uncertain and dynamic environmental contexts [45,56].They offer an agent-based decision-making approach by considering the agents working in a partially observable environment.This means that the agent cannot directly observe the underlying state.Instead, to choose the optimal action, it must maintain a probability distribution over the set of possible states, known as a belief, based on a set of observations and observation probabilities.
An exact solution to a POMDP yields the optimal action for each possible belief over the possible states.The optimal action maximizes the expected reward of the agent over a possibly infinite horizon.The sequence of optimal actions is known as the optimal policy of the agent for interacting with its environment.The basic elements of a POMDP are shown in Fig. 1.
A POMDP is specified as a tuple <S,A,Z,T,O,R,γ > where S represents the set of states specifying a description of the state of the environment; A is the set of Actions that the agent can select to perform at a particular time; Z is the set of observations specifying the information received by the agent from the environment using sensors related to the set of states S; T is the transition function T (s, a, s ) = P(s |s, a) specifying the probability of moving to the next state s' given an action a and current state s; O is the observation function O(s, a, z) = P(z|s, a) specifying the probability of observing the observation z given an action a and resulting state s; R is the reward function R(s,a) specifying a scalar real value generated by the environment as a feedback of the action a performed by the agent provided with the state s Fig. 1 POMDP of the environment; γ is the discount factor to support the preference for the immediate reward values over the future rewards.The value of the discount factor lies between 0 and 1.The values for rewards are normally assigned by the domain experts at design time based on the information that is provided to them at design time or from the previous experiences [23,34,36].
The decision-making agent tries to find the policy π , a mapping from the state of the environment to action, that maximizes the value function, i.e. the expected utility value of the sum of discounted rewards, as follows: Hence, based on the current state s t of the system, the value function V π is used to compute how much cumulative reward, discounted by γ , we can expect to get in future, if a particular action is performed in a particular state at time t.
Thus, the reward value R(s,a) is used to evaluate the effect of performing an action a during the state s with the help of a value function.Therefore, a cardinal scale is assigned to each decision made during a specific state of the system to indicate its priority.
As the states in a POMDP are not fully observable, a belief b over the states of the system is maintained.A POMDP offers the capability to quantify uncertainty in terms of the (partially) observed state of the environment.In reference to point-based planning methods [53,57] for solving POMDPs, that focus on the computation of a policy based on a sampling of points from the belief space, the value function over the belief V b is represented by a set of α-vectors A. Each α-vector is associated with an action a and has a length of |S| in order to provide a value for each state s.The α-vector is represented as follows: Here, V (s i ) represents the value of the value function for state s i given a total number of n states.
Thus, given A, the value over the belief is computed as: Therefore, for each belief b, a set of α vectors A provides a policy π A for the action that maximizes the value.The application of selected actions by the decision-making agent leads to a change in the state of the system.

MR-POMDPs
MR-POMDP [44,54,55], similar to POMDPs, is a sequential decision-making technique used to solve multi-objective decision problems.MR-POMDP is actually a POMDP that has more than one reward values represented in the form of a vector-valued reward function R as shown in Fig. 2. In MR-POMDPs, each objective (NFRs in our case) is associated with its own separate reward value.Hence, the size of the reward vector is equal to the number of objectives.As a consequence, the value function, given an initial belief V b 0 for the policy π of the MR-POMDP, is also a vector.Thus, each single element in the value function vector represents the expected utility value associated with each separate objective.As a result, it evaluates the effect of performing an action on the satisfaction of that objective given a particular state.The values of the elements in the value vector, associated with each objective, create a relative ranking for the objectives.Hence, these expected utility values represent the priority of objectives for their satisfaction, while making the decision.We use this built-in capability of MR-POMDPs to compute expected utility value for each individual objective as a base for the autonomous tuning of the priorities at runtime.
As R is a vector, each element in the α-vector is also a vector as a result creating an α-matrix, A. Each row in the α-matrix represents the values for the objectives in a particular state.The multi-objective value of taking an action a associated with alpha matrix A under a belief b is computed as follows: As the value function is represented as a vector in MR-POMDP, there may be multiple policies.The value functions of these multiple policies can be considered as optimal on the basis of the different priorities associated with the objectives.In order to select the best optimal policy from these multiple policies, a scalarization function f(V b ,W) is used to scalarize the value vectors V b with respect to the weights W, and computed by the agent, corresponding to the objectives [43] and computed as follows: (7) where w i and V b i refer to the weight and value for the ith objective given n number of total objectives.The size of the weights vector W is also equal to the number of the objectives.In this paper, the weights vector values are computed using the Optimistic Linear Support (OLS) algorithm [44] at runtime.
Hence, for a given belief b, α matrix for each action and weight w, we can compute the policy π A that takes the maximal value using Eqs.6 and 7 as: Hence, in MR-POMDP the reward vector is used to represent the priorities of objectives (NFRs in our case) by indicating their desirability in terms of their satisfaction given a particular state of the environment.The reward vector has to be initialized with estimated values.These values are assigned by domain experts and should reflect the experts' knowledge and the information they have available to them.The design-time assignment of priorities is a normal requirements practice.However, it is difficult to get right and particularly so if, as is often the case for SASs, the priorities assigned to a set of requirements may not be appropriate for all contexts the system encounters at run-time.Nor is the deployment of expert knowledge always easy, as highlighted in [58] where consensus between experts proved hard to achieve.To mitigate these difficulties, Pri-AwaRE permits requirements' priorities to be revised dynamically at runtime in a principled way.Nevertheless, the NFRs' satisfaction may be compromised if the initial, estimated reward values are poorly chosen.In our work, we have used simulations [21,50] to derive good initial values for the rewards.

Optimistic Linear Support
The Optimistic Linear Support (OLS) algorithm is based on Cheng's Linear Support [11] approach for solving POMDPs.OLS presented as Algorithm 1 follows an outer loop approach that creates an outer shell around a MR-POMDP solver (Persues1 ) [57] to create a solution set known as the Convex Coverage Set (CCS) X [43] to represent the collection of value vectors V π (specifying the multi-objective values) and their associated policies π such that after performing scalarization a maximizing policy is in the set.In order to select the policy having the maximizing value V * X (w), linear scalarization of the value vector is performed using the parameters in the form of weights vector. 2OLS helps in finding these weights intelligently at runtime.The OLS algorithm considering a two objective problem is presented in Fig 3.
The OLS algorithm starts by taking an empty set of X denoting the CCS of value vectors as shown in line 1 of Algorithm 1.The algorithm repeatedly executes steps 2 to 9 until no improved value vectors are found evaluated by the Maximal Possible Improvement Δ [42,44].In the first two iterations of the while loop, the algorithm selects the first two corner points as the extrema of the weights simplex, i.e. w a = 0.0 and w b =1.0 as represented by the red vertical lines in Fig. 3. Next, the value vectors, for example V a = [8,1] and V b = [2,7], for these corner points (w a and w b ) are computed using a MR-POMDP solver represented by the blue lines in between these corner points.On the basis of these value vectors, a new corner point (w c ) is identified at their intersection.Then, Maximal Possible Improvement Δ [44] is computed for w c .If Δ is improving then for this new corner weight w c , a new value vector V c = [6,5] is calculated by the calling the MR-POMDP solver as shown in Fig. 3.More corner points are generated such as w d and w e from the intersecting points of the existing value vectors.Again Maximal Possible Improvement Δ for each w d and w e is calculated.The corner weight (out of w d and w e ) having Fig. 2 MR-POMDP Fig. 3 Optimistic Linear Support high Δ is selected and MR-POMDP solver is called again to compute the value vector for the selected weight.The process is repeated until none of the remaining corner weights yield an improvement in the form of Δ.
The process returns a set X known as CCS of the value vectors and their associated policies.As we have more than one policy returned, we have to select the best policy on the basis of scalarization function by taking weight values generated as part of the OLS algorithm.The policy having the maximum scalarized value V * X (w) is then selected.The flow chart representing the step by step execution of OLS algorithm is shown in Fig. 4.More details on OLS and MR-POMDP solver can be found at [48].
The OLSAR 3 algorithm is an extension of OLS algorithm.It follows the same steps as OLS but it reuses the alpha matrices from previous iterations to compute the approximate Convex Coverage Set (CCS) of value vectors.

MAPE-K architecture
The MAPE-K control loop is an architectural blueprint for autonomic computing systems and was first introduced by IBM as a vision about Autonomic Computing [25].As SASs represent a specialized form of autonomic systems, they also employ the MAPE-K architecture loop.This architecture consists of the managing system and the managed system.The managed system corresponds to the application logic while the managing system corresponds with the decisionmaking agent, which makes use of the feedback loop with the phases Monitor-Analyse-Plan-Execute that run over a knowledge base.It is known as the MAPE-K loop for short.During the Monitor phase, the managing system collects data Fig. 4 Step By step execution of OLS from the managed system supported by sensors connected to the managed system.In the Analyse phase, the managing system analyses the monitored data to check if adaptation actions are required.If an adaptation action is required, the decision-making agent plans for the adaptive actions that will be performed by the managing system.The goal is to achieve operational goals exhibited by the managed system along with satisfaction of NFRs during the Plan phase.Lastly, during the Execute phase, the planned actions are carried out through the actuators over the managed system.The knowledge K of the MAPE-K loop represents the data required by the managing system to execute all the loop.The runtime model based on the MR-POMDP is embedded in the MAPE-K architecture loop to underpin decision-making for SASs.

Pri-AwaRE: self-adaptive architecture
This section presents the Pri-AwaRE architecture to support runtime decision-making for SASs.The proposed architecture makes use of MR-POMDP, as a priority-aware runtime model, as part of the phases of the MAPE-K loop.
Next, we present the Priority-Aware MR-POMDP++ and its use as part of the proposed self-adaptive architecture to support decision-making in SASs using an illustrative example of the IoT network.

Illustrative example
In order to illustrate our proposed approach, we consider the example of a self-adaptive Internet of Things (IoT) [2,61] network.As a concrete application, we consider the simulating environment of DELTA-IoT an exemplar SAS representing an IoT network for a smart campus [21].The simulator represents a multi-hop IoT network consisting of 15 motes, including RFID sensors, passive infrared sensors and temperature sensors, distributed across the various buildings of KU Leuven campus.These motes, based on LoRa (Long-Range) radio communication, are deployed in each building to provide access to the laboratories, monitor the occupancy status and sense the temperature.The motes communicate with each other to fulfil the functional goal of relaying information to the central gateway deployed at the central monitoring facility of the campus.The IoT networks are required to survive for a long period of time on a single battery along with maintaining communication reliability.Hence, the main goal for an IoT network is to increase the lifetime of the network by satisfying the NFRs of Minimization of Energy Consumption (MEC) and Reduction of Packet Loss (RPL)

MR-POMDP++
In this section, we present MR-POMDP++, a priority-aware runtime model, to represent the priorities and satisfaction levels of NFRs to perform decision-making in a SAS as shown in Fig. 5. Next, we present the rules for NFRs representation in the form of MR-POMDPs to support runtime decisionmaking in SASs.

(1) NFR satisfaction and MR-POMDP states
In order to achieve satisfaction of NFRs, a SAS is required to take adaptation actions that can have different effects (good or bad) on their satisfaction.Therefore, NFRs cannot be labelled as fully satisfied nor fully violated.Due to the lack of crispness in the nature of satisfaction of NFRs, the satisfaction levels of NFRs cannot be represented as an absolute value of True or False [19].Therefore, the satisfaction levels can be modelled in the form of probability distributions such as P(NFR = True) and a NFR is considered as satisfied if it meets an acceptability threshold constraint defined by the design experts [58].For example, in IoT case, the satisfaction level of RPL can be specified as P(RPL = True) = 0.9 or P(RPL = True) = 0.4.The RPL can be considered as highly satisfied if the P(RPL = True) > = 0.8 where 0.8 can be considered as the required threshold constraint.
Such specification of satisfaction levels of NFRs can be represented using the states of MR-POMDP.In MR-POMDP++, we consider each state to represent the set of combinations of satisfaction levels of NFRs.As states in MR-POMDP are not directly observable, a belief (i.e. a probability) over each state is maintained.Hence, the satisfaction levels of NFRs can be expressed in the form of marginalized probability distributions P(N F R i = T rue) where N F R i belongs to the set of NFRs [38].
On the basis of this description, we derive a mapping rule as follows:

Rule: 1 The state s ∈ S in MR-POMDP++ represents the set of combinations of satisfaction levels of the non-functional requirements (N F R 1 . . . N F R n ). As the states in the MR-POMDP++ are partially observable, the satisfaction levels of the NFRs can be represented in the form of probability distributions P(N F R i = T rue).
These probabilities can be used to conclude if the satisfaction levels meet the acceptability thresholds.Using Rule 1, the total number of states in terms of the satisfaction levels of NFRs in MR-POMDP++ can be computed as: |S| = |2| |N F R| where |S| corresponds to the size of set S, |N F R| corresponds to the number of NFRs, and 2 corresponds to True and False.For example, in an IoT network, if we consider 2 NFRs of Minimization of Energy Consumption (MEC) and Reduction of Packet Loss (RPL), so the number of states for MR-POMDP++ will become |S| = 2 2 = 4 as shown in Table 1.

2) NFR priorities and rewards vectors
During the decision-making process in SASs, the adaptation decisions should take into account the satisfaction priorities of individual NFRs.Priorities of NFRs indicate their level of importance for satisfaction.The higher the priority, the more important it is to satisfy that NFR at a particular point of time.
MR-POMDPs facilitate the modelling of these individual NFR priorities using a reward vector.In MR-POMDPs, a vector-valued reward function is used to associate a separate reward value with each objective (a NFR in our case).The reward values are generated as a feedback signal according to the decisions (adaptation actions) taken by MR-POMDPs.The reward value associated with a particular objective indicates the effect, either positive or negative, of performing an action on the satisfaction of that objective.Hence, the reward vector values specify a relative ranking of the objectives (NFRs) in terms of the cardinal effect that an action will have on the satisfaction of that objective (NFR) under an uncertain environmental condition.Consequently, a higher reward value for an objective indicates its higher priority (importance level) which is taken into account when an adaptation decision is taken by MR-POMDPs.
For example, in an IoT network, if the communication interference at a particular point of time is high, the decisionmaking agent might select the adaptation action of increasing transmission power (ITP) in order to support the NFR of RPL.However, increasing the transmission power might have a negative effect on the energy consumption.Therefore, given the current environmental context of high link interference, on the basis of the selected action of ITP, the system will generate an immediate reward for the RPL (e.g.75) that is higher than the reward for the MEC (that could be, e.g.− 50).
The reward value for RPL is higher than the reward value of MEC because it is more important to satisfy RPL at this point of time according to the MR-POMDP, given the current conditions.
Hence, the reward values represented in the form of a reward vector show a relative ranking of the NFRs in terms of the effect that the action will have on their satisfaction as follows: The reward vector values are initially assigned by the requirements engineers or domain experts [58].In MR-POMDP++, we consider these reward values as the initial expert defined priorities for NFRs.
Using these concepts, the priorities representation for NFRs in MR-POMDP++ is explained by the following rule:

R(s,a)
where R N F R1 represents the reward for NFR1 corresponding to the priority value of N F R 1 and so on.

(3) Expected utility values and autonomous tuning of priorities
In SASs, the priorities of NFRs may vary according to the changes in the environmental context.The ability to autonomously adapt to the context is what MR-POMDP++ offers.For example, if energy conservation is the most highly prioritized NFR, but at some point in time the batteries become fully charged, then rewarding that NFR's satisfaction may no longer be the optimal behaviour for the system.In order to deal with such situations, MR-POMDP++ with help of reward vectors offers an opportunity for autonomously tuning the individual NFRs' priorities.This tuning of priorities is done by the computation of the separate expected utility value for each NFR, during the operation of MR-POMDPs (using Eq. 3) as follows: where V N F Ri and Ri represent the expected utility value and reward values for NFRi.As presented in Eq. 9, the rewards, representing the initial expert defined priorities, are used for the computation of the distinct expected utility value for each NFR at runtime.
Hence, these expected utility values represent the newly tuned priority values of the different NFRs.The expected utility values consider the individual effect of performing an action on the satisfaction of an individual NFR given an uncertain environmental context, and are considered while making the decisions at runtime.More details about the autonomously tuned priorities are provided in Sect. 4.
Hence, MR-POMDP++ as a runtime specification model (S) takes into account the new knowledge (K') about the priorities of the individual NFRs to perform runtime decision-making for SASs for the purpose of conformance to the Requirements (R).
Next, we present the proposed self-adaptive architecture, Pri-AwaRE, that makes use of Priority-Aware MR-POMDP++ to perform decision-making for SASs.

Architecture for decision-making in SASs
The proposed Pri-AwaRE architecture for SASs is inspired by the feedback architecture of the reinforcement learning (RL) process based on the interactions between the decision-making agent and the environment in which the decision-making agent operates [31].On the basis of the observations monitored about the current state of the environment, the decision-making agent performs an action in order to achieve the desired goal.This process repeats continuously.Considering the above, we define an architecture for the SASs consisting of two components corresponding to the managed system (i.e. the environment in RL process) and the managing system (i.e. the decision-making agent in RL process), respectively, as shown in Fig. 6.
The Pri-AwaRE architecture structures the managing system (based on feedback loop) interacting with the managed system using probe and effector interfaces [50] as shown in Fig. 6.Our Pri-AwaRE approach focuses mainly on the aspects of the managing system.
In the next subsection, we present the components of the Pri-AwaRE architecture such as the managed and managing system (the decision-making agent).We also present the use of the MR-POMDP++ to support priority-awareness, by providing support to model and represent the satisfaction levels and priorities for NFRs.
(1) Managed system The managed system represents the actual environment for which we want to implement self-adaptive capabilities.It is instrumented using probe and effector components that are used to send information to and from the managing system.
For example, an IoT network operating according to its predefined settings can be considered as a managed system.To add self-adaptive capabilities, we have to attach some external managing system that can interact with the network using some probing and effector interfaces [21,50].More details on the interaction of the managing and managed system are provided in [50].For our study, we have selected the remote data mirroring (RDM) and IoT network environments represented by the RDMSim [50] and Delta-IoT [21], respectively, to act as the managed system.
(2) Managing system The managing system consists of the following components: (a) Monitoring component Monitoring components of the managing systems use sensors of the managed system in order to get data regarding the monitorable values (e.g.communication interference and traffic load on the links in an IoT network).The monitored values are sent as an input to the MR-POMDP++ model in the form of observations.

(b) Analysis and planning components
The analysis and planning components of the managing system make use of the steps of the MR-POMDP++ process as shown in Fig. 6.The MR-POMDP++ model takes observations from the monitoring component and the current belief over the states (maintained by MR-POMDP++) from its runtime Knowledge as an input.The MR-POMDP++ model analyses these values to perform planning for the adaptation actions for the fulfilment of target operational goals and satisfaction of NFRs, such as Minimization of Energy Consumption and improvement in packet delivery performance in case of the IoT network.

(c) Execution component
The execution component takes the prompted action by MR-POMDP++ as an input and performs that action on the managed system using effectors or actuators of the managed system, to therefore meet the operational functional goals and satisfy NFRs to comply to the requirements (R).

d) Knowledge component:
The knowledge component is based on the runtime knowledge maintained by MR-POMDP++ that is taken into account during the decisionmaking process.
The process is performed continuously during the SAS's execution.
Next, we present the application of Pri-AwaRE, our Priority-Aware self-adaptive architecture, to the example application of DELTA-IoT network.

Pri-AwaRE architecture for decision-making in the IoT network
The Pri-AwaRE architecture for the case of IoT networks consists of the following two components: (1)The managed system represents an IoT network.As a concrete example, we consider an IoT network for smart campus represented by the DELTA-IoT simulator [21].
(2) The managing system based on the constructs of MAPE-K loop and MR-POMDP++ consists of the following components:

(b) Analysis and planning components
To support priority-awareness, the steps of MR-POMDP++ process are used to support analysis and planning.The moni-tored SNR value is sent as an input, in the form of observation, to the MR-POMDP++.Then, the MR-POMDP++ analyses the observed SNR value and the current belief over the state maintained by the runtime model of the MR-POMDP++ (i.e. the Knowledge K) and plans for the selection of the next suitable adaptation strategy in the form of an action to be performed by the system.The knowledge component of MAPE-K loop comprises of the runtime knowledge maintained by MR-POMDP++.
The components of the MR-POMDP++ for the considered DELTA-IoT network are explained next.
States: According to Rule 1, for the two NFRs (MEC and RPL), four states are identified and are shown in Table 1.
Actions: represent the adaptation strategies to maintain the satisfaction of the NFRs RPL and MEC.These actions are Increase Transmission Power (ITP) and Decrease Transmission Power (DTP).ITP supports RPL by increasing the communication range [27,28] of the motes along with the adjustment of the distribution factor on the links.As the value of the communication range is directly proportional to the transmission power, increasing the communication range leads to improved packet delivery performance at the cost of high energy consumption [21].The action DTP supports MEC by decreasing the communication range along with adjusting the distribution factor of the links.Decrease in the communication range leads to decrease in transmis-sion power and, as a result, leads to lower level of energy consumption.
Rewards: There is a vector-valued reward function to represent the priorities of NFRs from the perspective of the experts.Therefore, the reward vector has a size of 2 represented as R(s,a) = [R M EC , R R P L ]. R M EC represents the priority value for MEC and R R P L represents the priority values for RPL at runtime.The reward vector values (provided by the experts) for NFRs in the DELTA-IoT network are shown in Table 2 Transition Function: According to Rule 1, the states are represented as a combination of NFRs in MR-POMDP++.The transition probabilities T(s,a,s') are factored as marginal conditional probabilities of NFRs P(MEC | MEC, a) and P(RPL | RPL, a) using the property of conditional independence and Bayes rule [38] as follows: The transition probabilities for going from one state to another as a result of action for the IoT case are shown in Table 3.These transition probabilities are provided by the experts.
Observations As the states of NFRs are not directly observable, we use monitorables to obtain observations required for monitoring the NFRs' satisfaction levels based on the information obtained from the environment.In case of the IoT network, we consider the monitorable SNR to observe link interference [21].Depending on the possible set of values of SNR, i.e. being less than, greater than or equal to zero, we have three types of observations.The higher the value of SNR, the stronger will be the signal strength of the link Therefore, it indicates less interference and vice versa.
Considering the given set-up, the MR-POMDP++ serves as a priority-aware model to analyse the given observations and plan for the next suitable action.

(c) Execution component
The execution component executes the selected action on the managed system by using the effector component of the DELTA-IoT simulator.

Experimental evaluations
We have conducted experiments using two different example applications from the networking domain as a proof of generalization of application of our Pri-AwaRE approach.The example applications are selected on the basis of the type of decisions, i.e.Global or Local considering a network environment that is composed of nodes.Global decisions have an effect on the entire network configuration, whereas local decisions affect the link configurations associated with a particular node.
The first example application that is presented is based on the decision-making in a self-adaptive IoT network.In the IoT network, local decisions are taken for each sensor based on local contextual information retrieved from the links associated with them.On the other hand, our second example application is related to a self-adaptive remote data mirroring (RDM) network.In the RDM system, global decisions for topology change are taken based on the analysis of the global contextual information (such as total number of active network links) to satisfy the NFRs.We have also compared our Pri-AwaRE approach with the existing single-objective approaches of RE-STORM [38] (implemented using the DESPOT and Perseus solvers) and RE-STORM-ARROW [39].The experiments were performed on a Lenovo Thinkpad with intel Core i7, 8th Gen processor and 16 GB RAM.A complete account of the results associated with the RDM case study is reported in [48].

MR-POMDP solver
As solving a MR-POMDP is a computationally intractable problem, we have used Optimistic Linear Support with Alpha Re-use (OLSAR) algorithm based on a point-based MR-POMDP solver, to generate approximate solutions by performing approximate backups while computing α−vectors only for a set of sampled belief values.Hence, it has proven to scale well during the experiments performed.The reuse of the alpha matrix in the algorithm also makes it efficient.

Experimental hypotheses
In this section, we define our hypotheses for the experiments.Let us revisit the concept of priority-awareness as defined in Definition 1.
Based on the above concept of priority-awareness, the null H 0 and alternative H a hypotheses are described as follows: H 0 : Pri-AwaRE does not improve decision-making under uncertainty in comparison with single-objective optimization techniques, which do not offer priority-awareness.
H a : Pri-AwaRE improves decision-making under uncertainty in comparison with single-objective optimization techniques, which do not offer priority-awareness.
Next, we provide a description of the example applications and experimental evaluations to test the hypotheses using these example applications.

Example application 1: Local decision-making for IoT network
As our first example application, we have used the simulating environment of DELTA-IoT; an exemplar self-adaptive system representing an IoT network for a smart campus [21].
Next, we present the initial set-up for the experiments using the DELTA-IoT exemplar.

Initial set-up
In the DELTA-IoT network, each timestep corresponds to a 15 min of network activity [21].For the current set of experiments, at each timestep, we execute the MR-POMDP++ model for each mote (having mote ids from 2 to 15) individually to make the local decisions of performing the action ITP or DTP according to their monitored link interference values.
For experiments with DELTA-IoT network, we focus on the NFRs of MEC and RPL which are representation of the NFRs concerned with the quality and performance [19] of the network.For the initialization of the MR-POMDP++ model, the states, rewards vector values and transition function probabilities are presented in Tables 1, 2 and 3, respectively.Further, to evaluate the approach, we have compared the results in terms of the real values from the DELTA-IoT simulator for the satisfaction of 2 NFRs MEC and RPL.

Requirements specification
Defined by the experts [21], following set of Requirements (R) regarding the satisfaction levels of NFRs for DELTA-IoT network are considered: R1: Total energy consumption in the network should be less than or equal to 20 coulombs, i.e.SAT M EC <= 20 R2: Total packet loss in the network should be less than or equal to 20%, i.e.SAT R P L <= 0.20

Experiments
The experiments have been performed to demonstrate the following two cases for evaluation of the hypotheses:

Case 1: Priority-aware decisions and autonomous tuning of NFRs' priorities
We have executed the MR-POMDP++ model for each mote individually in the network (having mote ids from 2 to 15) at each timestep.The model monitors the link interference on the outgoing links for a particular mote and takes the decision of increasing or decreasing the transmission power on the links by increasing or decreasing the communication range.So the model takes a local decision related to each mote at a particular timestep in order to configure its corresponding links as shown in Table 4.For example, first at timestep t1, all the links for mote 2 are configured as a result of the action of DTP.As a consequence, the satisfaction level for MEC becomes 33.961559 and a satisfaction level for RPL becomes 0.041667, respectively.The model is then executed for mote 3, mote 4 and so on.
The initial NFRs' priorities represented in the form of rewards are taken into account while taking the adaptive decisions of action selection.Let us study how the priorities of NFRs represented in the form of rewards have an impact on the decision of action selection for the purpose of link configurations for a particular mote in DELTA-IoT network as shown in Table 4. Considering the case of timestep t1, the expected utility values for NFRs are considered during the decision-making process at runtime.For example, for mote 2 the expected utility value for MEC has a higher value of 764.171898 than that for RPL having a value of 605.188297.As the expected utility value supports the MEC, so the action selected for mote 2 is DTP that supports MEC.In contrast, for the configuration of the links of mote 7, the action of ITP is selected on the basis of the higher expected utility value of 788.205554 for RPL than the expected utility value of 650.420788 for MEC.This decision of ITP increases the satisfaction level of RPL from 0.083333 to 0.0% representing no packet loss as shown in Table 4. Hence, it shows that expected utility values that represent the autonomously tuned NFRs' priorities has an impact on the decision of action selection leading to better-informed priority-aware decision-making.Given the above, it provides evidence that Pri-AwaRE supports priority-aware decision-making.
As a result of all the tuning configurations done for all the motes at the end of timestep t1, the satisfaction of MEC and RPL becomes 33.083955 and 0.009091, respectively.The same procedure is repeated at each timestep leading to achieve higher levels of satisfaction for both MEC and RPL as shown in Fig. 7. Hence, our approach shows promising results in terms of satisfying both MEC and RPL.

Case 2: Impact of priority-aware decisions on satisfaction levels of NFRs
We have also studied the impact that the Pri-AwaRE approach, performing priority-aware decisions, has on the satisfaction levels of NFRs of the DELTA-IoT network as compared to the network performing without any adaptive approach.Furthermore, we also compare our results with the approach of RE-STORM [38] implemented using Perseus [57], a single-objective POMDP solver.
During the execution of the simulator, without adaptation, for the DELTA-IoT network the simulator focuses on the satisfaction of RPL, at the cost of high value of energy consumption, by keeping the overall packet loss less than the specified threshold of 20% as shown in Fig. 7.In this case, the satisfaction for MEC ranges between 25.0 and 44.0 coulombs which is quite higher than the satisfaction threshold.On the other hand, by applying the adaptation mechanism offered by our Pri-AwaRE approach, DELTA-IoT network showed an improvement in terms of compliance to the requirements of SAT M EC <= 20.0 and SAT R P L <= 0.20.
Moreover, our approach also shows better satisfaction levels of NFRs of MEC and RPL in comparison with the network working under the adaptive decision-making offered by RE-STORM.Let us observe Fig. 7, Pri-AwaRE shows promising results in terms of satisfaction RPL as compared to RE-STORM.RE-STORM shows higher levels of packet loss than Pri-AwaRE by having a packet loss above the satisfaction threshold of 0.20% more frequently as shown in Fig. 7. On the other hand, Pri-AwaRE shows comparable results to RE-STORM in terms of satisfaction of MEC.In case of Pri-AwaRE, the satisfaction level of MEC remains below or closer to the satisfaction threshold at almost all of the timesteps.The satisfaction of MEC starts with quite high value of 33.083955 coulombs at the first timestep.But later on, it shows an improvement in the satisfaction of MEC by achieving a satisfaction level of 17.226979 coulombs at second timestep and thereby complying to the requirements specification of SAT M EC <= 20 as shown in Fig. 7.

Summary of findings in experiments
In summary, our Pri-AwaRE approach shows compliance with the requirements specifications as evident from the average satisfaction levels of MEC and RPL presented in Fig. 8.We have performed a confidence interval test to test our hypotheses.The reported results show a confidence level of 95% that the average satisfaction level of MEC lies between the confidence interval of 17.6989 and 18.0229 with a standard error of 0.0826.Similarly for average satisfaction level of RPL, the confidence interval lies between 0.1381 and 0.1457 showing a confidence level of 95% with a standard error of 0.0019 as shown in Table 5.As a consequence, it verifies the conformance to the requirements specification of SAT M EC <= 20 and SAT R P L <= 0.20.We can conclude that our priorityaware approach offers statistically sound results in terms of meeting the satisfaction levels of NFRs.Hence, this evidence rejects our hypothesis H 0 and satisfies our hypothesis H a .

Discussion
From the results, it can be deduced that Pri-AwaRE shows a significant improvement in the satisfaction of NFRs as compared to both the network operating without an adaptive mechanism and the network having the decision-making support of RE-STORM approach.The usage of MR-POMDP++ helps in maintaining compliance against the requirements (R) for the DELTA-IoT network.The average satisfaction levels for the NFRs MEC and RPL, generated by Pri-AwaRE, are 17.860959 and 0.141865, respectively, as shown in Fig. 8. Hence, the DELTA-IoT network is conforming to the requirements specification of SAT M EC <= 20 and SAT R P L <= 0.20.Hence, our approach shows comparable and sometimes even better satisfaction levels for NFRs than RE-STORM, which is representative of a single-objective approaches.
To further evaluate our results for the DELTA-IoT network, we have computed the extent of satisfaction (ExS) of the NFRs using the quantification tool of DeSiRE [14] as shown in Fig. 9.According to the DESiRE tool, the value of zero is considered as a satisfaction boundary between positive and negative degree of satisfaction represented by ExS values.In case of Pri-AwaRE, the ExS value for MEC remains positive at almost all of the timesteps representing positive degrees of satisfaction for MEC.With an exception of a few timesteps, the ExS goes below zero.Yet, this drop in the ExS is very minor and is quite close to the satisfaction boundary of zero.Moreover, Pri-AwaRE also shows promising results in terms of satisfaction of RPL by having ExS value above zero, which show positive degree of satisfaction for RPL.However, the ExS value for RPL goes below the zero value, indicating its negative degree of satisfaction.However, this deviation goes below to a maximum value ranging between -1.5 to -2.0 which is not that far from the satisfaction boundary of ExS.The analysis of ExS values creates an opportunity for RELAXation of a particular NFR [60] to temporarily benefit the satisfaction of others.For example, if MEC has a high ExS value at a particular timestep and RPL is slightly violated having a negative value for ExS that is closer to zero.Therefore, in such a situation, MEC can be RELAXed at that timestep to achieve better satisfaction level for RPL.This autonomous RELAXation of priorities made by the system is part of our future work.
Nevertheless, with no autonomous RELAXation at hand, and considering the analysis of the effects of the autonomous priority tuning on the decisions provided above, the Pri-AwaRE approach (based on MR-POMDP++) creates opportunities for the experts to consider the refinement of the initially defined priorities.The refinement would be done by taking into account the new autonomously tuned priorities provided by Pri-AwaRE to, therefore, improve the current specification of the requirements.This would take into account the newly found environmental contexts with the new corresponding sets of priorities, which were provided by the MR-POMDP++ model.This consideration can lead to a better assignment of initial expert defined priorities and thereby offer a significant improvement in the decisionmaking process of SASs in terms of satisfaction of NFRs.We are currently working in such specific cases.

Example application 2: global decision-making for RDM network
The RDM system [22] is a disaster recovery system for tolerating failures by maintaining multiple replicas (i.e.copies) of data at remotely located mirrors (i.e.servers).Access to the data can continue even if one of the copies of data is lost.Each network link in the network has an associated operational cost 4 and a measurable throughput, latency and loss rate used to determine the reliability, performance and cost of the RDM system.The goal here is to achieve the satisfaction of the NFRs of Minimization of Costs (MC), Maximization of Performance (MP) 5 and Maximization of Reliability (MR) under uncertain environmental contexts of link failures and varying ranges of bandwidth consumption [22].Hence, the network is required to continuously take global adaptive decisions of switching between the topologies of Minimum Spanning Tree (MST) and Redundant Topology (RT) to maintain better levels of satisfaction of NFRs.Both the topological configurations have a different impact on the levels of satisfaction of NFRs.RT provides a higher level of reliability than MST topology but it has a negative impact on the satisfaction of the MC and MP as the cost of maintaining non-stop RT topology will be high and the performance can be reduced because of data redundancy.On the other hand, MST topology provides better levels of satisfaction for MC and MP by maintaining a minimum spanning tree for the network.

Initial set-up
For experimental purposes, we consider the RDM network [50] that is based on the operational model presented in [22,24].The RDM network under consideration consists of 25 RDM Mirrors with a total number of 300 physical links used to transfer data between the mirrors [50].Considering this set-up of the network, the maximum number of concurrent active network links is 120 that does not affect the assigned budget for the network [17].For the current set of experiments, we focus on the NFRs related to the quality and performance attributes [19] of the RDM network such as MC, MR and MP.Next, we discuss the initial set-up of the components pf MR-POMDP++ model for the considered RDM network.

Components of MR-POMDP++ for RDM network
The components of MR-POMDP++ model for the considered RDM network are explained as follows: States We represent states as combinations of satisfaction levels of NFRs.Therefore, for the three NFRs (MC, MR and MP), eight states are identified and are shown in Table 6.
Actions Actions represent the adaptation strategies to support the satisfaction of the NFRs MC, MR and MP.For the case of RDM network, we consider two adaptive actions in the form of the two topological configurations of Minimum Spanning Tree (MST) and Redundant Topology (RT).
Rewards As we are dealing with 3 NFRs of MC, MR and MP for the RDM network, the reward vector has a size of 3 represented as: failures in the network links during execution of the selfadaptive RDM network.Such network link failures may be due to problems in devices such as switches or routers.For this purpose, deviations from the initially defined transition probabilities (i.e.P(N F R t+1 = T rue|N F R t , A t )) for the topologies (MST and RT) are introduced randomly at runtime.We consider the following detrimental contexts for the purpose of evaluation of our results: Detrimental Context 1 (DC1): The deviation levels are introduced to simulate unanticipated packet loss during the execution of the RDM network.This increase in the packet loss during the topological configuration of RT would lead to an unusual rate of data forwarding resulting in an increase in bandwidth consumption and a decrease in the performance.As a consequence, a decrease in the satisfaction levels of MC i.e.P(MC = True) and MP i.e.P(MP = True) would be expected.

Detrimental Context 2 (DC2):
The deviation levels are introduced in the RDM network during the execution of MST topology to simulate an unexpected data packet loss resulting in decrease in the reliability of the network.Data packet loss may represent network link failures in the RDM system, which may be caused due to problems with the equipment.As a result, the satisfaction level of MR, i.e.P(MR = True) would be expected to be reduced.
In order to simulate small realistic changes, we have introduced a maximum deviation of 12% from the current transition probabilities for a randomly selected duration between 5 and 15 timesteps for a specific deviation level.Considering the above detrimental contexts, next we present the two cases for experiments: Similar to the IoT example application, we evaluate the Hypotheses using the following experimental cases:

Case 1: Priority-aware decisions and autonomous tuning of NFRs' priorities
Here, we demonstrate priority-aware decision-making offered by our Pri-AwaRE approach and how it supports compliance with the requirements specification.To perform priority-aware decision-making, our proposed approach uses MR-POMDP++ to represent distinct priorities of NFRs in the form of rewards.We study how the priorities of NFRs have an impact on the action selection for the satisfaction of NFRs as shown in Table 12.For example, using the initial set-up, at timestep 45, MR-POMDP++ provides the best possible tradeoff by selecting the MST as a preferred topology over RT.The reason behind this decision is that the expected utility values for MP and MC are 395.32709and 392.13139, respectively, which are higher than the expected utility value for MR which is 388.21367 as shown in Table 12.This shows that the application of MST topology on the network has a more positive impact on the satisfaction of MP and MC than on MR.In comparison with MST, at timestep 45, the expected utility values in case of RT topology were 386.71867 for MC, 392.08584 for MR and 386.20288 for MP.Due to higher impacts offered by MST, the system selects MST as the preferred topology, to support the reduction of inter-site network links cost and improve the performance of the network.According to this decision, MST topology is set for the network.Hence, to offer priority-awareness, the decisions offered by MR-POMDP++ make the system aware of the explicit impacts of the decisions on the satisfaction of NFRs as presented in Table 12.
On the other hand, at timestep 49, the system decides to switch the topology from MST to RT.The reason behind it is that the expected utility value of MR being 392.13918 is higher than the expected utility values of MC and MP, i.e. 386.76743 and 386.25769, respectively, as shown in Table 12.As a consequence, the adaptation to RT topology shows an improvement in the satisfaction of MR from P(MR = True) = 0.83722 to P(MR = True) = 0.92089 by meeting compliance to the required satisfaction threshold, i.e.P(MR = True)>=0.85.Hence, the decision-making offered by MR-POMDP++ takes into account these expected utility values representing the new values for the tuned priorities of the NFRs.This is one of the contributions of our Pri-AwaRE approach, unlike other approaches, to offer SASs with a priority-aware decision-making process.
During the set-up of experiments, the initial priorities for NFRs of the RDM network were defined by the experts considering the different anticipated runtime contexts.According to the rules defined in Sect.4, these initial priorities were defined in the form of rewards for the MR-POMDP++ model (as shown in Table 7).During the decision-making process, these pre-defined priorities were tuned autonomously by MR-POMDP++, according to the runtime situations, using the computation of the expected utility value for each NFR individually (using Eq. 9).This tuning of individual priorities by MR-POMDP++ helps in supporting the SASs to comply to the requirements (R) by achieving higher levels of satisfaction for NFRs.The newly tuned priorities correspond with the expected utility values represented in Table 12.The goal of this autonomous tuning with help of expected utility values is to meet the requirements specification for the NFRs.

Case 2: Impacts of priority-aware decisions on satisfaction levels of NFRs
We have also studied the impact of priority-aware decisions by Pri-AwaRE on NFR satisfaction levels under different detrimental contexts (DC1 and DC2), and have compared its results with the existing techniques of RE-STORM [38], implemented using DESPOT (a single objective POMDP Solver), and RE-STORM-ARROW [39].The results reported in Table 12 show the implications of setting the selected topology for the network on the satisfaction levels on NFRs at a particular timestep.This decision of topology selection takes into account the expected utility values for individual NFRs.
Let us observe Figs. 10, 11 and 12 which show the results of Pri-AwaRE and RE-STORM under (i) the initial set of pre-defined rewards, transition and observation probabilities (stable conditions scenario), and (ii) the detrimental contexts DC1 and DC2 where the deviation levels are introduced.Under the stable scenario and DC1, both Pri-AwaRE and RE-STORM show comparable results by maintaining the NFRs of MC, MR and MP in the suitable zone of satisfaction, i.e. above the threshold values of P(MC = True) >= 0.70, P(MR = True) >= 0.85 and P(MP = True) >= 0.75.Both techniques show preference for the MST topology, with an increase in the use of MST topology by Pri-AwaRE under DC1 to support the satisfaction of both MC and MP as shown in Figs. 13 and 14.Under stable conditions, the percentage usage of MST topology by Pri-AwaRE and RE-STORM is 95.2 and 92.9%, respectively, for the simulation duration of the experiments.However, under DC1, the percentage usage of MST is increased to 99.8% by Pri-AwaRE as shown in Fig. 14.MST offers lower operational cost and improved performance along with supporting minimal level of required reliability.Therefore, application of MST to the network is the most suitable decision to be taken both under stable conditions and DC1 [37].In contrast, under DC2, where deviations are introduced to effect the system's reliability, Pri-AwaRE shows better level of satisfaction for MR as compared to RE-   13 and 15, respectively.This is the required expected behaviour by the self-adaptive RDM network as defined by the experts [17,37].Furthermore, we have also compared our results with the technique of RE-STORM-ARROW(based on P-CNP) [39], which offers the support of updating initially defined rewards for RE-STORM under both DC1 and DC2 as shown in Figs.11 and 12, respectively.However, even if the technique RE-STORM-ARROW supports the satisfaction of MR under This is due to the fact that RE-STORM-ARROW with the help of P-CNP approach provides the update of the initially defined rewards for RE-STORM approach.This is not the case in our approach where we are providing autonomous tuning of priorities that uses the rewards (see Eq. 9) but we are not updating the initially defined rewards.Moreover, RE-STORM-ARROW, even with the update of reward values does show poor levels of satisfaction for MR and MP at several timesteps as shown in Fig. 12.In contrast to RE-STORM-ARROW, our approach offers higher levels of NFR satisfaction.Further, another shortcoming of RE-STORM-ARROW is that the update offered by the POMDP is not autonomously executed as is the case of MR-POMDP++.Instead, RE-STORM-ARROW needs external support from the approach of P-CNP which creates efficiency problems.Hence, from the results, we can deduce that the priority-aware decision-making process by Pri-AwaRE offers higher levels of satisfaction of NFRs even under the detrimental contexts when compared to the existing single-objective techniques.

Summary of findings in experiments
In summary, our experimental results show that all the NFRs' levels of satisfaction MC, MR and MP are compliant with the requirements specification as evident from the average satisfaction levels presented in Fig. 16.The reported results show a confidence level of 95%.The confidence intervals and standard error for average satisfaction levels of NFRs under different experimental scenarios are presented in Table 13.Consider first the case of average satisfaction levels of NFRs under DC1.The average satisfaction level of MC lies between the confidence interval of 0.8649 and 0.8863 with a standard error of 0.0054.Similarly, for average satisfaction level of MR, the confidence interval lies between 0.8948  13.Hence, from the results we can conclude that Pri-AwaRE offers statistically sound results in terms of fulfilment of the requirements.Hence, it rejects hypothesis H 0 and supports hypothesis H a .

Discussion
From the results, it is evident that our Pri-AwaRE approach, based on MR-POMDP++, comply with the requirements (R) for the RDM network both under the stable and detrimental environmental conditions.The average satisfaction levels for all the NFRs of MC, MR and MP, generated by Pri-AwaRE, under initial stable conditions are 0.8732, 0.9069 and 0.8747, respectively.For the detrimental context scenarios, the average satisfaction levels are P(MC = True) = 0.8756, P(MR = True) = 0.9042 and P(MP = True) = 0.8811 under DC1 and P(MC = True) = 0.8431, P(MR = True) = 0.8906 and P(MP = True) = 0.8315 under DC2, respectively.Hence, the system is conforming to the requirements specification of P(MC = T rue) >= 0.70, P(M R = T rue) >= 0.85 and P(MC = T rue) >= 0.75 as shown in Fig. 16.Under all the scenarios, our approach shows comparable and sometimes even better satisfaction levels for NFRs than single-objective techniques of RE-STORM and RE-STORM-ARROW.Hence, based on the results, we can deduce that our approach using the reward vector provides statistically better results as compared to single-objective approaches.Moreover, our Pri-AwaRE approach also provides more awareness to the decisions by using individual NFRs' priorities during the decision-making process.To further evaluate our approach, we have also executed experiments using OLS algorithm for the RDM case.The results for OLS algorithm are reported in [48].
Furthermore, we have also compared the overall performance of the approaches.We have performed experiments on a Lenovo Thinkpad with intel Core i7, 8th Gen processor and 16 GB RAM.Using this hardware set-up, Pri-AwaRE, based on MR-POMDP++, takes 1500 ms to come up with a decision and RE-STORM takes 1000 ms for decisions.This is due to the fact that usage of OLSAR with the multi-reward Perseus solver adds this additional overhead as described in Section 2.4.Considering the case studies that we have selected to test our approach, this performance can be considered as adequate.The adaptation process typically depends on the frequency with which the stakeholders want to monitor the managed system.For example, in case of Internet of Things network, one simulation timestep represents a network activity of 15 min [21].So the adaptation decisions are taken after a gap of this time interval.Whereas in case of Remote Data Mirroring network, we consider one timestep representing an adaptation decision taken after a network activity of 1 h.Hence, the focus of our Pri-AwaRE approach is on runtime decision-making in SASs which is not hard real time [7].

Threats to validity
The threats to validity are based on the classification provided in [15].External validity is related to the generalization of the outcomes outside the scope of our study.Internal validity focuses on ensuring that the treatment used in the experiments is analogous to the actual outcome that is observed.Finally, construct validity is based on the relation between the theory behind the experiments and the observations.

External validity
A key threat to validity of our proposed approach lies in the computational cost of MR-POMDPs.Solving an MR-POMDP is a computationally intractable problem in its worst case, even using the OLSAR algorithm [44] that our implementation uses and which overcomes scalability issues related to the "curse of history".In the Pri-AwaRE architecture, the states are defined in terms of combinations of satisfaction levels of NFRs.If the number of NFRs is 2, we would have 4 states for MR-POMDP, for 3 NFRs, it would be 8 and so on.Therefore, in practice, it will work with small number of NFRs.It hints that the design experts of SASs, while using our approach, must limit their reasoning to critical NFRs that drive self-adaptation.Hence, our approach belongs to the multi-objective sequential decision-making techniques that focus both theoretically and practically on a few objectives, which according to [43] includes a considerable number of applications.
We have executed experiments using example applications [21,50] that focus on a centralized setting.The approach has not been tested in a decentralized set-up yet.More experiments would be required to test the application feasibility of our Pri-AwaRE approach in both centralized and decentralized domains.

Internal validity
The internal threat to validity concerns the extent to which our approach performs in an actual environmental set-up.In this paper, we have employed a case study approach based on a simulator.Our experimental results are based on the environmental factors presented by a simulating environment not an actual physical network.example applications [21,50] that we have selected are well-known and provide simulations that are closer to the real settings.Both RDM [17,50] and IoT [21] are well accepted applications in the research community and are already in use by the other teams.

Construct validity
The construct validity concerns the mirroring relationship between MR-POMDP++ and the managed system.However, we have established reflections for the current state of the managed system by MR-POMDP++ presented in [50] but more work is required.As part of our future work, we plan to make more investigations towards studying the mirroring aspect.

Related work
In this section, we discuss the different existing techniques that deal with prioritization of NFRs in SASs.There are three criteria that any such technique must satisfy: It must endow the SASs with self-awareness [25], i.e. the ability to reason about how well it is satisfying the required NFRs in terms of their priorities; it must quantify uncertainty about the degree of satisfaction of NFRs that is acceptable in any given runtime context; and it must be sufficiently efficient to enable adaptation decisions to be computed at runtime.We have classified these techniques in two categories as follows:

Design time techniques
The techniques based on Multi-Criteria Decision-Making (MCDM) approaches such as Analytic Hierarchy Process [30] and Primitive Cognitive Network Process [62] to support ranking and explicit modelling of the NFRs have been presented but they are more of design time techniques.However, the approach of ARROW [39], based on P-CNP, provides support to RE-STORM, which is based on a POMDP model [18], to deal with the prioritization of NFRs.With the help of P-CNP, ARROW supports automatic update for initially defined priorities of NFRs at runtime but the update is not autonomous and does not work from within the POMDP model.Our approach tackles this limitation by improving the runtime representation of priorities of individual NFRs underpinned by the multi-rewards support offered by the MR-POMDP model.Hence, different from [18,39], and supported by the results of the experiments performed, our MR-POMDP-based approach takes it further to support the explicit modelling of the individual NFRs' priorities, allowing improved runtime reasoning, tuning and awareness during the decision-making process of a SAS at runtime.
In order to support optimization of NFRs, there are also approaches that are based on search based techniques such as [8,41].These techniques perform optimization of priorities at design time, while these initially optimized priorities are further used at runtime in an off-line fashion.Therefore, it is considered that the approaches provide an explicit representation of priorities of NFRs but they are design time techniques.In contrast to these techniques, our approach supports the runtime modelling and autonomous tuning of distinct priorities of NFRs to offer priority-aware decisionmaking and the usage of Optimistic Linear Support algorithm [44] to solve MR-POMDP makes it more efficient.

Runtime techniques
In order to support decision-making in SASs, a number of runtime modelling techniques have been in use to support priority-awareness.Such techniques include the usage of probabilistic models like Dynamic Decision Networks (DDNs) [4].The DDNs are used to represent goal model as a runtime model to support decision-making under uncertainty.The DDNs with the help of Bayesian Theory of Surprise [6] provides quantification of uncertainty.The approach represents the priorities of NFRs as a single scalar utility value that represents a combined priority value for all NFRs.This scalar utility value is then used to determine the effect (positive or negative) of the decision of an action selection on the satisfaction of all of NFRs as a whole at runtime.Moreover, the technique based on DDNs deals with the problem of scalability over time (i.e. the curse of history, the graph to represent the history of observations and actions for the DDN planning grows exponentially with the planning horizon).On the other hand, the techniques presented in [1,9,16,33,38,59] make use of Markov-based approaches such as Markov Decision Process (MDPs), Partially Observable Markov Decision Processes (POMDPs) and Discrete Time Markov Chains (DTMCs) along with probabilistic model checking to support runtime assurance of NFRs during decision-making in SASs.As these techniques are Markov based (similar to our approach), they support the quantification of uncertainty by maintaining probabilities over the state of the environment.An important limitation of these techniques is that they lack explicit modelling of the distinct priorities of NFRs at runtime.Furthermore, the approaches specifically based on MDPs and POMDPs (with no multiple-reward support) model the ranking of the NFRs as a scalar-reward value to indicate a cumulative priority of all the NFRs hindering priority awareness.Our MR-POMDP-based approach instead tackles these limitations using a vector-valued reward function to support modelling and reasoning about individual priorities of NFRs.Furthermore, our approach, based on the reward vector, also offers and exploits the built-in capability of autonomous tuning of priorities of NFRs.Hence, our Pri-AwaRE approach provides the with a higher degree of awareness of priorities during decision-making.
Furthermore, control theory-based approaches such as [32,40] have also been used to support explicit runtime configuration and tuning of NFRs.However, the technique in [32] lacks the autonomous prioritization of NFRs, while the approach in [40] lacks the capability of dealing with the NFRs having the same priority rank.Therefore, in such a situation it fails to perform trade-offs of the individual NFRs having the same priority rank as another NFR.Moreover, the technique in [40] also does not consider uncertainty as a quantifiable measure.

Conclusion and future work
In this paper, we have presented Pri-AwaRE, a self-adaptive architecture that uses MR-POMDP++ as a runtime specification model (S) within the MAPE-K loop.Based on Zave and Jackson's principle [63], usage of MR-POMDP++ supports priority-aware decision-making in SASs by (i) providing the runtime modelling and reasoning of priorities of individual NFRs by taking into account the knowledge (K) about their distinct priorities, (ii) maintaining compliance with the requirements (R) by autonomously tuning the NFRs priorities at runtime under uncertain environmental contexts.Existing techniques typically represent the priorities of NFRs in the form of a single scalar utility value to show a combined priority for all the NFRs.By contrast, we have shown how Pri-AwaRE, using a reward vector, works towards nonscalar nature of priorities offering therefore better-informed trade-offs of NFRs at runtime.The usage of MR-POMDP++ supports the autonomous tuning of priorities of NFRs by the computation of separate expected utility value for each NFR under changing contexts at runtime.
For evaluation purposes, we have applied Pri-AwaRE to two different example applications; a remote data monitoring network in which global adaptation decisions are taken for the whole system, and an IoT network where local adaptation decisions are taken for each individual sensor.These reveal that our decision-making approach shows compliance to the requirements (R) by achieving higher levels of satisfaction of NFRs even when applied to two different domains.
We have also compared Pri-AwaRE with existing singleobjective techniques for both the example applications.Pri-AwaRE offered informed adaptation decisions, taking into account the individual priorities of NFRs, which led to the achievement of higher levels of satisfactions for NFRs.
For future work, we plan to use this technique as a tool for apriori-elicitation of priorities for NFRs.The idea is to perform simulations to learn about the environment to therefore, uncover contexts that otherwise would not be anticipated.The newly discovered knowledge K' would be made explicit in a new specification S' that would be used in the implementation of new releases of the SAS.Further, as the approach helps enriching the decision-making mechanism in SASs with newly discovered knowledge K', we are exploring how it can also be used to provide explanations for unclear adaptation decisions [37,51].
Finally, more research efforts are needed to explore the evolution of the model, including priorities and their impact on NFRs' satisfaction levels, and new alternative actions.This is a whole new important research line in the area of decision-making under uncertainty.

Fig. 6
Fig. 6 Pri-AwaRE architecture At a given timestep, data from the motes is gathered by the monitoring component using the probe component (i.e.sensor) of the DELTA-IoT simulator.The probe collects information about the link interference in the form of Signal to Noise Ratio (SNR).The monitored SNR value is forwarded to the Analysis and Planning components.

Fig. 9
Fig. 9 Extent of satisfaction of NFRs over time by applying Pri-AwaRE (using OLSAR)

Fig. 10
Fig. 10 Satisfaction of NFRs over time under stable conditions by applying Pri-AwaRE (using OLSAR) and RE-STORM (using DESPOT)

Fig. 13
Fig. 13 Topologies selection under stable environmental context a Pri-AwaRE and b RE-STORM

Table 1
States of the IoT network in terms of NFR

Table 2
Reward values for the NFRs in the DELTA-IoT network

Table 4
Experiment Results for timestep 1 represents the expected utility value of NFRi i.e.Val M EC and Val R P L Sat N F R represents satisfaction level of NFRi i.e.Sat M EC and Sat R P L

Table 5
Confidence intervals for average satisfaction levels of NFRs Sat AV G represents the average satisfaction level of NFR

Table 11
Observation probabilities for RDM network

Table 12
Experiment results for timesteps 45-51 represents the expected utility value of NFRi i.e.Val MC , Val M R and Val M P Sat N F R represents satisfaction level of NFRi i.e.Sat MC , Sat M R and Sat M P

Table 13
Confidence intervals for average satisfaction levels of NFRs *Sat AV G represents the average satisfaction level of NFR