Feature-Model-Guided Online Learning for Self-Adaptive Systems

A self-adaptive system can modify its own structure and behavior at runtime based on its perception of the environment, of itself and of its requirements. To develop a self-adaptive system, software developers codify knowledge about the system and its environment, as well as how adaptation actions impact on the system. However, the codified knowledge may be insufficient due to design time uncertainty, and thus a self-adaptive system may execute adaptation actions that do not have the desired effect. Online learning is an emerging approach to address design time uncertainty by employing machine learning at runtime. Online learning accumulates knowledge at runtime by, for instance, exploring not-yet executed adaptation actions. We address two specific problems with respect to online learning for self-adaptive systems. First, the number of possible adaptation actions can be very large. Existing online learning techniques randomly explore the possible adaptation actions, but this can lead to slow convergence of the learning process. Second, the possible adaptation actions can change as a result of system evolution. Existing online learning techniques are unaware of these changes and thus do not explore new adaptation actions, but explore adaptation actions that are no longer valid. We propose using feature models to give structure to the set of adaptation actions and thereby guide the exploration process during online learning. Experimental results involving four real-world systems suggest that considering the hierarchical structure of feature models may speed up convergence by 7.2% on average. Considering the differences between feature models before and after an evolution step may speed up convergence by 64.6% on average. [...]


INTRODUCTION
A self-adaptive system is capable of modifying its own structure and behavior at runtime based on its perception of the environment, of itself and of its requirements [1], [2], [3].As an example, take a self-adaptive web application.Faced with a sudden increase in workload, the web application may deactivate its resource-intensive recommender engine in order to maintain its performance requirements [4].
As depicted in Figure 1, a self-adaptive system can conceptually be structured into two main elements [2], [5]: the system logic (aka.the managed element) and the selfadaptation logic (aka.the autonomic manager).The selfadaptation logic can be further structured into four main conceptual activities that leverage a common knowledge base [6].The four activities monitor the system and its environment, analyze monitored data to determine adaptation needs, plan adaptation actions, and execute these adaptation actions at runtime.
To populate the self-adaptation logic's knowledge base, software developers codify knowledge about the system and its environment, as well as how adaptation actions c 2019 IEEE.Personal use of this material is permitted.Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

System Logic
Fig. 1: Self-adaptive system reference model (based on [5]) impact on the system [7], [8], [9], [10].However, codifying this knowledge at design time may not be fully possible due to design time uncertainty [11].For example, when following a model-based adaptation approach, software developers define analytical models about the system and its environment from which adaptation actions are generated at runtime [12], [13], [14].However, such analytical models may not be accurate due to simplifying assumptions made at design time [9], [10], [15].As another example, when following a rule-based adaptation approach, software developers have to specify adaptation rules prescribing which adaptation action is executed in a given environment situation [16], [17], [18].This requires anticipating at design time the potential environment situations the system may encounter at runtime.However, for many application domains and in particular for open-world systems [3], anticipating all potential environment situations at design time is often infeasible [19].
Due to design time uncertainty, insufficient knowledge about the system and its environment may be codified at design time.As a result, a self-adaptive system may execute adaptation actions that do not have the desired effect, i.e., are ineffective.An ineffective adaptation action may have no effect at all, may only have a partial or sub-optimal effect, or may even have a negative effect on the system [20].
Online learning is an emerging approach to address design time uncertainty by employing machine learning at runtime.As depicted in Figure 2, online learning observes the live system and its environment in order to accumulate knowledge at runtime to update the self-adaptation logic's knowledge base.

Online Learning
Observe Update Monitor Knowledge Fig. 2: Online learning for self-adaptive systems Online learning was employed for model-based adaptation where the knowledge base includes analytical models of the system and its environment [10], [21], as well as for rule-based adaptation where the knowledge base contains adaptation rules [22], [23], [24].To concisely describe the problem addressed and our contributions, we focus on rulebased adaptation in the remainder of this paper.

Problem Statement
The performance of machine learning depends -to a large degree -on the amount of data available for learning [25], [26].When machine learning is used for online learning, this data is collected at runtime in order to be representative of the running system and its environment.Existing findings indicate that when online learning is used to learn adaptation rules, it typically takes quite many observe-andupdate iterations until the learning process converges to a set of effective adaptation rules [27], [28]. 1 As an example, when using supervised learning, the system has to collect a sufficient amount of training data that is representative of the system's environment in order to determine possible environment situations.As another example, when using reinforcement learning, the system has to perform many interactions with its environment in a trial-and-error fashion to explore which adaptation action should be executed in which environment situation.
Until online learning has converged, the system most likely executes inefficient adaptation rules, because not enough observations have yet been made.In addition to adaptation rules having no effect or being sub-optimal, some of them may lead to negative effects.As an example, an adaptation action may activate all optional system features, thereby leading to a surge in resource consumption and a degradation of system performance.Executing these adaption actions has real consequences as they happen in the live system [23].If online learning requires a high number of iterations to converge, the impact and costs of online learning can become prohibitive [27].How fast online learning converges is thus a very important factor [29].
Different strategies were used to speed up the convergence of online learning for self-adaptive systems.These strategies include choosing the best-performing variant of a learning algorithm [30], controlling the rate of how many not-yet-executed adaptation actions are explored [31], employing transfer learning [32], and using an initial offline learning phase [33].However, these strategies do not explicitly consider the following two specific properties of a selfadaptive system's adaptation space (aka.the set of all possible adaptation actions [34], [35], [36]): Large adaptation space.To update the knowledge base, existing online learning approaches for rule-based adaptation randomly explore the adaptation space by selecting notyet executed adaptation actions. 2 The speed of convergence depends on the size of the adaptation space, because each not-yet executed adaptation action has an equal chance of being selected.If the adaptation space is small, the speed of convergence of online learning using such random adaptation action selection can be acceptable.However, the adaptation space of a self-adaptive system can be large [15], [44].In such a case, random adaptation action selection can lead to slow convergence [23], [40], [45].
There exist machine learning techniques that can cope with a large space of actions.However, these techniques require the space of actions to be continuous.A continuous space of actions is represented by continuous variables, such as real-valued variables.Setting a specific angle for a robot arm or changing the set-point of a thermostat are examples for a continuous space of actions [29].Many kinds of self-adaptive systems have a non-continuous space of adaptation actions.Examples include architecture-based self-adaptive systems, where adaptation actions are changes of component compositions [18], and dynamic software product lines, where adaptation actions are the activation and deactivation of system features [46].As an example, take a system that offers ten optional features that may be dynamically activated and deactivated in any combination and that allows changing from any active feature combination to any other possible feature combination. 3Its adaptation space thus contains 2 10 = 1024 adaptation actions.These 1024 adaptation actions cannot be represented as a continuous variable.
Change of adaptation space due to system evolution.Existing online learning approaches for rule-based adap-tation are unaware of system evolution [47], [48].They do not consider the fact that a self-adaptive system -like any software system -may undergo evolution [49], [50].Self-adaptation refers to the automatic modification of the system by itself.Evolution refers to the modification of the system by humans [51], [52].During evolution, software developers may modify the system to correct bugs, remove rarely used features, or introduce new features [53].System evolution means that the adaptation space may change.In the example from above, one of the ten features may be removed in an evolution step, thereby reducing the system's adaptation space.As another example, a new feature may be introduced in an evolution step, thereby adding new possible feature combinations and thus adaptation actions to the adaptation space.
There exist machine learning techniques that can cope with environments that change over time (so called nonstationary environments [54]).However, they do not consider that the space of actions may change over time.Being unaware of the changes to the adaptation space means that these techniques may explore adaptation actions that are no longer valid and thus may even have negative effects on the system.Also, they are unaware of the new adaptation actions and thus will not select these new adaptation actions even though they may lead to effective adaptation rules.A simple solution would be to restart the online learning process from scratch after each evolution step.However, this means that knowledge already gained is lost, and thus cannot be used to speed up the convergence of online learning after an evolution step.

Contributions
We introduce online learning strategies that address potentially large adaptation spaces and that can cope with a change of the adaptation space due to system evolution.Our online learning strategies use feature models [10], [55], [56], [57] to give structure to the system's adaptation space.A feature model is a tree or a directed acyclic graph of features, organized hierarchically.Feature models thereby provide additional information to guide the online learning process.Concretely, we make the following two contributions.
Using the feature model structure to explore the adaptation space.Our main idea is to leverage the hierarchical structure of the feature model to take more informed decisions during the exploration of the adaptation space than random exploration does.The strategies we propose systematically traverse the feature model to select the next adaptation action to be executed and observed.To illustrate, our strategies may first explore the different sub-features of a parent feature, before exploring features which are not directly related to the parent feature.We argue that via such systematic exploration, our learning strategies can speed up convergence.
Using feature model deltas to capture changes in the adaptation space due to system evolution.Our main idea is to make online learning aware of changes in the adaptation space by using feature model deltas.The strategies we propose analyze the delta between a feature model before and after an evolution step.Thereby, our strategies can identify added and removed adaptation actions.Removed adaptation actions are no longer considered, while added adaptation actions are targeted first, as they may offer new opportunities for finding effective adaptation rules.In addition, our strategies systematically reuse past knowledge about whether the presence of a certain feature contributed to an effective adaptation rule.We argue that by considering new adaptation actions and by reusing past knowledge, our learning strategies can speed up convergence.
We experimentally assess our online learning strategies using four real-world systems.We compare our experimental results with the results for random exploration of the adaptation space as a baseline.
The remainder of this paper is organized as follows.Section 2 explains the key concepts of feature models and rule-based adaptation, and also introduces a running example.Section 3 describes and illustrates the random exploration of the adaptation space to serve as baseline for our contributions and experiments.Section 4 explains how we use the feature model structure to systematically explore the adaptation space.Section 5 explains how we use feature model deltas to capture adaptation space changes due to system evolution.Section 6 presents the design and results of our experiments.Section 7 analyzes related work.Section 8 concludes with a discussion on limitations and directions for future work.

FUNDAMENTALS AND RUNNING EXAMPLE
This section explains the key concepts of feature models and rule-based adaptation, which are illustrated by a running example of a web application.

Feature Models for Self-adaptive Systems
As introduced above, our online learning strategies exploit additional knowledge about the self-adaptive system's adaptation space encoded in the form of feature models.
A feature model is a tree or a directed acyclic graph of features [56], [58], organized hierarchically.A feature can be decomposed into mandatory, optional or alternative subfeatures.A mandatory sub-feature has to be activated if its parent feature is activated.An optional sub-feature may or may not be activated if its parent feature is activated.At least one of the alternative sub-features has to be activated if their parent feature is activated.Additional constraints, such as "excludes" or "requires" constraints, between two features, express inter-feature dependencies.Thereby, a feature model describes the possible and allowed feature combinations of a system.
While feature models are traditionally used in software product line engineering to define the set of systems of the product line at design time [56], dynamic software product lines extend the use of feature models to runtime [10], [46], [57].In dynamic software product lines, the feature model describes the set of possible adaptation actions in the form of feature combinations, i.e., set of to be activated and deactivated system features.In a similar way, feature models can be used for architecture-based self-adaptive systems to define the possible runtime compositions of system components [59], [60].A feature model thereby can be used to define a self-adaptive system's adaptation space, where each adaptation action is expressed in terms of the feature combination to be active after adaptation.
Figure 3 shows the feature model of a self-adaptive web application we use as a running example.The DataLogging feature is mandatory (which means it is always active), while the ContentDiscovery feature is optional.The Data-Logging feature has three alternative sub-features, which express that at least one of the three levels of data logging must be active: Min, Medium or Max.The ContentDiscovery feature has two optional sub-features Search and Recommendation.The constraint Recommendation ⇒ Max ∨ Medium specifies that a sufficient level of data logging is required to collect enough information about the web application's users and transactions to make good recommendations.

Rule-based Adaptation
In rule-based adaptation, adaptation rules specify which adaptation action is executed in response to a given environment situation [16], [17], [18].To illustrate, let us consider that the web application introduced above has to adapt to changes in its environment in order to maintain its performance requirements.More concretely, the web application should adapt to changing workloads (i.e., number of simultaneous users) in order to keep its response time below 500ms.A software developer may express an adaptation rule for the web application such that it turns off some of the features in the presence of a higher workload, thereby reducing the resource needs of the application.Figure 4 shows a concrete example.Let us assume a software developer has specified an adaptation rule that states if the system faces and environment situation of more than 1000 concurrent users, then the Search feature should be deactivated.Here, the software developer estimates that deactivating the Search feature will lead to a sufficient reduction in resource needs.As shown in the figure, if the system runs in feature combination {DataLogging, Max, ContentDiscovery, Search, Recommendation}, this adaptation rule results in adapting the system to feature combination {DataLogging, Max, ContentDiscovery, Recommendation}.
We formally define the effectiveness of an adaptation rule using Zave and Jackson's framework applied to selfadaptive systems as presented in [61].Let E be the environment situation that triggers the rule, S the self-adaptive system after the execution of the adaptation action specified in the rule, and R the system's requirements.We consider an adaptation rule effective if S, E |= R.This means the rule is effective if the system after adaptation meets its requirements.

RANDOM ADAPTATION SPACE EXPLORATION AS BASELINE
As explained in Section 1, existing online learning strategies for rule-based adaptation randomly explore the adaptation space in order to select not-yet executed adaptation actions.
In this section we thus introduce the random online strategy as a baseline against which we compare our online learning strategies.

Illustration
Online learning in our running example from Section 2 observes whether the adaptation rule to deactivate the Search feature in the presence of more than 1000 users is effective.This means online learning observes whether the system after executing this adaptation rule is able to meet the response time requirements given the increased number of users.This adaptation rule may turn out to be ineffective, because only turning off the Search feature may not be sufficient to meet the response time requirements.
If the adaptation rule is ineffective, a random online learning strategy explores different alternative adaptation actions at random, until an adaptation action is found that is effective in the given environment situation.Table 1 illustrates such a random exploration of the adaptation space.We assume that only the feature combination {DataLogging, Min, ContentDiscovery, Search} is able to meet the response time requirements.In the example it takes six iterations until this effective feature combination is found and thus will be used as adaptation action in the adaptation rule.

Realization
Below we provide a potential realization of a random learning strategy -we call Rand -to serve as baseline realization for our experiments in Section 6.The Rand strategy chooses a new feature combination randomly from the adaptation space of the self-adaptive system.Given a feature model M that specifies the adaptation space, with F being the nonempty set of features of M, the Rand strategy is realized by the iterative Algorithm 1.
Algorithm 1 Random Strategy (Rand) f ← randomSelect(F ); 9: comb ← randomSelect(C f ); until F = ∅; 20: end function Note that the realization of the Rand strategy only uses the feature model to address the problem that enumerating all feature combinations of a large adaptation space may not scale due to its exponential complexity [62].It does not use the structure of the feature model to systematically explore the adaptation space.
The algorithm first selects a feature f randomly (line 8), e.g., the feature Recommendation in our running example from Section 2. Then the algorithm determines all possible feature combinations C f containing f (line 9) that were not previously selected (line 9 and line 15).For instance, if Recommendation is selected, then Max or Medium must be selected and Search can possibly be selected as well (but not Min because of the constraint expressed in the feature model; see Figure 3).To realize the getFeatureCombination-With() operation, we rely on the observation that computing all possible feature combinations beginning with a partial feature combination may scale better than computing all possible feature combinations from scratch [63].
If several possible feature combinations exist (i.e., if , one feature combination is selected randomly (line 11).If no effective feature combination among C f is found, the strategy starts over by selecting another feature and continues as long as no other new feature is available.
At the end of each iteration the selected feature f is removed from the set of all features (line 17).Together with not visiting again an already visited feature combination (line 9 and line 15), this effectively implements a random selection without replacement.Such a random selection without replacement is important to ensure a fair baseline for the comparisons in Section 6, as our strategies also select without replacement.

USING THE FEATURE MODEL STRUCTURE TO EXPLORE THE ADAPTATION SPACE
In this section we explain and illustrate our online learning strategies that use the feature model structure to systemati-cally explore the adaptation space.

Solution Idea and Illustration
As introduced above, the main solution idea for our online learning strategies is to systematically explore the adaptation space by hierarchically traversing the feature model.Below, we introduce two variants of online learning strategies that differ in the way the feature model is traversed.
Incremental Strategy (Inc).This strategy takes advantage of the semantics typically encoded in the structure of feature models.Non-leaf features in a feature model are usually abstract features used to better structure variability [64].These abstract features often do not have an impact at implementation level, but delegate their realization to their sub-features.Sub-features thus may offer different realizations of their abstract parent feature.The sub-features of a common parent feature, i.e., sibling features, can thus be considered semantically connected.In our running web application example (see Figure 3), the ContentDiscovery feature has two sub-features Search and Recommendation offering different concrete ways how a user may discover online content.The idea behind the Inc strategy is to exploit the information about these potentially semantically connected sibling features and systematically explore them first before exploring other features.
To illustrate the Inc strategy, let us start with feature combination {DataLogging, Max, ContentDiscovery, Recom-mendation} of the ineffective adaptation rule from Section 3. The Inc strategy first explores sibling features starting from this feature combination.In our example, let us say the Inc strategy starts exploration of the two sibling features Recommendation and Search. 4The Inc strategy systematically explores all feature combinations involving the Recommendation feature, and then moves to systematically exploring all feature combinations involving the Search feature.Table 2 shows a typical exploration sequence of adaptation actions of the Inc strategy (with the step-wise exploration of sibling features highlighted in gray).In this case, it takes 5 iterations until an effective adaptation action is found (one less than in the random search example from Section 3).

Iteration
Example of adaptation space exploration via the incremental strategy (Inc).

Feature Degree Strategy (Deg).
Even though the Inc strategy makes use of the structure and hierarchy of the feature model, it still contains several random elements.In particular, it randomly determines the order in which sibling features are explored.To take a more informed decision about which of the sibling features to explore, we define the Deg strategy which makes use of the concept of feature degree.We define the feature degree for a given feature f as the number of feature combinations that contain f .The intuition here is that there may be a higher probability of finding an effective feature combination when considering features with high feature degrees, as they are present in more feature combinations.
In our example, the feature degree of the Search feature is 5, while of the Recommendation feature it is only 4 (due to the constraint requiring at least the Medium logging level).The Deg strategy thus first explores all feature combinations involving the Search feature before exploring other feature combinations.Table 3 shows a typical exploration sequence of the Deg strategy (with the exploration of the sibling feature with the highest feature degree highlighted in gray).In this case, it takes 4 iterations until the effective adaptation action is found (one less than for the Inc strategy).Note that the above examples were purposefully chosen to show the potential improvements of our strategies.As we will experimentally analyze and discuss in Section 6, there may be some situations in which our strategies may not speed up convergence.

Realization
In what follows we explain how we realize the above learning strategies.
Incremental Strategy (Inc).The Inc strategy is realized by the recursive Algorithm 2. The algorithm is initialized by randomly selecting an arbitrary leaf feature f (i.e., a feature with no sub-features) among all leaf features that are part of the current feature combination (lines 5-6). 5Then, the set of feature combinations C f containing feature f is computed (line 7), while the sibling features of feature f are gathered into a dedicated siblings set (line 8).
While C f is non-empty, the strategy explores one randomly selected feature combination from C f and removes the selected feature combination from C f (lines 14-21).If C f is empty, then a new set of feature combinations containing a sibling feature of f is randomly explored, provided such sibling feature exists (lines [23][24][25][26][27].If no feature combination containing f or a sibling feature of f is found, then the strategy moves on to the parent feature of f .Moving to a respective parent feature is repeated until the root feature is reached (lines [29][30][31][32][33][34][35][36][37]. Feature Degree Strategy (Deg).The Deg strategy is realized by modifying Algorithm 2 to make use of the feature degree as shown in Algorithm 3. On the one hand, the feature degree is used to determine which leaf feature to start the learning from.Instead of randomly selecting a leaf feature, as done in the Inc algorithm (lines 5-6), the end if 40: end function Deg strategy selects a leaf feature with the highest feature degree.On the other hand, instead of randomly choosing sibling features as done in the Inc algorithm (line 24), the Deg strategy uses the feature degree to define an order in which sibling features are chosen (starting with the sibling feature with the highest feature degree).
To realize the confDeg operation, off-the-shelf feature model analysis tools (e.g., see [63]) can be used to compute the number of possible feature combinations containing f .

USING FEATURE MODEL DELTAS TO CAPTURE ADAPTATION SPACE CHANGES DUE TO EVOLUTION
In this section we explain and illustrate our online learning strategies that use the feature model deltas to capture adaptation space changes due to system evolution.

Solution Idea and Illustration
As introduced above, our main solution idea to capture a change in the adaptation space due to system evolution is to use feature model deltas.We do so by analyzing the delta between a feature model before and after an evolution step.Thereby we can identify new possible adaptation actions that were added to the adaptation space, as well as adaptation actions that were removed from the adaptation space.Removed adaptation actions are no longer explored, while added adaptation actions are targeted first, as they may offer new opportunities for finding effective adaptation rules.
Let us assume the adaptation space changes from an adaptation space A before an evolution step to an adaptation space A after an evolution step.Given a feature model M that specifies A (i.e., the set of all possible feature combinations) and a feature model M that specifies A , then two main types of changes of the adaptation space can be detected as deltas between feature models M and M . 6 Added feature combinations.New features may be added to M or existing constraints may be removed or relaxed from M (such as "requires" or "excludes" constraints).This means that new feature combinations are added to the adaptation space A .As an example in our web application, a new sub-feature Optimized might be added to the DataLogging feature, providing a more resource efficient logging implementation.Thereby, new feature combinations are added to the adaptation space, such as {DataLogging, Optimized, ContentDiscovery, Search}.As another example, the Recommendation implementation may have been improved and it now can work with the Min logging feature.This relaxes the initial constraint as shown in Figure 3, and adds new feature combinations such as {DataLogging, Min, ContentDiscovery, Recommendation}.
Removed feature combinations.Symmetrical to the above, features from M may be removed or constraints may be added or tightened in M .This means that feature combinations are removed from the adaptation space.
The idea of using feature model deltas to capture these adaptation space changes is two-fold.On the one hand, our strategies first explore the feature combinations that were added to the adaptation space by an evolution step, and then explore the remaining feature combinations if needed, i.e., we first explore feature combinations from A \ A. The rationale is that added feature combinations might offer new opportunities to find effective adaptation actions and thus should be explored first.6.A modification of a feature's implementation is not visible in a feature model.We discuss this further in Section 8.1.
On the other hand, our strategies accumulate knowledge across the evolution steps about whether the presence of a certain feature may help maintain the system's requirements or not.The strategies first explore feature combinations from A \ A that include as many as possible features that were part of effective feature combinations before an evolution step and as little as possible features that were part of ineffective feature combinations before an evolution step.
Table 4 shows a typical exploration sequence of such an evolution-aware strategy.Online learning targets the new, more resource-efficient Optimized feature, and in addition, chooses a feature combination that does not include the Recommendation feature, as this was not in any feature combination able to meet the response time requirements so far.In this case, it would thus only take 1 iteration until the effective adaptation action is found.

Realization
By extending the above three learning strategies Rand, Inc and Deg, we realize three evolution-aware learning strategies: EvoRand, EvoInc and EvoDeg.
To exploit the knowledge across evolution steps whether the presence of a certain feature helps maintain the system's requirements or not, we encode this knowledge in two evolving sets of features: • F (+) includes features of M that were part of at least one effective feature combination, as well as features that were not yet activated in any feature combination.
• F (−) includes features of M that were only part of ineffective feature combinations.
Based on the actual observations of online learning, these sets of features are updated accordingly.Features thus may move from one set to the other over the course of learning and evolution.In addition, features may be removed or added to the sets due to the deltas between M and M .
Our strategies prioritize feature combinations that include features from F (+) and do not include any feature from F (−) .To this end, we extend the algorithms from Section 4.2 along two directions.First, we select features from F (+) to determine the feature combinations to be explored.Second, we first explore those new feature combinations that do not contain features from F (−) .
To select features from F (+) , the algorithms are extended as follows.
Evolution-aware random strategy (EvoRand).Other than in Algorithm 1 (line 8), feature f is first randomly selected among the features in F (+) , and not randomly selected among all possible features in F.
Evolution-aware incremental strategy (EvoInc).Other than in Algorithm 2 (lines 6 and 24), feature f is first randomly selected among those leaf or sibling features that are in F (+) , and not randomly selected among all leaf or sibling features in F.

Evolution-aware feature degree strategy (EvoDeg).
Other than in Algorithm 3 (lines 6 and 24), feature f is first randomly selected among those leaves or sibling features with the highest feature degree that are also in F (+) , and not randomly selected among all leaves or sibling features in F with the highest feature degree.
To realize first exploring new feature combinations that do not contain features from F (−) , the algorithms are extended as follows.Whenever computing the set of feature combinations C f to be explored, this is performed in the following increments.
First, the added feature combinations (A \ A), which do not contain any features from F (−) , are explored. 7 Then, the remaining feature combinations (A ∩A), which do not contain any features from F (−) , are explored.
And only if all these feature combinations have been explored, the remaining ones are explored.

EXPERIMENTS
This section presents the design and the results of a set of experiments to assess and compare our online learning strategies with the random learning strategy.The random learning strategy serves as our baseline as it represents existing online learning strategies for rule-based adaptation.

Research Questions
We aim to answer the following research questions: RQ1 (Convergence of feature-model-guided online learning).How does the speed of convergence using feature models to explore the adaptation space compare to the speed of convergence using random exploration?We aim at determining whether using knowledge about the structure of the adaptation space speeds up convergence when compared with a random learning strategy, thereby helping address the potentially large size of the adaptation space.
RQ2 (Convergence of evolution-aware online learning).How does the speed of convergence of the evolution-aware learning strategies compare to the convergence of evolutionunaware learning strategies?We aim at assessing whether taking into account knowledge about system evolution speeds up convergence of online learning, thereby helping to cope with system evolution.
RQ3 (Impact of evolution-aware online learning strategies on system quality).What impact on the quality characteristics of a self-adaptive system can be observed when using evolution-aware online learning strategies?With this question we aim at understanding how our evolution-aware learning strategies perform when evolving and adapting a system during actual execution.In particular, we are interested in the effect that evolution-aware learning strategies have on system quality when compared with evolution-unaware learning strategies.
7. The notation M\S means that all features in S together with their sub-trees (all children features) are removed from the feature model.

Design
Our experiments build on four real-world systems and datasets that are listed in Table 5.We purposefully chose the four systems to exhibit different characteristics of the system's adaptation space.The systems differ with respect to the size of the adaptation space (i.e., the number of feature combinations), the number of features and the depth of the feature models.The feature models of all four systems are provided as supplemental material to this paper.The CloudRM dataset stems from a parametrized cloud resource management system and was created as part of previous work on cloud computing [65].CloudRM controls the allocation of computational tasks to virtual machines and the allocation of virtual machines to physical machines in a cloud data center.Moreover, it continuously re-optimizes the placement of virtual machines on physical machines using live migrations to respond to changes in the workload, with the overall aim of minimizing the total energy consumption of the data center while keeping the number of migrations low.CloudRM supports multiple algorithms for the selection of a virtual machine for a new task, and the algorithms can be parameterized using different sets of parameters.Adaptation actions for CloudRM are thus the selection of different algorithms and the parametrization of these algorithms.We consider energy consumption and number of virtual machine migrations as the specific quality characteristics of interest for the CloudRM system.
The BerkeleyJ, LLVM, and BerkeleyC datasets were collected by Siegmund et al. [66] and were used for experimentation with reconfigurable systems in order to predict their performance.They describe the reconfigurable open source database systems BerkeleyJ and BerkeleyC, as well as the reconfigurable open source LLVM compiler.We chose these systems because the datasets include performance measurements for all system configurations, which were measured using standard benchmarks.As adaptation actions we consider changing at runtime the configurations offered by these systems.We consider response time as the specific quality characteristic of interest, because this was available across all three datasets. 8 All learning strategies were implemented in Java.Feature model management and analysis were performed using the FAMA library. 9More specifically, we used the FAMA 8. We did not consider the other datasets in Siegmund et al., because in these datasets many of the configurations are associated with the same response time.This means the chance of finding an effective feature combination is very high, making the learning process converge too fast to observe any differences between the strategies.library to identify possible feature combinations from a feature model and reason on partial feature combinations in order to compute the feature degree.

Execution
To answer RQ1, we use the four datasets as follows.We (i) determine a target quality requirement value by randomly selecting one value from the dataset, and (ii) run the learning strategies until they find an effective feature combination, i.e., a feature combination that achieves this target value.
For each strategy, we measure the speed of convergence by counting the number of iterations required to find an effective feature combination.We also measure the relative speed of convergence by computing the ratio of visited feature combinations over all feature combinations (i.e., the size of the adaptation space).To avoid chance effects, we run the experiment n times, where n is the size of the adaptation space of the respective system, and average the results.
To answer RQ2, we compare the evolution-aware strategies against the evolution-unaware ones using an evolution scenario for each of the systems.For each step of the evolution scenario, the experiment measures the speed of convergence.The measurement procedure follows the one described for RQ1.
For CloudRM, we use the following 4-step evolution scenario.The scenario starts from an initial system version with a single feature called Simple placement, which creates a dedicated virtual machine for each task to be deployed in the cloud.Each evolution step adds features for different placement algorithms: 1) Multiple placement is added, allowing a given number of tasks to be deployed on a virtual machine.2) Maxsize placement is added, creating virtual machines whose size is at most 0.25 times the capacity of the available physical servers.When there are multiple virtual machines that can accommodate a new task, a virtual machine is selected using the First-Fit (FF) heuristic, selecting the first virtual machine that fits the resource needs of the task.3) Maxsize placement is made parametrizeable by allowing various maximum virtual machine sizes; two new virtual machine selection heuristics Best-Fit (BF) and Worst-Fit (WF) are added.4) Consolidation Friendly placement is added, selecting a physical machine that can accommodate the given task, and then selecting a virtual machine hosted on the physical machine.
For BerkeleyJ, BerkeleyC, and LLVM, we simulate system evolution by first changing all optional features to mandatory ones, thereby reducing the size of the adaptation space.Then, we start from this reduced adaptation space and, one by one, randomly change the mandatory features back into the original optional features, thereby incrementally increasing the size of adaptation space.This results in an m-step evolution scenario, with m being the number of optional features.We defined evolution scenarios for the BerkeleyJ, BerkeleyC, and LLVM systems with m = 7, 7, and 10 respectively.
To answer RQ3, we employ the CloudRM system and measure the quality characteristics "energy consumption" and "number of virtual machine migrations".We use the same evolution scenario as for RQ2, but now we actually execute the adaptation actions in the running system.This means for each of the feature combinations explored by the learning strategies, the system is reconfigured accordingly at runtime.We measure the impact of these adaptations on system quality for the evolution-aware and evolutionunaware learning strategies.
This experiment is based on a real-world workload trace with 10,000 tasks, in total spanning over a time frame of roughly one month [67].To ensure consistency among the results, the same workload was replayed after each step in the evolution scenario.CloudRM decides on the placement of new tasks whenever they are entered into the system (as driven by the workload trace).Additionally, CloudRM reexecutes the placement algorithms every five minutes to re-optimize the placement of virtual machines.To allow sufficient time in the experiment to observe the impact of the execution of an adaptation action selected by online learning, CloudRM is allowed to run one hour after each adaptation before the next adaptation action is selected.

Results
For RQ1, Table 6 presents the measurements of the convergence speed of the different online learning strategies.It gives for each system and for each strategy, the average and relative number of feature combinations explored until an effective one is found, as well as the relative reduction of the number of explored feature combinations compared to the baseline random strategy. 10 The results suggest that the learning strategies that consider the structure of the feature model (Inc and Deg) perform better than the baseline (Rand) for feature models with greater depth.Whenever the feature model is flat, i.e., when the feature model has only few levels, all strategies are 10.Note that the CloudRM results for energy and migrations are the same, as the strategies explore the feature model in the same order.very similar in terms of convergence, with around 50% of the feature combinations explored before finding the target one. 11This is the case for LLVM, which has a depth of 1, and also the case for BerkeleyC, which has a depth of 2. The reason is that flat models do not provide enough structure for our online learning strategies, and thus they behave like a random learning strategy.For feature models with greater depth, faster convergence is achieved when considering the structure of the feature model.This is the case, for instance, for BerkeleyJ with a depth of 5. Across all systems, the best results are achieved by the strategy with the lowest amount of randomness (Deg).For Deg, a speed up of convergence of up to 18.8% for BerkeleyJ was measured.This means that the Deg strategy had to explore 18.8% less adaptation actions than the baseline strategy.
For RQ2, Figure 5 plots the relative speed of convergence for each step of a system's evolution scenario.In addition, Table 7 shows the cumulative number of feature combinations explored across all steps of the evolution scenario, as well as their relative reduction.The two evolutionunaware strategies that performed worst resp.best in RQ1, were Rand resp.Deg.To measure the improvement of the evolution-aware strategies, we thus use their evolutionaware versions, i.e., we use EvoRand and EvoDeg.Like for RQ1, we use the evolution-unaware random strategy (Rand) 11. Results may vary for flat models that contain constraints, since constraints change the feature degree of involved features.
as the baseline for comparison.
As seen in Table 7, evolution-aware learning in general shows a strong reduction in the number of feature combinations to be explored before finding an effective one.These reductions range from 39.4% (BerkeleyC) to 92.5% (CloudRM -energy).There is only a small difference of 0.4% on average between the EvoDeg and the EvoRand strategies.This small difference indicates that first exploring feature combinations added by evolution and exploiting knowledge about whether a feature was part of an effective feature combination in the past has a stronger impact on convergence than how the adaptation space is traversed.
As visible in Figure 5, in few cases the evolution-aware  strategies may take wrong decisions.In these cases, the speed of convergence may be the same or slightly higher than for the evolution-unaware strategies.One reason is that even though a feature f 1 may have a negative impact on the system's quality in isolation, this feature in combination with a newly introduced feature f 2 in the evolved system can have a positive impact.However, our evolution-aware strategies currently do not consider such feature interactions and thus feature f 1 will be only explored after all other feature combinations that do not include this feature have been explored.Technically, feature f 1 is moved to the set F (−) (see Section 5.2).This is what happened in steps 4 and 5 with BerkeleyJ, step 7 with BerkeleyC and step 5 with LLVM.Another reason is that F (+) and F (−) do not contain many features, because the feature models before the evolution step are very small.Therefore, there is not much knowledge to be reused across the evolution step.This is what happened in step 1 for CloudRM (energy).
Here, the feature model at step 0 only includes a single feature, the Simple feature.
In Figure 5, we can also observe the general trend that the more evolution steps a system undergoes, the higher the reduction in the number of feature combinations to be explored becomes.This suggests that the speed of convergence increases with the number of evolution steps a system undergoes.This can be explained by the fact that the strategies accumulate knowledge from each of the previous evolutions.Technically, this means that from each evolution, the strategies learn more precise sets F (+) and F (−) .
For RQ3, Table 8 shows the percentage of savings in terms of energy consumption and number of virtual machine migrations for each of the four steps of the CloudRM evolution scenario.We use the worst performing evolutionaware strategy from RQ2 (EvoRand) and compare it with the random baseline (Rand).As visible in Table 8, evolutionaware learning leads to considerable savings in terms of the two considered quality characteristics in all but one case.The negative energy consumption saving for evolution step 1 can be explained by evolution-aware learning exploring more feature combinations than evolution-unaware learning, as explained before in relation to RQ2.
When comparing the two quality characteristics, higher savings can be achieved in terms of the number of migrations than in terms of energy consumption.This is due to the different placement algorithms in CloudRM having a larger variance in the number of migrations than in energy consumption.
Even though savings in energy consumption appear not to be very high, it should be noted that, in a large cloud data center, already a modest percentage of energy savings has   a considerable impact.For example, in a data center with 10,000 physical machines consuming on average 300 Watts, assuming a typical power usage effectiveness of 1.7 [67] and an average electricity price of 0.125 e per kWh [67], electricity would cost ca.15,000 e per day.In the case of CloudRM, taking evolution into consideration may save 12% of energy on average and thus may lead to savings of ca.1,800 e per day.
To provide a better understanding of how learning behaves over time, Figure 6 plots the cumulative amount of energy consumed and number of migrations performed for evolution step 4. In our experiments it took 180 iterations for the learning process to converge after an evolution step.As we used one-hour cycles for adaptation (as explained in Section 6.3), this means it takes 7.5 days to execute these 180 iterations.Figure 6 therefore reports the results computed based on the first eight days of the workload.
Figure 6 shows that the energy consumption and number of migrations are very low in the first couple of days, followed by increased activity starting from the third day.This pattern is a characteristic of the workload (shown at the top of Figure 6), and not related to CloudRM or our learning strategies.During the first couple of days, the workload is very low, rising after day 2. Thus, starting with day 3, evolution-aware learning makes an increasing difference: whilst evolution-unaware learning explores a number of ineffective feature combinations, evolutionaware learning more quickly finds effective feature combinations and thereby converges faster.As a result, both energy consumption and the number of migrations grow much slower when using evolution-aware learning.

Threats to Validity
Internal validity.The random baseline strategy as well as our online learning strategies -even if to a much lesser degree -exhibit random behavior during adaptation space exploration.In order to minimize chance effects due to this random behavior, we therefore repeated the experiments n times (n being the size of the adaptation space) and used the average results for comparison of the strategies.
We purposefully focused on evolution steps that increase the size of the adaptation space in order to assess in how far our strategies are able to capture adaptation spaces of increasingly larger size.Our experiments may be complemented by analyzing in how far the strategies differ when the size of the adaptation space is reduced.Even though in an adaptation space of reduced size, fewer feature combinations have to be explored -leading to faster convergence -there still may be differences in the way these fewer feature combinations are explored.
To measure the speed of convergence, we counted the number iterations until a feature combination is found that achieves a specific target quality requirement value.While being an objective metric, this definition of convergence is rather narrow.Providing a broader definition of convergence, say by giving lower or upper bounds around a target value, may deliver different results.
External validity.We used four real-world systems from different application domains to measure the speed of convergence of the different strategies.These four systems also differ in key aspects.They differ in the shape of their feature models, including the number of features and the depth of the feature model.Also, they differ in the size of the adaptation space.Overall, this contributes to the generalizability of our findings with respect to RQ1 and RQ2.
We used a cloud resource management system to measure the effect of online learning during actual system operation.Even though we have used a real-world workload trace, results are only for a single system.This limits generalizability of our findings with respect to RQ3.

RELATED WORK
This section discusses existing online learning techniques for self-adaptive systems and specifically analyzes them with respect to how they address convergence and system evolution.The discussion is structured along the two main machine learning paradigms used, reinforcement learning and supervised learning, as well as their combination.

Reinforcement Learning
In reinforcement learning, the system learns the effectiveness of its actions through interactions with its environment [45].Reinforcement learning can be used to solve sequential decision tasks [68], where the system aims to maximize its long-term rewards for taking a series of actions in an unknown environment.The system observes the current environment state and then selects and executes an action, which in turn may cause a change of the environment state.The system receives a reward value as feedback for executing an action.Reinforcement learning aims to find an action-selection policy that optimizes long-term rewards.
Amoui et al. propose using Q-Learning and SARSA (two concrete reinforcement learning algorithms) for selfadaptive systems [30].They propose speeding up convergence using offline learning and using simulations of the environment to generate a sufficient number of observations.They observe that different reinforcement learning algorithms may exhibit different speeds of convergence depending on the concrete application context.However, they do not take into account additional knowledge about the software system to speed up convergence.Also, they do not address system evolution.
Kim and Park propose Q-Learning as a concrete algorithm to learn adaptation rules at runtime [37].They propose using goal and scenario models to support a more systematic definition of the reinforcement learning problem (in terms of environment state variables and actions).They show that online learning may gradually optimize the set of adaptation rules, but provide no further convergence analysis.They do not address system evolution Dutreilh et al. employ Q-Learning for autonomic cloud resource management [38].They experiment with speeding up convergence by providing a good initial estimate for the Q-function (which represents the learned knowledge), as well as by using statistical estimates about the environment behavior.They indicate that system evolution may imply a change of system performance and sketch an idea on how to detect such drifts in system performance.Yet, they do not consider that evolution may also introduce or remove adaptation actions.
Barrett et al. propose using Q-Learning for autonomic cloud resource allocation [39].To facilitate convergence, they propose parallel learning.However, this requires that several systems concerned with the same resource allocation tasks exist and thus can share the information they learn in parallel.System evolution is not addressed, and in principle could become difficult if the involved systems underwent different forms of evolution in parallel.
Bu et al. employ Q-Learning for the self-configuration of cloud virtual machines and applications [40].They reduce the action space to a much smaller sub-set using two complementary strategies.On the one hand, they split the action space into coarse-grained sub-sets and for each of these subsets find a representative action using the simplex method.On the other hand, they encode domain knowledge into the learning process by setting experience-based thresholds for the adaption actions.Their experimental results indicate that their approach indeed can speed up convergence.Still, they do not address system evolution.
Jamshidi et al. and Arabnejad et al. apply fuzzy Q-Learning and fuzzy SARSA to learn fuzzy adaptation rules [31], [69].They observe that the rate of exploration (randomly choosing an action) versus exploitation (using learned knowledge to choose an action) affects convergence.As an extension of their initial work, they demonstrate that transfer learning may speed up learning [32].However, transfer learning is beneficial only if observations from the source environment are much cheaper to collect than samples from the target environment.Their approach does not address system evolution, as it assumes that the set of adaptation actions to be explored is fixed.
Caporuscio et al. propose using two-layer hierarchical reinforcement learning for multi-agent service assembly [41].Two layers of monitoring information serve as input to the learning process: local monitoring information and monitoring information collected by other agents.They observe that by sharing monitoring information, the learning process converges faster than when learning in isolation.Like for Barret et al., this requires that several systems exist that can share monitoring information.They do not address system evolution.
Filho and Porter [23] use an approach inspired by reinforcement learning to determine which composition of software components best suits the current environment situation.Their approach starts with the exhaustive exploration of every possible adaptation action in the adaptation space.They indicate that this is a clear limitation of their approach for what concerns scalability to large action spaces.Their learning strategy is unaware of changes in the adaptation space due to system evolution.
Wang et al. combine multi-agent reinforcement learning with game theory for adaptive service compositions [42].Their results indicate that convergence depends on the learning rate (i.e., to what degree newly observed rewards override past rewards), the number of agents collaborating, as well as the size of the adaptation space.As for Barrett et al., their approach requires that several systems exists that have the same learning goal.They do not address how a change in the service composition model due to system evolution may impact on learning.
In our own previous work [48], we sketch the principal dependencies between online learning and system evolution.On the one hand, we indicate how feedback from learning (e.g., if no effective feature combination could be found) may trigger system evolution.On the other hand, we analyze how the adaptation space may change during system evolution and how such a change may affect online learning.However, we neither provided concrete algorithms nor experimental results for considering system evolution during online learning.Also, we did not address the issue of convergence in the presence of large adaptations spaces.

Supervised Learning
In supervised learning, the system learns from a set of labeled training data (i.e., input data together with output data).Supervised learning can be used for one-shot decision tasks [68], where the input data provides the information available for decision making and the output data describes the correct decisions.
Esfahani et al. propose an online learning framework that uses feature models to represent the adaptation space [10].They learn an analytical model that captures how a feature combination impacts on the system's quality requirements, and use this model for planning adaptation actions.To realize their framework, they use the M5 model tree learning algorithm.They measure the time required to train the model for a given number of observations.However, they do not measure how many observations are needed to converge to an accurate model.Also, they do not consider how an evolution of the feature model may impact on the learning process.
Sykes et al. use probabilistic rule learning to update environment models at runtime [21].The environment model, encoded as rules in a logic program, describes what effect adaptation actions have on the environment.New rules are learned using execution traces of the running system.While they evaluate the time required for learning new rules from a set of execution traces, they do not analyze how many execution traces may be required to achieve convergence to a sufficiently accurate environment model.The impact of system evolution is not considered.
Qian et al. employ case-based reasoning for storing and retrieving adaptation rules [24].When facing a new situation, similar cases are retrieved from the case base to find an adaptation rule whose effectiveness has been shown earlier.If no similar case can be found in the case base, their approach resorts to using goal models to derive new adaptation rules.They provide no analysis of the convergence of their approach and whether using goal models to derive new adaptation rules may speed up convergence.They do not discuss how an update of the goal model due to system evolution would impact the case base.
Quin et al. explicitly address the problem of large adaptation spaces for model-based self-adaptation [44].If the adaptation space is large, the resource and time needed for model-based analysis can become prohibitive.The proposed solution is to employ classification and regression machine learning models to determine a representative and much smaller subset of the adaptation space and only analyze this subset.To speed up convergence, they use an offline learning phase to train sufficiently accurate machine learning models, which are then updated at runtime.They do not address system evolution and thus a change of the adaption space.

Hybrid Approaches
Hybrid approaches use reinforcement learning in combination with supervised learning.
Tesauro et al. use Q-Learning in combination with an artificial neural network for autonomic resource allocation in data centers [33].To capture large spaces of environment states, they use an an artificial neural network in the form of a multi-layer perceptron to approximate the value-function.In Q-Learning, the value function gives the expected cumulative reward when starting from a given environment state [45].To facilitate convergence, they perform offline learning using queuing models.They do not address the problem of large action spaces and whether their chosen function approximation may be applicable.Also, they do not address system evolution.
Xu et al. employ Q-Learning in combination with artificial neural networks for the automatic configuration of cloud virtual machines and applications [43].Like Tesauro et al., they use a multi-layer perceptron to approximate the value-function.In addition, they perform an an offline learning phase to find a good initial estimate of the action selection policy, thereby facilitating convergence at runtime.Experimental results indicate that such an offline policy initialization can indeed speed up convergence.System evolution is not addressed by their approach.
Moustafa and Zhang propose multi-agent Q-Learning in combination with function approximation via linear regression for adaptive service compositions [28].To speed up convergence, they propose using collaborative learning, where multiple systems simultaneously explore the set of concrete services to be composed.They observe that collaborative learning may significantly speed up exploration.However, like for Barrett et al. (see above), this requires that several systems with the same learning goal exist.They do not address system evolution.
Zhao et al. propose using reinforcement learning in combination with case-based reasoning to generate and update adaptation rules [22].To populate the case base, they use offline reinforcement learning to learn adaptation rules for different system goals.At runtime, case-based reasoning is used to select the best fitting rule and reinforcement learning is used to fine-tune this rule.Their approach may take as long to converge on an optimal rule set as online learning from scratch, but it may start with a higher effectiveness of the adaptation rules.Even though the approach can handle changes in the priorities of system goals, it does not consider an evolution of the system itself.

Summary of Related Work
While several approaches in the literature consider how to speed up the convergence of the online learning process, only one approach explicitly addresses the problem of large adaptation spaces [44].This approach focuses on reducing the size of the analysis models for model-based adaptation.In contrast, we explicitly address the problem of large adaptation spaces for rule-based adaptation by introducing online learning strategies that exploit additional information about the software system in the form of feature models.
System evolution and the impact it may have on the online learning process is not addressed in the literature, except in conceptual form in our own previous work [48].We address this gap by introducing evolution-aware online learning strategies for rule-based adaptation.

CONCLUSION
We introduced online learning strategies that address potentially large adaptation spaces and that can cope with a change of the adaptation space due to system evolution.Our online learning strategies use feature models to give structure to the system's adaptation space and thereby guide and speed up the online learning process.
By leveraging the hierarchical structure of feature models, our strategies reduce the amount of randomness when exploring the adaptation space.The strategies systematically traverse the feature model to select the next adaptation action to be executed and observed.We thereby address the problem that in the presence of large adaptation spaces, random exploration can lead to slow convergence.
By analyzing the delta between a feature model before and after an evolution step, we make online learning aware of changes in the adaptation space.Thereby, our strategies can identify added, removed and retained adaptation actions.We use this information to reuse knowledge across the evolution steps in order not to have to start online learning from scratch, as this would mean knowledge already gained is lost and therefore cannot be used to speed up the convergence of online learning after an evolution step.
Experimental results involving four real-world systems suggest that using feature models to structure the adaptation space can speed up convergence of online learning.Results indicate that considering the structure of the adaptation space speeds up convergence by up to 18.8% (with 7.2% on average).Additionally considering deltas in the adaptation space due to system evolution speeds up convergence by up to 92.5% (with 64.6% on average).Experimental results for a cloud management system indicate that this faster convergence may lead to energy savings of up to 35.5% (with 12.0% on average) and a reduction of virtual machine migrations of up to 89.4% (with 74.3% on average).
To conclude, we discuss limitations of our online learning strategies and provide pointers to future work.

Limitations
Feature interactions.We currently do not consider the impact of feature interactions when determining which features to explore in the evolved system.This means that features that were not part of effective feature combinations may be explored only very late for the evolved system, even though they would lead to an effective feature combination together with the new features.Considering such feature interactions may allow improving our evolution-aware online learning strategies.Existing solutions for feature interaction analysis in software product lines [70], [71] may be used to determine such feature interactions.
Feature modifications.To address system evolution, our strategies analyze the differences in the feature models before and after an evolution step.Thereby, our strategies can determine feature combinations that were added or removed from the system's adaptation space.A further possible change introduced during system evolution is the modification of a feature's implementation.However, such a change is not visible in a feature model.Encoding such kind of knowledge in the feature models thus could further improve our online learning strategies.
Adaptation constraints.Our learning strategies are based on the assumption that switching from an active feature combination to any other possible feature combination is always possible, i.e., we assume that there are no technical or logical constraints on adaptation.On the one hand, this means we are not concerned with the technicalities of how to switch between the feature combinations in the running system.This is the scope of other work, such as [72], [73], which thus may serve to address this concern.On the other hand, we do not take into account stateful or sequential constraints on adaptation itself.This means, we can directly switch between any of the feature combinations in the adaptation space without the need to go through intermediate feature combinations.However, in reality, only certain paths may be permissible to reach another feature combination from the current one.To consider adaptation paths, our strategies could be enhanced by building on work such as [74], [75].
Risks of online learning.Despite many successful applications of online learning for self-adaptive systems, online learning may not be applicable for all kinds of self-adaptive systems.Systems may operate in an environment where the trial-and-error nature of online learning may not be tolerable, because adaptation actions may harm their environment [45].A typical example are safety-critical systems.For such kinds of systems, online learning may face a too high risk as to be practically applicable.

Future Work
As part of our future work, we aim to address the current limitations of our online learning strategies.In particular, this includes relaxing the assumption that switching from one feature combination to any other feature combination is always possible, as well as considering feature interactions and feature modifications during evolution.In addition, we envision two main extensions: Extension to model-based adaptation.We focused on online learning strategies for rule-based adaptation.Yet, the main ideas underlying our strategies may also be applicable for model-based adaptation.In model-based adaptation, the aim of online learning is to learn analytical models that facilitate generating effective adaptation actions.To this end, representative observations of the system and environment need to be collected.As an important difference to rulebased adaptation, convergence has to be measured differently in model-based adaptation.Here, the accuracy of the model is a prerequisite for generating effective adaptation actions.Therefore, it is of interest how fast online learning converges on an accurate model.
Extension of reinforcement learning.Reinforcement learning is widely used for online learning for self-adaptive systems.Our online learning strategies may be used to extend existing reinforcement learning algorithms.As an example, they may be used to augment the exploration phase of reinforcement learning algorithms, such as SARSA or Q-Learning [45].Instead of randomly selecting the next action during exploration, our strategies may be used to better guide action selection.However, existing reinforcement learning algorithms assume that the set of actions remains constant.Further work is thus required to understand how evolution-aware online learning strategies may be integrated into reinforcement learning algorithms.

SUPPLEMENTAL MATERIAL Feature Models of the Exemplar Systems
The experiments presented in the paper build on four realworld exemplar systems and datasets.
As depicted in Table 9, the systems differ with respect to the size of the adaptation space (i.e., the number of feature combinations), the number of features and the depth of the feature model.The feature models of all four systems are provided on the following page.

Fig. 5 :
Fig. 5: Convergence for different steps of the evolution scenarios (x-axis: evolution step; y-axis: relative number of feature combinations explored until convergence)

Andreas
Metzger is senior academic councillor at the University of Duisburg-Essen and head of adaptive systems and big data applications at paluno, the Ruhr Institute for Software Technology.He is steering committee vice-chair of the European Technology Platform on Software, Services, Cloud and Data (NESSI) and deputy secretary general of the EU Big Data Value Association (BDVA).His background and research interests are software engineering for data-intensive and self-adaptive systems.Cl ément Quinton is associate professor in the Spirals group at University of Lille.His research activities focus on highly-configurable and distributed software systems operating in uncertain and evolving environments with the aim to devise formal models, methods and tools to cope with run-time evolution and adaptation of such systems.Zolt án Ád ám Mann is senior researcher at paluno, the Ruhr Institute for Software Technology of the University of Duisburg-Essen.He received his PhD in Computer Science from Budapest University of Technology and Economics, Hungary.His research interests include software engineering for self-adaptive systems and adaptive resource management in cloud computing.Luciano Baresi is full professor at the Politecnico di Milano.Luciano was visiting professor at the University of Oregon (USA) and visiting researcher at the University of Paderborn (Germany).His research interests are in the broad area of software engineering and include formal approaches for modeling and specification languages, distributed systems, service-based applications and mobile, self-adaptive, and pervasive software systems.Klaus Pohl is full professor for software systems engineering at the University of Duisburg-Essen and director of paluno, the Ruhr Institute for Software Technology.From 2005 to 2007 he was a founding director of lero, the Irish Software Engineering Research Centre.His research interests include requirements engineering, software product line engineering and engineering of selfadaptive systems.
Figure 7 provides a key to the symbols used in the feature models.

Fig. 7 :
Fig. 7: Key to symbols for feature models

TABLE 3 :
Example of adaptation space exploration via the feature degree strategy (Deg).

TABLE 4 :
Example of adaptation space exploration via an evolution-aware strategy.

TABLE 5 :
Systems and datasets used for the experiments (ordered by size of adaptation space).

TABLE 6 :
Average (and relative) number of feature combinations explored until convergence and reduction compared to baseline.

TABLE 7 :
Cumulative number of feature combinations explored during system evolution and reduction (in %) compared with baseline.

TABLE 8 :
Savings in energy consumption and number of migrations, achieved through evolution-aware learning for CloudRM.

TABLE 9 :
Systems and datasets used for the experiments.