Mind the Gap! Automated Anomaly Detection for Potentially Unbounded CardinalityBased Feature Models
 9 Citations
 860 Downloads
Abstract
Feature models are frequently used for specifying variability of userconfigurable software systems, e.g., software product lines. Numerous approaches have been developed for automating feature model validation concerning constraint consistency and absence of anomalies. As a crucial extension to feature models, cardinality annotations and respective constraints allow for multiple, and even potentially unbounded occurrences of feature instances within configurations. This is of particular relevance for useradjustable application resources as prevalent, e.g., in cloud computing. However, a precise semantic characterization and tool support for automated and scalable validation of cardinalitybased feature models is still an open issue. In this paper, we present a comprehensive formalization of cardinalitybased feature models with potentially unbounded feature multiplicities. We apply a combination of ILP and SMT solvers to automate consistency checking and anomaly detection, including novel anomalies, e.g., interval gaps. We present evaluation results gained from our tool implementation showing applicability and scalability to largerscale models.
Keywords
Software product lines Cloudbased systems Cardinalitybased feature models Integer Linear Programming (ILP)1 Introduction
Feature models become more and more established for specifying variability of highlyconfigurable software, e.g., software product lines [11]. Feature models are used during domain engineering to tailor configuration spaces of product lines in terms of available configuration parameters (features) and respective constraints, restricting their combinations within valid configurations. Each feature constitutes a uservisible (Boolean) configuration option from the problem domain, being mapped onto variable implementation artifacts within the solution space. This way, customertailored products are derivable from a common code base during application engineering. The FODA feature diagram notation is a frequently used graphical representation for feature models [6, 22]. FODA feature diagrams organize features as nodes in a treelike layout to denote a parentchild hierarchy. This feature tree is enriched with constructs to describe logical dependencies among features. Semantically, a feature model specifies a set of valid product configurations, i.e., those feature combinations satisfying all constraints. Recent approaches to formalizing feature model semantics either use algebraic representations [19, 34], or transformations into equivalent constraint problems, e.g., propositional formulas (SAT) [5, 25], and CSP [7]. The latter approach allows for applying offtheshelf constraintsolvers for automatically validating desirable semantic properties of feature models such as constraint consistency and absence of anomalies, e.g., dead features [6].
However, FODA feature diagram notation is, in many cases, not expressive enough for capturing all userconfigurable properties of realworld applications. In particular, two major extensions to feature models have been proposed, usually summarized under the term extended feature models (EFM), namely (1) nonBoolean feature attributes and respective constraints to denote extrafunctional properties of features, and (2) UMLlike feature multiplicities [32] in terms of cardinality annotations and respective constraints to allow selections of multiple feature instances (also referred to as copies), including (recursive) clones of their corresponding subtrees [14]. Semantically, both concepts impose extensions to the notion of product configurations by means of (1) feature types beyond Boolean, and (2) multisets of selected feature instances. Both extensions complicate feature model semantics, thus automated consistency checking and anomaly detection becomes even more important for their applicability in practice. Concerning (1), various promising approaches have been proposed for analyzing nonBoolean configuration constraints [7, 9, 20, 23]. In contrast, concerning (2), only preliminary attempts exist so far [12, 14, 26, 29, 30], although cardinalitybased variability modeling is emerging in nowadays applications and, therefore, recently found its way into novel modeling approaches like CVL [16] and Clafer [3]. As a prominent example, for cloudbased systems, not only the type, but also the amount of available resources is explicitly configurable by the user [28], especially including (virtually) unrestricted resources [35]. The resulting compound cardinality intervals lead to novel kinds of anomalies by means of dead cardinality, cardinality interval gaps and false unbounded cardinality.
In this paper, we present a comprehensive formalization and automated validation technique for cardinalitybased feature models (CFM). We support cardinality annotations including compound cardinality intervals and unbounded cardinality for singleton features, feature groups, as well as crosstree constraints. Our approach is motivated by a realworld cloudbased application [31]. We further introduce a normal form for cardinality constraints and enhance established notions of feature model consistency and anomaly to explicitly take feature cardinality constraints into account. Our tool implementation, presented in full detail in an accompanying tool paper [33], combines ILP solvers for intervalbound analysis and SMT solvers for intervalgap analysis to automate validation of cardinalitybased feature models. We provide evaluation results from experiments investigating applicability and scalability of our validation approach for input models of varying sizes and complexity.
2 CardinalityBased Feature Models
2.1 Background
All components of an AR game are highly configurable, including dynamic reconfigurations for runtime adaptation. Configuration decisions not only comprise presence or absence of functionality, but also the available amount of particular resources. Thus, CFM provide a suitable formalism to capture all relevant configuration choices and respective constraints of AR games. Figure 1 shows the CFM for configuring the Dissemination Strategy, the communication Interface and Channel properties of a (potentially unbounded) number of Nodes forming a FanOut Group. Similar to FODA notation [22], configuration parameters (features) reside in a treelike diagram denoting a feature decomposition hierarchy. As a crucial extension, CFM differentiate between selectable/deselectable feature types as usual and, additionally, for each selected feature type, the multiplicity of occurrences of feature instances together with copies of their corresponding subtrees within configurations [14]. Restrictions on selections of both feature types and instances are specified by cardinality intervals (l, u), where l denotes the lower bound and u denotes the upper bound for the number of feature types or instances [32]. In particular, the CFM language considered in this paper provides the following constructs.

Feature instance cardinality, annotated as \(\langle l,u\rangle \) on the leftmost position on top of each feature rectangle, restricts the minimum and maximum number of feature instances selectable from the subtree clone of respective parent feature instances. In our example, \(\langle 1,1\rangle \) denotes that exactly one Dissemination Strategy is selectable, whereas \(\langle 1,\texttt {*}\rangle \) denotes that arbitrary many, but at least one Node must be part of a FanOut Group.

Feature group type cardinality, annotated as [l, u], restricts the minimum and maximum number of types of feature instances selectable from the set of all immediate subfeatures of a selected feature instance. In our example, [1, 1] denotes that either instances of WiFi, or of BT must be selected for the Interface, whereas [2, 3] denotes that at least two types of Channels from the given three options must be instantiated in a FanOut Group.

Feature group instance cardinality, annotated as \(\langle l,u\rangle \) at the righthand side of each group arc, restricts the minimum and maximum number of feature instances of any type selectable from the set of all immediate subfeature types. In our example, \(\langle 3,\texttt {*}\rangle \) denotes that arbitrary many, but at least three Channel instances are required for each Node.

Crosstree edges by means of require and excludeedges annotated with \(\langle l,u\rangle \) constraints at both the source and target feature rectangles [30], define constraints on the number of instances of arbitrary pairs of features. In our example, if at least one instance of Reliable is selected in a subtree clone, then no instance of Probabilistic Broadcast is allowed in the FanOut Group and vice versa. In addition, if between 1 and 5 Nodes are selected in a FanOut Group, then BT is used for all Nodes and WiFi, otherwise.
Combining different cardinality annotations in one CFM may lead to complicated dependencies among feature types and their possible number of instances. In order to provide a precise characterization of CFM configuration semantics, we provide a CFM formalization in the following. We first define the abstract syntax of CFM. Therefore, we introduce an interval language to express cardinality intervals (l, u) as pairs of lower and upper cardinality bounds, both given by natural numbers, or, in case of upper bounds, also by the special symbol \(\texttt {*}\) denoting unbounded cardinality. By convention, \(k<\texttt {*}\) holds for any \(k\in \mathbb {N}_{0}\). Compound cardinality intervals are defined as the union of multiple (nonoverlapping) intervals \((l_1, u_1), (l_2, u_2),\ldots ,(l_n, u_n)\).
Definition 1
(Cardinality Interval). The set of cardinality intervals is defined as \(\mathcal {I}\subset \mathbb {N}_{0}\times (\mathbb {N}_{0}\cup \{\texttt {*}\})\), where \((l,u)\in \mathcal {I}\) iff \(l\le u\) holds. The set \(\mathcal {L}\subset _{\textit{fin}} 2^{\mathcal {I}}\) of compound cardinality intervals contains all finite subsets \(L\in \mathcal {L}\) of \(\mathcal {I}\) such that for all pairs \((l_i,u_i)\in L\), \((l_j,u_j)\in L\), \(i\ne j\), either \(l_i > u_j\), or \(u_i < l_j\) holds.
We further require compound intervals \(L\in \mathcal {L}\) to be defined as concise as possible, e.g., \(\{(1,4)\}\) instead of \(\{(1,2),(3,4)\}\). Intervals \(L\in \mathcal {L}\) are used for all kinds of cardinality annotations in a CFM as described above. A CFM consists of a finite set F of features together with a hierarchy relation \(\prec _F\) defining the tree hierarchy on F such that \(f\prec _{F}f'\) denotes f to be the parent feature of \(f'\). In addition, a feature instance cardinality interval \(\lambda _I^F(f)\in \mathcal {L}\) is assigned to every feature \(f\in F\) by a function \(\lambda _I^F\), as well as a group type cardinality interval \(\lambda _T^G(f)\in \mathcal {L}\) by a function \(\lambda _T^G\), and a group instance cardinality interval \(\lambda _I^G(f)\in \mathcal {L}\) by a function \(\lambda _I^G\). Both \(\lambda _T^G(f)\) and \(\lambda _I^G(f)\) define cardinality intervals on the set of direct subfeatures of feature f with respect to \(\prec _F\), hence we do not allow multiple direct subgroups below one feature node. Furthermore, we require for every nonleaf feature \(f\in F\)\(\lambda _I^F(f)\), as well as \(\lambda _T^G(f)\) and \(\lambda _I^G\) to be properly defined, even if f only contains a singleton subfeature \(f'\), e.g., by assuming default group cardinality constraints \(\lambda _T^G(f)=(0,1)\) and \(\lambda _I^G(f)=(0,\texttt {*})\). Crosstree edges consist of four components, i.e., the source feature and the target feature and corresponding cardinality annotations restricting the number of feature instances. Due to the binary nature of crosstree edges, cardinality intervals referring to feature types are meaningless and, therefore, not supported.
Definition 2

\(\prec _F \subseteq F \times F\) is a feature decomposition relation,

\(\lambda _I^F: F \rightarrow \mathcal {L}\) is a feature instance cardinality function,

\(\lambda _T^G: F \rightarrow \mathcal {L}\) is a feature group type cardinality function,

\(\lambda _I^G: F \rightarrow \mathcal {L}\) is a feature group instance cardinality function,

\(\varPhi _R \subseteq F \times \mathcal {L} \times \mathcal {L} \times F\) is a feature instance requireedge cardinality relation,

\(\varPhi _X \subseteq F \times \mathcal {L} \times \mathcal {L} \times F\) is a feature instance excludeedge cardinality relation.
For a CFM to be syntactically wellformed, it must satisfy further properties.

\(\prec _F\) forms a finite rooted tree on F, i.e., \(\prec _F^+\) is a strict partial order on F with root feature \(f_r \in F\) as unique minimal element, and for each \(f \in F\), \(f\ne f_r\), there is exactly one direct predecessor node \(f' \in F\) with \(f' \prec _F f\).

Root feature \(f_r\) is a mandatory singleinstance feature, i.e., \(\lambda _I^F(f_r) = (1,1)\).

Leaf nodes have empty group cardinality intervals, i.e., for each \(f\in F\) with \(\not \exists f'\in F: f\prec _F f'\), \(\lambda _G^I(f)=\lambda _G^T(f)=(0,0)\) holds.
Further wellformedness criteria may be imposed, e.g., forbidding \(\texttt {*}\) as upper bound for feature group type cardinality. However, these and far more complicated cases are comprehensively treated by the normal form in Definition 6.
Here, we apply the global interpretation, constituting – in our opinion – the more intuitive and graspable CFM semantics. CFM configuration semantics characterizes those valid feature subtree copies with corresponding parentchild feature instance dependencies satisfying all cardinality constraints. Our CFM semantics is based on multisets M over set F to denote the number of feature instances selected in a configuration. A multiset \(M:F\rightarrow \mathbb {N}_{0}\) over set F defines a mapping from each element \(f\in F\) onto a natural number \(k=M(f)\), defining the multiplicity of f, where \(k=0\) denotes absence of f in M. We write \(f_i^k\in M\), \(1\le k\le M(f_i)\) for short to refer to the kth instance of feature \(f_i \in F\) within multiset M with \(M(f_i)>0\). Furthermore, given a compound interval \(L = \{(l_1,u_1),(l_2,u_2),\ldots ,(l_n,u_n)\}\in \mathcal {L}\) and \(k\in \mathbb {N}_{0}\), we write \(k\sqsubseteq L\) if \((l_i,u_i)\in L\) such that \(l_i\le k\le u_i\) holds. We further denote a relation \(\prec ^{M}_{F}\subseteq M \times M\) on multiset M, relating child feature instances to parent feature instances.
Definition 3

\(M(f_r) = 1\),

if \(f^{k}_i \prec ^{M}_{F} f^{l}_{j}\) then \(f_i \prec _{F} f_{j}\) and \({(\prec ^{M}_{F})}^{+}\) forms a rooted tree on M,

if \(f_i^k \in M\), then for each \(f_j \in F\) with \(f_{i}\prec _F f_{j}\) it holds that \(\{f_j^l \in M  f_i^k \prec ^{M}_F f_j^l\}\sqsubseteq \lambda _{F}^{I}(f_j),\)

if \(f_i^k \in M\), then it holds that \(\{f_j^l \in M  f_i^k \prec ^{M}_F f_j^l\}\sqsubseteq \lambda _{G}^{I}(f_i),\)

if \(f_i^k \in M\), then it holds that \(\{f_j \in F  \exists f_j^l\in M :f_i^k \prec ^{M}_F f_j^l\}\sqsubseteq \lambda _{G}^{F}(f_i),\)

if \((f_i,L_i,L_j,f_j)\in \varPhi _R\) and \(M(f_i)\sqsubseteq L_{i}\) then \(M(f_j)\sqsubseteq L_{j}\), and

if \((f_i,L_i,L_j,f_j)\in \varPhi _X\) and \(M(f_i)\sqsubseteq L_{i}\) then \(M(f_j)\not \sqsubseteq L_{j}\) and vice versa.
By \(\llbracket \textit{CFM}\,\rrbracket \), we refer to the set of all valid configurations of \(\textit{CFM}\).
2.2 Analysis of CardinalityBased Feature Models
We are now able to characterize fundamental validity properties of CFM. In particular, we define consistency of CFM in terms of the absence of inconsistent cardinality constraints. By including \(\texttt {*}\) as cardinality bound, CFM allow to select an apriori unbounded number of feature instances and, therefore, a potentially infinite number of configurations.
Definition 4
(Consistent and Bounded CFM). A CFM is consistent iff it holds that \(\llbracket \textit{CFM}\,\rrbracket \ne \emptyset \). A CFM is bounded iff \(\texttt {*}\) does not occur in a cardinality annotation. A CFM is false unbounded iff \(\texttt {*}\) occurs in at least one cardinality annotation and \(\llbracket \textit{CFM}\,\rrbracket <\infty \) holds, and CFM is unbounded, else.
False unboundedness is one example for an undesirable CFM property going beyond syntactic wellformedness criteria. To generalize, we recall the notion of anomaly to summarize undesirable semantic CFM properties. For FODA feature models, several types of anomalies and accompanying validation techniques have been proposed, e.g., dead features and false optional features [6]. First proposals exist to lift the anomaly notion also to CFM, e.g., dead cardinality anomaly [30].
Definition 5
(Dead Feature Instance Cardinality).\(k\sqsubseteq \lambda _F^{I}(f_{i})\) is a dead feature instance cardinality of \(f_{i}\in F\), if no \((M,\prec _{F}^{M})\in \llbracket \textit{CFM}\,\rrbracket \) with \(f_j^{k}\in M\) and \(f_j \prec _{F} f_i\) exists such that \(\{f_i^{l}\in M  f_j^{k}\prec _{f}^{M}f_{i}^{l}\}=k\) holds.
For other kinds of cardinality intervals of a CFM, the notion of dead cardinality can be defined, accordingly. Hence, for a feature f to be dead in a CFM, every cardinality \(k\sqsubseteq \lambda _F^{I}(f_{i})\) must be dead, thus the actual feature cardinality instance interval of f is (0, 0), and a CFM is inconsistent if all features are dead.
The example in Fig. 2 exhibits several subtle cases of CFM anomalies. For example, the group instance cardinality \(\langle 1,\texttt {*}\rangle \) of \(f_0\) is false unbounded as the maximum number of possible childfeature instances is 11. The same holds for the interval \(\langle 1,\texttt {*}\rangle \) on the righthand side of the excludeedge between \(f_1\) and \(f_2\) whose upper bound is actually limited to 2. In contrast, feature \(f_5\) is truly unbounded thus making the entire CFM unbounded. Besides (false) unbounded intervals, this CFM contains further anomalies concerning bounded cardinality intervals. The lower bound 1 of the group instance cardinality interval \(\langle 1,\texttt {*}\rangle \) of \(f_0\) is a dead cardinality, as at least one instance of both \(f_1\) and \(f_2\) must be selected. Thus, lower bound 1 of group type cardinality [1, 3] of \(f_0\) is also dead. In addition, the lower bound of the target feature node cardinality interval \(\langle 2,6\rangle \) of the requireedge from \(f_4\) to \(f_1\) is actually 6 instead of 2. Besides CFM anomalies affecting upper and/or lower bounds of cardinality intervals, a dead cardinality might be also located within intervals, thus imposing interval gaps. For example, the group instance cardinality of \(f_0\) contains a gap at (6, 6) as no valid combination of feature instances of \(f_1\), \(f_2\), and \(f_3\) with an overall number of 6 is possible. As an even more subtle case, feature instance cardinality interval \(\langle 1,7\rangle \) of \(f_1\) contains the interval gap (2, 5).
Definition 6

\(\llbracket \overline{\textit{CFM}}\,\rrbracket = \llbracket \textit{CFM}\,\rrbracket \),

\(\overline{F} = F\), \(\overline{\prec }_{F} = \prec _{F}\), \({\overline{\varPhi }}_{R}\subseteq \varPhi _{R}\), \({\overline{\varPhi }}_{X}\subseteq \varPhi _{X}\), and

for each \(f_{i},f_{j}\in \overline{F}\), \(\overline{\lambda }_I^F(f_{i})\), \(\overline{\lambda }_T^G(f_{i})\), \(\overline{\lambda }_I^G(f_{i})\), as well as \(L_{i}\) and \(L_{j}\) in each \((f_{i},L_{i},L_{j},f_{j})\in \overline{\varPhi }_{R}\) and \((f_{i},L_{i},L_{i},f_{j})\in \overline{\varPhi }_{X}\) are minimal with respect to \(\precsim \).
Applied to the CFM in Fig. 2, the resulting normal form is shown in Fig. 3(a). The following property is a direct consequence of Definitions 5 and 6.
Theorem 1
For any \(\textit{CFM}\) according to Definition 2, a normal form \(\overline{\textit{CFM}}\) exists and \(\overline{\textit{CFM}}\) contains no dead cardinality.
In contrast, a normal form is, in general, not unique as removals of (mutually depending) redundant crosstree edges may yield ambiguous results. A procedure for computing normal forms would allow for automatically consolidating and validating CFM, e.g., during domain analysis. However, constraintsolvers for SAT and CSP, usually used for validating FODA feature models, are not applicable for CFM validation due to the potentially unbounded search space.
3 Automated Anomaly Detection for CFM
We observe two potential causes for anomalies in CFM during normal form computation due to faulty declarations of cardinality intervals: (1) unsatisfiable lower/upper bounds (including false unbounded), and (2) unsatisfiable subranges (gaps). For (1), we encode CFM semantics in an ILP representation and use a respective ILPsolver for bound analysis, whereas for (2), we apply an SMTsolver to find interval gaps. To keep the presentation concise, we focus our considerations on input models \(\textit{CFM}\) with noncompound cardinality intervals \(L\in \mathcal {I}\).
Detection of Interval Gaps. The ILPbased approach for intervalbound analysis is not directly applicable for intervalgap analysis as gaps are, by definition, not located at minima/maxima locations of the search space. For example, for detecting the group instance cardinality interval gap at (6, 6) of \(f_0\) in Fig. 2, we have to check whether (6, 6) is a feasible value for the corresponding feature multiplicity variables. Hence, detecting interval gaps does not constitute an optimization problem, but rather a constraint satisfaction problem incorporating integer inequalities. To this end, an SMTsolver is applicable, being capable of interpreting firstorder logics equipped with linear Integer arithmetics theory according to our ILP encoding of CFM semantics (cf. Fig. 3). For gap analysis, every subrange of all cardinality intervals of a CFM has to investigated, where in case of unbounded intervals, analysis has to be performed up to \(\textit{M}\).
Normal Form Computation. We can now combine intervalbound analysis and intervalgap analysis to compute CFM normal forms. By \(\texttt {ILP(CFM,interval)}\) we denote ILPsolver calls to investigate a particular cardinality \(\texttt {interval}\) of \(\texttt {CFM}\). The call returns the actual lower and upper bound of that interval to potentially replace the declared intervals within the normal form. For lower bounds of cardinality intervals defined by \(\lambda _I^F\), \(\lambda _I^G\) and \(\lambda _T^G\), the result is either greater than, or equal to the declared lower bound. For upper bounds, the result is either lower than, or equal to the declared upper bound. In case of unbounded intervals, the call either returns a concrete value in case of false unboundedness, or reports unboundedness. In case of infeasible intervals, the call returns (0, 0). For intervalgap analysis, we denote \(\texttt {SMT(CFM,interval,range)}\) for respective SMTsolver calls, where \(\mathtt {range}\) is a subrange of \(\mathtt {interval}\) to be investigated. For reducing the search space for gap detection, parameter \(\mathtt {range}\) can be obtained from ILPbased bound analysis. The SMT call reports invalid subranges within \(\mathtt {range}\) leading to compound intervals within the normal form. Finally, for cardinality intervals \(L_i\), \(L_j\) of crosstree edges \((f_i, L_i, L_j, f_j) \in \varPhi _Y\), \(Y\in \{R,X\}\), bound and gap analysis is, in general, performed as described above. In contrast, infeasibility of source and/or target feature node intervals imposes incremental removals of the corresponding edges from \(\varPhi _Y\) during normal form computation.
4 Experimental Evaluation
We implemented CFM bound analysis and gap detection in a tool providing textual syntax for specifying input CFM models [33]. Here, we present evaluation results gained from several experiments performed with our tool. We address the following research questions.

(RQ1) Is CFM normal form computation applicable to realworld input models?

(RQ2) How does the size and complexity of CFM affect scalability of CFM analysis?

(RQ3) How does the ILPbased feasibility check perform on FODA feature models compared to a SATbased satisfiability check?
For (RQ1), we computed the normal form for the AR game CFM which includes bound analysis for 27 intervals, thus requiring 54 ILPsolver calls. The CPLEX ILPsolver took about 10 ms per call. Gap analysis included 27 intervals which took about 15.71 s per call. The resulting normal form exposed a false unbounded group instance interval anomaly for the Channels group, thus the unbounded interval symbol \(\texttt {*}\) is replaced by 11.
Concerning (RQ2), we performed regression analysis to estimate influences of model characteristics on CFM analysis performance metrics. To identify significant coefficients, we applied multiple linear regression analysis on input data sets by randomly varying all generation criteria. We applied ttests to check significance of regression coefficients. With significance level \(p<0.05\), we identified (a) number of features, and (b) crosstree constraint ratio (CTCR), (c) ratio of unbounded cardinality intervals, as well as (d) CFM feasibility as coefficients with potentially high influences on runtime of ILPbased bound analysis. In contrast, the influence of average number of feature instances is not significant. Figure 4 contains the results of one bound analysis run for individual variation of coefficients (a)–(d). The plots show that runtime of ILPbased bound analysis is dominated by (a) and (b), as the size of the feature tree and the number of crosstree edges directly affects the number of decision variables and constraints. The results show that ILPbased analysis of one particular bound for CFM with 5,000 features takes about 50 ms and thus about 21 min. for complete bound analysis. This can be considered industrial strength. In contrast, for SMTbased gap analysis, we were only able to obtain runtime analysis results for smallsized (and mostly bounded) CFM up to at most 200 features. As expected, runtime of SMTbased gap analysis tends to show exponential growth with increasing average size of cardinality intervals. For (RQ3) we conducted multiple linear regression to estimate influences of FODA feature model characteristics, i.e., with CFM restricted to cardinality intervals between 0 and 1, for comparing runtime of satisfiability checks using SAT and ILPsolvers. We identified coefficients number of features, CTCR and CFM feasibility as highly significant (\(p<0.01\)). For CPLEX, the maximum branching factor has no significant influence. As shown in Fig. 4, the SATsolver exhibits lower runtime metrics with increasing model size compared to ILP. Nevertheless, ILPsolvers perform remarkably well, with differences in runtime metrics by means of a constant factor only up to models with 5,000 features (Fig. 5).
Threats to Validity. Threats to validity may arise from our experimental input data selection. Concerning (RQ1), the cloudbased AR game is part of a major research project and has already been used for experimental evaluation [31]. Similarly, our design choices for CFM syntax and semantics are derived from requirements of clouddomain experts. Concerning synthetic data for (RQ2) and (RQ3), we employed the wellestablished BeTTy tool for generating FODAlike feature trees, additionally augmented with cardinality intervals. The cardinality interval test data is dimensioned according to characteristics of our case study in order to obtain realistic models. To the best of our knowledge, there does neither exist a fullyfledged CFM generator, nor related approaches for comprehensive CFM analysis as in our approach. Hence, neither a qualitative, nor a quantitative comparison to existing other approaches has been possible so far.
5 Related Work
Formalization of CardinalityBased Feature Models. Riebisch et al. first propose to extend FODA notation with UMLlike multiplicities by means of feature group cardinality [32]. Czarnecki et al. extend feature models with group and feature cardinality, but forbid combinations of both [13]. Thereupon, Czarnecki et al. define CFM semantics based on subtree clones and propose their translation into a contextfree grammar [14]. They also permit unbounded cardinality but do no investigate their semantic impact. Quinton et al. introduce source and target cardinality for requireedges [30]. However, their approach does neither consider excludeedges, nor combinations of feature instance and group cardinality. Quinton et al. also mention unbounded cardinality, but neither address it in CFM semantics, nor as part of CFM analysis. Michel et al. investigate semantic ambiguities due to combinations of feature and group cardinality and distinguish local clonebased from global featurebased interpretation of group type cardinality intervals, being similar to our notion of group instance and group type cardinality intervals [26]. However, they only consider global featurebased interpretation being similar to our notion of group type cardinality intervals. Cordy et al. allow combinations of feature and group cardinality, but for the latter only consider group type cardinality intervals [12]. Again, neither Michel et al., nor Cordy et al. handle unboundedness semantically and during CFM analysis.
Automated Analysis of CardinalityBased Feature Models. Quinton et al. define inconsistent CFM similar to our notion of dead cardinality anomaly and perform inconsistency detection using CSP [28, 29, 30]. Cordy et al. in [12] and Zhang et al. in [38] present BDDbased CFM consistency analysis. However, neither of these approaches is able to handle unbounded configuration spaces and/or interval gaps, nor provide a normal form for CFM.
Analyzing Models with Unbounded Cardinality. Other modeling languages also employ the concept of cardinality to restrict instance multiplicities of model entities. CVL [16] provide iterators to mimic cardinality in feature diagrams including unbounded intervals, and the specification language Clafer combines concepts from UML and feature modeling including group and feature instance cardinality [2]. However, no systematic analysis of unbounded cardinality is provided yet. In addition, several approaches have been proposed for analyzing multiplicities in UML class diagrams using Alloy [1], CSP [10], and ILP [15] but none of them explicitly handles unboundedness. Balaban et al. present a graphbased algorithm for tightening multiplicities in UML class diagrams [4]. However, the approach essentially differs from CFM normal form computation as no (recursively) cloned subtree hierarchy, crosstree edges and multiple cardinality constraints per entities occur in class diagrams. Amongst others, Boufares et al. consider inconsistency in cardinality constraints of database schema definitions including unbounded cardinality, but do not take interval gaps into account [8].
6 Conclusion
We presented a comprehensive formalization of CFM configuration semantics including unbounded cardinality intervals. We further presented evaluation results gained from experiments conducted with our tool implementation for computing normal forms of CFM. The results show the general applicability and scalability of ILPbased bound analysis. For scalable gap analysis, we aim at replacing the SMTsolver also by an ILPsolver in our future work. We also plan to conduct further experiments including realworld case studies and alternative CFM semantics [26]. For integrating CFM into a fullyfledged engineering process with accompanying tool support, we plan to develop a methodology for mapping feature instances to solution space artifacts as, e.g., propagated by CVL [16].
Notes
Acknowledgment
This work was partially supported by the DFG (German Research Foundation) as part of projects B01 and C02 within CRC 1053 – MAKI and under SPP 1593: Design For Future – Managed Software Evolution.
References
 1.Anastasakis, K., Bordbar, B., Georg, G., Ray, I.: On challenges of model transformation from UML to Alloy. Softw. Syst. Model. 9(1), 69–86 (2010)CrossRefGoogle Scholar
 2.Bąk, K., Czarnecki, K., Wąsowski, A.: Feature and metamodels in Clafer: mixed, specialized, and coupled. In: Malloy, B., Staab, S., Brand, M. (eds.) SLE 2010. LNCS, vol. 6563, pp. 102–122. Springer, Heidelberg (2011)CrossRefGoogle Scholar
 3.Bak, K., Diskin, Z., Antkiewicz, M., Czarnecki, K., Wasowski, A.: Clafer: unifying class and feature modeling. Softw. Syst. Model. 1–35 (2014)Google Scholar
 4.Balaban, M., Maraee, A.: Simplification and correctness of UML class diagrams – focusing on multiplicity and aggregation/composition constraints. In: Moreira, A., Schätz, B., Gray, J., Vallecillo, A., Clarke, P. (eds.) MODELS 2013. LNCS, vol. 8107, pp. 454–470. Springer, Heidelberg (2013)CrossRefGoogle Scholar
 5.Batory, D.: Feature models, grammars, and propositional formulas. In: Obbink, H., Pohl, K. (eds.) SPLC 2005. LNCS, vol. 3714, pp. 7–20. Springer, Heidelberg (2005)CrossRefGoogle Scholar
 6.Benavides, D., Segura, S., RuizCortés, A.: Automated analysis of feature models 20 years later: a literature review. Inf. Syst. 35(6), 615–636 (2010)CrossRefGoogle Scholar
 7.Benavides, D., Trinidad, P., RuizCortés, A.: Automated reasoning on feature models. In: Pastor, Ó., Falcão e Cunha, J. (eds.) CAiSE 2005. LNCS, vol. 3520, pp. 491–503. Springer, Heidelberg (2005)CrossRefGoogle Scholar
 8.Boufares, F., Bennaceur, H.: Consistency problems in ERschemas for database systems. Inf. Technol. 163(4), 263–274 (2004)MathSciNetzbMATHGoogle Scholar
 9.Bürdek, J., Lity, S., Lochau, M., Berens, M., Goltz, U., Schürr, A.: Staged configuration of dynamic software product lines with complex binding time constraints. In: VaMoS 2014, pp. 16: 1–16: 8 (2014)Google Scholar
 10.Cadoli, M., Calvanese, D., De Giacomo, G., Mancini, T.: Finite model reasoning on UML class diagrams via constraint programming. In: Basili, R., Pazienza, M.T. (eds.) AI*IA 2007. LNCS (LNAI), vol. 4733, pp. 36–47. Springer, Heidelberg (2007)CrossRefGoogle Scholar
 11.Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. AddisonWesley Longman Publishing Co., Inc, Boston (2001)Google Scholar
 12.Cordy, M., Schobbens, P.Y., Heymans, P., Legay, A.: Beyond boolean productline model checking: dealing with feature attributes and multifeatures. In: ICSE 2013, pp. 472–481 (2013)Google Scholar
 13.Czarnecki, K., Helsen, S.: Staged configuration using feature models. In: Nord, R.L. (ed.) SPLC 2004. LNCS, vol. 3154, pp. 266–283. Springer, Heidelberg (2004)CrossRefGoogle Scholar
 14.Czarnecki, K., Helsen, S., Eisenecker, U.W.: Formalizing cardinalitybased feature models and their specialization. Softw. Process Improv. Pract. 10(1), 7–29 (2005)CrossRefGoogle Scholar
 15.Falkner, A., Feinerer, I., Salzer, G., Schenner, G.: Computing product configurations via UML and integer linear programming. Int. J. Mass Customisation 3(4), 351–367 (2010)CrossRefGoogle Scholar
 16.Fleurey, F., Haugen, Ø., MøllerPedersen, B., Svendsen, A., Zhang, X.: Standardizing variability – challenges and solutions. In: Ober, I., Ober, I. (eds.) SDL 2011. LNCS, vol. 7083, pp. 233–246. Springer, Heidelberg (2011)CrossRefGoogle Scholar
 17.GNU Linear Programming Kit, Version 4.55. http://www.gnu.org/software/glpk/glpk.html
 18.Gurobi Optimization, I.: Gurobi Optimizer Reference Manual (2015). http://www.gurobi.com
 19.Heymans, P., Schobbens, P.Y., Trigaux, J.C., Bontemps, Y., Matulevicius, R., Classen, A.: Evaluating formal properties of feature diagram languages. IET Softw. 2(3), 281–302 (2008)CrossRefGoogle Scholar
 20.Hubaux, A., Heymans, P., Schobbens, P.Y., Deridder, D.: Towards multiview featurebased configuration. In: Wieringa, R., Persson, A. (eds.) REFSQ 2010. LNCS, vol. 6182, pp. 106–112. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 21.IBM ILOG CPLEX V12.6 User’s Manual for CPLEX. IBM Corp. (2015). http://www01.ibm.com/software/commerce/optimization/cplexoptimizer/
 22.Kang, K.C., Cohen, S.G., Hess, J.A., Novak, W.E., Peterson, S.A.: Feature oriented domain analysis (FODA). Technical report, CMU (1990)Google Scholar
 23.Karataş, A.S., Oğuztüzün, H., Doğru, A.: Mapping extended feature models to constraint logic programming over finite domains. In: Bosch, J., Lee, J. (eds.) SPLC 2010. LNCS, vol. 6287, pp. 286–299. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 24.Le Berre, D., Parrain, A.: The Sat4j Library, Release 2.2. J. Satisfiability Boolean Model. Comput. 7, 59–64 (2010)Google Scholar
 25.Mendonça, M., Wasowski, A., Czarnecki, K.: SATbased analysis of feature models is easy. In: 13th SPLC, pp. 231–240 (2009)Google Scholar
 26.Michel, R., Classen, A., Hubaux, A., Boucher, Q.: A formal semantics for feature cardinalities in feature diagrams. In: VaMoS 2011, pp. 82–89 (2011)Google Scholar
 27.de Moura, L., Bjørner, N.S.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)CrossRefGoogle Scholar
 28.Quinton, C., Romero, D., Duchien, L.: Automated selection and configuration of cloud environments using software product lines principles. In: IEEE Cloud 2014, pp. 144–151 (2014)Google Scholar
 29.Quinton, C., Pleuss, A., Berre, D.L., Duchien, L., Botterweck, G.: Consistency checking for the evolution of cardinalitybased feature models. In: SPLC 2014, pp. 122–131 (2014)Google Scholar
 30.Quinton, C., Romero, D., Duchien, L.: Cardinalitybased feature models with constraints: a pragmatic approach. In: SPLC 2013, pp. 162–166 (2013)Google Scholar
 31.Richerzhagen, B., Stingl, D., Hans, R., Groß, C., Steinmetz, R.: Bypassing the cloud: peerassisted event dissemination for augmented reality games. In: P2P 2014, pp. 1–10 (2014)Google Scholar
 32.Riebisch, M., Böllert, K., Streitferdt, D., Philippow, I.: Extending feature diagrams with UML multiplicities. In: 6th World Conference on Integrated Design & Process Technology (IDPT) (2002)Google Scholar
 33.Schnabel, T., Weckesser, M., Kluge, R., Lochau, M., Schürr, A.: CardyGAn: tool support for cardinalitybased feature models. In: VaMoS 2016 (2016) (to appear)Google Scholar
 34.Schobbens, P.Y., Heymans, P., Trigaux, J.C.: Feature diagrams: a survey and a formal semantics. In: Proceedings of RE 2006, pp. 139–148 (2006)Google Scholar
 35.Schroeter, J., Mucha, P., Muth, M., Jugel, K., Lochau, M.: Dynamic configuration management of cloudbased applications. In: SPLC 2012, pp. 171–178 (2012)Google Scholar
 36.Segura, S., Galindo, J., Benavides, D., Parejo, J., RuizCortés, A.: BeTTy: benchmarking and testing on the automated analysis of feature models. In: VaMoS 2012, pp. 63–71 (2012)Google Scholar
 37.Williams, H.P.: Model Building in Mathematical Programming. John Wiley & Sons, Hoboken (2013)zbMATHGoogle Scholar
 38.Zhang, W., Yan, H., Zhao, H., Jin, Z.: A BDDbased approach to verifying cloneenabled feature models’ constraints and customization. In: Mei, H. (ed.) ICSR 2008. LNCS, vol. 5030, pp. 186–199. Springer, Heidelberg (2008)CrossRefGoogle Scholar