Abstract
Feature Models (FMs) are a popular formalism for modeling and reasoning about the configurations of a software product line. As the manual construction of an FM is time-consuming and error-prone, management operations have been developed for reverse engineering, merging, slicing, or refactoring FMs from a set of configurations/dependencies. Yet the synthesis of meaningless ontological relations in the FM – as defined by its feature hierarchy and feature groups – may arise and cause severe difficulties when reading, maintaining or exploiting it. Numerous synthesis techniques and tools have been proposed, but only a few consider both configuration and ontological semantics of an FM. There are also few empirical studies investigating ontological aspects when synthesizing FMs. In this article, we define a generic, ontologic-aware synthesis procedure that computes the likely siblings or parent candidates for a given feature. We develop six heuristics for clustering and weighting the logical, syntactical and semantical relationships between feature names. We then perform an empirical evaluation on hundreds of FMs, coming from the SPLOT repository and Wikipedia. We provide evidence that a fully automated synthesis (i.e., without any user intervention) is likely to produce FMs far from the ground truths. As the role of the user is crucial, we empirically analyze the strengths and weaknesses of heuristics for computing ranking lists and different kinds of clusters. We show that a hybrid approach mixing logical and ontological techniques outperforms state-of-the-art solutions. We believe our approach, environment, and empirical results support researchers and practitioners working on reverse engineering and management of FMs.
Similar content being viewed by others
Notes
As feature modeling is a notational subset of ontologies, feature models can be translated into description logics (Fan and Zhang 2006) and ontology languages like OWL (Wang et al. 2007). The purpose of the translation is to reuse existing description logic solvers and automatically reason about feature models (e.g., for checking consistency) (Benavides et al. 2010). It should be noted that we do not rely on these techniques (as our goal differs).
The empirical study of the next section is precisely here to observe and quantify the limits in practical settings.
The extraction process is time-consuming (about 2 days on a single machine) and the extracted database contains approximatively 40GB of data. WikipediaMiner also provides an API to search and compare articles in the database.
The heuristics based on WikipediaMiner do not guarantee that the most relevant article is retrieved. For instance, searching Storage in Wikipedia leads to a disambiguation page linking to different types of storage. Our current strategy is to arbitrarily choose a given definition. Asking the user to choose the most appropriate article can raise this limitation and is an interesting perspective to further improve the effectiveness of our heuristic.
Essentially we remove FMs with nonsense feature names like F1 or FT22 or written in Spanish. We did not discard FMs containing feature names not recognized by our ontologies.
The threshold values were manually determined. For each heuristic, we changed the threshold value by steps of 0.1 (all our heuristics return a value between 0 and 1) to maximize the average number of features in a correct cluster.
References
Abbasi EK, Acher M, Heymans P, Cleve A (2014) Reverse engineering web configurators. In: CSMR/WRCE’14
Acher M, Cleve A, Collet P, Merle P, Duchien L, Lahire P (2011) Reverse engineering architectural feature models. In: ECSA’11, LNCS, vol 6903, pp 220–235
Acher M, Cleve A, Collet P, Merle P, Duchien L, Lahire P (2014) Extraction and evolution of architectural variability models in plugin-based systems. Software and Systems Modeling (SoSyM)
Acher M., Cleve A., Perrouin G, Heymans P, Vanbeneden C, Collet P, Lahire P. (2012) On extracting feature models from product descriptions. In: VaMoS’12, pp 45–54. ACM
Acher M, Collet P, Lahire P, France R (2013) Familiar: A domain-specific language for large scale management of feature models. Sci Comput Program 78 (6):657–681
Acher M, Combemale B, Collet P, Barais O, Lahire P, France RB (2013) Composing your compositions of variability models. In: MoDELS’13, pp 352–369
Acher M, Heymans P, Cleve A, Hainaut JL, Baudry B (2013) Support for reverse engineering and maintaining feature models. In: VaMoS’13. ACM
Ahnassay A, Bagheri E, Gasevic D (2013) Empirical evaluation in software product line engineering. Tech. Rep. TR-LS3-130084R4T, Laboratory for Systems, Software and Semantics. Ryerson University
Aho AV, Garey MR, Ullman JD (1972) The transitive reduction of a directed graph. SIAM J Comput 1(2):131–137
Algorithm of Haslinger et al. (2013): http://www.jku.at/sea/content/e139529/e126342/e188736/
Alves V, Schwanninger C, Barbosa L, Rashid A, Sawyer P, Rayson P, Pohl C, Rummler A (2008) An exploratory study of information retrieval techniques in domain analysis. In: SPLC’08, pp 67–76. IEEE
Andersen N, Czarnecki K, She S, Wasowski A (2012) Efficient synthesis of feature models. In: Proceedings of SPLC’12, pp 97–106. ACM
Apel S, Batory D, Kästner C, Saake G (2013) Feature-Oriented Software Product Lines: Concepts and Implementation. Springer
Apel S, Kästner C (2009) An overview of feature-oriented software development. Journal of Object Technology (JOT) 8(5):49–84
Apel S, Kästner C, Lengauer C (2013) Language-independent and automated software composition: The featurehouse experience. IEEE Trans Softw Eng 39:63–79
Apel S, von Rhein A, Wendler P, Größlinger A, Beyer D (2013) Strategies for product-line verification: Case studies and experiments. In: ICSE’13. IEEE
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11. ACM, New York, pp 1–10
Baader F, Nutt W (2003) The description logic handbook. chap. Basic Description Logics. Cambridge University Press, New York, NY, USA, pp 43–95
Bagheri E, Ensan F, Gasevic D (2012) Decision support for the software product line domain engineering lifecycle. Autom Softw Eng 19(3):335–377
Bagheri E, Gasevic D (2011) Assessing the maintainability of software product line feature models using structural metrics. Softw Qual J 19(3):579–612
Bécan G, Acher M, Baudry B, Ben Nasr S (2013) Breathing ontological knowledge into feature model management. Rapport Technique RT-0441, INRIA. http://hal.inria.fr/hal-00874867
Bécan G, Nasr SB, Acher M, Baudry B (2014) WebFML: Synthesizing Feature Models Everywhere. In: SPLC’14
Bécan G, Sannier N, Acher M, Barais O, Blouin A, Baudry B (2014) Automating the formalization of product comparison matrices. In: Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, pp 433–444. ACM
Benavides D, Segura S, Ruiz-Cortes A (2010) Automated analysis of feature models 20 years later: a literature review. Information Systems 35(6):p.615–636
Berger T, She S, Lotufo R, Wasowski A, Czarnecki K (2013) A study of variability models and languages in the systems software domain . IEEE Trans Softw Eng 39(12):1611–1640
Berger T, Rublack R, Nair D, Atlee J M, Becker M, Czarnecki K, Wasowski A (2013) A survey of variability modeling in industrial practice. In: VaMoS’13. ACM
Boucher Q, Abbasi E, Hubaux A, Perrouin G, Acher M, Heymans P (2012) Towards more reliable configurators: A re-engineering perspective. In: PLEASE’12 Int’l workshop at ICSE’12
Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguis 32(1):13–47
Camerini P, Fratta L, Maffioli F (1979) A note on finding optimum branchings. Networks 9(4):309–312
Chen K, Zhang W, Zhao H, Mei H (2005) An approach to constructing feature models based on requirements clustering. In: RE’05, pp 31–40
Classen A, Heymans P, Schobbens PY, Legay A (2011) Symbolic model checking of software product lines. In: ICSE’11, pp 321–330. ACM
Classen A, Heymans P, Schobbens PY, Legay A, Raskin JF (2010) Model checking lots of systems: efficient verification of temporal properties in software product lines. In: ICSE’10, pp 335–344. ACM
Cordy M, Schobbens PY, Heymans P, Legay A (2013) Beyond boolean product-line model checking: dealing with feature attributes and multi-features. In: ICSE’13, pp 472–481
Czarnecki K, Eisenecker U (2000) Generative Programming: Methods, Tools and Applications. Addison-Wesley, Reading
Czarnecki K, Kim CHP, Kalleberg KT (2006) Feature models are views on ontologies. In: SPLC ’06, pp 41–51. IEEE
Czarnecki K, Pietroszek K (2006) Verifying feature-based model templates against well-formedness ocl constraints. In: GPCE’06, pp 211–220. ACM
Czarnecki K, She S, Wasowski A (2008) Sample spaces and feature models: There and back again. In: SPLC’08, pp 22–31
Czarnecki K, Wasowski A (2007) Feature diagrams and logics: There and back again. In: SPLC’07, pp 23–34. IEEE
Davril JM, Delfosse E, Hariri N, Acher M, Cleland-Huang J, Heymans P (2013) Feature model extraction from large collections of informal product descriptions. In: ESEC/FSE’13
Dietrich C, Tartler R, Schröder-Preikschat W, Lohmann D (2012) A robust approach for variability extraction from the linux build system. In: SPLC’12, pp 21–30
Fan S, Zhang N (2006) Feature model based on description logics. In: Gabrys B, Howlett R, Jain L (eds) Knowledge-Based Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, vol 4252. Springer, Berlin Heidelberg, pp 1144–1151
Ferrari A, Spagnolo GO, dell’Orletta F (2013) Mining commonalities and variabilities from natural language documents. In: Kishi T, Jarzabek S, Gnesi S (eds) SPLC, pp 116–120. ACM
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Hariri N, Castro-Herrera C, Mirakhorli M, Cleland-Huang J, Mobasher B (2013) Supporting domain analysis through mining and recommending features from online product listings IEEE Trans Softw Eng
Haslinger EN, Lopez-Herrejon RE, Egyed A (2011) Reverse engineering feature models from programs’ feature sets. In: WCRE’11, pp 308–312. IEEE
Haslinger EN, Lopez-Herrejon RE, Egyed A (2013) On extracting feature models from sets of valid feature combinations. In: FASE’13, LNCS, vol 7793, pp 53–67
Heidenreich F, Sanchez P, Santos J, Zschaler S, Alferez M, Araujo J, Fuentes L, amd Ana Moreira UK, Rashid A (2010) Relating feature models to other models of a software product line: A comparative study of featuremapper and vml*. Transactions on Aspect-Oriented Software Development VII. Special Issue on A Common Case Study for Aspect-Oriented Modeling 6210:69–114
Heule MJH, Järvisalo M, Biere A (2011) Efficient cnf simplification based on binary implication graphs. In: Proceedings of the 14th International Conference on Theory and Application of Satisfiability Testing, SAT’11. Springer-Verlag, Berlin, Heidelberg, pp 201–215
Hubaux A, Acher M, Tun TT, Heymans P, Collet P, Lahire P (2013) Domain Engineering: Product Lines, Conceptual Models, and Languages, chap. Separating Concerns in Feature Models: Retrospective and Multi-View Support. Springer 45(4):51
Hubaux A, Tun TT, Heymans P (2013) Separation of concerns in feature diagram languages: A systematic survey. ACM Comput Surv
Janota M, Kuzina V, Wasowski A (2008) Model construction with external constraints: An interactive journey from semantics to syntax. In: MODELS’08, LNCS, vol 5301, pp 431–445
Kang K, Lee J, Donohoe P (2002) Feature-oriented product line engineering. Software, IEEE 19(4):58–65
Kästner C, Dreiling A, Ostermann K (2013) Variability mining: Consistent semiautomatic detection of product-line features. IEEE Trans Softw Eng 40(1):67–82
Krueger CW (2007) Biglever software Gears and the 3-tiered spl methodology. In: OOPSLA’07, pp 844–845. ACM
Linden FJvd, Schmid K, Rommes E (2007) Software Product Lines in Action: The Best Industrial Practice in Product Line Engineering. Springer-Verlag, New York, Inc., Secaucus, NJ, USA
Lopez-Herrejon RE, Galindo JA, Benavides D, Segura S, Egyed A (2012) Reverse engineering feature models with evolutionary algorithms: An exploratory study. In: SSBSE’12, LNCS, vol 7515, pp 168–182. Springer
Lopez-Herrejon RE, Linsbauer L, Galindo JA, Parejo JA, Benavides D, Segura S, Egyed A (2014) assessment of search-based techniques for reverse engineering feature models. J Syst Softw. 10.1016/j.jss.2014.10.037
Medelyan O, Milne DN, Legg C, Witten IH (2009) Mining meaning from wikipedia. Int J Hum-Comput Stud 67(9):716–754
Mendonca M, Branco M, Cowan D (2009) S.p.l.o.t.: software product lines online tools. In: OOPSLA’09 (companion). ACM
Mendonca M, Wasowski A, Czarnecki K (2009) SAT-based analysis of feature models is easy. In: SPLC’09, pp 231–240. IEEE
Metzger A, Pohl K, Heymans P, Schobbens PY, Saval G (2007) Disambiguating the documentation of variability in software product lines: A separation of concerns, formalization and automated analysis. In: RE’07, pp 243–253
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Milne D (2007) Computing semantic relatedness using wikipedia link structure. In: The New Zealand Computer Science Research Student Conference. Citeseer
Milne DN, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222– 239
Mussbacher G, Araújo J, Moreira A, Amyot D (2012) Aourn-based modeling and analysis of software product lines. Softw Qual J 20(3–4):645–687
Nadi S, Berger T, Kästner C, Czarnecki K (2014) Mining configuration constraints: Static analyses and empirical results. In: Proceedings of the 36th International Conference on Software Engineering (ICSE)
Niu N, Easterbrook SM (2009) Concept analysis for product line requirements. In: Sullivan KJ, Moreira A, Schwanninger C, Gray J (eds) AOSD’09, pp 137–148. ACM
Pleuss A, Botterweck G (2012) Visualization of variability and configuration options. Int J Softw Tools Technol Transfer 14(5):497–510
Pohl K, Böckle G, van der Linden FJ (2005) Software Product Line Engineering: Foundations, Principles and Techniques. Springer-Verlag
Pohl R, Lauenroth K, Pohl K (2011) A performance comparison of contemporary algorithmic approaches for automated analysis operations on feature models. In: ASE’11, pp 313–322
Pohl R, Stricker V, Pohl K (2013) Measuring the structural complexity of feature models. In: ASE’13
pure::variants: http://www.pure-systems.com/pure_variants.49.0.html
Rabkin A, Katz R (2011) Static extraction of program configuration options. In: ICSE’11, pp 131–140. ACM
Rubin J, Chechik M (2012) Locating distinguishing features using diff sets. In: ASE’12, pp 242–245. ACM
Rubin J, Chechik M (2013) Domain Engineering: Product Lines, Conceptual Models, and Languages, chap. A Survey of Feature Location Techniques. Springer
Ryssel U, Ploennigs J, Kabitzsch K (2011) Extraction of feature models from formal contexts. In: FOSD’11, pp 1–8
Sannier N, Acher M, Baudry B (2013) From Comparison Matrix to Variability Model: The Wikipedia Case Study. In: ASE’13. IEEE
Sayyad AS, Menzies T, Ammar H (2013) On the value of user preferences in search-based software engineering: a case study in software product lines. In: ICSE’13, pp 492–501
Schobbens PY, Heymans P, Trigaux JC, Bontemps Y (2007) Generic semantics of feature diagrams. Comput Netw 51(2):456–479
She S (2013) Feature Model Synthesis. University of Waterloo, Ph.D. thesis
She S, Lotufo R, Berger T, Wasowski A, Czarnecki K (2011) Reverse engineering feature models. In: ICSE’11, pp 461–470. ACM
Smith T, Waterman M (1981) Identification of common molecular subsequences. Mol Biol 147:195– 197
Tarjan RE (1977) Finding optimum branchings. Networks 7(1):25–35
Thaker S, Batory D, Kitchin D, Cook W (2007) Safe composition of product lines. In: GPCE ’07. ACM, New York, NY, USA, pp 95–104
Thüm T, Batory D, Kästner C (2009) Reasoning about edits to feature models. In: ICSE’09, pp 254–264. ACM
Thüm T, Kstner C, Benduhn F, Meinicke J, Saake G, Leich T (2012) Featureide: An extensible framework for feature-oriented software development. Sci Comput Program 79:70–85
Vacchi E, Combemale B, Cazzola W, Acher M (2014) Automating Variability Model Inference for Component-Based Language Implementations. In: 18th International Software Product Line Conference (SPLC’14)
Valente MT, Borges V, Passos L (2012) A semi-automatic approach for extracting software product lines. IEEE Trans Softw Eng 38(4):737–754
Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173
Wang HH, Li YF, Sun J, Zhang H, Pan J (2007) Verifying feature models using owl. Web Semant 5(2):117–129
Weston N, Chitchyan R, Rashid A (2009) A framework for constructing semantically composable feature models from natural language requirements. In: SPLC’09, pp 211–220. ACM
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: the 32nd annual meeting on Association for Computational Linguistics, pp 133–138. Association for Computational Linguistics
Wulf-Hadash O, Reinhartz-Berger I (2013) Cross product line analysis. In: VaMoS’13 ACM
Yi L, Zhang W, Zhao H, Jin Z, Mei H (2012) Mining binary constraints in the construction of feature models. In: RE’12, pp 141–150. IEEE
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Ebrahim Bagheri, David Benavides, Per Runeson and Klaus Schmid
Appendix A: Detailed Results of Statistical Tests
Appendix A: Detailed Results of Statistical Tests
Numerous statistical tests were performed to evaluate our FM synthesis algorithm and heuristics (see Section 6 for further details). In this appendix, we present the comprehensive results of the tests. In particular, we report all the p-values and effect sizes that were computed.
There are two kinds of tables in this appendix. The first kind compares ontological techniques (displayed on top of the table) to purely logical techniques (displayed on the left of the table). The p-value corresponds to the comparison of an ontological technique with a logical one. The effect size is the difference between the mean score of the ontological technique and the mean score of the logical one. It means that if the effect size is positive, the ontological technique outperforms the logical one whereas if the effect size is negative, it is the opposite relation.
The second kind of tables compares each heuristic performing on the complete BIG with the same heuristic on the reduced BIG. The p-value corresponds to this comparison and the effect size is the difference between the mean score of the heuristic on the reduced BIG and the mean score on the complete BIG. It means that if the effect size is positive, the same heuristic performs better on the reduced BIG than on the complete BIG. If the effect size is negative, the reduced BIG has a negative impact on the heuristic.
Each table is related as follows to an hypothesis in Section 6:
-
H1 Table 7a compares FMONTO with FMRAND BIG for automatically synthesizing an FM while Table 7b compares FMONTO LOGIC with FMRAND RBIG.
-
H2 Table 8 compares FMRAND BIG and FMONTO with respectively FMRAND RBIG and FMONTO LOGIC on the fully automated synthesis.
-
H3 Table 9a compares FMONTO with FMRAND BIG for computing ranking lists while Table 9b compares FMONTO LOGIC with FMRAND RBIG.
-
H4 Table 10 compares FMRAND BIG and FMONTO with respectively FMRAND RBIG and FMONTO LOGIC on the computation of ranking lists.
-
H5 Table 11a (resp. Table 11c) compares FMONTO with FMRAND BIG on the percentage of correct clusters (resp. percentage of features in a correct cluster) generated by the heuristics. Table 11b and 11d present the same results but for FMONTO LOGIC and FMRAND RBIG.
-
H6 Table 12a compares FMRAND BIG and FMONTO with respectively FMRAND RBIG and FMONTO LOGIC on the percentage of generated correct clusters. Table 12b presents the same comparison but for the percentage of features in a correct cluster.
-
H7 Table 13a compares the percentage of correct feature groups generated from the BIG with feature groups generated from the reduced BIG. Table 13b presents the same comparison but for the percentage of features in a correct group.
Rights and permissions
About this article
Cite this article
Bécan, G., Acher, M., Baudry, B. et al. Breathing ontological knowledge into feature model synthesis: an empirical study. Empir Software Eng 21, 1794–1841 (2016). https://doi.org/10.1007/s10664-014-9357-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-014-9357-1