Skip to main content
Log in

Breathing ontological knowledge into feature model synthesis: an empirical study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Feature Models (FMs) are a popular formalism for modeling and reasoning about the configurations of a software product line. As the manual construction of an FM is time-consuming and error-prone, management operations have been developed for reverse engineering, merging, slicing, or refactoring FMs from a set of configurations/dependencies. Yet the synthesis of meaningless ontological relations in the FM – as defined by its feature hierarchy and feature groups – may arise and cause severe difficulties when reading, maintaining or exploiting it. Numerous synthesis techniques and tools have been proposed, but only a few consider both configuration and ontological semantics of an FM. There are also few empirical studies investigating ontological aspects when synthesizing FMs. In this article, we define a generic, ontologic-aware synthesis procedure that computes the likely siblings or parent candidates for a given feature. We develop six heuristics for clustering and weighting the logical, syntactical and semantical relationships between feature names. We then perform an empirical evaluation on hundreds of FMs, coming from the SPLOT repository and Wikipedia. We provide evidence that a fully automated synthesis (i.e., without any user intervention) is likely to produce FMs far from the ground truths. As the role of the user is crucial, we empirically analyze the strengths and weaknesses of heuristics for computing ranking lists and different kinds of clusters. We show that a hybrid approach mixing logical and ontological techniques outperforms state-of-the-art solutions. We believe our approach, environment, and empirical results support researchers and practitioners working on reverse engineering and management of FMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. As feature modeling is a notational subset of ontologies, feature models can be translated into description logics (Fan and Zhang 2006) and ontology languages like OWL (Wang et al. 2007). The purpose of the translation is to reuse existing description logic solvers and automatically reason about feature models (e.g., for checking consistency) (Benavides et al. 2010). It should be noted that we do not rely on these techniques (as our goal differs).

  2. Definition 4 is exactly the definition of feature implication graph employed in She et al. (2011). In another context (simplification of conjunctive normal form formula), it should be noted that Heule et al. (2011) use a different definition of binary implication graph than ours.

  3. The empirical study of the next section is precisely here to observe and quantify the limits in practical settings.

  4. The extraction process is time-consuming (about 2 days on a single machine) and the extracted database contains approximatively 40GB of data. WikipediaMiner also provides an API to search and compare articles in the database.

  5. The heuristics based on WikipediaMiner do not guarantee that the most relevant article is retrieved. For instance, searching Storage in Wikipedia leads to a disambiguation page linking to different types of storage. Our current strategy is to arbitrarily choose a given definition. Asking the user to choose the most appropriate article can raise this limitation and is an interesting perspective to further improve the effectiveness of our heuristic.

  6. http://sourceforge.net/projects/simmetrics

  7. http://extjwnl.sourceforge.net

  8. https://github.com/cpettitt/dagre

  9. http://d3js.org/

  10. Essentially we remove FMs with nonsense feature names like F1 or FT22 or written in Spanish. We did not discard FMs containing feature names not recognized by our ontologies.

  11. The threshold values were manually determined. For each heuristic, we changed the threshold value by steps of 0.1 (all our heuristics return a value between 0 and 1) to maximize the average number of features in a correct cluster.

References

  • Abbasi EK, Acher M, Heymans P, Cleve A (2014) Reverse engineering web configurators. In: CSMR/WRCE’14

  • Acher M, Cleve A, Collet P, Merle P, Duchien L, Lahire P (2011) Reverse engineering architectural feature models. In: ECSA’11, LNCS, vol 6903, pp 220–235

  • Acher M, Cleve A, Collet P, Merle P, Duchien L, Lahire P (2014) Extraction and evolution of architectural variability models in plugin-based systems. Software and Systems Modeling (SoSyM)

  • Acher M., Cleve A., Perrouin G, Heymans P, Vanbeneden C, Collet P, Lahire P. (2012) On extracting feature models from product descriptions. In: VaMoS’12, pp 45–54. ACM

  • Acher M, Collet P, Lahire P, France R (2013) Familiar: A domain-specific language for large scale management of feature models. Sci Comput Program 78 (6):657–681

    Article  Google Scholar 

  • Acher M, Combemale B, Collet P, Barais O, Lahire P, France RB (2013) Composing your compositions of variability models. In: MoDELS’13, pp 352–369

  • Acher M, Heymans P, Cleve A, Hainaut JL, Baudry B (2013) Support for reverse engineering and maintaining feature models. In: VaMoS’13. ACM

  • Ahnassay A, Bagheri E, Gasevic D (2013) Empirical evaluation in software product line engineering. Tech. Rep. TR-LS3-130084R4T, Laboratory for Systems, Software and Semantics. Ryerson University

  • Aho AV, Garey MR, Ullman JD (1972) The transitive reduction of a directed graph. SIAM J Comput 1(2):131–137

    Article  MathSciNet  MATH  Google Scholar 

  • Algorithm of Haslinger et al. (2013): http://www.jku.at/sea/content/e139529/e126342/e188736/

  • Alves V, Schwanninger C, Barbosa L, Rashid A, Sawyer P, Rayson P, Pohl C, Rummler A (2008) An exploratory study of information retrieval techniques in domain analysis. In: SPLC’08, pp 67–76. IEEE

  • Andersen N, Czarnecki K, She S, Wasowski A (2012) Efficient synthesis of feature models. In: Proceedings of SPLC’12, pp 97–106. ACM

  • Apel S, Batory D, Kästner C, Saake G (2013) Feature-Oriented Software Product Lines: Concepts and Implementation. Springer

  • Apel S, Kästner C (2009) An overview of feature-oriented software development. Journal of Object Technology (JOT) 8(5):49–84

    Article  Google Scholar 

  • Apel S, Kästner C, Lengauer C (2013) Language-independent and automated software composition: The featurehouse experience. IEEE Trans Softw Eng 39:63–79

    Article  Google Scholar 

  • Apel S, von Rhein A, Wendler P, Größlinger A, Beyer D (2013) Strategies for product-line verification: Case studies and experiments. In: ICSE’13. IEEE

  • Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11. ACM, New York, pp 1–10

  • Baader F, Nutt W (2003) The description logic handbook. chap. Basic Description Logics. Cambridge University Press, New York, NY, USA, pp 43–95

    Google Scholar 

  • Bagheri E, Ensan F, Gasevic D (2012) Decision support for the software product line domain engineering lifecycle. Autom Softw Eng 19(3):335–377

    Article  Google Scholar 

  • Bagheri E, Gasevic D (2011) Assessing the maintainability of software product line feature models using structural metrics. Softw Qual J 19(3):579–612

    Article  Google Scholar 

  • Bécan G, Acher M, Baudry B, Ben Nasr S (2013) Breathing ontological knowledge into feature model management. Rapport Technique RT-0441, INRIA. http://hal.inria.fr/hal-00874867

  • Bécan G, Nasr SB, Acher M, Baudry B (2014) WebFML: Synthesizing Feature Models Everywhere. In: SPLC’14

  • Bécan G, Sannier N, Acher M, Barais O, Blouin A, Baudry B (2014) Automating the formalization of product comparison matrices. In: Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, pp 433–444. ACM

  • Benavides D, Segura S, Ruiz-Cortes A (2010) Automated analysis of feature models 20 years later: a literature review. Information Systems 35(6):p.615–636

    Article  Google Scholar 

  • Berger T, She S, Lotufo R, Wasowski A, Czarnecki K (2013) A study of variability models and languages in the systems software domain . IEEE Trans Softw Eng 39(12):1611–1640

    Article  Google Scholar 

  • Berger T, Rublack R, Nair D, Atlee J M, Becker M, Czarnecki K, Wasowski A (2013) A survey of variability modeling in industrial practice. In: VaMoS’13. ACM

  • Boucher Q, Abbasi E, Hubaux A, Perrouin G, Acher M, Heymans P (2012) Towards more reliable configurators: A re-engineering perspective. In: PLEASE’12 Int’l workshop at ICSE’12

  • Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguis 32(1):13–47

    Article  MATH  Google Scholar 

  • Camerini P, Fratta L, Maffioli F (1979) A note on finding optimum branchings. Networks 9(4):309–312

    Article  MathSciNet  MATH  Google Scholar 

  • Chen K, Zhang W, Zhao H, Mei H (2005) An approach to constructing feature models based on requirements clustering. In: RE’05, pp 31–40

  • Classen A, Heymans P, Schobbens PY, Legay A (2011) Symbolic model checking of software product lines. In: ICSE’11, pp 321–330. ACM

  • Classen A, Heymans P, Schobbens PY, Legay A, Raskin JF (2010) Model checking lots of systems: efficient verification of temporal properties in software product lines. In: ICSE’10, pp 335–344. ACM

  • Cordy M, Schobbens PY, Heymans P, Legay A (2013) Beyond boolean product-line model checking: dealing with feature attributes and multi-features. In: ICSE’13, pp 472–481

  • Czarnecki K, Eisenecker U (2000) Generative Programming: Methods, Tools and Applications. Addison-Wesley, Reading

    Google Scholar 

  • Czarnecki K, Kim CHP, Kalleberg KT (2006) Feature models are views on ontologies. In: SPLC ’06, pp 41–51. IEEE

  • Czarnecki K, Pietroszek K (2006) Verifying feature-based model templates against well-formedness ocl constraints. In: GPCE’06, pp 211–220. ACM

  • Czarnecki K, She S, Wasowski A (2008) Sample spaces and feature models: There and back again. In: SPLC’08, pp 22–31

  • Czarnecki K, Wasowski A (2007) Feature diagrams and logics: There and back again. In: SPLC’07, pp 23–34. IEEE

  • Davril JM, Delfosse E, Hariri N, Acher M, Cleland-Huang J, Heymans P (2013) Feature model extraction from large collections of informal product descriptions. In: ESEC/FSE’13

  • Dietrich C, Tartler R, Schröder-Preikschat W, Lohmann D (2012) A robust approach for variability extraction from the linux build system. In: SPLC’12, pp 21–30

  • Fan S, Zhang N (2006) Feature model based on description logics. In: Gabrys B, Howlett R, Jain L (eds) Knowledge-Based Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, vol 4252. Springer, Berlin Heidelberg, pp 1144–1151

    Chapter  Google Scholar 

  • Ferrari A, Spagnolo GO, dell’Orletta F (2013) Mining commonalities and variabilities from natural language documents. In: Kishi T, Jarzabek S, Gnesi S (eds) SPLC, pp 116–120. ACM

  • Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220

    Article  Google Scholar 

  • Hariri N, Castro-Herrera C, Mirakhorli M, Cleland-Huang J, Mobasher B (2013) Supporting domain analysis through mining and recommending features from online product listings IEEE Trans Softw Eng

  • Haslinger EN, Lopez-Herrejon RE, Egyed A (2011) Reverse engineering feature models from programs’ feature sets. In: WCRE’11, pp 308–312. IEEE

  • Haslinger EN, Lopez-Herrejon RE, Egyed A (2013) On extracting feature models from sets of valid feature combinations. In: FASE’13, LNCS, vol 7793, pp 53–67

  • Heidenreich F, Sanchez P, Santos J, Zschaler S, Alferez M, Araujo J, Fuentes L, amd Ana Moreira UK, Rashid A (2010) Relating feature models to other models of a software product line: A comparative study of featuremapper and vml*. Transactions on Aspect-Oriented Software Development VII. Special Issue on A Common Case Study for Aspect-Oriented Modeling 6210:69–114

  • Heule MJH, Järvisalo M, Biere A (2011) Efficient cnf simplification based on binary implication graphs. In: Proceedings of the 14th International Conference on Theory and Application of Satisfiability Testing, SAT’11. Springer-Verlag, Berlin, Heidelberg, pp 201–215

  • Hubaux A, Acher M, Tun TT, Heymans P, Collet P, Lahire P (2013) Domain Engineering: Product Lines, Conceptual Models, and Languages, chap. Separating Concerns in Feature Models: Retrospective and Multi-View Support. Springer 45(4):51

  • Hubaux A, Tun TT, Heymans P (2013) Separation of concerns in feature diagram languages: A systematic survey. ACM Comput Surv

  • Janota M, Kuzina V, Wasowski A (2008) Model construction with external constraints: An interactive journey from semantics to syntax. In: MODELS’08, LNCS, vol 5301, pp 431–445

  • Kang K, Lee J, Donohoe P (2002) Feature-oriented product line engineering. Software, IEEE 19(4):58–65

    Article  Google Scholar 

  • Kästner C, Dreiling A, Ostermann K (2013) Variability mining: Consistent semiautomatic detection of product-line features. IEEE Trans Softw Eng 40(1):67–82

    Article  Google Scholar 

  • Krueger CW (2007) Biglever software Gears and the 3-tiered spl methodology. In: OOPSLA’07, pp 844–845. ACM

  • Linden FJvd, Schmid K, Rommes E (2007) Software Product Lines in Action: The Best Industrial Practice in Product Line Engineering. Springer-Verlag, New York, Inc., Secaucus, NJ, USA

    Book  Google Scholar 

  • Lopez-Herrejon RE, Galindo JA, Benavides D, Segura S, Egyed A (2012) Reverse engineering feature models with evolutionary algorithms: An exploratory study. In: SSBSE’12, LNCS, vol 7515, pp 168–182. Springer

  • Lopez-Herrejon RE, Linsbauer L, Galindo JA, Parejo JA, Benavides D, Segura S, Egyed A (2014) assessment of search-based techniques for reverse engineering feature models. J Syst Softw. 10.1016/j.jss.2014.10.037

  • Medelyan O, Milne DN, Legg C, Witten IH (2009) Mining meaning from wikipedia. Int J Hum-Comput Stud 67(9):716–754

    Article  Google Scholar 

  • Mendonca M, Branco M, Cowan D (2009) S.p.l.o.t.: software product lines online tools. In: OOPSLA’09 (companion). ACM

  • Mendonca M, Wasowski A, Czarnecki K (2009) SAT-based analysis of feature models is easy. In: SPLC’09, pp 231–240. IEEE

  • Metzger A, Pohl K, Heymans P, Schobbens PY, Saval G (2007) Disambiguating the documentation of variability in software product lines: A separation of concerns, formalization and automated analysis. In: RE’07, pp 243–253

  • Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Milne D (2007) Computing semantic relatedness using wikipedia link structure. In: The New Zealand Computer Science Research Student Conference. Citeseer

  • Milne DN, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222– 239

    Article  MathSciNet  Google Scholar 

  • Mussbacher G, Araújo J, Moreira A, Amyot D (2012) Aourn-based modeling and analysis of software product lines. Softw Qual J 20(3–4):645–687

    Article  Google Scholar 

  • Nadi S, Berger T, Kästner C, Czarnecki K (2014) Mining configuration constraints: Static analyses and empirical results. In: Proceedings of the 36th International Conference on Software Engineering (ICSE)

  • Niu N, Easterbrook SM (2009) Concept analysis for product line requirements. In: Sullivan KJ, Moreira A, Schwanninger C, Gray J (eds) AOSD’09, pp 137–148. ACM

  • Pleuss A, Botterweck G (2012) Visualization of variability and configuration options. Int J Softw Tools Technol Transfer 14(5):497–510

    Article  Google Scholar 

  • Pohl K, Böckle G, van der Linden FJ (2005) Software Product Line Engineering: Foundations, Principles and Techniques. Springer-Verlag

  • Pohl R, Lauenroth K, Pohl K (2011) A performance comparison of contemporary algorithmic approaches for automated analysis operations on feature models. In: ASE’11, pp 313–322

  • Pohl R, Stricker V, Pohl K (2013) Measuring the structural complexity of feature models. In: ASE’13

  • pure::variants: http://www.pure-systems.com/pure_variants.49.0.html

  • Rabkin A, Katz R (2011) Static extraction of program configuration options. In: ICSE’11, pp 131–140. ACM

  • Rubin J, Chechik M (2012) Locating distinguishing features using diff sets. In: ASE’12, pp 242–245. ACM

  • Rubin J, Chechik M (2013) Domain Engineering: Product Lines, Conceptual Models, and Languages, chap. A Survey of Feature Location Techniques. Springer

  • Ryssel U, Ploennigs J, Kabitzsch K (2011) Extraction of feature models from formal contexts. In: FOSD’11, pp 1–8

  • Sannier N, Acher M, Baudry B (2013) From Comparison Matrix to Variability Model: The Wikipedia Case Study. In: ASE’13. IEEE

  • Sayyad AS, Menzies T, Ammar H (2013) On the value of user preferences in search-based software engineering: a case study in software product lines. In: ICSE’13, pp 492–501

  • Schobbens PY, Heymans P, Trigaux JC, Bontemps Y (2007) Generic semantics of feature diagrams. Comput Netw 51(2):456–479

    Article  MATH  Google Scholar 

  • She S (2013) Feature Model Synthesis. University of Waterloo, Ph.D. thesis

    Google Scholar 

  • She S, Lotufo R, Berger T, Wasowski A, Czarnecki K (2011) Reverse engineering feature models. In: ICSE’11, pp 461–470. ACM

  • Smith T, Waterman M (1981) Identification of common molecular subsequences. Mol Biol 147:195– 197

    Article  Google Scholar 

  • Tarjan RE (1977) Finding optimum branchings. Networks 7(1):25–35

    Article  MathSciNet  MATH  Google Scholar 

  • Thaker S, Batory D, Kitchin D, Cook W (2007) Safe composition of product lines. In: GPCE ’07. ACM, New York, NY, USA, pp 95–104

  • Thüm T, Batory D, Kästner C (2009) Reasoning about edits to feature models. In: ICSE’09, pp 254–264. ACM

  • Thüm T, Kstner C, Benduhn F, Meinicke J, Saake G, Leich T (2012) Featureide: An extensible framework for feature-oriented software development. Sci Comput Program 79:70–85

    Article  Google Scholar 

  • Vacchi E, Combemale B, Cazzola W, Acher M (2014) Automating Variability Model Inference for Component-Based Language Implementations. In: 18th International Software Product Line Conference (SPLC’14)

  • Valente MT, Borges V, Passos L (2012) A semi-automatic approach for extracting software product lines. IEEE Trans Softw Eng 38(4):737–754

    Article  Google Scholar 

  • Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173

    Article  MathSciNet  MATH  Google Scholar 

  • Wang HH, Li YF, Sun J, Zhang H, Pan J (2007) Verifying feature models using owl. Web Semant 5(2):117–129

    Article  Google Scholar 

  • Weston N, Chitchyan R, Rashid A (2009) A framework for constructing semantically composable feature models from natural language requirements. In: SPLC’09, pp 211–220. ACM

  • Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: the 32nd annual meeting on Association for Computational Linguistics, pp 133–138. Association for Computational Linguistics

  • Wulf-Hadash O, Reinhartz-Berger I (2013) Cross product line analysis. In: VaMoS’13 ACM

  • Yi L, Zhang W, Zhao H, Jin Z, Mei H (2012) Mining binary constraints in the construction of feature models. In: RE’12, pp 141–150. IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Bécan.

Additional information

Communicated by: Ebrahim Bagheri, David Benavides, Per Runeson and Klaus Schmid

Appendix A: Detailed Results of Statistical Tests

Appendix A: Detailed Results of Statistical Tests

Numerous statistical tests were performed to evaluate our FM synthesis algorithm and heuristics (see Section 6 for further details). In this appendix, we present the comprehensive results of the tests. In particular, we report all the p-values and effect sizes that were computed.

There are two kinds of tables in this appendix. The first kind compares ontological techniques (displayed on top of the table) to purely logical techniques (displayed on the left of the table). The p-value corresponds to the comparison of an ontological technique with a logical one. The effect size is the difference between the mean score of the ontological technique and the mean score of the logical one. It means that if the effect size is positive, the ontological technique outperforms the logical one whereas if the effect size is negative, it is the opposite relation.

The second kind of tables compares each heuristic performing on the complete BIG with the same heuristic on the reduced BIG. The p-value corresponds to this comparison and the effect size is the difference between the mean score of the heuristic on the reduced BIG and the mean score on the complete BIG. It means that if the effect size is positive, the same heuristic performs better on the reduced BIG than on the complete BIG. If the effect size is negative, the reduced BIG has a negative impact on the heuristic.

Each table is related as follows to an hypothesis in Section 6:

  • H1 Table 7a compares FMONTO with FMRAND BIG for automatically synthesizing an FM while Table 7b compares FMONTO LOGIC with FMRAND RBIG.

  • H2 Table 8 compares FMRAND BIG and FMONTO with respectively FMRAND RBIG and FMONTO LOGIC on the fully automated synthesis.

  • H3 Table 9a compares FMONTO with FMRAND BIG for computing ranking lists while Table 9b compares FMONTO LOGIC with FMRAND RBIG.

  • H4 Table 10 compares FMRAND BIG and FMONTO with respectively FMRAND RBIG and FMONTO LOGIC on the computation of ranking lists.

  • H5 Table 11a (resp. Table 11c) compares FMONTO with FMRAND BIG on the percentage of correct clusters (resp. percentage of features in a correct cluster) generated by the heuristics. Table 11b and 11d present the same results but for FMONTO LOGIC and FMRAND RBIG.

  • H6 Table 12a compares FMRAND BIG and FMONTO with respectively FMRAND RBIG and FMONTO LOGIC on the percentage of generated correct clusters. Table 12b presents the same comparison but for the percentage of features in a correct cluster.

  • H7 Table 13a compares the percentage of correct feature groups generated from the BIG with feature groups generated from the reduced BIG. Table 13b presents the same comparison but for the percentage of features in a correct group.

    Table 7 H1 - Full synthesis
    Table 8 H2 - Full synthesis (BIG vs reduced BIG)
    Table 9 H3 - Top 2
    Table 10 H4 - Top 2 (BIG vs reduced BIG)
    Table 11 H5 - Clusters generated by heuristics
    Table 12 H6 - Clusters generated by heuristics (BIG vs reduced BIG)
    Table 13 H7 - Feature groups (BIG vs reduced BIG)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bécan, G., Acher, M., Baudry, B. et al. Breathing ontological knowledge into feature model synthesis: an empirical study. Empir Software Eng 21, 1794–1841 (2016). https://doi.org/10.1007/s10664-014-9357-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9357-1

Keywords

Navigation