Introducing Collaboration for Locating Features in Models: Approach and Industrial Evaluation

  • Francisca Pérez
  • Ana C. Marcén
  • Raúl Lapeña
  • Carlos Cetina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10573)


Feature Location (FL) is one of the most important tasks in software maintenance and evolution. However, current works on FL neglected the collaboration of different domain experts. This collaboration is especially important in long-living industrial domains where a single domain expert may lack the required knowledge to fully locate a feature, so the collaboration among different domain experts could alleviate this lack of knowledge. In this work, we address collaboration among different domain experts by automatically reformulating their feature descriptions. With our approach, we extend existing FL approaches based on Information Retrieval and Linguistic rules to locate features in models. We evaluate our approach in a real-world case study from our industrial partner, which is a worldwide leader in train manufacturing. We analyze the impact of our approach in terms of recall, precision, and F-Measure. Moreover, we perform a statistical analysis to show that the impact of the results is significant. Our results show that our approach for collaboration boosts the quality of the results of FL.


Collaborative information retrieval Feature location Query expansion Model driven engineering 



This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R).


  1. 1.
    Apache OpenNLP: Toolkit for the processing of natural language text (2017).
  2. 2.
    Efficient java matrix library (2017).
  3. 3.
    English (porter2) stemming algorithm (2017).
  4. 4.
    Ambreen, T., Ikram, N., Usman, M., Niazi, M.: Empirical research in requirements engineering: trends and opportunities. Requirements Eng., 1–33 (2016)Google Scholar
  5. 5.
    Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)Google Scholar
  6. 6.
    Cavalcanti, Y.a.C., Machado, I.d.C., Neto, P.A.d.M.S., de Almeida, E.S., Meira, S.R.d.L.: Combining rule-based and information retrieval techniques to assign software change requests. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE 2014, pp. 325–330 (2014)Google Scholar
  7. 7.
    Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Evol. Process 25(1), 53–95 (2013)CrossRefGoogle Scholar
  8. 8.
    Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., Mirakhorli, M.: On-demand feature recommendations derived from mining public product descriptions. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, pp. 181–190 (2011)Google Scholar
  9. 9.
    Fidel, R., Pejtersen, A.M., Cleal, B., Bruce, H.: A multidimensional approach to the study of human-information interaction: a case study of collaborative information retrieval. J. Am. Soc. Inf. Sci. Technol. 55(11), 939–953 (2004)CrossRefGoogle Scholar
  10. 10.
    Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: Kapitsaki, G.M., Santana de Almeida, E. (eds.) ICSR 2016. LNCS, vol. 9679, pp. 39–54. Springer, Cham (2016). doi: 10.1007/978-3-319-35122-3_3 Google Scholar
  11. 11.
    Font, J., Ballarín, M., Haugen, Ø., Cetina, C.: Automating the variability formalization of a model family by means of common variability language. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 411–418 (2015)Google Scholar
  12. 12.
    Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River (1992)Google Scholar
  13. 13.
    Hansen, P., Shah, C., Klas, C.P.: Collaborative Information Seeking: Best Practices, New Domains and New Thoughts, 1st edn. Springer Publishing Company, Incorporated, Berlin (2015)Google Scholar
  14. 14.
    Haugen, Ø., Moller-Pedersen, B., Oldevik, J., Olsen, G., Svendsen, A.: Adding standardized variability to domain specific languages. In: 12th International on Software Product Line Conference, SPLC 2008, pp. 139–148, September 2008Google Scholar
  15. 15.
    Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of NL-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, ICSE 2009, pp. 232–242. IEEE Computer Society, Washington, DC (2009)Google Scholar
  16. 16.
    Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference, vol. 2. pp. 36–43 (2014)Google Scholar
  17. 17.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)Google Scholar
  18. 18.
    Kimmig, M., Monperrus, M., Mezini, M.: Querying source code with natural language. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, pp. 376–379 (2011)Google Scholar
  19. 19.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)CrossRefGoogle Scholar
  20. 20.
    Leech, G., Garside, R., Bryant, M.: Claws4: the tagging of the British National Corpus. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 622–628. Association for Computational Linguistics (1994)Google Scholar
  21. 21.
    Liu, D., Marcus, A., Poshyvanyk, D., Rajlich, V.: Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, ASE 2007, pp. 234–243. ACM, New York (2007)Google Scholar
  22. 22.
    Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549, March 2015Google Scholar
  23. 23.
    Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)Google Scholar
  24. 24.
    Marcén, A.C., Pérez, F., Cetina, C.: Ontological evolutionary encoding to bridge machine learning and conceptual models: approach and industrial evaluation. In: Proceedings of the 36th International Conference on Conceptual Modeling (2017)Google Scholar
  25. 25.
    Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, WCRE 2004, pp. 214–223 (2004)Google Scholar
  26. 26.
    Martinez, J., Ziadi, T., Bissyand, T.F., Klein, J., le Traon, Y.: Automating the extraction of model-based software product lines from model variants (T). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406, November 2015Google Scholar
  27. 27.
    Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Traon, Y.L.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line, pp. 101–110 (2015)Google Scholar
  28. 28.
    Meziane, F., Athanasakis, N., Ananiadou, S.: Generating natural language specifications from UML class diagrams. Requirements Eng. 13(1), 1–18Google Scholar
  29. 29.
    Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007)CrossRefGoogle Scholar
  30. 30.
    Revelle, M., Dit, B., Poshyvanyk, D.: Using data fusion and web mining to support feature location in software. In: IEEE 18th International Conference on Program Comprehension (ICPC), pp. 14–23, June 2010Google Scholar
  31. 31.
    Rivas, A., Iglesias, E., Borrajo, L.: Study of query expansion techniques and their application in the biomedical information retrieval. Sci. World J. (2014)Google Scholar
  32. 32.
    Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Eng. 14(2), 131–164 (2009)CrossRefGoogle Scholar
  33. 33.
    Salman, H.E., Seriai, A., Dony, C.: Feature location in a collection of product variants: combining information retrieval and hierarchical clustering. In: The 26th International Conference on Software Engineering and Knowledge Engineering, pp. 426–430 (2013)Google Scholar
  34. 34.
    Salton, G.: The SMART Retrieval System-Experiments in Automatic Document Processing. Prentice-Hall Inc., Upper Saddle River (1971)Google Scholar
  35. 35.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)Google Scholar
  36. 36.
    Shah, C.: Collaborative information seeking: a literature review. Exploring the Digital Frontier Advances in Librarianship, vol. 32 (2010)Google Scholar
  37. 37.
    Sisman, B., Kak, A.C.: Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 309–318 (2013)Google Scholar
  38. 38.
    Spanoudakis, G., Zisman, A., Pérez-Minana, E., Krause, P.: Rule-based generation of requirements traceability relations. J. Syst. Softw. 72(2), 105–127 (2004)CrossRefGoogle Scholar
  39. 39.
    Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)Google Scholar
  40. 40.
    Wang, S., Lo, D., Jiang, L.: Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE 2014, pp. 677–682 (2014)Google Scholar
  41. 41.
    Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013)Google Scholar
  42. 42.
    Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012)Google Scholar
  43. 43.
    Yang, J., Tan, L.: Inferring semantically related words from software context. In: Mining Software Repositories (MSR), pp. 161–170 (2012)Google Scholar
  44. 44.
    Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), vol. 1, pp. 766–771 (2012)Google Scholar
  45. 45.
    Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Francisca Pérez
    • 1
  • Ana C. Marcén
    • 1
  • Raúl Lapeña
    • 1
  • Carlos Cetina
    • 1
  1. 1.SVIT Research GroupUniversidad San JorgeZaragozaSpain

Personalised recommendations