Advertisement

An approach for bug localization in models using two levels: model and metamodel

  • Lorena ArcegaEmail author
  • Jaime Font
  • Øystein Haugen
  • Carlos Cetina
Regular Paper
  • 30 Downloads

Abstract

Bug localization is a common task in software engineering, especially when maintaining and evolving software products. This paper introduces a bug localization approach that, in contrast to existing source code approaches, takes advantage of domain information found in the model and the metamodel. Throughout this paper, we present an approach for bug localization in models (BLiM2) that applies the source code ideas for bug localization (textual similarity to the bug description and the Defect Localization Principle) and takes advantage of the domain information from the model and the metamodel. We evaluated our approach in BSH, a real-world industrial case study in the induction hob domain measuring the results in terms of recall, precision, the combination of both the F-measure and the Matthews correlation coefficient. Our study shows that our BLiM2 approach, which combines information from the model and the metamodel for the textual similarity and differentiates between the timespan from the model and metamodel, provides the best results in this work. We also performed a statistical analysis to provide evidence of the significance of the results. The values obtained show that there exist significant differences in the performance of the best BLiM2 approach with the approach used by our industrial partner. Finally, the effect size statistics reveals that the best BLiM2 approach obtains better results in the 78% of the times in the worst case.

Keywords

Bug localization Model-driven engineering Reverse engineering 

Notes

Acknowledgements

This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R). We also thank ITEA3 15010 REVaMP2 Project.

References

  1. 1.
    Apache opennlp: Toolkit for the processing of natural language text. http://opennlp.apache.org/ (2010). Online; Accessed 04 April 2017
  2. 2.
    Alves, E., Gligoric, M., Jagannath, V., d’Amorim, M.: Fault-localization using dynamic slicing and change impact analysis. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pp. 520–523. IEEE Computer Society, Washington, DC, USA (2011).  https://doi.org/10.1109/ASE.2011.6100114
  3. 3.
    Arcega, L., Font, J., Haugen, Ø., Cetina, C.: On the influence of models at run-time traces in dynamic feature location. In: Modelling Foundations and Applications - 13th European Conference, ECMFA 2017, Held as Part of STAF 2017, Marburg, Germany, July 19–20, 2017, Proceedings (2017)Google Scholar
  4. 4.
    Arcega, L., Font, J., Haugen, Ø., Cetina, C.: On the influence of modification timespan weightings in the location of bugs in models. In: Proceedings of the 26th International Conference on Information Systems Development, ISD 2017, Larnaca, Cyprus, September 6–8, 2017 (2017)Google Scholar
  5. 5.
    Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014).  https://doi.org/10.1002/stvr.1486 Google Scholar
  6. 6.
    Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir. Softw. Eng. 18(3), 594–623 (2013).  https://doi.org/10.1007/s10664-013-9249-9 Google Scholar
  7. 7.
    Bencomo, N., Hallsteinsen, S., de Almeida, E.Santana: A view of the dynamic software product line landscape. Computer 45(10), 36–41 (2012).  https://doi.org/10.1109/MC.2012.292 Google Scholar
  8. 8.
    Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4/5), 993–1022 (2003)Google Scholar
  9. 9.
    Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLOS ONE 12(6), 1–17 (2017).  https://doi.org/10.1371/journal.pone.0177678 Google Scholar
  10. 10.
    Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, Hoboken (1999)Google Scholar
  11. 11.
    de Oliveira Barros, M., Dias-Neto, A.C.: 0006/2011-threats to validity in search-based software engineering empirical studies. RelaTe-DIA 5(1), (2011)Google Scholar
  12. 12.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. Trans. Evol. Comput. 6(2), 182–197 (2002).  https://doi.org/10.1109/4235.996017 Google Scholar
  13. 13.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990).  https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  14. 14.
    Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Maint. Evol. Res. Pract. (2011)Google Scholar
  15. 15.
    Dyer, D.W.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for java). http://watchmaker.uncommons.org/ (2006). Online; Accessed 04 April 2017
  16. 16.
    Efficient Java Matrix Library. https://ejml.org (2016). Online; Accessed 04 April 2017
  17. 17.
    Font, J., Arcega, L., Haugen, O., Cetina, C.: Leveraging variability modeling to address metamodel revisions in model-based software product lines. Comput. Lang. Syst. Struct. 48, 20–38 (2017).  https://doi.org/10.1016/j.cl.2016.08.003 Google Scholar
  18. 18.
    Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: 15th International Conference on Software Reuse, ICSR 2016, Limassol, Cyprus (2016)Google Scholar
  19. 19.
    Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in models through a genetic algorithm driven by information retrieval techniques. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, MODELS ’16, pp. 272–282. ACM, New York, NY, USA (2016).  https://doi.org/10.1145/2976767.2976789
  20. 20.
    Garca, S., Fernndez, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010).  https://doi.org/10.1016/j.ins.2009.12.010 Google Scholar
  21. 21.
    Gong, L., Lo, D., Jiang, L., Zhang, H.: Interactive fault localization leveraging simple user feedback. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp. 67–76 (2012).  https://doi.org/10.1109/ICSM.2012.6405255
  22. 22.
    Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Earlbaum, Mahwah (2005)Google Scholar
  23. 23.
    Hassan, A.E., Holt, R.C.: The top ten list: dynamic fault prediction. In: 21st IEEE International Conference on Software Maintenance (ICSM’05), pp. 263–272 (2005).  https://doi.org/10.1109/ICSM.2005.91
  24. 24.
    Haugen, Ø., Møller-Pedersen, B., Oldevik, J., Olsen, G.K., Svendsen, A.: Adding standardized variability to domain specific languages. In: Proceedings of the 2008 12th International Software Product Line Conference, SPLC ’08, pp. 139–148. IEEE Computer Society, Washington, DC, USA (2008).  https://doi.org/10.1109/SPLC.2008.25
  25. 25.
    Hoang, T.V., Oentaryo, R.J., Le, T.B., Lo, D.: Network-clustered multi-modal bug localization. IEEE Trans. Softw. Eng. (2018).  https://doi.org/10.1109/TSE.2018.2810892 Google Scholar
  26. 26.
    Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference: Volume 2, pp. 36–43 (2014).  https://doi.org/10.1145/2647908.2655965
  27. 27.
    Kagdi, H., Gethers, M., Poshyvanyk, D., Hammad, M.: Assigning change requests to software developers. J. Softw. Evol. Process 24(1), 3–33 (2012).  https://doi.org/10.1002/smr.530 Google Scholar
  28. 28.
    Kim, D., Tao, Y., Kim, S., Zeller, A.: Where should we fix this bug? A two-phase recommendation model. IEEE Trans. Softw. Eng. 39(11), 1597–1610 (2013)Google Scholar
  29. 29.
    Kusumoto, S., Nishimatsu, A., Nishie, K., Inoue, K.: Experimental evaluation of program slicing for fault localization. Empir. Softw. Eng. 7(1), 49–76 (2002).  https://doi.org/10.1023/A:1014823126938 Google Scholar
  30. 30.
    Lam, A.N., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 218–229 (2017).  https://doi.org/10.1109/ICPC.2017.24
  31. 31.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998).  https://doi.org/10.1080/01638539809545028 Google Scholar
  32. 32.
    Le, T.D.B., Oentaryo, R.J., Lo, D.: Information retrieval and spectrum based bug localization: Better together. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 579–590. ACM, New York, NY, USA (2015).  https://doi.org/10.1145/2786805.2786880
  33. 33.
    Lehman, M.M., Ramil, J., Kahen, G.: A paradigm for the behavioural modelling of software processes using system dynamics. Tech. rep., Imperial College of Science, Technology and Medicine, Department of Computing (2001)Google Scholar
  34. 34.
    Liang, D., Harrold, M.J.: Equivalence analysis and its application in improving the efficiency of program slicing. ACM Trans. Softw. Eng. Methodol. 11(3), 347–383 (2002).  https://doi.org/10.1145/567793.567796 Google Scholar
  35. 35.
    Liu, D., Marcus, A., Poshyvanyk, D., Rajlich, V.: Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pp. 234–243. ACM, New York, NY, USA (2007).  https://doi.org/10.1145/1321631.1321667
  36. 36.
    Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103, 353–369 (2015).  https://doi.org/10.1016/j.jss.2014.10.037 Google Scholar
  37. 37.
    Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103, 353–369 (2015).  https://doi.org/10.1016/j.jss.2014.10.037 Google Scholar
  38. 38.
    Lukins, S.K., Kraft, N.A., Etzkorn, L.H.: Bug localization using latent dirichlet allocation. Inf. Softw. Technol. 52(9), 972–990 (2010).  https://doi.org/10.1016/j.infsof.2010.04.002 Google Scholar
  39. 39.
    Mao, X., Lei, Y., Dai, Z., Qi, Y., Wang, C.: Slice-based statistical fault localization. J. Syst. Softw. 89, 51–62 (2014).  https://doi.org/10.1016/j.jss.2013.08.031 Google Scholar
  40. 40.
    Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223 (2004).  https://doi.org/10.1109/WCRE.2004.10
  41. 41.
    Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., l. Traon, Y.: Automating the extraction of model-based software product lines from model variants (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406 (2015).  https://doi.org/10.1109/ASE.2015.44
  42. 42.
    Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Traon, Y.L.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 101–110 (2015).  https://doi.org/10.1145/2791060.2791086
  43. 43.
    Neumann, G., Harman, M., Poulding, S.: Transformed Vargha-Delaney Effect Size, pp. 318–324. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22183-0_29 Google Scholar
  44. 44.
    Panichella, A., Dit, B., Oliveto, R., Penta, M.D., Poshyvanyk, D., Lucia, A.D.: Parameterizing and assembling ir-based solutions for se tasks using genetic algorithms. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 314–325 (2016).  https://doi.org/10.1109/SANER.2016.97
  45. 45.
    Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007).  https://doi.org/10.1109/TSE.2007.1016 Google Scholar
  46. 46.
    Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar
  47. 47.
    Rahman, M.M., Chakraborty, S., Ray, B.: Which similarity metric to use for software documents? A study on information retrieval based software engineering tasks. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, ICSE ’18, pp. 335–336. ACM, New York, NY, USA (2018).  https://doi.org/10.1145/3183440.3194997
  48. 48.
    Rao, S., Kak, A.: Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11, pp. 43–52. ACM, New York, NY, USA (2011).  https://doi.org/10.1145/1985441.1985451
  49. 49.
    Revelle, M., Dit, B., Poshyvanyk, D.: Using data fusion and web mining to support feature location in software. In: IEEE 18th International Conference on Program Comprehension (ICPC), pp. 14–23 (2010).  https://doi.org/10.1109/ICPC.2010.10
  50. 50.
    Saha, R.K., Lease, M., Khurshid, S., Perry, D.E.: Improving bug localization using structured information retrieval. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 345–355 (2013).  https://doi.org/10.1109/ASE.2013.6693093
  51. 51.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc, New York (1986)Google Scholar
  52. 52.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975).  https://doi.org/10.1145/361219.361220 Google Scholar
  53. 53.
    Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: A straw to break the camel’s back. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 465–474 (2013).  https://doi.org/10.1109/ASE.2013.6693104
  54. 54.
    Sisman, B., Kak, A.C.: Incorporating version histories in information retrieval based bug localization. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 50–59 (2012).  https://doi.org/10.1109/MSR.2012.6224299
  55. 55.
    Svendsen, A., Zhang, X., Lind-Tviberg, R., Fleurey, F., Haugen, Ø., Møller-Pedersen, B., Olsen, G.K.: Developing a software product line for train control: a case study of cvl. In: Proceedings of the 14th International Conference on Software Product Lines: Going Beyond, SPLC’10, pp. 106–120. Springer-Verlag, Berlin, Heidelberg (2010). http://dl.acm.org/citation.cfm?id=1885639.1885650
  56. 56.
    The English (porter2) Stemming Algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html (2002). Online; Accessed 04 April 2017
  57. 57.
    Thomas, S.W., Hassan, A.E., Blostein, D.: Mining Unstructured Software Repositories, pp. 139–162. Springer, Berlin, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-45398-4_5 Google Scholar
  58. 58.
    Vargha, A., Delaney, H.D.: A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000).  https://doi.org/10.3102/10769986025002101 Google Scholar
  59. 59.
    Wang, S., Lo, D.: Amalgam+: composing rich information sources for accurate bug localization. J. Softw. Evol. Process 28(10), 921–942 (2016).  https://doi.org/10.1002/smr.1801 Google Scholar
  60. 60.
    Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013).  https://doi.org/10.1145/2499777.2500708
  61. 61.
    Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)Google Scholar
  62. 62.
    Zamani, S., Lee, S.P., Shokripour, R., Anvik, J.: A noun-based approach to feature location using time-aware term-weighting. Inf. Softw. Technol. 56(8), 991–1011 (2014).  https://doi.org/10.1016/j.infsof.2014.03.007 Google Scholar
  63. 63.
    Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011).  https://doi.org/10.1109/SPLC.2011.24
  64. 64.
    Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, vol. 1, pp. 766–771 (2012).  https://doi.org/10.1109/APSEC.2012.76
  65. 65.
    Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp. 14–24. IEEE Press, Piscataway, NJ, USA (2012). http://dl.acm.org/citation.cfm?id=2337223.2337226
  66. 66.
    Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, pp. 563–572. IEEE Computer Society, Washington, DC, USA (2004). http://dl.acm.org/citation.cfm?id=998675.999460

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Lorena Arcega
    • 1
    • 2
    Email author
  • Jaime Font
    • 1
    • 2
  • Øystein Haugen
    • 3
  • Carlos Cetina
    • 1
  1. 1.Escuela de Arquitectura y TecnologiaUniversidad San JorgeZaragozaSpain
  2. 2.Department of InformaticsUniversity of OsloOsloNorway
  3. 3.Faculty of Computer ScienceØstfold University CollegeHaldenNorway

Personalised recommendations