Abstract
In this keynote I introduce the use of Predictive Analytics for Software Engineering (SE) and then focus on the use of search-based heuristics to tackle long-standing SE prediction problems including (but not limited to) software development effort estimation and software defect prediction. I review recent research in Search-Based Predictive Modelling for SE in order to assess the maturity of the field and point out promising research directions. I conclude my keynote by discussing best practices for a rigorous and realistic empirical evaluation of search-based predictive models, a condicio sine qua non to facilitate the adoption of prediction models in software industry practices.
This paper provides an outline of the keynote talk given by Dr. Federica Sarro at SSBSE 2019, with pointers to the literature for details of the results covered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arcuri, A., Briand, L.C.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. STVR 24(3), 219–250 (2014)
Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., Panichella, S.: Multi-objective cross-project defect prediction. In: Proceedings of the IEEE 6th International Conference on Software Testing, Verification and Validation, ICST 2013, pp. 252–261 (2013). https://doi.org/10.1109/ICST.2013.38
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: How effective is Tabu search to configure support vector regression for effort estimation? In: Proceedings of the International Conference on Predictive Models in Software Engineering, PROMISE 2010, pp. 4:1–4:10 (2010). https://doi.org/10.1145/1868328.1868335
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: Using tabu search to configure support vector regression for effort estimation. Empir. Softw. Eng. 18(3), 506–546 (2013). https://doi.org/10.1007/s10664-011-9187-3
Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. In: Caivano, D., Oivo, M., Baldassarre, M.T., Visaggio, G. (eds.) PROFES 2011. LNCS, vol. 6759, pp. 247–261. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21843-9_20
Ferrucci, F., Gravino, C., Oliveto, R., Sarro, F.: Using Tabu search to estimate software development effort. In: Abran, A., Braungarten, R., Dumke, R.R., Cuadrado-Gallego, J.J., Brunekreef, J. (eds.) IWSM 2009. LNCS, vol. 5891, pp. 307–320. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05415-0_22
Ferrucci, F., Gravino, C., Oliveto, R., Sarro, F.: Genetic programming for effort estimation: an analysis of the impact of different fitness functions. In: Proceedings of the 2nd International Symposium on Search Based Software Engineering, SSBSE 2010, pp. 89–98 (2010). https://doi.org/10.1109/SSBSE.2010.20
Ferrucci, F., Harman, M., Sarro, F.: Search-based software project management. In: Ruhe, G., Wohlin, C. (eds.) Software Project Management in a Changing World, pp. 373–399. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55035-5_15
Ferrucci, F., Salza, P., Sarro, F.: Using hadoop MapReduce for parallel genetic algorithms: a comparison of the global, grid and island models. Evol. Comput. 26, 1–33 (2017). https://doi.org/10.1162/evco_a_00213
Ferrucci, F., Gravino, C., Oliveto, R., Sarro, F., Mendes, E.: Investigating Tabu search for web effort estimation. In: Proceedings of EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2010, pp. 350–357 (2010)
Ferrucci, F., Mendes, E., Sarro, F.: Web effort estimation: the value of cross-company data set compared to single-company data set. In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering, pp. 29–38. ACM (2012)
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2012). https://doi.org/10.1109/TSE.2011.103
Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F., Srivisut, K.: Less is more: temporal fault predictive performance over multiple hadoop releases. In: Le Goues, C., Yoo, S. (eds.) SSBSE 2014. LNCS, vol. 8636, pp. 240–246. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09940-8_19
Harman, M.: The relationship between search based software engineering and predictive modeling. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, PROMISE 2010, pp. 1:1–1:13 (2010). https://doi.org/10.1145/1868328.1868330
Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 27th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, ESEC/FSE 2019 (2019)
Langdon, W.B., Dolado, J.J., Sarro, F., Harman, M.: Exact mean absolute error of baseline predictor, MARP0. Inf. Softw. Technol. 73, 16–18 (2016). https://doi.org/10.1016/j.infsof.2016.01.003
Lanza, M., Mocci, A., Ponzanelli, L.: The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw. 33(6), 102–105 (2016). https://doi.org/10.1109/MS.2016.156
Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013). https://doi.org/10.1109/MS.2013.86
Najafi, A., Rigby, P., Shang, W.: Bisecting commits and modeling commit risk during testing. In: Proceedings of the 27th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, ESEC/FSE 2019 (2019)
Braga, P.L., Oliveira, A.L.I., Meira, S.R.L.: A GA-based feature selection and parameters optimization for support vector regression applied to software effort estimation. In: Proceedings of the ACM Symposium on Applied Computing, SAC 2008, pp. 1788–1792 (2008)
Ruchika, M., Megha, K., Rajeev, R.R.: On the application of search-based techniques for software engineering predictive modeling: a systematic review and future directions. Swarm Evol. Comput. 32, 85–109 (2017)
Russo, B.: A proposed method to evaluate and compare fault predictions across studies. In: Proceedings of the 10th International Conference on Predictive Models in Software Engineering, PROMISE 2014, pp. 2–11. ACM (2014). https://doi.org/10.1145/2639490.2639504
Salza, P., Ferrucci, F., Sarro, F.: Elephant56: design and implementation of a parallel genetic algorithms framework on hadoop MapReduce. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, GECCO 2016, pp. 1315–1322 (2016). https://doi.org/10.1145/2908961.2931722
Sarro, F., Di Martino, S., Ferrucci, F., Gravino, C.: A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC 2012, pp. 1215–1220 (2012). https://doi.org/10.1145/2245276.2231967
Sarro, F., Petrozziello, A., Harman, M.: Multi-objective software effort estimation. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, pp. 619–630 (2016). https://doi.org/10.1145/2884781.2884830
Sarro, F.: Search-based approaches for software development effort estimation. In: Proceedings of the 12th International Conference on Product Focused Software Development and Process Improvement, PROFES 2011, pp. 38–43 (2011). https://doi.org/10.1145/2181101.2181111
Sarro, F.: Predictive analytics for software testing: keynote paper. In: Proceedings of the 11th International Workshop on Search-Based Software Testing, SBST 2018, p. 1 (2018). https://doi.org/10.1145/3194718.3194730
Sarro, F., Ferrucci, F., Gravino, C.: Single and multi objective genetic programming for software development effort estimation. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC 2012, pp. 1221–1226 (2012). https://doi.org/10.1145/2245276.2231968
Sarro, F., Harman, M., Jia, Y., Zhang, Y.: Customer rating reactions can be predicted purely using app features. In: Proceedings of 26th IEEE International Requirements Engineering Conference, RE 2018, pp. 76–87 (2018). https://doi.org/10.1109/RE.2018.00018
Sarro, F., Petrozziello, A.: Linear programming as a baseline for software effort estimation. ACM Trans. Softw. Eng. Methodol. 27(3), 12:1–12:28 (2018). https://doi.org/10.1145/3234940
Shepperd, M.J., MacDonell, S.G.: Evaluating prediction systems in software project estimation. Inf. Sofw. Technol. 54(8), 820–827 (2012). https://doi.org/10.1016/j.infsof.2011.12.008
Sigweni, B., Shepperd, M., Turchi, T.: Realistic assessment of software effort estimation models. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016, pp. 41:1–41:6. ACM (2016). https://doi.org/10.1145/2915970.2916005
Xia, X., Shihab, E., Kamei, Y., Lo, D., Wang, X.: Predicting crashing releases of mobile applications. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016, pp. 29:1–29:10 (2016). https://doi.org/10.1145/2961111.2962606
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarro, F. (2019). Search-Based Predictive Modelling for Software Engineering: How Far Have We Gone?. In: Nejati, S., Gay, G. (eds) Search-Based Software Engineering. SSBSE 2019. Lecture Notes in Computer Science(), vol 11664. Springer, Cham. https://doi.org/10.1007/978-3-030-27455-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-27455-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27454-2
Online ISBN: 978-3-030-27455-9
eBook Packages: Computer ScienceComputer Science (R0)