Machine learning techniques for software testing effort prediction


Software testing (ST) has been considered as one of the most important and critical activities of the software development life cycle (SDLC) since it influences directly on quality. When a software project is planned, it is common practice to predict the corresponding ST effort (STEP) as a percentage of predicted SDLC effort. However, the effort range for ST has been reported between 10 and 60% of the predicted SDLC effort. This wide range on STEP causes uncertainty in software managers due to STEP is used for allocating resources to teams exclusively for testing activities, and for budgeting and bidding the projects. In spite of this concern, hundreds of studies have been published since 1981 about SDLC effort prediction models, and only thirty-one STEP studies published in the last two decades were identified (just two of them based their conclusions on statistical significance). The contribution of the present study is to investigate the application for STEP of five machine learning (ML) models reported as the most accurate ones when applied to SDLC effort prediction. The models were trained and tested with data sets of projects selected from an international public repository of software projects. The selection for projects was based on their data quality rating, type of development, development platform, programming language generation, sizing method, and resource level of projects. Results based on statistical significance allow suggesting the application of specific ML models to software projects by type of development, and developed on a determined platform and programming language generation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. Abhilasha, Sharma, A. (2013). Test effort estimation in regression testing. In: IEEE International Conference in MOOC, Innovation and Technology in Education (MITE), Jaipur, India, pp. 343-348.

  2. Abhishek, C., Kumar, V. P., Vitta, H., & Srivastava, P. R. (2010). Test effort estimation using neural network. Journal Software Engineering & Applications, 3, 331–340.

    Article  Google Scholar 

  3. Ali, A., Gravino, C. (2019). A systematic literature review of software effort prediction using machine learning methods, Journal of Software: Evolution and Process, Wiley, 31(10), e2211.

  4. Almeida, É.R.C., Abreu, B.T., Moraes, R. (2009). An alternative approach to test effort estimation based on use cases. In: IEEE International Conference on Software Testing Verification and Validation, pp. 279–288.

  5. Aloka, S., Singh, P., Rakshit, G., Srivastava, P.R. (2011). Test effort estimation-particle swarm optimization based approach, In: Communications in Computer and Information Science, pp. 463–474.

  6. Aranha, E., Borba, P. (2007). An estimation model for test execution effort. In: First International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–9.

  7. Aranha, E., & Borba, P. (2015). Estimating manual test execution effort and capacity based on execution points. International Journal of Computers and Applications, Taylor & Francis, 31(3), 167–172.

    Article  Google Scholar 

  8. Badri, M., & Toure, F. (2012). Empirical Analysis of object-oriented design metrics for predicting unit testing effort of classes. Journal of Software Engineering and Applications, 5, 513–526.

    Article  Google Scholar 

  9. Badri, M., Toure, F., Lamontagne, L. (2015). Predicting unit testing effort levels of classes: An exploratory study based on multinomial logistic regression modeling. In: International Conference on Soft Computing and Software Engineering (SCSE), pp. 529 – 538.

  10. Bardsiri, V. K., Jawawi, D. N. A., & Khatibi, E. (2014). Towards improvement of analogy-based software development effort estimation: A review. International Journal of Software Engineering and Knowledge Engineering (IJSEKE), 24(7), 1065–1089.

    Article  Google Scholar 

  11. Bareja, K., Singhal, A. (2015). A review of estimation techniques to reduce testing efforts in software development. In: Fifth International Conference on Advanced Computing & Communication Technologies, pp. 541–546.

  12. Bhattacharya, P., Srivastava, P.R., Prasad, B. (2012). Software test effort estimation using particle swarm optimization. In: International Conference on Information Systems Design and Intelligent Applications, Advances in Intelligent and Soft Computing, pp. 827–835.

  13. Bock F, German R, Siegl, S. (2016). Mathematical test effort estimation for dependability assessment of sensor-based driver assistance systems. In: 42th Euromicro Conference on Software Engineering and Advanced Applications, pp. 222–226.

  14. Bock F, Siegl S, German, R. (2017). Analytical test effort estimation for multisensor driver assistance systems. In: IEEE 43rd Euromicro Conference on Software Engineering and Advanced Applications, pp. 239–246.

  15. Boehm, B.W. (2017). Software cost estimation meets software diversity. In: IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 495–496.

  16. Boser, B.E., Guyon, I.M., Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers, In: 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152.

  17. Bourque, P., Fairley, R. (2014). Guide to the software engineering body of knowledge, SWEBOK V3.0, IEEE Computer Society

  18. Calzolari, F., Tonella, P., & Antonio, G. (2001). Maintenance and testing effort modeled by linear and nonlinear dynamic systems. Information and Software Technology, Elsevier, 43(8), 477–486.

    Article  Google Scholar 

  19. Carbonera, C. E., Farias, K., & Bischoff, V. (2020). Software development effort estimation: A systematic mapping study. IET Software, 14(4), 328–344.

    Article  Google Scholar 

  20. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27.

    Article  Google Scholar 

  21. Cortes, C., & Vapnik, V. N. (1995). Support-vector networks. Machine Learning., 20, 273–297.

    Article  MATH  Google Scholar 

  22. Dawson, C. W. (1998). An artificial neural network approach to software testing effort estimation. WIT Transactions on Information and Communication Technologies, 20, 1–11.

    Article  Google Scholar 

  23. Ege-Adalı, O., Alpay, N., Gürel, Z. , Tahir, T., Gencel, C. (2017). Software test effort estimation, state of the art in Turkish software industry. In: IEEE 43rd Euromicro Conference on Software Engineering and Advanced Applications, pp. 412–420.

  24. Felipe, N. F., Cavalcanti, R. P., Maia, E. H. B., Amaral, W. P., Farnese, A. C., Tavares, L. D., et al. (2014). A comparative study of three test effort estimation methods. Revista Cubana de Ciencias Informáticas, 8, 1–13.

    Google Scholar 

  25. Fernández-Diego, M., & González-Ladrón-de-Guevara, F. (2014). Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review. Information and Software Technology, Elsevier., 56(6), 527–544.

    Article  Google Scholar 

  26. Ferrucci, F., Gravino, C., Sarro, F. (2014). Exploiting prior-phase effort data to estimate the effort for the subsequent phases: A further assessment. In: 10th International Conference on Predictive Models in Software Engineering (PROMISE), pp. 42–51.

  27. Finschi, I. (1996). An Implementation of The Levenberg-Marquardt Algorithm, Eidgenössische Technische Hochschule Zürich.

  28. Fox, J. P. (2010). Bayesian item response modeling. Theory and Applications, Statistics for Social and Behavioral Sciences, Springer.

    Article  MATH  Google Scholar 

  29. García-Floriano, A., Ferreira-Santiago, Á., Camacho-Nieto, O., & Yáñez-Márquez, C. (2019). A machine learning approach to medical image classification: Detecting age-related macular degeneration in fundus images. Computers & Electrical Engineering, Elsevier, 75, 218–229.

    Article  Google Scholar 

  30. Garousi, V., & Mäntylä, M. V. (2016). A systematic literature review of literature reviews in software testing. Information and Software Technology, Elsevier, 80, 195–216.

    Article  Google Scholar 

  31. Gass, S., & Fu, M. C. (2013). Lagrange Multipliers. Encyclopedia of operations research and management science: Springer.

    Google Scholar 

  32. Gautam, S.S., Singh, V. (2018). The state-of-the-art in software development effort estimation. Journal of Software: Evolution and Process, Wiley, e1983.

  33. Grover, M., Bhatia, P.K., Mittal, H. (2017). Estimating software test effort based on revised UCP model using fuzzy technique. In: Information and Communication Technology for Intelligent Systems (ICTIS), pp. 490–498.

  34. Gupta, A., Tripathi, A., Kuswaha, D.S. (2015). Use case based approach to analyze software change impact and its regression test effort estimation. In: Advanced Computer and Communication Engineering Technology (LNEE), pp. 1057–1067.

  35. Hassouna, A., & Tahvildari, L. (2010). An effort prediction framework for software defect correction. Information and Software Technology, Elsevier, 52, 197–209.

    Article  Google Scholar 

  36. Haykin, S. (2009). Neural networks and learning machines. Pearson: Third Edition.

    Google Scholar 

  37. Humphrey, W.S. (1995). A Discipline for Software Engineering, First Edition, Addison-Wesley

  38. Idri, A., Amazal, F. A., & Abran, A. (2015). Analogy-based software development effort estimation: A systematic mapping and review. Information and Software Technology. Elsevier, 58, 206–230.

    Article  Google Scholar 

  39. IFPUG. (2020). The International Function Point Users Group, . Accessed June 15, 2020

  40. ISBSG. (2018). Guidelines for use of the ISBSG data, Release 2018, International Software Benchmarking Standards Group.

  41. ISO/IEC 24570:2018. (2018). Software engineering, NESMA functional size measurement method, Definitions and counting guidelines for the application of function point analysis.

  42. ISO/IEC 29881:2010. (2010). Information technology, Systems and software engineering, FiSMA 1.1 functional size measurement method.

  43. Jayakumar, K. R., & Abran, A. (2013). A survey of software test estimation techniques. Journal of Software Engineering and Applications, 6, 47–52.

    Article  Google Scholar 

  44. Ji, H., Huang, S., Wu, Y., Hui, Z., & Zheng, C. (2019). A new weighted naive Bayes method based on information diffusion for software defect prediction. Software Quality Journal, Springer, 27, 923–968.

    Article  Google Scholar 

  45. Jorgensen, M., & Shepperd, M. (2007). A systematic review of software development cost estimation studies. IEEE Transactions on Software Engineering, 33(1), 33–53.

    Article  Google Scholar 

  46. Kantardzic, M. (2011). Data Mining. Concepts, Models, Methods, and Algorithms, Second Edition, Wiley

  47. Kaur, A., & Kaur, K. (2019). Investigation on test effort estimation of mobile applications: Systematic literature review and survey. Information and Software Technology, Elsevier, 110, 56–77.

    Article  Google Scholar 

  48. Khurana, P., Tripathi, A., Kushwaha, D.S. (2012). Change Impact Analysis and its Regression Test Effort Estimation. In: 3rd IEEE International Advance Computing Conference (IACC), pp. 1420–1424.

  49. Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEE Proceedings Software, 148(3), 81–85.

    Article  Google Scholar 

  50. Kitchenham, B.A., Mendes, E. (2009). Why comparative effort prediction studies may be invalid. In: 5th International Conference on Predictor Models in Software Engineering (PROMISE), pp. 1–5.

  51. Koza, J. R. (1998). Genetic Programming: On the programming of computers by means of natural selection. Cambridge: The MIT Press.

    Google Scholar 

  52. Koza, J. R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and Computing, Springer, 4, 87–112.

    Article  Google Scholar 

  53. Kushwaha, D. S., & Misra, A. K. (2008). Software test effort estimation. ACM SIGSOFT Software Engineering Notes, 33(3), 1–6.

    Article  Google Scholar 

  54. Langdon, W.B., Poli, R. (2011). Foundations of genetic programming, Springer-Verlag

  55. Li, J. J., Ulrich, A., Bai, X., & Bertolino, A. (2020). Advances in test automation for software with special focus on artificial intelligence and machine learning. Software Quality Journal, Springer, 28, 245–248.

    Article  Google Scholar 

  56. Li, Z., Jing, X.-Y., & Zhu, X. (2018). Progress on approaches to software defect prediction. IET Software, 12(3), 161–175.

    Article  Google Scholar 

  57. Lin, C. T., & Huang, C. Y. (2008). Enhancing and measuring the predictive capabilities of testing-effort dependent software reliability models. The Journal of Systems and Software, Elsevier, 81, 1025–1038.

    Article  Google Scholar 

  58. Llermeier, E.H. (2007). Case-Based Approximate Reasoning, Springer

  59. Lyu, M.R. (1996). Handbook of Software Reliability Engineering. IEEE Computer Society Press.

  60. Mahmood, Y., Kama, N., Azmi, A. (2020). A systematic review of studies on use case points and expert‐based estimation of software development effort. Journal of Software: Evolution and Process, Wiley. e2245.

  61. Malhotra, R., & Khanna, M. (2018). Threats to validity in search-based predictive modelling for software engineering. IET Software, 12(4), 293–305.

    Article  Google Scholar 

  62. Mensah, S., Keung, J., Bennin, K.E., Bosu, M.F. (2016). Multi-Objective Optimization for Software Testing Effort Estimation. In: The 28th International Conference on Software Engineering and Knowledge Engineering, (SEKE), pp. 527–530.

  63. Moore, D. S., McCabe, G. P., & Craig, B. A. (2009). Introduction to the Practice of Statistics (6th ed.). Freeman and Company: W. H.

    Google Scholar 

  64. Nageswaran, S. (2001). Test effort estimation using use case points. In: 14th International Internet & Software Quality, pp. 1–6.

  65. Nguyen, V., Pham, V., Lam, V. (2013). qEstimation: A Process for Estimating Size and Effort of Software Testing. In: International Conference on Software and System Process, pp. 20–28.

  66. Prasad, D.S.U.M., Chacko, S., Kanakadandi, S.S.P., Durbhaka, G.K. (2014). Automated Regression Test Suite Optimization based on Heuristics. In: 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, pp: 48–53.

  67. Punitha, K., Chitra, S. (2013). Software defect prediction using software metrics - A survey. In: International Conference on Information Communication and Embedded Systems (ICICES), Chennai, India.

  68. Richter, M. M., & Weber, R. O. (2013). Case-Based Reasoning. Springer.

  69. Rokach, L., & Maimon, O. (2005). Decision trees. Data Mining and Knowledge Discovery Handbook: Springer, Boston.

    Google Scholar 

  70. Savolainen, P., Ahonen, J. J., & Richardson, I. (2012). Software development project success and failure from the supplier’s perspective: A systematic literature review. International Journal of Project Management, Elsevier, 30, 458–469.

    Article  Google Scholar 

  71. Sette, S., & Boullart, L. (2001). Genetic programming: Principles and applications. Engineering Applications of Artificial Intelligence, Elsevier, 14(6), 727–736.

    Article  Google Scholar 

  72. Sharma, A., & Kushwaha, D. S. (2012). Applying requirement based complexity for the estimation of software development and testing effort. ACM SIGSOFT Software Engineering Notes, 37(1), 1–11.

    Article  Google Scholar 

  73. Sharma, A., & Kushwaha, D. S. (2013). An empirical approach for early estimation of software testing effort using SRS document. CSI Transactions on ICT, 1(1), 51–66.

    Article  Google Scholar 

  74. Shepperd, M., & MacDonell, S. (2012). Evaluating prediction systems in software project estimation. Information and Software Technology, Elsevier, 54(8), 820–827.

    Article  Google Scholar 

  75. Silva, D.G., Abreu, B.T., Jino, M. (2009). A simple approach for estimation of execution effort of functional test cases, In: International Conference on Software Testing Verification and Validation, pp. 289–298.

  76. Silva, D.G., Jino, M., Abreu, B.T. (2010). Machine learning methods and asymmetric cost function to estimate execution effort of software testing. In: Third IEEE International Conference on Software Testing, Verification and Validation, pp. 275–284.

  77. Singh, Y., Kaur, A., Malhotra, R. (2008). Predicting testing effort using artificial neural network, In: World Congress on Engineering and Computer Science (WCECS), pp. 1–6.

  78. Song, Q., Shepperd, M., Cartwright, M., & Mair, C. (2006). Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), 69–82.

    Article  Google Scholar 

  79. Srivastava, P.R., Bidwai, A., Khan, A., Rathore, K., Sharma, R., Yang, X.S. (2014). An empirical study of test effort estimation based on bat algorithm, In: International Journal of Bio-Inspired Computation, pp. 57–70.

  80. Srivastava, P. R., Kumar, S., Singh, A. P., & Raghurama, G. (2011). Software testing effort: An assessment through fuzzy criteria approach. Journal of Uncertain Systems, 5(3), 183–201.

    Google Scholar 

  81. Srivastava, PR, Varshney, A., Nama, P. (2012). Software test effort estimation: a model based on cuckoo search. International Journal of Bio-Inspired Computation, pp. 278–285.

  82. Stewart, B. (2002). Predicting project delivery rates using the Naive-Bayes classifier. Journal of Software Maintenance and Evolution: Research and Practice, Wiley, 14, 161–179.

    Article  MATH  Google Scholar 

  83. Tiwari, R., & Goel, N. (2013). Reuse: Reducing test effort. ACM SIGSOFT Software Engineering Notes, 38(2), 1–11.

    Article  Google Scholar 

  84. Tripathy, P., Naik, K. (2014). Software Evolution and Maintenance: A Practitioner's Approach, Wiley

  85. Vapnik, V.N. (1998). Statistical Learning Theory, Wiley-Interscience

  86. Veenendaal, E.P.W.M., Dekkers, T. (1999). Test point analysis: a method for test estimation. In: 10th European Software Control and Metrics conference and the 2nd SCOPE conference on software product evaluation, pp. 47–59.

  87. Villuendas-Rey, Y., Rey-Benguría, C. F., Ferreira-Santiago, Á., Camacho-Nieto, O., & Yáñez-Márquez, C. (2017). The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing, Elsevier, 265, 105–115.

    Article  Google Scholar 

  88. Wen, J., Li, S., Lin, Z., Hu, Y., & Huang, C. (2012). Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology, Elsevier., 54, 41–59.

    Article  Google Scholar 

  89. Witten, I. H., & Frank, E. (2005). Data Mining. Second Edition, Elsevier: Practical Machine Learning Tools and Techniques.

    Google Scholar 

  90. Xiaochun, Z., Bo, Z., Fan, W., Yi, Q., Lu, C. (2008). Estimate Test Execution Effort at an Early Stage: An Empirical Study. In: IEEE International Conference on Cyberworlds, pp. 195–200,

  91. Yáñez-Márquez, C., López-Yáñez, I., Aldape-Pérez, M., Camacho Nieto, O., Argüelles-Cruz, J. A., & Villuendas-Rey, Y. (2018). Theoretical foundations for the alpha-beta associative memories: 10 years of derived extensions, models, and applications. Neural Processing Letters, Elsevier, 48(2), 811–847.

    Article  Google Scholar 

  92. Yenigun, H., Yevtushenko, N., & Cavalli, A. R. (2019). Guest editorial: Special issue on testing software and systems. Software Quality Journal, Springer, 27, 497–499.

    Article  Google Scholar 

Download references


I would like to thank CUCEA of the Universidad de Guadalajara, Jalisco, México; Programa para el Desarrollo Profesional Docente (PRODEP), as well as to Consejo Nacional de Ciencia y Tecnología (Conacyt). In addition, I appreciate the help of the Ph.D. Student Felipe Belmont Polanco for his collaboration in the GP algorithm executions.

Author information



Corresponding author

Correspondence to Cuauhtémoc López-Martín.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Number of projects for effort analysis (TD, type of development; DP, development platform; LT, language type; CA, count approach; RL, resource level; NSP, number of software projects)

New MF 2GL FISMA 1 26 Enhancement MF 2GL IFPUG V4+ 1 2
  MR 3GL FISMA 1 10    3GL FISMA 1 47
  Multi 3GL FISMA 1 37    3GL IFPUG V4+ 1 76
  PC 3GL FISMA 1 15    3GL IFPUG V4+ 2 3
  MF 4GL FISMA 1 4    3GL FISMA 4 3
  MR 4GL FISMA 1 4    3GL IFPUG V4+ 4 7
  Multi 4GL FISMA 1 10    4GL FISMA 1 6
  PC 4GL FISMA 1 2    4GL IFPUG V4+ 1 8
  MR ApG FISMA 1 4    4GL IFPUG V4+ 4 2
  Multi ApG FISMA 1 2    ApG FISMA 1 4
  MF 3GL FISMA 4 1    ApG IFPUG V4+ 1 53
  PC 3GL FISMA 4 2    ApG IFPUG V4+ 2 1
  MF 2GL IFPUG V4+ 1 2   MR 3GL FISMA 1 4
  MF 3GL IFPUG V4+ 1 51    3GL IFPUG V4+ 1 29
  MR 3GL IFPUG V4+ 1 9    3GL NESMA 2 1
  Multi 3GL IFPUG V4+ 1 1    4GL FISMA 1 1
  PC 3GL IFPUG V4+ 1 21    4GL IFPUG V4+ 1 39
  MF 4GL IFPUG V4+ 1 12    ApG FISMA 1 1
  MR 4GL IFPUG V4+ 1 8   Multi 3GL FISMA 1 13
  Multi 4GL IFPUG V4+ 1 3    3GL IFPUG V4+ 1 20
  PC 4GL IFPUG V4+ 1 26    3GL NESMA 4 1
  MF ApG IFPUG V4+ 1 2    4GL FISMA 1 4
  MF 3GL IFPUG V4+ 2 4    4GL IFPUG V4+ 4 1
  MR 3GL IFPUG V4+ 2 3    ApG FISMA 1 9
  MF 4GL IFPUG V4+ 2 2   PC 3GL IFPUG V4+ 1 54
  MR 4GL IFPUG V4+ 2 1    3GL IFPUG V4+ 4 4
  PC 4GL IFPUG V4+ 2 2    4GL IFPUG V4+ 1 34
  MF ApG IFPUG V4+ 2 2    4GL IFPUG V4+ 4 1
  MR 3GL IFPUG V4+ 3 1       
  PC 4GL IFPUG V4+ 3 1       
  MF 3GL IFPUG V4+ 4 3       
  MR 3GL IFPUG V4+ 4 1       
  Multi 3GL IFPUG V4+ 4 2       
  PC 3GL IFPUG V4+ 4 9       
  MF 4GL IFPUG V4+ 4 1       
  MR 4GL IFPUG V4+ 4 2       
  Multi 4GL IFPUG V4+ 4 5       
  PC 4GL IFPUG V4+ 4 9       
  MF ApG IFPUG V4+ 4 1       

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

López-Martín, C. Machine learning techniques for software testing effort prediction. Software Qual J (2021).

Download citation


  • Testing effort prediction
  • Machine learning models
  • Statistical regression