Neural Computing and Applications

, Volume 27, Issue 8, pp 2229–2240 | Cite as

Metaheuristic optimization of multivariate adaptive regression splines for predicting the schedule of software projects

  • Angel Ferreira-Santiago
  • Cuauhtémoc López-Martín
  • Cornelio Yáñez-Márquez
Predictive Analytics Using Machine Learning


A qualitative common perception of the software industry is that it finishes its projects late and over budget, whereas from a quantitative point of view, only 39 % of software projects are finished on time compared to the schedule when the project started. This low percentage has been attributed to factors such as unrealistic time frames and lack of planning regarding poor prediction. The main techniques used for predicting project schedule have mainly been based on expert judgment and mathematical models. In this study, a new model, derived from the multivariate adaptive regression splines (MARS) model, is proposed. This new model, optimized MARS (OMARS), uses a simulated annealing process to find a transformation of the input data space prior to applying MARS in order to improve accuracy when predicting the schedule of software projects. The prediction accuracy of the OMARS model is compared to that of stand-alone MARS and a multiple linear regression (MLR) model with a logarithmic transformation. The two independent variables used for training and testing the models are functional size, which corresponds to a composite value of 19 independent variables, and the maximum size of the team of developers. The data set of projects was obtained from the International Software Benchmarking Standards Group (ISBSG) Release 11. Results based on the absolute residuals and t paired and Wilcoxon statistical tests showed that prediction accuracy with OMARS is statistically better than that with the MARS and MLR models.


Software project schedule prediction Multivariate adaptive regression splines Statistical regression Simulated annealing ISBSG 



The authors would like to thank the CUCEA of Universidad de Guadalajara, Jalisco, México, the Instituto Politécnico Nacional, and the Consejo Nacional de Ciencia y Tecnología (CONACyT) for their support during the development of this work.


  1. 1.
    Abran A, Moore JW (2004) The guide to the software engineering body of knowledge. IEEE Computer Society, SWEBOKGoogle Scholar
  2. 2.
    Jørgensen M, Shepperd MJ (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53. doi: 10.1109/TSE.2007.256943 CrossRefGoogle Scholar
  3. 3.
    Pressman R, Maxim B (2014) Software engineering: a practitioner’s approach, 8th edn. McGraw Hill, LondonGoogle Scholar
  4. 4.
    ISBSG (2011) Guidelines for use of the ISBSG data. Release 11, International Software Benchmarking Standards GroupGoogle Scholar
  5. 5.
    Sheetz SD, Henderson D, Wallace L (2009) Understanding developer and manager perceptions of function points and source lines of code. J Syst Softw 82:1540–1549. doi: 10.1016/j.jss.2009.04.038 CrossRefGoogle Scholar
  6. 6.
    Berlin S, Raz T, Glezer C, Zviran M (2009) Comparison of estimation methods of cost and duration in IT projects. Inf Softw Technol 51:738–748. doi: 10.1016/j.infsof.2008.09.007 CrossRefGoogle Scholar
  7. 7.
    Alyahya MA, Ahmad R, Lee SP (2009) Effect of CMMI-based software process maturity on software schedule estimation. Malays J Comput Sci 22(2):121–137Google Scholar
  8. 8.
    Agrawal M, Chari K (2007) Software effort, quality, and cycle time: a study of CMM level 5 projects. IEEE Trans Softw Eng 33(3):145–156. doi: 10.1109/TSE.2007.29 CrossRefGoogle Scholar
  9. 9.
    Wilkie G, McChesney IR, Morrow P, Tuxworth C, Lester NG (2011) The value of software sizing. Inf Softw Technol 53:1236–1249. doi: 10.1016/j.infsof.2011.05.008 CrossRefGoogle Scholar
  10. 10.
    Laranjeira LA (1990) Software size estimation of object-oriented systems. IEEE Trans Softw Eng 16(5):510–522. doi: 10.1109/32.52774 CrossRefGoogle Scholar
  11. 11.
    Hakkarainen J, Laamanen P, Rask R (1993) Neural networks in specification-level software size estimation. In: Conference on system sciences, pp 626–634Google Scholar
  12. 12.
    MacDonnel SG (2003) Software source code sizing using fuzzy logic modeling. Inf Softw Technol 45(7):389–404CrossRefGoogle Scholar
  13. 13.
    Halkjelsvik T, Jørgensen M (2012) From origami to software development: a review of studies on judgment-based predictions of performance time. Psychol Bull 138(2):238–271. doi: 10.1037/a0025996 CrossRefGoogle Scholar
  14. 14.
    López-Martín C, Alain A (2012) Applying expert judgment to improve an individual’s ability to predict software development effort. Int J Softw Eng Knowl Eng (IJSEKE) 22(4):467–483. doi: 10.1142/S0218194012500118 CrossRefGoogle Scholar
  15. 15.
    Yang Y, He Z, Mao K, Li Q, Nguyen V, Boehm B, Valerdi R (2013) Analyzing and handling local bias for calibrating parametric cost estimation models. Inf Softw Technol 55(8):1496–1511. doi: 10.1016/j.infsof.2013.03.002 CrossRefGoogle Scholar
  16. 16.
    Yeong-Seok S, Doo-Hwan B, Ross J (2013) AREION: software effort estimation based on multiple regressions with adaptive recursive data partitioning. Inf Softw Technol 55(10):1710–1725. doi: 10.1016/j.infsof.2013.03.007 CrossRefGoogle Scholar
  17. 17.
    Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59. doi: 10.1016/j.infsof.2011.09.002 CrossRefGoogle Scholar
  18. 18.
    Zapata AH, Chaudron MRV (2013) An empirical study into the accuracy of IT estimations and its influencing factors. Int J Softw Eng Knowl Eng (IJSEKE) 23(4):409–432. doi: 10.1142/S0218194013400081 CrossRefGoogle Scholar
  19. 19.
    Oligny S, Bourque P, Abran A (1997) An empirical assessment of project duration models in software engineering. In: European software control and metrics conference (1997)Google Scholar
  20. 20.
    Kitchenham BA, Pfleeger SL, McColl B, Eagan S (2002) An empirical study of maintenance and development estimation accuracy. J Syst Softw 64(1):57–77. doi: 10.1016/S0164-1212(02)00021-3 CrossRefGoogle Scholar
  21. 21.
    Bourque P, Oligny S, Abran A, Fournier B (2007) Developing project duration models in software engineering. J Comput Sci Technol 22(3):348–357. doi: 10.1007/s11390-007-9051-5 CrossRefGoogle Scholar
  22. 22.
    Wang YR, Yu CY, Chan HH (2012) Predicting construction cost and schedule success using artificial neural network ensemble and support vector machine classification models. Int J Project Manag 30:470–478. doi: 10.1016/j.ijproman.2011.09.002 CrossRefGoogle Scholar
  23. 23.
    Lopez-Martín C, Chavoya A, Meda-Campaña ME (2013) Use of a feedforward neural network for predicting the development duration of software projects. In: 12th international conference on machine learning and applications (ICMLA13). doi: 10.1109/ICMLA.2013.182
  24. 24.
    Harter DE, Krishnan MS, Slaughter SA (2000) Effects of process maturity on quality, cycle time, and effort in software product development. Manag Sci 46(4):451–466. doi: 10.1287/mnsc.46.4.451.12056 CrossRefGoogle Scholar
  25. 25.
    Chaos Manifiesto 2013 Think Big, Act Small (2013). The Standish GroupGoogle Scholar
  26. 26.
    The Standish Group Report, CHAOS (2014)Google Scholar
  27. 27.
    Chao-Jung H, Chin-Yu H (2011) Comparison of weighted grey relational analysis for software effort estimation. Softw Qual J 19(1):165–200. doi: 10.1007/s11219-010-9110-y CrossRefGoogle Scholar
  28. 28.
    Kitchenham BA, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: IEEE 5th international conference on predictor models in software engineering, PROMISE. doi: 10.1145/1540438.1540444
  29. 29.
    Montgomery D, Peck E (2001) Introduction to linear regression analysis. Wiley, New YorkzbMATHGoogle Scholar
  30. 30.
    Boehm B, Abts C, Brown AW, Chulani S, Clarck BK, Horowitz E, Madachy R, Reifer D, Steece B (2000) COCOMO II. Prentice Hall, Upper Saddle RiverGoogle Scholar
  31. 31.
    Fernández-Diego M, González-Ladrón-de-Guevara F (2014) Potential and limitations of the ISBSG dataset in enhancing software engineering research: a mapping review. Inf Softw Technol 56(6):527–544. doi: 10.1016/j.infsof.2014.01.003 CrossRefGoogle Scholar
  32. 32.
    Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54:820–827. doi: 10.1016/j.infsof.2011.12.008 CrossRefGoogle Scholar
  33. 33.
    Symons C (2012) Exploring software project effort versus duration trade-offs. IEEE Softw 29(4):67–74. doi: 10.1109/MS.2011.126 CrossRefGoogle Scholar
  34. 34.
    Chou SM, Lee TS, Shao YE, Chen IF (2004) Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert Syst Appl 27:133–142. doi: 10.1016/j.eswa.2003.12.013 CrossRefGoogle Scholar
  35. 35.
    Lee TS, Chiu CC, Chou YC, Lu CJ (2006) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50:1113–1130. doi: 10.1016/j.csda.2004.11.006 MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67. doi: 10.1214/aos/1176347963 MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Friedman JH, Roosen CB (1995) An introduction to multivariate adaptive regression splines. Stat Methods Med Res 4:197–217. doi: 10.1177/096228029500400303 CrossRefGoogle Scholar
  38. 38.
    Garmus D, Herron G (1996) Measuring the software process: a practical guide to functional measurements. Prentice Hall, Upper Saddle RiverGoogle Scholar
  39. 39.
    Myrtveit I, Stensrud E (2012) Validity and reliability of evaluation procedures in comparative studies of effort prediction models. Empir Softw Eng 17(1–2):23–33. doi: 10.1007/s10664-011-9183-7 CrossRefGoogle Scholar
  40. 40.
    Nassif AB, Ho D, Capretz LF (2013) Towards an early software estimation using log-linear regression and a multilayer perceptron model. J Syst Softw 86(1):144–160. doi: 10.1016/j.jss.2012.07.050 CrossRefGoogle Scholar
  41. 41.
    Shin M, Goel AL (2000) Empirical data modeling in software engineering using radial basis functions. IEEE Trans Softw Eng 26(6):567–576. doi: 10.1109/32.852743 CrossRefGoogle Scholar
  42. 42.
    Kocaguneli E, Menzies T (2013) Software effort models should be assessed via leave-one-out validation. J Syst Softw 86:1879–1890. doi: 10.1016/j.jss.2013.02.053 CrossRefGoogle Scholar
  43. 43.
    Ross SM (2004) Introduction to probability and statistics for engineers and scientists, 3rd edn. Elsevier Press, AmsterdamzbMATHGoogle Scholar
  44. 44.
    Mukhopadhyay A, Iqbal A (2006) Comparison of ANN and MARS in prediction of property of steel strips. In: Applied soft computing technologies: the challenge of complexity. Springer, Berlin, pp 329–341. doi: 10.1007/3-540-31662-0_26
  45. 45.
    De Andrés J, Lorca P, De Cos Juez FJ, Sánchez-Lasheras F (2011) Bankruptcy forecasting: a hybrid approach using fuzzy c-means clustering and multivariate adaptive regression splines (MARS). Expert Syst Appl 38:1866–1875. doi: 10.1016/j.eswa.2010.07.117 CrossRefGoogle Scholar
  46. 46.
    Lu CJ, Lee TS, Lian CM (2012) Sales forecasting for computer wholesalers: a comparison of multivariate adaptive regression splines and artificial neural networks. Decis Support Syst 54:584–596. doi: 10.1016/j.dss.2012.08.006 CrossRefGoogle Scholar
  47. 47.
    Kooperberg C (2013) Polspline: Polynomial spline routines.
  48. 48.
    Miguéis VL, Camanho A, Falcão e Cunha J (2013) Customer attrition in retailing: an application of multivariate adaptive regression splines. Expert Syst Appl 40:6225–6232. doi: 10.1016/j.eswa.2013.05.069 CrossRefGoogle Scholar
  49. 49.
    Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38:375–397. doi: 10.1109/TSE.2011.55 CrossRefGoogle Scholar
  50. 50.
    Pahariya JS, Ravi V, Carr M (2009) Software cost estimation using computational intelligence techniques. In: World congress on nature biologically inspired computing, NaBIC, pp 849–854. doi: 10.1109/NABIC.2009.5393534
  51. 51.
    Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680. doi: 10.1126/science.220.4598.671 MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Dekkers A, Aarts E (1991) Global optimization and simulated annealing. Math Program 50:367–393. doi: 10.1007/BF01594945 MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    Selim SZ, Alsultan K (1991) A simulated annealing algorithm for the clustering problem. Pattern Recognit 24:1003–1008. doi: 10.1016/0031-3203(91)90097-O MathSciNetCrossRefGoogle Scholar
  54. 54.
    Aerts JCJH, Heuvelink GBM (2002) Using simulated annealing for resource allocation. Int J Geogr Inf Sci 16:571–587. doi: 10.1080/13658810210138751 CrossRefGoogle Scholar
  55. 55.
    Cheng MY, Cao MT (2014) Accurately predicting building energy performance using evolutionary multivariate adaptive regression splines. Appl Soft Comput 22:178–188. doi: 10.1016/j.asoc.2014.05.015 CrossRefGoogle Scholar
  56. 56.
    Cheng MY, Cao MT (2014) Evolutionary multivariate adaptive regression splines for estimating shear strength in reinforced-concrete deep beams. Eng Appl Artif Intell 28:86–96. doi: 10.1016/j.engappai.2013.11.001 CrossRefGoogle Scholar
  57. 57.
  58. 58.
    Conover WJ (1999) Practical nonparametric statistics, 3rd edn. Wiley, New YorkGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2015

Authors and Affiliations

  • Angel Ferreira-Santiago
    • 1
  • Cuauhtémoc López-Martín
    • 2
  • Cornelio Yáñez-Márquez
    • 1
  1. 1.Laboratorio de Redes Neuronales y Cómputo no Convencional, Instituto Politécnico NacionalCentro de Investigación en ComputaciónMexico CityMexico
  2. 2.Department of Information SystemsUniversidad de GuadalajaraGuadalajaraMexico

Personalised recommendations