Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges

  • Ayman Jalal Hassan AlmutlaqEmail author
  • Dayang N. A. JawawiEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1073)


Software effort estimation is one the critical aspects of software engineering. It revolves around predicting the required efforts needed to complete a software task. However, any estimation technique or model relies on an input data in which it defines and predicts future values. Missing data and values within such data is a common occurrence in the software development industry and thus it leads to inaccurate predictions or misleading results. Thus, Missing Data is an important aspect of effort estimation models that is required to be addressed. However, Missing Data is not without its gaps and issues. This review aims at elaborating the recent issues and gaps that exist within the missing data and software effort estimation field. This may allow future researchers to get a better grasp and understanding of the inner workings of Missing Data and the methods through which these challenges can be addressed.


Missing Data Software Effort Estimation Imputation methods Literature review 



The authors fully acknowledge Universiti Teknologi Malaysia for UTM-TDR Grant Vot No. 06G23, and Ministry of Higher Education (MOHE) for FRGS Grant Vot No. 5F117, which have made this research endeavor possible.


  1. 1.
    Huang, J., Li, Y.-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)CrossRefGoogle Scholar
  2. 2.
    Dizaji, Z.A., Gharehchopogh, F.S.: A hybrid of ant colony optimization and chaos optimization algorithms approach for software cost estimation. Indian J. Sci. Technol. 8(2), 128 (2015)CrossRefGoogle Scholar
  3. 3.
    Papatheocharous, E., et al.: An investigation of effort distribution among development phases: a four-stage progressive software cost estimation model. J. Softw. Evol. Process 29(10), e1881 (2017)CrossRefGoogle Scholar
  4. 4.
    Trendowicz, A., Jeffery, R.: Software project effort estimation. In: Foundations and Best Practice Guidelines for Success, Constructive Cost Model – COCOMO 2014, pp. 277–293 (2014)Google Scholar
  5. 5.
    Wen, J., et al.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)CrossRefGoogle Scholar
  6. 6.
    Song, L., Minku, L.L., Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(1), 5 (2019)CrossRefGoogle Scholar
  7. 7.
    Azzeh, M., Nassif, A.B., Banitaan, S.: Comparative analysis of soft computing techniques for predicting software effort based use case points. IET Softw. 12(1), 19–29 (2017)CrossRefGoogle Scholar
  8. 8.
    Twala, B., Cartwright, M.: Ensemble missing data techniques for software effort prediction. Intell. Data Anal. 14(3), 299–331 (2010)CrossRefGoogle Scholar
  9. 9.
    Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76(3), 449–462 (2005)CrossRefGoogle Scholar
  10. 10.
    Lethbridge, T.C., Sim, S.E., Singer, J.: Studying software engineers: data collection techniques for software field studies. Empir. Softw. Eng. 10(3), 311–341 (2005)CrossRefGoogle Scholar
  11. 11.
    Mockus, A.: Missing data in software engineering. In: Guide to Advanced Empirical Software Engineering, pp. 185–200. Springer (2008)Google Scholar
  12. 12.
    Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Softw. Eng. 21(2), 126–137 (1995)CrossRefGoogle Scholar
  13. 13.
    Sarro, F., Petrozziello, A., Harman, M.: Multi-objective software effort estimation. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE (2016)Google Scholar
  14. 14.
    Andrew, B., Selamat, A.: Systematic literature review of missing data imputation techniques for effort prediction. In: International Conference on Information and Knowledge Management, Singapore (2012)Google Scholar
  15. 15.
    Keele, S.: Guidelines for performing systematic literature reviews in software engineering. In: Version 2.3, EBSE Technical report (2007)Google Scholar
  16. 16.
    Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 1–23 (2019)Google Scholar
  17. 17.
    Zhang, W., Yang, Y., Wang, Q.: Using Bayesian regression and EM algorithm with missing handling for software effort prediction. Inf. Softw. Technol. 58, 58–70 (2015)CrossRefGoogle Scholar
  18. 18.
    Huang, J., et al.: An empirical study of dynamic incomplete-case nearest neighbor imputation in software quality data. In: 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE (2015)Google Scholar
  19. 19.
    Soltanveis, F., Alizadeh, S.H.: Using parametric regression and KNN algorithm with missing handling for software effort prediction. In: 2016 Artificial Intelligence and Robotics (IRANOPEN). IEEE (2016)Google Scholar
  20. 20.
    Jing, X.-Y., et al.: Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation. In: Proceedings of the 38th International Conference on Software Engineering. ACM (2016)Google Scholar
  21. 21.
    Huang, J., Sun, H.: Grey relational analysis based k nearest neighbor missing data imputation for software quality datasets. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2016)Google Scholar
  22. 22.
    Idri, A., Abnane, I., Abran, A.: Missing data techniques in analogy-based software development effort estimation. J. Syst. Softw. 117, 595–611 (2016)CrossRefGoogle Scholar
  23. 23.
    Abnane, I., Idri, A.: Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)Google Scholar
  24. 24.
    Bala, A., Abran, A.: Use of the multiple imputation strategy to deal with missing data in the ISBSG repository. J. Inf. Technol. Softw. Eng. 6, 171 (2016)Google Scholar
  25. 25.
    Huang, J., et al.: Cross-validation based K nearest neighbor imputation for software quality datasets: an empirical study. J. Syst. Softw. 132, 226–252 (2017)CrossRefGoogle Scholar
  26. 26.
    Huang, J., et al.: An empirical analysis of three-stage data-preprocessing for analogy-based software effort estimation on the ISBSG data. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2017)Google Scholar
  27. 27.
    Abnane, I., Idri, A.: Improved analogy-based effort estimation with incomplete mixed data. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2018)Google Scholar
  28. 28.
    Idri, A., Abnane, I., Abran, A.: Support vector regression-based imputation in analogy-based software development effort estimation. J. Softw. Evol. Process 30(12), e2114 (2018)CrossRefGoogle Scholar
  29. 29.
    Bala, A., Abran, A.: Impact analysis of multiple imputation on effort estimation models with the ISBSG repository of software projects. Softw. Meas. News 23(1), 17–34 (2018)Google Scholar
  30. 30.
    Chatzipetrou, P.: Software cost estimation: a state-of-the-art statistical and visualization approach for missing data. Int. J. Serv. Sci. Manag. Eng. Technol. (IJSSMET) 10(3), 14–31 (2019)Google Scholar
  31. 31.
    Padhy, N., Singh, R., Satapathy, S.C.: Software reusability metrics estimation: algorithms, models and optimization techniques. Comput. Electr. Eng. 69, 653–668 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computing, Faculty of EngineeringUniversiti Teknologi Malaysia (UTM)SkudaiMalaysia

Personalised recommendations