An ensemble-based model for predicting agile software development effort

Malgonde, Onkar; Chari, Kaushal

doi:10.1007/s10664-018-9647-0

An ensemble-based model for predicting agile software development effort

Published: 11 September 2018

Volume 24, pages 1017–1055, (2019)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

1865 Accesses
35 Citations
2 Altmetric
Explore all metrics

Abstract

To support agile software development projects, an array of tools and systems is available to plan, design, track, and manage the development process. In this paper, we explore a critical aspect of agile development i.e., effort prediction, that cuts across these tools and agile project teams. Accurate effort prediction can improve the planning of a sprint by enabling optimal assignments of both stories and developers. We develop a model for story-effort prediction using variables that are readily available when a story is created. We use seven predictive algorithms to predict a story’s effort. Interestingly, none of the predictive algorithms consistently outperforms others in predicting story effort across our test data of 423 stories. We develop an ensemble-based method based on our model for predicting story effort. We conduct computational experiments to show that our ensemble-based approach performs better in comparison to other ensemble-based benchmarking approaches. We then demonstrate the practical application of our predictive model and our ensemble-based approach by optimizing sprint planning for two projects from our dataset using an optimization model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured software development versus agile software development: a comparative analysis

Article Open access 12 June 2023

Impact of Agile Methodology Use on Project Success in Organizations - A Systematic Literature Review

How artificial intelligence will transform project management in the age of digitization: a systematic literature review

Article Open access 09 April 2024

Notes

We thank the anonymous reviewer for pointing this out.
Dejaeger et al. (2012) identify 13 data mining techniques used in software effort estimating in a traditional setting. In this paper, we identify representative techniques (regression, decision trees, support vector machines, neural network, Bayesian network) that perform better than the variants and augment them with newer data mining techniques (k-nearest neighbor and ensemble approaches). For example, a radial kernel outperformed other kernels in Support Vector Machines.
Prior effort estimation studies have considered different variants of the algorithms considered in this study. Readers are referred to studies by Jørgensen and Shepperd (2007), Wen et al. (2012), and Idri et al. (2016) for a review of candidate algorithms in traditional software development projects.
Log Transformation of dependent variable provided worse performance.
We thank the anonymous reviewer for pointing this out.
The measure has four categories: (a) negligible effect (|d| < 0.147), (b) small effect (|d| < 0.33), (c) medium effect (|d| <0.474), and (d) large effect (|d| >0.474).
To facilitate practical interpretation, we also provide (Vargha and Delaney 2000) statistic (\(\hat {A_{12}}\)) for each pair of predictive algorithms.
Increasing β leads to higher computation costs. We choose the maximum value as 8.0 based on our experimental results. Values greater than 8.0 did not significantly improve the results.

References

Abrahamsson P, Salo O, Ronkainen J, Warsta J (2002) Agile software development methods: Review and analysis. Report, VTT
Abrahamsson P, Moser R, Pedrycz W, Sillitti A, Succi G (2007) Effort prediction in iterative software development processes - incremental versus global prediction models. Empirical Software Engineering and Measurement, pp 344–353
Abrahamsson P, Fronza I, Moser R, Vlasenko J, Pedrycz W (2011) Predicting development effort from user stories. In: International Symposium on Empirical Software Engineering and Measurement, pp 400–403
Aggarwal C (2015) Data Mining: The Textbook. Springer, New York
MATH Google Scholar
Azhar D, Riddle P, Mendes E, Mittas N, Angelis l (2013) Using ensembles for web effort estimation. In: ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
Azzeh M, Nassif AB, Minku L (2015) An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation. The Journal of Systems and Software 103:36–52
Article Google Scholar
Bayley S, Falessi D (2018) Optimizing prediction intervals by tuning random forest via meta-validation. arXiv:1801.07194
Beck K, Andres C (2004) Extreme Programming Explained:Embrace Change. Addison-Wesley, Reading
Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57 (1):289–300
MathSciNet MATH Google Scholar
Bergmeir C, Benitez JM (2011) Forecaster performance evaluation with cross-validation and variants. In: 11Th international conference on intelligent systems design and applications (ISDA). IEEE, pp 849-854
Chari K, Agrawal M (2018) Impact of incorrect and new requirements on waterfall software project outcomes. Empir Softw Eng 23(1):165–185
Article Google Scholar
Chowdhury S, Di Nardo S, Hindle A, Jiang ZMJ (2018) An exploratory study on assessing the energy impact of logging on android applications. Empir Softw Eng 23(3):1422–1456
Article Google Scholar
Cinnéide MÓ, Moghadam IH, Harman M, Counsell S, Tratt L (2017) An experimental search-based approach to cohesion metric evaluation. Empir Softw Eng 22(1):292–329
Article Google Scholar
Conboy K (2009) Agility from first principles: Reconstructing the concept of agility in information systems development. Inf Syst Res 20(3):329–354
Article Google Scholar
Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: A comparative study. IEEE Trans Softw Eng 38(2):375–97
Article Google Scholar
Grenning J (2002) Planning poker or how to avoid analysis paralysis while release planning. Report, Hawthorn Woods: Renaissance Software Consulting
Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning. Springer, New York
MATH Google Scholar
Haugen NC (2006) An empirical study of using planning poker for user story estimation. In: Agile Conference, 2006, IEEE, pp 9–pp
Hearty P, Fenton N, Marquez D, Neil M (2009) Predicting project velocity in XP using a learning dynamic bayesian network model. IEEE Trans Softw Eng 35 (1):124–137
Article Google Scholar
Hussain I, Kosseim L, Ormandjieva O (2013) Approximation of cosmic functional size to support early effort estimation in agile. Data and Knowledge Engineering 85:2–14
Article Google Scholar
Idri A, Hosni M, Abran A (2016) Systematic literature review of ensemble effort estimation. J Syst Softw 118:151–175
Article Google Scholar
Jahedpari F (2016) Artificial prediction markets for online prediction of continuous variables. PhD thesis, University of Bath, Bath
Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2015) An Introduction to Statistical Learning with Applications in R. Springer Texts in Statistics. Springer, New York
MATH Google Scholar
Jonsson L, Borg M, Broman D, Sandahl K, Eldh S, Runeson P (2016) Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts. Empir Softw Eng 21(4):1533–1578
Article Google Scholar
Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53
Article Google Scholar
Karner G (1993) Resource estimation for objectory projects. Objective Systems SF AB, p 17
Kocaguneli E, Menzies T, Keung JW (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416
Article Google Scholar
Kultur Y, Turhanm B, Bener AB (2008) ENNA: software effort estimation using ensemble of neural networks with associative memory. In: 16th ACM SIGSOFT
Lee D (2016) Alternatives to p value: confidence interval and effect size. Korean Journal of Anesthesiology 69(6):555–562
Article Google Scholar
Li Y, Yue T, Ali S, Zhang L (2017) Zen-ReqOptimizer: A search-based approach for requirements assignment optimization. Empir Softw Eng 22(1):175–234
Article Google Scholar
Logue K, McDaid K, Greer D (2007) Allowing for task uncertainties and dependencies in agile release planning. In: 4th Proceedings of the Software Measurement European Forum, pp 275–284
Lokan C, Mendes E (2014) Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study. Inf Softw Technol 56(9):1063–1075
Article Google Scholar
MacDonell SG, Shepperd M (2010) Data accumulation and software effort prediction. In: ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Magazinius A, Börjesson S, Feldt R (2012) Investigating intentional distortions in software cost estimation–an exploratory study. J Syst Softw 85(8):1770–1781
Article Google Scholar
Mahnic V, Hovelja T (2012) On using planning poker for estimating user stories. The Journal of Systems and Software 85:2086–2095
Article Google Scholar
Minku L, Yao X (2013) Ensembles and locality: Insight on improving software effort estimation. Inf Softw Technol 55(8):1512–1528
Article Google Scholar
Miyazaki Y, Takanou A, Nozaki H, Nakagawa N, Okada K (1991) Method to estimate parameter values in software prediction models. Inf Softw Technol 33:239–243
Article Google Scholar
Neill J (2008) Why use effect sizes instead of significance testing in program evaluation? http://www.wilderdom.com/research/effectsizes.html, accessed: 2018-07
Nunes N, Constantine L, Kazman R (2011) iUCP: Estimating interactive-software project size with enhanced use-case points. IEEE Software 28(04):64–73
Article Google Scholar
Palmer S, Felsing J (2002) A Practical Guide to Feature-driven Development. Prentice Hall, Upper Sadle River
Google Scholar
Papatheocharous E, Papadopoulos H, Andreou A (2010) Feature subset selection for software cost modelling and estimation. Eng Intell Syst 18:233–246
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Pendharkar P, Subramanian G, Rodger J (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31(7):615–624
Article Google Scholar
Perols J, Chari K, Agrawal M (2009) Information market-based decision fusion. Manag Sci 55(5):827–842
Article Google Scholar
Pikkarainen M, Haikara J, Salo O, Abrahamsson P, Still J (2008) The impact of agile practices on communication in software development. Empir Softw Eng 13(3):303–337
Article Google Scholar
Santana C, Leoneo F, Vasconcelos A, Gusmão C (2011) Using function points in agile projects. In: International Conference on Agile Software Development. Springer, pp 176–191
Schwaber K, Sutherland J (2016) The scrum guide (2013). Dostopno na:. http://www.scrumguidesorg/docs/scrumguide/v1/scrum-guide-uspdf (dostop 28–4–2016)
Shmueli G, Bruce P, Patel N (2016) Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner. Wiley, Hoboken
Google Scholar
Stapleton J (1997) Dynamic systems development method. Addison-Wesley, Boston
Google Scholar
Usman M, Mendes E, Weidt F, Britto R (2014) Effort estimation in agile software development: a systematic literature review. In: 10th International Conference on Predictive Models in Software Engineering, pp 82–91
Usman M, Mendes E, Börstler J (2015) Effort estimation in agile software development: a survey on the state of the practice. In: 19th International Conference on Evaluation and Assessment in Software Engineering
Vargha A, Delaney HD (2000) A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J Educ Behav Stat 25(2):101–132
Google Scholar
VersionOne (2016) 10th annual state of agile report. Technical report
Vidgen R, Wang X (2009) Coevolving systems and the organization of agile software development. Inf Syst Res 20(3):355–376
Article Google Scholar
Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59
Article Google Scholar
Wolpert DH (1992) Stacked generalization. Neural networks 5(2):241–259
Article Google Scholar

Download references

Acknowledgements

This paper has benefited from the feedback received at Workshop in Information Technology and Systems (WITS) 2014 (Auckland) and WITS 2016 (Dublin) where preliminary versions of the paper were presented. There were many individuals that helped us with this research project. First, we would like to thank those individuals at our data site that helped us gain access to the dataset. We would also like to thank Dr. Terry Sincich for helping us with the design and choice of statistical tests, developers/project managers who shared their insights on the implications of this research to practice, and Dr. Patricia Nickinson for proofreading and editing our draft. Finally, we would like to express our sincere gratitude to the three anonymous reviewers and the editors for providing constructive feedback on our earlier submission.

Author information

Authors and Affiliations

Operations Management & Information Systems, College of Business, Northern Illinois University, DeKalb, IL, USA
Onkar Malgonde
Information Systems and Decision Sciences, Muma College of Business, University of South Florida, Tampa, FL, USA
Kaushal Chari

Authors

Onkar Malgonde
View author publications
You can also search for this author in PubMed Google Scholar
Kaushal Chari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Onkar Malgonde.

Additional information

Communicated by: Martin Shepperd

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

The K-Nearest Neighbor (KNN) algorithm uses k closest data points in the training data (determined by some distance metric —in our case Euclidean), where k is provided by the user, to determine its prediction. Each neighbor of a target point can equally influence the prediction (uniform weight) or inverse of the distance from the target point (distance-based weight). For our experiments, the parameter optimized was the number of neighbors (k). Exhaustive searches from 2 neighbors to the maximum possible in the training dataset were used; the best number of neighbors to minimize prediction errors is 14. Further, we used a distance-based measure such that closer points had higher influence than those that were farther out.

Decision trees (DT) represent a set of hierarchical decisions on the feature variables of the training data. A decision at a particular node, called a split criterion, is conditional on one or more feature variables. Different criteria can be used to measure the quality of the split. In our case, mean absolute error was used as the measurement criterion. As the depth of a tree increases, the tree is over fitted to the training data i.e., all instances can be classified by a dedicated leaf node of their own. However, such trees suffer when exposed to test datasets. To balance overfitting, optimization for the maximum depth of the tree was found to be 2. We considered depth value from 1 to 199.

Ridge regression (RR), in addition to minimizing the residual sum of squares (RSS), aims to minimize the shrinkage penalty using a tuning parameter. As the tuning parameter approaches zero, the ridge regression produces a least squares model. As the tuning parameter increases, the flexibility of the ridge regression fit decreases, leading to decreased variances but increased bias. The optimal value of the tuning parameter was found to be 3.51. The tuning parameter value varied from 0.0001 to 50 with increments of 0.001.

A Bayesian network (BN) learns the joint probability distribution of the target variable using a causal model that provides the prior and posterior probabilities of variables. New information can be incorporated into the model using belief updating procedures. Bayesian networks are popular in machine learning and software effort prediction literature due to their accuracy (Hearty et al. 2009). We implemented the procedure in Pendharkar et al. (2005) that provided point predictions for a target story.

Support Vector Machine (SVM) is a generalization of maximal margin classifier that identifies a separating hyperplane (p-1 dimensional flat space in a p dimensional space) that classifies the one-dimensional space. However, this approach is limited by cases where the perfect separating hyperplane is not available. Support Vector Classifiers address this problem by identifying soft hyperplanes that almost separate the classes. Data points that lie on the hyperplane are known as support vectors. These points determine the support vector classifier performance. Tolerance for points on the wrong side of the hyperplane can be tuned by a penalty parameter C. A greater value of C will increase the tolerance for data points on the wrong side. The hyperplane can be linear or nonlinear. Support Vector Regression (SVR), a nonparametric method, uses kernel functions to identify the decision boundary. Kernel functions can be linear, polynomial, radial, or sigmoid, among others. In our experiments, the penalty parameter C was identified to be 1.9 using a Radial Basis Function (rbf) kernel whose width was identified to 0.3. We used a multi-level grid search to tune parameters. The possible values for penalty parameters were varied from 0.01 to 10 whereas the values for kernel widths varied from 0.1 to 10.

In Artificial Neural Network (ANN), derived features are created from non-linear combinations of input variables (input nodes). The target variable to be predicted, in turn, is modeled as a linear function of the derived features. The units computing the derived features are known as hidden nodes since they are not visible to the user. Hidden nodes transform the nodes’ inputs using a weighted linear summation. A regularization parameter is used to limit overfitting of the neural network to the training data. In our experiments, the regularization parameter was determined to be 0.000081. The value of regularization parameter varied from 1.00E-06 to 1 with increments of 1.00E-05.

In Random Forest (RF), a number of decision trees are applied to subsamples of the training dataset. Every time these trees are built, a different set of predictors is chosen as split candidates. This produces decorrelated trees which reduce the overall variance. For our experiments, the random forest was optimized to 5 trees in the forest with 4 features selected to determine the best split. Possible values for the number of trees in the forest and features were varied from 1 to 20 each, with all possible combinations considered to identify optimal parameter values.

Extra Trees (ET), similar to RF, fit multiple randomized decision trees on subsamples of the dataset. However, variables are chosen from the entire dataset rather than bootstrapping and at random. The number of trees was optimized to 5 trees and 4 features selected to determine the best split. Possible values for the number of trees and features varied from 1 to 20 each, with all possible combinations considered to identify optimal parameter values.

Adaptive Boost (AB) works by incrementally focusing on cases that are difficult to predict. A base estimator (in our case, decision tree) is identified by fitting to the original dataset. Incrementally, adjusted estimators are fitted to the dataset such that weight is increased for wrongfully classified data points. In our experiments, the boosting was terminated after two estimators were fit to the data. Possible values considered were from one to 30 estimators.

Gradient Boosting (GB) optimizes least squares loss function by adding weak learners (decision trees) to an additive model. As new trees are added at each boosting stage, existing trees are not changed. The optimal number of boosting stages was identified as 28. The possible range of stages considered was 1-60.

Stacking (ST) is an ensemble approach which aggregates individual 1^st-level algorithms with a 2^nd- level algorithm (often known as the meta-regressor). Individual first-order algorithms provide predictions which form the input to the meta-regressor. Five algorithms selected in our experiments formed the first order algorithms with respective hyperparameters identified in the experiments. We used Support Vector Machine with “rbf” kernel as the meta-regressor.

For an extensive discussion on each of these algorithms, readers are referred to (Hastie et al. 2008), (James et al. 2015), (Aggarwal 2015), and (Shmueli et al. 2016).

Appendix B

Tables in this appendix (Tables 24, 25 and 26) provide effect sizes with Cliff’s delta, 95% confidence interval, and the Vargha and Delaney (2000) (\(\hat {A_{12}}\)) statistic (VDA), which compares the probability of yielding lower absolute error for individual predictive algorithms. For each effect size computation, we consider the error values across all test dataset (N = 423). Reversing the order of algorithms will provide the same Cliff’s delta multiplied by -1. Similarly, confidence interval values will reverse with different signs. The VDA statistic for the reverse order of the algorithms is VDA_Reverse = |1 − VDA_{originalorder}|. For example, the effect size of RR and SVM is 0.10752, confidence interval is [0.02946, 0.18394], and VDA statistic is 0.553759. Statistically significant pairs are highlighted.

Table 24 Effect Sizes for Individual Predictive Algorithms (MAE)

Full size table

Table 25 Effect Sizes for Individual Predictive Algorithms (MBE)

Full size table

Table 26 Effect Sizes for Individual Predictive Algorithms (RMSE)

Full size table

Appendix C

Tables in this appendix (Tables 27, 28 and 29) provide effect sizes with Cliff’s delta, 95% confidence interval, and the Vargha and Delaney (2000) (\(\hat {A_{12}}\)) statistic (VDA) for ensemble algorithms. For each effect size computation, we consider the error values across all test dataset (N = 423). Statistically significant pairs are highlighted.

Table 27 Effect Sizes for Ensemble Algorithms (MAE)

Full size table

Table 28 Effect Sizes for Ensemble Algorithms (MBE)

Full size table

Table 29 Effect Sizes for Ensemble Algorithms (RMSE)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malgonde, O., Chari, K. An ensemble-based model for predicting agile software development effort. Empir Software Eng 24, 1017–1055 (2019). https://doi.org/10.1007/s10664-018-9647-0

Download citation

Published: 11 September 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10664-018-9647-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble-based model for predicting agile software development effort

Abstract

Access this article

Similar content being viewed by others

Structured software development versus agile software development: a comparative analysis

Impact of Agile Methodology Use on Project Success in Organizations - A Systematic Literature Review

How artificial intelligence will transform project management in the age of digitization: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A

Appendix B

Appendix C

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An ensemble-based model for predicting agile software development effort

Abstract

Access this article

Similar content being viewed by others

Structured software development versus agile software development: a comparative analysis

Impact of Agile Methodology Use on Project Success in Organizations - A Systematic Literature Review

How artificial intelligence will transform project management in the age of digitization: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A

Appendix B

Appendix C

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation