Skip to main content

Cost Prediction and Software Project Management

  • Chapter
  • First Online:
Software Project Management in a Changing World

Abstract

This chapter reviews the background and extent of the software project cost prediction problem. Given the importance of the topic, there has been a great deal of research activity over the past 40 years, most of which has focused on developing formal cost prediction systems. The problem is that presently there is limited evidence to suggest formal methods outperform experts, therefore detailed consideration is given to the available empirical evidence concerning expert performance. This shows that software professionals tend to be biased (optimistic) and over-confident, and there are a number of deep cognitive biases which help us understand why this is so. Finally, the chapter describes how this might best be tackled through a range of simple, practical and evidence-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    There is something of a proliferation of terminology. Whilst the majority of writers refer to cost modelling or prediction, strictly speaking the usual focus is upon labour or effort which forms the dominant part of costs and is usually the hardest to predict. Such costs may or may not be reflected in the price charged to the client or user. This chapter will use the term in this particular sense. Likewise, estimation and prediction are used interchangeably since we’re only concerned with future events.

  2. 2.

    The regression model is constructed one independent variable at a time or iteratively until no new variable significantly contributes to the model fit.

  3. 3.

    A lazy learner only makes an inductive generalization when actually presented with the new problem to solve. This can be advantageous when trying to learn in the face of noisy training cases and much uncertainty.

  4. 4.

    Essentially, the point is that when conducting a significance test for a hypothesis, there are two dangers: One can wrongly reject the null hypothesis or wrongly fail to reject the null hypothesis. It is customary to set the chances of wrongly rejecting the null hypothesis (denoted by α) at 0.05. However, if many tests are performed, the probability of at least once committing such an error grows with the number of tests. For this reason, the α threshold needs to be reduced to take this danger into account.

  5. 5.

    The bibliographic database can be found at www.simula.no/BESTweb

References

  • Abran A, Bourque, P (2004) SWEBOK: guide to the software engineering body of knowledge. IEEE Computer Society

    Google Scholar 

  • Albrecht AJ, Gaffney JR (1983) Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans Softw Eng 9:639–648

    Article  Google Scholar 

  • Argyris C, Schön D (1996) Organizational learning II: theory, method and practice. Addison-Wesley, Reading, MA

    Google Scholar 

  • Armstrong S (2007) Significance tests harm progress in forecasting. Int J Forecast 23:321–327

    Article  Google Scholar 

  • Benington HD (1956) Production of large computer programs. In: Symposium on advanced computer programs for digital computers, ACR-15

    Google Scholar 

  • Boehm BW (1981) Software engineering economics. Prentice-Hall, Englewood Cliffs, NJ

    MATH  Google Scholar 

  • Boehm BW (1984) Software engineering economics. IEEE Trans Softw Eng 10:4–21

    Article  Google Scholar 

  • Boehm B, Abts C, Brown W, Chulani S, Clark BK, Horowitz E, Madachy R, Reifer D, Steece B (2000) Software cost estimation with COCOMO II. Pearson/Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

  • Borkowski JG, Carr M, Pressley M (1987) Spontaneous strategy use: perspectives from metacognitive theory. Intelligence 11:61–75

    Article  Google Scholar 

  • Buehler R, Griffin D (2003) Planning, personality, and prediction: the role of future focus in optimistic time predictions. Organ Behav Hum Decis Process 92:80–90

    Article  Google Scholar 

  • Buehler R, Griffin D, Ross M (1994) Exploring the “Planning Fallacy”: why people underestimate their task completion times. J Pers Soc Psychol 67:366–381

    Article  Google Scholar 

  • Clark J, Dolado JJ, Harman M, Hierons RM, Jones B, Lumkin M, Mitchell B, Mancoridis S, Coutinho SA (2007) The relationship between goals, metacognition, and academic success. Educate 7:39–47

    Google Scholar 

  • Coutinho SA (2007) The relationship between goals, metacognition, and academic success. Educate 7:39–47

    Google Scholar 

  • Cuelenaere A, van Genuchten M, Heemstra F (1987) Calibrating a software cost estimation model - why and how. Inf Softw Technol 29:558–567

    Article  Google Scholar 

  • Dawson TL (2008) Metacognition and learning in adulthood. Developmental Testing Service LLC, Northampton, MA

    Google Scholar 

  • DeMarco T (1982) Controlling software projects: management, measurement and estimation. Yourdon Press, New York

    Google Scholar 

  • El Emam K, Koru G (2008) A replicated survey of IT software project failures. IEEE Softw 25:84–90

    Article  Google Scholar 

  • Ellis P (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Fishman G (1996) Monte Carlo: concepts, algorithms, and applications. Springer, New York

    Book  MATH  Google Scholar 

  • Flavell JH (1979) Metacognition and cognitive monitoring: a new area of cognitive-developmental inquiry. Am Psychol 34:906–911

    Article  Google Scholar 

  • Flyvbjerg B (2008) Curbing optimism bias and strategic misrepresentation in planning: reference class forecasting in practice. Eur Plan Stud 16:3–32

    Article  Google Scholar 

  • Flyvbjerg B, Bruzelius N, Rothengatter W (2003) Megaprojects and risk: an anatomy of ambition. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29:985–995

    Article  Google Scholar 

  • Griffin D, Buehler R (1999) Frequency, probability, and prediction: easy solutions to cognitive illusions? Cogn Psychol 38:48–78

    Article  Google Scholar 

  • Grimstad S, Jørgensen M, Moløkken-Østvold K (2006) Software effort estimation terminology: the tower of Babel. Inf Softw Technol 48:302–310

    Google Scholar 

  • Gulezian R (1991) Reformulating and calibrating COCOMO. J Syst Softw 16:235–242

    Article  Google Scholar 

  • Heemstra FJ (1992) Software cost estimation. Inf Softw Technol 34:627–639

    Article  Google Scholar 

  • Hughes RT (1996) Expert judgement as an estimating method. Inf Softw Technol 38:67–75

    Article  Google Scholar 

  • Hughes RT, Cotterell M (2009) Software project management. McGraw-Hill, London

    Google Scholar 

  • Humphrey W (2000) Introducing the personal software process. Ann Softw Eng 1:311–325

    Article  Google Scholar 

  • Jeffery DR, Low GC (1990) Calibrating estimation tools for software development. Softw Eng J 5:215–221

    Article  Google Scholar 

  • Jørgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70:37–60

    Article  Google Scholar 

  • Jørgensen M (2010) Identification of more risks can lead to increased over-optimism of and over-confidence in software development effort estimates. Inf Softw Technol 52:506–516

    Article  Google Scholar 

  • Jørgensen M, Grimstad S (2012) Software development estimation biases: the role of interdependence. IEEE Trans Softw Eng 38:677–693

    Article  Google Scholar 

  • Jørgensen M, Gruschke T (2005) Industrial use of formal software cost estimation models: expert estimation in disguise? In: Proceedings of EASE, Keele, UK

    Google Scholar 

  • Jørgensen M, Gruschke T (2009) The impact of lessons-learned sessions on effort estimation and uncertainty assessments. IEEE Trans Softw Eng 35:368–383

    Article  Google Scholar 

  • Jørgensen M, Moløkken-Østvold K (2006) How large are software cost overruns? A review of the 1994 CHAOS report. Inf Softw Technol 48:297–301

    Article  Google Scholar 

  • Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33:33–53

    Article  Google Scholar 

  • Jørgensen M, Sjøberg DIK (2003) An effort prediction interval approach based on the empirical distribution of previous estimation accuracy. Inf Softw Technol 45:123–136

    Article  Google Scholar 

  • Jørgensen M, Teigen KH, Moløkken K (2004) Better sure than safe? Overconfidence in judgment based software development effort prediction intervals. J Syst Softw 70:79–93

    Article  Google Scholar 

  • Kahneman D, Tversky A (1979) Intuitive prediction: biases and corrective procedures. TIMS Stud Manag Sci 12:313–327

    Google Scholar 

  • Kahneman D, Fredrickson B, Schreiber C, Redelmeir D (1993) When more pain is preferred to less: adding a better end. Psychol Sci 4:401–405

    Article  Google Scholar 

  • Kemerer CF (1987) An empirical validation of software cost estimation models. Commun ACM 30:416–429

    Article  Google Scholar 

  • Keung J, Kitchenham B, Jeffery R (2008) Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34:471–484

    Article  Google Scholar 

  • Kitchenham BA (2002) The question of scale economies in software - why cannot researchers agree? Inf Softw Technol 44:13–24

    Article  Google Scholar 

  • Kitchenham BA, Kansala, K. (1993) Inter-item correlations among function points. In: 1st International symposium on software metrics. IEEE Computer Society Press, Baltimore, MD

    Google Scholar 

  • Kitchenham BA, Linkman SG (1997) Estimates, uncertainty and risk. IEEE Softw 14:69–74

    Article  Google Scholar 

  • Kitchenham BA, MacDonell SG, Pickard L, Shepperd MJ (2001) What accuracy statistics really measure. IEEE Proc Softw Eng 148:81–85

    Article  Google Scholar 

  • Kitchenham BA, Pfleeger SL, McColl B, Eagan S (2002) An empirical study of maintenance and development estimation accuracy. J Syst Softw 64:57–77

    Article  Google Scholar 

  • Kitchenham B, Mendes E, Travassos G (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33:316–329

    Article  Google Scholar 

  • Kocaguneli E, Menzies T, Hihn J, Kang H (2012a) Size doesn’t matter? On the value of software size features for effort estimation. In: Proceedings of the 8th international conference on predictive models in software engineering, New York

    Google Scholar 

  • Kocaguneli E, Menzies T, Keung J (2012b) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38:1403–1416

    Article  Google Scholar 

  • Kolodner JL (1993) Case-based reasoning. Morgan-Kaufmann, San Mateo, CA

    Google Scholar 

  • Lederer A, Mendelow A (1999) The impact of the environment on the management of information systems. Inf Syst Res 1:205–222

    Article  Google Scholar 

  • Liu Q, Mintram R (2005) Preliminary data analysis methods in software estimation. Softw Qual J 13:91–115

    Article  Google Scholar 

  • MacDonell S, Shepperd M (2003a) Using prior-phase effort records for re-estimation during software projects. In: 9th IEEE international metrics symposium

    Google Scholar 

  • MacDonell S, Shepperd M (2003b) Combining techniques to optimize effort predictions in software project management. J Syst Softw 66:91–98

    Article  Google Scholar 

  • MacDonell S, Shepperd MJ (2007) Comparing local and global software effort estimation models – reflections on a systematic review. In: 1st international symposium on empirical software engineering and measurement, Madrid

    Google Scholar 

  • Mair C, Shepperd M (2005) The consistency of empirical comparisons of regression and analogy-based software project cost prediction. In: 4th international symposium on empirical software Engineering (ISESE) Noosa Heads, Australia

    Google Scholar 

  • Mair C, Kadoda G, Lefley M, Keith P, Schofield C, Shepperd M, Webster S (2000) An investigation of machine learning based prediction systems. J Syst Softw 53:23–29

    Article  Google Scholar 

  • Mair C, Martincova M, Shepperd M (2009) A literature review of expert problem solving using analogy. In: 13th international conference on evaluation and assessment in software engineering (EASE), British Computer Society, Swinton, UK

    Google Scholar 

  • Menzies T, Jalili M, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17:409–437

    Article  Google Scholar 

  • Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39:822–834

    Article  Google Scholar 

  • Minku L, Yao X (2013) Ensembles and locality: insight on improving software effort estimation. Inf Softw Technol 55:1512–1528

    Article  Google Scholar 

  • Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39:537–551

    Article  Google Scholar 

  • Moløkken K, Jørgensen M (2004) Group processes in software effort estimation. Empir Softw Eng 9:315–334

    Article  Google Scholar 

  • Moon J (1999) Reflection in learning and professional development: theory and practice. Kogan Page, London

    Google Scholar 

  • Myrtveit I, Stensrud E (1999) A controlled experiment to assess the benefits of estimating with analogy and regression models. IEEE Trans Softw Eng 25:510–525

    Article  Google Scholar 

  • Passing U, Shepperd M (2003) An experiment on software project size and effort estimation. In: ACM-IEEE international symposium on empirical software engineering (ISESE 2003)

    Google Scholar 

  • Riaz M, Mendes E, Tempero E (2009) A systematic review of software maintainability pre- diction and metrics. In: 3rd international symposium on empirical software engineering and measurement, ACM Computer Press, pp 367–377

    Google Scholar 

  • Ridley D, Schutz P, Glanz R, Wernstein C (1992) Self-regulated learning: the interactive influence of metacognitive awareness and goal-setting. J Exp Educ 60:293–306

    Article  Google Scholar 

  • Saltelli A, Tarantola S, Campolongo F (2000) Sensitivity analysis as an ingredient of modeling. Stat Sci 15:377–395

    Article  MathSciNet  Google Scholar 

  • Schön DA (1983) The reflective practitioner. Basic Books, New York

    Google Scholar 

  • Shepperd M (2003) Case-based reasoning and software engineering. In: Aurum A, Jeffery R, Wohlin C, Handzic M (eds) Managing software engineering knowledge. Springer, Berlin

    Google Scholar 

  • Shepperd MJ, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27:987–998

    Article  Google Scholar 

  • Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54:820–827

    Article  Google Scholar 

  • Shepperd MJ, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23:736–743

    Article  Google Scholar 

  • Sommerville I (2010) Software engineering. Pearson, Hemel Hempstead, UK

    Google Scholar 

  • Song Q, Shepperd M (2007) Missing data imputation techniques. Int J Bus Intell Data Mining 2:261–291

    Article  Google Scholar 

  • Song Q, Shepperd M (2011) Predicting software project effort: a grey relational analysis based method. Expert Syst Appl 38:7302–7316

    Article  Google Scholar 

  • Strike K, El Emam K, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27:890–908

    Article  Google Scholar 

  • Symons CR (1988) Function point analysis: difficulties and improvements. IEEE Trans Softw Eng 14:2–11

    Article  Google Scholar 

  • Taff LM, Borcering JWB, Hudgins WR (1991) Estimeetings: development estimates and a front-end process for a large project. IEEE Trans Softw Eng 17:839–849

    Article  Google Scholar 

  • Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 185:1124–1131

    Article  Google Scholar 

  • Wagner S (2007) An approach to global sensitivity analysis: FAST on COCOMO. In: 1st International symposium on empirical software engineering and measurement (ESEM 2007). IEEE Computer Society, pp 440–442

    Google Scholar 

  • Whitfield D (2007) Cost Overruns, delays and terminations: 105 outsourced public sector ICT contracts. The European Services Strategy Unit

    Google Scholar 

  • Willis R (1985) Invited review: critical path analysis and resource constrained project scheduling—theory and practice. Eur J Oper Res 21(2):149–155

    Article  Google Scholar 

  • Witten I, Frank E, Hall M (2011) Data mining: practical machine learning, tools and techniques. Morgan Kaufmann, Burlington, MA

    Google Scholar 

  • Yang Y, He Z, Mao K, Li Q, Nguyen V, Boehm B, Valerdi R (2013) Analyzing and handling local bias for calibrating parametric cost estimation models. Inf Softw Technol 55:1496–1511. Software Engineering Body of Knowledge (SWEBOK). Software Engineering Body of Knowledge (SWEBOK) Home. http://www.computer.org/portal/web/swebok/home

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Shepperd .

Editor information

Editors and Affiliations

Glossary

Absolute residuals

a simple and robust means of assessing the predictive accuracy of a prediction system. It is defined simply as: \( \left|{y}_i-{\widehat{y}}_i\right| \) where y i is the true value for the ith project and \( {\widehat{y}}_i \) the estimated value. This gives the error, irrespective of direction, i.e., an under- or over-estimate. The mean residual (keeping the direction of error) gives a measure of the degree of bias.

Cognitive bias

these are patterns of thinking about problem solving or decision-making that distort and lead people to ‘sub-optimal’ choices. Because of the ubiquity of many such biases, they are classified and named, e.g., the anchoring bias. See the pioneering work of Tversky and Kahneman (1974).

Double loop learning

this differs from ordinary or single-loop learning in that one not only observes the effects of the process, but also understands the external factors that influence the effects. This was initially promoted by Argyris and Schön as a way of promoting effective organisational behaviour (Argyris and Schön 1996).

Estimation by Analogy (EBA)

uses some form of case-based reasoning where a new or target case which is to be solved is plotted in feature space (one dimension per feature) and some distance metric used to determine past proximal cases from which a solution can be derived. For a general account of CBR see the pioneering work by Kolodner (1993) and for its application to software engineering see Shepperd (2003).

Expert Judgement

this is something of a catch all description for a range of informal approaches to estimation. Jørgensen describes it as ‘unaided intuition (“gut feeling”) to expert judgment supported by historical data, process guidelines and checklists (“structured estimation”)’ (Jørgensen 2004). Despite it being a widespread estimation approach, it can still be criticised for its reasoning not being open to scrutiny since the reasoning process is ‘non-recoverable’ (Jørgensen 2004), not repeatable or easily transferable from existing experts to others.

Formal prediction system

or formal model for cost prediction is characterised by repeatability so that different individuals applying the same inputs should generate the same outputs (with the exception of prediction systems based on stochastic search [also see Chap. 15 on search-based project management] where this will tend to be true over time (Clark et al. 2007), but not necessarily for a single utilisation). Examples of formal systems range from simple algorithmic models, such as COCOMO, to complex ensembles of learners.

Machine Learning

this is a branch of applied artificial intelligence based on inducing prediction systems from historical data, i.e., reasoning from the particular to the general. There are a wide range of approaches including neural networks, case-based reasoning, rule induction, Bayesian methods, support vector machines and population search methods such as genetic programming. Standard textbooks that provide overviews of these techniques include Witten et al. (2011).

Mean magnitude of relative error (MMRE)

this is a widely used, although now heavily criticized (Kitchenham et al. 2001; Foss et al. 2003; Shepperd and MacDonell 2012), measure of predictive accuracy defined as: \( MMRE=\displaystyle \sum_1^n\left.{\bigg(\bigg|\left.^{({x}_i-{\widehat{x}}_i)}\right/ {x}_i\bigg|\bigg)}\right/_{n} \) where x is the true cost for the ith project, is the estimated cost and n the total number of projects.

Metacognition

this refers to ‘thinking about thinking’ (Flavell 1979) and is an awareness and monitoring of one’s thoughts and performance. It encompasses the ability to consciously control the cognitive processes involved in learning such as planning, strategy selection, monitoring and evaluating progress towards a particular goal and adapting strategies as, and when, necessary to reach that goal (Ridley et al. 1992).

Over-confidence

refers to the tendency of an estimator to value precision over accuracy. Typically, one might express confidence in an estimate as the likelihood that the true value falls within a specified interval. For example, stating that one is 80 % confident that the actual effort will fall within the range 1,000–1,200 person-hours implies that this will occur 8 out of 10 times. If the true value falls into the range less frequently this implies over-confidence. Jørgensen et al. (2004) reported that over-confidence was a widespread phenomenon and that at least one contributor was the fact that managers often interpret wide intervals as conveying a lack of knowledge and prefer narrow but less accurate estimates.

Over-optimism

refers to the situation where the estimation error is biased towards an under-estimate. Many studies indicate that this is the norm in the software industry with a figure of 30 % being seen as typical (Jørgensen 2004).

Prediction

whilst ‘prediction’ and ‘estimation’ are often used interchangeably, we use ‘prediction’ to mean a forecast or projection, and ‘estimate’ to connote a guess or rough and ready calculation.

Single-loop learning

Argyris and Schön (1996) characterise this as focusing on restrictive feedback so that the individual or organisation only endeavours to improve a single metric without external reflection upon the process, i.e., double loop learning.

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Shepperd, M. (2014). Cost Prediction and Software Project Management. In: Ruhe, G., Wohlin, C. (eds) Software Project Management in a Changing World. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55035-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55035-5_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55034-8

  • Online ISBN: 978-3-642-55035-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics