Predictions and the Uncertainty of the Future

Halkjelsvik, Torleif; Jørgensen, Magne

doi:10.1007/978-3-319-74953-2_3

Torleif Halkjelsvik¹² &
Magne Jørgensen¹³

Part of the book series: Simula SpringerBriefs on Computing ((SBRIEFSC,volume 5))

8059 Accesses
1 Citations

Abstract

A project manager states that a project will require 432 hours. Your friend sends you a text message saying that he will be at your place in 12 minutes. The precision of these time predictions is most likely misleading when interpreted according to the rules of significant digits, where the number of trailing non-zero values indicates the intended accuracy. For example, 432 hours and 12 minutes should indicate that the time prediction error is ±1 hour and ±1 minute—not very likely for most types of project or arrival time predictions.

You have full access to this open access chapter, Download chapter PDF

3.1 Precisely Wrong or Roughly Right?

A project manager states that a project will require 432 hours. Your friend sends you a text message saying that he will be at your place in 12 minutes. The precision of these time predictions is most likely misleading when interpreted according to the rules of significant digits, where the number of trailing non-zero values indicates the intended accuracy. For example, 432 hours and 12 minutes should indicate that the time prediction error is ±1 hour and ±1 minute—not very likely for most types of project or arrival time predictions.

Although time predictions given with high precision are wrong most of the time, they are often the type of time predictions we like to give and receive. We may react negatively if the car service responds with ‘repairing your car will take between one and 10 hours’ or ‘it is 70% likely that the repair will take less than six hours’. Even if we know that the prediction one to 10 hours reflects the actual uncertainty of the work, we may easily think that the car service is incompetent or that they are not interested in doing the work. Why is this so? One reason is that we use precision as an indicator of competence and that we perceive time predictions with many trailing zeros or wide intervals as less informative and those who present them as less competent [1]. With this in mind, it is not surprising, although unfortunate, that many prefer to be precisely wrong rather than roughly right in their time predictions.

A similar observation is the basis of the preciseness paradox [2]. This paradox refers to the observation that we sometimes have more confidence in precise time predictions, such as it will take four hours, than in time predictions that takes uncertainty into account, such as it will take between two and 20 hours. The latter statement is more likely to be correct than the first. Still, the first may seem more believable.

Higher precision is not always rewarded, as illustrated in a recent experiment with software professionals [3]. The software professionals evaluated the relative trustworthiness of four different hypothetical time predictions, along with evaluations of the relative competence of the persons who made them. The time predictions and the evaluations are presented in Table 3.1, where the percentages are the proportions of the software professionals’ responses per response category.

Table 3.1 Percentage of participants ranking software developers as the most competent, least competent, most trustworthy, and least trustworthy

Full size table

Developer A’s time prediction (1020 work hours) is the least likely to be correct. Developer D’s time prediction (between 500 and 1500 work hours) is the most likely to be correct. It is reasonable to assume that developer B’s time prediction has been rounded to become 1000 work hours and that developer B, for example, believes the time usage to be between 900 and 1100 work hours, that is, about the same accuracy as the interval predicted by developer C.

In this situation, the very precise time prediction of developer A was not rewarded. The software professionals did not believe that a time prediction of 1020 work hours was believable and 49% of them ranked this as the least trustworthy time prediction. Developer A’s competence was evaluated to be the lowest by 31%. The respondents were, on the other hand, not impressed by the wide time prediction interval (500–1500 work hours) of developer D either. Developer D’s time prediction was ranked as the least trustworthy by 36% of the respondents and that developer’s competence was ranked lowest by 55%. To be roughly right with a wide interval consequently seems to be a poor strategy if the goal is to make people believe in your time predictions and your competence. Instead, it would be better to act as developer C and give a narrow time prediction interval. Developer C was ranked as having the most trustworthy time prediction by 74% of the respondents and as being the most competent developer by 70%. Seemingly, the question of whether one should be precisely wrong or roughly right, in terms of being interpreted as competent, is more complex than we initially thought. One can be too precise as well as too imprecise. Interestingly, developer B (1000 work hours), who may have held similar accuracy beliefs as developer C (who indicated an interval of 900–1100 work hours) was not ranked favourably.

Strictly speaking, none of the time predictions in Table 3.1 are particularly informative regarding the uncertainty of the work. The point-based time predictions (1020 and 1000 work hours) have no explicit information about uncertainty and the interval-based time predictions (900–1100 and 500–1500 work hours) do not specify the probability of the actual time usage being within these intervals. Is it 99% likely, 90%, 80%, or only 50% likely that the intervals will include the actual time usage? In the next sections, we will discuss how to make, interpret, and communicate time predictions, including their uncertainty, in more meaningful ways.

Take home message 1: When evaluating time predictions and the people producing them, greater precision is often used as an indication of greater trustworthiness and higher competence, especially in the form of narrow time prediction intervals. This happens even though such time predictions are less likely to be correct.

Take home message 2: Although people prefer precise predictions, overly precise predictions can be negatively evaluated, leading to assessments of low trustworthiness and competence, at least when assessed by people with competence in the domain.

3.2 Communication of Time Predictions

Assume that the car service tells you that the repair will be finished in four and a half hours. What does this information mean? Is the prediction meant to be a best-case prediction, assuming that the work is done by the most skilled service professional and with no unexpected problems? Is it a promise based on a worst-case time prediction? And if it is a worst-case prediction, how sure can you be that the car is actually finished in four and a half hours?

If you really need your car back in four and a half hours, the meaning of the time prediction will matter a lot. Whether the prediction was based on a best-case scenario or on experience documenting that 99% of car repairs in similar situations required less than four and a half hours will make an important difference. If the estimate was a best-case prediction, you should have a backup plan, while a 99% likelihood prediction based on past repairs should make you sufficiently safe that the car will be at your disposal when you need it.

Most people do not, as far as we have observed in various domains, explain what they mean by their time predictions. We cannot even assume that people in the same context and with similar backgrounds mean the same thing with their time predictions. We once asked software developers to give their time predictions for completion a programming task [4]. Immediately afterward, we asked them how we should interpret their time predictions. When summarizing their responses, we used the category ideal effort if a time prediction was based on the assumption of no unexpected problems, most likely effort if a time prediction was what they thought was the most likely outcome, median effort if a time prediction was what they thought was about 50% likely not to be exceeded, and risk-averse effort if a time prediction was considered very likely to be sufficient to complete the work.

In spite of the same time prediction instructions and the same prediction task, the meanings of the predictions differed greatly (see Table 3.2). In addition, a large proportion of the software developers, all of whom regularly produced and communicated time predictions, openly admitted that they did not really know what they meant by their time predictions.

Table 3.2 What do software developers mean when communicating a time prediction?

Full size table

We have conducted several studies of this type in various contexts and they all show great variety in what is meant by a time prediction. This was the case even within a homogeneous context, such as within a single company. The studies also confirm that the meaning is usually not communicated by those producing the time prediction and that those receiving the predictions rarely requested such information. Requesting a time prediction without stating precisely what is wanted could lead to time predictions representing anything from best-case to risk-averse thinking. Much of what seem to be time prediction errors and unrealistic plans may simply be the consequences of poor communication of the meaning of the predictions.

Sometimes people try to explain what their time prediction means by including verbal probabilities or qualifiers, such as ‘very likely to take less than four hours’, ‘possible to be completed in two days’, ‘will take about three hours’, and ‘can take 10 hours’. Such phrases are not only vague with a strongly context-dependent meaning [5], but also frequently misunderstood. The time prediction ‘it can be finished in five days’ is, for example, likely to be understood differently by the person communicating it and the person receiving it. The person communicating it tends to think that five days is an extreme outcome, in this case, perhaps the best-case outcome. The person who receives the time prediction will, on the other hand, tend to interpret it as a likely outcome [6]. While using the word can ensures that you are never wrong (you never claimed that it was certain or even likely that the job would be finished in five days), it is certainly not a precise way of communicating time predictions. Similar interpretation challenges accompany the use of more than and less than. A task predicted to take more than 10 hours may, for example, be interpreted as larger than one predicted to take less than 20 hours [7]. The use of verbal probabilities and qualifiers, in spite of their frequent use in professional and daily life, turns out not to be very helpful when communicating time predictions.

In some cases, we can make a good guess, perhaps based on the context and previous experience, of what is meant by a time prediction. If someone says, ‘I’ll be there in 5 minutes’, our previous experience with that person may tell us that this is a best-case time prediction. If nothing goes wrong, the person will be there in five to 15 minutes; otherwise, it may take much more time. In cases in which we have little experience, the lack of explanation of what is meant by a time prediction may be quite unfortunate and lead to frustration and poor decisions. One way to give time predictions more meaning and to communicate that meaning is through the use of probabilities and distributions. This is the topic of the next section.

Take home message 1: It is often not clear what people mean when they give a time prediction. The meaning varies greatly and is sometimes not even clear to those who made the time prediction.

Take home message 2: Not explaining what is meant by a time prediction and not asking for an explanation of its intended meaning may lead to misunderstandings and unrealistic plans.

3.3 Probability-Based Time Predictions

The time usage to complete a task may be predicted and given meaning through the use of a frequency distribution of the actual time usage of similar tasks on previous occasions. Assume, for example, that driving your car to work from home usually takes about 30 minutes. It may take a bit less, a bit more, or much more time if there is a great deal of traffic or an accident blocking the road. Let us say that the frequency distribution of driving times, when starting from home between 8 a.m. and 9 a.m., based on 1000 observations, is as shown in Fig. 3.1.

The distribution in Fig. 3.1 tells us that approximately 30 minutes is the most likely driving time, with 80 observations. This value is called the mode in statistics.^{Footnote 1} In many cases, the most likely time usage is good enough as a time prediction. If, on the other hand, we need to be fairly sure of being on time, the most likely time usage may not be very helpful. The distribution in Fig. 3.1 tells us that we will use 30 minutes or less only in about 30% of the times, that is, only 30% of the observations of previous driving times are on the left side of 30 minutes in our distribution. If the past is a reliable indicator of future time usage, this means that it is only 30% likely that 30 minutes will be enough. To be quite sure to be on time, say, 90% sure, we should draw a vertical line in Fig. 3.1 so that 90% of the observations are on the left side. For the distribution in Fig. 3.1, this corresponds to a value of 55 minutes. We would then, for example, have to leave home at 7:05 a.m. to be 90% sure of arriving before 8:00 a.m. To be 99% sure of being on time, we would need an even higher value from the distribution of past time usage. In Fig. 3.1, 99% certainty corresponds to about 70 minutes. In other words, we would have to leave home at 6:50 a.m. to be 99% sure of arriving before 9:00 a.m. The distribution in Fig. 3.1 also illustrates our previous point about how meaningless it can be to talk about a time prediction without stating what it means.

To communicate the meaning of time predictions, we can use probabilities and distributions in several ways:

We may present the full distribution of possible time outcomes, that is, the full distribution of Fig. 3.1. The receiver of the time prediction may then use the value that best reflects his or her time prediction needs.
We may present a two-sided time prediction interval. A two-sided time prediction interval is a minimum–maximum interval together with the probability that the actual time usage will be inside the interval. We could, for example, give the 90% prediction interval of 23–60 minutes, because 90% of the observations for past time usage in Fig. 3.1 are more than 23 minutes and less than 60 minutes.
We may present a one-sided prediction interval. Using the observations in the distribution in Fig. 3.1, we may predict that it is 50% likely that we will spend less than 36 minutes or that we are 90% confident of spending less than 55 minutes.

One-sided prediction intervals are sometimes called pX predictions. A pX prediction of Y hours means that we think that using Y hours or less is X% likely. Time usage pX predictions are used for project evaluation and management in several domains. When used for project management purposes, the p50 prediction may be used for planning and the p85 prediction for budgeting purposes, meaning that 50% of projects are not expected to exceed the planned use of time and the budget is expected to be sufficiently large 85% of the time. A published evaluation report suggests that the implementation of these two types of pX predictions and associated uncertainty assessment methods have a positive impact on the realism of project time and cost predictions.^{Footnote 2}

A time prediction can be any value of the outcome distribution as long as we explain what is meant. Three values of the probability distribution are, however, of special interest for time predictions: the mode (the most likely value), the median (the middle observations, or p50 prediction), and the mean (the expected value).

The most likely use of time is usually easy to identify from the distribution, since it is the point or interval with the highest frequency of occurrence. The most likely use of time is the value we would choose if we tried to maximize the likelihood of very accurate time predictions. Using the data in Fig. 3.1, we would find that a time prediction based on the most likely value (30 minutes) would be within ±5 minutes of the actual time in 36% of the cases. The corresponding proportion of time predictions within ±5 minutes would be 30% when using the median (38 minutes) as our time prediction and 27% when using the mean (40 minutes). The drawback is that, by maximizing the likelihood of very accurate predictions, we may harm our other time prediction goals. The median and mean values have properties that often make them more suitable as time prediction values.

The median use of time in Fig. 3.1 is 37 minutes. The median is the value we would chose if we were trying to minimize the mean deviation between the predicted and the actual time usage. Using the data in Fig. 3.1, we would have a mean time prediction error of nine minutes when using the median as our prediction, 11 minutes when using the most likely time as our prediction, and 10 minutes when using the mean value as our prediction. Although the most likely time usage is more often very accurate, it is sometimes far off, which makes the median more accurate, on average. Another useful property of the median value is that it is frequently more robust than the mean value; that is, it is less affected by extreme values. This is especially useful if we have few observations of past time usage.

The mean use of time is the sum of individual time usages divided by the number of observations. The mean value in Fig. 3.1 is about 40 minutes. This value is hard to observe directly from the distribution and frequently hard to judge based on experience, partly because it may be strongly affected by extreme values. The mean is the point in the distribution where the sum of the time prediction error will be the same for all overruns as it is for all underruns.^{Footnote 3} This is difficult to imagine, so it may, instead, be useful to think of the mean value as the balance point of the distribution. Assume that the distribution in Fig. 3.1 is placed on an old-fashioned scale. The mean value would be the point of the scale where the scale would be in balance, whereas the point of the median or the most likely value would result in imbalance (see Fig. 3.2, where the left panel shows the use of the mean and the right panel the median as the points where we try to balance the scale).

The mean value takes into account how far away extreme observations are, since a value that is 14 hours more than the balance point has the same weight as 14 values that are one hour less than the balance point. In contrast, to find the median value, one just has to count the observations and ensure that as many observations are above as below it. Research suggests that we tend to underestimate the mean value when presented with a right-skewed distribution, such as the distribution in Fig. 3.1 [9], and when establishing the distribution from memory.

If the mode and the median are easier to understand and to calculate and are more robust (less affected by outliers) than the mean, why do we want to determine the mean of an outcome distribution? The main reason for using the mean when predicting time is that the mean value minimizes the deviation between the sum of time predictions and the sum of the actual time usages. When we want to know the total time usage of a set of tasks or projects, that is, when we want to add time predictions, this property is crucial. It may be required, for example, when breaking down a project and summarizing time predictions of subtasks or when considering the overall potential for cost overrun of a set of projects.

If, for example, we use the prediction of the most likely time usage to predict driving to work from home 10 times, the sum of these predictions is likely be too low to reflect the actual total time. If, on the other hand, we use the mean prediction, we would typically be more correct about the total time of driving 10 times. Similarly, if you are the chief executive officer of a large company and have four large projects running with most likely costs of $500 million each, you should not expect the total costs to be $2 billion but, most likely, substantially higher, given the right-skewness of most project cost distributions. We explain in more detail about why this is so and how to properly add time predictions later in this book. Table 3.3 summarizes the suitability of the most likely, median, and mean time prediction values, given different time prediction goals.

Table 3.3 The suitability of the prediction type depends on the prediction goal

Full size table

You may have started wondering how we can know the time usage distributions, which is a prerequisite for even thinking about using the mode, median, and mean time usage as our time prediction. In some cases, we have highly relevant data about past time usage, perhaps for travel times or production times enabling such knowledge, as in Fig. 3.1. More often, this is not the case. Since we hardly ever know the exact probabilities of future outcomes, we may have to try to derive or guess the outcome distribution from memory and other knowledge, that is, by expert judgement. Although sometimes hard, this approach is required to provide a good model for thinking and talking about time predictions, to enable us to connect prediction goals with types of time predictions, and to be precise about the uncertainty of our time predictions. You can read more about how to derive time usage distributions in Chap. 6.

Take home message 1: Good ways of presenting and communicating your time predictions include those using two-sided prediction intervals, such as ‘it is X% likely that the work will take between Y and Z hours’, and those using one-sided prediction intervals (pX predictions), such as ‘I am X% confident that it will take less than Y hours’.

Take home message 2: Different values of the outcome distribution, such as the most likely, the median, and the mean, optimize different time prediction accuracy functions and meet different time prediction goals.

3.4 Right-Skewed Time Distributions

As part of his work as a graphic designer, Tom is asked to take photos of trees, flowers, and people for use in digital illustrations. The last time he went to take photos for a similar project, he spent about five hours. He is aware that the time usage may vary from occasion to occasion, so he tells his manager that he expects to be back in five hours plus or minus two hours. What is wrong with Tom’s time prediction?

There may be several questionable elements of Tom’s time predictions, such as not communicating how likely he believes that the actual time will be within the stated interval, but the problem we are concerned with in this section is his idea of a symmetric interval, or symmetric distribution, of time usage. Although we do not really know what is going on in Tom’s mind, it seems as if he assumes that two hours more than the predicted time is as likely as two hours less. If Tom were aware of the typical asymmetric distribution of time usage, he should have given an asymmetric interval. An asymmetric and probably more realistic time prediction could be, for example, that the shoot will most likely take about five hours, most certainly between four and eight hours.

Maybe the only data of past time usage Tom could think of concerned the last time he did a similar project, so he is excused for not considering the entire distribution of potential time usage. However, let us say that there are a few hundred graphic designers such as Tom, all performing the same or highly similar tasks, and that we plot all their actual time usages for this task in a graph. What would this look like? In Fig. 3.3, we present three different alternatives.

The distribution in panel A is a symmetric, so-called normal, or Gaussian, distribution, where spending one hour more than the most likely time usage is just as likely as spending one hour less. Distribution A corresponds to Tom’s naïve belief about time usage when giving a time usage interval symmetric around the most likely value. The distribution in panel B is right-skewed, that is, a distribution with a long right tail. Distribution B would be the result if a task is as likely to take a bit less as it is to take a lot longer than usual. Distribution C is left-skewed, that is, a distribution with a long left tail. It would be the result if a task is as likely to take substantially less as it is to take slightly more than usual. Which figure is likely to correspond to the distribution of time usage for a few hundred graphic designers doing the same task?

Most time usage distributions seem to be most similar to distribution B. Time usage distributions tend to have a long and sometimes thick right tail. The thickness and length of the tail may vary, but we have yet to see a time usage distribution that is strongly left-skewed, such as distribution C in Fig. 3.3. We therefore feel quite confident about claiming that nearly all time usage distributions are right-skewed. Why is this so and why is it relevant?

Right-skewed distributions are found everywhere in nature. They are probably more common than symmetric ones and could be the result of a range of different processes.^{Footnote 4} In the realm of time usage, right-skewed distributions may be related to the fact that no activity can take less than zero time to complete, while there is hardly any absolute upper time limit for any activity. Even the slightest activity, given terrible luck or extreme perfectionism, can take a very long time. The poet Franz Wright spent six years on a five-line poem [11]—but no poem has ever been written in zero seconds or less.

Related to the lack of an upper boundary of time usage, you may have experienced that more things can go very wrong than very right. The history of the Sagrada Familia cathedral is a good example of how bad things can go. Construction began in 1883, was only 15% completed in 1926, and was predicted in 2013 to be finished in 2026. Among other events, delays were caused by inconsistent funding, a civil war, two world wars, highly complex construction work, and a change of architects. There is no shortage of projects or endeavours similar to this one, where the list of possible negative events leading to greater time usage is nearly endless [12].

What about projects that are extremely successful? Of course, there are projects completed ahead of their time predictions and with lower costs than predicted. For instance, the T-REX infrastructure project in Denver finished 22 months ahead of schedule, which is rather substantial for a project predicted to last about seven years. Doing things in new and clever ways and the use of new technology may decrease duration and costs somewhat, but we have yet to see a project predicted to last five years taking two weeks or a project predicted to cost $2 billion ending up with costs of about $1000. Thus, there is a limit on how fast one can do something and how inexpensive projects turn out to be, but hardly any limit to the other end of the scale. Furthermore, we rarely plan our projects to be inefficient, so our predictions are often based on imagining success [13], which also limits the potential for completing projects ahead of schedule and with less time usage than predicted.

There is also another, frequently forgotten reason why the distributions of time usages is right-skewed. Consider a run on a 400-metre track under windy conditions. In one of the straight 100-metre sections, there is a headwind, decreasing the speed, and in the other a tailwind, increasing the speed. One could easily think that the negative effect of headwind is compensated for by the positive effect of tailwind. The hindrance (the headwind), however, lasts longer (in spite of the same distance) than the advantage (the tailwind), leading to a slower total time compared to a situation without wind. While this may seem like a strange example, it illustrates an effect that may contribute to right-skewed distributions of task completion time: An increase in time usage in a project due to negative events enables more negative events to happen due to more time spent, which generates further increases in time usage and greater right-skewness (which enables even more negative events to happen and so on). The Sagrada Familia construction is such an example. The initial delays enabled further delays due to the civil war and two world wars. In contrast, a project’s high productivity is not likely to be a factor contributing to a higher likelihood of even further increases in productivity. If anything, high productivity reduces opportunities for further decreases in time usage because there is less time left to be shortened.

Take home message 1: Based on empirical data and on analytical reasoning we can assume that typical distributions of time usages are right-skewed, that is, have a longer tail towards higher values.

Take home message 2: The most extreme deviations from typical time usage are nearly always found on the right side of the distribution. That is, one can use much longer times than what is common or expected, but one rarely experiences extreme cases of using less. In any case, the lower limit of time usage is zero.

Take home message 3: The hours saved from times you are more productive than usual (e.g. tailwind) will seldom compensate for the hours lost from times you are less productive than usual (e.g. headwind). In other words, your efficiency is rarely as extreme as your inefficiency.

3.5 Relearning to Add: 2 + 2 Is Usually More Than 4

Most people think they know how to add. It may therefore come as a surprise that, in the world of time predictions, 2 hours + 2 hours is often not 4 hours. It is usually more than 4 hours, perhaps as much as 5 or 6 hours. The nonintuitive addition of time predictions is a consequence of the probabilistic (stochastic) nature of time usage and its right-skewed distribution.

To illustrate what happens if we add time predictions the way we learned in school, we reuse the example of driving to work from home. If you do not recall it, have another look at Fig. 3.1 (Sect. 3.3). The figure shows the distribution of time usage of a drive to work from home. What do you think is the most likely total usage of time for one full year of driving, assuming that we drive 200 times per year?

If we add all the most likely values for each of the 200 trips (=30 minutes × 200), the most likely total time usage for one year would be predicted to be 6000 minutes (=100 hours). This prediction of the total time usage would be far too low to reflect the likely total time usage.

Assume that we sample 200 outcomes from the distribution in Fig. 3.1 and sum them. This simulates the total driving time for one year. If we repeat this sampling process 1000 times, we obtain a distribution of the total time for one year of driving based on 1000 values. One such distribution is displayed in Fig. 3.4. As can be seen, a prediction of 6000 minutes based on summing the most likely time usages is not even close to the lowest observed sum of the time usage. The most likely sum of time usage is, instead, somewhere around 8000 minutes. What is going on? Why is the sum of the most likely time usages not the most likely sum of time usage? (If you find this question difficult to grasp, you are not alone).

To correctly add time predictions, we need to take the long tail of the distribution of driving time usage into account. In that respect, the most likely time predictions do a poor job. So does the median value. In the case of the 200 trips from home to work, adding the median predictions would yield a total prediction of 7400 minutes (the median of 37 minutes multiplied by 200 trips = 7400 minutes). As can be seen in Fig. 3.4, this prediction is also far too low to be realistic. The only type of prediction that can be used here is predictions of the mean driving time. In contrast to the median and the most likely values, the mean value incorporates the extreme values of the long tail of the distribution.

Adding the mean values also leads to some statistical magic. Even if the individual time usage distributions are heavily skewed (as in Fig. 3.1), the distributions of the total time (as in Fig. 3.4) will approach symmetry and consequently have similar values for the most likely, median, and mean total time usage values. This magic is described as the central limit theorem from statistical theory, which holds that the sums of distributions, even heavily skewed ones, will be close to a normal, symmetric distribution if certain conditions are met, such as independence of the added elements. To be honest, this magic rarely represents the time usage outcome distribution in real-life projects. There are usually numerous dependencies between tasks and tasks that are forgotten in the time predictions, leading to a right-skewed outcome distribution of the sum. A common example of dependency between tasks is when the time spent on one activity is a proportion of another, that is, there is multiplicative dependency between tasks. Take, for example, the common situation in projects in which the time spent on administration is a proportion of the time spent on construction work. If the time spent on construction increases, so does the time spent on administration. To find the time spent on administration, we may either multiply the distribution of time spent on non-administrative work by the distribution of the proportions of time spent on administration, or we can model the relation by including a correlation between the two activities.^{Footnote 5} Back in our example of driving, where we assume independence of the individual driving times, the total time usage of 200 trips from home to work is likely to be an approximately normally and symmetrically distributed variable with a central value of 200 times the mean value (200 × 40 minutes = 8000).^{Footnote 6} The expected total time usage for driving 200 times is consequently 8000 minutes.

The differences between using the mean, the most likely, and the median values when adding time predictions have fascinating practical implications. First, you might experience a time overrun of your project, even when most of the time predictions of the subtasks are pessimistic. A real-world example of this situation is that of an information technology project in which the predictions and actual time usages of 443 project tasks were recorded. The predictions of 196 of the tasks were too optimistic (time overruns), with actual times of up to four or five times the predicted time. The predictions of 215 tasks were pessimistic (time underruns), in some cases with actual time usages less than 1/10 the predicted time. In spite of more tasks with pessimistic predictions than optimistic predictions, there was a time overrun for the project at large. The predicted number of work hours spent on the 443 tasks was 2723, whereas the actual number of work hours on the same tasks was 3130 (15% overrun).^{Footnote 7} Overpessimism at the task level and overoptimism at the aggregated project level is perfectly understandable when taking the long, fat tail of time distributions into account. If you predict that a task will take 30 hours and it takes 10 times as long, you have a 270-hour overrun. If you manage to spend only 1/10 of the predicted time, you have a time underrun of only 27 hours. Cases of underrun rarely compensate for cases of overrun. Inefficiency trumps efficiency.

We frequently have an interest in the total time usage. We may, for example, have many smaller tasks to complete at home and wonder if we are able to manage them all. Companies may have several projects running simultaneously and are interested in the total cost compared to the total budget, and most projects include numerous subtasks. If we want to take control over our schedules and investments, we have to take the challenge of adding time predictions seriously. It is likely that many time overruns are caused not by poor time prediction abilities but, rather, by poor time prediction addition abilities.

Take home message 1: The sum of the most likely time usages of individual activities is not the same as the most likely total time usage of the same activities. When adding most likely time predictions, you will obtain a prediction of the most likely total time usage that is too low.

Take home message 2: For the proper addition of time predictions, you should add the predicted mean value of each subtask.

3.6 How to Predict the Mean Time Usage

The previous section demonstrated that predictions of the most likely time usage of tasks cannot be added to obtain the expected total time usage. For this purpose, we need predictions of the mean time usage of all subtasks. Unfortunately, it is not likely that you will receive the predicted mean time usage even when explicitly requested. Determination of the mean value is much more complex than finding the middle (median) or most frequent (most likely, mode) value of a distribution. Even when you observe the full distribution of past outcomes, it may be difficult to judge what is the mean value.

One common approach to obtain predictions of the mean value is to derive a distribution based on so-called three-point estimation. The Program Evaluation and Review Technique (PERT) project planning approach, for example, requires the input of the most likely value, the minimum (best-case) value, and the maximum (worst-case) value [15] and calculates the mean by use of the formula

$$ Mean = \frac{minimum + 4 \cdot {most}\, {likely} + maximum}{6} $$

The PERT method and similar approaches may be helpful in solving the problem of finding the mean value but they introduce new ones. One problem is that people are typically very poor at making best- and worst-case time usage predictions. For instance, in one study, students first predicted the time usage they were 99% sure not to exceed (the p99 prediction) for software development tasks, along with the best case that would occur with only a 1% chance (the p1 prediction). This means that, in 98% of the cases (=p99 − p1), given realistic values, the actual time usage should be between the stated minimum and maximum time usages. This did not happen. After completing the tasks, it turned out that the actual time usage was inside their 98% confidence intervals in only 57% of the cases [16]. This result is typical of studies where people are asked to give minimum and maximum values (see Chap. 6 for more on this issue). To make things even worse, the original PERT approach assumes that the best- and worst-case predictions respectively correspond to p0 and p100,^{Footnote 8} which, for most real-world tasks, are meaningless values that are impossible to derive from experience or historical data.

As an alternative to the current approaches, such as the PERT model, we developed a new three-point prediction tool (a spreadsheet model) that provides predictions of mean outcomes based on user-determined confidence levels.^{Footnote 9} We believe that the tool has several important features that make it different from and perhaps better than other approaches. First, it forces the person making the prediction to look back and use historical information. Neglecting historical information may be one of the main reasons for poor time usage predictions in many domains [18]. Second, the tool does not require the prediction of extreme outcomes, such p1 or p95 predictions. Third, it does not require a particular meaning of the time prediction used as reference (median, most likely, p85, etc.), as long as the meaning is the same used for previous time predictions. The steps to calculate the mean time prediction are as follows:

1.
Predict the time usage. The prediction, the reference, may be a prediction of the most likely use of time or any other type of time prediction.
2.
Assess the accuracy of similar past predictions. Select two prediction accuracy points for which you have historical information or can make a qualified judgement. Each accuracy point should include (a) the prediction error (the deviation of actual outcomes from the prediction) and (b) the frequency of occurrence. For example, you may know that, for about seven out of 10 (70% occurrence) previously completed tasks similar to the one being predicted, you spent less than 130% of the predicted time. This means that your p70 prediction is 130%. You need one more such accuracy point. The second assessment could, for example, be that you spent less than 90% of the predicted time in three out of 10 cases (30% occurrence), meaning that your p30 prediction is 90%.
3.
Input the accuracy points into the spreadsheet, which calculates the uncertainty distribution, the pX values and the mean value.

Example: Assume that you have predicted the most likely time usage to be 30 minutes. You know from similar situations that, in about 90% of the cases, the actual time usage was less than twice (200%) your predicted time usage and that, in about 50% of the cases, the actual time usage was less than 130% of your predicted time usage. This means that you have a p90 prediction of 200% the original prediction and a p50 prediction of 130% the original prediction. Using the spreadsheet, assuming a lognormal distribution of time usage,^{Footnote 10} yields a mean time prediction of 41 minutes. The time usage distribution is displayed in Fig. 3.5, showing, for example, that the most likely value is around 33 minutes. The pX distribution is displayed in Fig. 3.6, showing, for instance, that the p95 prediction is a bit less than 70 minutes. For more details on this method for making realistic pX predictions, including more examples, see Chap. 6.

Take home message 1: The calculations of the predicted mean value of an outcome distribution are typically based on giving two or three points, such as a low value, the most likely value, and a high value, of the distribution as input.

Take home message 2: Predictions of a low value (e.g. a p10 prediction) and a high value (e.g. a p90 prediction), when unaided by historical data and proper methods, tend to be very inaccurate and result in underestimation of the time usage uncertainty. Methods that do not compensate for this human bias, such as typical use of the PERT method, will tend to underestimate the mean values and, consequently, the total time usage.

Take home message 3: This book offers a method and a tool for predicting the mean value from two user-determined points of the historical time prediction error distribution.

3.7 How Time Predictions Affect Performance

Predicting the weather does not have any effect on the actual weather. Predicting time usage is different, since the prediction can have an impact on the actual time usage [19]. A famous book by Parkinson, the fellow behind the ‘law’ stating that work expands to fill the time available for its completion, includes a story that nicely illustrates this difference [20]. The story is about an elderly lady who spends a whole day to send a postcard to her niece. First, she has to go buy the postcard, then she must walk home and find her glasses, decide on what to write, write it, eat lunch, decide on whether to take an umbrella or not, buy stamps, drink another cup of tea, and so forth. The lady’s prediction of the time it takes to send a postcard would be a full day, because she has a full day available. A busy lady would perhaps predict spending five minutes on the same task, because that is all the time she has available. An extension of Parkinson’s law, relevant in many contexts, would be that many types of work expands to fill the time available for its completion, plus a little more. Even with plenty of time to complete a task, we may end up with a time overrun due to low productivity in the initial stages or poor planning of the time required for the last part of the job. The old lady may, for example, receive a visitor just before she is going to post the postcard in the afternoon, miss the hours the post office is open, and have to postpone the rest of the task to the next day.

If too high time predictions can lead to lower productivity, what about too low time predictions? Do they lead to increased productivity? A study on software development teams found an inverted U-shaped relation between the degree of perceived time pressure and productivity [21]. Here, time pressure was measured as how much the team’s initial time predictions were reduced based on pressure (negotiation) from the client.^{Footnote 11} The study found that, if the software development teams were allowed to use their original time predictions as the planned time, they had lower productivity than when the time predictions were reduced by up to 20% due to client negotiations. However, when the reduction in planned time continued beyond 20%, the time pressure became too high and productivity tended to decline. Figure 3.7 illustrates a possible relation between time pressure and productivity.

Other contexts may show different results as to when pressure is positive and negative, but it is reasonable to believe that, in many contexts, a great deal of time pressure and very little time pressure both have negative consequences on productivity.

Not only can work productivity be affected by pressure from low time predictions, but also the quality of the work. In an experiment, we found an increase in the number of errors made by software programmers when the time prediction was intentionally set 40% below what we would expect from the participants’ previous work [22]. Based on that study and other experience, the relation between time prediction-induced work pressure and quality could be as depicted in Fig. 3.8.

Since time predictions can affect quality and productivity, the introduction of incentives, such as evaluations or financial rewards, to achieve time prediction accuracy can be problematic, as the following real-world example illustrates.

A company introduced a financial bonus for those project leaders who made accurate project time usage predictions. The following year, time prediction accuracy improved greatly, but the company also experienced a decrease in productivity. What happened was a natural reaction to the new practice of rewarding accurate time predictions. Project managers, smart as they are, raised their time usage predictions to predictions they were almost certain not to exceed. The extra buffer in the planned use of effort was used for improvements in quality, training, testing, and documentation (exploiting Parkinson’s law). For the company, this decrease in productivity was not beneficial and it soon stopped rewarding accurate time predictions.

The strategies of producing accurate time predictions by lowering productivity in situations with too high time predictions or cutting corners in situations with too low time predictions require the work process or product to be flexible. This is the case for much of what we do in life. If I make a bet on how long it will take to drive to work, I may be able to win the bet by adjusting my behaviour, especially if the prediction is high. If my prediction is as long as 60 minutes, I can drive slowly, stop at the gas station to fill up the tank, and spend time on searching for the perfect parking spot. If my prediction is a bit on the low side, for example, 25 minutes, I can drive fast, violate traffic rules, and park illegally. We could call the first prediction accuracy strategy work stretching (stretching work to fit a high prediction) and the second work shrinking (shrinking work to fit a low prediction). These strategies are common. Without them, our time predictions would look worse—sometimes much worse—than they currently do. Stretching and shrinking may sometimes be acceptable, but these strategies may violate other important goals, such as productivity and quality.

Take home message 1: Time predictions may affect the work, especially when it is highly flexible. Time predictions that are slightly too low may increase productivity, while those that are much too low or too high may decrease productivity. Time predictions that are too low may also lead to reductions in the quality of the work.

Take home message 2: Rewarding accurate time predictions is usually not a good idea, particularly if people behave strategically and stretch the work (lower work productivity) or shrink it (lower work quality) to fit the prediction.^{Footnote 12}

Notes

1.
The most likely use of time does not necessarily have to be interpreted as the mode (defined as the most frequently observed value) of the empirical distribution. The ‘natural peak’ of the distribution, if we draw a smooth line over the bins in Fig. 3.1, is what we would consider as the most likely use of time, but this does not necessarily correspond to the most frequently observed value in the data. The most frequently observed value also depends on the granularity of the time usage values included in the distribution (decimals may give a different mode than whole minutes).
2.
See [8]. The report has possible selection bias due to the omission of projects that completely failed, but, even after adjusting for this, there seems to be a positive effect on prediction realism.
3.
The mean value also optimizes the square of the deviations between predicted and actual time usage.
4.
See [10]. This paper argues against the use of the central limit theory to claim the dominance of normal distributions in nature. Observed distributions are the result of several combinatory processes, not just the addition of independent elements, which is a prerequisite for the generation of a normal distribution.
5.
All too often, as far as we have experienced, time predictions are added, for example, using Monte Carlo simulation, without modelling the dependencies between the elements. This results in an unrealistic symmetric distribution of total time usage, in accordance with the central limit theorem. We strongly recommend modelling the main dependencies when predicting the time usage of larger projects. Use for example the free risk analysis tool Riscue (www.riscue.com), or commercial tools such as @RISK (www.palisade.com/risk/) for this purpose.
6.
In this case, but not generally, it is reasonable to assume independence between the added element (the driving times), implying that the central limit theorem creates the magic and the distribution becomes symmetric. When a distribution is symmetric, the mode, median, and the mean—that is, the central values—are the same.
7.
A perhaps even better demonstration of this counterintuitive outcome comes from a study of 4000 software projects. In this study, overestimation was just as common as underestimation (with a median time overrun of around 0%), but the mean time overrun was as high as 107%. See [14].
8.
That is, the value that is so low it has a 0% probability of occurring and the value of which you are 100% not to exceed. Note that there are modifications of the PERT model that enable the use of p10 and p90 instead of p0 and p100, such as described in [17].
9.
Downloadable from www.simula.no/~magnej/Time_predictions_book.
10.
It is possible to extend the spreadsheet to distributions, but lognormal distributions seems to accommodate most of the typical time usage distributions.
11.
More precisely, they measured time pressure as (time predicted by the development team—time negotiated by the customer)/time predicted by the development team.
12.
The second strategy may be meaningful within a framework of a fixed budget and flexible content (design to cost, agile development with flexible scope, etc.). In that case, the pressure is not necessarily to reduce the quality, given insufficient time to complete all the tasks but, rather, to reduce the amount of deliveries.

References

Jørgensen M, Teigen KH (2002) Uncertainty intervals versus interval uncertainty: an alternative method for eliciting effort prediction intervals in software development projects. In: International conference on project management (ProMAC). Singapore, pp 343–352
Google Scholar
Teigen KH (1990) To be convincing or to be right: a question of preciseness. In: Gilhooly KJ, Keane MTG, Logie RH, Erdös G (eds) Lines of thinking. Wiley, Chichester, pp 299–313
Google Scholar
Jørgensen M (2016) The use of precision of software development effort estimates to communicate uncertainty. In: International conference on software quality. Springer International Publishing, pp 156–168
Google Scholar
Jørgensen M (2014) Communication of software cost estimates. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, ACM, p 28
Google Scholar
Brun W, Teigen KH (1988) Verbal probabilities: ambiguous, context-dependent, or both? Organ Behav Hum Decis Process 41(3):390–404
Article Google Scholar
Teigen KH, Filkuková P (2013) Can > will: predictions of what can happen are extreme, but believed to be probable. J Behav Decis Making 26(1):68–78
Article Google Scholar
Teigen KH, Halberg AM, Fostervold KI (2007) Single-limit interval estimates as reference points. Appl Cogn Psychol 21(3):383–406
Article Google Scholar
Samset K, Volden GH (2013) Investing for impact. Concept report 36. www.ntnu.no/documents/1261860271/1262010703/Concept_rapport_nr_36.pdf. Accessed March 2017
Peterson C, Miller A (1964) Mode, median, and mean as optimal strategies. J Exp Psychol 68(4):363
Article Google Scholar
Limpert E, Stahel WA, Abbt M (2001) Log-normal distributions across the sciences: keys and clues. Bioscience 51(5):341–352
Article Google Scholar
Hilbert E (2006) The secret glory. An interview with Franz Wright. See www.cprw.com/Hilbert/wright.htm. Accessed March 2017
Silverman J, Kiger PJ 10 construction projects that broke the bank. science.howstuffworks.com/engineering/structural/10-construction-projects.htm#page=5. Accessed March 2017
Newby-Clark IR, Ross M, Buehler R, Koehler DJ, Griffin D (2000) People focus on optimistic scenarios and disregard pessimistic scenarios while predicting task completion times. J Exp Psychol: Appl 6(3):171–182
Google Scholar
Budzier A, Flyvbjerg B (2013) Making sense of the impact and importance of outliers in project management through the use of power laws. In: Proceedings of IRNOP (International Research Network on Organizing by Projects)
Google Scholar
Wikipedia: The free encyclopedia program evaluation and review technique. en.wikipedia.org/wiki/Program_evaluation_and_review_technique. Accessed March 2017
Connolly T, Dean D (1997) Decomposed versus holistic estimates of effort required for software writing tasks. Manag Sci 43(7):1029–1045
Article Google Scholar
Kim SD, Hammond RK, Bickel JE (2014) Improved mean and variance estimating formulas for PERT analyses. IEEE Trans Eng Manag 61(2):362–369
Article Google Scholar
Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 185(4157):1124–1131
Article Google Scholar
Buehler R, Peetz J, Griffin D (2010) Finishing on time: when do predictions influence completion times? Organ Behav Hum Decis Process 111(1):23–32
Article Google Scholar
Parkinson CN (1957) Parkinson’s law and other studies in administration, vol 24. Houghton Mifflin, Boston
Google Scholar
Nan N, Harter DE (2009) Impact of budget and schedule pressure on software development cycle time and effort. IEEE Trans Softw Eng 35(5):624–637
Article Google Scholar
Jørgensen M, Sjøberg DI (2004) The impact of customer expectation on software development effort estimates. Int J Proj Manag 22(4):317–325
Article Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian Institute of Public Health, Oslo, Norway
Torleif Halkjelsvik
Department of Software Engineering, Simula Research Laboratory, Fornebu, Norway
Magne Jørgensen

Authors

Torleif Halkjelsvik
View author publications
You can also search for this author in PubMed Google Scholar
Magne Jørgensen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Torleif Halkjelsvik .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Halkjelsvik, T., Jørgensen, M. (2018). Predictions and the Uncertainty of the Future. In: Time Predictions. Simula SpringerBriefs on Computing, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-74953-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-74953-2_3
Published: 01 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74952-5
Online ISBN: 978-3-319-74953-2
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics

Predictions and the Uncertainty of the Future

Abstract

3.1 Precisely Wrong or Roughly Right?

3.2 Communication of Time Predictions

3.3 Probability-Based Time Predictions

3.4 Right-Skewed Time Distributions

3.5 Relearning to Add: 2 + 2 Is Usually More Than 4

3.6 How to Predict the Mean Time Usage

3.7 How Time Predictions Affect Performance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation