Skip to main content
Log in

Practical challenges of requirements prioritization based on risk estimation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Requirements prioritization and risk estimation are known to be difficult. However, so far, risk-based requirements prioritization has not been investigated empirically and quantitatively. In two quantitative experiments, we explored practical challenges and needs of risk estimations in general and of our method MOQARE specifically. In the first experiment, ten students made individual estimations. In the second one, twenty-four students estimated risks in seven moderated groups. The students prioritized the same requirements with different methods (risk estimation and ranking). During the first experiment, we identified factors which influence the quality of the prioritization. In the second experiment, the results of the risk estimation could be improved by discussing risk estimations in a group of experts, gathering risk statistics, and defining requirements, risks and prioritization criteria more tangibly. This first quantitative study on risk-based requirements prioritization helps to understand the practical challenges of this task and thus can serve as a basis for further research on this topic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Ambler SW (2002) Agile modeling-effective practices for extreme programming and the unified process. Wiley Computer Publishing, New York

    Google Scholar 

  • Arora A, Hall D, Pinto CA, Ramsey D, Telang R (2004) An ounce of prevention vs. a pound of cure: how can we measure the value of IT security solutions? Carnegie Mellon CyLab

  • Beck K (2000) Extreme programming explained. Upper Saddle River, Addison-Wesley

    Google Scholar 

  • Berander P (2004) Prioritization of Stakeholder Needs in Software Engineering. Understanding and Evaluation. Licenciate Thesis, Blekinge Institute of Technology, Sweden, Licentiate Series No 2004:12

  • Berander P, Jönsson P (2006) Hierarchical cumulative voting (HCV)-prioritization of requirements in hierarchies. Int J Softw Eng Knowl Eng 16(6):819–849. doi:10.1142/S0218194006003026

    Article  Google Scholar 

  • Carver J, Shull F, Basili V (2003) Observational Studies to Accelerate Process Experience in Classroom Studies: An Evaluation. Proc. of the 2003 Int. Symposium on Empirical Software Eng. ISESE, Rome, Italy, pp 72–79

    Google Scholar 

  • Cook TD, Campbell DT (1979) Quasi-Experimentation–Design and Analysis Issues for Field Settings. Houghton Mifflin Company, Boston

    Google Scholar 

  • Chulani S, Boehm B, Steece B (1999) Bayesian analysis of empirical software engineering cost models. IEEE Trans Softw Eng 25(4):573–583. doi:10.1109/32.799958

    Article  Google Scholar 

  • Daneva M, Herrmann A (2008) Requirements Prioritization Based on Benefit and Cost Prediction: A Method Classification Framework. Track on Software Process and Product Improvement (SPPI), 34th Euromicro Conf., Parma, Italy, 1–5 Sept. 2008

  • Davis AM (2003) The Art of requirements triage. IEEE Comput 36(3):42–49

    Google Scholar 

  • Denne M, Cleland-Huang J (2003) Software by Numbers: Low-Risk, High-Return Development. Prentice-Hall

  • Devnani-Chulani S (1999) Bayesian Analysis of Software Cost and Quality Models. A Dissertation Presented to the Faculty of the Graduate School, University of Southern California, http://sunset.usc.edu/publications/TECHRPTS/PhD_Dissertations/files/SChulani_Dissertation.pdf

  • Dutoit AH, Paech B (2002) Rationale-based use case specification. Requirements Eng. J. 7:3–19. doi:10.1007/s007660200001

    Article  MATH  Google Scholar 

  • Dutoit AH, Paech B (2003) Eliciting and maintaining knowledge for requirements evolution. In: Aurum A, Jeffery R, Wohlin C, Handzic M (eds) Managing Software Engineering Knowledge. Springer, Berlin, pp 135–156

    Google Scholar 

  • Feather MS, Cornford SL (2003) Quantitative risk-based requirements reasoning. Requirements Eng J 8(4):248–265. doi:10.1007/s00766-002-0160-y

    Article  Google Scholar 

  • Feather MS, Cornford SL, Larson T (2000a) Combining the best attributes of qualitative and quantitative risk management tool support. Proc. 15th IEEE Int. Conf. on automated software eng., Grenoble, France, 11–15 September 2000. IEEE Computer Society, 309–312

  • Feather MS, Cornford SL, Gibbel M (2000b) Scalable mechanisms for requirements interaction management. IEEE Int. Conf. on Requirements Eng., Schaumburg, USA, pp 119–129

    Google Scholar 

  • Feather MS, Cornford SL, Kiper JD, Menzies T (2006) Experiences using Visualization Techniques to Present Requirements, Risks to Them, and Options for Risk Mitigation. Proc. Int. Workshop on Requirements Eng. Visualization, Minneapolis/ St. Paul, Minnesota

    Google Scholar 

  • Herrmann A, Paech B (2005) Quality Misuse. Proc. 11th Int. Workshop on Requirements Eng. for Software Quality, Foundations of Software Quality REFSQ, Essener Informatik Beiträge. Band 10:193–199

    Google Scholar 

  • Herrmann A, Paech B (2006) Benefit Estimation of Requirements Based on a Utility Function. Proc. 12th Int. Workshop on Requirements Eng. for Software Quality, Foundations of Software Quality REFSQ, Essener Informatik Beiträge. Band 11:249–250

    Google Scholar 

  • Herrmann A, Paech B (2008a) Practical Challenges of Requirements Prioritization Based on Risk Estimation: Result of Two Student Experiments. Technical Report SWEHD-TR-2008-03, University of Heidelberg, http://www-swe.informatik.uni-heidelberg.de/research/publications/reports.htm

  • Herrmann A, Paech B (2008b) MOQARE: Misuse-oriented quality requirements engineering. Requirements Eng J 13(1):73–86. doi:10.1007/s00766-007-0058-9

    Article  Google Scholar 

  • Herrmann A, Rückert J, Paech B (2006) Exploring the Interoperability of Web Services using MOQARE. Proc. IS-TSPQ First Int. Workshop on Interoperability Solutions to Trust, Security, Policies and QoS for Enhanced Enterprise Systems, Bordeaux, France

    Google Scholar 

  • Höst M, Regnell B, Wohlin C (2000) Using students as subjects—a comparative study of students and professionals in lead-time impact assessment. Empir Softw Eng 5(3):201–214. doi:10.1023/A:1026586415054

    Article  MATH  Google Scholar 

  • ISO (International Standards Organization) (2002) ISO, Risk management–Vocabulary–Guidelines for use in standards, ISO Guide 73. International Standards Organization, Geneva

    Google Scholar 

  • Jalali O, Menzies T, Feather M (2008) Optimizing requirements decisions with KEYS. Proc. 4th Int. Workshop on Predictor models in software eng., Int. Conf. on Software Eng., Leipzig, Germany. ACM New York, NY, USA, pp 79–86

    Google Scholar 

  • Karlsson J (1996) Software requirements prioritising. Proc. 2nd Int. Conf. Requirements Eng., 110–116

  • Karlsson J, Wohlin C, Regnell B (1998) An evaluation of methods for prioritizing software requirements. Inf Softw Technol 39:939–947. doi:10.1016/S0950-5849(97)00053-0

    Article  Google Scholar 

  • Karlsson L, Berander P, Regnell B, Wohlin C (2004) Requirements Prioritisation: An Experiment on Exhaustive Pair-Wise Comparisons versus Planning Game Partitioning. Berander, P. Prioritization of Stakeholder Needs in Software Engineering, Understanding and Evaluation, Licenciate Thesis, Blekinge Institute of Technology, Licentiate Series No 2004:12

  • Karlsson L, Thelin T, Regnell B, Berander P, Wohlin C (2007) Pair-wise comparisons versus planning game partitioning—experiments on requirements prioritisation techniques. Empir Softw Eng 12(1):3–33. doi:10.1007/s10664-006-7240-4

    Article  Google Scholar 

  • Kontio J (1996) The Riskit Method for Software Risk Management, Version 1.00. University of Maryland. College park, MD, Computer Science Technical Reports CS-TR-3782

  • Leffingwell D, Widrig D (2000) Managing Software Requirements—A Unified Approach. Addison-Wesley, Reading, Massachusetts, USA

    Google Scholar 

  • Mayer N, Rifaut A, Dubois E (2005) Towards a risk-based security requirements engineering framework. Proc. 11th Int. workshop on requirements eng. for software quality, foundations of software quality REFSQ, essener informatik beiträge. Band 10:89–104

    Google Scholar 

  • Menzies M, Kiper J, Feather M (2003) Improved software engineering decision support through automatic argument reduction tools. 2nd Int. Workshop on Software Eng. Decision Support SEDECS2003, part of SEKE2003, June 2003

  • Ngo-The A, Ruhe G (2005) Decision Support in Requirements Engineering. In: Aurum A, Wohlin C (eds) Engineering and Managing Software Requirements. Springer, Berlin, Heidelberg

    Google Scholar 

  • Papadacci E, Salinesi C, Rolland C (2004) Payoff Analysis in Goal-Oriented Requirements Engineering. Proc. 10th Int. Workshop on Requirements Eng. for Software Quality, Foundations of Software Quality REFSQ

  • Park J, Port D, Boehm B, In H (1999) Supporting distributed collaborative prioritization for winwin requirements capture and negotiations. Proc. Int. 3rd World Multiconference on Systemics, Cybernetics and Informatics SCI’99 2:578–584

    Google Scholar 

  • Raiffa H, Richardson J, Metcalfe D (2002) Negotiation analysis—the science and art of collaborative decision making. Belknap, Cambridge

    Google Scholar 

  • Regnell B, Höst M, Natt och Dag J, Beremark P, Hjelm T (2001) An industrial case study on distributed prioritisation in market-driven requirements engineering for packaged software. Requirements Eng 6:51–62. doi:10.1007/s007660170015

    Article  MATH  Google Scholar 

  • Remus W (1989) Using Students as Subjects in Experiments on Decision Support Systems. Proc. 22nd Annual Hawaii Int. Conf. on System Sciences, Vol. III: Decision Support and Knowledge Based Systems Track, pp 176–180

  • Richardson R (2003) 2003 CSI/FBI Computer Crime and Security Survey. Computer Security Institute. http://i.cmpnet.com/gocsi/db_area/pdfs/fbi/FBI2003.pdf (last visit: nov 07)

  • Robson C (2002) Real World Research. Blackwell Publishing, Cornwall, UK

    Google Scholar 

  • Ruhe G, Eberlein A, Pfahl D (2003) Trade-off analysis for requirements selection. Int J Softw Eng Knowl Eng 13(4):345–366. doi:10.1142/S0218194003001378

    Article  Google Scholar 

  • Runeson P (2003) Using students as experiment subjects—an analysis on graduate and freshmen student data. Proc. Int. Conf. Empirical Assessment and Evaluation in Software Eng. EASE Keele, UK, pp 95–102

    Google Scholar 

  • Ryan K, Karlsson J (1997) Prioritizing Software Requirements in an Industrial Setting. Proc. Int. Conf. on Software Eng., pp 564–565

  • Saaty TL (1980) The Analytic Hierarchy Process. McGraw-Hill, New York

    MATH  Google Scholar 

  • Sindre G, Opdahl AL (2000) Eliciting security requirements by misuse cases. Proc. TOOLS Pacific 2000:120–131

    Google Scholar 

  • Sindre G, Opdahl AL (2001) Templates for Misuse Case Description. Proc. 7th Int. Workshop on Requirements Eng.: Foundation of Software Quality–REFSQ, Essener Informatik Beiträge Band 6. Essen, Germany, pp 125–136

  • Stamatis DH (2003) Failure Mode and Effect Analysis–FMEA from Theory to Execution. American Society for Quality Press, Milwauki, USA

    Google Scholar 

  • Stylianou AC, Kumar RL, Khouja MJ (1997) A total quality management-based systems development process. ACM SIGMIS Database 28(3):59–71. doi:10.1145/272657.272691

    Article  Google Scholar 

  • Sysiphus (2007) http://sysiphus.in.tum.de/ (last visit: nov 2007)

  • Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empir Softw Eng 5(4):309–312. doi:10.1023/A:1009844119158

    Article  MathSciNet  Google Scholar 

  • Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 185:1124–1131. doi:10.1126/science.185.4157.1124

    Article  Google Scholar 

  • van den Akker M, Brinkkemper S, Diepen G, Versendaal J (2004) Flexible Release Composition using Integer Linear Programming. Technical Report UU-CS-2004-063, Institute of Information and Computing Sciences, Utrecht University, Netherlands

    Google Scholar 

  • van den Akker M, Brinkkemper S, Diepen G, Versendaal J (2006) Software product release planning through optimization and what-if analysis. Technical Report UU-CS-2006-63, Department of Information and Computing Sciences, Utrecht University, The Netherlands

    Google Scholar 

  • Wiegers K (1999) First things first: prioritizing requirements. Software Development 7(9) http://www.processimpact.com/pubs.shtml#requirements (last visit: nov 07)

  • Wolf T, Dutoit AH (2004) A rationale-based analysis tool. Proc. 13th Int. Conf. on Intelligent on Adaptive Systems and Software Eng, Nice, France

    Google Scholar 

  • Xie N, Mead NR, Chen P, Dean M, Lopez L, Ojoko-Adams D, Osman H (2004) SQUARE Project: Cost/Benefit Analysis Framework for Information Security Improvement Projects in Small Companies. Software Engineering Institute, Carnegie Mellon University, Technical Note CMU/SEI-2004-TN-045

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Herrmann.

Additional information

Editor:

Annex A: Data and Data Analysis

Annex A: Data and Data Analysis

This annex for Experiment 1 and 2 describes the quantitative analysis of the variables defined in Section 3. For each variable, the results from both experiments and all methods are presented together. Their interpretation, especially how we believe that these variables have been influenced by the influencing factors, is discussed in Section 6.

1.1 A.1 Time Consumption

Method 1 demands to determine only two values (group and priority) for each of the countermeasures. Method 2 demands the estimation of two probabilities and two damages per countermeasure.

In Experiment 1, the time needed for Method 2 (risk estimation) is significantly higher than in Method 1 (ranking). The average time needed for Q1 was 6.6 min for those 7 participants who noted it. Q2 took an average of 31.8 min. In Experiment 2, the time need averaged over those groups who performed this method first, was 17.5 min for Method 1 and 37.3 min for Method 2. (We count only these groups, because of the learning effect observed.)

From these numbers, we calculated the time need per countermeasure (Table 5) and also the time need per estimation (Table 6.), as the number of estimations per countermeasure in Method 2 depends on the number of misuse cases.

1.2 A.2 Priorities

The resulting priorities of the countermeasures varied widely among the participants and groups in both experiments, for both methods, as can be seen from Tables 15, 16, 17, 18. This means that they differ greatly about the importance of the countermeasures. The same countermeasure could have the highest priority (1) for one participant/ group and the lowest for another. This was the case even in Method 1, where the results were transparent to and manipulable by the estimators, while in Method 2 the lacking transparency and indirect manipulability of the priorities could possibly lead to results which are unexpected by the estimators.

Table 15 Priorities resulting from Experiment 1 with Method 1 (“1” standing for the most important one): The “1” is row “R1” and column “1” means that according to the results of participant 1, Countermeasure R1 is the most important one
Table 16 Priorities resulting from experiment 1 with method 2 (“1” indicating the most important one)
Table 17 Priorities resulting from experiment 2 with method 1 (“1” indicating the most important one)
Table 18 Priorities resulting from experiment 2 with method 2 (“1” indicating the most important one)

In Method 1, the averages of the priorities lie between 3.0 and 6.4 for the individual countermeasures. In Method 2, the averages are between 3.1 and 7.2. In Experiment 2, all groups agreed that R3 is one of the least important: In Method 1, all seven groups gave R3 the lowest priority 7, while in Method 2, this was the case for five groups, once it received priority 6 and once 5. The priority averages in Method 1 (not counting R3), vary from 2.57 to 5.00, and in Method 2 from 2.29 to 4.21.

We are sure that these wide ranges are not caused by the misunderstanding whether “1” stands for the highest priority or the lowest. In Method 1, the priority 1 countermeasure for all participants was found in the “high benefit” group. In Method 2, the priorities were determined by us, based on the calculated benefits.

1.3 A.a Standard Deviation of Priorities

The standard deviations s found among the ten estimations of the nine priorities in Experiment 1 are shown in Table 7 and also the seven estimations found among the seven priorities in Experiment 2 in Table 8. The differences of the standard deviations between the methods in the same experiment are very low and statistically not significant. The standard deviations in Experiment 2 are lower than in Experiment 1 because fewer countermeasures were prioritized, but also when divided by the average priority (which is (n + 1)/2), in Experiment 2 the standard deviation is lower (see Table 9).

1.4 A.b Ease of Use

The ease of use of each method as rated by the participants is shown in Table 10. We attribute points to the answers: Very easy =2 points, Easy =1, Undecided =0, Difficult =−1, Very difficult =−2.

1.5 A.c Participants Expect Their Estimations to Be Realistic

In Experiment 1, immediately after the estimations, but before they knew the resulting priorities, we asked the participants whether they expected to have made reasonable, realistic and useful estimations (question Q4b, see Table 11). In Experiment 2, they were asked whether they believe that their results were realistic. At this point of time, they knew the priorities. Points were attributed to the answers: very = 2 points, rather =1, undecided = 0, rather not =−1, not at all = −2 points.

It is interesting to note that in Experiment 2, the average points for the probability and damage estimations were 0.13, i.e. lower than for the priorities (but not statistically significantly due to high variations). This means that some participants trusted the method to deliver priorities which are more realistic than the risk estimations on which they are based.

1.6 A.d Accuracy of the Results

Accuracy here means that after the experiment the participants considered the countermeasure priorities which resulted from their risk estimations to be plausible and reflecting their views. In Experiment 1, this question was asked in the post-test session 1 week after the experiment, in Experiment 2, the question was asked during the experiment session directly after the estimations.

In Experiment 1, nine participants answered this question (free-text answers). Four were in favour of Method 1, two wrote, that their first impression was that Method 1 delivered the most plausible results, because they corresponded to their intuitive priorities, but as Method 2 was a systematic method, probably this method should provide the best results. Another participant wrote that all methods delivered plausible as well as less plausible results for the different countermeasures. The last participant wrote that they were all plausible: Method 1 reflected his own perception, neglecting risk and cost. Method 2 was plausible as well, taking into account risk and cost.

In Experiment 2, Method 1 got an average of 1.29 points as a result to this question; Method 2 got 0.13 points on a scale between −2 and +2 (very well = 2 points, rather yes = 1, undecided = 0, rather not =−1, very badly = −2 points). This difference is statistically significant.

1.7 A.e Frequency of Naming a Misuse Case or Countermeasure as Especially Uncertain

The participants were asked to name the countermeasures or misuse cases for which they considered their estimated values especially uncertain. We counted how often a certain countermeasure (in Method 1) or a certain misuse case (in Method 2) was named (Variable e). To compare the methods and countermeasures, we calculated the average frequency with which a countermeasure or misuse case was named here, averaged over all countermeasures/ misuse cases. We did so, because the number of countermeasures and misuse cases was not the same in the two experiments and the methods applied. The results are summarized in Table 12.

In Experiment 1, regarding Method 1, R8 was never named and R3 only once. R5 was mentioned twice, R1, R2, R4 and R9 three times, R7 four times and R6 five times by the ten participants. Concerning Method 2, eight out of ten participants named specific misuse cases, while six additionally or instead said that they were uncertain about practically all of them. If we take the latter group literally, each misuse case was named with the average frequency 0.7. If we do not count the participants stating they were uncertain concerning all misuse cases, the numbers are about 0.2 for the probability estimations as well as for the damage estimations.

In Experiment 2, regarding Method 1, R1 was named seven times (by the 18 participants who answered to this question), R6 and R7 four times, R2, R4 and R5 three times and R3 only once. Except for R1, which seems to be especially difficult to judge, and R3, which caused almost no irritation, most countermeasures seem to be equally difficult. Few correlations are seen among the answers of members of the same group. Only once, all three group members agreed that R1 was difficult, and three times two of three or four group members agreed about the same countermeasure.

In Experiment 2, concerning the probability estimation in Method 2, Misuse Cases 1–3 were named four times (by the 18 participants who answered this question), Misuse Case 4 only once, Misuse Case 5 nine times and Misuse Case 6 eight times. Only once, all three group members agreed about the same misuse case: for Misuse Case 5 they all found probability estimation difficult. Five times, two group members agreed about the same misuse case.

In Experiment 2, concerning the damage estimation in Method 2, the misuse cases were named with the following frequencies (by the 18 participants who answered this question): Misuse Case 6 nine times, 1 and 4 seven times each, 3 five times, 5 four times, and 2 only once. Only once all three group members agreed concerning 3. Seven times, two group members considered the same misuse case’s damage estimation to be difficult. This means an average frequency for a misuse case of 0.3. These numbers are approximately the same as for the probability estimation.

The differences observed between methods (during the same experiment) and during different experiments for the same method are statistically significant.

1.8 A.3 Influence of Statistics

In Experiment 1, we tested the influence of statistics provided to the estimators. In Q5d, 8 out of 9 participants now clearly attributed different probabilities (reference risk) to the two security misuse cases, usually much higher ones, see Table 13 and Table 14. One participant wrote that he estimated the same probabilities as before (0.5% and 0.1%), but we doubt whether they were really derived from the statistics, as they differ too much from the estimations of the other participants.

When questioned whether the statistics facilitated the probability estimation, we received free-text answers. Some of them were: “They definitely were helpful. One feels more certain, thanks to this information.” Others were more sceptical: “I would say that the statistics have strongly influenced my estimations. However, I wonder how similar the systems of these companies are to the reference system in the case study. Only when this is known, one can say whether the high estimated value is justified. I think that I still do not have enough information to deliver a good estimation.” All together, 4 participants out of 9 wrote that the statistics were helpful, while one wrote they did not influence the estimation and four wrote they influenced the estimations, but they still were sceptical whether the estimated values were exact.

We expected that when providing the participants with several statistics, the standard deviation (relative to the average) of their estimations to decrease (Variable a). As one can see in Table 13 and Table 14, the estimated probabilities still differed among the participants. This was to be expected because four statistics were given, which did not apply to exactly the same environment as the case study. Therefore, interpretations and adaptations were necessary. The coefficient of variation became less than half by using the statistics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Herrmann, A., Paech, B. Practical challenges of requirements prioritization based on risk estimation. Empir Software Eng 14, 644–684 (2009). https://doi.org/10.1007/s10664-009-9105-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-009-9105-0

Keywords

Navigation