Improving Reliability of Judgmental Forecasts

  • Thomas R. Stewart
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 30)


All judgmental forecasts will be affected by the inherent unreliability, or inconsistency, of the judgment process. Psychologists have studied this problem extensively, but forecasters rarely address it. Researchers and theorists describe two types of unreliability that can reduce the accuracy of judgmental forecasts: (1) unreliability of information acquisition, and (2) unreliability of information processing. Studies indicate that judgments are less reliable when the task is more complex; when the environment is more uncertain; when the acquisition of information relies on perception, pattern recognition, or memory; and when people use intuition instead of analysis. Five principles can improve reliability in judgmental forecasting:
  1. 1.

    Organize and present information in a form that clearly emphasizes relevant information.

  2. 2.

    Limit the amount of information used in judgmental forecasting. Use a small number of really important cues.

  3. 3.

    Use mechanical methods to process information.

  4. 4.

    Combine several forecasts.

  5. 5.

    Require justification of forecasts.



Accuracy combining forecasts error information acquisition information processing psychometrics reliability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Annett, J. (1969), Feedback and Human Behaviour: The Effects of Knowledge of Results, Incentives, and Reinforcement on Learning and Performance. Baltimore: Pengu in Books.Google Scholar
  2. Armstrong, J. S. (1985), Long-range Forecasting: From Crystal Ball To Computer. (Sec- ond Edition ed.). New York: Wiley. Full text at Scholar
  3. Armstrong, J. S. (2001), “Judgmental bootstrapping: Inferring experts’ rules for forecasting.” In J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  4. Armstrong, J. S., R. J. Brodie S. H. McIntyre (1987), “Forecasting methods for marketing: Review of empirical research,” International Journal of Forecasting, 3, 355–376.Google Scholar
  5. Ashton, A. H. R. H. Ashton (1985), “Aggregating subjective forecasts: Some empirical results,” Management Science, 31, 1499–1508.Google Scholar
  6. Ashton, R. H. (1986), “Combining the judgments of experts: How many and which ones?” Organizational Behavior and Human Decision Processes, 38, 405–414.Google Scholar
  7. Balzer, W. K., J. Rohrbaugh K. R. Murphy (1983), “Reliability of actual and predicted judgments across time,” Organizational Behavior and Human Performance, 32, 109–123.Google Scholar
  8. Bar-Hillel, M. (1990), “Back to base rates,” in R. M. Hogarth (ed.), Insights In Decision Making: A Tribute To Hillel J. Einhorn. Chicago: University of Chicago Press.Google Scholar
  9. Blattberg, R. C. S. J. Hoch (1990), “Database models and managerial intuition: 50% model + 50% manager,” Management Science, 36, 887–899.Google Scholar
  10. Bolger, F. G. Wright (1992), “Reliability and validity in expert judgment,” in F. Bolger G. Wright (eds.), Expertise and Decision Support. New York: Plenum Press. (pp. 47–76 )Google Scholar
  11. Bosart, L. F. (1975), “SUNYA experimental results in forecasting daily temperature and precipitation,” Monthly Weather Review, 103, 1013–1020.Google Scholar
  12. Brehmer, A. B. Brehmer (1988), “What have we learned about human judgment from thirty years of policy capturing?” in B. Brehmer C. R. B. Joyce (eds.), Human Judgment: The Social Judgment Theory View. Amsterdam: North-Holland. (pp. 75–114 ).Google Scholar
  13. Brehmer, B. (1970), “Inference behavior in a situation where the cues are not reliably perceived,” Organizational Behavior and Human Performance, 5, 330–347.Google Scholar
  14. Brehmer, B. (1976), “Note on the relation between clinical judgment and the formal characteristics of clinical tasks,” Psychological Bulletin, 83, 778–782.Google Scholar
  15. Brehmer, B. (1978), “Response consistency in probabilistic inference tasks,” Organizational Behavior and Human Performance, 22, 103–115.Google Scholar
  16. Brockhoff, K. (1984), “Forecasting quality and information,” Journal of Forecasting, 3, 417–428.Google Scholar
  17. Bruce, R. S. (1935), “Group judgments in the fields of lifted weights and visual discrimination,” Journal of Psychology, 1, 117–121.Google Scholar
  18. Brunswik, E. (1952), The Conceptual Framework of Psychology. Chicago: University of Chicago Press.Google Scholar
  19. Brunswik, E. (1956), Perception and the Representative Design of Psychological Experiments. 2nd ed. Berkeley: University of California Press.Google Scholar
  20. Bunn, D. (1987), “Expert use of forecasts: Bootstrapping and linear models,” in G. Wright P. Ayton (eds.), Judgemental Forecasting. Chichester: Wiley.Google Scholar
  21. Bunn, D. G. Wright (1991), “Interaction of judgmental and statistical forecasting methods: Issues and analysis,” Management Science, 37 (5), 501–518.Google Scholar
  22. Camerer, C. (1981), “General conditions for the success of bootstrapping models,” Organizational Behavior and Human Performance, 27, 411–422.Google Scholar
  23. Castellan, N. J. Jr. (1972), “The analysis of multiple criteria in multiple-cue judgment tasks,” Organizational Behavior and Human Performance, 8, 242–261.Google Scholar
  24. Castellan, N. J. Jr. (1992), “Relations between linear models: Implications for the lens model,” Organizational Behavior and Human Decision Processes, 51, 364–381.Google Scholar
  25. Clemen, R. T. (1989), “Combining forecasts: A review and annotated bibliography,” International Journal of Forecasting, 5, 559–583.Google Scholar
  26. Conway, J. M., R. A. Jako D. F. Goodman (1995), “A meta-analysis of interrater and internal consistency reliability of selection interviews,” Journal of Applied Psychology, 80, 565–579.Google Scholar
  27. Cooksey, R. W. (1996), Judgment Analysis: Theory, Methods, and Applications. New York: Academic Press.Google Scholar
  28. Cooksey, R. W. P. Freebody (1985), “Generalized multivariate lens model analysis for complex human inference tasks,” Organizational Behavior and Human Decision Processes, 35, 46–72.Google Scholar
  29. Cooksey, R. W., P. Freebody A. J. Bennett. (1990), “The ecology of spelling: A lens model analysis of spelling errors and student judgments of spelling difficulty,” Reading Psychology: An International Quarterly, 11, 293–322.Google Scholar
  30. Dawes, R. M. (1979), “The robust beauty of improper linear models in decision making,” American Psychologist, 34 (7), 571–582.Google Scholar
  31. Dawes, R. M. (1988), Rational Choice in an Uncertain World. New York: Harcourt, Brace, Jovanovich.Google Scholar
  32. Dawes, R. M. B. Corrigan (1974), “Linear models in decision making,” Psychological Bulletin, 81 (2), 95–106.Google Scholar
  33. Edmundson, R. H. (1990), “Decomposition: A strategy for judgmental forecasting,” Journal of Forecasting, 9 (4), 305–314.Google Scholar
  34. Einhorn, H. J. (1971), “Use of nonlinear, noncompensatory models as a function of task and amount of information,” Organizational Behavior and Human Performance, 6, 1–27.Google Scholar
  35. Einhorn, H. J. (1972), “Expert measurement and mechanical combination,” Organizational Behavior and Human Performance, 7, 86–106.Google Scholar
  36. Einhorn, H. J. (1974), “Expert judgment: Some necessary conditions and an example,” Journal of Applied Psychology, 59, 562–571.Google Scholar
  37. Einhorn, H. J., R. M. Hogarth E. Klempner (1977), “Quality of group judgment,” Psychological Bulletin, 84, 158–172.Google Scholar
  38. Ericsson, K. A. A. C. Lehmann (1996), “Expert and exceptional performance: Evidence of maximal adaptation to task constraints,” Annual Review of Psychology, 47, 273–305.Google Scholar
  39. Faust, D. (1986), “Research on human judgment and its application to clinical practice,” Professional Psychology: Research and Practice, 17 (5), 420–430.Google Scholar
  40. Garb, H. N. C. J. Schramke (1996), “Judgment research and neuropsychological assessment: A narrative review and meta-analysis,” Psychological Bulletin, 120, 140–153.Google Scholar
  41. Gigone, D. R. Hastie (1997), “Proper analysis of the accuracy of group judgments,” Psychological Bulletin, 121, 149–167.Google Scholar
  42. Goldberg, L. R. (1968), “Simple models or simple processes? Some research on clinical judgments,” American Psychologist, 23, 483–496.Google Scholar
  43. Goldberg, L. R. (1970), “Man versus model of man: A rationale, plus some evidence, for a method of improving on clinical inferences,” Psychological Bulletin, 73, 422–432.Google Scholar
  44. Gordon, K. (1924), “Group judgments in the field of lifted weights,” Journal of Experimental Psychology, 7, 398–400.Google Scholar
  45. Grove, W. M. P. E. Meehl (1996), “Comparative efficiency of formal (mechanical, algorithmic) and informal (subjective, impressionistic) prediction procedures: The clinical/statistical controversy,” Psychology, Public Policy, and Law, 2, 293–323.Google Scholar
  46. Guilford, J. P. (1954), Psychometric Methods. ( 2nd ed. ). New York: McGraw-Hill.Google Scholar
  47. Hagafors, R. B. Brehmer (1983), “Does having to justify one’s judgments change the nature of the judgment process?” Organizational Behavior and Human Performance, 31, 223–232.Google Scholar
  48. Hammond, K. R. (1996), Human Judgment and Social Policy: Irreducible Uncertainty, Inevitable Error, Unavoidable Injustice. New York: Oxford University Press.Google Scholar
  49. Hammond, K. R., B. F. Anderson, J. Sutherland B. Marvin (1984), “Improving scientists’ judgments of risk,” Risk Analysis, 4, 69–78.Google Scholar
  50. Hammond, K. R., R. M. Hamm, J. Grassia T. Pearson (1987), “Direct comparison of the efficacy of intuitive and analytical cognition in expert judgment,” IEEE Transactions on Systems, Man, and Cybernetics, SMC-17, 753–770.Google Scholar
  51. Hammond, K. R., C. J. Hursch F. J. Todd (1964), “Analyzing the components of clinical inference,” Psychological Review, 71, 438–456.Google Scholar
  52. Hammond, K. R., T. R. Stewart, B. Brehmer D. O. Steinmann (1975), “Social judgment theory,” in M. F. Kaplan S. Schwartz (eds.), Human Judgment and Decision Processes. New York: Academic Press. (pp. 271–307 ).Google Scholar
  53. Hammond, K. R. D. A. Summers (1972), “Cognitive control,” Psychological Review, 79, 58–67.Google Scholar
  54. Hansell, S. (1995) “Loans granted by the megabyte: Computer models change small-business lending,” The New York Times, (April 18), pp. D1, D4.Google Scholar
  55. Harvey, N. (1995), “Why are judgments less consistent in less predictable task situations?” Organizational Behavior and Human Decision Processes, 63, 247–263.Google Scholar
  56. Harvey, N. (2001), “Improving judgmental forecast,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  57. Hogarth, R. (1987), Judgement and Choice ( 2nd ed. ). New York: John Wiley Sons.Google Scholar
  58. Hursch, C. J., K. R. Hammond J. L. Hursch (1964), “Some methodological considerations in multiple-cue probability studies,” Psychological Review, 71, 42–60.Google Scholar
  59. Kelley, T. L. (1925), “The applicability of the Spearman-Brown formula for the measurement of reliability,” Journal of Educational Psychology, 16, 300–303.Google Scholar
  60. Kirwan, J. R., D. M. Chaput De Saintonge, C. R. B. Joyce H. L. F. Currey (1983), “Clinical judgment in rheumatoid arthritis,” Annals of the Rheumatic Diseases, 42, 644–664.Google Scholar
  61. Koran, L. M. (1975), “The reliability of clinical methods, data, and judgments,” New England Journal of Medicine, 293, 642–646, 695–701.Google Scholar
  62. Lawrence, M. M. O. Connor (1992), “Exploring judgmental forecasting,” International Journal of Forecasting, 8, 15–26.Google Scholar
  63. Lee, J W. J. F. Yates. (1992), “How quantity judgment changes as the number of cues increases: An analytical framework and review,” Psychological Bulletin, 112 (2), 363–377.Google Scholar
  64. Levi, K. (1989), “Expert systems should be more accurate than human experts: Evaluation procedures from human judgment and decision making,” IEEE Transactions on Systems, Man, and Cybernetics, 19 (3), 647–657.Google Scholar
  65. Lichtenstein, S., B. Fischhoff L. D. Phillips (1982), “Calibration of probabilities: The state of the art to 1980,” in D. Kahneman, P. Slovic A. Tversky (eds.), Judgment Under Uncertainty: Heuristics and Biases. New York: Cambridge University Press, pp. 306–334.Google Scholar
  66. Little, K. B. (1961), “Confidence and reliability,” Educational and Psychological Measurement, 21, 95–101.Google Scholar
  67. Lusk, C. M. K. R. Hammond (1991), “Judgment in a dynamic task: Microburst forecasting,” Journal of Behavioral Decision Making, 4, 55–73.Google Scholar
  68. Lusk, C. M., T. R. Stewart, K. R. Hammond R. J. Potts (1990), “Judgment and decision making in dynamic tasks: The case of forecasting the microburst,” Weather and Forecasting, 5, 627–639.Google Scholar
  69. MacGregor, D. G. (2001), “Decomposition for judgmental forecasting and estimation,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  70. Maines, L. A. (1990), “The effect of forecast redundancy on judgments of a consensus forecast’s expected accuracy,” Journal of Accounting Research, 28, 29–47.Google Scholar
  71. Makridakis, S. R. L. Winkler (1983), “Averages of forecasts: Some empirical results,” Management Science, 29, 987–996.Google Scholar
  72. McNees, S. K. (1987), “Consensus forecasts: Tyranny of the majority?” New England Economic Review (November/December), 15–21.Google Scholar
  73. Millimet, C. R. R. P. Greenberg (1973), “Use of an analysis of variance technique for investigating the differential diagnosis of organic versus functional involvement of symptoms,” Journal of Consulting and Clinical Psychology, 40, 188–195.Google Scholar
  74. Murphy, A. H. (1988), “Skill scores based on the mean square error and their relationships to the correlation coefficient,” Monthly Weather Review, 116, 2417–2424.Google Scholar
  75. Murphy, A. H. B. G. Brown (1984), “A comparative evaluation of objective and subjective weather forecasts in the United States,” Journal of Forecasting, 3, 369–393.Google Scholar
  76. Nunnally, J. C. (1978), Psychometric Theory (2nd ed.). New York: McGraw-Hill. O’Connor, R. M. Jr., M. E. Doherty R. D. Tweney (1989), “The effects of system failure error on predictions,” Organizational Behavior and Human Decision Processes, 44, 1–11.Google Scholar
  77. Peters, J. T., K. R. Hammond D. A. Summers (1974), “A note on intuitive vs analyticthinking,” Organizational Behavior and Human Performance, 12, 125–131.Google Scholar
  78. Preston, M. G. (1938), “Note on the reliability and validity of the group judgment,” Journal of Experimental Psychology, 22, 462–471.Google Scholar
  79. Ramanaiah, N. V. L. R. Goldberg (1977), “Stylistic components of human judgment: The generality of individual differences,” Applied Psychological Measurement, 1, 23–39.Google Scholar
  80. Roebber, P. J. L. F. Bosart (1996), “The contributions of education and experience to forecast skill,” Weather and Forecasting, 11, 21–40.Google Scholar
  81. Rothstein, H. G. (1986), “The effects of time pressure on judgment in multiple cue probability learning.” Organizational Behavior and Human Decision Processes, 37, 83–92.Google Scholar
  82. Sanders, F. (1963), “On subjective probability forecasting,” Journal of Applied Meteorology, 2, 191–210.Google Scholar
  83. Sen, T. W. J. Boe (1991), “Confidence and accuracy in judgements using computer displayed information,” Behaviour Information Technology, 10, 53–64.Google Scholar
  84. Slovic, P. (1972), “Psychological study of human judgment: Implications for investment decision making,” Journal of Finance, 27 (4), 779–799.Google Scholar
  85. Slovic, P. S. Lichtenstein (1971), “Comparison of Bayesian and regression approaches to the study of information processing in judgment,” Organizational Behavior and Human Performance, 6, 649–744.Google Scholar
  86. Smith, B. B. (1941), “The validity and reliability of group judgments,” Journal of Experimental Psychology, 29, 420–434.Google Scholar
  87. Stenson, H. H. (1974), “The lens model with unknown cue structure,” Psychological Review, 81, 257–264.Google Scholar
  88. Stewart, T. R. (1976), “Components of correlations and extensions of the lens model equation,” Psychometrika, 41, 101–120.Google Scholar
  89. Stewart, T. R. (1987), “The Delphi technique and judgmental forecasting,” Climatic Change, 11, 97–113.Google Scholar
  90. Stewart, T. R. (1990), “A decomposition of the correlation coefficient and its use in analyzing forecasting skill,” Weather and Forecasting, 5, 661–666.Google Scholar
  91. Stewart, T. R., J. Bobeck J. Shim (2000), Annotated bibliography and summary of medical literature on the reliability of diagnostic signs and judgments. Albany, NY: Center for Policy Research, Nelson A. Rockefeller College of Public Affairs and Policy, State University of New York.Google Scholar
  92. Stewart, T. R. C. M. Lusk (1994), “Seven components of judgmental forecasting skill: Implications for research and the improvement of forecasts,” Journal of Forecasting, 13, 579–599.Google Scholar
  93. Stewart, T. R., W. R. Moninger J. Grassia, R. H. Brady F. H. Merrem (1989), “Analysis of expert judgment in a hail forecasting experiment,” Weather and Forecasting, 4, 24–34.Google Scholar
  94. Stewart, T. R., W. R. Moninger, K. F. Heideman P. Reagan-Cirincione (1992), “Effects of improved information on the components of skill in weather forecasting,” Organizational Behavior and Human Decision Processes, 53, 107–134.Google Scholar
  95. Stewart, T. R., P. J. Roebber L. F. Bosart (1997), “The importance of the task in analyzing expert judgment,” Organizational Behavior and Human Decision Processes, 69, 205–219.Google Scholar
  96. Stroop, J. R. (1932), “Is the judgment of the group better than that of the average member of the group?” Journal of Experimental Psychology, 15, 550–560.Google Scholar
  97. Trumbo, D., C. Adams, C. Milner L. Schipper (1962), “Reliability and accuracy in the inspection of hard red winter wheat,” Cereal Science Today, 7, 62–71.Google Scholar
  98. Tucker, L. R. (1964), “A suggested alternative formulation in the developments by Hursch, Hammond, and Hursch, and by Hammond, Hursch, and Todd,” Psychological Review, 71, 528–530.Google Scholar
  99. Ullman, D. G. M. E. Doherty (1984), “Two determinants of the diagnosis of hyperactivity: The child and the clinician,” Advances in Developmental and Behavioral Pediatrics, 5, 167–219.Google Scholar
  100. Wagenaar, W. A. H. Timmers (1978), “Extrapolation of exponential time series is not enhanced by having more data points,” Perception and Psychophysics, 24, 182–184.Google Scholar
  101. Webby, R., M. O’Connor M. Lawrence (2001), “Judgmental time-series forecasting using domain knowledge,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  102. Wilsted, W. D., T. E. Hendrick T. R. Stewart (1975), “Judgment policy capturing for bank loan decisions: An approach to developing objective functions for goal programming models,” Journal of Management Studies, 12, 210–215.Google Scholar
  103. Winkler, R. L. (1981), “Combining probability distributions from dependent information sources,” Management Science, 27, 479–488.Google Scholar
  104. Winkler, R. L., S. Makridakis (1983), “The combination of forecasts,” Journal of the Royal Statistical Society A, 146, 150–157.Google Scholar
  105. York, K. M., M. E. Doherty J. Kamouri (1987), “The influence of cue unreliability on judgment in a multiple cue probability learning task,” Organizational Behavior and Human Decision Processes, 39, 303–317.Google Scholar

Copyright information

© Springer Science+Business Media New York 2001

Authors and Affiliations

  • Thomas R. Stewart
    • 1
  1. 1.Center for Policy Research, Nelson A. Rockefeller College of Public Affairs and PolicyUniversity at Albany, State University of New YorkUSA

Personalised recommendations