Evaluating Forecasting Methods

  • J. Scott Armstrong
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 30)


Ideally, forecasting methods should be evaluated in the situations for which they will be used. Underlying the evaluation procedure is the need to test methods against reasonable alternatives. Evaluation consists of four steps: testing assumptions, testing data and methods, replicating outputs, and assessing outputs. Most principles for testing forecasting methods are based on commonly accepted methodological procedures, such as to prespecify criteria or to obtain a large sample of forecast errors. However, forecasters often violate such principles, even in academic studies. Some principles might be surprising, such as do not use R-square, do not use Mean Square Error, and do not use the within-sample fit of the model to select the most accurate time-series model. A checklist of 32 principles is provided to help in systematically evaluating forecasting methods.


Backcasting benchmarks competing hypotheses concurrent validity construct validity disconfirming evidence domain knowledge error measures face validity fit jackknife validation M-Competitions outliers predictive validity replication statistical significance successive updating 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adya, M. (2000), “Corrections to rule-based forecasting: Results of a replication,” International Journal of Forecasting, 16, 125–127.CrossRefGoogle Scholar
  2. Ames, E. S. Reiter (1961), “Distributions of correlation coefficients in economic time series, ” Journal of the American Statistical Association, 56, 637–656.CrossRefGoogle Scholar
  3. Anscombe, F. J. (1973), “Graphs in statistical analysis,” American Statistician, 27, 17–21.Google Scholar
  4. Armstrong, J. S. (1970), “How to avoid exploratory research,” Journal of Advertising Research, 10 (August), 27–30. Full text at Google Scholar
  5. Armstrong, J. S. (1979), “Advocacy and objectivity in science,” Management Science, 25, 423–428.CrossRefGoogle Scholar
  6. Armstrong, J. S. (1980), “Unintelligible management research and academic prestige,” Interfaces, 10 (March—April), 80–86. Full text at Google Scholar
  7. Armstrong, J. S. (1983), “Cheating in management science,” Interfaces, 13 (August), 20–29.CrossRefGoogle Scholar
  8. Armstrong, J. S. (1984), “Forecasting by extrapolation: Conclusions from 25 years of research,” Interfaces, 13 (Nov./Dec.), 52–61. Full text at Google Scholar
  9. Armstrong, J. S. (1985), Long-Range Forecasting. New York: John Wiley. Full text at Google Scholar
  10. Armstrong, J. S. (1988), “Research needs in forecasting,” International Journal of Forecasting, 4, 449–465. Full text at Google Scholar
  11. Armstrong, J. S. (1996), “Management folklore and management science: On portfolio planning, escalation bias, and such,” Interfaces, 26, No. 4, 28–42. Full text at Google Scholar
  12. Armstrong, J. S. (1997), “Peer review for journals: Evidence on quality control, fairness, and innovation,” Science and Engineering Ethics, 3, 63–84. Full text at See “peer review.”Google Scholar
  13. Armstrong, J. S. (2001a), “Role-playing: A method to forecast decisions,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  14. Armstrong, J. S. (2001b), “Selecting forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA.: Kluwer Academic Publishers.Google Scholar
  15. Armstrong, J. S., M. Adya F. Collopy (2001), “Rule-based forecasting: Using judgment in time-series extrapolation,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  16. Armstrong, J. S., R. Brodie A. Parsons (2001), “Hypotheses in marketing science: Literature review and publication audit,” Marketing Letters, 12, 171–187.CrossRefGoogle Scholar
  17. Armstrong, J. S. F. Collopy (1992), “Error measures for generalizing about forecasting methods: Empirical comparisons,” International Journal of Forecasting, 8, 69–80. Full text at Followed by commentary by AhlburgGoogle Scholar
  18. Chatfield, Taylor, Thompson, Winkler and Murphy, Collopy and Armstrong, and Fildes, pp. 99–111.Google Scholar
  19. Armstrong, J. S. F. Collopy (1993), “Causal forces: Structuring knowledge for time series extrapolation,” Journal of Forecasting, 12, 103–115. Full text at Google Scholar
  20. Armstrong, J. S. F. Collopy (1994), “How serious are methodological issues in surveys? A reexamination of the Clarence Thomas polls.” Full text at Google Scholar
  21. Armstrong, J. S. F. Collopy (2001), “Identification of asymmetric prediction intervals through causal forces” Journal of Forecasting (forthcoming).Google Scholar
  22. Armstrong, J. S. R. Fildes (1995), “On the selection of error measures for comparisons among forecasting methods,” Journal of Forecasting, 14, 67–71. Full text at Google Scholar
  23. Armstrong, J. S. A. Shapiro (1974), “Analyzing quantitative models,” Journal of Marketing, 38, 61–66. Full text at Google Scholar
  24. Barrett, G. V., J. S. Phillips R. A. Alexander (1981), “Concurrent and predictive validity designs: A critical reanalysis,” Journal of Applied Psychology, 66, 1–6.CrossRefGoogle Scholar
  25. Batson C. D. (1975), “Rational processing or rationalization? The effect of disconfirming information on a stated religious belief,” Journal of Personality and Social Psychology, 32, 176–184.CrossRefGoogle Scholar
  26. Bretschneider, S. I., W. L. Gorr, G. Grizzle E. Klay (1989), “Political and organizational influences on the accuracy of forecasting state government revenues,” International Journal of Forecasting, 5, 307–319.CrossRefGoogle Scholar
  27. Brouthers, L. E. (1986), “Parties, ideology, and elections: The politics of federal revenues and expenditures forecasting,” International Journal of Public Administration, 8, 289–314.CrossRefGoogle Scholar
  28. Carbone, R. J. S. Armstrong (1982), “Evaluation of extrapolative forecasting methods: Results of a survey of academicians and practitioners,” Journal of Forecasting, 1, 215–217. Full text at Google Scholar
  29. Card, D. A.B. Krueger (1994), “Minimum wages and a case study of the fast-food industry in New Jersey and Pennsylvania,” American Economic Review, 84, 772–793.Google Scholar
  30. Chamberlin, C. (1965), “The method of multiple working hypotheses,” Science, 148, 754–759.CrossRefGoogle Scholar
  31. Chatfield, C. (1988), “Apples, oranges and mean square error,” Journal of Forecasting, 4, 515–518.CrossRefGoogle Scholar
  32. Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  33. Cohen, J. (1994), “The earth is round (p.05),” American Psychologist, 49, 997–1003.CrossRefGoogle Scholar
  34. Collopy, F., M. Adya J. S. Armstrong (1994), “Principles for examining predictive validity: The case of information systems spending forecasts,” Information Systems Research, 5, 170–179.CrossRefGoogle Scholar
  35. Collopy, F. J. S. Armstrong (1992), “Rule-based forecasting: Development and validation of an expert systems approach to combining time series extrapolations,” Management Science, 38, 1394–1414.CrossRefGoogle Scholar
  36. Dalessio, A. T. (1994), “Predicting insurance agent turnover using a video-based situational judgment test,” Journal of Business and Psychology, 9, 23–37.CrossRefGoogle Scholar
  37. Dunnett, C. W. (1955), “A multiple comparison procedure for comparing several treatments with a control,” Journal of the American Statistical Association, 50, 1096–1121. Available at Google Scholar
  38. Dunnett, C. W. (1964), “New tables for multiple comparisons with a control,” Biometrics, 20, 482–491. Available at Google Scholar
  39. Elliott, J. W. J. R. Baier (1979), “Econometric models and current interest rates: How well do they predict future rates?” Journal of Finance, 34, 975–986.CrossRefGoogle Scholar
  40. Erickson, E.P. (1988), “Estimating the concentration of wealth in America,” Public Opinion Quarterly, 2, 243–253.CrossRefGoogle Scholar
  41. Ferber, R. (1956), “Are correlations any guide to predictive value?” Applied Statistics, 5, 113–122.CrossRefGoogle Scholar
  42. Fildes, R. R. Hastings (1994), “The organization and improvement of market forecasting,” Journal of the Operational Research Society, 45, 1–16.Google Scholar
  43. Fildes, R., M. Hibon, S. Makridakis N. Meade (1998), “Generalizing about univariate forecasting methods: Further empirical evidence” (with commentary), International Journal of Forecasting, 14, 339–366.Google Scholar
  44. Fildes, R. S. Makridakis (1988), “Forecasting and loss functions,” International Journal of Forecasting, 4, 545–550.Google Scholar
  45. Fischhoff, B. (2001), “Learning from experience: Coping with hindsight bias and ambiguity,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  46. Flores, B. C. Whybark (1986), “A comparison of focus forecasting with averaging and exponential smoothing,” Production and Inventory Management, 27, (3), 961–103.Google Scholar
  47. Friedman, M. (1953), “The methodology of positive economics,” Essays in Positive Economics. Chicago: University of Chicago Press.Google Scholar
  48. Friedman, M. A. J. Schwartz (1991), “Alternative approaches to analyzing economic data.” American Economic Review, 81, Appendix, pp. 48–49.Google Scholar
  49. Gardner, E. S. Jr. (1984), “The strange case of lagging forecasts,” Interfaces, 14 (May–June), 47–50.CrossRefGoogle Scholar
  50. Gardner, E. S. Jr. (1985), “Further notes on lagging forecasts,” Interfaces, 15 (Sept–Oct.), 63.CrossRefGoogle Scholar
  51. Gardner, E. S. Jr. E. A. Anderson (1997), “Focus forecasting reconsidered,” International Journal of Forecasting, 13, 501–508.CrossRefGoogle Scholar
  52. Gurbaxani, V. H. Mendelson (1990), “An integrative model of information systems spending growth,” Information Systems Research, 1, 254–259.CrossRefGoogle Scholar
  53. Gurbaxani, V. H. Mendelson (1994), “Modeling vs. forecasting: The case of information systems spending,” Information Systems Research, 5, 180–190.CrossRefGoogle Scholar
  54. Henderson, D.R. (1996), “Rush to judgment,” Managerial and Decision Economics, 17, 339–344.CrossRefGoogle Scholar
  55. Hubbard, R. J. S. Armstrong (1994), “Replications and extensions in marketing: Rarely published but quite contrary,” International Journal of Research in Marketing, 11, 233–248. Full text at Google Scholar
  56. Hubbard, R. P. A. Ryan (2001), “The historical growth of statistical significance testing in psychology—and its future prospects,” Educational and Psychological Measurement, 60, 661–681. Commentary follows on pp. 682–696.Google Scholar
  57. Hubbard, R. D. E. Vetter (1996), “An empirical comparison of published replication research in accounting, economics, finance, management, and marketing,” Journal of Business Research, 35, 153–164.CrossRefGoogle Scholar
  58. Lau, R. D. (1994), “An analysis of the accuracy of `trial heat’ polls during the 1992 presidential election,” Public Opinion Quarterly, 58, 2–20.CrossRefGoogle Scholar
  59. Machlup, F. (1955), “The problem of verification in economics,” Southern Economic Journal, 22, 1–21.CrossRefGoogle Scholar
  60. Makridakis, S. (1993), “Accuracy measures: Theoretical and practical concerns,” International Journal of Forecasting, 9, 527–529.CrossRefGoogle Scholar
  61. Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen R. Winkler (1982), “The accuracy of extrapolation (time series) methods: Results of a forecasting competition,” Journal of Forecasting, 1, 111–153.CrossRefGoogle Scholar
  62. Makridakis, S., C. Chatfield, M. Hibon, M. Lawrence, T. Mills, K. Ord L. F. Simmons (1993), “The M2-Competition: A real-time judgmentally based forecasting study,” International Journal of Forecasting, 9, 5–22. Commentary follows on pages 23–29.Google Scholar
  63. Makridakis, S. M. Hibon (1979), “Accuracy of forecasting: An empirical investigation” (with discussion), Journal of the Royal Statistical Society: Series A, 142, 97–145.Google Scholar
  64. Makridakis, S. M. Hibon (2000), “The M3-Competition: Results, conclusions and implications,” International Journal of Forecasting, 16, 451–476.CrossRefGoogle Scholar
  65. Mayer, T. (1975), “Selecting economic hypotheses by goodness of fit,” The Economic Journal, 85, 877–883.CrossRefGoogle Scholar
  66. McCloskey, D. N. S. T. Ziliak (1996), “The standard error of regressions,” Journal of Economic Literature, 34, 97–114.Google Scholar
  67. McLeavy, D.W., T. S. Lee E. E. Adam, Jr. (1981), “An empirical evaluation of individual item forecasting models” Decision Sciences, 12, 708–714.CrossRefGoogle Scholar
  68. Mentzer, J. T. K. B. Kahn (1995), “Forecasting technique familiarity, satisfaction, usage, and application,” Journal of Forecasting, 14, 465–476.CrossRefGoogle Scholar
  69. Nagel, E. (1963), “Assumptions in economic theory,” American Economic Review, 53, 211–219.Google Scholar
  70. Ohlin, L. E. O. D. Duncan (1949), “The efficiency of prediction in criminology,” American Journal of Sociology, 54, 441–452.CrossRefGoogle Scholar
  71. Pant, P. N. W. H. Starbuck (1990), “Innocents in the forest: Forecasting and research methods,” Journal of Management, 16, 433–460.CrossRefGoogle Scholar
  72. Schnaars, S. (1984), “Situational factors affecting forecast accuracy,” Journal ofMarketing Research, 21, 290–297.CrossRefGoogle Scholar
  73. Schupack, M. R. (1962), “The predictive accuracy of empirical demand analysis,” Economic Journal, 72, 550–575.CrossRefGoogle Scholar
  74. Sexton, T. A. (1987), “Forecasting property taxes: A comparison and evaluation of methods,” National Tax Journal, 15, 47–59Google Scholar
  75. Shamir, J. (1986), “Pre-election polls in Israel: Structural constraints on accuracy,” Public Opinion Quarterly, 50, 62–75.CrossRefGoogle Scholar
  76. Slovic, P. D. J. McPhillamy (1974), “Dimensional commensurability and cue utilization in comparative judgment,” Organizational Behavior and Human Performance, 11, 172–194.CrossRefGoogle Scholar
  77. Smith, B. T. (1978), Focus Forecasting: Computer Techniques for Inventory Control. Boston: CBI Publishing.Google Scholar
  78. Smith, M. C. (1976), “A comparison of the value of trainability assessments and other tests for predicting the practical performance of dental students,” International Review of Applied Psychology, 25, 125–130.CrossRefGoogle Scholar
  79. Stephan, W. G. (1978), “School desegregation: An evaluation of predictions made in Brown v. Board of Education,” Psychological Bulletin, 85, 217–238.CrossRefGoogle Scholar
  80. Theil, H. (1966), Applied Economic Forecasting. Chicago: Rand McNally.Google Scholar
  81. Wade, N. (1976), “IQ and heredity: Suspicion of fraud beclouds classic experiment,” Science, 194, 916–919.CrossRefGoogle Scholar
  82. Webster, E. C. (1964), Decision Making in the Employment Interview. Montreal: Eagle.Google Scholar
  83. Weimann, G. (1990), “The obsession to forecast: Pre-election polls in the Israeli press,” Public Opinion Quarterly, 54, 396–408.CrossRefGoogle Scholar
  84. Winston, C. (1993), “Economic deregulation: Days of reckoning for microeconomists,” Journal of Economic Literature, 31, 1263–1289.Google Scholar
  85. Yokum, T. J. S. Armstrong (1995), “Beyond accuracy: Comparison of criteria used to select forecasting methods,” International Journal of Forecasting, 11, 591–597. Full text at hops.wharton.upenn/edu/forecast.Google Scholar
  86. Zellner, A. (1986), “A tale of forecasting 1001 series: The Bayesian knight strikes again,” International Journal of Forecasting, 2, 491–494.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2001

Authors and Affiliations

  • J. Scott Armstrong
    • 1
  1. 1.The Wharton SchoolUniversity of PennsylvaniaUSA

Personalised recommendations