Performance Measures and Uncertainty
Many artificial intelligence algorithms or models are ultimately designed for prediction. A prediction algorithm, wherever it may reside—in a computer, or in a forecaster's head—is subject to a set of tests aimed at assessing its goodness. The specific choice of the tests is contingent on many factors, including the nature of the problem, and the specific facet of goodness. This chapter will discuss some of these tests. For a more in-depth exposure, the reader is directed to the references, and two books: Wilks (1995) and Jolliffe and Stephenson (2003). The body of knowledge aimed at assessing the goodness of predictions is referred to as performance assessment in most fields; in atmospheric circles, though, it is generally called verification. In this chapter, I consider only a few of the numerous performance measures considered in the literature, but my emphasis is on ways of assessing their uncertainty (i.e., statistical significance).
Here, prediction (or forecast) does not necessarily refer to the prediction of the future state of some variable. It refers to the estimation of the state of some variable, from information on another variable. The two variables may be contemporaneous, or not. What is required, however, is that the data on which the performance of the algorithm is being assessed is as independent as possible from the data on which the algorithm is developed or fine-tuned; otherwise, the performance will be optimistically biased—and that is not a good thing; see Section 2.6 in Chapter 2.
KeywordsMean Square Error False Alarm Rate Sampling Distribution Ensemble Prediction System False Alarm Ratio
Unable to display preview. Download preview PDF.
- Baldwin, M. E., Lakshmivarahan, S., & Kain, J. S. (2002). Development of an “events-oriented” approach to forecast verification. 15th Conference, Numerical Weather Prediction, San Antonio, TC, August 12–16, 2002. Available at http://www.nssl.noaa.gov/mag/pubs/nwp15verf.pdf
- Brown, B. G., Bullock, R., Davis, C. A., Gotway, J. H., Chapman, M., Takacs, A., Gilleland, E., Mahoney, J. L., & Manning, K. (2004). New verification approaches for convective weather forecasts. Preprints, 11th Conference on Aviation, Range, and Aerospace, Hyannis, MA, October 3–8Google Scholar
- Devore, J., & Farnum, N. (2005). Applied statistics for engineers and scientists. Belmont, CA: Thomson LearningGoogle Scholar
- Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. London: Chapman & HallGoogle Scholar
- Good, P. I. (2005a). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). New York: Springer. ISBN 0-387-98898-XGoogle Scholar
- Good, P. I. (2005b). Introduction to statistics through resampling methods and R/S-PLUS. New Jersey, Canada: Wiley. ISBN 0-471-71575-1Google Scholar
- Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer Series in Statistics. New York: SpringerGoogle Scholar
- Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. Chichester WileyGoogle Scholar
- Livezey in Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. West Sussex, England: Wiley. See chapter 4 concerning categorical events, written by R. E. LivezeyGoogle Scholar
- Macskassy, S. A., Provost, F. (2004). Confidence bands for ROC curves: Methods and an empirical study. First workshop on ROC analysis in AI, ECAI-2004, SpainGoogle Scholar
- Seaman, R., Mason, I., & Woodcock, F. (1996). Confidence intervals for some performance measures of Yes-No forecasts. Australian Meteorological Magazine, 45, 49–53Google Scholar
- Stephenson, D. B., Casati, B., & Wilson, C. (2004). Verification of rare extreme events. WMO verification workshop, Montreal, September 13–17Google Scholar
- Venugopal, V., Basu, S., & Foufoula-Georgiou, E. (2005). A new metric for comparing precipitation patterns with an application to ensemble forecasts. Journal of Geophysical Research, 110, D8, D08111 DOI: 10.1029/2004JD005395Google Scholar
- Wilks, D. S. (1995). Statistical methods in the atmospheric sciences (467 pp.). San Diego, CA: Academic PressGoogle Scholar