This paper reviews methods to evaluate the performance of air quality models, which are tools that predict the fate of gases and aerosols upon their release into the atmosphere. Because of the large economic, public health, and environmental impacts often associated with the use of air quality model results, it is important that these models be properly evaluated.
A comprehensive model evaluation methodology makes use of scientific assessments of the model technical algorithms, statistical evaluations using field or laboratory data, and operational assessments by users in real-world applications. The focus of the current paper is on the statistical evaluation component. It is important that a statistical model evaluation exercise should start with clear definitions of the evaluation objectives and specification of hypotheses to be tested. A review is given of a set of model evaluation methodologies, including the BOOT and the ASTM evaluation software, Taylor’s nomogram, the figure of merit in space, and the CDF approach. Because there is not a single best performance measure or best evaluation methodology, it is recommended that a suite of different performance measures be applied. Suggestions are given concerning the magnitudes of the performance measures expected of “good” models. For example, a good model should have a relative mean bias less than about 30% and a relative scatter less than about a factor of two.
In order to demonstrate some of the air quality model evaluation methodologies, two simple baseline urban dispersion models are evaluated using the Salt Lake City Urban 2000 field data. The importance of assumptions concerning details such as minimum concentration and pairing of data are shown. Typical plots and tables are presented, including determinations of whether the difference in the relative mean bias between the two models is statistically significant at the 95% confidence level.