Closely related to the problem of the reliability presented in the previous chapter is the problem of determining if the difference between multiple performance evaluation measurements is statistically significant. If the interest is in the statistical significance of differences between performance evaluation scores of machine learning models, the discussion of the previous chapter showed that there are two major sources of randomness that need to be respected: one is the randomness of the test data sample on which the models are to be compared. The other is the inherent randomness of the machine learning procedure, exemplified by meta-parameter variations.