To check the congruence of two distributions they can be plotted in a common diagram. However, a mere visual examination of congruence is often not adequate. In the following, a selection of different indicators to quantify the similarity or conformity of two distributions is presented.
Comparison of distribution parameters
In general, distributions, i.e. their position, appearance and properties, can be described by distribution parameters. By a comparison of those distribution parameters, it is possible to determine the similarity between two distributions. Therefore, it is essential to record them for each distribution. These parameters are in particular:
-
sample size N (this is equivalent to the total number of trips),
-
weighted mean \( \bar{m} \) (this represents the mean indicator, e.g. mean trip distance),
-
standard deviation sm,
-
coefficient of variation Vm,
-
skew of the distribution \( \gamma_m \) and
-
percentiles of the distribution, e.g. \( Q_{0.05} ,\;Q_{0.15} ,\;Q_{0.25} ,\;Q_{0.5} ,\;Q_{0.75} ,\;Q_{0.85} ,\;Q_{0.95} . \)
The calculation specifications of the parameters are shown in the appendix.
Statistical tests
Various statistical tests can be used to check for differences between distributions and their parameters, respectively. For example, the t-test checks for significant deviations of the mean and the F-test checks for significant variance deviations (Backhaus et al. 2011).
In addition to these parametric tests, which check the distribution parameters, there are non-parametric or distribution-free tests, which check whether two samples derive from a common population. An example of this is the Kolmogorov-Smirnoff-test, suitable for interval scaled data. (Herz et al. 1992)
The Kolmogorov-Smirnoff-test initially determines the largest deviation in terms of absolute values between two discrete relative frequency distributions. A comparative measure is then obtained depending on the sample sizes and the accepted level of error. It should be noted that in this test only the position of relative frequencies is assessed.
In contrast to a household travel survey, a travel demand model represents a full census, which means that it covers a large sample size. Consequently, the test criterion is very strict, which often results in a negative test result (i.e. the distributions do not originate from a common population).
Quality indicators
Quality indicators consider the similarity of two distributions x and y. This section will give an executive summary about available quality indicators. The corresponding calculation specifications for each indicator are shown in the appendix.
Correlation Coefficient and Coefficient of Determination
The Correlation Coefficient R and the Coefficient of Determination R2 represent quality indicators that check the dependency on two (typically unclassified) datasets. R is a non-dimensional variable ranging between − 1.0 and 1.0 and reflecting the extent of the linear dependency between two data sets. R2 expresses the amount of variation explained by the independent variables of the model. The higher the Coefficient of Determination, the higher the proportion of data explained by the regression function.
It is to be considered critically that the Coefficient of Determination on the one hand does not indicate any causality of the observed correlation and on the other hand inevitably increases with an increasing number of examined factors (Cambridge Systematics Inc. 2014; Backhaus et al. 2011). Therefore, R2 is suitable as a quality indicator only to a limited extent. The thresholds for acceptable R2 values mentioned in the literature vary between 0.88, 0.95 (Cambridge Systematics Inc. 2014) and 0.98 (Department for Transport 2014).
Mean Absolute Error, Euclidean Distance and Root Mean Squared Error
A common quality indicator is the Mean Absolute Error (MAE). It describes, as the name implies, the average of all deviations between the modelled and observed values. One way to scale the MAE to a unit-less quality indicator is to divide the MAE by the sum of the observed values. The resulting quality indicator is called Relative Mean Absolute Error (%MAE). This %MAE should not be confused with the Mean Absolute Percentage Error (MAPE) (Vandeput 2019).
The Euclidean Distance d considers the sum of the squared deviations across all distribution classes. Since, as will be explained later, the use of relative frequencies is advisable, a standardization to a common unit, as recommended e.g. by Backhaus et al. 2011, is not necessary.
Another common quality indicator is the Root Mean Squared Error (RMSE) and its relative and unit-less form (%RMSE). Compared to the MAE, the RMSE weights errors more strongly and thus has a stronger influence on the evaluation result (Cambridge Systematics Inc. 2014; Vandeput 2019).
Theil’s Forecast Accuracy Coefficient
Theil’s Forecast Accuracy Coefficient examines the conformity of two distributions. Unfortunately, two indicators with the same name U have been developed. In order to distinguish them they are called U1 and U2.
-
U1 ranges between 0 and 1, where 0 indicates a perfect match and 1 indicates the worst match. The worst match occurs either with a negative proportionality between the considered distributions or when both distributions are continuously equal to zero. Since all distributions are evaluated better than a naïve forecast, it is not possible to interpret them unambiguously and therefore U1 should not be used (Bliemel 1973; Andres and Spiwoks 2000).
-
U2 ranges from 0 to ∞, where 0 indicates a perfect match. At U2 = 1 the comparative distribution has the same quality as a naïve forecast. Consequently, U2 > 1 indicates that the comparative distribution is worse to evaluate since even a naïve forecast is evaluated better. U2 is always preferable to U1 (Bliemel 1973, Andres and Spiwoks 2000).
Furthermore, the deviations between two distributions can be analysed more precisely with three components of error UM, US and UC (FGSV 2006; Bliemel 1973; Andres and Spiwoks 2000).
-
UM indicates the proportion of the mean squared error resulting from an inequality of the mean values, which in turn results from a systematic over- or underestimation.
-
US indicates systematic differences in the variances.
-
UC indicates the absence of a linear correlation between the two distributions. It is an indicator for unsystematic, random errors.
According to Andres and Spiwoks (2000) and FGSV (2006) a good conformity is assumed if UM and US are close to 0 (\( U^{M} ,U^{S} \to 0 \) or at least \( U^{M} ,U^{S} < 0.2 \)) and when the total error is mostly due to unsystematic errors \( \left( {U^{C} \to 1} \right). \)
Vortisch’s Indicator of Similarity
A distance indicator presented by Vortisch (2006) takes into account the similarity in form and position of two distributions (see Fig. 4).
The Correlation Coefficient R describes the similarity of the shapes. Furthermore, the positional similarity θ and the overlapping of the domains σ are determined. The individual components are then combined to a general distance indicator ∆. If the distributions match perfectly, ∆ = 0 will result and with increasing difference it will strive against ∆ = 1.
Coincidence Ratio
The Coincidence Ratio (CR) determines the degree to which two distributions overlap (see Fig. 5). Input value for the calculation are relative frequencies of classified distributions. The CR ranges from 0 to 1, where 1 indicates a perfect match and 0 indicates no match. According to Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014), a high degree of congruence applies if CR > 0.7.
Absolute or relative frequencies
As shown in Table 2, a comparison of two distributions can either use relative or absolute frequencies. The choice of relative or absolute frequencies influences the value of the quality indicators. Hence, if two distributions are to be evaluated in terms of their similarity, it must be clarified beforehand whether the quality indicator used should operate with absolute or relative frequencies. As distance and time distributions derived from surveys are usually provided only as relative values, relative frequencies should be used. Furthermore, in a travel demand model, the absolute number of generated trips should be checked directly after trip generation. If this trip generation validation is omitted, systematic differences, such as permanently too few trips per class, are not recognized.
Summary on quality indicators
Table 2 compares the presented quality indicators by the following criteria (☒ meaning “yes” and ☐ meaning “no”):
-
What is the possible range of values of the indicators?
-
Does the calculation specification require absolute or relative frequencies?
-
Is there a difference in the resulting values of the indicator when using relative and absolute frequencies?
-
Is there a possibility for in-depth analysis using the same indicator?
-
Are there applicable thresholds for the indicator?
Table 2 Comparison of quality indicators In a benchmark comparison, over 30 equiquantile distributions with 10 classes were compared. These were derived from real travel demand models as well as systematically generated in order to examine certain characteristics of the indicators (see top Fig. 6). The box plots in bottom Fig. 6 show the result of this comparison. The following conclusions can be derived:
-
Correlation Coefficient R and Coefficient of Determination R2 fail for ideal equiquantile distributions, because the mean of the reference distribution matches the demand per class, resulting in an unsolvable expression.
-
The results for Euclidean Distance d, Mean Absolute Error MAE and Root Mean Squared Error RMSE show a very small bandwidth in the present case of relative frequencies. Therefore, their relative forms Relative Mean Absolute Error %MAE or relative Root Mean Squared Error %RMSE are to be preferred. A general statement as to which of these quality indicators is more suitable is not possible, since this depends, among other things, on the respective accuracy requirements of the model.
-
The expressions for Relative Root Mean Squared Error %RMSE and Theil’s Forecast Accuracy Coefficient U2 are mathematically identical in this case of relative deviations and classified data.
-
Due to the properties described above, Theil’s Forecast Accuracy Coefficient U2 and Vortisch’s Indicator of Similarity ∆ are suitable because of their sophisticated analysis options. However, the latter is less sensitive, which is reflected in the fact that it does not make full use of its value range even in the case of large deviations.
-
The Coincidence Ratio CR is sensitive to changes, it can take values in its entire value range and its value range from 0 to 1 speaks for a straightforward interpretability. In addition, specific thresholds from the Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014) provide good orientation for the evaluation of quality.
In summary, the Coincidence Ratio CR, the Relative Root Mean Squared Error %RMSE (or Theil’s Forecast Accuracy Coefficient U2) and the Relative Mean Absolute Error %MAE seem suitable for the evaluation of equiquantile classified relative frequency distributions. In addition, the three components of Theil’s Forecast Accuracy Coefficient UM, US and UC are useful for in-depth analyses.