Abstract
In travel demand modelling, trip distance distributions or trip time distributions are used to evaluate how well a model fits with observed sample data. Therefore, the comparison of distributions is an essential part in the model validation process. Despite its importance, the common modelling guidelines from the UK, the USA or Austria provide little information about the correct structure and handling of such distributions. Likewise, common statistical methods are not practicable for the validation of transport models. This lack of rules leads to individual solutions, which complicate a model validation and the comparison of models. For example, when comparing two distributions the quality indicator strongly depends on the number of classes. Therefore, guidelines for model validation need to suggest an appropriate way to determine the number of classes. The paper suggests a method for evaluating trip distance distributions and trip time distributions within the model validation process of a travel demand model. It proposes (a) indicators for a classification which consider modespecific trip distances and trip times (b) a generic classification method based on an equiquantile class width, quality indicators for comparing two distributions and (c) to use relative frequencies instead of absolute frequencies for the calculation of the quality indicators.
Similar content being viewed by others
Motivation
Within the process of building a travel demand model calibration and validation are two iterative and repetitive steps. Calibration describes the process of modifying the model, refinement of specifications, correction of (previously undiscovered) erroneous input data or adjustment of (nonempirically verified) parameters. The validation on the other hand describes the process of testing if the model has reached a certain goodnessoffit. This goodnessoffit does not refer to a particular key performance indicator. Ideally, before the model building process starts, the tendering institution determines how accurately the model should reproduce e.g. number of trips, traffic volumes, mean trip distances and mean trip times for each mode and trip purpose, keeping in mind the availability of reference data and its quality. If the validation process concludes that the model quality is not sufficient, adjustments are made to the model in the step of calibration.
Frequency distributions, which are another type of key performance indicators, describe travel behaviour using specific indicators in discrete classes. There are two common cases: frequency distributions of (1) the time travelled, which are also known as trip time distributions or trip time frequency distributions, and (2) the distance travelled commonly known as trip distance distributions or trip length frequency distributions.
Among others, those distributions are an important tool to evaluate how well a model fits with observed sample data. Therefore, the comparison of distributions is an essential part in the model validation process. Despite its importance the common modelling guidelines from the UK (WebTAG (Department for Transport 2014)), the USA (Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014)) or Austria (draft of Qualivermo (Sammer et al. 2012)) provide little information about the correct structure and handling of such distributions. Likewise, common statistical methods that check whether two samples derive from a common population are not practicable for application during the validation of transport models, as will be explained later. This lack of rules leads to individual solutions, which complicate the model validation process, the comparison of models and the definition of thresholds for quality indicators in guidelines.
The following Fig. 1 is intended to illustrate this issue. The same data set (number of trips with their corresponding trip length from a household travel survey and from a travel demand model) is classified into discrete classes in three different ways: the left figure is classified into 20 equidistant classes (1.5 km each), the middle figure into 10 equidistant classes (3 km each) and the right figure into 5 equidistant classes (6 km each). The overlap of the observed and modelled distribution, which is measured by the Coincidence Ratio (CR, which will be explained in more detail later), shows that the lower the number of classes, the less overlap there is. In other words, in an extreme case with only one or two classes, there would be a very high overlap. In this example, it would mean that the classes would only have to be large enough to reach the threshold of CR ≥ 7 proposed in the Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014). Having said this, the modelled distribution in the left figure would probably be rejected, while the others would be acceptable, although the underlying data set is the same in all three figures.
In general, this means that the resulting distribution evaluation is meaningless if there are no rules for creating the distribution, because even a poor model can be twisted so that the results look fine. Thus the overall goodness–of–fit is at risk. Therefore, guidelines for model validation need to suggest an appropriate way of how to build the distributions.
This is one of the reasons for many misunderstandings between the model users (tendering institutions) and the model developers (contractors) in the context of model development. When validating a model with distributions of time and distance the following questions inevitably appear and they should be answered beforehand by the tendering institutions:

What indicator should be used for the classification?

Which area is covered by the distribution?

What is the reference for a comparison of distributions?

How many classes should be distinguished? What is the appropriate size of each class? Should the class size increase with the distance travelled? How to proceed with empty classes?

What quality indicator should be used when comparing two distributions? Should absolute or relative frequencies or both be evaluated?
The questions above are also related to the issue that the classification often depends on the dimensions of the study area. This is another reason for individual and not comparable solutions. In order to provide general guidance on how to handle frequency distributions within the model validation process this paper proposes a standardized classification method that can be applied independent from the indicator used and from the dimensions of the study area. For the data set in the example in Fig. 1, this would lead to a single generic classification. It is therefore a necessary complement to the existing guidelines in order to make model validation results more evaluable and comparable.
Selection of indicators for a classification
In a trip distance or a trip time distribution, the trips are assigned to discrete classes. This requires the number of trips as well as distance or time values on the level of origin–destinationpairs (ODpairs).
Figure 2 shows a simple example with two ODpairs (from A to B and from B to C) and two modes (car and public transport  PuT). The modes require different amounts of time to cover the distance between the ODpairs. This can lead to cases where the trips of one ODpair fall in different classes for different modes. Such cases occur frequently for trip time, but they can also occur for trip distance, as public transport trips are often longer than car trips. In the example of Fig. 2 the lower left chart shows that the travel demand for each ODpair is assigned to different trip time classes, because a modespecific indicator was used for classification.
This effect is difficult to understand in more complex multimodal trip distributions. In order to avoid this effect entirely and for comparisons on ODpair level (e.g. a distancedependent modal share), the travel demand of all modes on one ODpair should fall into the same class.
To achieve this, a modeindependent indicator should be used for the classification. In case of trip distance distributions, the direct distance between an ODpair can serve as a “natural” modeindependent indicator. For trip time distributions, such a “natural” modeindependent indicator does not exist. An indicator weighted with the number of trips provides a solution for this.
In the example of Fig. 2, the lower right chart displays a trip time distribution, which uses a modeindependent indicator for classification. As a result, both car and PuT travel demand fall in the same trip time class for each ODpair.
Table 1 summarizes the options for selecting a frequency distribution indicator.
A standardized classification requires rules for handling intrazonal trips. In the case of a macroscopic travel demand matrix the main diagonal represents the intrazonal trips. Since this part of the demand is not assigned to the network model, indicators like trip time or trip distance cannot be calculated, but must be estimated in such a way that they represent an average movement within a zone. Those mean indicators will never fit to an observed intrazonal trip. Therefore, when using distribution comparisons for model validation it is advisable to exclude the intrazonal trips.
Nevertheless, as Bhatta and Larsen (2011) point out omitting intrazonal trips leads to different results in model estimation. Therefore, the model developers are obligated to check the intrazonal trips in a separate analysis. Such an analysis should investigate the number and the modal split of intrazonal trips. A separate examination for different zone sizes may also be reasonable.
Selection of the study area and a reference distribution
A travel demand model computes demand for a defined study area. The model is calibrated with observed data from a household travel survey. This survey ideally covers a sample of the population from the entire study area. If the study area of the survey and the study area of the travel demand model do not match completely, the comparison of observed and modelled values should only include trips starting and ending in the common study area. This also means that observed trips leaving the common study area have to be excluded from the survey when comparing the observed and the modelled distributions. A comparison of two distributions is only meaningful if observed and modelled trips relate to the same area. A separate analysis is recommended for modelled trips where no comparison data is available.
An additional point to consider is that only homebased trips can be used for validation. For example, modelled trips from home to work can be compared directly with survey data, while modelled trips from shopping to leisure are not necessarily made by residents of the study area and therefore not comparable to a survey of the study area.
Calibrating the destination choice of a travel demand model requires observed distance or time distributions from a household travel survey as a reference distribution. Since interviewed persons tend to round estimates of distance and time for their reported trips, reported distance and time values should not be used directly from the survey. Instead, the distance and time values should be computed with the model using the geocoded origins and destinations from the survey (FGSV 2012).
As Sammer et al. (2018) point out, it should be noted that a household travel survey always contains a systematic error, which results from underreporting of travel behaviour. Hence, even a complete fit of observed and modelled distribution does not mean, that a realistic behaviour is modelled. Sammer et al. (2018) give advice how to increase the quality of a household travel survey.
The purpose of a model is to analyse impacts of scenarios compared to the impacts of a base case. For this, planers look at key performance indicators. Key performance indicators can be single values (e.g. volumes at certain locations, total distance travelled and total time spend) and distributions of travel time and distance. For comparing distributions of a base case and scenarios, distributions derived from the base case of the travel demand model replace observed distributions from the household travel survey as reference distribution.
Selection of a classification method
Class width versus class size
Usually distribution classes are determined in such a way that the class width and a number of classes are predefined. Based on the classification indicator the demand is then assigned to the resulting classes. If the class width is the same across all distribution classes (and the number of classes is infinite in an extreme case), the term “equidistant” distribution is used. However, in many cases insufficiently populated classes are aggregated, resulting in classes of varying width.
A different approach defines the share of demand in each class and uses it to determine the class width. The resulting classes vary in width but they are equally populated. This method of classifying distributions is called “equiquantile” (based on e.g. Paluš (1995)).
In the case of an equidistant classification, three questions need to be addressed:

What is the size of the classes?

How many classes should there be in total?

Should classes be grouped together and, if so, how?
On the other hand, the method of equiquantile distribution only raises the question about the number of classes. For this reason, the equiquantile classification method is presented first. Based on this, a procedure is explained to answer the question about the class width of equidistant distributions.
Analysis with equiquantile classes
The class boundaries of this classification method are calculated using weighted quantiles, with the elements of the indicator matrix representing the classification variable and the demand matrix elements representing the weight. The calculation used for the classification and an example calculation are shown in the appendix.
Although all classes should have the same quantity, the actual demand per class may differ slightly from the desired quantity. This may be due to the following reasons:

A discrete demand cannot always be distributed completely and evenly across all classes. This is the case, when \( \left( {demand} \right)\,\bmod \,\left( {number\;of\;classes} \right) \ne 0. \) As a result, at least one class has a greater demand than other classes. The smaller the population, the greater the deviation from theoretically equal classes. However, this effect is reduced as the sample size increases.

A single ODpair with a large demand can also lead to a distortion, for example when \( demand_{od} > demand_{total} /number\;of\;classes. \)

A rounding of class boundaries prior to classification can lead to shifts in demand between neighbouring classes.
As mentioned above, the number of classes must be defined for the equiquantile classification. Dividing the demand into 10% steps, i.e. into ten classes, is one possible pragmatic assumption. Figure 3 shows a sample of a trip distance distribution. It is shown that each class has approximately the same number of trips and the resulting graph approximates a straight line.
Equiquantile classes can be applied for a particular demand segment, for the total demand and for the person distance travelled. Those applications are described in detail in the appendix.
Analysis with equidistant classes
As previously mentioned, the number of classes and the class width must be predefined for the equidistant classification method. A common procedure to determine those parameters is Sturges’ rule for constructing histograms, which is discussed e.g. by Hyndman (1995).
Another way of determining the class width is to define the smallest class of the equiquantile distributed total demand as the equidistant class width. This class width represents the smallest class with 10% of the total demand. In a strictly equidistant distribution, the number of classes is not important, i.e. classes of equal width are created until the largest classification indicator is reached. However, in this case empty classes may occur.
In a variation of the equidistant classification, the statistically unreliable, i.e. lowoccupied, classes are aggregated. However, in order to use this particular classification method the input parameters number of classes, size of the grouped classes and minimum occupancy of a class need to be defined in advance. Qualivermo (Sammer et al. 2012) suggests class widths for these cases:

for metropolitan traffic in short distance areas: 2 km or 5 min,

for regional and longdistance traffic: 5 km or 10 min.
Application of distributions for validation and presentation purposes
The equidistant classification method requires three parameters as input (number, size, aggregation rules of classes). This leads to several reasonable combinations of parameters. In contrast, the equiquantile method requires only one parameter (number of classes). As the equiquantile method it the more generic representation of a demand distribution it should be used for model validation purposes.
Results from an equidistant classification display typical patterns of travel demand, e.g. decreasing demand with increasing distance. This may be helpful for interpreting results. Thus, an equidistant classification should be used for presentation purposes, but not for validation purposes.
Selection of a quality indicator for distribution comparisons
To check the congruence of two distributions they can be plotted in a common diagram. However, a mere visual examination of congruence is often not adequate. In the following, a selection of different indicators to quantify the similarity or conformity of two distributions is presented.
Comparison of distribution parameters
In general, distributions, i.e. their position, appearance and properties, can be described by distribution parameters. By a comparison of those distribution parameters, it is possible to determine the similarity between two distributions. Therefore, it is essential to record them for each distribution. These parameters are in particular:

sample size N (this is equivalent to the total number of trips),

weighted mean \( \bar{m} \) (this represents the mean indicator, e.g. mean trip distance),

standard deviation s_{m},

coefficient of variation V_{m},

skew of the distribution \( \gamma_m \) and

percentiles of the distribution, e.g. \( Q_{0.05} ,\;Q_{0.15} ,\;Q_{0.25} ,\;Q_{0.5} ,\;Q_{0.75} ,\;Q_{0.85} ,\;Q_{0.95} . \)
The calculation specifications of the parameters are shown in the appendix.
Statistical tests
Various statistical tests can be used to check for differences between distributions and their parameters, respectively. For example, the ttest checks for significant deviations of the mean and the Ftest checks for significant variance deviations (Backhaus et al. 2011).
In addition to these parametric tests, which check the distribution parameters, there are nonparametric or distributionfree tests, which check whether two samples derive from a common population. An example of this is the KolmogorovSmirnofftest, suitable for interval scaled data. (Herz et al. 1992)
The KolmogorovSmirnofftest initially determines the largest deviation in terms of absolute values between two discrete relative frequency distributions. A comparative measure is then obtained depending on the sample sizes and the accepted level of error. It should be noted that in this test only the position of relative frequencies is assessed.
In contrast to a household travel survey, a travel demand model represents a full census, which means that it covers a large sample size. Consequently, the test criterion is very strict, which often results in a negative test result (i.e. the distributions do not originate from a common population).
Quality indicators
Quality indicators consider the similarity of two distributions x and y. This section will give an executive summary about available quality indicators. The corresponding calculation specifications for each indicator are shown in the appendix.
Correlation Coefficient and Coefficient of Determination
The Correlation Coefficient R and the Coefficient of Determination R^{2} represent quality indicators that check the dependency on two (typically unclassified) datasets. R is a nondimensional variable ranging between − 1.0 and 1.0 and reflecting the extent of the linear dependency between two data sets. R^{2} expresses the amount of variation explained by the independent variables of the model. The higher the Coefficient of Determination, the higher the proportion of data explained by the regression function.
It is to be considered critically that the Coefficient of Determination on the one hand does not indicate any causality of the observed correlation and on the other hand inevitably increases with an increasing number of examined factors (Cambridge Systematics Inc. 2014; Backhaus et al. 2011). Therefore, R^{2} is suitable as a quality indicator only to a limited extent. The thresholds for acceptable R^{2} values mentioned in the literature vary between 0.88, 0.95 (Cambridge Systematics Inc. 2014) and 0.98 (Department for Transport 2014).
Mean Absolute Error, Euclidean Distance and Root Mean Squared Error
A common quality indicator is the Mean Absolute Error (MAE). It describes, as the name implies, the average of all deviations between the modelled and observed values. One way to scale the MAE to a unitless quality indicator is to divide the MAE by the sum of the observed values. The resulting quality indicator is called Relative Mean Absolute Error (%MAE). This %MAE should not be confused with the Mean Absolute Percentage Error (MAPE) (Vandeput 2019).
The Euclidean Distance d considers the sum of the squared deviations across all distribution classes. Since, as will be explained later, the use of relative frequencies is advisable, a standardization to a common unit, as recommended e.g. by Backhaus et al. 2011, is not necessary.
Another common quality indicator is the Root Mean Squared Error (RMSE) and its relative and unitless form (%RMSE). Compared to the MAE, the RMSE weights errors more strongly and thus has a stronger influence on the evaluation result (Cambridge Systematics Inc. 2014; Vandeput 2019).
Theil’s Forecast Accuracy Coefficient
Theil’s Forecast Accuracy Coefficient examines the conformity of two distributions. Unfortunately, two indicators with the same name U have been developed. In order to distinguish them they are called U_{1} and U_{2}.

U_{1} ranges between 0 and 1, where 0 indicates a perfect match and 1 indicates the worst match. The worst match occurs either with a negative proportionality between the considered distributions or when both distributions are continuously equal to zero. Since all distributions are evaluated better than a naïve forecast, it is not possible to interpret them unambiguously and therefore U_{1} should not be used (Bliemel 1973; Andres and Spiwoks 2000).

U_{2} ranges from 0 to ∞, where 0 indicates a perfect match. At U_{2} = 1 the comparative distribution has the same quality as a naïve forecast. Consequently, U_{2} > 1 indicates that the comparative distribution is worse to evaluate since even a naïve forecast is evaluated better. U_{2} is always preferable to U_{1} (Bliemel 1973, Andres and Spiwoks 2000).
Furthermore, the deviations between two distributions can be analysed more precisely with three components of error U^{M}, U^{S} and U^{C} (FGSV 2006; Bliemel 1973; Andres and Spiwoks 2000).

U^{M} indicates the proportion of the mean squared error resulting from an inequality of the mean values, which in turn results from a systematic over or underestimation.

U^{S} indicates systematic differences in the variances.

U^{C} indicates the absence of a linear correlation between the two distributions. It is an indicator for unsystematic, random errors.
According to Andres and Spiwoks (2000) and FGSV (2006) a good conformity is assumed if U^{M} and U^{S} are close to 0 (\( U^{M} ,U^{S} \to 0 \) or at least \( U^{M} ,U^{S} < 0.2 \)) and when the total error is mostly due to unsystematic errors \( \left( {U^{C} \to 1} \right). \)
Vortisch’s Indicator of Similarity
A distance indicator presented by Vortisch (2006) takes into account the similarity in form and position of two distributions (see Fig. 4).
The Correlation Coefficient R describes the similarity of the shapes. Furthermore, the positional similarity θ and the overlapping of the domains σ are determined. The individual components are then combined to a general distance indicator ∆. If the distributions match perfectly, ∆ = 0 will result and with increasing difference it will strive against ∆ = 1.
Coincidence Ratio
The Coincidence Ratio (CR) determines the degree to which two distributions overlap (see Fig. 5). Input value for the calculation are relative frequencies of classified distributions. The CR ranges from 0 to 1, where 1 indicates a perfect match and 0 indicates no match. According to Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014), a high degree of congruence applies if CR > 0.7.
Absolute or relative frequencies
As shown in Table 2, a comparison of two distributions can either use relative or absolute frequencies. The choice of relative or absolute frequencies influences the value of the quality indicators. Hence, if two distributions are to be evaluated in terms of their similarity, it must be clarified beforehand whether the quality indicator used should operate with absolute or relative frequencies. As distance and time distributions derived from surveys are usually provided only as relative values, relative frequencies should be used. Furthermore, in a travel demand model, the absolute number of generated trips should be checked directly after trip generation. If this trip generation validation is omitted, systematic differences, such as permanently too few trips per class, are not recognized.
Summary on quality indicators
Table 2 compares the presented quality indicators by the following criteria (☒ meaning “yes” and ☐ meaning “no”):

What is the possible range of values of the indicators?

Does the calculation specification require absolute or relative frequencies?

Is there a difference in the resulting values of the indicator when using relative and absolute frequencies?

Is there a possibility for indepth analysis using the same indicator?

Are there applicable thresholds for the indicator?
In a benchmark comparison, over 30 equiquantile distributions with 10 classes were compared. These were derived from real travel demand models as well as systematically generated in order to examine certain characteristics of the indicators (see top Fig. 6). The box plots in bottom Fig. 6 show the result of this comparison. The following conclusions can be derived:

Correlation Coefficient R and Coefficient of Determination R^{2} fail for ideal equiquantile distributions, because the mean of the reference distribution matches the demand per class, resulting in an unsolvable expression.

The results for Euclidean Distance d, Mean Absolute Error MAE and Root Mean Squared Error RMSE show a very small bandwidth in the present case of relative frequencies. Therefore, their relative forms Relative Mean Absolute Error %MAE or relative Root Mean Squared Error %RMSE are to be preferred. A general statement as to which of these quality indicators is more suitable is not possible, since this depends, among other things, on the respective accuracy requirements of the model.

The expressions for Relative Root Mean Squared Error %RMSE and Theil’s Forecast Accuracy Coefficient U_{2} are mathematically identical in this case of relative deviations and classified data.

Due to the properties described above, Theil’s Forecast Accuracy Coefficient U_{2} and Vortisch’s Indicator of Similarity ∆ are suitable because of their sophisticated analysis options. However, the latter is less sensitive, which is reflected in the fact that it does not make full use of its value range even in the case of large deviations.

The Coincidence Ratio CR is sensitive to changes, it can take values in its entire value range and its value range from 0 to 1 speaks for a straightforward interpretability. In addition, specific thresholds from the Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014) provide good orientation for the evaluation of quality.
In summary, the Coincidence Ratio CR, the Relative Root Mean Squared Error %RMSE (or Theil’s Forecast Accuracy Coefficient U_{2}) and the Relative Mean Absolute Error %MAE seem suitable for the evaluation of equiquantile classified relative frequency distributions. In addition, the three components of Theil’s Forecast Accuracy Coefficient U^{M}, U^{S} and U^{C} are useful for indepth analyses.
Conclusion and recommendations
Appropriate transport planning requires appropriate travel demand models. Currently, many discussions are taking place about what “appropriate” actually means. This paper is a contribution to those discussions. Frequency distributions are an important quality feature of travel demand models. In quality assessment it is particularly important that the classification method is known, because the current techniques that are used in everyday practice lead to individual classifications and thus to different evaluation results. This makes it impossible to compare validation results of different models and to define a quality indicator threshold in modelling guidelines.
In the previous chapters, various classification and quality determination methods have been presented. In order to ensure a uniform classification method in quality assurance, the following procedure is proposed for the standardized creation and assessment of distributions. This proposal is intended to serve as a basic concept. For specific travel demand model applications, the specifications have to be adjusted if necessary. Modelling guidelines should define a generally obligatory procedure in the future.

1.
Selection of modeindependent classification indicators for each distribution:

for trip distance distributions: direct distance,

for trip time distributions: mean weighted trip time in the reference situation.

The intrazonal trips are not taken into account in this evaluation. They have to be evaluated separately.


2.
Specification of the study area:

only areas covered by the reference and the model are considered,

only trips with origin and destination inside the study area are considered,

other trips have to be evaluated separately.


3.
Specification of the reference distribution for each application:

for calibrating and validating a travel demand model: a distribution from a household travel survey (keeping in mind that such surveys contain a systematic error),

for comparisons of modelled scenarios: a distribution from the base case of a travel demand model.


4.
Calculation of ten equiquantile classes for all relevant demand segments and for the total demand and

visualization of the distributions with relative frequencies,

presentation of the distribution parameters,

using the Coincidence Ratio as quality indicator (calculation with relative frequencies). According to Travel Model Validation and Reasonableness Checking Manual (Cambridge Systematics Inc. 2014), the Coincidence Ratio should be CR ≥ 0.7 for a high level of congruence.


5.
Optional: Visualization of equidistant distributions. To determine the class width, it is possible to use the smallest class of the equiquantile classified of the total demand. Equidistant distributions should not be used for quality tests. They are created for display purposes only.
References
Andres, P., Spiwoks, M.: Prognosegütemaße. State of the Art der statistischen ExpostBeurteilung von Prognosen. SofiaStudien zur Institutionenanalyse, Nr. 001. Sofia, Darmstadt (2000)
Backhaus, K., Erichson, B., Plinke, W., Weiber, R.: Multivariate Analysemethoden. Eine anwendungsorientierte Einführung. In: 13th revised edition. Springer, Berlin, Dordrecht, London, New York (2011)
Bhatta, B.P., Larsen, O.I.: Are intrazonal trips ignorable? Transp. Policy 18(1), 13–22 (2011). https://doi.org/10.1016/j.tranpol.2010.04.004
Bliemel, F.: Theil’s forecast accuracy coefficient. A clarification. J. Mark. Res. X, 444–446 (1973)
Cambridge Systematics Inc.: Travel Model Validation and Reasonableness Checking Manual, 2nd edn. Cambridge Univ Press, Cambridge (2014)
Department for Transport (ed): TAG UNIT M3.1. Highway Assignment Modelling. Transport analysis guidance: WebTAG, M3.1 (2014)
FGSV—Forschungsgesellschaft für Straßen und Verkehrswesen (German Road and Transportation Research Association) (ed): Hinweise zur mikroskopischen Verkehrsflusssimulation. Grundlagen und Anwendungen, vol. 388. FGSV Verlag, Köln (2006)
FGSV—Forschungsgesellschaft für Straßen und Verkehrswesen (German Road and Transportation Research Association) (ed): Empfehlungen für Verkehrserhebungen. EVE, vol. 125. FGSV Verlag, Köln (2012)
Friedrich, M., Schiller, C., Pestel, E., Simon, R., Schimpf, M.: Influencing Factors on the Quality of Macroscopic Travel Demand Models. Einflussgrößen auf die Qualität von makroskopischen Nachfragemodellen im Personenverkehr. DFG research project (FR 2666/31) (2015–2019)
Herz, R., Schlichter, H.G., Siegner, W.: Angewandte Statistik für Verkehrs und Regionalplaner. In: 2nd revised and extended edition, WernerIngenieurTexte, vol 42. Werner, Düsseldorf (1992)
Hyndman, R.J.: The Problem with Sturges’ Rule for Constructing Histograms (1995). https://robjhyndman.com/papers/sturges.pdf
Paluš, M.: Testing for nonlinearity using redundancies: quantitative and qualitative aspects. Physica D 1995(80), 186–205 (1995)
Rieser, N., Tasnády, B., Friedrich, M., Pestel, E., Vries, N. de, Rothenfluh, M., Fischer, R.: Quality assurance for transport models and their applications. Qualitätssicherung von Verkehrsmodellberechnungen; SVI research project (2015/001). (2018)
Sammer, G., Gruber, C., Roeschel, G., Tomschy, R., Herry, M.: The dilemma of systematic underreporting of travel behavior when conducting travel diary surveys—a metaanalysis and methodological considerations to solve the problem. Transp. Res. Procedia 32, 649–658 (2018). https://doi.org/10.1016/j.trpro.2018.10.006
Sammer, G., Röschel, G., Gruber, C.: Qualitätssicherung für die Anwendung von Verkehrsnachfragemodellen und Verkehrsprognosen. Quality Management and Model Validation for Application of Transport Demand Modelling and Forecast. Straßenforschung, Heft 604. Österreichische Forschungsgesellschaft Straße, Schiene, Verkehr, Wien (2012)
Vandeput, N: Forecast KPI: RMSE, MAE, MAPE and Bias (2019). https://medium.com/analyticsvidhya/forecastkpirmsemaemapebiascdc5703d242d. Accessed 06 Nov 2019
Vortisch, P.: Modellunterstützte Messwertpropagierung zur Verkehrslageschätzung in Stadtstraßennetzen. Dissertation. Scientific series, University of Karlsruhe (TH), vol. 64, Karlsruhe (2006)
Acknowledgements
Open Access funding provided by Projekt DEAL. This paper was developed as part of the research project „Influencing factors on the quality of macroscopic travel demand models“ (Einflussgrößen auf die Qualität von makroskopischen Nachfragemodellen im Personenverkehr) (Friedrich et al. 2015–2019) commissioned by the German Research Foundation (DFG). The classification methodology described was tested on three different macroscopic travel demand models in the research project „Quality assurance for transport model calculations“ (Rieser et al. 2015) commissioned by the Swiss Association of Transportation Engineers and Experts (SVI). Special thanks to PD Dr. Christian Schiller and Robert Simon from the Technical University of Dresden, Dr. Nadine Rieser and Bence Tasnády from EBP Schweiz AG, Dr. Juliane Pillat from PTV AG as well as Prof. Dr. Markus Friedrich from the University of Stuttgart. The indepth discussions within the scope of the abovementioned research projects have greatly enriched this paper. Last but not least, I would like to thank the reviewers for their comments and efforts towards improving the paper.
Author information
Authors and Affiliations
Contributions
The author Eric Pestel devised the conceptual idea as well as the formal definitions of the presented methodology and he wrote the manuscript including the presented examples.
Corresponding author
Ethics declarations
Conflict of interest
The corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Calculation specification and example of the equiquantile classification method
Initially, the indicator values v_{n} are sorted in ascending order. The demand values w_{n} are sorted according to the indicator sorting. Formula (1) is used to calculate the unweighted quantiles q_{n}.
Formula (1) is then extended by the demand values (weight) w_{n} to formula (2):
The desired (upper) class boundaries of the indicators v_{q} are the result of a linear interpolation between the relevant weighted discrete quantiles q_{w,n}. The demand is allocated to the resulting classes. This leads to the classified equiquantile demand w_{q}.
Annotations:

n: nth element of the distribution

N: population size (number of individual values)

q_{n}: nth quantile

q_{w,n}: nth weighted quantile

v_{n}: nth indicator

v_{q}: upper indicator class boundaries of the qthquantile

w_{n}: nth demand element (weight)

w_{cum,n}: cumulated weight of the nth element

w_{q}: equiquantile demand within the qthquantile
The following example with 20 ODpairs illustrates the calculation procedure. Please note that with such a small sample size it is not possible to distribute the demand completely and evenly across all classes, but that this effect is reduced as the sample size increases.

Input variables:

Calculation:
column index →  1  2  3  4  5  6 

row index n  sorted indicators v_{n}  sorted Demand (weight) w_{n}  0.5 * col. 2  col. 2 cumulated  col. 4–col. 3  col. 5 / sum(col. 2) = weighted quantile q_{w,n} 
1  1.0  562.5  281.3  562.5  281.3  0.033 
2  3.0  196.6  98.3  759.1  660.8  0.078 
3  7.0  90.3  45.2  849.4  804.3  0.095 
4  15.0  846.6  423.3  1696.0  1272.7  0.151 
5  17.0  841.8  420.9  2537.8  2116.9  0.251 
6  20.0  223.6  111.8  2761.4  2649.6  0.314 
7  21.0  403.7  201.9  3165.1  2963.3  0.351 
8  30.0  220.5  110.3  3385.6  3275.4  0.388 
9  34.0  43.5  21.8  3429.1  3407.4  0.404 
10  35.0  268.5  134.3  3697.6  3563.4  0.422 
11  37.0  506.5  253.3  4204.1  3950.9  0.468 
12  43.0  814.0  407.0  5018.1  4611.1  0.546 
13  53.0  34.1  17.1  5052.2  5035.2  0.597 
14  54.0  550.4  275.2  5602.6  5327.4  0.631 
15  62.0  301.6  150.8  5904.2  5753.4  0.682 
16  83.0  846.6  423.3  6750.8  6327.5  0.750 
17  86.0  592.2  296.1  7343.0  7046.9  0.835 
18  90.0  255.2  127.6  7598.2  7470.6  0.885 
19  92.0  627.0  313.5  8225.2  7911.7  0.938 
20  94.0  213.7  106.9  8438.9  8332.1  0.987 
∑  8438.9 

Result:
desired quantile q  upper indicator class boundaries v_{q}  absolute demand per class w_{q} [−]  relative demand per class w_{q} [%] 

0.1  7.7  849.4  10.1 
0.2  16.0  846.6  10.0 
0.3  19.3  841.8  10.0 
0.4  33.0  847.8  10.0 
0.5  39.4  818.5  9.7 
0.6  53.1  848.1  10.0 
0.7  67.6  852.0  10.1 
0.8  84.8  846.6  10.0 
0.9  90.6  847.4  10.0 
1.0  94.0  840.7  10.0 
Applications of the equiquantile classification method
Equiquantile classes for a particular demand segment
Demand segments are a subset of the total demand. Usually demand is segmented by mode, by trip purpose or by the combination of mode and purpose. As each demand segment has a specific distribution, this leads to specific class widths for each segment. A demand segment “pedestrians” will have more short distance classes then a demand segment “car”. For comparing distributions of a particular demand segment in different scenarios, it is therefore recommendable to determine the class widths with the base case demand of that particular demand segment. The classification should be determined once for the reference distribution of the base case and then preserved for all scenarios.
Figure 7 shows an example of an equiquantile trip distance distribution of the mode car for a realism test, in which the number of inhabitants is changed by ± 10% and ± 20%. The base case serves as reference distribution. In the example, it can be seen that in the case of less inhabitants (80% or 90%) the number of trips decreases across all classes. However, the decrease is not equal in all classes. Short distance classes show a lower share of the total demand, longer distance classes a higher share. This can be explained by the relationship between demand and supply, where lower demand leads to travel time reductions, which then increase travel distance. The opposite applies to the case with more inhabitants (110% or 120%).
Equiquantile classes for the total demand
Similar to the evaluation for particular demand segments, it is possible to perform an evaluation of the total demand across all modes to analyse the changes of the total demand in the different distribution classes.
In addition, individual demand segments can also be presented with the classification of the total demand. For example, a distancedependent modal split (see Fig. 8) or the development of individual modes in comparison to the total demand for different scenarios (see Fig. 9) can be displayed.
Equiquantile classes from person distance travelled
As a modification of the equiquantile classification, it is possible to use the combination of demand and trip distance (person distance travelled) as classification indicator. For vehicle distance travelled or person time travelled, an analogous approach has to be used.
This is important in the calibration process, where it is more important to match the person distance travelled than the actual number of trips. An equal number of trips in all distance classes leads to an increasing person distance travelled with increasing distance class, as is shown in Fig. 10 (left). In contrast, Fig. 10 (right) shows a distribution where each class contains the same amount of person distance travelled, while the number of trips is decreasing monotonously with increasing distance classes. In comparison to a “normal” equiquantile distribution, shown in Fig. 10 (left), it becomes obvious that the class width in the shortrange distances increases while the long distance classes are displayed in greater detail.
Calculation specification and explanation of the distribution parameters
The calculation specifications presented refer to classified data—therefore it is irrelevant whether relative or absolute frequencies are used.
Annotations:

\( \bar{m} \): weighted mean across all classes (mean indicator)

\( s_{m} \): weighted standard deviation (standard deviation of the mean indicator)

\( V_{m} \): coefficient of variation

\( \gamma_{m} \): skewness of the distribution

\( x_{k} \): frequency in class k

\( m_{k} \): mean of the class k

\( K \): number of classes

\( N \): population size (number of individual values).
Calculation specification and explanation of the quality indicators
Correlation Coefficient and Coefficient of Determination (adapted for classified datasets)
Annotations:

\( R \): Correlation Coefficient

\( x_{k} \) or \( y_{k} \): frequency of distribution x or y in class k

\( K \): number of classes

\( \bar{x} \) or \( \bar{y} \): mean of dataset x or y.
Mean Absolute Error and Relative Mean Absolute Error (adapted for classified datasets)
Annotations:

\( MAE \): Mean Absolute Error

\( \% MAE \): Relative Mean Absolute Error

\( x_{k} \) or \( y_{k} \): frequency of distribution x or y in class k

\( K \): number of classes
Euclidean Distance (adapted for classified datasets)
Annotations:

\( d \): Euclidean Distance

\( K \): number of classes

\( x_{k} \) or \( y_{k} \): frequency of distribution x or y in class k
Root Mean Squared Error (adapted for classified datasets)
Annotations:

\( RMSE \): Root Mean Squared Error

\( \% RMSE \): Relative Root Mean Squared Error

\( x_{k} \) or \( y_{k} \): frequency of distribution x or y in class k

\( K \): number of classes
Theil’s Forecast Accuracy Coefficient (adapted for classified datasets)
Annotations:

\( U_{1} \): Theil’s Forecast Accuracy Coefficient (old form; not recommended according to Bliemel (1973) and Andres and Spiwoks (2000).

\( U_{2} \): Theil’s Forecast Accuracy Coefficient (new form)

\( U^{M} \): proportion of error from systematic differences in the means

\( U^{S} \): proportion of error from systematic differences in the variances

\( U^{C} \): proportion of error from unsystematic errors

\( K \): number of classes

\( x_{k} \) or \( y_{k} \): frequency of distribution x or y in class k, for \( U_{2} \) explicitly relative frequencies (Bliemel 1973)

\( \bar{x} \) or \( \bar{y} \): mean of distribution x or y across all classes

\( s_{x} \) or \( s_{y} \): standard deviation of distribution x or y

\( R \): Correlation Coefficient
Vortisch’s Indicator of Similarity
The parameters \( \alpha \) and \( \gamma \) can be used to control the influence of shape, position and domain. Vortisch (2006) recommends \( \alpha = \gamma = 0.5 \) by default.
Please note the following special cases:

one of the two distributions is constant: \( R = 0 \)

both distributions are constant: \( R = 1 \)

if \( x_{k} = y_{k} = 0 \), then \( \theta = 1 \) applies

if \( \left {D\left( {x,y} \right)} \right = 0 \), then \( \theta = 0 \) applies
Annotations:

∆: Vortisch’s Indicator of Similarity

θ: similarity of the shape

σ: similarity of the position

R: Correlation Coefficient

K: number of classes

\( x_{k} \) or \( y_{k} \): absolute frequency of distribution x or y in class k

D (x, y): common domain of distribution x and y

D (x) or D (y): domain of distribution x or y

α: influence of shape and position \( \in \left[ {0;1} \right] \)

γ: influence of domain \( \in \left[ {0;1} \right] \)
Coincidence Ratio
where
Annotations:

CR: Coincidence Ratio

K: number of classes

\( p_{k} \) or \( q_{k} \): relative frequency of distribution x or y in class k

\( x_{k} \) or \( y_{k} \): absolute frequency of distribution x or y in class k.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pestel, E. Considerations about the quality assessment of travel time and travel distance distributions in transport modelling: a proposal for a standardized methodology. Transportation 48, 1285–1309 (2021). https://doi.org/10.1007/s1111602010095y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1111602010095y