Abstract
The estimation of the number of passengers with an identical journey is a common problem for public transport authorities. This problem is also known as the origin–destination estimation (OD) problem and it has been widely studied for the past 30 years. However, theory is missing when observations are not limited to the passenger counts but also include station surveys. Our aim is to provide a solid framework for the estimation of an OD matrix when only a portion of the journey counts are observable. Our method consists of a statistical estimation technique for OD matrix when we have the sum-of-row counts and survey-based observations. Our technique differs from the previous studies in that it does not need a prior OD matrix which can be hard to obtain. Instead, we model the passengers behavior through the survey data, and use the diagonalization of the partial OD matrix to reduce the space parameter and derive a consistent global OD matrix estimator. We demonstrate the robustness of our estimator and apply it to several examples showcasing the proposed models and approach. We highlight how other sources of data can be incorporated in the model such as explanatory variables, e.g. rainfall, indicator variables for major events, etc, and inference made in a principled, non-heuristic way.





Notes
This is a misuse of notation stating that every element in \({\mathcal {X}}_{C}\) is supposed to be drawn with a Poisson distribution whose parameter belongs to the matrix \(C\).
References
Anderson T (1963) Asymptotic theory for principal component analysis. Ann Math Stat 34:122–148
Anderson T (1987) The asymptotic distribution of characteristic roots and vectors in multivariate components of variance. Econometric workshop on probability and statistics
Bera S, Rao K (2011) Estimation of origin–destination matrix from traffic counts: the state of the art. Eur Transport 49:3–23
Bierlaire M (1995) Meuse: an origin–destination matrix estimator that exploits structure. Transp Res Part B Methodol 29(1):47–60
Carvalho L (2010) A Bayesian statistical approach for inference on static origin–destination matrices, pp 1–29. arXiv:1012.1047
Chriqui C, Robillard P (1975) Common bus lines. Transp Sci 9(2):115–121
Dai Q, Luan K (2012) Application research of OD matrix estimation method based on road traffic flow. World Automation Congress (WAC)
Kostakos V (2008) Towards sustainable transport: wireless detection of passenger trips on public transport buses, pp 1–14. arXiv:0806.0874
Liu C, Hu D (2011) A forecasting method of trip generation based on land classification combined with OD matrix estimation. Advanced Forum on Transportation of China, pp 1–4
Lo HP, Zhang N, Lam WHK (1996) Estimation of an origin–destination matrix with random link choice proportions: a statistical approach. Transp Res B 30(4):309–324
Lundgren JT, Peterson A (2008) A heuristic for the bilevel origin–destination matrix estimation problem. Transp Res Part B Methodol 42(4):339–354
Nandi A, Bhattacharya K, Manna S (2009) An optimal network for passenger traffic. Stat Mech Phys A
Tamin OZ, Willumsen LG (1989) Transport demand model estimation from traffic counts. Transportation 16(1):3–26
Wilson A (2010) Entropy in urban and regional modelling: retrospect and prospect. Geogr Anal 42(4):364–394
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Calculation in case of Poisson distribution
To make it clear, we will explore a simple density example, where the Poisson distribution will be used instead of the Negative Binomial. Therefore, the pdf can be expressed as,
where \(\mu \) is the parameter we are interested in. Then, if we make the assumption that the number of passengers in every station are independent, each \(\tilde{y}_{D}^{i}\) is distributed according to a Poisson distribution, with the parameter \(\sum _{j} r_{ij}\), which can be re-written according to Eq. 17,
where the \(\lambda _{i}\) are the eigenvalues, and the \(p_{ij}\) the element of the matrix P. If we denote \(s_{k} = \sum _{j} p_{jk}\), the transformed density can then be written,
Therefore, the log-likelihood can be expressed as follows,
The maximum likelihood estimation is then equivalent to solve the following system,
still under the constraint C1 and C2. C3 is excluded because this set of parameter doesn’t exist in the Poisson modelling. If \(n = 1\), \({\mathbf{C1}} \Leftrightarrow \lambda \ge 0\) and C2 doesn’t stand. Then estimated value corresponds to the classical one-dimensional Poisson unbiased mean estimator \(\hat{\lambda } = \bar{y}\). The system of equations 40 seems at first a quite complicated one. Nevertheless, it can be simplify so as to become,
where \(\mu _{j} = \sum _{k} \lambda _{k} p_{jk} s_{k}\) contains the unknown parameters. Then, if we denote \(U = (1/\mu _{j})_{j}\), then,
where \(\bar{Y}_{d} = diag(\bar{y}_{i})_{i}\) and \(S = (s_{i})_{i}\). Then, we can keep simplifying the expression,
Finally, the same reasoning leads to the following estimator,
where \(S_{d} = diag(s_{i})_{i}\). This estimator will probably not be the best estimator given that it relies on the inversion of \(\hat{U}\), but has the advantage to be asymptotically unbiased, with variance decreasing to zero.
1.1 Properties of the calculated estimators
Let \(f\) be a probability density function. If \(f_{\Lambda }\) denotes the pdf of \(\Lambda \), and \(f_{P}\) the pdf of \(\tilde{P}\), we can write,
Let us consider the estimator presented in Eq. 23, and make the assumption that we are in a large value case, meaning \({\mathcal {P}} (.) \sim {\mathcal {N}}(.,.)\). Then,
where,
and \(\tilde{P}\) is an estimation of \(P\) according to the first observations.
Appendix 2
To prove the convergence in probability, we need to demonstrate that,
where \(n = \min (N_{1},N_{2})\). Starting with the left hand side, we have,
And we know that \(\bar{Y} \mathop{\rightarrow}\limits_{}^{a.s.} P S_{d} \Lambda \). Then,
and we have,
where \(f_{n}()\) stands for the probability density function of \(\vert (\tilde{S_{d}}^{-1} {}^{t} \tilde{P} - S_{d}^{-1} {}^{t} P)\bar{Y} \vert \).
The first integral decreases towards \(0\) as \(n\) grows to infinity according to Eq. 50. The argument for the second integral is the following. According to the assumption of strong convergence of \(\tilde{P}\), \(f_{n}()\) converge towards the Dirac function \(\delta _0()\) as \(N_{1}\) goes to infinity. \(\epsilon \) being strictly positive, this ends the proof. □
Appendix 3: Calculation in case of Poisson regression (and log link function)
The beginning of the reasoning is similar to the previous one. Then, if we assume that exogenous variables have impacts on the number of passengers, we can write,
where \(\beta \) are symmetric matrices reflecting the intercept (\(\beta _{0}\)) for baseline commuter flows and the variable influences (\(\beta _{m}\)) for changes in commuter flows from known daily influences. Moreover, we assume that the same diagonalization (meaning with the same eigenvectors) can be applied, which lead us to,
Therefore, \(\tilde{y}_{i}\) will be distributed according to a Poisson distribution with the following parameter,
where the parameters to be estimated are \((d_{k}^{0})_{k},(d_{k}^{m})_{k,m}\), which means we have to estimate \(n \times (r+1)\) parameters.
The probability of one observation can then be written,
which gives the following log-likelihood,
Therefore, to obtain the final system of equation, we need to calculate the derivatives of the log-likelihood with respect to each parameter \(d_{k}^{m}\).
Rights and permissions
About this article
Cite this article
Ickowicz, A., Sparks, R. Estimation of an origin/destination matrix: application to a ferry transport data. Public Transp 7, 235–258 (2015). https://doi.org/10.1007/s12469-015-0102-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12469-015-0102-y