Skip to main content
Log in

Estimation of an origin/destination matrix: application to a ferry transport data

  • Case Study and Application
  • Published:
Public Transport Aims and scope Submit manuscript

Abstract

The estimation of the number of passengers with an identical journey is a common problem for public transport authorities. This problem is also known as the origin–destination estimation (OD) problem and it has been widely studied for the past 30 years. However, theory is missing when observations are not limited to the passenger counts but also include station surveys. Our aim is to provide a solid framework for the estimation of an OD matrix when only a portion of the journey counts are observable. Our method consists of a statistical estimation technique for OD matrix when we have the sum-of-row counts and survey-based observations. Our technique differs from the previous studies in that it does not need a prior OD matrix which can be hard to obtain. Instead, we model the passengers behavior through the survey data, and use the diagonalization of the partial OD matrix to reduce the space parameter and derive a consistent global OD matrix estimator. We demonstrate the robustness of our estimator and apply it to several examples showcasing the proposed models and approach. We highlight how other sources of data can be incorporated in the model such as explanatory variables, e.g. rainfall, indicator variables for major events, etc, and inference made in a principled, non-heuristic way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Finland)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. This is a misuse of notation stating that every element in \({\mathcal {X}}_{C}\) is supposed to be drawn with a Poisson distribution whose parameter belongs to the matrix \(C\).

References

  • Anderson T (1963) Asymptotic theory for principal component analysis. Ann Math Stat 34:122–148

    Article  Google Scholar 

  • Anderson T (1987) The asymptotic distribution of characteristic roots and vectors in multivariate components of variance. Econometric workshop on probability and statistics

  • Bera S, Rao K (2011) Estimation of origin–destination matrix from traffic counts: the state of the art. Eur Transport 49:3–23

    Google Scholar 

  • Bierlaire M (1995) Meuse: an origin–destination matrix estimator that exploits structure. Transp Res Part B Methodol 29(1):47–60

    Article  Google Scholar 

  • Carvalho L (2010) A Bayesian statistical approach for inference on static origin–destination matrices, pp 1–29. arXiv:1012.1047

  • Chriqui C, Robillard P (1975) Common bus lines. Transp Sci 9(2):115–121

    Article  Google Scholar 

  • Dai Q, Luan K (2012) Application research of OD matrix estimation method based on road traffic flow. World Automation Congress (WAC)

  • Kostakos V (2008) Towards sustainable transport: wireless detection of passenger trips on public transport buses, pp 1–14. arXiv:0806.0874

  • Liu C, Hu D (2011) A forecasting method of trip generation based on land classification combined with OD matrix estimation. Advanced Forum on Transportation of China, pp 1–4

  • Lo HP, Zhang N, Lam WHK (1996) Estimation of an origin–destination matrix with random link choice proportions: a statistical approach. Transp Res B 30(4):309–324

    Article  Google Scholar 

  • Lundgren JT, Peterson A (2008) A heuristic for the bilevel origin–destination matrix estimation problem. Transp Res Part B Methodol 42(4):339–354

    Article  Google Scholar 

  • Nandi A, Bhattacharya K, Manna S (2009) An optimal network for passenger traffic. Stat Mech Phys A

  • Tamin OZ, Willumsen LG (1989) Transport demand model estimation from traffic counts. Transportation 16(1):3–26

    Article  Google Scholar 

  • Wilson A (2010) Entropy in urban and regional modelling: retrospect and prospect. Geogr Anal 42(4):364–394

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrien Ickowicz.

Appendices

Appendix 1: Calculation in case of Poisson distribution

To make it clear, we will explore a simple density example, where the Poisson distribution will be used instead of the Negative Binomial. Therefore, the pdf can be expressed as,

$$\begin{aligned} f_{R_s}({\mathcal {X}}_R) = \frac{\mu ^{{\mathcal {X}}_R}}{{\mathcal {X}}_R!} e^{-\mu } \end{aligned}$$
(36)

where \(\mu \) is the parameter we are interested in. Then, if we make the assumption that the number of passengers in every station are independent, each \(\tilde{y}_{D}^{i}\) is distributed according to a Poisson distribution, with the parameter \(\sum _{j} r_{ij}\), which can be re-written according to Eq. 17,

$$\begin{aligned} \tilde{y}_D^i \sim {\mathcal {P}}\left(\sum _k \lambda _k p_{ik} \sum _j p_{jk}\right) \end{aligned}$$
(37)

where the \(\lambda _{i}\) are the eigenvalues, and the \(p_{ij}\) the element of the matrix P. If we denote \(s_{k} = \sum _{j} p_{jk}\), the transformed density can then be written,

$$\begin{aligned} g_{R_z}(\tilde{y}_D) =\frac{(\sum _k \lambda _k p_{ik} s_k)^{\tilde{y}_D}}{\tilde{y}_D!} e^{-\sum _k \lambda _k p_{ik} s_k} \end{aligned}$$
(38)

Therefore, the log-likelihood can be expressed as follows,

$$\begin{aligned} log {\mathcal {L}} = -N \sum _{k=1}^n \lambda _k s_k^2 + \sum _{i=1}^n \ln \left( \sum _{k=1}^n \lambda _k p_{ik} s_k \right) \sum _{l=1}^N \tilde{y}_i^l \end{aligned}$$
(39)

The maximum likelihood estimation is then equivalent to solve the following system,

$$\begin{aligned} \left\{ \begin{array}{lll} -N s_1^2 + \sum _{i=1}^n \frac{p_{i1} s_1}{\sum _{k=1}^n \lambda _k p_{ik} s_k} \sum _{l=1}^N \tilde{y}_i^l &{}=&{} 0\\ \vdots &{}\vdots &{} \\ -N s_n^2 + \sum _{i=1}^n \frac{p_{in} s_n}{\sum _{k=1}^n \lambda _k p_{ik} s_k} \sum _{l=1}^N \tilde{y}_i^l &{}=&{} 0 \end{array} \right. \end{aligned}$$
(40)

still under the constraint C1 and C2. C3 is excluded because this set of parameter doesn’t exist in the Poisson modelling. If \(n = 1\), \({\mathbf{C1}} \Leftrightarrow \lambda \ge 0\) and C2 doesn’t stand. Then estimated value corresponds to the classical one-dimensional Poisson unbiased mean estimator \(\hat{\lambda } = \bar{y}\). The system of equations 40 seems at first a quite complicated one. Nevertheless, it can be simplify so as to become,

$$\begin{aligned} \forall i, \quad -s_i + \sum _{j=1}^n \frac{p_{ji} \bar{y}_j}{\mu _j} = 0 \end{aligned}$$
(41)

where \(\mu _{j} = \sum _{k} \lambda _{k} p_{jk} s_{k}\) contains the unknown parameters. Then, if we denote \(U = (1/\mu _{j})_{j}\), then,

$$\begin{aligned} \hat{U} = \left( \bar{Y}_d P {}^t P \bar{Y}_d \right) ^{-1} \bar{Y} P S \end{aligned}$$
(42)

where \(\bar{Y}_{d} = diag(\bar{y}_{i})_{i}\) and \(S = (s_{i})_{i}\). Then, we can keep simplifying the expression,

$$\begin{aligned} \hat{U}&= \bar{Y}_d^{-1} P S \end{aligned}$$
(43)

Finally, the same reasoning leads to the following estimator,

$$\begin{aligned} \hat{\Lambda } = \left( S_d {}^t P P S_d \right) ^{-1} S_d {}^t P \left( \bar{Y}_d^{-1} P S \right) ^{-1} \end{aligned}$$
(44)

where \(S_{d} = diag(s_{i})_{i}\). This estimator will probably not be the best estimator given that it relies on the inversion of \(\hat{U}\), but has the advantage to be asymptotically unbiased, with variance decreasing to zero.

1.1 Properties of the calculated estimators

Let \(f\) be a probability density function. If \(f_{\Lambda }\) denotes the pdf of \(\Lambda \), and \(f_{P}\) the pdf of \(\tilde{P}\), we can write,

$$\begin{aligned} f_{\Lambda } = \int f_{\Lambda \vert \tilde{P} }(\lambda ) f_{\tilde{P}}(\tilde{p}) d\tilde{p} \end{aligned}$$
(45)

Let us consider the estimator presented in Eq. 23, and make the assumption that we are in a large value case, meaning \({\mathcal {P}} (.) \sim {\mathcal {N}}(.,.)\). Then,

$$\begin{aligned} \Lambda \vert \tilde{P} \sim {\mathcal {N}} \left( m, N^{-1} \Sigma \right) \end{aligned}$$
(46)

where,

$$\begin{aligned} \left\{ \begin{array}{l} m = \tilde{S}_d^{-1} {}^t\tilde{P} P S_d \Lambda \\ \Sigma = \tilde{S}_d^{-1} \left[ D + {}^t \tilde{P} diag(PS_d \Lambda ) \tilde{P} \right] \tilde{S}_d^{-1} \end{array} \right. \end{aligned}$$
(47)

and \(\tilde{P}\) is an estimation of \(P\) according to the first observations.

Appendix 2

To prove the convergence in probability, we need to demonstrate that,

$$\begin{aligned} \lim _n {\mathbb {P}}(\vert \hat{\Lambda } - \Lambda \vert \ge \epsilon ) = 0 \end{aligned}$$
(48)

where \(n = \min (N_{1},N_{2})\). Starting with the left hand side, we have,

$$\begin{aligned} {\mathbb {P}}(\vert \hat{\Lambda } - \Lambda \vert \ge \epsilon )&= {\mathbb {P}}(\vert \tilde{S_d}^{-1} {}^t \tilde{P} \bar{Y} - \Lambda \vert \ge \epsilon ) \nonumber \\&\le {\mathbb {P}}(\vert \tilde{S_d}^{-1} {}^t \tilde{P} \bar{Y} - S_d^{-1} {}^t P \bar{Y} \vert + \vert S_d^{-1} {}^t P \bar{Y} - \Lambda \vert \ge \epsilon ) \nonumber \\&= \int {\mathbb {P}} \left( \vert S_d^{-1} {}^t P \bar{Y} - \Lambda \vert \ge \epsilon - c \right) {\mathbb {P}}(\vert (\tilde{S_d}^{-1} {}^t \tilde{P} - S_d^{-1} {}^t P)\bar{Y} \vert = c) dc \end{aligned}$$
(49)

And we know that \(\bar{Y} \mathop{\rightarrow}\limits_{}^{a.s.} P S_{d} \Lambda \). Then,

$$\begin{aligned} \forall \epsilon > 0, \quad \lim _n {\mathbb {P}}\left( \vert S_d^{-1} {}^t P \bar{Y} - \Lambda \vert \ge \epsilon \right) = 0 \end{aligned}$$
(50)

and we have,

$$\begin{aligned} {\mathbb {P}}(\vert \hat{\Lambda } - \Lambda \vert \ge \epsilon ) \le \int _0^\epsilon {\mathbb {P}} \left( \vert S_d^{-1} {}^t P \bar{Y} - \Lambda \vert \ge \epsilon - c \right) f_n (c)dc + \int _\epsilon ^{+\infty } f_n (c)dc \end{aligned}$$
(51)

where \(f_{n}()\) stands for the probability density function of \(\vert (\tilde{S_{d}}^{-1} {}^{t} \tilde{P} - S_{d}^{-1} {}^{t} P)\bar{Y} \vert \).

The first integral decreases towards \(0\) as \(n\) grows to infinity according to Eq. 50. The argument for the second integral is the following. According to the assumption of strong convergence of \(\tilde{P}\), \(f_{n}()\) converge towards the Dirac function \(\delta _0()\) as \(N_{1}\) goes to infinity. \(\epsilon \) being strictly positive, this ends the proof. □

Appendix 3: Calculation in case of Poisson regression (and log link function)

The beginning of the reasoning is similar to the previous one. Then, if we assume that exogenous variables have impacts on the number of passengers, we can write,

$$\begin{aligned} R = {\varvec{\beta }}_0 + \sum _m {\varvec{\beta }}_m x_m \end{aligned}$$
(52)

where \(\beta \) are symmetric matrices reflecting the intercept (\(\beta _{0}\)) for baseline commuter flows and the variable influences (\(\beta _{m}\)) for changes in commuter flows from known daily influences. Moreover, we assume that the same diagonalization (meaning with the same eigenvectors) can be applied, which lead us to,

$$\begin{aligned} r_{ij} = \underbrace{\sum _k d_k^0 p_{jk} p_{ik}}_{{{\mathrm{fixed}}\,{\mathrm{part}}}} + \underbrace{\sum _k \sum _m d_k^m p_{jk} p_{ik} x_m}_{{{\mathrm{multivariate}}\hbox{-}{\mathrm{time}}\,{\mathrm{varying}}\,{\mathrm{part}}}} \end{aligned}$$
(53)

Therefore, \(\tilde{y}_{i}\) will be distributed according to a Poisson distribution with the following parameter,

$$\begin{aligned} \sum _j r_{ij} = \sum _k d_k^0 p_{ik} s_k + \sum _k \sum _m d_k^m p_{ik} s_k x_m \end{aligned}$$
(54)

where the parameters to be estimated are \((d_{k}^{0})_{k},(d_{k}^{m})_{k,m}\), which means we have to estimate \(n \times (r+1)\) parameters.

The probability of one observation can then be written,

$$\begin{aligned} p(\tilde{y}_i^l \vert x_m^l)&= \frac{ ( \sum _k d_k^0 p_{ik} s_k + \sum _k \sum _m d_k^m p_{ik} s_k x_m^l)^{\tilde{y}_i^l}}{\tilde{y}_i^l ! } \times \nonumber \\&e^{-(\sum _k d_k^0 p_{ik} s_k + \sum _k \sum _m d_k^m p_{ik} s_k x_m^l)} \end{aligned}$$
(55)

which gives the following log-likelihood,

$$\begin{aligned} log {\mathcal {L}}&\propto \sum _i \left( \sum _l \tilde{y}_i^l \log \left( \sum _k d_k^0 p_{ik} s_k + \sum _k \sum _m d_k^m p_{ik} s_k x_m^l \right) \right. \nonumber \\&\left. - \sum _l \left( \sum _k d_k^0 p_{ik} s_k + \sum _k \sum _m d_k^m p_{ik} s_k x_m^l \right) \right) \end{aligned}$$
(56)

Therefore, to obtain the final system of equation, we need to calculate the derivatives of the log-likelihood with respect to each parameter \(d_{k}^{m}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ickowicz, A., Sparks, R. Estimation of an origin/destination matrix: application to a ferry transport data. Public Transp 7, 235–258 (2015). https://doi.org/10.1007/s12469-015-0102-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12469-015-0102-y

Keywords

Navigation