Precise estimation of connections of metro passengers from Smart Card data


The aim of this study is to estimate both the physical and schedule-based connections of metro passengers from their entry and exit times at the gates and the stations, a data set available from Smart Card transactions in a majority of train networks. By examining the Smart Card data, we will observe a set of transit behaviors of metro passengers, which is manifested by the time intervals that identifies the boarding, transferring, or alighting train at a station. The authenticity of the time intervals is ensured by separating a set of passengers whose trip has a unique connection that is predominantly better by all respects than any alternative connection. Since the connections of such passengers, known as reference passengers, can be readily determined and hence their gate times and stations can be used to derive reliable time intervals. To detect an unknown path of a passenger, the proposed method checks, for each alternative connection, if it admits a sequence of boarding, middle train(s), and alighting trains, whose time intervals are all consistent with the gate times and stations of the passenger, a necessary condition of a true connection. Tested on weekly 32 million trips, the proposed method detected unique connections satisfying the necessary condition, which are, therefore, most likely true physical and schedule-based connections in 92.6 and 83.4 %, respectively, of the cases.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. Asakura, Y., Iryo, T., Nakajima, Y., Kusakabe, T., Takagi, Y., Kashiwadani, M.: Behavioural analysis of railway passengers using smart card data. In: Proceedings of the Urban Transport, pp. 599–608. Malta (2008)

  2. Bagchi, M., White, P.R.: The potential of public transport smart card data. Transp. Policy 12(5), 464–474 (2005)

    Article  Google Scholar 

  3. Bureau of Public Roads: Traffic Assignment Manual. U.S, Department of Commerce (1964)

  4. Cox, T., Houdmont, J., Griffiths, A.: Rail passenger crowding, stress, health and safety in Britain. Transp. Res. Part A 40, 244–258 (2006)

    Google Scholar 

  5. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)

    Article  Google Scholar 

  6. De Cea, J., Fernandez, J.E.: Transit assignment for congested public tranport system: an equilibrium model. Transp. Sci. 27(2), 133–147 (1993)

    Article  Google Scholar 

  7. Einmahl, J.H.J., Smeets, S.G.W.R.: Ultimate 100 m world records through through extreme-value theory. Stat. Neerl. 65(1), 32–42 (2011)

    Article  Google Scholar 

  8. Fu, Q., Liu, R., Hess, S.: A bayesian modelling framework for individual passenger’s probabilistic route choices: a case study on the London underground. In: 93rd Transportation Research Board (TRB) Annual Meeting (2014)

  9. Guo, Z., Wilson, N.: Transfer behavior and transfer planning in public transport systems: a case of the London underground. In: Proceedings of the 11th International Conference on Advanced Systems for Public Transport, Hong Kong (2009)

  10. Jang, W.: Travel time and transfer analysis using transit smart card data. Transp. Res. Rec. 2144, 142–149 (2010)

    Article  Google Scholar 

  11. Kato, H., Kaneko, Y., Inoue, M.: Comparative analysis of transit assignment: evidence from urban railway system in the Tokyo metropolitan area. Transportation 37, 775–799 (2010)

    Article  Google Scholar 

  12. Ko, S.-J., Kim, K.M., Hong, S.-P.: Estimation of transfer times and alighting times of the metro passengers in Seoul metropolitan area. Working paper

  13. Kusakabe, T., Iryo, T., Asakura, Y.: Estimation method for railway passengers’ train choice behaviour with smart card transaction data. Transportation 37, 731–749 (2010)

    Article  Google Scholar 

  14. Lam, W.H.K., Lo, H.K.: Traffic assignment methods. In: Hensher, D.A., Button, K.J., Haynes, K.E., Stopher, P.R. (eds.) Handbook of Transport Geography and Spatial Systems, pp. 609–625 (2004)

  15. Lehtonen, M., Rosenberg, M., Rasanen, J., Sirkia, A.: Utilization of the smart card payment system (scps) data in public tranport planning and statistics. In: Proceedings of the 9th World Congress on Intelligent Transport Systems, Chicago, Illinois, 14–17 October 2002

  16. Morency, C., Trépanier, M., Agard, B.: Measuring transit use variability with smart-card data. Transp. Policy 14(3), 193–203 (2007)

    Article  Google Scholar 

  17. Nielsen, O.A.: A stochastic transit assignment model considering differences in passengers utility functions. Transp. Res. Part B 34(5), 377–402 (2000)

    Article  Google Scholar 

  18. Nour, A., Casello, J.M., Hellinga, B.: Anxiety-based formulation to estimate generalized cost of transit travel time. Transp. Res. Rec. 2143, 108–116 (2010)

    Article  Google Scholar 

  19. Park, J.Y., Kim, D.-J., Lim, Y.: Use of smart card data to define public transit use in Seoul, South Korea. Transp. Res. Rec. 2063, 3–9 (2008)

    Article  Google Scholar 

  20. Pelletier, M.-P., Trèpanier, M., Morency, C.: Smart card data use in public transit: a literature review. Transp. Res. Part C 19, 557–568 (2011)

    Article  Google Scholar 

  21. Raveau, S., Muñoz, J.C., de Grange, L.: A topological route choice model for metro. Transp. Res. Part A 45, 138–147 (2011)

    Google Scholar 

  22. Rinks, D.B.: Revenue allocation methods for integrated transit systems. Transp. Res. Part A 20(1), 39–50 (1986)

    Article  Google Scholar 

  23. Seaborn, C.: Application of smart card fare payment data to bus network planning in London. UK. MS thesis, Massachusetts Institute of Technology, Cambridge (2008)

  24. Seaborn, C., Attanucci, J., Wilson, N.: Analyzing multimodal public transport journeys in London with smart card fare payment data. Transp. Res. Rec. 2121, 55–62 (2009)

    Article  Google Scholar 

  25. Shin, S.G., Cho, Y., Lee, C.: Integrated transit service evaluation methodologies using transportation card data (In Korean). Technical Report 2007-R-09, Seoul Development Institute (2007)

  26. Trépanier, M., Tranchant, N., Chapleau, R.: Individual trip destination estimation in a transit smart card automated fare collection system. J. Intell. Transp. Syst. 11(1), 1–14 (2007)

    Article  Google Scholar 

  27. Tsamboulas, D.A., Antoniou, C.: Allocating revenues to public transit operators under an integrated fare system. Transp. Res. Rec. 1986, 29–37 (2006)

    Article  Google Scholar 

  28. Utsunomiya, M., Attanuchi, J., Wilson, N.H.: Potential uses of transit smart card registration and transaction data to improve transit planning. Transp. Res. Rec. 1971, 119–126 (2006)

    Article  Google Scholar 

  29. Weidmann, U., Orth, H., Dorbritz, R.: Development of measurement system for public transport performance. Transp. Res. Rec. 2274, 135–143 (2012)

    Article  Google Scholar 

  30. Zhou, F., Xu, R.-H.: Model of passenger flow assignment for urban rail transit based on entry and exit time constraints. J. Transp. Res. Board 2284, 57–61 (2012)

    Article  Google Scholar 

Download references


This research was supported in part by Basic Science Research Program (2014R1A2A1A11049663) through the National Research Foundation of Korea (NRF), and by the BK21 Plus Program(Center for Sustainable and Innovative Industrial Systems) funded by the Ministry of Education, Korea.

Author information



Corresponding author

Correspondence to Kyung Min Kim.



Probability estimation of schedule-based connections

Suppose the current physical connection requires a single transfer, say, at Station \(A\). The schedule-based connections on a physical connection can be represented by a time-expanded network as in Fig. 11.

Fig. 11

Two schedule-based connections can be consistent

The consistency check is initiated by finding consistent trains at both \(O\) and \(D\). By this assumption, there can be at most two trains, say \(X_1\) and \(X_2\), at \(O\), whose time intervals contain the entry time, while at most one train, say \(Y\), can be consistent with the exit time at \(D\). If there are no such trains at either \(O\) or \(D\), the passenger did not use the physical connection.

If neither \(X_1\) and \(X_2\) can be connected to \(Y\), in the sense that there is no relevant transfer reference passenger, we conclude that the passenger did not use the physical connection.

If there is only one such train, say \(X_1\), whose connection to \(Y\) can be verified by transfer reference passengers, then the schedule-based connection, \(X_1-Y\) is confirmed as the unique connection of the passenger.

Finally, if there are two trains, say \(X_1\) and \(X_2\), from both of which we can find transfer reference passengers to \(Y\) as in Fig. 11, we need to return both \(X_1-Y\) and \(X_2-Y\). It is a worst case in that the maximum number of schedule-based connections are confirmed as consistent connections.

The estimation, however, can be refined by a probability distribution over the two connections. In Fig. 11, we introduce some notations as follows:

  • \(p\): The fraction of the boarding reference passengers from the overlap of the two time intervals that boarded train \(X_1\)

  • \(1-p\): The fraction of the boarding reference passengers from the overlap of the two time intervals that boarded train not \(X_1\) but \(X_2\)

  • \(1-q_1\): The fraction of the transfer reference passenger from \(X_1\) to \(Y\)

  • \(q_2\): The fraction of the transfer reference passenger from \(X_2\) to \(Y\)

It is not then difficult to show that

$$\begin{aligned} \begin{array}{rl} \Pr \left\{ {\text{Passenger}} {\text{ chose }}\; X_1-Y\right\} &{}= \frac{p (1-q_1)}{p (1-q_1) + (1-p) q_2},\\ \Pr \left\{ {\text{Passenger}} {\text{ chose} }\; X_2-Y\right\} &{}= \frac{(1-p) q_2}{p (1-q_1) + (1-p) q_2}. \end{array} \end{aligned}$$
Table 6 Numbers and lists of consistent schedule-based connection(s), the corresponding conditions, and the probability distributions for a single-transfer physical connection

Table 6 summarizes the numbers and list of consistent schedule-based connection(s), the corresponding conditions, and the probability distributions. If none of the conditions from Table 6 is satisfied, no schedule-based connection can be consistent with the quadruple of our passenger and hence the physical connection is rejected.

For a physical connection that requires two transfers, there may be up to 3 schedule-based connections consistent with a quadruple if the trip is not abnormally delayed. The previous arguments can be easily extended to such a case.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hong, SP., Min, YH., Park, MJ. et al. Precise estimation of connections of metro passengers from Smart Card data. Transportation 43, 749–769 (2016).

Download citation


  • Physical and schedule-based connection estimation
  • Smart Card data
  • Metro network
  • Passenger’s behaviors