Abstract
In this paper, we propose a method for estimating the timings at which trains arrive and depart from stations using passenger farecard data and knowledge of the network topology. The problem we consider is essential for understanding commuter movement patterns across metro systems at high granular detail in settings where one does not have access to train logs (comprising records of train arrival and departure timings) or when these records are unreliable. Our technique requires as input the timings at which passengers arrive and depart from station—these are easily retrievable from farecard data—and provide as output an estimate of the number of trains running as well as the timings at which each train arrives and departs at each station. Our method relies on two key observations: (1) passengers tend to exit metro stations as soon as they alight and (2) we can reliably conclude that groups of passengers who board at the same stop but alight at different stops were on the same train if their boarding timings have similar distributions. In contrast with prior works, our methodology is stand-alone in that it does not rely on external sources of information such as train schedules and it requires minimal parameter tuning. In addition, because a by-product of our method is that we infer the trains for which passengers board, our techniques can be employed as a pre-processing step for downstream tasks such as inferring passenger route choices. We apply our method to recover train logs using synthetically generated data as well as actual ticketing data of passengers in the Singapore metro network. Experiments on synthetic data show that our method reliably recovers train logs even with moderate levels of overcrowding on train platforms.







Similar content being viewed by others
References
Alsger, A., Assemi, B., Mesbah, M., Ferreira, L.: Validating and improving public transport origin-destination estimation algorithm using smart card fare data. Transp. Res. C Emerg. Technol. 68, 490–506 (2016)
Darling, D.A.: The Kolmogorov-Smirnov, Cramer-von Mises tests. Ann. Math. Stat. 28(4), 823–838 (1957)
Einmahl, J.H.J., Magnus, J.R.: Records in athletics through extreme-value theory. J. Am. Stat. Assoc. 103(484), 1382–1391 (2008)
El-Geneidy, A.M., Horning, J., Krizek, K.J.: Analyzing transit service reliability using detailed data from automatic vehicular locator systems. J. Adv. Transp. 45(1), 66–79 (2011)
Eom, J.K., Choi, M.H., Lee, J.: Evaluation of metro service quality using transit smart card data. In: Transportation Research Board 91st Annual Meeting (2012)
Jin Ki, E., Ji Young, S., Dae-Seop, M.: Analysis of public transit service performance using transit smart card data in Seoul. KSCE J. Civ. Eng. 19(5), 1530–1537 (2015)
Hörcher, D., Graham, D.J., Anderson, R.J.: Crowding cost estimation with large scale smart card and vehicle location data. Transp. Res. B Methodol. 95, 105–125 (2017). https://doi.org/10.1016/j.trb.2016.10.015
Hinneburg, A., Gabriel, H.H.: Denclue 2.0: fast clustering based on kernel density estimation. In: International Symposium on Intelligent Data Analysis, pp. 70–80. Springer (2007)
Hong, S.-P., Min, Y.-H., Park, M.-J., Kyung Min, K., Suk Mun, O.: Precise estimation of connections of metro passengers from Smart Card data. Transportation 43(5), 749–769 (2016)
Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)
Ko, S.J., Kim, K.M., Hong, S.P.: Estimation of transfer times and alighting times of the metro passengers in Seoul metropolitan area. Technical report, Working paper (2015)
Kusakabe, T., Asakura, Y.: Behavioural data mining of transit smart card data: a data fusion approach. Transp. Res. C Emerg. Technol. 46, 179–191 (2014)
Kusakabe, T., Iryo, T., Asakura, Y.: Estimation method for railway passengers’ train choice behavior with smart card transaction data. Transportation 37(5), 731–749 (2010)
Lee, M., Sohn, K.: Inferring the route-use patterns of metro passengers based only on travel-time data within a Bayesian framework using a reversible-jump Markov chain Monte Carlo (MCMC) simulation. Transp. Res. B Methodol. 81, 1–17 (2015)
Legara, E.F., Khoon, L.K., Guang, H.G., Monterola, C.: Mechanism-based model of a mass rapid transit system: a perspective. Int. J. Mod. Phys. Conf. Ser. 36, 1560011 (2015)
Lin, J., Wang, P., Barnum, D.T.: A quality control framework for bus schedule reliability. Transp. Res. E Logist. Transp. Rev. 44(6), 1086–1098 (2008)
Ma, Z., Xing, J., Mesbah, M., Ferreira, L.: Predicting short-term bus passenger demand using a pattern hybrid approach. Transp. Res. C Emerg. Technol. 39, 148–163 (2014)
Manley, E., Zhong, C.: Spatiotemporal variation in travel regularity through transit user profiling. Transportation 45(3), 703–732 (2018)
Min, Y.-H., Ko, S.-J., Kim, K.M., Hong, S.-P.: Mining missing train logs from Smart card data. Transp. Res. C Emerg. Technol. 63, 170–181 (2016)
Nan, H., Erika Fille, L., Kee Khoon, L., Gih Guang, H., Christopher, M.: Impacts of land use and amenities on public transport use, urban planning and design. Land Use Policy 57, 356–367 (2016)
Paul, E.C.: Estimating train passenger load from automated data systems: application to London underground. PhD thesis, Massachusetts Institute of Technology (2010)
Pelletier, M.-P., Trépanier, M., Morency, C.: Smart card data use in public transit: a literature review. Transp. Res. C Emerg. Technol. 19(4), 557–568 (2011)
Muhamad Azfar, R., Vasundhara, J., Hyen Chee, K., Kian Heong, T., Garyee Kee, K., Christopher, M.: Improved estimation of commuter waiting times using headway and commuter boarding information. Phys. A Stat. Mech. Appl. 501, 217–226 (2018)
Rodríguez-Núñez, E., García-Palomares, J.C.: Measuring the vulnerability of public transport networks. J. Transp. Geogr. 35, 50–63 (2014). https://doi.org/10.1016/j.jtrangeo.2014.01.008
Sun, Y., Schonfeld, P.M.: Schedule-based rail transit path-choice estimation using automatic fare collection data. J. Transp. Eng. 142(1), 04015037 (2016). https://doi.org/10.1061/(ASCE)TE.1943-5436.0000812
Sun, L., Lee, D.H., Erath, A., Huang, X.: Using smart card data to extract passenger’s spatio-temporal density and train’s trajectory of MRT system. In: Proceedings of the ACM SIGKDD International Workshop on Urban Computing, UrbComp ’12, pp. 142–148. ACM (2012). https://doi.org/10.1145/2346496.2346519
Sun, L., Lu, Y., Jin, J.G., Lee, D.-H., Axhausen, K.W.: An integrated Bayesian approach for passenger flow assignment in metro networks. Transp. Res. C Emerg. Technol. 52, 116–131 (2015)
Trépanier, M., Morency, C., Agard, B.: Calculation of transit performance measures using smartcard data. J. Public Transp. 12(1), 5 (2009)
van Oort, N., Brands, T., de Romph, E.: Short-term prediction of ridership on public transport with smart card data. Transp. Res. Rec. 2535(1), 105–111 (2015)
Zhu, W., Wang, W., Huang, Z.: Estimating train choices of rail transit passengers with real timetable and automatic fare collection data. J. Adv. Transp. (2017a). https://doi.org/10.1155/2017/5824051
Zhu, Y., Koutsopoulos, H.N., Wilson, N.H.M.: A probabilistic passenger-to-train assignment model based on automated data. Transp. Res. B Methodol. 104, 522–542 (2017b). https://doi.org/10.1016/j.trb.2017.04.012
Acknowledgements
This research is supported by the National Research Foundation, Singapore, and the Land Transport Authority under its Urban Mobility Grand Challenge Programme (Award No UMGC-L005). The views expressed herein are those of the authors and are not necessarily those of the funding agencies.
Author information
Authors and Affiliations
Contributions
Conceptualisation, methodology, HET, DWS, YSS and MAR; Coding, formal analysis and investigation, HET, DWS and YSS; Writing—original draft preparation, HET and DWS; Writing—review and editing, HET, DWS, YSS and MAR; Supervision, project administration and funding acquisition, MAR. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tan, H.E., Soh, D.W., Soh, Y.S. et al. Derivation of train arrival timings through correlations from individual passenger farecard data. Transportation 48, 3181–3205 (2021). https://doi.org/10.1007/s11116-021-10164-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11116-021-10164-w


