Journal of Zhejiang University SCIENCE C

, Volume 13, Issue 10, pp 750–760 | Cite as

Transit smart card data mining for passenger origin information extraction

  • Xiao-lei Ma
  • Yin-hai Wang
  • Feng Chen
  • Jian-feng Liu


The automated fare collection (AFC) system, also known as the transit smart card (SC) system, has gained more and more popularity among transit agencies worldwide. Compared with the conventional manual fare collection system, an AFC system has its inherent advantages in low labor cost and high efficiency for fare collection and transaction data archival. Although it is possible to collect highly valuable data from transit SC transactions, substantial efforts and methodologies are needed for extracting such data because most AFC systems are not initially designed for data collection. This is true especially for the Beijing AFC system, where a passenger’s boarding stop (origin) on a flat-rate bus is not recorded on the check-in scan. To extract passengers’ origin data from recorded SC transaction information, a Markov chain based Bayesian decision tree algorithm is developed in this study. Using the time invariance property of the Markov chain, the algorithm is further optimized and simplified to have a linear computational complexity. This algorithm is verified with transit vehicles equipped with global positioning system (GPS) data loggers. Our verification results demonstrated that the proposed algorithm is effective in extracting transit passengers’ origin information from SC transactions with a relatively high accuracy. Such transit origin data are highly valuable for transit system planning and route optimization.

Key words

Transit smart card Automated fare collection (AFC) Bayesian decision tree Markov chain Origin inference 

CLC number

U121 TP391 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barry, J.J., Newhouser, R., Rahbee, A., Sayeda, S., 2002. Origin and destination estimation in New York City with automated fare system data. Transp. Res. Rec., 1817: 183–187. [doi:10.3141/1817-24]CrossRefGoogle Scholar
  2. Barry, J.J., Freimer, R., Slavin, H., 2009. Use of entry-only automatic fare collection data to estimate linked transit trips in New York City. Transp. Res. Rec., 2112:53–61. [doi:10.3141/2112-07]CrossRefGoogle Scholar
  3. Bayes, T., Price, R., 1763. An essay towards solving a problem in the doctrine of chances. Phil. Trans. R. Soc. Lond., 53:370–418. [doi:10.1098/rstl.1763.0053]CrossRefGoogle Scholar
  4. BTRC (Beijing Transportation Research Center), 2010a. Beijing Transport Annual Report 2010. Available from (in Chinese) [Accessed on Aug. 23, 2011].
  5. BTRC (Beijing Transportation Research Center), 2010b. Beijing Transportation Smart Card Usage Survey. Research Report, unpublished (in Chinese).Google Scholar
  6. Chu, K.K.A., Chapleau, R., 2008. Enriching archived smart card transaction data for transit demand modeling. Transp. Res. Rec., 2063:63–72. [doi:10.3141/2063-08]CrossRefGoogle Scholar
  7. Cooper, G.F., 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell., 42(2-3):393–405. [doi:10.1016/0004-3702(90)900 60-D]zbMATHCrossRefGoogle Scholar
  8. Farzin, J.M., 2008. Constructing an automated bus origin-destination matrix using farecard and global positioning system data in Sao Paulo, Brazil. Transp. Res. Rec., 2072:30–37. [doi:10.3141/2072-04]CrossRefGoogle Scholar
  9. Hofmann, M., Wilson, S., White, P., 2009. Automated Identification of Linked Trips at Trip Level Using Electronic Fare Collection Data. 88th Annual Meeting of Transportation Research Board, p.18.Google Scholar
  10. Jang, W., 2010. Travel time and transfer analysis using transit smart card data. Transp. Res. Rec., 2144:142–149. [doi:10.3141/2144-16]CrossRefGoogle Scholar
  11. Janssens, D., Wets, W., Brijs, T., Vanhoof, K., Arentze, T., Timmermans, H., 2006. Integrating Bayesian networks and decision trees in a sequential rule-based transportation model. Eur. J. Oper. Res., 175(1):16–34. [doi:10. 1016/j.ejor.2005.03.022]zbMATHCrossRefGoogle Scholar
  12. Li, B., 2009. Markov models for Bayesian analysis about transit route origin-destination matrices. Transp. Res. Part B, 43(3):301–310. [doi:10.1016/j.trb.2008.07.001]CrossRefGoogle Scholar
  13. Nassir, N., Khani, A., Lee, S.G., Noh, H., Hickman, M., 2011. Transit stop-level origin-destination estimation through use of transit schedule and automated data collection system. Transp. Res. Rec., 2263:140–150. [doi:10.3141/2263-16]CrossRefGoogle Scholar
  14. Pelletier, M.P., Trépanier, M., Morency, C., 2011. Smart card data use in public transit. Transp. Res. Part C, 19(4):557–568. [doi:10.1016/j.trc.2010.12.003]CrossRefGoogle Scholar
  15. Rahbee, A.B., 2009. Farecard passenger flow model at Chicago transit authority, Illinois. Transp. Res. Rec., 2072: 3–9. [doi:10.3141/2072-01]CrossRefGoogle Scholar
  16. Reddy, A., Lu, A., Kumar, S., Bashmakov, V., Rudenko, S., 2009. Entry-only automated fare collection (AFC) system data used to infer ridership, rider destinations, unlinked trips, and passenger miles. Transp. Res. Rec., 2110:128–136. [doi:10.3141/2110-16]CrossRefGoogle Scholar
  17. Trépanier, M., Tranchant, N., Chapleau, R., 2007. Individual trip destination estimation in a transit smart card automated fare collection system. J. Intell. Transp. Syst., 11(1):1–14. [doi:10.1080/15472450601122256]Google Scholar
  18. Trépanier, M., Morency, C., Agard, B., 2009. Calculation of transit performance measures using smartcard data. J. Publ. Transp., 12(1):79–96.Google Scholar
  19. US Energy Information Administration, 2007. International Energy Outlook 2007. Available from [Accessed on Feb. 23, 2010].
  20. Zhang, L., Zhao, S., Zhu, Y., Zhu, Z., 2007. Study on the Method of Constructing Bus Stops OD Matrix Based on IC Card Data. Int. Conf. on Wireless Communications, Networking and Mobile Computing, p.3147–3150. [doi:10.1109/WICOM.2007.780]Google Scholar
  21. Zhang, Y.F., 2002. Programming on OD Matrix Estimation—Application in New York City Mass Transit System. Proc. 3rd Int. Conf. on Traffic and Transportation Studies, p.786–792. [doi:10.1061/40630(255)110]Google Scholar
  22. Zhao, J., Rahbee, A., Wilson, N.H.M., 2007. Estimating a rail passenger trip origin-destination matrix using automatic data collection systems. Comput.-Aided Civ. Infr. Eng., 22(5):376–387. [doi:10.1111/j.1467-8667.2007.00494.x]CrossRefGoogle Scholar

Copyright information

© Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Xiao-lei Ma
    • 1
  • Yin-hai Wang
    • 1
  • Feng Chen
    • 2
  • Jian-feng Liu
    • 2
  1. 1.Department of Civil and Environmental EngineeringUniversity of WashingtonSeattleUSA
  2. 2.Beijing Transportation Research CenterBeijingChina

Personalised recommendations