Automated Setting of Bus Schedule Coverage Using Unsupervised Machine Learning

  • Jihed Khiari
  • Luis Moreira-MatiasEmail author
  • Vitor Cerqueira
  • Oded Cats
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9651)


The efficiency of Public Transportation (PT) Networks is a major goal of any urban area authority. Advances on both location and communication devices drastically increased the availability of the data generated by their operations. Adequate Machine Learning methods can thus be applied to identify patterns useful to improve the Schedule Plan. In this paper, the authors propose a fully automated learning framework to determine the best Schedule Coverage to be assigned to a given PT network based on Automatic Vehicle location (AVL) and Automatic Passenger Counting (APC) data. We formulate this problem as a clustering one, where the best number of clusters is selected through an ad-hoc metric. This metric takes into account multiple domain constraints, computed using Sequence Mining and Probabilistic Reasoning. A case study from a large operator in Sweden was selected to validate our methodology. Experimental results suggest necessary changes on the Schedule coverage. Moreover, an impact study was conducted through a large-scale simulation over the affected time period. Its results uncovered potential improvements of the schedule reliability on a large scale.


Unsupervised learning Public transportation Big data Schedule plan Schedule coverage Sequence mining Probabilistic reasoning 



This work was also supported by the European Commission under TEAM, a large scale integrated project part of the Seventh Framework Programme for research, technological development and demonstration [Grant Agreement No. 318621]. The authors would like to thank all partners within TEAM for their cooperation and valuable contribution.


  1. 1.
    Moreira-Matias, L., Mendes-Moreira, J., Freire de Sousa, J., Gama, J.: Improving mass transit operations by using avl-based systems: a survey. IEEE Trans. Intell. Transp. Syst. 16(4), 1636–1653 (2015)CrossRefGoogle Scholar
  2. 2.
    Mendes-Moreira, J., Moreira-Matias, L., Gama, J., Freire de Sousa, J.: Validating the coverage of bus schedules: a machine learning approach. Inf. Sci. 293, 299–313 (2015)CrossRefGoogle Scholar
  3. 3.
    Mazloumi, E., Mesbah, M., Ceder, A., Moridpour, S., Currie, G.: Efficient transit schedule design of timing points: A comparison of ant colony and genetic algorithms. Transp. Res. Part B: Methodol. 46(1), 217–234 (2012)CrossRefGoogle Scholar
  4. 4.
    Cats, O., Mach Rufi, F., Koutsopoulos, H.: Optimizing the number and location of time point stops. Public Transp. 6(3), 215–235 (2014)CrossRefGoogle Scholar
  5. 5.
    Jorge, A.M., Mendes-Moreira, J., de Sousa, J.F., Soares, C., Azevedo, P.J.: Finding interesting contexts for explaining deviations in bus trip duration using distribution rules. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 139–149. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Patnaik, J., Chien, S., Bladikas, A.: Using data mining techniques on apc data to develop effective bus scheduling. J. Syst. Cybern. Inf. 4(1), 86–90 (2006)Google Scholar
  7. 7.
    Pei, J., Han, J., Mortazavi-Asl, N., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICCCN, p. 0215. IEEE (2001)Google Scholar
  8. 8.
    Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Matias, L., Gama, J., Mendes-Moreira, J., Freire de Sousa, J.: Validation of both number and coverage of bus schedules using avl data. In: 13th IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 131–136 (2010)Google Scholar
  10. 10.
    Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Wagner, R., Scholz, S., Decker, R.: The number of clusters in market segmentation. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds.) Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 157–176. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2012). ISBN 3-900051-07-0Google Scholar
  13. 13.
    Fraley, C., Raftery, A., Scrucca, L.: Normal mixture modeling for model-based clustering, classification, and density estimation. Department of Statistics, University of Washington 23, 2012 (2012)Google Scholar
  14. 14.
    Tabei, Y.: An imprementation of prefixspan (prefix-projected sequential pattern mining), August 2015. last access at August 2015
  15. 15.
    Ceder, A.: Urban transit scheduling: framework, review and examples. J. Urban Plann. Dev. 128(4), 225–244 (2002)CrossRefGoogle Scholar
  16. 16.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jihed Khiari
    • 1
  • Luis Moreira-Matias
    • 1
    Email author
  • Vitor Cerqueira
    • 1
  • Oded Cats
    • 2
  1. 1.NEC Laboratories EuropeHeidelbergGermany
  2. 2.Department of Transport and PlanningTU DelftDelftNetherlands

Personalised recommendations