Delay Prediction System for Large-Scale Railway Networks Based on Big Data Analytics

  • Luca OnetoEmail author
  • Emanuele Fumeo
  • Giorgio Clerico
  • Renzo Canepa
  • Federico Papa
  • Carlo Dambra
  • Nadia Mazzino
  • Davide Anguita
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 529)


State-of-the-art train delay prediction systems do not exploit historical train movements data collected by the railway information systems, but they rely on static rules built by expert of the railway infrastructure based on classical univariate statistic. The purpose of this paper is to build a data-driven train delay prediction system for large-scale railway networks which exploits the most recent Big Data technologies and learning algorithms. In particular, we propose a fast learning algorithm for predicting train delays based on the Extreme Learning Machine that fully exploits the recent in-memory large-scale data processing technologies. Our system is able to rapidly extract nontrivial information from the large amount of data available in order to make accurate predictions about different future states of the railway network. Results on real world data coming from the Italian railway network show that our proposal is able to improve the current state-of-the-art train delay prediction systems.


Intelligent transportation systems Railway Delay prediction Big data Extreme learning machine Apache spark 


  1. 1.
    Anguita, D., Ghio, A., Oneto, L., Ridella, S.: In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 23(9), 1390–1406 (2012)CrossRefGoogle Scholar
  2. 2.
    Berger, A., Gebhardt, A., Müller-Hannemann, M., Ostrowski, M.: Stochastic delay prediction in large train networks. In: OASIcs-OpenAccess Series in Informatics (2011)Google Scholar
  3. 3.
    Cambria, E., Huang, G.B.: Extreme learning machines. IEEE Intell. Syst. 28(6), 30–59 (2013)CrossRefGoogle Scholar
  4. 4.
    Caruana, R., Lawrence, S., Lee, G.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Neural Information Processing Systems (2001)Google Scholar
  5. 5.
    Cordeau, J.F., Toth, P., Vigo, D.: A survey of optimization models for train routing and scheduling. Transp. Sci. 32(4), 380–404 (1998)CrossRefzbMATHGoogle Scholar
  6. 6.
    Dollevoet, T., Corman, F., D’Ariano, A., Huisman, D.: An iterative optimization framework for delay management and train scheduling. Flex. Serv. Manuf. J. 26(4), 490–515 (2014)CrossRefGoogle Scholar
  7. 7.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)CrossRefzbMATHGoogle Scholar
  8. 8.
    Figueres-Esteban, M., Hughes, P., Van Gulijk, C.: The role of data visualization in railway big data risk analysis. In: European Safety and Reliability Conference (2015)Google Scholar
  9. 9.
    Fumeo, E., Oneto, L., Anguita, D.: Condition based maintenance in railway transportation systems based on big data streaming analysis. In: The INNS Big Data conference (2015)Google Scholar
  10. 10.
    Google: Google Compute Engine (2016). Accessed 3 May 2016
  11. 11.
    Goverde, R.M.P.: A delay propagation algorithm for large-scale railway traffic networks. Transp. Res. Part C: Emerg. Technol. 18(3), 269–287 (2010)CrossRefGoogle Scholar
  12. 12.
    Hansen, I.A., Goverde, R.M.P., Van Der Meer, D.J.: Online train delay recognition and running time prediction. In: IEEE International Conference on Intelligent Transportation Systems (2010)Google Scholar
  13. 13.
    Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)CrossRefzbMATHGoogle Scholar
  14. 14.
    Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17(4), 879–892 (2006)CrossRefGoogle Scholar
  15. 15.
    Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)CrossRefGoogle Scholar
  16. 16.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: IEEE International Joint Conference on Neural Networks (2004)Google Scholar
  17. 17.
    Kecman, P.: Models for predictive railway traffic management (Ph.D. thesis). TU Delft, Delft University of Technology (2014)Google Scholar
  18. 18.
    Kecman, P., Goverde, R.M.P.: Online data-driven adaptive prediction of train event times. IEEE Trans. Intell. Transp. Syst. 16(1), 465–474 (2015)CrossRefGoogle Scholar
  19. 19.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence (1995)Google Scholar
  20. 20.
    Li, H., Parikh, D., He, Q., Qian, B., Li, Z., Fang, D., Hampapur, A.: Improving rail network velocity: a machine learning approach to predictive maintenance. Transp. Res. Part C: Emerg. Technol. 45, 17–26 (2014)CrossRefGoogle Scholar
  21. 21.
    Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(34), 1–7 (2016)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Milinković, S., Marković, M., Vesković, S., Ivić, M., Pavlović, N.: A fuzzy petri net model to estimate train delays. Simul. Model. Prac. Theor. 33, 144–157 (2013)CrossRefGoogle Scholar
  23. 23.
    Morris, C., Easton, J., Roberts, C.: Applications of linked data in the rail domain. In: IEEE International Conference on Big Data (2014)Google Scholar
  24. 24.
    Müller-Hannemann, M., Schnee, M.: Efficient timetable information in the presence of delays. In: Ahuja, R.K., Möhring, R.H., Zaroliagis, C.D. (eds.) Robust and Online Large-Scale Optimization. LNCS, vol. 5868, pp. 249–272. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-05465-5_10 CrossRefGoogle Scholar
  25. 25.
    Núñez, A., Hendriks, J., Li, Z., De Schutter, B., Dollevoet, R.: Facilitating maintenance decisions on the dutch railways using big data: the aba case study. In: IEEE International Conference on Big Data (2014)Google Scholar
  26. 26.
    Oneto, L., Orlandi, I., Anguita, D.: Performance assessment and uncertainty quantification of predictive models for smart manufacturing systems. In: IEEE International Conference on Big Data (Big Data) (2015)Google Scholar
  27. 27.
    Oneto, L., Pilarz, B., Ghio, A., D., A.: Model selection for big data: algorithmic stability and bag of little bootstraps on gpus. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2015)Google Scholar
  28. 28.
    Packard, N.H., Crutchfield, J.P., Farmer, J.D., Shaw, R.S.: Geometry from a time series. Phys. Rev. Lett. 45(9), 712 (1980)CrossRefGoogle Scholar
  29. 29.
    Pongnumkul, S., Pechprasarn, T., Kunaseth, N., Chaipah, K.: Improving arrival time prediction of thailand’s passenger trains using historical travel times. In: International Joint Conference on Computer Science and Software Engineering (2014)Google Scholar
  30. 30.
    Prechelt, L.: Automatic early stopping using cross validation: quantifying the criteria. Neural Netw. 11(4), 761–767 (1998)CrossRefGoogle Scholar
  31. 31.
    Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs mpi/openmp on beowulf. In: The INNS Big Data Conference (2015)Google Scholar
  32. 32.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)Google Scholar
  33. 33.
    Shoro, A.G., Soomro, T.R.: Big data analysis: apache spark perspective. Glob. J. Comput. Sci. Technol. 15(1) (2015)Google Scholar
  34. 34.
    Thaduri, A., Galar, D., Kumar, U.: Railway assets: a potential domain for big data analytics. In: The INNS Big Data conference (2015)Google Scholar
  35. 35.
    Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)CrossRefGoogle Scholar
  36. 36.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Conference on Networked Systems Design and Implementation (2012)Google Scholar
  37. 37.
    Zarembski, A.M.: Some examples of big data in railroad engineering. In: IEEE International Conference on Big Data (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Luca Oneto
    • 1
    Email author
  • Emanuele Fumeo
    • 1
  • Giorgio Clerico
    • 1
  • Renzo Canepa
    • 2
  • Federico Papa
    • 3
  • Carlo Dambra
    • 3
  • Nadia Mazzino
    • 3
  • Davide Anguita
    • 1
  1. 1.DIBRIS - University of GenoaGenovaItaly
  2. 2.Rete Ferroviaria Italiana S.p.A.GenoaItaly
  3. 3.Ansaldo STS S.p.A.GenoaItaly

Personalised recommendations