A dynamic, interpretable, and robust hybrid data analytics system for train movements in large-scale railway networks

  • Luca OnetoEmail author
  • Irene Buselli
  • Alessandro Lulli
  • Renzo Canepa
  • Simone Petralli
  • Davide Anguita
Regular Paper


We investigate the problem of analysing the train movements in large-scale railway networks for the purpose of understanding and predicting their behaviour. We focus on different important aspects: the Running Time of a train between two stations, the Dwell Time of a train in a station, the Train Delay, the Penalty Costs associated to a delay, and the Train Overtaking between two trains which are in the wrong relative position on the railway network. Two main approaches exist in the literature to address these problems. One is based on the knowledge of the network and the experience of the operators. The other one is based on the analysis of the historical data about the network with advanced data analytics methods. In this paper, we will propose a hybrid approach in order to address the limitations of the current solutions. In fact, experience-based models are interpretable and robust but not really able to take into account all the factors which influence train movements resulting in low accuracy. From the other side, data-driven models are usually not easy to interpret nor robust to infrequent events and require a representative amount of data which is not always available if the phenomenon under examination changes too fast. Results on real-world data coming from the Italian railway network will show that the proposed solution outperforms both state-of-the-art experience-based and data-driven systems in terms of interpretability, robustness, ability to handle nonrecurring events and changes in the behaviour of the network, and ability to consider complex and exogenous information.


Railway network Train movements Running time Dwell time Train delays Penalty costs Train overtaking Experience-based models Data-driven models Hybrid models 



This research has been supported by the European Union through the projects IN2DREAMS (European Union’s Horizon 2020 research and innovation programme under grant agreement 777596) and In2Rail (European Union’s Horizon 2020 research and innovation programme under grant agreement 635900).


  1. 1.
    Albrecht, T.: Reducing power peaks and energy consumption in rail transit systems by simultaneous train running time control. WIT Trans. State-of-the-Art Sci. Eng. 39, 3–12 (2010)CrossRefGoogle Scholar
  2. 2.
    Anaissi, A., Khoa, N.L.D., Wang, Y.: Automated parameter tuning in one-class support vector machine: an application for damage detection. Int. J. Data Sci. Anal. 6(4), 311–325 (2018)CrossRefGoogle Scholar
  3. 3.
    Badi, H., Fadhel, M., Sabry, S., Jasem, M.: Retraction note to: a survey on human–computer interaction technologies and techniques. Int. J. Data Sci. Anal. 3(2), 149–149 (2017)CrossRefGoogle Scholar
  4. 4.
    Barta, J., Rizzoli, A.E., Salani, M., Gambardella, L.M.: Statistical modelling of delays in a rail freight transportation network. In: Proceedings of the Winter Simulation Conference (2012)Google Scholar
  5. 5.
    Berger, A., Gebhardt, A., Müller-Hannemann, M., Ostrowski, M.: Stochastic delay prediction in large train networks. In: OASIcs-OpenAccess Series in Informatics, vol. 20 (2011)Google Scholar
  6. 6.
    Breiman, L.: Random forest. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  7. 7.
    Brünger, O., Dahlhaus, E.: Railway Timetable and Traffic-Analysis, Modelling, Simulation. Eurail Press, Utrecht (2008)Google Scholar
  8. 8.
    Bryan, J., Weisbrod, G.E., Martland, C.D.: Rail Freight Solutions to Roadway Congestion: Final Report and Guidebook. Transportation Research Board, Washington, DC (2007)Google Scholar
  9. 9.
    Cao, L.: Data science and analytics: a new era. Int. J. Data Sci. Anal. 1(1), 1–2 (2016)CrossRefGoogle Scholar
  10. 10.
    Daamen, W., Goverde, R.M.P., Hansen, I.A.: Non-discriminatory automatic registration of knock-on train delays. Netw. Spat. Econ. 9(1), 47–61 (2009)CrossRefzbMATHGoogle Scholar
  11. 11.
    D’Ariano, A.: Improving Real-Time Train Dispatching: Models, Algorithms and Applications. TRAIL Research School, Netherlands (2008)Google Scholar
  12. 12.
    D’Ariano, A., Albrecht, T., Allan, J., Brebbia, C.A., Rumsey, A.F., Sciutto, G., Sone, S.: Running time re-optimization during real-time timetable perturbations. Timetable Plan. Inf. Qual. 1, 147–156 (2010)Google Scholar
  13. 13.
    D’Ariano, A., Pranzo, M.: An advanced real-time train dispatching system for minimizing the propagation of delays in a dispatching area under severe disturbances. Netw. Spat. Econ. 9(1), 63–84 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    D’Ariano, A., Pranzo, M., Hansen, I.A.: Conflict resolution and train speed coordination for solving real-time timetable perturbations. IEEE Trans. Intell. Transp. Syst. 8(2), 208–222 (2007)CrossRefGoogle Scholar
  15. 15.
    Fang, W., Yang, S., Yao, X.: A survey on problem models and solution approaches to rescheduling in railway networks. IEEE Trans. Intell. Transp. Syst. 16(6), 2997–3016 (2015)CrossRefGoogle Scholar
  16. 16.
    Flier, H., Gelashvili, R., Graffagnino, T., Nunkesser, M.: Mining Railway Delay Dependencies in Large-Scale Real-World Delay Data. Robust and Online Large-Scale Optimization. Springer, Berlin (2009)zbMATHGoogle Scholar
  17. 17.
    Ghofrani, F., He, Q., Goverde, R.M., Liu, X.: Recent applications of big data analytics in railway transportation systems: a survey. Trans. Res. Part C Emerg. Technol. 90, 226–246 (2018)CrossRefGoogle Scholar
  18. 18.
    Goverde, R.M.P.: A delay propagation algorithm for large-scale railway traffic networks. Trans. Res. Part C Emerg. Technol. 18(3), 269–287 (2010)CrossRefGoogle Scholar
  19. 19.
    Goverde, R.M.P., Meng, L.: Advanced monitoring and management information of railway operations. J. Rail Transp. Plan. Manag. 1(2), 69–79 (2011)Google Scholar
  20. 20.
    Hansen, I.A., Goverde, R.M.P., Van Der Meer, D.J.: Online train delay recognition and running time prediction. In: IEEE Conference on Intelligent Transportation Systems, pp. 1783–1788 (2010)Google Scholar
  21. 21.
    Kecman, P., Goverde, R.M.P.: Process mining of train describer event data and automatic conflict identification. Comput. Railw. XIII Comput. Sys. Des. Oper. Railw. Other Transit Syst. 127, 227 (2013)Google Scholar
  22. 22.
    Kecman, P., Goverde, R.M.P.: Online data-driven adaptive prediction of train event times. IEEE Trans. Intell. Transp. Syst. 16(1), 465–474 (2015)CrossRefGoogle Scholar
  23. 23.
    Ko, H., Koseki, T., Miyatake, M.: Application of dynamic programming to the optimization of the running profile of a train. WIT Trans. Built Environ. 74. (2004)
  24. 24.
    Kougka, G., Gounaris, A., Simitsis, A.: The many faces of data-centric workflow optimization: a survey. Int. J. Data Sci. Anal. 6(2), 81–107 (2018)CrossRefGoogle Scholar
  25. 25.
    Lamorgese, L., Mannino, C.: An exact decomposition approach for the real-time train dispatching problem. Oper. Res. 63(1), 48–64 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Lukaszewicz, P.: Energy consumption and running time for trains. Ph.D. thesis, Doctoral thesis. Railway Technology, Department of Vehicle Engineering, Royal Institute of Technology, Stockholm (2001)Google Scholar
  27. 27.
    Lulli, A., Oneto, L., Canepa, R., Petralli, S., Anguita, D.: Large-scale railway networks train movements: a dynamic, interpretable, and robust hybrid data analytics system. In: IEEE International Conference on Data Science and Advanced Analytics (2018)Google Scholar
  28. 28.
    Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P.: Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 56, 251–262 (2015)CrossRefGoogle Scholar
  29. 29.
    Marquez, F.P.G., Lewis, R.W., Tobias, A.M., Roberts, C.: Life cycle costs for railway condition monitoring. Transp. Res. Part E Logist. Transp. Rev. 44(6), 1175–1187 (2008)CrossRefGoogle Scholar
  30. 30.
    Milinković, S., Marković, M., Vesković, S., Ivić, M., Pavlović, N.: A fuzzy petri net model to estimate train delays. Simul. Model. Pract. Theory. 33, 144–157 (2013)CrossRefGoogle Scholar
  31. 31.
    Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series forecasting. Int. J. Data Sci. Anal. 3(3), 161–181 (2017)CrossRefGoogle Scholar
  32. 32.
    Nowakowski, T.: Analysis of modern trends of logistics technology development. Arch. Civ. Mech. Eng. 11(3), 699–706 (2011)CrossRefGoogle Scholar
  33. 33.
    Oneto, L.: Model selection and error estimation without the agonizing pain. WIREs Data Min. Knowl. Discov. 8(4), e1252 (2018)CrossRefGoogle Scholar
  34. 34.
    Oneto, L., Fumeo, E., Clerico, C., Canepa, R., Papa, F., Dambra, C., Mazzino, N.D.A.: Dynamic delay predictions for large-scale railway networks: deep and shallow extreme learning machines tuned via thresholdout. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2754–2767 (2017)CrossRefGoogle Scholar
  35. 35.
    Oneto, L., Fumeo, E., Clerico, G., Canepa, R., Papa, F., Dambra, C., Mazzino, N., Anguita, D.: Advanced analytics for train delay prediction systems by including exogenous weather data. In: IEEE International Conference on Data Science and Advanced Analytics (2016)Google Scholar
  36. 36.
    Regione, L.: Weather Data of Regione Liguria. (2018). Accessed 14 Jan 2019
  37. 37.
    Regione, L.: Weather Data of Regione Lombardia. (2018). Accessed 14 Jan 2019
  38. 38.
    Regione, L.: Weather Data of Regione Piemonte. (2018). Accessed 14 Jan 2019
  39. 39.
    Restel, F.: The Markov reliability and safety model of the railway transportation system. In: Safety and Reliability: Methodology and Applications-Proceedings of the European Safety and Reliability Conference (2014)Google Scholar
  40. 40.
    Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3), 145–164 (2016)CrossRefGoogle Scholar
  41. 41.
    Trabo, I., Landex, A., Nielsen, O.A., Schneider-Tilli, J.E.: Cost benchmarking of railway projects in Europe—can it help to reduce costs? In: International Seminar on Railway Operations Modelling and Analysis-RailCopenhagen (2013)Google Scholar
  42. 42.
    Tsai, T.H., Lee, C.K., Wei, C.H.: Neural network based temporal feature models for short-term railway passenger demand forecasting. Exp. Syst. Appl. 36(2), 3728–3736 (2009)CrossRefGoogle Scholar
  43. 43.
    Wang, R., Work, D.B.: Data driven approaches for passenger train delay estimation. In: IEEE Conference on Intelligent Transportation Systems, pp. 535–540 (2015)Google Scholar
  44. 44.
    Weihs, C., Ickstadt, K.: Data science: the impact of statistics. Int. J. Data Sci. Anal. 6(3), 189–194 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.DIBRISUniversity of GenovaGenoaItaly
  2. 2.Rete Ferroviaria Italiana S.p.A.GenoaItaly

Personalised recommendations