Soft Computing

, Volume 22, Issue 17, pp 5707–5718 | Cite as

Using machine learning and big data approaches to predict travel time based on historical and real-time data from Taiwan electronic toll collection

  • Shu-Kai S. FanEmail author
  • Chuan-Jun Su
  • Han-Tang Nien
  • Pei-Fang Tsai
  • Chen-Yang Cheng


As the technology in automation and computation advances, traffic data can be easily collected from multiple sources, such as sensors and surveillance cameras. To extract value from the huge volumes of available data requires the capability to process and extract patterns in large datasets. In this paper, a machine learning method embedded within a big data analytics platform is constructed by using random forests method and Apache Hadoop to predict highway travel time based on data collected from highway electronic toll collection in Taiwan. Various prediction models are then developed for highway travel time based on historical and real-time data to provide drivers with estimated and adjusted travel time information.


Big data Random forests Electronic toll collection (ETC) Travel time prediction Apache Hadoop 



This study was partially funded by the Ministry of Science and Technology (Taiwan) Grant: MOST 105-2221-E-027-052 -MY3.

Compliance with ethical standards

Conflict of interest

All the authors of this paper declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Breiman L (2001a) Bagging predictors. Manuf Neth Mach Learn 24:123–140zbMATHGoogle Scholar
  2. Breiman L (2001b) Random forests. Manuf Neth Mach Learn 45:5–32CrossRefzbMATHGoogle Scholar
  3. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19:171–209CrossRefGoogle Scholar
  4. Chen FH, Howard H (2016) An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput 20:1945–1960CrossRefGoogle Scholar
  5. Chien SI-J, Kuchipudi CM (2003) Dynamic travel time prediction with real-time and historic data. J Transp Eng 129(6):608–616CrossRefGoogle Scholar
  6. Cunha J, Silva C, Antunes M (2015) Health Twitter Big Bata Management with Hadoop Framework. Proc Comput Sci 64:425–431CrossRefGoogle Scholar
  7. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  8. Fei X, Lu C-C, Lui K (2011) A bayesian dynamic linear model approach for real-time short-term freeway travel time prediction. Transp Res Part C 19:1306–1318CrossRefGoogle Scholar
  9. Gal G, Mandelbaum A, Schnitzler F, Senderovich A, Weidlich M (2017) Traveling time prediction in scheduled transportation with journey segments. Inf Syst 64:266–280CrossRefGoogle Scholar
  10. Greenhalgh J, Mirmehdi M (2012) Traffic sign recognition using MSER and random forests. In: Proceedings of the \(20{\rm th}\) European signal processing conferenceGoogle Scholar
  11. Harris JR, Grunsky EC (2015) Predictive lithological mapping of Canada’s north using random forest classification applied to geophysical and geochemical data. Comput Geosci 80:9–25CrossRefGoogle Scholar
  12. Innamaa S (2005) Short-term prediction of travel time using neural networks on an interurban highway. Transportation 32:649–669CrossRefGoogle Scholar
  13. Jain E, Jain S (2014) Categorizing Twitter Users on the basis of their interests using Hadoop/Mahout Platform. In: Proceedings of the 9th international conference on industrial and information systemGoogle Scholar
  14. Joshi A, Monnier C, Betke M, Sclaroff S (2017) Comparing random forest approaches to segmenting and classifying gestures. Image Vision Comput 58:86–95CrossRefGoogle Scholar
  15. Kalambe YS, Pratiba D, Shah P (2015) Big data mining tools for unstructured data: a review. Int J Innov Technol Res 3(2):2012–2017Google Scholar
  16. Khosravi A, Mazloumi E, Nahavandi S, Creighton D, van Lint JWC (2011) A genetic algorithm-based method for improving quality of travel time prediction intervals. Transp Res Part C 19:1364–1376CrossRefGoogle Scholar
  17. Li CS, Chen MC (2013) Identifying important variables for predicting travel time of freeway with non-recurrent congestion with neural networks. Neural Comput Appl 23:1611–1629CrossRefGoogle Scholar
  18. Li CS, Chen MC (2014) A data mining based approach for travel time prediction in freeway with non-recurrent congestion. Neurocomputing 133:74–83CrossRefGoogle Scholar
  19. Mistry P, Neagu D, Trundle PR, Vessey JD (2016) Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. Soft Comput 20:2967–2979CrossRefGoogle Scholar
  20. Qiao W, Haghani A, Shao C-F, Lui J (2016) Freeway path travel time prediction based on heterogeneous traffic data through nonparametric model. J Intell Transp Syst 20(5):438–448CrossRefGoogle Scholar
  21. Rio SD, Lopez V, Benitez JM, Herrera F (2014) On the use of MapReduce for imbalanced big data using Random Forest. Inf Sci 285:112–137CrossRefGoogle Scholar
  22. Singh K, Guntuku SC, Thakur K, Hota C (2014) Big data analytics framework for peer-to-peer botnet detection using random forests. Inf Sci 278:488–497CrossRefGoogle Scholar
  23. van Lint JWC (2006) Reliable real-time framework for short-term freeway travel time prediction. J Transp Eng 132(12):921–932CrossRefGoogle Scholar
  24. Vlahogianni EI, Karlaftis MG, Golias JC (2014) Short-term traffic forecasting: where we are and where we’re going. Transp Res Part C 43:3–19CrossRefGoogle Scholar
  25. Wu C-H, Ho J-M, Lee DT (2004) Travel-time prediction with support vector regression. IEEE Trans Intell Transp Syst 5(4):276–281CrossRefGoogle Scholar
  26. Xu Y, Zhang Q, Wang L (2016) Metric forests based on Gaussian mixture model for visual image classification. Soft Comput. doi: 10.1007/s00500-016-2350-4
  27. Yildirimoglu M, Geroliminis N (2013) Experienced travel time prediction for congested highways. Transp Res Part B 53:45–63CrossRefGoogle Scholar
  28. Zhang X, Rice JA (2003) Short-term travel time prediction. Transp Res Part C 11:187–210CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Shu-Kai S. Fan
    • 1
    Email author
  • Chuan-Jun Su
    • 2
  • Han-Tang Nien
    • 1
  • Pei-Fang Tsai
    • 1
  • Chen-Yang Cheng
    • 1
  1. 1.Department of Industrial Engineering and ManagementNational Taipei University of TechnologyTaipei CityTaiwan, ROC
  2. 2.Department of Industrial Engineering and ManagementYuan Ze UniversityTaoyuan CityTaiwan, ROC

Personalised recommendations