Abstract
This study compares the accuracy and complexity of eleven machine learning classifiers for the problem of incident duration prediction. The proposed framework integrates feature selection and modeling techniques to evaluate the effect of multiple influencing factors and choose the best model for predicting incident durations. Models were developed and tested using an incident dataset collected from the Houston TranStar incidents archive, including more than 110,000 records. Features were selected based on integrating information gain, correlation-based, and relief-based evaluators’ results. The developed and fine-tuned classifiers were compared in terms of multiple accuracy measures (precision, recall, F-1 score, and AUC) and complexity measures (memory storage, training time, and testing times). Overall, results showed that among the developed models, the support vector machines (SVM), K-Nearest Neighborhoods, and Gaussian processes classification outperformed other classifiers with a prediction accuracy of 97%. The Decision Tree classifier recorded the lowest performance with a prediction accuracy of 82%. Considering a trade-off between the model’s accuracy and complexity, the classifier with higher accuracy associated with low training time complexity was the K-Nearest Neighborhoods achieving an accuracy of 97%, 0.024 s of training time, 0.042 s of testing time, and a memory storage of 0.04 megabytes. Nevertheless, the SVM achieved the same accuracy of 97% yet consumed much lower memory storage of 0.004 megabytes and a testing time of 0.01 s. Although the K-NN recorded the lowest training time, the SVM can be considered the best model for the ID-prediction classification problem.
Similar content being viewed by others
References
Ma X, Ding C, Luan S, Wang Y, Wang Y (2017) Prioritizing Influential Factors for Freeway Incident Clearance Time Prediction Using the Gradient Boosting Decision Trees Method. IEEE Trans Intell Transp Syst 18(9):2303–2310. https://doi.org/10.1109/TITS.2016.2635719
Tavassoli Hojati A, Ferreira L, Washington S, Charles P, Shobeirinejad A (2014) Modelling total duration of traffic incidents including incident detection and recovery time. Accid Anal Prev 71:296–305. https://doi.org/10.1016/j.aap.2014.06.006
WJJ Knibbe, TP Alkim, JFW Otten, and MY Aidoo, (2006) Automated estimation of incident duration on Dutch highways,” in IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, 870–874 https://doi.org/10.1109/itsc.2006.1706853.
Hojati AT, Ferreira L, Washington S, Charles P (2013) Hazard based models for freeway traffic incident duration. Accident Anal Prevent 52:171–181. https://doi.org/10.1016/j.aap.2012.12.037
Li R, Pereira FC, Ben-Akiva ME (2015) Competing risk mixture model and text analysis for sequential incident duration prediction. Transp Res Part C Emerg Technol 54:74–85. https://doi.org/10.1016/j.trc.2015.03.009
Shi Y, Zhang L, Liu P (2015) Survival analysis of urban traffic incident duration: a case study at shanghai expressways. J Comput 26(1):29–39
B. N. Araghi, R. K. Simon Hu, M. Bell, and W. Ochieng, (2014) A comparative study of k-NN and hazard-based models for incident duration prediction, in 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), , pp. 1608–1613.
Ji YB, Jiang R, Qu M, Chung E (2014) Traffic incident clearance time and arrival time prediction based on hazard models. Math Probl Eng. https://doi.org/10.1155/2014/508039
Hou L, Lao Y, Wang Y, Zhang Z, Zhang Y, Li Z (2014) Time-varying effects of influential factors on incident clearance time using a non-proportional hazard-based model. Transp Res Part A Policy Pract 63:12–24. https://doi.org/10.1016/j.tra.2014.02.014
Ghosh I, Savolainen PT, Gates TJ (2014) Examination of factors affecting freeway incident clearance times: a comparison of the generalized F model and several alternative nested models. J Adv Transp 48(6):471–485. https://doi.org/10.1002/atr
Chimba D, Kutela B, Ogletree G, Horne F, Tugwell M (2014) Impact of abandoned and disabled vehicles on freeway incident duration. J Transp Eng 140(3):04013013. https://doi.org/10.1061/(ASCE)TE
Zou Y, Ye X, Henrickson K, Tang J, Wang Y (2018) Jointly analyzing freeway traffic incident clearance and response time using a copula-based approach. Transp. Res. Part C Emerg. Technol. 86(2017):171–182. https://doi.org/10.1016/j.trc.2017.11.004
Al Kaabi A, Dissanayake D, Bird R (2012) Response time of highway traffic accidents in Abu Dhabi: investigation with hazard-based duration models. Transp Res Rec 2278(1):95–103. https://doi.org/10.3141/2278-11
Junhua W, Haozhe C, Shi Q (2013) Estimating freeway incident duration using accelerated failure time modeling. Saf Sci 54:43–50. https://doi.org/10.1016/j.ssci.2012.11.009
Hamad K, Khalil MA, Alozi AR (2020) Predicting freeway incident duration using machine learning. Int J Intell Transp Syst Res 18(2):367–380. https://doi.org/10.1007/s13177-019-00205-1
Z. A. Mohammed, M. N. Abdullah, and I. H. Al-hussaini, (2021) Predicting incident duration based on machine learning methods, Iraqi J. Comput. Commun. Control Syst. Eng., 1–15 https://doi.org/10.33103/uot.ijccce.21.1.1.
W. Wu, S. Chen, and C. Zheng, (2011) traffic incident duration prediction based on support vector regression, in In 11th International Conference of Chinese Transportation Professionals (ICCTP), 346–359.
Zhao Y, Deng W (2022) Prediction in traffic accident duration based on heterogeneous ensemble learning. Appl Artif Intell 00(00):1–24. https://doi.org/10.1080/08839514.2021.2018643
Garib A, Radwan AE, Al-Deek H (1997) Estimating magnitude and duration of incident delays. J Transp Eng 123(6):459–466. https://doi.org/10.1061/(ASCE)0733-947X(1997)123:6(459)
J.-Y. Lee, J.-H. Chung, and B. Son, (2009) Incident Clearance Time Analysis for korean freeways using structural equation model, in The 8th International Conference of Eastern Asia Society for Transportation Studies, 7: 360–360.
Ding C, Ma X, Wang Y, Wang Y (2015) Exploring the influential factors in incident clearance time: disentangling causation from self-selection bias. Accid Anal Prev 85:58–65. https://doi.org/10.1016/j.aap.2015.08.024
Khattak AJ, Liu J, Wali B, Li X, Ng MW (2016) Modeling traffic incident duration using quantile regression. Transp Res Rec 2554(2554):139–148. https://doi.org/10.3141/2554-15
Khattak AJ, Schofer JL, Wang M-H (1995) A simple time sequential procedure for predicting freeway incident duration. I V H S J 2(2):113–138. https://doi.org/10.1080/10248079508903820
Yu B, Xia Z (2012) A methodology for freeway incident duration prediction using computerized historical database, CICTP 2012 Multimodal Transp. Safe, Cost-Effective, Effic, Syst. https://doi.org/10.1061/9780784412442.351
Hamad K, Al-ruzouq R, Zeiada W, Dabous SA, Khalil MA (2020) Predicting incident duration using random forests. Transp A Transp Sci 16(3):1269–1293
Lin L, Wang Q, Sadek AW (2016) A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations. Accid Anal Prev 91:114–126. https://doi.org/10.1016/j.aap.2016.03.001
Liu F, Wang S (2021) Predicting subway incident delays using text analysis based accelerated failure time model. J Transp Saf Secur 13(3):340–356. https://doi.org/10.1080/19439962.2019.1638474
Zhang Z, Liu J, Li X, Khattak AJ (2021) Do Larger Sample Sizes Increase the Reliability of Traffic Incident Duration Models? A Case Study of East Tennessee Incidents. Transp, Res. Rec., p 0361198121
Kalair K, Connaughton C (2021) Dynamic and interpretable hazard-based models of traffic incident durations. Front Futur Transp. https://doi.org/10.3389/ffutr.2021.669015
Zhan C, Gan A, Hadi M (2011) Prediction of lane clearance time of freeway incidents using the M5P tree algorithm. IEEE Trans Intell Transp Syst 12(4):1549–1557. https://doi.org/10.1109/TITS.2011.2161634
Khattak A, Wang X, Zhang H (2012) Incident management integration tool: dynamically predicting incident durations, secondary incident occurrence and incident delays. IET Intell Transp Syst 6(2):204–214. https://doi.org/10.1049/iet-its.2011.0013
Zhang H, Khattak AJ (2010) Analysis of cascading incident event durations on urban freeways. Transp Res Rec 2178(1):30–39. https://doi.org/10.3141/2178-04
Park H, Haghani A, Zhang X (2016) Interpretation of Bayesian neural networks for predicting the duration of detected incidents. J. Intelligent Transport Syst 20(4):385–400
Zou Y, Lin B, Yang X, Wu L, Muneeb Abid M, Tang J (2021) Application of the Bayesian model averaging in analyzing freeway traffic incident clearance time for emergency management. J Adv Transp. https://doi.org/10.1155/2021/6671983
Ghosh B, Dauwels J (2021) Comparison of different Bayesian methods for estimating error bars with incident duration prediction”, J. Transp. Syst. Technol. Planning, Oper, Intell. https://doi.org/10.1080/15472450.2021.1894936
Zong F, Zhang H, Xu H, Zhu X, Wang L (2013) Predicting severity and duration of road traffic accident. Math Probl Eng. https://doi.org/10.1155/2013/547904
Valenti G, Lelli M, Cucina D (2010) A comparative study of models for the incident duration prediction. Eur Transp Res Rev 2(2):103–111. https://doi.org/10.1007/s12544-010-0031-4
Lee Y, Wei CH (2010) A computerized feature selection method using genetic algorithms to forecast freeway accident duration times. Comput Civ Infrastruct Eng 25(2):132–148. https://doi.org/10.1111/j.1467-8667.2009.00626.x
Wei CH, Lee Y (2007) Sequential forecast of incident duration using artificial neural network models. Accid Anal Prev 39(5):944–954. https://doi.org/10.1016/j.aap.2006.12.017
Wei C, Lee Y (2005) Applying data fusion techniques to traveler information services in highway network. J East Asia Soc Transp Stud 6:2457–2472. https://doi.org/10.11175/easts.6.2457
Pereira FC, Rodrigues F, Ben-Akiva M (2013) Text analysis in incident duration prediction. Transp Res Part C Emerg Technol 37:177–192. https://doi.org/10.1016/j.trc.2013.10.002
El-Basyouny K, Sayed T (2006) Comparison of two negative binomial regression techniques in developing accident prediction models. Transp Res Rec 1950:9–16. https://doi.org/10.3141/1950-02
Vlahogianni EI, Karlaftis MG (2013) Fuzzy-entropy neural network freeway incident duration modeling with single and competing uncertainties. Computer-Aided Civil and Infrastructure Engineering 28(6):420–433. https://doi.org/10.1111/mice.12010
Kim HJ, Choi H-K (2001) A comparative analysis of incident service time on urban freeways. IATSS Res 25(1):62–72. https://doi.org/10.1016/s0386-1112(14)60007-8
W. Wenqun, C. Haibo, and M. Bell, (2002) A study of the characteristics of traffic incident duration on motorways, in Proceedings of the Conference on Traffic and Transportation Studies, ICTTS, pp. 1101–1108, doi: https://doi.org/10.1061/40630(255)153.
Vlahogianni EI, Dimitriou L (2015) Fuzzy modeling of freeway accident duration with rainfall and traffic flow interactions. Anal Methods Accid Res 5–6:59–71. https://doi.org/10.1016/j.amar.2015.04.001
Sheikh MS, Regan A (2022) A complex network analysis approach for estimation and detection of traffic incidents based on independent component analysis. Phys. A Stat. Mech. its Appl. 586:126504. https://doi.org/10.1016/j.physa.2021.126504
Chang H, Chang T (2013) Prediction of freeway incident duration based on classification tree analysis. J East Asia Soc Transp Stud 10(1):1964–1977
Kim W, Chang G (2012) Development of a hybrid prediction model for freeway incident duration: a case study in Maryland. Int J Intell Transp Syst Res 10(1):22–33. https://doi.org/10.1007/s13177-011-0039-8
W. Kim, G.-L. Chang, and S. M. Rochon, (2008) Analysis of freeway incident duration for atis applications, in Proceedings of the 15th World Congress on Intelligent Transport Systems and ITS America Annual Meeting, 950–958.
Ozbay K, Noyan N (2006) Estimation of incident clearance times using Bayesian networks approach. Accid Anal Prev 38(3):542–555. https://doi.org/10.1016/j.aap.2005.11.012
Yang BBJ, Zhang X, Sun LJ (2008) Traffic incident duration prediction based on the bayesian decision tree method. In Transport Develop Innovat Best Pract 2008(319):338–343. https://doi.org/10.1061/40961(319)56
L Shen and M Huang, (2011) Data mining method for incident duration prediction, in Communications in Computer and Information Science, https://doi.org/10.1007/978-3-642-23214-5_64.
S. Boyles, D. Fajardo, and S. T. Waller, “Naive bayesian classifier for incident duration prediction,” in Transportation Research Board 86th Annual Meeting, 2007, vol. 253, no. 07–1801, [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.526.3396&rep=rep1&type=pdf.
Lin L, Wang Q, Sadek A (2014) Data mining and complex network algorithms for traffic accident analysis. Transp Res Rec 2460(1):128–136. https://doi.org/10.3141/2460-14
Weng J, Qiao W, Qu X, Yan X (2015) Cluster-based lognormal distribution model for accident duration. Transp A Transp Sci 11(4):345–363. https://doi.org/10.1080/23249935.2014.994687
Zhao LP, Kolonel LN (1992) Efficiency loss from categorizing quantitative exposures into qualitative exposures in case-control studies. Am J Epidemiol 136(4):464–474. https://doi.org/10.1093/oxfordjournals.aje.a116520
T Shoaib, (2019) SPSS- Visual Binning, https://doi.org/10.13140/RG.2.2.28631.73123.
I. H. W. G. Holmes, A. Donkin, “Weka: A machine learning workbench, in: Intelligent Information Systems, 1994.,” 1994. [Online]. Available: http://netcologne.dl.sourceforge.net/project/weka/documentation/3.7.x/WekaManual-3-7-12.pdf.
I Koprinska, (2010) Feature Selection for Brain-Computer Interfaces, Pacific-Asia Conf. Knowl. Discov. data Min., 106–117
Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput Networks 148(November):164–175. https://doi.org/10.1016/j.comnet.2018.11.010
M N Injadat, A Moubayed, AB Nassif, and A Shami, (2020) Multi-stage optimized machine learning framework for network intrusion detection, arXiv, https://doi.org/10.1109/tnsm.2020.3014929.
Lee S, Park I (2013) Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines. J Environ Manage 127:166–176. https://doi.org/10.1016/j.jenvman.2013.04.010
DM Farid, L Zhang, CM Rahman, MA Hossain, and R Strachan, (2014) Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks, Expert Syst. Appl., 41(4) PART 2: 1937–1946
Song YY, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130–135. https://doi.org/10.11919/j.issn.1002-0829.215044
Salo F, Injadat MN, Moubayed A, Nassif AB, Essex A (2019) Clustering Enabled Classification using Ensemble Feature Selection for Intrusion Detection, 2019 Int. Conf Comput Netw Commun ICNC 2019(April):276–281. https://doi.org/10.1109/ICCNC.2019.8685636
S Alketbi, AB Nassif, MA. Eddin, I Shahin, and A Elnagar,(2020) Predicting the power of a combined cycle power plant using machine learning methods, 1–5, 2020, https://doi.org/10.1109/ccci49893.2020.9256742
Y. Afadar, A. B. Nassif, M. A. Eddin, M. AbuTalib, and Q. Nasir, (2020) Heart Arrhythmia abnormality classification using machine learning 1–5, https://doi.org/10.1109/ccci49893.2020.9256763.
Subasi A, Erçelebi E (2005) Classification of EEG signals using neural network and logistic regression. Comput Methods Programs Biomed 78(2):87–99. https://doi.org/10.1016/j.cmpb.2004.10.009
Liu D, Li T, Liang D (2014) Incorporating logistic regression to decision-theoretic rough sets for classifications. Int J Approx Reason 55(1):197–210. https://doi.org/10.1016/j.ijar.2013.02.013
Manogaran G, Lopez D (2018) Health data analytics using scalable logistic regression with stochastic gradient descent. Int J Adv Intell Paradig 10(1–2):118–132. https://doi.org/10.1504/IJAIP.2018.089494
AB Nassif, O Mahdi, Q Nasir, MA Talib, and M Azzeh, (2018) Machine Learning Classifications of Coronary Artery Disease, arXiv https://doi.org/10.1109/isai-nlp.2018.8692942.
A. B. Nassif, M. AlaaEddin and A. A. Sahib, "Machine Learning Models for Stock Price Prediction," 2020 Seventh International Conference on Information Technology Trends (ITT), Abu Dhabi, United Arab Emirates, 2020, pp. 67–71. https://doi.org/10.1109/ITT51279.2020.9320871
López-Martín C, Villuendas-Rey Y, Azzeh M, Bou Nassif A, Banitaan S (2020) Transformed k-nearest neighborhood output distance minimization for predicting the defect density of software projects. J Syst Softw 167:1–20. https://doi.org/10.1016/j.jss.2020.110592
Sharma A, Paliwal KK (2015) Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern 6(3):443–454. https://doi.org/10.1007/s13042-013-0226-9
Morais CLM, Lima KMG (2018) Principal component analysis with linear and quadratic discriminant analysis for identification of cancer samples based on mass spectrometry. J Braz Chem Soc 29(3):472–481. https://doi.org/10.21577/0103-5053.20170159
L Bottou, (2010) Large-Scale Machine Learning with Stochastic Gradient Descent https://doi.org/10.1007/978-3-7908-2604-3.
S Shrivastava, PM Jeyanthi, and S Singh, (2020) Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting, Cogent Econ. Financ. 8(1) https://doi.org/10.1080/23322039.2020.1729569.
Kim MJ, Kang DK (2010) Ensemble with neural networks for bankruptcy prediction. Expert Syst Appl 37(4):3373–3379. https://doi.org/10.1016/j.eswa.2009.10.012
Bazi Y, Melgani F (2010) Gaussian process approach to remote sensing image classification. IEEE Trans Geosci Remote Sens 48(1):186–197. https://doi.org/10.1109/TGRS.2009.2023983
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Hou L, Lao Y, Wang Y, Zhang Z, Zhang Y, Li Z (2013) Modeling freeway incident response time: a mechanism-based approach. Transp Res Part C Emerg Technol 28:87–100. https://doi.org/10.1016/j.trc.2012.12.005
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare relevant to this article’s content. No direct funding was received to assist with the preparation of this manuscript.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
For this type of study, formal consent is not required.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hamad, K., Obaid, L., Nassif, A.B. et al. Comprehensive evaluation of multiple machine learning classifiers for predicting freeway incident duration. Innov. Infrastruct. Solut. 8, 177 (2023). https://doi.org/10.1007/s41062-023-01138-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41062-023-01138-1