Abstract
Predicting traffic conflicts is pivotal for vehicle-based active safety system to prevent crashes. Yet, conflict prediction is a challenging task as correct prediction depends on the nature of data and techniques employed. Moreover, traffic conflicts data are naturally imbalanced with traffic conflicts being the minority class. Working with imbalanced dataset might result in biased and inaccurate predictions. Therefore, this study aims to appraise machine learning and deep learning techniques systematically, to identify the optimal technique which can reliably predict real-time traffic conflicts by making use of cost-sensitive learning. Five machine learning techniques were optimised and utilised including: logistic regression (LR), Support vector machines (SVM), deep neural networks (DNN), long short-term memory (LSTM) and LSTM convolutional neural network (LSTM-CNN) to appraise their predictability performance using a large, imbalanced, and disaggregated traffic dataset. Unlike existing studies, a wide range of interconnected factors are employed for real-time traffic conflict prediction to provide a more reliable prediction outcome. A large heterodox dataset was gathered from the M1 motorway in the UK to evaluate these techniques. Results suggested that DNN outperform other techniques in predicting conflicts with 0.72 sensitivity at 5% false alarm rate. Such promising results reflect that DNNs can be further applied to deepen our understanding in predicting traffic conflicts design more reliable primary safety systems for intelligent vehicles. Moreover, exploring state-of-the-art classification techniques with class imbalance on big data is significant to the future of big data analytics.
Similar content being viewed by others
References
Abdel-Aty M, Uddin N, Abdalla F, Pande A, Hsia L (2004) Predicting freeway crashes based on loop detector data using matched case–control logistic regression. Transp Res Board 1897:88–95
Abdel-Aty M, Uddin N, Pande A (2005) Split models for predicting multivehicle crashes during high-speed and low-speed operating conditions on freeways. Transp Res Rec J Transp Res Board 1908:51–58
Ariannezhad A, Karimpour A, Qin X, Wu Y-J, Salmani Y (2021) Handling imbalanced data for real-time crash prediction: application of boosting and sampling techniques. J Transp Eng Part A Syst 147(3):04020165. https://doi.org/10.1061/JTEPBS.0000499
Basso F, Basso LJ, Bravo F, Pezoa R (2018) Real-time crash prediction in an urban expressway using disaggregated data. Transp Res Part C Emerg Technol 86:202–219. https://doi.org/10.1016/j.trc.2017.11.014
Basso F, Basso LJ, Pezoa R (2020) The importance of flow composition in real-time crash prediction. Accid Anal Prev 137:105436. https://doi.org/10.1016/j.aap.2020.105436
Basso F, Pezoa R, Varas M, Villalobos M (2021) A deep learning approach for real-time crash prediction using vehicle-by-vehicle data. Accid Anal Prev 162:106409. https://doi.org/10.1016/j.aap.2021.106409
Bauder RA, Khoshgoftaar TM, Hasanin T (2019) An empirical study on class rarity in big data. Proc17th IEEE Int Conf Mach Learn Appl ICMLA 2018:785–790. https://doi.org/10.1109/ICMLA.2018.00125
Beam AL, Kohane IS (2018) Big data and machine learning in health care. JAMA–J Am Med Assoc 319(13):1317–1318. https://doi.org/10.1001/jama.2017.18391
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181
Brijs T, Karlis D, Wets G (2008) Studying the effect of weather conditions on daily crash counts using a discrete time-series model. Accid Anal Prev 40(3):1180–1190. https://doi.org/10.1016/j.aap.2008.01.001
Cai Q, Abdel-Aty M, Yuan J, Lee J, Wu Y (2020) Real-time crash prediction on expressways using deep generative models. Transp Res Part C Emerg Technol 117:102697. https://doi.org/10.1016/j.trc.2020.102697
Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899. https://doi.org/10.1109/TNNLS.2013.2246188
Cervantes J, Li X, Yu W, Li K (2008) Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing 71(4–6):611–619. https://doi.org/10.1016/j.neucom.2007.07.028
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1002/eap.2043
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. https://doi.org/10.1145/2939672.2939785
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
Chollet F (2018) Deep learning with python. Manning, Alberta
Denil M, Trappenberg T (2010) Overlap versus imbalance. Advances in artificial intelligence. Springer, Berlin
DeTienne KB, DeTienne DH, Joshi SA (2003) Neural networks as statistical tools for business researchers. Organ Res Methods 6(2):236–265. https://doi.org/10.1177/1094428103251907
Dreiseitl S, Ohno-Machado L, Kittler H, Vinterbo S, Billhardt H, Binder M (2001) A comparison of machine learning methods for the diagnosis of pigmented skin lesions. J Biomed Inform 34(1):28–36. https://doi.org/10.1006/jbin.2001.1004
Elamarani Abou Elassas Z, Mousannif H, Al MH (2020) Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study. Traffic Inj Prev 21(3):201–208. https://doi.org/10.1080/15389588.2020.1723794
Engelbrecht A (2007) Computational intelligence. An introduction, 2nd edn. Wiley & Sons, Chichester
Fernández A, García S, Galar M, Prati RC (2018) Learning from imbalanced data sets. Springer, Cham
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2019) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci (ny) 479:448–455. https://doi.org/10.1016/j.ins.2017.12.030
Formosa N, Quddus M, Ison S, Abdel-Aty M, Yuan J (2020) Predicting real-time traffic conflicts using deep learning. Accid Anal Prev. https://doi.org/10.1016/j.aap.2019.105429
Formosa N, Quddus M, Papadoulis A, Timmis A (2022) Validating a traffic conflict prediction technique for motorways using a simulation approach †. Sensors 22:556
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484. https://doi.org/10.1109/TSMCC.2011.2161285
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1
Green M (2000) “How long does it take to stop?” Methodological analysis of driver perception-brake times. Transp Hum Factors 2(3):195–216. https://doi.org/10.1207/STHF0203_1
Guo H, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239. https://doi.org/10.1016/j.eswa.2016.12.035
Hall MA, Smith LA 1995 Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. Proc. Twelfth Int. FLAIRS Conference.
Hayward JC (1971) Near-Miss determination through use of a scale of danger. The Pennsylvania State University, Philadelphia, PA
He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications, 1st edn. John Wiley & Sons Inc., Hoboken
Hossain M, Muromachi Y (2013) A real-time crash prediction model for the ramp vicinities of urban expressways. IATSS Res 37(1):68–79. https://doi.org/10.1016/j.iatssr.2013.05.001
Hossain M, Abdel-Aty M, Quddus MA, Muromachi Y, Sadeek SN (2019) Real-time crash prediction models: state-of-the-art, design pathways and ubiquitous requirements. Accid Anal Prev 2018(128):66–84. https://doi.org/10.1016/j.aap.2018.12.022
Iram S, Vialatte F-B, Qamar MI (2016) Early diagnosis of neurodegenerative diseases from gait discrimination to neural synchronization. Applied computing in medicine and health. Elsevier, Amsterdam
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6:1. https://doi.org/10.1186/s40537-019-0192-5
Karlaftis MG, Golias I (2002) Effects of road geometry and traffic volumes on rural roadway accident rates. Accid Anal Prev 34(3):357–365. https://doi.org/10.1016/S0001-4575(01)00033-1
Karlaftis MG, Vlahogianni EI (2011) Statistical methods versus neural networks in transportation research: differences, similarities and some insights. Transp Res Part C Emerg Technol 19(3):387–399. https://doi.org/10.1016/j.trc.2010.10.004
Ke J, Zhang S, Yang H, Chen X (2018) PCA-Based missing information imputation for real-time crash likelihood prediction under imbalanced data. Transportmetrica A. https://doi.org/10.1080/23249935.2018.1542414
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587. https://doi.org/10.1109/TNNLS.2017.2732482
Krawczyk B, Woźniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput J 14:554–562. https://doi.org/10.1016/j.asoc.2013.08.014
Kuang L, Yan H, Zhu Y, Tu S, Fan X (2019) Predicting duration of traffic accidents based on cost-sensitive Bayesian network and weighted K-nearest neighbor. J Intell Transp Syst 23(2):161–174. https://doi.org/10.1080/15472450.2018.1536978
Laureshyn A, de Goede M, Saunier N, Fyhri A (2017) Cross-comparison of three surrogate safety methods to diagnose cyclist safety problems at intersections in Norway. Accid Anal Prev 105:11–20. https://doi.org/10.1016/j.aap.2016.04.035
Lee H, Park M, Kim J (2016) Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. Proc Int Conf Image Process ICIP. https://doi.org/10.1109/ICIP.2016.7533053
Li P, Abdel-Aty M, Yuan J (2020) Real-time crash risk prediction on arterials based on LSTM-CNN. Accid Anal Prev 135:105371. https://doi.org/10.1016/j.aap.2019.105371
Lin T, Guo T, Aberer K (2017) Hybrid neural networks for time series learning. Proc Twenty-Sixth Int Jt Conf Artif Intell. https://doi.org/10.24963/ijcai.2017/316
Longadge R, Dongre S, Malik L (2013) Class imbalance problem in data mining: review. Int J Comput Sci Netw 2:1. https://doi.org/10.1016/j.ejim.2013.08.659
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci (ny) 250:113–141. https://doi.org/10.1016/j.ins.2013.07.007
López V, Del Río S, Benítez JM, Herrera F (2015) Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38. https://doi.org/10.1016/j.fss.2014.01.015
Lynam AL, Dennis JM, Owen KR et al (2020) Logistic regression hassimilar performance to optimised machine learning algorithms in a clinical setting: application to the discriminationbetween type 1 and type 2 diabetes in young adults. Diagn Progn Res 4:6. https://doi.org/10.1186/s41512-020-00075-2
Ma X, Dai Z, He Z, Ma J, Wang Y, Wang Y (2017) Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors (switzerland) 17:4. https://doi.org/10.3390/s17040818
Mafi S, Abdel RY, Doczy R (2018) Machine learning methods to analyze injury severity of drivers from different age and gender groups. Transp Res Rec J Transp Res Board 2672(38):171–183. https://doi.org/10.1177/0361198118794292
Man CK, Quddus M, Theofilatos A (2022) Transfer learning for spatio-temporal transferability of real-time crash prediction models. Accid Anal Prev 2021(165):106511. https://doi.org/10.1016/j.aap.2021.106511
Mease D, Wyner A, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn Res 8:409–439
Mukaka MM (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3):69–71
Nadimi N, Behbahani H, Shahbazi H (2016) Calibration and validation of a new time-based surrogate safety measure using fuzzy inference system. J Traffic Transp Eng 3(1):51–58. https://doi.org/10.1016/j.jtte.2015.09.004
O’Reilly UM, Yu T, Riolo R, Worzel B (eds) (2006) Genetic programming theory and practice II, Vol 8. Springer
Pande A, Abdel-Aty M (2005) A freeway safety strategy for advanced proactive traffic management. J Intell Transp Syst Technol Planning Oper 9(3):145–158. https://doi.org/10.1080/15472450500183789
Parsa AB, Taghipour H, Derrible S, Mohammadian AK (2019) Real-time accident detection: coping with imbalanced data. Accid Anal Prev 129:202–210. https://doi.org/10.1016/j.aap.2019.05.014
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Müller A, Nothman J, Louppe G, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Passos A. Cournapeau D, Brucher M, Perrot M, Duchesnay É 2012 Scikit-learn: Machine Learning in Python.
Peng Y, Li C, Wang K, Gao Z, Yu R (2020) Examining imbalanced classification algorithms in predicting real-time traffic crash risk. Accid Anal Prev. https://doi.org/10.1016/j.aap.2020.105610
Piwek L, Ellis DA, Andrews S, Joinson A (2016) The rise of consumer health wearables: promises and barriers. PLoS Med 13(2):1–9. https://doi.org/10.1371/journal.pmed.1001953
Pouyanfar S, Tao Y, Mohan A, Tian H, Kaseb AS, Gauen K, Dailey R, Aghajanzadeh S, Lu YH, Chen SC, Shyu ML (2018) Dynamic sampling in convolutional neural networks for imbalanced data classification. Proc–ieEE 1st Conf Multimed Inf Process Retrieval, MIPR 2018:112–117. https://doi.org/10.1109/MIPR.2018.00027
Principe J, Euliano N, Lefebvre W (2000) Neural and adaptive systems: fundamentals through simulations. John Wiley & Sons, Hoboken
Quddus M (2013) Exploring the relationship between average speed, speed variation, and accident rates using spatial statistical models and GIS. J Transp Saf Secur 5(1):27–45. https://doi.org/10.1080/19439962.2012.705232
Rahman R, Zhang J, Dey S, Tanmoy T, Istiak B, Naveen J (2022) A data - driven network model for traffic volume prediction at signalized intersections. J Big Data Anal Transp 4(2):135–152. https://doi.org/10.1007/s42421-022-00059-2
Ruthotto L, Haber E (2021) An Introduction to deep generative modeling. GAMM-Mitteilungen. https://doi.org/10.1002/gamm.202100008
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci (ny) 291:184–203. https://doi.org/10.1016/j.ins.2014.08.051
Sivaraman S, Trivedi MM (2014) Active learning for on-road vehicle detection : a comparative study. Mach vis Appl 25:599–611. https://doi.org/10.1007/s00138-011-0388-y
Smith LN 2018. A disciplined approach to neural network hyper-parameters: Part 1–Learning rate, batch size, momentum, and weight decay, US Naval Research Laboratory Technical Report 5510–026.
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378. https://doi.org/10.1016/j.patcog.2007.04.009
Tang Q, Yang M, Yang Y (2019) ST-LSTM: a deep learning approach combined spatio-temporal features for short-term forecast in rail transit. J Adv Transp. https://doi.org/10.1155/2019/8392592
Tarko AP (2018) Estimating the expected number of crashes with traffic conflicts and the Lomax distribution–a theoretical and numerical exploration. Accid Anal Prev 113:63–73. https://doi.org/10.1016/j.aap.2018.01.008
Tarko AP (2020) Measuring road safety using surrogate events. Emerald Publishing Limited, Bingley
Theofilatos A, Chen C, Antoniou C (2019) Comparing machine learning and deep learning methods for real-time crash prediction. Transp Res Rec 2673(8):169–178. https://doi.org/10.1177/0361198119841571
Tian Y, Pan L 2015 Predicting short-term traffic flow by long short-term memory recurrent neural network. IEEE Int. Conf. Smart City/SocialCom/SustainCom together with DataCom 2015 SC2 2015. doi:https://doi.org/10.1109/SmartCity.2015.63
Tsoi TS, Wheelus C 2020 Traffic signal classification with cost-sensitive deep learning models, in: 2020 IEEE international conference on knowledge graph (ICKG). IEEE, pp. 586–592. doi:https://doi.org/10.1109/ICBK50248.2020.00088
Varghese V, Chikaraishi M, Urata J (2020) Deep learning in transport studies: a meta - analysis on the prediction accuracy. J Big Data Anal Transp 2(3):199–220. https://doi.org/10.1007/s42421-020-00030-z
Wan Z, Zhang Y, He H 2017 Variational autoencoder based synthetic data generation for imbalanced learning, In: 2017 IEEE Symposium series on computational intelligence (SSCI). IEEE, pp. 1–7. doi:https://doi.org/10.1109/SSCI.2017.8285168
Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368. https://doi.org/10.1109/TKDE.2014.2345380
Wang S, Liu W, Wu J, Cao L, Meng Q, Kennedy PJ (2016) Training deep neural networks on imbalanced data sets. Proc Int Jt Conf Neural Netw. https://doi.org/10.1109/IJCNN.2016.7727770
Xu C, Tarko AP, Wang W, Liu P (2013) Predicting crash likelihood and severity on freeways with real-time loop detector data. Accid Anal Prev 57:30–39. https://doi.org/10.1016/j.aap.2013.03.035
Xu W, Pan J, Wei J, Dolan JM (2014) Motion planning under uncertainty for on-road autonomous driving. Proc–ieee Int Conf Robot Autom. https://doi.org/10.1109/ICRA.2014.6907209
Yang K, Yu R, Wang X, Quddus M, Xue L (2018) How to determine an optimal threshold to classify real-time crash-prone traffic conditions? Accid Anal Prev 117:250–261. https://doi.org/10.1016/j.aap.2018.04.022
Yang K, Wang X, Quddus MA, Yu R 2017 Deep learning for real-time crash prediction on urban expressways using highly imbalanced big data. J. Transp. Res. Board
Yang K, Wang X, Quddus M, Yu R (2019) Predicting real-time crash risk on urban expressways using recurrent neural network. In: Proceedings of the Transportation Research Board 98th Annual Meeting. Washington, DC, USA
Yu R, Abdel-Aty M (2013) Utilizing support vector machine in real-time crash risk evaluation. Accid Anal Prev 51:252–259. https://doi.org/10.1016/j.aap.2012.11.027
Yuan J, Abdel-Aty M, Gong Y, Cai Q (2019) Real-time crash risk prediction using long short-term memory recurrent neural network. Transp Res Rec 2673(4):314–326. https://doi.org/10.1177/0361198119840611
Zangenehpour S, Miranda-moreno LF, Saunier N (2015) Automated classification based on video data at intersections with heavy pedestrian and bicycle traffic : methodology and application. Transp Res Part C 56:161–176. https://doi.org/10.1016/j.trc.2015.04.003
Zheng L, Ismail K, Meng X (2014) Traffic conflict techniques for road safety analysis: open questions and some insights. Can J Civ Eng 41(7):633–641. https://doi.org/10.1139/cjce-2013-0558
Zheng Z, Yang Y, Liu J, Dai H (2019) Deep and embedded learning approach for traffic flow prediction in urban informatics. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2019.2909904
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77. https://doi.org/10.1109/TKDE.2006.17
Zhou B, Wang X, Zhang S, Li Z, Sun S, Shu K, Sun Q (2020) Comparing factors affecting injury severity of passenger car and truck drivers. IEEE Access 8:153849–153861. https://doi.org/10.1109/ACCESS.2020.3018183
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Formosa, N., Quddus, M., Man, C.K. et al. Appraising Machine and Deep Learning Techniques for Traffic Conflict Prediction with Class Imbalance. Data Sci. Transp. 5, 4 (2023). https://doi.org/10.1007/s42421-023-00067-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42421-023-00067-w