Abstract
Road accidents are one of the most imperative factors that affect the untimely death among people and economic loss of public and private property. Road safety is a term associated with the planning and implementing certain strategy to overcome the road and traffic accidents. Road accident data analysis is a very important means to identify various factors associated with road accidents and can help in reducing the accident rate. The heterogeneity of road accident data is a big challenge in road safety analysis. In this study, we are making use of latent class clustering (LCC) and k-modes clustering technique on a new road accident data from Haridwar, Uttarakhand, India. The main focus to use both the techniques is to identify which technique performs better. Initially, we applied LCC and k-modes clutering technique on road accident data to form different clusters. Further, Frequent Pattern (FP) growth technique is applied on the clusters formed and entire data set (EDS). The rules generated for each clusters do not prove any cluster analysis technique superior over other. However, it is certain that both techniques are well suited to remove heterogeneity of road accident data. The rules generated for each cluster and EDS proves that heterogeneity exists in the entire data set and clustering prior to analysis certainly reduces heterogeneity from the data set and provides better solutions. The rules for Haridwar district reveals some important information which can used to develop policies to prevent and overcome the accident rate.
Similar content being viewed by others
References
Abdel-Aty MA, Radwan AE (2000) Modeling traffic accident occurrence and involvement. Accid Anal Prev 32(5):633-642
Akaike H (1987) Factor analysis and AIC. Psychome 52:317–332
Barai S (2003) Data mining application in transportation engineering. Transport 18:216–223
Chaturvedi A, Green P, Carroll J (2001) k-Modes clustering. J Classif 18:35–55
Chen W, Jovanis P (2002) Method of identifying factors contributing to driver-injury severity in traffic crashes. Transp Res Rec 1717
Depaire B, Wets G, Vanhoof K (2008) Traffic accident segmentation by means oflatent class clustering. Accid Anal Prev 40(4):1257–1266
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
Geurts K, Wets G, Brijs T, Vanhoof K (2003) Profiling of high frequency accident locations by use of association rules. Transp Res Rec 1840
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, New York
Han J, Pei H, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the conference on the management of data (SIGMOD’00, Dallas, TX). ACM Press, New York
http://www.emri.in Accessed 15 Dec 2015
Islam S, Mannering F (2006) Driver aging and its effect on male and female single-vehicle accident injuries: some additional evidence. Accid Anal Prev 37(2):267–276
Joshua SC, Garber NJ (1990) Estimating truck accident rate and involvements using linear and poisson regression models. Trans Plan Tech 15(1):41–58
Karlaftis M, Tarko A (1998) Heterogeneity considerations in accident modeling. Accid Anal Prev 30(4):425–433
Kumar S, Toshniwal D (2015) A data mining framework to analyze road accident data. J Big Data 2(1):1–18. doi:10.1186/s40537-015-0035-y
Kumar S, Toshniwal D (2016) A data mining approach to characterize road accident locations. J Mod Transp 24(1):62–72
Ona JD, Lopez G, Mujalli R, Calvo FJ (2013) Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid Anal Prev 51:1-10
Raftery AE (1986) A note on Bayes factors for log-linear contingency table models with vague prior information. J R Stat Soc B 48:249–250
Sasidharan L, Wu KF, Menendez M (2015) Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland. Accid Anal Prev 85:219–228
Savolainen P, Mannering F (2007) Probabilistic models of motorcyclists’ injury severities in single- and multi-vehicle crashes. Accid Anal Prev 39(5):955–963
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63:411–423
Ulfarsson GF, Mannering FL (2004) Difference in male and female injury severities in sport-utility vehicle, minivan, pickup and passenger car accidents. Accid Anal Prev 36(2):135–147
Vermunt JK, Magidson J (2002) Latent class cluster analysis. In: Hagenaars JA, McCutcheon AL (eds) Advances in latent class analysis. Cambridge University Press, Cambridge
Acknowledgments
We are thankful to GVK-Emergency Management Research Institute Dehradun, Uttarakhand to provide data for our research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None of the authors have any competing interest.
Rights and permissions
About this article
Cite this article
Kumar, S., Toshniwal, D. & Parida, M. A comparative analysis of heterogeneity in road accident data using data mining techniques. Evolving Systems 8, 147–155 (2017). https://doi.org/10.1007/s12530-016-9165-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-016-9165-5