Skip to main content

Advertisement

Log in

A comparative analysis of heterogeneity in road accident data using data mining techniques

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Road accidents are one of the most imperative factors that affect the untimely death among people and economic loss of public and private property. Road safety is a term associated with the planning and implementing certain strategy to overcome the road and traffic accidents. Road accident data analysis is a very important means to identify various factors associated with road accidents and can help in reducing the accident rate. The heterogeneity of road accident data is a big challenge in road safety analysis. In this study, we are making use of latent class clustering (LCC) and k-modes clustering technique on a new road accident data from Haridwar, Uttarakhand, India. The main focus to use both the techniques is to identify which technique performs better. Initially, we applied LCC and k-modes clutering technique on road accident data to form different clusters. Further, Frequent Pattern (FP) growth technique is applied on the clusters formed and entire data set (EDS). The rules generated for each clusters do not prove any cluster analysis technique superior over other. However, it is certain that both techniques are well suited to remove heterogeneity of road accident data. The rules generated for each cluster and EDS proves that heterogeneity exists in the entire data set and clustering prior to analysis certainly reduces heterogeneity from the data set and provides better solutions. The rules for Haridwar district reveals some important information which can used to develop policies to prevent and overcome the accident rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abdel-Aty MA, Radwan AE (2000) Modeling traffic accident occurrence and involvement. Accid Anal Prev 32(5):633-642

    Article  Google Scholar 

  • Akaike H (1987) Factor analysis and AIC. Psychome 52:317–332

    Article  MathSciNet  MATH  Google Scholar 

  • Barai S (2003) Data mining application in transportation engineering. Transport 18:216–223

    Google Scholar 

  • Chaturvedi A, Green P, Carroll J (2001) k-Modes clustering. J Classif 18:35–55

    Article  MathSciNet  MATH  Google Scholar 

  • Chen W, Jovanis P (2002) Method of identifying factors contributing to driver-injury severity in traffic crashes. Transp Res Rec 1717

  • Depaire B, Wets G, Vanhoof K (2008) Traffic accident segmentation by means oflatent class clustering. Accid Anal Prev 40(4):1257–1266

    Article  Google Scholar 

  • Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588

    Article  MATH  Google Scholar 

  • Geurts K, Wets G, Brijs T, Vanhoof K (2003) Profiling of high frequency accident locations by use of association rules. Transp Res Rec 1840

  • Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, New York

    MATH  Google Scholar 

  • Han J, Pei H, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the conference on the management of data (SIGMOD’00, Dallas, TX). ACM Press, New York

  • http://www.emri.in Accessed 15 Dec 2015

  • Islam S, Mannering F (2006) Driver aging and its effect on male and female single-vehicle accident injuries: some additional evidence. Accid Anal Prev 37(2):267–276

    Google Scholar 

  • Joshua SC, Garber NJ (1990) Estimating truck accident rate and involvements using linear and poisson regression models. Trans Plan Tech 15(1):41–58

    Article  Google Scholar 

  • Karlaftis M, Tarko A (1998) Heterogeneity considerations in accident modeling. Accid Anal Prev 30(4):425–433

    Article  Google Scholar 

  • Kumar S, Toshniwal D (2015) A data mining framework to analyze road accident data. J Big Data 2(1):1–18. doi:10.1186/s40537-015-0035-y

    Google Scholar 

  • Kumar S, Toshniwal D (2016) A data mining approach to characterize road accident locations. J Mod Transp 24(1):62–72

    Article  Google Scholar 

  • Ona JD, Lopez G, Mujalli R, Calvo FJ (2013) Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid Anal Prev 51:1-10

    Article  Google Scholar 

  • Raftery AE (1986) A note on Bayes factors for log-linear contingency table models with vague prior information. J R Stat Soc B 48:249–250

    MathSciNet  MATH  Google Scholar 

  • Sasidharan L, Wu KF, Menendez M (2015) Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland. Accid Anal Prev 85:219–228

    Article  Google Scholar 

  • Savolainen P, Mannering F (2007) Probabilistic models of motorcyclists’ injury severities in single- and multi-vehicle crashes. Accid Anal Prev 39(5):955–963

    Article  Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63:411–423

    Article  MathSciNet  MATH  Google Scholar 

  • Ulfarsson GF, Mannering FL (2004) Difference in male and female injury severities in sport-utility vehicle, minivan, pickup and passenger car accidents. Accid Anal Prev 36(2):135–147

    Article  Google Scholar 

  • Vermunt JK, Magidson J (2002) Latent class cluster analysis. In: Hagenaars JA, McCutcheon AL (eds) Advances in latent class analysis. Cambridge University Press, Cambridge

    Google Scholar 

Download references

Acknowledgments

We are thankful to GVK-Emergency Management Research Institute Dehradun, Uttarakhand to provide data for our research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sachin Kumar.

Ethics declarations

Conflict of interest

None of the authors have any competing interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S., Toshniwal, D. & Parida, M. A comparative analysis of heterogeneity in road accident data using data mining techniques. Evolving Systems 8, 147–155 (2017). https://doi.org/10.1007/s12530-016-9165-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-016-9165-5

Keywords

Navigation