An international review of challenges and opportunities in development and use of crash prediction models
Over the past 10 years, building on road infrastructure data, crash prediction models (CPMs) have become fundamental scientific tools for road safety management. However, there is a gap between state-of-the-art and state-of-the-practice, with the practical application lagging behind scientific progress. This motivated a review of international experience with CPMs from perspectives of application by practitioners and development by researchers. The objective of the paper is to improve practitioner understanding of modelling road safety performance using CPMs for crash frequency estimation, leading to their greater uptake in improving road safety. In short, why and how should road safety practitioners consider CPMs?
Both scientific and practice-oriented literature was retrieved, using academic sources, as well as reports of road agencies or institutes. The selection was limited to English language.
From the review it is clear that developing CPMs is not a straightforward task: there are many available choices and decisions to be made during the process without definite guidance. This explains the diversity of approaches, techniques, and model types. The paper explains how some fundamental modelling decisions affect practical aspects of modelling safety performance.
There is a need to identify CPM solutions that will be scientifically sound and feasible in practitioners’ context. Together with increased communication between researchers and practitioners, these solutions will help overcome the identified challenges and increase use of CPMs.
KeywordsRoad safety Crash prediction model Risk model State-of-the-art State-of-the-practice Review
Ni … crash frequency on road i in specific time period
β0 … intercept
EXPOi … exposure on road i in specific time period
βj … regression coefficients
xj … explanatory variables
In order to correctly consider discrete and character of crash frequencies, generalized linear modelling (GLM) methods are typically used. First models used the Poisson regression as a starting point; however, it was found that they cannot handle overdispersion (the variance exceeding the mean), which is typical for crash data . It motivated use of the negative binomial (or Poisson-gamma) models, which assume that the Poisson parameter follows a gamma probability distribution. According to an extensive review by Lord and Mannering , the negative binomial (NB) models are the most used in crash-frequency modelling. Given this fact, further text will focus on NB models; for more information on other model types, such as zero-inflated, generalized estimating equations (GEE), generalized additive models (GAM), random-effects, random-parameters, hierarchical/multilevel or neural networks, see e.g. [5, 47, 50].
CPMs analyse and highlight potential safety issues, help to identify potential for safety improvements and estimate their benefits . Over the past decades, building on road infrastructure data, CPMs have become the fundamental scientific tools in quantitative road safety management, forming the foundation of the AASHTO Highway Safety Manual (HSM) or the Australian National Risk Assessment Model (ANRAM). First edition of HSM (2010) became a recognized source of information and methods for science-based decision making, allowing safety to be quantitatively evaluated alongside other transportation performance measures such as traffic operations, environmental impacts, pavement durability or construction costs. The methods in HSM, based on CPMs, provide an opportunity to: (1) improve the reliability of common activities, such as screening a network for sites at which to reduce crashes, and (2) expand analysis to include assessments of new or alternative geometric and operational characteristics .
CPMs may be used for various key functions, including network safety screening, development of crash modification factors (CMFs), road safety impact assessments and economic analysis. However, there are gaps between state-of-the-art (what is published by researchers) and state-of-the-practice (what is needed/used by practitioners), which limit the application of CPMs.
This paper will assist road safety practitioners in understanding why and how they might use CPMs to improve road safety. The paper presents a review of how CPMs are developed and applied. Especially, the paper explores challenges of optimising scientific validity and practical applicability. These challenges are discussed in context of opportunities and potential solutions that might assist practitioners in incorporation CPMs into road safety management.
academic: Web of Science and Scopus, including selected references (snowballing)
practical: reports of agencies (e.g. Federal Highway Administration, Austroads, NZ Transport Agency)
both: ARRB Knowledge Base, TRID database, reports of European institutes, EU project deliverables
Keywords: accident prediction model, crash prediction model, safety performance function
Time frame restriction: none
Macro/planning-level applications (analysis based on jurisdiction, GDP, or land-use zones in assignment models)
Specific CPMs for vulnerable road users, such as pedestrians or bicyclists
CPMs for specific road elements (e.g. railway level crossings, bridges, tunnels, etc.)
Logistic binary modelling of crash characteristics (e.g. victim gender and age, vehicle age, etc.)
Use of CPMs for evaluation of safety effectiveness of safety treatments or programmes (before/after studies)
These CPM applications are important in broader road safety context and may be explored using findings presented in this paper as a starting point.
Data collection, sample size and time period
Road network segmentation
Selection of explanatory variables
Model function and variable forms
Using CPMs in network screening
Using CPMs in developing crash modification factors (CMFs)
Using CPM tools, e.g. for road safety impact assessment
Previous reviews related to CPMs [5, 47, 54, 86] usually considered some of these steps only, mainly 3 and 4. The presented review fills the gap by compiling information on all eight steps, followed by summarised challenges and opportunities, with available solutions.
3.1 CPMs and their uses
Exploring and comparing combinations of individual risk factors that make some road locations unsafe
Network safety screening, i.e. safety ranking road locations, or identification of hazardous locations
Impact assessments, i.e. assessing safety of contemplated (re)constructions or safety treatments
Economic analysis of project costs vs. safety benefits
It is to be noted that Task 1 is rather research-oriented; Tasks 2, 3 and 4 represent typical practical tasks undertaken by many road agencies. According to a review of North American practices , network screening is the most common application of CPMs. In European project PRACT, cost-benefit analysis was identified as a common use of CPM application [85, 86].
As noted, CPMs may be developed for road segments of a particular road type (e.g. rural undivided highway), for all intersections, for individual intersection types, or any combination of these. CPMs can be developed for all recorded crashes, casualty crashes, or severe crashes only; the approach depends on the purpose of the model. Very broad CPMs may be useful in high-level network screening or highlighting strategic issues. More specific safety management or research objectives will require more specific models. Given the range of potential applications, CPMs have been acknowledged worldwide as recommended tools, on which rational road safety management should be based. However, at the same time, it has been known that prediction modelling is not a simple task [15, 18, 77] and involve various analytical choices, which are often done without explicit justification. This may explain why there are gaps between state-of-the-art and state-of-the-practice; and this may in turn limit the practical use of CPMs. For example, a survey among European road agencies found that 70% of them rarely or never systematically use CPMs in their decision-making .
Regarding the selection of research for inclusion in the review, another distinction needs to be made. HSM introduces a set of CPMs (referred to as safety performance functions, SPFs) and crash modification factors (CMFs). Crash prediction in the HSM has two main steps: (1) prediction of a baseline crash rates using SPFs/CPMs for nominal route and intersection conditions, and (2) multiplying the ‘baseline’ models by crash modification factors (CMFs) to capture changes in geometric design and operational characteristics (deviations from nominal conditions). This approach has gained popularity, being incorporated into Interactive Highway Safety Design Model (IHSDM), and recently adopted in the European CPM , as well as Australian ANRAM  and New Zealand Crash Estimation Compendium .
The CPMs/SPFs in the HSM and ISHDM, developed from data in several US states, are not directly transferable to other jurisdictions (inside or outside US). Some studies confirmed good transferability, mainly between US states [7, 74, 84], but others were less successful when applied abroad, for example in Canada, Italy or Korea [42, 63, 64, 69, 88]. Therefore, it is recommended that each country and jurisdiction (e.g. State) develops its own specific CPMs. The present review, written by non-US authors, adopts this perspective.
3.2 Data collection
In theory, to obtain sufficiently representative models, one should randomly sample data from the population of similar road types or intersections. In this regards, given the variance of crash frequencies, several authors recommended minimal sample sizes, such as at least 50 sites , 200 crashes  or 300 crashes . The HSM  advises using a sample of 30–50 locations with a total of at least 100 crashes per year. However, others were critical about the one-size-fits-all approach. For example, Lord  provided guidance on necessary sample size based on sample mean, i.e. for example 200 segments in case of average of 5 crashes per segment, or 1000 segments in case of average of 1 crash per segment. (Note that these considerations do not apply in case of network screening, whose goal is to screen the complete network).
In addition, unlike in the case of large USA and Canadian samples, smaller countries are limited in their samples of network and crash data. For example, Turner etal.  mentioned, that New Zealand road network size limits the development of models for some segment and site types, e.g. interchanges. This factor also reduces opportunities for disaggregation CPMs into all crash types and severity levels.
Data on crashes, traffic volumes and other relevant road attributes need to be assigned to all the sample sites. Crash data are known for various biases, such as underreporting, location errors, severity misclassification or inaccurate identification of contributory factors. Also, traffic volume data may be prone to errors: typical measure of traffic volume AADT is an average, aggregated for various vehicle types ; in addition, location errors also exist, as traffic volumes typically measured at one location are assumed to apply to the entire section, and often to multiple sections. Thus actual variation in traffic flow is difficult to reflect in data.
Choice of time period for crash and AADT data requires another decision. A 1- to 5-year period is usually recommended for safety ranking, with 3-year period being the most frequent . Using longer time periods (beyond 5 years) may cause problems due to changes in conditions, such as substantial increases in traffic volumes or layout changes, over the period. Probably due to these issues there are no specific guidelines for time period choice. An exception was the simulation study of Cheng and Washington , which concluded there is little gain in the network screening accuracy when using a period longer than 6 years. Also using several consistency tests, 4 years were found sufficient for developing a CPM in a study by Ambros etal. . Usually a compromise between the need for early analysis of new treatments and the need for accumulating sufficient crashes to permit robust analysis is accepted .
Differences between rural and urban settings are also worth mentioning. Traditionally most focus has been given to rural roads (as also evident from CPM reviews [66, 85, 86]). In contrast, modelling urban safety is more challenging, due to higher presence of vulnerable road users and complex environments, including facilities for different road users, mixed land use, or higher density of various intersection types. Detailed crash data is likely to be needed if crash type-specific models are to be developed later on. More road attributes also need to be collected for urban roads, then tested for correlation, autocorrelation, and only then considered in models .
Ideal data sources are road agency asset inventories. Unfortunately, these may not be complete or up to date, and a modeller thus needs to combine various data sources. Additional surveys can be also conducted, either in the field (pedestrian counts, signal timing, speeds, etc.), drive-through digital video collection, or via online maps. Recent emergence of big data and open government policies (e.g. open data initiatives such as data.vic.gov.au) have aided these efforts substantially. It is feasible to pull together substantial amounts of road data from publicly available and road agencies’ own sources. Cross-checking of data for the same attributes between different sets also adds to reducing errors and better data quality management.
3.3 Road network segmentation
CPMs are typically developed either for road intersections or segments. In the latter case, segmentation has to be conducted, in order to divide the network into homogeneous segments, i.e. with constant values of explanatory variables. However, in case of multiple variables, this practice can naturally lead to short segments. This may complicate accurate assigning of crashes to individual segments. In addition, crash concentration is heterogeneous and random; many short segments may also have zero crash counts during the selected time period.
For segmentation, some authors set fixed lengths of several hundred meters [12, 14, 26], or used patterns based on tangents and curves [10, 44, 79]. Long segments can lead to forced homogenisation of variables by aggregating continuous variables into categories (e.g. pavement width bands), and this can lead to loss of applicability. In short, segmentation should consider the overall purpose of the modelling exercise. Longer segments (1–5 km) are often used for network screening [27, 57, 65]. Shorter segments are used to develop more meaningful CMFs, or to estimate localised benefits of safety treatments. Variable segment length can be included in the model. HSM assumes length to be a directly proportional to crash frequency, however many published models which include segment length as a variable suggest otherwise (e.g. ).
In practice, division of road network into segments is likely to be dictated by structure of national road databanks. For example in the Czech Republic, national traffic census (as the main source of AADT data) does not cover all minor roads; thus process of aggregating segments into longer segments including minor intersections was found feasible . As the segments may be subject to further investigations, their length should be feasible for on-site visits or crash analyses.
3.4 Explanatory variables
Selection of explanatory variables should be guided by previously documented crash and injury risk factor evidence available from research literature. However, in practice it is often dictated simply by data availability. Explanatory variables generally include exposure, transport function, cross section, traffic control; less often variables describing alignment, vehicle types or road user behaviour are used . When actual variables are not available, proxy variables may be used, e.g. abutting land use as a proxy for pedestrian movement counts.
The first step in variable selection involves identifying variables which are correlated with each other. For each such pair the researcher should remove one variable which is less useful to the purpose of the model (e.g. if sealed shoulder provision is strongly correlated with line marking presence, then remove the latter). In order to further identify the statistically significant variables, a stepwise regression approach is typically used. It may be applied either in a forward selection or a backward elimination manner; in both cases selected goodness-of-fit (GOF) measures are used to assess the statistical significance. Common GOF measures include information criteria such as AIC or BIC, while others use for example scaled deviance [22, 77] or proportion of explained systematic variance [2, 45].
Based on a number of explanatory variables (model complexity), CPMs may be simple (exposure-only) or multivariate (fully-specified) . Sawalha and Sayed  warned against temptations to build overfit models, i.e. containing too many insignificant variables. In fact, a number of studies found that additional predictors are not as beneficial as expected [59, 70, 82]. One should strive for parsimonious models, i.e. the ones containing as few explanatory variables as possible . Such models enable simple interpretation and understanding, as well as easy updating .
A practice-driven approach was adopted in developing New Zealand rural road CPMs . When it was found that the statistically significant variables did not include the parameters that were of most interest to practitioners, two distinct model types were developed. Statistical models are the best-performing models according to goodness-of-fit measures at 95% confidence levels. Practitioner models contain additional variables of interest to safety professionals, at confidence levels of 70% or more.
On the other hand, in case of leaving out an influential explanatory variable due to unavailable data, so called “omitted variable bias” occurs. The bias results in biased parameter estimates that can produce erroneous inferences and crash frequency predictions [47, 50, 51].
Another bias may be caused by spatial correlation, given by the fact that adjacent road segment may share unobserved effects . This bias can be handled by using random-effect models, where the common unobserved effects are assumed to be distributed over the road segments according to some distribution and shared unobserved effects are assumed to be uncorrelated with explanatory variables .
3.5 Model function and variable forms
Before carrying out the modelling task, exploratory data analysis should be conducted, in order to detect potential outliers, check the extreme values, potential mistakes, etc.
As previously mentioned, crash data are typically overdispersed. The degree of overdispersion in a negative binomial model is represented by overdispersion parameter that is estimated during modelling along with the regression coefficients of the regression equation. The overdispersion parameter is used to determine the value of a weight factor for use in the empirical Bayes (EB) method. This method combines predicted (modelled) and recorded (observed) crash frequencies, in order to improve reliability of a specific site safety level estimation . Applications of EB methods are described in later sections of the review.
Crash frequency (i.e. response variable) ideally should not involve mixed levels of crash severity and crash types, as it may produce uninterpretable results . It is thus recommended to develop disaggregated CPMs . Alternatively one may use the observed proportion of a given crash type or severity and apply it to the CPM that has been estimated for total crashes . However, this has been found a questionable practice, leading to estimation errors . The current recommendation is estimating separate CPMs by crash types. New Zealand practice is developing models for key (or common) crash types and, if necessary, scaling their predictions to represent total crash frequency, to allow for less common crash types . Some studies [24, 27] used sub-samples (for example stratification based on AADT under/over specific limits) in order to improve model quality. In any case, developing disaggregated CPMs obviously requires larger sample sizes. In terms of severity models are developed by injury severity levels (usually with fatal and serious injury crashes combined), as with the ANRAM models . Alternatively, severity factors (proportions) are applied to models developed for all injury crashes or all crashes (including non-injury) .
Regarding function forms of explanatory variables, there is no universal guidance and various are used in the literature. To select the most suitable mathematical forms of explanatory variables, one may use graphical relationships between crash frequency or a road variable (i.e. univariate analysis) , or use more complex techniques, such as empirical integral functions and cumulative residuals (CURE) . According to Hauer , the model equation may have both multiplicative components (to represent the influence of continuous factors, such as lane width or shoulder type), and additive components (to account for the influence of point hazards, such as driveways or narrow bridges). Despite these recommendations, the typical modelling approach is often simple. The general model form of Eq. (1) is widely adopted.
Exposure is usually modelled in terms of traffic volume, i.e. single AADT value for road segments, or product of major and minor AADTs for road intersections. Function is typically a power form, but some authors considered it jointly with an exponential form (so called Ricker model ). Traffic volumes (flows) should be adapted to the specific segment and intersection types. For example, New Zealand CPMs  apply either product of flows or conflicting flows, based on the type of intersection, urban/rural settings and speed limits. As discussed, segment length variable is often used where road segments are not of equal length. For intersections, standard approach length is typically used, e.g. 50–100 m, and not modelled as a variable.
Another example is segment length, usually applied as an offset, i.e. with regression coefficient = 1, but often also in a power form [30, 67, 68]. According to Hauer , segment length should also be considered when estimating the over-dispersion parameter for the frequency models to be used in the empirical Bayes approach. However, the exact form of the relationship is not definite ; in fact, not only length but also other variables may play a role .
Creation of a model is undertaken by running relevant statistical regression processes on the sample data. The most common tools for this are statistical software packages such as R, SPSS, SAS or Matlab. Microsoft Excel is not considered appropriate for this task as it lacks many of the necessary statistical features.
In practice, the modelling process is highly iterative. Variables are added, and then removed if shown to add little or nothing to explanation of the response variable. Often data for a given variable is re-categorised to improve its significance if it is borderline. Often borderline or non-significant variables are retained if they add to better understanding of crash problem. Optimisation of the model fit vs. number of variables vs. applicability is gradually achieved. This iterative process can be stopped when little further improvement in the model is achieved with each iteration [10, 25].
3.6 Model validation
The goal of validation is proving whether the developed model is acceptable from both scientific and practical perspectives. It is thus surprising that most of modelling guidelines seem to overlook this step [1, 23, 35, 36, 48, 71, 72, 83].
Interval validity means that CPM findings should be consistent with established knowledge on the subject; CPM should also possess the features of the underlying phenomenon; and finally CPM should agree with fundamental information and knowledge, such as physical mechanics and dynamics involved with crashes . Newly developed CPMs may be compared to previous literature in terms of signs and magnitudes of regression coefficients, or for example their marginal effects .
External validity (goodness-of-fit) may be evaluated by comparing either models from two independent samples, or a model from a complete sample applied on selected sub-samples that have not been used in the model building (e.g. randomly-chosen 20%). Various goodness-of-fit indicators may be applied; often proportion of systematic variation in the original accident dataset explained by the model (also known as Elvik index) is used [22, 45].
3.7 Using CPMs in network screening
Previous reviews [16, 52] indicated that current state-of-practice is generally behind the state-of-the-art. According to the EB methodology, predicted crash frequency from CPMs should be combined with observed historical crash frequency to obtain the so called “expected average crash frequency with empirical Bayes adjustment” (in short EB estimate). These EB estimates benefit to the practitioner by removing much of the random statistical variation associated with historical crash data, especially at low frequencies [1, 41]. Apart from EB estimates, other safety indicators can be developed for network screening purposes, for example potential for safety improvement (PSI) , level of service of safety (LOSS)  or scaled difference .
In Australia and New Zealand, where low-volume rural roads generate very low numbers of crashes per kilometre per 5 years (or zero), CPMs provide a continuous proxy measure of safety. In Australia the ANRAM model uses EB estimates of severe casualty crashes to remove the random variation in observed crash data at 1–3 km segment level: sites are prioritised simply on the EB estimate . Differences of more than two standard errors between the EB estimate and observed crashes are noted as a possible indicator of non-infrastructure based influences of safety (e.g. localised speeding or drink-driving) .
Given the variety of available methods, HSM  notes that “using multiple performance measures to evaluate each site may improve the level of confidence in the results.” Hence sites may be ranked for treatment based on several different methods [49, 52, 89]. Those that rank consistently high using several methods are the sites where treatment should be focused.
3.8 Using CPMs in developing crash modification factors
Crash modification factor (CMF) is a multiplicative factor used to compute the expected number of crashes after implementing a given countermeasure or a design change at a location. CMFs may be derived from before-after or cross-sectional studies; however, each method has its own challenges, and available CMFs can often be highly inconsistent between literature sources . Before and after studies are generally the preferred source of CMFs, particularly for the HSM. However they typically only look at features in isolation and so when the combined effects of features on crash occurrence is not the sum of the effects of each individual feature, then they may provide misleading results. Several solutions to developing multiple treatment CMFs have been proposed, without reaching definite conclusions [17, 29, 58].
Cross-sectional studies (i.e. the ones based on CPMs) have been criticised for being more prone to non-causal safety effects, due to bias-by selection [11, 19, 36]. Bias-by-selection can occur when a treatment (e.g. a crash barrier) is applied more often to sites that already have a crash problem than to those that do not. They do however provide a much better crash prediction for the combination of road features. In some cases, CMFs are developed from CPMs where limited before and after studies are available.
Although the practice of deriving crash modification factors (CMFs) from cross-sectional CPMs has been criticised, it is relatively common. Again, there are various approaches: for example, Park etal.  tested six different methods of combining CMFs and concluded that one should not rely on only one of them. Interim solution is applying ‘rule-of-thumbs’ , such as using the product of no more than three separate independent countermeasures  or reducing the product through multiplying by a ratio 2/3 .
3.9 Using CPM tools
The above-mentioned analytical steps (data preparation, exploratory analysis, modelling, calculations) are typically conducted in statistical software or spreadsheets. Nevertheless, for an end user it is beneficial to be able to visualize the results. These may take form of tables or map outputs, for example the identified hotspots or the lists of ranked segments. A number of practitioner tools are worthy of mention, especially as they apply to network screening and analysis of safety impacts of potential treatments.
IHSDM Crash Prediction Module  estimates the frequency and severity of crashes on a highway using geometric design and traffic characteristics. This helps users evaluate an existing highway, compare the relative safety performance of design alternatives, and assess the safety cost-effectiveness of design decisions.
SafetyAnalyst (commercial software) Network Screening Tool  identifies sites with potential for safety improvement. In addition, it is able to identify sites with high crash severities and with high proportions of specific crash types.
Note that there are close links between IHSDM, SafetyAnalyst and Highway Safety Manual. According to Harwood etal. , SafetyAnalyst Module 1 (network screening) is to be applied first, followed by Module 2 (diagnosis and countermeasure selection), Module 3 (economic appraisal and priority ranking) and IHSDM to perform safety analyses as part of the design process.
The Finnish evaluation tool TARVA  also deserves mentioning. Its purpose is to provide a common method and database for (1) predicting the expected number of crashes, and (2) estimating the safety effects of road safety improvements. Based on simple CPMs and pre-determined CMFs, it currently exists in Finnish and Lithuanian versions, with planned applications in other countries.
Capabilities of network screening and road safety impact assessment are built in commercial software PTV Visum Safety. There are also applications in the form of Excel spreadsheets, for example British COBALT, Swedish TS-EVA or Norwegian CPMs for national and country roads [37, 38]. In the US, spreadsheets were developed for safety analysis of freeway segments and interchanges (ISAT  and ISATe ).
The Australian National Risk Assessment Model (ANRAM) tool, available to road agencies, is a network screening and prioritisation tool, which uses CPMs for different road stereotypes, together with CMFs and observed crash data to estimate severe injury crashes across segmented road network . ANRAM allows users to develop and estimate benefits of road network and corridor treatment programs. This tool has gained wide use among state road agencies in Australia, particularly for the rural road networks where actual severe crashes are randomly distributed. ANRAM is available in a spreadsheet form, with planned online adaptations.
New Zealand also has a history of various safety prediction tools. Turner etal.  stressed the practical need of such tools and after review of overseas applications, considered IHSDM as worth transferring into New Zealand conditions, for assessing new road designs. A later work  reviewed New Zealand spreadsheet applications, as well as experience with using and calibrating the ISAT tool from the USA.
Increasingly, online business analytics software has been used to display CPM results in map format, often with dynamic filtering and computational functions. Examples include open source and free resources such as ArcGIS Online, QGIS, Tableau, or Microsoft Power BI. These solutions make it easy for practitioners to access and understand the value of CPMs.
4 Challenges and opportunities
Overview of identified challenges, opportunities and potential solutions
Challenges for practitioners
Opportunities and potential solutions
Lack of knowledge of CPMs. Many decision makers simply do not know much about CPMs and thus rely on established but lesser methods. CPMs can be seen as a domain of researchers.
Road agencies, researchers and educators may develop online factsheets and educational materials outlining different applications of CPMs, including case studies. Investment in practitioner tools which use CPMs is also encouraged.
Understanding applicability. Are already published CPMs useful in practitioners’ jurisdictions? Do they apply to their specific safety management problems?
Researchers should clearly state data sources and modelling purpose in CPM publications, and in such sources as CMF Clearinghouse, Highway Safety Manual or PRACT. (See a note on calibration below.)
Confusing choice of model types. There are many different types of CPMs focussing on different aspects of road safety performance management, e.g. crash type-specific, severity-specific, intersections, road segments, road type-specific, network screening vs. CMF development.
Researchers should state these basic intents clearly when publishing CPMs, pointing out limitations in applying their models in unintended ways. This will assist in interpretation of the CPM findings.
Education gap. High level of statistical expertise is required to understand and interpret CPM outputs. Practitioners often lack it.
Researchers should aim to present results so that practitioners will understand them, e.g. equations, tables, sets of graphs, or dynamic visualisations. Articles on application of CPMs should be published in online communication platforms popular with practitioners. CPMs should be included in Masters-level engineering education.
Calibration. When models are published, there is little practical advice available how widely these can be used, or how to calibrate them to the local conditions.
Factsheets and ‘how-to’ guides can be provided as part of systems such as PRACT, to help practitioners in making these decisions.
Application in engineering practice. Even well-communicated and understood CPMs may be too difficult to access if not included in practitioner guidance and tools.
Road agencies and researchers should invest in practitioner tools using CPMs. Big data and online mapping platforms make this task easier and lower cost than in the past. CPMs ‘approved’ for use by experts should be included in the guidelines.
Data availability. Traditionally lack of adequately large road and crash data sample hampered the CPM development.
Big data platforms, connected vehicle technologies and surrogate safety measures are developing fast. With adequate research and development investment, these sources will provide exponentially larger data samples than available from traditional sources.
Modelling task. Data preparation and modelling tasks remain a domain of statistics experts and require dedicated software.
Raise of free programming and software environments for statistical computing has opened this area of research to many. Many online big data platforms allow creation of specialist open source tools for complex mathematical tasks. It is possible that such tools will be developed to guide and simplify data preparation and modelling tasks.
Implementing CPMs in road safety management.
More practitioners could benefit from CPMs with improved education (e.g. factsheets, online resources) and easier access via practitioner tools, such as for example PRACT web repository.
5 Summary and conclusions
Greater uptake of state-of-the-art analytical techniques is necessary for continuing improvement in road safety. This paper aimed to improve practitioner understanding of modelling road safety performance using CPMs, so that this useful analytical technique could become more accessible.
A number of steps have been reviewed: from data collection and road network segmentation to choosing variables and function forms, validating models and using them in practice, including description of available tools. The review highlighted that developing CPMs is not a straightforward task: there are many alternative choices and decisions to be made during the process (without definite guidance), which explains the diversity of approaches and techniques. While this may be interesting from a research perspective, the current diverse state-of-the-art limits understanding and application by practitioners, and complicates international comparability or transferability. There is a need to identify the opportunities and solutions, which will be scientifically sound, while also meeting the needs of practitioners.
The main consideration for the researches should be application of their models by intended practitioners. This applies equally in the context of basic research, such as seeking understanding of a new challenge, as in the context of applied research such as development of algorithms for inclusion in practitioner software. Either way the end users of CPMs are the practitioners, i.e. road agency engineers, policy makers, or data analysts.
CPMs are valuable tools, which help link crashes with risk factors. This is especially valuable in current conditions of scattered crash occurrence (less crash black-spots), where traditional crash-based approaches do not work well.
Developing and using CPMs has its challenges. However, these may be overcome by improved communication of the CPM benefits and application, so that practitioners have a basic understanding of CPMs and can make basic application decisions (e.g. use or calibrate available models).
Applying network-wide CPMs enable performing effective road safety impact assessment and network screening.
Ongoing investment in developing CPM-based practitioner tools, big data management and visualisation platforms offers potential for improved accessibility and uptake of CPMs in road safety management.
The paper was produced with the financial support of Czech Ministry of Education, Youth and Sports under the National Sustainability Programme I project of Transport R&D Centre (LO1610), using the research infrastructure from the Operation Programme Research and Development for Innovations (CZ.1.05/2.1.00/03.0064).
All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.AASHTO (2010) Highway safety manual, 1st edn. American Association of State Highway and Transportation Officials, WashingtonGoogle Scholar
- 3.Ambros J, Sedoník J, Křivánková Z (2018) How to simplify road network safety screening? Adv Transp Stud 44:151–158Google Scholar
- 4.Arndt O, Troutbeck R (2006) Techniques for analysing the effect of road geometry on accident rates using multifactor studies. Paper presented at the 22nd ARRB conference, Canberra.Google Scholar
- 6.Bonneson JA, Geedipally S, Pratt MP, Lord D (2012) Safety prediction methodology and analysis tool for freeways and interchanges. NCHRP project 17–45 final report. Transportation Research Board, WashingtonGoogle Scholar
- 7.Bornheimer C, Schrock S, Wang M, Lubliner H (2012) Developing a regional safety performance function for rural two-lane highways. Paper presented at the 91st Transportation Research Board Annual Meeting, WashingtonGoogle Scholar
- 10.Cafiso S, D’Agostino C (2013) Investigating the influence of segmentation in estimating safety performance functions for roadway sections. Paper presented at the 92nd Transportation Research Board Annual Meeting, WashingtonGoogle Scholar
- 11.Carter D, Srinivasan R, Gross F, Council F (2012) Recommended protocols for developing crash modification factors. NCHRP project 20-07, task 314 report. Transportation Research Board, WashingtonGoogle Scholar
- 12.Cenek PD, Davies RB, McLarin MW, Griffith-Jones G, Locke NJ (1997) Road environment and traffic crashes. Research report 79. Transfund, WellingtonGoogle Scholar
- 14.da Costa JO, Jacques MAP, Pereira PAA, Freitas EF, Soares FEC (2015) Portuguese two-lane highways: modelling crash frequencies for different temporal and spatial aggregation of crash data. Transp 30:1–12Google Scholar
- 15.Eenink R, Reurings M, Elvik R, Cardoso J, Wichert S, Stefan C (2008) Accident prediction models and road safety impact assessment: recommendations for using these tools. RIPCORD-ISEREST project deliverable 2Google Scholar
- 18.Elvik R (2010) Assessment and applicability of road safety management evaluation tools: Current practice and state-of-the-art in Europe. Report 1113/2010. Institute of Transport Economics, OsloGoogle Scholar
- 20.FHWA (2003) Interactive highway safety design model (IHSDM) – crash prediction module (CPM) Userʼs manual. Federal Highway Administration, McLeanGoogle Scholar
- 21.FHWA (2010) SafetyAnalyst: software tools for safety management of specific highway sites. White paper for module 1 – Network screening. Federal Highway Administration, McLeanGoogle Scholar
- 23.Fridstrøm L (2015) Disaggregate accident frequency and risk modelling: A rough guide. Report 1403/2015. Institute of Transport Economics, OsloGoogle Scholar
- 26.Geyer J, Lankina E, Chan C-Y, Ragland D, Pham T, Sharafsaleh A (2008) Methods for identifying high collision concentration locations for potential safety improvements. Report UCB-ITS-PRR-2008-35. University of California, BerkeleyGoogle Scholar
- 27.Gitelman V, Doveh E (2016) Safety management of non-urban roads in Israel: an application of empirical Bayes evaluation. J Traffic Transp Eng 4:259–269Google Scholar
- 28.Gross F, Persaud B, Lyon C (2010) A guide to developing quality crash modification factors. Report FHWA-SA-10-032. Federal Highway Administration, WashingtonGoogle Scholar
- 29.Gross F, Hamidi A (2011) Investigation of existing and alternative methods for combining multiple CMFs. T-06-013 HSIP Technical Support, Task A.9Google Scholar
- 30.Hadi MA, Aruldhas J, Chow L-F, Wattleworth JA (1995) Estimating safety effects of cross-section design for various highway types using negative binomial regression. Transp Res Rec 1500:169–177Google Scholar
- 31.Harwood DW, Torbic DJ, Richard KR, Meyer MM (2010) SafetyAnalyst: Software tools for safety management of specific highway sites. Report FHWA-HRT-10-063. Federal Highway Administration, McLeanGoogle Scholar
- 32.Hauer E (1997) Observational before-after studies in road safety: estimating the effect of highway and traffic engineering measures on road safety. Pergamon, OxfordGoogle Scholar
- 33.Hauer E, Bamfo J (1997) Two tools for finding what function links the dependent variable to the explanatory variables. Paper presented at ICTCT 97 Conference, LundGoogle Scholar
- 36.South J, Blass B (2001) The future of modern genomics. Blackwell, LondonGoogle Scholar
- 37.Høye A (2014) Development of crash prediction models for national and county roads in Norway. Report 1323/2014. Institute of Transport Economics, OsloGoogle Scholar
- 38.Høye A (2016) Development of crash prediction models for national and county roads in Norway (2010-2015). Report 1522/2016. Institute of Transport Economics, OsloGoogle Scholar
- 39.Jonsson T (2005) Predictive models for accidents on urban links: A focus on vulnerable road users. Bulletin 226. Lund University, LundGoogle Scholar
- 41.Jurewicz C, Steinmetz L, Turner B (2014) Australian National Risk Assessment Model. Publication AP-R451–14. Austroads, SydneyGoogle Scholar
- 42.Kim E, Lee D, Choi B-G, Choi S-E, Choi E (2010) Applicability of a Korea highway safety evaluation model compared to the crash prediction module of IHSDM. Paper presented at the 12th World Conference on Transport Research, LisbonGoogle Scholar
- 45.Kulmala R (1995) Safety at rural three- and four-arm junctions: Development and application of accident prediction models. Publication 233. VTT Technical Research Centre of Finland, EspooGoogle Scholar
- 47.Lord D, Mannering F (2010) The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transp Res A 44:291–305Google Scholar
- 53.NZTA (2016) Crash estimation compendium (New Zealand crash risk factors guideline). NZ Transport Agency, WellingtonGoogle Scholar
- 54.OECD (1997) Road safety principles and models: review of descriptive, predictive, risk and accident consequence models. OECD, ParisGoogle Scholar
- 55.OECD (2012) Sharing road safety: developing an international framework for crash modification functions. OECD, ParisGoogle Scholar
- 59.Peltola H, Kulmala R, Kallberg V-P (1994) Why use a complicated accident prediction model when a simple one is just as good? Paper presented at the 22nd PTRC Summer Annual Meeting, WarwickGoogle Scholar
- 62.Persaud BN (2001) Statistical methods in highway safety analysis: A synthesis of highway practice. NCHRP synthesis 295. Transportation Research Board, WashingtonGoogle Scholar
- 64.Persaud B, Saleem T, Faisal S, Lyon C, Chen Y, Sabbaghi A (2012) Adoption of Highway Safety Manual predictive methodologies for Canadian highways. Paper presented at 2012 TAC Conference, FrederictonGoogle Scholar
- 65.Ragnøy A, Christensen P, Elvik R (2002) Injury severity density: A new approach to identifying hazardous road sections. Report 618/2002. Institute of Transport Economics, OsloGoogle Scholar
- 66.Reurings M, Janssen T, Eenink R, Elvik R, Cardoso J, Stefan C (2005) Accident prediction models and road safety impact assessment: a state-of-the-art. RIPCORD-ISEREST project deliverable 2.1Google Scholar
- 67.Reurings M, Janssen T (2007) Accident prediction models for urban and rural carriageways. Report R-2006-14. SWOV Institute for Road Safety Research, LeidschendamGoogle Scholar
- 72.Srinivasan R, Bauer K (2013) Safety performance function development guide: Developing jurisdiction-specific SPFs. Report FHWA-SA-14-005. Federal Highway Administration, WashingtonGoogle Scholar
- 73.Srinivasan R, Carter D, Bauer K (2013) Safety performance function decision guide: SPF calibration vs SPF development. Report FHWA-SA-14-004. Federal Highway Administration, WashingtonGoogle Scholar
- 74.Sun X, Li Y, Magri D, Shirazi HH (2006) Application of highway safety manual draft chapter: Louisiana experience. Transp Res Rec 1950:55–64Google Scholar
- 75.Torbic DJ, Harwood DW, Gilmore DK, Richard KR (2007) Interchange Safety Analysis Tool (ISAT): User manual. Report FHWA-HRT-07-045. Federal Highway Administration, McLeanGoogle Scholar
- 76.Turner B (2011) Estimating the safety benefits when using multiple road engineering treatments. Road Safety Risk Reporter 11Google Scholar
- 77.Turner S, Durdin P, Bone I, Jackett M (2003) New Zealand accident prediction models and their applications. Paper presented at the 21st ARRB Conference, CairnsGoogle Scholar
- 78.Turner S, Tate F, Koorey G (2007) A SIDRA for road safety. Paper presented at 2007 IPENZ Transportation Group Conference, TaurangaGoogle Scholar
- 79.Turner S, Singh R, Nates G (2012) The next generation of rural road crash prediction models: final report. Research Report 509. NZ Transport Agency, WellingtonGoogle Scholar
- 80.Turner S, Brown M (2013) Pushing the boundaries of road safety risk analysis. Paper presented at 2013 IPENZ Transportation Group Conference, DunedinGoogle Scholar
- 83.Wood GR, Turner S (2007) Towards a start-to-finish approach to the fitting of traffic accident models. In: De Smet A (ed) Transportation accident analysis and prevention. Nova Science, New York, pp 239–250Google Scholar
- 85.Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, La Torre F et al (2014) Overview of existing accident prediction models and data sources. PRACT project deliverable D1Google Scholar
- 86.Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, Calabretta F et al (2015) Inventory and critical review of existing APMs and CMFs and related data sources. PRACT project deliverable D4Google Scholar
- 87.Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, La Torre F, et al (2016) Use of accident prediction models in road safety management – an international inquiry. Transp Res Proc 14:4257–4266Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.