1 Background

Today, traffic safety work is based on registered crashes with the goal of addressing the problems identified in crash data analysis such as to reduce number and severity of injuries. However, inaccurate traffic crash records may bias the results of traffic safety analyses, consequently leading to misguided crash-prevention strategies. A major problem concerning the availability of accurate information about traffic crashes is the incomplete crash records in the official statistics [1]. Recorded crashes involving vulnerable road users (VRUs; e.g., pedestrian, cyclist and motorcyclist) are underrepresented in the official national statistics, especially those involving slight injuries (see, e.g., [2,3,4,5,6]). Because, on average, road traffic deaths worldwide involve vulnerable road users, with approximately 50% in high-income countries, 60% in middle-income countries and 70% in low-income countries of all registered crashes [7], complementary data about the injury situation of these groups are of great value, giving the opportunity to extend the scope of data available for research and safety improvement measures.

Traffic crash cases are documented by police, and injured or killed persons are noted by hospital registers. Both registers suffer from underreporting (see, e.g., [8,9,10]). Insurance companies also collect crash data, but usually, these data are not accessible for actors outside the company. Various factors may affect the reporting of crash events such as injury severity, day and time of crashes, demographic characteristics (e.g., age group and gender) and cost of damages (discussed in [11]). The incompleteness of traffic crash records is a worldwide issue, both in developed countries (see, e.g., [3, 9]) and developing countries (see, e.g., [11, 12]), which has led researchers to call for complementary sources of information on road traffic crashes.

Other than being documented by the police, hospital, insurance companies or other records (e.g., a company record), some unrecorded traffic crashes can be traced via self-reports. Self-reporting is a common way to address underreporting problems as a complementary approach to official records (see, e.g., [13,14,15]). Self-reports are widely used in research areas such as transportation research, social science and medicine, here finding a role as a complementary approach to obtain more individual information. Since self-reports of traffic crashes provide useful complementary information to official reports, they are of great value, and are increasingly used in low- and middle-income countries [16]; it is an individual reporting system that divulges the participant’s information not recorded in any official documents without researcher interference. The participants normally are asked to report their personal information without any external influences. Lajunen and Özkan [17] claimed that self-report surveys are a cost-effective and easy way to gather large samples of data. However, Violanti and Marshall [18] stated that self-reported crashes usually are more numerous than those found in the official records because most drivers usually report more crashes than the official files contain [19].

Self-reporting can have many different aims depending on the research question being investigated, but there are various issues of importance, such as study design (e.g., type of questionnaire/interview to be used for data collection, recruitment of respondents, sample size, recall period, type of crashes and type of road users), the reliability and validity of the data and reporting bias (i.e., recall — possibility of overreporting or underreporting and social desirability). The ways of getting information from people can vary; they may be asked to fill out written questionnaires (either online or paper based), participate in interviews (either face-to-face or via telephone) or report their crash involvement via an app on their mobile device. Also, response rates might vary depending on the data collection method. Some study designs may necessitate follow-up sessions to obtain possible additional information. The target group may be some specific group of road users, e.g., car users, bicyclists or pedestrians, a certain age group or people with a certain illness and participant selection may be voluntarily or random. The sample size may vary with the purpose of the study or simply be limited by economic considerations. The information that people are asked to give also may vary. In some studies, only the number of crashes in which the respondent was involved may be of interest. In other studies, respondents may be asked about possible crash contributory factors, and some studies deal with the respondent’s recollection of the crash details. A sensitive issue is the anonymity of the respondents. If self-reports to be compared to other sources of traffic crash records, e.g., hospital or police records, an individual identifier is necessary to be able to match crash events in both data sources and in this case a consent given by the respondent is needed. Finally, but not least, the added value of self-reporting studies in their context is of relevance. A good insight in these issues and how they influence the outcome of a self-reporting study is of interest for those working with road safety analysis based on crash data.

2 Aim

The present review article aims to map the current practice in the collection of road traffic crash data by surveying studies where traffic crashes were reported by the involved road users. The analysis is focused on the publications that emphasise the methodological aspects, such as selection and type of respondents, sample size, data collection method and so forth. Advantages and drawbacks of the various ways to carry out a self-reported study are discussed and recommendations for further studies are given.

To the best of our knowledge, there is no published literature review paper about the processes surrounding the issue of self-reporting of traffic crashes in transportation literature.

3 Method

A systematic literature review was carried out to map the current practice of data collection for self-reported traffic crashes. Three databases were searched for publications: ScienceDirect, Scopus and Transport Research International Documentation (TRID). ScienceDirect contains research articles from 3800 journals and more than 37,000 book titles. Scopus is the largest abstract and citation database (i.e., journals, books and conference proceedings), and it contains more than 22,800 serial titles and more than 150,000 books that come from more than 5000 publishers. Both ScienceDirect and Scopus are owned by Elsevier. TRID focuses on transportation research and contains more than 1.1 million records worldwide (i.e., books, technical reports, conference proceedings and journal articles) and is maintained by the Transportation Research Board of the U.S. National Academies. A search of these three databases is expected to cover all relevant publications in the transport research area.

3.1 Search strategy

Combinations of three groups of keywords, strings (*) and Boolean operators (AND/OR) were used in the search strategy to retrieve the relevant publications (see Table 1). The ‘AND’ Boolean operator was used to connect keyword groups while the ‘OR’ Boolean operator was used to connect each keyword in the groups. All keywords were searched in the title, abstract and keywords sections in each database.

Table 1 Search terms and keywords used in the literature search

The systematic literature review aimed to locate publications related to the self-reporting of traffic crashes. Because the words ‘accident’ and ‘crash’ are used as synonyms for each other in academic publications, both were used in the search strategy as the first keyword. The second keyword was used to specify the method of data collection, here being self-reporting by the road users. The third keyword was crucial for retrieving the most relevant publications in the transport research field because the word ‘accident’ (first keyword) also covers broad areas of research; hence, it will return a very high number of hits, where the majority are related to other kinds of accidents (e.g., industrial accidents), not just traffic accidents. ‘Self-reporting’ (the second keyword) is also used in various fields of research, including medicine and social science.

Publications written in English were included, except for the ScienceDirect database, where no language filter tool was available. Therefore, non-English retrieved publications from ScienceDirect were manually excluded. The search was limited to the last 11 years (2006–October 2017). The titles and abstracts were screened according to the following inclusion criteria:

  1. i.

    The paper deals with traffic crashes/accidents.

  2. ii.

    Self-reporting means that people provide information on at least the number of crashes but perhaps also more details, either via face-to-face interviews, telephone interviews, questionnaire (paper or online) or by other means.

A codebook was established to thoroughly review the publications that met the above criteria. Using a codebook helps the reviewer extract the important themes and findings of the studies and expedite the analysis stage. The codebook classified information on several aspects, including publication ID, full reference, link to publication, year of publication, language of the publication, non-inclusion criteria in the case where a paper was not included, focus of the study (either methodological, practical applied or both), how the data were collected, sample size, the recruitment of the respondents, age group of the respondents, road user type, the recall period and interval the respondent was asked to self-report, follow-up frequency, response rate, whether the self-reported data were compared with crash data that were registered by other means, the country the study was conducted in and what the self-reported crash data were used for. Figure 1 shows a conceptual model of the issues considered in study design, which constitute the basis of the outcomes of the current self-reporting study. For example, the data collection method and recruitment method that ensures the anonymity of the respondents will influence the response rate, but it also could increase or decrease the desirability bias.

Fig. 1
figure 1

The conceptual model used in the analysis of self-reporting studies

There were 1533 hits in the selected databases (i.e., ScienceDirect = 148; Scopus = 542; TRID = 843). All retrieved publications were exported to EndNote X7.7.1 for a screening process. Two hundred and fifty-five duplicate publications were removed, resulting in 1278 to be thoroughly screened. Three non-English publications were removed. Two stages of the screening process were performed to remove irrelevant publications; the first screen was based on the title and abstract, and the second screen was based on the full text. At the end of the screening process, 127 publications were kept and included in the review. Three of the publications described more than one study of self-reporting traffic crashes, which in total gave 134 studies to be coded and discussed (See the table in Appendix).

4 Findings

The number of publications on self-reported traffic crashes has increased over the last 10 years, indicating that this area is relevant and useful in the transportation safety engineering field when it comes to assessing safety problems or crash causation factors.

4.1 Focus of the studies

The reviewed studies could have their focus either on methodological or applied/practical aspects. Of the 134 studies reviewed, two-thirds mainly had an applied/practical focus, getting accident data without emphasising the method used to obtain the data. Forty-one studies focused on both the practical/applied issues and methodological aspects of self-reporting of traffic crashes. Five studies had a strong methodological focus where the method was explained in detail [20,21,22,23,24].

Various motivations were found to drive the studies on self-reporting of traffic crashes, such as safety evaluation [21, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68], investigation of crash causation factors [22, 69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103], determination of the number of crashes for a specific group (e.g., novice drivers, elderly) [104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131], estimation of underreporting [15, 20, 132,133,134,135,136,137,138,139], calculation of crash costs [140] or other factors (e.g., to investigate the memory effect) [141,142,143,144,145]. However, all studies were conducted at the very least to understand and assess the traffic safety situation.

4.2 Studies by world regions

Self-reporting studies were mainly conducted in European, Australasian and North American countries (see Fig. 2). Fewer studies were conducted in Asian, African, South American and Middle Eastern countries. Three publications compared self-reporting studies from multiple regions (i.e 7 countries) (i.e., [40, 67, 104]). Eighty-two percent of the studies collected data from a limited area (town or region) while 16% of the studies covered a whole nation.

Fig. 2
figure 2

Distribution of the studies by world regions (three of the studies were made in multiple regions)

4.3 Data collection

The reviewed studies used various data collection approaches (see Fig. 3). The approach can be based on various criteria, for example, efficiency in recruiting potential respondents, assessed response rate, time-efficiency or the costs of conducting the study. Using a questionnaire was found to be the most frequent method used for data collection, either online or paper. Interviews seemed to be a less popular method of data collection. Twenty-nine of the reviewed studies conducted follow-up sessions to obtain information on possible additional crashes that might have occurred after the preceding session (e.g., weekly, fortnightly, monthly, every 3 or 6 months, annually) [20, 23, 24, 38, 40, 41, 45, 47, 52, 55,56,57, 63, 67, 77, 83, 87, 111, 116, 121, 129, 133,134,135,136, 138, 144, 145]. Interestingly, there is one unique study used smartphone sensors to assist with data collection [108]. Not all reviewed studies stated the response rate of the respondents (see Appendix).

Fig. 3
figure 3

Method of data collection

Most of the studies selected the target group of road users either voluntarily or randomly; some specific group of road users, such as people with a certain illness (see, e.g., [115, 131]), young adults (see, e.g., [27]) or elderly (see, e.g., [134]). Figure 4 shows the basis for recruiting respondents in the reviewed studies.

Fig. 4
figure 4

Basis for selection of respondents (details in Appendix)

There was a large amount of variation in the type of road users targeted in the reviewed studies (see Fig. 5). Almost half of the studies targeted car users as the study’s respondents. Only a few studies focused on VRUs, despite these groups of road users having the highest number of casualties and being the most underreported [3, 6, 12, 146]. Bicycle safety studies seem to have become popular recently and mostly are found in Australasia and European countries and Canada. Some of the studies combined all types of road users or more than one type of road user (see, e.g., [75, 88]).

Fig. 5
figure 5

Type of road users involved in the traffic crashes

The size of sample in the reviewed studies ranged from less than 100 to more than 10,000, depending on the objective of the study (see Fig. 6).

Fig. 6
figure 6

Sample sizes of the reviewed studies

The reviewed studies also varied regarding the age group of the recruited respondents. Sixty-three percent of the studies focused on adults, covering those older than the legal age of obtaining a driver’s license and with no upper limit, while the rest included respondents ranging from children to elderly. Three studies involved respondents who were aged 16 years and older [54, 69, 107]. Only 16% of the reviewed studies focused on young adults of 15–30 years old (details in Appendix).

The recall periods used by the reviewed studies varied from less than a month to up to more than 5 years or since licensure. Approximately 60% of the recall periods ranged 1–3 years, with most of them being 1 year (50 studies). Only about 11% of the reviewed studies used lifetime or more than 5 years of a recall period.

One quarter of the studies compared self-reports to other sources of traffic crash records, such as hospital records, police records, insurance records, company records, multiple records or other data sources. These types of comparisons were possible only if consent had been given by the respondents and if permission was granted by the authority (e.g., police) to access individual data in the crash database. Ethical approval had to be obtained if the conducted research involved confidential data especially from medical records.

5 Discussion

This review focused on the self-reporting of traffic crashes in a traffic safety engineering context. A well-constructed search strategy was essential to find all relevant publications.

In general, the studies reviewed in the current paper mainly focused on car crashes (49%) and involved adult road users (63%). Fewer studies targeted VRUs (24%), despite the fact that traffic crashes are the main cause of death among those aged 15–29 years and that almost 50% of all deaths on the world’s roads occur among those with the least protection, such as pedestrians, cyclists and motorcyclists [7].

The majority (82%) of the studies reviewed were conducted in Europe, North America and Australasia and had a practical and/or applied focus (65%). Fewer studies (18%) were conducted in Asian, South American, African and Middle Eastern countries. Knowing that official crash data are not always available to the researcher and road authorities in developing countries (see, e.g., [11]), developing countries would benefit from using the self-reporting approach to conduct road safety studies so that the real safety situation of the country could be appropriately assessed to determine the crash causation factors, estimate underreporting, crash costs and reveal other effects (e.g., psychological distress after injury as studied by Tran et al. [145]), and consequently improve their traffic situations. A WHO [7] reported, most of the traffic deaths (approximately 90%) occur in developing countries, where rapid economic growth in parallel with motorisation has led to traffic injuries, especially those involving VRUs (60–70%). Also, Mock et al. [16] recommended that self-reporting would be a suitable approach in low- and middle-income countries due under-reported in the official records.

Apparently, most of the researchers were aware of the ‘social desirability’ bias that is sometimes present in self-reporting studies (as argued by [40]) and incorrect memory recall due to passage of time from crash event occurred to when the respondent was asked to recall it [16, 147] because most of the reviewed studies discussed these issues. A social desirability bias may occur when the respondents want to show that they are good road users, which could affect the number of reported crashes. The possibility of this bias being present could increase if the respondents’ personal information is asked for. A social desirability bias could be prevented by applying anonymous questionnaires if the reported crashes are not going to be linked to other data sources. Then, a personal identifier is not important. However, in this case, the self-reporting data cannot be validated.

Another issues in self-reporting is the deterioration of memories, which could arise because of several factors, such as the seriousness of the experienced crash, the number of involved vehicles and how long ago the crash had occurred. The deterioration of a respondent’s memories may significantly affect the reliability of self-reporting data. af Wåhlberg et al. [40] claimed that drivers do not report their involvement in crashes accurately (overreporting or underreporting), while a study conducted by Bajaj et al. [20] found a strong agreement (90%) between the self-reports of traffic crashes for cirrhosis patients and the official records. af Wåhlberg et al. [40] suggested that self-reports should be used in parallel with a lie scale to control for the possible lie effect. Nevertheless, none of the 134 studies reviewed incorporated a lie scale. Incorrectly recalling one’s involvement in a traffic crash can only be addressed with a shorter recall period or a regular reporting scheme. Long recall periods carry the risk of forgotten crash events that may amount to approximately 30% each year [147]. Therefore, some of the reviewed studies used regular follow-up sessions, driving diaries or limited recall periods to reduce the effect of memory recall bias. Based on the findings, it seems that a maximum of 1 year is the optimal recall period: the time period is short enough to reduce the risk of recall bias, but it is still a long enough to allow for the collection of a satisfactory amount of data.

Obviously, researchers ‘trust’ self-reports, which is indicated by the increasing number of publications using self-reporting as a research method for data collection, despite the reliability and validity issues and possibility of reporting bias that come with self-reported data. Of the reviewed studies, 48 used self-reports to assess the safety situation, 39 to identify crash causation factors, 31 to determine the number of crashes for a specific group, 10 to estimate underreporting, five to solve other issues related to traffic safety and one to estimate the costs of crashes. For example, a study conducted by Finestone et al. [52] that evaluated safety among stroke survivors showed that self-reports are useful in complementing the official records because some of the crashes are not reported in the official records but are registered in self-reports and vice-versa; therefore, a combination of both records could give a more accurate picture of driving safety. Hassan and Abdel-Aty [91] used the results of self-reports to suggest crash risk-reduction measures and to promote safe driving among young drivers.

Using a questionnaire (either paper or online) was the most frequent approach for collecting self-reported data. Some of the studies used both types of questionnaires to reach the targeted respondents because not all the respondents had Internet access. Interviews (either telephone or face-to-face) seemed to be less popular (23%). Nevertheless, interviews logically could reduce the number of outlier respondents because the interviewer could ask or rephrase the questions to ensure that the respondents understand the questions. There were several studies using telephone interviews as a follow-up to obtain more information about the reported crashes [111, 134]. When dealing with young respondents (school children) for paper-based or online questionnaires, the researcher was always present.

The utilisation of a smartphone device seems to be a promising approach for collecting self-reported data and recorded data. More recent smartphones normally are embedded with sensors to trace movement and rotation. A study conducted by Isho et al. [108] used smartphone sensors to record trunk acceleration associated with fall risk among post-stroke elderly with and without fall history. They found that smartphone can provide detailed pattern of movement that might be useful as a complementary data to better understand the crash course of event. There are ongoing efforts to develop these types of sensor-based apps in a EU-project called InDeV (In-depth understanding of accident causation for vulnerable road users) [148]. However, more research is needed to investigate their stability, validity and reliability.

The quality of self-reports strongly depends on the way the questions are asked in connection with the reason for asking the question. The approach used for a self-reporting study is influenced by the expected number of respondents and expected response rate. The number of recruited respondents depends on the objective(s) of the study (if focused on a limited area, a whole nation or a specific group of road users). The expected response rate, however, did not seem to be a robust indicator for deciding the best practice of data collection because not all the reviewed studies stated the figure and depended on how the questionnaires were distributed or the interviews were conducted; sometimes, the studies did not account for the number of total invitations. The response rate was provided in some papers, and in some others, it could be manually calculated. The reported response rate was anywhere between 1% and 100%, and not robust to be relied on to drawn conclusion of the self-reports; however, for 56 of the studies, no response rate was available. A combination of several methods could also improve the quality of the study and could produce a higher response rate (> 80%) (see, e.g., [65, 140]). It should be noted that a paper-based questionnaire is costly compared with an online questionnaire. Nevertheless, several aspects should be considered when using online questionnaires, such as the availability of a server to host the questionnaire and the Internet access for the targeted respondents. Interviews seem to be a promising approach, but they are also costly (e.g., transportation costs for face-to-face interviews or the costs for telephone interviews), and it is very time-consuming if involving a large number of respondents.

The reviewed studies often targeted respondents using specific criteria, recruiting either volunteers or selecting them randomly from a specific type of target group (e.g., school children, novice drivers, offending drivers, etc.). Respondents were invited and recruited at public service areas (e.g., train station, gas station, etc.), at shopping centres or by advertising on social media, websites, e-mail, flyers and word of mouth. Also, participants of events were targeted (e.g., bicycle event) for a limited study region (see, e.g., [50, 136]). To represent young drivers, most of the studies targeted high school students, university students and driving license learners (see, e.g., [28, 55, 79, 137]). A random recruitment of citizens was usually used to collect the data representative for an entire nation, for example, in Goldenbeld et al. [22]. Some studies divided the study area into several geographical units, for example, Gliklich et al. [95], and limited the number of respondents in each area by making the sample stratified. Epidemiological studies usually targeted hospital patients to obtain the patients’ crash history, which could be related to their health status development (see, e.g., [20, 52, 69]).

The current review is limited to the studies retrieved from the selected databases, and there is a possibility that self-reporting traffic crash studies published elsewhere are not included here. The database search was restricted to only English language literature published from the year 2006 until October 2017 and that was available online. Due to the language barrier, it can be expected that some of the research articles written by researchers in Asian, South American, African and Middle Eastern countries were locally published and not indexed in the mainstream international databases, affecting the number of available publications from these countries. Some of the publications focused on driver behaviour by employing a ‘Driver Behaviour Questionnaire’ were excluded, even if a question asked for the number of crashes the participant was involved in.

6 Conclusions and recommendations

Self-reporting is a useful tool that can be used as a complementary method to obtain more information on crash events, but reliability and validity issues should always be taken into consideration. The following conclusions can be drawn from this review:

  • Studies of self-reported crashes are more common in European, North American and Australasia countries, but there are few in developing countries.

  • Most of the reviewed studies were conducted on car users. Studies on VRUs (i.e., pedestrians, cyclists, motorcyclists, etc.) were relatively few.

  • A questionnaire (either paper or online) approach was more common than interviews (either face-to-face or telephone), but a combination of more than one approach could reach more potential respondents and produce a better response rate.

  • A recall period of 1 year was the most common in the reviewed studies, though it ranged from less than a month to more than 5 years.

Because official crash databases are far from complete and the VRUs involved in traffic crashes are overrepresented, self-reporting studies of traffic crashes of VRUs should be conducted to complement the official files.

More studies should be conducted to assess the safety of younger populations (< 30 years) because this group of road users are overrepresented in traffic crashes. Developing countries should increase their efforts in this area to efficiently assess the actual traffic safety situation.

Crashes recorded in the self-reports could be linked to official databases to determine the degree of agreement and increase the data validity. Not to mention, a sufficient individual identifier is required to match crash events in both data sources [149]. Nevertheless, consent from the individual respondents should be granted prior to a data link is performed. The possibility that respondents, aware that their self-reported data will be linked to official crash records, will only recall crashes that had been reported to the official files, thus resulting in an underreporting of crashes, should be considered. However, because including an individual identifier could lead to a social desirability bias, anonymity issues should be taken into account, as suggested by Lajunen and Özkan [17].

It is important for researchers to be aware of the shortcomings (i.e. reporting bias – social desirability and incorrect memory recall) of self-reporting and take the appropriate measures to mitigate them. Studies emphasising the method used should be made to promote an in-depth understanding of self-reporting traffic crashes. Furthermore, research papers should be more explicit in explaining the method of self-reporting by clearly stating how the data collection was conducted, how the targeted respondents were approached, the total number of respondents, the response rate, the recall period used and which category of road users (i.e., age and type) were included in the study.

Traffic safety research could benefit from the rapidly growing of smartphone devices with their sophisticated technology. Use of smartphone applications can assist data collection for in-depth crash analysis. Crash detection via smartphone app, particularly involving VRUs, is to be explored further.