Background

Violent deaths are a significant public health burden in the USA, with over 270,000 deaths attributed to fatal injury in 2020 (Centers for Disease Control and Prevention 2021a). Evidence-based violence prevention efforts have been hampered historically by a lack of high quality and timely surveillance data on these deaths and their circumstances. Calls for a national fatal intentional injury system that tracked these deaths resulted in collaborative efforts to create such a monitoring system (Barber et al. 2013; Hemenway et al. 2009), which began as the National Violent Injury Statistics System (NVISS). The National Violent Death Reporting System (NVDRS, publicly available at https://www.cdc.gov/violenceprevention/datasources/nvdrs/dataaccess.html), implemented by the Centers for Disease Control (CDC) in 2002, arose from this ongoing effort as a federally funded, active state-based reporting system that collects data on violent deaths, defined as “death that results from the intentional use of physical force or power, threatened or actual, against oneself, another person, or a group or community” (Centers for Disease Control and Prevention 2022b). These include suicide, homicide, legal intervention deaths, unintentional firearm deaths, and deaths with undetermined intent.

The NVDRS- and state-specific Violent Deaths Reporting Systems (VDRS) collect and link primary investigative information from a number of existing sources, including death certificates, coroners and medical examiners (C/ME), toxicology records, and law enforcement (LE) reports, to create the most comprehensive, centralized surveillance reporting system of violent deaths. The NVDRS also incorporates secondary sources of information from crime labs, hospitals, court records, press releases, and Intimate Partner Violence (IPV) and Child Fatality Review (CFR) reports (Centers for Disease Control and Prevention 2022b). The scope and methodology of the NVDRS has been described in additional detail elsewhere (Centers for Disease Control and Prevention 2022b; Blair et al. 2016b; Steenkamp et al. 2006; Paulozzi 2004). As of 2018, the NVDRS expanded to all 50 US states, Puerto Rico, and the District of Columbia. This reporting system has substantial potential to inform policy and prevention practice, with examples of this already demonstrated in various states (Powell et al. 2006).

Beyond this publicly available data, the CDC manages a centralized Restricted Access Database of the NVDRS (RAD-NVDRS) which includes additional variables encompassing decedent and suspect demographic variables, incident circumstance variables, and toxicology variables. Notably, the RAD-NVDRS contains short text narratives (between 150 and 300 words) written by VDRS staff using C/ME and LE reports, suicide notes, and interviews with the decedents’ family/friends (Centers for Disease Control and Prevention 2022b). These narratives provide a rich source of qualitative data to supplement the NVDRS’s existing quantitative variables. In addition to validating coding decisions on coded variables, the narratives provide opportunities to identify emerging and novel risk factors salient to violent deaths beyond existing quantitative variables in the NVDRS. They can also be used to identify violent deaths that are often difficult to accurately count, such as accidental gun deaths (Barber and Hemenway 2011) and homicides by police (Barber et al. 2016). A growing number of studies have used the NVDRS to investigate epidemiologic trends, precipitating factors, and contextual factors of violent deaths as well as how these correlates vary by race/ethnicity, occupation, and physical and mental health (Mezuk et al. 2021).

Although the narratives serve as a valuable tool to inform research on violent deaths, they are subjected to potential biases and challenges relating to data collection and abstraction. Many of these challenges are due to the fragmented nature of the US death investigation system, as acknowledged by the NVDRS itself. Each state implements their own medico-legal procedures (Ruiz et al. 2018; Huguet et al. 2012), which vary by the degree of centralization, credentials and training of death investigation personnel (i.e., medical examiners versus coroners), and levels of funding (Hanzlick 2003). This lack of unified investigation procedures may have important implications for documentation and classifications of violent deaths across states and jurisdictions (Rockett et al. 2018, 2014; Breiding and Wiersema 2006; Dailey et al. 2012).

Effective utility of text narratives entails a need to mitigate challenges in the collection and abstraction of the NVDRS while advocating for continuous improvements of this data source.

While many of the original source documents that inform the NVDRS were not designed for research, the NVDRS narratives have increasingly been used to study a range of violent deaths for prevention and intervention efforts within the last decade (Nazarov et al. 2019). As a foundation for future research, this review provides a comprehensive summary of peer-reviewed studies using NVDRS narratives over the past 20 years, highlights potential challenges of these narratives and how they are addressed in the current literature and provides recommendations on utilizing and improving the information potential of the narratives, with an eye to the application of data science tools.

Methods

Search strategies

An informationist (L.N.J.) developed search strategies to identify relevant articles, conference abstracts, and government/agency reports that used NVDRS text narratives (or individual state VDRS narratives). From the time of inception of each database, PubMed, PsycInfo, Scopus, and Google Scholar (for gray literature) were searched on March 26, 2021; updated searches in each database were conducted on January 26, 2022. Each search utilized title and abstract tags for the following keywords and phrases: “National Violent Death Reporting System”, “Violent Death Reporting System”, NVDRS, VDRS, violent, violence, injury, suicide, homicide, “firearm accident”, “unintentional firearm”, “undetermined death”, accident, “intimate partner violence”, IPV, “domestic violence”, “child abuse”, “legal intervention”, “law enforcement”, narrative, “text narrative”, “mixed method”, circumstances, coding, and code. No indexing languages were used since the phrase "National Violent Death Reporting System" is not an indexed term in any of the databases. A set of sentinel articles were identified before the search process to generate search terms and test the effectiveness of the strategies in each database (Barber et al. 2016; Nazarov et al. 2019; Skopp et al. 2019; Ream 2020; Mezuk et al. 2003). The search was not limited by language, publication date, or any other restrictions. Complete search strategies are described in Additional File 1: Appendix A.

Criteria for study selection

Studies were eligible for full-text abstraction if they were peer-reviewed published articles or government/agency reports in English language that used NVDRS text narratives or individual state VDRS narratives, with no restrictions on the types of study and types of violent death. Two articles that used the NVISS, the predecessor to the NVDRS, were also included. Theses, dissertations, conference presentations and posters, editorials, commentaries, or abstract-only publications were excluded for quality control (Taylor et al. 2014).

Study selection process

In the first stage, two authors (L.N.D., E.T.K.) independently screened the titles and abstracts of all studies generated from the database search for the following phrases: “National Violent Death Reporting System”, “Violent Death Reporting System”, “NVDRS”, and “VDRS”. Studies were included for further review when the title and abstract screening was inconclusive. Interrater agreement, assessed by comparing screening results of 25 randomly selected articles between two authors, yielded high agreement, with 24 out of 25 articles agreed. Next, the same authors conducted a full-text screening of eligible articles selected from the title/abstract screening to determine whether the text narratives were used in the methods. Any additional articles/reports were identified by screening the references of abstracted articles. Disagreements were resolved through discussions among all authors.

Data abstraction

The following information was extracted from each article: name of first author, year of publication, type of data (NVDRS, state-specific VDRS, or NVISS), type of death, research question(s), study population(s), study sample size, number of narratives used, type of narratives, selection criteria for narratives, statistical approaches (e.g., purpose for analyzing narratives, methods to analyze narratives, linkage with external data sources), assessment of narrative quality (e.g., efforts to address missing narratives, validation of data abstracted from the narratives), challenges and recommendations pertaining to the narratives and NVDRS as noted by the authors. A description of each extraction variable is provided in Additional File 2: Table S1. Analyses for this study were pre-registered via the Open Science Framework (OSF) in July of 2022 (Johns et al. 2022).

Summarizing

Frequencies of abstracted articles were described by type of data, type of narratives (C/ME, LE, or both), type of deaths (suicide, homicide, homicide followed by suicide, legal intervention, unintentional firearm, undetermined intent, and multiple types of death), study population (summarized by age groups, gender, professions, health conditions, and vulnerable/minority subgroups), purpose for analyzing narratives, and approaches to assess data completeness and reliability (missing narratives, linkage to external data sources, validation of information abstracted from narratives). In addition, a cumulative flow diagram of studies using the text narratives by methodological tools was generated for the period from 2004 to 2022. Finally, major challenges frequently encountered by researchers, both relating to the narratives and the NVDRS system in general, were summarized.

Assessment of study quality

The relative quality of studies in terms of sample size, study population, and methodological approaches for analyzing text narratives was evaluated as part of the article abstraction process. However, because we did not seek to derive an overall effect size of a particular exposure-outcome relationship, metrics for assessment of study quality and risk of bias (e.g., Cochrane, Newcastle–Ottawa Scale, etc.) were not relevant for this scoping review (Khalil et al. 2016; Peters et al. 2015).

Results

Search results

Figure 1 is a Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) flow diagram of the study selection process (Page et al. 2020). The initial database search yielded 1820 eligible studies and additional 410 were identified from an updated search (347 in PubMed, 191 in PsycInfo, 337 in Scopus, and 1355 in Google Scholar). After removing duplicates, 1,482 remained for further review. The title/abstract screening identified 428 studies eligible for full-text screening, excluding studies that were not in English (n = 22), not peer-reviewed published articles or government/agency reports (n = 475), and did not use NVDRS or state VDRS as indicated in the titles and abstracts (n = 557). Of the 428 studies, the full-text screening identified 111 eligible for abstraction. No government/agency reports used text narratives and were excluded. Two Epid-Aid reports that used the NVDRS in conjunction with other publicly available data sources as part of the investigations of suicidal behaviors among youth in Utah and Santa Clara Country, California, were excluded (Garcia-Williams et al. 2016; Annor et al. 2017). Finally, the reference screening did not identify any additional studies for inclusion in the full-text abstraction. In summary, a total of 111 studies were included for full-text abstraction. Additional File 3: Table S2, provides descriptions of these studies.

Fig. 1
figure 1

PRISMA Flowchart on Study Identification, Screening, and Inclusion. Source: Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. https://doi.org/10.1136/bmj.n71

Characteristics of abstracted studies

As shown by Table 1, more than three quarters of studies (n = 91, 82%) used the NVDRS as opposed to state-specific VDRS, and most studies used both C/ME and LE reports (n = 106, 95%). Of 111 studies using text narratives, almost half (n = 48) studied suicide only; one fifth (n = 25) studied homicide (including single, multiple, and mass homicide); and the remaining studied homicide followed by suicide (n = 8), legal intervention deaths (n = 6), unintentional firearm deaths (n = 4), undetermined intent deaths (n = 1), and multiple types of deaths (n = 16). Many studies were conducted within a particular subpopulation, defined by age groups (19 studies on infants/children, 2 studies on middle-aged adults, and 8 studies on adults aged 50 +); sex/gender or orientation (6 studies on women, 5 studies on men, and 4 studies on LGBTQ +); professions (4 studies on active duty or veterans, 5 studies on healthcare professionals [e.g., nurses, physicians, psychologists], and 3 studies on farmers); health conditions (1 study on cancer, 5 studies on mental/brain disorders, and 1 study on chronic pain), and vulnerable groups (3 studies on pregnant/postpartum women, 1 study on Non-Hispanic Asians/Pacific Islanders, and 6 studies on currently/formerly incarcerated individuals).

Table 1 Descriptive characteristics of all included studies and their citations (Peters et al. 2015)

Assessment of data completeness and reliability

Only a few studies reported missing narrative data (n = 17), and the majority failed to specify whether missing narrative data were of significant concern to the research question(s), how and/or why a particular narrative was missing, as well as how missingness was handled. Almost half of studies (n = 48) assessed the degree to which similar information agreed between the quantitative coded variables and qualitative text narratives. One-third (n = 36) used or linked to external data sources beyond the NVDRS or state VDRS, for example, the US Census data (for mortality data), medical records (for additional health characteristics), and media reports (for additional case identification). Out of 36 studies that linked to external data sources, the majority (n = 30, 83%) did not assess the degree to which similar information agreed between the narratives and/or NVDRS variables with the external sources (Table 1).

Purpose for analyzing narratives

Narratives were used in two distinct ways. The majority of studies analyzed contents of narratives to characterize salient risk factors or circumstances around deaths (n = 38, 34%) or to supplement existing quantitative variables for case identification (n = 49, 44%), or both (n = 23, 21%) (Table 1). For example, Adhia et al. (2020) manually reviewed text narratives to characterize murder-suicides perpetrated by adolescents. Arseniev-Koehler et al. (2021) employed a topic modeling approach to investigate racial and ethnic differences in the narrative descriptions of threat and dangerousness (e.g., physical aggression) associated with legal intervention deaths among men.

Methodological tools for analyzing narratives

There were a wide range of statistical approaches used for analyzing the narratives. As shown in Table 1, narratives were primarily analyzed through manual review (n = 81, 73%), keyword searches (n = 9, 8%), or a combination of approaches (n = 13, 12%). Only a few studies employed data science methods including natural language processing (n = 3) and topic modeling (n = 3). (Adhia et al. 2020; Arseniev-Koehler et al. 2021).

Figure 2 shows the cumulative flow diagram of studies using the text narratives by methodological tools in the period between 2002 and 2022, as the NVDRS began collecting data in 2002 (Center and for Injury Prevention and Control, Division of Violence Prevention 2021). Studies that used text narratives were first published four years after the creation of NVDRS; the number of these studies increased over time, with the overwhelming majority being published after 2014 (n = 94, 85%). Notably, there was a shift in the methodological tools used for analyzing the narratives over time. Methods for analyzing narratives became increasingly diverse; for example, there were a growing number of studies employing keyword search, natural language processing, and topic modeling in addition to manual review in recent years. Additionally, more advanced statistical methods were used to extract narrative data. While manual review was predominantly and exclusively used in studies prior to 2015, more studies have used keyword search since 2015 and data science methods (e.g., natural language processing and topic modeling) since 2019.

Fig. 2
figure 2

Cumulative flow diagram of studies using text narratives by methodological tools between 2002 and 2022

Data challenges encountered by researchers

Table 2 summarizes two major challenges frequently encountered by the researchers. The first challenge relates to a lack of or limited information on contextual factors relevant to deaths or populations being investigated. For example, several studies found that demographic and circumstantial details in the narratives were insufficient for case identification and characterization of death incidents. (Scheyett et al. 2013; Frazier et al. 2017; Briker et al. 2019; Fraga Rizo et al. 2021) Sensitive topics such as child maltreatment, intimate partner homicides, and legal intervention deaths, while routinely collected by the NVDRS, are limited to the information provided by the source documents and interpretations of the abstractors. (Lord 2014; Brown and Seals 2019; Hunter et al. 2022) The second challenge relates to information variation within the NVDRS system, such as discrepancies between different data sources (e.g., C/ME and LE reports) and variations in reporting, coding, abstraction, completeness, and contents of text narratives and NVDRS across states.

Table 2 Major challenges encountered by researchers relating to the text narratives and NVDRS system in general

Quality of included studies

All studies included in this scoping review were peer-reviewed, which serves as a crude metric of research quality. The sample size of included studies (ranged from 46 to 233,108 incidents) was appropriate for the research questions, which were largely descriptive and representative of the decedents in the population of interest. Most studies limited their sample to cases from continuously reporting NVDRS states to ensure the reliability of narrative data. Whether they used traditional qualitative techniques or data science tools, studies employed rigorous methodological approaches for analyzing narratives. For example, many studies (e.g., Holland et al. (2017) Kohlbeck et al. (2020) Schwab-Reese et al. (2021) Mennicke et al. (2021)) developed comprehensive coding guidelines for characterizing salient circumstances of violent deaths via open-coding procedures and comparative methods. Other studies (e.g., Tian et al. (2016) Petrosky et al. (2018) O’Donnell et al. (2019) Miller et al. (2021)) improved case identification by employing keyword searches followed by manual review of the narratives.

Discussion

This review provides a comprehensive assessment of the research utility of the NVDRS text narratives as a valuable qualitative tool for understanding violence at the population scale. Results showed a substantial increase in the number of studies using the narrative data in recent years, particularly concerning correlates of suicide and homicide consistent with prior reviews of the NVDRS (Nazarov et al. 2019). Leveraging text narratives in studying suicide deaths presents a unique opportunity for identifying novel risk factors and advancing the historically stagnant nature of suicide research (Franklin et al. 2017). This review also highlights that taking full advantage of NVDRS narratives will require novel methodological tools, including those captured under the umbrella of “data science”, to extract insights from these narratives in an effective and meaningful way. These tools, in turn, will be enhanced by integrating and incorporating multiple data sources to understand both protective and risk factors to go beyond the purely descriptive nature of many of the studies included here.

This review identified several data challenges that researchers have frequently encountered; many of which align with previously identified limitations of the RAD-NVDRS (Kaplan et al. 2017). First, relevant contextual factors are often lacking or insufficient in the narratives. The NVDRS, and its narrative data, depend on the completeness and accuracy of the original C/ME and LE sources; both of which are dependent on the nature of violent deaths, death investigation procedures, qualifications, and experiences of the data abstractors, as well as the relationships between various local and state level stakeholders. For example, toxicological reports and sensitive information, such as circumstances around child maltreatment, intimate partner homicides, and legal intervention deaths, are often missing. Further, detailed contextual information around relationship status (Abolarin et al. 2019; Smith et al. 2014), the presence of cyber abuse and bullying (Brown and Seals 2019), and diagnosed mental health and substance use (Mezuk et al. 2015; Logan et al. 2008) were identified as lacking or insufficient.

Additionally, many studies reported the difficulties of capturing relevant circumstantial information due to ongoing investigations, deaths occurring in states different from state of residence, and deaths involving law enforcement suspects. Therefore, any efforts to draw inferences from the narratives require a careful consideration of sources of missingness, both in abstractor-coded variables and text narratives, particularly in studying legal intervention deaths given officers are both the inflictors and key witnesses. Such a dynamic can have implications for the accuracy and presence of important circumstances in the narrative data. This further illustrates how the research question may affect both the awareness and nature of the challenges associated with using narratives.

Second, the review highlighted the challenges relating to variability of the narratives in terms of length, completeness, and availability. As narratives are collected from secondary sources such as suicide notes and interviews with family/friends of the decedents, their contents vary depending on the information reported by the informants, circumstance details deemed relevant by the coroner/medical examiner and law enforcement, as well as the interpretations of the abstractors. These narrative variations may also stem from human errors during coding and abstraction process (Dailey et al. 2012). Information bias can arise when the data presence or quality of narratives varies systematically as a function of decedent characteristics (Mezuk et al. 2021), which has broad implications on the ability to draw unbiased inferences from this data source. These challenges with death certificate data have been previously documented (Data and Surveillance Task 2014).

Third, there are information inconsistencies between various data sources, including conflicting information between C/ME and LE narratives and between the abstractor-coded variables and the narrative texts. These inconsistencies arise because the NVDRS data, while designed as a research repository, are derived from source documents collected for non-research purposes. A lack of or an underdeveloped data-sharing between different partners (e.g., vital records, C/ME offices, law enforcement) can result in inconsistencies within the NVDRS. While the CDC provides detailed Users’ Manuals for the NVDRS (Centers for Disease Control and Prevention 2020, 2021, 2022b), there is a general lack of concrete guidance on how to reconcile incongruencies and integrate text narratives with the abstractor-coded variables. This review found that researchers who utilize the narratives as a means of case finding or case confirming often privilege the content within the qualitative data in classifying or categorizing cases and incident circumstances when coded variables were found to be insufficient (Davidson et al. 2021a; Lohman et al. 2021; Yau and Paschall 2018; Wertz et al. 2020). However, few studies reported information on missingness or incompleteness of these texts, much less how such data issues were addressed in the analysis.

Lastly, although the NVDRS has expanded to all 50 US states, Puerto Rico, and the District of Columbia, states participate in this reporting system at various points in time. Early participating states (e.g., Virginia, New Jersey) have more established death investigation infrastructures and therefore, more consistent data in comparison with newer states (e.g., California) (Center and for Injury Prevention and Control, Division of Violence Prevention 2021). This can have an impact on the information potential of the narratives. Furthermore, not all states participate in optional modules such as the IPV and CFR modules (Centers for Disease Control and Prevention 2022b). These data barriers may result in small analytical samples, as studies often limited their analyses to states that have consistently reported data.

Informed by the findings from this review, Table 3 summarizes recommendations for improving the utility of text narratives, both for end-users (i.e., researchers) and for NVDRS administrators. Our findings suggest several opportunities for researchers to leverage existing, advanced, and flexible data science methods to explore and analyze large amounts of unstructured textual data in a meaningful and efficient manner. Contrary to traditional textual analysis methods (e.g., manual review, keyword searches), which are often time-consuming and labor-intensive, natural language processing and topic modeling can be immensely useful in combing through large amounts of textual data, detecting patterns in circumstances, and building algorithms as an alternative for manual review, as illustrated by some of the studies included in this review (Mezuk et al. 2003; Lohman et al. 2021; Arseniev-Koehler et al. 2020). However, these data science methods can be computationally intensive, require specialized and technical knowledge, and often rely on the amount of data included in narratives which, in turn, rely on the consistent and detailed abstraction of circumstances around violent deaths.

Table 3 Recommended approaches to address challenges in utility of text narratives for research and practice

To generate a meaningful comparison group, the NVDRS can be linked to external datasets using temporal (e.g., year) and geographic (e.g., state) identifiers to characterize additional circumstances or contexts (e.g., health circumstances, rurality/urbanicity, etc.), create comparison groups to make inferences about potential risk and protective factors, and for more complete case ascertainment using other sources of violent death reporting. Examples of publicly-available data sources beyond the NVDRS include the Census (Petrosky et al. 2018; Yau and Paschall 2018Graham et al. 2022), other mortality registries and vital records (Barber et al. 2016; Austin et al. 2016), media reports (DeBois et al. 2020; Robiner and Li 2022), and population-based surveys (Hemenway and Solnick 2015), However, data linkage can be difficult given the requirement of identifiers with which to link, the dynamic nature of some data including EMRs, concerns over privacy, and the necessity of “comparable” groups when using non-deceased controls.

Lastly, given a large share of studies utilized the text narratives as a means of supplementing information provided in the coded variables, incongruencies between the narrative and coded variables (or potentially between C/ME versus LE narratives themselves) are an important challenge faced by researchers and, to our knowledge, there is no existing guidance on how to integrate these two data sources, reconcile discrepancies, or when to privilege one over the other. As such, greater transparency and clearer documentation from NVDRS administrators to the research community are needed. A few studies have focused on recommendations for the improvement of the NVDRS, including the standardization of the investigation system and data collection procedures (Kaplan et al. 2017; Friday 2006), although such standardization efforts are challenging due to systemic barriers in infrastructure, limited resources, and funding.

Strengths and limitations

To the best of our knowledge, this review is the first comprehensive evaluation of the utility of the NVDRS narratives as a valuable qualitative source in studying violent deaths, with a focus on the analytical tools and data challenges with analyzing narrative texts. The restriction to peer-reviewed studies, the relatively large size and representative nature of the sample of eligible studies, well-defined study populations, and various rigorous methodological approaches of the studies reviewed indicates that studies using these narratives are of sufficient quality to draw reliable inferences. A broad range of study populations, exposure-outcome relationships, and research questions were examined, which collectively can inform future research using this data system. This review additionally recommended actionable approaches to enhance the research usefulness of the narratives and NVDRS data. Despite the comprehensive nature of this review, there are several limitations. First, a defined set of major databases were used to capture the scholarly and academic literature at the cost of others (Web of Science, OVID, Embase). Secondly, studies included in this review were limited to peer-reviewed sources and do not include dissertations, posters, abstracts, letters to the editor, and conference proceedings. As a result, findings are subject to publication bias, which can have implications for the resulting conclusions.

Conclusion

By producing actionable insights and recommendations, this review endeavors to improve and maximize the use of text narratives and NVDRS data in research. Increasing use of advanced data science methods, leveraging linkages to external datasets, and increasing awareness of and addressing issues of narrative completeness and quality are important considerations. By providing guidance on the use of narrative texts, this review furthers the goal of the NVDRS to assess and understand the scope of violent deaths to inform prevention efforts more completely.