Background

Self-harm, defined as completed suicide, attempted suicide, suicidal ideation, or non-suicidal self-injury (NSSI), is a significant health problem in children and adolescents worldwide [1, 2]. The third leading cause of death in 10–19 year-olds in England is suicide [3] and suicide rates in 15–19-year-old British and Welsh girls increased by 13.2% from 2010 to 2017 and in boys by 5.9% [4]. Completed suicide was the second leading cause of death in the US in 2019 for 10–19 year-olds (https://webappa.cdc.gov/sasweb/ncipc/leadcause.html). Studies from several countries also show that completed suicides, suicide attempts, or suicidal ideation in children younger than 12 are no longer unusual [5,6,7,8,9].

Completed suicide is associated with previous suicide attempts, suicidal ideation [10, 11], and NSSI [12, 13]. The issue of NSSI is important when discussing children and adolescents, as it has been reported to occur in up to 35% of the adolescent population worldwide [14, 15] and in nearly 8% of the general population of 7–8 year-olds [16]. Moreover, in a large clinical sample of 3–6 year-olds with major depressive disorder (MDD), rates of NSSI, suicidal ideation, and suicide attempts were present in 21.3, 19.1 and 3.5% of the children, respectively [17]. Recent data demonstrate that the younger the age of onset, specifically if younger than 13 years of age, the more severe and protracted the course of NSSI and suicidality [18].

In addition to completed suicide, children and adolescents with self-harm have poorer adult outcomes with lower educational and occupational attainment and more mental and physical health problems [19, 20]. Ideally, successful treatment would not only reduce the risk of completed suicide or other forms of self-harm, but also facilitate healthy development.

Unfortunately, we do not yet have an evidence-based, validated set of clinical treatment guidelines for treating pediatric self-harm [21]. Despite a growing number of encouraging treatment studies in recent years, systematic reviews and meta-analyses conclude that replication of positive effects from dialectical behaviour therapy (DBT) is needed and that more clinical trial research is still required for mentalization and family-based therapies to determine efficacy [22,23,24] Moreover, gaps in this corpus of work include an absence of studies about treatments for self-harm in children younger than 12 years and a paucity of trials for particularly vulnerable sub-groups such as those in care [25] and non-cis-gender children or adolescents [24]. No randomized controlled trials (RCTs) of pharmacological interventions for pediatric self-harm have been published and the results from non-RCT studies using adaptations of adult medication treatments have been disappointing [26,27,28].

Developing better treatments likely requires more detailed characterization of this complex population. Clinical biomarkers have the potential to advance such characterization. The first stage in biomarker development is the identification of valid and reliable peripheral, neural, or genetic correlates of clinical symptoms, treatment response, or long-term outcomes [29].

Three recent reviews have summarized the literature on genetic correlates of self-harm, including in children or adolescents [30,31,32]. Therefore, we focused our study on research covering peripheral and neural correlates. Eight reviews summarized research that included children and adolescents from birth to 19 years of age, six of which were done before 2016 [33,34,35,36,37,38,39,40]. A total of 31 studies (15 of peripheral correlates and 16 of neural correlates) were presented, but all these reviews except for one [34], combined findings on children and adolescents with the data from adults. An additional review published after we had submitted our manuscript also combined pediatric and adult data [41]. Of the nine publications, only one was a systematic review [34], although most of them took a systematic approach to finding studies.

Not differentiating child and adolescent data from those obtained from young adults in their twenties may be misleading in developing biomarkers for these younger age groups. Neuroimaging research shows that brain development continues into the late twenties, but the risk factors and clinical profiles of self-harming children and adolescents differ from those reported in young adults in their twenties [18, 42, 43]. For example, compared to adults who attempted suicide, adolescents had a significantly higher number of previous attempts, were more likely to be responding to interpersonal problems, and were more likely to use medication for self-poisoning [44]. Several reviews indicate that imaging findings may be different in self-harming adolescents than those reported for adults, e.g., in decision-making and impulsivity [38]. Moreover, adolescent hopelessness, loneliness, or impulsivity are less strongly related to self-harm than they are in adults [35]. Adolescents’ social risk factors are also different: parent-child conflict, school stressors, vicissitudes of early romantic relationships, victimization from bullying, and internet addiction and stress [37, 45]. These factors may affect associations between self-harm and biological correlates differently in children and adolescents than in young adults.

The discovery of clinical peripheral and neural biomarkers requires evidence of reliable and valid biological correlates studied in samples from the targeted population, which in this case is children and adolescents younger than 19 years of age. Identifying potential diagnostic, treatment, or prognostic biomarkers often begins with a quantitative synthesis of such a body of research. To our knowledge, such a synthesis does not exist for peripheral and neural correlates of all types of self-harm in children and adolescents. Thus, in preparation for a systematic review and meta-analysis, we conducted a scoping review of this corpus of work.

Methods

Scoping reviews are designed to 1) identify and characterize studies in a body of research; 2) summarize how the research is conducted; 3) identify factors that can affect findings; 4) delineate research gaps; and 5) present implications for researchers (instead of for clinicians, as with systematic reviews) [46]. We used the Joanna Briggs Institute (JBI) structure for scoping reviews [47] and our review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Scoping Reviews (PRISMA-ScR) guidelines, as summarized in the PRISMA-ScR checklist [48] (Supplement 1).

Search strategy and information sources

Our search strategy used terms mapping onto constructs of age, e.g. ‘child’, ‘youth’, ‘adolescent’; self-harm, e.g. ‘suicide’, ‘self-injury’, ‘suicidality’, ‘non-suicidal self-injury’; and the broad search term of ‘biological correlates’, in addition to specific correlates obtained from previous reviews categories of correlates, e.g., ‘nutrition’ and ‘neurotransmitters’. Medical Subject Headings (MeSH terms) and keywords were incorporated into the search strategy. The searches were conducted in the PubMed and EMBASE databases from 1980 onwards. Searching was initiated on June 17, 2018, updated on September 13, 2019, and again on May 6, 2020. No studies published after May 6, 2020, were reviewed. Reference lists from studies obtained or reviews were also examined for missed studies. Any discovered were processed in the same manner as those found with the searches. Gray literature was not searched. Details of the search strategy are in Supplement 2.

Eligibility criteria

Eligibility criteria were established a priori. A study was included if the following were investigated: 1) suicidality, defined as completed suicide, attempted suicide, suicide plans, or suicidal ideation; 2) NSSI, defined as self-harm of any type, e.g., cutting or burning, without intention to kill oneself; 3) any type of self-harm, i.e., not designated as strictly suicidality or NSSI; 4) participants with ages birth − 19 years; and 5) peripheral or neural biological correlates, i.e., objective, biological peripheral or neural data collected with non-invasive methods. Studies with participants older than 19 years were included if data were reported separately on those within our age range. Only peer-reviewed studies written in English were included. There were no restrictions on study location nor were there restrictions on how study subjects’ self-harm was defined. Every study was included if it focused on self-harm, whether participants were classified by diagnostic criteria from an edition of the Diagnostic and Statistical Manual (DSM), a version of the International Classification of Diseases (ICD), questionnaires, scales, interviews, or clinical records.

A study was excluded if it: 1) examined genetic correlates; 2) only investigated self-harm in patients as a function of severe intellectual or developmental disability syndromes, e.g., Lesh-Nyhan syndrome or tuberous sclerosis complex; or 3) was a conference abstract, review or case report.

Screening process and data extraction

Covidence was used to assist in the processing of records [49]. Duplicates were removed from the abstracts, followed by a three-step process of blinded assessments by pairs of co-authors. Titles and abstracts were screened against inclusion and exclusion criteria, and those still eligible were then subjected to full-text screening. The remaining eligible studies were then subjected to full-text information extraction, using a spreadsheet adapted from the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) criteria [50] Disagreements between raters were resolved through consensus.

Assessment of risk of bias

Although not required in a scoping review, to better characterize the research and formally assess factors that could affect findings, we estimated the risk of bias in each study. We used one of three tools, depending on study design: the Quality Assessment Tool for Case-Control Studies, the Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies, or the Quality Assessment Tool for Pre-Post Intervention Studies from the U.S. National Heart, Lung, and Blood Institute (https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools). These rating tools evaluate study methods and implementation as sources of bias, including sample characteristics (subject selection, participation, attrition), confounding, study power, and estimation of causality between exposures or interventions and outcomes. This system does not generate a score per study, but the items inform a rater’s qualitative assignment of a rank; Poor (high risk of bias), Fair (medium risk of bias), or Good (low risk of bias). Ratings were done by four raters in pairs, blind to each other’s work. Disagreements were resolved by the senior author who reviewed each study in question, blind to ratings from the others.

Results

Figure 1 displays the PRISMA diagram of article processing. We located 4025 abstracts in PubMed and 1953 in Embase, resulting in a total of 5978 records. Back-searching reference lists identified 30 more records meeting eligibility criteria, for a total of 6008 articles processed. Removal of 471 duplicates left 5537 records for title/abstract screening, which found that 5322 articles were ineligible for inclusion. The full-text screening was conducted with the remaining 215 studies, and 136 were excluded for the reasons listed in Fig. 1. The most common reasons for exclusion were that studies were out of the age range or did not study a biological correlate.

Fig. 1
figure 1

PRISMA Diagram

The final number of studies retained was 79, including three papers that collected data on two different specific correlates (see Table 1) [51,52,53]. Therefore, for any results concerning the total number of studies, we use 79, as this refers to the number of studies of specific correlates. However, since these three papers conducted their research on the same sample for both specific correlates, we only used each paper once when summarizing designs, sample characteristics, and risk of bias.

Table 1 Studies reviewed, type of self-harm, designs, specific correlates studied

Self-harm focus in the studies was defined in one of three ways with three types of samples: suicidality (subjects only with suicide attempts, plans, or ideation), NSSI (subjects only with self-harm without intention to die), or what we labeled ‘any self-harm’ (subjects who manifested suicidality, NSSI, or both, but the samples were mixed with regards to types of self-harm). Most studies, 65% (51/79), focused on subjects with suicidality, while 19% (15/79) studied NSSI, and 16% (13/79) investigated participants recruited with Any Self-Harm.

Publication dates ranged from January 1985 to May 2020. There was a relationship between time of publication and type of self-harm studied, as shown in Fig. 2. The counts of papers on suicidality have more than doubled every 5 years since 2009. Research on studies of any self-harm first showed up in 2005, whereas NSSI papers did not appear until 2012. Numbers of studies about any type of self-harm have increased in the past 5 years, as have NSSI studies in the past 10 years. However, work on suicidality has continued to show the largest growth.

Fig. 2
figure 2

Publication dates by type of self-harm studied

We found 48 studies of peripheral correlates and 31 of neural correlates. We categorized these into seven sub-types of peripheral correlates and three sub-types of neural correlates (see Fig. 3). The most frequently-studied sub-category in the peripheral correlates was the stress response system and for the neural correlates, it was brain function with imaging. There were no replication studies.

Fig. 3
figure 3

Tree map of peripheral and neural correlate sub-categories and number of studies

Most of the studies were conducted in the US. Germany was the location for eight studies. Three or fewer were done in Israel, Canada, Turkey, Scotland, Spain, Sweden, and Hungary.

Study designs, samples, identifying self-harm, specific correlates studied bias/quality ratings

Table 1 displays the studies included organized by whether they were in the peripheral or neural category, the type of self-harm that was the focus of the study, the design, and the specific correlate studied [51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126]. Details about samples, methods, and findings, as well as risk of bias rating for each study can be found in Supplement 3 (peripheral correlates) or 4 (neural correlates).

Designs

Proportions of study designs were similar across correlate categories. Case-control designs were most frequently used, in 74% (34/46) of the peripheral correlates studies and 70% (21/30) of the neural correlate studies. Cross-sectional or cohort designs were employed in similar proportions across correlate categories (24% or 11/46, peripheral and 23% or 7/30 of the neural correlate studies). Likewise, each category had one non-controlled pre-post intervention study.

Samples

Table 2 and Supplements 3 and 4 show that sample characteristics varied widely across studies. Samples in the peripheral correlates more frequently comprised subjects with suicidality: 70% (32/46), compared to 60% of the neural correlate papers (18/30). The neural correlate data more often came from samples of participants with NSSI 30% (9/30) compared to only 13% (6/46) of the studies of peripheral correlates. The proportions of papers in each category studying participants with any type of self-harm were more similar, 17% (8/46) in the peripheral correlates and 13% (4/30) in the neural correlates research.

Table 2 Sample characteristics

Sample sizes for the entire body of work ranged from 9 to 1268. The range for the peripheral correlates’ samples was 9–1258 subjects (median = 62.5), but the neural correlates’ range was 10–152 participants (median = 44). Adolescents were studied most often, with only two studies in each category exclusively examining children younger than 12. Nearly the same percentage of studies in each correlate category used combined samples of children and adolescents.

Many studies (63% (29/46) of the peripheral correlate studies and 50% (15/30) of the neural correlates) recruited convenience samples of clinical cases from inpatient units or outpatient clinics. A number of these studies also recruited from the community, but this process was always to obtain healthy controls, not self-harming children or adolescents who were not patients. Self-harming participants in cross-sectional and cohort studies were also often collected as convenience samples, but several studies used samples of the general population recruited with population sampling methods.

Half of the studies in both correlate categories recruited subjects based on a clinical diagnosis, almost always major depressive disorder (MDD). Other disorders targeted subjects with borderline personality disorder, anxiety disorders, or psychosis mixed with all types of mood disorders. Several studies also recruited subjects with “MH concerns,” but no diagnosis was associated with this.

Of the studies that used control groups (34 peripheral and 23 neural studies), healthy control groups were the most common: 62% (21/34) for peripheral correlates and 57% (13/23) for neural correlates. Few studies used only psychiatric controls, but 39% (9/23) of neural correlate projects recruited healthy and psychiatric controls, and 18% (6/34) of the peripheral correlates used both.

Girls were studied more often than boys. The percentage of girls, averaged across all studies, was 72% and the range was 11 to 100%. Nearly a third of the studies investigated chiefly or entirely female samples, which we defined as > 85% girls: 30% (14/46) and 27% (8/30) in the peripheral and neural categories, respectively.

Methods to identify self-harm

Data classifying the type of self-harm were collected with six approaches: 1) self- or parent-report, 2) diagnostic interview, 3) clinician-rated scale/non-diagnostic interview, 4) combinations of the previous three approaches, 5) clinical records, or 6) non-standardized instruments created for the specific study. Data collection methods were similar in the peripheral and neural correlates research, but there was little inter-study consistency, as shown in Supplements 3 and 4.

The lack of consistency is illustrated by the finding that numerous different instruments were used within each of the first four approaches. Self-report data about self-harm were collected from one or more of 16 instruments, diagnostic interview data from one of five instruments, and information from clinician-rated scales/non-diagnostic interviews could have come from one or more of 14 instruments. Many of the instruments used were designed for adults and lacked psychometric data for use in the pediatric age group. Moreover, several of the instruments in all categories were not designed to assess self-harm and classified subjects based on answers to just a few questions (sometimes only one), e.g., the Youth Self-Report (YSR) [127] or older versions of the Kiddie-Schedule for Affective Disorders and Schizophrenia (K-SADS) [128].

The most frequently used approach was clinician rating scales/non-diagnostic interviews. Studies of peripheral correlates most often used the Self-Injurious Thoughts and Behaviors Interview (SITBI) [129], while the neural correlates research most frequently used the Columbia Suicide Severity Rating Scale (C-SSRS) [130]. The next most common data collection method was diagnostic instruments, usually with a version of the K-SADS. Self-report studies were the third most common, with the neural correlate studies using this strategy nearly twice as often as peripheral correlates.

Neural correlate research used the Suicidal Ideation Questionnaire (SIQ) [131] most frequently, in contrast to the YSR in peripheral correlate studies. Over a third of the neural correlate studies used a combination of methods, compared to only 15% (7/48) of the peripheral. Clinical records used to categorize subjects on self-harm were used in 10% (5/48) of the peripheral correlate work, although only 3% (1/31) of the neural correlate studies did this. Only one study used a non-standardized instrument. An unusual study used response latency to timed judgments of pairs of death-related and self-related words to categorize participants. Shorter latency to respond to death/me words than to life/me was categorized as implicit self-harm, differentiated from explicit self-harm defined by standard self-report measures. The authors suggested that this strategy may yield more reliable classification than self-report, especially in younger children who may have trouble articulating their feelings [121].

Specific correlates

Table 3 presents the 28 specific correlates measured in these studies and the number of methods and outcomes for each (see Supplements 3 and 4 for further details). Over a quarter (29% (8/28)) of the specific correlates were investigated in only one study. The remainder were examined with two to eleven studies, but even in clusters of studies about one specific correlate, there was heterogeneity in the methods used to measure the correlate. Research on event-related potentials (ERPs) in reward processing had the most consistent methods. In contrast, five methods were used to investigate the reactive function of the autonomic nervous system (ANS). Outcomes of interest also varied considerably, with studies of some specific correlates all focusing on the same outcome, e.g., reactivity of the HPA axis measuring changes in cortisol levels, while groups of other studies examined diverse outcomes, e.g., neural functional connectivity studies investigated six different outcomes.

Table 3 Specific correlates: number of studies, measurement methods, and outcomes

Risk of bias ratings

Table 4 summarizes the risk of bias assessment rating results, organized by correlate sub-categories. An estimate of Good was given to 37% (29/79) studies; Fair to 57% (45/79), and Poor to 4% (4/79). Inter-rater agreement was moderate (k = 0.23) [132]. Neuromodulator and lipid metabolism had the highest percentage of Good studies with 100% in each. However, there were only four and two studies in each, respectively. And the two studies of S100B levels appeared to be overlapping samples, although this was not explicitly stated. Slightly over half of the twenty-three studies of the stress response system were rated as Good, and likewise half of the sleep studies. The remaining studies were judged to be Fair. The high percentage (50%) of studies rated as Poor in the pituitary hormones sub-category is due to only having two papers for this correlate and one of them rated as Poor.

Table 4 Summary of risk of bias ratings per correlate sub-category

The most common reasons that studies were not scored as Good were: no sample size justification or power analysis, measurements of specific correlates were not blind to self-harm status (of particular importance in cohort studies), self-harm participants and control group members were not chosen at random from potential subjects (of particular importance in case-control studies), and analyses of the association between self-harm and specific correlates did not account for confounding variables, e.g. medication usage or gender. We did not find a relationship between ratings and publication date. As this was a scoping review, we did not exclude studies rated as Poor. Neither did we weight individual study findings by bias scores in the next section.

Summary of study findings by type of self-harm

Table 5 presents a high-level summary of individual study findings, organized by the type of self-harm investigated. We could not calculate the number of unique participants studied for each correlate category because some studies appeared to have used the same sample in different studies, although that was rarely stated. At least one significant association with some type of self-harm was reported for all specific correlates, except three: neurotrophins [89, 91], neuroimaging and response inhibition [98] and neuroimaging and decision-making [99] (see Supplements 3 and 4). In the last two studies, subjects with MDD plus suicidality and healthy controls showed no differences in neuroimaging findings, but subjects with MDD did have aberrant responses.

Table 5 Overview of findings by type of self-harm

Table 5 shows that, first, there are substantially more data from children and adolescents with suicidality than on subjects with NSSI or those from studies of any type of self-harm. Second, the findings for the specific correlates are inconsistent, even for the correlates with larger numbers of studies and even within a single type of self-harm. For example, studies of the association between suicidality and HPA axis reactivity reported hyperreactivity [61], hyporeactivity [62], and an aberrant secretion pattern [58] and in subjects with NSSI, three studies reported hyporeactivity [64, 67, 74] and one study reported hyperreactivity [75]. In an example from the neural correlates, brain activity in response to social interaction was associated with suicidality, but one study reported decreased activity in the insula [101] and the other reported increased activity [102]. Heterogeneity in the methods used to identify self-harm, sample characteristics, and measurement of the specific correlates or outcomes is so prevalent that it is difficult to interpret these discrepant findings.

A third feature of these findings evident in Table 5 is that there are stronger signals from some of the studies of specific correlates. These appear to be from groups of studies more methodologically similar and with lower risks of bias. For example, two studies with larger sample sizes, similar age ranges, outpatients and healthy controls, and similar proportions of girls investigated the association between neurotrophins and suicidality or any type of self-harm (for which data were analyzed separately). Correlate measurement and outcome methods were similar. Both were rated Good with regards to risk of bias. No association was found with any neurotrophin in either study. Similarly, two studies of lipid metabolism and suicidality both used samples of inpatients, clinical records to classify self-harm, used the same methods for measuring correlates, and compared the outcomes with normed levels in children and adolescents. Risk of bias ratings were Good for both studies. Both found that suicidality was associated with lower cholesterol.

Fourth, there are no clear patterns of findings, e.g., differences or similarities, for any of the specific correlates by self-harm sub-types. For most of the correlates, the issue is moot because there aren’t enough studies for comparison of findings by self-harm sub-type. But even with specific correlates having studies about each type of self-harm, e.g. the stress response system or neuroimaging responses to social interaction, it is difficult to determine if self-harm sub-type makes any difference in the results because of such wide inter-study methodologic variability.

Fifth, most studies examined correlates with designs and protocols that could contribute to development of diagnostic biomarkers. However, several are notable for correlate investigations that could inform advancement of prognostic [61, 62, 67, 69, 72, 73, 80, 81, 121] or treatment response biomarkers [113, 114, 120].

Discussion

This scoping review contributes to research on peripheral and neural correlates of self-harm by summarizing data on children and adolescents ages 3 to 19 years, a demographic with social, developmental, and psychological characteristics of self-harm that can differ from those found in young adults [18, 35, 37, 38, 42,43,44,45]. Our work also advances knowledge on this topic by reviewing 79 studies in 76 publications, notably more studies than in earlier reviews and by covering 45 years from 1985 to 2020.

Twenty-eight specific correlates were investigated in this body of literature, although more than a quarter of them were only studied once. The widespread use of the case-control design makes all the study findings vulnerable to selection and information biases, as well as confounding [133,134,135] problems that can be mitigated by adequate sample sizes, strategies to minimize classification error, and recruitment of subjects representative of the pediatric self-harm population. Unfortunately, many of these studies fall short on one or more of these features. Conversely, studies which did have similar methods and were rated as Good did report similar findings, e.g., [89, 91, 94, 95].

Resolution of inter-study divergence in findings is challenging because of methodological heterogeneity on multiple levels: classification of self-harm; classification of subjects with respect to psychiatric patient status; an assortment of different types of controls and a surprising lack of uniformity in measuring the actual correlates. The use of multiple different instruments to classify subjects also undermines our ability to use these studies for biomarker development. Moreover, recent reviews of child and adolescent self-harm instruments have questioned their psychometric properties [136,137,138] and pointed out possible threats to validity when an instrument is used for purposes other than originally designed. Other data from adults demonstrate that 40% of those responding yes to a question about attempting suicide later denied the report [139]. This suggests that single questions may be misleading, but numerous studies did classify subjects with one or two questions.

Similar issues arose in the measurement of correlates or outcomes. Confounding was rarely handled by standard methods such as stratification or propensity scores [140] and researchers sometimes measured unique outcomes of specific correlates, making inter-study comparisons or interpretation of different findings challenging.

Although the patient samples from earlier studies may have been representative of self-harming children and adolescents in the 1980s and early 1990s, current information suggests that this is no longer the case. Up to 60% of adolescents with NSSI in the general population do not seek care [141] and half of the adolescents with suicidality or NSSI in a population study do not present for help [142]. Moreover, the ability to access care can be compromised by low socioeconomic status [143], rural geographic location [144], or minority race/ethnicity [145], thus reducing generalizability of findings for those who experience healthcare disparities. Similarly, recruiting participants based on a psychiatric disorder limits applicability of results to the sub-population of self-harming children or adolescents with that disorder, despite evidence that self-harm can be transdiagnostic or exist independent of psychiatric disorders [146,147,148].

We identified four research gaps: 1) the absence of replication studies; 2) a dearth of studies on children younger than 11 years old; 3) relatively few studies on non-patient children or adolescents, and 4) disproportionate representation of girls. A possible gap is the lack of data on non-white children and adolescents, but we could not confirm this.

If left unfilled, these gaps will significantly impede progress in this field. Replication studies can help verify that an association between self-harm and a specific correlate is not a spurious finding and they are a critical step in the development of all types of biomarkers [149]. Thus, they should be included in future research about pediatric peripheral and neural correlates of self-harm. These can be guided by some innovative research to determine which studies in a body of work should undergo replication [150,151,152].

More studies on correlates of self-harm in younger children are needed, as self-harm is increasing in younger age groups [153, 154]. For example, presentations to US emergency departments for suicidality increased substantially from 2007 to 2015 (the most recent data available) and 43% of those visits were from children 5–10 years of age. Moreover, suicide was the third leading cause of death in the US for younger children (10–12 years of age) (https://webappa.cdc.gov/sasweb/ncipc/leadcause.html) and the characteristics of younger children with self-harm are different from adolescents [5].

Gender proportions are essential to balance in the research landscape. Girls are more likely to engage in suicidality and NSSI, so samples comprised mostly or entirely of females can be appropriate. But results from such studies cannot be generalized to boys. Furthermore, given gender differences in help-seeking, it is unlikely that many boys with self-harm will be found in clinical settings.

New studies must increase the number of non-patient children and adolescents under investigation. It is also essential that samples are not only more diverse with regards to gender, but also for race and ethnicity, as recent data show that from 1991 to 2017 suicide attempts among black adolescents in the US rose 73%, compared to a decrease of 7.5% in white adolescents [155]. The current body of work is ill-suited to help us understand self-harm in black children and adolescents.

There were several strengths in this body of research, including a larger number of studies and a longer list of specific correlates than we expected to find based on previous reviews. In addition, the cohort and pre-post treatment studies provide good foundations for the development of prognostic and treatment biomarkers.

Our original goal was to prepare for a systematic review and meta-analysis in service of identifying correlates with potential for biomarker development. Clinical biomarker development requires that a representative and valid sample of the target population is studied with a feasible and standardized process for biomarker data collection and processing, and that replicability of results is shown in appropriate sub-populations [156, 157]. After these criteria have been satisfied, characteristics of the marker such as sensitivity, specificity, PPV, and NPV [158] must be established. Progression to biomarker development is not possible for the peripheral or neural correlates identified in our review, due to the small numbers of studies, concerns about self-harm classification, variability of findings, and methodologic weaknesses in measuring some of the specific correlates.

But this body of work could serve as an excellent platform for biomarker discovery if four improvements are made in future research. The first and most important pertains to the classification of self-harm. In the early to mid-2000s, there was widespread discussion of whether suicidality and NSSI lay on a continuum, i.e., with a predictable pattern of progression from NSSI to suicidal behavior or on a spectrum, i.e., co-occurring disorders that partially overlapped in characteristics and etiology, but comprising distinct clinical syndromes. The concept of a spectrum gained momentum, culminating in the US with the designation of NSSI and suicidal behavior disorder as separate disorders in need of further study in psychiatry’s Diagnostic and Statistical Manual (DSM)- 5 [159].

However, this approach may be difficult to use, based as it is in self-report about intention to die when engaging in self-harm. Some researchers assert this two-category conceptualization of self-harm has been inadequately validated [160], with concerns that investigations based on this schema will lead to invalid phenotyping [161162163164165].

To continue to acquire knowledge about correlates of self-harm in children and adolescents despite disagreements about the phenomenology, we recommend improving participant classification methods. Studies should collect and publish information about all types of self-harm, even if the study aim is to focus on one type. Optimizing the chance that homogeneous samples will be created if that is a goal, publishing results of this classification strategy will also deepen our understanding of the complex symptoms and behaviours comprising child and adolescent self-harm.

To increase the validity of classification, instruments with good psychometric properties in children and adolescents should be used. Approaches using one or two questions from instruments measuring other constructs are not recommended [166]. Furthermore, as the type of instrument, e.g., self-report checklist vs. clinician-rated instruments, can produce different prevalences of self-harm [167], we recommend classifying self-harm in subjects based on a transparent integration of data from several types of instruments [168]. We also recommend more research on the use of cognitive tasks (instead of, or in addition to self-report) to classify self-harm in children and adolescents, especially in younger children [169].

The second set of improvements involves minimizing bias in future correlate studies. All the methodologic issues in design, sample construction, correlate and outcome measurement discussed in this review are well-described in the epidemiologic literature. However, to ensure that bias and measurement errors are maximally mitigated, we suggest that researchers use one of the risk of bias instruments as a planning guide in the study development phase [170].

Advancement in this field will be stalled unless measurement of specific correlates and outcomes is standardized. Researchers working in each specific correlate area could substantially improve the capacity to detect associations between self-harm and correlates if there was agreement on the measurement of peripheral and neural correlates and their outcomes. Such a practice would also minimize chances that information bias or measurement error has produced low inter-study agreement in results.

Most of the 28 specific correlates investigated in our dataset were derived from research on adults. Our fourth recommendation is to encourage future researchers to use innovative strategies to search for new potential correlates in children and adolescents with self-harm. One possible source of new correlates is post-mortem studies of completed suicide in the pediatric age range. No post-mortem studies met inclusion criteria for this review. But studies modeled after the pioneering work of Pandey and colleagues [171] need to be conducted in 3–19-year-olds who have completed suicide. Another source of possible correlates are genome-wide association studies of persons who have completed suicide [172] and of those with other types of self-harm [173] or NSSI [174]. The burgeoning field of “omics” research beyond genomics is likely to be useful in generating possible correlates for investigation [175], whether studies are conducted on small samples of individuals [176] or use “ome-wide” data from population samples [177]. The work to date has primarily been done in adults, but we encourage researchers to apply the same strategies in 3–19-year-olds [178].

A final approach to identify new potential correlates or biomarkers is the use of machine learning, either with electronic health record (EHR) data [179] or in analyzing neural signatures in response to cognitive tasks [180]. There are many reservations about the use of these new approaches [181, 182], but as the machine learning field matures, strategies such as these may provide promising leads.

Limitations

While having several strengths, our review also has limitations. First, we only obtained papers written in English, so may have missed important studies on the topic not written in English. Second, our search used only two databases, PubMed and Embase. However, these two cover medical and biomedical research from 1947 to the present, including Medline, conference abstracts, ebooks, and citations in non-medical journals. PubMed has 25 million records, while Embase has 29 million. Therefore, we do not think this search strategy missed studies, but it is possible. Third, we did not search the gray literature, nor did we write to prominent authors looking for unpublished studies, especially those with negative findings. Publication bias thus might explain why nearly every study in our dataset reported some association between self-harm and the specific correlate under investigation. Fourth, our categorization of self-harm studies was based on how investigators described their populations of interest or samples. Our classification system was too high-level for us to report on the more nuanced features of suicidality, e.g., suicidal plans, ideation, attempts or on specific NSSI behaviors, e.g. cutting or burning. Future researchers will likely want more detail on specific behavioral manifestations, but if so, such details are supplied in Supplements 3 and 4. Fifth, our assessments of risk of bias showed only moderate inter-rater agreement. The methodologic problems that we summarized in the studies are easy to list from questions asked in the rating process, but we are less confident about the qualitative ratings. Any future systematic reviews should ensure better agreement from the beginning of the process with better training or by using more quantitative rating systems.

Conclusions

Our scoping review demonstrates that this corpus of research is not sufficiently mature for a meta-analysis to identify potential biomarkers. Many conflicting results are reported for the 28 specific correlates. Interpretation of the divergent results is hampered by methods that may have produced biased findings and samples mainly generalizable to clinical populations and girls. Most of the work was done in adolescents, not children younger than 11 years. Although the current research is not robust enough to identify potential biomarkers, it provides a platform for the next level of work. Our suggestions to improve future research should significantly advance the field and help promote biomarker development for the diagnosis, prognosis, and treatment of the growing problem of child and adolescent self-harm.