Introduction

Adult attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental condition that emerges during childhood or young adulthood and is characterized by symptoms of inattention and/or hyperactivity-impulsivity (Adler et al., 2017; Asherson et al., 2016). Adults with ADHD have a global prevalence of 2.58% (persistent disorder) and 6.76% (symptomatic disorder) (Song et al., 2021). In clinical practice, adult ADHD is assessed using questionnaires, interviews with relatives and inspection of school certificates. Although neurocognitive deficits are inherent to ADHD, there is still no established test or test battery that is commonly used in the assessment of this disorder (Fried et al., 2021; Nikolas et al., 2019). Nevertheless, neuropsychological tests should be used in patients with presumed cognitive deficits to objectify these deficits during the diagnostic process. To date, there is an emerging quest to establish neurocognitive paradigms as complementary tools in the assessment of adult ADHD.

Early research on ADHD has suggested that deficits in inhibitory control are a primary phenotype in this disorder (Barkley, 1997). Support for the inhibition deficit model has come from studies showing altered executive functions in adult ADHD (Hadas et al., 2021; Linhartová et al., 2021; Nigg et al., 2002; Silverstein et al., 2020). An important aspect of executive functions is inhibitory control, which has often been investigated with the stop-signal task (SST; Verbruggen & Logan, 2008), but also with other paradigms such as the go/no-go task (Fisher et al., 2011). However, there are substantial differences between the two tasks. For instance, the go/no-go task does not include a measure of individual response inhibition speed, which is explicitly obtained in the SST. Moreover, it has been shown that different neural mechanisms underlie stimulus processing in the two tasks (Raud et al., 2020). Hence, the go/no-go task and the SST likely capture different facets of response inhibition and therefore, they should be examined independently. In the current review and meta-analysis, we focused on the SST in adult ADHD (for a meta-analysis on the go/no-go task in combined children, and adult studies, see Wright et al., 2014).

In the SST, participants are instructed to make a forced-choice response following a ‘go-signal’, e.g., an arrow pointing to the left or right, and to respond with a left or right button press, respectively. Crucially, in a small proportion of trials, an auditory or visual ‘stop-signal’ is presented after the go-signal and participants are required to withhold the behavioral response. In most studies an adaptive approach is used to obtain the delay interval between the go-signal and the stop-signal for which the response inhibition rates are around 50% at the individual subject level. Based on this delay interval and the response times to go-trials, the stop-signal reaction time (SSRT) is calculated, which has become an established measure of response inhibition (Logan et al., 2014). A previous meta-analysis of the SSRT, which involved studies in children and adults diagnosed with ADHD, has revealed deficits of moderate effect sizes \(\left(g\right)=0.62\) across age groups (Lipszyc & Schachar, 2010). This meta-analysis included 68 studies with children but only 10 studies with adult ADHD. Therefore, the validity of this analysis regarding adult ADHD was limited and the degree of SSRT deficits in adult ADHD remains to be investigated.

Here, we performed a review and meta-analysis that conformed to current PRISMA guidelines focusing on response inhibition deficits, as measured by the SSRT, in adult ADHD. Our analysis included 26 publications with 27 studies, which allowed for a reliable estimation of response inhibition deficits in patients. We performed a quality assessment of the SST following a recent consensus paper (Verbruggen et al., 2019) and estimated the risk of bias (RoB) for each study. We thoroughly examined whether the study quality as well as participant-related and clinical factors influence response inhibition deficits in adult ADHD.

Methods

The protocol of this systematic review and meta-analysis has been pre-registered in the PROSPERO database (PROSPERO ID: CRD42021266709).

Study Selection

The following study selection criteria were applied: 1. Patient population: Studies contained at least one group of adult participants (18 +) with a current diagnosis of ADHD in accordance with the DSM criteria (5 or earlier) or Hyperkinetic Disorder according to the ICD (10 or earlier) criteria. Studies investigating populations with only subclinical ADHD symptoms were not included. 2. Control group: Studies must include at least one healthy control group. 3. Experimental task: Response inhibition performance had to be assessed using the SST or the Stop-Change Task, which is a modified version of the SST in which individuals swith to a secondary response after they have inhibited an ongoing response (Verbruggen & Logan, 2009). Studies using atypical SST paradigms such as dual tasks or the selective SST were excluded. Also excluded were studies in which participants received feedback on stop-signal performance, as feedback and reward influence response inhibition (Lipszyc & Schachar, 2010; Slusarek et al., 2001). 4. Outcome measure: Sufficient test statistics for the stop-signal reaction time (SSRT) must be provided to calculate standardized mean differences (Hedge’s g). 5. Other criteria: Empirical articles written in English or German language and published or accepted for publication in peer-reviewed journals during the period 2000–2022.

Search Strategy

To identify relevant articles, an electronic search was conducted up to April 14, 2022 in two major publication databases: Medline and PsycInfo (accessed from EBSCOhost). The following syntax, adapted from Lipszyc and Schachar (2010), was used: [(attention deficit hyperactivity disorder OR ADHD) AND Adult* AND (stop task OR stop signal OR response inhibition OR executive function)]. Limiters were set so that only articles published in peer-reviewed journals in English or German since the 1st of January 2000 were included. Furthermore, reference lists of the identified empirical articles, previous meta-analyses and systematic reviews were scanned to ensure that all relevant articles were captured.

Study Selection

The study selection process was conducted by two authors (TZ and DS) and included two stages: 1. Initial screening of titles and abstracts using the inclusion and exclusion criteria described above. 2. For the resulting set of studies full texts were obtained and reviewed in detail for eligibility. Screening of eligible articles was performed in Endnote. In case of disagreement between authors regarding the eligibility of studies, studies were screened by other team members and disagreements during the first or the second screening process were discussed until consensus was reached.

Data Extraction and Outcomes

Data were extracted independently by two authors (TZ and MS). When statistical values were insufficiently reported for the meta-analysis, the authors of the articles were contacted and asked to provide the missing information. The following measures of the SST were extracted from the articles, separately for patients and controls: SSRT as primary outcome; stop commission errors (responding on a stop trial); go discrimination errors (e.g., responding with the left arrow key, even though a rightwards pointing arrow was presented); go omission errors (not responding on a go trial); and go accuracy (percentage of correct go trials) as secondary outcomes. In addition, important variables that might influence behavioral performance in the planned analysis were extracted and tabulated for each study: Age; IQ; percentage of males; ADHD subtype; years of education; comorbidities; medication status; patient recruitment setting.

Assessments of the SST Validity and Risk of Bias

The validity of the SST and the risk of bias (RoB) in each study are important factors that could influence group differences between patients and controls. Therefore, they were explicitly examined in the current analysis. Both SST validity and RoB were assessed by two independent raters (TZ and MS). In case of disagreement between the authors, a consensus was reached by discussion with other team members. To increase inter-rater reliability, a calibration session was conducted in which the assessments were applied to two articles not included in this review. This allowed the two raters to identify possible sources of disagreement and to decide on rules for the assessment of ambiguous cases (Supplementary Text 1).

Across studies selected for this meta-analysis, there is considerable variability in the administration of the SST, and the analytic procedures used to derive outcome measures. To determine the validity of the SST, we used the recent consensus guide developed by Verbruggen et al. (2019), which provides 12 ‘best practice’ recommendations for the design, implementation and analysis of the SST. We selected four main criteria from this consensus guide and rephrased them into four dichotomous items, i.e., item fulfilled or not, for the critical appraisal (Supplementary Text 2). In case of missing information, the criterion was rated as not fulfilled.

The overall validity of the SST was then rated as follows: < 3 criteria fulfilled = low validity; 3 criteria fulfilled = moderate validity; 4 criteria fulfilled = high validity. Cohen’s unweighted kappa for nominal data were calculated for each item.

In case of a bias or a prevalence problem, Byrt’s bias and prevalence adjusted kappa were additionally reported (Byrt et al., 1993; Hallgren, 2012). For the RoB assessment, we applied the adapted Hombrados and Waddington criteria that have been recently used for studies with ADHD patients (Hulsbosch et al., 2021): (1) Equivalent group sizes; (2) Use of a diagnostic interview or questionnaires to determine ADHD diagnosis; (3) Sufficient sample sizes; (4) All statistical outcomes are reported; (5) Transparent reporting of the data analysis; (6) Reporting of missing/excluded data. Each item was rated as “good/low RoB”, “satisfactory/moderate RoB”, or “poor/high RoB”. In accordance with the rating system described in the Cochrane Handbook, an overall quality rating of low, moderate or high was assigned to each study based on the following criteria: If at least one of the categories was rated as having a moderate RoB, the overall RoB could only be rated as moderate as well, even if all other categories were rated as having a low RoB. The same principle applied if at least one category was rated as having a high RoB. Cohen’s weighted Kappa was calculated for each individual domain (Cohen, 1968; Hallgren, 2012). After all studies were rated for SST validity and RoB, a overall study quality variable was created combining the RoB ratings and SST validity. Studies with high RoB and low SST validity were rated as having low overall quality, and studies with moderate or low RoB AND moderate or high SST validity were rated as having moderate to high overall quality. Studies characterized by the remaining combinations of RoB and SST validity (low RoB and low SST validity; moderate RoB and low SST validity; high RoB and moderate SST validity; high RoB and high SST validity) were assigned to the category moderate to low overall quality. This categorization was used for subgroup analyses (see below).

Meta-Analysis

The meta-analysis was carried out in R (version 4.0.3; R Core Team, 2020) and the metafor package (version 3.0.2; Viechtbauer, 2010). Hedges’ \(g\) was calculated for each individual study and each outcome (primary outcome SSRT and secondary outcomes) displaying the effect size of the group difference. Given that various sources could account for differences in findings between studies, e.g., examination of different patient samples or use of different SST paradigms, a random-effects model was fitted to the data. Instead of the usual large-sample approximation, the sampling variance was adjusted by taking the sample-size weighted average of the Hedges' \(g\) values into the equation, as this approach has been shown to be less biased (Lin & Aloe, 2021). For computing confidence intervals, the method introduced by Knapp and Hartung (2003) was chosen. To assess for heterogeneity, (1) \({\tau }^{2}\) was estimated using the restricted maximum-likelihood estimator (Viechtbauer, 2005), (2) the Q-test for heterogeneity and (3) the \({I}^{2}\) statistics (Higgins & Thompson, 2002) are reported. If heterogeneity between studies is present, i.e., \({\widehat{\tau }}^{2}>0\), regardless of whether the Q-test reaches significance, a prediction interval for the true outcomes is provided (Riley et al., 2011). The results will be visualized using forest plots. Furthermore, the model is assessed regarding (1) potential outliers, i.e. studies with studentized residuals larger than the \(100\times (1-0.05/(2\times k))th\) percentile of a standard normal distribution, considering a Bonferroni correction with \(\alpha=0.05\) (two-sided) for k included studies as well as (2) potentially overinfluential studies, i.e. with a Cook’s distance larger than the median plus 6 times the interquartile range of the Cook’s distances (Viechtbauer & Cheung, 2010). If outliers were detected, leave-one-out diagnostics for sensitivity analysis were conducted.

Assessment of Publication Bias

Evidence of publication bias was assessed using a combination of visual and statistical approaches. First, the funnel plot (Copas & Chi, 2000) of standardized mean difference (SMD) against the inverse square root of the sample size was visually inspected for asymmetries (Zwetsloot et al., 2017). In the absense of bias, the funnel plot should be symmetrical and narrow down at the top, where studies with larger sample sizes are located and the effect estimate is more precise. However, determining of publication bias using visual inspection methods (such as funnel plots) is often subjective and prone to judgment errors (Wang & Bushman, 1998). Therefore, it is recommended to additionally compute a quantile–quantile plot (Q-Q plot) to aid in the assessment of publication bias. Next, Egger’s regression test was computed using the inverse of the square root sample size as a predictor to statistically test for asymmetry of the funnel plot (Zwetsloot et al., 2017).

Meta-Regression and Subgroup Analysis

To assess whether the pre-specified extracted demographic and clinical variables as well as study quality influenced the meta-analytic outcome and to explore the source of potential heterogeneity, a meta-regression analysis was performed for continuous (age, sex, IQ) and a subgroup analysis for categorical covariates (RoB, SST validity and overall study quality, comorbidities, patient setting and medication status).

Mixed-effects models were fitted to the data for the meta-regression analysis. If sufficient data were available across studies, the extracted variables were included in a multivariate regression model. Otherwise, univariate models were fitted. The parameter \({\tau }^{2}\), which indicates the residual heterogeneity not explained by the included moderators (Viechtbauer, 2010), was estimated using the REML-estimator (Viechtbauer, 2005). Tests and confidence intervals were calculated by the Knapp and Hartung (2003) method. The mean values for age, IQ, and percentage of males of the study samples were computed for inclusion in a regression model. For this purpose, the reported means of the patients and the means of the controls were averaged. When sample sizes of ADHD participants and healthy controls differed substantially, a sample-size weighted mean was calculated. When IQ scores were reported separately for verbal and non-verbal IQ in a study, the mean of these scores was calculated. IQ was centered before taken into a univariate regression model. Age and gender were standardized before taken into a multivariate regression model.

The following variables were considered for subgroup analysis: (1) RoB with the levels low RoB vs. moderate RoB vs. high RoB; (2) SST validity with the levels low validity vs. moderate validity vs. high validity; (3) overall study quality with the levels low overall quality vs. low to moderate overall quality vs. moderate to high overall quality; (4) psychiatric comorbidities in patients with the levels comorbidities allowed vs. comorbidities not allowed; (5) psychiatric comorbidities in control participants with the levels comorbidities allowed vs. comorbidities not allowed; (6) patient setting with the levels subgroups recruited from a clinical-setting vs. recruited from a non-clinical setting vs. recruited from both (mixed); (7) medication status with the levels subgroups medicated vs. unmedicated. Separate random-effects models were fitted for each of these variables. Then, a fixed-effects regression including a moderator with the effect estimates of the subgroups was calculated to test whether it significantly moderated SSRT.

Finally, for both meta-regressions and subgroup analyses an omnibus test of moderators was conducted, testing all coefficients excluding the intercept against 0. If the omnibus test reaches significance, it may indicate that some of the heterogeneity could be explained by the predictors included in the model (Viechtbauer, 2010).

Secondary Outcome Measures of the SST

In addition to the SSRT, the SST identifies other outcome measures that are recommended to be reported (Verbruggen et al., 2019). For the current meta-analysis, we examined stop commission errors, go discrimination errors, go omission errors and go accuracy. All studies that reported these parameters were included in the meta-analyses. When errors were reported in numbers (i.e., means, standard deviations), then the percentage of errors was calculated. The analytical procedures were the same as for the SSRT, except that no meta-regression or subgroup analyses were conducted, due to the smaller number of available studies.

Results

Study Selection

The electronic search resulted in 1186 articles in MEDLINE and 1353 in PsycInfo (Fig. 1). Limiters described in the Methods section excluded 215 of these articles. Search results were exported to EndNote, where EBSCOhost automatically removed 662 duplicates, resulting in 1662 studies. After removing the remaining duplicates using EndNote’s automatic deduplication tool (n = 177) and manual inspection (n = 36), 1449 articles remained. Screening of titles and abstracts for eligibility resulted in the exclusion of 1288 studies. Full texts of the remaining 161 studies were obtained and checked thoroughly checked for eligibility. Of the 161 studies, 116 were excluded because they did not include a healthy control group (n = 1), assessed only subclinical symptomatology (n = 2), included ADHD only as comorbid disorder (n = 1), or did not use an SST paradigm (n = 112). Another 4 studies were excluded because they had substantially modified the SST paradigm and another 8 studies were excluded because their samples included both children and adults. Six studies reported insufficient statistical values for the meta-analysis and authors were contacted. We received data from four studies, which were then included in the final sample. It is important to note that Bekker et al. (2005a, b) and van Dongen-Boomsma et al. (2010) reported identical SSRTs, i.e., for the same experimental session and the same sample of participants. The same accounted for Nigg et al. (2005), Stavro et al. (2007) and Martel et al. (2017) as well as for Linhartová et al. (2020) and Linhartová et al. (2021). For these groups of articles, the reported SSRT value was extracted and counted as a single sample in the meta-analysis. Finally, Szekely et al. (2017) conducted two SST experiments, one implemented for fMRI and one for MEG. Although the samples for these two experiments partially overlapped (63 completed the SST during MEG and fMRI, 85 during fMRI only, and 33 during MEG only), they were treated as single observations in the analysis. Screening of reference lists did not revealed any additional articles. Thus, in total, 26 publications with 27 studies were included in the meta-analysis (1799 participants; ADHD = 883; controls = 916). Sample characteristics for all included studies are shown in Table 1.

Fig. 1
figure 1

PRISMA flow diagram of study selection (in accordance with Page et al., 2021)

Table 1 Studies investigating the stop-signal task in adult ADHD

Twenty-four out of 27 studies prohibited stimulant medication on the day of testing, two studies did not report this information and one study allowed medication (Linhartová et al., 2021). One study tested the effect of stimulant medication on task performance (Chamberlain et al., 2007) and another study allowed medication during testing (Congdon et al., 2014). To maintain similarity between studies, data from Chamberlain et al. (2007) were extracted for the placebo patient group only, and data from Congdon et al. (2014) were extracted for the unmedicated patient group only. Marx et al. (2013) used an SST paradigm comparing performance with and without reward. For this study only data for the non-reward group were extracted. In some articles, information on the presence of comorbidities or the patient setting was reported ambiguously. For example, Aron et al. (2003) report that healthy controls had “no previous contact with psychiatric services” but it is unclear whether potential comorbidities of healthy controls were screened within the study. Therefore, the coding for these two variables may be biased. On request, Meachon et al. (2021) provided unpublished information on the age and sex distribution in the two groups. Bialystok et al. (2017) provided the mean age and proportion of males for the subset who completed the SST. Demographic variables, information on the IQ and other relevant information on the study population were not available for all studies. A summary is provided in Table 2. A detailed overview of psychiatric comorbidities in patients and in controls is given in Table 3.

Table 2 Sample characterization
Table 3 Detailed comorbidity information for patients and healthy controls

SST Validity and Risk of Bias

The validity of the SST was evaluated for all 26 articles using 4 items, resulting in 104 individual ratings. Table 4 provides an overview of the ratings. Nineteen studies (73%) were rated as having low validity, 5 studies (19%) as having moderate and 2 studies (8%) as having high validity. Marginal distributions showed some degree of prevalence bias for all items. This bias was strongest for items 2 and 4. All items showed substantial to perfect interrater agreement (Hallgren, 2012), with no systematic differences between raters (Supplementary Table 1). Overall, most of the studies included in the meta-analysis had a low or moderate quality of the SST.

Table 4 Stop-signal task validity ratings

RoB was evaluated for all 26 articles in 6 domains, resulting in 156 individual ratings. Table 5 provides an overview of the RoB ratings. Overall, most studies had a moderate or high RoB. One article (4%) received a low rating, twelve articles (46%) a moderate rating and 13 articles (50%) a high rating. There was substantial to perfect interrater agreement for all domains (Supplementary Table 2). Two major sources of interrater disagreement were in the reporting of missing data (category 6) and selective outcome reporting (category 4). Some studies did not specifically address whether the entire sample was included in the final SST analysis. However, this could be inferred from the degrees of freedom in the analysis. Therefore, it was decided to use the degrees of freedom to rate this category. In addition, four of the included studies did not report the mean and standard deviation of SST outcomes. Upon request, the authors provided us with these values and the selective data reporting for these four studies was then rated as low RoB.

Table 5 Risk of bias ratings

Meta-Analysis of Stop-Signal Reaction Time

Figure 2 presents the forest plot of the observed group differences in the SSRT for 27 observations. Across studies, Hedges' \(g\) values ranged from -0.341 to 1.230. Results of the random-effects meta-analysis revealed a statistically significant moderate mean effect size estimate of 0.509 (t(26) = 7.829, p < 0.0001, 95% CI: 0.376–0.644). Adults with ADHD showed moderately higher SSRTs compared to healthy controls. The \({I}^{2}\) statistic demonstrated moderate evidence of heterogeneity across studies (Q(26) = 39.546, p = 0.043, \({\widehat{\tau }}^{2}=0.030\), \({I}^{2}=31.224\mathrm{\%}\)). The heterogeneity reflects in a 95% prediction interval ranging between 0.129 and 0.891.

Fig. 2
figure 2

Forest plot showing the observed standardized mean differences (Hedges’ g) for SSRT, the random-effects model estimate on the right, and the results of the test for heterogeneity on the left. The dashed line on the overall effect estimate (diamond) represents the prediction interval that is shown due to the present heterogeneity. 1Data were extracted from Bekker et al. (2005ab) and van Dongen-Boomsma et al. (2010); 2 Data were extracted from Linhartová et al. (2020) and Linhartová et al. (2021); 3Data were extracted from Nigg et al. (2005), Stavro et al. (2007) and Martel et al. (2017)

According to the Cook’s distances, none of the studies was overly influential. However, the study by Szekely et al. (2017) implementing the SST for fMRI had a studentized residual larger than ± 3.113 and is therefore an outlier in the context of this model. Omitting this observation would reduce \({\widehat{\tau }}^{2}\) to 0.000, \({I}^{2}\) to 0.004% and increase \(g\) to 0.524 (95% CI 0.416 to 0.631). Linhartová et al. (2021) was the only study that allowed stable stimulant medication during testing and in Chamberlain et al. (2007) patients received a placebo treatment. To explore whether this might have influenced the results, we conducted a sensitivity analysis excluding these two studies. In this analysis \(g\) decreased slightly to 0.498 (95% CI 0.354, 0.641), yet heterogeneity remained comparable to the original results with \({\widehat{\tau }}^{2}=0.034\) and \({I}^{2}=34.44\mathrm{\%}\). This indicates that medication does not have a substantial effect on SSRT deficits in adult ADHD. Taken together, the random-effects meta-analysis showed moderate effect sizes \(\left(g=0.509\;\mathrm{ to }\;0.524\right)\) with larger SSRTs in patients compared to controls.

Publication Bias

Figure 3A depicts a funnel plot of the studies’ SMDs plotted against the inverse of the square root of the sample sizes. Egger’s regression test for funnel plot asymmetry was not significant (\(t\)(25) = 1.941, \(p\) = 0.064). The funnel plot appears to converge close to the mean estimate as the sample size increases. A normal quantile–quantile plot is shown in Fig. 3B. Most of the points in this plot fall inside the 95%-confidence bands. However, there is a slight skewing to the left in the middle of the line, with several points outside the bands. This is an indication that there may be a subtle publication bias.

Fig. 3
figure 3

Plots for assessment of publication bias. A Funnel plot for SSRT plotting SMDs against the inverse of the square root of the sample size. B Normal quantile–quantile plot, plotting the quantiles of a standard normal distribution against the quantiles of the observed distribution. The points should fall on a straight line and inside the 95%-confidence bands. 11Cubillo et al. (2010), 20Murphy et al. (2002), 26Szekely et al. (2017)

Meta-Regression and Subgroup Analysis

Meta-regression analysis was conducted for continuous covariates (age; sex; IQ) and a subgroup analysis for categorical covariates (RoB, SST validity; overall study quality; comorbidities; patient setting; medication status). In this analysis, the data from the fMRI study by Szekely et al. (2017) were an outlier and were therefore excluded from further analysis. In four of the studies (Bialystok et al., 2017; Cherkasova et al., 2014; Lampe et al., 2007; Meachon et al., 2021), only a subset of participants completed the SST. To assess whether this might affect the robustness of the meta-regression results, the analysis was repeated excluding these 4 studies. This did not substantially influence the study outcome (Supplementary Tables 3 and 4). Table 6 provides an overview of the meta-regression analyses. Bialystok et al. (2017) reported demographic and outcome variables separately for monolinguals (ML) and bilinguals (BL) in each group. Values for ML and BL were averaged to obtain only a single value per group for inclusion in the meta-regression analysis. The analysis revealed no significant effects of age, sex or IQ. Table 7 provides an overview of the subgroup analyses. Some studies reported that only some psychiatric comorbidities led to exclusion. These were also coded as “comorbidities allowed”. Data on years of education were sparse and heterogeneous and were therefore not included in the meta-regression. As only one study reported that participants were medicated during testing, medication status was also excluded from the analysis. There were only 5 articles that did not allow for comorbidities in ADHD patients. Therefore, the estimated mean SMD for these studies reported in Table 7 may not be robust. The same accounts for the estimated SMD for the level “mixed” of the setting variable, as only 2 studies reported recruiting ADHD patients from both clinical and non-clinical settings. Interestingly, the differences between the setting subgroups approached significance (p = 0.066, Table 7). Therefore, we conducted a follow-up analysis to explore whether there are significant differences between studies with clinical and non-clinical settings only, which was not the case (p = 0.171). The analysis of study quality revealed that both RoB assessment and SST validity ratings did not significantly moderate the SSRT. For RoB, the estimated effect was largest for studies with low RoB \(\left(g=0.651\right)\) and smallest for studies with high RoB \(\left(g=0.531\right)\). However, only one study was classified as having a low RoB, so the result for this category should be interpreted with caution. The group of studies with low SST validity showed the largest average effect size \(\left(g=0.556\right)\), whereas the group of studies with high SST validity showed the smallest average effect size \(\left(g=0.415\right)\). There were only two studies with high SST validity, which limits the reliability of the result for this category. Similar to RoB, the study quality did not significantly moderate SSRT, with an effect size \(g=0.49\) for studies with moderate to high overall quality ratings. Forest plots with subgroups are shown in Supplementary Figs. 1, 2, and 3. In summary, our analysis did not reveal variables that significantly moderated the SSRT deficits in adult ADHD.

Table 6 Meta-regression analyses for SSRT
Table 7 Subgroup analysis for SSRT

Secondary Outcome Measures of the SST

Fifteen studies reported the percentage of stop commissions (Supplementary Fig. 4); 7 studies reported the percentage of choice errors (Supplementary Fig. 5); 9 studies reported omission errors (Supplementary Fig. 6); and 8 studies reported go accuracy (Supplementary Fig. 7). Analysis of the secondary SST outcome measures revealed no significant differences between patients and controls with respect to stop commissions (\(g=0.142\), \(p=0.064\)) and choice errors (\(g=0.242\), \(p=0.078\)). However, ADHD patients made significantly more omission errors (\(g=0.418\), \(p=0.01\)) and had a significantly lower go accuracy (\(g=-0.385\), \(p<0.008\)). A more detailed description of the results is provided in Supplementary Text 3.

Discussion

In this systematic review and meta-analysis, we integrated the data from 27 studies that examined the stop-signal task in adult ADHD. The analysis revealed inhibitory control deficits, as expressed in prolonged SSRTs, with a moderate effect size \(g\) = 0.51. These deficits were not significantly moderated by the study quality, sample characteristics, or clinical parameters. In addition, the analyses of secondary outcome measures revealed greater SST omission errors and reduced go accuracy in patients, although only few studies (n < 10) were available for these measures.

Behavioral Inhibition Deficits in Stop-Signal Response Times

The main finding of our meta-analysis is that patients with adult ADHD reliably show moderate deficits in the SSRT. The magnitude of the deficits is consistent with the results of a previous meta-analysis, which included a much smaller number of studies in adult ADHD (Lipszyc & Schachar, 2010). Our meta-analysis of 27 studies establishes the SSRT as a reliable measure for assessing inhibitory control deficits in adult ADHD. Extending previous work, we evaluated the quality of the SST using the recommendations of a recent consensus paper (Verbruggen et al., 2019) and estimated the risk of bias for each study. The large number of observations allowed us to examine whether study quality, taking into account RoB and the validity of the SST, demographics (age and gender), IQ or clinical parameters (comorbidities and setting) influence SSRT deficits in patients. To this end, we computed meta-regression and subgroup analyses including all studies that reported the respective variables. Surprisingly, none of these variables significantly influenced the magnitude of SSRT deficits in patients. This suggests that the prolonged SSRT in patients can be observed in experimental settings even when the study quality and other parameters are not optimal. The finding that there were no variables which significantly moderated SSRT deficits suggests that deficits in inhibitory control may be a phenotype in adult ADHD.

Another important question is how SSRT deficits relate to clinical symptoms in adult ADHD. In a large-scale study, Kamradt et al. (2014) examined correlations between SSRTs and ratings of current inattentive, hyperactive-impulsive symptoms and executive functions in patients. The study revealed significant moderate relationships between SSRTs and all symptom domains (r = 0.23 to 0.30). Using a hierarchical linear regression model that included other neuropsychological paradigms and demographic covariates, the authors found that only the SSRT and the continuous performance test predicted total symptom scores for inattention and hyperactivity-impulsivity. Similarly, Stavro, Nigg and colleagues (Nigg et al., 2005; Stavro et al., 2007) also found moderate (r = 0.29) relationships between SSRT deficits and executive functions, as expressed in inattentive-disorganized and hyperactive-impulsive symptoms. Thus, inhibitory control deficits, although frequently reported in empirical studies, are not well reflected in the diagnostic criteria for adult ADHD. For this reason, it could be that these deficits are often neglected during the diagnostic process and therefore also not treated, e.g., in the framework of neurocognitive training.

It is important to note that SSRT deficits are found not only in ADHD but also in other psychiatric disorders such as obsessive–compulsive disorder (OCD), addiction or schizophrenia (Lipszyc & Schachar, 2010; Smith et al., 2014). For example, Lipszyc and Schachar (2010) observed SSRT deficits with effect sizes \(g\) = 0.77 and \(g\) = 0.69 in OCD and schizophrenia, respectively. However, these analyses included only a few studies (N = 4 per group), and an updated evaluation of the SSRT in these groups would be desirable. Nevertheless, given the overlap in inhibitory deficits across disorders, the SST is unlikely to provide diagnostic value in differentiating these disorders. Therefore, we suggest that the SST could be used to quantify inhibitory control deficits in adult ADHD after excluding other psychiatric disorders in which inhibitory control deficits have been reported. In addition to the SST, other paradigms such as the go/no-go task, may be useful in assesssing response inhibition in adult ADHD. For instance, a meta-analysis of the go/no-go task that combined child, adolescent and adult studies, found deficits with a moderate effect size \(g\) = 0.49 (Wright et al., 2014), which is comparable with the SSRT deficits in current analysis. In conclusion, the finding of reliable moderate deficits in the SSRT suggests that the SST may become a valuable tool for the neuropsychological assessment of inhibitory control deficits in adult ADHD. To this end, it would be desirable to collect SST data from large samples of participants in order to obtain normative SSRT distributions, considering age, gender and education. An individual’s performance in the SST could then be compared to a normative sample.

Behavioral Inhibition Deficits in Secondary Measures of the SST

In addition to the SSRT, we computed meta-analyses for stop commission errors, go discrimination errors, go omission errors and go accuracy. These analyses revealed small to moderately greater omission errors \(\left(g=0.418\right)\) and reduced go accuracies \(\left(g=-0.385\right)\) in patients. However, only a few studies have reported omission errors (n = 9) or go accuracy (n = 8), and thus, these findings should be interpreted as preliminary evidence.

For omission errors, the study with the largest reported effect size \(\left(g=0.73\right)\) was conducted by Roberts et al. (2011). In this study, 30 adult patients with ADHD and 28 control subjects participated in a classical SST paradigm (Logan et al., 1984). Contrary to Roberts et al. (2011), an even negative albeit not significant effect \(\left(g=-0.18\right)\) was reported by Bialystok et al. (2017). In their study monolingual and bilingual patients (n = 28 monolingual, n = 28 bilingual) and controls (n = 36 monolingual, n = 37 bilingual) participated in a slightly modified version of the SST. Hence, although there was some variance in effect sizes across the studies included in the analysis, on average there were significant small to moderate group differences in omission errors.

A similar variability was also observed for accuracy in go trials, where the largest group differences \(\left(g=-0.64\right)\) were reported by Epstein et al. (2001) and the smallest group differences \(\left(g=0.24\right)\) were observed by Szekely et al. (Szekely et al., 2017; fMRI study). However, the latter study can be considered as an outlier and a meta-analysis excluding this study resulted in an increased g = −0.488. In summary, there is some evidence that, in addition to the SSRT, omission errors and accuracy in go trials during the SST also reflect neurocognitive deficits in adult ADHD. Since the deficits in omission errors and accuracies are restricted to go trials, they suggest an inability to maintain an ongoing response, which is indicative of attentional difficulties. This is in line with previous reports of sustained and focused attention deficits in adult ADHD (Marchetta et al., 2008). Further studies should analyze and report the secondary measures or the SST, which could then be submitted to an updated meta-analysis with a larger number of observations.

Limitations

This review has some limitations. First, although we used an adapted version of the search syntax proposed by Lipszyc and Schachar (2010) to ensure compatibility with previous reviews, it is possible that the search strategy missed relevant studies by excluding other terms. To ensure that we identified all studies that met our selection criteria, we thoroughly scanned the reference lists of the preselected empirical articles, previous meta-analyses and systematic reviews. Second, the literature search was restricted to peer-reviewed articles written in English or German. This excluded articles that were unpublished or published in a non-commercial form. Therefore, publication bias cannot be excluded. Third, meta-regression analyses based on study-level-averages, such as the mean age of the total study sample carry the risk of an ecological bias. For example, age may be correlated with the outcome within sudies (e.g., Congdon et al., 2014), but not across studies, or vice versa (Higgins & Thompson, 2002). For this reason, the possibility that demographic or clinical variables might influence the results of the SST at the individual study level cannot be completely excluded. Fourth, the validy assessment of the SST revealed that most studies did not use cut-offs in order to identify invalid task behavior. It has been shown that adults with ADHD often failed performance validity measures, i.e., some participants in the studies might have intentionally performed the task incorrectly to mimic cognitive deficits (Marshall et al., 2010, 2016). Therefore, the results of the SSRT meta-analysis results may be biased to some extent. Finally, the quality of most of the studies included in our meta-analysis was not optimal. Therefore, we suggest that future studies should follow recently published best practice recommendations for the design, implementation, analysis and reporting of the SST (Verbruggen et al., 2019) and apply the adapted Hombrados and Waddington criteria to ensure that a representative clinical sample is assessed (Hulsbosch et al., 2021).

Conclusion

This systematic review and meta-analysis revealed reliable moderate deficits in inhibitory control, as reflected in the SST, in adult ADHD. Our meta-regression and subgroup analyses further demonstrated no significant contribution of demographic and study quality variables on the observed group differences in SSRTs. This suggests that inhibitory control deficits can be considered a phenotype in adult ADHD. Our review and meta-analysis suggest that the SST in conjunction with other neurocognitive tests and clinical questionnaires, could become an important tool for the assessment of inhibitory control deficits in adult ADHD.