Introduction

Tourette syndrome (TS) and chronic (motor or vocal) tic disorder (CTD) are neurodevelopmental disorders characterised by the presence of motor and/or vocal tics lasting for at least 1 year [1]. TS/CTD are associated with functional impairment in multiple domains, including everyday life and activities, social adjustment, and academic performance [2, 3]. Additionally, TS/CTD are also associated with high rates of psychiatric comorbidity, which further add to the patients’ burden [4].

There are several treatment options available for patients with TS/CTD. Treatment guidelines from both Europe and North America recommend behaviour therapy (BT) as the first-line intervention, based on a combination of clinical trial evidence and expert consensus [5,6,7]. Out of several available BT modalities, comprehensive behavioural intervention for tics (CBIT) has most evidence to date [8]. CBIT, which includes habit reversal training (HRT) as its primary component, has been successfully evaluated in several randomised controlled trials (RCTs) in both paediatric and adult samples [9,10,11]. There is also some evidence for the use of exposure with response prevention (ERP) [12]. European and Canadian treatment guidelines recommend both CBIT/HRT and ERP as first-line interventions for patients with TS/CTD, whilst American treatment guidelines primarily recommend CBIT/HRT [5,6,7].

Efficacy trials aim to evaluate treatment effects under extensively controlled conditions, which entail several methodological advantages and generate high internal validity. However, the efforts for achieving high internal validity might result in substantial deviations from regular clinical practice, affecting the external validity and the generalisability of results. Examples of factors contributing to low external validity in efficacy trials include using specific participant eligibility criteria (e.g., excluding participants with certain psychiatric or somatic comorbidities), delivering an intervention in a highly standardised way (e.g., using specific treatment protocols with a pre-determined number of sessions and highly experienced/trained therapists), restricting concurrent interventions (e.g., excluding individuals on psychotropic medications) or using extensive resources to maximize patient compliance with the intervention (e.g., reminding patients about appointments) [13]. For example, the two largest RCTs of BT for TS/CTD excluded participants with comorbid autism and used specific treatment protocols and highly trained therapists, although allowed for the inclusion of participants with most other psychiatric comorbidities and on concurrent tic medication (if stable) [9, 11].

Effectiveness trials (sometimes referred to as pragmatic trials), on the other hand, aim to evaluate treatment effects when delivered in regular clinical practice. To ensure generalisability of the results to the full spectrum of the population to which the treatment might be applied, the treatments are delivered more flexibly, the patients’ compliance may vary, and there are generally no strict inclusion and exclusion criteria [14]. Perhaps unsurprisingly, meta-analyses including studies from a variety of fields indicate that treatment effects are generally larger in efficacy trials, compared to effectiveness trials [15, 16].

To our knowledge, only one study has evaluated the effectiveness of BT for young people with TS/CTD [17]. In a retrospective chart review conducted at a specialised paediatric clinic, CBIT was found to be potentially effective in reducing tic severity in a small sample of patients (N = 10). However, larger studies are needed to more thoroughly evaluate how the results from previous efficacy trials [9, 11, 12] generalise to naturalistic clinical settings.

Another common issue in many studies of BT for TS/CTD is that follow-ups have been relatively short (up to 6 months after the end of treatment) [9, 11, 12] or restricted to initial treatment responders only [9, 11]. The few available long-term follow-up studies have generally been small (N= 23–32) [10, 18]. Long-term follow-ups are especially important in this patient group given that tics are highly influenced by contextual factors and have a natural tendency to wax and wane over time [1].

The aim of this study was two-folded. First, we evaluated the effectiveness of BT in a consecutive sample of youth with TS/CTD referred to a specialist clinic. Second, we investigated whether the treatment gains were maintained at long-term follow-up (12 months after the end of the treatment).

Methods

Study Design and Setting

This was an open longitudinal study in a naturalistic clinical setting. The study was conducted at a single site, a specialist outpatient clinic for obsessive–compulsive and related disorders, including TS/CTD, within the child and adolescent mental health services (CAMHS) in Stockholm, Sweden. The Swedish health care system is mainly government-funded and universal for all citizens. Our specialist clinic primarily receives referrals from local CAMHS services (Stockholm region), but in some cases also from other parts of Sweden. All consecutive patients referred to the clinic with a diagnosis of TS or CTD (together with their parents/legal guardians, henceforth referred to as parents) were invited to participate in the study, and those agreeing provided written informed consent. The study was approved by the Swedish Ethical Review Authority (reference number: 2015/1977-31/4 [2019-02132]).

Procedure

Before their initial assessment at the clinic, all patients and their parents completed multiple standardised self- and parent-reported questionnaires online (see section Time Points and Outcome Measures). The initial assessment consisted of a 3-h evaluation at the clinic, where experienced clinical psychologists conducted a full psychopathological screening using the Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI-KID) [19] and supplementary modules for obsessive–compulsive and related disorders. Each patient was then discussed by a multidisciplinary team of clinical child psychologists, child psychiatrists, and nurses, using all collected information from the interviews and questionnaires to diagnose patients according to the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) [20] and the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [1]. After the initial assessment, patients were either offered treatment at our specialist clinic, referred to more suitable services (e.g., according to other clinical priorities) or discharged (e.g., if symptoms were too mild or in remission). Treatment started as soon as a therapist was available, usually within a few weeks.

Time Points and Outcome Measures

Data were collected at intake for all families. For patients who received treatment at the specialist clinic, data were also collected directly after treatment (post-treatment) and at 3-, 6-, and 12-month follow-up post-treatment. Because this was a naturalistic study, some patients were still in follow-up when the final data extraction was performed on 30 April 2020, which accounted for some of the data missing at later time points.

Data collection included a range of clinician-, self-, and parent-reported measures (see below). All measures were repeatedly collected at all time points. Clinician-rated measures were administered by the clinicians responsible for the initial assessment (baseline) as well as by the therapists providing treatment (all subsequent time points). All self- and parent-reported questionnaires were administered digitally via an online service.

Clinician-Rated Measures

The Yale Global Tic Severity Scale (YGTSS) [21] is a clinician-rated semi-structured interview which assesses tic severity and tic-related impairment. It is the most commonly used outcome measure in TS/CTD clinical trials [22]. The interview consists of three major parts: a symptom checklist, a tic severity rating, and a tic-related impairment rating. The total score of 0–100 points can be divided into several subscores, namely the Motor Tic Severity Score (0–25 points), the Vocal Tic Severity Score (0–25 points), the Total Tic Severity Score (YGTSS-TTSS; the two former scores combined, 0–50 points), and the Impairment Score (0–50 points). The YGTSS has demonstrated good psychometric properties [21, 23, 24].

The Clinical Global Impression-Improvement scale (CGI-I) [25] is a single-item clinician rating of symptom improvement. In this study, the CGI-I provided a rating of tic disorder symptom improvement compared to the baseline time point, with a range from 1 (‘very much improved’) to 7 (‘very much worse’). In line with previous TS/CTD trials [9, 11], ratings of 1 (‘very much improved’) and 2 (‘much improved’) were used to define treatment response.

The Children’s Global Assessment Scale (CGAS) [26] was also administered to provide a clinician rating of general functioning (not only restricted to tic disorder symptoms). The CGAS consists of a single item ranging from 1 to 100, where 100 indicates the best possible functioning.

Self-Reported Measures

The Premonitory Urge for Tics Scale (PUTS) is a 9-item (9-36 points) measure of premonitory urges. The questionnaire has shown good psychometric properties [27, 28]. The Gilles de la Tourette Syndrome-Quality of Life scale (GTS-QOL) [29] is a 27-item (0–108 points) disease-specific measure of health-related quality of life. Since the child and adolescent versions of the questionnaire had not yet been published when the data collection started, the adult version was used in this study. The Obsessive Compulsive Inventory-Child Version (OCI-CV) [30] is a 21-item (0–42 points) multi-dimensional measure of obsessive–compulsive symptoms. The questionnaire has demonstrated good psychometric properties [30]. Both the Children’s Depression Inventory-Short version (CDI-S; 10 items, 0–20 points) [31] and the Short Mood and Feelings Questionnaire-Child Version (SMFQ-CV; 13 items, 0–26 points) [32] were used in the study as self-reported measures of depressive symptoms, with the SMFQ-CV gradually replacing the CDI-S over time. For analyses, both depression measures were combined using z-scores. Lastly, the Work and Social Adjustment Scale-Youth version (WSAS-Y) [33], a psychometrically sound, 5-item (0–40 points) measure of functional impairment in several areas, was included.

Parent-Reported Measures

The Parent Tic Questionnaire (PTQ) [34] is a parent-rated measure of tic severity consisting of two separate lists of common motor and vocal tics. For each present tic, parents rate its frequency and intensity, generating a total score ranging from 0 to 224 points. The questionnaire has established psychometric properties [34]. Parents also rated their children’s depressive symptoms with the SMFQ-Parent Version (SMFQ-PV; 13 items, 0–26 points) [32] and global functional impairment with the WSAS-Parent version (WSAS-P; 5 items, 0–40 points) [33].

Interventions

BT was delivered according to published treatment manuals [35, 36] further supplemented by locally developed patient worksheets. The choice of BT modality (HRT or ERP) was primarily according to therapist preference, even though the patient and their family were also invited to express their preference. In HRT, the patients practiced to become more aware of their tics and to perform competing responses (i.e., voluntary behaviours physically incompatible with the tics) aimed at interrupting tic occurrence. In ERP, patients practiced to suppress tics (response prevention) for increased periods of time. In addition, patients intentionally provoked their premonitory urges (i.e., unpleasant sensations typically preceding tic occurrence) to make tic suppression more difficult (exposure), with the aim of increasing their tic suppression abilities. Along with psychoeducation and the core elements of HRT or ERP, additional therapeutic strategies could be added by the therapist, such as relaxation training and interventions based on functional assessment (as included in CBIT) [35]. BT treatments were by default 10 sessions long, but could be tailored (shortened or extended) to each patient’s needs. Each therapy session generally lasted 1 h. Therapists were clinical psychologists (5 years of master level studies and a minimum of 1 year of clinical training) trained in BT with extensive experience in the treatment of obsessive–compulsive and related disorders, including TS/CTD, or trainee psychologists or clinical psychology master students under supervision of a senior clinical psychologist. Throughout the study period, more senior clinical psychologists with extensive experience in delivering BT for TS/CTD were readily available for consultation.

Additionally, some patients were on medication for their tics. This included prescriptions prior to being referred to the clinic, as well as new prescriptions by a child and adolescent psychiatrist at the clinic, according to clinical judgement and treatment guidelines.

Statistical Analyses

Mixed-effects regression models for repeated measures using maximum likelihood estimation (MLE) of parameters were implemented on all continuous outcome measures. The models included fixed effects of time and random intercepts for the participant effects. The primary model included the baseline and post-treatment time points. Furthermore, to investigate the durability of the treatment effects, a second model was fitted which included the post-treatment, 3-, 6-, and 12-month follow-up time points. For graphical representation purposes only, a third model was also fitted including all time points from baseline to the 12-month follow-up. Significance thresholds (two-tailed) were set to p < 0.05. Bootstrapped within-group effect-sizes (d), derived from mixed-effects regression models, were calculated with the m_effectsize command in Stata developed by the Karolinska Institutet Biostatistics Core and available at www.imm.ki.se/biostatistics/stata. All analyses were performed using Stata 14.2 (StataCorp LLC). Sample sizes may vary for some of the analyses as a result of missing data. Missing data points are specified for each measure in all tables.

Results

Study Participants and Treatment Completion

Figure 1 shows the study participants’ flow. A total of 110 patients with TS/CTD provided informed consent for the study between 1 January 2015 and 27 January 2020. Seventy-four participants received BT for TS/CTD at the specialist clinic, of which 46 received psychoeducation plus ERP, 14 received psychoeducation plus HRT, and 14 received other BT: psychoeducation plus a combination of ERP and HRT (n = 7) or psychoeducation plus ERP/HRT treatment rationales only (n = 7; when judged sufficient for the participant’s clinical needs). The 74 participants completed an average of 7.07 (SD = 3.01, range = 2–16) BT sessions between baseline and the post-treatment time point. Figure 1 further shows the availability of TS/CTD data for each time point. For the 74 participants, follow-up data continued to be collected until 30 April 2020.

Fig. 1
figure 1

Study participants’ flow. A Participants were only excluded if all of the following data were missing at the specified time point: Yale Global Tic Severity Scale, Parent Tic Questionnaire, Premonitory Urge for Tics Scale, and Gilles de la Tourette Syndrome-Quality of Life scale; B During the study period, two participants received two (non-overlapping) rounds of treatment and 12-month long-term follow-up. In these cases, only the first round was included in the study; C Data are listed as available if any of the following data were available at the specified time point: Yale Global Tic Severity Scale, Parent Tic Questionnaire, Premonitory Urge for Tics Scale, and Gilles de la Tourette Syndrome-Quality of Life scale; D Data are listed as missing if all of the following data were missing at the specified time point: Yale Global Tic Severity Scale, Parent Tic Questionnaire, Premonitory Urge for Tics Scale, and Gilles de la Tourette Syndrome-Quality of Life scale; E Participants were still in follow-up at the time of the data freeze, hence they never reached this time point. BT behaviour therapy, CTD chronic tic disorder, ERP exposure with response prevention, FU 3-, 6-, and 12-month follow-up time points, HRT habit reversal training, OCD obsessive–compulsive disorder, post post-treatment time point, psychoed psychoeducation, TS Tourette syndrome

Table 1 shows baseline demographic and clinical characteristics for the total BT sample and by BT modality. Supplementary Table S1 further shows the proportion of specific motor and vocal tics at baseline, according to the YGTSS symptom checklist.

Table 1 Baseline demographic and clinical characteristics of the total BT sample and by BT modality

Clinician-Rated Measures and Treatment Response

Mixed-effects regression analyses at post-treatment showed a significant reduction on the YGTSS-TTSS for the total BT sample (coefficient [95% CI] − 7.74 [− 9.33 to − 6.16], p < 0.001), as well as for each BT modality (ERP: − 7.90 [− 9.81 to − 5.99], p < 0.001; HRT: − 6.09 [− 9.87 to − 2.31], p = 0.002; other BT: − 8.81 [− 12.72 to − 4.91], p < 0.001). Bootstrapped effect sizes (d) were the following: 1.03 (0.78 to 1.29) for the total sample, 1.09 (0.75 to 1.43) for the ERP group, 0.82 (0.19 to 1.44) for HRT, and 1.13 (0.59 to 1.66) for patients receiving other BT. Table 2 shows detailed statistics for the YGTSS and other TS/CTD-specific measures at post-treatment for the total BT sample. Supplementary Tables S2–S4 show the equivalent information separately for each BT modality.

Table 2 Detailed statistics for TS/CTD-specific measures at post-treatment for the total BT sample

At post-treatment, 38 out of 74 participants (57%, available data for n = 67) were classified as treatment responders according to the CGI-I. Numbers were too small to meaningfully report on the specific BT modalities separately.

Between baseline and the post-treatment time point, 8 out of 74 participants (11%) changed their medication (α2-agonists or antipsychotics): 4 participants (5%) either increased their dosage or started medication, 2 participants (3%) switched from one compound to another, and 2 participants (3%) either decreased their dosage or stopped taking medication for their tics. A mixed-effects regression analysis excluding these 8 participants from the total BT sample showed a similarly significant reduction on the YGTSS-TTSS as in the main model (coefficient [95% CI] − 7.60 [− 9.34 to − 5.85], p < 0.001, bootstrapped d = 1.01 [0.76 to 1.26]).

Mixed-effects regression analyses for the YGTSS Impairment Score and the CGAS both showed significant improvements at post-treatment for the total BT sample and each BT modality separately (Table 2 and Supplementary Tables S5–S8).

Self- and Parent-Reported Measures

Mixed-effects regression analyses for the PTQ, the GTS-QOL, the OCI-CV, the CDI-S and the SMFQ-CV combined to z-scores, the SMFQ-PV, the WSAS-Y, and the WSAS-P, showed significant improvements at post-treatment for the total BT sample (Table 2 and Supplementary Table S5). The results for the specific BT modalities are presented in Supplementary Tables S2–S4 and S6–S8, but should be interpreted with caution given the small sample sizes.

Long-Term Follow-Up

Mixed-effects regression analyses showed a significant improvement between post-treatment and the 12-month follow-up on the YGTSS-TTSS for the total BT sample (coefficient [95% CI] − 1.90 [− 3.60 to − 0.20], p = 0.029). Similarly, there was a continued improvement on the YGTSS Impairment Score and the CGAS for the total BT sample (Table 3 and Supplementary Table S9). A third model including all five time points was also implemented in order to obtain a graphical representation of the YGTSS-TTSS for the total BT sample from baseline to the 12-month follow-up (coefficient [95% CI] − 9.60 [− 11.35 to − 7.84], p < 0.001; Fig. 2).

Table 3 Detailed statistics for TS/CTD-specific measures for the long-term follow-up period for the total BT sample
Fig. 2
figure 2

Estimated means on the YGTSS-TTSS for the total BT sample, from a mixed-effects regression model including all five time points. Error bars indicate 95% confidence intervals. BT behaviour therapy, YGTSS Yale Global Tic Severity Scale, YGTSS-TTSS Yale Global Tic Severity Scale-Total Tic Severity Score

At the time of the data freeze, 11 participants were still active at the clinic and had not yet reached the 12-month follow-up. For the remaining 63 participants, 39 (75%, available data for n = 52) were classified as treatment responders at the 12-month follow-up. Numbers were too small to report on the specific BT modalities. Between the post-treatment and the 12-month follow-up time points, the 63 participants received on average 3.02 additional follow-up sessions (SD = 1.64, range = 0–10, complete data). The follow-up sessions primarily consisted of a YGTSS administration and brief clinical advice to promote the maintenance of the treatment effects. For participants who received more than 3 follow-up sessions, the additional sessions were considered as pure booster sessions focused on improving TS/CTD symptoms.

Additionally, comparing data from the post-treatment and the 12-month follow-up time points, 4 out of 74 participants (5%; available data for n = 53) changed their medication (α2-agonists or antipsychotics): 2 participants (3%) either increased their dosage or started medication, 1 participant (1%) switched from one compound to another, and 1 participant (1%) decreased his dosage. A mixed-effects regression analysis excluding these 4 participants from the total BT sample showed a similar continued improvement between the post-treatment and the 12-month follow-up time points on the YGTSS-TTSS as in the main model (coefficient [95% CI]  − 1.86 [− 3.64 to − 0.07], p = 0.041).

Regarding the self- and parent-reported measures, mixed-effects regression analyses showed that the gains were maintained (i.e., unchanged) between post-treatment and the 12-month follow-up on the PTQ, the PUTS, the OCI-CV, the SMFQ-PV, the WSAS-Y, and the WSAS-P. For the GTS-QOL and the CDI-S and the SMFQ-CV combined to z-scores, mixed-effects regression analyses showed that symptoms significantly deteriorated between the same time points. Full details are presented in Table 3 and Supplementary Table S9. Note that long-term follow-up analyses only were performed for the total BT sample due to an increased amount of missing data in the later time points.

Discussion

This study evaluated the effectiveness of BT for young people with TS/CTD in a naturalistic setting, a TS/CTD specialist outpatient clinic in Stockholm, Sweden. Additionally, the study examined the long-term maintenance (up to 1 year after the end of treatment) of the treatment effects, regardless of whether the participants had initially responded to treatment.

Baseline participant characteristics were in many respects consistent with existing literature [9, 11, 12, 37], including the age of tic onset, mean age, sex ratio, type of tic disorder ratio, and the mean baseline YGTSS-TTSS score. A majority of the participants received ERP (62%) compared to HRT (19%) and other BT (19%). The therapists’ preference for ERP may have been influenced by various factors, including their participation in an ERP training workshop during the study period, their familiarity with ERP from treating other related disorders (primarily obsessive–compulsive disorder), and that in Europe ERP is generally more frequently used than HRT/CBIT.

Tic severity and tic-related impairment (as measured by the YGTSS) improved significantly for both the total BT sample and each separate BT modality, with within-group effect sizes consistently in the large range. The estimated reduction of 7.7 raw points on the YGTSS-TTSS for the total BT sample at post-treatment was in line with several major RCTs of BT for TS/CTD (range = 3.4-8.5 raw points) [9, 11, 12]. Further, the 7.7 raw points equals an average tic severity reduction of 33%, which is well above the proposed 25% reduction corresponding to a clinically meaningful change [38]. The treatment response rate of 57% at post-treatment was comparable to the 53% reported for CBIT/HRT in the largest RCT to date [9]. Thus, we did not observe worse outcomes in a naturalistic setting, compared to controlled settings, as reported for other common psychiatric disorders [15, 16].

Results from self- and parent-reported measures were also generally positive, with significant improvements on measures of tic severity, TS/CTD-specific health-related quality of life, obsessive–compulsive symptoms, depression, and global functional impairment. More specifically, the estimated reduction of 14.6 raw points on the PTQ was above the proposed 10-point reduction indicative for positive clinical change [39]. Percentage wise, however, the 39% reduction on the PTQ was below the proposed 45% to 55% range for positive treatment response [39]. Similarly to previous trials of BT for TS/CTD [18, 40, 41], we did not observe significant improvements on the PUTS, suggesting that premonitory urges are not amenable to modification with BT.

Regarding long-term durability of the treatment effects, tic severity and tic-related impairment (as measured by the YGTSS) improved further between post-treatment and the 12-month follow-up. The treatment response rate for the total BT sample increased from 57% at post-treatment to 75% at the 12-month follow-up. Taken together, these results indicate that the effects of BT delivered in a naturalistic specialist clinical setting are maintained for at least 1 year after the end of treatment. Thus, this study provides important new information, as published RCTs have typically not followed patients for longer than 6 months after the end of treatment and have only followed treatment responders [9, 11].

This study had several strengths, primarily related to its naturalistic design and the inclusion of a long-term follow-up. External validity was arguably higher than in the typical RCT given that treatment was provided in a naturalistic setting, no specific eligibility criteria were used (e.g., 11% of the sample had comorbid autism spectrum disorder), and concurrent interventions (such as medication) were allowed. However, this study also had limitations. By definition, the naturalistic design did not control for the natural passage of time, meaning that participants could potentially have improved due to reasons unrelated to the interventions provided, such as ‘regression to the mean’ or the natural waxing and waning of tics. The study is further limited by missing data, especially for self- and parent-reported measures and later follow-up time points. Clinician-rated measures (including the YGTSS) were administered by the therapists providing treatment, rather than independent assessors. A small proportion of participants were on TS/CTD medication or changed their medication during the study period, but sensitivity analyses excluding these participants did not modify the results. We recruited a consecutive sample of patients referred to our clinic but the proportion of individuals who did not consent to participation is unknown because we did not have ethical approval to save their data. However, in our experience, the vast majority of families agreed to be included in the study. Lastly, it is important to note that, even if the study design was naturalistic, the patients were still seen in a specialist setting and treated by trained therapists using evidence-based treatment manuals. The results might therefore not be generalisable to general CAMHS or other healthcare settings.

Summary

To our knowledge, the current study is the largest to date evaluating BT for young people with TS/CTD in a naturalistic setting. The results confirmed that BT is an effective and durable treatment for young people with TS/CTD in a specialist clinical context, with effects comparable to those reported in RCTs.