Participants in this study were part of the Mind My Mind trial (Trial ID: NCT03535805). The details of the trial are described elsewhere [17, 18]. Briefly, the methods and study population are as follows.
The trial is designed to evaluate the effectiveness and cost-effectiveness of a new transdiagnostic modular cognitive and behavioral treatment versus treatment-as-usual for school-aged children with emotional and/or behavioral disturbances. The program comprises 9–13 weekly individual sessions targeting anxiety, depression and/or behavioral problems. The management-as-usual varied, as the children could receive anonymous counseling, pedagogical advice, network meetings, educational support, or psychological treatment of various kinds, either publicly or privately funded, or no further treatment.
The trial was advertised for professionals and parents in the community by use of pamphlets and intra-/internet, and the recruitment was based on parent’s initiative to seek help in collaboration with professionals such as the schoolteacher, nurse or psychologist. The inclusion procedure included minimum scores derived from the SDQ as reported by the parents and a clinical interview with a psychologist. The minimum scores from the SDQ follow a screening algorithm designed to identify children with mental health problems in the need of an intervention, this is further described elsewhere . To be included in the trial, the child had to have a primary problem that falls within the domains of anxiety, depressive symptoms or behavioral problems, according to the classification by the psychologist conducting the interview. Children with prior mental disorder diagnosis and children with an indication of severe mental disorders (e.g., signs of a full syndrome of ADHD and autism spectrum disorder) were excluded. Children with parents that did not understand and speak Danish sufficiently to participate in the trial were excluded. Based on sample size calculation and a pilot trial a total of 396 children aged 6–15 from four Danish municipalities (Helsingør, Holstebro, Næstved and Vordingborg) across the country were included and randomized on an individual level to the intervention or treatment-as-usual in the four community mental health care settings.
Data were collected via an online platform at both baseline and end-of-treatment 18 weeks later. Children and parents as proxies for the children completed the Danish versions of SDQ, KIDSCREEN-27 (KIDSCREEN) and CHU9D at baseline and end-of-treatment. This study solely focuses on parental responses since the SDQ was not reported by children younger than 11 years (which accounts for 48% of the study population). All three questionnaires were in Danish using validated translations [7, 20, 21].
The strengths and difficulties questionnaire
The SDQ is a widely used and well validated questionnaire aiming to assess children’s mental health problems in both clinical samples and in general population [22,23,24,25]. The SDQ contains 25 items, which cover five subscales relating to the children’s emotional problems, peer problems, behavioural problems, hyperactivity and pro-social behaviour. Responses to the subscales on emotional problems, peer problems, behavioural problems, and hyperactivity can be used to calculate a total difficulties score (SDQ-TD). Each subscale score ranges from 0–10, implying that the total difficulties score ranges from 0 to 40 . An extended version of the SDQ includes an impact assessment to evaluate how much the identified mental difficulties interfere with the child’s everyday life. An impact score (SDQ-I) is calculated from five items; whether the difficulties upsets or distresses the child and how much the difficulties interfere with home life, friendships, classroom learning, and leisure activities. Each item is scored on a scale from 0 to 2. To score 1 or 2, the interference from the difficulties in that domain must be assessed to either “quite a lot” or “a great deal” . The impact score is the primary outcome of the Mind My Mind Trial. When completing the SDQ parents were asked to respond as a proxy for their child based upon the preceding six months at baseline, and the preceding month at end-of-treatment.
KIDSCREEN is a 27-item generic measure of HRQOL and well-being. A total of 13 European countries were included in the cross-cultural harmonization and development of the measure. Several studies have found it to be valid and reliable in children with and without chronic health conditions, demonstrating adequate psychometric properties [20,21,22,23,24,25,26,27]. KIDSCREEN measures HRQOL and well-being across five domains: Physical Well-being, Psychological Well-being, Autonomy & Parents, Peers & Social Support and School Environment. Item responses are based on a five-point Likert scale and T scores for each domain are computed with a mean of 50 and standard deviation of 10, whereby higher scores indicate better HRQOL . KIDSCREEN domain scores do not allow for the calculation of a global HRQOL score. When completing the KIDSCREEN, the parents were asked to respond as a proxy for their child based upon the last week.
Child health utility 9D
CHU9D is a generic preference-based HRQOL measure designed specifically for use in an economic evaluation of health care interventions in children and adolescents. CHU9D has nine items with five levels of severity representing nine dimensions of HRQOL: Worried, Sad, Pain, Tired, Annoyed, Schoolwork/homework, Sleep, Daily routine, and Activities. The responses to the nine items can be converted to utilities, on the 0–1 dead–full health QALY scale, using preference-based scoring algorithms. In this study, two separate scoring algorithms were applied. The original algorithm is based on the standard gamble method of health state valuation and the preferences of an adult general population in the United Kingdom (N = 300). This algorithm generates utility scores ranging from 0.3261 (pit-state) to 1 (perfect health) . A newer algorithm is based on best–worst scaling methods and the preferences of adolescent Australians aged 11–17 from the general population (N = 1982) and a smaller sample (N = 152) time-trade-off experiment with young adults to anchor the tariffs. This algorithm generates utility scores ranging from − 0.1059 (pit-state) to 1 (perfect health) . Danish-specific preference weights are not yet available. When completing CHU9D, the parents were asked to respond as a proxy for their child based upon their HRQOL on the present day.
Table 1 provides a simplified overview of the conceptual overlaps between the three different measures. We categorized the items and the subscales of the three instruments into seven dimensions of quality of life based on direct comparisons of the content, even though the content and concepts are not likely to be independent of each other. Table 1 provides information on which measures we hypothesize CHU9D to be closest related to. Thus, CHU9D has the largest conceptual overlap with SDQ-I followed by SDQ-TD and KIDSCREEN’s Physical Well-being and Psychological Well-being. Contrary, there is no clear conceptual overlap between CHU9D and the KIDSCREEN Social Support & Peers measure.
To assess construct validity, the baseline data were used, and the discriminant validity and convergent validity were examined.
Discriminant validity was assessed by testing whether CHU9D can discriminate between groups defined by the SDQ-TD, the SDQ-I and the KIDSCREEN Psychological well-being score. The entire sample in this study exhibited some degree of mental health problems distributed on a continuum, and it was, therefore, not possible to define clearly distinguishable categories. Instead, we assessed whether CHU9D could distinguish between groups of children with different levels of problems using percentiles as cut-off values on the SDQ-TD, SDQ-I, and KIDSCREEN Psychological Well-being score. We focused on these scores as mental health problems are expected to have the largest impact on HRQOL. The study sample is divided into three groups: the children with the 25% lowest scores (low), the 25% highest scores (high) and the 50% in between (medium). Statistical differences were tested using Kruskal–Wallis test due to non-normality of the utility distributions (tested using the Shapiro–Francia test), and the magnitude of mean difference was assessed based on a minimally important difference (MID) of 0.03  as no formal MID is available for CHU9D.
Convergent validity was assessed using Spearman rank correlation coefficients. Correlation between CHU9D, the SDQ-TD, the SDQ-I and the KIDSCREEN scores was assessed. Based on Table 1, we hypothesized moderate correlations between CHU9D and the SDQ-TD and SDQ-I scores, KIDSCREEN Psychological Well-being and Physical Well-being. For the other KIDSCREEN scores, we hypothesized a low but positive correlation, as higher scores in these conceptually less overlapping dimensions of HRQOL would to some degree still be expected to correlate with higher CHU9D utility scores. A complete correlation matrix at the dimension/item level for CHU9D and each of the SDQ and KIDSCREEN scores and items is available in the appendix. Following established guidelines, the following categories for Spearman rank correlations are used: ≥ 0.5, strong; ≥ 0.3 to < 0.5, moderate; and < 0.3, weak .
To assess responsiveness, the floor and ceiling effects were first examined; next, the magnitude of change over time and the ability to differentiate between improvement and no improvement were investigated.
Floor or ceiling effects (i.e., more than 15% of respondents scored the lowest or highest possible score) affect the ability of the measure to detect deterioration or improvements in health, respectively . For CHU9D, we hypothesized a low percentage at the floor and ceiling at baseline, but we expected a higher percentage at the ceiling at follow-up, given that an effective intervention should improve the mental health of the respondents randomized to intervention. The floor and ceiling effects on SDQ and KIDSCREEN scores are used as reference values for examining CHU9D.
The magnitude of change in scores from baseline to end-of-treatment was assessed using the standardized response mean (SRM) statistic. The following categories for SRM are used: < 0.2 small; 0.5, moderate; and > 0.8, large . We first report the SRM for the whole sample. To study the responsiveness, we identified sub-groups of children whose mental health condition had improved according to the standardized measures SDQ and KIDSCREEN. Children who had improved at least 1 point on the SDQ-I were examined, as this is considered a minimum clinically important difference . For these groups of children with improved mental health, we hypothesized that CHU9D demonstrates a change in the same direction as SDQ and KIDSCREEN Psychological Well-being scale. Given that the latter two scales are more specific to the intervention, it was expected that larger effects would be found relative to CHU9D. The SRMs from the SDQ and KIDSCREEN scales are presented as reference values.
The mean change in CHU9D score for the children with improved mental health was estimated and compared with the mean changes for the children whose condition did not improve or got worse. Due to non-normality of the utility distributions, statistical differences were tested using Mann–Whitney test. The interpretation of the magnitude of mean difference was again based on a MID of 0.03.