Background

Children and adolescents comprise a third of the world’s population, and 10–20% of them are considered to suffer from mental health problems [1, 2]. According to the Global Burden of Disease study, mental disorders and substance use disorders account for 15–30% of years lived with disability among young people [3]. Despite the growing burden of disease and its long-lasting consequences beyond childhood and adolescence, there are significant gaps between need and resource availability, particularly in low- and middle-income countries (LMICs) where 90% of the world’s children and adolescents live [1]. Studies of mental health services suggest that a small proportion of those with mental health needs are receiving care [4,5,6].

Valid and simple screening instruments can contribute to the promotion of child and adolescent mental health services in LMICs in multiple ways. For example, it enables an identification of people with service needs at health care facilities as well as an epidemiological surveillance at the population level.

The Strengths and Difficulties Questionnaire (SDQ) is a mental health screening instrument for children and adolescents aged 4–17 years old used in over 100 countries including LMICs [7,8,9,10,11,12,13,14,15]. The SDQ is extensively used in both research and clinical settings as it is quick and easy to complete and score has good psychometric properties [16]. For the screening of psychiatric disorders, the SDQ has been shown to have 39–72% sensitivity and 76–94% specificity by the borderline/abnormal cut-off score in other languages [16,17,18]. Moreover, it has very broad acceptance by non-health professionals, children and their parents [19]. However, normative scoring and psychometric properties of the SDQ have been extensively assessed predominantly in samples from high-income countries. Several cross-cultural issues have been demonstrated. In different cultural contexts, the same answer may be perceived different ways, mental health terminologies may not be translated well, the original cut-off scores do not work appropriately, and the original five-factor structure may not be replicated [14, 18, 20,21,22,23].

Mongolia is a lower middle-income country. The epidemiological transition from communicable diseases to non-communicable diseases is occurring and mental health needs are assumed to be high [24]. The Mongolian version of the SDQ was developed through translation and back-translation and made available on the official website [25]. The SDQ is by far the only one internationally used mental health screening instrument available in Mongolia. However, validation of the Mongolian version of the SDQ has not been conducted yet, and the international cut-off scores, which were originally the cut-off scores derived from the UK children, have been used. A previous study in Mongolia demonstrated that 43% of adolescents were classified as abnormal by the international cut-off score [26]. There is a strong need for validating the SDQ and its cut-off score in Mongolia.

Therefore, the present study aimed to analyze the discriminative validity of the parent version of the Mongolian SDQ and to define the appropriate cut-off scores for the categories in Mongolia by banding normative data. Previous studies validated the SDQ by various methods such as comparing the results of the SDQ to other gold standard questionnaires, to psychiatric diagnosis given by established structured diagnostic interviews, comparing the results of the SDQ between low- and high-risk population. The present study aimed to validate the SDQ by comparing low- and high-risk population despite the risk of contamination of children with and without mental health problems, as there is no other gold standard screening tools or structured diagnostic interviews available in Mongolia. Appropriate cut-off scores and proven validity of the SDQ are necessary for the effective use of the SDQ in Mongolia for various purposes such as epidemiological surveillance and clinical needs assessment.

Methods

Study settings

This study compared two samples to validate the SDQ. These are: (1) a community sample; and (2) a clinical sample. The community sample consisted of children recruited from public schools in one district in the capital city. The clinical sample consisted of children recruited at a psychiatric outpatient service at the National Mental Health Center in Mongolia, which is the only one specialized service for child and adolescent psychiatry in Mongolia. There was no gold standard questionnaire to compare with the SDQ in Mongolia. In addition, there were no known groups which consist only of children with mental disorders and only of children without mental disorder. Thus, in the present study, we aimed to validate the SDQ through examining the discriminating ability of the SDQ between a community sample and clinical sample although there was a possibility that the community sample might include children with mental health problems and the clinical sample might include children without mental health problem. This method for SDQ validity has been applied in previous studies [27, 28].

Community sample

The community sample consisted of participants in a study that evaluated the effectiveness of physical activity on academic achievement and cognitive function among children in elementary schools. The details of this study are described elsewhere [29]. Participants were children in their 4th year at 10 public primary schools in Sukhbaatar district, which is one of nine districts in the capital city, Ulaanbaatar. Inclusion criteria for the original study were: (1) attendance at a public school in the Sukhbaatar district; (2) written consent from parents or guardians; and (3) child’s age-appropriate literacy in Mongolian. Exclusion criteria were: (1) comorbidities or contraindications prohibiting participation in an exercise program; and (2) enrollment in a special needs program. The population of the district is roughly 10 and 5% of the population of Ulaanbaatar and Mongolia, respectively [30]. The district stretches from urban city center to non-urban area where the infrastructure is not enough developed so that the socioeconomical background of the participants are diverse. There is no apparent difference in terms of population structure and residents’ economic level in this district compared with other districts in Ulaanbaatar [30].

Clinical sample

The clinical sample was recruited at the child and adolescent mental health outpatient service at the National Mental Health Center. The National Mental Health Center is a tertiary-level hospital and it is the only one specialized hospital for mental health in Mongolia. Almost all children with severe mental disorders or intellectual disability in Ulaanbaatar are considered to visit this National Mental Health Center.

The inclusion criteria were: (1) younger than 20 years old; (2) visiting Child and Adolescent Mental Health Outpatient Service at the National Mental Health Center between 1st December 2018 to 31st March 2019; (3) written consent from parents or guardians; and (4) parents’ or guardians’ literacy in Mongolian. There were no exclusion criteria. Originally, this sample was recruited to validate the SDQ as well as to understand the global characteristics of the users, all the target age range (younger than 20 years old) were recruited. However, in this analysis, children aged between 4 to 17 years old were included in the analysis due to the target age range of the SDQ.

Measures

Socio-demographic characteristics and the SDQ were obtained from a parent or guardian in both community and clinical samples. Socio-demographic characteristics included age, sex, region, maternal education, family structure, and household income. Clinical diagnosis was obtained in the clinical sample.

The strengths and difficulties questionnaire

The SDQ is a 25-item questionnaire for child and adolescent mental health problems. It is for 4–17-year-old children and adolescents. It is used for clinical assessment, epidemiological study and screening of psychiatric disorders. The 25-items are answered using a 3-point scale, “certainly true”, “somewhat true” and “not true” and scored from 0 to 2 points. The items yield 5 subscale scores that range from 0 to 10 including: (1) emotional symptoms; (2) conduct symptoms; (3) hyperactivity/inattention; (4) peer relationship problems; and (5) prosocial behavior. Summing emotional, conduct, hyperactivity/inattention and peer relationship subscale scores yields a total difficulties score that ranges from 0 to 40. The SDQ uses cut-off scores that are defined using normative data banding in three categories: normal (80th percentile and less), borderline (80th-90th percentile) or abnormal (90th percentile and more). The Mongolian version was obtained from the official website [25].

In the clinical sample, clinical diagnosis was made by certified psychiatrists at the hospital according to the 10th revision of International Statistical Classification of Diseases and Related Health Problems (ICD-10). The diagnosis was obtained by attending consultation or reviewing medical records. The ICD-10 is used conventionally in Mongolia and was used in the present study.

Statistical analysis

Descriptive analysis was done for socio-demographic characteristics. Two samples were compared of its socio-demographic background. For age, t-test was done. For other categorical variables, chi-square test was done.

Factor structure and internal consistency analysis

Among the community sample, exploratory factor analysis (EFA) was performed. To decide the number of factors, Minimum Average Partial (MAP) criterion, Bayesian Information Criterion (BIC), and the number of components by parallel analysis was used. When there were multiple candidates of factor number, the factor number was determined assessing factor loadings of each models.

Using the factor number determined by the exploratory factor analysis, McDonald’s omega coefficient was calculated for the entire SDQ and each subscale. Omega values of 0.7–0.8 are considered sufficient and above 0.8 are considered good [31, 32].

The SDQ is consisted of five subscales. Confirmatory factor analysis (CFA) was performed among community sample to examine the original five-factor structure. To evaluate the model fit, the Comparative Fit Index (CFI), the Tucker Lewis Index (TLI) and the Root Mean-Square Error of Approximation (RMSEA) were calculated. For TLI and CFI, values lower than 0.9 is considered as lack of fit, 0.90–0.95 as reasonable fit, and 0.95–1.00 as good fit [33]. For RMSEA, values smaller than 0.05 is an indicator of good fit, 0.05–0.08 is reasonable fit [33].

The analysis was done with lavaan library and psych library on R version 3.4.4 [34, 35].

Receiver operating characteristic analysis

To validate the SDQ, Receiver Operating Characteristic (ROC) analysis was performed. ROC curves are curves drawn by plotting sensitivity and specificity for all possible thresholds. The discriminating ability of the total difficulties score between the community and clinical samples was assessed by evaluating the area under the curve (AUC). The AUC of total difficulties scores was calculated among the entire sample, subdividing by sex. AUC values of 1.0 means perfect discriminating ability and AUC values of 0.5 means no discriminating ability at all. Conventionally, AUC values of 0.5–0.7 are considered low accuracy, 0.7–0.9 are considered moderate accuracy and 0.9–1.0 are high accuracy [36]. The analysis was done with pROC library on R version 3.4.4 [37].

This analysis had an assumption that the prevalence of psychiatric disorders among the clinical sample was substantially higher than that of the community sample. This analysis did not have an assumption that either none of the participants in the community sample had a psychiatric disorder or all the participants in the clinical sample had a psychiatric disorder.

To assess the discriminating ability of subscale scores, AUC of each subscale score was calculated.

Although the clinical sample consisted of patients at a child and adolescent psychiatric outpatient service, some participants in the clinical sample might not have a psychiatric disorder. If many in the clinical sample did not have a psychiatric disorder, it might be difficult to examine the discriminating ability of the SDQ. To solve this problem, a sensitivity analysis was conducted using the community sample and a subsample of the clinical sample participants which only included those with definite psychiatric diagnoses.

Cut-off score by normative banding

Normative data for the SDQ total difficulties score of the entire community sample were described. As the etiology of child and adolescent mental health problems has sex differences, normative data by sex were also described (Supplementary Table 1) [38]. Normative data of the 5 subscale scores were described (Supplementary Table 2).

To determine the original UK version SDQ cut-off scores, banding of the normative data of the SDQ total difficulties scores was done to divide percentiles into abnormal and borderline categories [28]. In the present study, the same banding method was applied to the normative data to determine the cut-off scores of the Mongolian version.

Comparison with the cut-off score candidates by ROC analysis

The cut-off score by normative banding was compared with the cut-off score candidates using ROC analysis which has a balance between sensitivity and specificity. For ROC analysis, the best cut-off score was analyzed by two methods: (1) determining the point closest to the top-left point of the plot which means perfect discriminating ability (100% sensitivity and 100% specificity); and (2) Youden’s J statistics which uses the point that maximizes the distance to the line of no discriminating ability (connecting the point of 100% sensitivity and 0% specificity and the point of 0% sensitivity and 100% specificity) [36, 39, 40]. The candidates from the ROC determined cut-off score were compared with the cut-off score by normative banding.

Sensitivity and specificity

Though we did not have an assumption that either the community sample did not include any participants with psychiatric disorders or that the clinical sample did not include any participants without psychiatric disorders, sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio to discriminate participant’s group (clinical sample or community sample) were calculated for each cut-off score. Sensitivity meant the proportion of above threshold participants in the clinical sample. Specificity meant the proportion of below threshold children in the community sample The proportion of children above each cut-off score was calculated.

Results

Community sample

A total of 2309 children were enrolled in met the inclusion criteria. None of the 2309 children met exclusion criteria. Of 2309 children, 2301 children participated in the study on physical activity, academic achievement and cognitive function (99.6%). A total of eight children did not participate because their parents/guardians did not provide an informed consent. Data from 2301 children were analyzed in this analysis.

Clinical sample

During the study period 666 children visited the National Mental Health Center, and 498 participated in the study (74.8%). Of those, 429 participants were between 4 and 17 years and were included in this analysis. Participant’s flow is presented in Fig. 1.

Fig. 1
figure 1

Participants in the clinical sample

Age, sex and socioeconomic factors

The mean age of the community sample was 9.7 years (SD 0.4, range 8.3–12.0). Of all participants, 51.3% were male. All community sample participants were living in Ulaanbaatar, 80.3% were living in a two-parent family, 94.0% had a mother whose educational level was upper secondary school or more, and 65.1% had household income above 700,000 MNT. Mean age of the clinical sample was 10.4 years (SD 3.8, range 4.0–17.8) and 60.1% were male. Of all clinical sample participants, 84.5% were living in Ulaanbaatar, 71.0% were living in a two-parent family, 88.2% had a mother whose educational level was upper secondary school or more, and 64.9% had household income above 700,000 MNT.

Mean age was higher in the clinical sample (p < 0.001) and the proportion of categories of sex, region, family structure, maternal education differed between two samples (p = 0.001, p < 0.001, p < 0.001, p < 0.001 respectively). The proportion of categories of household income did not differ between two samples (p = 1.00). Participants’ socio-demographic characteristics are presented in Table 1.

Table 1 Socio-demographic characteristics of participants

Clinical diagnosis

The psychiatric diagnoses of the children in the clinical sample were 123 (28.7%) with F7: Mental retardation, 74 children (21.0%) with F8: Disorders of psychological development and 66 (18.7%) with F9: Behavioral and emotional disorders with onset usually occurring in childhood and adolescence and 76 children (17.7%) were missing data which included no psychiatric diagnosis, not having determined a diagnosis or had missing values.

Discrimination between the community and clinical sample

Among the community sample, 2046 participants (88.9%) answered the SDQ. The mean of the total difficulties score of the community sample was 12.9 (SD 4.8). Among the clinical sample, 424 participants (98.8%) answered the SDQ. The mean of the total difficulties score of the clinical sample was 20.4 (SD 6.2).

Factor structure and internal consistency

The number of factors was two by MAP, five by BIC, and four by parallel analysis. When the factor loadings of each model were examined, four-factor model is in line with the original structure better, and 33% of variance was explained. In the four-factor model, emotion and peer problem subscales ware not distinguished. Internal consistency was sufficient for the entire SDQ but poor for each subscale by omega coefficient (the entire SDQ 0.75, conduct subscale 0.48, hyper activity subscale 0.65, emotional subscale 0.53, peer relationship subscale 0.37, and prosocial subscale 0.41).

CFA was conducted among community sample. CFA demonstrated a misfit when evaluated by CFI and TLI (CFI = 0.73, TLI = 0.69), but reasonable fit by RMSEA (RMSEA = 0.054 (95%CI 0.051–0.056)).

Discrimination between the community and clinical sample

For ROC analysis, the area under the curve (AUC) was 0.82 (95% confidential interval (95% CI) 0.80–0.85), and the 95% CI was estimated by the DeLong method and 2000 stratified bootstrap replicates and the results were the same (Fig. 2). Distribution bar graph of the total difficulties score is presented in Fig. 3.

Fig. 2
figure 2

ROC curve of total difficulties score

Fig. 3
figure 3

Distribution of total difficulties score

Among males, the mean total difficulties score was 13.3 (SD 4.9) in the community sample and 20.4 (SD 6.1) in the clinical sample (Supplementary Fig. 1). AUC was 0.81 (95% CI 0.78–0.84). Among females, the mean total difficulties score was 12.4 (SD 4.6) in community sample and 20.4 (SD 6.5) in the clinical sample (Supplementary Fig. 1). AUC was 0.84 (95% CI 0.80–0.87). This meant that the higher mean total difficulties score in the clinical sample was not due to the higher proportion of males in the clinical sample.

For each subscale score, AUC was calculated: (1) emotional subscale 0.67 (95% CI 0.64–0.70); (2) conduct subscale 0.76 (95% CI 0.73–0.79); (3) hyperactivity/ inattention subscale 0.74 (95% CI 0.71–0.77); (4) peer relationship subscale 0.78 (95% CI 0.76–0.81); and (5) prosocial subscale 0.67 (95% CI 0.64–0.70).

A total of 350 participants among the clinical sample had a definite diagnosis of a psychiatric disorder. Using a subsample of the clinical sample, which only included participants with definite psychiatric diagnosis, AUC of the total difficulties score was 0.82 (95%CI 0.80–0.85), which was consistent with the original AUC value. For each subscale score, AUC was calculated: (1) emotional subscale 0.68 (95% CI 0.65–0.71); (2) conduct subscale 0.74 (95% CI 0.71–0.77); (3) hyperactivity/ inattention subscale 0.72 (95% CI 0.69–0.75); (4) peer relationship subscale 0.78 (95% CI 0.75–0.80); and (5) prosocial subscale 0.67 (95% CI 0.63–0.70). These were similar to the results of the original analysis.

Cut-off score by normative banding

Among the entire community sample, the cut-off score between normal and borderline and between borderline and abnormal was 16/17 and 19/20 respectively. Cut-off scores by normative banding were compared to that of the UK [12]. Cut-off scores of subscales are presented in Table 2. The cut-off score for the total difficulties score was 3 points higher than that of the UK. The cut-off scores of emotion, conduct and hyperactivity/ inattention and peer relationship subscales were 0–2 points higher than those of the UK. The cut-off score for the prosocial subscale was 2 points lower than that of UK.

Table 2 Normative banding

Normative data for the SDQ total difficulties score were demonstrated in Supplementary Table 1. Normative data of the five subscale scores were described in Supplementary Table 2.

Comparison with cut-off score candidates by ROC analysis

For the first method, the best cut-off score was determined by the point closest to the top-left point and was 16/17. Sensitivity was 0.72 and specificity was 0.78. For the second method, the Youden method, the cut-off score was 17/18. Sensitivity was 0.67 and specificity was 0.84. According to the comparison between cut-off scores by normative banding and these cut-off score candidates using ROC analysis, the cut-off score of 16/17 was considered to have better balance between sensitivity and specificity than the cut-off score of 19/20. The cut-off score of 19/20 weighs more on specificity. Thus, the cut-off score of 16/17 is considered to be a good cut-off score for the screening of mental health problem among community children in Mongolia.

Sensitivity and specificity

For each cut-off score, sensitivity, specificity, positive likelihood ratio, negative likelihood ratio and proportion of high risk among the community sample were calculated and displayed in Table 3.

Table 3 Indicators of cut-off score candidates

Discussion

Summary of results

The SDQ score of 2301 community representative children and 429 mental health service user representative children were compared. The AUC value of the total difficulties score was 0.82, which means moderate discriminating ability. As for cut-off scores, normative banding suggested 16/17 for a cut-off between normal and borderline and 19/20 for a cut-off between borderline and abnormal. Both cut-off scores were three points higher than the international cut-off scores. The cut-off score of 16/17 had good balance between sensitivity and specificity by ROC analysis. We recommend a cut-off score of 16/17 for the screening of mental health problems among community children. This analysis demonstrated that the use of international cut-off scores in Mongolia leads to an over estimation of high risk children.

Comparison with previous studies

Previous validation studies of the parental version of the SDQ has demonstrated AUC ranges between 0.66 to 0.87, which moderated discriminating ability [13, 17, 18, 28, 41]. AUC of the parental version of the Mongolian SDQ was 0.82 and consistent with previous studies. This suggested that the parental version of the SDQ could be used in Mongolia.

EFA suggested a four-factor structure. Four factor structure was suggested by a study in the Netherland [42]. Internal consistency was sufficient for the entire SDQ but low for subscales. A systematic review reported that some studies demonstrated a factor structure other than the original five-factor structure or a low internal consistency for some subscales [23]. CFA did not show a good fit for five-factor structure by CFI and TLI. However, some of the previous studies also failed to show five-factor structure [23]. Hence, the subscales need to be used with caution.

Cut-off score

Internationally, normative banding has been used to determine the cut-off score of the SDQ. However, ROC analysis suggested that cut-off scores between borderline and abnormal disproportionately weighed on specificity rather than sensitivity and false negatives might be a problem. Although this study suggests a cut-off score of 16/17 for normal/borderline and 19/20 for borderline/abnormal following the methods of previous studies, other cut-off scores can also be considered according to the purpose and nature of the target population. For example, if a human resource to perform an assessment for screened children is depleted, minimizing false negative is an important strategy. In that case, higher cut-off score must be considered.

Difference from the results of previous survey in Mongolia

One previous study has used the SDQ parental version among Mongolian adolescents [26]. In the study, children aged 11 to 18 from both Ulaanbaatar and outside Ulaanbaatar were included. The mean total difficulties score was 16.6 (SD 4.4) by parental report. This mean score was higher than the mean total difficulties score among the current community sample. The difference in age range and residential area of the study sample might explain the difference.

Limitations

The community sample consisted of children attending the same year at primary school and did not include younger children or adolescents. The community sample consisted of children at around 9–10 years old, which is childhood. Thus, the mental health problems occurring in the adolescence is not captured. In this study, we used children living in Ulaanbaatar as the community sample. However, the lifestyle of children is very diverse in urban and rural areas. As the suggested cut-off scores are based on the normative banding of the community sample, cut-off scores among different age range samples and rural area samples are not confirmed. Confirming them will be a future research focus. In addition, as the data collection period was not year-round, there might have been seasonal effects.

In our study, the community sample might have included children with mental health problems and the clinical sample might have included children without mental health problems. Regarding the community sample, if the community sample had exclusively consisted of children without mental disorders, the discriminating ability would have been higher and we did not overestimate the discriminating ability. Similarly, for the clinical sample, it did not exclusively consist of children with mental disorders. However, the sensitivity analysis using only children with definite psychiatric disorder yielded the same level of discriminating ability. Thus, the present study did not overestimate the discriminating ability due to the sampling methods.

Conclusions

The parental version of the SDQ demonstrated moderate discriminating ability among Mongolian school-age children. The cut-off score between normal and borderline was 16/17 and between borderline and abnormal was 19/20. For the screening of mental health problems among community children, the cut-off score 16/17 is recommended. The suggested cut-off score was considerably different from the cut-off score used internationally. If the internationally used cut-off score is used in Mongolia, specificity would be very low and false positives would be more likely. Confirmation of cut-off score for early childhood, adolescence, and rural population will be a future research focus.