Section 1: background

Many adult mental health issues originate in childhood and adolescence. Despite this importance in the formulation of adult mental health and the impact of youth mental health on families and communities, youth mental health has historically been an understudied and underserved area of psychiatry [1]. This missing link in the literature has begun to be rectified internationally, but still, the majority of psychiatric epidemiological studies of child and adolescent mental health have been conducted in Western countries.

This gap in the psychiatric epidemiological research literature on Arab youth is improving as countries in the Arab region are increasingly prioritizing youth mental health. In last few years, Qatar has made significant strides in improving child and adolescent mental health services [2, 3].

Studies presenting diagnostic criteria in community samples of Arab youth in Oman [4, 5], Saudi Arabia [6], Lebanon [7] and Yemen [8] have provided data mainly within the normal variation of international estimates. The authors of studies of Arab youth mental health in representative and non-representative samples have been split between concluding that their data showed cross-cultural invariance in mental health [9, 10], mixed conclusions [4, 6, 7], or cross-cultural variation [5, 11]. With most of these data coming from non-representative samples, it is difficult to know if any differences are caused by the sampling method or represent real variation in underlying mental health diagnoses. No prevalence data from representative samples exist for Qatari youth.

In the field of psychiatric epidemiology, there are several main strategies for bulk assessment of youth mental health: full-length clinical interviews based on diagnostic classifications [e.g., World Mental Health Composite International Diagnostic Interview [12]], scales or short-form assessments based on diagnostic classifications [e.g., Strengths and Difficulties Questionnaire [11]]; Development and Well Being Assessment [13]; Symptoms Checklist [6] and scales or short-form assessments based on quantitative taxonomies [e.g., Child Behavior Checklist [14, 15]].

When choosing a scale to validate for the current study, we applied the following five constraints. The scale must:

  1. 1.

    Be based on diagnostic classifications, for international comparison and clinical relevance

  2. 2.

    Have Arabic and English versions

  3. 3.

    Have a wide age range (i.e., from early childhood through adolescence) for the eventual expansion of the project

  4. 4.

    Have some demonstrated validity—generally and in Arab youth

  5. 5.

    Reasonable regarding cost, length, and complexity that allow usability in community/epidemiological samples

The following scales were eliminated for one or more of these constraints: the Child Behavior Checklist [1], the Arabic Youth Mental Health scale [2, 4], the World Mental Health Composite International Diagnostic Interview [3, 5], and the Symptoms Checklist [3].

Two scales fit four of the five constraints: the Development and Well Being Assessment (DAWBA) and the Strengths & Difficulties Questionnaire (SDQ). The SDQ is the modern version of Rutter’s scales. The current version of the DAWBA includes the SDQ as part of its measurement, giving both an SDQ score and a DAWBA diagnostic label within the same measure. In addition to extensive validation of other language versions of the DAWBA [16], the DAWBA-Arabic has been validated in a small sample of Lebanese youth [17]. In a larger sample of Yemeni youth, the DAWBA-Arabic showed substantial agreement with clinical diagnoses [8], and in the same study, the SDQ was shown to distinguish between clinical and community samples. The SDQ is shorter and can be scored by a non-clinician. In addition, because the DAWBA has added a payment per test administered, the SDQ was seen as a more widely available, cost-conscious mechanism for Arabic-speaking communities to screen their youths’ mental health.

The SDQ is one of the most established screening tools in youth mental health and has been translated to and validated in many languages including Arabic. Until recently, most of the validation work has been conducted in Levantine Arabic countries (see above). In the GCC, a few recent studies have added to the validation data of the SDQ showing that the scale effectively predicted mental health outcomes as estimated by teachers and parents in Oman [18], showed invariance in testing modality between parent and teacher reports in Kuwait [19], and showed structural integrity between cultures in Saudi Arabia and Oman [20]. In Qatar, there is some convergent validity only for the conduct problems subscale [21] but no assessment of the full scale’s validity. In all of the above cases, some level of convergent validity has been established. However, to date, the SDQ has not been validated against clinicians’ diagnoses in the GCC.

Further validation of the Arabic version of the SDQ helps develop scholarly tools for the Arab youth mental health research community. Having proven its usefulness in Qatar, the SDQ can then be used in larger-scale epidemiological research and more significant youth mental health screening initiatives in the GCC without concerns of waste of time or resources.

Purpose of the present study

In the present study, we examined the utility of the SDQ-Arabic as a mental health screening tool in GCC countries. We posed three main questions related to the predictive validity of the SDQ-Arabic: First, within a clinically referred sample, does the SDQ differentiate between categories of diagnoses? Second, does the SDQ differentiate between clinically referred and community samples? Third, does the SDQ predict negative and positive parenting dimensions as it does in other regions (external validity)? The primary data set was collected from a sample of Arabic-speaking 13-to-17-year-old youth and their parents referred to a CAMHS unit in a pediatric tertiary care hospital in Qatar from which we collected SDQ-Arabic data and parenting information (APQ-15; Arabic). In the third set of analyses, these data were compared with SDQ data from a regionally stratified representative community sample of 13-to-17-year-old youth Arabic-speaking youth in Qatar. The study was approved by Institutional Review Board (IRB) of Sidra Medicine.

Section 2: method

Section 2.1: sample

The clinical sample (n = 62; 45% female) was obtained from Sidra Medicine’s CAMHS outpatient clinic in Doha, Qatar where children and adolescents received services after a referral from another healthcare professional or a school counselor. Parents and youth meeting the criteria were approached in the waiting room for consent, assent, and the questionnaire was completed in the waiting room before and after their sessions. All measures were pen-and-paper measures that were given to participants by a research assistant while they were waiting for their clinical sessions. The criteria were youths 13 to 17-years old, who spoke Arabic, and their parents. Participants were excluded only if they were outside of the age range, did not speak Arabic, or did not have appropriate parental consent or assent to participate Data were collected from 2019 through 2021 and were interrupted by the COVID-19 pandemic.

Comparisons are made between SDQ scores for this clinical sample and a community-based sample. This community-based sample is a regionally stratified, national sample of 13-to-17-year-old Arabic-speaking youth living in Qatar and was collected as part of a separate study (Gilstrap, in preparation). Details about this sample are presented in the relevant results sections below.

Section 2.2: instruments

The strengths and difficulties questionnaire (SDQ; Arabic)

The main portion of the SDQ comprises 25 items with five items for each of five dimensions: emotional symptoms, hyperactivity/inattention, conduct problems, peer relationships, and prosocial behavior [22]. These items are rated on a 3-point scale from “not true” to “certainly true.” Each item is scored from 0 to 2, giving a possible range of 0 to 10 for each dimension. In addition, as a standard part of the official SDQ scoring the first four dimensions, excluding prosocial behavior, are summed to indicate a total difficulties score.

The optional impact supplement was also included in this study which is an 8-item addition assessing how chronic, distressing, impairing, and burdensome any reported problems are to the youth and their friends and family. These items are condensed into a measure of overall impact. Specific downloadable standard scoring rules can be downloaded from the SDQ website (https://www.sdqinfo.org).

Alabama parenting questionnaire (APQ-15; Arabic)

The APQ-15 is a shortened version of the original APQ questionnaire that includes the same dimensions: positive parenting (PP), parental involvement (PI), poor monitoring and supervision (PMS), inconsistent discipline (ID), and corporal punishment (CP) [21]. Of the 15 items, three items are used to assess each dimension. Each item is ranked from 1 (never) to 5 (always) generating a possible range of 3 – 15 for each of the five dimensions.

Section 3: results

Section 3.1: demographics of sample and response rates

Of the clinically referred youths meeting the criteria who were approached by a researcher, approximately 50% (and their parents) participated with non-participants either not having a parent present to consent, not having time to complete the questionnaire before or after their session, or not assenting or consenting. Completion rates for the parental scales were over 97% SDQ and 87% APQ.. Diagnoses for all participating youth were obtained from clinical records at a later time.

Youths ranged from 13 to 17-years old ( = 15.58, SD = 1.23) and 45% were female. Nationality data were collected from 74% of the participants. Of those reporting their nationality (n = 46), 52% reported being Qatari, 22% Egyptian, and the remaining 26% reported being Lebanese, Palestinian, Syrian, or Tunisian. Parents were not asked to report their gender, but during data collection, we noted that just over half were mothers. Boys were commonly accompanied to the clinic by their fathers.

Section 3.2: diagnoses

To match the categories used in the SDQ, diagnoses for each participant obtained from patient records originally using the DSM-5 were evaluated as including or not including three overarching categories of disorders: emotional, hyperactive, and conduct. In addition to these three categories, a category of other diagnoses was added for disorders not falling into the SDQ categories. Note that participants with multiple diagnoses could be classified as having more than one of these overarching types. Clinicians diagnosed the youth as having: an emotional disorder (53%), hyperactive disorder (31%), conduct disorder (7%), and other disorders (27%). The most common types of other disorders noted were Autism Spectrum Disorder (ASD) and cognitive impairment.

Section 3.3: psychometric properties of scales in this sample

Internal consistency data for the SDQ dimensions were as follows: emotional symptoms (α = 0.69; interitem from r = 0.17 to r = 0.49), hyperactivity/inattention (α = 0.75; interitem from r = 0.18 to r = 0.61), conduct problems (α = 0.59; interitem from r = 0.11 to r = 0.36), peer relationships (α = 0.15; interitem from r =—0.21 to r = 0.25), and prosocial behavior (α = 0.65; interitem from r = 0.18 to r = 0.45). All dimensions had acceptable reliability ratings except the peer relationships dimension. Analyses of the individual items indicated that none of the peer relationship items had strong correlations with each other and that the items ranged from weakly negatively correlated to weakly positively correlated. We present the dimensional correlations in Table 1.

Table 1 Pearson correlations between dimensions on the SDQ

Similar to the SDQ, as the APQ has positive and negative parenting dimensions, we present the interdimensional correlations in Table 2 and the internal consistency data for the five dimensions in the text below. Across the scale, in general positive parenting dimensions were correlated (e.g., positive parenting and parental involvement, r = 0.58, p < 0.01) and positive and negative parenting dimensions were negatively correlated (e.g., positive parenting and poor monitoring and supervision, r = -0.57, p < 0.01). Cronbach alphas for the dimensions were: positive parenting (α = 0.78; interitem from r = 0.49 to r = 0.63), parental involvement (α = 0.34; interitem from r = -0.11 to r = 0.32), poor monitoring and supervision (α = 0.17; interitem from r = -0.04 to r = 0.36), inconsistent discipline (α = 0.52; interitem from r = 0.14 to r = 0.51), and corporal punishment (α = 0.55; interitem from r = 0.10 to r = 0.46). The lower interitem reliability matches the findings of Badahdah and Le (2016) who also found that this scale had the lowest internal consistency. Of the three items on the poor monitoring and supervision subscale, we found that the item about staying outside the house later than the youth was supposed to, and the item about going out with friends their parents did not know was positively correlated (r = 0.36, p < 0.05) but that an item about not telling their parents where they were going either face-to-face, phone call, or by message was not correlated with either of the other two items (rs = -04, -0.05).

Table 2 Pearson correlations between dimensions on the APQ

Section 3.4: does the SDQ-Arabic predict the diagnosis category in a clinically referred sample of Arabic-speaking youth in Qatar?

Using area under the curve analyses (8, for a similar example), the SDQ-Arabic differentiated between broad categories of diagnoses in the clinically referred sample. Unlike a correlation, the AUC is compared to 0.5 (chance) rather than zero. The further from 0.5 the larger the effect size, with numbers above 0.5 indicating a positive predictive relationship and numbers below 0.5 indicating a negative predictive relationship. This effect size and its associated variability are then tested for statistical significance at the p < 0.05 level with significance indicating that the variable predicts the binary outcome variable (in this case a relevant clinical diagnosis or not) better than chance.

The three main diagnostic dimensions of the SDQ (emotional symptoms, hyperactivity, and conduct) each predicted a diagnosis of a trained clinician. Reported emotional symptoms on the SDQ predicted a diagnosis of an emotionally based disorder by a trained clinician (AUC = 0.71, p < 0.01), reported hyperactive symptoms on the SDQ predicted a diagnosis of a hyperactivity-based disorder by a trained clinician (AUC = 0.79, p < 0.001), and reported conduct problems on the SDQ predicted a diagnosis of a conduct based disorder by a trained clinician (AUC = 0.86, p < 0.001). In addition to these direct predictions from three main diagnostic dimensions of the SDQ, there were some predictions between other SDQ dimensions and the clinicians’ diagnoses. Conduct-based disorders, as diagnosed by a trained clinician, could be effectively predicted by lower reported prosocial behaviors on the SDQ (AUC = 0.23, p < 0.01) and increased reports of peer problems on the SDQ (AUC = 0.86, p < 0.001). Finally, as diagnosed by a trained clinician, a diagnosis of an emotionally based disorder could be effectively predicted by lower reported hyperactive symptoms on the SDQ (AUC = 0.29, p < 0.01). The SDQ total score and the SDQ impact score within the clinically referred sample did not predict diagnostic category differences.

Section 3.5: does the SDQ-Arabic differentiate between a clinically referred sample and a representative non-clinically referred sample?

Using area under the curve analyses, the SDQ-Arabic differentiated between a clinically referred sample and a non-clinically referred community sample of Arabic-speaking youth. In a separate sample, gathered at the same time for a separate study, the SDQ was also administered to a non-clinically referred, geographically stratified, nationally representative sample of 13-to-17-year-old ( = 14.91, SD = 1.16) Arabic speaking youth in Qatar (n = 265, 45% female). This sample was not specifically matched to the clinically referred sample, and so the following analyses are presented as exploratory and as additional supporting evidence for the SDQs predictive validity in a GCC country.

In contrast to the clinically referred analyses, by far the most sensitive and specific predictor of whether a youth was clinically referred or not was the SDQ impact score (see Table 1) with higher SDQ impact scores predicting being clinically referred (AUC = 0.74, p < 0.001). The only other SDQ dimension that differentiated between the clinically referred and community samples were SDQ reported emotional symptoms (AUC = 0.59, p < 0.05). SDQ reported conduct problems, hyperactivity, peer problems, prosocial behavior, and total reported problems did not differentiate between the community and clinically referred samples.

However, these predictions between the clinically referred and non-clinically referred community samples differed by youth gender. The main predictive dimension (i.e., impact score) remained predictive—that is, a higher impact score was still predictive of clinically referred status for both genders (girls AUC = 0.69, p < 0.01; boys AUC = 0.78, p < 0.001). However, increased SDQ total difficulties scores predicted clinical referred status for boys (AUC = 0.67, p < 0.01) but not girls (AUC = 0.44) and increased reported SDQ conduct problems predicted clinical referred status for boys (AUC = 0.61, p < 0.05) but not girls (AUC = 0.45). In addition, increased SDQ reported hyperactivity predicted clinical referred status for both boys and girls, but in the opposite direction. Higher levels of hyperactivity predicted being in the clinically referred group for boys (AUC = 0.64, p < 0.05), while lower levels of hyperactivity predicted being in the clinically referred group for girls (AUC = 0.32, p < 0.01).

Section 3.6 relationship between the SDQ and APQ

Partial correlations (two-tailed) were conducted between the dimensions of the SDQ (emotional symptoms, conduct problems, hyperactivity/inattention, peer problems, prosocial behavior), the SDQ total difficulties score on the SDQ, and the SDQ impact score, and the dimensions of the APQ-15 (positive parenting, poor monitoring and supervision, parental involvement, inconsistent discipline), controlling for age and gender. For the most part, the SDQ dimensions were not correlated with the APQ dimensions. Because of the relatively small sample size for partial correlations, power might be a concern. However, except for the statistically significant correlations, the majority of correlations (i.e., effect sizes) were near zero (< 0.10), and so a larger sample size would not be expected to change the global pattern of results. Here we report the significant and marginally significant correlations for future researchers who might follow up on these exploratory data. The APQ dimension of poor monitoring and supervision was related (p < 0.05) to decreased reported SDQ emotional symptoms (r = -0.29) and total symptoms (r = -0.31) and tended (p < 0.10) to be related to decreased reported hyperactivity (r = -0.26). Inconsistent discipline tended to be related to decreased hyperactivity (r = -0.21), and corporal punishment tended to be related to lower reported peer problems (r = -0.24).

Section 3.7: gender and age

Multivariate analyses were conducted predicting dimensions of the SDQ (emotional symptoms, conduct problems, hyperactivity/inattention, peer problems, prosocial behavior), the SDQ total difficulties score on the SDQ, and the SDQ impact score, and the dimensions of the APQ-15 (positive parenting, poor monitoring and supervision, parental involvement, inconsistent discipline), from age and gender. The overall multivariate model (F = 5.18, p < 0.01) and gender (F = 2.62, p < 0.01) were significant predictors but age was not (F = 1.93).

Tests of between subjects effects revealed that clinically referred Arabic speaking girls had higher average ratings of positive parenting (F = 4.32, girls x̄ = 4.78, SE = 0.32; boys x̄ = 3.78, SE = 0.34) and emotional problems (F = 10.50, girls x̄ = 5.13, SE = 0.45; boys x̄ = 2.99, SE = 0.48) and lower ratings of hyperactivity (F = 4.37, girls x̄ = 3.35, SE = 0.55; boys x̄ = 4.92, SE = 0.51) than clinically referred Arabic speaking boys.

Older clinically referred Arabic speaking youth had higher-rated positive parenting (F = 6.83, r = 0.39) and prosocial behavior (F = 6.10, r = 0.37) and tended (p < 0.10) to have lower poor monitoring and supervision (i.e., more monitoring and supervision) (F = 2.89, r = -0.26) and lower overall impact scores (F = 2.88, r = -0.21) than younger Arabic speaking youth. These findings are consistent with previous literature and are compared with that literature in the discussion.

Section 4: discussion

In the current study, we found additional validation evidence for another mental health screening tool to be used in Arabic in the GCC. The current study extends the validity of the SDQ-Arabic. The SDQ-Arabic, which had previously been validated in Arabic speaking Levantine countries in the region, continues to demonstrate strong predictive value in a GCC sample. The SDQ-Arabic differentiated between diagnoses in a clinically referred sample and differentiated between the clinically referred sample and a representative community sample. This short, free, and easy to administer score scale can be used in larger youth mental health screening initiatives in support of local community mental health initiatives (e.g., [23, 24]) as well as to support mental health initiatives for youth across the GCC.

While this is what we expected, it is not to be taken for granted. Previous experience has shown that scales do not always show the same factors and variables do not always predict the same outcomes when applied to cultures that they were neither developed nor normed within. Scales used internationally to assess youth mental health have produced different underlying factor structures in Arab youth than in their Western counterparts [11]. Variables such as family cohesion have been shown to be a major factor in understanding and predicting youth mental health internationally [25,26,27], and to interact with family characteristics to influence mental health outcomes [28]. Yet family cohesion cannot be used to explain variance in youth mental health outcomes in Qatar because the vast majority of Qatari families are intact [21, 29]. These are just two examples of many that illustrate why it is important to validate scales and test relationships within the culture to which you want to apply them.

The 15 item version of the Alabama Parenting Questionnaire in Arabic was related to a number of SDQ variables. Although a direct validation of the APQ-15 was not the goal of this study, our results provide substantiation of its usefulness in this population. One particular variable is worth noting and it is poor monitoring and supervision. Lower rates of monitoring and supervision were found to be related to higher emotional symptoms and higher total symptoms on the SDQ. This same relationship can of course be stated as the inverse, that higher rates of monitoring and supervision were related to lower emotional symptoms and lower total symptoms. Although this data is correlational, it is suggestive of an evocative relationship between the child and the parent in which the parents alter their monitoring and supervision based on the child’s symptomology.

Conclusion and limitations

The current study extends the validity of the SDQ-Arabic. The SDQ-Arabic, which had previously been validated in Arabic speaking Levantine countries in the region, continues to demonstrate strong predictive value in a GCC sample. The SDQ-Arabic predicted between clinically referred and not clinically referred samples. However, while this is the first known study to do this in the GCC, there are several important limitations to consider. First, GCC countries are of course not monolithic and vary within themselves on important cultural dimensions that may impede generalization. While the fact that the SDQ-Arabic predicted clinical referral in one GCC country is good news for its use in other GCC countries, it cannot be taken for granted that the results would generalize to other GCC countries and this should be examined in future studies. Further, of course, this study is correlational and clinical referral is itself a further confound. The not clinically referred sample will contain some percentage of youth who should be clinically referred and clinical referral itself is also a confound with clinical referral being at least partially also based on a number of factors including acting out, and family involvement. However, the fact that the SDQ-Arabic did predict between a clinically referred and a not clinically referred sample is still strong evidence of its validity as a screening tool.