Introduction

Anxiety and depression are psychiatric disorders that often coexist and share some features. Both conditions are highly prevalent, with anxiety affecting approximately one in 15 persons annually and depression affecting one in 20 persons annually [1]. The prevalence of anxiety and depression varies across different populations and settings. A meta-analysis encompassing 174 surveys from 63 countries indicated that approximately 1 in 5 respondents (17.6%) experienced a common mental disorder, including mood and anxiety disorders, during the previous 12 months. Regional variations were notable, with lower prevalence rates in North and South East Asia and higher rates in English-speaking countries [1]. Comorbidity of anxiety and depression may be present as the full clinical picture of the two syndromes or as limited symptoms from both two syndromes. Anxiety disorders typically present first [2]. The theoretical framework behind the comorbidity of anxiety and depression integrates several psychological, biological, and environmental factors. This comorbidity is understood through a network theory perspective, which posits that overlapping symptoms, or "bridge symptoms," such as worrying and irritability, facilitate the interaction between anxiety and depression, leading to mutual reinforcement of these disorders [3]. Additionally, early-life stressors and trauma contribute to the development of both disorders, indicating that environmental factors play a critical role in their comorbidity [4]. Anxiety and depression also share other similar risk factors, including female gender, family history, and perinatal factors [5]. Neurobiological research has shown consistent abnormalities across both anxiety and depression, particularly in amygdala hyperactivity [6]. The high comorbidity rate also reflects the interconnected nature of mental health symptoms, where the presence of one disorder significantly increases the likelihood of developing the other, creating a complex interplay of psychological vulnerabilities and stressors that perpetuate both anxiety and depression [7].

Anxiety and depression can lead to functional disability [8], including impaired physical functioning, role limitations due to emotional health problems, and decreased social functioning. Research has shown that both anxiety and depressive symptoms are associated with a reduced quality of life [9]. Compared with those without anxiety, individuals with moderate to severe anxiety were less likely to meet ideal levels of physical activity. Similarly, individuals with moderate to severe depression were less likely than those without depression to meet ideal levels of physical activity. [10]. Furthermore, one study showed that people with low depression and high anxiety, with high depression and low anxiety, and with high depression and high anxiety were at 2.46, 26.32 and 54.77 times more risk for suicide, compared to subjects with low depression and low anxiety [11]. Treatment options including antidepressants and cognitive behavioral therapy are similar for anxiety and depression [12]. Results from 51,547 respondents in a World Health Organization (WHO) study revealed that only 9.8% of individuals with anxiety disorder received adequate treatment, and only 27.6% received any form of treatment at all [13].

Several instruments have been developed for measuring depression, including the Beck Depression Inventory [14], Hamilton Depression Rating Scale [15], Montgomery-Åsberg Depression Rating Scale [16], and Lebanese Depression Scale [17]. A commonly used brief depression severity scale is the Patient Health Questionnaire – 9 (PHQ-9) [18], which has been shortened into the two-item PHQ-2 [19]. Similarly, other scales were developed to measure clinical anxiety, such as the Beck Anxiety Inventory [20], Lebanese Anxiety Scale [21], and Generalized Anxiety Disorder Scale – 7 (GAD–7) [22]. The 7-item GAD-7 scale was shortened into the 2-item GAD-2 [23]. However, these previous scales either measure depression or anxiety separately, but do not measure both. Considering the high comorbidity of these two disorders, there was a need for a validated and reliable scale that measures both depression and anxiety. From this perspective, the four-item composite Patient Health Questionnaire – 4 (PHQ-4) was developed by combining the PHQ-2 and GAD-2 scales [24], which contains two core anxiety items and two core depression items. The PHQ-4 leverages the cognitive-behavioral model to assess both anxiety and depression, rooted in the interconnection between thoughts, feelings, and behaviors. According to this model, mental health conditions are maintained by negative thought patterns and maladaptive behaviors. Depression is characterized by pervasive negative views about oneself, the world, and the future, while anxiety involves overestimating the likelihood of negative events and underestimating one’s ability to cope, contributing significantly to emotional distress [25]. Additionally, behavioral avoidance plays a critical role in both conditions; individuals with anxiety avoid situations that trigger their anxiety, and those with depression withdraw from activities that previously brought them joy. By identifying these cognitive and behavioral patterns, the PHQ-4 could effectively screen for anxiety and depression, aiding in early intervention and treatment [25]. In the original study [24], the PHQ-4 demonstrated construct validity, factorial validity, and internal reliability, proving to be an efficient ultra-brief tool for screening for anxiety, depression, or both. Higher scores on the PHQ-4 scale indicate a need for further assessment of anxiety, depression, or both.

Ultra brief assessment tools such as the PHQ-4 are essential for busy clinicians, patient follow-up after treatment, and time-restricted research. Numerous studies have shown that ultra-short two- or three-question tests perform comparably to their longer counterparts [26,27,28]. In fact, the PHQ-4 has shown comparable performance to longer depression and anxiety measures in terms of correlating well with measures of quality of life, disability days, and healthcare utilization [24]. The PHQ-4 has been validated and proved to be reliable for use in various populations, including the general population [25], college students [29], pregnancy [30] and attention-deficit/hyperactivity disorder (ADHD) patients [31]. Multiple languages including Spanish [32], Portuguese [33], Greek [34], Korean [35], and Colombian [36] has been translated and validated. Numerous studies have proven a cross-cultural validation of the PHQ-4 [37, 38]. The PHQ-4 has been translated to Arabic and validated on a group of Syrian refugees in Germany [39]. While the authors’ efforts are valuable in expanding the cross-cultural adaptability of the PHQ – 4, we believe that validating the Arabic PHQ-4 scale in Lebanon, despite its prior validation in the German study, is essential due to several reasons. Cultural differences significantly influence how mental health symptoms are expressed and perceived, necessitating a culturally relevant validation within the local context and population, to ensure accuracy. Additionally, the unique stressors and challenges faced by the specific population of Syrian refugees in Germany, such as different levels of access to resources and support, may impact mental health differently compared to the Lebanese population. Thus, the PHQ-4 still needs to be validated in Arabic-speaking adults from the general population more broadly. Psychometric properties like reliability and validity need testing within the specific population to confirm the scale's effectiveness. Finally, for seamless integration into the Lebanese healthcare system, local validation ensures that healthcare providers can trust and confidently use the scale for screening and diagnosing mental health conditions among Lebanese individuals. Thus, validating the PHQ-4 in Lebanon ensures it is culturally, linguistically, and contextually appropriate, providing accurate and reliable measurements for diagnosing psychological distress.

In Lebanon, research has shown that 17% of individuals met the criteria for a 12-month mental disorder, and 27% were classified as serious [40], and half of the respondents had a history of exposure to war-related traumatic events. Lebanon has recently experienced a series of profound tragedies, including the COVID-19 pandemic and the devastating explosion at the Beirut port on August fourth, which stands as the world's most powerful non-nuclear explosion [41]. Additionally, Lebanon is grappling with its worst economic crisis in modern history, characterized by the rapid devaluation of the national currency, one of the highest inflation rates globally, and severe shortages of essential resources such as electricity and fuel. All those factors are significantly impacting the well-being and contributing to an increase in psychological disorders among the Lebanese population. Consequently, interventions aimed at early detection and treatment may play a crucial role in reducing the persistence or severity of primary anxiety and depressive disorders and preventing the onset of secondary disorders.

The availability of a psychometrically validated ultra-brief scale such as the PHQ-4 in Arabic will assist clinicians in screening for anxiety and depression, provide a valuable follow-up tool to assess intervention efficacy, augment mental health research in the Arab world, and contribute to a broader cross-cultural validation of the PHQ-4. Therefore, the aim of this study was to translate the PHQ-4 into Arabic and determine the psychometric properties of this translation, including internal reliability, sex invariance, composite reliability, and its correlation with measures of psychological distress and well-being. We hypothesize that our translation of the PHQ-4 will demonstrate a fit for a two-factor solution similar to the original scale [24], good internal consistency reliability, adequate convergent and concurrent validity, as well as cross-sex invariance.

Methods

Participants and procedures

All data were collected via a Google Form link, between February and March 2023. After being trained by the research team, five university students were asked to collect data via the snowball sampling technique. Students were instructed to forward the link to acquaintances, who were asked to forward the link to other family members and friends. Inclusion criteria for participation included being of a resident and citizen of Lebanon of adult age. Exclusion criteria were those who refused to fill out the questionnaire. Internet protocol (IP) addresses were monitored to prevent duplicate survey responses. Participants provided digital informed consent before completing the survey instruments, which were presented in a pre-randomized order to control for order effects. The survey was anonymous and participation was voluntary and without remuneration. A total of 587 participants completed the survey (mean age of 34.48 ± 15.06 years, 69.4% females, 42.7% married and 74.0% with a university level of education).

Translation procedure

According to Beaton’s guidelines [42], the forward–backward translation approach was employed for the scale. Initially, the English version was translated into Arabic by two Lebanese translators who were unaffiliated with the study. Subsequently, two Lebanese psychologists who were proficient in English, back-translated the Arabic version back into English. To ensure the accuracy of the translation, the original English version and the translated one were compared, and any inconsistencies were identified and corrected by a committee of experts comprising the research team and the translators [43]. Furthermore, an adaptation of the measure to the Arab context was conducted to ascertain any potential misunderstanding of the item wordings and the ease of item interpretation, ensuring the conceptual equivalence between the original and Arabic scale in both contexts [44]. Following translation and adaptation of the scale, a pilot study was conducted with 20 participants to confirm comprehension of all questions; no alterations were made after the pilot study.

Measures

Patient Health Questionnaire (PHQ-4)

The PHQ-4 is a concise 4-item questionnaire designed to assess anxiety and depressive symptoms experienced over the past two weeks [24]. It comprises two subscales: anxiety (e.g. “Feeling nervous, anxious or on edge”) and depression (e.g. “Little interest or pleasure in doing things”), each consisting of two items. Each item is rated on a 4-point Likert scale ranging from 0 (not at all) to 3 (nearly every day). To calculate the total PHQ-4 score, the scores from all four items are summed. The cut-off score for the PHQ-4’s subscales is greater than or equal to 3. A score between 3 and 5 suggests mild psychological distress, between 6 and 8 suggests moderate psychological distress, and between 9 and 12 suggests severe psychological distress. A score of 0 to 2 indicates the absence of psychological distress.

The Depression Anxiety and Stress Scale-8 items (DASS-8)

This instrument has been developed and validated in Arabic by Ali et al. [45]. It is composed of eight items and three dimensions: (1) stress (two items; e.g. “I felt that I was using a lot of nervous energy”), (2) anxiety (three items; e.g. “I felt scared without any good reason”), and (3) depression (three items; “I was unable to become enthusiastic about anything”). Higher scores reflect higher level of symptom affirmation (McDonald’s ω = 0.88; Cronbach’s α = 0.88).

WHO-5 wellbeing index

Validated in Lebanon [46], formed of 5 items scored on a 6-point Likert scale with anchors ranging from “at no time” to “all the time” (e.g. “In the last two weeks, I have felt cheerful in good spirits”). Items are summed on a scale from 0 to 25, with higher scores reflecting higher wellbeing [47] (McDonald’s ω = 0.93; Cronbach’s α = 0.93).

Analytic strategy

Data treatment

There were no missing responses in the dataset. To examine the factor structure of the PHQ-4, we conducted a confirmatory factor analysis (CFA) using the data from the total sample via SPSS and AMOS (version 29) software. A minimum sample size of 80 participants was needed, based on 20 participants per item on the scale was deemed necessary to conduct the CFA [48]. Parameter estimates were obtained using the maximum likelihood method. Calculated fit indices were the normed model chi-square (χ2/df), the Steiger-Lind root mean square error of approximation (RMSEA), standardized root mean square residual (SRMR), the Tucker–Lewis index (TLI) and the comparative fit index (CFI). Values ≤ 5 for χ2/df, ≤ 0.08 for RMSEA, ≤ 0.05 for SRMR and ≥ 0.95 for CFI and TLI indicate good fit of the model to the data [49]. Multivariate normality was verified (Bollen-Stine p = 0.752). Convergent validity was confirmed via an average extracted variance (AVE) value > 0.5.

Sex invariance

To examine sex invariance of the PHQ-4 scores, we conducted multi-group CFA using the total sample [50]. Measurement invariance was assessed at the configural, metric and scalar levels [51]. We accepted ΔCFI ≤ 0.010 and ΔRMSEA ≤ 0.015 or ΔSRMR ≤ 0.010 as evidence of invariance [50], Comparison between males and females was done using the Student t-test only if scalar or partial scalar invariance held.

Further analyses

Composite reliability in both subsamples was assessed using McDonald’s omega (ω) and Cronbach’s alpha (α), with values greater than 0.70 reflecting adequate composite reliability [52]. The normality of the PHQ-4 score was verified, since the skewness and kurtosis values for each item of the scale varied between -1 and + 1 [53]. To assess concurrent validity, Pearson’s correlation coefficient was used to correlate the PHQ-4 scores with DASS-8 and WHO-5. Correlation coefficients values ≤ 0.10 were considered weak, ~ 0.30 were considered moderate, and ~ 0.50 were considered strong correlations [54].

Results

Five hundred eighty-seven participants completed the survey, with a mean age of 34.48 ± 15.06 years and 69.4% females. Other characteristics of the participants are summarized in Table 1.

Table 1 Sociodemographic and other characteristics of the participants (n = 592)

Confirmatory factor analysis

CFA indicated that the fit of the two-factor model of the PHQ-4 scores was modest: χ2/df = 0.13/1 = 0.13, RMSEA = 0.001 (90% CI < 0.001, 0.078), SRMR = 0.002, CFI = 1.005, TLI = 1.000. The standardized estimates of factor loadings were all adequate (Fig. 1). Internal reliability was excellent (McDonald’s ω = 0.86; Cronbach’s α = 0.86). The AVE value was satisfactory at = 0.65.

Fig. 1
figure 1

Standardized Estimates of Factor Loadings from the Confirmatory Factor Analysis in the total sample

Sex invariance

The indices suggested that configural, metric, and scalar invariance were supported across sex (Table 2). No significant difference was found between males and females in terms of the PHQ-4 total scores (4.75 ± 3.16 vs 4.67 ± 3.00, t(590) = 0.30, p = 0.762), PHQ-4 anxiety scores (2.31 ± 1.66 vs 2.27 ± 1.71, t(590) = 0.24, p = 0.807), and PHQ-4 depression scores (2.44 ± 1.68 vs 2.40 ± 1.57, t(590) = 0.32, p = 0.751).

Table 2 Measurement invariance across sex in the total sample

Convergent and concurrent validity

The PHQ-4 total score and the PHQ-4 depression and anxiety scores were significantly and moderately-to-strongly associated with lower wellbeing and higher DASS total and subscales scores (Table 3).

Table 3 Pearson correlation matrix

Discussion

The findings from this study suggest that an ultra-brief 4-item measure can reliably and validly measure depression and anxiety in the general population. Overall, the results support the reliability and validity of the instrument, as well as its suitability for use in Arabic-speaking adults from the general population. It is important to note the PHQ-4 serves only as a screening tool, and individuals with elevated PHQ–4 scores should undergo further assessment to determine whether they meet the full diagnostic criteria for either disorder or if intervention is warranted.

Though depression and anxiety often coexist [55, 56], assessment for both conditions seems necessary. Consistent with prior research conducted in various countries, such as Germany [25], Colombia [36], the United States [24, 32], Spain [57], and Iran [58], our study reaffirmed the two-factor structure of the PHQ-4, indicating distinct subscales for anxiety and depression. Hence, our findings support the differentiation between the two scales, PHQ-2 and GAD-2, rather than relying solely on one of them or the total PHQ-4 score. However, these finding contrasts with a study conducted among a sample of Quechua speakers, which supported a one-dimensional model, where anxiety and depression items combined to form a single latent variable of emotional problems [59].

Given the brevity of the PHQ-4, its reliability was notably high. In our study, internal reliability was excellent (McDonald's ω = 0.86; Cronbach's α = 0.86), slightly surpassing those reported in the German validation study for the PHQ-2 (α = 0.75) and GAD-2 (α = 0.82) [25], and to some extent similar to the values from previous studies conducted across various other populations [24, 25, 29, 30, 32, 35, 36]. This suggests that the PHQ-4 is equally effective in measuring symptoms of depression and anxiety in the Arabic-speaking population as it is in other demographic groups.

Another finding of our research is that the factor loadings of the Arabic PHQ-4 remained consistent across sex at the three levels (configural, metric, and scalar). Consequently, comparisons between sex groups showed no statistically significant variances in PHQ-4 total scores and two subscores within our sample. Put differently, individuals of both genders comprehend and interpret the significance of PHQ-4 items similarly. Consistent with our findings, studies conducted in Colombia [36], Germany [25], and Greece [34] have also provided evidence supporting the consistency of measurement across sexes.

Our findings also revealed that the PHQ-4 depression and anxiety scores were significantly and moderately-to-strongly associated with depression, anxiety and stress scores as measured using another brief scale, i.e. the DASS-8, thus confirming the convergent validity of the Arabic PHQ-4. Furthermore, the PHQ-4 total score and the PHQ-4 depression and anxiety scores were linked to lower levels of well-being, in line with a previous study [60]. Indeed, the presence of both anxiety and depressive disorders has been shown to have negative impacts on various aspects of an individual's life, including perceived well-being, satisfaction in relationships [61,62,63], poorer occupational outcomes [64], and more loneliness [65]. These negative impacts may be attributed to the common characteristics observed in individuals experiencing depressive and anxiety symptoms, such as fatigue, loss of energy, feeling slowed down or agitated, poor attention and concentration, slow thinking, distractibility, impaired memory, and indecisiveness [66], all of which contribute to a diminished well-being.

Our study showed that, according to the PHQ-4 cut-off, 24.0% of the participants had no depression, while 37.2%, 30.2% and 8.6% of the participants exhibited mild, moderate and severe psychological distress respectively. These findings are consistent with a prior study indicating a depression prevalence of 34.44% in the Southeast Asian context [67], as well as recent meta-analytic studies by Bueno-Notivol et al. [68] and Salari et al. [69]. Nevertheless, our results are higher than those reported in a German study, which suggested that 6.5% and 7.0% of participants had probable anxiety and depression, respectively [70].

Clinical implications

Providing a reliable and valid Arabic version of the PHQ-4 could help gather precise epidemiological information regarding anxiety and depression symptoms in Arab nations. This initiative could enhance awareness surrounding mental health (anxiety and depression) screening and diagnosis in Arab contexts, and guide the creation of culturally appropriate interventions grounded in evidence.

Limitations

Firstly, the snowball sampling method was used, which may have introduced sampling bias and restricted the representation of the general population. Hence, future studies should aim to employ more diverse and representative sampling techniques to enhance the external validity of the findings. Secondly, using cross-sectional data precludes the ability to assess the predictive validity and test–retest reliability of the PHQ-4. Additionally, despite the utilization of a substantial community sample of Lebanese participants in this study, access to the survey was restricted to individuals with internet connectivity, potentially leading to an incomplete representation of the entire adult general population. Finally, this study was conducted exclusively in Lebanon, thereby restricting the generalizability of our results to Arab-speaking individuals in other Arab and non-Arab countries.

Conclusion

The PHQ-4 proves to be a reliable, valid, and cost-effective tool for assessing symptoms related to depression and anxiety. Using reliable mental health screening instruments lessens the load on participants in extensive data gathering, facilitates swift estimation by researchers of the prevalence and intensity of mental health symptoms, assists in timely interventions and psychological support, and provides a sustainable method for monitoring and assessing mental health symptoms amidst economic crises and other humanitarian disasters. To evaluate the practical effectiveness of the Arabic PHQ-4 and to further enhance the data on its construct validity, future studies should assess the measure in diverse contexts and among specific populations.