Common mental disorders (CMD), characterised by significant levels of depressive, anxiety and/or somatic symptoms, appear to have a unidimensional underlying construct in community samples [11, 14]. “Caseness” for CMD may, therefore, best be defined as a level of symptom burden crossing a threshold of clinical significance, regardless of the precise constellation of symptoms present [11]. Conceptualising CMD caseness in this way has particular value cross-culturally as it (1) avoids presupposing the presence of Western-based diagnostic syndromes of mental disorder such as those outlined in the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) or International Classification of Disease, tenth edition (ICD-10) [2, 29], (2) recognises the relevance of non-Western manifestations of mental distress and (3) allows significance thresholds to be developed that are salient in the cultural setting [7, 21].

Reference measurement of CMD for research purposes most often employs a standardised, semi-structured clinician interview, for example the Schedules of Clinical Assessment in Neuropsychiatry (SCAN) [30]. By relying on standardised questions and diagnostic algorithms, this approach fails to capture the less differentiated but clinically important symptom combinations characterising community-level CMD. In addition, the length of such interview schedules limits their feasibility where clinician researchers are in short supply and can be burdensome for non-literate informants. The extensive training which is required is also not easily accessed from low-income settings.

In our quest for an alternative measure of CMD that would be reliable, valid and feasible for use in Ethiopia, we trained local psychiatrists in use of the Comprehensive Psychopathological Rating Scale (CPRS) [4]. The CPRS is an observer-rated scale which was designed to identify the presence and change over time of a broad range of symptoms and signs of mental disorder, as well as to provide a global rating of the presence or absence of mental disorder. Although the CPRS has mainly been used in clinical settings, the scale has face validity for use in the community and does not constrain clinicians to use international diagnostic criteria in deciding on a “case” of CMD. Using the CPRS means that the evaluation of caseness can benefit from the clinician’s psychiatric interviewing skills, clinical expertise and local knowledge while ensuring comprehensive assessment and standardised rating.

The primary aim of this study was to evaluate the reliability and feasibility of CPRS rating of CMD in Ethiopia. We also undertook exploratory factor analysis of CPRS as a means of evaluating the construct validity of CMD in this setting.


This study evaluated the measurement of CMD caseness using the CPRS as follows: (1) test-retest reliability, (2) inter-rater reliability and (3) construct validation.

Setting and sampling

The study was conducted among women aged 15–45 years recruited from governmental health facilities in Addis Ababa, the capital of Ethiopia.

  1. (1)

    Test-retest reliability study: Attendees at the Beletshachew Primary Health Care (PHC) clinic over 1 month in 2004 formed the sample population. Women were systematically sampled from a sampling frame composed daily from a list of all women registering with the clinic,

  2. (2)

    Inter-rater reliability study: A convenience sample of 99 women of reproductive age attending psychiatric (49.5%), medical (8.1%) and antenatal (36.4%) out-patient clinics held at Amanuel psychiatric hospital and St Paul’s general hospital over 1 month in 2005, and

  3. (3)

    Construct validation: Consecutive women attending the Addis Ketama and Selam Primary Health Care clinics for postnatal checks over 1 month in 2006.

Exclusion criteria

In all three studies women were excluded from participation if acutely unwell, that is requiring immediate medical attention or too mentally unwell to engage in an interview, or non-fluent in Amharic, the National language of Ethiopia.

Measurement of CMD using the CPRS

The CPRS has 66 items; 40 symptoms based on the subjective report of the interviewee, 25 signs rated on the basis of observation during the interview and a global rating indicating presence of significant mental disorder. The presence of clearly defined symptoms or signs of mental disorder is rated on a 4-point scale (0–3). The definitions of each scale point were standardised as follows: 0 = not present; 1 = doubtful whether present, and not interfering with life; 2 = definitely present and of moderate severity; 3 = severe or incapacitating. Clinicians were asked to conduct a full psychiatric interview and then complete the CPRS ratings using all available information. The interviewers were free to (1) phrase questions to be understandable to respondents, (2) ask about symptoms not included within the CPRS, (3) screen out culturally acceptable beliefs and behaviours, and (4) probe for indications of culturally relevant clinical significance.

The judgment as to caseness of CMD did not depend on a simple tally of CPRS items, but rather on the basis of significance criteria, as follows:

(1) Subjective report of significant distress, (2) Interfering with functioning (occupational, social or interpersonal), (3) Objectively significant disorder even if not considered so by the participant, for example due to lack of insight or (4) Response considered disproportionate to any adverse circumstances reported or representing a change for that individual even in the presence of ongoing adversity.

Training in CPRS

Initially four Ethiopian trainee psychiatrists, with a minimum of 6 months psychiatric experience, were trained and participated in the test-retest study. Two of these psychiatrists also participated in the inter-rater study and were supplemented by two new psychiatrists who received their own training in CPRS. The CPRS items were not translated into the National language of Ethiopia (Amharic) because they are not intended to be read out verbatim. All medical education in Ethiopia, including psychiatric training, is conducted in English, and therefore training was focused around ensuring full understanding of the concepts behind the items in English. Each item of the CPRS was discussed with three senior Ethiopian psychiatrists, and practice interviews and ratings carried out over 3 days. Discrepancies in rating were discussed and consensus reached.


  1. (1)

    Test-retest reliability study: Assessment of test-retest reliability involves comparing ratings from separate interviews conducted on the same patient by different interviewers. Thus test-retest reliability includes the process of eliciting a feature of mental disorder as well as the rating of the elicited symptom or sign. In this study, each participant was interviewed consecutively by two different Ethiopian psychiatrists. The results of this test-retest study, together with the need to evaluate the reliability of new interviewers joining the project, led us to conduct an inter-rater reliability study.

  2. (2)

    Inter-rater reliability study: When assessing inter-rater reliability, the participant’s responses are independently and simultaneously rated by two observers, one of whom conducts the interview. This method only measures agreement in the way that symptoms and signs are rated.

Although a less stringent test than test-retest, the inter-rater reliability study was feasible and allowed the new psychiatrists direct experience of the interviewing style of the original interviewers. In this study, one psychiatrist conducted a full psychiatric examination of the participant and then both psychiatrists (interviewer and observer) independently rated the CPRS items.

In both reliability studies, allocation of the four psychiatrists was randomised using an incomplete blocks design to ensure equal numbers of possible psychiatrist pairs. The order in which the psychiatrists interviewed patients (test-retest study) and whether they were interviewing or observing (inter-rater study) was also randomly assigned. See Table 1 for the allocation of interviewers. Each psychiatrist was masked to the other ratings. CPRS item scores and a global CMD rating were recorded.

Table 1 Randomly allocated interviewer pairs for the test-retest and inter-rater reliability studies (individual interviewer identified A to E)

In order to understand how the CPRS cases of CMD related to international diagnostic criteria, the psychiatrists were also asked to document the presence of any axis I diagnoses according to DSM-IV, regardless of whether or not participants were categorised as CPRS cases of CMD. All of the psychiatrists were experienced in application of the DSM-IV but were additionally supplied with the DSM-IV criteria and asked to record the following diagnoses where present: depressive disorders (296.2/296.3/296.9/300.4), anxiety disorders (300.x), acute stress reaction (308.x), adjustment disorder and post-traumatic stress disorder (309.x).

Construct validity

The construct validity of CMD in Ethiopia was assessed by examining the factor structure of the CPRS scale. We were interested in the presentation of CMD in primary health care/antenatal care. Therefore reliability study participants who were recruited from psychiatric or medical outpatients were excluded, leaving a sample n = 138. In addition, CPRS data was available on 100 consecutive postnatal women attending PHC services for vaccination of their new infant. The resulting combined sample size used for construct validation numbered 238.

Statistical methods

Sample size calculation

Assuming the prevalence of CMD in the Primary Healthcare setting to be 15%, in order to estimate κ with a 95% confidence interval of width 0.4 assuming a true value of 0.7, a sample size of 100 women was needed [8].

Data analyses

Data were analysed using Stata version 8.0 [26]. Cohen’s kappa (κ) [9] was calculated to show the degree of agreement in categorisation of CMD caseness (CPRS global rating of 2 or 3) over and above that expected by chance alone. Agreement on rating individual CPRS items was estimated using weighted κ coefficients with weights: 1−|ij|/(k−1), where i and j index the rows and columns of the ratings by the two raters and k is the maximum number of possible ratings. This accounted for the distance between ratings in calculating the level of agreement. Agreement on total CPRS score was evaluated as recommended by Bland and Altman [6]. Maximum likelihood factor analysis with varimax rotation was carried out and factors extracted on the basis of the scree plot, the amount of variance explained by the factor and interpretability of the resulting factors [23].

Ethical considerations

Ethical approval was granted by Research Ethics Committees in the relevant academic institutions in Ethiopia and the UK. Participants gave informed and voluntary consent. Women with significant mental health problems were referred for free treatment.


Sample characteristics

The mean age of participants in the test-retest and inter-rater reliability studies (n = 99 in each sub-sample) was 25.5 (standard deviation (SD) 6.25) and 27.0 (6.6) years, respectively, and in the final factor analysis sample (n = 238) was 25.7 years (5.5). In the test-retest study the majority of women (n = 56; 65.9%) underwent both interviews on the same day, the remainder less than 10 days apart. Participants for the inter-rater study came predominantly from psychiatric (n = 43; 45.3%) and antenatal clinics (n = 39; 41.0%).

Caseness for CMD

The estimated prevalence of CPRS cases of CMD was 33.3% in the test-retest study (first interview) and 23.3% in the inter-rater study (interviewer rating). In both studies the prevalence of “any DSM-IV diagnosis” was significantly higher than CPRS cases of CMD (Test-retest study χ2 = 27.80; P < 0.001. Inter-rater study χ2 = 46.25; P < 0.001, Table 2).

Table 2 Prevalence of global CPRS caseness and DSM-IV diagnoses and estimated kappa for diagnostic agreement

The distribution of DSM-IV diagnoses in CPRS cases of CMD is shown in Table 3.

Table 3 Frequency distribution of primary DSM-IV diagnoses in cases of common mental disorder according to the CPRS

The median total CPRS score in cases of CMD vs. non-cases was 21 (IQR 17) vs. 9 (IQR 12) for the test-retest study and 23 (IQR 10) vs. 1 (IQR 3) for the inter-rater study.

Reliability of assessment of CMD caseness

Test-retest κ for global CPRS caseness was fair (κ 0.29) but inter-rater reliability was excellent (κ 0.82). See Table 2. The κ for test-retest reliability was no different if the interviews were carried out on the same day (κ 0.27) compared to being up to 1 week apart (κ 0.26). Post-hoc inspection of κ for interviewer pairs indicated that agreement for presence or absence of CMD in pairs including interviewer A (A–B κ 0.16; A–C κ 0.33; A–D κ 0.03) was substantially lower than the other pair combinations (B–C κ 0.51; C–D κ 0.44; B–D κ 0.40).

Agreement for total CPRS score

The mean difference in CPRS total score in the test-retest study was 1.1 (95%CI −0.9 to 3.0), although the limits of agreement (± 2SD from the mean) were −18.3 (95%CI −21.7 to −14.9) and 20.4 (95%CI 17.1 to 23.8) indicating substantial variation in the differences in CPRS score between interviewers. See Fig. 1 for the inter-rater study, the mean difference in CPRS score was 0.1 (95%CI −0.6 to 0.7). The limits of agreement were −6.0 (95%CI −7.1 to −5.0) and 6.1 (95%CI 5.0 to 7.2) indicating much less variation in agreement in total CPRS score between raters.

Fig. 1
figure 1

Difference in CPRS scores between assessing psychiatrists for (i) test-retest and (ii) inter-rater assessments

CPRS item frequencies

The prevalence of endorsement of individual CPRS items (dichotomised to indicate clinical significance: 0/1 vs. 2/3) is shown in Table 4.

Table 4 Weighted kappa for CPRS items present with a prevalence of greater than one percent

In CMD cases, six of the top ten most prevalent CPRS items were the same in both studies (prevalence in test-retest vs. inter-rater studies): fatiguability (42.4% vs. 60.9%), pessimistic thoughts (42.4% vs. 39.1%), inability to feel (39.4% vs. 43.5%), reduced sleep (39.4% vs. 43.5%), reduced sexual interest (36.4% vs. 39.1%) and concentration difficulties (36.4% vs. 60.9%). Suicidal thoughts were present in 27.3% and 17.4% of cases in the test-retest and inter-rater studies, respectively. The only observational items rated as clinically significant were apparent sadness and reduced speech, both found exclusively in cases of CMD.

Agreement in rating individual CPRS items

In the test-retest reliability study the weighted κ for nearly two-thirds of CPRS items was 0.3 or higher, indicating moderate agreement (Table 4). For clinically significant CPRS items present at a prevalence of ≥5%, the lowest weighted κ estimate was for aches and pains (0.12) and phobia (0.16) whereas the best agreement was found for suicidal thoughts (0.59), reduced appetite (0.50) and reduced sleep (0.49).

In the inter-rater reliability study, the weighted κ for almost all CPRS items was above 0.70, indicating excellent agreement. The two exceptions were both observational items; apparent sadness (0.53) and reduced speech (0.26).

Factor analysis of CPRS

The frequency of CPRS item endorsement was inspected and items where 2.5% or fewer participants scored one or above were excluded. Likewise items that did not have a correlation of ≥0.30 with any other CPRS item were excluded. Factor analysis using Maximum Likelihood methods with varimax rotation gave a one factor solution explaining 24.7% of the variation. Item-factor correlations are shown in Table 5.

Table 5 Factor analysis of comprehensive psychopathological rating scale items in the combined sample (n = 238) using maximum likelihood with varimax rotation


Overall, measurement of CMD caseness in Ethiopia with local psychiatrists using the CPRS was shown to be reliable and feasible. Exploratory factor analysis supported the construct validity of CMD measured by CPRS in this setting. Strengths of our study include the sample size and efforts to define caseness in a socioculturally relevant way.

The majority of reports evaluating the reliability of CMD measurement estimate inter-rater rather than test-retest reliability and are often based on too small sample sizes [1, 3, 13, 19, 20, 28]. Agreement over global CPRS rating of caseness was only reported in one study from Japan, where κ for inter-rater agreement was lower than in the present study, ranging from 0.58 to 0.74 depending on the interviewer pair [13]. Other inter-rater studies examining the rating of individual CPRS items tend to show good to excellent agreement between different mental health professionals and across cultural settings within Europe [1, 19, 20], with poorer agreement for rating of observed items than self-reported [10, 18]. Kappa has also been noted to be lower for “neurotic” rather than “psychotic” CPRS items [10].

In this study, inter-rater reliability of CMD caseness was excellent and indicates that reliable application of CPRS as a gold standard measure of CMD in Ethiopia is possible with thorough training and clearly defined criteria for determining clinical significance. The results compare favourably with the reliability of more standardised clinical interviews; for example, the Structured Clinical Interview for DSM-III-R (SCID) [25] had κ of 0.37 for “any current diagnosis” in a non-patient sample [28]. Reliability of diagnostic agreement for SCAN diagnosis of depression has been variable across studies, varying from κ of 0.78 (test-retest) in field trials [27] to κ of 0.37 (inter-rater) in an Australian out-patient clinic sample [3], but more consistent for “any DSM diagnosis” (test-retest) κ of 0.62 [24] and inter-rater κ of 0.67 [27]).

In the present study, the difference between estimated values of κ for detection of CMD in the test-retest and inter-rater studies was sizeable. There are two potential additional sources of variability when comparing test-retest to inter-rater studies: first, differences in the way clinicians elicit symptoms and signs, and second, variation in symptoms or the way the interviewee discloses symptoms between interviews. In the test-retest study, participants had presented to the primary care centre because of perceived ill health. The first CPRS interview sometimes took place prior to assessment by the primary care staff, whereas the second interview usually took place afterwards, and it is possible that participants were systematically less distressed once their physical health problems had been attended to. It was also noted by the study psychiatrists that most participants appreciated talking at length about their difficulties and appeared to derive therapeutic benefit from the research interview. This might also lead to a diminution of symptoms by the time of the second research interview. On the other hand, many of the participants were physically unwell and waiting around for the second interview could have accentuated their symptoms. Order effects for reporting of symptoms of mental disorder are well-established in the literature, and it is postulated that participants may learn to articulate their symptoms of mental distress more clearly with practice, reflect upon their experiences and recall more symptoms or indeed modify their responses to decrease the interview duration [12].

Discussion of discordant cases in the test-retest reliability study revealed differing thresholds for deciding on whether a participant’s expressions of mental distress was understandable in view of the level of poverty and social adversity which they were experiencing. Further practice of joint rating of cases prior to the inter-rater reliability study to check application of the criteria for caseness is likely to have contributed to the observed difference in κ estimate in the second study.

The prevalence of CPRS cases of CMD differed markedly between the two reliability studies; 33.3% in the primary care sample vs. 23.2% in the sample from out-patient psychiatry, general medical and antenatal clinics. Random sampling was not employed in either study and so selection bias could have influenced the prevalence estimates in unpredictable ways. Although the sample including psychiatric out-patients might be expected to have a higher morbidity of mental disorder, most patients were receiving treatment and attending for routine follow-up and would therefore expected to be relatively well. In contrast, at present there is no mental health care available to patients attending primary healthcare clinics and thus the burden of untreated disorder may well be higher. As the purpose of the study was to establish the reliability of measuring CMD caseness rather than to determine prevalence of CMD, the aforementioned differences should not have affected the veracity of our findings.

Determination of the presence of a DSM-IV diagnosis or caseness for CMD was carried out by the same assessor and thus caution is required in interpreting the relation between the categorisations. Nonetheless it is interesting that many women fulfilling criteria for DSM-IV diagnoses were not considered to be cases of CMD. It is possible that DSM-IV diagnoses unduly pathologise expressions of mental distress in this setting which are understandable with local expertise.

Exploratory factor analysis indicated that the construct of CMD in Ethiopia is unidimensional and includes depressive, anxiety and somatic symptoms. Most previous factor analytical studies of the CPRS have been conducted in clinical populations, included persons with psychotic disorders and were based on smaller sample sizes than the present study. These variations in design may explain the wide variation in proposed factor structures for CPRS [5, 1517, 22]. Our findings most closely matched the Norwegian study conducted in persons with depression or anxiety disorders [16]. This lends support to the construct validity of CPRS in Ethiopia. However, the identified factor only accounted for a relatively small amount of the overall variability (24.7%). The reason for this could have been our use of a primary healthcare sample where the number and range of symptoms of CMD would be expected to be lower than in psychiatric settings. In addition, CPRS items were originally selected as time-varying aspects of syndromes of mental disorder and thus do not include the full range of CMD characteristics recognised in Western settings, for example guilt. Likewise, culture-specific symptoms, which might be pertinent to the construct of CMD in Ethiopia, are not contained within the CPRS. Further examination of the construct of CMD in community samples from low-income settings is warranted.


Detection of socioculturally meaningful cases of CMD in Ethiopia can be reliably achieved with local psychiatrist assessment using CPRS, although thorough training is essential.