Journal of Psychopathology and Behavioral Assessment

, Volume 32, Issue 2, pp 246–254

Comparison of Eleven Short Versions of the Symptom Checklist 90-Revised (SCL-90-R) for Use in the Assessment of General Psychopathology

Authors

    • Department of Child and Adolescent PsychiatryUniversity Hospital Münster
  • Christian Postert
    • Department of Child and Adolescent PsychiatryUniversity Hospital Münster
  • Thomas Beyer
    • Department of Child and Adolescent PsychiatryUniversity Hospital Münster
  • Tilman Furniss
    • Department of Child and Adolescent PsychiatryUniversity Hospital Münster
  • Sandra Achtergarde
    • Department of Child and Adolescent PsychiatryUniversity Hospital Münster
Article

DOI: 10.1007/s10862-009-9141-5

Cite this article as:
Müller, J.M., Postert, C., Beyer, T. et al. J Psychopathol Behav Assess (2010) 32: 246. doi:10.1007/s10862-009-9141-5

Abstract

Eleven short versions of the Symptom Checklist (SCL-90-R) assessing general psychopathology, containing 5 to 53 items, were compared on the basis of data from a sample of one hundred mothers of 0-to-6-year-old children referred for treatment at a Child Psychiatric Family Day Hospital in Münster, Germany. The SCL short versions were compared with regard to internal consistency, sensitivity and specificity, ability to distinguish between subjects by a new test index (PDTS), and association with indicators of validity (SCL-90-R Global Severity Index, BDI scores). All short versions showed almost equally high internal consistency, sensitivity and specificity, and high correlations with validity indices. The PDTS test index describes a ‘good’ ability of the original SCL-90-R to differentiate between subjects, a ‘moderate’ performance for the BSI, the HSCL-25 and the SCL-27, and a ‘poor’ performance of the very short forms—according to the standards of interpreting PDTS scores. The SCL-10S is recommended for screening purposes because this scale represented the best compromise between economy and accuracy. However, for other research and clinical purposes, the use of one of the longer short versions (BSI, HSCL-25, or SCL-27) is recommended because of their superior discriminative ability.

Keywords

SCL-90-RSymptom checklistShort versionInternal consistencyValidityPDTSScreening

The Symptom-Checklist 90-Revised (SCL-90-R; Derogatis 1992; German version: Franke 2002) is a widely used measure designed for screening and assessment of psychopathology, symptom burden and treatment effectiveness (Hardt and Brähler 2007). Internal consistency, retest reliability and validity of the questionnaire have been proven in many studies (Franke 2002). However, the postulated nine-factor-structure of the instrument could not be replicated (Hessel et al. 2001). Instead, one global factor appeared which indicates general symptom stress. This factor may best be represented by the global scores, especially the Global Severity Index (GSI), which is calculated as the mean of all items (Hardt and Brähler 2007). Information about global symptom distress could be obtained with a much more economical instrument than the SCL-90-R (Hessel et al. 2001). In order to assess symptom burden more economically, several short versions of the SCL-90-R have been developed during the last decades. These short versions comprise between 5 (SCL-5; Strand et al. 2003) and 53 items (Brief Symptom Inventory; Derogatis 1993).

Short Versions of the SCL: Characteristics and Development

In the following, we will describe the characteristic features and the development of eleven selected short versions in more detail: the HSCL-25 (Derogatis et al. 1974), the BSI (Derogatis 1993), the SCL-10N1 (Nguyen et al. 1983), the BSI-18 (Derogatis 2000), the SCL-10R and the SCL-6 (Rosen et al. 2000), the SCL-K-9 (Klaghofer and Brähler 2001), the SCL-27 (Hardt and Gerbershagen 2001), the SCL-5 and the SCL-10S (Strand et al. 2003), and the SCL-11 (Lutz et al. 2006).

Note that some SCL-90-R short versions were derived from already existing short versions or precursor versions of the SCL-90-R. In fact, the development of the SCL-90-R began as early as the 1950s (Schauenburg and Strack 1999). Specifically, the SCL-10N, the SCL-10R and the SCL-6 were developed on the basis of the SCL-90 (Derogatis et al. 1973), which is the direct precursor of the SCL-90-R with almost identical items. Originally, the SCL-90 was derived from the Hopkins Symptom Checklist-58 (Derogatis et al. 1974), which is also the origin of the Hopkins Symptom Checklist-25 (HSCL-25, Derogatis et al. 1974; Hesbacher et al. 1980). Therefore, it is justifiable to regard all of these scales—the HSCL-25, the SCL-10N, the SCL-10R and the SCL-6—as short versions of the SCL-90-R (Hardt and Brähler 2007).

The short versions to be investigated in this study will be introduced in order of their year of publication. We use the term “scales” when we refer to the dimensions of the questionnaires which were originally intended by the test authors. We use the term “factor” when we refer to empirically derived dimensions, which rely on a factor analysis. The factors are not necessarily identical to the scales, and therefore may be labelled somewhat differently.

HSCL-25

The Hopkins Symptom Checklist-25 (HSCL-25; Derogatis et al. 1974; Hesbacher et al. 1980) covers symptoms from the Anxiety, Depression and Somatic dimension (Hardt and Brähler 2007). Cut-off-scores for screening purposes were devised by several authors (see e.g. Winokur et al. 1984). The scale shows satisfactory to good internal consistency, interrater reliability, test-retest reliability, and validity (Lee et al. 2008).

SCL-10N

The SCL-10N was created as a short form of the SCL-90 (Nguyen et al. 1983). Ten items were selected for their factor loadings on the three most important factors identifiable in a psychiatric population; Depression, Somatization and Phobic Anxiety (Hoffmann and Overall 1978).

BSI

The Brief Symptom Inventory (BSI; Derogatis 1993) was designed as a short, but multi-dimensional questionnaire with the nine original dimensions of the SCL-90-R (Somatization, Obsessive-Compulsive Behaviour, Interpersonal Sensitivity, Depression, Anxiety, Hostility, Phobic Anxiety, Paranoid Ideation, and Psychoticism). Containing 53 items, the BSI is more comprehensive than most other short versions. Internal consistency and test-retest-reliability coefficients are satisfactory to good and several studies favor a one-factor-solution (Benishek et al. 1998; Loutsiou-Ladd et al. 2008).

BSI-18

The Brief Symptoms Inventory-18 (BSI-18; Derogatis 2000) was developed on the basis of the BSI (Derogatis 1993) and the SCL-90-R (Derogatis 1992), in order to create a leaner self-report measure of psychological distress. In contrast to the BSI, which includes 53 items on nine dimensions, the BSI-18 includes 18 items on three dimensions: Depression, Anxiety, and Somatization (Recklitis et al. 2006). However, similar to other short versions, there is empirical evidence for unidimensionality of the BSI-18 (Asner-Self et al. 2006; Prelow et al. 2005). Reliability and validity of the questionnaire were demonstrated in several studies (Prelow et al. 2005).

SCL-10R

In their review of studies on the factorial structure of the SCL-90, Rosen et al. (2000) came to the conclusion that there was always a predominant first factor related to general distress, and sometimes a secondary factor which was related to the Somatization, Social insecurity, Hostility and Paranoid thinking scales. On the basis of these findings, Rosen and colleagues developed the SCL-10R as an alternative to the SCL-10N. In contrast to the SCL-10N, the SCL-10R includes items from all nine of the original SCL-90 subscales, in order to more broadly represent both the primary and the secondary factor. Six items representing the primary factor were selected (a) on the basis of factor loadings which were obtained in previous studies, and (b) with the aim of including as many of the original SCL-90 subscales as possible. Additionally, four items representing the secondary factor were selected on the basis of factor loadings which were obtained in previous studies. These items belonged to the original SCL-90 Somatization, Avoidance, Hostility and Paranoia subscale.

SCL-6

In addition to the SCL-10R, Rosen et al. (2000) created the SCL-6 as a short index which was aimed at representing only the primary distress factor of the SCL. For this purpose, the authors selected two items each from the SCL-90 Depression, Anxiety, and Psychoticism subscales on the basis of (a) the number of studies in which they loaded on the primary factor and (b) their average factor loading on the primary factor. The resulting 6-item-scale is considered unidimensional (Hardt and Brähler 2007).

SCL-K-9

The SCL-K-9 by Klaghofer and Brähler (2001) includes items from all of the nine original subscales of the SCL-90-R. The authors chose one item out of each subscale which correlated most highly with the GSI, in order to achieve concordance with the SCL-90-R. In this way, they created a broad, but unidimensional index including different symptom categories with high internal consistency (α = .87) and a strong association to the original questionnaire (r = .93).

SCL-27

The SCL-27 by Hardt and Gerbershagen (2001) was designed as a short screening instrument for psychopathology in chronic pain patients. The authors selected 27 items out of the SCL-90-R which represent six dimensions specified on the basis of an exploratory factor analysis. Items were selected according to high convergent and low discriminant correlations. Additionally, the authors excluded all items which are contained in the SCL-K-9 (Klaghofer and Brähler 2001), in order to retain the possibility of assessing patients independently with both measures (Hardt and Gerbershagen 2001). The six dimensions cover despressive, dysthymic, vegetative, agoraphobic, mistrust and social phobia symptoms. The authors report high reliability of the subscales as well as a high correlation with the GSI of the 90-item-version. Cut-off-points for screening purposes are delivered (Hardt and Gerbershagen 2001).

SCL-5 and SCL-10S

The SCL-5 and the SCL-10S (Strand et al. 2003) were developed for the conduction of population surveys in Norway (Tambs and Moum 1993). The scales were not directly derived from the SCL-90-R, but from the HSCL-25. The items of the SCL-5 and SCL-10S were selected on the basis of high correlation with the HSCL-25 global score.

SCL-11

The SCL-11 was developed by Lutz et al. (2006) with the intention of creating a short, but multidimensional and change-sensitive outcome measure for the evaluation of therapeutic progress in psychotherapy and psychiatry. The authors chose a stepwise item selection procedure on the basis of the BSI (Derogatis 1993). In several steps, 11 items were selected from the BSI according to content and convergent validity (correlation with BDI and SCL-90-R scales), with regard to content validity (depression and anxiety according to DSM-IV and ICD-10-criteria), retest-reliability, and change-sensitivity.

Research on the SCL-90-R short versions has generally shown high correlations between the original version and the abbreviated versions. This is particularly relevant as some of the short versions include items from the SCL-90, the precursor of the SCL-90-R. Items from the SCL-90 may differ slightly in wording from SCL-90-R items. However, the high correlations suggest that these differences in wording are of minor importance and do not limit the comparability of the shortened versions.

All items included in the aforementioned shortened versions belong to certain scales from the original SCL-90-R (Table 1), but not all scales from the original SCL-90-R were included in all short forms. Importantly, items from the original Depression and the Anxiety scale were considered in all shortened forms. The Somatization scale was considered in all abbreviated versions except the SCL-5, the SCL-6 and the SCL-11. Items from the Phobic fear scale were included in the SCL-10N, the BSI, the SCL-10R, the SCL-K-9 and the SCL-27. Items of the Obsessive-compulsive scale were included in the BSI, the SCL-10R, the SCL-K and the SCL-27. The Psychoticism scale was considered in the SCL-10N, the BSI, the SCL-10-R, the SCL-6 and the SCL-K-9. Social insecurity and Paranoid thinking were represented in the BSI, the SCL-10R, the SCL-K-9 and the SCL-27. Hostility items were included only in the BSI, the SCL-10R and the SCL-K-9. Consequently, although the short versions differ in their specific content, they all include some items from the original Depression and Anxiety scales of the SCL-90-R. This characteristic justifies the use of a depression measure, for example the BDI, as a validation criterion for the short versions.
Table 1

Scales of the short forms of the SCL-90-R

SCL-90-R scales

HSCL-25

SCL-10N

BSI

BSI-18

SCL-10R

SCL-6

SCL-K-9

SCL-27

SCL-5

SCL-10S

SCL-11

Depression

X

X

X

X

X

X

X

X

X

X

X

Anxiety

X

X

X

X

X

X

X

X

X

X

X

Somatization

X

X

X

X

X

 

X

X

 

X

 

Phobic fear

 

X

X

 

X

 

X

X

   

Obsessive-compulsive

  

X

 

X

 

X

X

   

Psychoticism

 

X

X

 

X

X

X

    

Social insecurity

  

X

 

X

 

X

X

   

Paranoid thinking

  

X

 

X

 

X

X

   

Hostility

  

X

 

X

 

X

    

Objectives of the Present Paper

In the previous section we have shown that the abbreviated versions of the Symptom Checklist were developed with different purposes in mind and on the basis of different procedures. Therefore we do not know which of the scales performs best in a clinical population. Although there is a plentitude of literature on the psychometric quality of certain short versions, to our knowledge there has to date been no study which has compared all available short versions of the SCL-90-R simultaneously.

Moreover, in this study the scales will be compared by using a newly developed test index PDTS2 (Probability of Distinct Test Scores; Müller 2006a), which is especially useful in describing differences between short questionnaires. The descriptive test index PDTS is defined as the ratio of the number of statistically different test scores, to the total number of all test score comparisons, and reports the probability of obtaining statistically different test scores (PDTS). The index is sensitive to skewed score distribution (more frequently observed for short tests) and to item number, both of which lower the chances of distinguishing between test scores (Müller 2006a). The index reflects the practical limitations which result from the shortening of questionnaires better than traditional test indices do. The PDTS also works well with small sample sizes, which has been demonstrated in the simulation study of Müller (2006a). Sample sizes of n > 50 show satisfactory psychometric properties (standard error is below 1% of a PDTS score). A further advantage is that the index provides an easy-to-understand summary for a test user who may not be familiar with psychometric theory. The traditional psychometric criteria (internal consistency, validity) are also considered, but they cannot appropriately reflect the diagnostic limitations of very short tests.

Finally, it would appear opportune to define what a ‘good’ PDTS score is and thus broaden the technical standards proposed in Müller (2006a). We propose five categories to label a PDTS score, from ‘very poor’ to ‘excellent’. A ‘very poor’ PDTS is below 30%, ‘poor’ between 30% and 45%, ‘moderately’ for 45% up to 60%; ‘good’ for 60% up to 75%; ‘very good’ for 75% up to 90% and higher values than 90% as ‘excellent’. Nevertheless, it should be noted that an acceptable lower boundary depends on the diagnostic question posed.

In this study, our aim is to evaluate the eleven short versions of the SCL-90-R with regard to their psychometric qualities for the assessment of general psychopathology/emotional distress. The short versions will be compared on the basis of data from mothers of young children who were referred for treatment at an infant and preschool child psychiatric family day hospital. We have chosen this specific population because mothers of young children with mental health symptoms tend to exhibit various psychological and psychosomatic health symptoms (von Hofacker and Papousek 1998) which are indicative of general pathology. Our evaluative questions are as follows:
  • Do the short versions show acceptable internal consistency when applied in our sample (homogeneity)?

  • Do the short versions perform in a comparable way related to their ability to discriminate between subjects, who show clinical versus non-clinical levels of distress, by their test scores (new test criterion PDTS)?

  • Do the short versions correlate in a convergent way with the original SCL-90-R score (internal validity)?

  • Do the short versions correlate with the BDI, a standardized questionnaire assessing the degree of depression (concurrent validity)?

  • Do the short versions classify subjects as “distressed” or “not distressed” as well as the SCL-90-R (sensitivity, specificity)?

On the basis of these questions, we will try to evaluate whether the short versions of the SCL-90-R reach a psychometric quality comparable to the original version. In summarizing, we will suggest which of the scales appear particularly suitable for application in screening and research contexts.

Method

Procedure

This study is part of a broader study of children and their families who attended the Preschool Child Psychiatric Family Day Hospital Münster between 2002 and 2007. Every child was accompanied by a parent, in most cases (98.5%) the mother. The accompanying parents underwent a comprehensive psychiatric assessment on admission, as parental psychopathology may be an important contributing or sustaining factor of child mental health symptoms (Ramchandani et al. 2005). The self-rating questionnaires SCL-90-R and BDI were completed by accompanying parents as part of the standard assessment procedure. Additional clinical diagnostic assessment of parents was conducted by a consultant psychiatrist.

During the period of data collection, 2002 through 2007, a total of 123 parents were admitted to the Family Day Hospital and took part in the standard assessment procedure. 20 mothers and one father did not complete the SCL-90-R. We did not include two completed questionnaires from fathers to further sample homogeneity. There were no additional exclusion criteria. Finally, n = 100 mothers were included in the analysis. Mothers who participated in the study completed the full length version of the SCL-90-R. The short versions which were selected for comparison were subsequently compiled from the total of 90 items.

Sample

Demographic Information

The mean age of the mothers was M = 33.12 years (SD = 5.73) with a range between 19.0 and 46.7 years. Information on education was available in n = 70 mothers. Of those, 2.9% had no certificate of secondary level educational attainment, 30.0% had a secondary general school certificate (9 years of formal education), 38.6% had a secondary school level I certificate (10 years of formal education), 4.3% had an advanced technical college entrance qualification (12 years of formal education) and 24.3% had a general university entrance qualification (13 years of education). Nationality was known in n = 92 mothers. Of those, 98.9% were from Germany and 1.1% from other European countries.

Measures

SCL-90-R

The well-established self-report inventory by Derogatis (1992) consists of 90 items which measure psychological and psychosomatic symptoms with a time frame of one week. Subjects rate each item on a 5-point Likert scale (0 = “no problem” to 4 = “very serious”). For this study only the global score GSI was computed.

BDI

The Beck Depression Inventory (Beck et al. 1988; German version: Hautzinger et al. 1995) assesses the severity of depression in adult subjects and consists of 21 groups of items. Items are selected following DSM diagnostic criteria for major depression. Subjects are asked to choose the one item out of a group which describes best how they have felt during the last seven days. Reliability and validity of the instrument have been proven in numerous studies (review of the German version see Richter et al. 1998). In this study the BDI was chosen as a criterion for comparison of the SCL-90-R short versions because all short versions include items from the original Depression scale of the SCL-90-R.

Statistical Analyses

Cronbach’s alpha was estimated to evaluate the internal consistency of each scale. A second aspect of test equivalence refers to change in dimensionality assessed by the correlation between the sum score of short versions and the GSI of the complete questionnaire. In order to create an internal validity criterion, mothers were classified as “distressed” or “not distressed” on the basis of the published SCL-90-R cut-off score (GSI ≥ .57; Schauenburg and Strack 1998). This classification was used as the standard against which the classifications of the short versions were compared with regard to their sensitivity, specificity, and positive and negative predictive value. Sensitivity (SE) refers to a test’s ability to produce a positive screening result for subjects who actually have the tested condition, while specificity (SP) refers to a test’s ability to produce negative results for subjects who do not have the condition. Therefore, SE and SP are quantitative measures of screening accuracy. The positive predictive value (PPV) informs about the proportion of correct results among all positive screening results, while the negative predictive value (NPV) quantifies the proportion of correct results among all negative screening results. Therefore, PPV and NPV reflect the efficiency of screening.

In order to check for deviation from the known association of GSI to BDI, all short versions were analyzed to establish their correspondence with the BDI. All statistical analyses were conducted with the Statistical Package for the Social Sciences (SPSS) 15.0. The PDTSCTT was calculated by a SAS program (Müller 2006b). Data were complete in all analyses which dealt exclusively with the SCL-90-R and any of the short forms (n = 100). BDI data were available in n = 94 cases (94%).

Results

Descriptive Results

In Table 2, the number of items per scale, mothers’ mean scores and standard deviation on the SCL-90-R and on the short scales are reported. Results show that mothers scored highly on the SCL-90-R (M = 71.12, SD = 43.47). This corresponds to a GSI of .76, which exceeds the published clinical cut-off score (GSI = .57). Accordingly, when applying the GSI score as criterion, a substantial percentage of mothers (63%) were identified as severely distressed.
Table 2

Basic descriptive indices, internal consistencies, and PDTS scores of the short versions, and correlations with the original SCL-90-R and with the BDI

Versionsa

Items

Mean

SD

Cronbach’s alpha

PDTSCTTb

SCL-90-R GSI

BDI

(n = 100)

  

Correlations (n = 94)

 

α

in %

rtc

rtc

SCL-90-R GSI

90

71.12

43.47

.96

69.70

 

.75**

BSI

53

40.79

27.43

.94

62.69

.98**

.72**

SCL-27

27

18.67

13.49

.88

48.51

.95**

.71**

HSCL-25

25

23.98

15.03

.90

53.10

.95**

.70**

BSI-18

18

12.62

10.13

.86

43.06

.93**

.66**

SCL-11

11

9.88

7.63

.82

37.91

.92**

.65**

SCL-10N

10

7.84

6.36

.77

35.60

.91**

.68**

SCL-10R

10

9.38

6.65

.78

39.49

.91**

.66**

SCL-10S

10

9.96

6.97

.80

41.04

.94**

.69**

SCL-K-9

9

10.70

6.90

.80

39.88

.90**

.66**

SCL-6

6

5.40

2.84

.77

34.89

.87**

.61**

SCL-5

5

5.85

4.35

.79

37.04

.83**

.58**

aFor reference of short versions see text

bProbability of obtaining two statistically different test scores (Müller 2006a). The subscript CTT indicates the underlying test theoretical model, which is here the classical test theory

Internal Consistency

The internal consistency of the complete SCL-90-R was high (Cronbach’s alpha = .96). Cronbach’s alpha scores of the short versions were, as expected, in a satisfactory range between .77 and .94, with higher internal consistency in the longer scales. Consequently, according to Cronbach’s alpha, there are no major concerns about the psychometric properties for the short forms. The BSI with 53 items showed the highest internal consistency among the short scales (Cronbach’s alpha = .94).

Probability of Obtaining Two Statistically Different Test Scores

In contrast to internal consistency, scales differed remarkably with regard to their probability of achieving two statistically different test scores (PDTS; Table 2). The PDTSCTT scores of the short scales ranged between 34.89% (SCL-6) and 62.69% (BSI). The complete SCL-90-R performs ‘good’ in distinguishing test results with a PDTSCTT of 69.70%. This means that in approximately 70% of comparisons of two arbitrarily selected subjects, the SCL-90-R detects a significant difference. This quality is diminished when the SCl-90-R is shortened.

Association with Validity Criteria

Shortening a long scale also affects its internal validity, here represented by the correlation with the GSI, and its concurrent validity, here represented by the correlation with the BDI. All short versions correlated strongly and significantly with the GSI of the SCL-90-R (see Table 2). Correlations ranged between .83 (SCL-5) and .98 (BSI) and do not suggest a decimation of validity. Accordingly, all short versions correlated strongly and significantly with the BDI sum score, however not as strongly as with the GSI. Correlations ranged from r = .58 (SCL-5) up to r = .72 (BSI).

Sensitivity, Specificity, Positive and Negative Predictive Value

In order to evaluate the screening accuracy and effectiveness of the diverse SCL short forms, we have considered test sensitivity (SE), specificity (SP), positive (PPV) and negative predictive value (NPV; see Table 3). The published cut-off score of the SCL-90-R (GSI ≥ .57) served as the criterion for clinical distress. According to the high base rate of severely distressed persons in our sample (63%; n = 37 mothers obtained a GSI score below .57), we chose the 37th percentile as the theoretically optimal cut-off score for all short scales. As can be seen in Table 3, the selected cut-off score resulted in good sensitivity, specificity, positive and negative predictive values for all short scales. Most scores exceeded .90 (SE, PPV) or .80 (SP, NPV), respectively, with the exception of the SCL-5, which showed slightly lower accuracy and effectiveness values. Again, the best values were obtained by the two longest scales, SCL-27 and BSI, with a sensitivity index above 98%, a specificity index above .90, and comparably good effectiveness scores (PPV > .98; NPV > .94). Yet the SCL-11, the SCL-10N and the SCL-10S also obtained good values despite their brevity (see Table 3).
Table 3

Sensitivity (SE), specificity (SP), positive (PPV) and negative predictive value (NPV) of the short versions of the Symptom Checklist (SCL-90-R)

Short versionsb (n = 100)

Cut-off

SE

SP

PPV

NPV

BSI

23.00

98.41

91.89

98.41

97.30

SCL-27

10.00

98.41

94.59

98.39

94.74

HSCL-25

14.00

95.24

86.49

93.55

86.84

BSI 18

6.33

96.83

83.78

96.55

83.33

SCL-11

5.00

98.41

89.19

93.55

86.84

SCL-10N

4.00

93.65

94.59

96.72

89.78

SCL-10R

5.00

92.06

81.08

91.38

76.19

SCL-10S

5.33

96.83

83.78

96.67

87.50

SCL-K-9

7.00

90.48

83.78

90.48

83.78

SCL-6

2.00

90.48

89.19

93.44

84.62

SCL-5

3.00

85.71

72.97

91.84

64.71

aSensitivity (SE), specificity (SP), positive (PPV) and negative predictive value (NPV) were calculated for the theoretically optimal cut-off score (c = 37th percentile); caseness criterion: SCL-90-GSI ≥ .57

bFor references of the short versions see text

Discussion

In the present paper, eleven short versions of the Symptom Checklist-90-Revised (SCL-90-R) were compared with regard to psychometric properties on the basis of a clinical sample; in order to help test users to select the most efficient and informative short form of the SCL-90-R. Among the evaluation criteria were internal consistency, a recently proposed test criterion PDTS, validity indices, and screening accuracy and efficiency.

As expected, the internal consistencies of all short versions appear satisfactory (α between .77 and .96), and thus for most test users there would be no limitations in test use. Furthermore, limitations by the number of possible test results (which are limited by very short tests) and test score distribution are not displayed by internal consistencies. Therefore the internal consistencies cannot indicate clearly which short forms should be preferred. For such comparison, the PDTS score was suggested (Müller 2006a). This index describes the ability of a test to separate subjects by their test results. The PDTS is based in this study on the classical test theory and is therefore indicated by PDTSCTT. The PDTSCTT scores show a much broader range from 34.89% up to 69.70%. With this score, differences between the short forms are also numerically displayed. Differences greater than 2% are considered to be significant according to the standard error of the PDTSCTT, which is below 1% for even smaller samples (Müller 2006a). Moreover, we have defined ranges on the PDTS scale from ‘very poor’ to ‘excellent’ in order to help interpretation of PDTS scores.

As expected, the original SCL-90-R performed best in separating subjects by their test score (PDTSCTT = 69.70%), which is, expressed in standardized terms, a ‘good’ performance. Practically, this means that any two persons who were assessed with the SCL-90R would have a 70% chance of differing significantly in their test scores. The BSI, the HSCL-25 and the SCL-27 (with a PDTSCTT of 62.69%, 53.10% and 48.51%, respectively) are in a ‘moderate’ range, and performed numerically clearly below the original SCL-90-R. The very short-forms show a PDTSCTT in a ‘poor’ range with scores between 34.89% and 43.06%.

A different aspect of test quality concerns validity. We have shown that all short versions correlated strongly and significantly with the GSI of the SCL-90-R. The highest correlation with the GSI was achieved by the BSI (r = .98), while the “lowest” correlations with the GSI were found in the very short scales SCL-5 (r = .83) and SCL-6 (r = .87). Obviously, even these correlations are remarkably high, especially in view of the brevity of the two scales. These findings strongly imply that all short versions do indeed embody the main contents of the full-length SCL-90-R. Similar results were found for the correlation of the short versions with the BDI. We found moderate to strong correlations between the BDI and all short scales. Again, the BSI showed the highest correlation and the shortest versions showed the lowest, which were still about r = .60. These results underline the good validity of the SCL short scales as measures of psychological distress, with a focus on depressive symptoms which are the shared content basis of all SCL-90-R short versions.

Finally, the accuracy and effectiveness of the short scales as screening instruments were examined, considering sensitivity, specificity, and positive and negative predictive value. In order to conduct an adequate comparison, we used a cut-off score at the 37th percentile, which is justified by a base rate of 63% cases. Results showed good screening accuracy and effectiveness for all scales, except the very short SCL-5. In accordance with our previous results, the best sensitivity, specificity, and positive and negative predictive values were obtained in the longest scales from all short versions (BSI, SCl-27). Note that of all versions the full SCL-90-R shows the highest psychometric quality and should be applied whenever there is sufficient time.

Conclusion

We see two possible areas of application for the more economic short forms: as screening instruments in order to build the groups “distressed” and “non-distressed”, and as more dimensional instruments for assessing distress. The latter scenario may occur more often in research studies, where the best possible compromise between economic assessment and tapping the full potential of information is sought. For both scenarios, we have different suggestions.

In the context of screening (screening instruments here are defined as questionnaires not longer than 15 items) the performance of the SCL-10S was slightly superior over all indices, and the questionnaire showed the best compromise between economy and accuracy. However, it depends on the diagnostic question posed whether the type I or the type II error should be minimized, and the decision for a specific questionnaire should therefore be based on the sensitivity or the specificity, respectively (see Table 3). Note that all questionnaires with less than 15 items performed ‘poorly’ according to the PDTSCTT criterion, which indicates that short questionnaires should only be applied for dichotomous decisions and not as a continuous measure.

For research purposes, the PDTSCTT criterion uncovers the more subtle quality differences between the short scales. This is important when the objective of assessment is to depict differences or slight changes in symptom severity (e. g. in the evaluation of treatment outcome). Considering the PDTSCTT scores obtained in this study, the SCL-90-R should not be shortened more than was done in the BSI (with a ‘good’ PDTSCTT) with 53 items, the HSCL-25 (‘moderate’ PDTSCTT) or the SCL-27 (‘moderate’ PDTSCTT).

In this study, we used the BDI as validity criterion, as depressive symptoms are very common and a central aspect of general psychopathology. In order to broaden the generalizability of the results to other aspects of general psychopathology, the external validity of the short versions recommended here should be examined in further research. This research should rely on additional validity criteria, such as the general psychopathology index of the clinical scales of the MMPI (Butcher et al. 1989), the Basic Personality Inventory (Jackson 1989), the Depression Anxiety Stress Scales (Lovibond and Lovibond 1995), or the Mental Health Questionnaire (Ware and Sherbourne 1992).

Finally, note that the conclusions drawn from this study are limited to the specific and relatively small population of mothers who were examined here. As all participants were female, predominately German, and exclusively European, the results may not be generalized to males and to people from other nations, ethnic or racial groupings. The applicability and psychometric properties of the short versions in other clinical and non-clinical samples still have to be examined in further studies with larger and more heterogeneous samples.

Footnotes
1

In this paper, the letters “N” and “S” were added to the name “SCL-10” to distinguish the scales by Nguyen et al. (1983) and Strand et al. (2003). The letter “R” in SCL-10R (Rosen et al. 2000) means “revised”.

 
2

The index is fully described by a subscript for the underlying test model, which is here the classical test theory and abbreviated with PDTSCTT.

 

Expertise

Statistical methods; Analyses of questionnaires; Research consultant

Projects

Funding of the project ‘Development, Application and Meta-Analysis of test index of psychological assessment’

Copyright information

© Springer Science+Business Media, LLC 2009