Introduction

In parallel with the progress made in the fields of psychometrics, neurobiology and psychopathology psychometric instruments for assessing the negative syndrome of schizophrenia have evolved over the past 30 years to better meet the requirements imposed by our current understanding of negative symptoms and signs (Fig. 1). One of the most important milestones in this evolution has been the NIMH initiative of sponsoring a consensus development conference on various aspects of the negative syndrome, with a structure similar to that used with the dimension of cognition (MATRICS). One of the objectives of the conference, named the NIMH–MATRICS Consensus Statement on Negative Symptoms, was “to develop or identify widely acceptable, evidence-based measures and methodologies needed to establish the efficacy of treatments that target negative symptoms” [1].

Fig. 1
figure 1

Evolution of psychometric instruments for assessing the negative syndrome of schizophrenia: first- versus second-generation instruments

Recommendations of the NIMH–MATRICS Consensus Statement on Negative Symptoms included the development of a new instrument assessing the five agreed upon domains of the negative syndrome and to form a workgroup for achieving this goal [1]. Following this recommendation, the Collaboration to Advance Negative Symptom Assessment of Schizophrenia (CANSAS) was created and funded by the NIMH with the mission of developing the clinical assessment interview for negative symptoms (CAINS) [2]. Kirkpatrick et al. [3] also following the recommendations of the NIMH–MATRICS Consensus Statement on Negative Symptoms developed another new instrument, the Brief Negative Symptom Scale (BNSS).

In the light of these advances, instruments are categorized into two classes; older and newer instruments [4] or first and second generation [5]. This last classification is according to new conceptual formulations and content validity and assessment approach of the instruments [5]. First-generation instruments have more content validity problems than second-generation instruments as they do not accurately reflect the currently accepted negative syndrome (Table 1). On the one hand, they do not include all negative symptoms (asociality, avolition and anhedonia) and signs (blunted affect and alogia) that constitute the negative syndrome of schizophrenia, and on the other, they include some symptoms from other dimensions, mainly the cognitive dimension. They also have more problems relative to the use of behavioural referents instead of internal experiences of deficits when assessing symptoms, which may lead to measuring functioning instead of negative symptoms.

Table 1 Comparison of the negative symptoms and signs assessed by available instruments

Methods

This is a selective review of data on the state of psychometric evaluation of the negative syndrome of schizophrenia. Relevant literature on the instruments addressed in this paper was analysed. Articles were considered relevant if they described (1) the conceptual development of the instrument or (2) its psychometric properties.

First- versus second-generation instruments

Based on our classification of negative instruments [5], the Brief Psychiatric Rating Scale [6], the Scale for the Assessment of Negative Symptoms [7], the Subjective Experience of Negative Symptoms [8] and the Positive and Negative Syndrome Scale [9] belong to the first generation, while the Brief Negative Symptom Scale [3], the Clinical Assessment Interview for Negative Symptoms [2] and the Motivation and Pleasure Scale—Self-Report [10] belong to the second generation. The Negative Symptom Assessment [11] can be considered a transitional instrument between the two, as it better covers the negative syndrome and relies less on behaviours and performance when scoring than do the first-generation instruments.

The Brief Psychiatric Rating Scale (BPRS)

The BPRS is one of the most widely used scales in Psychiatry. It was developed in 1962 by Overall and Gorham [6] to measure psychiatric symptom changes (including positive, negative and affective psychopathology) in clinical drug studies with patients with schizophrenia or other psychotic disorders. Originally, the scale had 16 items, but since them two different versions were developed; one in 1976 with 18 items [12], and other, the BPRS-Extended version (BPRS-E), in 1986 with 24 items [13]. Each item is rated using a 7-point Likert scale (1–7).

Different factor models of the BPRS and BPRS-E are available in the literature. Most of them found 4–5 factor solutions. Generally speaking, these factors refer to positive (thought disturbance, hostility–suspiciousness), negative (apathy, anergia, withdrawal), agitation (animation, disorganization, activation) and mood symptoms (depression–anxiety, mood disturbances, affect) [1417]. The most frequent included items in the BPRS/BPRS-E “negative” factor was: blunted affect, emotional withdrawal and motor retardation. Some analysis also included: disorientation, self-neglect, mannerism and posturing, and uncooperativeness. Dingenmans et al. found a moderate internal consistency for their BPRS-E four-component scales with Cronbach’s alpha ranging from 0.76 (depression scale) to 0.64 (mania scale), while Burlingame et al. [18] demonstrated its sensitive to change.

The Scale for the Assessment of Negative Symptoms (SANS)

Developed by Andreasen in 1983 [7], it is one of the most widely used scales for assessing negative symptoms [1]. It consists of 25 items that evaluate five symptom factors—affective flattening or blunting, alogia, avolition/apathy, anhedonia/asociality and attention—using a 6-point Likert scale (from 0 to 5) where higher scores mean greater symptom severity. The scale provides a global rating on each symptom factor and a total score sum of the scores on each item. These five factors have also been found by Peralta and Cuesta [19]. These authors found an inter-rater reliability of 0.72 and an internal consistency (Cronbach’s alpha) of 0.89, although the corrected alpha was 0.29. The Pearson’s correlation coefficient between the negative subscale of the PANSS and the SANS was 0.80 [20].

An interesting and useful initiative has established the concordance between the clinical global impression (CGI) and SANS total score using the equipercentile linking method [21]. It found that each increment of CGI severity corresponds to an average 16-point increment in the SANS total score. It also found that each increment of CGI change corresponds to a mean SANS percentage change of 19 %.

Although the NIMH Consensus Statement on Negative Symptoms [1] accepted this scale for evaluating negative symptoms, it has a number of drawbacks that call its validity into question today. Firstly, there is a failure to replicate the established five-factor structure [22] and secondly, its content validity. The symptom factor of attention refers to the patient’s overall concentration capacity that belongs to the cognition dimension instead of the schizophrenia negative dimension. Furthermore, two of the nuclear negative symptoms are grouped together to constitute one single factor, anhedonia/asociality, with one single rating. Finally, some of its ratings rely on behavioural or performance deficits instead of experiential deficits.

A shortened version was recently developed using the item response theory (IRT) [22]. The short SANS contains 11 items that evaluate three symptom factors—affective flattening, asociality and alogia/inattentiveness—using a shortened 3-point Likert scale (moderate, marked and severe) for ratings. The authors themselves acknowledge that this shortened version has some limitations including no assessment of some symptoms of the negative dimension such as anhedonia and avolition. However, they should have also mentioned the problem of over-inclusion of symptoms that do not belong to the spectrum of the negative dimension such as inattentiveness.

In 1993, Selten et al. [8] developed a semi-structured self-rating interview from the SANS, the Subjective Experience of Negative Symptoms (SENS). It consists of 21 items taken from the SANS. For each item, the interviewer asks about the patient’s awareness of the symptom. The patient makes his/her rating using a 5-point Likert scale (from 1 to 5) where only values of 1 and 2 reflect patient awareness of the symptom. For each item with a rating of 1 or 2, the interviewer asks the patient about his/her causal attribution and the distress level caused by the symptom. The authors found that, compared to clinicians, patients underestimated their negative symptoms both in terms of frequency and severity [23].

The Positive and Negative Syndrome Scale, Negative Subscale (PANSS-NS)

The PANSS is a clinician-rated semistructured interview developed by Kay et al. [9] based on the classification of schizophrenia into types I and II by Crow in 1980 [24], thus providing a balanced representation of positive, negative and general psychopathology symptoms. It is the most established scale for assessing patients with schizophrenia [25].

Its negative subscale, the PANSS-NS, consists of seven items: blunted affect, emotional withdrawal, poor rapport, passive social withdrawal, difficulty in abstract thinking, lack of spontaneity and flow of conversation, and stereotyped thinking. Each item is rated using a 7-point Likert scale where higher scores represent greater severity and is accompanied by a complete definition and detailed anchoring criteria for the rating points. It provides a global negative score that is calculated by adding the scores on the seven items.

As in the case of the SANS, the PANSS-NS has limitations related to its content validity. On the one hand, it includes clearly cognitive symptoms such as difficulty in abstract thinking, and on the other, it does not incorporate highly relevant symptoms commonly identified as part of the domain of negative symptoms, such as avolition and anhedonia. Due to these limitations, the PANSS negative factor (PANSS-NF) obtained by factor analysis is preferred [26]. This negative factor [24] includes five items from the negative scale (blunted affect, emotional withdrawal, poor rapport, passive social withdrawal, and lack of spontaneity and flow of conversation) and two items from the general psychopathology scale (motor retardation and active social avoidance).

Respect to its psychometric properties [9], its reliability both in terms of internal consistency and test–retest reliability can be considered moderate. Thus, Cronbach’s alpha and Pearson’s correlation coefficients were 0.73, 0.83 and 0.79, and 0.80, 0.68 and 0.60 for the positive, negative and general psychopathology scales, respectively. Its inter-rater reliability was tested among raters of different nationalities participating in two international trials [27]. The Kappa value reported for ratings of the negative subscale were 0.84, similar to that reported for the Marder negative factor (0.82).

Using the IRT, Khan et al. [28] demonstrated that the four first negative symptoms of the PANSS-NS discriminated well and reflected dimensional individual differences. Furthermore, they proposed a six-item Negative Mini-PANSS including all items from the PANSS-NS except item 5 (difficulty in abstract thinking). However, in 2013, the same authors [29] proposed a new integrated negative symptom factor by reviewing all published principle component analyses of the PANSS, extracting the items most frequently included in the negative domain and examining their quality using the IRT. This new negative factor contains the following nine items: emotional withdrawal, blunted affect, passive/apathetic social withdrawal, poor rapport, lack of spontaneity/conversation flow, active social avoidance, disturbance of volition, stereotyped thinking and difficulty in abstract thinking.

The PANSS-NS also has problems due to reliance primarily on performance and behaviour instead of experiences when scoring. Obermeier et al. [25] noted severe mathematical problems as well and recommended rescaling the PANSS options. They highlighted the necessity of transforming the interval scale level into a ratio scale (subtracting 30 from the total score) when using PANSS percentage changes as an outcome measure, thus avoiding underestimation of the true response rates.

Recently, van Erp et al. [30] have developed conversion equations between scores on the SANS and PANSS-NS and PANSS-NF. Although they found high correlation coefficients (0.71–0.84), they concluded that further studies are needed in order to validate these equations [30].

The Negative Symptom Assessment (NSA)

Initially developed by Alphs et al. in 1989 [11], the most widely used version is the shortened 16-item version of the NSA developed by Axelrod et al. in 1993 [31] with a latent structure similar to the original instrument. The NSA-16 is a semi-structured interview containing 16 items that comprehensively assess the negative syndrome of schizophrenia, including the following factors: communication, emotion/affect, social involvement, motivation and retardation [31]. Items are rated using a 6-point Likert scale (from 1 to 6) where higher scores reflect greater impairment. Detailed anchoring criteria for the rating points are provided in the scale, along with a total score, sum of the scores on the 16 items and a global negative symptom rating based on the global clinical impression of the patient’s negative symptoms.

The main limitation of the NSA is its high reliance on functioning or behaviours even for experiential symptoms such as reduced social drive whose severity is measured by type and frequency of social interactions. With respect to content validity, item 3 impoverished speech content, which refers to the amount of information given by the patient, would fit better in the cognitive than in the negative dimension of schizophrenia.

A confirmatory factor analysis demonstrated that the five-factor model described above better defined its dimensional structure (Comparative Fit Index 0.92) [31]. In addition, the authors shown that the NSA-16 had high internal consistency (Cronbach’s alpha = 0.92). Its inter-rater reliability was tested among raters of different nationalities participating in two international trials [27]. The Kappa value reported for ratings of the NSA-16 was 0.89.

In 2010, Alphs et al. [32] developed a simplified version selecting four items from the NSA-16 (restricted speech quantity, reduced range of emotion, reduced social drive and reduced interests) based on their psychometric properties. This four-item version also includes a global negative symptom rating. The shortened version has demonstrated overall accuracy and predictive validity similar to the NSA-16 [32]. The NSA-4 is an interesting instrument for use in daily clinical practice in view of its brevity and psychometric quality.

The Brief Negative Symptom Scale (BNSS)

Based on the guidelines of the NIMH–MATRICS Consensus Statement on Negative Symptoms [1], in 2011, Kirkpatrick et al. [3] developed a concise new instrument appropriate for assessing negative symptoms in clinical trials. The BNSS consists of 13 items organized into six subscales: anhedonia (three items), distress (one item), asociality (two items), avolition (two items), blunted affect (three items) and alogia (two items). It provides a total score (sum of the scores on the 13 items) and six subscale scores (sum of the scores on the subscale items). The authors recommend using the total score as the primary outcome measure in clinical trials and the six subscale scores as secondary measures [33].

This new instrument not only evaluates the five negative symptoms identified in the consensus statement but also distinguishes between anticipatory anhedonia more characteristic of schizophrenia and related to motivation and goal-directed behaviour, and consummatory anhedonia more typical of depression [34]. In addition for two negative symptoms—avolition and asociality—the BNSS differentiates between internal experiences and behaviours [35]. Although this distinction may help to clarify and assess the true nature of the negative symptoms, i.e. internal experiences, evaluating behaviours may lead to the limitation of assessing functioning instead of negative symptoms. The authors also included an affective item that assesses lack of normal distress as this has shown to be a core predictor of primary and enduring negative symptoms [36]. Lastly, two negative signs—blunted affect and alogia—are assessed by observing facial, vocal and gestural expression, quantity of speech and spontaneous elaboration.

The BNSS internal consistency was excellent with a Cronbach’s alpha coefficient of 0.93 [3]. Likewise, the inter-rater reliability was excellent with intraclass correlation coefficients for the total and for each subscale scores of 0.96 (total), 0.95 (anhedonia), 0.89 (distress), 0.92 (asociality), 0.91 (avolition), 0.92 (blunted affect) and 0.93 (alogia) [3]. The BNSS shown high temporal stability with a 1-week correlation coefficient of 0.81 [3]. Its convergent validity was good as demonstrated by the correlation coefficients between the BNSS total score and the SANS total, BPRS negative and PANSS negative subscale scores (0.84, 0.68 and 0.80, respectively) [3, 35]. With regard to the discriminant validity, as expected, there was no statistically significant correlation with the BPRS positive score (r = −0.06) nor with the PANSS positive subscale score (r = 0.09) or with the cognitive function as measured by the WASI (r = −0.13), while the scale showed moderate correlation with the PANSS general psychopathology score (r = 0.4) [3, 35].

A principal axis factor analysis demonstrated that the BNSS has a two-factor structure that resembles the dimensions thought to underlie the negative syndrome, i.e. anhedonia/avolition/asociality (accounts for 57–57.9 % of the variance) and emotional expressivity (accounts for 14–10.8 % of the variance) [3, 35]. However, a recent paper [37] has shown that the best solution was a three-component structure, i.e. external world—anhedonia and asociality—(accounting for 54.8 % of the variance), inner world—avolition and blunted affect—(14.0 % of the variance) and alogia (8.6 % of the variance). Moreover, given the low correlation between alogia and the other two components (with external world: −0.117, with inner world: 0.179), the authors raised the question of the content validity to include alogia in the negative syndrome instead of in the cognitive.

The Clinical Assessment Interview for Negative Symptoms (CAINS)

Like the BNSS, the CAINS was also developed based on the recommendations of the NIMH–MATRICS Consensus Statement on Negative Symptoms [1] to overcome the limitations of the first-generation instruments. For the development of the CAINS, the Collaboration to Advance Negative Symptom Assessment of Schizophrenia (CANSAS) was funded by the NIMH.

Based on its preliminary versions, the CAINS-beta [38] and the CAINS-beta2 [39], and taking a multistep data-analytic approach, the authors developed and validated the final version of the CAINS [2]. In this process, the scale underwent several changes: the number of items was reduced from 23 to 13 as was the number of negative symptoms assessed (asociality was not included in the final CAINS as a separate dimension), the original 7-point Likert rating scale was reduced to a 5-point scale, and its conceptual approach was changed from a negative (amotivation, anhedonia, blunted affect and alogia) to a positive one (motivation, pleasure, emotion expression and speech).

Like the BNSS, the CAINS evaluates motivation and pleasure in a comprehensive way, assessing behavioural engagement in social, vocational and recreational activities and internal experiences of motivation, interest and emotion based on patient reports [40]. It thus has the same limitation, i.e. that functioning may be mistaken for negative symptoms. The CAINS evaluates the negative signs of emotion expression and speech based on observations during the interview. It also differentiates between anticipatory and consummatory anhedonia by asking about expected upcoming-week pleasure and past-week pleasure [2].

The CAINS consists of 13 items covering motivation, pleasure, emotion expression and speech. To ensure its inter-rater reliability, it was built as a clinician-rated standardized structured interview with comprehensive and descriptive anchor points, which facilitates the rating procedure. All items are rated using a 5-point Likert scale where higher scores mean greater deficits. The scale provides two subscale scores, although a single composite score of the two subscales can also be obtained [2].

With respect to the psychometric properties, its structural analyses yielded two relatively independent factors, motivation and pleasure in social, recreational and vocational areas (nine items) and expression—vocal prosody, gestures, facial expression and speech—(four items) [2], thus replicating the two-dimensional structure found in the CAINS-beta [39]. The internal consistency also supports this two-dimensional structure with Cronbach’s alpha of 0.74 for the subscale of motivation/pleasure and 0.88 for the expression [2]. The CAINS has demonstrated good rater agreement with intraclass correlation coefficients (ICCs) of 0.93 and 0.77 for the motivation/pleasure and expression subscales, respectively, while its 2-week test–retest reliability was moderate (correlation coefficients = 0.69 for both scales) [2]. The convergent validity with the BPRS negative symptoms subscore was low for the motivation/pleasure subscale (r = 0.28) and moderate for the expression subscale (r = 0.52), while its convergent validity with the SANS total score was moderate for both subscales (0.48 and 0.55, respectively). However, its convergent validity with self-report measures of pleasure and desire for close relationships, although statistically significant in the case of the motivation/pleasure subscale, was very low in magnitude (from 0.16 to 0.36). The convergent validity with self-report measures of sensitivity of approach and avoidance motivation systems was statistically significant only for the expression subscales, but again the correlations were very low in magnitude (0.15 and 0.29) [2]. As expected, the CAINS showed no significant correlations neither with depression scores nor with extrapyramidal or cognitive symptom scores. However, the motivation/pleasure subscale showed statistically significant correlations with both BPRS positive and agitation subscores although they were low in magnitude (0.31 and 0.18, respectively) [2].

Barch [41] emphasizes the significant progress that the CAINS constitutes for the assessment of negative symptoms. She highlights its development based on recent advances in affective neuroscience, its distinction between anticipatory and consummatory anhedonia, its adequate coverage of negative symptoms, its development procedure using large samples and its excellent psychometric properties. However, it is necessary to bear in mind that other scales reviewed in this paper also show these characteristics. She also points out some potential limitations of the CAINS such as its length and consequently significant time requirements, and the two-factor structure that potentially undermines its statistical power in clinical trials. Furthermore, the lack of specific dimension for the negative symptom “asociality” should merit further investigation.

The Motivation and Pleasure Scale: Self-Report (MAP-SR)

The MAP-SR is the self-reported version of the motivation/pleasure subscale of the CAINS. Its preliminary version, the Clinical Assessment Interview for Negative Symptoms—Self-Report (CAINS–SR) [42], contained 30 items and assessed thefive domains/negative symptoms assessed by the CAINS-beta2 [39]. Since the expression subscale of the CAINS–SR did not show good convergent validity with the CAINS-beta2 expression subscale [42], it was removed and a new instrument, the MAP-SR, was derived.

The MAP-SR consists of 15 items that measure deficits in motivation and pleasure. The items are grouped into four subscales (social pleasure, recreational or work pleasure, feelings and motivations about close, caring relationships, and motivation and effort to engage in activities) [10]. As with the CAINS and the BNSS, both types of pleasure—consummatory and anticipatory’—are assessed. All items are rated using a 5-point Likert scale where higher scores reflect less impairment.

Llerena et al. [10] demonstrated an excellent internal consistency (Cronbach’s alpha = 0.90) and moderate convergent validity between patient (MAP-SR) and clinician (CAINS) ratings of motivation and pleasure (r = 0.65). This is contrary to the results reported by Selten et al. [14] on the discrepancy between patient and clinician ratings using the SENS and the SANS, and by Kring et al. [2] on the lack of agreement between the CAINS and the self-report measures of pleasure and motivation (see the CAINS section). Unfortunately, the authors did not give information on the convergent validity with the BPRS negative symptom subscale score. With respect to the discriminant validity, the authors did not found statistically significant correlations with the BPRS positive and depression/anxiety subscale scores. However, they unexpectedly found a moderate correlation between the MAP-SR and the Agitation/Mania subscale of the BPRS; thus, they conclude that further research is needed as a previous study did not found this correlation [42].

Challenges in assessment of the negative syndrome

From our point of view, there are two major challenges in the assessment of negative symptoms. The first is to determine whether the available instruments actually evaluate primary negative symptoms or conversely do not differentiate between primary and secondary symptoms. If this is the case, new instruments should be developed specifically for evaluating primary negative symptoms due to their relevant therapeutic implications.

Despite all efforts to develop new tools or transform the old ones using psychometric techniques that better fit our current knowledge of the negative syndrome, the referents used for its assessment continue to be focused, at least partially, on behaviours. This makes it difficult to distinguish between functioning and negative symptoms and may account for part of the high correlation found between these two dimensions. Therefore, the second challenge is to develop forms of assessment that better capture the inner experiences involved in negative symptoms rather than external behaviours. In this sense, evidence from the NIMH Research Domain Criteria (RDOC), primarily from the Positive Valence and Social Processes systems, may be of great value.

In addition to these two major challenges, it would be advisable that in future editions of these tests described, as well as in the new ones being developed, to use the recent psychometric advances, which can improve both, the test development itself, and the data analyses. To quote a couple of examples, the authors could incorporate technology such as differential item functioning (DIF) or computerized adaptive testing (CAT).

In conclusion, although new instruments that more accurately represent our current understanding of negative symptoms and signs have been developed, further efforts are needed to better capture the essence of primary negative symptoms.