Assessing DSM-IV nicotine withdrawal symptoms: a comparison and evaluation of five different scales
- First Online:
- Cite this article as:
- West, R., Ussher, M., Evans, M. et al. Psychopharmacology (2006) 184: 619. doi:10.1007/s00213-005-0216-z
- 371 Views
This study evaluated four of the major scales used to measure nicotine withdrawal symptoms plus one new scale.
Eighty-three smokers were randomly assigned to continue smoking (n=37) or abstain completely for 24 h (n=46), by which time the symptoms should become manifest. Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) withdrawal symptoms (irritability, depression, restlessness, insomnia, anxiety, hunger and poor concentration) plus craving were measured at baseline and after 24 h. The scales tested were the Minnesota Nicotine Withdrawal Scale (MNWS), the Mood and Physical Symptoms Scale (MPSS), the Shiffman Scale (SS), the Wisconsin Smoking Withdrawal Scale (WSWS) and the newly developed Cigarette Withdrawal Scale (CWS).
Measurement of withdrawal symptoms was robust in the case of all scales for total withdrawal score, irritability, restlessness, poor concentration and craving. The MNWS and CWS were less sensitive to depression; the WSWS and MNWS were less sensitive to insomnia; the MPSS was less sensitive to anxiety and hunger; the CWS and WSWS did not include restlessness as a distinct symptom; the SS did not include insomnia, and its scores tended to decline over time during ad lib smoking. Longer scales, using multiple items to measure each symptom, did not yield more reliable or accurate measurement than briefer scales.
To measure total withdrawal discomfort or craving, all of the scales examined can be recommended, and there is little to choose between them apart from length. When it comes to assessing individual symptoms, different scales have different strengths and weaknesses. There would be merits in developing a new questionnaire that combined the best features of the scales tested.
Cigarette smoking is declining slowly in the US and some other Western countries, but it is increasing in other countries. It leads to the premature death of some 400,000 people in the US and 4.9 million people worldwide (World Health Organization 2002). It is accepted that most cigarette smokers have difficulty stopping because they are addicted to nicotine (US Department of Health and Human Services 1988; Royal College of Physicians 2000). Part of this addiction involves the emergence of withdrawal symptoms when smokers abstain. These symptoms can be unpleasant and, in some cases, may cause smokers trying to stop to relapse (e.g., West et al. 1989a,b). The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) lists as symptoms irritability, restlessness, insomnia, anxiety, depression, increased appetite and poor concentration (American Psychiatric Association 1994). Since DSM-IV was published other symptoms have been proposed (constipation, mouth ulcers and upper respiratory tract infections), but the DSM list comprises the core items that are routinely assessed. It has also been proposed that craving or urges to smoke should be included, and these are routinely measured in clinical and experimental studies (see Hughes et al. 1994).
Accurate and efficient measurement of withdrawal symptoms is obviously important in helping to understand nicotine dependence and in developing improved methods of treating it. Over the past 25 years, a number of questionnaires have been developed for this purpose (Shiffman et al. 2004). While there is considerable overlap between the questionnaires, they vary in their coverage of symptoms, the number and wording of labels used to assess the symptoms, the response options and the systems for aggregating responses to derive quantitative indices of withdrawal. Some questionnaires have been in use for two decades, while others are more recent (see Shiffman et al. 2004).
With increasing attention being focused on the development of psychological and pharmacological aids to smoking cessation and the use of pharmacological aids to assist with temporary abstinence, there is a need to consider how different withdrawal scales compare. The scales may differ in a number of respects: sensitivity to abstinence, stability during continuing smoking (tendency for the mean score to change upon retesting), reliability (consistency of scores of individual smokers while still smoking), construct validity (association between scores and variables that would be expected to be associated with them), predictive validity (association with later relapse), length (affecting time to complete), coverage of the symptoms and sensitivity to interventions that affect symptom severity. A recent review of withdrawal symptom assessment examined what withdrawal symptoms should be measured (Shiffman et al. 2004), but in the absence of direct comparisons between scales, the review was not able to make a recommendation of what scales would be preferred and for what purposes.
The present study compared four of the most widely used measures of DSM-IV tobacco withdrawal, plus one more recently developed scale, through examining their sensitivity to abstinence, stability during continued smoking, reliability and construct validity. Predictive validity was not assessed. For four of the scales these parameters were also examined with regard to craving, which was omitted from DSM-IV but which evidence indicates is an integral part of the abstinence syndrome (see Hughes et al. 1994; West and Hajek 2004). A specific craving scale (the Questionnaire on Smoking Urges; QSU) has been developed, derived from a particular theoretical perspective (Tiffany and Drobes 1991). However, because the present study focused on broader measures of withdrawal symptoms, and there was a need to keep the questionnaire completion time within reasonable limits, the QSU was not included.
The withdrawal symptom scales assessed in the current study have varied in the assumptions underlying their development. The aim of the Mood and Physical Symptoms Scale (MPSS) (West and Hajek 2004) and the Minnesota Nicotine Withdrawal Scale (MNWS) (Hughes and Hatsukami 1986) were to be as concise as possible and only use one item for each withdrawal symptom, except that the MPSS uses two items for urges to smoke. The Shiffman Scale (SS), not the Shiffman–Jarvik scale (see later discussion) (Shiffman et al. 2000), Wisconsin Smoking Withdrawal Scale (WSWS) (Welsch et al. 1999) and the Cigarette Withdrawal Scale (CWS) (Etter 2005) typically use more than one item to measure each withdrawal symptom. In psychometric theory this would be expected to lead to greater precision of measurement because fluctuations associated with individual items would tend to balance out.
Another issue is that the multi-item scales have been developed using factor analysis. With this method individual items may be grouped together to measure a particular symptom because they correlate with each other. This can be an effective method of discovering common dimensions underlying a set of ratings. However, it can be misleading if responses to conceptually distinct items happen to correlate with each other because they have a common etiology and not because they reflect a common underlying state. For example, anxiety and depression are highly correlated symptoms and yet distinct entities. Thus, combining items to produce a single symptom score may lead to a blurring of symptom boundaries and even some symptoms being subsumed in others. For example, the CWS and WSWS do not have a distinct ‘restlessness’ symptom because items that relate to that construct are grouped with other items. Yet restlessness appears to follow a different time course from other symptoms, increasing briefly and then decreasing to a level below that while still smoking within about 4 weeks (e.g., West et al. 1987).
The scales also differ in the number of points on the rating scales (see Appendix). The SS uses 10 points, while the other scales use just 5 points. A 10-point scale offers the potential for finer discrimination, but if such discrimination cannot genuinely be made by subjects, their choice of response might be influenced by extraneous factors creating noise or bias. The choice of labels for the items and for responses also varies. The application of labels to mental states is subject to considerable uncertainty, and it may be that minor differences in wording affect the responses made. For example, the MPSS just uses the word ‘anxious’ for the symptom bearing that label, while the MNWS uses ‘anxious/nervous’. It has been reported that a simple rating of anxiety correlated well with the State–Trait Anxiety Inventory (STAI) (West and Hajek 1997) multi-item anxiety measure, and both the simple rating and the STAI responded very similarly to cigarette withdrawal; surprisingly, in that study, they went down rather than up after 1 week of abstinence. However, It may be that applying other labels than ‘anxious’ evoke different meanings for subjects, and different results are obtained. The labels attached to response options can also affect responses. The MPSS and MNWS use labels such as ‘very’ that directly describe the severity of the feeling being canvassed, while the WSWS uses a more indirect ‘agree–disagree’ type scale, and the SS labels just the extremes of the response scales.
The above considerations may make little or no difference when it comes to assessment of withdrawal symptoms. The present study provided an opportunity to address these issues empirically. For example, we were interested to know whether multi-item measurements would produce more sensitive and reliable scales, whether the 10-point scale of the SS would lead to greater sensitivity, and whether measurement of anxiety that included related terms would show the expected increase following abstinence that had not previously been shown by the single label.
Surprisingly, there is only one study in the literature comparing withdrawal scales (Etter in press). The lack of studies may have arisen partly because some of the scales (the MPSS, SS and MNWS) were developed independently at around the same time, and all drew conceptually from the Shiffman–Jarvik questionnaire (Shiffman 1979) that predated them. In one published comparative study, the CWS was compared with the MNWS and WSWS using an innovative web-based data collection system (Etter in press). The study concluded that there was little to choose between the scales in terms of predictive or construct validity, but the methodology had the disadvantage that the abstinence period varied, abstinence could not be checked biochemically and it relied on retrospective reporting of symptoms. A prospective study comparing different withdrawal scales is needed.
Several scales were not included in the present study. It was considered that five questionnaires were probably the most that could be administered at one time. Important scales that were excluded were the Shiffman–Jarvik questionnaire (Shiffman 1979) and the Smoker Complaints Scale (Schneider and Jarvik 1984).
Apart from comparing the scales, this study examined different ways of assessing the effects of abstinence. Two main methods are used: calculating the change from baseline to post-abstinence and using the post-abstinence score with the baseline score as a covariate. The former provides a more readily interpretable figure, while the latter might be expected to maximise the chances of detecting abstinence effects. The theory behind this is well explained in Tabachnick and Fidell (2001). We were interested to examine how much gain in sensitivity would be achieved by using the covariate approach so that a judgement could be made as to whether this was sufficient to counterbalance the slight increase in complexity of interpretation.
Smokers were recruited through posters in stores and workplaces. It was made clear that smokers were being sought who were not seeking to stop for good. Potential participants were invited to make contact by telephone, at which point the study was explained to them, and they were screened for eligibility. The criteria were aged 18–65 years, not receiving psychiatric treatment, not pregnant and smoking ten or more cigarettes a day for at least 3 years. Participants who completed all assessments were given £25 to compensate them for their time. Eligible smokers who expressed an interest in participating were invited to attend the laboratory. A total of 83 smokers took part and were randomly assigned to either abstain for 24 h (n=46) or continue smoking (n=37). Two further smokers were allocated to the abstinence condition but did not manage to abstain. Approval was obtained from the local ethics committee, and all participants provided written consent.
Design and procedure
A mixed (repeated-measures and between-groups) design was employed. The repeated-measures factor was time (baseline vs 24 h later), and the between-groups factor was smoking (continued smoking vs abstinence for 24 h). The follow-up time was based on two considerations. First, previous research has shown that all the withdrawal symptoms, including cravings, are manifest within 24 h, so it would provide ample opportunity to quantify severity of withdrawal and compare the measures on this. Secondly, keeping the follow-up period short minimised the chance that subjects would smoke when they were supposed to be abstinent.
Characteristics of the sample
Abstinent group (n=46)
Continuing smoking group (n=37)
Number of quit attempts in last 5 years
Longest duration of abstinence (weeks)
Age of first puff on cigarette (years)
Age when daily smoking began (years)
Cigarettes per day
Cigarettes smoked on the day at baseline
Time since last cigarette (hours) at baseline
Baseline expired-air CO (ppm)
24-h expired-air CO (ppm)
Married or living with partner
The second visit took place 24 h after the first (at the same time of day). Abstinence was recorded and checked using expired-air carbon monoxide (<10 ppm), and all participants completed the five withdrawal symptoms measures again, the order being counterbalanced as before.
Each of the measures included in this study assesses some or all of the DSM-IV cigarette withdrawal symptoms (irritability, anxiety, restlessness, depression, insomnia, poor concentration and hunger) (American Psychiatric Association 1994) plus craving/urges to smoke (Hughes et al. 1994). The Appendix shows the items and their relationship to the symptoms specified by the authors of the scales. For withdrawal symptoms measured by more than one item, scores for each symptom were produced by taking the means of the individual ratings (reversed if appropriate). A total withdrawal symptom score was also calculated for all non-craving items by adding together the scores for individual symptoms. Except for the SS a total score, including the craving items, was also calculated.
The most important feature of the withdrawal symptom scales is the ability to detect and quantify the effect of abstinence. Therefore, for each withdrawal symptom of each scale, we compared the change from baseline between the two groups. We report the effect size for the between-group comparison in change score expressed as eta-squared (which varies from 0 to 1, and when multiplied by 100 provides the per cent variance accounted for). An alternative approach to using pre- to post-test change scores is to compare the abstinent and smoking post-test scores and use the pretest scores as a covariate. This has the advantage of maximising the utility of the pre-test score in explaining the post-test score variance and, therefore potentially, the ability of the score to discriminate abstinence from smoking. However, it has the disadvantage of moving away from a clearly interpretable quantity, which is ‘change from baseline attributable to abstinence’. We also present the results of this method of analysis using eta-squared values.
In addition, we were interested in how well withdrawal and craving scores from each measure would correlate with dependence as measured by the FTND. This could be construed as a measure of ‘construct validity’. It may be noted, however, that ‘dependence’ and withdrawal symptoms are not the same thing, and so there are many reasons why this correlation would be expected to be moderate at best. The interest here is in the relative size of the correlation of each withdrawal measure compared with the other measures. For this analysis, we selected just the abstinent group and examined the correlation between FTND scores and (a) change in total withdrawal and craving scores from baseline to abstinence and (b) post-abstinence withdrawal and craving scores with baseline scores used as a covariate. On the basis of earlier work (West and Hajek 2004) we expected that the covariance approach would yield higher correlations.
In many studies of withdrawal symptoms there is no control group that continues smoking. It is important to be confident that a particular scale is stable under conditions of continuing smoking. We assessed this by examining whether mean scores changed over 24 h in the group that continued smoking. This was assessed statistically by a paired t test of the baseline vs 24-h scores in those continuing smoking. We also examined the correlation between baseline and 24-h values in that group; this can be construed as ‘retest reliability’. As with construct validity, it is the comparison across the different measures that is important because there are extraneous sources of variation in the actual items measured that cannot be controlled.
We calculated that the size of the study sample would provide more than 80% power to detect changes in individual symptom scores as measured by any of the scales on the basis of previous studies with those scales. It was not possible to predict whether, or by what degree, one scale would be more sensitive to abstinence as compared with others nor how much they would differ on other parameters being assessed.
Table 1 shows the baseline characteristics of the two groups. There were no significant differences between the groups. Nor were there any significant differences at baseline between the groups for any of the withdrawal measures.
Mean (SD) withdrawal symptom scores at baseline and after 24 h
Eta squared (change method)b
Eta squared (covariance method)b
Total withdrawal score (excluding craving)
Total withdrawal score (including craving)
With regard to stability there was a tendency in the case of several scales for scores to decline significantly after 24 h of continued smoking (Table 2); most notably, the SS showed a decline in four symptoms after 24 h of continued smoking. One scale (MPSS) showed no decline on any symptom, and two scales (SS and WSWS) showed declines in total withdrawal scores, excluding craving.
Retest reliability of scales: correlations between baseline and 24-h scores in the continue smoking group
Total score (excluding craving)
Correlations between withdrawal scores and craving scores with measures of dependence (FTND) in the abstinent group
Total score (excluding craving)
Change from baseline
Baseline as covariate
Total score (including craving)
Change from baseline
Baseline as covariate
Change from baseline
Baseline as covariate
When assessing total withdrawal discomfort and craving, all the scales showed a high degree of sensitivity to abstinence and good reliability. Construct validity was good only when post-abstinence scores were used with baseline scores as a covariate, not when scores were calculated as changes from baseline. When assessing individual symptoms, all the scales that measured these symptoms show high sensitivity for detecting changes in irritability, restlessness, poor concentration and cravings. Other symptoms were assessed less robustly by some scales. The MNWS and CWS were less sensitive to depression, the WSWS was less sensitive to insomnia, while the MPSS was less sensitive to hunger and anxiety. Reliability was generally moderate or good for individual symptoms, although less so for the MNWS and the MPSS than the other scales. There was a tendency for the SS, and to a lesser extent other scale scores, to decline after 24 h of continued smoking. There was consistently greater sensitivity using the covariance method rather than change scores.
It seems that for total withdrawal discomfort and craving, all of the scales can be recommended. For individual symptoms it depends on which symptoms are most important for the study in question. One guiding principle could be to choose the scale that most efficiently assesses the effect of abstinence on any symptom. Under that criterion, irritability could be readily measured with just one item using the MPSS, depression would best be measured using the four items of the WSWS, hunger could be measured by either the SS or the CWS (although the latter combines this with weight gain which would only be relevant after a more extended period of abstinence), poor concentration and restlessness could be measured by single items using the MNWS or MPSS, insomnia would best be measured using the MPSS or the CWS, anxiety could be measured by the CWS and craving by the MPSS. Note that there is a craving measure associated with the SS that we did not test in this study.
The fact that anxiety reliably increased after 24 h of abstinence supports the view that it is a genuine withdrawal symptom and conflicts with our earlier finding (West and Hajek 2004). One possible explanation is that subjects who are only abstaining for a short period such as 24 h for the purposes of a study and are not trying to stop smoking permanently experience no increase in anxiety just prior to abstinence, and that the decrease observed in other studies was actually a normalisation in smokers who were discovering that their quit attempt, about which they were anxious, was succeeding. This is an issue that merits further study.
Major strengths of the present study were the fact that there was minimal dropout, there was a continued smoking group and the abstinent group received no supportive intervention such as nicotine replacement or psychological support, which might bias the results. They were not smokers wanting to stop, and so the results should not have been affected by issues such as worries about future relapse or feelings of achievement. However, this study was limited in a number of respects. First, it only covered 24 h of abstinence. While most symptoms peak around this time, hunger probably does not (West et al. 1989a,b). Nevertheless, it was possible for most of the scales to detect increases in hunger over this time scale. Secondly, it did not include symptoms that were not in DSM-IV. Symptoms such as constipation, cough/cold symptoms and mouth ulcers have recently been discovered and will probably need to be included in any DSM revision. These symptoms probably take longer than 24 h to develop, and so a future study should include a longer time interval. Thirdly, our comparisons between scales were based purely on examination of differences in eta-squared values and not of statistical comparisons between these. The replicability of these differences remains to be seen.
There was an increase in sensitivity achieved by using baseline scores as a covariate rather than calculating change scores. This suggests that where maximum sensitivity is needed; for example, when comparing the effects of interventions on withdrawal symptoms, as opposed to when the goal is to describe the phenomenon in a particular sample, the covariance method is preferable.
The development of a comprehensive, sensitive and valid questionnaire to assess withdrawal symptoms merits further consideration. A questionnaire that combined the best elements of all the current scales would be a useful step forward in what is an increasingly important area of research. We would also suggest that questionnaires include the items not covered in DSM-IV such as constipation so that these can be characterised more fully.