Introduction

Blepharospasm, literally “spasm of the eyelid”, is a focal dystonia characterized by involuntary eyelid closure (Hallett 2002). Although blepharospasm like symptoms can occur secondarily to certain neurologic or ophthalmic disorders or lesions, the etiology is often unknown; in that case it is referred to as idiopathic, essential, or primary blepharospasm (Hallett 2002). Blepharospasm can range in severity from mild cases that do not appreciably interfere with daily activities to severe cases that render individuals functionally blind, preventing them from driving, working, reading, and walking.

Botulinum neurotoxin (BoNT) injections are a primary symptomatic treatment for blepharospasm. BoNTs act by inhibiting the release of acetylcholine at the neuromuscular junction, which reduces excessive muscular contractions and helps normalize muscle activity. The effects of BoNTs are reversible and temporary, lasting approximately 3 months in the treatment of blepharospasm (Jankovic and Orman 1987; Nussgens and Roggenkamper 1995).

A number of different measurement tools and scales have been used to evaluate the effects of BoNT on various aspects of blepharospasm, including force of eyelid closure, severity of muscle spasms, and patient functional status (Cohen et al. 1986; Elston and Russell 1985; Scott et al. 1985). Today, the rating instruments have coalesced into several main clinical scales, including the Jankovic Rating Scale (JRS) and Blepharospasm Disability Index (BSDI). Historically, duration of the beneficial effect has been used as a stand-alone measure of BoNT efficacy (Nussgens and Roggenkämper 1997), but today is typically considered complementary to existing efficacy scales.

In any disorder, precise and meaningful measures of improvement or deterioration in signs and/or symptoms are important in order to evaluate the patient’s disease state and the effects of the therapy. This review examines the different methods of blepharospasm assessment, beginning with a historical overview of early measures and proceeding to a discussion of current scales and their use in BoNT clinical studies. The paper concludes with a discussion and suggestions of how best to assess the efficacy of blepharospasm therapy.

Historical overview of blepharospasm measurement

Force of eyelid closure

Early studies of BoNT injections examined force of eyelid closure in grams using an external device that prevents downward movement of the upper eyelid [e.g., (Scott et al. 1985)]. However, force of eyelid closure is not widely used as an outcome measure today, as treatments for blepharospasm would ideally resolve muscle spasms while leaving eyelid force normal. With the use of BoNT, paralysis of the orbicularis muscle is not necessary for optimal benefit. In this sense, we do not agree with the scale developed by Rahman et al. (2003) in which orbicularis function is assessed as the ability of observers to manually open the eyelids of patients who are attempting to close their eyelids forcefully.

Muscle spasm

Several different muscle spasm scales were reported in the 1980s. Tsoy et al. (1985) used a scale that ranged from 0 = absent to 4 = extreme to rate the intensity of orbicularis oculi muscle spasm, as well as brow spasm. A scale used by Cohen et al. (1986) combined spasm intensity and functional ratings that ranged from 0 = none, 1 = increased blinking caused by external stimuli, 2 = mild fluttering of the lids, not incapacitating, 3 = moderate spasm with mild incapacitation, and 4 = severe spasm resulting in incapacitation (unable to drive, read, etc.).

Fahn scale

The Fahn rating scale for blepharospasm, developed in the 1980s, consists of a movement subscale and a disability scale (Fahn 1985). The movement subscale incorporates the location of involuntary movements (one point for each muscle area), as well as one point for each external factor that influences the dystonia (sunlight, television, etc.) (Fahn 1985). The frequency of involuntary movements is rated according to the percentage of waking time that movements are present (five different frequency categories), and severity of eyelid closure is rated from 0 = absent to 4 = severe, forceful eyelid contractions. The disability subscale consists of seven everyday activities (driving, reading, television, movies, shopping, walking, housework/job) for which the degree of impairment is rated according to descriptors that increase in severity from 1 to 3 or 1 to 5 (Fahn 1985). An eighth item documents the need to wear sunglasses outdoors (1 point) and indoors (1 point).

Elston functional status scales

In the 1980s and early 1990s, Elston and Russell (1985) developed a series of scales that focused on the functional status of blepharospasm patients following BoNT injections. The first of these scales ranged from 1 = functionally blind (eyes shut for >80% of day) to 3 = inconvenienced (eyes shut for 10–30% of day). In a subsequent study, Elston and colleagues asked patients to evaluate the percentage of the day spent functionally blind before and after BoNT injections (Grandas et al. 1988). A third scale by this group ranged from 1 = blind, 2 = dependent outside home, 3 = independent, poor function, 4 = independent, moderate function, 5 = inconvenienced, 6 = normal (Elston 1992).

Current blepharospasm scales and their use in BoNT trials

Current blepharospasm rating scales are largely modified versions of previous scales that have been altered for ease of use or to remedy one or more inadequacies of previous scales. Current scales fall into three general categories: (i) clinical scales, (ii) activities of daily living/functional ability status scales, and (iii) global rating scales. Each of these scale types has been used in recent studies comparing the effects of different BoNTs in blepharospasm. In this section, we briefly review the three general types of scales and, for each type, consider the benefits and drawbacks based on its use in recent BoNT trials.

Overview of clinical scales

Clinical scales focus on clinical signs and/or symptoms and are rated by observers as opposed to patients. These scales often take the form of ordinal, numeric ratings anchored by descriptors. Blepharospasm clinical rating scales in use today are modifications of the frequency and/or severity scales developed by Fahn (1985).

The Jankovic Rating Scale (JRS) is probably the most widely used current clinical scale (Fig. 1) (Jankovic and Orman 1987). The two subscales that make up the JRS—severity and frequency—are 5-point scales ranging from 0 to 4, where 0 indicates no symptoms and 4 indicates the most severe or frequent symptoms. As can be seen from the descriptions in Fig. 1, the JRS primarily focuses on the objective signs of blepharospasm but does incorporate some subjective symptoms such as whether the increased blinking and spasms are incapacitating, as judged by the observer.

Fig. 1
figure 1

Jankovic Rating Scale (Jankovic and Orman 1987)

Advantages of the JRS include its relative simplicity and broad applicability for both patients and physicians. The evaluations required for the scale are readily performed in an office setting without the need for complex equipment or scoring procedures. Disadvantages of the JRS may include a lack of sensitivity to small changes in blepharospasm severity or frequency, particularly at the mild end of the spectrum where patients must change from increased blinking in response to external stimuli (a score of “1”) to “none” in order for an improvement in their condition to be documented on the severity scale. Additionally, the scale does not take into account how the patient’s blepharospasm affects his or her daily activities. It may be noted that a JRS sum score of 1 is not possible, as even mild symptoms (severity score = 1) occurring at the lowest frequency (frequency = 1) give a sum score of 2.

Table 1 lists published clinical studies that were randomized or enrolled at least 50 patients and were designed to evaluate the effects of BoNTs for blepharospasm; as can be seen in this table, a number of these studies used the JRS. It should be noted that various studies have used different versions of the same scale. For example, (Mezaki et al. 1999) used a scale modified from the JRS blepharospasm frequency scale.

Table 1 Studies of BoNT treatment of essential blepharospasm (limited to controlled studies of any size and open-label studies with ≥50 patients)a

Jankovic Rating Scale in 2006 BoNTA non-inferiority trial (Roggenkämper et al. 2006)

The JRS was used as the primary outcome measure in a randomized, controlled, non-inferiority trial comparing Xeomin® (Merz) to BOTOX® (Allergan) published in 2006 (Roggenkämper et al. 2006). This 16-week study included 300 subjects with blepharospasm (n = 148/152 per group) who had achieved a stable response to two previous injections of BOTOX®. Subjects received a single treatment with one or the other medication based on the doses and injection sites of their previous BoNTA treatments (mean doses per eye: 19.8 units Xeomin®; 20.4 units BOTOX®).

In this study, both BoNTA products significantly reduced total JRS scores versus baseline at 3 weeks and ~3.6 months. Mean baseline scores in the two groups were comparable (5.3 Xeomin®, 5.4 BOTOX®), as were adjusted mean improvements at 3 weeks (−2.83 Xeomin®, −2.65 BOTOX®) and ~3.6 months (−0.84 Xeomin®, −0.66 BOTOX®). In this study, the JRS was sensitive enough to detect post-baseline improvements with both BoNT products; this fits with the initial use of the scale, which was to evaluate the effects of BoNTA versus placebo (Jankovic and Orman 1987).

However, the JRS may not be sensitive enough to detect differences between two BoNT products—both of which appear to be effective for the treatment of blepharospasm—and indeed, it was not designed to do so. In the non-inferiority study, a clinically irrelevant difference in the JRS sum score was defined as 0.8 points (Roggenkämper et al. 2006). Because the JRS is rated in whole numbers, a substantial portion of subjects would have needed to rate one or the other BoNTA as approximately 25% lower on either the severity or frequency subscale to achieve a mean difference of at least 0.8 (e.g., a 1-point change on a 5-point subscale scale is approximately a 25% improvement, recognizing that the psychological distance may not be equivalent between ordinal numbers on the scale). This may be somewhat unlikely with two products that are both effective for the treatment of blepharospasm.

Given these considerations with the use of the JRS (and, as discussed later, other current scales) in studies that compare different BoNTs, there is likely a bias toward finding no meaningful differences between groups. Additionally, as previously noted, it may also be more difficult to detect improvements on the JRS in mild cases of blepharospasm for which increased blinking would need to be completely absent to change a rating from 1 to 0. Finally, as a clinical scale evaluating two dimensions of blepharospasm (severity and frequency), nuances in the effects of two different BoNTAs, for instance, in quality of life measures or other more subjective criteria would not be detected.

Overview of activities of daily living/functional ability status scales

In contrast to clinical scales, instruments that assess activities of daily living or patient functional status are rated by the patients themselves. These scales recognize the importance of improvement in everyday activities as an outcome of therapy (Lindeboom et al. 1995). Here we consider functional ability scales specific to blepharospasm as opposed to general scales such the Medical Outcomes Study 36-item Short Form (SF-36), which assesses multiple health-related domains (physical, social, role limitations, bodily pain, mental health, vitality, general health) that are not specific to blepharospasm and may or may not be expected to improve with blepharospasm treatment.

In the 1980s and 1990s, the Blepharospasm Disability Scale (BDS) emerged as a useful functional ability rating scale. The BDS is an eight-item subsection of the Blepharospasm Rating Scale developed by Fahn (1985). Despite the documented reliability and validity of this scale (Lindeboom et al. 1995), it has certain drawbacks, including the lack of a “non-applicable” option for any of the individual items.

This led to the development of the Blepharospasm Disability Index (BSDI), which has been used in several recent BoNT studies (Table 1). The BSDI consists of six daily activities, each rated on a scale from 0 = no impairment to 4 = not possible due to disease, and also includes a “not applicable” option (Fig. 2) (Goertelmeyer et al. 2002). Advantages of the BSDI include its focus on daily activities and ease of use. The scoring system is also relatively simple: mean item scores on the BSDI are calculated by dividing the sum score by the number of applicable items.

Fig. 2
figure 2

Blepharospasm Disability Index (Roggenkämper et al. 2006); scale originally described in Goertelmeyer et al. 2002

The Craniocervical Dystonia Questionnaire (CDQ) is a dystonia-specific quality of life instrument consisting of 24 questions that make up six subscales: stigma, emotional wellbeing, pain, activities of daily living, and social/family life (Müller et al. 2004). Each question is answered as “never, occasionally, sometimes, often, or always.” An advantage of this scale is that it includes questions on social and emotional aspects of dystonia as opposed to focusing solely on limitations to activities. However, the sensitivity of this scale in comparing treatments may be limited by the five-choice options.

BDS in 2008 Dysport® (Ipsen) trial

A modified version of the BDS was used as the primary outcome measure in a randomized, controlled study of Dysport® versus placebo for the treatment of blepharospasm (Truong et al. 2008). This 16-week study included 120 subjects with blepharospasm who had a minimum score of 8 out of 26 possible points on the BDS. Subjects were assigned to receive a single treatment with placebo or one of three Dysport® doses: 40, 80, or 120 units/eye.

In this study, the BDS was modified to exclude questions that were not relevant for individual patients and include an additional rating of 0 (zero) if an item was not affected by blepharospasm. BDS scores were calculated as the percentage of normal activity, defined as “total points scored divided by the maximum possible individual score, multiplied by 90 and subtracted from 90% (i.e., final score = 90% − 90 [score/maximum possible])” (Truong et al. 2008) based on the scoring system specified by Fahn (1985). BDS outcomes were presented as differences in median percentage of normal activity between the BoNTA and placebo-treated groups.

Neither mean nor median baseline BDS scores were presented in the article, but all subjects must have had a minimum score of 8 to be included in the study. All of the BoNTA doses tested significantly increased the percentage of normal activity on the BDS over placebo at weeks 4, 8, and 12, with the two highest doses also showing significant improvements over placebo at week 16. Thus, as modified in this study, the BDS was able to detect differences between BoNTA versus placebo.

BSDI in 2006 BoNTA non-inferiority comparison trial

Although the JRS was the primary outcome measure in the randomized, non-inferiority trial described in a previous section, this study also included the BSDI as a secondary outcome measure (Roggenkämper et al. 2006). Mean item scores on the BSDI were calculated at 21 and 109–112 days post-treatment. As mentioned above, mean item scores on the BSDI are calculated by dividing the sum score by the number of applicable items.

In this study, both BoNTA products significantly reduced mean item scores on the BSDI versus baseline at the two follow-up time points. Mean item scores on the BSDI at baseline were compared in the two groups (1.67 BOTOX®, 1.60 Xeomin®), and mean improvements at 21 days (−0.83 BOTOX®, −0.82 Xeomin®) and 109–112 days (−0.22 BOTOX®, −0.36 Xeomin®) were not significantly different at the P < 0.05 level.

BSDI in 2010 BoNTA preliminary comparison trial

Total BSDI scores were used as the primary outcome measure in a randomized, controlled trial comparing Xeomin® to BOTOX® (Wabbels et al. 2010). Total BSDI scores are calculated as the sum of scores on all of the BSDI items. This 14-week study included 65 subjects with blepharospasm (n = 32/33 per group) who had received at least one treatment with BOTOX® ≥20 units/eye and required another treatment with the same dose. Subjects were randomized to receive a single treatment with one or the other medication based on the doses and injection sites of their previous BoNTA treatment (mean doses 29 units BOTOX®, 27 units Xeomin®).

In this study, both BoNTA products significantly reduced total BSDI scores versus baseline at 4 and 8 weeks. Mean total baseline scores in the two groups were compared (7.9 BOTOX®, 8.3 Xeomin®), and mean improvements in total BSDI scores at 4 weeks (−2.8 BOTOX®, −1.3 Xeomin®) and 8 weeks (−1.3 BOTOX®, −0.8 Xeomin®) were not significantly different at the P < 0.05 level. Mean item scores on the BSDI were also compared at baseline (1.39 BOTOX®, 1.44 Xeomin®), and improvements in mean item scores on the BSDI at 4 weeks (−0.42 BOTOX®, −0.21 Xeomin®) were not significantly different at the P < 0.05 level.

Again, it seems possible that this scale may not be sensitive enough to detect differences between two BoNT products. The BSDI shares the same sensitivity issues as the JRS in that the ratings for each category range from 0 = no impairment to 4 = not possible due to disease. As with the JRS, the BSDI ratings consist of whole numbers, and subjects would need to rate one or the other BoNTA as approximately 25% lower on one of the items in order to detect any difference between them. Again, this assumes that a 1-point improvement on a 5-point scale constitutes an approximately 25% improvement, recognizing that the psychological distance may not be equivalent between ordinal numbers on the scale. Like the JRS, the BSDI is a modified version of a scale that was originally developed to determine whether BoNT was more effective than placebo in the treatment of focal dystonia (Lindeboom et al. 1995).

Application of the “clinically meaningful” improvement criteria

Jankovic et al. (2009) recently analyzed the metric properties of the JRS and BSDI in blepharospasm patients by evaluating the relationship between various clinical outcome measures. These authors concluded that a 0.7-point improvement in BSDI mean item score and a 2-point improvement in the JRS sum score constituted clinically relevant improvements (Jankovic et al. 2009).

It is important to point out that these criteria are only relevant for patients whose baseline scores are >2 on the JRS and >0.7 on the BSDI (calculated as the mean of all BSDI item scores that are relevant for a given patient). A case in point is the BoNTA comparison study described in the preceding section (Wabbels et al. 2010). In this study, only 19 of the 31 subjects (61%) in the BOTOX® group and 24 of 33 subjects (73%) in the Xeomin® group had high enough baseline BSDI scores to be included in a responder analysis based on the Jankovic criterion (Wabbels et al. 2010).

Caution must therefore be used when applying the Jankovic criteria to overall study population means; only those whose baseline scores exceed the improvement criteria can legitimately be subject to them.

Global rating scales

Global rating scales are usually general rather than disease specific, designed to capture the overall subjective effects of treatment. These scales can be physician or patient rated, and are usually used as secondary instead of primary outcome measures.

One global rating scale that has often been used in BoNT trials is a modification of the one developed by Brin et al. (1995). On this scale, improvement or worsening from baseline is rated from −4 (marked worsening in symptoms and function) to 0 (no effect) to +4 (marked improvement in symptoms and function). This scale has been used in a number of BoNTA trials in several different medical conditions (Naumann and Lowe 2001; Roggenkämper et al. 2006; Simpson et al. 1996).

A percentage of normal function scale developed by Brin et al. (1994; 1995) has been used in other BoNT studies (Wissel et al. 2000). On this scale, patients are asked to rate the function of the body part treated from 0% (fully disabled with no functional activity) to 100% (normal function).

An advantage of patient-rated global scales is that they permit assessment and quantification of improvements in symptoms or functions that are important to patients. Because these scales are non-specific, they may consider more than one aspect of a treatment’s effects and theoretically represent a global judgment of how well the treatment works. Additionally, global assessment scales that are anchored by only a few rating descriptions such as mild, moderate, and marked essentially have built-in clinical relevance in that improvement from marked to moderate or moderate to mild appears to be clinically meaningful. However, the inclusion of such few ratings renders the scale less sensitive than larger scales. Another drawback to global scales is their subjectivity, with ratings possibly influenced by psychological variables such as mental state/mood and expectations. Patients are also typically asked to compare their condition to baseline, which may be difficult to remember.

Global assessments in BoNT comparison trials

A 9-point global assessment scale (−4 to 0 to +4) described in the preceding section was used in both the non-inferiority (Roggenkämper et al. 2006) and preliminary comparison (Wabbels et al. 2010) trials evaluating the effects of BOTOX® versus those of Xeomin®. In both studies, mean improvements with the BoNTA products were generally between +2 and +3, supporting a global effect on blepharospasm. However, statistically significant differences between BoNTA on this measure were not detected in either trial. As with the previously described JRS and BSDI measures, it may not be possible to detect subtle differences between two BoNTAs due to the insensitivity of the scales (only four possible ratings for improvement).

Considerations/conclusions

There are several important considerations with blepharospasm scales beyond the need for reliable and valid measurement of the condition. First, it must be recognized that blepharospasm can be challenging to assess because symptoms may vary depending on time of day, patient stress level, lack of sleep, situation (e.g., home vs. examiner’s office), and environmental stimuli. Thus, blepharospasm measurement is hindered by the variable nature of the complaints. To address this, patients should be evaluated in the same location and the same time of day, recognizing that it will not be possible to control all of the non-medication related factors that may influence blepharospasm severity during rating.

A second consideration is the lack of precision and objectivity of current measures. For conditions such as high blood pressure and diabetes, precise and sensitive measurements can be performed using manual or electronic devices. Historically, blepharospasm has been evaluated using an instrument to measure eyelid force; however, as noted previously, this is not an optimal measure for the effects of BoNTs because the therapeutic goal is elimination of spasms while maintaining normal force of eyelid closure. The development of alternate devices for the measurement of blepharospasm could alleviate the subjectivity issue. Of note, several studies have evaluated the use of video nystagmography as a measurement tool for blinking in blepharospasm before and after botulinum toxin treatment (Casse et al. 2008, 2009). The objectivity of this method is advantageous, although further studies are needed to assess its validity.

A third consideration is the lack of sensitivity of current rating scales. Clearly, more sensitive scales are needed if we are to accurately compare the effects of different BoNTs in the treatment of blepharospasm. Current scales, which were initially used to determine whether BoNTs were superior to placebo, are probably biased against finding differences due to their lack of sensitivity. To address this, scales should incorporate a broader range of ratings. One global assessment scale that does include a broad range is the percentage of normal function (Brin et al. 1995). Perhaps the utility of this scale could be improved by including descriptive anchors such as 50% = half of normal function, 90% = almost normal function, etc. In this vein, it is interesting to note that some of our patients have developed similar scales for themselves (e.g., diaries listing percentage of relief). The sensitivity limitations with existing scales may be in part overcome by the use of visual analogue scales or percentage function scales in which the patient may be asked, for example, to rate the extent to which their disease limits specific activities (e.g., reading, watching television, working, participating in social activities). However, diaries would be particularly important in these analyses because it may be hard for patients to accurately remember the extent of their limitations prior to treatment.

A related consideration is whether the scales are sensitive enough for patients with mild disability or whether the scales need to be adapted for this population, who mainly have discomfort or psychosocial problems due to the visibility of the blepharospasm but are not functionally restricted. For example, in the study that used the BSDI as the primary outcome measure, the mean scores on individual BSDI items before injection were only 1.39 and 1.44 for the two groups, indicating only mild impairment (Wabbels et al. 2010). In such cases, the BSDI would not detect improvement unless subjects obtained a rating of 0, indicating no impairment. It may be possible to address this issue by integrating several items that pertain to the psychosocial or aesthetic aspects of blepharospasm into existing scales.

Another related consideration is whether current scales measure what is important to patients. The BSDI attempts to do this by including activities of daily living; however, it is unclear whether these represent the aspects of blepharospasm that are most important to patients. We have observed that some patients return for their next BoNTA injection when their scores on existing scales are “0” (zero). Although this could be a function of patients seeking re-treatment before the effects of BoNT wear off, it may represent the inadequacy of current scales in measuring what is important to these patients. For instance, more mildly affected patients often note that it is the reaction of others to their eye spasms that is the most bothersome aspect of their condition. This is not captured on current scales, with the exception of the dystonia-specific quality of life scale, CDQ-24, which includes a stigma subscale (Müller et al. 2004). Of course, no scale can capture every aspect of the condition that is important to all patients. However, it may be possible to develop a scale that permits patients to select the top 2 or 3 most bothersome aspects of their condition and rate “percentage of normal” on that variable following treatment. This would represent an adaptation of the Disability Assessment Scale that has been used to evaluate the effects of BoNTs in focal spasticity (Brashear et al. 2002).

It may also be noted that there is more than one type of blepharospasm. There is a “typical” blepharospasm, which may be orbital, palpebral or mixed and may present as either tonic or phasic. There is also a levator inhibition subtype in which the major problem is opening the lid. There may also be patients who present with a hybrid type of blepharospsam that includes components of typical blepharospasm and levator inhibition (Aramideh et al. 1994). True apraxia of the eyelid (not to be confused with levator inhibition) is probably rare; by definition this is not a dystonia and therefore it is not treated with BoNTA. Existing scales do not address differences between subtypes of blepharospasm.

Another consideration in the measurement of BoNT effects is duration. Although duration has frequently been used as a measure of BoNT efficacy, it is imprecise and definitions have varied or, in some cases, have not been specified. For instance, total amelioration of symptoms for 4 weeks is not the same as a reduction in symptoms for 4 weeks (Nussgens and Roggenkämper 1997). Another challenge with measuring duration is that patients often return for re-injection of BoNTA before the effects of their previous treatment have completely worn off. For instance, patients may know that symptoms recur after 13 weeks, so they get injections every 12 weeks. Nevertheless, information about BoNT action over time is important in assessing the real effect of the medication over time and is important to include in clinical studies.

Also critical in comparing different therapies for blepharospasm, or any condition, is the clinical trial design. The study must be adequately powered and the variability controlled to permit detection of expected differences (i.e., avoidance of a type II or beta error). This may involve ensuring that enough subjects are enrolled in the study and designing the inclusion/exclusion criteria to reduce heterogeneity in the study population. Crossover designs may be useful, provided that adequate washout periods are allowed between injections to preclude carryover effects. Administration of questionnaires prior to and following each treatment may also be useful for minimizing differences among subjects.

A final important point has less to do with the assessment scales than the doses of BoNTs. In studies that compare different BoNTs, the products are typically compared at set doses—usually corresponding to each patient’s maintenance dose. Although physicians vary in the doses administered, it is likely that these tend to be at or near the top of the dose–response curve for each patient. Thus, the products are compared at what may be the asymptotic portions of the dose–response curves. If no differences are found, some may interpret this as indicating that the different BoNT products produce the same effects at the same doses. However, this is not necessarily the correct interpretation. A hypothetical example may be one BoNT that produces a maximal effect at 1.25 U/site and another that produces a maximal effect at 2.0 U/site. If the products are compared at 2.5 U/site, the study may find no difference. However, if the BoNTs are compared across a range of doses, differences may become evident and such differences may be particularly evident at the lowest effective doses.

Based on the aforementioned considerations, it seems unlikely that any single measure is adequate to evaluate the effects of treatment on blepharospasm. Objective measures may not be sensitive to features of the condition that are most important to patients, and subjective, patient-rated scales may depend too much on the patient’s memory, expectations, and psychological state at the time of rating. In the absence of a mechanical or electronic device that provides a valid measure of response to treatment, it seems logical to rely on a combination of objective and subjective rating scales. However, as described previously, current measures lack sensitivity and thus we should seek to amend these scales or develop new ones. In this pursuit, it will be important to distinguish clinically meaningful differences in improvement on the various measures, as (Jankovic et al. 2009) have attempted to do for the JRS and BSDI. However, smaller differences on more objective or sensitive measures may also be important in the study of BoNTs, as these could indicate pharmacological or potency differences.

Our overall conclusion is that no single scale is broad enough to capture the entire essential blepharospasm experience; multiple scales are likely necessary to get a full picture of this disorder. Additionally, existing scales are not sensitive enough to differentiate the effects of different BoNTs in clinical trials and may not be sensitive enough for patients with mild blepharospasm. We recommend (i) development of more sensitive scales, including the pursuit of a valid biomechanical or electronic measurement device and (ii) development of a scale that incorporates symptoms that are known to be relevant to patients (e.g., selection of several symptoms for ratings).