Content overlap of 91 dystonia symptoms among the seven most commonly used cervical dystonia scales

Introduction Dystonia is a movement disorder characterized by sustained or intermittent muscle contractions. Cervical dystonia (CD) is the most common focal dystonia. There are several instruments assessing the symptoms of CD. However, different scales assess different features which may lead to poor patient evaluation. Aim The aim of the study was to evaluate the degree of overlap of most often used CD rating scales identified by the literature review. Methods A thorough search of the Medline database was conducted in September 2021. Then the frequency of each scale was calculated, and 7 most common scales were included in the content overlap analysis using Jaccard index (0 – no overlap, 1 – full overlap). Results Toronto Western Spasmodic Torticollis Rating Scale (TWSTRS), Tsui score, Burke-Fahn-Marsden Dystonia Rating Scale (BFMDRS), Cervical Dystonia Impact Profile 58 (CDIP-58), Craniocervical Dystonia Questionnaire 24 (CDQ-24), Cervical Dystonia Severity Rating Scale (CDSS), Cervical Dystonia Severity Rating Scale (DDS) and The Dystonia Non-Motor Symptoms Questionnaire (DNMSQuest) were the most common scales. 91 CD symptoms were distinguished from 134 items used in the scales. The mean overlap among all scales was 0.17. 52 (62%) symptoms were examined by more than one scale. The CIDP-58 captured the highest number of symptoms (63.0%), while the CDSS captured the lowest number (8.0%). None of the symptoms were examined by seven instruments. Conclusions There was a very weak overlap among scales. High inconsistency between the scales may lead to highly different dystonia severity assessment in clinical practice. Thus, the instruments should be combined.


Introduction
Dystonia is the third most common movement disorder [1,2].It is characterized by sustained or intermittent muscle contractions causing abnormal movements or postures [3].
However, different scales focus on different features which might result in a poor assessment of disease severity.This problem mainly applies to non-motor symptoms (NMS) of dystonia.For instance, not all of the scales evaluate NMS as some of them were developed before the subject emerged [14].In recent decades NMS have been the subject of extensive research -the most popular symptoms are: sleep disorders, anxiety and tiredness [15,16].Until now, there is no universal tool to assess them.There is a possibility of overlooking the symptoms by using one specific questionnaire.
Eiko Fried (2017) proposed a methodology for evaluating different assessment tools in order to designate the level of overlap of symptoms across the questionnaires [17].Such analysis might give valuable information about the reproducibility and generalization of findings made by one tool.When the overlap is poor, it might indicate that the symptoms assessed by one tool are idiosyncratic and cannot be found by another tool -putting in doubt the comparability of different studies.Moreover, this methodology has been already used in several different studies [18][19][20][21] The aim of our study was to evaluate the degree of overlap of most often used CD rating scales which were identified by the literature review.Furthermore, we gave recommendations on which scales should be combined to give more specific information about dystonia severity in research.

Literature review
A thorough search of the Medline database was conducted by one researcher in September 2022.No language or date filters were applied.The query was as follows:

The inclusion criteria:
• assessing the symptoms of CD with the usage of a specific scale; • patients ≥ 18 years old.
The exclusion criteria: • type of article: meta-analyses, systematic review, literature review, case study; • usage of scales which are not specific for dystonia assessment; • non-cervical types of dystonia; • studies developing a scale.
Overall, 196 studies that fulfilled above mentioned criteria were included.Figure 1 presents a flow diagram representing the procedure of finding, excluding and selecting publications for further analyses.The frequency of each scale in the articles was calculated and the seven most common ones were included in the analysis.The decision of choosing the seven most popular used scales was based on the methodology of our previous study [18].

CD scales
The chosen scales were: 20-item TWSTRS [5]; 7-item Tsui scale [6]; 8-item BFMDRS (8 items because only the part considering neck and disability was analyzed) [7]; 58-item CDIP-58 [8]; 24-item CDQ-24 [9]; 3-item CDSS [10]; 14-item DNMSQuest [12].Table 1 presents the internal consistency, interrater and test-retest reliability of the tools.Although the frequency of DDS [11] was higher than DNM-SQuest (1.5% vs. 1.0%) we decided to include DNMSQuest instead of DDS.Firstly, DDS is a scale that assesses only the patient's experience related to the disease rated from 0 (no complaints) to 100 (maximum subjective severity of the untreated condition).That is why it is much different from other evaluated scales and thus cannot be compared with them.Secondly, DNMSQuest is a relatively new scale assessing NMS and it is not as well studied as other scales.

Content analysis
Based on the Fried (2017) methodology we adjusted the number of items in each scale [17].The items were combined if they analyze the same symptom, in order to avoid biasing further analyses.The 8-item BFMDRS was reduced to 6 items."Dressing", "hygiene" and "feeding" were combined as one item assessing "limited daily activities".
58-item CDIP-58 was reduced to 48 items."Tension in the neck" and "tightness in the neck" were combined as one item assessing unpleasant feeling of tension in the neck."Aching in shoulders" and "pain in shoulders" were combined as one item assessing pain in shoulders.Similarly to BFMDRS, items such as: "carrying light objects", "chores", "cooking", "getting tired doing light activities", "cleaning the house", "limits in type of work", "heavy chores", "carrying heavy objects", "getting tired doing demanding activities" were combined as "limited daily activities".In our opinion, the analysis of specific activities mentioned in each scale would bias the analysis while those items evaluate a similar issue.
24-item CDQ-24 was adjusted to 23 items."Feeling down or depressed" and "feeling sad" were combined as "feeling down".
On the other hand, in TWSTRS we added one item "head movements" to make a comparison more reliable.Original TWSTRS evaluates head movements in detail but it lacks the general category describing them.Therefore, adjusted TWSTRS had 21 items.
After adjusting the scales, the maximum number of 91 symptoms were identified, and they were used in content analysis.
The categorization of items was also based on Fried's (2017) methodology [17].Items were treated as disparate if they distinctly differed from each other and they were considered equivalent if they evaluate the same symptom.For instance, "do you have any speech problem?" from DNM-SQuest and "speech" from BFMDRS were considered as equivalent as both of them evaluated the same symptom -speech problems.On the other hand, "straining in the neck" and "stiffness in the neck" were not considered equivalent as those are different sensations.The symptom was defined as idiosyncratic if it appears in only one instrument.

Statistical analysis
Firstly, each symptom was evaluated, and the decision was made if it appears specifically in the scale or the symptom is featured generally (indirectly) or is not featured at all.Secondly, the overlap of CD symptoms between the scales was calculated with the use of the Jaccard Index.It is a similarity coefficient for binary data (to run this analysis symptoms were categorized separately from previous analysis as present (1) or not (0) in the scale).The coefficient ranges from 0 (no overlap) to 1 (complete overlap) [17].It was calculated by s u 1 +u 2 +s , where s is the number of items two scales share and u 1 and u 2 stand for the number of items that are unique to each of the scales.Criteria of Jaccard Index power were as follows: very weak 0.00 -0.19, weak 0.20 -0.39, moderate 0.40 -0.59, strong 0.60 -0.79, very strong 0.80 -1.00 [17,24].Moreover, the symptoms were divided into seven categories: motor symptoms, sensory symptoms, disability, sleep disturbances, gait disturbances, emotions, and social interactions.Then, the rate of symptoms in each category in each scale was calculated.Analyses were conducted with the code supplied in [17] with the usage of R software [25].

Results
From the included articles most commonly used scales were TWSTRS (in 162 articles), Tsui scale (in 40 articles), BFM-DRS (in 14 articles), CDIP-58 (in 12 articles), CDQ-24 (in 9 articles), CDSS (in 3 articles), DDS (in 3 articles) and DNM-SQuest (in 2 articles).Detailed information about the number of articles in which each scale was used can be found in Fig. 2.
The analysis extracted 91 distinct CD symptoms (Fig. 3).52 (62%) symptoms are examined by more than one scale.Whereas none of the symptoms appear in six or seven instruments simultaneously (Table 2).Furthermore, each symptom is present in two instruments on average.
In Table 3 the analysis of specific, compound and idiosyncratic symptoms can be found.CDIP-58 consists of the highest number of idiosyncratic symptoms (12 -which is 21% of items in this scale).CDSS is the only instrument that has no idiosyncratic symptoms.CDSS also analyses the lowest percentage of symptoms -8% of 91 distinct symptoms.On the other hand, CDIP-58 examines the highest percentage of symptoms -63%.
TWSTRS is the instrument that analyses the highest number of symptoms in the motor symptoms category -12 (67%) and in the disability category -10 (91%).CDIP-58 captures the highest number of sensory symptoms, sleep disturbances and gait disturbances (six (43%), four (100%) and nine (100%), respectively).CDQ-24 consists of the highest number of symptoms related to emotions and social interactions (17 (94%) and 13 (76%), respectively).Detailed information about the symptom categories can be found in Table 4.
The correlation between the mean Jaccard coefficient of each scale (the mean overlap a scale with all others) and the length of the scale is 0.61 for the number of specific symptoms captured and 0.54 for the adjusted scale length.

Discussion
In our study, we have conducted a literature search and gathered 196 studies from which we have chosen the seven most commonly used instruments examining CD symptoms.According to the literature, TWSTR is the most popular rating tool used in CD research (82.7%).The second one is Tsui scale (20.4%).91 disparate symptoms were identified.The mean overlap among all tools was very weak (0.17).As presented in Table 2 38% of symptoms were present in only one instrument while none of the symptoms appeared in all of them.What is more, we conducted the analysis of the categories of symptoms which revealed that some scales do not mention specific categories.It can be found that the most popular scale (TWSTRS) mainly focuses on motor symptoms and disability while there are no items evaluating psychological status.Tsui scale and CDSS also analyse only motor disturbances.On the other hand, BFMDRS focuses mainly on gait and disability.There is also a group of scales (CDIP-58, CDQ-24 and DNMSQuest) focusing on NMS, omitting the motor symptoms.Furthermore, we showed there is a moderate correlation between the Jaccard coefficient and the length of the scale showing that scales with a bigger amount of items have higher level of similarity to other questionnaires.To our best knowledge, this is the first study to analyse the content overlap of the scales used in CD research.Previously published comparisons of scales focused on the clinical applicability, validity and descriptive analysis of the symptoms measured by the scales [28][29][30].In some studies, CD symptoms were examined by more than one scale.For instance, Tarsy used TWSTRS and Tsui scale to assess the dystonia severity for the same patients in the study about botulinum toxin efficacy and found a significant positive correlation between the reduction of two scales scores after the treatment [31].Considering our analysis, those scales analyse mainly motor symptoms which might explain a similar result, regardless of the weak overlap of the questionnaires (0.31).On the other hand, Tomic et al. presented a significant correlation between disability measured by TWSTRS and subscales of CDQ-24 [32].However, considering the very weak overlap between TWSTRS and CDQ-24 (0.15) the symptoms assessed by CDQ-24 might be overlooked while using only TWSTRS.As was shown by Jost et al.TWSTRS is the main scale used in randomized controlled trials regarding botulinum toxin efficacy which is in line with our literature  review in which it was the most commonly used scale [30].It indicates that the NMS assessed by other scales are not taken into consideration in such trials regardless of their probable influence on the results.Even though TWSTRS analyse motor symptoms extensively, it captures only 30% of symptoms overall.BFMDRS has the lowest mean overlap (0.09), simultaneously having very weak or weak overlap with scales analysing mostly motor symptoms (e.g.TWSTRS) or NMS (e.g.DNMSQuest).Therefore, this scale should not be used alone but with e.g.Tsui scale or CDSS which evaluate most of motor symptoms while not considering disability assessed in BFMDRS.Instruments with very weak overlap, especially with an overlap which equals zero (e.g.Tsui scale and BFMDRS, Tsui scale and CDQ-24, CDQ-24 and CDSS) should not be used interchangeably as they examine entirely different symptoms.
Taking into consideration our original analysis of symptom categories, three types of scales might be identified -analysing motor symptoms, NMS or both.The first type consists of TWSTRS, Tsui scale and CDSS; the second -CDQ-24 and DNMSQuest; the third -CDIP-58 and BFMDRS.TWSTRS examines the highest number of motor symptoms and disability symptoms; CDIP-58 -of sensory symptoms, sleep disturbances and gait disturbances; CDQ-24 -of emotions and social interactions (however the differences between the number of symptoms analysed by CDQ-24 and CDIP-58 in those categories are small).CDIP-58 is the only scale that examines at least one symptom in each category.The suggested categorization is also supported by specific content overlap analysis.Motor symptoms scales have none or very weak overlap with nonmotor scales, while the overlap score is greater inside the type.
Based on our analysis, CDIP-58 examines the largest number of symptoms and has the highest mean overlap with other scales.Thus, it might be a valuable clinical tool.Furthermore, there is a moderate correlation between the Jaccard index and scale length, and CDIP-58 is the scale consisting of the greatest number of items in our study.However, it analyses a low number of specific motor symptoms.TWSTRS on the other hand is well validated and most commonly used to assess motor symptoms [30].That is why in our opinion CDIP-58 should be combined with TWSTRS for complex assessment of CD.

Limitations
As in the previous studies using the content analysis, the main limitation is the number and choice of scales [17,18].As before, we implemented a thorough literature search to minimise the risk of bias resulting from wrong scale choice.Another limitation of content overlap analysis is the lack of an objective method to compare the items across the instruments [17,18].We followed Fried's (2017) methodology and if there were any doubts about the symptoms we considered symptoms rather too similar than too different [17].

Conclusions
It was the first study to conduct content overlap analysis of the CD symptoms scales.We showed a very weak overlap among scales and categorised the scales on the basis of assessed symptoms.None of the scales alone examines the CD symptoms exhaustively enough, which is why we recommend combining them.
Funding No funding was received to assist with the preparation of this manuscript.

Fig. 1
Fig. 1 Flowchart of the inclusion process

Fig. 3
Fig. 3 Presence of 91 dystonia symptoms in 7 most commonly used scales.Colored circles indicate that a symptom is directly assessed by the scale, while empty circles indicate that a scale only measures a symptom indirectly

Table 2
The number of symptoms found in most commonly used scales.For example, 35 items, presenting 38% of all analysed cervical dystonia-related symptoms, are present in one instrument, while none of the items is present in every analysed scale

Table 3
The percentage of the specific, compound and idiosyncratic symptoms gathered in seven analysed tools, as well as the percentage of all 91 disparate items they capture

Table 5
Overlap of the item content of 7 dystonia scales.