Background

Vestibular disorders (VD) produce a group of vestibular symptoms (VS) as well as a range of concomitant autonomic-anxiety symptoms [1]. Epidemiological data on VD in the general population are scarce. Studies have reported a discrepant range (6.1 to 27%) for one-year prevalence of VS [2]. However, they are prevalent among individuals visiting outpatient care centers [3]. VS are vague and present themselves in different patterns (acute, episodic, and chronic) [4]. That is, they are difficult for patients to describe, and hard for healthcare professionals to evaluate [5]; hence, they place a burden on both patients and community [6].

One potential way to overcome the difficulty of evaluating demanding symptoms is the utilization of patient-reported outcome measures (PROMs) through reliable and validated questionnaires, which has gained acceptance and popularity in different fields of medicine [7]. Based on the Consensus-based Standards for the Selection of Health Status Measurement Instruments (COSMIN) checklist of property measurements [8], the clinical utility of a group of PROMs related to VD was appraised through a systematic review; among them, the long form of the Vertigo-Symptom Scale earned the second highest score [9]. It was developed by Yardley et al. [10] and contains 34 items. However, Mendel et al. [11] found that utilizing the long form as a single aggregated scale may result in methodological bias; to overcome this hazard he suggested studying these items separately by using the short form (VSS  SF).

The VSS  SF is composed of 15 items [12], extracted from the long form. This self-rated questionnaire uses five-point scales ranging from 0 to 4, with response options of never, a few times, several times, quite often, and very often. The score indicates the frequency of the 15 symptoms, which range from 0, suggesting no symptoms, to 60, representing persistent symptoms. According to the types of symptoms, the 15 items are divided into two subscales: vestibular (balance) (VSS  V), and autonomic-anxiety (VSS − AA) [13].

However, to use a PROM in a population with a language different from the source, it must undergo a process of crosscultural adaptation, which includes both translation and cultural adaptation. However, translation of any validated PROM can debilitate its psychometric properties; therefore, consistency and validity should also be confirmed and reported in accordance with international guidelines for measuring patient-reported health outcomes [14]. The psychometric properties of the VSS − SF were assessed when Norwegian and Japanese versions were cross-culturally validated; both translated versions had acceptable internal consistency, external reliability, convergent validity, and discriminating validity. Two factors were explored in the Norwegian version: VSS-V and VSS-AA [15]; however, a third factor related to duration of symptoms was also extracted from the Japanese version [16].

Unfortunately, there is a critical shortage of validated tools in Kurdish that can quantify vestibular disorders. The VSS-SF is efficient, simple, short, and has not been adapted to Kurdish. Accordingly, in this study we applied an adjusted translation and cultural adaptation of the VSS  SF to the central Kurdish dialect (VSS  SF CK). Utilizing a cross-sectional survey, and in accordance with the COSMIN checklist [8], we assessed the psychometric properties of the VSS − SF − CK.

Methods

Cross-cultural adaptation (CCA)

The focus group (FG)

In accordance with international regulations for qualified PROMs [8], the College of Medicine – University of Sulaimani (hereafter, “the institute”) assembled a FG, consisting of seven otolaryngologists (including one of the authors) who were all native speakers of the target language with 15 to 25 years of experience in the field of VD. The moderator of the group was aware of how to run the discussion sessions according to the corresponding guidelines [17].

Preparation:

Preparation consisted of three steps.

  1. (1)

    The corresponding author contacted and confirmed the permission of Professor Lucy Yardley as one of the original developers.

  2. (2)

    A junior otolaryngologist (who could easily contact the members of the FG and the translators) was recruited to follow the translation process.

  3. (3)

    The concepts of clarity, fluency, and unambiguity in the forwarded translations were agreed upon and followed during CCA.

CCA:

The process was conducted according to the steps recommended by Wild and colleagues [18] and Beaton and colleagues [19]. Two forward-translations of the contents were performed by an expert native otolaryngologist (T1) and a licensed native translator (T2). The FG compared and resolved differences between T1 and T2; then, a preliminary form of VSS  SF  CK was created (T12). After back-translation, identified discrepancies (see Additional file 1) were resolved (e.g., a clause was added to clarify the meaning of “dizziness.”) To examine the clarity, we conducted a pilot test with 18 linguistically−knowledgeable patients with vestibular symptoms. Utilizing a specific form designed for ratings (Additional file 2), members of the FG and participants in the pilot test were asked to give feedback on understandability and to rate the contents of each translated item. The CCA process and results of the ratings were reviewed; consequently, the face and content validity were considered excellent. Ultimately, after proofreading and cognitive debriefing, the final version was established (Additional file 3) and the details of the process were reported to the institute.

Sample size and participants

Based on a subject-to-variable ratio of a minimum of 10 participants for each item [20] and factors extracted in previous research on the same instrument [16], we estimated that 165 participants would be sufficient to observe the covariation among our 15 surface attributes; along with 30 healthy control participants for comparison. Two well-equipped audio-vestibular tertiary clinics that cover a major proportion of the center and districts of Sulaimani-Governorate, Iraq enrolled participants from March 2017 to July 2018. Participants were patients with chief complaints of VS who had been objectively diagnosed as having VD.

Inclusion criteria allowed native speakers with sufficient communication and performance abilities. The exclusion criteria were: age below 17 or above 79, symptoms of less than 1 day duration (Patients needed to have experienced symptoms [a feeling of being dizzy, disoriented, or swimmy lasting all day] for at least 1 day in order to answer item-6), musculo-skeletal diseases and symptoms primarily due to other systems disorders such as neurological, cardiopulmonary, and cognitive disorders.

Subgroups:

The heterogeneity of symptoms in the instrument required patients with different presentations and from different settings [10]; consequently, the inclusion and exclusion criteria were adjusted to ensure that the sample was a good representation of the target population (patients with VS of vestibular origin with no associated illnesses that may produce VS). The sample contained all types of patients that may be encountered in primary, secondary, and tertiary clinics. Furthermore, based on the patterns of presentation, and to evaluate the discriminating validity, the sample was classified into three subgroups: (1) Acute presentation (acute episode of symptoms at the time of rating), (2) Chronic presentation (long-term sensations of symptoms), and (3) Episodic presentation (recurrent symptoms with symptom-free intervals) [21]. For the 76 participants who were randomly selected from the patients included in the reliability subgroup, the design was converted to a short-term longitudinal study to assess external reliability.

Educational level and raters

The VSS − SF − CK is a self-rated survey tool, that is, the role of the rater (interviewer) is trivial [22], but not everyone in the target population is literate, so participants’ educational levels were documented. Methodologists also recommend the involvement of a female interviewer to simplify the process, considering participants’ psychological and/or societal obstacles [23]; that is, female interviewers can interview both genders, particularly women in conservative or religious families. Hence, two female raters with similar qualifications and sufficient training were recruited.

Recruitment and randomization

While patients were waiting for the results of their investigations or rehabilitation protocols, a systematic numbered sample was used on a daily basis to select patient participants who fulfilled the inclusion criteria and accepted the invitation. The first participant was selected randomly followed by fixed-interval selection.

Comparators

To the best of our knowledge, there are no validated PROMs in Kurdish that measure the construct under investigation. Consequently we employed two comparators that could measure a similar construct but using two different approaches, that is, subjective and objective. First, in the subjective approach, a visual analogue scale (VAS) was applied so patients could rate their total self-perceived vestibular symptoms (VAS − T). The scale started with zero to represent no symptoms and ended with 100 to represent subjectively rated worst-possible symptoms. Second, in the objective approach, the Tandem Romberg (TR) was utilized, participants were requested to maintain balance for 60 s under the following four conditions: 1- right foot behind the left, eyes open; 2- same as the first, eyes closed; 3- left foot behind the right, eyes open; 4- same as the third, eyes closed. Only one of three trials was administered for each condition if the patient could complete 60 s successfully. The scores from all four conditions (TR − T) were summed out of 240 s [24].

External reliability

Steps recommended by Kottner and his colleagues were followed during reliability assessments and reporting [25]. Patients in the reliability subgroup (n = 76) were rated on two separate occasions. The timing of the second rating was arranged according to the patients’ availability.

The following strategies were used to minimize measurement errors:

  1. (1)

    Participants with unstable conditions (dramatic recovery or deterioration) were excluded from the reliability tests.

  2. (2)

    The time interval between ratings was one to 5 days; furthermore, to avoid recall bias, the sequence of items for the second rating was different. However, the interval for Tandem Romberg was one to 2 hours to remove the effect of in-between rehabilitation.

  3. (3)

    Similar settings were applied to all patients; ratings were performed in a quiet room to eliminate distractions and minimize auditory stimuli, so patients could not maintain their balance using these stimuli, especially in eye closed conditions (to test vestibular system alone, the role of other systems, that could help in maintaining balance, should be excluded).

  4. (4)

    Raters were instructed not to prompt patients for specific answers.

Statistics

Data screening

Ceiling and floor effects were absent, while the percentages of patients with the highest and lowest scores in the three outcome measures were below 15% [26]; pairwise exclusion was used with missing values. In our sample size (50 < N < 300), absolute Z-scores above 3.29 were considered to reflect a non-normal distribution [27]. Univariate and multivariate (Mardia test) statistics revealed an asymmetric distribution. Ordinal variables such as Likert-type items fail to assume normality [16, 28] and therefore require either log-transformation or distribution-free (e.g., nonparametric) tests; we chose the latter [29].

Structural validity

Exploratory factor analysis (EFA)

To identify the latent constructs, considering a sample size of (≤300) and non-normality [20, 28], the authors conducted EFA. Some methodologists recommend use of parametric tests even if the distribution is non-normal [30]. However, for ordinal data and non-normality, others advocate more robust tests, such as polychoric correlations (PC) [31], specifically, Robust Diagonally Weighted Least Squares (DWLS) [32]. In view of the study context, principal axis factoring (PAF) was considered to outweigh maximum likelihood [28]. To certify that the same outcomes would be reproduced, and in light of the above circumstances, we utilized both PAF and DWLS in EFA. Assuming moderate inter-factor correlation (IFC), promax oblique rotation (Kappa = 4) was employed.

Number of factors to retain

To avert bias, guidelines emphasize using diverse strategies for finding the ultimate number of internal attributes [28, 33]. This was resolved based on five parameters:

  1. (1)

    Kaiser Criterion (eigenvalue > 1).

  2. (2)

    Scree plot.

  3. (3)

    Horn’s parallel analysis (HPA) [34].

  4. (4)

    Minimum average partial (MAP).

  5. (5)

    The a priori hypotheses that the instrument consists of two subscales: VSS − V and VSS − AA [15, 16].

Discriminant validity (internal discrimination)

To establish this feature, four criteria were utilized:

  1. (1)

    Cross-Loadings Inspection: Item−loading on its construct should be higher than its cross-loadings.

  2. (2)

    Fornell-Larcker: The average variance extracted (AVE) by each factor should be higher than the square of IFC (IFC2).

  3. (3)

    The heterotrait-monotrait ratio of correlations (HTMT) Value < 0.85 is favorable.

  4. (4)

    HTMT−Inference: value < 1 is assuring [35].

    The last two variables were estimated by the partial least squares (PLS) [36].

Model fit

This was appraised by a comparative fit index (CFI) value of ≥0.95 and the root mean square error of approximation (RMSEA) value of ≤0.06 [37].

External reliability

Intraclass correlation coefficient (ICC) was utilized. Cut-off values for strength of reliability were: < 0.5─poor, from ≥0.5 to ≤0.75─moderate, from ≥0.75 to ≤0.9─good, and > 0.9─excellent [22].

Internal consistency reliability

The following seven variables were estimated and compared with the corresponding cut-off points:

  1. (1)

    Cronbach’s alpha (α): > 0.7 [38, 39].

  2. (2)

    Average Inter-item correlation (AIC): ≥0.2 ≤ 0.5 [26].

  3. (3)

    Corrected Item-total correlation (CI − TC): ≥0.4

  4. (4)

    Alpha if item deleted (AIID): the resultant α of the selected scale should not rise if any item is deleted [38].

Methodologists consider α to be a controversial estimate; accordingly, the following was also reported:

  1. (5)

    The consistent reliability measure of the partial least squares (rhoA): > 0.7.

  2. (6)

    Composite reliability (rhoC): > 0.7.

  3. (7)

    AVE by each factor: > 0.5 [40].

Hypotheses

Yardley stated that PROMs are cumulative measures, while objective tests are single-point measures [10]. Thus, we may find adequate correlations between subjective scores if they measure the same construct; however, the concept is not the same when subjective and objective scores are correlated even if they are measuring similar constructs [15, 41, 42]; accordingly, the following hypotheses were formed:

  • The positive correlation between the total score of VSS  SF −  CK (VSS  T) and the VAS − T would be adequate, because they measure similar constructs with similar approaches.

  • The correlation between TR-T and VSS-V scores would be moderate because they measure similar constructs with different approaches; furthermore, the value would be negative (moderately negative) because low scores on TR-T are associated with high scores on VSS-V.

  • The negative correlation between TR  T and the VSS  AA would be weak because they measure different constructs with different approaches. Rank coefficient (Spearman) was used to estimate the correlations. The study classified values from assorted regulations as follows: < 0.3─weak, ≥0.3 < 0.5─moderate, ≥0.5 < 0.7─adequate, and ≥ 0.7─high correlations [16, 43].

Discriminating validity (external discrimination)

It is assumed that the instrument has the ability to discriminate between subgroups as well as between the patient and healthy groups. The Mann-Whitney U test was used to test this assumption with a significance level of 5%.

The flowchart (Fig. 1) illustrates the sequential order of the works implemented in the study.

Fig. 1
figure 1

The course of the study. Note: Each color represent a specific field of work in the study; Black arrows show the sequential order and connections between the fields. Abbreviations: VSS − SF/CK, Vertigo Symptom Scale−Short Form/Central Kurdish; VAS − T, Visual Analogue Scale−Total; TR − T, Tandem Romberg−Total; PAF, Principal Axis Factoring; DWLS, Diagonally Weighted Least Squares; HTMT, Heterotrait-monotrait ratio; CI − TC, Corrected Item-Total Correlation; AIC, Average Inter-item Correlation; AIID, Alpha If Item Deleted; rhoA, Reliability measure of the partial least squares; rhoC, Composite reliability

More details on methodology are available in Additional file 1.

Software

Three programs were utilized: 1- FACTOR V10.8.04 (Rovira i Virgili University, Tarragona, SPAIN) for PC, HPA, and goodness of fit [44]; 2- SmartPLS 3. (Boenningstedt: SmartPLS GmbH) [36] for rhoA and discriminant validity; and 3- IBM SPSS Statistics V21 (IBM, Armonk, NY, USA) for the rest of the analysis such as, PAF, α and syntaxes for HPA and MAP [45].

Results

Data related to participants and exclusions are presented in Fig. 1; no valid differences in the results were exhibited based on exclusions. Furthermore, more details of participants’ attributes are shown in Table 1.

Table 1 Demographic attributes of the groups and subgroups

Factorability was achieved, the determinant was not equal to zero (0.007), the Kaiser-MeyerOlkin test was meritorious (0.873), and Bartlett’s test of sphericity was significant (p < 0.001). Based on eigenvalues > 1, PAF revealed three factors. On this basis, a 3-factor solution was applied using DWLS. The cumulative proportions of variance (CPV) in the three factors were 53 and 59% in PAF and DWLS, respectively. In the case of DWLS, the three consecutive eigenvalues and the CPV were 6.2 (41%), 1.6 (52%), and 1.1 (59%). Nonetheless, the elbow of the scree plot was distinctly flexed at the point where the second factor was located (Fig. 2). Furthermore, HPA, MAP “See Additional file 4: Table S1 and Table S2”, and the a priori hypothesis also supported the scree plot display; that is, a 2factor solution.

Fig. 2
figure 2

Scree plot of the initial exploratory factor analysis, based on Eigenvalues > 1. Note: The flexion of the elbow at the second factor is maximal denoting 2 factors retaining

Consequently, a 2factor solution was conducted with both PAF and DWLS. Two factors were extracted: vestibular (VSS  V) and autonomic-anxiety (VSS  AA), In the case of DWLS, the two consecutive eigenvalues and the CPV were 6.1(41%), 1.6 (52%). Each factor adequately loaded seven items with weak cross-loadings. The remaining Item12 (feeling faint, about to black out), was loaded adequately by the VSS  AA; however, it was associated with noticeable cross loadings by VSS  V.

The AVE by neither method reached the acceptable level, as it was < 0.5 for both factors. Additional file 5 shows how to estimate AVE and rhoC. To assess the negative effects of low AVE on discriminant validity, AVE and IFC2 were compared (Fornell-Larcker criterion). In PAF, the AVE by both factors were lower than IFC2 (validity not established); while for DWLS, AVE was higher than IFC2 only in VSS  V (validity of one factor established). However, the validity was confirmed by HTMT value = 0.71 (< 0.85) and HTMT-inference value = 0.81 (< 1). To examine the situation, we deleted item 12 (the cross-loading item); consequently, in DWLS, the AVE by VSS–AA was slightly inflated and became more than a slightly deflated IFC2; hence, the Fornell-Larcker criterion was also achieved for the VSS  AA (Table 2). Additional file 6 shows the details of 2-factor extraction by DWLS and the results of model fit, CFI = 0.985 (≥0.95) and RMSEA = 0.049 (≤0.06).

Table 2 Item loadings in exploratory factor analysis with 2–factor solution and the internal consistency variables

Moreover, Table 2 presents the outcomes for the internal consistency variables, they were satisfactory for all methods and scales; regarding AIID, resultant α did not increase when any item was deleted. In both methods, values of rhoA and rhoC gained the acceptable limits.

The instrument and the comparators exhibited good to excellent reliabilities in all types (Table 3).

Table 3 External reliability of the instruments

Table 4 shows the Spearman’s correlations between VSS  SF −  CK and its subscales, VAS  T, and TR  T (Pearson’s correlations revealed similar results). The Mann-Whitney U test compared the medians of the scores and revealed that the distributions were similar in all scales across subgroups (ps > .05). However, they were not similar when the mean ranks of the control group were compared to that of the subgroups and total patients (ps < .05). For Pearson’s correlations and the medians/interquartile ranges, see Additional file 4: Tables S3 and Table S4. Further, the shapes of the scores are shown in Fig. 3.

Table 4 Spearman’s correlation of the scales with the comparators
Fig. 3
figure 3

Shape and distribution of the scores in subgroups and healthy group. Note: Subgroups were classified based on the pattern of presentations of the vestibular symptoms at the time of rating. Abbreviation: VSS, Vertigo Symptom Scale

Discussion

The study utilized a regulated process of cross-cultural adaptation and produced a VSS  SF  CK. The steps as described in the methodology were mostly applied in accordance with the related guidelines.

The nature of both the population and sample obliged the authors to involve raters (interviewers) and transform the instrument, as necessary, from self-administered to interviewer-administered (e.g., in cases of non-motivated and illiterate participants). The reliabilities of the VSS  SF − CK and the comparators were enhanced by these measures which was consistent with the test-retest results of the Norwegian and Japanese versions.

The results of both DWLS and PAF were nearly similar during EFA: seven items (1, 4, 6, 8, 10, 13, and 15), which are directly related to VD, firmly loaded onto vestibular factor with weak cross-loadings to the autonomic-anxiety factor; this was a preliminary sign of the discriminant ability of the VSS  V.

Previous studies as well as the present survey have used various types of analyses and samples; however, across these samples, two items (items-3 and 12) were associated with loading issues.

In five previous samples (Mexican, U.K. hospital, U.K. primary care, Norwegian [Table 3], and Japanese), item 3 (nausea, vomiting) loaded interchangeably on both factors with noticeable cross-loadings on every occasion [15, 16, 46]. The mean loading (calculated by the authors) in these samples showed that the reflectiveeffect of anxiety factor on item3 (loading 0.41) was higher than that of vestibular (loading 0.35).

The story of item3 began when the original developer, for several reasons, intentionally decided to retain the item along with other items in the VSS  V [46], knowing that this item originally belonged to the VSS  AA from a physiological viewpoint [47]. However, the present sample has strongly placed the item into the VSS  AA (Table 3), which can be attributed to the heterogeneous nature of the symptoms in this sample; that is, various presentations and durations.

The item 12 cross-loading issue (feeling faint, about to black out) is perhaps a structural matter. Out of six samples including the present survey, four of them included item 12 correctly with VSS  AA [15, 16, 46]; the order, starting from weaker cross-loadings, was U.K. primary care, Japanese, U.K. hospital, and then the present sample. In the remaining two samples, the item unexpectedly settled on VSS  V; the order, starting from stronger loadings, was Norwegian then Mexican. It is unexpected for an item to oscillate or cross-load between constructs unless it is flawed. Accordingly, we believe this item represents two different types of symptoms. The words are clear and assumed to belong to the autonomic-anxiety symptoms; however, we noticed that some patients tried using many words or clauses to describe strange feelings of dizziness (spatial disorientation), words that were similar to those used to describe fainting and/or being about to black out. In spite of this, in the present study, item 12 loaded adequately on VSS  AA (0.45); however, it was the only item characterized by the lowest loading and the highest cross-loading. The situation was investigated by deleting item 12, which resulted (in both methods) in deflation of IFC and slight inflation of AVE by VSS  AA (Table 3). Consequently, the Fornell-Larcker criterion was also obtained for VSS  AA, leading to establishment of discriminant validity.

Regarding the 15 items’ structural consistency, the item loading results in both methods were nearly similar, but the robustness of polychoric correlation via DWLS was evident through higher AVE and item-loadings. The two-factor model in the VSS  SF − CK was suitable according to the recommended fit indices. Along with structure, the construct was also validated across internal consistency parameters such as αs, rhoA, and rhoC, and it was clear from the results that all values achieved desirable levels. Despite the low AVE, discriminant validity was also established by both HTMT and HTMT-inference, while the Fornell-Larcker criterion was obtained for only one factor, VSS  V.

The hypotheses regarding convergent validity were supported. An adequate positive correlation was found between VSS − T and VAS − T as well as a moderate negative correlation between the VSS  V and stability; the latter replicated a similar correlation (between VSS-V and path length) in a previous analysis [15]. Although the types of scores in VSS − AA and TR − T are different (subjective and objective), the resultant weak negative correlation between them (− 0.14) indicates the divergent ability of the VSS  AA because they measure two different constructs (anxiety and stability).

The instrument significantly discriminated the healthy group from the patients’ group and subgroups; however, it was not efficient in discriminating presentation subgroups, most probably because patients narrated the sum of their symptoms from the onset, regardless of the presence or absence of symptoms at the time of rating; as Yardley stated, the score is a cumulative measure [10]. The interpretability and responsiveness were beyond the scope of this study.

Strengths and limitations

We believe that the study’s strength is its sample being representative of the target population. However, a potential limitation was related to convergent validity, as there were no validated comparator PROMs in Kurdish that could measure the same construct; for that reason we utilized VAS and emphasized discriminant validity. Second, close observation was required to sustain patients’ motivation for self-rating; and finally, because of the accommodation issue, we were obliged to shorten the minimum interval between rating events to 1 day.

Conclusion

The VSS − SF was cross-culturally adapted to Kurdish. It revealed high external reliabilities. The structure of the 2-factor model was associated with high internal consistency and composite reliability with the ability to discriminate two latent variables (vestibular and autonomic-anxiety). These stabilities were confirmed by goodness of fit indices. It has adequate correlations with the comparators, demonstrating convergent validity. VSS − SF − CK is, then, a consistent and validated PROM that can be used by Kurdish researchers and clinicians to quantify vestibular symptoms before and/or after treatment protocols.