Introduction

Migraine is a common and disabling neurobiological disorder [1] which is under-recognized and under-treated [2, 3]. It imposes a substantial health burden with nearly all migraine sufferers experiencing impairment of social activities and of work capacity [4, 5]. The World Health Report 2002 [3] ranks migraine as number 12 in women and number 19 in both genders amongst all causes of disability in the world. In spite of this, it is estimated that only about 50% of migraine patients are diagnosed and, therefore, treated adequately [610]. There are a few validated questionnaires such as the ID migraine to diagnose migraine, and the migraine disability assessment score (MIDAS) to assess disability in the last 3 months, but there is no comprehensive questionnaire to assess migraine associated burden.

The physical and emotional impact of migraine on individual sufferers, their caretakers, family and colleagues is poorly acknowledged and this is true as well for the social and economic burden of migraine on society in comparison with those of other less prevalent, neurological disorders [9, 1113].

We aimed to develop and validate a questionnaire to assess the burden of migraine after having translated it into the main languages in order to use it in subsequent studies in different linguistic populations.

Methods

Questionnaire

We designed a questionnaire combining elements from established questionnaires and added further questions concerned with disease management and social consequences of headache. Priority areas for the questionnaire were defined with joint support from NGO’S (Swiss Migraine Trust Foundation, Migraine Action Association UK, Switzerland and Luxembourg), several international headache experts (see “Acknowledgements”) and the Luxembourg Ministry of Health. Ethics committee approval for the study was obtained from the National Ethic and Research Board of Luxembourg.

The resulting questionnaire contains 77 items, 17% of them are open questions. In the first part, the respondents are asked for biographical details such as age, gender, their most spoken language and their employment status. For the purpose of migraine diagnosis, the questions from “ID migraine” [14] are included. Specific information on headache, such as age of onset, the average number of headache days per month for the last 3 months, and symptoms before and after the headaches are gathered as well as information on general health, and previous and current disorders using items from the World Health Organization Disability Assessment Schedule II (WHODAS II) [15], the Migraine Disability Assessment Scale (MIDAS) [16] and the Patient Health Questionnaire-9 (PHQ-9) [17]. Participants are asked about the influence of headaches on their job and family-life as well as whether they ever had consulted medical doctors, about the diagnosis that was made and about the medication that had been prescribed. Psychosocial circumstances having worsened the headaches, limitations in social activities, conceptions of headache and the need of support from health professionals to improve the headaches are also assessed.

Evaluation of the questionnaire

The testing of the questionnaire included face, content, and language validity; the stability of the questionnaire over 1 month, a period of time during which little or no change is expected (Test–retest reliability); the extent to which the questionnaire is able to discriminate between respondents with more or less severe disease status (construct validity) and the extent to which individual items in a questionnaire correlate with other items relating to the particular area of investigation (internal consistency). The respective methodology is detailed below.

Study population

Patients with headache were recruited from primary care centres, pain clinics and lay organisations. The idea behind this recruitment was to test the questionnaire in different settings. Selection for the primary care setting was done by doctors in general practice from the personal acquaintance of the project team. For the pain clinic setting the patients were selected from the pain clinic of the Centre Hospitalier de Luxembourg (Central Hospital in Luxembourg). When consulting because of headaches, both of these patient groups were asked by their physician to complete the questionnaire. For a third group of headache patients, headache sufferers with different employment settings were consecutively recruited by the national occupational health service and by a patient organisation.

The samples size needed to investigate internal consistency, construct validity and for test–retest reliability was estimated by using the kappa formula (see below). Assuming an absolute precision of 0.18 (given the validated parts of the questionnaire), we estimated that 73 responses to the main questions in the second test would enable a Kappa value of ≥0.5 to be detected with a power of 0.95 (two-tailed α = 0.05). Thus allowing for a 60% response rate, 135 subjects were considered necessary.

Face, content and language validity

Initial content validity was explored through systematic review by experts, and face validity was tested by pre-piloting with 23 volunteers. All questions which had not been used before in the respective language in validated questionnaires were translated using a forward–backward method with two different native translators. Comprehensiveness was piloted with native speaker volunteers.

Test–retest reliability

Questions were categorized by the amount of change expected, as described previously for the development of a comparable questionnaire [18], primarily based on the time frame of the question and blinded to the results as follows: ‘no change expected,’ ‘change unlikely,’ ‘1 unit change expected,’ ‘3 unit change expected,’ ‘change likely’.

The data from the two periods of answering the questionnaire were compared to assess test–retest reliability. For categorical data, this was estimated by using agreement measures as percentage agreement, Kappa values, Mac Nemar’s S test and Bowker’s S test. Percentage agreement gives an estimate of within-patient agreement. The Kappa coefficient indicates when the observed agreement exceeds chance-agreement; a value above 0.6 is generally considered as acceptable. The Mac Nemar’s S provide a measure of agreement when used between two measures of the same questionnaire in the same patient. The null hypothesis of the Bowker’s S test is that the probabilities of cells in the square table satisfy symmetry. It was used for r × c tables where r > 2 or c > 2. For the questions with discrete integer data, intraclass correlation coefficient (ICC) was calculated using a 2-way random effects model for agreement.

Construct validity

Comparisons between these samples were made for the total scores of the WHODASII, MIDAS and PHQ9. Comparison between categorical scores of the three samples was performed by using a chi-square test. Continuous values of the scores were also used for comparison and a one-way ANOVA was used with the score as dependent variable. Normality was assessed with a Kolmogorov–Smirnov test; if significant, data were log-transformed and analysed if normally distributed. Otherwise, the Kruskall–Wallis test was used.

Internal consistency/content

Where appropriate, cross-tabulations were used to check for internal consistency. Blocks of questions corresponding to ID, WHODASII, MIDAS and PH9 were compared in terms of correlations.

This was done in order to verify if they measure the same construct in a multilingual context and in the newly designed questionnaire. The Cronbach’s alpha coefficient was used to explore the overall consistency of the ID, WHODAS II, MIDAS and PHQ-9 questionnaires. The larger the overall alpha coefficient, the more likely that items contribute to a reliable scale. A value of 0.70 suggests an acceptable reliability coefficient; smaller reliability coefficients are seen as inadequate. A coefficient alpha after deleting each variable independently from the scale was calculated to determine how each item reflects the reliability of the scale. When the coefficient increases after an item is deleted from the scale, one can assume that the item is not correlated highly with other items in the scale. Conversely, if the coefficient decreases, it can be assumed that the item is highly correlated with other items in the scale.

Results

Population and frequency of headache in the samples

A total of 130 questionnaires were completed leading to a response rate of 65% (Fig 1). Out of this sample, 15.4% (n = 20) were from the pain clinic, 13.1% (n = 17) from the primary care centre and 71.5% (n = 93) from the lay organisation (Table 1). Fifty-two persons (40%) responded in German, 1 (0.8%) in English, 72 (55.4%) in French and 5 (3.8%) in Portuguese. Eighty-four percent were women, mean age was 41.9 ± 11.5, the gender distribution was significantly different (P = 0.03) between centres. There was no statistically significant difference in age, age at onset of headaches, work status and diagnosis of migraine between the three groups. Headache frequencies were unequal between centres (P = 0.02) with higher headache frequencies in subjects at the pain clinic. In the primary care setting, most individuals were in the 4–9 days/month category. In the pain clinic, most of the patients had headache on ≥15 days/month. The general public population had a similar profile as the primary care setting.

Fig. 1
figure 1

Tests and samples used in the different steps of the BURMIG questionnaire validation

Table 1 Socio-demographic and headache characteristics of the validation sample

Out of 130 subjects of the whole population, 28 did not answer all the MIDAS questions leading to the unfeasibility to calculation of the total score. Thus, 102 subjects only had the total score. When re-running the comparison without the unhealthy subjects, only 10 subjects out the 28 had no total scores. Forty-nine subjects had the total score.

Completion rates

Completion rates for the items of the questionnaire varied between 5.83 and 100.00%. As the questionnaire included some questions with more than one possible choice or sub-questions, only the principal item was kept to evaluate questions with good completion rates. Thus, 63% questions were found to have completions rates of 90% or more. Questions where there were several choices tended to have completion rates around 10%. There was no difference for completion rates between genders and language groups. Completion rates of the second questionnaire varied from 5.41 to 100% and were very similar to the first questionnaire (63% of questions with completion ≥90%).

Test–retest reliability

Out of the 130 subjects recruited for the validation process, 91 subjects replied a second time to the questionnaire sent 1 month later. Seventy-nine single items (including sub-questions) were used to assess reliability, excluding open questions; 67.1% of the items (n = 53) were over an 80% agreement, whereas 13.9% (n = 11) ranged between 60 and 80% and 19% (n = 15) were below a 60% agreement.

The Kappa coefficient ranged from 0.23 to 0.99. Questions categorized as ‘no change expected’ (0.86–0.99) and ‘change unlikely’ (0.68–0.99) showed a good agreement (Table 2). From the items categorized as ‘1 unit change expected’ or ‘3 unit change expected’ the kappa showed values ranging from 0.45 to 0.92, indicating a poor agreement for some questions; unsurprisingly, from the items categorized as ‘change likely’ the kappa value showed lower values ranging from 0.23 to 0.77. Questions, which showed the smallest agreement, were the items from the WHODAS II, PHQ9 and questions 5 and 6 from the MIDAS.

Table 2 Test–retest reliability with percentage agreement and kappa values

Mac Nemar’s S test showed no significant differences. Only one item was significant (P = 0.03) with the Bowker’s S test: No agreement was observed for ‘Feeling tired or having little energy’ from the question 25 (PHQ9) between the two measures. The intra-class correlation coefficient for quantitative answers is detailed in Table 2. Values were significant for questions 15 (from WHODAS II) and 18 (from MIDAS) (Table 3).

Table 3 Test–retest reliability with McNemar’s coefficient for 2 × 2 tables, Bowker’s coefficient for more than 2 classes variables and intraclass correlation for continuous variables

Construct validity

The mean frequency of headache days was significantly different between the three samples (Table 5). While few subjects had high headache frequency in the primary care and general population samples, a large proportion (45% of subjects) in the pain clinic sample had ≥15 headache days per month. However, there was no difference between the three samples in terms of average disability attributed to headaches (MIDAS total score) or of depression (as measured by PHQ9). The mean scores of WHODASII, MIDAS and PHQ9 were not different between the three samples (Table 6) except for a significant pair-wise difference between the pain clinic and the general population sample with the MIDAS total score (P < 0.05) (Table 4).

Table 4 Internal consistency

A subanalysis was carried-out after omitting patients (n = 71) with headache from the general population sample in order to better discriminate WHODASII, MIDAS and PHQ9 values between levels of headaches. The remaining general population sample (n = 22) was assumed completely healthy while the pain clinic sample was supposed to be the most affected group. Results showed a clear trend (P = 0.06) for the mean number of days with headaches and the presence of depressive disorder (P = 0.09). A highly significant difference was observed between the general population sample, the pain clinic and the primary care sample for MIDAS scores (P value = 0.0005) but not for the PHQ9 depressive disorder estimate (Table 5).

Table 5 Construct validity for frequency of headaches, MIDAS, and PH9 categorical scores

The mean WHODASII score did not show any significant difference in this subanalysis while for MIDAS and PHQ9, total scores were significantly different (Table 6).

Table 6 Pairwise differences of WHODASII, MIDAS and PH9 scores between groups

When further analysing pair-wise relationships between the three samples, differences (P < 0.05) were observed between the MIDAS score of the pain clinic sample and the general population sample when including all subjects, and also between the PHQ9 scores of the primary care and the general population sample (Table 6) when excluding incompletely healthy subjects from the general population sample.

Internal consistency/content

The standardized values of the Cronbach’s alpha to test the consistency of (ID, WHODASII, MIDAS and PH9 tested in the new questionnaire were 0.26, 0.91, 0.74 and 0.84, respectively. Questions categorized by the amount of change expected and compared between the test and the retest time to assess the internal consistency showed a 80–100% agreement except for open questions where more than 70% change was observed (Electronic supplementary material).

Discussion

We described the development and methodological testing of a self-reporting questionnaire to evaluate the burden of migraine in the general population.

Completion rates for each question were generally good with the vast majority between 60 and 90%. A small number of questions showed low completion rates which can be explained by the fact that they were part of multiple-choice questions. Some other questions did not have to be answered in all participants since they applied only to subgroups. Questions from WHODAS II and PHQ9 showed both, good completion rates, and good reliability. For methodological purposes, we had defined the amount of change expected for each question before administering the questionnaire. Questions, where a change had been expected, actually showed higher amounts of change and lower reliability. This means, that these items were used in an appropriate way and that they can be used as part of a questionnaire on the impact and burden of migraine and headaches. The question “Feeling tired or having little energy” from PHQ9 was found to have little re-test reliability at 1 month interval which can be explained by the transient character of this item.

Internal consistency was evaluated independently for each scale tested within in the questionnaire. It was found to be excellent for the MIDAS and somewhat smaller for questions from WHODASII and PHQ9.

Construct validity was found to be acceptable when samples were adequately chosen to discriminate between levels of headache. However, questions from WHODAS showed a poor discrimination between headache patients and the general population. This can be explained by the fact that this tool is not specifically designed for headache sufferers. The headache specific MIDAS, as expected, showed good discriminative power.

Disease management

Regarding questions on disease management, agreement ranged from 77 to 98% (except for multiple-choice questions). Kappa coefficient ranged from 0.68 (0.62 with multiple-choice questions) to 1.00 which indicates good agreement between the two steps.

The majority of the questions about private and social influence were of the multiple-choice type and scored poorly in terms of percentage agreement (10–30%), but had a good retest reliability (kappa coefficients ranging from 0.52 to 0.97). These questions were therefore stable with time and could be used in a large study with a period of recruitment lasting a few months.

Changes brought to the final questionnaire

In the disease management part, two questions on medical doctor consultations were merged into one question allowing a better completion.

In one question on the temporal relation between headache and other problems, in addition to “before,” and “after” a third item “during” was added.

Conclusions

A new questionnaire, BURMIG, was developed with the aim to estimate the burden of migraine. It uses established and previously validated items for diagnosis and to measure disability and depression. Questions related to disease management and the influence on daily living were added. The resulting questionnaire was tested in a sample in the Grand Duchy of Luxembourg. Reliability and consistency of BURMIG were found to be comparable to previously published questionnaires. Therefore, this tool is suitable to study larger populations of headache patients.