Skip to main content

How to measure fluctuating impairments in people with MS: development of an ambulatory assessment version of the EQ-5D-5L in an exploratory study



Health fluctuations even within a single day are typical in multiple sclerosis (MS), but are not captured by widely used questionnaires like the EQ-5D-5L. This exploratory study aimed to develop an ambulatory assessment (AA) version of the EQ-5D-5L (EQ-5D-AA) where patients rate their health on mobile phones multiple times per day over several days, and to assess its feasibility and face validity.


An initial EQ-5D-AA version was based on two patient focus groups. It was then tested and continuously developed in an iterative process: patients completed it over several days, followed by debriefing interviews. Findings were used to refine the EQ-5D-AA, with the resulting version being tested by the subsequent wave of patients until participants declared no need for changes anymore. Before and after the AA period, participants completed the standard paper-based EQ-5D-5L asking about ‘today’.


Focus group participants reported that their impairments often fluctuated between and within days. They regarded an AA with three assessments per day over seven days most appropriate; assessment should be retrospective to the previous assessment, but not all items should be assessed at each time point. Four waves of AA testing were conducted. Thirteen out of the 17 participants preferred the AA over standard assessment as they regarded it more informative, but not too burdensome.


The newly developed one-week AA of the EQ-5D-5L captures within-day and day-to-day health fluctuations in people with MS. From the patients’ perspective, it is a feasible and face valid way to provide important information beyond what is captured by the standard EQ-5D-5L.

Plain English summary

People with the neurological disease multiple sclerosis (MS) have different symptoms and impairments that can reduce their quality of life. These impairments are often not constantly present but change within a day or from one day to another. Measuring these changes might help clinicians treat people with MS better, and it might also be useful in studies, for example those investigating the effectiveness of MS medications. Therefore, we developed a way to measure the fluctuations in these patients’ everyday lives, using mobile phones. First, we discussed with a group of patients how the instrument should look. Second, we developed a first version of the instrument, which was tested by patients and then refined. In the new instrument, patients answer questions about their health three times a day over nine days on their mobile phones. The questions were taken from the EQ-5D-5L questionnaire, which is a well-established instrument measuring health-related quality of life. The questions covered mobility, self-care, usual activities, pain/discomfort and anxiety/depression, as well as a 0–100 scale where patients rate their subjective health. Our study participants found the new instrument feasible and useful.


Multiple sclerosis (MS) is a chronic, currently uncurable, inflammatory disease of the central nervous system characterised by clinically significant fluctuations in symptoms and functioning. MS frequently affects vision, mobility, cognition, bladder control and other functions [1]. The most frequent MS phenotype is relapsing–remitting, followed by secondary and primary progressive disease course [2]. In relapsing–remitting MS, symptoms worsen during the clinical episodes (relapses) and last for a period of weeks to months [3]. However, symptoms also fluctuate at shorter intervals within a single day [4, 5] and from one day to the next [6]. For example, in fatigue, a frequent MS symptom, 35.5% of variability could be attributed to moment-to-moment fluctuations, 8.2% to day-to-day changes and 56.6% to individual differences [7].

The vast majority of patient-reported outcomes measures (PROMs) do not assess fluctuation but ask for the extent of impairment within a specific period like “the last seven days” or “today”. To choose a response option, respondents must summarise their experience within the reference period to form some kind of average or typical value. For example, a person may rate pain that is mild in the morning but gets more severe over the day as “moderate” as this represents the average intensity; another person in the same situation may choose “severe” as this represents the maximum.

However, information on short-term fluctuation is crucial for the understanding of impairments in MS. In addition to considerable diurnal variability within persons, temporal patterns differ between persons. Furthermore, fluctuation data within a single day can help uncover the interrelation between different impairments, as associations between symptoms were found predominantly within a day with little carry-over effect from one day to the next [8]. In clinical practice, information on these fluctuations is highly relevant for rehabilitation, medication adjustment and life planning. For example, spasticity substantially fluctuates depending on time of the day, activity level, temperature, but also psychological factors. A sensitive assessment of impairments related to this symptom can help to adjust dosing of antispastic agents which have also substantial side effects.

Fluctuations can be captured by a method called ambulatory assessment (AA) where respondents provide information on mobile devices multiple times per day over several days [9, 10]. In addition to capturing within-person dynamics, AA reduces the need for respondents to average their health problems over longer periods of time, reduces recall bias, and can be assessed in everyday life, thereby providing high external validity [11]. As a drawback, AA increases response burden. Moreover, when repeatedly answering the same questions and thereby gaining experience with the surveyed construct, respondents may adjust their responses to the rating scale. Their answers will thus not be fully comparable anymore, a phenomenon known as recalibration response shift [12]. AA is increasingly being used in PROMs [13, 14] where it has been found to be feasible and valid [13, 15].

One of the most widely used PROMs is the EQ-5D-5L, a generic instrument of health status [16, 17]. Its first part, the EQ-5D descriptive system, includes five items (one per dimension) assessing mobility, self-care, usual activities, pain/discomfort and anxiety/depression, each with five response options representing different levels of severity [18]. The second part, EQ VAS, measures self-rated health with a horizontal visual analogue scale (VAS), the anchors labelled “The best health you can imagine” (100) and “The worst health you can imagine” (0). Both parts refer to health “today” without differentiating by time of day. The EQ-5D-5L has replaced the previous version EQ-5D-3L that had only three response options, hoping to decrease the considerable ceiling effects. These are still found for the 5L version in the general population, but less so in people with increased morbidity [19, 20].

Psychometric properties of the EQ-5D-5L have been investigated in people with MS (PwMS), finding good test–retest reliability and convergent validity, but limited content validity and discriminative ability [21]. In other chronic diseases, it has also been found that the EQ-5D misses some relevant aspects of health-related quality of life; for example, fatigue [22]. We nonetheless decided to use the EQ-5D-5L in this study because of its combination of widespread use and brevity, the latter being crucial for feasibility in an AA.

The EQ-5D-5L also captures dimensions of health that are highly relevant in MS: Persons with relapsing-remitting MS reported “some” or “extreme” problems in mobility (68.9%), self-care (38.2%), usual activities (77.9%), pain/discomfort (63.9%) and anxiety/depression (58.5%) (using the former, three-level EQ-5D version) [23]. When currently in a relapse, the number of PwMS who experience problems was found to be even higher with 55 to 94% by dimension [24].

To our knowledge only two other studies have measured within-day fluctuations with adapted versions of the EQ-5D-5L, both in non-MS patient groups. In Kerr et al. 2016, persons with Parkinson’s disease completed the EQ-5D-5L both for “on-time” (where medication is working well) and “off-time” (where it does not), reporting also the duration of both states [25]. Considerable within-day fluctuations were found. With MS, however, it is not as clear cut as good/bad, calling for a different approach to capturing health dynamics. In the second study, a momentary version of the EQ VAS with 10 assessments per day has successfully been tested in three patient groups and healthy people [26]. They found that average AA ratings correlated with, but also significantly differed from, the standard EQ VAS as assessed after the AA period and may therefore provide important additional information. The EQ-5D descriptive system was not included in that study.

To enable the measurement of health fluctuations in PwMS, we therefore aimed to develop an AA version of the complete EQ-5D-5L (called EQ-5D-AA, for ambulatory assessment of the EQ-5D) for use in this patient group. As an AA implies higher response burden than a one-time questionnaire, we also aimed to assess the EQ-5D-AA’s feasibility and its face validity from the patient perspective as compared to the standard EQ-5D-5L. This study focuses only on the EQ-5D as a measure of health in research and clinical practice, not on its role as a tool for economic evaluation for which it is frequently used.


This was a qualitative descriptive study [27] involving focus groups and one-on-one, in-person or telephone interviews with additional exploratory quantitative analyses. It included two phases: (a) the use of patient focus groups, resulting in a first version of the EQ-5D-AA and (b) completion of the EQ-5D-AA by subsequent waves of PwMS, each followed by cognitive debriefing and refinement of the instrument (Fig. 1).

Fig. 1
figure 1

Flow chart of study procedures; PwMS, people with multiple sclerosis; EQ-5D-AA, ambulatory assessment of the EQ-5D-5L

Participants were recruited at the MS outpatient clinic at the University Medical Center Hamburg-Eppendorf and through MS self-help groups (newsletter and posting). Inclusion criteria were: age ≥ 18 years, confirmed MS diagnosis, fluent in German, and sufficient cognitive and physical ability. The study sample should be heterogeneous with regard to disease severity, cognitive impairment and age, and should include both men and women. Participants received financial reimbursement.

Focus groups

In the two focus groups, we introduced participants to the EQ-5D-5L and the concept of AA. We asked them to report on the extent and pattern of fluctuation they experienced in each EQ-5D-5L dimension both within and between days. They also discussed which AA specifications would be optimal to capture these fluctuations, like number of assessments per day, time points of data collection and retrospective vs. concurrent assessment, taking ease of administration into consideration.

Participant characteristics were assessed with a self-completion questionnaire, including sociodemographic and clinical data, EQ-5D-5L and Perceived Deficits Questionnaire (PDQ-20 [28]) on cognitive impairment.

Audio recordings of focus group sessions were transcribed verbatim. The qualitative approach used here was iterative thematic analysis. For this, we extracted all text passages potentially relevant for the research questions. Each extract was translated to English (for the international research group) by two members of the German team and summarised, and extracts were grouped by theme; additionally, each theme was summarised separately. Based on these findings, the research group achieved consensus on specifications of the first version of the EQ-5D-AA; the research group included experts on EQ-5D-5L, PROMs, MS and qualitative methodology. Specifications were implemented in movisensXS (Movisens GmbH, Karlsruhe, Germany), an app specifically developed for AA studies. EQ-5D was modified by the authors with permission by the EuroQol Research Foundation.

AA testing and cognitive debriefing

The EQ-5D-AA was tested by four subsequent waves of three to six PwMS, followed by individual debriefing interviews. After each wave, we refined the AA according to participant feedback, with the resulting version being tested by the subsequent wave of PwMS. The sample size was guided by the concept of information power [29], that is, additional waves were conducted until no need for changes to the AA emerged anymore.

In detail, procedures were as follows. In a face-to-face meeting, participants familiarised themselves with the software using a test version. The EQ-5D-AA was installed on the participant’s own Android smartphone or a loan unit (Samsung Galaxy A3), at the participant’s option. They completed the standard paper-based EQ-5D-5L about “today” and a questionnaire on sociodemographic and clinical data. During the following seven (in later waves, nine) days, they completed the EQ-5D-AA three times a day.

After the AA period, participants again completed the standard paper-based EQ-5D-5L.

In a subsequent debriefing, we interviewed each participant on feasibility of the EQ-5D-AA. Interviews were conducted in person or by phone, if needed. We used a pilot-tested interview guideline covering the following themes: feasibility and appropriateness of number of assessments per day and time points of data collection; feasibility and appropriateness of item wording; feasibility completing the AA for seven (or nine) days; face validity and preference for either AA or standard EQ-5D-5L; any further comments or suggestions for the EQ-5D-AA (Online Appendix 2).

For investigation of face validity, participants of the in-person interviews were presented their individual EQ-5D-AA patterns displayed graphically along with their completed baseline paper EQ-5D-5L. Participants were asked whether and why they believed the AA data provided (or did not provide) important information about their health beyond the one-time assessment.

Analytical procedures were the same as in the focus group analysis (transcription, extracting and summarising, discussion with research group). In discussing the findings, focus group findings were also considered if pertinent to the respective theme. We used the results from each wave of debriefings to refine the EQ-5D-AA, which was then tested in the subsequent wave of participants. As we aimed to adopt only those changes that are needed specifically for an AA version but did not intend to optimise the EQ-5D-5L itself, no change suggestions relating exclusively to the instrument itself were considered.

Quantitative analyses

In exploratory quantitative analyses, the distribution of EQ-5D-AA responses was evaluated, including variability (number of participants with invariant responses; standard deviation (SD) of the EQ VAS), percentage of responses indicating no impairment, and graphical depiction of EQ VAS responses over the study period.

To test for agreement between standard and AA version, index scores were calculated for both the first assessment of EQ-5D-5L (at study inclusion) and the EQ-5D-AA, using the German value set [30]. For the EQ-5D-AA, scores were calculated separately per day and then averaged over days. In those items collected multiple times a day, the response indicating the highest impairment within that day was used. We did not calculate a score for each time point within the AA because not all EQ-5D items were collected at each time of the day. Agreement between scores based on EQ-5D-5L and EQ-5D-AA was determined using two-way mixed, average score, absolute agreement intra-class correlation (ICC). Agreement of average responses on single-item basis was evaluated descriptively only, as assumptions for ICC calculation were not met. For this, responses at study inclusion were averaged over participants, and EQ-5D-AA responses were first averaged over single assessments for each participant, then averaged over participants.


Focus groups

The first focus group had 4 participants (1 male, 3 female), the second 5 (all male, Table 1). Both took place in August 2019. Age ranged from 29 to 55 years. All participants were employed except for one in early retirement. Six out of 9 participants had A-levels school education (i.e. 12 or 13 years of school education). MS types included relapsing-remitting (n = 6) and secondary progressive (n = 3); participants had been diagnosed with MS between 1 and 21 years before. The EQ-5D-5L index score ranged from 0.38 to 1.00 (1 representing full health). EQ VAS ranged from 45 to 97 (100 representing full health). Cognitive impairment was between 0 and 35 on the PDQ-20 scale ranging from 0 (no impairment) to 80 (highest impairment).

Table 1 Participant characteristics

Focus group analysis resulted in eleven themes: one on each EQ-5D-5L item (including EQ VAS), one on retrospective versus momentary assessment, three on alerts at different times of day (morning, midday and evening), and one on options to postpone or silent alerts.

Most participants agreed that the best way to measure the fluctuations they experienced was to assess retrospectively to the previous assessment instead of for the current moment (i.e. using a coverage strategy instead of a sampling strategy [15, 31]. They also agreed that questions should be asked for seven days at three times a day (morning, midday, evening), but not including all six items at each time point. For example, the EQ VAS should only be assessed in the evenings with regard to the time period since the previous evening as this was sufficient to describe overall health, while pain/discomfort should be assessed three times a day.

Participants differed in how much they reported their impairments to fluctuate, with some of them even reporting constant levels in some items: For example, one person who used a wheelchair was always unable to walk, and two persons never had any problems with self-care. However, for each item, most participants reported significant fluctuation within and/or between days.

Based on the focus group findings, specifications of the initial EQ-5D-AA version were derived.

AA testing and cognitive debriefing

The EQ-5D-AA testing was conducted between February and June 2019. Four waves were needed, including three, three, six and five participants (n = 17; 6 males, 11 females; age 21–63; three of them had participated in the focus groups) (Table 1). Participants reported being employed (n = 7), in early retirement (n = 5), student/trainee (n = 3) or other (n = 2). The most frequent levels of school education were A-levels (n = 8) and secondary school certificate (n = 5; other, n = 4). MS types included relapsing–remitting, primary progressive and secondary progressive. Participants had received the MS diagnosis between two and thirty years before. EQ VAS ranged from 30 to 98, EQ-5D-5L index scores from 0.35 to 1.00. Cognitive impairment ranged from 0 to 46 on the PDQ-20 scale of 0–80.

After the first, second and third wave, substantive changes were made to the EQ-5D-AA. For example, the assessment of depression/anxiety was changed from one to three times a day, and we added an option to do the midday alert earlier if the person is going to take a nap. The results from the fourth wave suggested only one minor change, which did not warrant an additional wave of testing: A "Good day" and "Good evening" page should be added to the midday and evening alert, respectively. There were also specifications of the EQ-5D-AA for which the debriefing interviews did not suggest a need for changes, for example the frequency of assessments (i.e. three times a day).

Specifications of the final EQ-5D-AA version are listed in detail in Table 2 along with rationales, example citations from either focus groups or cognitive debriefings, and the preceding version, if applicable. Minor modifications of the AA wording are not listed, for example changing the morning instruction from “… last night” to “… since yesterday evening”. Screenshots of the final German EQ-5D-AA with translations to UK English are shown in Online Appendix 1. Briefly, the final EQ-5D-AA version assesses EQ-5D-5L items three times a day over a period of seven days, preceded by a familiarisation phase of two days. Participants are reminded of item completion by a repeated alarm. Morning and evening times are specified individually as the earliest and the latest time the participant is usually awake; the midday time is the exact middle between these two time points. The morning assessment time can be defined differently for weekdays vs. Saturday/Sunday. Mobility, pain/discomfort and anxiety/depression are assessed three times a day, mainly because participants considered these to be highly fluctuating. Usual activities are assessed at midday and in the evening, self-care and EQ VAS in the evening only. Participants can prepone both the midday and the evening alert so that the AA will not interfere with sleep.

Table 2 Specifications and rationale of the final EQ-5D-AA

Feasibility of the EQ-5D-AA

Asked to elaborate on feasibility in the debriefing interviews, most participants evaluated the EQ-5D-AA as short, easy, comprehensible and fine to handle:

Female, 57 years: “For me, that was okay. I did not feel bothered in any way. (…) It could easily be integrated into the changing everyday life that I have. (…) It does not take long, (…) one minute maximum.”

Male, 51 years: “I was doing fine with it. The questions are clearly worded so that you know what is asked for.”

Female, 62 years: “I got along well. (…) I only feared it could wake up my neighbour. (…) There also have been no difficulties with the mobile (which I had feared in the beginning), because the questions were always the same.”

However, some participants found the alerts to be annoying in some situations, and some could not respond at all times and therefore missed or postponed alerts:

Female, 28 years: “It was actually quite pleasant. Though sometimes I was interrupted in my daily habits, when suddenly the mobile rings and you’re like: No! Silence, silence, silence!”

Female, 57 years: “Those two times or so that I forgot … not forgot, but too late … I think that wasn’t so dramatic.”

Occasionally, technical problems occurred (e.g. having to restart the study within the app; irrelevant warnings displayed by the app).

Missing values

While seven participants responded to each alert, ten participants missed between one and ten alerts. This corresponds to 0–45.5% missing responses per person (mean: 7.4%). No single items were missing, that is, whenever participants responded to an alert, they answered all items. In the interviews, participants stated as reasons for non-response being busy at work or doing leisure activities, sleeping, forgetting to switch on the phone, not taking the lend device with them or accidentally choosing the ‘decline’ option.

Face validity

Asked how well the EQ-5D-AA represented their actual health during the respective week, 13 out of 17 participants thought that the AA was better in capturing health than the two assessments with the standard EQ-5D-5L before and after the AA period. Stated reasons were that the AA was more informative or precise, captured fluctuations better, evaluated more than two days (the latter being more of a snapshot), measured multiple times per day and provided the opportunity to get used to the questions.

One participant thought the two assessments were better suited to depict health, but without being able to give a reason; one participant thought both methods were equally appropriate; and two did not make a clear statement on this question.

Variation in EQ-5D-AA items over time

In all five items of the EQ-5D descriptive system, the percentage of “no problems” responses in the AA was higher than 50% (averaged over participants; self-care: 66.3%; anxiety/depression: 65.0%; mobility: 51.7%; pain/discomfort: 51.4%; usual activities: 50.0%). Depending on the item, between three and seven of the 17 participants did not show any variation in the respective dimension. In all these cases, “no impairment” was reported, except for one participant who in the mobility item stated the highest possible impairment (“unable to walk”) at all time points.

For the EQ VAS, variability differed markedly between participants with 0.7–24.7 SD. Figures 2 and 3 depict the individual EQ VAS courses over the assessment period (which was eight or ten days: participants used either the seven-day or the nine-day version, and the AA started already in the evening after study inclusion which added another VAS assessment). Figure 2 shows that participants with more stable responses (SD < 9, based on median split) could be in either good or bad health as measured with the EQ VAS. Figure 3 shows that in participants with higher variability (SD > 9), no clear pattern of increase or decrease over time is discernible.

Fig. 2
figure 2

EQ VAS responses collected in the EQ-5D-AA, by participant and day of study (subsample: participants with a low variability (SD < 9) in the EQ VAS; each line represents one participant; n = 9) EQ VAS, visual analogue scale of the EQ-5D-5L; SD, standard deviation; EQ-5D-AA, ambulatory assessment of the EQ-5D-5L

Fig. 3
figure 3

EQ VAS responses collected in the EQ-5D-AA, by participant and day of study (subsample: participants with a high variability (SD > 9) in the EQ VAS; each line represents one participant; n = 8) EQ VAS, visual analogue scale of the EQ-5D-5L; SD, standard deviation; EQ-5D-AA, ambulatory assessment of the EQ-5D-5L

Agreement between standard EQ-5D-5L and EQ-5D-AA

At a group level, agreement between index scores calculated for standard paper EQ-5D-5L at study inclusion and for EQ-5D-AA (averaged over days) was high with an ICC of 0.833. Agreement was also high for the EQ VAS with an ICC of 0.896. Looking at the single items on group level, participants reported slightly more problems in the standard version than in the AA (Table 3). On single participant level, the largest differences between the two assessments were found for mobility being rated up to 3.1 points worse on paper. Differences in the other direction, i.e. better health ratings in the AA than at study inclusion in single patients, were less pronounced with up to 0.65 points difference.

Table 3 Descriptive comparison of standard paper EQ-5D-5L (collected at study inclusion) with EQ-5D-AA (averaged over single assessments for each participant)


In this study, we developed an AA version of the EQ-5D-5L for use in people with MS. After two focus groups and four waves of iterative testing and refining, the EQ-5D-AA was finalised. The AA was extended from seven to nine days due to participants reporting recalibration response shift within the first two days. Participants judged the AA as not too burdensome to complete for this duration and also considered it feasible. Most of them found it more informative than the standard one-time assessment of EQ-5D-5L.

There was high agreement between one-time assessment and average AA index scores in spite of intra-individual variability in AA responses. This shows that times in better and worse health evened out over the 7–10-day AA period. Descriptively, similar values were also found on single-item basis, but ratings were slightly more negative in the standard EQ-5D-5L than in the AA. This may indicate that the AA does not provide much added information and therefore does not warrant the additional effort. However, this finding seems in contrast to most participants clearly favouring AA over one-time assessment because they believed it captured important information about their health. An explanation may be that they regard the variability and pattern of health fluctuations as relevant over and above the average level of impairment. Indeed, all six EQ-5D-AA items showed variation in most participants, and the patterns were also quite different: some participants had highly stable values, while others showed considerable fluctuation. Detecting these patterns may be of additional value in understanding a person’s health status, comparable to findings that emotion variability has added value next to average emotion intensity when predicting well-being [33, 34]. However, these quantitative findings were exploratory only and need confirmation in a larger sample.

The EQ-5D-AA items ask retrospectively to the previous assessment; thereby, covering the complete assessment period (except for the night where two items were not applicable, usual activities and self-care). This approach is called proximal intensive assessment or complete coverage [15, 31]. In contrast, a sampling strategy would assess a—usually random—sample of moments only, which are taken to be representative for all moments within the sampling period. Such a strictly momentary approach would be applicable to the EQ-5D-5L dimensions of pain/discomfort and anxiety/depression: Both are states of mind that have some intensity at any given (waking) moment, including a possible intensity of zero. However, for some dimensions a momentary approach is not appropriate as they do not apply to most moments. This is especially true for the self-care dimension (because most of the day, no washing or dressing is needed), and to a lesser extent probably also for usual activities and mobility. However, the coverage approach used here also comes with disadvantages: recall bias is possible, and respondents still have to build an average value for the respective—albeit short—time period.

For the exploratory analyses, we calculated an index score for the seven-day AA period by first determining the score for each day, using the respective highest score of each item, and then averaging over days. It was not possible to determine a score for each time point because only three out of five items were assessed three times a day. However, with this calculation, scores will be the same regardless of whether an impairment was present during the complete day or only parts of it. As an alternative, the median or mode score of all item values of the week could be used for index score calculation. In addition, one could determine seven-day fluctuation scores, using variability or instability parameters [33,34,35]. Which of all these scores carry most information on patient-relevant aspects of health and/or are predictive of future health outcomes, needs to be evaluated with larger samples. Score calculation is further complicated by the missing data, which are very common in AA due to the high number of assessments and have also been found in the majority of our participants. Imputing missing values using statistical techniques, such as multiple imputation is recommended [36].

We would not recommend the EQ-5D-AA for use as a utility measure in health-economic evaluations for several reasons. One, existing valuation sets for the EQ-5D-5L cannot be used for an AA version; instead, specific preference elicitation studies would be needed which require large representative samples. Two, AA comes with higher respondent burden and also logistic effort than the standard EQ-5D-5L assessment. As health-economic evaluation often draws on the data from clinical trials, it is probably unrealistic that the additional effort of an AA will be taken in these studies.

Larger, subsequent studies also need to evaluate psychometric properties of the EQ-5D-AA as compared to the EQ-5D-5L and confirm its feasibility. They should use a stand-alone AA application that is compatible with both Android and iOs so that most participants can use their own mobile phone, probably reducing missing values. Finally, it should be evaluated whether and under which circumstances (e.g., one’s job and family situation) people would also be willing to complete the AA for a longer period of time than tested here—for example for monitoring purposes in clinical practice. This is of particular importance as our sample was small and probably also subject to selection bias in that only people who were willing to complete an AA took part.

If the EQ-5D-AA will prove valid and reliable, it can be used in future research, but also by individual PwMS self-tracking their health; some of our participants mentioned this to be an interesting option. Such data might also support patient-clinician communication on symptom dynamics and management, for example for activity planning and symptomatic treatment applications: Whether such use in clinical care is feasible and useful would need to be addressed in additional research, also investigating feasibility and usefulness from the health providers’ perspective.

A strength of this study was its iterative approach to AA development with subsequent waves of real-life testing, debriefing and adaptation. This approach may also be suited for AA development in other health conditions. Furthermore, our multidisciplinary research team included experts on PROMs and electronic PRO assessment, members of the EuroQol group, and a clinician specialised in MS care, each contributing their unique perspective on the AA development.

While our study sample was heterogeneous with regard to gender, age, disease duration, and both cognitive and subjective health impairment, it should be considered a limitation that most participants were from Hamburg, Germany, and people with lower education were underrepresented. It will therefore be important to include PwMS from this subgroup as well as people from other regions, including rural areas, in subsequent validation studies. Owing to the small sample, which also represents a limitation, the exploratory quantitative analyses can only give a hint on possible associations and patterns in EQ-5D-AA data that may warrant investigation in follow-up studies.


This study suggests that an one-week AA of the EQ-5D-5L can capture within-day and day-to-day fluctuations in subjective health and was feasible in people with MS. Patients stated that the EQ-5D-AA can provide important information on their health beyond what is captured by the EQ-5D-5L standard version.

Data availability

The data that support the findings of this study and the SPSS code are available from the corresponding author upon reasonable request.


  1. Reich, D. S., Lucchinetti, C. F., & Calabresi, P. A. (2018). Multiple sclerosis. New England J Med, 378(2), 169–118.

    CAS  Article  Google Scholar 

  2. Faissner, S., & Gold, R. (2019). Progressive multiple sclerosis: latest therapeutic developments and future directions. Therapeutic Adv Neurol Disorders, 25(12), 1756286419878323.

    Google Scholar 

  3. Galea, I., Ward-Abel, N., & Heesen, C. (2015). Relapse in multiple sclerosis. British Med J, 350, h1765.

    Article  Google Scholar 

  4. Kratz, A. L., Murphy, S. L., & Braley, T. J. (2017). Ecological momentary assessment of pain, fatigue, depressive, and cognitive symptoms reveals significant daily variability in multiple sclerosis. Archives Physical Medicine Rehabilitation, 98(11), 2142–2150.

    Article  Google Scholar 

  5. Heine, M., van den Akker, L. E., Blikman, L., Hoekstra, T., van Munster, E., Verschuren, O., et al. (2016). Real-time assessment of fatigue in patients with multiple sclerosis: how does it relate to commonly used self-report fatigue questionnaires? Archives of Physical Medicine Rehabilitation, 97(11), 1887–1894.

    Article  Google Scholar 

  6. Kasser, S. L., Goldstein, A., Wood, P. K., & Sibold, J. (2017). Symptom variability, affect and physical activity in ambulatory persons with multiple sclerosis: Understanding patterns and time-bound relationships. Disability Health J, 10(2), 207–213.

    Article  Google Scholar 

  7. Powell, D. J. H., Liossi, C., Schlotz, W., & Moss-Morris, R. (2017). Tracking daily fatigue fluctuations in multiple sclerosis: ecological momentary assessment provides unique insights. Journal of Behavioral Medicine, 40(5), 772–783.

    Article  Google Scholar 

  8. Kratz, A. L., Murphy, S. L., & Braley, T. J. (2017). Pain, fatigue, and cognitive symptoms are temporally associated within but not across days in multiple sclerosis. Archives Physical Med Rehabilitation, 98(11), 2151–2159.

    Article  Google Scholar 

  9. Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment (EMA) in behavioral medicine. Annals Behav Med, 16(3), 199–202.

    Article  Google Scholar 

  10. Bolger, N., & Laurenceau, J.-P. (2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. New York, NY: Guilford Press.

    Google Scholar 

  11. Schwarz, N. (Ed.). (2007). Retrospective and concurrent self-reports: The rationale for real-time data capture. New York, NY: Oxford University Press.

    Google Scholar 

  12. Schwartz, C. E., & Sprangers, M. A. (1999). Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Social Sci Med, 48(11), 1531–1548.

    CAS  Article  Google Scholar 

  13. Schneider, S., & Stone, A. A. (2016). Ambulatory and diary methods can facilitate the measurement of patient-reported outcomes. Quality Life Res, 25(3), 497–506.

    Article  Google Scholar 

  14. Mareva, S., Thomson, D., Marenco, P., Estal Muñoz, V., Ott, C. V., Schmidt, B., et al. (2016). Study protocol on ecological momentary assessment of health-related quality of life using a smartphone application. Frontiers Psychol, 7, 1086.

    Article  Google Scholar 

  15. Carlson, E. B., Field, N. P., Ruzek, J. I., Bryant, R. A., Dalenberg, C. J., Keane, T. M., et al. (2016). Advantages and psychometric validation of proximal intensive assessments of patient-reported outcomes collected in daily life. Quality Life Res, 25(3), 507–516.

    Article  Google Scholar 

  16. Herdman, M., Gudex, C., Lloyd, A., Janssen, M., Kind, P., Parkin, D., et al. (2011). Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Quality Life Res, 20(10), 1727–1736.

    CAS  Article  Google Scholar 

  17. Devlin, N. J., & Brooks, R. (2017). EQ-5D and the EuroQol group: past, present and future. Applied Health Economics Health Policy, 15(2), 127–137.

    Article  Google Scholar 

  18. Janssen, M. F., Pickard, A. S., Golicki, D., Gudex, C., Niewada, M., Scalone, L., et al. (2013). Measurement properties of the EQ-5D-5L compared to the EQ-5D-3L across eight patient groups: a multi-country study. Quality Life Res, 22(7), 1717–1727.

    CAS  Article  Google Scholar 

  19. Buchholz, I., Janssen, M. F., Kohlmann, T., & Feng, Y. S. (2018). A systematic review of studies comparing the measurement properties of the three-level and five-level versions of the EQ-5D. PharmacoEconomics, 36(6), 645–661.

    Article  Google Scholar 

  20. Konnopka, A., & König, H. H. (2017). The “no problems”-problem: an empirical analysis of ceiling effects on the EQ-5D 5L. Quality Life Res, 26(8), 2079–2084.

    Article  Google Scholar 

  21. Kuspinar, A., & Mayo, N. E. (2014). A review of the psychometric properties of generic utility measures in multiple sclerosis. PharmacoEconomics, 32(8), 759–773.

    Article  Google Scholar 

  22. Efthymiadou, O., Mossman, J., & Kanavos, P. (2019). Health related quality of life aspects not captured by EQ-5D-5L: Results from an international survey of patients. Health Policy, 123(2), 159–165.

    Article  Google Scholar 

  23. Jones, K. H., Ford, D. V., Jones, P. A., John, A., Middleton, R. M., Lockhart-Jones, H., et al. (2013). How people with multiple sclerosis rate their quality of life: an EQ-5D survey via the UK MS register. PLoS ONE, 8(6), e65640.

    CAS  Article  Google Scholar 

  24. Hemmett, L., Holmes, J., Barnes, M., & Russell, N. (2004). What drives quality of life in multiple sclerosis? QJM: An International Journal of Medicine, 97(10), 671–676.

    CAS  Article  Google Scholar 

  25. Kerr, C., Lloyd, E. J., Kosmas, C. E., Smith, H. T., Cooper, J. A., Johnston, K., et al. (2016). Health-related quality of life in Parkinson’s: impact of ‘off’ time and stated treatment preferences. Quality Life Res, 25(6), 1505–1515.

    Article  Google Scholar 

  26. Maes, I. H. L., Delespaul, P. A. E. G., Peters, M. L., White, M. P., van Horn, Y., Schruers, K., et al. (2015). Measuring health-related quality of life by experiences: the experience sampling method. Value Health, 18(1), 44–51.

    Article  Google Scholar 

  27. Kim, H., Sefcik, J. S., & Bradway, C. (2017). Characteristics of qualitative descriptive studies: a systematic review. Res Nursing Health, 40(1), 23–42.

    CAS  Article  Google Scholar 

  28. Sullivan, J., Edgeley, K., & Dehoux, E. (1990). A survey of multiple sclerosis. part I: perceived cognitive problems and compensatory strategy used. Canadian J Rehabilitation, 4, 99–105.

    Google Scholar 

  29. Malterud, K., Siersma, V. D., & Guassora, A. D. (2016). Sample size in qualitative interview studies: guided by information power. Qualitative Health Res, 26(13), 1753–1760.

    Article  Google Scholar 

  30. Ludwig, K., Graf von der Schulenburg, J.-M., & Greiner, W. (2018). German value set for the EQ-5D-5L. PharmacoEconomics, 36(6), 663–674.

    Article  Google Scholar 

  31. Shiffman, S. (2007). Designing protocols for ecological momentary assessment. In A. Stone, S. Shiffman, A. Atienza, & L. Nebeling (Eds.), The science of real-time data capture: Self-reports in health research. New York, NY: Oxford University Press.

    Google Scholar 

  32. Fuller-Tyszkiewicz, M., Skouteris, H., Richardson, B., Blore, J., Holmes, M., & Mills, J. (2013). Does the burden of the experience sampling method undermine data quality in state body image research? Body Image, 10(4), 607–613.

    Article  Google Scholar 

  33. Houben, M., van den Noortgate, W., & Kuppens, P. (2015). The relation between short-term emotion dynamics and psychological well-being: A meta-analysis. Psychological Bulletin, 141(4), 901–930.

    Article  Google Scholar 

  34. Dejonckheere, E., Mestdagh, M., Houben, M., Rutten, I., Sels, L., Kuppens, P., et al. (2019). Complex affect dynamics add limited information to the prediction of psychological well-being. Nat Human Behav, 3(5), 478–491.

    Article  Google Scholar 

  35. Topp, J., Andrees, V., Heesen, C., Augustin, M., & Blome, C. (2019). Recall of health-related quality of life: how does memory affect the SF-6D in patients with psoriasis or multiple sclerosis? A prospective observational study in Germany. British Med J Open, 9(11), e032859.

    Google Scholar 

  36. Black, A., Harel, O., & Matthews, G. (2012). Techniques for Analyzing Intensive Longitudinal Data with Missing Values. In M. R. Mehl & T. S. Conner (Eds.), Handbook of research methods for studying daily life. New York: The Guilford Press.

    Google Scholar 

Download references


We thank all participants of this study. We gratefully acknowledge the student assistants Antonia Newi and Lena Hirsch for their help in conducting this study. The authors thank the Scientific Communication Team of the IVDP, in particular Merle Twesten and Mario Gehoff, for copy editing. This work has been presented at the 26th Annual Conference of the International Society for Quality of Life Research 2019, San Diego, CA, USA, and at the 18. Deutscher Kongress für Versorgungsforschung (German Congress for Health Care Research) 2019, Berlin, Germany.


Open Access funding enabled and organized by Projekt DEAL. This research was funded by the EuroQol Research Foundation, EQ Project 201660. The views expressed by the authors in this publication do not necessarily reflect the views of the EuroQol Group.

Author information




All authors— Conceptualisation and methodology, data analysis, writing—review and editing. Christine Blome—data collection, writing—original draft preparation. Christine Blome, Jill Carlton, John Brazier— funding acquisition.

Corresponding author

Correspondence to Christine Blome.

Ethics declarations

Conflict of interest

The University of Sheffield was paid by the EuroQol Research Foundation for the time of JC and JB. The University Medical Center Hamburg-Eppendorf was paid by the EuroQol Research Foundation for the time of CB and CH. MFJ has received research funding from the EuroQol Research Foundation for this project. AL has received research funding from the EuroQol Research Foundation for this project. MO has no conflict of interest to declare.

Ethical approval

The study was performed in accordance with the June 1964 Declaration of Helsinki. The Ethics Committee of the Hamburg Chamber of Physicians provided ethical approval (reference PV5721). All participants gave written informed consent.

Informed consent

Not applicable.

Copyright notes

EQ-5D™ is a trade mark of the EuroQol Research Foundation. EQ-5D was modified by the authors with permission by the EuroQol Research Foundation. Reproduction of this version is not allowed. For reproduction, use or modification of the EQ-5D (any version), please register your study by using the online EQ registration page:

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 196 KB)

Supplementary file2 (DOCX 22 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Blome, C., Carlton, J., Heesen, C. et al. How to measure fluctuating impairments in people with MS: development of an ambulatory assessment version of the EQ-5D-5L in an exploratory study. Qual Life Res 30, 2081–2096 (2021).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • EQ-5D
  • Health-related quality of life
  • Ambulatory assessment
  • Ecological momentary assessment
  • Instrument development
  • Multiple sclerosis