Abstract
Purpose
There is need for a comprehensive measure of post-stroke fatigue with sound measurement properties. This study aimed to develop the Norwegian Fatigue Characteristics and Interference Measure (FCIM) and assess its content validity, structural validity, and internal consistency.
Method
This study consisted of three steps: (1) an expert panel developed version 1.0 of the Norwegian FCIM, (2) its content validity was assessed in cognitive interviews with stroke patients (N = 15), (3) a convenience sample of stroke patients (N = 169) completed an online questionnaire with the FCIM, Fatigue Severity Scale, and sociodemographic information; validity and reliability were assessed using Rasch analysis.
Results
FCIM version 1.0 included a 10-item characteristics subscale, a 20-item interference subscale, and two pre-stroke fatigue items. The cognitive interviews revealed content validity issues, resulting in two interference items being removed and five items being flagged but retained for Rasch analysis (version 2.0). Rasch analysis led to removal of four items from the characteristics subscale and six more from the interference subscale. The final six-item characteristics subscale and 12-item interference subscale (version 3.0) both showed adequate fit to the Rasch model with indications of unidimensionality and local independence. The interference subscale had a high person separation index. No significant differential item function (DIF) was found in relation to gender, but one item demonstrated DIF in relation to age.
Conclusion
The cognitive interviews and Rasch analysis demonstrated that the Norwegian version of the FCIM has high content validity, structural validity, and internal consistency. Future research should assess its construct validity, reliability, and responsiveness.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Fatigue is a common and debilitating symptom for stroke survivors, potentially affecting every part of their daily activities and quality of life [1]. Stroke clinicians are currently advised to systematically assess fatigue in all patients [2, 3]. However, there is insufficient evidence to support the use of any routine treatment or prevention strategies [4, 5], and the pathophysiology of post-stroke fatigue (PSF) is largely unknown [1]. A critical barrier in PSF research is the lack of a patient-reported outcome measure (PROM) for fatigue with sound psychometric properties [6, 7].
There is growing recognition that content validity is the most important measurement property of a PROM [8]. Content validity can be defined as “the degree to which the content of a measurement instrument is an adequate reflection of the construct to be measured” [9]. It is advised that content validity should be demonstrated before evaluating other psychometric properties [10, 11]. However, establishing content validity in PROMs that assess unobservable constructs such as fatigue is challenging and requires several steps involving qualitative methods [9]. In a prior study [12], we explored the experience of fatigue in a qualitative study with stroke survivors and health professionals. These findings resulted in the development of a conceptual framework that outlined PSF as a multidimensional phenomenon. Two important dimensions were fatigue characteristics (e.g., intensity, timing) and fatigue interference (i.e., emotional, cognitive, activity, and social impacts of fatigue). A clear definition of the construct to be measured is a prerequisite for item development [8]. Despite this, recent fatigue measures lack a clear definition, measure various aspects, and lack high-quality evidence of content validity [6].
In stroke research, the Fatigue Severity Scale (FSS) is the most-used PROM for fatigue. However, the FSS lacks evidence of content validity and evidence of its psychometric properties is limited, particularly among people with stroke [6, 13, 14]. For example, the FSS does not assess important features such as mental versus physical fatigue or diurnal variations [6, 15]. Although other, more complex PROMs for fatigue exist, a recent review found no multidimensional fatigue questionnaires had been adequately validated in people with stroke [16]. With the growing recognition of PSF as a unique clinical entity and the increasing use of PROMs as endpoints in clinical studies, there is a clear need to develop a PSF-specific PROM with robust measurement properties that capture multidimensional features of PSF.
After establishing content validity, guidelines also recommend extensive field testing to obtain insight into the structural validity of the data [9, 11, 17]. Structural validity can be defined as “the degree to which the scores of a measurement instrument are an adequate reflection of the dimensionality of the construct to be measured” [18]. Rasch analysis is a restrictive model that builds a linear interval measure invariant across test-takers and is suitable for item reduction and to examine structural validity and item characteristics in detail [9, 19]. Further, Rasch analysis can obtain stable parameter estimates with smaller sample sizes than 2-parameter item response theory models [20]. The fatigue instruments currently used in stroke populations have limited evidence of their content validity. Studies have often moved straight to assessing other types of validity, reliability, and responsiveness [13, 16, 21]. This approach could introduce several potential biases in these instruments’ scores [8]. To move forward in PSF research, there is need for a fatigue PROM designed for and validated in the stroke population and that has been developed following advanced PROM guidelines [16]. The aim of this study was, therefore, to develop such a measure and evaluate its content validity, structural validity, and internal consistency.
Method
Design
This study had a mixed-methods design involving three iterative steps (Fig. 1), as described by de Vet et al. [9]. An expert panel developed the instrument’s initial items, cognitive interviews were conducted to evaluate content validity, and an online questionnaire was used to collect data for a Rasch analysis guiding item reduction and evaluating structural validity and internal consistency. Reporting of the item development and cognitive interviews follows the consolidated criteria for reporting qualitative research [22], and reporting of the Rasch analysis follows the Rasch reporting guideline for rehabilitation research [23].
Item development
We first established an expert panel, using convenience sampling to include a balanced group with diverse backgrounds. The expert panel consisted of three stroke researchers and clinicians from Lovisenberg Diaconal Hospital (LDS), two PROM researchers (one from LDS and one external expert on Rasch analysis), and two stroke patients with fatigue serving as user consultants. The first author (IJS), who is a nurse and current PhD student, led six meetings (four live at LDS and two on Zoom) each lasting 1–4 h (with breaks). Between meetings, all group members read and commented on written versions of the instrument. The expert panel developed the items and response categories for the fatigue instrument based on our conceptual framework of fatigue derived from a previously published qualitative study. In that study, we defined PSF as “an experience of mental, physical or general feeling of exhaustion and tiredness, with a discrepancy between the level of activity and the level of fatigue” [12]. The expert panel developed the initial instrument informed by PROM development guidelines [8, 24,25,26,27]. The aim was to develop an instrument that could measure characteristics and interference (in two subscales), have the potential to measure improvement (for use in intervention studies), assess the level and duration of fatigue experienced prior to stroke (i.e., pre-stroke fatigue) and be relatively short (feasible). The expert panel developed version 1.0 of the instrument, named the Fatigue Characteristics and Interference Measure (FCIM), through regular meetings from January through June 2020. We also consulted a speech therapist for advice on adapting the instrument for people with aphasia and/or reading difficulties. Finally, to ensure item clarity and test interview technique, we performed two pilot interviews with the user consultants.
Cognitive interviews—developing evidence of content validity
Next, we conducted cognitive interviews with stroke patients. The aim was to establish the content validity of FCIM version 1.0 by evaluating comprehensibility, retrieval, judgment and communication, comprehensiveness, and relevance [26, 28]. A stroke user organization in Norway invited members to participate via text messages, e-mail, and its Facebook page. Inclusion criteria were prior stroke during the last 2 years, over 18 years, and living within driving distance from Oslo. 15 stroke survivors were purposively sampled to ensure variability in age, gender, fatigue severity, and time since stroke diagnosis. Face-to-face individual interviews were conducted during August and September 2020. Eight interviews were conducted in participants’ homes, and seven at LDS in Oslo. Participants completed FCIM version 1.0 as part of the interview, and a questionnaire including a 7-item version of the FSS (FSS7) [14] and information about their sociodemographic characteristics and relevant medical history.
Cognitive interviews were conducted with the Three-Step Test-Interview (TSTI) technique [29] and followed a semi-structured interview guide (Online Resource 1). TSTI was developed as an aid to identify problems in newly developed instruments. It consists of the following three steps:
-
(1)
Observing response behavior and concurrent thinking aloud. The interviewer takes notes and observes behavior such as hesitation and correction of response category. The participants are also instructed to think aloud and verbalize their thoughts when filling out the instrument. The aim is to make the participants’ immediate thoughts about the instrument observable for the interviewer.
-
(2)
Follow-up probing considering the behavior or expressed thoughts collected in step 1, where the aim is to clarify and complete only the primary data previously collected.
-
(3)
Debriefing aimed at eliciting experiences and opinions, such as potential problems, possible improvements, and the instrument’s completeness.
Audio recordings and field notes were taken during the interviews. Recordings were transcribed verbatim and subject to a deductive content analysis facilitated by NVivo 12 [30]. Each item was analyzed separately, and we used a categorization matrix based on Tourangeau’s four-stage cognitive model [28], which includes comprehension, retrieval, judgment, and response. Finally, we assessed the completeness of the instrument as a whole. After 10 interviews, some changes were made to the instrument, and then again after completing all 15 interviews, resulting in FCIM version 2.0.
Rasch analysis—developing evidence of structural validity and internal consistency
We conducted a cross-sectional study with a convenience sample of stroke patients (N = 169). Participants were recruited through the website of a stroke user organization. Inclusion criteria were adults with a self-reported stroke diagnosis who could read Norwegian. Data were collected between January and March 2021. Participants responded to an online questionnaire including sociodemographic information, relevant medical history, FSS7, and FCIM version 2.0.
FCIM version 2.0 was analyzed using a Rasch model which calculates the probability of a specified response for both persons and items along the same linear scale (representing the latent trait). This enables transformation of ordinal raw scores into an interval-level variable (called logits). Winsteps (version 5.2.0.0), R (version 4.1.2), and SPSS (version 28.0.0.0) were used to conduct statistical analyses and generate graphs. Since the FCIM is a new instrument and includes items with different response categories, we applied a Partial Credit Model (PCM) which make no assumptions about the equidistance between thresholds across items [31]. Then we assessed rating scale functioning according to Linacre’s guidelines to determine whether the scale was suitable for Rasch analysis [32] (Table 1). The primary focus in the Rasch analysis was to address two main aims: item reduction and evaluation of FCIM’s structural validity and internal consistency. This process involves repeated analyses (iterations) of each subscale in FCIM. Following each iteration, the measurement properties were evaluated and items not meeting pre-established criteria were removed. Iteration of analysis were repeated until satisfactory results, as outlined in Table 1.
Rasch analysis uses fit statistics to estimate how well the items’ and persons’ raw data fit the model assumptions [33]. Fit statistics are presented in infit and outfit unstandardized mean square (MNSQ) and standardized fit statistics (z-values). The MNSQ residuals show the degree of randomness and a MNSQ value of 1.0 indicates perfect fit. Values less than 1.0 indicate overfit to the model (i.e., the observations are too predictable), while values greater than 1.0 indicate underfit (i.e., there is more randomness in the data than expected in the Rasch model). Consistent with earlier empirical studies [14], we evaluated standardized infit statistics as they are more sensitive to unexpected item response patterns targeted to the person. This is in contrast to outfit, which is more sensitive to unexpected observations on very easy or hard items, with high outfit MNSQ often resulting from a few random responses by low performers [33, 34].
Structural validity
Each subscale’s unidimensionality was assessed separately using Principal Components Analysis (PCA) of the residuals [35]. Although the conceptual framework of fatigue is multidimensional, statistical unidimensionality of each subscale/dimension needs to be ensured. Local dependency between items was evaluated with Yen’s Q3 statistics, which compute raw score residual correlations [36]. Local independence means that the variance left after removing the contribution to the latent trait is only random, normally distributed, noise [35]. If two items are locally dependent, it indicates either that they add to some other dimension, or that they duplicate some feature of each other (called redundancy-dependency) [35].
Internal consistency
Targeting of persons was reported with the mean measure score (theta) for persons and mean standard error. Well-targeted measures have similar mean locations for persons and items. A positive mean value for persons indicates higher levels of fatigue compared to the average of the scale (set at 0) and a negative value indicates lower levels of fatigue [37]. We also reported Wright item maps that, on the same scale, display both the individual participants’ ability measures and the individual items’ difficulty calibrations (including the step calibrations [Andrich thresholds]) [35]. Precision was evaluated using the person and item separation index with associated reliability [35]. Consistency in item correlations was assessed with Kuder-Richardson Formula 20 (KR-20).
Score-to-measure conversion
Rasch analysis converts the raw scores to interval data (logits). To facilitate clinical interpretation of FCIM, we provide score-to-measure tables for both subscales.
The groups of persons with misfitting MNSQ and z-values were compared to the rest of the sample concerning age, gender, and fatigue (FSS severe fatigue vs. no/mild/moderate fatigue (combined)), using Student’s t test or Fisher’s exact test, as appropriate.
Uniform differential item functioning (DIF)
DIF analysis was conducted to investigate whether previously known subgroups (gender and age [1, 38]) had significantly different responses to items despite equal levels of the underlying trait [35].
Ethical considerations
The study was conducted following the Declaration of Helsinki and was approved by the Regional Medical and Health Ethics Committee of Southeastern Norway (REK) (reference 2017/1741). All participants provided written informed consent.
Results
Instrument development by expert panel
First, the expert panel generated 160 items related to our conceptual framework [12]. Then, the panel selected the best-worded and most relevant, comprehensive, and discriminating items [18]. The FCIM is based on a reflective model and was developed after thorough discussions and according to relevant guidelines [8, 26]. Based on the conceptual framework, the instrument was divided into two subscales: Characteristics and Interference. This initial version consisted of our definition of PSF and 32 items with a 7-day recall period. The Characteristics subscale had 10 items with a 5-point Likert scale ranging from “not at all” to “very much.” Three item pairs (fatigued items 1 and 2, mental fatigue items 3 and 4 and physical fatigue items 5 and 6) were designed to be the same question, but with slightly different wording (Table 2). We included all three item pairs to investigate in the next steps which items were preferred by stroke patients. The Interference subscale had 20 items with a 5-point Likert scale ranging from “never” to “all the time.” In addition, we included two pre-stroke fatigue items to help distinguish post-stroke fatigue from pre-existing levels of fatigue. Based on the speech therapist’s input, we used bold font for essential words in each item. This process resulted in FCIM version 1.0.
Content validity testing by cognitive interviews
FCIM version 1.0 was then assessed in 15 cognitive interviews with stroke patients. Characteristics of the participants are presented in Table 3. The interviews lasted between 24 and 92 (mean 53) minutes. Details about the cognitive interviews results are displayed in an item-tracking matrix (Online Resource 2). Preliminary analysis after the first 10 interviews showed difficulties with comprehension of the PSF definition, and minor difficulties with comprehension and judgment in seven items in the Interference subscale. We edited the instrument and presented the updated version in the next five interviews, resulting in improved understanding of these items and the PSF definition. After 15 interviews, we changed the ordering of items in the Interference subscale and removed two items due to comprehension problems and lack of relevance. We also flagged five items (2, 4, 6, 27 and 28) because of similarity, comprehension and judgment issues. Despite potential issues, we decided to temporarily keep the flagged items and investigate their performance in the Rasch analysis. This resulted in FCIM version 2.0 with 10 Characteristics items, 18 Interference items and two pre-stroke fatigue items (Table 2). Except for the five flagged items, we found no significant problems relating to comprehension, judgment, relevance, or completeness of the items.
Evaluating structural validity and internal consistency with Rasch analysis
FCIM version 2.0 was further evaluated in a sample of 169 patients with stroke who responded to an online questionnaire (Table 3). There were no missing data. First, we evaluated the functioning of both subscales against Linacre’s guidelines (Table 1) [32], and both subscales fulfilled the criteria of having unimodal distribution with peaks in the center, more than 10 observations in each category, outfit MNSQ < 2.0, and step calibrations (Andrich thresholds) that advanced monotonically between 1 and 5 logits (Online resource 3 and 4).
Characteristics subscale
The first iteration with all 10 items revealed misfit in item 10 (evening fatigue), so we removed this item (Table 4). The second iteration displayed overfit in item 2 (exhausted), which was also removed. In the third iteration, all eight items demonstrated acceptable fit to the Rasch model. Then we assessed dimensionality of the remaining eight items by a PCA. The residuals explained by the latent trait was just above 55%; however, the eigenvalue in the 1st component was slightly elevated, justifying further investigation. Residual correlations between items 3 and 4 (mental fatigue), as well as items 5 and 6 (physical fatigue), were above the critical value. This was expected since these item pairs were almost identical and previously flagged from the cognitive interviews. We removed items 4 and 6 and re-ran the analysis. Removing these locally dependent items improved the results indicating unidimensionality (Table 4; Online Resource 5). The remaining six items also demonstrated evidence of local independence, with no positive correlations. The mean person response was about 1 logit higher than the mean item measure. The Wright map (Fig. 2) shows that the subscale works across different levels of fatigue in this sample. In addition, the 6-item subscale demonstrated acceptable KR-20 and person separation (Table 4). Slightly exceeding our criterion, 10 persons (5.9%) demonstrated misfit to the Rasch model. However, no significant differences in the group of misfits compared to the rest of the sample were found in relation to age, gender, or fatigue. Uniform DIF was not detected in relation to gender, but significant DIF was found for item 8 (fatigued around noon) in relation to age, with the age group 61–83 being more likely to agree with this item than the age group 45–60 (p = 0.0064). The final 6-item Characteristics subscale demonstrated evidence of good structural validity and internal consistency. Characteristics subscale raw scores range from six to 30 and correspond to Rasch person measures of -5.96 to 6.37 logits. A score-to-measure table is provided in Online Resource 6.
Interference subscale
Item goodness-of-fit statistics from the first iteration of the 18-item Interference subscale indicated misfit in items 15 (bath/shower) and 16 (dressed/undressed) (Table 4). After removal of these two items, the remaining 16 items demonstrated acceptable fit with infit MNSQ values that met our specified criteria. PCA showed that 62% of the variation was explained by the latent trait; however, as the eigenvalue in the 1st component was greater than 2, and some of the residual item correlations were above 0.3 (indicating locally dependent items), further investigation was justified. Items 14 (gathering thought) and 18 (completing tasks) both demonstrated a higher-than-expected residual correlation with two other items (Table 4); thus, we decided to remove items 14 and 18. We additionally removed items 27 (hobbies) and 28 (pleasant activities), since they were flagged as redundant items from a content validity perspective in the cognitive interviews. In the final subscale, the mean person measure was 0.55 logits higher than the mean item measure, and the Wright map (Fig. 3) shows that the subscale works well across all levels of fatigue in this sample. The final Interference subscale also had a high person separation index and reliability. Fourteen persons (8.28%) demonstrated misfit to the Rasch model on the Interference subscale, which exceeded our criterion (Table 4). However, no statistically significant differences were found in the group of misfitting persons compared to the rest of the sample in relation to age, gender, or fatigue. No significant uniform DIF was found in relation to gender or age groups. Interference subscale raw scores range from 12 to 60 and correspond to Rasch person measures of -7.54 to 8.09 logits. A score-to-measure table is provided in Online Resource 7. In sum, the final 12-item Interference subscale demonstrated good fit to the Rasch model and good structural validity and internal consistency (Table 4; Online Resource 8).
Pearson correlation indicated a strong positive relationship (r = 0.77, p < 0.001) between each individual’s Rasch measure from the 6-item Characteristic subscale and 12-item Interference subscale.
Discussion
In this study, we developed and evaluated the Norwegian Fatigue Characteristics and Interference Measure (FCIM), a new 20-item PROM for PSF.
Several previous studies have concluded that fatigue is a multidimensional phenomenon [12, 39]. In this study, based on our conceptual framework and results from the cognitive interviews, the fatigue dimensions of Characteristics and Interference were separated into two subscales. In clinical settings, separating Characteristics and Interference into two subscales is an advantage as the Characteristics subscale can be used in all post-stroke phases, whereas the Interference subscale only can be used after the initial acute phase once the patient has experienced how fatigue has interfered with their life. In addition, differentiating these subscales may also support targeting of different types of interventions. For example, a specific intervention might have no effect on fatigue’s intensity (measured by items in the Characteristics dimension), but could alter fatigue’s interference with the person’s activities. However, the subscales were relatively highly correlated, and future studies with larger and more diverse samples are needed to statistically confirm this two-dimensional structure.
A high-quality PROM needs to be feasible in addition to having evidence of validity and reliability. To avoid respondents losing concentration or becoming fatigued, it is recommended that the instrument not be too extensive or time-consuming [9]. Thus, we aimed for a relatively small number of items in the final FCIM. Three items were removed due to high infit MNSQ values indicating that answers to these items were more unexpected than predicted by the Rasch model. This might indicate that these items capture an additional construct [35]. For example, a possible explanation of misfit in item 10 is that evening fatigue can be commonly experienced even by people with low levels of overall fatigue, as it might reflect normal circadian rhythms of lower energy levels later in the day [40]. Keeping this item could possibly bias the results. A clinical advantage of using a Rasch model to generate individual measures (in logits) is that these measures are based on the individual pattern of responses, not the sum score. Thus, even for respondents with some missing data (e.g., due to fatigue), a Rasch model can generate a comparable measure for use in clinical practice or research (although with a larger individual standard error).
In our previous study, we found that stroke survivors qualitatively described fatigue in a variety of ways [12], a finding also reflected in existing fatigue PROMs using a wide range of different expressions to capture fatigue [6]. Even slight differences in item wording are known to affect responses [41]. To identify the best fatigue wording, we included three item pairs with slightly different wording in the Characteristics subscale. Based on the cognitive interviews, we flagged these items and retained them for the Rasch analysis. Not surprisingly, Rasch analysis indicated local dependence or overfit to the model in these items. It seems likely that the residual correlations between these items indicated duplication (redundancy-dependency) rather than multidimensionality [36]. In retrospect, we could have removed these items after the cognitive interviews, however, it is unlikely that keeping them changed our overall results.
Another limitation of our study was the loss of some separation ability due to the subscales’ item reduction, although the instrument’s structural validity and internal consistency increased, as did its feasibility. Person misfit was slightly higher than expected and should be monitored more closely in larger studies, as a larger sample (> 200) would offer more powerful data on person and context [24]. In addition, we used a convenience sampling method, and despite our sample’s diverse sociodemographic and clinical characteristics, a random-sampling method could yield more robust results.
An advantage of our study is that FCIM is developed based on qualitative data. Our previous review of PROMs used in PSF research showed that existing instruments included items confounded by other post-stroke sequela [6], such as “Do you feel weak?” Including such items could bias the results in a stroke population. Fatigue characteristics and interference are common in other diagnoses, and FCIM has the potential to be used across different patient groups after assessment of its measurement properties.
Conclusion
In this study, we have developed the FCIM, a patient-reported outcome measure for post-stroke fatigue that includes two subscales measuring fatigue’s Characteristics and Interference, and two pre-stroke fatigue items. This study has shown that the FCIM comprehensively captures the essential experiences of fatigue, and thus, demonstrates evidence of content validity. Using Rasch analysis on the two separate subscales, we removed misfitting and locally dependent items, which resulted in both subscales demonstrating evidence of structural validity and internal consistency. The findings suggest that a scale with relatively few items (from a larger item pool) can be clinically sufficient to generate valid measures of different experiences of fatigue for a vulnerable target group. Further assessment of the FCIM is still necessary before it can be used in clinical practice as well as research. The most important next step is to investigate the FCIM’s ability to detect change over time (i.e., responsiveness) and its relationship to other instruments. Responsiveness is especially important for determining whether the FCIM can be used as an outcome measure for intervention studies. Such studies can also serve as a foundation for using the FCIM to support intervention planning, to minimize fatigue’s severity and interference in daily life for stroke patients. Future studies should also evaluate the FCIM’s measurement properties in other patient populations, as fatigue characteristics and interference are common outcomes of many diseases/disorders. While FCIM is currently only available in Norwegian, we aim to translate and cross-culturally validate the instrument in an English-speaking sample.
Data availability
The authors retain copyright for the Norwegian Fatigue Characteristics and Interference Measure (FCIM). For any inquiry regarding the availability, use, translation and scoring of FCIM, contact the first author Ingrid Johansen Skogestad; https://orcid.org/0000-0001-8252-258X.
References
Paciaroni, M., & Acciarresi, M. (2019). Poststroke fatigue. Stroke, 50(7), 1927–1933. https://doi.org/10.1161/STROKEAHA.119.023552
Lanctot, K. L., Lindsay, M. P., Smith, E. E., Sahlas, D. J., Foley, N., Gubitz, G., Austin, M., Ball, K., Bhogal, S., Blake, T., Herrmann, N., Hogan, D., Khan, A., Longman, S., King, A., Leonard, C., Shoniker, T., Taylor, T., Teed, M., et al. (2020). Canadian stroke best practice recommendations: Mood, cognition and fatigue following stroke, 6th edition update 2019. International Journal of Stroke, 15(6), 668–688.
Norwegian Directorate of Health. The Norwegian guideline for stroke treatment and rehabilitation [in Norwegian] 2017. Retrieved from https://www.helsedirektoratet.no/retningslinjer/hjerneslag.
Wu, S., Kutlubaev, M. A., Chun, H. Y., Cowey, E., Pollock, A., Macleod, M. R., Dennis, M., Keane, E., Sharpe, M., & Mead, G. E. (2015). Interventions for post-stroke fatigue. Cochrane Database of Systematic Reviews, 7, CD007030. https://doi.org/10.1002/14651858.CD007030.pub3
Kennedy, C., & Kidd, L. (2018). Interventions for post-stroke fatigue: A Cochrane review summary. International Journal of Nursing Studies, 85, 136–137. https://doi.org/10.1016/j.ijnurstu.2017.11.006
Skogestad, I. J., Kirkevold, M., Indredavik, B., Gay, C. L., Lerdal, A., & Group N. (2019). Lack of content overlap and essential dimensions—A review of measures used for post-stroke fatigue. Journal of Psychosomatic Research, 124, 109759. https://doi.org/10.1016/j.jpsychores.2019.109759
Cumming, T. B., Yeo, A. B., Marquez, J., Churilov, L., Annoni, J.-M., Badaru, U., Ghotbi, N., Harbison, J., Kwakkel, G., Lerdal, A., Mills, R., Naess, H., Nyland, H., Schmid, A., Tang, W. K., Tseng, B., van de Port, I., Mead, G., & English, C. (2018). Investigating post-stroke fatigue: An individual participant data meta-analysis. Journal of Psychosomatic Research, 113, 107–112. https://doi.org/10.1016/j.jpsychores.2018.08.006
Terwee, C. B., Prinsen, C. A. C., Chiarotto, A., Westerman, M. J., Patrick, D. L., Alonso, J., Bouter, L. M., de Vet, H. C. W., & Mokkink, L. B. (2018). COSMIN methodology for evaluating the content validity of patient-reported outcome measures: A Delphi study. Quality of Life Research, 27(5), 1159–1170. https://doi.org/10.1007/s11136-018-1829-0
de Vet, H., Terwee, C., Mokkink, L., & Knol, D. (2011). Measurement in medicine: A practical guide (practical guides to biostatistics and epidemiology). Cambridge University Press.
U. S. Department of Health Human Services, & F. D. A. (2006). Guidance for industry: Patient-reported outcome measures: use in medical product development to support labeling claims: Draft guidance. Health and Quality of Life Outcomes, 4, 79.
Prinsen, C. A. C., Mokkink, L. B., Bouter, L. M., Alonso, J., Patrick, D. L., de Vet, H. C. W., & Terwee, C. B. (2018). COSMIN guideline for systematic reviews of patient-reported outcome measures. Quality of Life Research, 27(5), 1147–1157. https://doi.org/10.1007/s11136-018-1798-3
Skogestad, I. J., Kirkevold, M., Larsson, P., Borge, C. R., Indredavik, B., Gay, C. L., & Lerdal, A. (2021). Post-stroke fatigue: An exploratory study with patients and health professionals to develop a patient-reported outcome measure. Journal of Patient Reported Outcomes, 5(1), 35. https://doi.org/10.1186/s41687-021-00307-z
Nadarajah, M., & Goh, H. T. (2015). Post-stroke fatigue: A review on prevalence, correlates, measurement, and management. Topics in Stroke Rehabilitation, 22(3), 208–220. https://doi.org/10.1179/1074935714Z.0000000015
Lerdal, A., & Kottorp, A. (2011). Psychometric properties of the Fatigue Severity Scale-Rasch analyses of individual responses in a Norwegian stroke cohort. International Journal of Nursing Studies, 48(10), 1258–1265. https://doi.org/10.1016/j.ijnurstu.2011.02.019
Krupp, L. B., LaRocca, N. G., Muir-Nash, J., & Steinberg, A. D. (1989). The fatigue severity scale. Application to patients with multiple sclerosis and systemic lupus erythematosus. Archives of Neurology, 46(10), 1121–1123.
Elbers, R. G., Rietberg, M. B., van Wegen, E. E., Verhoef, J., Kramer, S. F., Terwee, C. B., & Kwakkel, G. (2012). Self-report fatigue questionnaires in multiple sclerosis, Parkinson’s disease and stroke: A systematic review of measurement properties. Quality of Life Research, 21(6), 925–944. https://doi.org/10.1007/s11136-011-0009-2
Oosterveld, P., Vorst, H. C. M., & Smits, N. (2019). Methods for questionnaire design: A taxonomy linking procedures to test goals. Quality of Life Research, 28(9), 2501–2512. https://doi.org/10.1007/s11136-019-02209-6
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., Bouter, L. M., & de Vet, H. C. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63(7), 737–745. https://doi.org/10.1016/j.jclinepi.2010.02.006
Andrich, D., & Marais I. (2019). A course in Rasch measurement theory. D Andrich y I Marais (Coords), Measuring in the Educational, Social and Health Sciences, 41, 53.
Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5–18. https://doi.org/10.1007/s11136-007-9198-0
Mead, G., Lynch, J., Greig, C., Young, A., Lewis, S., & Sharpe, M. (2007). Evaluation of fatigue scales in stroke patients. Stroke, 38(7), 2090–2095.
Tong, A., Sainsbury, P., & Craig, J. (2007). Consolidated criteria for reporting qualitative research (COREQ): A 32-item checklist for interviews and focus groups. International Journal for Quality in Health Care, 19(6), 349–357. https://doi.org/10.1093/intqhc/mzm042
Mallinson, T., Kozlowski, A. J., Johnston, M. V., Weaver, J., Terhorst, L., Grampurohit, N., Juengst, S., Ehrlich-Jones, L., Heinemann, A. W., Melvin, J., Sood, P., & Van de Winckel, A. (2022). Rasch reporting guideline for rehabilitation research (RULER): The RULER statement. Archives of Physical Medicine and Rehabilitation, 103(7), 1477–1486. https://doi.org/10.1016/j.apmr.2022.03.013
Mokkink, L. B., Prinsen, C. A., Patrick, D. L., Alonso, J., Bouter, L. M., de Vet, H. C., & Terwee, C. B. (2019). COSMIN study design checklist for patient-reported outcome measurement instruments. The Netherlands.
Patrick, D. L., Burke, L. B., Gwaltney, C. J., Leidy, N. K., Martin, M. L., Molsen, E., & Ring, L. (2011). Content validity–establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 1–eliciting concepts for a new PRO instrument. Value in Health, 14(8), 967–977. https://doi.org/10.1016/j.jval.2011.06.014
Patrick, D. L., Burke, L. B., Gwaltney, C. J., Leidy, N. K., Martin, M. L., Molsen, E., & Ring, L. (2011). Content validity–establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO Good Research Practices Task Force report: Part 2–assessing respondent understanding. Value in Health, 14(8), 978–988. https://doi.org/10.1016/j.jval.2011.06.013
Gagnier, J. J., Lai, J., Mokkink, L. B., & Terwee, C. B. (2021). COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Quality of Life Research, 30(8), 2197–2218. https://doi.org/10.1007/s11136-021-02822-4
Tourangeau, R. (1984). Cognitive science and survey methods. In T. B. Jabine, J. M. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey design: Building a bridge between disciplines (pp. 73–100). National Academy Press.
Hak, T., Veer, K. V. D., & Jansen, H. (2008). The three-step test-interview (TSTI): An observation-based method for pretesting self-completion questionnaires. Survey Research Methods, 2(3), 143–150.
Elo, S., & Kyngas, H. (2008). The qualitative content analysis process. Journal of Advanced Nursing, 62(1), 107–115. https://doi.org/10.1111/j.1365-2648.2007.04569.x
Pallant, J. F., & Tennant, A. (2007). An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46(1), 1–18. https://doi.org/10.1348/014466506X96931
Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85–106.
Bond, T. G., & Fox, C. M. (2013). Applying the Rasch model: Fundamental measurement in the human sciences. Psychology Press.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA Press.
Linacre, J. M. (2006). A user's guide to Winsteps Ministeps Rasch-model computer programs. Program Manual 5222022.
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178–194. https://doi.org/10.1177/0146621616677520
Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement model in rheumatology: What is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis and Rheumatology, 57(8), 1358–1362. https://doi.org/10.1002/art.23108
Lerdal, A. (2013). Curvilinear relationship between age and post-stroke fatigue among patients in the acute phase following first-ever stroke. International Journal of Physical Medicine & Rehabilitation. https://doi.org/10.4172/2329-9096.1000141
Lerdal, A., Bakken, L. N., Kouwenhoven, S. E., Pedersen, G., Kirkevold, M., Finset, A., & Kim, H. S. (2009). Poststroke fatigue–a review. Journal of Pain and Symptom Management, 38(6), 928–949. https://doi.org/10.1016/j.jpainsymman.2009.04.028
Valdez, P. (2019). Circadian rhythms in attention. The Yale Journal of Biology and Medicine, 92(1), 81–92.
Willis, G. B. (2004). Cognitive interviewing: A tool for improving questionnaire design. Sage Publications.
Smith, A. B., Rush, R., Fallowfield, L. J., Velikova, G., & Sharpe, M. (2008). Rasch fit statistics and sample size considerations for polytomous data. BMC Medical Research Methodology, 8(1), 1–11.
Mokkink, L. B., Prinsen, C. A., Patrick, D., Alonso, J., Bouter, L. M., De Vet, H. C., & Terwee, C. B. (2018). COSMIN methodology for systematic reviews of Patient‐Reported Outcome Measures (PROMs) User Manual.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300.
Lerdal, A., Bakken, L. N., Rasmussen, E. F., Beiermann, C., Ryen, S., Pynten, S., Drefvelin, A. S., Dahl, A. M., Rognstad, G., Finset, A., Lee, K. A., & Kim, H. S. (2011). Physical impairment, depressive symptoms and pre-stroke fatigue are related to fatigue in the acute phase after stroke. Disability and Rehabilitation, 33(4), 334–342. https://doi.org/10.3109/09638288.2010.490867
Acknowledgements
The authors thank the participating patients and user consultants for their valuable contributions to this study.
Funding
Open access funding provided by University of Oslo (incl Oslo University Hospital). Ingrid Johansen Skogestad is funded by the Norwegian non-profit National Association for Public Health doctoral studentship. The funding source had no involvement in conducting or reporting of this study.
Author information
Authors and Affiliations
Contributions
Study conception and design were performed by IJS, AK and AL. Data collection was performed by IJS and analyses were performed by IJS, AK, PL, TMM, CRB and AL. The first draft and the revisions of the manuscript, was written by IJS and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Skogestad, I.J., Kottorp, A., Larsson, P. et al. Development and evaluation of the Norwegian Fatigue Characteristics and Interference Measure (FCIM) for stroke survivors: cognitive interviews and Rasch analysis. Qual Life Res 32, 3389–3401 (2023). https://doi.org/10.1007/s11136-023-03477-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-023-03477-z