FormalPara Key Summary Points

Why carry out this study?

To assess the psychometric properties of the ILQI and confirm the scoring cut-offs, in order to evaluate the reliability and validity of scores and demonstrate that ILQI is appropriate for measuring the impact of ITP on HRQoL in clinical practice.

What did the study ask? What was the hypothesis of the study?

Hypothesis was that the ILQI had already demonstrated good content validity and developers wanted to assess the psychometric validity, to further support appropriateness for use in clinical practice.

What were the study outcomes/conclusions?

Psychometric evaluation demonstrates that the ILQI is fit for measuring the impact of ITP on quality of life.

What has been learned from the study?

ILQI is a valid and reliable PRO measure of quality of life in patients with ITP.

Introduction

Immune thrombocytopenia (ITP) is an autoimmune disorder characterized by immunologic destruction of platelets, leading to an increased risk of bleeding. ITP is considered a rare disease, diagnosed primarily by excluding other possible causes of disease, and symptoms can present with varying severities. Patients often experience misdiagnosis and complex treatment patterns, which may significantly affect health-related quality of life (HRQoL) in this patient population.

Bleeding is identified as the primary symptom experienced by patients with ITP; however, this can present across a broad range of severity, from mild bruising and mucosal bleeding to severe haemorrhage [1]. A wide range of bleeding manifestations exist and the primary goal of treatment is to prevent severe/life-threatening bleeding [2]. Consequently, first-line therapy is indicated for patients with bleeding complications and those who are at increased risk of bleeding; however, the decision to initiate therapy depends not only on platelet count and the risk of bleeding but also on other outcomes, including quality of life and multiple lifestyle variables [3, 4]. Improving HRQoL and patient outcomes are key considerations when evaluating the impact of disease and the burden and benefits of treatments [5].

While patients and physicians largely align on overall ITP symptom burden, some differences can arise in their views pertaining to the limitations caused by ITP and/or its treatment [6, 7]. Employment, marital status, and education may influence HRQoL in patients with ITP and should be considered in clinical practice, as well as when planning and conducting studies [8]. The International Consensus Report has mentioned that severe fatigue is the most difficult ITP symptom to treat, experienced by 39–59% of adult patients, yet it is under-recognized by healthcare practitioners. Further, physicians primarily focus on platelet count whereas fatigue and symptoms related to mental health are often the more primary concerns of patients. The consensus report recommends individualization of treatment, and that improvement in patients’ HRQoL should be a primary treatment goal [9].

There is currently no appropriate disease-specific prospective tool in routine clinical practice to quantify the HRQoL of adult patients with ITP. The ITP Life Quality Index (ILQI) is a 10-item patient-reported outcome measure (Fig. 1) developed for use in clinical practice to aid discussions between patients and physicians about patients’ disease experience, and therefore facilitate a patient-centric approach towards treatment decision-making. The ILQI was originally developed by clinical experts in the field of ITP and content validity was confirmed by conducting qualitative interviews with 15 adult patients. The ILQI was also cognitively debriefed and items refined following qualitative analysis and additional clinical input [10, 11].

Fig. 1
figure 1

ITP Life Quality Index (ILQI)

The ILQI was included in the ITP World Impact Survey (I-WISh) [12, 13], a global observational survey which collected real-world data characterising the experiences of patients with ITP [6]. This large study provided an opportunity to evaluate the psychometric properties of the ILQI in order to assess the reliability and validity of scores and demonstrate that ILQI is appropriate for measuring the impact of ITP on HRQoL in clinical practice.

Methods

I-WISh Data

The ILQI was administered alongside a variety of other clinical and demographic questions in the I-WISh [12], including a series of single-item questions assessing HRQoL and items adapted from the Work Productivity and Activity Impairment Questionnaire (WPAI) [14]. The global survey was completed by 1507 adult patients with ITP and 472 healthcare providers from 13 countries (in order of number of patients recruited: USA, China, UK, France, Germany, Italy, India, Canada, Turkey, Japan, Colombia, Spain and Egypt). Patients were recruited via physicians and patient support groups; all patients self-confirmed their diagnosis of ITP, confirmed they were over 18 years old and gave their consent to participate. Healthcare providers confirmed their primary speciality was either haematology or haematology-oncology, confirmed their caseload of patients with ITP, and confirmed that they had responsibility for making ITP treatment decisions. Patients and healthcare providers were provided with an online link to complete study questions.

The ILQI

The ILQI is a 10-item patient-reported outcome (PRO) measure with a recall period of ‘the last month’ (Fig. 1). The items generally have four response options ranging from ‘never’ to ‘all of the time’ and three items (items 1, 2 and 5) have additional response options which allow the patient to specify that the item does not apply to them or that they do not wish to answer.

The ILQI was originally developed by clinical experts in the field of ITP and was considered unidimensional, resulting in a single score. Originally a total sum score ranging between 7 and 40 was proposed, where coded responses were summed (see Fig. 1 for coding). As the total sum score did not account for missing items, an alternative scoring procedure was tested in this analysis, using the mean of all items transformed to reflect the initial maximum total score of 40. This allows for the calculation of a total score when there are missing items; henceforth, this score is referred to as ILQI total (alternative) score.

The calculation for the ILQI total (alternative) score is as follows:

ILQI total (alternative) score = total sum score of all answered items/number of items with a score ≥ 1 × 10.

Prior to the psychometric validation work, a score of 20 or above was defined by the clinical experts as the cut-off for ‘impaired’ HRQoL and a score of 30 or above was the cut-off for ‘significantly impaired’ HRQoL.

Statistical Analyses

The normality of the data and individual ILQI items were assessed. Item response distribution and floor and ceiling effects were examined for each item to ensure the response scale appropriately captured the range of possible severity. Floor and ceiling effects were defined as > 25.00% of patients selecting the worst or best possible response respectively (assuming a uniform distribution across the four main responses). Underused response options (those selected by < 5.00% of patients) were also flagged for further consideration.

Differential item functioning (DIF) was conducted to assess whether patients with ITP in countries other than the USA answered each item in a similar way to the USA cohort of patients, stratified by disease severity. Uniform and non-uniform DIF was assessed using ordinal logistic regression models with item response as the dependent variable, and ILQI total (alternative) score and country indicator as covariates [15]. Note that sample sizes of at least 200 patients per group are recommended for DIF analyses [16]; therefore, this analysis was exploratory in nature due to the limited likelihood of achieving such sample sizes.

Polychoric correlation coefficients between each pair of ILQI items were reviewed to explore dimensionality. Items with particularly high inter-item correlations (> 0.90) were considered for potential redundancy. Exploratory factor analysis (EFA) was conducted to explore the latent structure (i.e. underlying structure) of the ILQI without imposing any preconceived structure. EFA was performed for half of the analysis population, stratified on the basis of current health, age and gender and was estimated using mean- and variance-adjusted weighted least squares (WLSMV) and employing delta parameterisation. Geomin rotation was specified, which allowed for the extracted factors to be correlated. The remaining half of the population was used in a confirmatory factor analysis (CFA) and the factor structure suggested through the EFA was treated as the primary hypothesized model. Model fit was assessed by calculating the comparative fit index (CFI; > 0.95 for evidence of acceptable fit) [17], root mean square error of approximation (RMSEA; < 0.10 for acceptable fit) [18], and standardised root mean square residual (SRMR; < 0.10 for acceptable fit) [18]. The chi-square test of model fit was not assessed because of its tendency to reject the null hypothesis in large samples, even when the hypothesised model shows trivial misfit [19].

Internal consistency reliability was conducted to assess how well the ILQI items were measuring the same underlying construct (i.e. homogeneity of the items belonging to the same score). A Cronbach’s alpha coefficient ≥ 0.70 was considered reliable [20] and was reassessed with each item removed in turn, to identify if items were not contributing to the reliability. The impact on missing items on the reduction in reliability was assessed through applying the Spearman–Brown prophecy formula to the Cronbach’s alpha coefficient.

Convergent validity correlations were conducted to evaluate the relationship of the ILQI total (alternative) score and specific ILQI items with other measures or items. Hypothesised strong correlations (≥ 0.50) and moderate correlations (≥ 0.30 but < 0.50) are presented in Table 1.

Table 1 Hypothesised convergent validity relationships

Known-groups methods evaluated the construct validity by segregating patients into hypothetically distinct groups, according to current health (poor, moderate, good), impact on emotional well-being (low, moderate, large impact), effect on daily activity reduction (low, moderate, high impact), fatigue (yes, no) and depression (yes, no). The amount of difference in mean ILQI total (alternative) scores characterises the degree to which the score is capable of distinguishing among groups hypothesised a priori to be clinically distinct. Between-group effect sizes were calculated and interpreted as ‘small’ (0.20), ‘medium’ (0.50) and ‘large’ (0.80) [21].

Existing severity thresholds for ILQI were derived from clinical expert judgement at the time of original development (a score of ≥ 20 for “impaired” HRQoL and a score of ≥ 30 for “significantly impaired” HRQoL). It was therefore necessary to assess these thresholds using statistically robust methods (i.e. receiving operating characteristic [ROC] analysis). ROC analysis was used to identify the threshold on ILQI total (alternative) score which optimally discriminated between ‘low’, ‘moderate’ and ‘high’ severity groups according to the current health, impact on emotional well-being, and effect on daily activity reduction anchors. The threshold which had the best trade-off between sensitivity and specificity according to the sum of squares method was selected [22]. If the area under the curve (AUC) confidence interval (CI) contains 0.50, the score may be no better than chance at predicting changes as defined by the anchor and ideally this should be > 0.70 [23, 24].

For each scoring algorithm derived as part of the analyses, the reduction in accuracy and level of bias were presented when patients were missing between one and five items using simulation of missing data (bootstrapping technique) [25].

The study was conducted in accordance with the Helsinki Declaration of 1964, and its later amendments. Survey materials and the study protocol were reviewed and approved by central institutional review boards (IRB) in both Europe and North America. All participants provided informed consent prior to completing the survey. No participant identifiable information is included in the manuscript.

Results

Demographic and Clinical Characteristics (I-WISh)

The mean (± SD) age of patients was 46.90 ± 16.22 years and 64.70% (n = 975) were female (Table 2). Patients were recruited from 13 different countries. Over half of the sample were working, either full time or part-time (60.40%). A quarter of patients (25.40%) had a current platelet count of > 100 × 109/L and the sample included some patients with a platelet count of < 10 × 109/L at time of survey (5.10%). Mean time since diagnosis was 8.9 ± 10.8 years. Most of the sample (85.20%) self-reported their current health state to be good or excellent (> 4 on a 1–7 Likert scale). Further results from the I-WISh survey are previously published [6, 12, 26,27,28].

Table 2 Participant demographic and clinical characteristics

Descriptive Analyses

The mean ILQI total (alternative) score was 21 (SD 7.00) (Table 3) and did not deviate from statistical normality (Shapiro–Wilk W = 0.97). Half of the patients (49.90%) were currently experiencing fatigue and 17.90% were currently experiencing depression. Two thirds of patients (62.30%) reported hiding the signs of bleeding to some degree and almost all patients (93.30%) reported some degree of worry about platelet counts going up and down. Patients missed a mean of 6 h (SD 15.10) of work in the past 7 days. Using a 10-point NRS, patients reported the effect of ITP on their work productivity over the past 7 days as a mean of 4 (SD 2.60), with higher scores representing worse work productivity.

Table 3 PRO descriptive analysis

ILQI Item Properties

Patients used the full range of ILQI item response options. Six items displayed ceiling effects and for three of these items, the most severe end of the scale was underused (answered by fewer than 5% of respondents) (Fig. 2). The ceiling effects range from 27.80% to 40.10% of patients selecting the best possible response for an item; however, this does not necessarily indicate an issue with the performance of the items, as other factors such as disease severity of the sample need to be considered (i.e. 63.30% of this sample reported their current health state was between 5 and 7 on a 7-point Likert scale, where 7 = excellent). Item 8, assessing ability to support people close to you, had the largest ceiling effect (40.10%) and only 4.80% of patients selected the worst possible response option (i.e. ITP impacts your ability to support people close to you all of the time), suggesting the response scale may not appropriately capture the range of possible severity levels.

Fig. 2
figure 2

Item response distributions

Inter-item correlations were examined (Table 4). All inter-item correlations were below 0.90, suggesting no item redundancy. Lowest correlations [29] of < 0.40 were found between two specific items (item 2 and 5) with most other items of the ILQI. Both item 2: ‘how often have you taken time off work or education because of your ITP’ (r = 0.25–0.39) and item 5: ‘how often has ITP impacted your sex life?’ (r = 0.27–0.32) had an N/A or ‘prefer not to say’ option, which may have contributed to the poor correlations.

Table 4 Inter-item correlations

The number of patients within each country was less than the recommended sample size of 200 for DIF analyses in all countries except the USA and China (Table 2). Therefore, DIF analyses were viewed as hypothesis-generating. The extent of uniform and non-uniform DIF was generally low when comparing the USA with Canada and other Western countries; however, items related to work or education, social life and sex life exhibited DIF between the USA and countries such as China, Turkey, Colombia and India.

Domain Structure and Scoring

The EFA factor loadings suggested that the ILQI has a unidimensional structure, supporting the creation of a total score including all 10 items. However, a two-factor solution (i.e. dividing the items into two groups of concepts which relate to each other) could be used by separating out items 1 and 2, assessing ‘work and study’. The factor loadings for the EFA sample (one- and two-factor solution) are presented in Table 5.

Table 5 Factor loadings for the EFA

In light of the EFA findings, a bi-factor model was fitted in the CFA sample to compare a single general factor (i.e. all 10 items loading onto the same factor) with two distinct specific factors (e.g. a separate factor for the work items). Model fit indices demonstrated acceptable model fit (RMSEA = 0.06, CFI = 0.99 and SRMR = 0.02). The bi-factor model showed that the items loaded more highly onto the general factor than their specific factors (explained common variance [ECV] = 0.85), suggesting that the ILQI is unidimensional and a total score can be used. The bi-factor solution is presented in Fig. 3.

Fig. 3
figure 3

CFA bi-factor solution

Reliability and Validity of Scores

When items of the ILQI were tested, Cronbach’s alpha was > 0.90, suggesting the items within the unidimensional score were interrelated and were working together to measure the same underlying construct [30] (Table 6).

Table 6 Internal consistency reliability for the ILQI score

Using a threshold of 0.90, the Spearman–Brown prophecy formula suggested that no more than three missing items are recommended for total score creation. Therefore, only calculating the ILQI total (alternative) score where a minimum of 7/10 items are completed is recommended.

Convergent validity was assessed between the ILQI total (alternative) score and individual items and other clinical and demographic single questions, completed as part of I-WISh. As hypothesised, there were large correlations (≥ 0.50) [24] between the ILQI items 1 and 2 (e.g. assessing impact of ITP on work or study) with the demographic question assessing work productivity reduction (r = 0.58–0.68). Large correlations were also identified between the ILQI total (alternative) score and the emotional well-being question (r = 0.65). The daily activity reduction question correlated strongly with the ILQI total (alternative) score (r = 0.72) and also most items on the ILQI (items 3–4 and 6–10, r > 0.50). The question on current health was not as strongly correlated as hypothesised but was still within an acceptable (moderate) range (r = 0.41). ILQI item 5, assessing impact on sex life, continued to perform poorly (all r values < 0.30), potentially because of the number of patients who preferred not to answer this item (n = 333, 22.10%) and as a convergent measure specifically measuring sexual functioning was not included for correlation.

In known-groups analysis, ILQI total (alternative) scores were compared among distinct groups that were expected to have different ILQI scores (Table 7). All assessments showed that mean ILQI total (alternative) scores increased monotonically and differentiated between individuals who would be expected to have higher or lower HRQoL. A large between group effect size difference (> 0.80) was observed between most groups and all differences between consecutive groups were statistically significant (p < 0.0001).

Table 7 Known-groups analyses for the ILQI total (alternative) score

Potential thresholds were evaluated by finding an optional cut-point using ROC curves where sensitivity and specificity were both maximised. All ROC curves had an AUC significantly greater than 0.5. Optimal thresholds to discriminate between low and moderate severity on each anchor measure were current health (18, sensitivity = 0.596, specificity = 0.687); impact on emotional well-being (16, sensitivity = 0.721, specificity = 0.738); effect on daily activity reduction (19, sensitivity = 0.728, specificity = 0.829). Optimal thresholds to discriminate between moderate and high severity on each anchor measure were current health (25, sensitivity = 0.743, specificity = 0.547); impact on emotional well-being (22, sensitivity = 0.703, specificity = 0.685); effect on daily activity reduction (25, sensitivity = 0.727, specificity = 0.576).

Discussion

Improving HRQoL and patient outcomes are key considerations when evaluating the impact of disease, particularly in ITP, where patients may still be experiencing significantly impaired HRQoL despite an adequate platelet count [6, 7, 12, 13, 31]. As such, discussions between patients and physicians are essential to ensure patients’ HRQoL is considered and that ultimately a shared approach is taken towards treatment decision-making [31]. While the importance of a shared decision-making framework is heavily propagated in management of paediatric patients with ITP [32,33,34], there are fewer tools and resources to support it within care of adult patients with ITP. Notably, it is acknowledged although there are clinical outcome assessment tools available to assess symptoms of ITP in adults [31, 35] there is currently no disease-specific tool designed for use in clinical practice, with evidence of validity and reliability, which takes account of patient HRQoL to aid discussions between adult patients and physicians. The original ILQI, developed only by clinical experts, was refined, using methods in line with best practice guidelines for the development and validation of PRO instruments [36,37,38]. Specifically, appropriate, in-depth qualitative work and rigorous psychometric methods were used to ensure the content validity of the ILQI.

The results presented here confirm that the ILQI is a valid and reliable PRO measure of HRQoL in patients with ITP. For six items, item response distributions were skewed towards the lower end of the scale (indicating high HRQoL); however, this is not a concern for the performance of the items, as the full range of response options were used for every item and other factors, such as disease severity, need to be considered. Despite the large ceiling effects for item 8, the concept which it assesses (support) emerged as important to patients with ITP in the qualitative research, suggesting that this item is more relevant to patients with more severe disease. Good inter-item correlations were identified between item-pairs that were expected to be highly correlated and no items were considered redundant. Considering the relevance of these concepts in the qualitative research, these items were retained for the remainder of the analyses in order to fully assess their fit. Consideration of the qualitative research and item content suggested that all the items were conceptually distinct.

EFA and CFA results supported a unidimensional solution, two-factor solution and a bi-factor solution with two distinct specific factors. However, the CFA bi-factor model showed that the items loaded more highly onto the general factor than the specific factor, suggesting that the ILQI is essentially unidimensional and the ILQI total (alternative) score can be used. Examining Cronbach’s alpha with each item removed in turn found that the ILQI internal consistency was most reliable when all 10 items were included in the score. Results for convergent validity were strong between the ILQI and items assessing work productivity reduction, emotional well-being and daily activity reduction and there was a moderate correlation with current health. The item assessing impact on sex life continued to perform poorly, potentially because of the number of patients who did not answer this item and the lack of a convergent measure specifically assessing sexual functioning. However, given the qualitative relevance of this item to patients, and the fact that it still contributed to the internal consistency reliability, it was retained. Known-groups analyses confirmed that ILQI total (alternative) scores were capable of distinguishing between clinically distinct groups.

The ROC-based analyses suggested that the existing thresholds were not representative and instead a threshold of around 17 points should be used to detect ‘impaired HRQoL’, and a threshold of around 23–25 points to detect ‘significantly impaired HRQoL’.

Exploratory DIF analyses generated hypotheses regarding possible cultural differences in the way patients responded to specific items. Further linguistic and cultural validation is currently being conducted, which will explore the hypotheses arising from this analysis in greater detail through qualitative methods as recommended [15].

Study Limitations and Further Research

Several limitations should be considered when interpreting the results of this study. Firstly, patients were recruited via physicians and patient support groups as part of a larger survey/study. It must be acknowledged that patients enrolled via support groups may not be representative of the overall patient population seen in clinic. One concern is that patients who are engaged in support groups are typically those with more severe disease manifestation, and who are in need of additional support to manage their condition [39]. However, that was not the case in this sample as 85.2% of patients self-reported their current health state to be good or excellent. This relatively healthy sample should also be taken into consideration when interpreting these results.

Another potential limitation is that I-WISh did not include an existing validated measure which could be used as an established clinically significant distress variable. Correlations were assessed between the ILQI items/scores and clinical and demographic questions and there was no appropriate single-item assessment of sexual functioning.

The practicalities of administering a 10-item PRO measure in clinical practice and the time taken to discuss the findings with patients could be a challenge. As a result of time restrictions in clinic appointments, it may be more appropriate to identify a smaller set of questions to assess the impact of ITP on HRQoL. To address this, item response theory (IRT) analysis is currently being conducted to assess the discriminatory properties of the items and further assess the structure of the ILQI, with the aim of developing a more accurate scoring algorithm.

It is recommended that further validation work is conducted to assess the reliability of the ILQI longitudinally, to evaluate a change in score that would represent a minimal clinically important difference.

Conclusions

These findings provide evidence that the ILQI has acceptable psychometric properties as a PRO instrument in patients with ITP. The cut-off scores derived from the work described in this paper help to optimally discriminate between severity groups and will aid patient-centred treatment decision-making between patients and physicians. The psychometric evaluation demonstrates that it is fit for measuring the impact of ITP on HRQoL in clinical practice. The adoption of the ILQI in routine care should improve consistency of patient-centred decision-making and may lead to better outcomes for those patients whose HRQoL has been negatively affected by their ITP. The intention is that the ILQI can be completed on paper or online, either before or during a consultation, to establish a patient’s current HRQoL status and stimulate further discussion into the underlying causes of poor HRQoL, during the clinical interaction. The ongoing linguistic and cultural validation and IRT analysis aims to provide further evidence of validity and refine this measure.