Background

Sickle Cell Disease causes profound suffering and decrements in daily functioning [1, 2]. The Adult Sickle Cell Quality of Life Measurement System (ASCQ-Me℠, pronounced “Ask me”), was developed to address the growing demand for valid and reliable measures to systematically document these effects in adults [3]. ASCQ-Me℠ is one of four measurement systems housed within the Person–Centered Assessment Resource (PCAR), funded by the National Institutes of Health to support clinical research. [4, 5]. Of interest to users of these measures is evidence-based information regarding how and when to apply each. Here we report such a study describing the validity of the Patient-Reported Outcome Measurement Information System (PROMIS®) which was designed to be applicable across chronic diseases [6, 7], as well as one of the condition-specific measures included in the PCAR -- ASCQ-Me℠ [2]. The health assessments in both PROMIS and ASCQ-Me were built and scored using Item Response Theory (IRT), specifically the Graded Response Model (GRM), and both use a web-based, electronic data collection platform [3, 8]. Given that PROMIS was designed to be universally applicable, [9] the value added of a system like ASCQ-Me may be in question.

Sickle cell disease (SCD) is one of the most common genetic disorders in the USA affecting up to 100,000 individuals. [10] Adults with SCD face debilitating health problems including multi-organ failure, chronic pain and neurocognitive deficits. [11, 12] Adult care for patients with SCD lags pediatric care because SCD used to be a disease of childhood with few individuals living long enough to become adults. Now, with widely-adopted infant screening practices, major advances in therapy, and increased use of preventative medicine, the vast majority of individuals with SCD grow out of pediatric care [13,14,15,16]. Therefore, beginning in 2002, the National Heart, Lung and Blood Institute (NHLBI) conducted a series of workshops that focused on ways to improve treatment for adults living with SCD [17]. Stakeholders determined the need for a systematic, reliable and valid method for documenting adult patient-reported outcomes (PRO) for SCD which led to the creation of ASCQ-Me. A number of treatments currently are available to improve the functioning and wellbeing of adults with SCD [18,19,20,21,22]. To inform the choice of therapy, the effects of these alternative treatments need to be systematically documented using data that are comparable across studies. Research and development of ASCQ-Me was conducted during the same time period as that for PROMIS and, like PROMIS, used IRT to evaluate and calibrate questions for inclusion in the system [23, 24]. This enabled the development of a computer adaptive testing (CAT) system for ASCQ-Me [25].

There is a long-standing debate in the PRO literature regarding the relative advantages of generic assessments of functioning and well-being compared to condition-specific measures of the same [26]. Some research suggests that disease-specific indicators may lack relevance because many aspects of functioning (e.g. sleep, sexual, or cognitive functioning) and wellbeing (e.g. depression, pain, fatigue) are, in fact, not specific to a particular condition [27, 28]. Other research demonstrates that the amount of evidence available to interpret the meaning of a measure is increased when it is possible, as it is with generic measures, to accumulate data across conditions and treatments [29,30,31]. Finally, a generic approach to measuring functioning and wellbeing has a practical value because a new measurement system does not have to be created for every specific chronic condition: rather, the resources that would have gone into creating an alternative measure can be put into designing studies which go beyond measure development.

Yet, the practical value of generic measures is only relevant if their validity to assess outcomes for specific conditions is comparable to that of condition-specific measures. The evidence in this regard is inconsistent. Compared to generic measures, condition-specific measures have sometimes been shown to be more sensitive to differences in disease severity [32, 33], and sometimes less sensitive [34, 35]. For example, the Functional Assessment of Cancer Therapy-Colorectal PRO (a condition-specific measure) was found to be more responsive to change in condition for patients with colorectal cancer, than the Short Form-12 Health Survey version 2 (a generic measure) [33]. In contrast, the Short Form 36 Bodily Pain scale (a generic measure) was found to be more responsive to worsening of symptoms in a group of patients with diagnosis of herniated disc, spinal stenosis, and spondylosis, than two disease-specific measures (the Oswestry Diability Index or the MODEMS) [34]. A comparison between generic and condition-specific health-related quality of life in children with SCD showed that the condition-specific measure provided important information not provided by the generic measure [36].

There is also disagreement about the characteristics of measures that define them as either generic or condition specific. The condition-attribution approach is to take generic questions and modify them so that the respondent answers them only with regard to the condition [27, 37, 38]. Each item (e.g. “How severe is your pain?”) would have an attribution to the condition (e.g. “How severe is your sickle cell pain?”). Following this approach, condition-specific items can be formed by simply modifying existing questions. The condition attribution approach is efficient -- patient interview data would not have to be collected and analyzed in order to generate the condition-specific items. Yet, FDA guidelines on the development of PROs require that patient interviews be part of the development process [39] and this favors the content-validity approach to developing condition-specific measures. The content-validity approach is to base items on aspects of functioning and wellbeing that persons with the condition have spontaneously offered in semi-structured interviews or that are known features of the clinical presentation. That is, the content is condition-specific because it has been reported by persons with the condition [40,41,42,43,44] and this is the approach that we used to develop ASCQ-Me [2].

Previous research comparing the measurement properties of selected PROMIS item banks to condition-specific measures of the same or related domains, in general, has supported the use of PROMIS as an alternative to condition specific measures. PROMIS measures were shown to provide precise measurement over a broader range of scores on the latent trait than legacy measures [45, 46]. For example, in 17,726 patients with osteoarthritis, compared to arthritis-specific PROs (the Western Ontario and McMaster Universities Arthritis Index, WOMAC, and the Health Assessment Questionnaire, HAQ), the PROMIS Physical Functioning (PF) CAT scores had lower standard errors over a broader range of physical function latent trait scores [47]. The PROMIS PF CAT also was shown to be more sensitive to change in condition following knee surgery than either a condition-specific PRO (the International Knee Documentation Committee, IKDC, scale) or an electronic walking performance measure [35]. One reason PROMIS may perform well in these contexts is that the added precision of adaptive assessment makes up for any precision loss that may be due to PROMIS’ lack of condition-specific content. Indeed, a comparison of the PROMIS Depression CAT to a variety of fixed-length forms from the same item bank showed the CAT to be more precise and have lower ceiling and floor effects [48]. Thus, a more valid comparison of PROMIS to condition specific measures would keep the type of measure constant. That is, comparisons would be made between fixed format PROMIS measures and fixed format condition-specific measures or between PROMIS CATs and condition-specific CATs.

Here we compare the measurement properties of PROMIS and ASCQ-Me using fixed formats for each. Moreover, our earlier research [3] did not test the reliability and validity of ASCQ-Me fixed forms and so we provide this evidence as well. The objective of this research is to produce information useful to those interested in using either PROMIS or ASCQ-Me to assess outcomes for adults with SCD. Thus, we conducted a descriptive study to accomplish four tasks: (1) to publish evidence regarding the reliability and validity of the ASCQ-Me fixed forms and short forms (SFs), (2) to describe the precision of the ASCQ-Me fixed forms to discriminate among levels of SCD severity; (3) to describe the validity of PROMIS Version 1.0 SFs to assess health outcomes for adults with SCD, and (4) to determine which scores measuring similar health concepts provided the most information about SCD severity.

Methods

Participants

PROMIS and ASCQ-Me field test data were collected at seven geographically diverse sites with the assistance of site coordinators trained in a standardized study protocol. The targeted enrollment across sites was set to obtain sufficient sample size for the psychometric analyses (500 patients) assuming a ten-percent rate of no-shows; and we attempted to achieve diversity in age and gender. Eligible participants were adults with 18 years of age or older at the time of data collection and diagnosed with sickle cell disease. People who were younger than 18, did not have a diagnosis of SCD, had a diagnosis of sickle cell trait, or could not read English, were excluded from the study. We sought to be inclusive of the variability in adult patients who were seen in ambulatory clinics in the U.S., including those in steady state and on therapy, so we had no other exclusions. It is important to note that the same group of patients completed both ASCQ-Me and PROMIS questions so that any differences in the ASCQ-Me and PROMIS scores would not be attributable to differences in the people who provided the data for each.

Measures

We required a method of identifying groups of patients who differed in their SCD severity in order to evaluate the ability of ASCQ-Me and PROMIS measures to reflect differences in suffering and functioning of people who differed in the extent of their disease. This was challenging because there is no consensus method for assessing SCD severity. SCD is characterized by the type of mutations to the pair of beta-hemoglobin (Hb) genes, variations include Hb-SS, Hb-SC and Hb-Sβ [49, 50] and individuals with Hb-SS usually, but not always, have more symptoms than those with other genotypes [51,52,53]. However, genotype is not a reliable indicator of disease severity because variation of symptomatology within genotypes is so broad [54,55,56]. Frequency of hospitalizations has been used as a marker of disease severity [57,58,59,60,61]; yet, other data indicate that a large percentage of patients who suffer from extreme pain never go to the hospital [62,63,64].

Nevertheless, adult sickle cell providers seeing a patient for the first time ask that patient a set of questions to gauge the severity of his or her disease. A medical history characterized by prescription pain medication, blood transfusions and a number of these diagnoses (i.e., retinopathy, avascular necrosis, leg ulcers, kidney disease, stroke, and pulmonary hypertension) in a person presenting with SCD could indicate severe disease [65,66,67,68,69]. In the absence of a consensus method for determining severity, we reasoned that a method which mimicked the clinical interview in content would identify patients who differed in the amount of SCD-related damage caused by their sickle cell, and could, thus, serve as a surrogate marker of disease severity. Following this logic we included a checklist of seven conditions usually secondary to SCD and two treatments indicative of severity as part of the data collection. For convenience, we refer to this indicator as the SCD Medical History Checklist (SCD-MHC).

In previous research this measure demonstrated discriminant validity with regard to a checklist of conditions not associated with SCD, convergent validity with alternative indicators of SCD severity (number and severity of vaso-occlusive incidents, frequency of emergency department visits in the past year), and resistance to common method bias [3]. The SCD-MHC was scored as the sum of the endorsed questions -- the method employed in previous research with such checklists [70,71,72] -- and supported by research showing negligible differences between unit and alternative weighting methods for the scoring of checklists [73, 74].

ASCQ-Me measures included five-item SFs for Emotional Impact, Sleep Impact, Social Impact, Stiffness Impact, Pain Impact; and a five-item, pain episode fixed form scored as Pain Episode Frequency (two items) and Pain Episode Severity (three items). We use the term “fixed form” to indicate that these are not adaptive measures because all respondents are presented with the same items in the same sequence. In contrast, all ASCQ-Me short forms are subsets of items from the corresponding ASCQ-Me item banks. The Pain Episodes items are not short forms because they are not drawn from the ASCQ-Me item banks, but they are fixed forms because the items are presented in a fixed sequence. PROMIS measures included version 1.0 SFs for Pain Impact, Pain Behavior, Physical Functioning, Anxiety, Depression, Fatigue, Satisfaction with Discretionary Social Activities, Satisfaction with Social Roles, Sleep Disturbance, and Sleep-Related Impairment. PROMIS SFs ranged in length from six to ten questions; with most (eight out of ten SFs) containing either seven or eight questions.

Table 1 denotes the PROMIS measures that correspond to each of the ASCQ-Me measures and shows that for each ASCQ-Me measure (except for Stiffness Impact), there were two corresponding PROMIS measures. PROMIS Fatigue does not have a corresponding ASCQ-Me SF. Table 1 also describes the differences in the scoring for ASCQ-Me and PROMIS scales.

Table 1 ASCQ-Me fixed and short formsa, corresponding PROMIS short forms, direction of scoring for each

To be consistent with widely-used health status measures [75,76,77], most ASCQ-Me scores are calculated in the direction of overall health such that higher ASCQ-Me scores indicate better health. The one exception is the Pain Episodes measure for which higher scores mean more frequent and severe pain episodes. PROMIS scores for health concepts that describe functioning (e.g. physical and social functioning) are scored in this direction as well (higher scores indicate better health). PROMIS scores for symptom burden (e.g. depression, sleep problems, pain) are calculated such that higher scores indicate poorer health, consistent with symptom burden. These differences do not affect statistical analysis of variance attributable to each measure. However, in comparing ASCQ-Me and PROMIS with regard to associations between scores and criterion variables, these differences must be kept in mind. For example, the correlation between ASCQ-Me measures of symptoms and PROMIS measures of symptoms will be negative because lower scores on the PROMIS measures indicate less of the symptom while higher scores on the ASCQ-Me measures indicate less of the symptom.

Both ASCQ-Me and PROMIS are scored so that “50” is the average for the population on which their questions were calibrated; and 10 points is equivalent to one standard deviation in that population [30]. For ASCQ-Me, scores were based on the 556 adults with SCD who participated in the field test [3]. The sociodemographic characteristics of this population were consistent with the sociodemographic characteristics of the adult clinical population who have SCD [78, 79]. For PROMIS, scores were based on a sample from the general US population and included individuals with and without chronic disease [30, 80].

Data collection procedure

Patients signed a consent form after they arrived at one of ASCQ-Me field test sites. They were then seated at a computer and a site coordinator helped them to log onto the ASCQ-Me website. Sites confirmed that participants had SCD. The site coordinator entered the SCD type, and assisted the respondent in reviewing a tutorial that demonstrated how to navigate through the survey. Respondents completed ASCQ-Me questions first, took a 30-min break if they wanted one, and then completed the PROMIS measures. Respondents received an honorarium for their participation. We limited our analytic sample to the 490 respondents who had completed both ASCQ-Me and PROMIS assessments.

Analytic methods -- reliability and validity of ASCQ-Me short- and fixed forms

Reliability and validity evidence has been published for the ASCQ-Me item banks [3] and for the PROMIS measures [80, 81] but not for the ASCQ-Me SFs. The ASCQ-Me SF scoring algorithms incorporated IRT item calibrations but we present internal consistency reliability estimates using coefficient alpha [82] rather than test information curves to facilitate interpretation of the results and to enable audiences to compare our estimates of reliability to those available for other measures which report alpha. Construct validity of the ASCQ-Me SFs and pain episode fixed forms was assessed by examining the correlations of short-form scores to item-bank scores for the same health concepts and to PROMIS SFs for similar concepts. As explained in Table 1, in many cases there is more than one PROMIS score corresponding to a particular ASCQ-Me score. When this is the case, the range of correlations among the ASCQ-Me score and the PROMIS scores will be presented. Construct validity was also evaluated by determining the ability of the SF scores to discriminate among groups of participants formed on the basis of their SCD-MHC scores -- representing low, medium, and high levels of severity. Low, medium and high cut-offs for the SCD-MHC were based on tertiles of the distribution of scores. SCD-MHC scores were the sum of the number conditions checked. Cutoffs for low, medium and high groups were SCD-MHC scores less than 2, equal to 2, and greater than 2, respectively.

Analytic methods - precision of ASCQ-Me and PROMIS to discriminate among levels of SCD severity

We calculated the average score for respondents within each tertile of SCD severity for all of the ASCQ-Me and PROMIS measures and created histograms to examine the pattern of scores. We examined these patterns to determine: 1) whether there was a monotonic relationship between levels of SCD severity and mean scores on the ASCQ-Me and PROMIS measures; 2) which SFs indicated a decrease in health corresponding to an increase in SCD severity, and 3) whether the patterns of relationships between SCD severity and health for similar health concepts was similar for ASCQ-Me and PROMIS. We used univariate analysis of variance (ANOVA) to test for differences in means among levels of SCD severity with a Bonferroni correction to the significance level to account for the family-wise error rate (i.e. 0.05/17 = 0.0029) [83]. The relative precision of each scale was described by dividing the F-statistic associated with each scale by the largest F-statistic in the group [84, 85].

Analytic methods- comparative sensitivity of PROMIS and ASCQ-measures of similar health concepts

These analyses provided information about the amount of unique variance in SCD severity accounted for by ASCQ-Me compared to PROMIS. We fitted multiple linear regression models to evaluate the relationship of ASCQ-Me and PROMIS scores to SCD-MHC scores controlling for the effect of sex, age and genotype. For this set of analyses, SCD-MHC scores were left continuous. We used Type III sum of squares to compute the F-statistic for the unique variance in SCD-MHC associated with ASCQ-Me and PROMIS scores. Three models each were fitted to compare the effects of the PROMIS Pain Impact and Behavior SFs to three ASCQ-Me pain measures (Pain Impact, Pain Episode Frequency, and Pain Episode Severity). We applied the Bonferroni correction to the significance level of statistics from these models to account for the family-wise error rate (i.e. 0.05/3 = 0.0166). Two models each were fitted to compare the effect of ASCQ-Me scores to two corresponding PROMIS measures for the emotional, social and sleep domains. The significance level for the associated statistics was set at 0.025 (i.e. 0.05/2 = 0.025).

Results

Respondent characteristics

A total of 490 adults completed both the ASCQ-Me and PROMIS questions. Of these, just 6.5% of participants were older than 55; and roughly 30% were in each of the age ranges of 18–24 (30%), 25–34 (33%), and 35–54 (31%). Nearly two-thirds of respondents were female (64%). Almost two-thirds of respondents had Sickle Cell Anemia (Hemoglobin SS - 65%) and the rest either had Sickle Hemoglobin C Disease (Hemoglobin SC - 20%), Sickle Beta-Thalassemia Disease (Hemoglobin S Beta Plus or Zero Thalassemia 10%) or Sickle Cell type unspecified (5%).

ASCQ-Me fixed form reliability and validity

Table 2 shows the psychometric properties of the five ASCQ-Me SFs. Cronbach’s alpha coefficients for all ASCQ-Me SFs were well above 0.90 and latent health scores obtained from SFs had a very high correlation with those obtained from full item banks (>0.95 in every case).

Table 2 Reliability and validity of ASCQ-Me fixed and short formsa

All correlations between ASCQ-Me SFs and PROMIS SFs for similar health concepts were large (>0.50) [86] and ranged from 0.80 to 0.54 with a median correlation of 0.69. The ASCQ-Me Emotional Impact scores had a stronger correlation with the PROMIS Depression scores than with the PROMIS Anxiety scores. The ASCQ-Me Pain Impact scores were more strongly related to the PROMIS Pain Interference than to the PROMIS Pain Behavior scores.

Evidence for the reliability of the ASCQ-Me pain episode fixed forms was not as strong, although internal consistency reliabilities exceeded 0.70 -- a frequently used rule of thumb for evaluating whether a measure is reliable enough for use in statistical comparisons at the group level [87]. Of the two, the ASCQ-Me Pain Episode Frequency measure had higher internal consistency reliability and a stronger relationship to the ASCQ-Me Pain Impact item bank and to the PROMIS Pain Interference and the PROMIS Pain Behavior SFs. Correlations between the ASCQ-Me pain episode fixed forms and alternative assessments of pain were not as large as those between the ASCQ-Me pain short form and the PROMIS pain measures. We address this in the discussion.

Taken together, these results support both the reliability and validity of the ASCQ-Me SF scores, in particular, and suggest that the results found for the SFs would be indicative of results that would be obtained with the full item bank. Other evidence for the validity of the ASCQ-Me SFs is presented below.

Table 3 displays the mean scores for seven ASCQ-Me and ten PROMIS fixed format measures at each level of SCD severity. The measures are ordered in terms of the differences in the means among level of severity starting with the measure that differs the most across levels of severity.

Table 3 Discrimination of ASCQ-Me and PROMIS scores among levels of SCD severity

The scores which discriminated most among levels of SCD severity were the ASCQ-Me Stiffness Impact, PROMIS Physical Functioning and ASCQ-Me and PROMIS pain SF measures. Next most discriminating were the ASCQ-Me and PROMIS social functioning, the ASCQ-Me Sleep and Emotional impact, and the PROMIS Fatigue scores. Among the least discriminating were the PROMIS sleep and emotional and the ASCQ-Me pain episode scores. For all but the PROMIS Anxiety and Sleep Impairment measures, the probability associated with differences in the means for all of the ASCQ-Me and eight of ten of the PROMIS measures was less than the cut-off of 0.0029 for statistical significance (see Table 3). In Table 4 we present these results separately for PROMIS and ASCQ-Me measures to facilitate comparisons of the sensitivity of different scores within each measurement system.

Table 4 Discrimination of ASCQ-Me and PROMIS scores: within system comparison

Precision of ASCQ-Me and PROMIS to discriminate among levels of SCD severity

The histograms in Fig. 1 show the monotonic relationship between the three means for each ASCQ-Me measure corresponding to the respondent SCD severity grouping of low, medium and high, respectively. In every case, those with ASCQ-Me scores indicating the worst health were found in the highest tertile of SCD severity. The thick, dashed, horizontal line which intersects the vertical axis at 50 represents the average score in the ASCQ-Me field test sample, indicating that those in the top and bottom tertiles of the SCD-MHC, had ASCQ-Me scores showing poorer and better than average health, respectively.

Fig. 1
figure 1

ASCQ-Me scores at low, medium and high SCD severity. The ASCQ-Me measures shown on the X-axis are: Emotional Impact (Emotional); Pain Impact (Pain); Sleep Impact (Sleep); Social Impact (Social); Stiffness Impact (Stiffness); SCD Pain Episode Frequency (Pain Epi Freq); SCD Pain Episode Severity (Pain Epi Sev). The Pain Episode measures are scored such that a higher score indicates more pain whereas the other ASCQ-Me measures are scored so that a higher score means better health

There were two PROMIS SFs each for domains of pain impact, emotional impact, sleep impact, and social impact (see Table 2) and we display the histograms in Fig. 2 for the domain short form which had the strongest relationship to SCD severity. Thus, we did not include the histograms for PROMIS Pain Behavior, Anxiety, Sleep-Related Impairment, and Satisfaction with Discretionary Social Activities in Fig. 2, but the means for these measures at each level of SCD severity can be found in Table 3. All PROMIS measures displayed a monotonic relationship with mean scores systematically showing better health at lower levels of SCD severity. With the general population mean of 50 as the reference line (see thick dashed line at 50 on the graph), these graphs show that adults with SCD, even those with less severe disease, were always less healthy than the general population across all PROMIS measures. Recall that one standard deviation unit on PROMIS and ASCQ-Me metrics is equivalent to 10 points. Those with the most severe disease scored around one standard deviation worse than the general population and even those with the least severe disease scored nearly half as standard deviation worse than the general public on the PROMIS Physical Functioning, Fatigue and Pain Impact SFs.

Fig. 2
figure 2

PROMIS scores at low, medium and high SCD severity. The PROMIS measures shown on the X-axis are: Depression; Fatigue; Pain Impact; Sleep Disturbance (Sleep); Satisfaction with Social Roles (Social); Physical Functioning (Phys Funct). The PROMIS Social and Physical Function measures are scored so that a higher score means more functioning and better health; whereas the other measures are scored so that a higher score means more suffering and poorer health

Comparative sensitivity of PROMIS and ASCQ-Me scores to variability in SCD severity

Table 5 describes the unique variance in SCD severity explained by ASCQ-Me and PROMIS scores after controlling for the effects of age, sex, and SCD genotype. Pain is the hallmark symptom of SCD; thus, we compared the information value of all of the ASCQ-Me and PROMIS pain measures (results for first six models, top half of the table).

Table 5 Comparison of unique variance in SCD severity explained by ASCQ-Me and PROMIS measures

The results showed that, in comparison to the ASCQ-Me Pain Impact SF, the PROMIS Pain Interference and Pain Behavior SFs explained less unique variance in SCD severity (see results for Models I and II). By contrast, the PROMIS Pain Interference and Behavior scores explained more unique variance in SCD severity compared to the ASCQ-Me pain episode scores (see results corresponding to Models III-VI). The results for the second set of models (I-VII, bottom half of the table) show that compared to PROMIS scores for similar health domains, ASCQ-Me scores consistently explained more unique variance in SCD severity. In every case, the amount of unique variance explained by ASCQ-Me was statistically significant and the amount of unique variance explained by the PROMIS Physical Functioning and Satisfaction with Social Roles SFs was also statistically significant (see Models I and II, bottom part of the Table).

Discussion

Interpretation of the results – reliability and validity of ASCQ-Me fixed forms

ASCQ-Me fixed and short forms were shown to be highly reliable. The SFs, based on a subset of items from the full item banks, had internal consistency reliability coefficients ranging from 0.94 to 0.90, supporting their clinical use at the individual-patient level [88, 89]. The fixed-form pain episode measures had internal consistency reliabilities of 0.80 and 0.73 demonstrating good precision for use in group-level clinical research [88,89,90].

Results supported the construct validity of the ASCQ-Me fixed and short forms as well. The correlations between the ASCQ-Me SFs and the corresponding ASCQ-Me item bank, were very large and ranged from 0.96 to 0.99. These correlations support the use of the ASCQ-Me fixed forms as a substitute for the ASCQ-Me item banks. Correlations between the ASCQ-Me Pain Episode Frequency and Severity fixed forms and the full ASCQ-Me Pain Impact item bank were lower but still large to moderate (0.54 and 0.26 in absolute magnitude). The lower correlations suggest that the ASCQ-Me pain episode fixed forms measure aspects of pain which are not covered by the ASCQ Pain Impact SF or item bank. Indeed, a comparison of the content of the measures reveals important differences. There is no overlap in items between the ASCQ-Me pain episode measures and the ASCQ-Me Pain Impact item bank and the pain episode questions refer specifically to pain episodes rather than to pain in general. Moreover, the ASCQ-Me pain episode questions differ from the ASCQ-Me and PROMIS pain short-form questions because they refer to a different time frame. The ASCQ-Me pain episode questions refer either to the past 12 months or to ever; whereas the ASCQ-Me and PROMIS short form questions refer to the past 7 days. The lower correlation of the pain episode scores with the ASCQ-Me Pain Impact item bank also could be due, in part, to the comparatively lower reliability of the pain episode measures. All things being equal, a more reliable measure will have a higher correlation with a criterion than a less reliable measure [90].

ASCQ-Me scores were strongly related to SCD severity providing additional evidence of their construct validity. There was a consistently monotonic relationship between levels of SCD severity and ASCQ-Me, such that ASCQ-Me scores indicated worse health at higher levels of SCD severity. In every case, ASCQ-Me scores significantly discriminated among groups of patients defined by tertiles of SCD disease severity. The ASCQ-Me short-form scores demonstrated a stronger relationship to disease severity than did the ASCQ-Me pain episode scores. Those ASCQ-Me SFs most strongly related to SCD severity were the ASCQ-Me Stiffness, Pain, and Social Impact measures, respectively.

Interpretation of the results –validity of PROMIS short forms

PROMIS scores were strongly related to SCD severity and there was a consistently monotonic relationship between levels of SCD severity and PROMIS scores wherein PROMIS indicated worse health at higher levels of SCD severity. PROMIS scores for these SCD patients, even at the lowest level of SCD severity, indicated impairment relative to the general population. This finding is consistent with the clinical picture of SCD as incurring some suffering and disability even among those with less symptomatic disease [16, 91, 92] and, thus, supports the validity of PROMIS as a measure of functional deficits and suffering for SCD. The PROMIS scores showing the most profound effect of SCD relative to the general population were Physical Functioning, Pain Interference, Pain Behavior, and Fatigue.

Interpretation of results – comparisons among ASCQ-Me and PROMIS scores

One-way ANOVA models demonstrated that, whether measured by PROMIS or ASCQ-Me, ability to function physically, pain, and the ability to engage in social roles and activities were most affected by SCD severity (see last two columns of Table 3). The PROMIS Fatigue SF was highly, significantly related to SCD severity, although less so than the ASCQ-Me and PROMIS measures of physical and social function and pain. On the other hand, compared to PROMIS, ASCQ-Me SFs demonstrated a greater effect of disease severity on emotional distress and sleep. ASCQ-Me pain episode scores were not as sensitive to SCD severity as scores yielded by other ASCQ-Me measures. Taken together, these results support the validity of many of the PROMIS and ASCQ-Me SFs of similar health concepts to describe differences in disease severity.

In choosing among ASCQ-Me and PROMIS assessments of similar health concepts, one would want to compare the amount of unique information about disease severity that each provides. We used multiple linear regression models of the relationship of ASCQ-Me and PROMIS SFs to SCD severity holding constant the potentially confounding effects of age, gender and genotype (Table 4). These results consistently demonstrated that, compared to PROMIS SFs, ASCQ-Me SF scores of similar concepts explained more unique variance in SCD severity and did so with fewer items.

Still, we are left to wonder why ASCQ-Me scores were sometimes found to be more sensitive than PROMIS which measured similar health concepts. It is not because the ASCQ-Me questions ask respondents to attribute symptoms or functioning to SCD, because the results were found for scores based on questions that did not refer to SCD. Some prior research has shown better sensitivity for PROMIS scores to condition severity, when those scores were based on calibrations derived from patients with that specific condition [93]. So, the greater sensitivity of ASCQ-Me scores could be due to the items having been calibrated on an SCD sample. In prior research, we replicated the regression analyses described earlier using two types of scoring: IRT scoring (with weights determined by the GRM for both) and raw scoring (with unit weights). The pattern of differences between ASCQ-Me and PROMIS was largely the same regardless of scoring method, suggesting that the greater sensitivity of ASCQ-Me was due to the content originating in qualitative research with SCD patients, rather than the calibration sample [94].

Limitations

The implications of these results are restricted by variables included in the data collection. The PROMIS suite of measures includes multiple short forms for each health concept – for example, there are 10 PROMIS SFs that assess physical functioning. We do not know whether these results would generalize to other SFs; however the versions used in this study are the ones in widest circulation. In addition, our condition severity data was self-reported. This research would be strengthened were a consensus “gold standard” method of measuring SCD severity available to define the low, medium and high severity groups. Because they derive from the same source, the relationship between our indicator of condition severity (SCD-MHC) to ASCQ-Me or PROMIS scores might be artifacts of the data collection method. This theory is not supported, however, by evidence of discriminant validity for the SCD-MHC in relationship to self-reports of conditions which are not related to SCD [3]. Time was another variable missing from the data collection. Data were cross-sectional, so we could not address the relationship of ASCQ-Me or PROMIS scores to change in condition.

The implications of these results also are restricted by the characteristics of the participants. Those older than 54 and with SCD Type other than SS were in the minority, so sample size prevented us from being able to evaluate the generalizability of these results to the elderly and those with genotypes other than SS. Implications are also restricted by the study participant background data we were able to obtain. For example, we did not have data on all the various therapies to which individuals had been exposed so we could not evaluate the generalizability of results to subsets of patients defined by therapy. While data came from seven geographically dispersed clinics throughout the country, we do not know how representative our field test sample is because a nationally-representative, descriptive study of the socio-demographic and health characteristics of adults with SCD does not yet exist. However, available data suggests that the characteristics of our sample are likely to mirror those of the other populations with regard to age and hemoglobin type, although males may have been under-represented [95,96,97].

Future research

Sickling hemoglobin may cause obstruction of blood flow in the brain and so cognitive functioning is an important health domain for SCD [12]. Unfortunately, the cognitive functioning item bank that we developed for the ASCQ-Me field test did not demonstrate good psychometric properties and so it is not included among the ASCQ-Me measures approved for use [3]. Future research could be designed to collect the data to evaluate the validity of the PROMIS Cognitive Functioning measures in adults with SCD.

Future research also could be conducted to evaluate alternative PROMIS short forms. PROMIS was developed to provide banks of items from which clinicians and researchers could select subsets particularly relevant for their purposes. PROMIS short-forms created by selecting a subset of items from corresponding item banks using mixed methods, may result in more precise measures for use in SCD than either generic short forms or ASCQ-Me measures. Such research has been successful in developing PROMIS short forms for use in multiple sclerosis and fibromyalgia, for example [97, 98].

Research is being conducted to determine whether these fixed-form results would generalize to results based on ASCQ-Me and PROMIS CATs. We did not compare the sensitivity of the PROMIS CATs to the ASCQ-Me CATs because the field test did not administer the ASCQ-Me CAT. Evaluations of this using simulated ASCQ-Me CAT data [99] are under way.

These results have implications for the sample size requirements to achieve a certain level of statistical power -- a measure which yields a more precise score than another measure will require fewer respondents. But statistical differences are not the same as clinically meaningful differences. A clinically meaningful difference is one large enough to be perceived by patients and/or their providers and/or one which has implications for planning care [100]. Future studies are required to determine whether the differences in precision between ASCQ-Me and PROMIS scores have any consequences for clinical care.

Other research to link ASCQ-Me scores to PROMIS scores is underway so that ASCQ-Me scores can be understood in comparison with the general population. But future research is required to compare the responsiveness of ASCQ-Me and PROMIS scores to change in condition severity.

Conclusions

Study results showed support for the validity of eight PROMIS SFs and all ASCQ-Me SFs and fixed forms to assess health outcome in adults with SCD. Compared to comparable PROMIS scores, most ASCQ-Me scores were better predictors of disease severity. The clinical implications of these results require further investigation. Future research also should evaluate the validity of PROMIS cognitive functioning measures for use in adults with SCD and the sensitivity of both PROMIS and ASCQ-Me measures to change in SCD severity over time.