Abstract
Objectives
This study aims to assess psychometric properties of the Hungarian PROMIS-29+2 profile measure and provide general population reference values for Hungary.
Methods
An adult general population sample (n = 1700) completed PROMIS-29+2 v2.1 in an online survey. The following psychometric properties were assessed: floor and ceiling effect, convergent validity with SF-36v1 domains, internal consistency (McDonald’s omega), unidimensionality, local independence, monotonicity, graded response model (GRM) fit and differential item functioning (DIF). Age- and gender-specific reference values were established using the US item calibrations.
Results
Depending on scale orientation, high floor or ceiling effects were observed for all domains (25.2–60.7%) except for sleep disturbance. McDonald’s omega for domains ranged from 0.87–0.97. Unidimensionality, local independence and monotonicity were supported and the GRM adequately fitted for all but one domains. The sleep disturbance domain demonstrated item misfit, response level disordering and low discrimination ability, particularly for item Sleep116 (‘refreshing sleep’). Strong correlations were observed between PROMIS-29+2 and corresponding SF-36 domains (rs=│0.60│ to │0.78│). No DIF was detected for most sociodemographic characteristics. Problems with physical function, pain interference and social roles tended to increase, whereas problems with anxiety, depression, fatigue and cognitive function declined with age (p < 0.01). In all domains except for cognitive function, more health problems occurred in females than in males (p < 0.001).
Conclusion
The Hungarian PROMIS-29+2 shows satisfactory psychometric properties; however, the sleep disturbance domain substantially underperforms that requires further attention. Population reference values were generated that facilitate the interpretation of health outcomes in various patient populations.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
In recent years, clinicians, health service providers, researchers, the pharmaceutical industry, reimbursement agencies and health policymakers have been increasingly recognizing the importance of measuring health-related-quality of life (HRQoL) [1, 2]. Some HRQoL instruments are referred to as ‘generic measures’ that describe health in a general way allowing the assessment of HRQoL and changes in HRQoL across a range of disease areas and patient populations, including members of the general public and patient groups. Such measures include the 36-Item Short Form Survey (SF-36), EQ-5D and Assessment of Quality of Life (AQoL) [3, 4]. More recently, the Patient-Reported Outcomes Measurement Information System (PROMIS) adult generic profiles (PROMIS-57, -43 and -29)[5] have been developed that represent a new generation of such measures by relying on item response theory (IRT) calibrated item banks there using a different approach than conventional measures [6].
The PROMIS initiative has so far developed item banks for over 100 key HRQoL domains, such as physical (e.g., pain, physical function, itch, sleep), mental (e.g., anxiety, depression) and social health (e.g., ability to participate in social roles and activities) [7]. Item banks enable computerized adaptive testing (CAT) tools for individual assessment of HRQoL. A major advantage of the three PROMIS generic profile measures is that they are able to produce comparable results to the complete item banks [5]. Although originating from the US, the item banks and the profile measures have been translated to several languages and have increasingly been used in European and Asian countries [8,9,10,11,12]. As standardised HRQoL measures are required to maintain their psychometric performance in different languages, the robustness of measurement properties needs to be confirmed for all language versions.
Among the three PROMIS adult profile measures, PROMIS-29 is the most widely used as a standalone, concise HRQoL measure [13]. By extending it with two items of cognitive function (PROMIS-29+2), it allows the estimation of quality-adjusted life years (QALYs) to assess benefits of treatments in economic analyses [14]. Psychometric performance of PROMIS-29, including validity, reliability and responsiveness, has already been tested in a broad range of health conditions and populations, such as cancer [15, 16], inflammatory bowel diseases [17], chronic kidney disease [18], burn [19], haemophilia [20], musculoskeletal diseases [21,22,23], systemic lupus erythematosus [24], aortic dissection [25], elderly with multiple chronic conditions [26] and general population [27,28,29,30]. Moreover, PROMIS-29 population reference values have also been established in many countries [28, 29] supporting the interpretation of scores by evaluating the relative burden of health conditions compared with reference values. The psychometric performance of the Hungarian PROMIS profile measures has not yet been tested and no reference scores are available for Hungary. This study therefore aims to (1) assess psychometric properties of the Hungarian PROMIS-29+2 profile measure and (2) provide general population reference values from a large representative sample in Hungary.
Methods
Study design and data collection
The study was approved by the Research Ethics Committee of the Corvinus University of Budapest (No. KRH/343/2020). The validation of PROMIS-29+2 formed part of a larger survey on health and well-being of the Hungarian general population [31, 32]. In November 2020, a web-based cross-sectional survey was undertaken in Hungary. We engaged a survey research company to conduct the data collection among members of an online panel. By contract the company provided access to the dataset of those respondents’ responses that had fully completed the questionnaire. Providing access to partially completed questionnaires was not included in the contract. The survey company provided compensation to the respondents in the form of survey points redeemable for rewards. We set ‘soft’ target quotas for age, gender, education, type of settlement and region to achieve a sample that approximates the composition of the Hungarian adult general population. Inclusion criteria were being aged ≥ 18 years and providing informed consent prior to starting the survey.
Respondents completed the official Hungarian-language version PROMIS-29+2 v2.1 [33] as distributed by the PROMIS Health Organization. Other data collected included sociodemographic questions (age, gender, education, employment, marital status, income, household size, type of settlement, region), history of chronic health conditions and the 36-item Short Form Health Survey (SF-36v1). The order of the two instruments was fixed, respondents first completed the PROMIS-29+2 followed by the SF-36. There were no missing values in the data as we made it mandatory to respond to all questions in the online survey.
PROMIS-29+2
PROMIS-29+2 v2.1 [33] was included in our survey that consists of PROMIS-29 and two items from Cognitive Function-Abilities v2.0 [34]. The PROMIS-29 profile comprises of 29 items relating to the following seven HRQoL domains [physical function, anxiety, depression, fatigue, sleep disturbance, ability to participate in social roles and activities (hereafter social roles) and pain interference] and an 11-point pain intensity numeric rating scale [5]. The Cognitive Function-Abilities items are measures of an eighth, cognitive function domain. Each PROMIS-29 domain has four five-level items. The five-point response scale varies across difficulty (i.e., ‘without any difficulty’ to ‘unable to do’), frequency (‘never’ to ‘always’), severity (‘not at all’ to ‘very much’) and global rating (‘very poor’ to ‘very good’) format scales. The recall period is unspecified for physical function and social roles; all other domains refer to the past seven days. A total raw score ranging from 4 to 20 (2–10 for cognitive function) may be computed for each domain by adding up the responses on each item of the domain. The US item calibrations were used to derive T-scores from raw domain scores, where a mean T-score of 50 with a SD of ten represents the US general population [7]. The only exception is the sleep disturbance domain, where a mixed general population and clinical sample was used for the calibration of T-scores with above-average sleep disturbance [35]. For scales of function (i.e., physical function, social roles and cognitive function) a higher score corresponds to a better HRQoL and for symptoms (i.e., anxiety, depression, fatigue, sleep disturbance and pain interference) a higher score corresponds to worse HRQoL [36].
36-item short form survey (SF-36)
SF-36 is one of the most extensively used and validated generic HRQoL instruments [37]. It assesses respondents’ HRQoL in 36 items covering eight domains with a four-week recall period: physical functioning (ten items), role limitations due to physical health problems (four items), bodily pain (two items), general health (five items), vitality (four items), social functioning (two items), role limitations due to emotional problems (three items) and mental health (five items). One item (2nd), which asks about health change, is not included in the scale or summary scores. Scores for items on each of the eight scales are summed up to give scale scores that are linearly transformed onto a 0–100 scale. Note that scores are not comparable across domains.
Psychometric analyses
Data analysis was carried out with R version 4.1.1 (Vienna, Austria). We followed classical test theory and IRT methods previously used in testing psychometric properties of PROMIS item banks and profile measures [6, 20, 21, 27, 38, 39]. For the analyses, we considered PROMIS-29 as the core measure and we tested measurement properties of the additional cognitive function domain separately, wherever possible. Psychometric analyses were performed on the unweighted sample; however, for estimating population reference values, the sample was weighted for age group and gender. All the statistical tests were two-sided, and p < 0.05 was considered statistically significant.
Floor and ceiling effect
Floor (proportion of responses at the lowest score) and ceiling (proportion of responses at the highest score) were computed for the eight PROMIS-29+2 domains. If > 15% of respondents scored the lowest or highest response level, we considered ceiling or floor effect to be present [40, 41].
Reliability analyses
Internal consistency reliability was assessed by computing Cronbach’s alpha and McDonald’s omega (total) for each domain (‘psych’ package [42]). For Cronbach’s alpha, a value > 0.70, while for McDonald’s omega total > 0.90 was considered as a sign of adequate internal consistency [43].
Item response theory assumptions
In accordance with previous PROMIS validation studies [6, 27, 30], the seven domains of PROMIS-29 were separately analysed with graded response models (GRM). Before modelling, the following three statistical assumptions were tested: unidimensionality, local independence and monotonicity. Unidimensionality was assessed using an exploratory bifactor model (‘psych’ package [42]) that allowed to extract explained common variance (ECV) and McDonald’s omega (hierarchical) values. The following cut-off values were used: ECV > 0.60 and omega > 0.70 [44]. IRT-based standardized Chen and Thissen’s index (χ2) was used to detect local dependence (‘mirt’ package [45]). A χ2 of > 0.3 implied possible local dependence and > 1 definite local dependence [46]. Any violations of local dependence were considered negligible if the ECV was ≥ 0.90 [46,47,48,49]. Monotonicity was tested by examining the graphs of item mean scores conditional on the total raw scale score minus the item score [6].
Item response theory analyses
After confirming the IRT assumptions, we fitted a GRM (‘mirt’ package [45]). We examined each item’s discrimination (i.e., item slope, a) and item thresholds (i.e., item difficulty, b). Model fit was assessed by root mean square error of approximation (RMSEA), Standardized Root Mean Square Residual (SRMR), Comparative Fit Index (CFI) and Tucker–Lewis Index (TLI), and was considered acceptable if CFI > 0.95, TLI > 0.95, RMSEA < 0.06 and SRMR < 0.08 [50]. Item fit was assessed by computing the differences between observed and expected responses under the GRM using S-χ2 statistic, where a p-value < 0.001 was considered indicative of item misfit [51]. Item characteristic curves (ICCs) were generated using GRM.
Differential item functioning
To assess differential item functioning (DIF), a series of ordinal logistic regressions were fitted (‘lordif’ package [52]). In the first step, we performed an ordinal logistic regression without any anchor. The χ2 criterion was assessed looking for potential items with DIF. Once DIF was detected, we moved to the second step, where items within a domain that did not show any DIF were used as already-purified anchors. In this second step, three ordinal logistic regression models were estimated to compare the overall, uniform and non-uniform DIF for each item. Uniform DIF occurs when there is a constant systematic difference in item response between subgroups of respondents across the entire continuum of the latent trait, whereas non-uniform DIF occurs when the differences between groups vary across the continuum of the latent trait. Uniform, non-uniform and overall DIF were examined by comparing model 1 vs. model 2, model 2 vs. model 3, model 1 vs. model 3, respectively. Items were flagged for DIF when the McFadden’s pseudo R2 change was > 0.02 [33]. Test characteristic curves were used to visualize the aggregate impact of DIF on domain scores (i.e., differential test functioning). DIF was evaluated for age (median split at 47 years), gender (male vs. female), education (primary, secondary, university/college), employment (employed, retired, other), place of residence (capital, other town, village), geographical region (Central Hungary, Transdanubia, Great Plain and North), marital status (married or domestic partnership vs. any other) and household net monthly income per person (under or over the median of HUF 126,924 and do not know/want to answer).
Convergent validity
Convergent validity of PROMIS-29+2 was assessed against the SF-36v1 questionnaire. We used Spearman’s rank-order correlations to test the association between domains and summary scores of the two measures. Correlation coefficients were interpreted as very weak (< 0.20), weak (0.20–0.39), moderate (0.40–0.59), strong (0.60–0.79) and very strong (≥ 0.80) [53]. We hypothesized at least strong correlations between domains covering a similar construct (e.g., PROMIS physical function and SF-36 physical functioning). Weak or no correlations were assumed between the PROMIS cognitive function and SF-36 domains as this area of HRQoL is missing from the SF-36.
Population reference values and cross-country comparisons
In estimating population reference values, the sample was weighted for age group and gender to account for small deviations from the reference population in Hungary [54]. To accommodate the effect of weighting on variances, Taylor linearization was used to calculate appropriate standard errors. Mean (SD) dimension and summary T-scores and their 95%CIs were computed by gender and age groups (18–24, 25–34, 35–44, 45–54, 55–64 and 65 + years). Bivariate ordinary least squares regressions were used to test the association between domain T-scores and pain intensity scores with age groups and gender. Weighted domain T-scores were compared to those of the general population in the US, the UK, Germany and France [28].
Results
Characteristics of the sample
Overall, 2502 online panel members initiated the survey. Of these, 2079 consented and 379 dropped out during the questionnaire. A total of 1700 respondents finished the survey. The median completion time of PROMIS-29+2 was 2 min 59 s (Q1: 2 min 9 s, Q3: 4 min 8 s). Table 1 shows the sociodemographic and health-related characteristics of the respondents in comparison to the general population in Hungary. The sample was generally representative of the Hungarian general population for age, gender, employment and marital status, type of settlement and geographical region. Secondary educated respondents were underrepresented in the sample. Overall, 47.4% had a self-reported, physician diagnosed health condition. Descriptive statistics of PROMIS-29+2 and SF-36 domain scores are presented in Table 2.
Floor and ceiling effect
Among the eight PROMIS-29+2 domains, the highest floor effects were observed for pain interference (50.5%), followed by depression (44.1%), anxiety (35.4%) and fatigue (25.2%) (Table 2). Floors of the physical function, social roles, sleep and cognitive function domains were well below the threshold (0.3–6.2%). High ceiling effect was observed for physical function (60.7%), social roles (39.1%) and cognitive function (36.5%), while there were no apparent ceiling effects for the other domains (0.4–1.3%).
Reliability
Cronbach’s alpha and McDonald’s omega total values exceeded the thresholds of 0.70 and 0.90 for all PROMIS-29 domains with the exception of McDonald’s omega total (0.87) for the sleep disturbance domain (Table 3).
IRT assumptions
Using bifactor models, the unidimensionality assumption was confirmed for all PROMIS-29 domains. For sleep disturbance, ECV was met (0.68), however, McDonald’s omega hierarchical was exactly at the threshold (0.70) (Table 3). Chen and Thissen’s local dependence indices were below 1 for nearly all item pairs of each domain (Online Resource 1). The exceptions include Sleep109 (‘sleep quality’) vs. Sleep20 (‘problem with sleep’) and PAININ9 (‘pain interfering with day to day activities’) vs. PAININ22 (‘pain interfering with work around the home’). However, for the latter pair, the ECV from the bifactor model was very high (0.94), therefore the local dependence detected can be deemed negligible. In the sleep disturbance domain three item pairs showed a Chen and Thissen’s index of above 0.3 and one pair was above 1. Graph item mean scores conditional on total score minus item score supported the monotonicity assumption for all domains (Online Resource 2).
IRT analysis
For each of the seven PROMIS-29 domains, almost all three assumptions of IRT analysis were met. Several items misfitted the GRM as indicated by the p-values for the S–χ2 statistics (Table 4). Misfitting items included two items of the anxiety domain [EDANX01 (‘fearful’) and EDANX53 (‘uneasy’)], two items of the depression domain [EDDEP04 (‘worthless’), EDDEP41 (‘hopeless’)], all four items of the sleep disturbance domain and one item of the pain interference domain [PAININ31 (‘pain interfering with social activities’)].
For all domains but sleep disturbance, the GRM models’ fit indices met the established criteria for SRMR, CFI and TLI. However, out of the seven PROMIS-29 domains, only anxiety, depression and social roles met the RMSEA cut-off value. The sleep disturbance (0.06–0.97) and fatigue (0.81–0.99) domains had the lowest average item difficulty (b), while physical function (1.41–1.82) had the highest in absolute values. The following items produced the highest discriminative ability (a): PAININ22 (‘pain interfering with work around the home’), PAININ34 (‘pain interfering with household chores’), FATEXP40 (‘fatigue on average’) and PAININ9 (‘pain interfering with day to day activities’). Three items of the sleep disturbance domain [Sleep116 (‘refreshing sleep’), Sleep44 (‘difficulty falling asleep’), Sleep109 (‘sleep quality’)] had the lowest item discrimination.
The ICC plots shown in Online Resource 3 indicated that for most items, the five response options were monotonically ordered. The only exception was item Sleep116 (‘refreshing sleep’) (Fig. 1).
Differential item functioning
No DIF was identified for any of the domains for the following sociodemographic characteristics: gender, education, employment, place of residence, geographical region, marital status and income. However, PFA21 (‘go up and down stairs at a normal pace’) and PFA53 (‘run errands at shop’) of the physical function domain showed uniform DIF for age (McFadden’s pseudo R2 changes between model 1 and 2: 0.030 and 0.023, respectively). The test characteristic curves for these two items showed a small overall impact of DIF (Online Resource 4).
Convergent validity
Table 5 presents the results of the convergent validity analyses. In line with our hypotheses, evidence of strong convergence between corresponding PROMIS-29+2 and SF-36 domains were identified. The strongest correlations were observed between PROMIS-29+2 physical function and SF-36 physical function domains (rs = 0.78), PROMIS-29+2 fatigue and SF-36 vitality (rs = −0.76), PROMIS-29+2 pain interference and SF-36 bodily pain (rs = −0.74) and PROMIS-29+2 depression and SF-36 mental health (rs = −0.70). The PROMIS-29+2 sleep disturbance domain correlated weakly or moderately with SF-36 domains and showed the strongest association with vitality (rs = −0.57). As expected, the PROMIS-29+2 cognitive function domain correlated moderately or weakly with all SF-36 domains (rs = 0.18–0.42). The correlations between the domains within the two questionnaires are presented in Online Resources 5 and 6.
Population reference values and cross-country comparisons
Mean domain T-scores tended to worsen with age for physical function, pain interference and social roles, whereas improved with age for depression, anxiety, fatigue and cognitive function (p < 0.01) (Table 6). The age gradient was not present for sleep disturbance (p = 0.155). Self-reported HRQoL problems were generally higher for females in all domains (p < 0.001), except for cognitive function (p = 0.348). Higher mean pain intensity scores were reported by older and female respondents (p < 0.001).
Compared to the US calibration sample with a mean of 50 and the three European countries with existing reference values, mean PROMIS-29+2 domain T-scores in the Hungarian general population indicated similar or better HRQoL with the largest difference being seen for social roles (> 5 points from the US calibration sample) (Fig. 2). The lowest level of anxiety and sleep disturbance was found in Hungary, while for physical function it was similar to Germany and the UK and for depression, fatigue and pain interference to France. Cognitive function in Hungary was better compared to the US calibration sample.
Discussion
This study assessed the psychometric properties of the Hungarian version of PROMIS-29+2 and provided reference values in a large representative sample of the adult general population in Hungary. Our findings provide evidence of a satisfactory measurement performance of the Hungarian PROMIS-29+2. Floor and ceiling effects were observed for nearly all domains depending on the scale orientation that is comparable to the findings of previous studies in various patient samples [18, 20, 21, 25]. An acceptable reliability was confirmed for all domains. Favourable psychometric properties of the scale include an excellent convergent validity with SF-36 and no or minor DIF for main sociodemographic characteristics. Nevertheless, few potential weaknesses of PROMIS-29+2 have also been identified, particularly the poor performance of the sleep disturbance domain.
While the GRM produced an acceptable fit for six PROMIS-29+2 domains, sleep disturbance failed to meet any fit indices and showed item misfit for all four items of the domain and very low item discrimination ability. Sleep109 (‘sleep quality’) vs. Sleep20 (‘problem with sleep’) showed local dependence suggesting redundancy between the two items. Furthermore, response categories of item Sleep116 (‘refreshing sleep’) were disordered and its discriminatory ability was also substantially lower than that of any other item. Similarly to our findings, the Norwegian and Dutch PROMIS-29 validation studies also reported problems with the performance of the sleep disturbance domain and item characteristics curves of Sleep116 [27, 30]. The sleep disturbance domain of PROMIS-29 is unique in the sense that it includes two positively phrased, reverse coded items (Sleep109 and Sleep116). In questionnaires, reverse-worded items are typically intended to reduce response bias (e.g., pattern answering), disrupt nonsubstantive responding or provide a better coverage of the domain studied [55]. Yet, several studies reported that such items can lead to measurement problems, including low reliability and poor model fit and some argue that they would prevent respondents from inattentive or acquiescent answering [56]. The further exploration of the issues with the sleep disturbance domain as well as testing alternative combinations of items could be subject of future research that administer the full PROMIS sleep item bank.
HRQoL decreased with age for physical and social health domains, but not for the cognitive or mental ones. This finding corresponds to the general population reference values in neighbouring Slovenia that reported worse mental health among young adult respondents using the EQ-5D-5L [57] and to the European reference values for the European Organisation for Research and Treatment of Cancer (EORTC) CAT Core that reported an improving trend for cognitive and emotional functions with age [58]. The better HRQoL of the Hungarian population in some domains compared to Western Europe is an unexpected finding as the average health status in Hungary was found to be below the EU average [59]. Comparisons across countries using different health status measures also reported mixed evidence. Using the EQ-5D-3L, the Hungarian general population was in a substantially worse HRQoL compared to other European countries [60]; however, the EQ-5D lacks domains for fatigue, sleep problems and social roles. By contrast, the EORTC CAT showed that in some HRQoL domains (e.g., physical functioning, social functioning, sleep problems), the Hungarian population, in fact, had a better health status than what was found in Germany or the UK [58].
In this study, we used the official US item parameters to compute T-scores. However, multiple approaches exist to score PROMIS items with each offering their own advantages and disadvantages [61]. Using the US item calibrations follows the PROMIS convention and has the advantage that it represents a common metric, which directly allows for international comparisons. On the other hand, if any item within a domain shows language-DIF, the parameter estimates may not be valid for the local population. Another option is using country-specific item calibrations that enable improved accuracy for comparisons with local patient groups and country-specific interpretation of scores. To benefit from the advantages of both methods, a hybrid approach may also be recommended that uses US item calibrations for items without language-DIF and country-specific item parameters for items with language-DIF [62].
There are a number of limitations to this study. First, the online mode of administration might be responsible for selection bias, and the quota sampling lacks known sampling probability. Second, data were collected during the second wave of the COVID-19 pandemic in Hungary that could have an effect on self-reported health, particularly on young adults’ mental health [62,63,64,65,66,67]. However, responses on self-perceived health status (SF-36 first question) were roughly identical to those reported in a similar large-scale general population survey in Hungary before the pandemic (2019) [68]. The third limitation is that we had no information on the total number of potential respondents contacted by the survey company or access to the data from partially completed questionnaires. Fourth, the reference values for the 65 + age group might not be fully representative to the general population as there were relatively few respondents in the 75 + age group (3.4%). Fifth, it was not possible to fit a GRM for cognitive function because the domain has only two items in PROMIS-29+2. Finally, for each PROMIS-29 domain we fitted a GRM, as this modelling approach was used to develop the PROMIS item banks and this is suggested in the PROMIS analytical recommendations [6]. However, it is possible that certain traits measured by PROMIS-29+2 domains do not have an a priori normal distribution in the population, e.g., physical functioning, pain, fatigue, anxiety and depression because many respondents reporting no problems [69]. A few alternative model types exist that could be useful for future analyses, for example, to alleviate the skewness in data, e.g., zero-inflated mixture IRT models or Davidian Curve IRT [70, 71].
In summary, our results provide support for the satisfactory psychometric properties of the Hungarian version of PROMIS-29+2, including internal consistency reliability, good convergent validity with SF-36 and no DIF. However, the large ceiling and floor effect may detract from the usefulness of the measure when the aim is to differentiate between HRQoL levels at the mild end of the scale. Measurement problems were found with regard to the sleep disturbance domain that would require further refinement. Age and gender-specific reference values were generated for the Hungarian PROMIS-29+2 that facilitate the interpretation of HRQoL outcomes in various patient populations.
Data availability
are available from the corresponding author upon a reasonable request.
References
Weldring, T., & Smith, S. M. (2013). Article commentary: Patient-reported outcomes (pros) and patient-reported outcome measures (PROMs). Health Services Insights, 6, HIS. S11093.
Higginson, I. J., & Carr, A. J. (2001). Measuring quality of life: Using quality of life measures in the clinical setting. BMJ, 322(7297), 1297–1300.
Brazier, J., Ara, R., Rowen, D., & Chevrou-Severac, H. (2017). A review of generic preference-based measures for use in cost-effectiveness models. PharmacoEconomics, 35(1), 21–31.
Finch, A. P., Brazier, J. E., & Mukuria, C. (2018). What is the evidence for the performance of generic preference-based measures? A systematic overview of reviews. The European Journal of Health Economics, 19(4), 557–570.
Cella, D., Choi, S. W., Condon, D. M., Schalet, B., Hays, R. D., Rothrock, N. E., et al. (2019). PROMIS® adult health profiles: Efficient short-form measures of seven health domains. Value in Health, 22(5), 537–544.
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45, S22–S31.
Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., et al. (2010). The Patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194.
Alonso, J., Bartlett, S. J., Rose, M., Aaronson, N. K., Chaplin, J. E., Efficace, F., et al. (2013). The case for an international patient-reported outcomes measurement information system (PROMIS®) initiative. Health and Quality of Life Outcomes, 11(1), 1–5.
Terwee, C., Roorda, L., De Vet, H., Dekker, J., Westhovens, R., Van Leeuwen, J., et al. (2014). Dutch-Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS). Quality of Life Research, 23(6), 1733–1741.
Devine, J., Klasen, F., Moon, J., Herdman, M., Hurtado, M., Castillo, G., et al. (2018). Translation and cross-cultural adaptation of eight pediatric PROMIS® item banks into Spanish and German. Quality of Life Research, 27(9), 2415–2430.
Cai, T., Huang, Q., Wu, F., & Yuan, C. (2021). Psychometric evaluation of the PROMIS social function short forms in Chinese patients with breast cancer. Health and Quality of Life Outcomes, 19(1), 1–11.
Cai, T., Wu, F., Huang, Q., Yu, C., Yang, Y., Ni, F., et al. (2022). Validity and reliability of the Chinese version of the patient-reported outcomes measurement information system adult profile-57 (PROMIS-57). Health and Quality of Life Outcomes, 20(1), 95.
Pan, T., Mulhern, B., Viney, R., Norman, R., Tran-Duy, A., Hanmer, J., et al. (2021). Evidence on the relationship between PROMIS-29 and EQ-5D: A literature review. Quality of Life Research, 31, 1–11.
Dewitt, B., Feeny, D., Fischhoff, B., Cella, D., Hays, R. D., Hess, R., et al. (2018). Estimation of a preference-based summary score for the patient-reported outcomes measurement information system: The PROMIS®-preference (PROPr) scoring system. Medical Decision Making, 38(6), 683–698.
Kang, D., Kim, Y., Lim, J., Yoon, J., Kim, S., Kang, E., et al. (2021). Validation of the Korean version of the patient-reported outcomes measurement information system 29 profile V2. 1 among cancer survivors. Cancer Research and Treatment, 54, 10–19.
Chow, K. Y., Kurumety, S., Helenowski, I. B., Giri, S., & Sturgeon, C. (2021). Association between quality of life and patient-reported complications from surgery and radioiodine in early-stage thyroid cancer survivors: A matched-pair analysis. Surgery, 170, 462–468.
IsHak, W. W., Pan, D., Steiner, A. J., Feldman, E., Mann, A., Mirocha, J., et al. (2017). Patient-reported outcomes of quality of life, functioning, and GI/psychiatric symptom severity in patients with inflammatory bowel disease (IBD). Inflammatory Bowel Diseases, 23(5), 798–803.
Tang, E., Ekundayo, O., Peipert, J. D., Edwards, N., Bansal, A., Richardson, C., et al. (2019). Validation of the patient-reported outcomes measurement information system (PROMIS)-57 and-29 item short forms among kidney transplant recipients. Quality of Life Research, 28(3), 815–827.
McMullen, K., Bamer, A., Ryan, C. M., Schneider, J. C., Gibran, N., Stewart, B. T., et al. (2021). Validation of PROMIS-29 domain scores among adult burn survivors: A national institute on disability, independent living, and rehabilitation research burn model system study. Journal of Trauma and Acute Care Surgery, 92, 213–222.
van Balen, E. C., Haverman, L., Hassan, S., Taal, L. M., Smit, C., Driessens, M. H., et al. (2021). Validation of PROMIS Profile-29 in adults with hemophilia in the Netherlands. Journal of Thrombosis and Haemostasis, 19, 2687–2801.
Katz, P., Pedro, S., & Michaud, K. (2017). Performance of the patient-reported outcomes measurement information system 29-item profile in rheumatoid arthritis, osteoarthritis, fibromyalgia, and systemic lupus erythematosus. Arthritis Care & Research, 69(9), 1312–1321.
Khutok, K., Janwantanakul, P., Jensen, M. P., & Kanlayanaphotporn, R. (2021). Responsiveness of the PROMIS-29 scales in individuals with chronic low back pain. Spine, 46(2), 107–113.
Rawang, P., Janwantanakul, P., Correia, H., Jensen, M. P., & Kanlayanaphotporn, R. (2020). Cross-cultural adaptation, reliability, and construct validity of the Thai version of the patient-reported outcomes measurement information system-29 in individuals with chronic low back pain. Quality of Life Research, 29(3), 793–803.
Lai, J.-S., Beaumont, J. L., Jensen, S. E., Kaiser, K., Van Brunt, D. L., Kao, A. H., et al. (2017). An evaluation of health-related quality of life in patients with systemic lupus erythematosus using PROMIS and Neuro-QoL. Clinical Rheumatology, 36(3), 555–562.
Huang, W., Wu, Q., Zhang, Y., Tian, C., Huang, H., Huang, S., et al. (2022). Preliminary evaluation of the Chinese version of the patient-reported outcomes measurement information system 29-item profile in patients with aortic dissection. Health and Quality of Life Outcomes, 20(1), 94.
Rose, A. J., Bayliss, E., Huang, W., Baseman, L., Butcher, E., García, R. E., et al. (2018). Evaluating the PROMIS-29 v2. 0 for use among older adults with multiple chronic conditions. Quality of Life Research, 27(11), 2935–2944.
Elsman, E. B. M., Roorda, L. D., Smidt, N., de Vet, H. C. W., & Terwee, C. B. (2022). Measurement properties of the Dutch PROMIS-29 v2.1 profile in people with and without chronic conditions. Quality of Life Research, 31(12), 3447–3458.
Fischer, F., Gibbons, C., Coste, J., Valderas, J. M., Rose, M., & Leplège, A. (2018). Measurement invariance and general population reference values of the PROMIS profile 29 in the UK, France, and Germany. Quality of Life Research, 27(4), 999–1014.
Coste, J., Rouquette, A., Valderas, J., Rose, M., & Leplège, A. (2018). The French PROMIS-29. Psychometric validation and population reference values. Revue d’Épidémiologie et de Santé Publique, 66(5), 317–324.
Rimehaug, S. A., Kaat, A. J., Nordvik, J. E., Klokkerud, M., & Robinson, H. S. (2021). Psychometric properties of the PROMIS-57 questionnaire, Norwegian version. Quality of Life Research, 31, 1–12.
Rencz, F., & Janssen, M. F. (2022). Analyzing the pain/discomfort and anxiety/depression composite domains and the meaning of discomfort in the EQ-5D: A mixed-methods study. Value Health, 25(12), 2003–2016.
Rencz, F., Brodszky, V., & Janssen, M. F. (2023). A direct comparison of the measurement properties of EQ-5D-5L, PROMIS-29+2 and PROMIS global health instruments and EQ-5D-5L and PROPr utilities in a general population sample. Value Health (in press)
Healthmeasures (2021). PROMIS International. Available from: https://www.healthmeasures.net/explore-measurement-systems/promis/measure-development-research/promis-international. Accessed August 25, 2022.
Healthmeasures (2021). Preference-based scores. Available from: https://www.healthmeasures.net/score-and-interpret/calculate-scores/preference-based-scores. Accessed August 25, 2022.
Buysse, D. J., Yu, L., Moul, D. E., Germain, A., Stover, A., Dodds, N. E., et al. (2010). Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments. Sleep, 33(6), 781–792.
Hanmer, J., Jensen, R. E., & Rothrock, N. (2020). A reporting checklist for healthmeasures’ patient-reported outcomes: ASCQ-Me, Neuro-QoL, NIH Toolbox, and PROMIS. Journal of Patient-Reported Outcomes, 4(1), 1–7.
Ware, J. E., Snow, K. K., Kolinski, M., & Gandeck, B. (1993). SF-36 health survey manual and interpretation guide. The Health Institute.
Nolte, S., Coon, C., Hudgens, S., & Verdam, M. G. (2019). Psychometric evaluation of the PROMIS® depression item bank: An illustration of classical test theory methods. Journal of Patient-Reported Outcomes, 3(1), 1–10.
Streiner, D. L., Norman, G. R., & Cairney, J. (2015). Health measurement scales: A practical guide to their development and use. Oxford University Press.
McHorney, C. A., & Tarlov, A. R. (1995). Individual-patient monitoring in clinical practice: Are available health status surveys adequate? Quality of Life Research, 4(4), 293–307.
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., et al. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60(1), 34–42.
Revelle, W. (2017). psych: Procedures for psychological, psychometric, and personality research. Northwestern University, Evanston, Illinois. R package version 2.2.9. Available from: https://CRAN.R-project.org/package=psych Accessed December 31, 2022.
Nunnally, J. (1978). Psychometric theory (2nd ed.). McGraw.
Reise, S. P., Scheines, R., Widaman, K. F., & Haviland, M. G. (2013). Multidimensionality and structural coefficient bias in structural equation modeling: A bifactor perspective. Educational and Psychological Measurement, 73(1), 5–26.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289.
Ten Berge, J. M., & Sočan, G. (2004). The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika, 69(4), 613–625.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143.
Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment, 92(6), 544–559.
Hu, L., & t., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.
Kang, T., & Chen, T. T. (2008). Performance of the generalized S-X2 item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391–406.
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1–30.
Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co.
Hungarian Central Statistical Office: Hungarian microcensuses 2016. (2016). Available from: https://www.ksh.hu/mikrocenzus2016/ Accessed: August 25, 2022
Weijters, B., & Baumgartner, H. (2012). Misresponse to reversed and negated items in surveys: A review. Journal of Marketing Research, 49(5), 737–747.
Sonderen, E., & v., Sanderman, R., & Coyne, J. C. (2013). Ineffectiveness of reverse wording of questionnaire items: Let’s learn from cows in the rain. PLoS ONE, 8(7), e68967.
Rupel, V. P., & Ogorevc, M. (2020). EQ-5D-5L Slovenian population norms. Health and Quality of Life Outcomes, 18(1), 1–12.
Liegl, G., Petersen, M., Groenvold, M., Aaronson, N., Costantini, A., Fayers, P., et al. (2019). Establishing the European norm for the health-related quality of life domains of the computer-adaptive test EORTC CAT Core. European Journal of Cancer, 107, 133–141.
OECD/European Observatory on Health Systems and Policies (2019). Hungary: Country Health Profile 2019, State of Health in the EU, OECD Publishing, Paris/European Observatory on Health Systems and Policies, Brussels
Janssen, M., Szende, A., Cabases, J., Ramos-Goñi, J. M., Vilagut, G., & König, H.-H. (2019). Population norms for the EQ-5D-3L: A cross-country analysis of population surveys for 20 countries. The European Journal of Health Economics, 20(2), 205–216.
Terwee, C. B., Crins, M. H., Roorda, L. D., Cook, K. F., Cella, D., Smits, N., et al. (2021). International application of PROMIS computerized adaptive tests: US versus country-specific item parameters can be consequential for individual patient scores. Journal of Clinical Epidemiology, 134, 1–13.
Paz, S. H., Spritzer, K. L., Morales, L. S., & Hays, R. D. (2013). Evaluation of the patient-reported outcomes information system (PROMIS®) Spanish-language physical functioning items. Quality of Life Research, 22(7), 1819–1830.
Hay, J. W., Gong, C. L., Jiao, X., Zawadzki, N. K., Zawadzki, R. S., Pickard, A. S., et al. (2021). A US population health survey on the impact of COVID-19 using the EQ-5D-5L. Journal of General Internal Medicine, 36(5), 1292–1301.
Lingelbach, K., Piechnik, D., Gado, S., Janssen, D., Eichler, M., Hentschel, L., et al. (2021). Effects of the COVID-19 pandemic on psychological well-being and mental health based on a German online survey. Frontiers in Public Health, 9, 915.
Didriksen, M., Werge, T., Nissen, J., Schwinn, M., Sørensen, E., Nielsen, K. R., et al. (2021). Impact of COVID-19 pandemic on sleep quality, stress level and health-related quality of life—a large prospective cohort study on adult Danes. International Journal of Environmental Research and Public Health, 18(14), 7610.
Bonichini, S., & Tremolada, M. (2021). Quality of life and symptoms of PTSD during the COVID-19 lockdown in Italy. International Journal of Environmental Research and Public Health, 18(8), 4385.
Ferreira, L. N., Pereira, L. N., da Fé Brás, M., & Ilchuk, K. (2021). Quality of life under the COVID-19 quarantine. Quality of Life Research, 30(5), 1389–1405.
Rencz, F., Tamási, B., Brodszky, V., Ruzsa, G., Gulácsi, L., & Péntek, M. (2020). Did you get what you wanted? Patient satisfaction and congruence between preferred and perceived roles in medical decision making in a Hungarian national survey. Value in Health Regional Issues, 22, 61–67.
Reise, S. P., Du, H., Wong, E. F., Hubbard, A. S., & Haviland, M. G. (2021). Matching IRT models to patient-reported outcomes constructs: The graded response and log-logistic models for scaling depression. Psychometrika, 86(3), 800–824.
Smits, N., Öğreden, O., Garnier-Villarreal, M., Terwee, C. B., & Chalmers, R. P. (2020). A study of alternative approaches to non-normal latent trait distributions in item response theory models used for health outcome measurement. Statistical Methods in Medical Research, 29(4), 1030–1048.
Wall, M. M., Park, J. Y., & Moustaki, I. (2015). IRT modeling in the presence of zero-inflation with application to psychiatric disorder severity. Applied Psychological Measurement, 39(8), 583–597.
Acknowledgements
The authors wish to thank Istvan Mucsi for supporting this study.
Funding
Open access funding provided by Corvinus University of Budapest. Balázs Jenei and Ariel Z. Mitev were supported by the Higher Education Institutional Excellence Program 2020 of the Ministry of Innovation and Technology in Hungary the framework of the 'Financial and Public Services' research project (TKP2020-IKA-02) at the Corvinus University of Budapest. The data collection was supported by the Hungarian Academy of Sciences (MTA-PPD 462025). Alex Bató’s work was supported by the ÚNKP-21–3 New National Excellence Program of the Ministry for Innovation and Technology from the source of the National Research, Development and Innovation Fund (ÚNKP-21–3-I-SE-78). Fanni Rencz’s work was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences (BO/00304/21) and the New National Excellence Program of the Ministry for Innovation and Technology from the source of the National Research, Development and Innovation Fund (ÚNKP-22–5-CORVINUS-4).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jenei, B., Bató, A., Mitev, A.Z. et al. Hungarian PROMIS-29+2: psychometric properties and population reference values. Qual Life Res 32, 2179–2194 (2023). https://doi.org/10.1007/s11136-023-03364-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-023-03364-7