Introduction

Patient-centeredness has been increasingly recognized as a crucial aspect of patient care [1]. With an increased emphasis on patient-centered care, health-related quality of life (HRQL) that permits patient self-report is assuming a more prominent role as an important endpoint and a major treatment indicator in both research and clinical practice [2]. For an average of 8.5 years [3], patients with uterine fibroids (UFs) experience a high symptom burden, including moderate to severe abdominal pain [4], low back pain, urinary frequency and urgency, pain during intercourse [4], and vaginal bleeding [5], all of which negatively impact physical and social activities, work productivity, and quality of life (QoL) [6, 7]. Importantly, the lifetime risk of UFs for women over 45 is up to 60% [8]. Given the high prevalence among women, UFs impose a significant health care burden on women's individual health [9] as well as a burden on the health care and social security systems as a result of work productivity loss during treatment and disease recurrence [10].

As one of the direct clinical outcome assessments [11], patient-reported outcome (PRO)-based symptom and QoL measures serve as reliable approaches for patients with UFs throughout the full course of diagnosis [12], treatment [13], and disease management [14]. Consequently, measuring UF-related symptoms and QoL status in a valid and reliable manner could support high-quality practice and comprehensive patient management for patients of diverse cultures.

The uterine fibroid symptom and quality of life (UFS-QoL) English questionnaire was published in 2002 as the only questionnaire designed to assess the whole spectrum of fibroid-related symptoms and their impact on QoL [15]. The UFS-QoL questionnaire has been used in Brazilian, Portuguese [16, 17], Spanish [18], traditional Chinese [19], and simplified Chinese [20] as a disease-specific measure of health-related QoL. It has been shown to be sensitive to treatment-related changes [18], with the 4-week recall version being sensitive to treatment-related changes in Western culture [21]. Self-consciousness has the lowest Cronbach α based on existing UFS-QoL validation studies in Chinese populations, [20] possibly due to existing adaptive barriers in China. QoL is a subjective and multidimensional concept based on the individual's perception of the position of their life in the cultural context and value systems in which they reside, in relation to their goals, expectations, standards, and concerns [16]. In the evaluation of QoL, patient-related concepts that vary from person to person are considered [22]. However, social, linguistic, and cultural differences necessitate proper cultural adaptation.

The two Chinese validation studies only demonstrated that the UFS-QoL questionnaire can identify disease and symptoms in patients by comparing women with UFs and healthy women in a cross-sectional analysis [20, 23]. However, a few studies reported that PROs play an important role as clinical tools for clinical application and disease monitoring, such as differentiating disease severity and demonstrating symptom recovery and alleviation trajectory over time, not only screening for UFs. The responsiveness of the UFS-QoL, which is one of the major characteristics of Food and Drug Administration (FDA)-reviewed PRO instruments, must be further validated to determine its efficacy. Therefore, large longitudinal population-based studies are required to further evaluate the clinical utility of UFS-QoL.

We can demonstrate the measurement properties of UFS-QoL when longitudinally applied in clinical settings due to the availability of longitudinal UFS-QoL data from the largest cohort of patients with UFs ever. Using internal consistency, convergent validity, known-group validity, and concurrent validity (correlation between UFS-QoL and SF-36), the current study aimed to demonstrate the applicability and adaptability of the UFS-QoL in evaluating adaptability and treatment efficacy. We also evaluated responsiveness, the ability to detect change with treatment using UFS-QoL, such as when comparing the effect size of major UFs treatment modalities before and after treatment.

Methods

Data of uterine fibroids were extracted from a 20-centered prospective cohort study (Uterine fibroids multicentre network information system: www.hifuctr.com) for a second-analysis, which included patients who received self-selected hysterectomy, myomectomy, or High-Intensity Focused Ultrasound (HIFU) therapy after being fully informed of the treatment options (The multicentre study was approved by a China-registered clinical trial ethics committee (ChiECRCT-2011034). Details regarding the study's design, data collection, and primary outcomes regarding the efficacy of the treatment were published [24]. Prior to undergoing any study-related procedures at the clinical site, patients filled out the UFS-QoL questionnaire, the study short form-36 (SF-36), and a brief sociodemographic questionnaire. Follow-up visits were scheduled at 6 months and 12 months post-procedure, included complications, magnetic resonance imaging evaluation, overall treatment effect evaluates, the UFS-QoL questionnaire (for those who had under-gone HIFU or myomectomy, because the instructions of the UFS-QoL questionnaire are based on the presence of uterine fibroids and menstrual periods), SF-36, and several health care utilization items were recorded.

Questionnaires utilized in the study

Uterine fibroid symptom and quality of life questionnaire (UFS-QoL).

The UFS-QoL was developed from focus groups of women with uterine fibroid [15, 20]. The UFS-QoL questionnaire consists of 37 items, 8 of which assess the severity of symptoms (single domain) and 29 of which assess health-related quality of life (HRQL) in 6 subscales (concern, activities, energy/mood, control, self-consciousness, and sexual function). All responses were classified into five Likert scale options. A higher score on the questionnaire's severity subscale indicates more severe symptoms, while a lower score on the HRQL subscales indicates poorer QoL.

Medical outcomes of the study short form 36 (SF-36).

SF-36 is a 36-item self-administered generic measure used to assess general health status. and validated cross-cultural application, reality and validity in Chinese [25,26,27,28]. SF-36 consists of eight subscales: physical, functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, and mental health, as well as two composite scores, the physical and mental component scores. Individual subscale items are combined to form a subscale rating, which is then converted to a 0–100 scale [29, 30]. Higher QoL scores correspond to a four-week recall period [30]. There are available reference values derived from a healthy population and distributed by age and gender. We estimate of the Spearman's correlation between UFS-QoL and SF-36 to validate the concurrent validity.

Statistical analysis

Descriptive analyses (mean and SD) were performed using sociodemographic and clinical characteristics. Means of differences, 95% confidence interval (CI), and statistical significance (P < 0.05) were tested with independent sample t tests.

Cronbach's α coefficient was applied to determine the internal consistency of the quality of life. The Cronbach's α coefficient, which ranges from 0 to 1, is used to determine the degree to which items on the 6-subscale HRQL measure are related to the same concepts. A greater value indicates a smaller measurement error, which indicates a higher level of reliability.

To examine convergent validity between items and subscales in our study, we employed principal axis factor analysis with orthogonal rotation, which was used in conjunction with orthogonal rotation to determine the final number of factors based on their eigenvalues, congruence, and clinical significance. The Kaiser–Meyer–Olkin (KMO) test confirmed adequate sample size, and a KMO value > 0.5 indicated an acceptable structural validity. With structural equation modeling (SEM) and orthogonal (intercorrelated) factors, a confirmatory factor analysis was conducted. Examining the relative chi-square (chi-square/degrees of freedom), the root mean-square error of approximation, the goodness of fit index, the standardized root means square residual, the normed fit index, the Tucker Lewis index, and the comparative fit index allowed for the evaluation of model fit. The expected factor loadings for each item were ≥ 0.40 [31].

Concurrent validity refers to the correlation between an instrument and another instrument that measures a related but not identical concept, and it is used to evaluate the reliability of correlation testing of a priori hypotheses [32]. Both scales were designed to measure QoL, so Spearman's correlation was used to determine the strength of the UFS-correlation QoL's testing with SF-36 to validate concurrent validity. Similar UFS-QoL and SF-36 subscales were correlated. For instance, the SF-36's "physical functioning domain" was compared to the UFS-QoL's “activity.” The SF-36 "role-emotional" domain was compared with the UFS-QoL "mood/energy" domain.

We examined evidence of known-group validity that the UFS-QoL can distinguish between clinically distinct groups by testing its ability to differentiate between patients based on health status. Patients were considered to have a poor general health status if they responded "fair" or "poor" to the SF-36–1 question “In general, would you say your health is”; otherwise, the health status was presumed good.

Using general linear models, the ability to detect change was evaluated by comparing 6-month pre-treatment and post-treatment scores to 12-month scores at 6-month intervals. Effect size (change in mean score divided by baseline standard deviation) [33] and standardized response mean (change in mean score divided by change standard deviation) were computed. A value of 0.2 was considered to have a “small” effect, 0.5 a “moderate” one, and ≥ 0.8 a “large” one.

The questionnaires were scored in accordance with the developers' instructions. Version 9.1.3 of SAS was used to conduct analyses. All statistical tests were predetermined, and no missing data imputations were performed. All statistical tests were conducted with a fixed type I error probability of 0.05 and a two-tailed design [29].

Results

Psychometric characteristics

To assess internal consistency, Cronbach's alphas were calculated for each UFS-QoL questionnaire subscale. The UFS-QoL questionnaire demonstrated good internal consistency and reliability in all subscales (> 0.7) at baseline and follow-up (6 and 12 months) except for self-consciousness (0.5–0.62) (Table 1). We also calculated the internal consistency of self-consciousness stratified by age (< 45 vs. ≥ 45), highest educational level (below junior school vs. above junior school), annual family income (< 50,000 vs. ≥ 50,000), and number of pregnancies (1 vs. > 1), which all depicted a Cronbach’s α < 0.7 (Additional file 1: Table S1).

Table 1 Internal consistency reliability of UFS-QOL subscales

The factor analysis was utilized to determine the relative importance of UFS-QoL components. In factor analysis, values of the KMO test > 0.7 and the statistical significance of the Bartlett's test indicate adequate sampling [34, 35]. We used orthogonal rotation to isolate the potential UFS-QoL factors (Table 2). In this study, the value of KMO was 0.954, and the value of Bartlett's sphericity test was 2,536.26 (P < 0.001), indicating that factor analysis was suitable for the data. The UFS-QoL retained six factors. Overall, the test variables explained 63.61% of the total variance. 11.68, 1.75, 1.61, 1.20, 1.13, and 1.07 were the eigenvalues of the factors, and the variance percentages of the test were 40.27, 6.04, 5.54, 4.15, 3.91, and 3.07. Five items have a factor load of greater than 0.40 on multiple factors, and their item numbers are 12, 14, 16, 19, 29; meanwhile, two items load on any factor below or equal to 0.40. The item numbers for these items are 26 and 27. Additional file 1: Table S2 displays the distribution of items within each factor, the comparison with the subscales of the original questionnaire, and the Chinese version of the original validation. Comparative fit index of 0.842, Tucker Lewis index of 0.894, and root mean square error of approximation of 0.077 for this six-factor confirmatory factor analysis model indicate adequate model fit (Additional file 1: Table S3).

Table 2 Construct validity of the HRQL: baseline factor loadings of the HRQL items (N = 2411)

As shown in Additional file 1: Table S4, floor or ceiling effects are present if > 15% of respondents achieved the lowest or highest possible score, respectively [36]. In six subscales, ceiling effects ranged from 5.14% to 16.96%, while self-consciousness (15.18%) and sexual functioning (16.96%) were all > 15%. The ceiling effect varies between 0.02 and 0.5%.

Assessing the degree of correlation between similar subscales on the UFS-QoL and SF-36 was used to assess the reliability of correlation testing for the UFS-QoL (Table 3). The “physical functioning” domain of the SF-36 correlated positively and moderately with the “activity” domain of the UFS-QoL (r: 0.33–0.4, P < 0.001). The “role-emotional” domain of the SF-36 had a moderately positive correlation with the “energy/mood” domain (r: 0.35–0.43, P < 0.001). Similarly, the “role-physical” subscale of SF-36 had a moderate correlation with the “control” subscale of the UFS-QoL (r: 0.3–0.4, P < 0.001).

Table 3 Relationship of UFS-QoL Subscale Scores and SF-36*

UFS-QoL was sensitive enough to detect varying levels of current health status, particularly six and twelve months after surgery (Table 4). Patients with poor general health status (those who rated "In general, would you say your health" as fair or poor on the SF-36-1) had statistically significantly higher severity for all symptoms and poorer QoL than those with good general health status (those who rated "In general, would you say your health" as excellent, very good, or good) (all P < 0.0001), and all effect sizes > 0.5 indicated a moderate effect 6 and 12 months after surgery.

Table 4 Known-group validity: comparison of UFS-QOL symptom and quality of life scores and known health status

Except for self-consciousness 6 and 12 months after surgery, UFS-QoL items displayed a good ability to detect change in general (Table 5). After treatment, there was a significant decrease in symptom severity scores and an improvement in HRQL subscale scores. Mean score change from baseline to 12-month follow-up for symptom severity was − 11.5 (P < 0.001), with an effect size of − 0.81 and standardized response means (SRM) of − 0.81 (Table 5). The mean change in score for the HRQL subscales ranged from 7.20 (sexual function) to 11.70 (concern), with effect sizes ranging from 0.38 (self-consciousness) to 0.55 (concern) and SRM ranging from 0.38 (self-consciousness) to 0.6 (concern) in 6 months after treatment. The self-consciousness subscale exhibited the lowest effect size (0.38), as well as the lowest SRM (0.38).

Table 5 Responsiveness of women with uterine fibroids after treatment

Discussion

This is the first study from the largest cohort of patients with UFs to evaluate the adaptability and clinical applicability of the UFS-QoL in comparative effectiveness research involving clinical manifestations with varying severity. It further evaluated the UFS-QoL's capacity to produce valid and consistent results. Our study revealed that the Chinese version of the UFS-QoL requires further modification for cross-cultural adaptation, and that ongoing efforts in the management of uterine fibroids in Chinese must be expanded to reduce disparities in individual symptom burdens. This study will aid clinicians and researchers in selecting a suitable instrument for measuring quality of life and symptom burden in Chinese patients with uterine fibroids.

In this analysis, we found that some items did not adequately affect any factor (≤ 0.40 or ≥ 0.40 on more than one factor) in the factor analysis, which, according to the decision rules for item reduction in development studies [15], resulted in differences in the factor subscales from the original subscales. These differences may be attributable to cultural context and language system differences between China and other countries. This study and others have demonstrated that "Caused you embarrassment?" belongs to the concern subscales, not the activities subscales. Meanwhile, "Made you feel anxious about the unpredictable onset or duration of your periods?" belongs to activities unrelated to menstruation. Some Chinese questions containing both energy and activity subscales, such as "Item 19: Made you feel it was difficult to carry out your usual activities," resulted in inadequate item discrimination. China with a complex population structure and a large gap between the rich and the poor, both of which have varied effects on cross-cultural tests [16]. Lacking are cognitive debriefing or linguistic validation in Chinese population validation [19, 20, 23], and cross-cultural tests to eliminate cultural differences. Before applying the UFS-QoL to Chinese populations, the cross-cultural test criteria of adjusting the subscales to the Chinese culture, shortening or deleting the poor discrimination items to ensure the validity of the scale, and further optimizing and reevaluating the practicability of these items, should be considered.

FDA claims that a PRO questionnaire as an instrument should measure what it is intended to measure, and it is assumed that cross-cultural adaptation will yield an equivalent measure [37]. Self-consciousness possessed the lowest Cronbach's α, ceiling effects, and responsiveness. This subscale contains three questions: (1) Made you feel self-conscious of weight gain; (2) Made you feel conscious about the size and appearance of your stomach? (3) Affected the size of clothing you wear during your periods. Our findings indicate that these three questions may not be sensitive enough to assess the concept of interest and self-consciousness in Chinese patients with UFs, despite their content validity in western populations [15, 17]. When applying UFS-QoL to Chinese populations, it is necessary to modify the item definition and concept of reliability and sensitivity by incorporating cognitive debriefing, focus groups and/or committees, Rasch measurement theory analysis, and traditional psychometrics.

Sexual health and health-related QoL may be phenomenologically related [38]. However, sexual functioning has demonstrated poor responsiveness and a significant ceiling effect, which may be attributable to differences in study design and culture. Sexuality is openly discussed on the mainland. China remains controversial, and research in this area is relatively new [39], which casts doubt on the authenticity of certain objects. To determine how uterine fibroids affect sexual activity in Chinese populations, it is necessary to adapt most of the sex QoL questionnaires validated in China, from frequency and level subscales to the evaluation of sexual functioning [40].

Our study's strengths include a sufficient sample size and the incorporation of multiple treatments, thereby validating the Chinese version of the UFS-QoL for myomectomy, HIFU, and hysterectomy. Another strength of our study is that we found that the self-consciousness domain requires additional research into cultural adjustment, as it had the lowest Cronbach's α (0.56), effect size (0.38), and SRM (0.38). This study had a few limitations. First, we evaluated responsiveness and comparative efficacy using effect size instead of a P value, yielding some small effect results. Second, there was no follow-up for the UFS-QoL questionnaire in the hysterectomy group, as the questionnaire's instructions are contingent on the presence of menstrual periods. Thirdly, we did not include healthy women in the analysis because the purpose of the clinical application was not to diagnose and differentiate uterine fibrous patients from women without myoma; rather, we focused on the clinical evaluation, including the severity of the illness.

Conclusion

In the Chinese version of the UFS-QoL, the symptom, activity, and mood interference subscales were culturally appropriate and reliable. However, the self-consciousness domain requires additional research on cultural adaptation, such as cognitive debriefing of how Chinese populations interpret the questions. This study will provide clinicians and researchers with more specific psychological evidence for selecting an appropriate instrument and demonstrate the benefit of accurately assessing the quality of life and symptom burden in Chinese patients with uterine fibroids. The results of the self-consciousness subscale of the UFS-QoL should be interpreted with caution when evaluating the quality of life of patients with uterine fibroids.