Background

Urinary incontinence (UI) refers to an objectively proven condition of involuntary urine leakage [1]. Sneezing and coughing can induce urine leakage, which leads to awkwardness, anxiety, and depression in most patients. The frequent urine leakage and the consequent unpleasant smell may deprive patients of socialisation and even cause sleeping disorders in some patients, which consequently induces psychological diseases [2]. UI has a high incidence and affects a wide range of people, influencing the health of patients and the lives of their families. Studies in the United States, the United Kingdom, and Sweden have shown that the incidence of UI is as high as 46% and 68% in males and females, respectively [3]. The incidence of UI was 8.7% to 69.8% in Chinese women, representing 43–349 million women [4]. UI can cause a number of sexual dysfunctions, and 83.45% of patients were dissatisfied with their sex lives [5]. Patients with UI also experience decreased quality of life [6]. UI not only has an adverse influence on patients and their families but also increases the disease burden on society [7].

UI treatment includes conservative treatment (e.g. appropriate fluid intake, weight loss, smoking cessation, and rehabilitation training), physical devices, medication, and surgery [8, 9]. Compared with surgical treatments, which are associated with substantial trauma and high costs, conservative treatments are effective, safe, and acceptable and have been considered as the major treatment for UI [9]. The International Urinary Control Association recommends pelvic floor muscle and bladder training as the first-line treatment for patients with UI, affirming the role of such training in improving UI [10]. Studies have demonstrated that pelvic floor muscle and bladder exercises are effective training methods. Most evidence has shown that pelvic floor muscle training combined with bladder training is more effective than pelvic floor muscle training alone [8, 11]. Pelvic floor muscle training (PFMT) refers to patients consciously training their pelvic floor muscles, mainly the pubic coccygeus muscle group, to autonomously contract [12, 13]. Bladder training (BT) refers to patients urinating at prescribed times and gradually lengthening the intervals between urination to gradually increase their bladder capacity and enhance their control of the bladder function [14, 15]. These exercises are simple and easy to perform and suitable for patients capable of autonomous training [16, 17], but compliance is often problematic [18]. Compliance with pelvic floor muscle and bladder training plays an important role in improving pelvic floor muscle and bladder function and has been proven a main predictor of exercises long-term effectiveness [17]. Compliance is the extent to which a patient’s behaviour complies with the doctor’s advice regarding the treatment and prevention of disease [19]. In this study, the concept of compliance is defined as the degree of patients’ compliance with doctors’, therapists’, and nurses’ advice, consisting of the degree of consistency in the frequency, duration, and initiative of pelvic floor muscle training and bladder training. Training compliance is influenced by multiple factors and requires patients’ active participation [18]. Poor training compliance, for example in patients who forget to complete training, can lead to minimal perceived benefits [18, 20]. Therefore, it is crucial to develop a training compliance scale for patients with UI. This scale can assess the training compliance in patients with UI, which can help doctors to predict the efficacy of training and patients’ recovery. If the scale assessment outcome shows that a patient’s training compliance is poor, more strategies should be used to help the patient’s recovery.

To systematically published literature search data, there is a large body of research regarding pelvic floor muscle training [21] and bladder training [8]. However, most of these studies have focused on the mechanisms [22], treatment methods [23], and treatment effects [24] of pelvic floor muscle training for UI in women. Some studies have focused on the treatment effects of PFMT combined with BT for UI in women [11]. Few studies have reported on PFMT and BT compliance [18, 25], and we did not find any studies that developed a training compliance scale, although several studies have used a pelvic floor muscle exercise self-efficacy scale for female patients, like the Chen’s pelvic floor muscle exercise self-efficacy scale [26,27,28]. To the best of our knowledge, no study has developed a compliance scale for PFMT combined with BT for patients with UI. Therefore, this study aimed to develop a training compliance scale for patients with UI and evaluate its validity and reliability. Our findings may provide guidance regarding the assessment of training compliance of UI patients, which would help to increase their compliance with pelvic floor muscle and bladder training, improve their quality of life, and promote their recovery.

Methods

Study design and participants

This study developed and evaluated a training compliance scale for patients with UI in three steps: 1) establishment of the item pool and development of the scale; 2) evaluation of the validity of items, as well as the validity and reliability of the scale; and 3) exploratory factor analysis and confirmatory factor analysis.

There were 10 participants in the group discussions: one urological surgeon and one gynaecologist with senior professional titles, one urological surgeon and one gynaecologist with medium-grade professional titles, two nurses with senior professional titles, two nurses with medium-grade professional titles, and two nurses with master’s degrees in nursing science. All the inclusion members were included following the inclusion criteria: 1) Doctors or nurses engaged in urology or gynecology; 2) Bachelor's degree with at least 5 years of work experience, or master's degree and familiar with the field; 3) Volunteer to participate in the study.

A team of 22 experts participated in a Delphi survey. The inclusion criteria for these experts were as follows: 1) having at least a bachelor’s degree; 2) having at least 10 years’ work experience; 3) having a job level of associate senior or above; 4) being a physician, rehabilitation therapist, or nurse engaged in the diagnosis and treatment of UI or UI rehabilitation; and 5) being willing to participate in the consultation and able to complete the consultation on time. The team members were two chief nurses in charge of the daily management of nursing, eight chief doctors in charge of the management of medical practices, and 12 experts from UI-related departments (urology, obstetrics and gynaecology, and rehabilitation departments).

Convenience sampling was used to enrol patients diagnosed with UI who were referred to the outpatient departments of two tertiary 3A hospitals in Hainan, China for re-examinations between December 2020 and July 2021. The inclusion criteria for these patients were as follows: 1) a diagnosis of UI, overflow incontinence, UI following prostatectomy, or UI following in situ ileal neobladder [29,30,31], and the presence of UI as the prominent disease; 2) an age of at least 18 years; 3) a clear conscience and the ability to express themselves; 4) willingness to volunteer for the study; and 5) having received guidance on recovery training. Participants who had participated in similar studies before were excluded from the study.

Scale development

First, a literature analysis was performed to develop a training compliance scale for patients with UI. The Web of Science, PubMed, CNKI, and WANFANG databases were searched for relevant literature before October 25, 2020. The search terms were as follows: (‘urinary incontinence’) AND (‘pelvic floor muscle training (PFMT)’ OR ‘bladder training’ OR ‘bladder exercise’) AND (‘compliance’ OR ‘adherence’) AND (‘scale’ OR ‘gauge’ OR ‘questionnaire’ OR ‘questionnaire survey’ OR ‘compliance scale’ OR ‘compliance gauge’). The corresponding Chinese terms were used as the search terms in the Chinese-language databases (CNKI and WANFANG).

A priori eligibility criteria identified in the protocol were used to identify studies for inclusion. These criteria were as follows: (1) being conducted among adults aged at least 18 years with a diagnosis of UI; and (2) being related to rehabilitation training compliance in patients with UI. After a comprehensive review of the relevant literature on pelvic floor muscle training [2, 26, 32], bladder training [33,34,35,36,37], and voiding diaries [38,39,40], 25 candidate items were identified for the scale. Please see Table 1.

Table 1 Items in the item pool and their sources

Second, group discussions were held to modify the scale items. The group discussions were held three times, once a week, with a duration of an hour each. A round-table format was used, and the meeting was conducted in the urology conference room, chaired by a physician with a senior title from the discussion group. After the group discussions, 13 items were deleted or combined because of their similarity or redundancy to other items, and modifications were made to ensure that terms were used accurately and that the items were easy to understand. The scale consisted of 12 items after the group discussions. Please see Table 2.

Table 2 Changes to the scale based on the group discussions

Third, Delphi sessions were conducted using paper questionnaires and emails to consult with experts to screen and modify the items. Each round of Delphi took one week, with one week between rounds. This was done to ensure that the scale was concise and understandable and to avoid redundant items. The experts scored the importance and relevance of each item in this scale using a 5-point Likert scale (5 = very important, 4 = important, 3 = fair, 2 = unimportant, and 1 = highly unimportant). The experts could suggest the deletion or detailed modification of an item if they felt that the item was unnecessary or that the description was inaccurate. The experts could also add items or descriptions that had not been included in the scale. Please see Table 3.

Table 3 Changes to the scale based on the expert consultations

Our research team members read the references and guidelines and extracted the three dimensions of the scale, namely pelvic floor muscle training, bladder training, and urination diary recording, because pelvic floor muscle training and bladder training methods are simple, easy to carry out, and economical. Items 1 to 5 of the scale were used to assess compliance with pelvic floor muscle training, items 6 to 10 were used to assess compliance with bladder training, and items 11 to 12 were used to assess urinary diary recording.

The investigators in the study consisted of 2 postgraduate nursing students and 5 nursing personnel. They were uniformly trained by an associate chief nurse in a urological surgery department. This training covered the introduction to the scale, measures for obtaining informed consent from patients, dispatching of the questionnaire, and matters requiring attention in completing the scale. The questionnaire consisted of three parts: 1) basic data about the patients, including to sex, age, educational level, type of UI, and frequency of urinary leakage; 2) the training compliance scale for patients with UI developed in this study; and 3) the pelvic floor muscle exercise self-efficacy scale developed by Chen et al. [26]. Electronic or paper questionnaires were dispatched to the patients included in the study after obtaining their consent.

Reliability and validity

Reliability [41] refers to the consistency and robustness of the results measured by a tool. These results reflect the degree of reliability of the tool (scale). To measure the test–retest reliability of this scale, 30 patients with UI were included in the study for a re-assessment 2 weeks after. After re-assessment, the original questionnaires were recovered for analysis. The reliability of the scale was evaluated by its test–retest reliability and internal consistency reliability [42]. Test–retest reliability was used to assess the scale’s dependability [43]. A score over 0.7 is usually recognised as evidence that the scale is stable. The Cronbach’s α coefficient was used to assess the internal consistency of the scale. In general, a Cronbach’s α of 0.8–0.9 is acceptable and > 0.9 is high [26]. The items in the scale were divided into two equal parts to assess their correlation, with a desired value of split-half reliability over 0.8 [44].

Validity [41] indicates the effectiveness of the tool being evaluated. The content validity index (CVI) was calculated based on the scores assigned to the items by the experts. A CVI of ≥ 0.8 indicates excellent content validity. Each expert was asked to assess the relevance of each item to the corresponding content dimension. A 5-level scoring method was used, where (1 = very unessential, 2 = unessential, 3 = general, 4 = essential, and 5 = very essential) through the Delphi method. Items with a score of 4 or 5 were considered to be relevant to the content being measured [45]. The I-CVI was the ratio of the number of experts who determined the item as relevant (i.e. score ≥ 4) to the total number of experts [45]. The S-CVI was calculated as the average CVI across items. The content validity ratio (CVR) [46] was used to assess whether an item was essential for operating a construct. The CVR was calculated by experts' responses to the following options based on the Likert scale: (1 = very unessential, 2 = unessential, 3 = general, 4 = essential, and 5 = very essential), as the number of experts who determine the item as essential (i.e. score ≥ 4) minus “the total number of specialists∕2”, and this result is divided by “the total number of specialists∕2” [45]. The CVR can range from -1 (perfect disagreement) to 1 (perfect agreement), with a CVR greater than zero meaning that over half of the participants recognised an item as essential [47].

Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were used to evaluate the structural validity of the scale. EFA was performed for evaluation, and factorial analysis was performed to evaluate the structural validity of the scale by measuring whether the common factors were in agreement with the structural hypothesis of the scale. In CFA, researchers first raise a hypothesised factor structure and then test it, thereby examining whether the proposed model fits the data. A study mentioned that the sample size of 1:5 (N:q; the number of cases (N) to the number of estimated parameters (q)) qualifies for EFA and CFA [45, 48], but it also pointed out that this is used for minimum recommendations. In this study, the sample size for EFA and CFA was calculated to be 5–10 times as many subjects as the number. Therefore, the required sample content calculation formula is as follows: minimum sample size = 12 Items × 5 times = 60. The total sample was split into two parts, in the EFA and CFA stage according to 12 items: 61 samples for EFA and 62 samples for CFA. AMOS 23.0 (IBM Corp., Armonk, NY, USA) was used to examine whether the factor model constructed from EFA was a good fit for the data. The maximum-likelihood estimation method in AMOS with the covariance matrix generated in PRELIS was used to analyse this model. The chi-square test, relative chi-square (CMIN/DF), root mean square error of approximation (RMSEA), comparative fit index (CFI), normed fit index (NFI), non-normed fit index (TLI), and incremental fit index (IFI) were used to examine the model fit [49].

The average variance extracted (AVE) was used to calculate the scale’s convergent validity. An AVE above 0.50 indicates suitable convergent validity [50]. Discriminant validity can be tested by comparing the square root of a factor’s AVE with the correlation of the specific factor with any of the other factors. When the square root of the AVE is greater than the correlation coefficient, it indicates acceptable discriminant validity [50].

Chen’s pelvic floor muscle exercise self-efficacy (PFMSE) scale and the scale developed in this study were administered to 30 patients, and the results were subjected to correlation analysis. A higher coefficient indicates greater calibration correlation validity. The Cronbach’s α for the Chen PFMSE scale was 0.95, the test–retest reliability was 0.86, and good construct validity was reported [26]. The Chen PFMSE scale has been widely used to assess women’s pelvic floor muscle exercise adherence and confidence [10, 51, 52]. Criterion validity describes the instrument’s correlation with its criteria [53]. Thus, this study chose the Chen PFMSE scale to assess the correlation validity of the training compliance scale for patients with UI.

Evaluation

The scale developed in this study was scored on a 5-point Likert scale. The total score of the scale ranged from 12 to 60 points. Every item in this scale was assigned a score of 5, 4, 3, 2, or 1 point, indicating ‘always’, ‘usually’, ‘sometimes’, ‘occasionally’, and ‘never’, respectively. The evaluation criteria were as follows: a score ≥ total score of the scale × 80% indicated high compliance, and a score < total score of the scale × 80% indicated poor compliance [54].

Statistical analysis

SPSS 18.0 (SPSS Inc., Chicago, IL, USA) was used for statistical analysis, and two postgraduate nursing students checked the data independently. If any disagreements arose, the raw data were checked. Categorical data are described using frequencies and percentages, and continuous data are represented using means and standard deviations (SDs). Cronbach’s α was calculated for the dimensions and the scale. The Kappa coefficient and CVI were used for the items in the training compliance scale.

Kappa coefficients can be used to assess the quantifying reliability of scale [55]. The Kaiser–Meyer–Olkin (KMO) value was calculated, and Bartlett’s test of sphericity was conducted. The scale in this study and the pelvic floor muscle training self-efficacy scale developed by Chen were administered to 30 patients at the same time, and correlation analysis was conducted to obtain criterion validity. The larger the calibration validity coefficient, the better the calibration correlation. A two-sided P < 0.05 was considered statistically significant.

Results

Development of the scale

Three steps were taken to develop the scale. The development result of each step is shown below. During the first step, the literature review was used to build an item pool with 25 items for the scale. The literature review was conducted by a library staff who charge in teaching literature search and the researcher LLM. The details of the items and their sources are shown in Table 1.

During the second step, 10 staff members with rich working experience in a urology department held group discussion three times to modify the items in the scale. After deleting, combining, and modifying the items as needed, 12 items were retained. The details of the modifications made during this step are provided in Table 2.

During the third step, two Delphi rounds were conducted with 22 experts using paper questionnaires and email. No items were deleted during this step, but the experts gave comments to modify some of the items. The details of the modifications made during each round of this step are shown in Table 3.

Characteristics of the study participants

In total, 132 questionnaires were dispatched during this study, and 123 validated questionnaires were recovered, yielding an effective recovery rate of 93.18%. These patients consisted of 88 males and 35 females, and their mean age was 58.57 ± 12.96 years (35–83 years). The demographic and clinical characteristics of the 123 participants are shown in Table 4. The score of three dimensions of patients with urinary incontinence compliance scale are shown in Table 5. The items’ score of patients with urinary incontinence rehabilitation training compliance scale are shown in Table 6.

Table 4 Demographic and clinical characteristics of the 123 participants
Table 5 The score of three dimensions of patients with urinary incontinence compliance scale
Table 6 The items’ score of patients with urinary incontinence rehabilitation training compliance scale

Reliability

Cronbach’s α was 0.95 for the overall scale, and the Cronbach’s α values for the three factors were 0.93 of pelvic floor muscle training compliance, 0.91 of compliance of bladder training, and 0.94 of urination diary recording. The test–retest reliability was 0.82–0.87 for the items and 0.86 of the full scale (P < 0.05). The split-half reliability of this scale was 0.89 (P < 0.05).

Content validity

The S-CVI was 0.93 and the I-CVI was 0.87–1.0, indicating that the scale had good content validity (Table 7). The Spearman–Brown coefficient was 0.89, indicating that the items of the scale had high homogeneity and that the scale had good internal consistency. The CVR was 0.92, indicating that the experts recognised all of the items in the scale as essential. Kappa coefficient of the scale was 0.86 indicating that the items consistency in the scale were good [56].

Table 7 Kappa coefficient, CVI and CVR of items in the rehabilitation training compliance scale for patients with UI

Construct validity

The KMO value was 0.90, indicating that factor analysis was appropriate for the data. Bartlett’s test of sphericity was significant (χ2 = 851.130, df = 66, P < 0.001), indicating that factor analysis was appropriate for the data. EFA with varimax rotation yielded a three-factor solution that explained 85.99% of the variance in the data (Table 8). The scree plot identified three factors that accounted for 85.99% of total variation in the data (Fig. 1). The common factors were generally in agreement with the hypothesised structures of the scale during design, indicating that the structure of the scale was appropriate. No items were loaded below 0.40, and no items were removed from the scale; hence, the scale was formed from 12 items. The details of the scale’s factor loading after varimax rotation with three factors are shown in Table 9. The three factors were designated ‘pelvic floor muscle training compliance’ (5 items), ‘compliance of bladder training (5 items)’, and ‘urination diary recording’ (2 items). Following the identification of a three-factor solution by EFA, CFA was performed to test the structure of the scale. Goodness-of-fit indices were examined to determine the degree of fit between the data and a hypothesised model. The goodness-of-fit indices were as follows: χ.2 = 134.964; df = 51; p < 0.001; CMIN/DF = 2.646; RMSEA = 0.116; CFI = 0.94; NFI = 0.91; TLI = 0.92; and IFI = 0.94 [49, 57].

Table 8 Factor analysis: total variance explained
Fig. 1
figure 1

Scree plot

Table 9 Factor loading of the scale after varimax rotation with three factors

Convergent and discriminant validity

This scale showed good convergent and discriminant validity. The AVE and the square root of every AVE belonging to each factor were calculated, and the outcomes are shown in Table 10. There was a significant correlation between pelvic floor muscle training, bladder training, and urination diary recording (p < 0.05) in this scale, and the correlation coefficient was less than the corresponding AVE square root, indicating that the variables correlated with one another. There was also a certain degree of convergent and discriminant validity between the factors, and the scale had good convergent and discrimination validity.

Table 10 Convergent validity and discriminant validity

Criterion-related validity

The criterion-related validity of the scale with the Chen PFMSE scale was 0.89 (P < 0.05), indicating that the scale had high calibration correlation validity.

Discussion

This study developed a new scale to assess the training compliance of patients with UI, and its psychometric properties were assessed. The 12 items training compliance scale comprised three dimensions: ‘pelvic floor muscle training compliance,’ ‘compliance of bladder training,’ and ‘urination diary recording.’ The three dimensions performed well in reliability and content validity.

There were three steps in this scale development: the first step is a systematic literature review. To achieve the comprehensiveness of the literature review, it was completed under the guidance of a library staff who specialized in literature searching. The literature review developed the initial 25 items based on the characteristics of a rehabilitation training compliance scale for patients with UI. The second step was group discussions. To ensure the professionalism of the group discussion, the professional staff included in the group discussion were all related to the research area. After discussion, 12 items were extracted. The third step was Delphi consults. Twenty-two experts actively participated and gave constructive suggestions in the two rounds of Delphi consults. After the two rounds of Delphi consults, 8 items were revised. These three steps of scale development were rigorous and scientific, which ensured the objectivity, accuracy, effectiveness, and correlation of the scale items.

An effective response rate of 93.18% showed that the participants actively participated in the study and probably thought that this tool’s development would be helpful for their UI rehabilitation. The 12 items training compliance scale consisted of three factors. These three factors identified that the scale was meant to measure were extracted as was design. Even though the eigenvalue should be greater than one generally [58], clinical experts of the discussion group also suggested that the three factors should be retained in the scale. The accumulation of the extraction sums of the squared loadings and the scree plot also indicated that the three factors were reasonable. The CFA result suggested that the three-factor model fit the data. However, data analysis shows that the RMSEA is above 0.08, which indicates the model fit had a weakness.

The training compliance of patients with UI scale had good reliability, content validity, construct validity, convergent and discriminant validity, and criterion-related validity. This good reliability and validity indicate that this scale is a good measurement tool to assess the training compliance of patients with UI. This training compliance scale for patients with UI can facilitate the evaluation of training compliance, which can help medical staff examine patients’ weaknesses in training compliance and then develop specific interventions that improve the defective parts of training compliance for patients in the future.

The test–retest reliability and overall Cronbach’s α of this scale are as good as that of the Chen PFMSE scale. Compared with the Chen [26] PFMSE scale, this new scale’s I-CVI, S-CVI, the variance explained, and Criterion validity were better. The Chen PFMSE scale [26] were not reported the CFA, CVR, split-half reliability, Kappa coefficient, and convergent and discriminant validity, and these indexes performed well in the scale of training compliance of patients with UI. Compared with Chen’s scale development, this scale development process was more scientific and rigorous. Thus, the training compliance scale for patients with UI is a useful instrument for evaluating training compliance in patients with UI.

To our knowledge, there is no specific tool currently to assess UI patients' rehabilitation training compliance. This study constructed such an evaluation scale, using which medical staff can better know the rehabilitation training compliance of such patients. With the help of this scale, medical staff can promote rehabilitation training knowledge through online media, carry out health education lectures, interact with patients, provide personalised rehabilitation training guidance, and so on, all of which contribute to improving patients’ compliance with rehabilitation training [59]. Meanwhile, the scale can predict the recovery effectiveness and life quality of people with UI. Additionally, this scale could detect the weak aspects of training that people with UI do not comply with. This information could then be used to develop specific interventions to promote patient training compliance.

In summary, the development of the rehabilitation training compliance scale for patients with UI was scientific and strictly based on the scale designing principle. Therefore, this scale could be a reliable tool for medical staff to evaluate the rehabilitation training compliance of patients with UI.

This study has some limitations. First, the sample size of 123 patients with UI in this study was not big enough. Second, this study used convenience sampling; therefore, the representativeness of the sample could be insufficient. Third, this study unintentionally included some irrelevant information of the patients (for example, income). Fourth, items 1, 2, 3, 8, and 10 had cross-loadings (with two factors loading > 0.4). This overlap rate was too high, resulting in a slight weakness of the model’s fit. Therefore, the representativeness of the sample could be insufficient. Last, the minimum EFA and CFA sample affects the precision, stability, and replicability of the results. Affect by COVID-19, it was difficult to collect data at that time. Many people with UI were unable to seek medical treatment for UI is not an acute disease. Generally, for EFA and CFA, the stronger the data and the larger the sample, the more accurate the analysis will be. Further studies with larger sample sizes from a wider range of people are needed to validate the scale.

Conclusion

This scale is a reliable, scientific tool to evaluate the compliance of patients with UI to rehabilitation training in clinical practices. The future study should be perfected the scale and use the scale to assess the compliance of patients. Further, explore the factors that affect the patient's compliance and formulate the intervention plan to improve the compliance according to the relevant factors to promote the patient's recovery and improve the patient's quality of life.