Background

Multiple sclerosis (MS) is a progressive neurologic disorder that causes demyelination of affected nerves in the central nervous system [13]. MS can cause almost any neurological symptom, and it often progresses to cognitive and neurological disabilities in the patient, including a decline in mobility [1]. MS affects women three times as often as it affects men. It affects approximately 1 in 1000 people in the US and is the most common cause of neurological disability in individuals 20–45 years old [4, 5]. A majority of MS patients develop some form of lower urinary tract dysfunction due to disconnection between the brainstem and the lower spinal cord [68]. In particular, up to 75% of MS patients have neurogenic detrusor overactivity (NDO), a bladder disorder characterized by spontaneous overactivity of the detrusor muscle [9]. Symptoms of this disorder include urinary urgency, urinary frequency, and/or urinary incontinence [2, 10]. Incontinence has been identified as one of the worst aspects of the disease from the patient’s perspective [8]. High intravesical pressures may also lead to serious complications, including kidney infection, and reflux, ultimately causing irreversible damage [7]. Additionally, given all of the issues with NDO in MS it is noted that research groups are establishing guidelines for the evaluation of urinary disorders in MS [7, 8, 11]. Therefore, despite the high prevalence of urinary incontinence, urological evaluation and treatment are significantly under-accessed in this population [10]. This study outlines the development and validation of a shortened form of a new screening tool, the Actionable Bladder Symptom Screening Tool (ABSST), that healthcare providers can use to identify MS patients with urinary incontinence who may be in need of a more precise diagnosis and/or treatment for their urologic symptoms, including referral to a urologist. The objectives of this study were twofold: 1) to adapt the previously validated 17-item ABSST to a short form for ease and brevity of application in a medical setting that is clinically meaningful; and 2) to develop a scoring algorithm that would be interpretable in terms of referring/considering diagnosis and treatment. Novel item response and classical test methods were used to identify optimally performing items for inclusion in the short form assessment.

In view of recent research suggesting that patients and clinicians are more likely to use shorter instruments [1215], it was decided to develop a short form version of the ABSST that is clinically meaningful from the clinician’s perspective. The development of this short form version of the instrument is described here. In addition, a scoring algorithm was developed for the ABSST, with a identification of a score that would recommend further diagnosis or referral to a urologist. The novel approach of developing the scoring algorithm presented here is based on pivot anchoring as a means of demonstrating clinical meaningfulness.

Methods

Multi-site observational study

Data to support the short form adaptation was collected in a US-based, non-randomized, multi-center, stand-alone observational study in male and female patients who have MS with and without NDO. MS patients were recruited through neurology practices to complete patient reported outcome assessments (PRO) (the ABSST and the Overactive Bladder Questionnaire – Short Form (OAB-q-SF)) as well as a demographic and health information form. Written informed consent was needed before the subject could participate in any part of the study. All research involving patient interviews and patient completion of questions were in compliance of the Helsinki Declaration and had full Institutional Review Board (IRB) approval from Copernicus Group (an independent institutional review board organized and operating in compliance with regulations governing institutional review boards set forth in 21 CFR and ICH guidelines, as well as 45 CFR when applicable.) (Protocol # MAP1-11-049). Completed ABSST questionnaires (with patient specific identifying information removed) were shared with the referring clinician and the clinician denoted whether or not referral to a urologist was recommended based on the specific patient’s responses. The presence of NDO was not a requirement for inclusion into the observational study. Details of the design of the multi-site observational study will be reported in a separate publication summarizing the development and validation of the ABSST long form [16].

Psychometric criteria/methods for inclusion on the ABSST-short form

The psychometric evaluation and subsequent development of the ABSST-Short Form integrated responses on the longer version of the tool collected from both patients and clinicians. A set of predefined statistical criteria was developed and tested to ensure that the resulting shortened tool included clinically meaningful and reliable items. These were as follows:

  • Classical statistical methods

    Percentage of patient responses at the floor of measurement (lowest response option) less than 50% [17].

    Item correlation with the overall scale (e.g., the extent to which the items are related to other items on the scale) greater than 80% [17].

  • Item response theory methods

    Degree to which patients response as expected (e.g. measurement error) as measured by Rasch Infit Statistics [18] (acceptable criteria between 0.60 and 1.40).

  • Clinician perspective

    Greater than 50% of clinicians indicated that the item was clinically relevant.

For clinician response, a pivot anchoring approach was used. In this analysis, the 23 recruiting clinicians were asked to review the ABSST, circle the items that were important to them, and indicate the threshold at which they thought the items would indicate a potential bladder problem or be a cause for concern [18, 19].

Classical statistical methods

Individual items were evaluated using two key classical statistical methods: percentage of items at the floor (lowest response option) of measurement and degree to which individual items correlated with the overall scale score. Floor effect refers to a high percentage of patients scoring the lowest score on an individual item. If floor effects are too pronounced, it could interfere with the ability of the instrument to screen patients as experiencing problems. An item was considered to have floor effects if greater than 50% of patients endorsed the lowest category. The degree to which individual items correlated highly with the remaining items on the scale was measured as the magnitude of the correlations ≥ 0.80 [17].

Item response theory methods - rasch analysis

Individual items of the ABSST were evaluated using the Partial Credit model, an extension of the 1-parameter Rasch model for use with items using multiple response formats (dichotomous, polytomous). A Rasch analysis allows for the accumulation of evidence associated with each response to an item by a group of respondents rather than relying on group level statistics (i.e., classical test theory) [20]. More specifically, this particular model allows for the joint estimation of symptom severity (item difficulty) and person’s level of severity (person ability). The underlying assumptions for this item analysis are: 1) local independence and 2) unidimensionality. Local independence is evidence that items are conditionally independent of each other (i.e., each item measures a unique symptom). Unidimensionality is evidence that items on a scale measure one underlying trait (e.g., overactive bladder (OAB) symptoms). In the case of symptom measures, we assume the underlying trait as a hierarchical symptom structure which may, in fact, be considered multidimensional. The term unidimensionality, therefore, indicates the underlying symptomology of the condition.

The ability of the ABSST items to reflect an underlying latent construct was assessed by performing item fit analysis. Infit and outfit statistics compare the actual responses on the survey with responses predicted along the range of OAB severity. Acceptable values range from 0.60 – 1.40 for questions with rating scale response options [18]. Items that do not fit the Rasch model may be measuring domains other than the domain of interest or may elicit atypical responses (e.g., endorsement of a high severity symptom by persons with few symptoms). Two item fit statistics were calculated: mean-square infit and mean-square outfit. High infit reflects the tendency of the item to elicit unexpected responses among respondents whose level on the measure approximates the difficulty of the item. High outfit reflects the tendency of the item to elicit unexpected responses among respondents whose level on the measure is above or below the difficulty of the item. Mean-square infit and outfit statistics ≥ 1.40 indicate significant item misfit.

As a further test of local independence of the ABSST domains, individual items were evaluated under the Rasch model using the person separation and reliability, analogous to the Kuder-Richardson Formula-20 (KR-20) [2023]. The KR-20 is defined as a measure of reliability for dichotomous response choices. Values above 0.90 indicate homogeneity of responses. Under the Rasch model, KR-20 is used as each response category has a hypothesized probability of 0.50 of endorsement to the adjacent category. The item and person separation indices estimate the separation of persons and items on the underlying latent variable. Rasch measurement allows for the additional analysis of both items and persons distributed along the same linear continuum (trait). In order to evaluate whether each of the domain items covers the continuum, items must be sufficiently separated in terms of their item difficulty (which can represent the severity of the underlying trait). The threshold for separation is an index of 2.0 and an associated separation reliability of 0.80 [22]. Rasch person reliability estimates allow for evaluation of whether the items appropriately estimate a person’s symptom severity on the underlying trait. When reliability falls below the customarily accepted threshold of 0.70, it indicates that patients may be experiencing symptoms that the measure does not cover (i.e., construct deficiency) [22]. Rasch item reliability estimates the item severity range (i.e., Does the severity scale associated with the items cover the distribution of severity?); which is considered acceptable if ≥ 0.70 as well [22].

Pivot anchoring

In order to create a clinically useful ABSST, a pivot anchoring analysis was conducted utilizing both novel and classical test theories to identify optimal items. All 28 recruiting clinicians were asked to review the ABSST and circle which items were important to them in their clinical decision making, and at what threshold they thought the items would indicate a potential bladder problem or would be a cause for concern. Twenty-three responses were received and used in the pivot anchoring analysis.

A scoring method was developed which included both the clinician and patient responses. First, the clinicians referring patients to the quantitative study were surveyed on attributes of the patient population relative to the instrument. Each clinician was then asked to specify the lowest rating on each of the 16 items which would indicate a clinically meaningful potential bladder problem. Based on the clinicians’ ratings of the items, each patient’s response was then denoted by either a 1 (indicating a potential bladder problem) or a 0 (indicating no potential bladder problem). This scoring algorithm was then tested using classical test theory and item response theory methods (described in Pivot Anchoring and Predictive Validity sections).

Predictive validity

Logistic regression was used to determine the predictive validity of the ABSST total score to identify patients that would receive a recommendation to see a urologist. The predictive value was based upon a clinician rating of ‘Yes’ or ‘No’ on whether they would recommend a patient to see a urologist based upon the patient’s responses to the ABSST. Results from logistic regression models testing different cut-points predicting the recommendation were summarized with odds ratio, sensitivity and specificity, Positive predictive value (PPV), Negative predictive value (NPV), percent who warranted referral, and area under the receiver operating curve (ROC). The odds ratio was defined as those MS patients more likely to be referred to a urologist than not. The sensitivity refers to those results that are true results (e.g. would refer to a urologist) while specificity refers to those results that are truly negative results (e.g. would NOT refer to a urologist). PPV refers to proportion of positive test results that are true positives (e.g. proportion of patients who would be referred to a urologist are warranted to be referred) while the NPV refers to the proportion of negative results that are true negatives (e.g. the proportion of patients who would NOT be referred to a urologist are NOT warranted to be referred). The percent of those patients who warrant referral is the percentage of patients classified as either a referral being warranted to a urologist or not. The area under the ROC refers to the ability to classify those who would warrant or would not warrant being referred to a urologist [24].

Results

Patient population

A total of 151 patients, all with MS and with and without NDO, were recruited by 28 clinicians in various US geographical locations to participate in the study. Patients had a mean age of 48.2 (SD 12.11) years, with age ranging from 22 to 80 years of age. Patients had been diagnosed with MS an average of 9.1 (SD 7.24) years. Approximately 70% of patients described the severity of their MS symptoms over the past 6 months as mild, 23.8% described their symptoms as moderate, 1.3% described their symptoms as severe , and 4.6% described them as none/not applicable (4.6%). Severities were captured using patient self-report in a demographic form. Patients were asked how severe their MS symptoms were in the past 6 months on a 0–3 rating scale (0=Not applicable / None, 1 = Mild, 2 =Moderate (uses aides to walk), 3 = Severe (uses wheel chair sometimes)). Approximately 41% of patients reported having a history of or currently having urinary incontinence and/or urinary urgency.

Short form development

Table 1 outlines the results of the predefined statistical criteria that were developed to ensure that clinically meaningful and reliable items were included (please refer to Section 2.2). Eight items (plus the final “Yes/No” item) were identified from the original 17-item ABSST. Boxes marked with an (X) represent those items that met the patient level criteria. The black boxes indicate those items that met the clinician level analysis or statistical criteria. The final row in the table indicates important items from the pivot anchoring analysis. For example, on Item 1 (Urinate right away), if a clinician saw that a patient responded with a “4” (All of the time) then it would be a cause for concern and possibly referral to a urologist. Item 17 is the “Yes/No” item asking patients if they would seek help for their bladder issues. Please see Additional file 1: Table S1 for the ABSST short form.

Table 1 Clinician and patient short form results

Classical statistical results

Evaluation of the classical statistical methods, floor effect and item correlation statistics, indicated that items 1, 7, 8, 9, 10, and 13 met one or both criteria for inclusion into the short form. More specifically, items 1, 7, 8, and 9 met the criteria for inclusion into the short form based on floor effect less than 50%; and items 1, 7, 10 and 13 met the criteria for inclusion into the short form based on item correlation with scale greater than 80%.

Item response theory Rasch analysis results

Based on the criteria detailed in the methods section, each of the items on the ABSST demonstrated the evidenced person reliabilities ≥ 0.70. Fit statistics indicated that 9 item responses were within the expected criteria (Table 2). However, items 2, 4–6 and 14–16 scored outside of the acceptable ranges, demonstrating less than desirable infit or outfit statistics. For example, Item 16 was shown to not directly measure Impact of Symptom Intensity as it scored an infit statistic of 2.68 and an outfit of 2.57, both of which are outside of the threshold of 1.6 – 2.4. This result suggests poor fit to the Rasch model. Similarly, Items 2, 4, 5, 6, 14, and 15 fell outside of the acceptable range as previously described.

Table 2 Infit and outfit statistics by item

Predictive validity

Results from logistic regression models testing different cut-points predicting the recommendation to refer to a urologist are summarized in Table 3. The results of the odds ratio, sensitivity and specificity, PPV, NPV, and percent correctly classified suggest a raw score of 6 or more for further evaluation from a urologist. The overall ABSST Total Score c-statistic was equal to 0.928 and the logistic regression model was robust (Hosmer and Lemesow Goodness-of-Fit Test (p=0.5180)).

Table 3 Performance of the revised ABSST total score at various cut-points* predicting Clinician’s Urologist Referral

The performance of the ABSST total score predicting clinician’s referral to a urologist is presented in Figure 1 via the ROC curve. Figure 1 displays how different cut-points on the ABSST total score affect sensitivity and specificity of the prediction for referral to an urologist. The prediction model across the entire range of classification thresholds was evaluated; plotting the true positive identification rate against the false positive rate (1-Specificity) for various cut scores. In this study, the cut-point of greater than or equal to a raw score of 6 had a sensitivity of 85.7%, and specificity of 93.1% (i.e., 85.7% patients would warrant a referral to an urologist and 93.1% of the patients would not warrant a referral to an urologist).

Figure 1
figure 1

ROC Curve of the ABSST Total Score at Cut-Points Predicting Referral to a Urologist.

Discussion

The goal of this study was to validate the short form Actionable Bladder Symptom Screening Tool (ABSST) as a patient driven, clinically useful measure that is easy to administer and score. The goals were to develop a tool that is sensitive, MS-specific, easy to interpret, easy to use in a clinical setting, and multidimensional, all of which encourage patient-clinician interaction. The unique application of a mixed methods approach was used to select the best items from the long form, using both qualitative and quantitative techniques.

The original, long form ABSST was developed as a de novo measure after it was determined that existing measures did not appropriately assess the symptoms and impacts of NDO on MS patients. The long form of the ABSST is a 17-item instrument (16 items with an additional “Yes/No” item asking patients if they would like to receive help for their bladder problems) that covers three domains: Bladder Symptoms, Coping Strategies, and Impact of Bladder Symptoms. It was developed using established qualitative methods [25, 26], including a literature review and clinician input to identify potential symptoms, open-ended concept elicitation interviews with MS patients who have OAB, and face and content validity testing through cognitive interviews (face-to-face debriefing interviews on the ABSST – details to be reported in another publication) with another group of MS patients with OAB. Once the qualitative phase was completed, a US-based, non-randomized, multi-center, stand-alone observational study was carried out to evaluate the measurement properties of the newly developed instrument. Analysis of item completion, item and scale distribution, and predictive validity of the long form ABSST demonstrated strong psychometric properties (details reported in another publication) [17, 19, 2730]. The ABSST total score also demonstrated predictive validity, identifying patients who would receive a referral. In order to demonstrate that this screening tool was valid and reliable as a short form, classical test theory and item response theory approaches were taken to ensure that each item included provided the most information, was clinically meaningful, and demonstrated predictive attributes for patient referral to a urologist. Overall, concurrent validity for each subscale as well as predictive and concurrent validity of the total instrument were shown, demonstrating strong psychometric properties.

The simplified scoring method for bladder problem assessment developed here will make it easier for clinicians to identify patients with MS who may have potential problems. The approach used to develop the scoring algorithm adheres to the FDA guidance, is psychometrically valid, and appropriately utilizes pivot anchoring [31], which has been widely used in the interpretation of clinically meaningful points along a categorical continuum. This methodology allowed for synthesis of meaningful cut-points along the continuum where patients are likely to demonstrate urinary problems, as reported by clinicians. This mixed statistical methods approach to item reduction and scoring optimizes selection of items for making sound clinical decisions based on clinicians’ assessment of the need for patient referral. These data, when subjected to predictive validity calculations, provided very strong results indicating that clinicians could use it to refer patients appropriately.

This study has several limitations. First, the ABSST is a screening tool and was not designed for diagnostic purposes. Second, although it was identified as being clinically meaningful, the relevance in overall clinical practice has not yet been tested. Lastly, because of the error range on the tool itself, it is possible that some patients’ symptoms may go untreated (under-sensitive results) while other patients may incur unnecessary tests and costs (under-specified results). However, the specificity and sensitivity are strong which indicates that the ABSST Short Form is able to differentiate patients who would likely benefit from a referral from those who would not.

Other questionnaires have been developed such as an 8-item screening tool to aid in identifying patients who may have OAB in a busy primary care setting which has a sensitivity of 98% and specificity of approximately 83% [32]. Moreover, the 3-Item OAB awareness tool (which is a short version of the 8-item screening tool) has recently been validated with a sensitivity of 82% and specificity of 91%. In addition, the International Prostate Symptom Score IPSS has been used to identify the severity of bladder symptoms in an MS population (All patients had an Expanded Disability Status Scale score of <6.5, with a mean of 3.4). The 8-item IPSS was originally developed to measure symptom severity in benign prostate hyperplasia. It has also been utilized to measure the prevalence of bladder problems over 2–3 years [33]. Only the ABSST Short Form has been developed or psychometrically tested to the scientific rigor of the FDA Guidance for PRO development. Overall the short form ABSST demonstrated good sensitivity and specificity as it has a positive predictive value of approximately 86% and a negative predictive value of approximately 93%. This shows that the short form ABSST maintains the integrity of the longer form tool which had a positive predictive value of approximately 76% and a negative predictive value of approximately 95% [16]. The sensitivity and specificity of the short form ABSST are in-line with other validated tools in the field and demonstrate its strength and potential positive impact in a clinic and primary care setting [34]. Of particular importance is the ability of the tool to detect more true-positive cases which increase the cost-effectiveness of screening and early detection of OAB problems as opposed to the false-positive cases, which can be detrimental to the screening process.

Conclusions

The simplified scoring method of the ABSST allows for clinicians to easily identify patients with MS who may have potential urinary problems. The approach used to develop the scoring algorithm adheres to the FDA guidance, is psychometrically valid, and appropriately utilizes pivot anchoring [31], which has been widely used in the interpretation of clinically meaningful points along a categorical continuum. The methodology allowed for synthesis of meaningful cut-points where patients are likely to demonstrate urinary problems, as reported by clinicians. The reduction in items and scoring optimizes the most informational items for making sound clinical decisions. These data, when subjected to predictive validity calculations, indicate that clinicians can use the ABSST to refer patients appropriately. In conclusion, ABSST provides a new method for assessing bladder problems among MS patients, and may facilitate earlier diagnosis, treatment, and referral to a specialist.