Background

Idiopathic Normal Pressure Hydrocephalus (NPH) is a neurological condition caused by enlarged communicating ventricles, defined by an Evans index (EI) of at least 0.3 [1, 2]. The classic triad of NPH is gait dysfunction, cognitive decline, and urinary incontinence, with gait dysfunction being an essential symptom of the triad [3,4,5]. One way to assess if a patient will benefit from a shunt is measuring their responsiveness to temporary drainage of cerebrospinal fluid (CSF), either by a lumbar puncture (30–50 ml) or extended lumbar drainage (300–400 ml) [5]. The goal of temporary drainage of CSF it to compare the results of pre-drain and post-drain objective testing to determine if the patient had a meaningful improvement in symptoms [6].

Assessing change in objective gait or cognitive measures following temporary CSF drainage requires a defined model to ensure that patients are selected for shunt surgery in a reliable and valid manner. Grading scales are the primary models used to assess change in score following temporary CSF drainage. Grading scales are efficient for defining improvement at the group level. However, they have not been validated to differentiate a significant change from chance improvement at the individual patient level [7, 8]. The present study aimed to address this gap of knowledge by establishing standardized clinical change models using data from patients evaluated for suspected NPH to differentiate chance improvement from a clinically significant change at the individual level. Clinically significant change is defined as a change in score on an established clinical measure that is beyond what would be predicted by regression while accounting for measurement error [9]. We will present clinical change models for the 10 Meter Walk Test [10], Timed Up & Go [11,12,13], Dual Timed Up & Go [14, 15], 6-Minute Walk Test [16,17,18], Mini-Balance Evaluation Systems Test [19], Montreal Cognitive Assessment [20, 21], and Symbol Digit Modalities [22]. Additionally, we plan to assess the discriminate validity of measures used in patients presenting with suspected NPH designed to assess the domains of gait velocity, balance, and endurance.

Methods

Participants

A retrospective chart review of 323 patients who underwent temporary drainage of CSF at the Johns Hopkins Center for CSF Disorders. All patients included in the study were over the age of 60 and seen within the departments of Neurosurgery and Neurology, between October 2013 and March 2019. Patients were included in the study if they had cerebral ventriculomegaly and the presence of gait, cognitive, or urinary dysfunction. For this analysis, we did not classify patients as idiopathic NPH as the same criteria are used to assess change in score for both idiopathic and secondary NPH. However, patients that had a brain tumor or subarachnoid hemorrhage were excluded. The study was conducted with the approval of the Johns Hopkins Institutional Review Board. Since this was a retrospective study involving only data extraction and analysis, informed consent was waived by the IRB. Data once extracted was anonymized for analysis.

Outcome measures

Physical therapists completed all gait assessments, while trained research assistants completed cognitive testing. Gait assessments were administered in hallways with smooth floors. Assistive devices were used when required for safe ambulation. Patients were required to use the same assistive device for all trials. Pre-drain assessments were completed between 1 and 4 h of the beginning of the drain. Post-drain assessments began when the patient was medically cleared to walk after the drainage procedure, less than 60 min after LP, and between 1 and 4 h after ELD. Gait measures were divided into three domains - velocity, balance, and endurance.

Gait velocity

For all gait velocity measures, patients were instructed to walk, “As quickly as you can, safely.” 10 Meter Walk Test (10 MWT): patients started standing up, walked 10 meters in a straight line, and stopped. Timed Up & Go (TUG): patients started seated in a chair with armrests, stood up, walked 10 feet, turned 180 degrees; walked back and sat down in the chair. Dual TUG: identical to the TUG except while patients walked, they performed a serial subtraction of three. All gait velocity measures used the time required to complete measured to hundredths of a second as the final score.

Endurance

6-Minute Walk Test (6MWT): Patients walked as far as they were capable of in 6 minutes. The final score for the 6MWT was total distance walked measured in feet.

Balance

Mini-Balance Evaluation Systems Test (Mini-BEST): is a dynamic measure of balance consisting of 14 items all scored from 0 to 2. Patients attempted tasks such as rising from a chair with their arms crossed, attempting to raise their heels off the ground while standing upright, etc. The maximum score was 28 points with a higher total score indicating better balance.

Cognition

Montreal Cognitive Assessment (MoCA): assesses global cognitive function. The MoCA is scored out of 30 points with higher scores indicating better performance.

Symbol Digit Modalities Test (SDMT): is a test of visuomotor coordination relying on a combination of attention, processing speed and working memory.

Statistical analyses

Before defining the data analysis plan, we knew that we had attempted temporary drains on patients with extreme levels of impairment that go beyond what would be consistent with suspected NPH. To address this issue, outliers were removed based on pre-drain scores using the Standard Outlier Formula to trim the data [23]. The MoCA lower bound was 6.5 with an upper bound greater than 30 (removed n = 6). The TUG lower bound was below zero seconds and the upper bound was 51.37 s (removed n = 33). The Dual TUG lower bound was below zero seconds with an upper bound of 77.83 s (removed n = 17). Ten MWT lower bound was below zero seconds, with an upper bound of 33.1 s (removed n = 31). No outliers were removed for the SDMT, MiniBEST, or 6MWT.

In a sensitivity analysis to assess if there was a significant difference between LPs and ELDs, we compared Pre-drain scores, Post-drain scores, and percent change of scores using t-tests. After removing outliers, there was no significant difference between LPs and ELDs for the MoCA, TUG, Dual TUG, Mini-BEST and 10 MWT. There was a significant difference between groups at pre-drain for the 6 MWT. For patients with a baseline of over 500 ft, there was no significant difference in response to temporary drainage for the 6MWT. For this study, LPs (n = 238) and ELDs (n = 72) were pooled together for analysis. If a patient had undergone multiple temporary drains, only the first drain was used for analysis.

The primary objective of data analysis was to create empirical models for discerning clinically significant change from chance improvement at the individual patient level. Clinically significant change is defined as a change in score on an established clinical measure that is beyond what would be predicted by regression while accounting for measurement error. To create clinical change models, we used the Standardized Regression-Based model (SRB). We chose the SRB model because it best fits the needs of our population because of the wide range of scores at baseline [24]. For calculating the SRB models, we followed procedures described by McSweeney et al., [9] SRB models can use multiple linear regression to predict the posttest score for an individual based on the pretest score and other relevant variables. The predictor variables used in our regression equations were, Pre-drain score, sex, age, BMI, height, Evans index, past medical history of conditions affecting gait, assistive device used, education level, and depression. However, as others have noted, even when significant, the added predictor variables beyond the Pre-drain (baseline) score did not add any increased predictive value to the SRB models [24,25,26,27,28,29]. The final models used for all measures were simple SRB models. Once the predicted score has been calculated using the linear regression model, the score can be converted into a change z-score using the following equation: z-score = (Yo -Yp)/SEest, Yo is the observed posttest score, Yp is the predicted score and SEest is the standard error of the estimate from the regression equation. For change to be clinically significant, the z-score must exceed ±1.64 (90% confidence interval) [30, 31].

The secondary objective was to assess the discriminate validity of dividing the gait measures by the domains of velocity, balance, and endurance. To assess if there was a relationship between measures, a correlation matrix using the Pre-drain scores was computed. For measures that were highly correlated (R > 0.70), a Pearson chi2 analysis using the outcome of the temporary drain for each measure was performed (outliers were not removed for the chi2 analysis). For any combination of measures for which chi2 was significant, a regression analysis was performed to assess the relationship between the Pre-drain to Pre-drain and Post-drain to Post-drain scores of the measures. If the Pre-drain regression analysis was significant (p <  0.05), a Post-drain regression analysis was performed. For the regression analyses, models were checked for a quadratic or cubic relationship, and heteroscedasticity. If a model was found to be heteroscedastic, cooks’ distance was used with a cutoff of 4/n as a cutoff for overly influential points [32].

Results

Participants

Table 1 presents baseline patient demographic data and clinical characteristics including age, race, sex, education, EI, and past medical history that could affect gait (stroke, transient ischemic attack, Parkinson’s disease, peripheral neuropathy, osteoarthritis, degenerative joint disease, and spinal disorders).

Table 1 Baseline demographics and clinical characteristics of (N = 323) study participants

Patients were predominantly Caucasian (89%) and highly educated. The mean age was 74.9 years (SD ± 6.4), with a mean body mass index (BMI) of 27.75 (SD ± 4.6), 39% of patients used an assistive device, and 55% of patients had a past medical history that could affect their gait.

Outcome measures

Table 2 shows Pre-drain and Post-drain scores for the outcome measures.

Table 2 Pre-and Post-drain results of cognition, gait, balance, and endurance

The test-retest reliability of the Pre- and Post-drain measures was high, with a reliability coefficient ranging between 0.83–0.96. The percent change of the mean value between Pre-drain and Post-drain gait measures ranged from 14.48 to 19.18%. For the MoCA, the mean Pre-drain and Post-drain percent change was only 2.98%.

Table 3 shows the SRB equations predicting Post-drain scores using the Pre-drain score for each measure.

Table 3 Regression Coefficients for Standardized Regression-Based (SRB) models of cognition, gait, balance, and endurance

Across all measures, patients with minor impairment at Pre-drain required a smaller percent change to reach the threshold of clinically significant change than patients with moderate to high levels of impairment.

For assessing discriminate validity, we found that all baseline gait measures were highly correlated (r = 0.67–0.85) (Table 4).

Table 4 Discriminate validity Pre-drain correlations of cognition, gait, balance, and endurance

For the chi2 analysis, we found the TUG was significantly associated with all other gait measures (6MWT P <  0.001; 10 MWT P <  0.001; Mini-BEST P = 0.023). When applying regressions models using TUG as the predictor variable, there was a quadratic relationship with all other dependent variables.

Table 5 shows for Pre-drain regressions of TUG, Mini-BEST (C = 27.278, ßTUG Pre-drain = − 0.902, ßTUG Pre-drain2 = 0.011, SE ± 1.096, R-squared = 0.553, P <  0.001) and 6 MWT (C = 2089.879, ßTUG Pre-drain = − 95.028, ßTUG Pre-drain2 = 1.246, SE + 70.744, R-squared = 0.71, P <  0.001) were both highly significant and homoscedastic.

Table 5 Discriminate validity regression coefficients for the timed up & go at pre-drain and post-drain

The 10 MWT was also highly significant at pre-drain (C = .974, ßTUG Pre-drain = 0.782, ßTUG Pre-drain2 = − 0.007, SE ± 0.798, R-squared = .769, P <  0.001) but heteroscedastic after influential points were removed. For Post-drain regressions of TUG, the Mini-BEST (C = 28.679, ßTUG Post-drain = − 0.854, ßTUG Post-drain2 = 0.009, SE ± 0.799, R-squared = 0.589, P <  0.001) and 6 MWT (C = 2307.064, ßTUG Post-drain = − 116.008, ßTUG Post-drain2 = 1.651, SE ± 66.163, R-squared = 0.734, P <  0.001) were both highly significant. After influential points were removed both analyses were homoscedastic. The 10 MWT was highly significant (C = 3.606, ßTUG Post-drain = 0.441, ßTUG Post-drain2 = 0.001, SE ± 0.375, R-squared = 0.738, P <  0.001) at Post-drain and heteroscedastic after influential points were removed.

Discussion

In our study, we have created SRB clinical change models for assessing clinically significant change for patients undergoing CSF drainage as part of an evaluation for suspected NPH for the TUG, Dual TUG, 10 MWT, Mini-BEST, 6MWT, MoCA, and SDMT. Clinically significant change is defined as change on an established clinical measure that is beyond what would be predicted by regression while accounting for measurement error. Clinical change models can differentiate clinically significant change from chance improvement at the individual patient level. The most novel finding was that the percent change required for clinically significant change is not fixed across the range of impairment. The percent change in improvement increases as patients become more impaired. For the tests of discriminate validity, we found that the TUG is significantly related to both pre-drain and post-drain measures of the Mini-BEST, 6 MWT and, 10 MWT. Our findings show that these measures are not discriminate when used in the suspected NPH population.

When evaluating patients for possible NPH, the primary objective is to determine if there was a significant change in the patient’s symptoms following temporary drainage of CSF. Many different scales have attempted to quantify change following temporary CSF drainage. The vast majority of the scales are grading scales [7, 33,34,35,36,37]. Grading scales use categorical cutoffs to separate patients based on their level of impairment. After the temporary CSF drainage, if the patients’ Post-drain scores are within a less impaired category, they are considered improved. One commonly used rating scale for NPH was created using data from geriatric normative data. This scale assesses NPH using four separate domains (gait, neuropsychological, balance, and continence), all normalized to a 100-point scale. For a patient to be considered improved, they have to improve by greater than four points overall [7]. The limitation of this grading scale and others created for NPH is that it does not have an empirical method for differentiating chance improvement from a significant change at an individual patient level. The criteria for improvement in this grading scale is based on summed overall improvement across all four domains. This allows for change in score that is due to chance to contribute towards perceived improvement, thus to the recommendation for shunt surgery.

In this study, we attempted to address this issue by creating SRB clinically significant change models for common gait and cognitive measures (TUG, Dual TUG, 10 MWT, Mini-BEST, MoCA, SDMT, and 6 MWT) using data collected from patients presenting with suspected NPH that underwent temporary drainage of CSF. Incorporating clinical change model brings an empirical approach to the selection of individual patients for shunt surgery. Clinically significant change models allow physicians to make an evidenced based decision about the clinical significance of change following temporary drainage of CSF.

For temporary drains of CSF, the TUG was shown to be an efficient measure of gait for patients presenting with suspected NPH. The TUG does not require a trained physical therapist and can be administered by clinical staff in approximately 3 minutes. It is a reliable predictor of shunt outcome in NPH patients [38,39,40]. In our analysis, the TUG was significantly related to detailed measures of balance, endurance, and gait velocity.

This study has both strengths and limitations. Strengths include a large number of patients with detailed and well-established quantitative measures of several gait and cognitive measures. The high test-retest reliability of the measures allows for reliable clinical change models to be computed. Limitations include the unknown sensitivity and specificity of shunt outcomes for patients selected based on the clinical change models. To address this lack of information a study is planned to use the clinical change models in the center for CSF disorders to select patients for shunt surgery and assess both the short and long-term shunt outcome of the patients. As well, we intend to establish Minimal Clinically Important Difference models for shunted patients to assess the efficacy of shunting patients at both the individual and group levels.

Conclusion

Standardized Regression Based Clinically Significant Change Models allow for physicians to use an evidenced based approach to differentiate clinically significant change from chance improvement at the individual patient level. The Timed Up & Go was shown to be predictive of detailed measures of gait velocity, balance, and endurance.