As a compound measure of the overall destruction, preservation, and repair of brain tissue in MS patients, cerebral atrophy encompasses both neuroaxonal loss as well as the processes of demyelination, remyelination, gliosis, and edema. The present study shows that the rate of cerebral atrophy can be detected within a 6-month period and, when applied as primary outcome measure in short-term clinical trials with feasible sample size, requires a potent drug to obtain sufficient power.
Compared to sample size estimates for trials using atrophy measured with SIENA over longer periods of follow-up, the present sample sizes prove to be larger. To detect a 50% decrease in atrophy rate, trials for RRMS patients at 90% power showed approximately 69 patients per arm to be required over a 1-year follow-up period and 40 patients per arm over a 3-year follow-up period [11], whereas for trials for secondary progressive multiple sclerosis with 1-year follow-up, 56 patients per arm are required [16]. The lower sample sizes in these studies are explained by the larger atrophy rates due to larger trial durations and the accompanying larger detectable effect sizes. Also, detection of atrophy over short intervals may be prone to increased measurement errors leading to a greater variability of the atrophy measure, thereby requiring larger subject numbers than would be expected in a longer-interval study. Since SIENA proved to accomplish larger statistical power due to greater measurement precision compared to other measures of atrophy [11], the current sample size estimates are likely the optimal achievable numbers for trials of short duration.
When interpreting the current sample sizes, some considerations should be taken into account. First, the calculations assumed a 100% treatment effect to resemble zero loss of brain volume, whereas healthy controls are known to experience a small amount of brain volume loss. For comparison, a previous study showed a PBVC of 0.11% (0.30) within healthy controls for a follow-up duration of 1 year [11]. When taken into account, a larger sample size will be required to detect a similar treatment effect. Second, the treatment effects are assumed to be effective from onset and constant over time. When this assumption is not met and a compound takes time to become maximally effective, the detected effect sizes will decrease and subsequently increase the required sample size. Wallerian degeneration initiated by axonal damage prior to treatment, for example, might result in a delay of the true effect caused by the already ongoing atrophic processes at initiation of the drug. Another important consideration is the confounding effect of other factors influencing brain volume such as demyelination, remyelination, gliosis, and inflammation. In particular, the resolution of edema and inflammation induced by anti-inflammatory agents, a process known as “pseudoatrophy” [6], can cause loss of brain volume and cloud the measurement of true tissue loss. In order to measure a true neuroprotective effect, especially within a shorter period of time, a future trial might assess the neuroprotective effect of an experimental treatment as an add-on therapy to immunomodulatory-treated patients in both arms of the trial. Such a design, however, will likely result in higher sample sizes because of decreased rates of cerebral atrophy in the groups compared.
To partly overcome the aforementioned limitations, a more effective trial design would be to perform a run-in period of, e.g., 3 months in which the neuroprotective compound is administered and subsequently perform the short-interval atrophy assessment, thereby providing the opportunity for the applied compound to become maximally effective and confounding processes initiated prior to the trial to wane off.
An advantage of our study is that the present calculations are applicable in multicenter trials since the underlying data were obtained in multiple centers, with the accompanying variability caused by varying scanners and analyses. Also, we found a moderate but highly significant PBVC of −0.33% in a group of untreated RRMS patients within a period of 6 months. When annualized, this rate is well within the range of previous results (0.6–1.35%/year [6]) which enhances the generalizability of the sample size estimations since the calculations have not been biased by an atrophy rate at the higher end of the range.
A possible gain in statistical power for shorter-termed trials using cerebral atrophy as primary outcome measure might be achieved by adding multiple scanning time points to a trial. A recent study showed that, by placing additional scans towards the start and end of the trial, reductions in total variance and hence reductions in trial size of 41% could be achieved in patients with Alzheimer disease, using the brain boundary shift integral method for determining the rate of cerebral atrophy [17]. In particular, due to the within-subject variance contributing more to the overall variance at short intervals, acquiring multiple scans has more impact in shorter studies. Although relatively smaller gains in power can be expected from adding time points for more precise measures such as SIENA, the effect on the required sample size should definitely be explored in future multi-time point MS atrophy data.
The on-study association of the rate of atrophy and the number of PBHs suggest axonal loss to be one of the driving mechanisms of brain volume loss in MS patients. Previous studies showed the on-study volume of black holes to be closely related to supratentorial brain volume [18] and the baseline volume of black holes to be significantly correlated with subsequent development of atrophy [10, 19]. In contrast, the present study was not able to show a significant relationship between both Gd enhancement activity and T2 lesion load and changes in brain volume as shown in previous studies [8, 10, 19, 20]. These findings, together with the current results, show that the associations between focal tissue changes and atrophy are moderate at best over a relatively long period of follow-up and weaken within shorter amounts of time, most probably due to the noise introduced by the increased variability in the measurements and that focal tissue changes in MS only partly explain atrophy development. This also emphasizes the added value of cerebral atrophy as a measure of the overall destruction of neuronal tissue, encompassing not only measurable focal destruction but also unaccounted diffuse destruction.
The detectability of atrophy on the short term has been attributed to the measure of activity of the patients assessed within a cohort. Hardmeier et al. [8], who found a significant atrophy rate within 3 months, stated that their finding most likely reflected the natural history of a very active group of RRMS patients with well-established disease and could explain the absence of atrophy in comparable short-termed studies [9, 21]. The current study population underwent partly MRI-based selection criteria at baseline and can be regarded as an active cohort of MS patients, shown by the increase in T2 lesion load and Gd-enhancing lesion number and volume over the study period and the high proportion of active patients. The influence of the measure of activity of patients on the rate of atrophy is also reflected when the subgroups based on the applied baseline selection criteria are compared (Table 2). Although a more active cohort might influence the generalizability compared to a more random sampled cohort of patients and selection criteria make it more difficult to recruit subjects, the trade-off is a larger sample size being required when using unselected patients.
In conclusion, our finding suggests that the rate of cerebral atrophy is a detectable outcome measure in short-term clinical trials in RRMS and applicable in terms of study power while a potent drug is applied.