Introduction

Hereditary transthyretin (TTR) amyloidosis with polyneuropathy (hATTR-PN) is a rare, autosomal dominant, systemic amyloidosis that is characterized primarily by progressive ascending sensorimotor neuropathy, with or without autonomic involvement, although mixed phenotypes are common [1, 2]. The central estimate of global hATTR-PN prevalence is approximately 10,000 persons, but it may be as high as 40,000 [3]. hATTR-PN is traditionally categorized as either Val30Met [4] or non-Val30Met. The former is the most common variant globally [5]. These mutations make TTR tetramers prone to dissociating into monomers that undergo misfolding due to their physical structure; the misfolded proteins aggregate into insoluble amyloid fibrils that are deposited on peripheral nerves and in vital organs, leading to the symptoms of hATTR-PN. If untreated, the average survival is 10–15 years after symptom onset [5,6,7,8].

Tafamidis is a selective TTR stabilizer that holds TTR tetramers together to prevent formation of misfolded TTR, and is approved in over 40 countries to delay neurologic disease progression in early-stage hATTR-PN [9]. The tafamidis clinical development program demonstrated the drug’s long-term safety and effectiveness in delaying hATTR-PN disease progression for up to 5.5 years [10,11,12,13,14,15], with comparable outcomes observed in Val30Met and non-Val30Met patients compared to placebo [16].

Disease progression in hATTR-PN is typically measured according to standardized staging criteria that reflect the severity of systematic neurological involvement. One of the most frequently used staging systems is the polyneuropathy disability (PND) score [17], which ranges from stage 0 (no impairment) to stage IV (confined to a wheelchair or bedridden).

Since the initial approval of tafamidis in 2011 by the European Medicines Agency [18], various observational open-label studies have assessed its effectiveness among samples composed predominantly of stage I patients in the routine clinical (i.e., “real world”) setting [10, 19,20,21]. Several key characteristics were consistent across these studies, including the assessments used to measure neuropathy progression and the duration of assessment intervals. However, the mutant variant distributions, ages of onset, and timing of treatment initiation relative to disease stage differed among the studies. This inter-study heterogeneity—in addition to small sample sizes, different analytical approaches, and variable follow-up times—has made it difficult to interpret the uniformity of the effect of tafamidis on hATTR-PN progression. Key unresolved questions include whether progression and treatment response differ between mutation type, age of onset, and/or disease staging schemes.

We present herein a proof-of-concept study and applied example of a statistical method that can be used to pool real-world and randomized trial tafamidis study data. Methodological details are provided in an overview, and an applied example is described. The example constructs synthetic cohorts from summary statistics reported in the literature, and then contrasts the generated synthetic cohorts in a mixed model for repeated measures (MMRM) to characterize therapeutic benefit.

Methods

Integrative data analysis (IDA) [22] is a statistical pooling method to combine studies and then construct synthetic treatment and control cohorts. Aggregation of heterogeneous studies into synthetic cohorts can be thought of as a meta-analytic technique for raw data. Optimal weighting and scaling techniques are used to produce a synthetic cohort from each individual study that up-weights each study’s unique and usable information while simultaneously down-weighting its idiosyncratic noise. Available optimal pooling techniques range from the use of fixed and random study effects to inverse probability weighting (IPW) through propensity methods. This produces a synthetic cohort that is maximally representative of each study’s useful information. These techniques can be used to aggregate data from treatment studies and control/natural history studies to create synthetic treatment and control arms. These synthetic treatment and control cohorts can then be contrasted to determine the time-dependent value of therapeutic intervention. These synthetic cohorts yield greater precision by increasing sample size while shrinking error variance.

Four extant studies were selected for analysis in addition to the tafamidis registration study, because each was among the largest recent studies and contributed comprehensive examinations of the relationship between disease progression and tafamidis treatment in a manner commensurate with the approach taken in the registration pivotal study [23]. Note that only three of the four were completely independent samples, and that the Coelho et al. [10] cohort considered here was the tafamidis crossover extension of the placebo arm in the registration trial. Table 1 provides a summary of these studies.

Table 1 Summary of trial-based and real-world prospective studies in patients with hATTR-PN treated with tafamidis

The five included studies were used to characterize study-specific trends in average change from baseline in Neuropathy Impairment Score-Lower Limb (NIS-LL) scores. In addition, the study-specific trends were averaged within treatment arms to construct synthetic cohorts for treatment and controls (i.e., natural history cohorts). Averages were used to construct the synthetic cohort trends because more sophisticated pooling procedures are not available with summary data. The averaging procedure was used to serve as a proof of concept for a forthcoming work in which patient-level data from some of the studies described herein (as well as others) will be pooled to create synthetic cohorts for direct analysis of treatment versus control/natural history cohorts.

Summary statistics reported in each of the included studies were used to obtain or construct study-stratified change from baseline means and corresponding confidence limits. In some studies—notably the 2012 registration study by Coelho et al. [23]—change from baseline means and 95% confidence limits were not tabulated but rather were presented in figures only. In such cases, tracing software was used to recover as precisely as possible the numerical values presented in the figure. For the control group, the only published data were from the placebo arm of the 18-month tafamidis registration study [23]. A separate cohort of controls was simulated that behaved in a manner consistent with our expectation of natural history disease progression in neuropathy. Specifically, this simulated cohort bore the characteristics of the 2012 registration study’s placebo arm, except that it had worse progression (to reflect an assumed attenuation of any placebo effect) and it included a projection up to 30 months.

The outcome measure for this exercise was the average change from baseline in NIS-LL scores and corresponding 95% confidence limits. However, not all studies reported NIS-LL scores in the change from baseline metric, nor did they necessarily report 95% confidence limits when change from baseline means were available. Where average change from baseline and corresponding 95% confidence limits were reported (i.e., Coelho et al. [10, 23] and Cortese et al. [19]), the statistics were used directly. Where time-specific NIS-LL means and standard deviations were reported (i.e., Lozeron et al. [20] and Planté-Bordeneuve et al. [21]), change from baseline means and standard deviations were computed using properties of the distribution for the difference in correlated Gaussian variables. Specifically, if Y1 and Y2 are correlated Gaussian vectors, with distribution Y1 ~ N (µ1, \(\sigma_{1}^{2}\)) and Y2 ~ N  (µ2, \(\sigma_{2}^{2}\)), then (Y2 − Y1) ~ N  (\(\mu_{2} - \mu_{1} ,\;\sigma_{1}^{2} + \sigma_{2}^{2} + 2\rho \sigma_{1} \sigma_{2}\)), with corresponding standard deviation of \({\text{SD}}_{\Delta } = \sqrt {\sigma_{1}^{2} + \sigma_{2}^{2} + 2\rho \sigma_{1} \sigma_{2} }\). In every case where change from baseline had to be constructed from time-dependent means and standard deviations, the correlation coefficient, \(\rho\), was not reported. In these cases, \(\rho\) was conservatively estimated as 0.4 for the purpose of approximating the difference standard deviation.

In the case of the Planté-Bordeneuve et al. [21] data, time-dependent means and standard deviations were reported for the NIS but not the NIS-LL. As the NIS-LL is a subset of the NIS, NIS-LL estimates were approximated from these summary statistics by scaling them to a range consistent with that observed for the other NIS-LL data. Specifically, the means were divided by 5.4 and the variances divided by 2.0. For these data, the change from baseline statistics were computed as a function of the distribution for the difference in Gaussian variables scaled by a constant. Once the \({\text{SD}}_{\Delta }\) was computed, 95% Wald confidence limits were computed for the corresponding standard error. This procedure yielded change from baseline means (\(\mu_{\Delta }\)) and corresponding 95% confidence limits. These estimates were averaged to construct the tabulated and plotted synthetic cohort average trend and corresponding 95% Wald confidence limits, stratified by treatment arm.

Multivariate normal data were simulated from the synthetic cohort time-dependent treatment-arm-stratified means and variances using the “mvrnorm” function in R version 3.4.3 [24]. The simulated data were constructed under a balanced design with n = 100 patients per cohort and complete data in repeated measures from month 6 through month 30 at 6-month intervals. The variances were used to construct treatment-arm-stratified unstructured covariance matrices. In addition, correlated baseline covariates were simulated for both treatment arms generated to have a mean of 5 and a standard deviation of 2. Simulated change from baseline data was modeled via the MMRM. This model was estimated using the MIXED procedure in SAS 9.4 software [25]. The model was parameterized using reference cell coding, treating the synthetic placebo as reference for the treatment effect and month 6 assessment as reference for time effect with continuous baseline covariate. Least squares means (LSMs) were estimated for each treatment by assessment level. The estimated LSMs were then plotted over the observed estimated synthetic cohort means to assess the model’s ability to recover the observed assessment- and treatment-dependent means. This last component was conducted as part of the proof of concept to demonstrate that the model proposed for analysis of the final synthetic cohort data would successfully recover the functional form and observed means with acceptable precision. Fixed-effect point and interval estimates and variance components are not reported.

Tables were generated using the REPORT procedure in SAS 9.4 software, while figures were generated using the “ggplot2” package in R version 3.4.3. This article is based on previously conducted studies and does not contain any new data collected from human participants or animals.

This research is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.

Results

The reported or, in some cases, computed (e.g., the data computed for Planté-Bordeneuve [21]) values are tabulated for review in Table 2 (treated cohorts) and Table 3 (untreated or placebo cohorts). In addition, the estimates are plotted in several figures. Figure 1 presents the NIS-LL change from baseline trends stratified by study. Two clusters of trends are observed: the Cortese et al. and Lozeron et al. trends were comparable, and the Coelho et al. [23] and Planté-Bordeneuve et al. [21] trends were comparable. In all cases, the confidence bands were wide, reflecting in part the studies’ small sample sizes, with the exception of Coelho et al. [23], which had a notably larger sample (n = 125) than the other studies (mean n = 57). The only slightly outlying trend was associated with Coelho et al. [10]. However, the Coelho et al. [10] trend is distinct, since the original placebo cohort from the registration trial was switched to tafamidis treatment for the open-label continuation study; the plotted trend is the change from baseline in NIS-LL scores post-crossover.

Table 2 Study-stratified NIS-LL change from baseline means for tafamidis treatment cohorts
Table 3 Study-stratified NIS-LL change from baseline means for cohorts not receiving tafamidis
Fig. 1
figure 1

Study-stratified mean (95% confidence limits) NIS-LL change from baseline trend for tafamidis treatment cohorts. BL baseline, M month, NIS-LL Neuropathy Impairment Score-Lower Limb, TX treatment

Broadly, the trends demonstrate a slowing of disease progression in NIS-LL associated with tafamidis. The average of these study-specific trends is presented in Fig. 2. The average trend, plotted in black, fits through the center of all study-specific trends, with a shape consistent with the Gompertz function suggested as appropriate for the NIS-LL data in hATTR-PN [26]. The same process was used to generate Figs. 3 and 4 for the Coelho et al. placebo arm [23] and the simulated natural history data. The treatment and placebo synthetic cohort trends were plotted together in Fig. 5. The trends overlap early, but as expected, diverge around month 12, as disease progression is uncontrolled in the untreated synthetic cohort and progression slows within the tafamidis-treated synthetic cohort.

Fig. 2
figure 2

Study-stratified mean (95% confidence limits) NIS-LL change from baseline trend for tafamidis treatment cohorts, overlaying synthetic treatment cohort trend. BL baseline, M month, NIS-LL Neuropathy Impairment Score-Lower Limb, TX treatment

Fig. 3
figure 3

Study-stratified mean (95% confidence limits) NIS-LL change from baseline trend for natural history cohorts. BL baseline, CTRL control, M month, NIS-LL Neuropathy Impairment Score-Lower Limb

Fig. 4
figure 4

Study-stratified mean (95% confidence limits) NIS-LL change from baseline trend for non-tafamidis cohorts, overlaying synthetic control cohort trend. BL baseline, CTRL control, M month, NIS-LL Neuropathy Impairment Score-Lower Limb

Fig. 5
figure 5

Synthetic cohort-stratified mean (95% confidence limits) NIS-LL change from baseline trend. BL baseline, CTRL control, M month, NIS-LL Neuropathy Impairment Score-Lower Limb, Tx treatment

Within Fig. 6, the observed values and corresponding colors reported in Fig. 5 are retained (control = gray; tafamidis = black). These trends are overlaid with the model-estimated trends, which are also color-coded (control = orange; tafamidis = blue). As seen in Fig. 6, the observed synthetic cohort means (OBS) were precisely recovered by the MMRM-based values (LSMs). Notably, the discrepancy in estimates was zero between baseline and month 6 in both treatment and placebo synthetic cohorts, and zero between month 18 and month 24 for the placebo synthetic cohort. All other discrepancies were minor, and none evinced a departure from the observed functional form. Thus, the discrete-time MMRM is expected to precisely recover the observed means in the forthcoming analyses.

Fig. 6
figure 6

Treatment arm-stratified observed synthetic cohort means versus MMRM estimated LSMs. BL baseline, CTRL control, LSM least squares mean, M month, MMRM mixed model for repeated measures, NIS-LL Neuropathy Impairment Score-Lower Limb, OBS observed synthetic cohort, TX treatment

Discussion

In this work, a synthetic cohort approach was applied to the analysis of real-world outcomes for tafamidis for the treatment of hATTR-PN, including comparison to natural history data. Our findings demonstrate the merits of employing synthetic cohorts. The average trend lines for the synthetic cohorts did not distort any of the study-specific trends. The error variance, as measured by the width of the 95% confidence bands, shrank relative to any individual study, but not excessively so. In addition, within the treatment synthetic cohort, the average trend and confidence bands mimicked a Gompertz function, which is a well-known function for modeling decelerating exponential effects that asymptote asymmetrically. This is of interest as the Gompertz function has been proposed elsewhere for the analysis of neurodegenerative outcome measures within hATTR-PN [26].

While the Gompertz function may be a good approximation to the average trend, one might encounter difficulty in properly specifying the model in the context of repeated measures and random effects. In contrast, a discrete-time MMRM is easily parameterized and can flexibly accommodate non-linear trends. Therefore, a second part of this proof of concept was to demonstrate that if individual-level data were simulated from synthetic cohort means and variances, a discrete-time MMRM could precisely recover the synthetic cohort-stratified mean trends. In fact, the discrete-time MMRM did succeed in recovering the observed means, pointing to the ability of this model to detect and accurately reflect synthetic cohort treatment arm differences in NIS-LL disease progression.

This proof-of-concept report has some limitations. The included studies may have had some overlap in the patient samples, which may have artificially limited the variance and caused the confidence limits to be underestimated. Summary statistics available in the literature were used, limiting the methods available for optimally weighting the pooling procedure averaging across studies.

In addition, three of the five studies considered were composed of samples that were 100% Val30Met. The remaining two studies comprised mixed samples (< 50% Val30Met). Published evidence has suggested that progression and treatment response outcomes differ substantially between Val30Met and non-Val30Met populations [27,28,29]. However, the recent analysis by Gundapaneni et al. [16] suggested that progression and treatment responses were no different between Val30Met and non-Val30Met populations treated with tafamidis, after adjusting for baseline neuropathy status. A limitation of this proof-of-concept study is its inability to address the difference in progression and treatment response between these important sub-populations. However, given the modest sample sizes in the real-world data available to date, no single study has been able to do this either. As a consequence, given the findings of Gundapaneni et al., a new question that IDA may be uniquely positioned to answer is whether mutation-dependent progression and treatment differences are important, or whether they are artifacts arising from modest sample sizes and potentially insensitive analysis techniques (i.e., responder analyses). It is our contention that a precise answer is likely achievable only by optimally pooling available data via IDA, and this issue speaks to the need to conduct this pooling research.

The next step in this line of research is to apply similar methods to the raw data corresponding to a larger set of real-world data studies. Doing so will allow for an IDA approach [22, 30] in which the patient-level data from a group of independent studies is pooled, rather than the aggregate. With patient-level data, more sophisticated and sensitive methods of pooling studies under optimal weighting paradigms can be employed. These include, but are not limited to, incorporation of fixed and random study effects, propensity-matching procedures, and the preferable hybrid of these approaches (i.e., doubly robust propensity weighting). With access to individual patient characteristics and outcomes, more sophisticated statistical techniques and models can also be applied to achieve a greater understanding of clinical outcomes by using a unified process that adjusts for baseline and changes over time. By better characterizing the natural history of untreated hATTR-PN cases and the relative benefit of tafamidis treatment, IDA would significantly facilitate clinician–patient communication regarding available treatment regimens and their respective risks and benefits.

Conclusion

Beyond the registration trial, evidence published to date on the natural history, disease progression, and tafamidis treatment outcomes associated with hATTR-PN has demonstrated some heterogeneity and has been derived from studies with modest sample sizes (due to the low prevalence of this disease). IDA and synthetic cohorts are a technique that can be used to analyze myriad studies with shared features to increase the precision of the characterization of hATTR-PN treatment outcomes. In so doing, modest samples can be aggregated to form large cohorts from which increased precision of inference may be obtained. As this is a proof of concept for the application of IDA to patient-level progression data in hATTR-PN, no definite conclusions about the effectiveness of tafamidis can be made from these results. Rather, one can only conclude from the evidence presented herein whether IDA is a method that may be useful in the future for characterizing disease progression and drug effectiveness in a larger cohort using patient-level data.