We thank Aksan and colleagues for their interest in our recent article on the budgetary implications of using iron isomaltoside 1000 (IIM) relative to other intravenous (IV) iron formulations in patients with inflammatory bowel disease (IBD) and iron deficiency anemia (IDA) in the Danish setting [1]. We disagree with the key points raised and would counter that, in their letter, the correspondents have selectively quoted from their own study, made incorrect and misleading statements regarding the availability of data on the administration of high doses of IIM, and have, regrettably, demonstrated a misunderstanding of the modeling approaches employed in the analysis.

Before responding to the individual points raised, we would note two overarching aspects of the letter that appear to be internally inconsistent. Firstly, there is a great tension between what the correspondents consider to be an acceptably large sample size in different circumstances. On the one hand, they claim that extrapolation of data from 100 anemic patients in the Non-Interventional Monofer (NIMO) study to a Danish IBD population of 3522 patients is not “meaningful” on the grounds that the sample population is too small. On the other hand, they claim that their NMA-based efficacy comparison between IIM and ferric carboxymaltose (FCM) results in the “unequivocal” conclusion that FCM is more effective than IIM, despite (a) the non-significance of the findings and (b) IIM being linked into the network using data from just 113 patients on oral iron in a single study [2]. The implicit extrapolation in the latter case would be to all patients, globally, with IBD and IDA requiring IV iron. Maintaining that the network meta-analysis (NMA) results hold true in the global population of patients with IBD and IDA would appear to be completely at odds with the simultaneous dismissal of our assumption that the NIMO study anemic IBD subgroup would be representative of a Danish population of patients with IBD and IDA.

Secondly, there appears to be a contradiction between whether the correspondents would prefer the use of data reflecting “real-world” evidence or alignment with the summaries of product characteristics (SPC) or treatment guidelines. This is particularly apparent in the comments on dosing and retreatment frequency, in which the correspondents lament that our SPC-based dosing assumption “does not reflect real-life iron dosing”, while choosing to ignore the observed “real-life” iron retreatment frequency reported in the Kulnigg et al. study and instead recommending that retreatment frequency be based exclusively on the median time to patients reaching serum ferritin levels of less than 30 μg/l [3].

Assumption That IIM and FCM Are Equal in Efficacy and Safety

Regarding the “serious misinterpretation” of the Aksan et al. [4] NMA, we note the highly selective quoting of the correspondents’ own study and respond in kind; in the results of the NMA, the authors note that “[c]oncerning efficacy, no statistically significant difference was found when comparing FCM, [IIM and iron sucrose]”. By definition, this does not contradict our reading of the manuscript; the null hypothesis of no difference could not be rejected, and a significant difference between the IV iron treatments was therefore not demonstrated. We further note the correspondents’ vacillation on the credibility of their findings; contrary to the assertions of an “unequivocal” conclusion in the present correspondence, in a letter to the editor in 2017, the correspondents themselves noted “[t]hese types of analyses are more exploratory-pragmatic or ‘observational’ rather than confirmatory” [5]. Which is it: an “unequivocal” conclusion or simply “observational rather than confirmatory”?

The correspondents go on to cite rank probabilities and comparisons with oral iron as evidence that runs contrary to our modeling assumption of the equivalence of IIM and FCM. With regard to rank probabilities, we heeded the exact advice from the correspondents themselves: “researchers should use rank probabilities cautiously” and “it is advised to observe the estimated effects first and use the rankings only as a supplementary measure” [4]. We were indeed extremely cautious of using rank probabilities from the NMA, primarily because of the non-significance of the estimated effects, but secondarily because the authors failed to report the priors employed in the NMA [4]. With regard to the comparisons with oral iron, the significant difference between FCM and oral iron is simply not relevant and amounts to statistical chicanery when comparing IIM with FCM. Statistical significance is non-transitive; if A is significantly more effective than B, and C is not significantly more effective than B, it is a fundamental and egregious error to conclude that A is more effective than C.

We would further note that, despite the above verbatim quote on the lack of statistical significance in efficacy differences between IV iron formulations, the authors have recently published an editorial citing the very same study and noting adjacent to the citation that: “this study shows for the first time significant differences in efficacy and safety of the different intravenous iron preparations in patients with IBD” [6]. Not only was no formal analysis of safety conducted in the NMA but this recent reporting of significance is a direct contradiction of the original study. The claim in this editorial is therefore patently, undisputedly false, and appears to represent an example of negationism that has no place in the medical literature.

We do acknowledge that an absence of evidence of a difference is not evidence for the absence of a difference. Indeed, it is impossible to definitively demonstrate that two treatments have the same effect [7, 8]. But a central principle of health economic modeling relies on making assumptions that are congruent with the published evidence; on the basis of the lack of a significant difference in the Aksan et al. NMA and our own recent research in a general IDA population [9], we would vehemently defend our decision to assume equivalent efficacy until any additional evidence becomes available to the contrary. Evaluating the relative safety of IV iron formulations remains challenging given the heterogeneity of the endpoint reporting. A recent attempt to develop a framework for classifying hypersensitivity reactions using standardized Medical Dictionary for Regulatory Activities (MedDRA) queries (SMQs) has shown that safety differences may exist between IV iron formulations, but additional data and evidence synthesis would likely be required to conclusively demonstrate this [10].

Calculation of Population and Cohort Baseline Characteristics

We refer back to the above opening point on the correspondents’ own standards for extrapolation from small sample sizes and would note that, in targeting a margin of error of ± 0.5 g/dL with 95% confidence and using the NIMO standard deviation of 1.4 g/dL as a guide, a sample size of 31 would be sufficient for the hemoglobin values:

$$N = ~\left\lceil {\frac{{4 \cdot Z_{\alpha }^{2} \cdot s^{2} }}{{W^{2} }}} \right\rceil ~ = ~\left\lceil {\frac{{4 \cdot 1.96^{2} \cdot 1.4^{2} }}{{\left( {0.5 \cdot 2} \right)^{2} }}} \right\rceil = 31$$

Similarly, targeting a body weight margin of error of ± 5 kg with 95% confidence would require a sample size of 46 based on the standard deviation observed in the NIMO study.

We would further note that we conducted extensive sensitivity analyses around the body weight assumptions, varying the mean from 65 to 85 kg with no change in the directionality of the findings, and a trend toward reduced cost savings with IIM relative to FCM with lower mean body weight, as might be hypothesized based on the posology of the two formulations. We also refer to our previous work using a similar modeling technique, in which a two-way sensitivity analysis (comprising 20 individual analyses) over mean hemoglobin and body weight ranges of 8–11 g/dL and 65–85 kg showed that IIM remained cost saving relative to FCM in every analysis [11]. It would be extremely improbable that the overall Danish anemic IBD population averages would fall outside of these ranges.

Further to the correspondents’ comments on whether the Danish population enrolled in NIMO was comparable to the Swedish and Norwegian population, we can confirm that an analysis of variance (ANOVA) across the Danish, Swedish, and Norwegian anemic IBD subgroups showed that there was no significant difference in the baseline hemoglobin levels between the countries (between-group p = 0.41).

Dosage Assumption

We thank the correspondents for raising this issue as it is an important aspect of the analysis, and one to which we devoted a paragraph of the discussion in our original manuscript. The correspondents cite the NIMO study as evidence of our modeling analysis not reflecting “real-life iron dosing”; however, one of the key findings of the NIMO study was that 27% of anemic patients with IBD and 37% of all anemic patients enrolled in the study were still anemic at end-of-study. In the whole NIMO population, the mean iron dose given was just 1100 mg, 25% lower than the calculated total iron need of 1481 mg. We would note the similarity between this dosing shortfall and the proportion of patients whose IDA was not resolved at end-of-study. On the basis of these findings, we would contend that modeling the doses as administered in the NIMO study would have been an implicit endorsement of failing to correct iron deficiency in patients with IBD and IDA. Given that the other key finding of the NIMO study was that hematological response rate correlates with the administered iron dose, we feel that our choice of modeling based on the calculated iron need is entirely defensible, and was in very close alignment with the iron requirement calculations in the NIMO study (1363 mg in the NIMO study IBD subgroup versus our modeled estimate of 1374 mg).

Regarding our use of the simplified table of iron need rather than the Ganzoni formula, we were surprised to learn that Prof. Stein in particular appears to be endorsing the use of the Ganzoni formula, given that the 2015 guidelines that he coauthored note that “the formula is inconvenient, prone to error, inconsistently used in clinical practice, and underestimates iron requirements” [12]. Indeed, it was these exact guidelines, published on behalf of the European Crohn’s and Colitis Organisation (ECCO), that informed our use of the simplified table in the analysis. The guidelines specifically note that: “The estimation of iron need is usually based on baseline hemoglobin and body weight, and this is more effective for the treatment of IDA in IBD patients than individualized dosing based on the traditional Ganzoni’s formula” [12].

With regard to administering high doses of IIM, the correspondents make the false assertion that “the safety and efficacy of IIM as a single infusion at this dosage must be considered entirely hypothetical, since 1400 mg infusions have never been investigated in clinical trials of IIM”. While doses of exactly 1400 mg may not have been investigated, single doses much higher than this have indeed been investigated, and specifically so in patients with IBD in Scandinavia. The PROMISE study investigated single IIM doses of 1500 mg and 2000 mg, and cumulative doses of up to 3000 mg in 21 patients with IBD [13]. Over the course of the study, there were no serious adverse drug reactions, and only one mild, non-serious hypersensitivity reaction that resolved within 2 days. Hemoglobin levels increased from 10.7 to 13.4 g/dL (p < 0.0001, n = 14) at week 8 in patients receiving 1500–2000 mg of iron, and from 8.8 to 12.4 g/dL (p < 0.0001, n = 6) at week 16 in patients receiving 2500–3000 mg or iron [13].

Retreatment Period Assumption

We thank the correspondents for raising this and agree that the retreatment interval is a key aspect in determining the absolute cost savings. We would firstly note, however, that over a long enough time horizon (e.g., the 5 years used in the base case analysis) the relative cost savings are unaffected by this model parameter and that we conducted both one-way and probabilistic sensitivity analyses around this aspect of the analysis, with no change in the directionality of the findings.

The correspondents’ assertion that switching to the Ganzoni formula would result in a lower calculated iron deficit is correct on average (although crucially this does not hold true for every combination of body weight and hemoglobin), but their related assertion that this would result in lower cost savings with IIM relative to FCM highlights a misunderstanding of the modeling approach adopted. The model results are based on a combination of iron deficit calculations, a bivariate distribution of body weight and hemoglobin, and posological models of the IV iron formulations. The presumption that knowledge of a change in only one of these facets of the analysis could be used to predict the modeled cost differences reflects an overly simplistic understanding of the interplay between these aspects of the model. Where the Ganzoni formula results in a calculated iron deficit of greater than 1000 mg, regardless of the exact figure, this iron deficit could not be corrected with a single infusion of FCM if the SPC posological recommendations are followed: “the maximum recommended cumulative dose of Ferinject is 1000 mg of iron (20 mL Ferinject) per week” [14]. The same cannot be said of IIM, as the maximum dose with IIM is weight-based with no upper bound specified in the administration section of the SPC. For given populations, this results in a reduced proportion of patients requiring more than one IIM infusion when using the Ganzoni formula rather than table-based iron deficit calculations.

We would conclude by noting that the crux of our published analysis centers exclusively on posological considerations. Certain combinations of patient body weight and hemoglobin may result in situations where IV iron cannot be dosed to a sufficient level to correct a patient’s iron deficiency in a single infusion; our analysis simply illustrates that, in a population of patients with IBD in Denmark, these situations arise more often with FCM, and thereby incur additional costs of repeat infusions. The correspondents’ comments on the manuscript do nothing to change this fundamental reality and, as is hopefully apparent, we strongly and unreservedly reject the correspondents’ unfounded assertions that we have made flawed or tenuous assumptions with regard to the safety or efficacy of the iron formulations, the characteristics of the Danish population with IBD and IDA, the approach to dose modeling, or the iron retreatment frequency.