We thank Vorland et al. for their interest in our article and helpful comments relating to our statistical analysis and reporting. We have now reported the non-significant p values in Table 1 and agree that these will be useful for powering future studies in the area of phytate reduction and iron status. We acknowledge that we have discussed within-group differences more than between-group differences, mainly because the results were counter to our hypothesis that lowering the dietary phytate should lead to improved iron status and thus required further explanation. This is also related to the between-group findings, as it would have been expected that iron status would improve with reduced phytate. During re-analysis, we have detected a between-group difference for total body iron, which supports our overall conclusion that the low-phytate bread reduced iron status, most likely due to the strong acidic flavour which altered dietary habits.

Table 1 Data at baseline and after 12 weeks of intervention

Vorland et al. suggested that we use Bonferroni correction for the p values. We agree that the correction for multiple testing should be advised if there are no preplanned hypotheses and a large number of tests are carried out (Streiner DL, Norman GR. Correction for multiple testing: is there a resolution? Chest. 2011 Jul;140;1:16–18). There is debate about the general application of Bonferroni adjustment for p values, with some epidemiologists suggesting that they are, at best, unnecessary and, at worst, deleterious to sound statistical inference (Nakagawa S. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behavioral Ecology. 2004;15;6:1044–1045. Perneger TV. What's wrong with Bonferroni adjustments. Brit Med J.1998;316:1236–8. Savitz DA, Olshan AF. Multiple comparisons and related issues in the interpretation of epidemiologic data. Am J Epidemiol. 1995;142:904–908. Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990;1:43–6). Bonferroni correction is appropriate for e.g. genomic experiments where there are thousands of variables and the study is focused on generating a hypothesis, and the risk for Type 1 (false positive) errors is very high. However, in intervention studies that have been designed to test a specific hypothesis, such as ours (bread with low phytate increases iron status compared to bread with natural levels of phytate), use of Bonferroni or other false discovery rate correction factors is not necessary. We have measured eight variables (Table 1). Five of the variables are measured at different aspects of iron status and so are not fully independent of each other. This large number of markers of iron status is necessary as there is no one single marker that adequately accounts for iron status. Although false discovery rate correction would reduce the risk for false-positive results, the balance is that we greatly increase the risk for Type 2 outcomes (false-negative results). Other standard clinical parameters are there to indicate if there was any unexpected effect on the health or body composition that may have impacted on the results, and the alkylresorcinols, biomarkers of wholegrain intake, were measured to help explain if compliance may have been an issue—also independent of the primary hypothesis, but important for understanding the study outcomes. Blanket use of a false discovery rate factor is not always appropriate and indeed rarely applied in nutrition intervention studies with a clear hypothesis and many variables measured.

When there is a high rate of dropouts, it is important to consider any potential confounding effect. We acknowledge that a common problem with the presently used approach, per-protocol (PP) analysis, is that it can result in an overestimation of the treatment effect. Thus, an alternative approach would be intent-to-treat (ITT). A problem with ITT analysis may be that there is no outcome data for patients who dropped out. Another obvious disadvantage of intention-to-treat is that the subjects are mixed up with individuals who have not received the intended intervention at all, which gives a dilution effect and a clear underestimation of the effect in subjects who are really exposed to the intervention. Thus, in an ITT analysis, patients are not analysed according to the treatment actually received. When performing an ITT analysis of our data by using the principle of “last observation carried forward” there were no change in any of the iron status biomarkers in the high-phytate bread group (n = 49). In the low-phytate bread group (n = 53), there were no decrease in ferritin (p < 0.053), but a decrease in total body iron (p < 0.039), but no change in transferrin receptor (TfR) concentration, as reported in our paper. We have also explained the issues with the dietary intervention, and how this informed our use of PP instead of ITT. Comments from an independent statistician engaged by the European Journal of Nutrition supports our methodology for imputing missing data, although we acknowledge that there are several methods for this. Nevertheless, we have now performed between-group ITT comparisons applying the last observation carried forward approach. There were no between-group changes for markers of iron status (TfR p = 0.571; Ferritin p = 0.348; Body-Fe p = 0.499; Hepcidin p = 0.359; Using Mann–Whitney test).

In all, we thank Vorland et al. for their comments, which will help power calculations for future studies. The statistical reanalysis of our results underlines our earlier finding that counter to hypothesis, low-phytate wholegrain rye bread had a negative impact on iron status in young women. This may be related to the flavour of the test bread and the relatively high amount fed, and future studies with test products with improved flavour and different doses may find different outcomes.

To correct the original paper [2] an erratum [3] has been published.