Introduction

Nitrogen (N) pollution from agriculture causes damages to human health and ecosystems (Vitousek et al. 1997; Galloway 1998; Galloway et al. 2003, 2008). The damages can be mitigated with (1) changed consumption patterns and (2) improved N use efficiency in agriculture (de Vries et al. 2013; Sutton et al. 2011). These two approaches can be efficiently combined only if N flows in agroecosystems are well understood. However, scientific understanding will only help if it can be accessibly summarized to decision-makers, preferably in ways that allow direct comparison of policy alternatives. Therefore, a challenge to the research community is to continually reevaluate quantitative indicators and other communication tools: Do they accurately represent current science? Are they useful for stakeholders?

Several common indicators for N pollution are constructed from farm-gate N budgets (Oenema et al. 2003). For example, one common indicator is the N surplus (N input − N output), which is often reported per unit cropland area (Nevens et al. 2006; Oenema 2006; Dalgaard et al. 2012), and sometimes per unit product (Dalgaard et al. 1998; Schröder et al. 2003; Mihailescu et al. 2014; Mu et al. 2016), or inversely, as product quantity per unit N surplus (“eco-efficiency”, Nevens et al. 2006; Beukes et al. 2012). Another indicator often constructed from farm-gate N budgets is the N use efficiency (NUE), defined as N output/N input (Schröder et al. 2003; Godinot et al. 2014; Gerber et al. 2014).

However, it has been argued that the farm-gate system boundary can be misleading if there are substantial inputs from other farms, such as purchased feed for livestock (Schröder et al. 2003; Bleken et al. 2005). To address this problem, it has been proposed to expand the system boundary from the farm level to also include upstream farms, thereby defining a chain N budget. For example, Godinot et al. (2014) argued, based on a set of 38 mixed dairy/crop farms, that indicators based on chain N budgets give more relevant information about the N pollution associated with a product. A similar comparison of farm-gate and chain N surpluses for 32 specialized Dutch and Irish dairy farms (Mu et al. 2016) showed that, although the off-farm crop cultivation did contribute substantially to the chain N surplus, the expanded system boundary most often did not change the ranking of the dairy farms’ N use efficiency. In summary, indicators based on chain N budgets say something else than the corresponding farm-gate indicators, and they can be more relevant, depending on the question.

Another indicator with wider system boundary than the farm-gate is the N footprint, defined by Leach et al. (2012) as “the total amount of Nr released to the environment as a result of an entity’s resource consumption” (where Nr means reactive N, all forms of N except \(\text{N}_{2}\)). Thus, the N footprint has a wider system boundary than the chain N surplus, as it includes Nr losses from food waste, sewage treatment, energy use, etc. But N footprint calculations for food products (Leip et al. 2014; Galloway et al. 2014) show that most of their N losses take place on farms, within the boundary of a chain N budget. Hence, the practical difference between the N footprint and the chain N surplus is not entirely obvious. Especially the N footprint estimates by Leip et al. (2014), which excludes the consumption phase, may seem confusingly similar to the chain N surplus.

In summary, indicators such as the chain N surplus (Godinot et al. 2014; Mu et al. 2016) or the N footprint (Leach et al. 2012) may better capture differences between products or production systems than the more common indicators based on farm-gate N budgets. However, there are conceptual differences between the N footprint and the N surplus indicators, and it is unclear which one is most appropriate.

This paper contributes to the understanding of the three indicators (1) farm-gate N surplus, (2) chain N surplus, and (3) N footprint, by analyzing a large set of Swedish conventional (\(n_{\text{conv}} = 1566\)) and organic (\(n_{\text{org}} = 283\)) dairy farms. The questions we set out to answer are: How do N flows differ between conventional and organic milk production in Sweden? How are these differences reflected in the three indicators? How is the analysis affected by uncertainties? What conclusions can be drawn about the indicators’ usefulness for different purposes?

Method and materials

This part of the paper is structured as follows. First, the data sources are described. Then, the calculations and definitions of the three indicators are described. Finally, two sections describe how we tested the results for statistical significance and sensitivity to some uncertain parameters.

Data sources

The main data source was a set of farm-level nutrient budgets from Swedish farms, collected within the national advisory project Focus on Nutrients. The project, led by the Swedish Board of Agriculture since year 2001, aims to reduce N and P losses in agriculture, mainly by providing free advisory services. The nutrient budgets are established by the advisors together with farmers during farm visits. We consider the data to be of high quality since it is based on a detailed inventory by specially trained advisors.

From this database, we extracted a set of specialized dairy farms. We tried to strike a balance between including as many farms as possible on one hand, and only including specialized dairy farms on the other, so as to minimize the amount of confounding factors. Some farms had nutrient budgets for more than one year, and in these cases we selected only the latest, to obtain the most recent data. We also excluded budgets from before 2004 (222 conventional farms and 9 organic), to obtain a relatively recent dataset. Specifically, we selected according to the following criteria:

  1. 1.

    Only farms with dairy cows,

  2. 2.

    No farms with suckler cows,

  3. 3.

    No farms with other livestock than cattle,

  4. 4.

    No farms selling more crop products N than livestock N,

  5. 5.

    Only data from the 11-year period 2004–2014 (the last year we had data from), and

  6. 6.

    Only farms which were exclusively conventional or organic (excluding, e.g., a few farms with organic livestock and conventional crops).

The selection contained farm budgets from 1566 conventional and 283 organic farms, accounting for 75 and 80%, respectively, of the conventional and organic dairy cows in the full dataset. This is not surprising, since, according to Eurostat, most Swedish dairy cows are found on specialized dairy farms.Footnote 1 The size distribution of the sampled farms, measured as the number of dairy cows per holding, was similar to the distribution seen in national statistics.Footnote 2 Hence, at least concerning specialization and size distribution, the selected farms were typical for Swedish dairy production.

However, in general, it is hard to judge how well the sample represents Swedish dairy production. One possible source of bias is self-selection, since farmers choose themselves to participate in the advisory project. It seems likely that participating farmers are more interested in improving economic and/or environmental efficiency than the average, and perhaps they are also more efficient. This bias is not possible to control for without a comparable sample of non-participants. Two other possible confounders are (1) time trends over the 10-year period, and (2) effects of participating in the advisory project. We checked for such effects using multiple linear regression of farm-gate N surplus against year and number of prior advice visits, separately for the conventional and organic farms. In both cases, the adjusted \(\text {R}^2\) was below 0.001, which suggests that these two variables account for almost none of the variation in N surplus. Other possible biases were not formally tested for. In summary, although it is hard to verify statistically, for the purposes of this paper, we believe that the extracted dataset is representative of typical Swedish dairy farms.

Estimation of farm-gate N budgets

Farm-gate N budgets were calculated as shown in Table 1. Inputs and outputs of fertilizers, feed, biological N fixation (BNF), bedding, atmospheric N deposition, crop products, livestock, and milk were measured or estimated by the Focus on Nutrients project (Swedish Board of Agriculture 2015).

Table 1 Key characteristics and farm nitrogen budgets for the average organic and conventional farms, based on the dataset of 283 organic and 1566 conventional farms

The estimates of BNF in forage legumes were calculated using the model by Høgh-Jensen et al. (2004), with parameter values specific for Sweden (Frankow-Lindberg 2003). Crop-specific parameter values were used, most importantly for various mixtures of red clover, white clover, and grass. In grass/clover mixtures, the fraction of clover in the harvest is an important parameter, which was estimated on each farm by advisors and farmers together. BNF in grain legumes plays a smaller role, but was also included in the data we received, based on crop-specific parameters and farm-specific estimates of harvests (Swedish Board of Agriculture 2015; Frankow-Lindberg 2003). The estimation of BNF is an important source of uncertainty which we paid special attention to, as described in the Section “Uncertainty analysis” below.

Estimation of chain N budgets

The chain N budget is a combined budget for the dairy farm and the off-farm production of purchased feed (Mu et al. 2016). The chain-level system boundary is illustrated in Fig. 1. We calculated the chain N budget by replacing the purchased feed N term in the farm-gate budget by a sum of soil surface N budgets (Oenema et al. 2003) for the corresponding feed crops. The soil N budgets were weighted to account for (1) the amount of each feed, (2) the amount of the corresponding crop needed for that feed, and (3) if the feed has co-products, an allocation factor, such that the feed product only accounts for a part of the N budget. These parts are further described in the following paragraphs.

Fig. 1
figure 1

Illustration of the system boundaries used. The first indicator, farm-gate N surplus, has the dairy farm as system boundary (darker gray box). The two other indicators, chain N surplus and N footprint, both use the wider chain system boundary (lighter gray with dashed border). The chain system boundary includes soil N budgets for off-farm feed cultivation. Some of the off-farm N surplus is allocated to feed crop co-products (see Section "Estimation of chain N budgets"). Note that some minor farm-gate flows are omitted from this figure for clarity; see Table 1 for the full budgets

As the farm dataset only stated the amount of N in purchased feed, we estimated the amounts of different purchased feeds based on sales statistics from the largest Swedish feed industry (Öhman 2016, personal communication), specifying the fractions of the most important ingredients in feeds sold to organic and conventional dairy farms. The data were current in 2016 and are therefore not fully representative of the period 2004–2014, but this was the best data that could be found. The composition of purchased feed is summarized in Table 2.

Table 2 N budgets for feed ingredients, expressed as \({\text {g N}}{/}{\text {kg\,feed}}\)

To calculate the amounts of crops used for purchased feed, three steps were taken. First, the feed sales data contained broad categories which we interpreted as follows. Cereals: one third each of winter wheat, spring barley, and oats. Rapeseed products: rapeseed meal for conventional farms, rapeseed expeller cake for organic. Soy products: soy meal for conventional, soy expeller cake for organic. Bran products: wheat bran. Second, the feed masses were calculated using the amount of purchased feed N, the feed mass fractions, and feed N concentrations taken from the Focus on Nutrients project and Feedipedia (Swedish Board of Agriculture 2015; Feedipedia 2016). Third, the crop masses were calculated from feed masses using mass allocation factors from a life cycle assessment (LCA) database for Swedish feeds (Flysjö et al. 2008). Feed ingredients that accounted for <5% of the weight in both organic and conventional feed were summed under the heading “Other”, which we assumed to have N concentration and mass-basis soil N budget as the weighted average of the other feeds.

The soil N budgets were calculated with N application rates and crop yields according to Swedish statistics (Statistics Sweden 2014). Crop-specific statistics separating conventional and organic farming were available only for year 2011, so this data was taken to represent the whole period. N concentrations in crop products were taken from the Focus on Nutrients database (Swedish Board of Agriculture 2015). Atmospheric N deposition of 10 kg N/ha was assumed for all off-farm crop cultivation. All feed crops except soybeans were assumed to be produced in Sweden. For soybean, we assumed an average crop yield of 3000 kg/ha and no application of fertilizer N.

BNF in off-farm faba bean cultivation was estimated using the same model as for on-farm BNF (Swedish Board of Agriculture 2015). For soybeans, BNF estimates were based on the review by Salvagiotti et al. (2008).

For feed products with co-products (wheat bran: wheat flour; rapeseed and soy products: rapeseed and soy oil), we used economic allocation factors from the Swedish feed LCA database (Flysjö et al. 2008), assuming that the same allocations were applicable for both organic and conventional products.

Estimation of N footprints

The N footprint is defined as “the total amount of Nr released to the environment as a result of an entity’s resource consumption” (Leach et al. 2012). Thus, to calculate the N footprint of a product, we used an estimate of the net losses of Nr from the production system. The net losses are not equal to the chain N surplus. To see why, remember that the N budget typically includes atmospheric N deposition (Oenema et al. 2003). But the deposition term does not contribute to net losses, since it enters the agricultural system from the environment (the atmosphere). Deposited N may come, e.g., from another farm as \(\text{NH}_{3}\), or from high-temperature combustion as \(\text{NO}_{\text{x}}\), and in these cases it already contributes to the N footprint of products from that other farm or combustion process. Thus, to avoid double-counting, deposition should not be included in the N footprint of the receiving cropland. This distinction was not discussed by Leach et al. (2012), but we have noted that there seem to be different interpretations in the literature (see Leip et al. 2014; Pierer et al. 2014). Similarly, N accumulation on farms or in soil, typically included in a farm-gate or soil N surplus, are not losses. In summary, the N surplus contains atmospheric N deposition and net N accumulation, but the N footprint does not.

As noted in the Introduction, the original definition of the N footprint contains all cradle-to-grave losses of Nr. However, in this paper, we present a cradle-to-farm-gate N footprint, similar to that presented by Leip et al. (2014), i.e., excluding consumption-related losses. Further, Nr losses from energy use were approximated to zero, as it is a small contribution compared to other flows (Leach et al. 2012; Pierer et al. 2014).

In summary, we calculated the chain N losses as the sum of losses on the dairy farms and in cultivation of purchased feed crops. With the simplifications just mentioned, both terms can be calculated from the corresponding N budgets as (N loss) = (N surplus) − (net N accumulation) − (N deposition). We approximated the net N accumulation to zero. The remaining terms were then taken from the chain N budgets, calculated as explained in the previous section.

Definitions of indicators

Three different indicators were defined: (1) the farm-gate N surplus, (2) the chain N surplus, and (3) the N footprint. To put them on a common scale, we normalized them by the amount of sold milk N. Furthermore, since the farms sold a combination of milk, livestock, and crop products, we used an allocation factor \(\alpha _M\) to represent the fraction of the N surpluses and N losses attributable to the milk. An economic allocation factor could not be calculated, since the data only specified the sales of crops and livestock in terms of N flows. Instead, we allocated according to N flows, i.e., with \(\alpha _M=M{/}P\), where M is the amount of sold milk N and P is the combined N output of milk, livestock, and crop products.

To summarize, if \(S_f\) is the farm-gate N surplus, \(S_c\) the chain N surplus, and \(L_c\) the chain N losses, the three indicators were defined as:

  1. 1.

    Farm-gate N surplus per unit milk N, \(\text {FS}_\text {M} = \alpha _M S_f{/}M = S_f{/}P\).

  2. 2.

    Chain N surplus per unit milk N, \(\text {CS}_\text {M} = \alpha _M S_c{/}M = S_c{/}P\).

  3. 3.

    N footprint, \(\text {NF}_\text {M} = \alpha _M L_c{/}M = L_c{/}P\).

All three indicators are dimensionless, i.e., expressed as kg N (surplus or losses)/kg N (sold milk).

For comparison with studies using the NUE concept, it is useful to clarify its relationship to the surplus-based indicators. In mathematical terms, \(\text {NUE} = P{/}I\), where P and I are the N flows in products and inputs. By definition, the surplus is \(S=I-P\) and thus it is seen that \(\text {NUE} = \frac{1}{1 + S{/}P}\). Using also the definition \(\alpha _M = M{/}P\) leads to the following relation between the surplus-based indicators and NUE on farm-gate and chain level:

$$\begin{aligned} \text {NUE}_{\mathrm{farm}}&= 1{/}\left( 1 + \text {FS}_{\mathrm{M}}\right) , \text {and}\\ \text {NUE}_{\mathrm{chain}}&= 1{/}\left( 1 + \text {CS}_{\mathrm{M}}\right) . \end{aligned}$$

Indicators were calculated for average farms

It is important to note that the chain N surplus and the N footprint could not be calculated for individual farms, since the calculations require information about the composition of purchased feed. This information was only available as averages for the conventional and organic systems.

However, it is not necessary to calculate the indicator values for individual farms before averaging. In fact, for the purposes of this paper it is more appropriate to first calculate two average farms from the 1566 conventional and 283 organic farms, and then calculate indicator values for these average farms. The reason this is more appropriate is that it estimates the expected surplus or losses of a nation-wide random unit of milk, rather than the surplus or losses of the milk from a random farm. Conceptually, this can also be seen as making a weighted average of the individual farms’ indicator values with the farm size (N output) as weights, since, e.g., \(\text {NF}_{\mathrm{M}}=\frac{\sum _{i=1}^n L_{c,i}}{\sum _{i=1}^n P_i} = \frac{ \sum _{i=1}^n P_i \left( L_{c,i} {/} P_i \right) }{\sum _{i=1}^n P_i}\).

Confidence intervals for the indicators

To assess whether the indicators were significantly different between the organic and conventional systems, we calculated 95% confidence intervals for their differences. As explained previously, however, the statistics (the three indicators) are not calculated for each observation (each farm), but only for the sample as a whole (the average organic and conventional farms). Thus, we did not have access to multiple independent estimates of the indicator values in the two systems, and it was not possible to apply the most common types of statistical test procedures (e.g., a two-sample t test or a non-parametric rank test). However, the sampling distribution of average farms’ indicator values can still be estimated, as will now be explained. The procedure results in a test statistic equivalent to that of a two sample t test for individual N surpluses per unit milk.

As an example, consider the chain N surplus indicator. It is equal to the chain N surplus \(S_c\) divided by the milk N sales M, weighted by the allocation factor \(\alpha _M=M{/}P\), where P is the amount of N in sold milk, livestock, and crop products. In mathematical terms,

$$\begin{aligned} {\text{CS}}_{\text{M}} = \frac{S_c}{M}\frac{M}{P} = \frac{S_c}{P}. \end{aligned}$$

On average, the chain N surplus is \(S_c = S_f + k F\), where \(S_f\) is the farm-gate surplus, F is the amount of purchased feed N, and k is a constant representing the average amount of off-farm N surplus per unit of purchased feed.

As an estimate of the chain N surplus using a group of n sampled farms, we take

$$\begin{aligned} \widehat{\text{CS}}_{\text{M}} = \frac{\widehat{S}_c}{\widehat{P}} = \frac{\sum _{i=1}^n \left( S_{f,i} + k F_i \right) }{\sum _{i=1}^n P_i}, \end{aligned}$$

where \(S_{f,i}, F_i\) and \(P_i\) are the individual farms’ measured farm-gate surpluses, feed N purchases, and product N sales. When n is sufficiently large, both \(\widehat{S}_c\) and \(\widehat{P}\) are approximately normally distributed, so that \(\widehat{\text{CS}}_{\text{M}}\) is approximately distributed as a ratio of two normal distributions. In this case, it can be shown (Hayya et al. 1975) that the indicator estimate \(\widehat{\text {CS}}_{\text{M}}\) approximately follows a normal distribution with parameters that can be estimated using the values of \(S_{f,i}+kF_i\) and \(P_i\). The same line of reasoning holds for all three indicators. In summary, it is possible to calculate approximate confidence intervals for all three indicators for the average conventional and organic farms, as well as for the differences between conventional and organic. We calculated 95% confidence intervals for the indicator values and their differences between the systems to test whether they were significantly different. For a more detailed explanation, please see Online Resource 1.

Uncertainty analysis

As previously mentioned, the contributions of BNF to N budgets are rather uncertain. We have no reason to believe that the estimates are biased one way or another, but it seems plausible that future research and/or more accurate measurements on farms will change our best estimates. The uncertainty can conceptually be divided into two parts. First, the BNF model (Høgh-Jensen et al. 2004) uses the quantity of harvested shoot N. This was estimated for each field by the advisor together with the farmer. In grass/clover mixtures (the most important source of BNF here), the harvested shoot N was estimated as the product of total harvested dry matter (DM), fraction of legumes in the DM, and the N fraction of legume DM. The two first of these factors were typically not precisely measured, but roughly estimated, and thus we expect substantial errors at least in some cases. Second, the BNF model assumes, for each legume crop, a linear relationship between shoot N and below-ground fixed N. In a recent review on BNF estimation, Anglade et al. (2015) demonstrate a substantial uncertainty in this relationship, and specifically that “it must be noted that factors regulating the allocation of N to belowground parts have been poorly studied (e.g., growth conditions regulating water and N availability, genotype).” Based on these considerations, we tested the sensitivity to possible bias in the BNF estimates by calculating all indicators and their confidence intervals while varying the the BNF estimates by ±30%.

Another possible but smaller source of uncertainty lies in the estimation of off-farm crop yields and N application rates, which we took from Swedish statistics (Statistics Sweden 2014). To analyze the combined importance of all these uncertainties, we carried out a more extensive Monte Carlo uncertainty analysis which is further described in Online Resource 1.

Results

The average conventional and organic dairy farms supply themselves with N in quite different ways (see Table 1). On the average organic farm, the two main N inflows, BNF and purchased feed N, supplied about 85% of the N, whereas on the average conventional farm, mineral fertilizer and purchased feed N supplied 80%. The average organic farm purchased about 30% less feed N per unit sold milk, had 35% lower livestock density and 50% larger area of grass/clover ley per dairy cow. In the organic system, the farm-gate N surplus was 50% lower per unit area, but only about 10% lower per unit milk. In summary, the organic system used a larger area with smaller N surplus, and it produced a larger share of its feed.

The milk allocation factors, i.e., the shares of N output in milk, were about 74% both on the average organic and conventional farms, (95% confidence intervals: 70–77% for organic, 72–75% for conventional).

Fig. 2
figure 2

Comparison of farm-gate N surplus (FS\(_{\mathrm{M}}\)), chain N surplus (CS\(_{\mathrm{M}}\)), and N footprint (NF\(_{\mathrm{M}}\)) between conventional and organic milk. The error bars show 95% confidence intervals. The differences between conventional and organic are significantly positive in all cases; in other words, the conventional indicator values are significantly higher (FS\(_{\mathrm{M}}\) by about 10%, CS\(_{\mathrm{M}}\) and NF\(_{\mathrm{M}}\) by about 20%)

Figure 2 shows the three indicator values for organic and conventional milk, as well as their differences between the systems, along with 95% confidence intervals for all these quantities. The rightmost panel shows that all three indicator values are significantly higher for conventional milk than organic (\(p=0.006\) for FS\(_{\mathrm{M}}\) and \(p\le 10^{-5}\) for CS\(_{\mathrm{M}}\) and NF\(_{\mathrm{M}}\)). The difference between conventional and organic milk is more pronounced in the chain N surplus and the N footprint than in the farm-gate N surplus. This is because the average conventional farm purchased about 45% more feed N with 25% higher N surplus and 60% higher N losses per unit feed N, which is not reflected in the farm-gate indicator. The higher N surplus per unit feed is a result of the average composition of purchased feed, summarized in Table 2. Conventional purchased feed contains a large share of rapeseed meal, which has a high N surplus, whereas organic farms purchase more soy products, where our estimate was a small negative surplus (which is quite normal; see Salvagiotti et al. 2008). Table 3 gives a breakdown of how on-farm and off-farm losses and surpluses contribute to the indicator values, as well as NUE values for the farm level and chain level.

Table 3 Estimated N losses, atmospheric N deposition and N surplus on the average organic and conventional farms, expressed per unit sold milk N and weighted by the factor \(\alpha _{\mathrm{M}}\) (see Section “Definitions of indicators”)

Although the chain N surplus is similar to the N footprint, there are two important differences worth to note. First, the N footprint is lower than the chain N surplus. Second, the difference between conventional and organic is larger for the N footprint than for the chain N surplus. This is because the N footprint does not include atmospheric deposition, and in our estimates this is the only difference between the chain N surplus and the N footprint. Since the N deposition per unit milk is higher on the average organic farm, the N deposition increases the chain N surplus more for organic milk than conventional.

Fig. 3
figure 3

Spatial distribution of N surplus for conventional and organic milk, on-farm and off-farm. The width of each rectangle is the area used for production of 1 kg milk per year (weighted by the allocation factor \(\alpha _M\)), and the height equal to the average N surplus on that area. Therefore the areas of the large rectangles represent the farm N surpluses allocated to the milk (FS\(_{\mathrm{M}}\)) and the area of small and large rectangles together represent the chain N surplus allocated to the milk (CS\(_{\mathrm{M}}\))

Figure 3 illustrates how the spatial distribution of the chain N surplus differs between the organic and conventional systems. The average conventional farm has higher N surplus per unit area, both on-farm and off-farm, but the organic farm uses a larger area. This difference is not captured by any of the three indicators.

Fig. 4
figure 4

Sensitivity of the N footprint to a possible bias in the BNF estimates. The shaded bands around each line show 95% confidence intervals. If the model overestimates BNF (positive bias values), the N footprints are lower than our best estimates (Fig. 2) for both systems. Conversely, if BNF is underestimated (negative bias), the N footprints are higher than we think. The effect is stronger for organic milk since organic farms have more BNF input per unit milk, and therefore, the difference between organic and conventional N footprint is not significant at the 95% confidence level if BNF has been underestimated by 15% or more

Figure 4 illustrates the sensitivity of the results to a potential bias in the BNF estimates. The figure shows the N footprint as an example, but similar results hold for the other indicators. The crucial thing to note is that the N footprint of organic milk is more sensitive than the conventional to errors in BNF estimates, since organic production has more BNF input per unit milk. As seen in the figure, the difference between the organic and conventional N footprints is no longer statistically significant at the 95% level if the model underestimates BNF by about 15% or more. If the BNF is underestimated by 20–25% both footprints are about equal. Conversely, if the BNF is currently overestimated by 20–25%, the conventional N footprint is about 40–45% larger than the organic.

Results of a more extensive uncertainty analysis are presented in Online Resource 1. In summary, it reinforces the conclusion that the possible BNF estimation bias is the most important parameter uncertainty in the model.

Discussion

An ideal indicator is a simple number that captures a complicated problem. To do this well, it should have a clear meaning, a reliable estimation method, and be relevant to the problem at hand. The main aim of this discussion is to elaborate on these three criteria in relation to the three indicators, farm-gate N surplus, chain N surplus, and N footprint. To put our results in context, we begin with a comparison to some similar studies.

Comparison to similar studies

Without aiming for a comprehensive review, we compare our results to some of the many previous studies reporting farm-gate N surplus for dairy farms, either per unit area or per unit product (or, equivalently, as NUE or eco-efficiency).

The organic and conventional farm-gate N surpluses per unit area in the present paper (72 and 138 kg/ha/year) are not surprising given previous results. Some previous comparisons of organic versus conventional systems are 124 versus 240 kg/ha/year in Denmark (Dalgaard et al. 1998), 79 versus 145 kg/ha/year in Sweden (Cederberg and Flysjö 2004), and 104 versus 223 kg/ha/year in the Netherlands (Thomassen et al. 2008). Further examples of both organic and conventional surpluses are given, e.g., by Bleken et al. (2005), Roberts et al. (2008), Mihailescu et al. (2014). The Swedish average surplus for conventional farms reported in this paper is rather low compared to other European systems, but this is to be expected given the relatively low livestock density on Swedish conventional dairy farms. As an indicator of the production intensity, note that the average Swedish conventional dairy farm (Table 1) produced about 7000 kg milk/ha/year, which can be compared to the national averages of 13,000 kg/ha/year in the Netherlands (1998–2009) or 10,000 kg/ha/year in Flanders (2001) (Oenema et al. 2012; Nevens et al. 2006).

Compared to the wealth of studies reporting farm-gate N surplus per hectare, there are few analyses of product-based N indicators, such as surplus per unit product, NUE, or eco-efficiency. One example based on 16 organic and 14 conventional dairy farms is given by Dalgaard et al. (1998), where the farm-gate N surplus was 22 and 29 kg N/tonne milk on organic and conventional farms, respectively. Converted to dimensionless form (roughly comparable with our indicator FS\(_{\mathrm{M}}\)) this corresponds to 4.1 and 5.4 kg N surplus/kg milk N (assuming 0.54% N in milk). A number of dairy farm examples collected by Bleken et al. (2005) place dimensionless organic farm-gate surpluses around 2–4 kg N/kg product N and conventional around 2–6 kg N/kg product N. The more recent studies on chain N budgets and N footprints report dimensionless surpluses or losses for milk in the range 3.7–4.1 kg N/kg milk N in Austria and USA (Pierer et al. 2014), about 4.5–5.5 kg N/kg N in EU (Leip et al. 2014), and about 2.0 and 4.1 kg N/kg milk N on 13 Dutch and 19 Irish dairy farms (assuming 0.54% N in milk) (Mu et al. 2016). While these values are mostly a bit higher than our results for average Swedish farms, it must be noted that direct comparison may be inappropriate due to methodological differences, e.g., in system boundaries, allocations, and farm selection criteria.

Interpretation of the indicators

The farm-gate N surplus is the sum of net N accumulation and net N outflow not specified in the N budget. If the accumulation and unspecified inflows are known to be small, the farm-gate N surplus per farm area can be interpreted as a local pollution intensity. Thus, when the goal is to monitor local pollution intensity, e.g., within a farm over time, or for comparison with other farms or benchmarks, the farm-gate N surplus may strike the right balance between simplicity and usefulness.

However, expressing the N surplus per unit product says something entirely different. It suggests that the product’s total N surplus is what matters, irrespective of the area used for production. This resembles the idea that net Nr creation from unreactive \(\text{N}_{2}\) is what ultimately drives N-related environmental problems (Galloway et al. 2003; de Vries et al. 2013; Steffen et al. 2015). But as previously noted, the N surplus contains atmospheric deposition, a flow of Nr which was previously caused by some other process. So, if the goal is an indicator for net Nr pollution of a product, any indicator based on N surplus is inappropriate because it leads to double-counting. This motivates the existence of the N footprint, which measures net Nr releases to the environment, irrespective of time, location, and area intensity.

In summary, the definitions of the three indicators show that they must be interpreted differently. Although the product-based N balances may seem like straightforward variations of the area-based farm-gate N surplus, it is not obvious how to interpret them. In comparison, both the area-based N surplus and the N footprint say something concrete.

Uncertainties within and between organic and conventional milk

This analysis produces rather narrow confidence intervals for the indicator values and their differences between the organic and conventional systems. For all three indicators, the 95% confidence intervals for conventional milk are about ±2% of the central estimates, and for organic milk about ±7% (see Fig. 2). Therefore the differences of 10–20% between conventional and organic indicator values could be established with high statistical certainty.

However, it must be recognized that these claims rely on the correctness of several parameter values that are in fact uncertain. The most important parameter uncertainty we identified is the potential bias in BNF estimates. For example, if the BNF estimates are unbiased, the conventional N footprint is about 20% higher than the organic. But when accounting for a possible bias, another picture emerges (see Fig. 4), and it seems conceivable, if unlikely, that the conventional footprint is 40% higher, or maybe not higher at all.

In summary, we have discussed two main sources of uncertainty in the results. The first is the uncertainty due to the finite sample size (quantified with confidence intervals) and the second is the uncertainty propagating from uncertain parameters (see Fig. 4 and Online Resource 1). We have demonstrated that this latter type of uncertainty is of comparable magnitude as the former. In other words, the narrow confidence intervals may give a false sense of precision. This highlights the importance of conducting sensitivity analysis on BNF estimates in N budgets.

Environmental relevance of the indicators

If an indicator is intended to help reduce environmental impacts, it is important that it points in the right direction, i.e., that the products or production systems with lowest impact also have the lowest indicator values. This is not necessarily the case for the three indicators considered here. For example, the dynamics of nitrate \(({{\text{NO}}_{3}^-})\) water pollution and nitrous oxide \(({\text{N}}_{2}{\text{O}})\) emissions in aquatic ecosystems are influenced by climate, hydrology, and ecosystems functioning (Grizzetti et al. 2015), which vary across space and time. Thus, it is likely that two products with the same N surplus or N footprint, if produced in different locations or in different ways, have different environmental impacts. The indicator values would suggest that the products are environmentally equivalent, although in reality they are not. This type of inconsistency is to be expected of the N footprint, because like other footprint indicators, it does not address a single impact category, but rather a societal “area of concern” (Ridoutt et al. 2015). The same argument holds also for indicators based on N surplus. The fundamental problem, both for the N surplus and the N footprint, is that they aggregate N flows over time and space, and of different chemical forms. Thus, they hide many complexities with environmental relevance, which means that they can be misleading for decision-makers.

Another methodological concern is the handling of co-products, i.e., how N surpluses or N losses are attributed in multi-product systems. As established by numerous LCA studies, different methods give different results (see, e.g., Cederberg and Stadig 2003; de Vries and de Boer 2010; Dalgaard et al. 2014; Mackenzie et al. 2016), and there is no consensus on which method is most appropriate. To our knowledge, the importance of co-product handling in N indicators is not much investigated, but such efforts would be valuable.

In summary, the three indicators may be misleading in principle, but it is an empirical question how misleading they are in practice. Interesting questions for future research are: How do environmental impacts correlate with N surpluses and N footprints in the real world? When are the indicators misleading, and when are they not?

Conclusions

This paper explores the N flows on organic and conventional dairy farms in Sweden, and their relation to the three indicators (1) farm-gate N surplus, (2) chain N surplus, and (3) N footprint. The different information given by these indicators can be traced to two differences in their definitions. First, the chain N surplus and the N footprint have a wider system boundary than the farm-gate N surplus. Second, the N footprint is based on net N losses, which is conceptually different from the N surplus.

We conclude that, compared to indicators based on N surplus, the N footprint is a more understandable indicator for the N pollution associated with a product. However, the N footprint is not a replacement for the often used farm-gate N surplus per unit area, since the two indicators provide different information.

Despite the large dataset, there is substantial uncertainty in the indicator values, of which a large part is due to the possible bias in estimates of BNF. Hence, although the best estimate is that Swedish conventional milk has 10–20% higher indicator values than organic, it is conceivable that improved estimates of BNF will change that conclusion. These findings highlight the importance of conducting sensitivity analysis on BNF estimates in N budgets.

All three indicators aggregate N flows over time and space, and of different chemical forms. Thus, they hide many complexities with environmental relevance, which means that they can be misleading for decision-makers. This motivates further research on the relation between N surpluses and N footprints, and actual environmental damages.

Data availability statement

The farm-level data that support the findings of this study are available from the Swedish Board of Agriculture (Focus on Nutrients project) but restrictions apply to the availability of these data, and so are not publicly available. The authors of this article do not have permission to redistribute the data, and therefore recommend that the data owner is contacted to access the data. Other data that support the findings of this study (agricultural statistics from Eurostat and Statistics Sweden) are publicly available as referenced in the article.