Subnational nutrient budgets to monitor environmental risks in EU agriculture: calculating phosphorus budgets for 243 EU28 regions using public data

This paper presents a method to estimate soil surface phosphorus (P) budgets for 243 subnational regions in EU28. This is about the maximum spatial resolution that can be achieved mainly using international datasets that are regularly updated. Similar subnational budgets could be established for nitrogen (N) with some additions to this method. Increasing the spatial resolution from national to subnational is one way to address the well-known issue that national nutrient budgets sometimes mask considerable heterogeneity, i.e., regional surpluses and deficits that are not seen in national averages. Our results indeed show how a rich structure of different P budgets emerges when moving from national to subnational level. Another approach is to exclude the most extensively managed areas from the budgets, to better represent the surplus in intensive agriculture areas. Here, we show that both approaches are useful and sometimes important as they can affect P surplus estimates by about 10 kg P ha− 1 y− 1 or more. The choice of spatial resolution is a trade-off between accuracy and precision. National budgets are the most accurate thanks to good data coverage, but they sometimes fail to identify considerable P surpluses and deficits at subnational level. Increasing the precision (spatial resolution) gradually reveals this heterogeneity but comes at the cost of growing data gaps, which we discuss in detail. These subnational P surpluses represent a middle ground which may prove useful as one indicator among others to monitor the development of environmental risks and resource problems over time.


Introduction
Nutrient budgets are commonly used to monitor nutrient use efficiency and environmental pollution risks (Oenema et al. 2003) at spatial scales ranging from individual fields (van Leeuwen et al. 2019) via farms (Quemada et al. 2020) and farming systems (Godinot et al. 2014) up to subnational (Le Noë et al. 2017) and national and continental regions (Eurostat 2013;Lassaletta et al. 2014;Garnier et al. 2015). In particular, nitrogen (N) and phosphorus (P) budgets receive most attention because these nutrients are pivotal for agricultural productivity but also cause environmental problems (Sutton et al. 2011;Bennett and Schipanski 2013) and, in the case of P, because mineral P resources are limited and a majority of the reserves are concentrated in a handful of countries (Bennett and Schipanski 2013;Walan et al. 2014).
A common indicator of these environmental and resource problems is the nutrient surplus (or nutrient balance), defined as the difference between inputs and outputs in a nutrient budget, so that the surplus equals losses to the environment plus net soil nutrient accumulation (Oenema et al. 2003). As an environmental indicator the surplus is less sophisticated than some other alternatives (Schröder et al. 2003;Langeveld et al. 2007), but there is nevertheless a continued interest in N and P budgets and derived indicators such as the nutrient surplus, probably because the necessary data are available for a wide range of systems and because the surplus after all is judged as a good-enough indicator for some purposes. For example, OECD and the EU use national N and P surpluses to monitor environmental performance in their member countries (European Commission 2006;Eurostat 2013;OECD 2013).
However, like any quantitative indicator, the nutrient surplus comes with a few caveats, two of which we will only mention before looking closer at a third. The first is a warning against the simple but potentially deceiving interpretation that a positive surplus implies environmental pollution while a negative surplus (a deficit) implies depletion of soil nutrients. A positive surplus can be explained by soil accumulation and thus not necessarily imply environmental pollution. And a negative surplus, although it does imply soil stock depletion, does not preclude simultaneous losses to the environment. Further, it is worth to note that typical annual surpluses of maybe -10 to ? 30 kg P ha -1 y -1 are small compared to total soil P stocks, which in European natural soils average some 3000-5000 kg P ha -1 (Yang et al. 2013). In agricultural topsoils (upper 20 cm) in Europe, typically some 50-150 kg P ha -1 are extractable using the Olsen P method (Tóth et al. 2014). The conclusion is that a surplus or deficit is neither sufficient nor necessary for environmental or resource problems and therefore it is useful to complement with other information on soil nutrient status and environmental losses when such information is available (Eurostat 2013;Ö zbek and Leip 2015). Second, there is no simple relationship between nutrient losses to the environment and resulting impacts on the environment and human well-being. These impacts are multi-dimensional and depend on where, when, and in what form nutrient emissions occur, and therefore the surplus is at best an indicator of potential impacts (Azevedo et al. 2013;Henryson et al. 2017;Einarsson and Cederberg 2019).
The third caveat, which is the main concern of this paper, has to do with spatial scale and more generally with system boundaries. Since a nutrient budget merely states what crosses its system boundary, it can mask heterogeneity within that system boundary and thus fail to identify problematic surpluses and deficits. By increasing the spatial resolution, for example, it could be revealed that one half of a system has a large surplus while the other has a deficit, although the system as a whole has a surplus close to zero. The near-zero surplus of the whole system would not be representative for either part. This type of scale dependence can be seen going from field to farm scale (van Leeuwen et al. 2019), from subnational to national scale (Grizzetti et al. 2007;Le Noë et al. 2017), and from river basin or national scale to continental scale (Garnier et al. 2015). More abstractly, the same issue arises for example in the choice to draw the system boundary strictly around a livestock farm or extend it to other farms where feed is grown; the surplus will typically differ between the extended system and its parts (Godinot et al. 2014;Mu et al. 2016;Einarsson et al. 2018). Neither system boundary is ''wrong'' as such, but different alternatives provide different information. The common theme here is that spatial and conceptual system boundaries determine our chances to identify environmental problems using nutrient budgets.
In this paper, we consider the step from national to subnational nutrient budgets in the EU. Eurostat publishes an annual time series of national N and P budgets according to the Eurostat/OECD handbook on nutrient budgets (Eurostat 2013). While these budgets do contain useful information about nutrient flows in different countries over time (in many cases back to 1990), it is also known from several subnational budgets in the EU that the national averages mask a considerable heterogeneity. For example, Grizzetti et al. (2007) made N and P budgets for the EU15 countries for the year 2000 with 10 km 2 resolution, Leip et al. (2011) made N budgets for EU27 for the year 2002 with 1 km 2 resolution, Hong et al. (2012Hong et al. ( , 2017 applied the NANI/NAPI budget approach to 53 subnational regions in the Baltic Sea catchment, and Le Noë et al. (2017Noë et al. ( , 2018a have investigated differences in N and P budgets in France between national, subnational and farm scale. These subnational budgets all demonstrate that additional heterogeneity is often found when increasing the spatial resolution from national to subnational level. Here, we establish soil surface P budgets for 243 regions in the EU28 (NUTS2 level in Eurostat's nomenclature) for the year 2013. To our knowledge it is the most updated comparison of national and subnational P budgets in the EU. As far as possible, we work with public, international datasets (mainly Eurostat) that are continually updated, in order to test how well these datasets support the calculation of subnational P budgets not only for 2013 but also in the future. In addition, we take a closer look at the most important data gaps (permanent grassland yields, subnational mineral P fertilizer rates, and manure transports) and show how the choice of reference area (an example of a conceptual system boundary) may have different effects on the per-hectare surplus in different regions. In summary, we aim to provide an updated comparison of national and subnational P surpluses in the EU, as well as to identify and highlight some key points where appropriate data and methods can improve the accuracy and relevance of subnational nutrient budgets.

Method
This paper introduces a method to calculate P budgets for agricultural land at subnational resolution in EU28. We follow the soil surface approach (Oenema et al. 2003) which accounts for the main P flows crossing the soil surface layer, i.e., inputs from mineral fertilizers and manure as well as outputs from crop harvests. The surplus in a region is calculated as surplus = mineral fertilizer input ? manure input -crop output. Our methods and data sources to estimate these three terms are outlined in Table 1 and described in detail in separate sections below. A few P flows were excluded because they are small and/ or data are scarce. These are listed and further discussed in a separate section below.
Most of the data were retrieved from Eurostat, including the annual fertilizer use statistics (Eurostat 2019a, b), the annual crop statistics (Eurostat 2019c), and the triennial Farm Structure Survey (FSS) (Eurostat 2017). We primarily used data for the year 2013 since this was the most recent year with complete data from the FSS, and because the other data sources were also mostly available for 2013. The Eurostat data are reported on different geographical levels of Eurostat's classification system known as NUTS (Nomenclature of Territorial Units for Statistics), where the member states (NUTS0 level) are subdivided into NUTS1, 2 and 3 regions at increasing levels of spatial resolution. We used NUTS2 data in all EU28 countries except Germany where livestock populations from the FSS are only reported at NUTS1 level. This selection resulted in a total of 243 regions in the EU28.
Full source code for the calculations is archived in the Zenodo repository (Einarsson 2020, https://doi. org/10.5281/zenodo.3610358). The output data (the budgets and reference areas) are found in Online Resource 1. Some further details about the method are found in Online Resource 2.

Reference area
The P budget is calculated for a given reference area and ideally includes all inputs and outputs on that area. There are several possible choices for reference area. We primarily chose the Utilised Agricultural Area (UAA), i.e., the sum of cropland and permanent grassland because this is the default option in Eurostat's nutrient budgets (Eurostat 2013(Eurostat , 2019a. However, the UAA is not necessarily the most relevant reference area since it may include extensively managed areas where the annual surplus is close to zero (Eurostat 2013(Eurostat , 2019a, resulting in an average surplus for the UAA that is not representative of the more intensive areas (see Introduction above). If the aim is to reflect the typical nutrient surplus of intensive agriculture, very extensive areas and their associated P flows should ideally be excluded from the P budget (Eurostat 2019a).
For example, there are some subnational regions in Austria, Croatia, Greece, Italy, Spain, and the United Kingdom where rough grazing (Eurostat crop code J2000) and grasslands not in use (J3000) cover more than 50% of the UAA. Mineral P inputs could be assumed to be zero on these areas. However, the manure P inputs are not zero on rough grazing areas and thus would have to be subtracted from the regional manure totals. Due to lacking data on grazing management (e.g., livestock density and the grazing period's length), we were unable to exclude the P inputs from grazing animals and therefore kept the rough grazing areas in the main results. It would be straightforward to exclude the grasslands not in production (J3000), but since they cover less than 4% of the UAA in all but a few subnational regions, we opted for comparability to other studies and used the UAA as our primary choice of reference area. In addition, we calculated all the results for the smaller reference area (UAA-J2000-J3000), excluding the harvested/grazed yield on this land (see below), but without changing manure or mineral fertilizer inputs. Thus, this alternative calculation likely overestimates the effect of choosing the smaller reference area, and the comparison should be seen as an upper bound on the effect of changing reference area.

Crop P harvests
The crop P harvest in each region was calculated from crop harvests multiplied by crop P contents for 17 crop categories, listed in Table 2. The selection of categories was a trade-off between the following aims: (1) to maximize the availability of crop data, (2) to cover almost all the UAA in the EU28, and (3) to have crop categories with the same P content within each category. This last aim is optimally achieved by going to the most disaggregated level in Eurostat's extensive multi-level crop hierarchy (Eurostat 2019d), i.e., by distinguishing as many crops as possible. However, the two first aims are optimally achieved at a more aggregated level since data coverage is higher for more aggregated crops (e.g., there are typically more data on the total of ''Wheat and spelt'' than on its subcategories winter wheat, spring wheat, etc.). The crop data after some gap-filling (see below) accounted for more than 80% of the UAA in all countries except the Netherlands (79% of the UAA covered), Latvia (78%), Italy (76%), and Malta (68%). In each subnational region, we assumed that the remaining unaccounted area had an average P yield equal to the average P yield of the reported productive crops (excluding fallow and unused permanent grassland). Thus, the calculation produces an estimate of total P harvest that covers the total reference area. We scrutinized and gap-filled the crop data to ensure a high data quality and sufficient coverage as follows. Missing areas and harvests were gap-filled using the sum of its sub-regions or sub-crops if available. After this step, both harvests and areas were available for a majority of crops in most regions. In some cases subnational areas were known but harvests missing, since the areas were taken from the more detailed FSS. In these cases we estimated harvests using crop yields from the closest larger region (i.e., for a NUTS2 region we checked first for a value in the NUTS1 region, and if that was missing we used the national crop yield). We also manually corrected and removed a few outlier yield values as detailed in the source code repository.
Since Eurostat does not report yields of permanent grasslands, we used yield estimates for permanent grassland established by Smit et al. (2008) for 12 environmental zones in Europe (Metzger et al. 2005). For subnational regions overlapping multiple environmental zones, we calculated an average yield using the area fractions of the environmental zones in the region as weights (see source code for full details). These yields were assumed for all permanent grassland in use (Eurostat crop codes J1000 and J2000).

Manure P inputs
We estimated manure P inputs by multiplying livestock populations from the FSS with country-specific P excretion factors reported by EU countries for use in the Eurostat/OECD nutrient budgets (Eurostat 2014;Velthof 2014). Most countries derive excretion factors from mass balances: the P excretion equals feed P input minus retained P in products such as meat, milk, and eggs (Oenema et al. 2013). Other methods for deriving excretion factors include direct manure measurements, literature review, and expert judgements, but these methods are less common among the EU member states (Oenema et al. 2013). For countries that do not report P excretion factors, Eurostat and OECD use estimates based on N:P ratios from available countries, provided the N excretion factor is known, and otherwise P excretion coefficients are For the remaining 13 EU28 countries, we estimated the subnational distribution using data from the Farm Accountancy Data Network (European Commission 2014; FADN 2019). The EU28 member states increasingly report P fertilizer use in the FADN data starting in 2014 and reaching full coverage in 2017. We calculated the average 2014-2017 within-country distribution of P fertilizer use, rescaled to agree with 2013 national totals according to Eurostat. This calculation has a couple of intricacies worth mentioning. First, the FADN uses a different subnational regional division than Eurostat's NUTS system; the FADN regions mostly but not always coincide with NUTS regions. Therefore we calculated the area overlap between each NUTS region (Eurostat 2012) and its intersecting FADN region(s) (FADN 2015), and distributed the FADN fertilizer quantities in proportion to this overlap (see the source code for details). Second, comparing FADN's subnational distribution of mineral P fertilizers to Eurostat's for the 15 countries where subnational data are available, we noted that the two datasets disagree considerably (some 5-10 kg P ha -1 y -1 ) in a few cases, although the two datasets broadly agree on the subnational distribution. Eurostat's distribution at NUTS2 level looks more heterogeneous, which is not surprising given that the NUTS2 regions are usually smaller than the FADN regions. We used Eurostat's subnational data where available since they have higher spatial resolution and were collected specifically for the purpose of measuring regional mineral P use.
A final issue is that there are multiple and sometimes substantially different estimates of national total P fertilizer use. Eurostat publishes fertilizer use data from the member states (Eurostat 2019b) in addition to national sales data from Fertilizers Europe (Eurostat 2019e). For about half of the countries, the two datasets agree within some 10%, while in a few cases the disagreement is about 20% and in extreme cases a factor two or more. The reasons for these differences are not clear. We noted, however, that the mineral P use stated in Eurostat's national nutrient budgets (Eurostat 2019a) almost always agrees exactly with one of the two datasets, and so we chose to rescale all the subnational distributions to match the totals stated in Eurostat's national nutrient budgets (see Online Resource 2 for details).

Excluded flows
Due to limited data availability, we excluded some P flows from the budgets. Some of these are known to be relatively minor (seed inputs, crop residue removal, atmospheric P deposition) while others are hard to judge and can sometimes have a substantial effect on the results (manure trading within or between countries, manure withdrawals for processing, and application of sewage sludge). Below, we discuss the implications of excluding these flows from the soil P budget.
Manure is sometimes transported from livestock intensive regions to comply with limits on manure N and P application, such as the Nitrates Directive's limit on manure N application, and various national limits on P application (Amery and Schoumans 2014). According to Eurostat (2011), manure trading probably only has a large effect in a few regions with high livestock density such as the Netherlands, Brittany in France, Flanders in Belgium, and the Po valley in Italy. Most extreme is probably the Netherlands, where net manure P exports between 2006 and 2015 increased from about 8 to 18% of manure production, corresponding to 3-8 kg P ha -1 y -1 (van Grinsven and Bleeker 2017; Eurostat 2019a). However, the data reported by Eurostat only cover a few countries at national level (Eurostat 2019a). Therefore, we excluded manure trading in our P budgets but took extra care to interpret the results in light of the few data points available. The implications of excluding manure trade are further discussed in the ''Discussion'' section below.
Further, Eurostat's nutrient budgets also account for net removal of P with manure withdrawals for processing. Manure withdrawals do not always imply net removal of P: for example in biogas production the manure P is preserved in the digester effluent (Eurostat 2011) and likely reapplied on agricultural land. Other forms of manure withdrawals causing a net removal of P are possible but data are scarce even at national level (Eurostat 2013). Considering the limited data, we excluded manure withdrawals from the P budgets.
Likewise, data on agricultural use of sewage sludge in the EU are incomplete and Eurostat only gives information on dry matter application (Eurostat 2019f) which makes the corresponding P input uncertain. We chose to exclude sewage sludge inputs from the budgets. According to van Dijk et al. (2016, Supplementary material S15) EU27 sewage sludge P application in 2005 typically varied between 0 and 1 kg P ha -1 , with only six member states between 1 and 2 kg P ha -1 .
Finally, we excluded a few flows for which subnational data are lacking but that are fairly well known to be small at national level. First, inputs in seeds and planting materials are included in Eurostat's national budgets and typically account for about 0.2 kg P ha -1 y -1 (Eurostat 2019a). We excluded this flow because it is relatively small and subnational data are lacking. Second, crop residues, mainly straw, are harvested in large quantities for fodder, livestock bedding material, and to a lesser extent for energy purposes (Scarlat et al. 2010). However, the P removed for fodder and bedding material is mostly reapplied to agricultural soils via manure and litter, so the net removal is small. On the other hand, crop residues used for energy purposes, e.g., heat and power production, can have a greater effect on the P budget. Denmark, for example, incinerates about 2 million tonnes of straw annually, or a quarter of the straw production (Skøtt 2011). Although some of the P is returned to agricultural land in ash, it may be geographically redistributed among NUTS2 regions after incineration in central power plants (Skøtt 2011). However, assuming a straw P content of 0.1%, this removal or reallocation is less than 1 kg P ha -1 y -1 in Denmark, Europe's most intensive user of crop residues for energy, and so the exclusion of crop residue removal from agriculture is likely a small error in most countries. Third and last, atmospheric deposition of P is likely around 0.3 kg P ha -1 y -1 (Tipping et al. 2014) and was excluded, as also suggested by the Eurostat/OECD nutrient budget handbook (Eurostat 2013).

Comparison to other budgets
As a consistency check we compared our national surpluses to those by Eurostat (2019a) and by van Dijk et al. (2016). Our results are roughly comparable to these despite some differences. The most important difference to Eurostat's budgets is that theirs includes some data on manure trade as noted above. The surplus calculated by van Dijk et al. (2016) is a soil surface surplus, however including a few minor flows that our analysis lacks, such as sewage sludge, compost, and P contained in pesticides. Although the surplus estimates by van Dijk et al. concern year 2005, they should be roughly comparable to ours considering that Eurostat's budgets have not changed dramatically between 2005 and 2013 (Eurostat 2019a). If any of the three estimates deviates strongly from the others, it suggests the presence of important differences in data and/or methods which would be worth to investigate further. Figure 1 shows the three terms of the budgets (crop harvests, manure P inputs, and mineral P inputs) expressed per hectare UAA. This illustrates (1) a wide variation in overall P turnover on agricultural land, and (2) specialization into crop or livestock farming. Some regions, especially on the most fertile plains, are dominated by specialized crop production and relatively more supported by mineral P inputs, while areas with more livestock have little or no mineral P inputs. Figure 2 shows the P surplus calculated with the two different reference areas (UAA, and UAA minus the two extensive grassland categories as described in the ''Method'' section). Using the smaller reference area had a large effect on the surplus estimates in the minority of the regions that have large shares of extensive grassland, primarily parts of Austria, Greece, Croatia, Italy, Spain, and the UK. In total, the P balance of 40 NUTS2 regions increased by more than 5 kg P ha -1 y -1 with the alternative reference area (see Online Resource 2 for a map). The largest difference occurred in the Scottish Highlands and Islands, where the extensive grassland categories J2000 and J3000 cover 84% of the total UAA.

Results
As expected, subnational surpluses can be quite different from national averages. For example, the national surplus in France is only 2 kg P ha -1 y -1 , while subnational surpluses range from -10 kg P ha -1 y -1 in Picardy in the Paris basin to 25 kg P ha -1 y -1 in Brittany. Similarly, high surpluses deviating substantially from national averages are found in Catalonia, Valencia, and Murcia in Spain, the Po valley in northern Italy, Noord-Brabant in the Netherlands, and Flanders in Belgium.
Comparing the surplus (Fig. 2) to the three budget terms (Fig. 1), it is seen that the most extreme surpluses occur in livestock-rich regions such as Belgium, the Netherlands, and eastern Spain. Some of these extreme surpluses are overestimated since our model does not account for manure trade between regions; for example, Limburg and Noord-Brabant in the Netherlands in 2013 exported about 13,000 t of manure P to neighboring regions or countries (van Grinsven and Bleeker 2017), corresponding to 50% of the excreted manure or about 30 kg P ha -1 , which implies a surplus of 25-30 kg P ha -1 y -1 rather than 55-60 kg P ha -1 y -1 as our results indicate. However, substantial surpluses of 10-20 kg P ha -1 y -1 occur also in some regions with little manure input, for example in parts of Poland and in parts of the Mediterranean region. Negative surpluses (deficits) below -5 kg P ha -1 y -1 are found mainly in Bulgaria, eastern Denmark, eastern UK, and parts of Germany and France. Most of the P harvest in the deficit regions is in cereals, which covers roughly half of the UAA in these regions. Some of these deficit regions also have relatively large areas of oilseed and sugar beet cultivation (ca. 20% of UAA in the Paris basin), and sunflower seed (ca. 25% of UAA in Bulgaria). Figure 3 further illustrates the joint variation in harvests, manure and mineral inputs, and surpluses. This reveals a rich structure of different types of regions. The vast majority of the agricultural area receives a mix of mineral and manure inputs, with total inputs around 10-20 kg P ha -1 y -1 and surpluses around 0-10 kg P ha -1 y -1 ; the most extreme surplus regions account only for a very small part of the agricultural area. Figure 3 also shows that deficits occur both where harvests are moderate and where harvests are very high. Notably there is an area of about 18 Mha with P deficits despite inputs above 15 kg P ha -1 y -1 (the rightmost yellow markers in the lower panel of Fig. 3). This is almost exclusively the highly productive plains of France, Germany, and the UK.  The three terms of the P budget, using UAA as reference area broadly similar between the three studies, although there are a few large discrepancies. There are several reasons for these discrepancies, including at least the following. First, as we did not account for manure trading, our P surpluses for the Netherlands and Belgium are higher than Eurostat's; their data on net manure withdrawals, which include net exports and withdrawals for non-agricultural use and processing (Eurostat 2019a), corresponded to a surplus reduction of 10 and 5 kg P ha -1 y -1 for the Netherlands and Belgium, respectively. Second, for Italy we identified a probable reporting error in Eurostat's P budget, resulting in a substantially higher crop P output compared to our estimate and thus a P deficit (see Online Resource 2). Third, for Spain and the UK, we noted substantial differences in reference area between our study and Eurostat's, contributing to a lower surplus in Spain and a higher surplus in UK in Eurostat's indicator (Eurostat 2019a). These differences arise as Eurostat's budgets do not use the UAA from the FSS as reference area for Spain and the UK.
Eurostat's reference area is 36% larger than the FSS UAA for Spain and 29% smaller for the UK.

Discussion
This paper demonstrates a method to calculate soil surface P budgets for 243 subnational regions in EU28. The work has generated several important lessons concerning method choices and data sources. We first discuss the choices of reference area and spatial resolution, and how these influence the relevance of the surplus as an indicator. We then move on to some more specific issues concerning data and data gaps.
Reference areas, spatial resolution, and the relevance of the surplus As explained in the ''Introduction'' section, a nutrient budget for a heterogeneous mix of systems may fail to Fig. 2 Comparison of P surpluses in EU28, calculated at subnational level (top panels) and national level (bottom panels), and with the whole UAA as reference area (left hand side), and excluding extensive grasslands (right hand side). See main text for details identify large nutrient surpluses or deficits because different subsystems are averaged and variations are thus smoothed out. As an environmental indicator, the nutrient surplus is then a false negative, raising no concern although it ideally should. This paper explores two different ways to correct for this problem.
The first way is the one proposed for Eurostat's national nutrient budgets, namely to ''relate to the potential fertilised area, excluding very extensive unfertilised areas, to make comparisons [...] useful and to identify the potential risks of agricultural production to the environment'' (Eurostat 2019a). However, this aim is not yet reached: ''Extensive areas to be excluded from the balance have not been defined yet. Some countries however have identified and excluded certain extensive areas in their balance estimations'' (Eurostat 2019a). As a sensitivity analysis, we therefore tried excluding Eurostat's crop categories for rough grazing and unused grassland (Figs. 2, 4) and noted that it does affect the surplus estimates heavily in some regions dominated by extensive grassland. This simple calculation likely overestimates the effect size, since it excludes the extensive grassland's harvest but includes all manure and mineral fertilizer, and therefore the difference is to be interpreted an The horizontal axis is the same for both panels, showing total P inputs (manure ? mineral fertilizer) per hectare UAA. The two panels together demonstrate a considerable heterogeneity in P budgets. See main text for details Fig. 4 Comparison of national P surpluses on. ''Whole UAA'' and ''Excl. extensive grass'' refer to the two different reference areas in this study. Our surplus estimates using UAA as reference area generally agree best with Eurostat's, except for the UK where excluding extensive grassland gives a much closer match upper bound on the effect of excluding extensive grasslands.
The second way to correct for the averaging problem is to increase the spatial resolution. Increasing the resolution from national to subnational level decreases the amount of heterogeneity in each region and instead increases the differences between regions, as seen in Fig. 2.
Here, we demonstrate that both approaches can affect surplus estimates considerably. As seen in Figs. 2 and 4, both approaches, alone or in combination, sometimes change surplus estimates by 10 kg P ha -1 y -1 or more. As our model lacks data on manure trade, it exaggerates the surplus heterogeneity between some regions, and it should not be used when it is important to avoid false positives, i.e., warnings of surpluses or deficits that on closer inspection do not exist. However, if the risk of a few false positives can be tolerated, the subnational budgets are likely a more sensitive indicator of trends in agricultural P turnover.
In principle, the spatial resolution could be increased further to reveal even more heterogeneity. The logical endpoint would be to make separate budgets for each farm, each field, or even finer subdivisions, until eventually all heterogeneity would be fully represented. However, this is obviously not possible with the available data. The highest-resolution N and P budgets (e.g., Grizzetti et al. 2007;Leip et al. 2011;Britz and Witzke 2014) depend to a greater extent on assumptions and downscaling methods that introduce additional uncertainty, a typical case of the trade-off between accuracy and precision. Is it worth to further increase the spatial resolution below NUTS2 level? This is partly an empirical question (Le Noë et al. 2018b;van Leeuwen et al. 2019), but partly also a question of what use is intended for the nutrient budget.
Interestingly, it is also relevant to map P flows on a coarser scale than the national. International trade of agricultural products and fertilizers moves large P quantities between countries and even continents. For example, South America and South Asia mainly import mineral P fertilizers, while Western Europe imports a considerable share of its animal feed (Schipanski and Bennett 2012;Nesme et al. 2018). These global trade patterns are directly linked to P flows at the finer spatial scale; for example, some heavily feed importing countries in Western Europe are clearly oversupplied with P and therefore require systems for P redistribution and recycling to minimize environmental impacts and resource waste. Indeed, the agricultural specialization demonstrated here at subnational level is permitted, and partly caused, by international trade (Nesme et al. 2018). This means it is also important to understand driving factors at a higher level while considering measures for more sustainable P management at the local level.

Data availability and quality
There are considerable uncertainties in P budgets even at national scale due to gaps and inconsistencies in data. Important examples (see ''Method'' section) include permanent grassland yields, P excretion coefficients for livestock, national use of mineral P fertilizers, and international manure trade. Some indication of these uncertainties is found comparing our results at national level to two other state-of-the art estimates (van Dijk et al. 2016;Eurostat 2019a). While the three estimates broadly agree, there are a handful of large discrepancies, likely explained by different methods to estimate flows for which data are scarce. To make the different budget calculations more comparable, a substantial effort would be needed in harmonising definitions and input data. A good starting point for further inquiry would be to look closer at those countries where the three estimates clearly disagree. We have highlighted a few differences in the ''Results'' section above, but a detailed comparison of the three budget estimates is outside the scope of this paper.
The data coverage only gets worse moving to subnational resolution. For crop areas and harvests, the data gaps were moderate and were well addressed by our gap-filling procedure. However, subnational data coverage was much worse concerning mineral P fertilizers (where Eurostat's subnational statistics only covered 15 of 28 member states) and manure trade (where very little data was available even at national level). Our subnational estimate of mineral P fertilizer use, combining the Eurostat and FADN databases, is probably hard to improve without searching national statistical databases for more subnational data. Concerning manure trade, an improvement to our model would be possible by consulting various national data sources and experts, but this would be a substantial effort and it falls outside the scope of this paper which aims to explore how well subnational budgets can be estimated from public, international datasets that are continually updated. Thus, we exclude manure trade with the caveat that it has a large effect on subnational budgets in the most livestock-dense regions (where surpluses are overestimated) and their neighboring regions (which likely import manure and where surpluses are likely underestimated).
Finally, we highlight the well-known lack of reliable and comparable data on permanent grassland management and productivity. Given the large areas of permanent grassland throughout EU agriculture it is a continuing source of uncertainty at national and subnational levels alike.
Estimating N budgets using a similar method Most of the methods and data sources used here could also be used to establish subnational N budgets for EU agriculture, although there are a few more components in N budgets, including biological N fixation, gaseous losses of ammonia from animal houses and manure storage, and inputs of atmospheric N deposition. Soil surface N budgets would also not be directly comparable to Eurostat's national N budgets, which are socalled Gross Nitrogen Budgets (Eurostat 2013) with a different system boundary that includes the whole agricultural system (most importantly, the Gross Nitrogen Surplus includes gaseous N emissions from livestock systems). However, there are established methods to estimate all these flows and thus it seems straightforward to establish soil surface N budgets and/or Gross N Budgets at the same subnational resolution used here.

Conclusions
This paper presents a method to estimate soil surface P budgets for 243 subnational regions in EU28. This is about the maximum spatial resolution that can be achieved mainly using international datasets such as the Eurostat and FADN databases.
Increasing the spatial resolution from national to subnational is one way to address the problem that national nutrient budgets sometimes mask considerable heterogeneity, i.e., regional surpluses and deficits that are not seen in national averages. Another approach is to choose a different reference area, excluding extensively managed grasslands, to better represent the surplus in intensive agriculture areas. Here, we show that both approaches are useful and sometimes important as they can affect P surpluses by about 10 kg P ha -1 y -1 or more (Figs. 2, 4).
There are considerable data gaps and inconsistencies, even for national budgets (Fig. 4). For example, there is little data on the manure traded between regions or countries, and this probably introduces a large bias in the surplus estimates for some subnational regions and even countries. Further, permanent grassland yields are not routinely measured and we relied on the estimate by Smit et al. (2008). Finally, data on mineral P inputs are sometimes inconsistent even at national level (see ''Method'' section), and Eurostat's subnational statistics cover only 15 of the EU28 countries. We therefore devised a novel approach to combine subnational distributions according to Eurostat (15 countries) and FADN (13 countries), all rescaled to agree with Eurostat's national P budgets.
Similar subnational budgets could be established for N with some additions to this method. The same data limitations would largely apply, but given the considerable heterogeneity revealed within countries by these P budgets it seems worthwhile to establish subnational N budgets in a similar fashion.
In the end, how far to increase the spatial resolution is a trade-off between accuracy and precision. The most accurate budgets are arguably the national ones, but they sometimes fail to identify considerable P surpluses and deficits within countries. Increasing the precision (spatial resolution) gradually reveals this heterogeneity but comes at the cost of growing data gaps. These subnational budgets represent a middle ground between national budgets (Eurostat 2013;van Dijk et al. 2016) and the most disaggregated budgets (Grizzetti et al. 2007;Leip et al. 2011;Britz and Witzke 2014); and as our subnational budgets primarily use data from international databases that are regularly updated they may prove useful as one indicator among others to monitor the development of environmental risks and resource problems over time.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.