1 Introduction

1.1 A necessary transition

The need to operate a transition towards sustainable agriculture and food systems is widely acknowledged (Foley et al. 2011; IPES-Food 2016; Campbell et al. 2017). In this context, the role and share of livestock production in our food system are being increasingly debated (Westhoek et al. 2014; Röös et al. 2017; Willett et al. 2019). Over the past years, the pressure of livestock production on the environment has been extensively documented. The main impacts are contributions to global anthropogenic greenhouse gas emissions (Weiss and Leip 2012; Vermeulen et al. 2012; Gerber et al. 2013; Notarnicola et al. 2017); pollution of water resources through overuse of manure (Velthof et al. 2014; Notarnicola et al. 2017); significant requirements in terms of land use, and the reliance on feed crops entering in direct competition with human consumption, and potentially representing high impacts in terms of habitat and biodiversity loss (Steinfeld et al. 2006; Vermeulen et al. 2012; Karlsson et al. 2021).

However, livestock systems also have the potential to provide key ecosystem services. Important benefits include contributions to soil fertility in well-balanced crop–livestock systems; in the case of ruminants, the conversion of non-human-edible biomass into nutrient-dense food; in the case of pasture-based systems, the potential to help mitigate climate change through the storage of carbon in pastures; and contributions to grassland biodiversity (Steinfeld et al. 2006; Garnett et al. 2017; Mottet et al. 2017).

Combined with the economic precarity faced by some livestock farmers in Europe (Havet et al. 2014) and growing societal concerns regarding issues such as animal welfare (Boogaard et al. 2011), these contrasting attributes of livestock systems highlight the need to identify sustainable livestock systems which are both environmentally friendly, and also economically viable for farmers and socially acceptable.

More sustainable livestock systems will emerge from the most relevant systems in the diversity of extant and upcoming propositions. To identify these options for a more sustainable future, two types of tools are key: diversity assessments and indicator-based sustainability assessments.

1.2 Diversity assessments

Diversity assessments aim to acknowledge the diversity of practices and systems within an agricultural sector. Combined with sustainability assessments, they allow to better understand the studied agricultural sector and thereby favor a transition towards greater sustainability (de Snoo 2006; Lebacq 2015; Stylianou et al. 2020a, b). Diversity assessments rely on quantitative or qualitative typology classifications to identify and capture the diversity of production systems (Kuivanen et al. 2016; Stylianou et al. 2020a).

Performing diversity assessments is important for three main reasons. First, in order to ensure the adoption of sustainability assessments, the developed tool or assessment must be perceived as relevant by farmers (de Olde et al. 2016). Farms are the decision-making units that will decide whether to implement sustainability practices (Diazabakana et al. 2014; Latruffe et al. 2016; Kelly et al. 2018). As such, farmers need to be able to relate to the practices which are outlined and analyzed in a sustainability assessment. This will be enhanced if, rather than considering a homogenous set of farms and their average practices, farm-level diversity is taken into account. Farmers may then refer to a group of peers sharing the same practices. Second, understanding and highlighting the diversity of farms is key for the development of adequate interventions and policies aimed at addressing the challenges faced by farmers (Kamau et al. 2018). Third, grouping farms that present similar practices into production systems situates the scale of analysis at a meso-level, above the highly diverse plot and farm levels (micro-level) and below the very uniform regional or national levels (macro-scale). This is necessary to create constructive links between farmers and higher-level actors and develop a mutual understanding of respective objectives and constraints. In short, this approach accounts for farm-level diversity without being overwhelmed by it. As noted by Lynch et al. (2018), aggregating similar practices into production systems allows identifying trends which can be extrapolated at a higher level (e.g., regional or national) while still accounting for the existing diversity of models. Diversity assessments tend to have three main dilemmas, related to a lack of diversity, a lack of representativeness, and a lack of multidimensionality.

  • Dilemma 1: A focus on extreme systems is not sufficient to capture diversity. Many assessments focus on the dichotomies between two opposite systems, such as organic and conventional or extensive and intensive (Escribano et al. 2015; van Wagenberg et al. 2016). While such studies are necessary to characterize these contrasting systems, it is important to go beyond this dualization and to acknowledge a greater level of diversity so that all farmers can relate to the analyses. Moreover, other studies compare different farm types (e.g., arable, livestock, or mixed) without accounting for the diversity of systems within farm types (Westbury et al. 2011; Slijper et al. 2022).

  • Dilemma 2: The representativeness of assessments is constrained by the number of sampled farms. Assessments are often focused on a small number of sample farms (less than 20) (Batalla et al. 2014; Reinsch et al. 2021; Resare Sahlin et al. 2022). Collecting data in loco is important to better apprehend farmers’ realities and priorities. Yet, such assessments generally lack the necessary representativeness to extrapolate and generalize results at a wider scale. In this sense, using farm accountancy databases such as the European Farm Accountancy Data Network (FADN) is highly relevant as, by construction, their aim is to be representative of different farming sectors at regional or national level. While some limitations have been identified with regards to the representativeness of FADN, such as the overrepresentation of “commercial” farms and underrepresentation of smaller farms, it is still considered as the most reliable sample survey (Diazabakana et al. 2014; Mari 2020; Masi et al. 2021).

  • Dilemma 3: Diversity assessments fall short of accounting for the multidimensionality of sustainability. The combination of diversity and sustainability assessments is not always straightforward. For instance, some diversity assessments focus on one single dimension of sustainability (Reinsch et al. 2021; Froldi et al. 2022), or do not complement core accountancy data with additional and more relevant sustainability indicators (Gonzalez-Mejia et al. 2018; Stylianou et al. 2020b; Masi et al. 2021). As such, while some papers manage to combine farm typologies with multidimensional assessments (Haileslassie et al. 2016; Micha et al. 2017; Díaz de Otálora et al. 2022), diversity assessments remain often affected by the dilemmas of indicator-based sustainability assessments identified below.

1.3 Indicator-based sustainability assessments

Indicator-based assessments in the context of agricultural sustainability are relevant since their results are ready-to-use by a multiplicity of actors, including decision-makers, farmers (and advisors), and consumers (Boogaard et al. 2011; Diazabakana et al. 2014; Schader et al. 2014; de Olde et al. 2016; Kelly et al. 2018). Through a set of multidimensional indicators, they provide a better understanding of the sustainability performances of agricultural systems (Sadok et al. 2008; Binder et al. 2010; Van Passel and Meul 2012; de Olde et al. 2016). Yet, performing indicator-based sustainability assessments comes with several challenges in the context of data-driven approaches (i.e., which rely on farm accountancy databases, such as FADN). We identify four main dilemmas (see Figure S1 in the Supplementary material).

  • Dilemma 1: Current sustainability assessments mainly focus on the environment. In order to perform a satisfying sustainability assessment, the selected set of indicators should be multidimensional, i.e., cover all three dimensions of sustainability, namely, environmental, economic, and social. In practice, however, there is an imbalance in the dimensions of sustainability which receive more attention, both in the public and scientific debate, as the main focus currently lies on the environment (Binder et al. 2010; Lebacq et al. 2013; Diazabakana et al. 2014; Schader et al. 2014; de Olde et al. 2016; Stylianou et al. 2020b). The literature on economic and social sustainability is less abundant (Diazabakana et al. 2014) and the social dimension seems to be the least studied of all three (Boogaard et al. 2011). Kelly et al. (2018) carried out a non-exhaustive review of FADN-based sustainability assessments. Of the 27 studies included in the review, the social dimension was considered in 10 studies, as part of a global sustainability assessment but never on its own. In contrast, the environmental and economic dimensions were studied in respectively 20 and 18 studies, either individually or as part of a comprehensive sustainability assessment. Regarding the need for combined assessments, the review shows that 15 of the 27 studies focus on one sole sustainability dimension, whereas the other 12 consider at least two or all three dimensions (Kelly et al. 2018). Although not based on a statistically representative sample, this shows that an important number of current assessments do not yet cover all three dimensions of sustainability.

  • Dilemma 2: The focus on the environment clashes with low availability of data. The availability of indicators in farm databases varies greatly between sustainability dimensions. At the European level, none of the available databases were explicitly developed to assess farm-level sustainability (Kelly et al. 2018). Economic indicators are generally quantitative and monetary. As such, they can be easily measured and recorded in farm accountancy databases (Lebacq et al. 2013; Kelly et al. 2018). In contrast, the assessment of social sustainability is less straightforward. Except for some indicators such as the workforce or the workload, the required data for many social indicators are insufficiently available (Jan et al. 2012; Lebacq et al. 2013). The subjective character of some of these indicators (e.g., self-evaluation of a farmer’s quality of life) complicates their measurability and hence availability (Lebacq et al. 2013). Consequently, social indicators generally require additional data collection (Kelly et al. 2018). Despite receiving much attention, the environmental dimension is poorly covered in farm accountancy databases, with a general lack of precise and comprehensive environmental data (Jan et al. 2012).

  • Dilemma 3: Environmentally relevant indicators are hard to measure and less available. With regard to environmental sustainability, indicators are often classified along what is referred to as the cause–effect chain (Diazabakana et al. 2014; Latruffe et al. 2016). On one end, means-based indicators reflect agricultural and farmers’ practices (e.g., pesticide costs). On the other end, impact-based indicators reflect the actual impact related to a specific environmental theme (e.g., pesticide concentrations in soil and water resources) (Lebacq et al. 2013; Diazabakana et al. 2014). Environmental indicators tend to present an inverse relation between their relevance (i.e., their ability to effectively reflect an environmental impact) and their availability or measurability (i.e., their ease of access or measurement). In terms of relevance, means-based indicators have a low quality of prediction of environmental impacts given that they reflect agricultural practices, whereas impact-based indicators have a high environmental relevance given their direct link with the environmental theme to be assessed (Lebacq et al. 2013). In terms of availability and measurability, means-based indicators are easier to collect given their close link to technical means and inputs used on the farm. As such, they are readily available in farm databases. In contrast, the collection of impact-based indicators is more complex, time consuming, and expensive, and their availability is therefore more limited (Diazabakana et al. 2014).

  • Dilemma 4: Attempts to address sustainability dilemmas do not account for diversity. Previous research has addressed the three previous dilemmas, mainly by complementing core data from farm databases through the mechanistic modeling of additional environmental indicators. For example, Jan et al. (2012) complemented Swiss FADN data with Life Cycle Assessments. Lynch et al. (2018) complemented British FADN data with the Farmscoper tool, developed for the United Kingdom Department for Environment, Food and Rural Affairs (Defra). Westbury et al. (2011) applied the Agri-Environmental Footprint Index (AFI) methodology to British FADN data to assess the environmental performance of three different farm types (arable, lowland livestock, and upland livestock). Slijper et al. (2022) relied on FADN data to assess which farm characteristics affect resilience among different farm types (arable, livestock, and mixed). All these studies capitalized on FADN data to produce comprehensive assessments of farm resilience or multidimensional sustainability (with the social dimensions nevertheless still remaining understudied). However, such approaches tend to overlook the diversity of practices and systems within farm types.

1.4 Research objectives

The seven dilemmas presented above show that the combination of diversity and sustainability assessments is not straightforward. Yet, both tools are complementary and mutually reinforcing: diversity assessments are essential to enhance the relevance of sustainability assessments, while multidimensional sustainability assessments constitute a precondition to ensuring the usefulness of a diversity assessment. In this paper, we contribute to the literature by proposing an ad hoc method that simultaneously assesses the diversity and sustainability of production systems within one agricultural sector, while concurrently overcoming the dilemmas of both diversity and sustainability assessments. First, we start with an identification of multiple production systems to account for the diversity of practices within one farming sector, at regional level. Second, we combine the diversity assessment with a multidimensional indicator-based sustainability assessment to identify the livestock systems with the greatest potential to contribute to the transition towards a more sustainable food system. We show that it is possible to perform a comprehensive assessment which addresses the several dilemmas: it accounts for a high level of diversity (diversity dilemma 1); it is representative of the regional diversity, as it relies on a comprehensive farm accountancy dataset (diversity dilemma 2); it proposes a combined assessment of diversity and multidimensional sustainability (diversity dilemma 3 and sustainability dilemma 4), spanning over both the socio-economic and environmental dimensions (sustainability dilemma 1); it complements the set of available indicators to enhance the relevance of the assessment (sustainability dilemmas 2 and 3).

We applied the proposed method to the dairy and beef sectors in Wallonia (Southern Belgium). These two sectors constitute relevant case studies as they dominate the Walloon agricultural landscape (Figure 1). In 2018, about 50% of Walloon farms were specialized in bovine production, with a clear distinction between specialized beef farms, on the one hand, and specialized dairy farms, on the other (SPW 2020). The Walloon landscape is particularly well suited for these productions given its ample supply of grasslands, and in particular permanent grasslands, which represented 43% of the region’s utilized agricultural area (UAA) in 2018 (SPW 2020). This is much higher than Flanders (Northern Belgium), where permanent grasslands only represent 27% of the region’s UAA (Statbel 2019), or even the EU average, as permanent grasslands represented 34% of the EU’s UAA in 2016 (EU Commission 2018). This particularity is of high interest from an environmental perspective given the associated benefits fostered by permanent grasslands, such as biodiversity conservation (Peeters 2009) or carbon storage (Gourlez de la Motte et al. 2016, 2018). However, both sectors have undergone significant changes in recent decades in terms of concentration and intensification, posing challenges to their social, economic, and environmental sustainability (Peeters 2009; SPW 2020). There has been a decrease in the share of permanent grasslands (−14% over the 1990–2018 period), replaced by an increase in arable crops. In particular, forage maize, which is often associated with a quest for productivity and high input use (Lebacq et al. 2015), has gradually gained in importance in bovine systems in Belgium and all over Europe (Peeters 2009; Natagora 2020; Reinsch et al. 2021). Finally, current studies in Wallonia have primarily focused on the dairy sector (Lebacq et al. 2013, 2015; Lessire et al. 2019; De Herde et al. 2019, 2020; Dalcq et al. 2020), with a lack of studies on the sustainability of the beef sector.

Figure 1
figure 1

A diversity of bovine (dairy and beef-breeding) systems coexists in Wallonia, with varying levels of environmental and socio-economic performances (picture on the left by Philippe Baret; picture on the right by Roger Job).

This paper is organized as follows: In Section 2, we outline the data sources and develop the proposed method. Section 3 provides an overview of the results, including a description of the identified systems and their sustainability performances. In Section 4, broader considerations on the proposed method are provided. The Walloon case study results are further discussed and put in the perspective of the literature. Finally, Section 5 delivers general conclusions and recommendations.

2 Data and methods

2.1 Data

We base the assessment on FADN data to ensure representativeness of the studied region. The database comprises a wide range of farm-level indicators reflecting the diversity of practices in European farms. Farm household data was provided by the DAEA (Direction de l’Analyse Economique Agricole), the regional office in charge of collecting the data at local level and providing it to the FADN. Throughout this paper, we refer to the analyzed dataset as DAEA.

The analyzed sample covers a 4-year reference period (2014–2017). It initially included 359 observations of specialized Walloon dairy farms (corresponding to 108 different farms) and 419 observations of specialized Walloon beef farms (corresponding to 128 different farms). A farm is considered as specialized and classified into a specific farm type (dairy, beef, arable, etc.) by the FADN if at least two thirds of its standard gross product (SGP) come from that particular activity. One observation corresponds to an individual farm for a given year. For some farms of our sample, there are multiple observations over the 4-year period. A two-step data cleaning process was performed. First, all non-specialized farms were excluded from the sample. For the dairy sector, all observations presenting a significant number of suckler cows (more than 10% of total cows on the farm) were excluded from the sample in order to focus the assessment on specialized dairy farms. For the beef sector, it was chosen to put the focus on breeding farms and to exclude all farms performing a fattening step (see Supplementary material for more detail). Indeed, the Belgian beef sector presents a clear distinction between breeding and fattening activities, with a clear regional specialization: Wallonia tends to focus on the breeding stage while the fattening of calves and young bulls is more strongly concentrated in Flanders (Calay et al. 2020). Second, after the classification of farms into production systems (see below), farms situated below the 10th percentile in terms of farm income per family work unit were trimmed from the sample in order to exclude the majority of non-profitable farms from the sample (see Supplementary material for more detail). The final analyzed sample included 290 observations of specialized dairy farms and 216 observations of specialized beef-breeding farms.

2.2 Cost-effective method for production system classification and comprehensive sustainability assessments

The cost-effective method we propose for a multidimensional sustainability assessment accounting for the diversity of practices follows three main steps (see Figure S2 in the Supplementary material): (1) a classification step, which groups farms in typologies of production systems; (2) an indicator selection step, which consists in constructing a comprehensive set of structural, environmental, and socio-economic indicators, based on both core DAEA and calculated data; and (3) an analysis step, which consists in assessing and benchmarking the identified systems through the multidimensional set of indicators.

2.2.1 Step 1: classification

For each sector, three classification criteria were used to cluster similar farms into production systems (see Figure S3 in the Supplementary material). Two classification criteria are common to the dairy and beef-breeding sectors (share of pasture and stocking rate), while the third criterion is specific to each sector (herd size for the dairy sector and breed for the beef-breeding sector). The classification criteria were selected based on their capability to reflect important differences in farming structures and entail environmental benefits. The share of pasture can be associated with positive effects on the environmental impacts of dairy farms, such as biodiversity, global warming potential, acidification, and energy use (Guerci et al. 2013). Stocking rate, among other farming characteristics, can be negatively related to the environmental impact of dairy farms (Bava et al. 2014). Herd size was selected as a classification criterion based on the concentration of the dairy sector which has occurred over the last decades, leading to differences in strategies and practices between smaller and bigger farms (Lebacq 2015). Finally, we included a criterion related to breed given the historical importance of the highly specialized Belgian Blue breed in the Belgian beef sector (Stassart and Jamar 2008; Calay et al. 2020).

The percentage of pasture on the forage area distinguishes grass-based farms from diversified farms (in terms of forage, i.e., with a significant share of arable forage crops). We use a threshold corresponding to the sample median (92% in the case of dairy farms and 89% in the case of beef-breeding farms).

The stocking rate was used as a proxy for the intensification level of farms (the intensification level constitutes a more complex phenomenon resulting from a series of farm management practices and could also be measured by looking at the productivity level per unit of labor or per animal). A threshold of 1.8 LSU (Livestock Units)/ha on-farm forage area was used to separate extensive farms from intensive farms, in line with the Walloon Agri-Environment-Climate Measure, which aims at developing forage self-sufficiency (Natagriwal n.d.)).

For the dairy sector, a distinction between small-scale and large-scale farms was made based on the herd size. The sample median (69 dairy cows) was used as a threshold.

For the beef sector, farms are classified into two possible groups of breeds: the Belgian Blue breed or French breeds (such as Limousin and Blonde d’Aquitaine). Farms for which the share of the dominant breed was less than 50% (68 observations) were excluded from the sample as they were considered as mixed (Other breeds).

2.2.2 Step 2: indicator selection

We use a set of structural, environmental, and socio-economic indicators (summarized in Table 1) to carry out the sustainability assessment. The choice of the indicators was based on the research objectives, i.e., comparing a diversity of dairy and beef-breeding production systems in terms of their socio-economic and environmental performances. We aligned our indicators to the three criteria of indicator selection identified by Lebacq et al. (2013): parsimony (non-redundancy of indicators), consistency (necessary indicators for the interpretations), and sufficiency (the indicators are sufficient to cover the three dimensions of sustainability). The structural and socio-economic dimensions were mainly analyzed through core data readily available in the DAEA dataset. For the environmental dimension, additional calculations were necessary.

Table 1 Set of indicators used to perform a sustainability assessment of bovine production systems in Wallonia. The DAEA is the regional office in charge of collecting FADN data in Wallonia. All environmental indicators were expressed per hectare, per liter of milk (dairy systems), or per suckler cow and progeny (beef systems). Soy consumption was only expressed per cow and progeny. C(&P) cow (& progeny), DC dairy cow, SC suckler cow, cc concentrates, (F)WU (family) work unit, a.i. active ingredient, N nitrogen, DS damage score.

Structural indicators

Besides the indicators used for the classification step (percentage of grassland; stocking rate; herd size and beef breed), a series of additional structural indicators were used to analyze the dairy and beef-breeding systems: land use (on-farm and off-farm areas of crops which are dedicated to the bovine herds; see Supplementary material for included crops); share of forage maize in on-farm forage area; total (on-farm and bought) annual consumption of concentrates (expressed per cow and progeny); self-sufficiency of concentrates (share of on-farm concentrates on total concentrates); dairy yields in the case of dairy farms (annual production of milk per dairy cow); herd size in the case of beef farms (number of suckler cows per farm). All these data were readily available in the original DAEA dataset. Some additional calculations and hypotheses were nevertheless needed to calculate the land use (see Supplementary material), which as such is considered as a “calculated” indicator (Table 1).

Socio-economic indicators

Social sustainability was assessed using two indicators readily available in the dataset (Table 1): level of workforce, expressed in work units (one work unit corresponds to an annual working time of 1800 h), and level of workload, expressed in number of cows per work unit. The number of cows per work unit merely gives an indication of the workload level (e.g., farms with milking robots might have more cows per work unit without necessarily enduring a greater workload). Nevertheless, in the absence of more accurate data, this was considered as a satisfying proxy, as a greater number of cows entails more work for certain tasks (feeding, birth-giving, etc.). Ideally, additional indicators related to aspects such as education, quality of life, multifunctionality, or animal welfare would be included in such assessments.

The analysis of the economic dimension relies on one main indicator: farm income. This indicator is based on the cost and product structure of a farm. It corresponds to the difference between total farm products (including milk products, beef products, other products, and subsidies) and total costs (operational, structural, and financial). This indicator thus also includes farm income generated by other activities than dairy or beef products. Farm income was mainly expressed in euros per family work unit (FWU), but was also analyzed per working hour, per liter of milk (dairy systems), and per suckler cow and progeny (beef systems). Two additional indicators are derived from the cost and product structure: the share of subsidies (on total products) and the economic efficiency (ratio between the gross margin and the total products without subsidies; see Supplementary material for more detail). These economic data were readily available in the original DAEA dataset (Table 1).

Environmental indicators

Five environmental indicators were assessed (Table 1). Unlike the other dimensions, none of the environmental indicators were directly available in the dataset. They were therefore calculated on the basis of available data and emission factors provided in the literature (see Supplementary material for detailed methodologies).

Habitat degradation was assessed through the consumption of soy. Soy was considered to be bought by farms. As per ERM and UGent (2011), it was estimated that 22% of bought concentrates corresponded to soy in the case of dairy farms and 5% in the case of beef-breeding farms. The pollution of water and soil resources was assessed through two indicators: pesticide use and nutrient management. The use of pesticides associated with the production of feed ingredients was estimated based on the land use of farms and the average pesticide use of associated crops in Wallonia (Comité Régional Phyto 2015, 2017) (see Table S1 in the Supplementary material). Nutrient management was assessed through nitrogen emissions, which were estimated based on fixed nitrogen emission factors per animal category (VMM et al. 2020) (see Table S2 in the Supplementary material). The impact on biodiversity was assessed through the Damage Score indicator, which estimates the impact of different management practices (intensive — less intensive — organic) on different land uses (arable land — fertile grassland) through impact factors established by De Schryver et al. (2010) (see Table S3 in the Supplementary material). A higher Damage Score value represents a greater negative impact on biodiversity. Finally, climate change was assessed through the greenhouse gas (GHG) emissions associated with the dairy and beef productions. Unlike previous environmental indicators, which were assessed at farm level, GHG emissions could not be assessed specifically for each farm of the analyzed samples. Estimations were made based on the results of carbon footprints calculated for similar bovine typologies in Wallonia (Petel et al. 2018a, b; Riera et al. 2019).

As recommended in the literature (Lebacq et al. 2013), both area-based (per hectare) and output-based (per liter of milk in the case of dairy systems, and per suckler cow and progeny in the case of beef systems) functional units were used for four indicators : pesticide use, nitrogen emissions, biodiversity impact, and greenhouse gas emissions. In this way, results are neither favorable to very productive systems nor to extensive systems. For beef-breeding farms, the number of animals per farm was used as a proxy for productivity as the available data does not provide a satisfying indicator. For the consumption of soy, only one functional unit was considered: per animal. As a result, the environmental dimension was assessed through nine indicators in each sector.

Finally, as suggested by Bockstaller et al. (2008), the individual indicators were complemented with an aggregated environmental indicator based on the relative performance of each system against the performances of the entire dataset. For each environmental indicator, every system received a score ranging between one and four depending on the corresponding quartile of the average score of that system. Summing the scores for all nine indicators provided an environmental impact score (ranging between 9 and 36) which allowed comparing and classifying systems based on their environmental performances. A similar approach was adopted by Bijttebier et al. (2017).

2.2.3 Step 3: multidimensional analysis

The combination of structural, socio-economic, and environmental indicators outlined above allows for a comprehensive and multidimensional assessment of the identified dairy and beef-breeding systems. Apart from analyzing each indicator individually, the farm income (as it is one of the main economic indicators at farm level) and the environmental impact score (as it provides an overview of the environmental impact across several themes) were used as the two main indicators to perform a combined assessment and benchmarking of the global sustainability performance of the identified systems.

3 Results

3.1 Description of identified production systems

3.1.1 Dairy systems

Eight dairy systems were identified as a result of the classification step (Table 2). They are divided in large-scale systems (D1–D4) and small-scale systems (D5–D8). These two groups are in turn subdivided in either grass-based or diversified systems, which can be intensive or extensive.

Table 2 Summary statistics (mean ± standard deviation) of structural, socio-economic, and environmental indicators for eight dairy systems in Wallonia. Within rows, different superscript letters indicate significantly different means between systems at p<0.05, or p<0.1 for indicators marked with an *. DC(&P) dairy cow (& progeny), cc concentrates, (F)WU (family) work unit, a.i. active ingredient, N nitrogen, DS Damage Score.

Extensive systems present higher land use values than intensive systems. Diversified systems, and particularly the two intensive diversified systems (D2 and D6), present higher shares of forage maize, which is almost absent in grass-based systems. The diversified system D5 relies on other forage crops than maize (e.g., alfalfa) to pursue its diversification. The different strategies in terms of land use appear clearly on Figure 2a. The use of concentrates is systematically higher in intensive systems compared to their extensive counterparts. Furthermore, large-scale systems (D1–D4) tend to present higher concentrate consumption levels than small-scale systems (except D8 which presents the highest concentrate use). The two small-scale extensive systems (D5 and D7) present the lowest use of concentrates. The self-sufficiency of concentrates is higher in diversified systems compared to grass-based systems, which present a nearly null self-sufficiency. The small-scale diversified extensive system (D5) presents the lowest overall use of concentrates and the highest self-sufficiency of concentrates. In terms of milk yields, large-scale systems tend to present high production levels compared to small-scale systems. Furthermore, intensive systems tend to present higher yields than extensive systems. The large-scale diversified intensive system (D2) has the highest milk yield whereas the small-scale diversified extensive system (D5) has the lowest yield.

Figure 2
figure 2

Average land use (ha per cow and progeny) of eight dairy systems (a) and six beef-breeding systems (b) in Wallonia. The size of the squares is proportional to the average land use of each system. DC&P: dairy cow and progeny; SC&P: suckler cow and progeny.

3.1.2 Beef-breeding systems

Six beef-breeding systems were identified as a result of the classification step (Table 3). Based on the main breed, they are divided in Belgian Blue systems (B1–B4) and French breed systems (B5 and B6). These two groups are in turn subdivided in either grass-based or diversified systems, which can be intensive or extensive. Intensive French breed systems were not analyzed as these only included six observations.

Table 3 Summary statistics (mean ± standard deviation) of structural, socio-economic, and environmental indicators for six beef-breeding systems in Wallonia. Within rows, different superscript letters indicate significantly different means between systems at p<0.05, or p<0.1 for indicators marked with an *. SC(&P) suckler cow (& progeny), cc concentrates, (F)WU (family) work unit, a.i. active ingredient, N nitrogen, DS Damage Score.

Extensive systems, and in particular those working with French breeds (B5 and B6), present higher land use values than intensive systems working with the Belgian Blue breed. Within extensive Belgian Blue systems, the grass-based system (B3) occupies more land than the diversified one (B1). The share of forage maize is highest in the Belgian Blue diversified intensive system (B2). Unlike the Belgian Blue diversified systems (B1 and B2), the French breed diversified system (B5) relies on other forage crops than maize (e.g., alfalfa) to pursue its diversification (similarly to the dairy system D5). The different strategies in terms of land use appear clearly on Figure 2b. The use of concentrates is higher in the intensive and/or diversified Belgian Blue systems (B1, B2, and B4). The extensive grass-based Belgian Blue system (B3) and the two French breed systems (B5 and B6) present lower concentrate uses. The self-sufficiency of concentrates is significantly higher for the three diversified systems (B1, B2, and B5) compared to the three grass-based systems (B3, B4, and B6). Compared to dairy systems, beef-breeding systems present lower concentrate consumptions and higher concentrate self-sufficiencies. As no specific productivity indicator was available, the output levels of the systems were estimated through their herd size. The two grass-based extensive systems (both Belgian Blue and French breeds; B3 and B6) present the smallest herd sizes whereas the Belgian Blue diversified intensive (B2) and the French breed diversified extensive (B5) systems present the largest herd sizes.

3.2 Sustainability assessment of identified production systems

3.2.1 Socio-economic sustainability

Dairy systems (Table 2)

In large-scale systems, the two intensive systems (D2 and D4) present lower workforce levels and higher workloads compared to the two extensive systems (D1 and D3). In small-scale systems, the workload is lower than in large-scale systems. Small-scale grass-based systems (both intensive and extensive; D7 and D8) present the lowest workforce levels (which are not necessarily associated with highest workloads).

The average farm income across all systems is 27,424 €/FWU, i.e., 10.2 €/family working hour or 0.11 €/L milk. In all systems, intra-system variability is very high for farm income (high standard deviations). Only the two small-scale diversified systems (D5 and D6) present statistically significant lower farm income levels compared to the six other systems. These similar farm income levels hide very different product and cost structures, as illustrated in Figure 3.

Figure 3
figure 3

Product and cost structure and resulting farm income (€/FWU) of eight dairy systems (a) and six beef-breeding systems (b) in Wallonia. FWU: family work unit; Ext: extensive; Int: intensive.

Regarding the share of subsidies, it is particularly high for small-scale extensive systems (D5 and D7), which is where the majority of organic farms from the sample are found. In terms of economic efficiency, small-scale extensive and/or grass-based systems (D5, D7, and D8), as well as the large-scale grass-based extensive system (D3) present higher performances than the other systems (D1, D2, D4, and D6).

Beef-breeding systems (Table 3)

The Belgian Blue grass-based extensive system (B3) presents the lowest workforce and workload levels whereas Belgian Blue diversified intensive (B2) presents the highest workforce and workload levels. The other systems present intermediate situations.

The average farm income for beef-breeding farms across all systems is 9057 €/FWU, i.e., 3.4 €/family working hour, which is significantly lower than for dairy farms. Here too, intra-group variability is very high, resulting in an absence of statistically significant differences between group means. As for the dairy sector, Figure 3 illustrates that beef-breeding systems present different product and cost structures despite similar farm income levels.

Regarding the share of subsidies, it is extremely high (over 40% of total products) for the two French breed systems (B5 and B6, which are composed almost exclusively of organic farms) as well as for the extensive grass-based Belgian Blue system (B3). The share of subsidies is the lowest (around 20%) for the diversified intensive Belgian Blue system (B2). In general, the share of subsidies is significantly higher for beef-breeding farms than for dairy farms (around 10–20% of total products). Finally, in terms of economic efficiency, similar groups appear as for the share of subsidies (as well as for the farm income per suckler cow): the two French breed systems and the extensive grass-based Belgian Blue system (B3, B5, and B6) present better performances than the remaining three Belgian Blue system (B1, B2, and B4).

3.2.2 Environmental sustainability

Dairy systems (Table 2)

Regarding soy consumption, the intensive systems present the highest consumption levels whereas the small-scale extensive systems (D5 and D7) present the lowest values.

Regarding pesticides, results show that grass-based systems, and in particular the extensive ones (D3 and D7), use lower amounts of pesticides, both per hectare and per liter of milk. This is partly because these systems benefit from the presence of organic farms (for which a null use of pesticides is assumed), although the trend holds true when organic farms are excluded from the sample.

Regarding nitrogen emissions, small-scale extensive systems (D5 and D7) present the lowest emission levels when results are expressed per hectare. On the contrary, the more productive systems (in particular D2, D3, and D4) present the lowest emission levels when results are expressed per liter of milk.

Regarding biodiversity, small-scale extensive systems (both diversified and grass based; D5 and D7) present the lowest impact levels per hectare across all systems. They benefit from the presence of organic farms in their groups which present lower impact scores. Per unit of output, the more productive systems (in particular D2 and D1) present lower impact levels compared to less productive and, in general, more extensive systems.

Regarding GHG emissions, grass-based systems (in particular the extensive ones; D3 and D7) present lower emission levels when results are expressed per hectare whereas intensive systems (in particular the diversified ones; D2 and D6) present lower emission levels when results are expressed per unit of output.

Overall, when aggregating all nine environmental indicators into an environmental impact score, the two extensive grass-based systems (D3 and D7) present the lowest environmental impacts, followed by the small-scale diversified extensive systems (D5). Diversified intensive systems (D2 and D6) present the highest overall impacts. The remaining systems (D1, D4, and D8) present intermediate situations.

Beef-breeding systems (Table 3)

Regarding soy consumption, the two French breed systems (B5 and B6) show much lower soy consumptions compared to the two intensive Belgian Blue systems (B2 and B4), which present the highest values of soy consumption. The two extensive Belgian Blue systems (B1 and B3) present intermediate situations (Table 3). In general, beef-breeding systems present much lower soy and concentrate consumptions than dairy systems.

Regarding pesticides, the two French breed systems (B5 and B6) present the lowest values of pesticide use, both per hectare and per animal. This can be explained by the fact that these systems are composed almost exclusively of organic farms, for which the pesticide use was assumed to be inexistent. Within Belgian Blue systems, extensive grass-based system (B3) presents the lowest value.

Regarding nitrogen emissions, when results are expressed per hectare, extensive systems, and in particular grass-based systems (B3 and B6), lead to lower emission levels. On the contrary, intensive Belgian Blue systems (B2 and B4) lead to higher emissions, almost twice as high. Analyzing the results per suckler cow is not particularly relevant given that the nitrogen emission factor per animal was considered the same across all farms and systems.

Regarding biodiversity, the two French breed systems (B5 and B6), and in particular the diversified one (B5), present the lowest impact levels, both per hectare and per animal. Within Belgian Blue systems, the two intensive systems (B2 and B4) present lower impact levels when results are expressed per animal whereas the two extensive systems (B1 and B3), and in particular the grass-based one (B3), tend to present lower impact levels when results are expressed per hectare.

Regarding GHG emissions, grass-based systems (B3, B4, and B5), and particularly the extensive ones (B3 and B5), as well as the French breed diversified system (B6) present the lowest impact level. The Belgian Blue diversified intensive system (B2) presents the highest impact level, both per hectare and per animal.

Overall, when aggregating all environmental indicators, the two French breed systems (B5 and B6) present the lowest environmental impact score, followed by the Belgian Blue grass-based extensive system (B3). On the contrary, the Belgian Blue diversified intensive system (B2) presents the highest environmental impact score. The two remaining Belgian Blue systems (diversified extensive and grass-based intensive; B1 and B4) present intermediate situations.

3.2.3 Combined results: multidimensional sustainability

A combined assessment of the socio-economic and environmental performances of the dairy and beef-breeding sectors is based on the farm income and the environmental impact score of the different farms and systems (Figure 4). Farms and systems should aim for the top-left corner of the figure as this is where lower environmental impacts meet higher farm incomes. In both sectors, there are examples of systems reaching this goal.

Figure 4
figure 4

Combined economic and environmental performances of bovine systems in Wallonia: dairy sample observations (a); beef-breeding sample observations (b); dairy systems (c); beef-breeding systems (d). Greenhouse gas emissions were estimated at the production system level and were considered similar for all farms within a system. Red crosses indicate sample averages. FWU: family work unit. The top left corner indicates the best performance of economic and environmental results (high farm income and low environmental impact score).

Within dairy systems, two ways toward the top-left corner can be identified: an economic way and an environmental way. The former is composed of the four intensive systems (D2, D4, D6, and D8) as well as the large-scale diversified extensive system (D1). The latter is composed of the two grass-based extensive systems (D3 and D7) as well as the small-scale diversified extensive system (D5). Five systems can be considered as close to the top-left corner: D1, D4, and D8 have followed the economic way whereas D3 and D7 have followed the environmental way. The remaining three systems are further away from the top-left corner and present either poor economic performances (D5), poor environmental performances (D2), or both (D6).

Within beef systems, three systems can be considered as close to the top-left corner: the two French breed systems (B5 and B6) as well as the extensive grass-based Belgian Blue system (B3). The remaining three Belgian Blue systems are further away, either in terms of environmental performances (B2) or both environmental and economic performances (B1 and B4).

4 Discussion

4.1 Considerations on diversity assessments and implications for the sustainability of dairy and beef-breeding sectors

The first step, and key assumption of our method, was that a prior assessment of the diversity of systems and practices is necessary to enhance the relevance of sustainability assessments. A diversity of dairy and beef-breeding production systems coexist in Wallonia, showcasing different practices and strategies to pursue production and sustainability principles, as has recently been shown for Flanders (Tessier et al. 2021). Our typologies result from a representative sample of Walloon bovine farms and a set of qualitative criteria which are embedded in the local context. For instance, grouping farms based on the share of on-farm pasture results from the relative importance of grasslands in Wallonia (SPW 2020). Similarly, accounting for the breed was considered necessary given the historical importance of the highly specialized Belgian Blue breed in the Belgian beef sector (Stassart and Jamar 2008; Calay et al. 2020). Six main production systems were identified for the beef-breeding sector while eight systems were identified for the dairy sector. Our typology of the dairy sector is in line with the one proposed by Lebacq (2015).

The usefulness of the diversity assessment becomes evident when analyzing the combined economic and environmental performances of the dairy and beef-breeding farms. Drawing conclusions on the environmental and socio-economic sustainability of both sectors is difficult when a diversity of production systems is not taken into account and only an undifferentiated set of farms is considered (top of Figure 4; sub-figures a and b). On the contrary, the analysis gains in clarity and relevance when done through the lens of the identified production systems (bottom of Figure 4; sub-figures c and d), thereby allowing to better grasp the challenges at stake in terms of sustainability within the dairy and beef-breeding sectors. It is necessary to identify the specific practices or biophysical features which make farms more or less environmentally efficient (Lynch et al. 2018). Our diversity assessment approach sets a first step in this direction as it shows that extensive grass-based systems present better combined results.

In terms of economic sustainability, our results have confirmed that the Walloon dairy and beef sectors face important challenges. The situation is particularly dire for beef farms. With structurally low farm incomes and a high dependence on subsidies, the economic viability of this sector can be put into question, as noted by Calay et al. (2020) and SPW (2020), and confirmed by Duluins et al. (2022). Average farm income values did not show significant intergroup differences, but they do hide different strategies to secure their farm income, resulting from different product and cost structures (Figure 3). On one side, more large-scale, intensive, and generally maize-based systems (e.g., dairy systems D2 and D4 and beef-breeding systems B2 and B5) tend to aim for a product- and productivity-maximization strategy, which allows compensating higher costs. On the other side, more extensive grass-based systems (e.g., dairy systems D3 and D7 and beef-breeding systems B3 and B6) tend to aim for a cost-reducing strategy, to compensate lower output levels. Fostering the economic viability of bovine farms thus constitutes a necessity, for example, through the implementation of fair prices and fair relationships in value chains, or adequate policy instruments rewarding systems with better environmental performances.

Regarding the environmental sustainability, more extensive and grass-based systems present the lowest environmental impacts as opposed to more intensive and diversified systems which rely more importantly on forage maize. This confirms the higher impacts attributed to forage maize in comparison to grasslands (Peeters 2009; Lebacq 2015) and the potential of grass-based systems in terms of environmental conservation (Meul et al. 2012; Reinsch et al. 2021). Yet, the freedom to engage in practices and production systems (e.g., implementing extensive, grass-based practices) is not always guaranteed as farmers might be constrained by external factors such as pedo-climatic conditions or access to land (Lebacq 2015; Lynch et al. 2018).

In terms of combined environmental and socio-economic performances, environmentally friendly systems (mainly grass-based and extensive) present a better compromise between environmental impact score and farm income (bottom of Figure 4). Although we cannot assert that environmentally friendly systems perform better economically than the more environmentally harmful systems, we can conclude that the environmentally friendly systems are not burdened by poorer economic results, as also concluded by Duluins et al. (2022). This is crucial as farmers need to perceive that taking up sustainable practices does not imply any economic disadvantage (Lynch et al. 2018). Our results show that there is not necessarily a trade-off between environmental and economic performances.

4.2 Considerations on data-driven indicator-based sustainability assessments

The second step and objective of this paper was to adopt a method allowing for comprehensive sustainability assessments based on farm accountancy data and to overcome the dilemmas of indicator-based sustainability assessments. For our case study, the available DAEA data mainly included structural and socio-economic indicators. We were able to complement this core data and to produce relevant environmental indicators. Two options can be pursued to collect additional data for a comprehensive assessment: complementary on-farm enquiries and measurements, or the modeling of additional data with the help of some assumptions and estimations, as applied in our method.

In comparison with the collection of on-farm information, the main advantage of our method resides in its cost-effectiveness for a large-scale implementation. As suggested by Lebacq et al. (2013), in the context of indicator-based sustainability assessments, data should be collected at a reasonable cost. From this perspective, the modeling approach constitutes an interesting alternative to the collection of on-farm information as the latter may be costly and time consuming, as well as representing a potential challenge in terms of ensuring that a representative number of sample farms are surveyed (e.g., it might not be possible to measure GHG emissions for all FADN sample farms) (Lynch et al. 2018). On the downside, modeled data will always remain estimations, which are by definition less accurate than real measurements. Furthermore, the complexity of mechanistic models may in some cases limit their use (Halberg et al. 2005; Bockstaller et al. 2008), and the relevance of modeling is highly dependent on the quality of the models it relies on. Yet, a more systematic use of modeling within assessment processes may nurture a virtuous loop: the need for more accurate models will reinforce the motivation of modelers to develop research on models. Increased interactions between modeling specialists and assessment actors should lead to a better understanding of respective expectations. Moreover, data collected in the assessment may help to better fit models and check their relevance in real-life situations.

Choosing for the modeling approach also implies considerations on the cost of the additional calculations. In the case of farm accountancy databases, environmental indicators that are readily available (e.g., pesticide costs) present low environmental relevance (third dilemma of sustainability assessments). Modeling complex impact-based indicators (e.g., pesticide concentrations in soil) increases the accuracy of the assessment, but also its cost (Lebacq et al. 2013). Hence, for the sake of cost-effectiveness, we argue there is an optimum to find when calculating additional indicators. As illustrated in Figure 5, intermediate indicators (e.g., quantities of pesticides) might be more suitable, as they allow a gain in relevance while being relatively easy to estimate (i.e., with a limited cost). This does not question the usefulness of highly specific models, which focus on one particular environmental issue and seek to model it without having to rely on costly primary measurements. Rather, it suggests a complementary approach, which can be useful to overcome potential time and budget constraints while taking advantage of the structure of farm accountancy databases.

Figure 5
figure 5

Relation between the relevance and measurability of environmental indicators. Hatched areas represent available data (i.e., core dataset data, or measurements); white areas represent unavailable data (i.e., calculated data, or approximations). General situation (a): environmental indicators which are available in farm accountancy databases have a low relevance, but it is possible to calculate additional indicators to improve the accuracy of the assessment. Impact of data availability (b): for the sake of cost-effectiveness, it might be more pertinent to rely on intermediate indicators which are easy to estimate rather than on complex impact-based indicators.

As long as FADN and other farm accountancy databases remain focused on the economic dimension, our approach provides a hybrid cost-effective way to perform comprehensive sustainability assessments. In the context of its Farm to Fork strategy, the European Commission plans to transform the Farm Accountancy Data Network (FADN) into a Farm Sustainability Data Network (FSDN). Broadening the scope of FADN by giving a greater attention to social and environmental themes would allow meeting the evolving information needs of a variety of actors (farmers, policymakers, consumers, etc.) (Kelly et al. 2018; Lynch et al. 2018). Yet, this transformation represents a significant task that will require supplementing the existing dataset with new indicators, with an estimated increase in the data collection costs of up to 40% (Vrolijk and Poppe 2021). It will be necessary to strike a balance between providing the necessary information and minimizing the additional burden for farmers and data collectors. Our method has the advantage of being FSDN-ready, as a more comprehensive set of core sustainability indicators would only increase the quality of the assessments, without having to rely on estimated indicators.

4.3 Further considerations on the representativeness and replicability of our method

In terms of representativeness, it is important to keep in mind that results are only representative of the Walloon bovine sector (and only of the breeding step in the case of the beef sector, the fattening step being excluded). Such a regional approach is highly complementary to wider-scale analyses, comparing, for instance, trends across European countries or regions, based on country or region averages (e.g., Díaz de Otálora et al. 2022). Working upwards from the farm level, regional FADN data allows us to account for specificities at the farm level and leaves more room for regional diversity (identification of eight dairy systems and six beef-breeding systems in our case or highlighting the importance of the Belgian Blue breed). These region-specific assessments might be valuable to assist in the design and implementation of adequate local policies, such as the CAP strategic plans.

Despite being of regional relevance, our approach is highly replicable as it builds on EU-wide standardized FADN data. Our methodology can thus be applied in other Member States, or with other farm data surveys, as long as it can rely on locally relevant classification criteria.

5 Conclusions

In this paper, we present an ad hoc method, which allows performing comprehensive and multidimensional sustainability assessments based on farm accountancy data, while acknowledging the diversity of practices and production systems. Our combined approach seeks the best compromise between specificity (i.e., accounting for diversity), relevance, and cost-effectiveness. We tested this method on the Walloon dairy and beef-breeding sectors by analyzing FADN data. Five conclusions and key messages can be drawn from our results:

  1. 1.

    The results of our case study confirmed that complementing sustainability assessments with diversity assessments is key to fully grasp the challenges at stake in different farming sectors.

  2. 2.

    A diversity of systems coexists (large scale or small scale; grass based or diversified; intensive or extensive; etc.). Although they showcase different practices and strategies, the results prove that it is possible to overcome trade-offs between economic and environmental performances.

  3. 3.

    The results show that extensive grass-based systems present the best combination of economic and environmental results. This highlights the importance of preserving grassland resources at the regional level.

  4. 4.

    The results confirm that Walloon bovine farms face challenging economic situations, stressing the need to ensure their economic viability.

  5. 5.

    While our method proved effective to complement FADN data, it also suggests that the planned transformation of the Farm Accountancy Data Network into a Farm Sustainability Data Network is strongly needed.

Our results should be corroborated by further evidence able to overcome the identified methodological limitations of the study (e.g., include the beef fattening step in the analyses or improve the assessment of GHG emissions). Further research on strategic ways to implement a transition to more sustainable livestock systems in Europe is also needed (e.g., via the adoption of sustainable practices in different systems and geographical regions while dealing with year-to-year economic fluctuations, or tools to foster the economic sustainability of bovine farms).