A Composite Indicator to Assess Sustainability of Agriculture in European Union Countries

Few studies have been conducted to assess agricultural sustainability in the European Union (EU), and all of them fail to provide a holistic view of sustainability in a relevant temporal horizon that could effectively support the design of policies. In this paper, a composite indicator is constructed based on the geometric aggregation of 12 basic indicators measured yearly in the period 2004–2020 (17 years) on all EU countries plus United Kingdom, with weights determined endogenously according to the Benefit of Doubt (BoD) approach. Our composite indicator has a two-level hierarchical structure accounting for the contributions of the economic, social and environmental dimensions of sustainability. In our results, Bulgaria, Croatia, Lithuania and Poland are the countries with the strongest growth rate of sustainability, while countries reaching the 90th percentile of the score in sustainability include Austria, Czechia, Estonia, France, Germany, Hungary, Latvia, Lithuania, Slovakia and Sweden. In overall, the social and the environmental dimensions have similar levels, while the level of the economic dimension is definitely higher. Interestingly, several countries with a high level of sustainability are characterized by a decline of the economic dimension, including Austria, Finland, Italy, Latvia and Slovakia. The reliability of our composite indicator is supported by the substantial agreement of sustainability scores with subsidies attributed by the Common Agricultural Policy (CAP). Therefore, our proposal represents a valuable resource not only to monitor the progress of EU member countries towards sustainability objectives, but also to refine the scheme for the attribution of CAP subsidies in order to stimulate specific sustainable dimensions.


Introduction
Nowadays, the agricultural sector is called to face in the front row the challenge of satisfying food demand of the rapidly increasing world population. For this reason, sustainability of agriculture has become a widely spread theme among international decision makers. Specifically, the Food and Agriculture Organization (FAO) has outlined five principles of sustainable agriculture: (i) increase of productivity, employment and value addition in food systems, (ii) protection and enhancement of natural resources, (iii) improvement of livelihoods and promotion of inclusive economic growth, (iv) enhancement of the resilience of people, communities and ecosystems, (v) adaptation of governance to new challenges (Food and Agriculture Organization 2014). Also, the concept of agricultural sustainability has been integrated into the objectives of the Common Agricultural Policy (CAP) of the European Union (EU), and has found a significant place in the EU scientific research program Horizon 2020 (European Commission 2011) and in the 2030 agenda for the Sustainable Development Goals (SDGs) of the United Nations (UN General Assembly 2015). However, despite the widely acknowledged importance of sustainable agriculture for economic systems around the world, consensus on how agricultural sustainability should be defined, pursued and measured is still far from being achieved (Zhang et al. 2021).
Several different tools have been developed to assess sustainability of agriculture in a holistic view, including RISE (Response-Inducing Sustainability Evaluation, Hani et al. 2003), SAFE (Sustainability Assessment of Farming and the Environment, Van Cauwenbergh et al. 2007), IDEA (Indicateur de Durabilité des Exploitations Agricoles, Zahm et al. 2008), SEAMLESS (System for Environmental and Agricultural Modelling Linking European Science and Society, Van Ittersuma et al. 2008), SAFA (Sustainability Assessment of Food and Agriculture, Food and Agriculture Organization 2013), PG (Public Goods, Gerrard et al. 2012), and the COSA method (Committee on Sustainability Assessment 2020). In these tools, agricultural sustainability is conceptualized into three main pillars (sustainable dimensions), which are measured through sets of indicators: (i) the economic dimension, pertaining to the efficient production of goods and services, (ii) the social dimension, concerning the improvement of conditions in rural areas, and (iii) the environmental dimension, referring to the management of natural resources.
Indicators are widely used in assessment tasks because they provide a quantitative and simplified view of specific phenomena. Therefore, in principle, even the assessment of sustainability may benefit from their employment. Unfortunately, indicators involve significant difficulties in the selection and aggregation processes, which are reflected by the wide variability in the methodology across existing tools for the assessment of agricultural sustainability (De Olde et al. 2016;Chopin et al. 2021). Several guidelines for the selection of indicators have been proposed in the literature, with emphasis on the principles of parsimony, sufficiency and availability (Latruffe et al. 2016;Talukder et al. 2020). However, existing assessment tools are still a long way from converging towards a common core set of indicators, raising doubts about the achievement of standardized tools with general validity and applicability (De Olde et al. 2016).
The selection of indicators is not the only step determining the validity of assessment tasks. In fact, once indicators are selected, the underlying information should be extracted, interpreted and communicated in an easily intelligible form to policy makers. Currently, there is no consensus among existing assessment tools whether the indicators should be aggregated or considered individually (Chopin et al. 2021). Aggregation of indicators into composite indicators (Organisation for Economic Co-Operation and Development 2008) is an appealing approach as it provides one or few synthetic measures of sustainability that ease comparisons across different systems. However, the construction of composite indicators is subjected to several arbitrary choices that may influence the final results, especially the aggregation method and the weighting scheme (Terzi et al. 2021). In order to reconcile the two approaches, some authors have suggested to employ both individual and aggregated indicators, where the former are used to analyse each system and the latter to make comparisons among systems (Bockstaller et al. 2008).
Issues of agricultural sustainability may differ across the various geographical scales, i.e., farms, regions and countries. Therefore, in order to achieve a holistic view of agricultural sustainability, the integration of different geographical scales is just as important as the integration of sustainable dimensions. Many policies, management programs and assessments targeting the conservation of ecosystems and well-being fail because they do not properly address such integration (Millennium Ecosystem Assessment 2005). Also, the temporal attribute has an important role, as it allows to assess not only the level, but also the trend of sustainability.
In this paper, we focus on the assessment of agricultural sustainability in the EU. In our review of the literature, we found a total of twelve studies: five conducted at farm level and seven conducted at country level. Surprisingly, all of these studies fail to provide a holistic view of agricultural sustainability in a relevant temporal horizon that could effectively support the design of policies. On one hand, all studies conducted at farm level cover all the three sustainable dimensions but rely on cross-sectional data. On the other hand, among studies conducted at country level, some cover only a subset of the sustainable dimensions, others focus on a small set of countries, still others rely on cross-sectional data. Studies conducted at farm level have the opportunity to directly adopt existing assessment methods, especially for what concerns the selection of indicators, and data can be collected through direct interviews. However, the results are difficult to generalize at higher geographical scales, thus they have limited relevance to policy makers. Instead, studies conducted at country level provide a more general information which is suited to international policy making, but computing the indicators suggested by existing assessment tools may be impracticable due to scarce availability, or even unavailability, of data at the national scale (see the set of indicators proposed by Talukder et al. 2020 based on the existing literature). Clearly, whichever the geographical scale, the problem of data unavailability is further emphasized when the temporal evolution is considered, thus justifying the small number of studies based on longitudinal data at both farm and country level.
This paper aims at filling the gap of existing empirical studies assessing sustainability of EU agriculture by achieving a holistic view in a relevant temporal horizon. A composite indicator is constructed based on the geometric aggregation of 12 basic indicators measured yearly in the period 2004-2020 (17 years) on all EU countries plus United Kingdom, with weights determined endogenously according to the Benefit of Doubt (BoD) approach (Cherchye et al. 2007;Zhou et al. 2010;Vidoli et al. 2015). Our composite indicator has a two-level hierarchical structure accounting for the contributions of the three sustainable dimensions. Geometric aggregation allows a small degree of compensation to reflect the fact that sustainable development is achieved only when all or most individual sustainability goals are pursued, while the BoD weighting scheme, which has never been applied in the assessment of agricultural sustainability in the EU, permits to infer the relative importance of each basic indicator and sustainable dimension in the achievement of sustainability without relying on subjective opinions. This paper is structured as follows. In Sect. 2, the literature on the assessment of agricultural sustainability in the EU is reviewed. In Sect. 3, the selection of indicators and the data collection process are described. In Sect. 4, the methodology employed in the construction of the composite indicator is detailed. In Sect. 5, the results are presented and discussed, including the comparison with the attribution of CAP subsidies and the analysis of sensitivity to different aggregation methods and weighting schemes. Section 6 contains concluding remarks and purposes for future work.

Literature Review
The characteristics of existing studies assessing agricultural sustainability in the European Union (EU) are briefly reviewed in Table 1. We see that studies conducted at farm level (Gómez-Limón and Sanchez-Fernandez 2010;Majewski 2013;Ryan et al. 2016;Gaviglio et al. 2017) cover all the three sustainable dimensions, but they all rely on cross-sectional data and thus they disregard the temporal evolution of sustainability. Instead, among studies conducted at country level, some cover only a subset of the sustainable dimensions (Cristache et al. 2018;Czyzewski et al. 2020), others focus on a small set of countries (Radovanović and Lior 2017;Mili and Martínez-Vega 2019), still others rely on cross-sectional data (Nowak et al. 2019;Cataldo et al. 2020). The study in Magrini (2022) is the only exception covering all the three sustainable dimensions and considering a broad set of countries longitudinally, although the assessment is focused on the growth rate of sustainability and not on its level.
The method of assessment differs across studies in Table 1, but the construction of composite indicators, employed by nine studies out of twelve, is the most common approach. In Gómez-Limón and Sanchez-Fernandez (2010), both arithmetic and geometric aggregation is considered and combined with weights based on prior judgements and on principal component analysis. In Majewski (2013), Ryan et al. (2016), and Mili and Martínez-Vega (2019), one composite is constructed for each sustainable dimension using arithmetic aggregation and uniform weights, i.e., admitting full compensation and attributing the same importance to each indicator. In Radovanović and Lior (2017), a composite is constructed using arithmetic aggregation and different weighting methods based on several scenarios. Multi-criteria decision analysis is adopted by three studies: the Agri-environmental Footprint Index (AFI, Purvis et al. 2009) is employed in Gaviglio et al. (2017) and in Dabkiene et al. (2021), while the method of similarity to the ideal solution (TOPSIS, Hwang and Yoon 1981) is applied by Nowak et al. (2019). In Cataldo et al. (2020), an innovative weighting method based on Partial Least Squares Path Modelling (PLS-PM) with second-order formative constructs is proposed. This method has the advantage to provide one weight for each indicator and each sustainable dimension, thus making the results easier to interpret, and to not require correlation among basic indicators. In Magrini (2022), EU countries are clustered according to common trends of sustainable objectives through group-based multivariate trajectory modelling (Nagin et al. 2018).
Arithmetic aggregation (weighted sum) and geometric aggregation (weighted product) are distinguished by the degree of compensation. Specifically, arithmetic aggregation admits full compensation, i.e., it allows to cancel a bad performance in a basic indicator through a performance of the same intensity but of opposite sign in another basic indicator. On the contrary, using geometric aggregation, the compensation of a bad performance in a basic indicator requires a good performance of higher intensity in other basic indicators.

3
Although arithmetic aggregation is often preferred in the assessment of agricultural sustainability in the EU (see Table 1), we believe that the underlying assumption of full compensation is undesirable because sustainable development is achieved only when all or most individual sustainability goals are pursued. In this view, we believe that geometric aggregation is more suited to the assessment of sustainability due to its low degree of compensation, even if establishing the correct degree of compensation to assume remains challenging due to the lack of consensus on how agricultural sustainability should be defined, pursued and measured (Zhang et al. 2021).
As it can be noted from Table 1, existing studies assessing agricultural sustainability in the EU adopt different weighting methods. This variability reflects the existence of several different approaches without a widely accepted methodology. Essentially, weights can be set uniformly to give the same importance to each indicator and/or dimension, defined a priori with the help of experts' and stakeholders' opinions, or computed endogenously (i.e., empirically from data). A good review of weighting methods can be found in Organisation for Economic Co-Operation and Development (2008) and in Terzi et al. (2021). Uniform and a priori weighting are the most common schemes adopted by existing studies assessing sustainability of EU agriculture. Uniform weighting is easy to understand and replicate, but it cannot provide insights into the importance of indicators and may involve the risk of double weighting. The a priori definition of weights, commonly performed through multi-criteria decision analysis (see, for example, Talukder et al. 2018), represents the most transparent way to construct composite indicators, but it is potentially affected by bias due to scientific consensus or policy priorities, and it may be difficult or even impossible to be generalized across different geographical regions.
Several methods to determine the weights endogenously have been proposed to avoid sources of subjectivity. For instance, principal components and factorial analysis can be exploited to determine the weights based on empirical correlations. Although weights determined in this way can be interpreted as correlations with some underlying constructs, this approach have been criticized because the importance of indicators does not necessarily depend on their covariance structure. The Benefit of Doubt (BoD) approach (Cherchye et al. 2007;Zhou et al. 2010;Vidoli et al. 2015) is an alternative weighting method based on benchmarking arguments, i.e., weights are assigned in order to maximize the overall performance of units. Therefore, a unit with a relatively good (or bad) performance in a specific indicator indicates that such unit considers the underlying objective as more (or less) important to achieve a good overall performance. BoD weighting is superior to correlation-based schemes because it does not require indicators to be correlated and is unit invariant (Cooper et al. 2000, p. 39), thus normalization of indicators is not needed. However, it may attribute excessively low or high weights to indicators with the consequent risk of cancelling the contribute of some weak objectives. A commonly adopted solution to attenuate this inconvenient is the use of proportion constraints in order to bound the relative contribution of each indicator to the composite (Cherchye et al. 2007).
Although the BoD weighting scheme has not yet been adopted to assess agricultural sustainability in the EU, it has received a large popularity in the last two decades, as witnessed by several notable applications to a large variety of research fields, including human development (Despotis 2005), technological achievement (Cherchye et al. 2006), quality of life (Morais and Camanho 2011), internal market (Cherchye et al. 2007), competitiveness (Bowen and Moesen 2011), student evaluation (Rogge 2011), environmental performance (Zanella et al. 2011), digital access (Gaaloul and Khalfallah 2014), and health system evaluation (Lauer et al. 2004;Vidoli et al. 2015). In this view, the use of the BoD weighting scheme to assess sustainability of EU agriculture is definitely attracting.
Partially Ordered Sets (POSets) have been recently proposed as an alternative to composite indicators (Alaimo et al. 2021a, b;Fattore 2017). In essence, POSets can provide a (partial) order on the combinations of basic indicators' values, thus aggregation is avoided. Although POSets overcome several limitations of composite indicators, they are designed for ordinal basic indicators and involve a computational complexity that is exponential in the number of indicators and of their categories. Therefore, we believe that POSets are not suited to the assessment of agricultural sustainability because, according to existing assessment tools, a large number of indicators should be considered and most of them are quantitative.

Selection of Indicators and Data Collection
The selection of indicators was based on guidelines outlined in Van Cauwenbergh et al. (2007). Although these guidelines have been published more than ten years ago as part of the SAFE assessment tool, they have inspired several recent assessment tools and have often been appreciated in some recent critical reviews (see, for example, Latruffe et al. 2016;De Olde et al. 2016;Talukder et al. 2020). Also, SAFE consists of a smaller set of objectives compared to most existing assessment tools, thus making possible to select indicators that can be computed at country level based on publicly released statistics.
Our procedure for selecting indicators and collecting data was the following. Firstly, we identified all the objectives suggested in Van Cauwenbergh et al. (2007) that could be measured by at least one indicator for which data are released by international institutions and organizations. Secondly, we selected a set of indicators and a temporal window as large as possible balancing representativeness of the three sustainable dimensions (economic, social and environmental dimensions) and availability of time series data. In the data collection process, we tolerated the occurrence of at most six missing values for each time series (one third), with no more than two consecutive missing values internally to the time series and no more than one missing value at the extremes.
The resulting dataset comprises twelve indicators: five for the economic, three for the social and four for the environmental dimension, measured yearly on all the 27 EU countries plus United Kingdom in the period 2004-2020 (17 years). Table 2 contains a brief description, objective and data source of the selected indicators, while a detailed description is provided in Sect. 3.1. Details on the imputation of missing values and on cointegration analysis are given in Sect. 3.2.

Selected Indicators
The selected indicators for the economic dimension of agricultural sustainability cover the following objectives in Van Cauwenbergh et al. (2007): -"Agricultural activities are economically and technically efficient", measured through the Total Factor Productivity (TFP) index of agriculture with base year 2015 computed by the United States Department of Agriculture ( X 1 ); -"Land tenure arrangements are optimal", measured through the ratio of net capital stocks to gross value added ( X 2 , source: Faostat); -"Inter-generational continuation of farming activity is ensured", measured through the ratio young/elderly for farm managers ( X 3 , source: Common Monitoring and Evaluation Framework for the CAP 2014-2020), where young managers are those with less than 25 years and elderly managers are those with more than 55 years; -"Farm income is ensured", measured through the real income of agricultural factors per paid annual work unit ( X 4 ) and the net entrepreneurial income of agriculture per unpaid annual work unit ( X 5 ), both computed by Eurostat as indices with base year 2010.
Unfortunately, economic objectives outlined in Van Cauwenbergh et al. (2007) related to farmer's training, market activities and dependency on external finance were disregarded due to data unavailability.
For what concerns the social dimension of agricultural sustainability, we covered the objective "Equity in the farm community is maintained or increased" in Van Cauwenbergh et al. (2007) through the following three indicators: median equivalised net income in rural areas ( X 6 ), at-risk-of-poverty rate in rural areas ( X 7 ) and unemployment rate in rural areas ( X 8 ), all sourced to Eurostat. Unfortunately, we did not find reliable data on social objectives outlined in Van Cauwenbergh et al. (2007) related to food quality, integration, labour and health conditions. However, it is worth noting that indicator X ENV,2 (area under organic cultivation) selected for the environmental dimension as described below, partially covers the provision of food of good quality. All the shortcomings in the coverage of the social dimension of agricultural sustainability were independent of our effort, in fact the lack of data on social indicators is widely recognized (Latruffe et al. 2016) and, to our knowledge, the three indicators that we selected are the only measures for which time series data referred to EU countries are publicly available.
The selected indicators for the environmental dimension of agricultural sustainability cover the following objectives in Van Cauwenbergh et al. (2007): -"Energy flow is adequately buffered", measure through the production of renewable energy from agriculture ( X 9 , source: Common Monitoring and Evaluation Framework for the CAP 2014-2020); -"Soil physical and chemical quality is maintained or increased", measured through the area under organic cultivation ( X 10 , source: Faostat); -"Pollution levels are reduced", measured through greenhouse gas emissions due to agriculture ( X 11 , source: Faostat); -"Soil loss is minimized", measured through the gross nitrogen balance ( X 12 , source: Eurostat).
Unfortunately, the measurement of the nutrient balance was limited to nitrogen because the time series for the other available nutrients (phosphorus and potassium) contain a large number of missing values. For the same reason, we disregarded environmental objectives outlined in Van Cauwenbergh et al. (2007) related to natural conservation, soil mass flux, water supply and ecosystem services. Summary statistics of the selected indicators are shown in Table 3. From a first look to the data, it is apparent that the average annual changes in the period 2004-2020 across the considered EU countries are consistent with sustainability for most indicators. The ones with the highest change are the real income of agricultural factors per paid annual work unit ( X 4 , + 2.28%), the median equivalised net income in rural areas ( X 6 , +3.14%) and the production of renewable energy from agriculture ( X 9 , +19.85%). The huge growth rate of this last indicator is explainable by the commitment of EU countries to obtain 20% of its energy from renewable sources by 2020. The ratio young/elderly for farm managers ( X 3 ) is the only indicator with an average annual change not consistent with sustainability (−2.84%).

Imputation of Missing Values and Cointegration Analysis
In the data collection process, we tolerated the occurrence of at most six missing values for each time series (one third), with no more than two consecutive missing values internally to the time series and no more than one missing value at the extremes. The only exception was represented by the ratio young/elderly for farm managers ( X 3 ), which is systematically observed in 2005, 2007, 2010, 2013 and 2016, and missing otherwise. Given the regular pattern of observed values in the considered period (2004-2020) and the valuable and not substitutable information provided by this indicator, we decided to not exclude it from the analysis. It is worth remarking that half of the selected indicators have a missing value in 2020, specifically the agricultural TFP index ( X 1 ), the ratio young/elderly for farm managers ( X 3 ), and all the environmental indicators ( X 9 , X 10 , X 11 and X 12 ). However, this does not constitute a violation of the data collection criteria, and, moreover, the consideration of year 2020 allows our study to account for the most recent available information.
In order to obtain a complete dataset, we imputed missing values based on a Vector Auto-Regressive (VAR) model with fixed intercepts for countries. The procedure was the following. Firstly, we imputed missing values internally to the time series through linear interpolation. Secondly, we performed a graphical check of stationarity and noted that all the time series were definitely non-stationary, as confirmed by the ADF (Dickey and Fuller 1981) and KPSS (Kwiatkowski et al. 1992) tests. Therefore, we specified the VAR model on logarithmic returns to avoid spurious regression (Granger and Newbold 1974). Let x i,t = (x i,t,1 , … , x i,t,p ) � be the multivariate observation of the p indicators on country i at time t. The vector of logarithmic returns for country i at time t is: where log x i,t,j = log x i,t,j − log x i,t−1,j , which approximates the relative change in the value of the j-th indicator with respect to the previous time point. The adopted VAR specification for a given lag length L ∈ ℕ + was: where: is the p × p matrix of coefficients at lag l, i = ( i,1 , … , i,p ) � is the p-dimensional vector of fixed intercepts for country i, and u i,t = (u i,t,1 , … , u i,t,p ) � is the vector of random errors for country i at time t such that: Note that, since the VAR model in formula (2) is specified on logarithmic returns, the intercepts i represent the coefficients of linear deterministic trends. The Expectation-Maximization (EM) algorithm (Dempster et al. 1977) was employed to compute the expected value of missing data for L = 1, 2, 3, 4 , and the imputation provided by the model with the minimum Bayesian information criterion was retained as the final one. The EM algorithm was implemented as follows: 0. missing values are randomly initialized to obtain a complete dataset; 1. (E-step) the VAR model in formula (2) is fitted to the complete dataset; 2. (M-step) missing values are filled by their prediction based on the fitted model to obtain a new complete dataset; 3. the procedure is iterated from step 1 until convergence of the likelihood.
All the computations were performed in R Core Team (2022) through a program developed by the authors. Among the different lag lengths under consideration ( L = 1, 2, 3, 4 ), we found L = 1 as the optimal one.
Before constructing the composite indicator, we tested whether the time series of the selected indicators, after imputation of missing values, were cointegrated (Engle and Granger 1987). Cointegration ensures the existence of a long-term relationship among non-stationary time series, thus it is important to justify the multivariate analysis of the selected indicators. Since our data are structured as a panel, we tested cointegration according to Pedroni (1999). We found that, for half of the selected indicators, the majority among the eleven statistics proposed by Pedroni (1999) leaded to the rejection of the hypothesis of no cointegration. Instead, for the other half of the indicators, few or none of the statistics confirmed cointegration. This result appears satisfactory given the small length of the time series (17 time points), because cointegration tests are notoriously characterized by low power in small samples. (

Methodology
Our composite indicator for agricultural sustainability in EU countries is based on the weighted product method (geometric aggregation of basic indicators), with weights determined endogenously according to the Benefit of Doubt (BoD) approach (Cherchye et al. 2007;Zhou et al. 2010;Vidoli et al. 2015). The BoD approach consists of selecting the weights by maximizing the score of each observation. The BoD weighting scheme is unit invariant, i.e., weights are adapted to the units of measurement of basic indicators (Cooper et al. 2000, p.39), thus normalization is not required. Nevertheless, basic indicators should have the same polarity, thus we preliminarily applied the reciprocal function to all indicators negatively correlated with sustainability, which include the at-risk-of-poverty rate in rural areas ( X 7 ), the unemployment rate in rural areas ( X 8 ), greenhouse gas emissions due to agriculture ( X 11 ) and the gross nitrogen balance ( X 12 ). Let i = 1, … , n denote the countries, j = 1, … , p the basic indicators, and t = 1, … , T the time points. Also, let x i,j,t and w i,j,t be, respectively, the measurement and the weight of the basic indicator X j for country i at time t. The score in sustainability for country i at time t is defined as: Since the basic indicators can be partitioned into the economic (ECO), social (SOC) and environmental (ENV) sustainable dimensions, the score in sustainability given by formula (5) can be decomposed into the product of the score in each sustainable dimension: For each pair (i, t), the weights w i,1,t , … , w i,j,t , … , w i,p,t are determined by solving the following problem: The last constraint, which bounds between 5% and 15% the contribution of each basic indicator to the composite, is introduced to avoid excessively low or high weights.
Note that the logarithm of the composite indicator SUS in formula (5) is a linear combination of the logarithmic values of basic indicators: Therefore, the optimization problem in formula (7) becomes linear after logarithmic transformation of basic indicators. Precisely, for each pair (i, t), the weights w i,1,t , … , w i,j,t , … , w i,p,t are determined by solving the following problem: This optimization was performed in R Core Team (2022) through a program developed by the authors.
Note that, on the logarithmic scale, the contribution of each basic indicator to the composite can be expressed as a share: We refer to r i,j,t as the relative importance of the basic indicator X j for country i at time t. Analogously, the relative importance of sustainable dimensions ECO, SOC and ENV for country i at time t can be computed as the ratio of the logarithmic score in each dimension to the logarithmic score in sustainability: In order to assess the change in time of the composite indicator SUS and of its components ECO, SOC and ENV, we adopt the mobility index proposed by Giambona and Vassallo (2014). Let R i,t = R i,t − R i,t−1 be the first order difference in rank at year t for country i, and S i,t = S i,t − S i,t−1 be the first order difference in score at year t for country i. The mobility index for country i is defined as: It can be noted that the mobility index for a country is the mean of first order changes in rank weighed by first order changes in score. Therefore, it accounts not only for the absolute change of the country, but also for its relative change with respect to the other countries. The mobility index for a country takes positive (or negative) value in case of increasing (or decreasing) relative performance in the considered period, while a null value indicates an overall stability of the performance.

Results and Discussion
In this section, we report and discuss the results of our composite indicator. Sections 5.1 and 5.2 focus, respectively, on the trajectories of sustainability and on the relative importance of sustainable dimensions and basic indicators. Section 5.3 provides a comparison with the results of existing studies, while Sect. 5.4 compares our results with subsidies attributed by the Common Agricultural Policy (CAP). Finally, Sect. 5.5 reports the analysis of the sensitivity to different aggregation methods and weighting schemes. Figure 1 shows the trajectories of the composite indicator SUS (in red) and of its economic (ECO, in blue), social (SOC, in orange) and environmental (ENV, in green) components in the period 2004-2020. We see that the trend of sustainability is pretty stable or has a moderate growth rate for most countries, and that no country has a definitely decreasing trend of sustainability. Countries showing a trajectory of sustainability with strong growth rate include Bulgaria, Croatia, Lithuania and Poland. Among countries with a non-decreasing trajectory, those reaching the 90th percentile of the score in sustainability are Austria, Czechia, Estonia, France, Germany, Hungary, Latvia, Lithuania, Slovakia and Sweden. Cyprus, Malta and Netherlands show an irregular trajectory of sustainability, but with a non-decreasing trend in recent years. For what concerns sustainable dimensions, the social and the environmental ones have similar levels for most countries, while the level of the economic dimension is definitely higher for all countries. It can be noted that the trend of the economic dimension is decreasing for Austria (which is the country with the highest level of sustainability), Finland, Italy, Latvia and Slovakia. Also, Cyprus and Sweden show a decreasing trend of the social dimension, while Czechia is characterized by a decreasing trend of the environmental dimension. Mobility indices can be inspected to get an in-depth insight into the trajectories of sustainability of EU countries. In fact, the mobility index accounts for the evolution of the performance of each country relatively to the other ones. Average scores and mobility indices are shown in Table 4 and displayed in Figures 2 and 3. From Figure 2, it can be noted that the countries with average score in sustainability (SUS) above the third quartile are Austria (AT), Slovakia (SK), Sweden (SE), Hungary (HU), France (FR) and Czechia (CZ), and all of them has a positive mobility index, implying an overall improvement of sustainability in the period 2004-2020, with the exception of Czechia for which the mobility index is negative. Therefore, Czechia requires attention in the near future to prevent a degradation of its , interventions aimed at targeting specific weak sustainability objectives should be considered. Figure 3 provides a comparison between average scores and mobility indices by sustainable dimension. Such comparison may help, on one hand, in supporting the design of policies in favour of countries with a low level of sustainability, and, on the other hand, in monitoring the importance that countries with a high level of sustainability attribute to the sustainable dimensions. For instance, we see that the most problematic dimension  for Cyprus (CY) and Luxembourg (LU) is the environmental one (average score below the first quartile and negative mobility index), followed by the social dimension (average score above the third quartile but negative mobility index), while the economic dimension is characterized by a low average score but with positive mobility. Instead, the weakest dimension for Belgium (BE) and Netherlands (NL) is the economic one, characterized by low average score and negative mobility. The weak performance of Austria (AT) in the economic dimension despite the excellent performance in sustainability (SUS), previously  Figure 4 displays the trend of the relative importance of sustainability dimensions by country in the period 2004-2020, while Tables 5 and 6 report, for each country, mean and average annual change of the relative importance of each sustainable dimension and basic indicator. We see that the economic dimension has the highest relative importance with an average across countries equal to 42.9%, followed by the environmental dimension (23.8%) and by the social dimension (22.4%). The ranks of the relative importance of sustainable dimensions differ within countries, but it can be noted that the economic dimension is ranked first for all countries excepting Belgium and Netherlands, for which it is ranked second after the social dimension. Instead, the social dimension is ranked first only for Belgium and Netherlands, and the environmental dimension is never ranked first.

Relative Importance of Sustainable Dimensions and Basic Indicators
Among the considered basic indicators, the net entrepreneurial income of agriculture ( X 5 , economic dimension) has the highest relative importance with an average across countries equal to 13.74%, followed by the TFP index of agriculture ( X 1 , economic dimension, 12.59%), the median equivalised net income in rural areas ( X 6 , social dimension, 11.35%), greenhouse gas emissions due to agriculture ( X 11 , environmental dimension, 8.96%), and the real income of agricultural factors ( X 4 , economic dimension, 8.88%). The other basic indicators have an average relative importance across countries between 5% and 8%, although the ranks of their relative importance differ significantly within countries and no definite patterns can be deduced.

Comparison with Existing Studies
Our findings are not properly comparable with those of existing studies, because our study is the first one in the literature performing a longitudinal assessment on all the three sustainable dimensions for an exhaustive number of countries. Among existing studies, the most suited for a comparison with our results are Nowak et al. (2019) and Cataldo et al. (2020), where all the three sustainable dimensions and a non-trivial number of countries is considered, although they rely on cross-sectional data.
In Nowak et al. (2019), a composite indicator is constructed based on 2016 data using the TOPSIS method (Hwang and Yoon 1981), leading to a top ten list including seven transition economies (Slovakia, Czechia, Bulgaria, Latvia, Lithuania, Estonia and Hungary) and only three developed countries (Spain, Luxembourg and Austria). This result is apparently in contrast with our composite indicator, but it can be explained by the focus on a single year, where the increasing performance in sustainability for transition countries, also highlighted by our findings, may have been particularly favourable. However, it is reasonable to think that the discrepancies between the findings of Nowak et al. (2019) and ours are mainly due to the fact that the TOPSIS method is based on multicriteria decision analysis, and thus the weighting scheme differs substantially from the BoD one.
In Cataldo et al. (2020), Partial Least Squares Path Modelling (PLS-PM) with secondorder formative constructs is exploited to construct a composite indicator based on 2017 1 3 data, leading to a higher weight for the economic dimension, followed by the social and the environmental dimensions. The disagreement between these findings and ours can be explained by a substantial difference in the considered indicators. In fact, the study of Cataldo et al. (2020) considers the system of indicators designed to monitor Sustainable Development Goals (SDGs), which includes measures mainly related to agriculture but not limited to agricultural sustainability. Unfortunately, the main objective of Cataldo et al. (2020) is to derive the weights of sustainable dimensions, thus the ranks of countries are not reported and a comparison with ours is not possible.

Comparison with CAP Subsidies
The results of our composite indicator can be exploited to explore the effectiveness of subsidies attributed by the Common Agricultural Policy (CAP). At this purpose, we accessed the Farm Accountancy Data Network (FADN, European Commission 2020b) and downloaded the data at country level on the following indicators: "Total subsidies, excluding on investments" (SE605), "Subsidies on investments" (SE406), "Environmental subsidies" (SE621), "Subsidies for less favourite areas" (SE622), and "Other rural development payments" (SE623). Total CAP subsidies were obtained by summing the indicators SE605 and SE406. Also, we distinguished CAP subsidies based on economic, social and environmental objectives: environmental subsidies are directly measured by the indicator SE621, while social subsidies were proxied by summing the indicators SE622 and   SE623. Finally, subsidies targeting the economic dimension were obtained by subtraction from total CAP subsidies. All the data on CAP subsidies were divided by the utilized agricultural area (UAA) to allow comparisons among countries. Table 7 reports the mobility index for scores and for CAP subsidies to utilized agricultural area (UAA) by country in the period 2004-2019 (data for year 2020 are not available in the FADN). The same mobility indices are compared in Figure 5, where it is apparent a substantial agreement between the score in sustainability (SUS) and total CAP subsidies, with few exceptions. Countries with an evident incoherence include Netherlands (NL), which shows an increase in CAP subsidies despite a decreased score in sustainability (second quadrant of Figure 5), and Austria (AT), Slovenia (SI) and Malta (MT), which show a decrease in CAP subsidies despite an increased score in sustainability (fourth quadrant of Figure 5). Figure 6 provides a comparison between scores and CAP subsidies by sustainable dimension. Again, a substantial agreement is apparent for all the three sustainable dimensions with few exceptions. In particular, countries with an increase in CAP subsidies despite a decreased score in sustainability (second quadrant) include: Austria (AT), Czechia (CZ), Estonia (EE), Latvia (LV) and Netherlands (NL) for the economic dimension; Belgium (BE), Czechia (CZ), Luxembourg (LU) and Sweden (SE) for the social dimension; Bulgaria (BG), Cyprus (CY), Croatia (HR) and Portugal (PT) for the environmental dimension. Instead, countries with a decrease in CAP subsidies despite an increased score in sustainability (fourth quadrant) include: Malta (MT) for the economic dimension; Estonia (EE), Lithuania (LT), Malta (MT), Portugal (PT) and Slovenia (SI) for the social dimension; Austria (AT), Belgium (BE), Finland (FI) and Netherlands (NL) for the environmental dimension.
The substantial agreement between mobility indices of scores and mobility indices of CAP subsidies supports the reliability of our composite indicator. Therefore, it represents a valuable resource to refine the scheme for the attribution of CAP subsidies in order to stimulate specific sustainable dimensions.

Sensitivity Analysis
Sensitivity analysis is an important step to evaluate the robustness of the composite indicator with respect to alternative methodological choices, i.e., the selection of indicators, the Values are means across the period 2004-2020 with average annual change within brackets Overall: average across all countries normalization procedure, the aggregation method, and the weighting scheme (Terzi et al. 2021). Since we selected the basic indicators based on theory, guidelines in the literature and data availability, the robustness of our composite indicator with respect to the use of different basic indicators was not investigated. Also, since the BoD weighting scheme is unit invariant, we disregarded the effect of different normalization procedures and concentrated only on the comparison between different aggregation methods and weighting schemes. Specifically, we computed three alternative composite indicators: (i) arithmetic aggregation with BoD weights, (ii) geometric aggregation with uniform weighting, (iii) arithmetic aggregation with uniform weighting. In these alternative composite indicators, uniform weighting was obtained by setting the weights equal to the reciprocal of the variance of each basic indicator. Table 8 shows average ranks and mobility indices for our composite indicator and the three alternative composites. We see that there is a substantial difference in the results  based on arithmetic aggregation (BoD versus uniform weighting) are the most dissimilar from each other (Spearman correlation equal to 0.399). This analysis highlights a clear dependence of the results from the aggregation method and the weighting scheme, confirming the core importance of methodological choices that, in this research, have been clearly motivated in favour of geometric aggregation with BoD weights.

Concluding Remarks
In this paper, we have emphasized that few studies have been conducted to assess agricultural sustainability in the European Union (EU), and all of them fail to provide a holistic view of sustainability in a relevant temporal horizon that could effectively Our proposal is innovative with respect to existing studies because we considered: (i) all EU countries rather than a subset of them, (ii) a broad set of indicators (12 in total) to cover the economic, social and environmental dimensions of sustainability, (iii) longitudinal data over a long period (17 years). Also, the decomposition into the contributions of sustainable dimensions and the adoption of the BoD weighting scheme is novel in the assessment of agricultural sustainability in the EU.
The construction of composite indicators is subjected to several arbitrary choices that may influence the final results, especially the aggregation method and the weighting scheme. Therefore, we paid particular attention in motivating our methodological choices. On one hand, geometric aggregation was preferred to the arithmetic one because it allows a small degree of compensation to reflect the fact that sustainable development is achieved only when all or most individual sustainability goals are pursued. On the other hand, the BoD weighting scheme was selected because it permits to infer the relative importance of each basic indicator and sustainable dimension in the achievement of sustainability without relying on subjective opinions. The core importance of methodological choices, and thus of their motivation, was also confirmed by the sensitivity analysis conducted on our composite indicator.
A valuable resource employed to discuss our results is represented by the mobility index. The mobility index accounts for the evolution of the performance of each country relatively to the other ones, and not simply for each country separately. Therefore, it allows an in-depth insight into the trajectories of sustainability of the various countries. For this reason, we hope that our work encourages the developers of composite indicators for longitudinal data to use the mobility index in the discussion of their results.
In order to check the reliability of our composite indicator, we inspected the relationship between mobility indices of scores and mobility indices of subsidies attributed by the Common Agricultural Policy (CAP). Our findings highlighted a substantial agreement between the two, both in overall and by sustainable dimension. Therefore, our composite indicator represents a valuable resource not only to monitor the progress of EU member countries towards sustainability objectives, but also to refine the scheme for the attribution of CAP subsidies in order to stimulate specific sustainable dimensions.
The main critical point of our work relies in quality and availability of data, an issue affecting all multidimensional assessments due to the practical difficulty of collecting reliable measurements on a large number of indicators. The national scale and the longitudinal nature of our analysis entail further complications, because the only data sources are represented by international institutions and organizations, and available time series are typically short and may present a number of missing values. In this paper, missing values have been imputed based on a vector auto-regressive model with fixed intercepts. Our imputation procedure has good properties because the Expectation-Maximization (EM) algorithm was employed to compute the expected value of missing data. However, the limited length of the time series prevented us to effectively check the presence of cointegration, thus our methodology has room for improvement. At this purpose, we plan to integrate the EM algorithm within the BoD optimization and to recompute the composite indicator as future data become available.