Introduction

The principles of the circular economy dictate that materials use should be maximized in the provision of products, and that losses should be minimized over time (European Commission 2015). Along these lines, biomass generated in agricultural and industrial activities are increasingly valued as resources for recycling and re-use. In the EU, agricultural residues account for 90 M tons of biomass generation, while biowastes account for 147 M tons (Camia et al. 2018). In the European Union, the legal framework for management of wastes is crucial since the identity of biomass dictates how it will be handled. For instance, from the end of 2023 onwards EU Member States must collect bio-waste separately or ensure recycling at source. The Waste Framework Directive and subsequent amendments (European Commission 2008) defines bio-waste as biodegradable garden and park waste, food and kitchen waste from households, restaurants, caterers and retail premises, and comparable waste from food-processing plants. Furthermore, more than 50% of biowaste from municipal waste in Europe is still incinerated or landfilled (EEA 2020), highlighting the nature and extent of the unproductive and inefficient use of biomass. It is important to keep in mind that at this moment, composting and anaerobic digestion are the dominant forms of recycling biowaste in the EU (EEA 2020).

Based on the concept of cascading biomass use or a 'biomass value pyramid' (as a policy objective described in e.g. European Commission 2012, or agro-industrial byproducts in Berbel and Posadillo 2018), pursuit of efficiency in the bio-based economy begins with relatively low-value use for energy, increases when repurposed or transformed to a new material (e.g. compost or fertilizer), and value continues to increase as a feed, food, or with high-value products such as pharmaceuticals or specialty chemicals. For instance, phosphorus is considered a critical element in the EU since 2020 (European Commission 2020). Recovery of phosphorus and other nutrients from nutrient-laden wastes including manure, and upgrading biomasses resources into high-quality fertilizers, has thus become a research topic (Dadrasnia et al. 2021). In the context of alternative livestock feeds, so called indigestible, unpalatable, or undesirable biomass sources, or IUUB, have been classified to include (i) crop residues, typically inedible or indigestible, (ii) processing byproducts consisting of residues from food/beverage processing industry, and (iii) food waste/discards from the various stages of the food chain, usually undesirable to humans (Dou et al. 2022; Dou 2021). Some biomasses also contain appreciable quantities of phytochemicals which have utility in the cosmetic, food, or pharmaceutical industries, and recovery and extraction of phenolic compounds from abundant biomass sources is now given attention as a potential valorization pathway due to the high value of compounds with antioxidant activity. When considering nutrient recovery from the biorefinery perspective, a number of biomass properties must be considered, most importantly total contents, but also factors affecting extraction such as organic composition and biomass preparations which affect leachability of the target nutrients (Constantinescu-Aruxandei and Oancea 2023). Therefore, characterizing the chemical properties and composition of biomass is important when considering alternative valorization pathways, and this has been an objective of this work.

Biomass accounting is carried out on different administrative levels within the EU. Data collected is used for identifying opportunities for policy change, setting of goals, and developing appropriate strategies for biomass management. These data can also be used for regional nutrient budgets and understanding flows of matter. The treatment of biomass of course depends on its physical and chemical properties, which both determine it potential value for processes such as composting or biogas, value as a fertilizer, or other. For instance, the EU fertilizer regulation 2019/1009 (European Union 2019) establishes quantitative requirements for contents of products marketed as fertilizing products. What also must be considered is how contents of elements such as heavy metals, phytochemicals, or lignocellulosic composition may limit the appropriate management, transformation or safe use of a biomass.

For this study, we obtained 625 samples of agro-industrial biomass and biomass waste throughout the Valencian Community, a large administrative district within Spain comprising of 23,255 km2, and with a population of about 5 million, making it the fourth most populous Spanish autonomous governance region. The biomass samples were highly representative of biomass sources generated in the geographical region, and were analyzed for main physicochemical and chemical properties relevant to recycling in the context of the circular economy, as well as polyphenol contents and metal contents. With this database generated for an administrative region, our aims were to understand how biomass samples generated within in this geographical area may be categorized or contemplated for management or later processing with respect to their actual origin, administrative designations, or sectors.

Materials and methods

Experimental procedure

Samples from a large diversity of agricultural, agri-food industry, and municipal sources were collected over 2017–2022. Overall, 625 samples were collected from 164 municipalities within the Valencian Community. All the samples were taken directly in the production source from 2017 to 2022. Representative sample was obtained by mixing five subsamples from five sites of the organic waste source, from the whole profile (from the top to the bottom of the accumulation; Bustamante et al. 2012). Individual samples were unique in terms of type, locality, and installation, and were also independent (not following temporal sequence). As one exception to independence, one group of wastewater treatment sludge samples from one locality were sampled at two moments (nine samples each time, total of 18 samples). Collected samples were categorized following a regional protocol established by regional experts assigning each waste to logical categories based on process origin (e.g. industrial or agricultural), type of material (e.g. animal or plant), etc. Subcategories were also established for further refinement. The established categories with sample numbers are shown in Table 1. Basic properties included: material density, humidity of fresh sample, pH, electrical conductivity (EC), nitrogen contents, total potassium (K) contents, total phosphorus (P) contents, organic carbon (COrg), organic matter (OM), C/N ratio, total sodium contents. Also, total polyphenol content was analyzed on all samples, while a sub-set of samples was analyzed for micronutrients, metals and other elements' contents. The metal and metalloid dataset includes a total of 224 samples from 69 municipalities. As can be seen in Table 1, 17 of the total 21 categories were included in this dataset, with the exception of aerobic sludge, cereal straw, mushroom substate, and riparian canes. In order to refine the analyses, subcategories were also defined, and within the considered categories, the total number of subcategories was 54.

Table 1 Numbers of biomass samples included in each of the two databases (nutrients and main physicochemical properties database, and metal and metalloid database)

Analytical methods

Bulk density and moisture contents were determined in the fresh raw biomass subsamples according to the standard method CEN12580 (European Committee for Standardization 1999) and after drying at 105 °C until constant weight, respectively. Then the remaining raw sample was air-dried and ground to a particle size of 0.5 mm for the later analyses. The physicochemical and chemical parameters were determined in triplicate according to the methods described by Bustamante et al. (2008b). Briefly, water-soluble extracts (1/10, w/v) were analyzed for the physicochemical parameters (pH and electrical conductivity); an automatic elemental micro-analyzer was used for the assessment of total organic carbon (TOC) and total nitrogen (TN) and total organic matter (OM) was evaluated by loss on ignition (430 °C for 24 h) whereas macronutrients and micronutrients (P, K, Ca, Cu, Mg, Fe, Mn and Zn), toxic heavy metals (Cr, Ni, Cd, Hg and Pb), and other metal and metalloid elements were determined in the extract obtained after acid digestion (HNO3/H2O) (1:1 v/v) using a microwave by ICP-OES. The water-soluble polyphenols were determined by a modification of the Folin–Ciocalteu method in a 1:20 (w/v) water extract (Beltrán et al. 1999). Throughout the manuscript, percentage (%) refers to percent of the mass fraction.

Data treatment and analysis

To avoid exclusion of observations in the database, laboratory analysis results for metals and macroelements which were below limit of detection were automatically assigned the lower limit of detection for the method (e.g. 0.01 mg kg−1). The statistical package R version 4.1.3 was used for data analysis and visualization (R Core Team 2022). ANOVAs were constructed for each chemical parameter to understand the division of dataset variance between within-group and between groups. For parametric statistics, data were transformed appropriately if not fulfilling normal distributions. Relationships between certain variables were analyzed by liner regression (lm function). For analyses of material chemical properties by principal component analysis (PCA), which does not permit missing values, all such observations were removed from the database (n = 29). In order to understand the relationship between major nutrient components, a PCA was carried out using the prcomp function of the R stats package, and correlations between scaled data were also examined. Also, two data classification methods were employed to understand the relationships between biomass subcategories based on chemical properties. k-means clustering and hierarchical clustering analysis (HCA) are unsupervised machine learning techniques which have been applied previously to categorize biomass based on chemical properties or provide decision-making support in biomass energy applications, waste recycling and nutrient recycling (e.g. Mastro et al. 2020; Ray et al. 2020; Wu et al. 2023). The k-means technique aims to partition the observations into a particular number of clusters which minimizes the within-cluster squared Euclidian distances. The analysis was performed separately for the ‘nutrient’ and ‘metals’ datasets based on average values for each subcategory. In the ‘nutrients’ database, polyphenol contents were not included since this information about certain high-value compounds can be ascertained based on that parameter alone. In the case of the 'metals' dataset, one subcategory (processed food water) was removed as an outlier since it was assigned its own cluster. Also, in order to reduce unnecessary complexity in cluster size, following visual inspection of the correlation matrices, those elements having overall low correlation with others were removed from the dataset (Bi, La, Sb, Si, Tl). The optimum cluster number was found using the silhouette method (fviz_nbclust function of the factoextra package). Secondly, we applied agglomerative hierarchical clustering analysis (hclass in stats), in which cluster-observations are joined iteratively based on dissimilarities until only one cluster remains. Here, Ward's squared dissimilarity criterion was used for the clustering algorithm (Murtagh and Legendre 2014).

Results

Variability and associations among chemical parameters

ANOVAs were used to understand the partitioning of routine chemical parameter variance (how much owed to ‘category’ or ‘subcategory’ group membership). There were notable differences in F-values (whereas higher F-value indicates that category membership explains more variability). The lowest F-values, indicating low association of variability with group membership, was found for EC, Na, and polyphenols, while the largest F-values were found for biomass density and P and N contents (Table 2).

Table 2 F-values table for nutrients and main chemical properties of wastes, by ‘category’ and ‘subcategory’ grouping

Based on correlation and PCA analyses (Fig. 1) of nutrient contents, what is generally observed is negative correlations and associations between nutrient contents and organic matter or carbon contents, and positive correlations between nutrients, in particular N and P (R2 = 0.60, F = 884, p < 0.001). Briefly, biomass with high nutrient contents included biomasses from manure, industrial wastes, sludge, and coffee and cocoa, while lignocellulosic biomasses such as woody plant prunings and discarded plant parts had high carbon contents and relatively low nutrient contents (see below for more details). Boxplots of main chemical properties by ‘category’ are shown in S1. In the PCA analysis, the first two component dimensions summed to 52.5% of dataset variability, and 75% total variability was achieved with 4 principal components. Based on PCA eigenvectors, EC had the weakest contribution to the variance in the first principal components (-0.03 and -0.17, respectively), which is reflected in the graphical representation. Also seen is the grouping and therefore association of nutrient (N, P, K) and sodium contents of the materials. Polyphenols were not strongly associated with any of the measured main chemical properties.

Fig. 1
figure 1

PCA biplot of nutrient contents and main physicochemical parameters of the biomasses in the database (n = 625). Abbreviations as in manuscript text

To understand the origin of observed variability in electrical conductivity, a more detailed correlation analysis was carried out for parameters of EC and non-volatile matter (ash), Na and K contents. Following appropriate transformation of the variables to achieve distribution normality for all parameters, it was first seen that the relationship of EC with ash was weak (correlation coefficient R2 = 0.14), for Na moderate, (R2 = 0.25), and highest for K (R2 = 0.41). By categories, the highest correlations between Na and EC were found for marine plants (0.84), coffee and cocoa (0.78), and pig slurry (0.76), while the lowest were found for chicken manure and industrial crop waste (0.00–0.01). For K, the highest correlations were found for distillery waste (0.92) coffee and cocoa (0.99), and riparian canes (0.95) followed by pig and cow manure (0.77–81) and forestry wastes, while the lowest were found for unmarketable produce, substrates, and marine plants (0.00–0.01).

The averages of main chemical properties of the biomasses by subcategory are shown in Table 3, and the standard deviations of these are shown in S2. Total nitrogen and P contents were greatest in biomasses such as sludges and manures, but also in industrial processing wastes of fresh produce, and discarded produce. C/N ratio was highest in lignocellulosic materials such as forestry waste and plant prunings, and lowest in manures and vegetable discards. Polyphenols were highest in fruit and onion waste, trimmings, and unmarketable fractions, as well as aromatics distillery waste, and olive and artichoke biomass. The distributions of nutrient concentrations of all biomasses are shown in Fig. 2; in this figure, total P and K concentrations have been converted to equivalent contents as P2O5 and K2O to match the units of relevant legislation (EU 2019/1009).

Table 3 Average values for nutrient and main chemical properties, by ‘subcategory’ grouping
Fig. 2
figure 2

Histograms displaying the distribution of nutrient concentrations of the biomass samples within the database (n = 625). In the case of phosphorus and potassium, concentrations have been converted to equivalent contents in oxidized forms per conventions for commercial fertilizers. Vertical red dashed lines represent the limit for qualifying as solid organic fertilizer

Nutrient contents and elements limiting valorization as fertilizing products

Biomasses qualifying for certification as Solid Organic Fertilizers (EU 2019/1009) are shown in Table 4. N-rich biomasses with concentrations consistently > 2.5% included aerobic sludges, laying hens manure, and dog manure. Other biomasses such as other manures and municipal wastes achieved the benchmark in some cases but were less consistent. Notable P-rich biomasses (> 2% P2O5) included aerobic sludges, pig slurry, laying hens manure, and rabbit manure, while broiler chicken manure was less consistent (17/38 samples > 2%). Finally, regarding K2O contents > 2%, notable was broiler chicken manure, cow manure, grape stems, laying hens manure, sheep manure, different unmarketable fruits and vegetables, and grape and winery biomass. Less consistent was olive mill waste (41/72 samples > 2%).

Table 4 Biomasses, by subcategory, fulfilling minimum nutrient contents to be eligible as solid organic fertilizers (> 15% OC) with one primary nutrient under the EU regulation 2019/1009

Based on the metals and metalloids dataset, the concentrations of remaining macronutrients and micronutrients was assessed on 224 samples. Averages and standard deviations of all subcategories are shown in S3 and S4. Figure 3 shows average concentrations by subcategory of sulfur (S), iron (Fe), magnesium (Mg), and calcium (Ca), and Fig. 4 shows average concentrations by subcategory of manganese (Mn), molybdenum (Mo), boron (B), and nickel (Ni). Biomass S contents were highest for processed food samples and juice factory waste (0.7–1.1%), followed by manures (0.3–0.7%). Fe contents were highest for processed food wastes (5500 mg kg−1), followed by manures, juice factory sludge, persimmon prunings, and horticultural wastes (2200–4700 mg kg−1). Mg concentrations were highest in manure samples (0.4–1.5%). Ca contents were highest in manures, citrus prunings, parks and garden wastes, and processed food wastes (4.7–6.6%). Mn concentrations were largest in manure samples (160–730 mg kg−1) but also persimmon prunings (262 mg kg−1). Mo concentrations were quite high in juice factory sludge (22 mg kg−1), followed by processed food water and manures (2–6 mg kg−1). B contents were highest in persimmon prunings (64 mg kg−1) followed by winery wastes (45 mg kg−1) and orange prunings (43 mg kg−1). Ni contents were particularly high for processed food water (56 mg kg−1) followed by juice factory sludge (23 mg kg−1), whereas the range of all remaining samples was small (0.1–12 mg kg−1). Graphs for copper (Cu) and zinc (Zn) are not shown due to low evenness between categories since these elements are mainly present as contaminants in manure samples, but also processed food water (described in greater detail below). However, it is notable that olive leaves, with 20 samples, had average Cu contents of 101 mg kg−1, whereas the next highest plant-derived biomass was winery waste (grape marc; 49 mg kg−1).

Fig. 3
figure 3

Barplots of average concentrations, by subcategory, of sulfur (panel A), iron (B), magnesium (C), and calcium (D). See supplementary information for greater details of variability within groups

Fig. 4
figure 4

Barplots of average concentrations, by subcategory, of manganese (panel A), molybdenum (B), boron (C), and nickel (D). See supplementary information for greater details of variability within groups

Regarding potential contaminants, copper (Cu) and zinc (Zn) contents of organic fertilizers are limited to 300 mg kg−1 and 800 mg kg−1, respectively. Within the metals database, the Cu limit was exceeded in one pig slurry sample and one horse manure sample, and the Zn limit was exceeded in three boiler chicken manures, one park and garden waste sample, two olive mill waste samples, and nine pig slurry samples (Table 4). Lead (Pb) concentrations ranged from below detection limit (BDL)-52.1 mg kg−1, where processed food water, having the highest concentrations, was still below the established limit for organic fertilizers of 120 mg kg−1. Cadmium (Cd) concentrations ranged from BDL to 0.8 (pig slurry), whereas the limit is 1.5 mg kg−1. Rabbit manure, food processing sludges, and coffee and cocoa biomasses also had relatively high average Cd contents within the dataset (0.3–0.4 mg kg−1). Nickel (Ni) ranged from 0.2 to 56.3 mg kg−1, whereas the highest value for processed food water exceeded the established limit of 50 mg kg−1. Arsenic (As) concentrations ranged from BDL-47.7 mg kg−1, the highest value found in one olive mill waste sample, followed by one sample of horticultural plant parts (41.1 mg kg−1) whereas these exceeded the organic fertilizers limit of 40 mg kg−1. Hexavalent chromium and mercury were not measured in the studied biomasses.

Average polyphenol concentrations by subcategory are shown in Table 3. The highest average polyphenol contents are found in biomasses such as pomegranate discards (43,000 mg kg−1), pomegranate prunings (21,000 mg kg−1), unmarketable onion (27,000 mg kg−1), and also different citric fruit and citric prunings (11,000–19,000 mg kg−1).

Clustering analyses

For the nutrients database, the k-means clustering procedure resulted in two clusters following cluster size selection with the silhouette method (30 and 24 observations in clusters 1 and 2 respectively; Table S5). The ratio of between SS / total SS was 40.7%, and average silhouette size was 0.37. Grape marc and barley straw had silhouette coefficients which were near 0, indicating a high overlap with both clusters. Based on average values for each cluster (Table S6), it is seen that the biomasses were differentiated based on N and P contents, but also Na contents and electrical conductivity. As detected in the PCA plot of the clusters (Fig. 5) marine plants had a low silhouette value whereas this biomass had exceptionally high salinity. For the metals database, three k-means clusters resulted (13, 3, and 13 observations for clusters 1, 2 and 3), whereas the ratio of between SS / total SS was 80.5%, and average silhouette size was 0.5. Here, only the grape prunings subcategory within cluster 2 was flagged as possibly misclassified (silhouette coefficient S =  − 0.05), indicating that it had high overlap with cluster 1. The average metal and metalloid concentrations by cluster are shown in Table S7. Subcategories in cluster 1 had the lowest overall metal and metalloid contents, and cluster 2 the highest. Cluster 2 was composed of only three biomass types: goat manure, rabbit manure, and sheep manure, found to be similar in some respects to cluster 3 (Fig. 5). Clusters 2 and 3 seem to be most differentiated based on the high macroelement/macronutrient contents in biomass samples in cluster 2 (principally Al, Ca, and Fe, but also minor elements such as As, Li, and Ti).

Fig. 5
figure 5

Visualization of k-means clusters in two dimensions for (A) the ‘nutrient’ database, and (B) the ‘metals’ database, whereas cluster number (k) has been established using the silhouette method

The results of the HCA are presented in the form of dendrograms constructed with the application of the Ward's linkage method (Fig. 6). In the case of nutrients, the dendrogram split the database into two initial clusters, with later branching. These membership of the two large HCA clusters corresponded fully (100% agreement) with the memberships of the two clusters identified in the k-means analysis. Subsequent branching of the tree resulted in the close association of biomasses which are known to have similar properties, such as tree prunings, manures, and straws, and others, in addition to notable exceptions which revealed relevant differences (see discussion). In the case of the metals dataset, the initial branching of the dendrogram also resulted in two main groups, whereas the group containing low-metal biomasses (including prunings, straws, among other) had low within-group dissimilarity (height values in the y-axis of the dendrogram), while the other grouping had higher dissimilarity (seen as more branching across greater height within the tree). In this dendrogram, the three manures associated to their own groups in the k-means analysis are within the high metals content branch, but it is also seen that they are positioned together in early branching. This is also the case for materials which had low silhouette value and high distance from group centroids, such as pig slurry and juice factory sludge (identified visually in Fig. 5). A few differences in top-level group membership were found when compared to the k-means clustering, which were: broiler chicken manure, grape prunings, nut shells, and rice straw.

Fig. 6
figure 6

Dendrograms generated from hierarchical clustering analysis for (A) the ‘nutrients’ database and (B) the ‘metals’ database. On the y-axis, ‘height’ corresponds to the measure of dissimilarity computed using Ward's clustering criterion with the application of squared dissimilarities

Discussion

Variability of main chemical properties

Appropriate management of biomass wastes is a priority for achievement of circular economy goals. For this their properties must be understood for processing and recycling within the value chain.

Biomasses were grouped into logical categories based on criteria set for a regional biomass survey. Characterizing the variability of different parameters within these categories is important accounting and scaling for regional balances, or modelling efforts. Parameters associated with salinity and electrical conductivity (EC and Na contents) and polyphenols had the greatest variability within categorical groupings (Table 2). The parameters EC and Na are particularly related to the water-soluble fractions of waste, which can be related to processing or format of the biomasses, to be discussed in greater detail below. On the other hand, N and P had the lowest within-group variability, and are thus quite indicative of group membership (Table 2). This means that grouping has been particularly useful for describing nutrient contents of the categories and subcategories established. On a practical note, if estimated data based on categories were used for upscaling and estimating nutrient flows for these crucial elements, this would be an asset to those carrying out such efforts to quantify nutrient flows and budgets, since N and P are crucial elements which have both high value and at the same time are associated with regional nutrient imbalances and contemporary environmental problems e.g. those impacting freshwater resources.

Electrical conductivity, a surrogate of dissolved salts content, is relevant for fertilizing products characteristics and compost quality since high soil EC values can impact crucial soil biologically-mediated processes such as respiration, decomposition, and transformations of nitrogen in the nitrogen cycle (Smith and Doran 1996; Adviento-Borbe 2006). EC was the weakest predictor of variability within the PCA. Some biomass categories included relatively high EC values (> 10 dS m−1). Among these: manures due to their low high ash contents/low volatile matter contents; municipal wastes organic fractions are known to have high EC due to the presence of household food scraps; marine plants have high salt contents due to their origin and processing or storage conditions may affect the final salt contents of the biomass. Within the 'unmarketable produce' category, tomatoes and vegetables had particularly high EC values (averages of 11.1 and 7.7 ds m−1, respectively). Overall, Na had weaker associations with EC than K. Overall, it seems that EC was not strongly associated with other chemical parameters due to important differences in the manner in which biomass products may be processed or altered, affecting the solubility of elements contributing to EC.

Utility of surveyed biomasses for fertilizing products

The nutrient content of biomasses is key since recycling of nutrients within the agricultural system contributes to the goal of circularity of food systems. Manures are among the most valued source of nutrients due to their high concentration. However, manures face management challenges and use restrictions due to heavy metal contents (e.g. Liu et al. 2020), and the study highlights a number of other potential sources of concentrated macronutrients which may be targeted for increased exploitation. For instance, lettuce, artichoke, pepper, tomato, watermelon, and different wastes from food processing had N contents > 3%, though there were few such samples in the database. Many biomasses had sufficient macronutrient contents making them eligible for marketing as fertilizing products in the EU (Table 4). Manures, with among the highest nutrient contents, were differentiated in the analysis for their particular stoichiometries of N, P, and K. The management of some manures is complicated by their contents of heavy metals Zn and Cu, which are specifically regulated by the legislation (2019/1009). Zn is used as a food additive in intensive pig rearing (Carlson et al. 1999; Sáez et al. 2017), in some cases leading to exceptionally high concentrations in pig slurry (> 6000 mg kg−1 found in one sample), and in the database, 9 of 13 samples surpassed the 800 mg kg−1 limit. Other biomass types sporadically exceeded an established limit, whereas overall 20 samples of 224 exceed some heavy metal concentration limit for organic fertilizers under the EU regulation, nearly half of those pertaining to pig manure.

Concerning metals and metalloids, it is notable that industrial food wastes such as processed food water and juice factory sludge contained high concentrations of both macronutrients and microelements, in some cases surpassing safe levels (sections above). However, used in appropriate doses, these wastes might provide an efficient source of elements such as S, Fe, Mo, and Ni. Manures, containing many nutrients in concentrations larger than plant-derived biomasses, are good sources of micronutrients. However, some plant biomasses also contained appreciable quantities of micronutrients; in the case of Cu, the high concentrations in grape marc and olive leaves are related to use of fungicides (Adawi et al. 2022); persimmon prunings had large amounts of essential micronutrients B and Mn, whereas during cultivation these microelements are likely applied to improve flowering, fruit set, or post-harvest quality (Ferri et al. 2008). Lignocellulosic biomasses such as cereal straws and forestry wastes—high fixed carbon biomasses—had low nutrient contents, as is to be expected. It is also noted that wastes from crop processing of large importance and volume in the Mediterranean such as grape marc and olive mill waste, but also aromatics distillery waste, had generally low metal and metalloid macronutrient and micronutrient contents.

Potential for valorization of high-value compounds (polyphenols)

Within the database, polyphenol content was not strongly associated with any categorical grouping or measured chemical parameters. This may be due to the fact that phenol contents are more associated with specific plant identity and the particular secondary metabolisms of those species, rather than plant part or treatment path. Polyphenols are a large family of secondary compounds synthesized by plants useful for defense against biotic aggressors (pathogens) or abiotic stressors including ultraviolet radiation or oxidative stress. While they are omnipresent in the plant kingdom, some biomass wastes—including notably olive mill solid waste and wastewater—contain sufficient amounts of phenolic compounds to impact their management, due to potential environmental contamination and their possible interference in or inhibition of bioprocesses (Sharma and Melkania 2018). Olive mill wastes, when applied to land in large quantities, can provoke phytotoxicity, and for this reason specific application methods should be employed (e.g. Doula et al. 2012). Composting and other environmental technologies such as biobeds (Kinigopoulou et al. 2022) can be effective for valorizing such wastes with high phenol contents as a soil improver (Roig et al. 2006). A newer, higher-value approach to olive mill wastes involves the extraction of polyphenolic compounds (Tapia-Quirós et al. 2022), and methods have also been developed for olive leaves (Abi-Khattar et al. 2019). In fact, in our database olive leaves had higher polyphenol contents than olive mill wastes (average contents of 8500 and 6600 mg kg−1, respectively). Winery wastes, mainly composed of grape marc in our database, have also been given attention for extraction of polyphenols (Ferri et al. 2020). In our database, grape stems had approximately two times the polyphenol contents of winery wastes (5,600 and 2,200 mg kg−1, respectively). Other biomasses such as pomegranate discards and citrus discards and prunings had polyphenol contents which far surpassed both grape and olive-derived biomasses (ranging from appx. 18,000–43,000 mg kg−1). In our database the unmarketable onion subcategory (appx. 26,000 mg kg−1) comprised of only one sample, however this plant is well known for its high contents of antioxidants and bioactive compounds, and for this reason extraction of these compounds from onion waste has been a topic of research (Paesa et al. 2022).

Alternative categorization of biomasses by means of clustering analyses

While k-means and HCA showed general agreement in groupings, application of both methods provided complementary information. The k-means method allows the strict categorization into groups whose number is optimized statistically, while the HCA and dendrogram follows a different organizational principle and facilitates the visualization of specific dissimilarities to a greater degree. The application of each of the clustering methods to biomass chemical properties has provided two main types of information which may prove useful in biomass management. First, the k-means machine learning technique facilitated the detection of main chemical properties defining major differences between groups. For instance, we have seen that certain manure biomasses (sheep, rabbit, and goat) were differentiated from the other manures based on high macronutrient/macroelement contents. In this case, this likely reflects the way in which animal bedding waste is managed in the farms, since common bedding materials such as hay have relatively low metal contents. The k-means analyses would suggest that biomasses might be managed or categorized as “high-nutrient biomasses”, or “low-nutrient biomasses” in the case of the ‘nutrients’ database, and “low metal/micronutrient”, “high metal/micronutrient”, and “high macronutrient” biomasses in the case of the metalloid dataset.

On the other hand, the HCA has resulted in a classification of biomasses which improves previous knowledge and information about the products (membership/identity in subcategories) since it has facilitated the detection of properties or differences which may not be evident with an a priori categorization. Though an exhaustive description of these instances would be excessive here (since the database has been made available in supplementary material, we leave this to the interested reader), a few examples are offered: lemon prunings was situated in a position in the tree far from other prunings such as grape, loquat, or almond, due to its particularly high macronutrient contents, and avocado prunings were also separated from others in the tree owing to its intermediate macronutrient contents. Also seen is separation of unmarketable fruits (low nutrient contents) and unmarketable vegetables (high nutrient contents). In the metals database, we also see reflected the important differences between manure types mentioned previously.

Conclusions

Nutrient contents, organic matter, polyphenols contents, and contents of metals and metalloids are parameters which managers require for the recycling, upcycling, or revalorization of biomass wastes by different criteria. Full analyses of this type are costly, and the data can help orient stakeholders towards appropriate uses, identifying new opportunities for waste valorization, and for more accurate balances of nutrient and matter flows within territories. Among the highly representative biomass waste biomass types surveyed for this European region of agricultural importance, a large number of biomasses have been identified for value as fertilizers or sources of high-value phytochemicals, especially since very few of the surveyed materials presented limitations with regards to inorganic contaminants. The machine learning techniques applied here suggest an alternative vision for biomass categorization and management based on a comprehensive database of elemental contents. Visual inspection of clusters and dendrograms revealed differences among biomasses which may be perceived as similar, and awareness of these might be leveraged to improve valorization. With this basis, future studies with access to biomass production rates and spatial data can use this data to generate more reliable upscaling with a regional perspective, identifying potential value and opportunities as regarding fertilizer replacement value, energy generation, phytochemical value, or carbon sequestration.