1 Introduction

Optimizing the food system from an environmental and human health perspective will be vital in supporting a sustainable future. Increasingly, more studies assess nutritional and environmental dimensions in tandem (Green et al. 2020; Springmann et al. 2016; Willett et al. 2019). Methods to accomplish this include nutritional life cycle assessment (n-LCA). One application of n-LCA measures environmental impacts against the nutritional value of food levels (i.e., production systems, food items, and diets/food supply). This is opposed to standard LCA in which impacts are measured against a mass or volumetric unit (e.g., impact per 1 kg of food). Alternatively, n-LCA can use nutrient or health metrics in the impact assessment phase, as done in the CONE-LCA framework (Stylianou et al. 2016), or one can use nutrition as the basis for allocation or system expansion. As n-LCA is a novel method, there are many open issues; here, we focus on nutrient profiling systems which are increasingly used as functional units (FU) in LCA, although the findings are also relevant for metrics used in the impact assessment phase. Regardless, it is key to choose a nutrient metric with minimized bias based on the current scientific evidence. The terms nutrient index, nutrient metric, nutrient profiling system, and nutrient profiling algorithm are used interchangeably in the literature. To date, the most comprehensive assessment of nutrient metrics in LCA is the recent FAO paper (McLaren 2021), which established a basic foundation and high-level overview. Other papers in this realm include (Bianchi et al. 2020; Green et al. 2020; McAuliffe et al. 2019, 2023). While our focus is on nutrition as a function of food, other functions include pleasure, satiety, or beliefs; accordingly, other FUs are possible.

Currently, the use of nutrient metrics in n-LCA is highly variable, which can affect policy recommendations. A lack of transparency or standardization in metric use can allow for cherry-picking to make one food look more sustainable than another. A lack of standards for applying metrics can lead to different and conflicting outcomes across studies or offer room for greenwashing. While some of these occurrences are expected when using newer methods, metrics must be transparent to ensure comparability across studies and trust in results. In this domain, however, there is no commonly agreed-upon system to report how nutritional metrics are used in LCA. Consequently, we established the “points of differentiation” framework. Selecting various “points” will lead to different nutrient density scores. However, most studies do not explicitly consider these factors. Additionally, there is no agreed-upon framework for reporting these differences, and when studies do include these, they incorporate diverging aspects. This paper tests the use of these “points” and provides recommendations for future work to guide and streamline best practices for using nutrient metrics in n-LCA.

N-LCA can be relevant for various food system actors. Farmers can use it to measure the nutritional and environmental contribution their selected production practices confer and receive higher prices for foods with a stronger sustainability profile (Castro et al. 2018; Seo et al. 2017). For policymakers, developing better n-LCA methods will allow for more effective communication and policy setting because they can utilize simple but transparent metrics that effectively translate science. The industry currently uses nutrition metrics for product formulation; however, the combination with environmental facets will prove helpful in developing future products that are sustainable across multiple dimensions. Such actions are needed as consumers demand more transparent metrics that they can trust and understand (e.g., supermarket food labelling).

Nutrient metrics can address three major areas of concern for nutrition: (i) micronutrient deficiencies, (ii) undernutrition/hunger, and (iii) overweight/obesity. These areas are further linked to dietary non-communicable diseases (e.g., cancer, heart complications, and anemia) (WHO 2021), which dietary metrics, such as disability-adjusted life years (DALYs), can capture. Nutrient metrics can help to alleviate micronutrient deficiencies by weighting nutrients against deficiencies either on a population (public health needs) or individual (personalized nutrition) basis and they can also capture issues related to hunger or obesity by assessing under or over consumption of calories. Micronutrient deficiencies are primarily associated with populations in lower-income countries; however, subpopulations in high-income countries are at risk as well. For example, increasing iron and iodine deficiencies in reproductive women are of concern (Afshin et al. 2019). Moreover, some studies argue that we are likely underestimating deficiencies across all countries due to a lack of data (Hawkes et al. 2020; Hossain et al. 2020).

We classify nutrient metrics into three main categories: namely, nutrient adequacy, nutrient diversity, and nutrient quality. Nutrient adequacy metrics measure the amount of nutrients in a food item against recommended intake values, nutrient diversity metrics measure the diversity of nutrients in a food supply or diet (e.g., Rao’s quadratic entropy, Shannon’s diversity, and modified functional attribute diversity), and nutrient quality metrics measure the quality of a specific nutrient (e.g., amino acid content and digestibility for protein quality and the glycemic index for carbohydrates). Relatedly, bioavailability is also important. For example, iron quality is broadly determined by animal versus plant sources (i.e., heme versus non-heme) and zinc absorption is hindered by antinutrients such as phytate. Poor nutrient quality can lead to unknown deficiencies and a lack of nutrient diversity can lead to dietary risks or agricultural resilience challenges (Green et al. 2021). Over time, our food supply has been homogenizing; for example, 40% of our calories come from only three crops (FAO 2018). Additionally, diversity metrics may implicitly capture aspects of food consumption that we would otherwise ignore. Most sustainability studies only consider the adequacy category (Green et al. 2020) and, to a lesser but growing extent, protein quality.

When using nutrient metrics, there are two overarching issues to consider: the choice of metric and the assumptions behind it (i.e., how the metric is applied). We conceptualize these issues in the “points of differentiation” framework. The “points” include the selection of nutrients, weighting, energy standardization, context- and dietary-specific considerations, reference amount, across-the-board versus group-specific metrics, disqualifying nutrients, capping, processing quality, validation, and data quality. The choice of metric for the nutrient index in the impact assessment phase or n-FU can greatly influence results, as demonstrated in our case studies and previous reports (Green et al. 2021; McAuliffe et al. 2019; McLaren 2021). Accordingly, we develop the “points of differentiation” framework for nutrient adequacy metrics and provide commentary on diversity metrics where appropriate as these have not been tested extensively enough to offer robust recommendations. Nutrient quality metrics can vary a lot depending on which nutrient one is assessing. Thus, we do not explicitly develop a framework for this because recommendations cannot be generalized; nevertheless, we discuss how the “points” can apply to nutrient quality metrics when applicable.

The paper is structured in the following manner: The “Methods” section details the case studies and how we develop the framework. For each “point,” we provide an overview based on current understandings, offer new insights, and discuss recommendations. The “Results and discussion” section presents quantitative results from the case study detailing how and why “points” affect nutrient scores as well as how nutritionally-invested environmental impacts change under different n-FUs. We provide statistical analyses of results to further understand conclusions and their significance. This section also provides the “points of differentiation” framework with explicit recommendations for the use of these “points.” Lastly, we discuss options for moving forward with regard to this methodology.

2 Methods

The proposed framework builds upon an updated literature review related to nutrient metrics and their use in LCA. Different review papers have systematically assessed the role of nutrient metrics and their use in LCA and other sustainability assessments. The main areas these focused on include how nutrient metrics differ from one another with respect to weighting, across-the-board versus group-specific, criteria selection of nutrients, and capping (Bianchi et al. 2020; Green et al. 2020; McAuliffe et al. 2019). A previous paper applied the n-LCA framework to a global study of national food supplies and food items differentiated by regionality and explored other “points” such as energy standardization and dietary-specific metrics (Green et al. 2021). For this current paper, we have conducted an updated literature review to include additional papers of relevance. The Scopus query results were limited and thus supplemented with google scholar and with the reference sections of key review papers in this space.

The aim of this current paper is to tie the findings from these three research endeavors together and to further synthesize and build upon their outcomes. Accordingly, we delve into significantly more detail regarding the nuances of using the "points" (e.g., what is the change that can occur when using or when excluding these "points" in different contexts). Moreover, in this study, we introduce the concepts of context-specific metrics, explicate the discussion of across-the-board vs. group-specific metrics with a quantitative example on nuts, provide a more nuanced discussion of disqualifying nutrients in light of epidemiological updates, and explore the role of fortification and other food matrices’ issues. Moreover, we comprehensively report on all points of differentiation and are, thus, able to discuss their use in relation to one another (e.g., what are the differences between a 100-kcal reference unit and energy standardization). Lastly, we provide guidance for the use of these “points” at different food levels. The first section, in the “methods”, details support for how we developed this framework. For each “point” we describe it as well as its use at different food levels. The second section details the case studies that illustrate the use of this framework.

2.1 Nutritional life cycle assessment

As described, n-LCA is the integration of nutrition and health metrics into standard environmental LCA. As shown in Fig. 1, there are two main options to integrate nutrient metrics. The first is to use the n-FU. The FU in LCA is the basis of comparison for different products and should represent the main function of the product or process being evaluated. The second option is to include nutrient metrics in the impact assessment phase (i.e., include it as a separate metric); all “points” relate to both options.

Fig. 1
figure 1

Nutritional-LCA schematic. This schematic is based on the ISO 14040 standards (ISO 2006)

Additionally, when using a nutrient metric in LCA as the FU, there are two outcomes one must avoid; specifically, a FU must be neither a negative nor a fractional value. Negative values for nutrient metrics can be perceived as positive environmental impacts when combined with environmental impact data (Saarinen et al. 2017), and fractional values artificially inflate environmental impacts (Green et al. 2021). Fractional scores (i.e., values less than 1) are possible even when disqualifying nutrients are excluded.

2.2 Developing the proposed "points of differentiation" framework

This section details the justification for the “points of differentiation” framework (Fig. 2). It describes each “point” and its relation to nutrient adequacy, nutrient quality, and nutrient diversity metrics for each food level described in Table 1.

Fig. 2
figure 2

Stylized schematic of the points of differentiation framework

Table 1 Food levels and descriptions

2.2.1 Weighting

A nutrient index can be weighted or unweighted; if the former is considered, the basis for weighting must be determined. There is no universally agreed-upon method for weighting, and different options have been used for nutrient adequacy metrics, including weights based on micronutrient deficiencies, energy, or regression coefficients derived from relationships between nutrients and health (Green et al. 2020). Alternatives include the distance-to-target method used in environmental sciences to weight nutrient metrics by nutritional deficiencies and overconsumption (Bianchi et al. 2020; Ridoutt 2021) or assigned weighting factors based on expert opinion (Mozaffarian et al. 2021a). Most commonly, however, nutrient adequacy metrics are unweighted, meaning all nutrients are treated as providing the same relevance for the diet.

For nutrient quality metrics, studies have weighted the digestible indispensable amino acids score (DIAAS) value (a measure of protein quality) by the amount of protein in food items (Berardy et al. 2019), and other studies have used an unweighted approach. However, FAO cautions that when using this approach DIAAS values should be capped (i.e., values above 100 should not be used) (FAO 2013); capping is discussed in Section 2.2.7. For nutrient diversity metrics, some are weighted, whereas others are not. For example, Rao’s quadratic entropy can include weights based on the amount of different food items in the food supply (Bogard et al. 2018). In general, if data is available to weight the relative importance of nutrients to a diet it should be included; the best example of this would be to weight nutrients by micronutrient deficiencies in the diet (Avadí et al. 2014). In other cases, weighting may not be feasible, for example, when comparing phytochemicals in oils. In such a case where data quality is uncertain, an unweighted metric might be less biased.

2.2.2 Energy standardization

A common type of weighting is energy standardization (Estd), which we discuss as a separate “point” due to its relevance. With Estd, metrics are weighted by energy based on dietary need (i.e., 2000 kcal/kcal in food item). 2000 kcal is a common reference value for dietary needs but in reality this number varies (Fern et al. 2015). Estd can increase the comparability of food items because foods have different nutrient and caloric densities (Fern et al. 2015), and the use of Estd can affect nutrient index scores and subsequent conclusions on nutrient densities (Green et al. 2021).

Estd is appropriate when comparing across food groups because the nutrients delivered per unit (i.e., 100 g) can vary greatly across foods such as lettuce versus chicken. It is less appropriate when the metric is already weighted by micronutrient deficiencies because energy standardization alters the actual nutrient amounts in a particular food item. Estd should not be applied at the dietary level since a food supply or diet is theoretically a complete set of nutrients (Green et al. 2021).

2.2.3 Context- and dietary-specific

Considering context-dependence is crucial to fully understand trade-offs in food systems. Most metrics are generic and do not include context- and dietary-specific considerations but including this “point” can confer more robust and actionable data that is usable by policymakers or other actors such as farmers or industrial food processors. Including the dietary context is useful since the “healthiness” of a food item can be greatly swayed by the overall diet. Relatedly, as demonstrated in previous studies, foods that are high in one specific nutrient will receive higher nutrient density scores. Still, such a score is only valid if the population of concern is deficient in that nutrient (Green et al. 2021). In this case, a context- and dietary-specific metric can avoid this issue because deficient nutrients in the diet will be weighted higher in the nutrient index. This “point” is predominately relevant for adequacy metrics. While it could be applied to nutrient quality metrics, data at the dietary level for this is extremely limited.

It should be noted that incorporating the dietary context does not necessarily mean the metric is context-specific. For example, the NRprot-sub, which was developed to explore trade-offs for protein-rich food items within the question of omnivore versus non-omnivore diets (Green et al. 2021), is dietary-specific but not context-specific because it was developed for a more global application (although it could be adapted for a specific country). The FSI20 metric (Green et al. 2022) is both context- and dietary-specific because it is composed of nutrients that are deficient in a specific dietary pattern for a particular population.

An acceptable way to incorporate this point of differentiation is to determine the nutrient needs of a population through dietary studies and then weight the metric accordingly. The challenge here is data quality. Dietary data can be estimated from intakes derived from supply-oriented studies or more specific dietary surveys; alternatively, nutrient status values can be adopted from serum/urine measurement studies. The latter is more robust and can better account for bioavailability. One study found that intake and serum/urine data were uncorrelated for many nutrients (Schüpbach et al. 2017). Other options include the nutrient quality index (NQI) metric, which assesses the sustainability of a food by evaluating its relevance within a diet against a benchmark food (Sonesson et al. 2019); however, this option is more time-intensive to calculate and difficult to interpret (McLaren 2021).

2.2.4 Reference amount

Nutrient adequacy metrics have been measured against various reference amounts, including, 100 g, 100 kcal, serving size, and 100 g of dry matter. The advantages and disadvantages have been summarized in previous papers (Drewnowski et al. 2009, 2021). Most studies calculate metrics per 100 g (Mozaffarian et al. 2021a), but this can cause comparability issues because solids and fluids have different water contents (Drewnowski et al. 2009). It is possible that Estd can partially account for these differences since it also adjusts nutrient contents by energy density. A serving size reference amount may be more appropriate since it reflects actual consumption, although these vary by country (Drewnowski et al. 2021). We use much less than 100 g of butter or oil and a serving size of milk is twice that of 100 g. On the other hand, 100 g is an easy unit of comparison across food items and is used for food labeling in Europe (Drewnowski et al. 2021). Nevertheless, the use of a serving size is beneficial when comparing within food groups. Nutrient quality metrics are measured against reference dietary patterns (FAO 2013). However, these reference patterns only exist for a few age groups.

2.2.5 Across-the-board versus group-specific

Metrics can be across-the-board, meaning the metric is applied to all food groups, or they can be group-specific wherein the metric is specific to a particular food group and is inclusive of nutrients relevant to that food group (Scarborough et al. 2010). Group-specific metrics elucidate which foods are more nutrient dense within a specific group and allow one to see variability within a group for specific nutrients of interest that can be selected based on consumer preference (e.g., consumers eat nuts for their protein and fatty acid contents). Such information is useful for policy makers to communicate public health claims regarding which foods people should eat; this can also inform producers when they select different varieties of nuts or use different production practices that confer or enhance more of a particular nutrient. In the past, these metrics have aided the industry in production reformulation and innovation (Drewnowski et al. 2021). On the other hand, a group-specific metric could miss other trade-offs with nutrients uncommonly associated with a specific food group. This example is more clearly illustrated in the case study.

Across-the-board metrics that are context-specific can be useful when trying to solve general issues of micronutrient deficiencies. This is because they can compare a vast range of food items to determine those that are useful in contributing to certain micronutrient intakes. Similar studies would also be important for examining the potential of traditional (e.g., indigenous and underutilized crops) or novel (e.g., future foods such as insects or algae) food items. On the other hand, group-specific metrics focus on foods within a certain food group which means consumer acceptance might be higher for the proposed food since the foods are more similar (i.e., we can suggest substituting more almonds for cashew to reduce riboflavin deficiencies but suggesting egg as a substitution would be met with more consumer hesitance because adoption would require a greater shift in their diet).

In the future, experts can identify specific nutrients to include in group-specific metrics when comparing within food groups, making studies more standardized and comparable. The concept of across-the-board and group-specific metrics predominately applies to nutrient adequacy metrics. Nutrient diversity metrics are normally composed of all nutrients relevant to a diet or system; likewise, amino acids are generally thought of as parts of a whole.

2.2.6 Disqualifying nutrients

The use of disqualifying nutrients is one of the more controversial points of differentiation. This point is only relevant to adequacy metrics: they are irrelevant to nutrient quality metrics and should be excluded from diversity scores (Green et al. 2021). The choice of including disqualifying nutrients in a nutrient metric is particularly relevant to n-LCA due to the risk of a negative FU and because some practitioners argue that the function of food is not to harm, and thus, FUs should only include qualifying nutrients (Saarinen et al. 2017). We argue that the decision to include disqualifying nutrients depends on the study.

Incorporating disqualifying nutrients reduces the likelihood of biasing results in favor of energy-dense foods (Green et al. 2021). Additionally, metric results are easier to communicate because all information is included within one visual metric. There are two options for integrating disqualifying nutrients into a nutrient index; one can solely consider their overconsumption, or, the alternative is to penalize the percentage of maximal reference values (MRV). MRV indicate maximum amounts one should consume of a particular nutrient. Considering the overconsumption of these nutrients is relevant for food supply or diets because these food levels should only be penalized for their overconsumption. This approach can be useful for food items because some of these nutrients like sodium can/should be consumed in limited quantities. Additionally, it avoids penalizing foods that are otherwise nutrient dense in key nutrients; for example, foods such as dairy or nuts may receive lower scores because they are high in one disqualifying nutrient but are otherwise nutrient rich (Drewnowski et al. 2019). Nevertheless, while this approach may be warranted for more “natural” food items such as nuts or boiled eggs, for more processed products like bacon such an approach could be dangerous. On the other hand, the percent MRV approach accounts for the argument that certain nutrients should be consumed in the smallest amounts possible irrespective of the MRV; such nutrients include added sugars and trans-saturated fat. As mentioned earlier, if disqualifying nutrients are included, the threshold method (Green et al. 2021) should be applied to avoid fractional or negative scores.

The alternative option is to exclude disqualifying nutrients. If the study question is more specific, this option can be considered. Their influence should be accounted for within the impact assessment phase if excluded from the FU. This can be accomplished in a separate assessment of individual disqualifying nutrients, a holistic nutrient metric such as the LIM, or with a health metric such as DALYs. An example of the latter approach can be found in a previous study (Stylianou et al. 2016).

Regarding specific nutrients to incorporate as disqualifying nutrients, indices have evolved from including nutrients such as total fat to more specific nutrients such as saturated fat. Nevertheless, there is still the discussion on which nutrients to include. For example, as mentioned, some metrics include total sugar because values for added sugar are infrequently reported. However, there is no daily recommended intake (DRI) for total sugars so this penalizes foods high in natural sugar such as milk, which conflicts with certain dietary guidelines. Additionally, sodium is a nutrient that is detrimental in excess but still needed in small quantities; however, it is often included as a disqualifying nutrient. Lastly, some metrics include energy as a disqualifying nutrient to penalize foods that could contribute to obesity, but most do not. Relatedly, one index sought to incorporate nutrient overconsumption (Ridoutt 2021). Such a penalization is needed to address obesity concerns and to handle excessive fortification because consumption above tolerable limits for certain nutrients can induce health complications. Of course, the inclusion of either parameter is only appropriate for contexts in which these are key issues; for instance, in other contexts, hunger may be a more pertinent issue and thus energy would not be appropriate as a disqualifying nutrient.

Lastly, there has been a move towards including nutrient ratios because the presence of certain nutrients can mitigate the effect of disqualifying nutrients—e.g., potassium and sodium (Mozaffarian et al. 2021a). However, the inclusion of these ratios complicates how one should weigh them, as the ratio values can be orders of magnitude off from nutrient amounts. One option is to take the log of these ratios (Mozaffarian et al. 2021a), while another option is to provide a separate evaluation of these ratios (i.e., keep them as separate metrics).

2.2.7 Capping

Capping refers to truncating nutrient metrics at 100% of DRI values or, in the case of nutrient quality metrics, at 100% of the reference amount. When capping, if a diet or item meets the nutritional requirements for all nutrients, then the maximum score is 100. The alternative is to leave nutrient metrics uncapped; in this case, the maximum score is undefined. Uncapped metrics mean that a food level can receive high scores due to excess concentrations of one or two nutrients, as demonstrated in a previous study with vegetables and vitamin A (Green et al. 2021). These excess nutrient amounts can also obscure the lack of overall nutrient density in other food items. However, these issues can be addressed using context-specific metrics because if a population is deficient in a nutrient that a food item is excessively high in, the higher index score is justified. For example, vitamin A deficiencies are common in lower-income nations particularly among children; however, in higher-income nations such as Switzerland, vitamin A is not a nutrient of concern. Capping is particularly relevant for fortified foods. Foods such as cereals or juices fortified with high amounts of certain nutrients can receive higher index scores despite being less nutrient dense overall. While the literature has not explored this issue in detail, one index did suggest capping fortified nutrients and leaving naturally occurring nutrients uncapped (Katz et al. 2010).

In general, we recommend capping nutrients at the food supply and diet level but leaving metrics uncapped at the food item or production system level because excessive nutrients in one food item can compensate for the lack of nutrients in another within a diet or food supply (Green et al. 2020). The decision to include capping may vary depending on the nutrient. For example, on which nutrients we can accumulate (i.e., fat soluble) and do not excrete excesses of. In these cases, we can absorb more of a nutrient than DRI requirements, and this would be important for nutrients that are commonly deficient. Of course, care would need to be exerted for nutrients that can accumulate and become toxic like vitamin A. The food supply and diet levels are theoretically complete sets of food; thus, consuming nutrients beyond this does not confer any additional health benefits. Accordingly, scores should be capped. For nutrient quality metrics, FAO recommends food items should be uncapped unless one is multiplying the DIAAS value by the amount of protein in the food item; when assessing diets or food supply, capped values should be used (FAO 2013). Lastly, capping is relevant for energy standardized metrics. A previous study showed how vegetables and seafood were the most nutrient dense food groups on an uncapped basis, but on capped basis, seafood ranked more similar to other food groups like fruits and roots and tubers (Green et al. 2021). This difference in outcomes would affect policy recommendations on foods to consume.

2.2.8 Validation

Validation refers to validating metrics to determine their level of accuracy in ranking foods, but most metrics are un-validated (Mozaffarian et al. 2021a). The WHO recommends various options for validation, distinguished by their complexity. Options include comparing un-validated indices against validated ones, choosing indicator food items known to be “healthy” or “unhealthy,” and comparing rankings from the un-validated index to these indicator foods (WHO 2011). More complex methods involve comparisons against experimental study outcomes or establishing whether “healthy foods,” determined by the nutrient metric, constitute healthy diets, which would be defined by an independent and preferably validated dietary quality index (Fulgoni et al. 2009; O’Hearn et al. 2022; WHO 2011).

Validation is not always straightforward, particularly with more novel food items such as plant-based burgers for which benchmarks and dietary data are lacking. Additionally, indicator foods are more challenging to define in more granular food groups because fewer robust epidemiological studies have been conducted for these, unlike for large food groups or nutrients for which we have more data (e.g., added sugar and processed red meat are detrimental to health). Best methods for validation in the context of fortified foods needs further investigation. To date, validation has only been applied to nutrient adequacy metrics, and thus, validation for quality and diversity metrics would prove difficult.

2.2.9 Selection of nutrients and ingredients and processing quality

The choice of specific nutrients to include is a key aspect of using nutrient metrics. Certain indices aim to have all essential nutrients for which data is available [e.g., nutrient balance concept (Fern et al. 2015)], while others only include certain nutrients based on selection criteria. Consequently, various nutrient metrics exist with different nutrients as well as a different number of nutrients.

In general, for nutrient adequacy metrics, we recommend selecting nutrients relevant to the population of interest by including nutrients for which nutrient deficiencies exist or that are relevant to dietary health concerns or national public health policies. The WHO has stated that nutrient metrics should be relevant to the needs of public health for each country (WHO 2011). For example, one document discusses the different nutrient profiling approaches of various countries. Mexico focuses on addressing obesity, Thailand has significant dental issues, and consequently, sugar is a key focus of theirs (WHO 2011), and the UK is concerned with obesity and accordingly includes energy as a disqualifying component (Drewnowski et al. 2021).

Alternatively, when studying production systems, one could also select nutrients relevant to changes in management practices (e.g., fatty acids if assessing feed formulations or vitamins if assessing LED lighting). For nutrient quality, it is common to include all essential amino acids or only the limiting amino acid as done in the DIAAS score. Including all essential and non-essential nutrients is the best option for nutrient diversity. Here, nonessential nutrients or other components, for which there is evidence of health benefits (e.g., phytochemicals and antioxidants), can be included more easily as nutrient amounts do not have to be measured against DRI values.

Lastly, ingredients such as additives should ideally be incorporated as disqualifying nutrients when shown to confer detrimental health effects. Unfortunately, data in this space is limited. However, with the adoption of more poorly processed foods (e.g., diet sodas) (Morrison 2022) or certain plant-based alternatives, foods high in additives such as thickeners, stabilizers, and colorings should be assessed. Products can and are made without these (i.e., clean label products), but there are still many products with additives that can potentially harm health. Poorly processed foods result from low-quality processing meaning that unnecessary additives are used, beneficial nutrients are destroyed, or disqualifying nutrients are added in harmful amounts (i.e., close to or above MRV). Processing beef into sausage or pork into bacon has effects on food functionality that can be harmful to health. Home cooking also impacts the quality of food; for example, frying destroys a larger amount of nutrients compared to boiling. In contrast, processing can be beneficial by destroying anti-nutrients (which inhibit the absorption of beneficial nutrients like iron or zinc) or by removing food safety risks such as harmful microbes. As evidenced, processing can negatively impact nutritional compositions (e.g., nutrient degradation) and more targeted metrics are needed to classify these actions, specifically with regards to poorly processed foods (Braesco et al. 2022), for incorporation into nutrient metrics.

2.2.10 Interpretation/data quality

The last two points of interpretation and data quality should be evaluated in tandem with the other “points.” Data quality should be reflected with respect to the environmental and nutrient databases used (e.g., is the database specific for that region or are globally-averaged values being used). Additionally, statistical and uncertainty analyses are lacking in n-LCA. With respect to the interpretation “point”, as explained, this is particularly relevant for the disqualifying nutrients and processing quality “points.” When utilizing the nutrient metrics as the FU, there are important interpretation aspects to consider when applying the threshold method and when using a contingent vs. non-contingent metric. This is because it is important to understand if environmental impacts change proportionally with the FU; this happens with certain nutrient metrics since the environmental impact will always be calculated with the same amount (Saarinen et al. 2017).

The threshold method is employed to avoid negative or fractional scores. However, fractional scores can occur even without the inclusion of disqualifying nutrients. With this method, interpretation becomes critical. If the nutrient score is 1 then there is no nutritional gain for the environmental impacts of foods (i.e., the score on a mass and nutrient basis is the same). This of course partially undercuts the value of foods that only score slightly higher than 1; however, for hotspot analyses such an approach is useful.

A nutrient metric can be contingent or non-contingent. A non-contingent metric has an absolute and independent maximum (Green et al. 2021) and there is a meaningful unit increase throughout the metric. A capped nutrient metric is an example of a non-contingent metric. The absolute maximum is 100% and each incremental increase (i.e., 1% increase) means we are 1% closer to reaching 100%. However, an uncapped nutrient metric or one that is Estd has an independent maximum but not an absolute one, and each incremental increase does not have a clear interpretation with respect to reaching the maximum. A diversity metric, for comparison, is a contingent metric because its multidimensional nature means the value changes relative to the composition of the group (Green et al. 2021). Interpretation with a non-contingent metric is relatively more straightforward if one is examining absolute values of nutritionally-invested environmental impacts because impacts change proportionally to changes in the FU; this does not occur with contingent metrics. This is another reason why using relative rankings as opposed to absolute values of nutritionally-invested environmental impacts is warranted. Secondly, it is important to understand a nutrient metric used as the FU can be conceptualized as a quality-corrected FU (not to be confused with a nutrient quality metric). Related to the above issue of FUs and proportionality, a quality-corrected FU captures the issue that in LCA the FU should change environmental impacts proportionally, thus the n-FU can be defined as: (food amount) * (nutrient index). This changes the absolute values in comparison to using the nutrient metric as is, but not the relative rankings since changes are proportional (i.e., the interpretation of the results is not affected when adopting this approach).

2.3 Case study description

Our case study focuses on various food groups representative of a food supply. As the exemplary metrics, we use the NR (Drewnowski and Fulgoni 2008) and NBC (Fern et al. 2015). We chose the NR metrics as one of the bases because the NR9 or NRF9.3 is the most commonly used index (Green et al. 2020); moreover, the NBC is also being used more frequently. For nutrient data, we used values from the USDA (Table S1) and average DRI values from the Institute of Medicine for a 19–50-year-old female (Institute of Medicine 2019). Added sugar was estimated by considering 80% of sugars in processed foods, as similarly done in a previous FAO study (McLaren 2021), because the USDA database does not give specific values for sugar types.

For this case study, we include foods based on environmental data availability from Poore and Nemecek (2018). Their food groups capture 90% of calories consumed globally, so we selected foods in these categories and matched them to environmental impacts (Table S2). We chose multiple foods in each composite category (e.g., seafood) unless the category was already for an individual food item (e.g., apple). We selected more standard forms of cooked foods (e.g., ground beef, broiled pork chops, or roasted chicken and not bacon or sausages—the latter of which have a multitude of additional ingredients). For vegetables, we also included cooked forms. In total, we assessed 144 food items that we classified into 34 food groups and then into 11 larger groups for visualization purposes: fruits, vegetables, tubers, grains, fortified foods, vegetarian animal-sourced foods (ASF)— i.e., cheese, milk, eggs—, other, meat, seafood, pulses, and nuts. The quintile rankings for the nutrient indices rank from highest nutrient density (4) to lowest nutrient density (0) and for the environmental impacts we rank them from least environmentally friendly (0) to most (4). We also include more processed versions of these foods to demonstrate the role of fortification and nutrient loss when foods are processed or fortified. For example, we include potatoes and French fries as well as oatmeal and sugary breakfast cereals made from oats. We use the environmental impacts from cradle to retail. For similar products, we use the same environmental impacts, as this is the best available data. Accordingly, for potatoes and French fries (which are a standard form of potatoes without significant addition of extra ingredients), we use the impacts of potatoes and for fortified foods, we use the environmental impacts of their unfortified counterpart.

We calculate absolute values for nutrient indices as well as quintile rankings for food groups. To determine the effect of a “point” we isolate its effect by holding all other “points” constant. For example, to assess the effect of Estd, we conduct a pair-wise comparison of comparable metrics that are energy standardized and for which all other “points” are the same (e.g., only qualifying nutrients, the same application of capping, unweighted, etc.). For each “point,” we present absolute nutrient densities for the large food groups as well as differences in quintile rankings (i.e., the shift in quintile ranking, which are groupings based on nutrient density, due to the effect of a “point”). Quintile rankings are important because policy, public communication, and food labelling are often made based on relative differences between groups. For example, foods are scored and then grouped for communication purposes (e.g., Nutri-Score). For nutritionally-invested environmental impacts, we are predominately interested in relative differences, so these are assessed with quintile rankings; here, the absolute values are less relevant and interpretable as explained in the previous section. For the nutrient indices, we are interested in how similar scores are to all other nutrient indices and thus calculate Spearman rank correlation coefficients. On the other hand, for the combined sustainability results, we are interested if there is a statistical difference in environmental results from before and after the use of a specific nutrient-based FU, and this is determined by the Wilcoxon signed rank test. For example, we want to determine if there is a difference in nutritionally-invested environmental impacts before applying the “point” of weighting and after. Additionally, we measure the coefficient of variation to determine the effect of nutrient metric choice when used as the FU.

For communication purposes regarding the “points,” we refer to certain nutrient metrics by other names. For example, the NR_A, NR_B, NR_G, and NR_H are variations of the NR9 with different “points” applied and the NR_D and NR_E are variations of the NRF9.3 (i.e., NR9 with disqualifying nutrients) with different “points” applied. The NR metrics are composed of 9 nutrients including, iron, protein, fiber, vitamin A, vitamin C, vitamin E, calcium, magnesium, and potassium. The NBC metrics include all essential nutrients for which we have data; which is the essence of the NBC. In our study, we include protein, fiber, calcium, iron, magnesium, phosphorus, potassium, zinc, copper, selenium, vitamin C, thiamin, riboflavin, niacin, vitamin B6, vitamin A, folate, vitamin E, vitamin D and vitamin K, choline, and vitamin B12. The NRprot-sub includes nutrients relevant to non-omnivore diets (protein, riboflavin, vitamin B12, iron, and calcium)— all of these are commonly cited nutrients of concern when assessing vegan and vegetarian products for dietary substitution (Green et al. 2021); however, the metric can be adapted based on population specificities or updated scientific information regarding non-omnivore diets. The FSI metric includes all nutrients for which nutrient deficiencies exist in a specific population for specific dietary patterns (in this case, we focus on American omnivores); each nutrient is weighted so that nutrients in which people are strongly deficient receive a higher weighting. The disqualifying nutrients included are sodium, added sugar, and saturated fat. The equations of indices not in listed in Table 2 can be calculated by adapting the base metric with the equations for the “points.” The base metrics and their associated “points” are listed in Table 3.

Table 2 Equations for nutrient metrics and points of differentiation
Table 3 Points of differentiation for each nutrient metric

3 Results and discussion

3.1 Case study results: nutrient metrics

First, we discuss how the choice of nutrient metric can affect quintile rankings of food groups, and then we detail how different “points” contribute to these differences. As evidenced in Table S3, the choice of nutrient metric can have a massive effect on quintile groupings of foods. For example, the group onions and leeks or pork can range from the bottom quintile ranking to the highest, depending on the metric used. From this, it is clear that applying indices with different points (e.g., one index with 22 nutrients that is energy standardized will have different results than one that is not energy standardized, capped, and with 9 nutrients).

What is also important is to explore how and why these “points” have an effect. As explained, we isolate the effect of each “point” by holding all other points constant. For the nutrient indices, we calculate Spearman rank correlations (Fig. S1). In summary, the “points” that have the largest influence on nutrient density scores are reference amount, disqualifying nutrients, depending on how it is applied, Estd, and dietary specificities, as determined by weak correlations, larger shifts in quintile rankings, and changes in nutrient densities.

3.1.1 Weighting

In Fig. 3, we compare the FSI (weighted by nutrient deficiencies) and NR_I (unweighted). With the weighted metric, food groups such as seafood and nuts have much higher nutrient densities; meaning public policy communication would emphasize such food groups to address deficiencies. Tubers, vegetables, and other have positive quintile ranking shifts—meaning these are categorized into more nutrient-dense food groups with the weighted metric. However, the changes in ranks are smaller when compared to other “points” for which quintiles can shift up to four positions; this is expected because the weighted and unweighted metrics are strongly correlated (ρ = 0.84).

Fig. 3
figure 3

Effect of weighting. These graphs compare a weighted (FSI) to an unweighted (NR_I) nutrient metric. A Nutrient densities of weighted and unweighted nutrient metrics. B Differences in quintile rankings of food groups with inclusion of weighting. A positive value means a food group moved from a lower quintile group, characterized by a less favorable nutrient density, to a higher quintile group, characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

3.1.2 Energy standardization

For Estd, we examine the NR9_A versus NR9_G (ρ = 0.013) and NR_I versus QI (ρ = 0.094). In both cases, there is a weak correlation, signifying that Estd strongly affects nutrient index scores. In Fig. 4, compared to the other food groups, fruits, seafood, and vegetables have much higher nutrient densities when energy standardized, reflecting their lower energy density per calorie. With regards to ranking shifts, grains and seafood change the least; seafood already ranked high with the non-Estd metrics and thus could only improve slightly. Vegetables and fruit, on average, move to much higher nutrient density quintiles. Nuts and other, on average, score much worse when energy standardized, as evidenced by their negative, large quintile shifts. To further explore the aforementioned conclusions, we regressed the Estd scores against kcal (Fig. S2). For the NR_A we find a moderate power relationship (R2 = 0.559) and a weak one for the QI metric (R2 = 0.352). Nevertheless, the general trend was that foods with a lower caloric density receive higher scores, which is logical since the Estd weighting factor will be higher. The key outliers to this are fortified foods, foods with a low caloric density but also a low nutrient density (i.e., in the bottom quintile), and foods with a high caloric density but also high nutrient densities (i.e., the top quintile).

Fig. 4
figure 4

Effect of energy standardization (Estd). These graphs compare the NR_A to the NR_G (Estd) and the NR_I to the QI (Estd) metrics. A Nutrient density scores when examining Estd. B Differences in quintile rankings with the inclusion of energy standardization. This graph compares the NR_A against NR_G (Estd) and the NR_I versus the QI (Estd). A positive value means a food group moves from a lower quintile group characterized by a less favorable nutrient density to a higher quintile group characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

3.1.3 Context- and dietary-specific

Here, we compare the dietary NRprot-sub metric that is specific to nutrients of concern for non-omnivores to generic nutrient metrics applicable to all populations and contexts; namely, the NR_G and QI. The FSI was discussed in the weighting section. When compared against general nutrient metrics, we see a rho of 0.35 for the NR_G and 0.66 for the QI. As expected, ASF foods like meat, vegetarian ASF (e.g., eggs and cheese), and seafood rank much higher than plant-based foods, which do relatively worse under the NRprot-sub metric (Fig. 5). Fortified foods also score as more nutrient dense with the dietary-specific metric. This has implications when recommending food items to improve health based on if someone is a pescatarian, vegetarian, or vegan. Pulses also have a high nutrient density with the NRprot-sub, but due to their already high nutrient density with the generic metrics, they do not shift rankings under this “point.”

Fig. 5
figure 5

Effect of dietary-specificities. These graphs compare the NRprot-sub (dietary specific) metric to the QI and NR_G (generic) metrics. A Nutrient densities of dietary specific and generic metrics. nutrient metrics. B Difference in quintile rankings when including dietary specificities. A positive value means a food group moves from a lower quintile group characterized by a less favorable nutrient density to a higher quintile group characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

3.1.4 Disqualifying nutrients

One can include disqualifying nutrients in the nutrient index or assess them separately. Reasons for the latter are previously explained, but summarily, the moderating effect of qualifying nutrients and the debated effect of nutrients depending on the food matrix makes the interpretation of disqualifying nutrients more difficult. We compare the NR_A and NR_D and QI versus the NBC_A and NBC_B. NBC_A and NBC_B differ in the treatment of disqualifying nutrients (i.e., if we penalize foods based on percent (NBC_A) or on overconsumption of MRV (NBC_B); the benefits and drawbacks of these approaches are discussed in the methods section detailing the framework development.

Based on Fig. 6, when we compare the NR_A and NR_D (ρ = 0.44), fruits, vegetables, and pulses have strong and positive quintile shifts because high LIM scores penalize the overall nutrient density of groups such as meat that had higher nutrient density scores with the NR_A. Contrastingly, the high LIM of nuts did not impact nutrient density scores as intensely as it did for the ASF foods (as evidenced by small quintile shifts). Fortified foods of cereal and nut butters also have a high LIM score; while they score lower with the NR_D, they are still ranked in the top quintile under both metrics.

Fig. 6
figure 6

Effect of disqualifying nutrients. These graphs compares the NR_A to the NR_D (inclusive of disqualifying nutrients) metric and the QI metric to the NBC_A (inclusive of disqualifying nutrients — penalization based on % MRV) and NBC_B (inclusive of disqualifying nutrients — penalization based on overconsumption) metrics. A Nutrient densities of metrics with and without inclusion of disqualifying nutrients (not energy standardized). B Nutrient densities with and without inclusion of disqualifying nutrients (Estd indices). C Difference in quintile rankings with inclusion of disqualifying nutrients. A positive value means a food group moves from a lower quintile group characterized by a less favorable nutrient density to a higher quintile group characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

This picture changes partially when considering energy standardized metrics; these metrics are more strongly correlated. Nevertheless, the general trend of fruits, vegetables, and pulses having large quintile shifts holds and ASF (except for seafood) shift to lower nutrient density quintiles; however, to a smaller extent. Finally, there is a strong correlation (ρ = 0.97) between the NBC_A and NBC_B metrics and NBC_A nutrient density scores are lower for foods higher in disqualifying nutrients.

Lastly, when using nutrient metrics in the FU, the threshold method must be applied to avoid fractional or negative values for the FU. With this application, metrics are strongly correlated (NR_D vs. NR_E, ρ = 0.99, NBC_A vs. NBC_C ρ = 1 vs. NBC_B vs. NBC_D ρ = 1).

3.1.5 Capping

We compare the NR_A and NR_B, which are non energy standardized metrics on an uncapped and capped basis, respectively, and the NR_G vs. NR_H, which are energy standardized metrics on an uncapped and capped basis. Based on Fig. 7, for the non energy standardized metrics, capping makes a minimal difference in absolute values (ρ = 1), because few foods provide more than 100% of DRI requirements. An exception is fortified foods, which on an absolute basis can be more nutrient dense than all other foods even when capping is applied. This poses the question of how best to account for these within nutrient metrics. For example, is the high LIM score of fortified cereals excused since it is also reasonably nutrient-dense (albeit via fortification)?

Fig. 7
figure 7

Effect of capping. These graphs compare the NR_B and NR_H (capped) to the NR_A and NR_G (uncapped) metrics. A and B Nutrient densities of capped and uncapped metrics. A compares capping for non-energy standardized metrics and B compares capping for energy standardized metrics. C Differences in quintile rankings with inclusion of capping. A positive value means a food group moves from a lower quintile group characterized by a less favorable nutrient density to a higher quintile group characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

For energy standardized metrics, capping affects results more clearly. While the correlation between these two is strong (ρ = 0.84), moderately-strong shifts in quintile rankings and nutrient density scores are observed; such fluctuations would affect policy recommendations. Capping appears to penalize nutrient dense foods. On an uncapped basis, fruits, seafood, and vegetables have the highest nutrient density scores. On a capped basis, these food groups score realtively worse, while pulses, grains, and tubers score relatively better. Additionally, absolute values of nutrient density scores are similar across food groups on a capped basis; this lack of differentiation would hinder specified recommendations. Lastly, capping appears to have differing effects on fortified foods.

3.1.6 Reference amount

The differences between 100 g and 100 kcal have been extensively explored, and here we also find that results can substantially vary depending on the reference amount (ρ = 0.013 between NR_A and NR_C and ρ = 0.094 between NR_I and NR_J). Based on Fig. 8, fruits and vegetables have higher nutrient densities on a kcal basis, while fortified foods, nuts, and grains have lower nutrient densities. For quintile rankings, most food groups have large shifts, which further support the conclusion that the reference amount of 100 g vs. 100 kcal can strongly influence results. Additionally, we also compare metric with a 100 kcal reference unit to Estd metrics, since, in theory, they should have a similar effect on results (e.g., reducing bias due to water content). We found ρ = 1, for the relationship between QI and NR_J and NR_C and NR_G indicating a perfect correlation in both cases, and the quintile rankings are the same for all foods.

Fig. 8
figure 8

Effect of reference amount. These graphs compare the NR_A and NR_I (100 g) to the NR_C and NR_J (100 kcal) metrics. A Nutrient densities measured against different reference amounts. B Differences in quintile rankings with different reference amounts. A positive value means a food group moves from a lower quintile group characterized by a less favorable nutrient density to a higher quintile group characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

3.1.7 Selection of nutrients and processing quality

Here we compare the NR_A and NR_I with 9 and 22 nutrients, respectively (ρ = 0.78). The NR_G and QI also have 9 and 22 nutrients but are energy standardized (ρ = 0.83) (Fig. 9). When more nutrients are included, meat and seafood have relatively higher nutrient density scores; additionally, when energy standardized, vegetables and fruits score relatively worse on an absolute basis. Despite the positive and high correlations demonstrating that these are similar in absolute terms, with respect to ranking shifts, grains, vegetarian ASF, and seafood move to more nutrient dense quintiles, on average, when more nutrients are included.

Fig. 9
figure 9

Effect of nutrient selection. These graphs compare metrics with 9 (NR_A, NR_G) versus 22 (NR_I, QI) nutrients. A compares nutrient selection for nonenergy standardized metrics and B compares them for energy standardized metrics. C Differences in quintile rankings when comparing metrics differentiated by nutrient selection. A positive value means a food group moves from a lower quintile group characterized by a less favorable nutrient density to a higher quintile group characterized by a more favorable nutrient density. A negative difference indicates the food group moves from a higher quintile to a lower one

It should be noted that these comparisons are not complete. We are missing many health promoting ingredients that fruits and vegetables confer like antioxidants (Arias et al. 2022; Drewnowski and Burton-Freeman 2020). We also need to consider the effect of low-quality processing on foods like cereals and French fries; while the lower nutrient density of these foods compared to their counterparts would be reflected in their scores, processing effects on food functionality for these foods would not, which is why the interpretation “point” is important.

Protein, a single nutrient index (not pictured), was moderately correlated with the NR_A and more strongly correlated with the NR_I metric (protein vs. NR_A: ρ = 0.44; protein vs. NR_I: ρ = 0.79). Nevertheless, while non-omnivores should watch their protein intake, few populations need significantly more protein than currently consumed; what is of greater need is increased intakes of micronutrients associated with ASF like zinc and iron. Nevertheless, pertinent cases for a single-nutrient analysis include examining a specific population deficient in this nutrient or comparing production systems under different management practices that are supposed to influence the amount of this nutrient in a certain food.

3.1.8 Across-the-board vs. group-specific

The previous examples have been for across-the-board metrics. Group-specific metrics allow actors to better understand the variability of nutrient densities within a food group. For example, as shown in Fig. 10, within the nut group, certain nutrients, such as protein or potassium, are comparable in levels across nut types; however, there are nutrients for which variability is much higher. For example, the nut group is, on average, high in vitamin E because it contributes 27% of DRI and is a moderate contributor to B6 (12% DRI). However, while almonds (81% DRI) and hazelnuts (47% DRI) are strong sources of vitamin E, cashew (3% DRI) and walnuts (2% DRI) are not. Likewise, for B6, pistachios are a good source (37% DRI), but Brazil nuts (2% DRI) and almonds (3%) do not significantly contribute to DRIs.

Fig. 10
figure 10

Percent daily recommended intake (DRI) of nutrients for the nut food group. This figure shows the contribution of various nuts to DRI. The dashed lines represent the mean values across the nuts for each nutrient. A value of 1 indicates that the nut provides 100% of daily recommended intakes. DRI values are from the Codex Alimentarius (McLaren 2021) and nutrient amounts were calculated per serving size of raw nuts from the USDA food composition database (USDA 2020)

Group-specific metrics predominately include nutrients deemed relevant to a food group; however, this specificity could lead to missed tradeoffs with nutrients excluded from such an index. For example, on average, the nut group is low in niacin (6% DRI), riboflavin (6% DRI), and folate (6% DRI) compared to other nutrients; thus, a group-specific metric would likely exclude these nutrients. However, almonds have relatively high contributions of riboflavin (27% DRI), and groundnuts have high amounts of niacin (23% DRI) and folate (17% DRI). Such findings are relevant for populations wherein deficiencies of these nutrients are prevalent; for instance, riboflavin is a common deficiency in vegan diets, and folate is a nutrient of concern for pregnant women.

3.2 Case study results: nutrient metrics as functional units

This section discusses n-LCA results when there is evidence of strong variations in rankings depending on which nutrient metric is used as the FU, by examining how “points” can affect nutritionally-invested environmental impacts (i.e., combined sustainability scores). As with the nutrient indices, quintile rankings for food groups can shift substantially depending on the metric used (Fig. 11). However, there appears to be a relatively smaller variation in rankings for nutritionally-invested environmental impacts when compared to differences for the nutrient indices. Quintile rankings for food groups in relation to their mass-based GHG and stress-weighted water use impacts are shown in Fig. 12 and Fig. 13, because GHG emissions (kg CO2eq) and water use (L) have a weak relationship (R2 = 0.061), whereas GHG and other impacts have moderately-strong predictive relationships (R2 > 0.6595) (Table S5). Quintile rankings of food items for land use (m2), acidification (kg SO2eq), and eutrophication (kg PO43−eq) can be found in Fig. S3, S4, and S5.

Fig. 11
figure 11

Coefficient of variation (CV) for quintile rankings of nutritionally invested environmental impacts

Fig. 12
figure 12

N-LCA results for GWP. GHG emissions under different functional units: kg and different n-FUs. Rankings from 0 (bottom quintile of nutritionally-invested environmental impacts i.e., environmental friendly) to 4 (i.e., environmentally unfriendly). Threshold method is applied for all metrics to remove fractional and negative values

Fig. 13
figure 13

NLCA results for water use. Water use under different functional units. kg and different n-FUs. Rankings from 0 (bottom quintile of nutritionally invested environmental impacts i.e., environmental friendly) to 4 (i.e., environmentally unfriendly). Threshold method is applied for all metrics to remove fractional and negative values

For our impact categories, we calculated the coefficient of variation (CV) on the quintile rankings. A large CV indicates that the choice of nutrient metric (i.e., the influence of “points”) strongly affects impacts, while a smaller CV implies that nutrient metrics have less of an effect. The choice of metric has a considerable impact on certain food groups such as fruits, fortified foods, pulses, vegetables (except in the case of land use), and tubers. Vegetables have much lower land use impacts; therefore, variations in scores of nutrient indices have a lower influence on results. Likewise, “points” have little effect on high-impacting food groups like meat when concerning GHG emissions, land use, acidification, and eutrophication because the environmental impacts are much higher in comparison to other foods that the nutrient density does not affect relative quintile rankings. For example, beef has a quintile ranking of 4 regardless of the metric used, lamb and mutton only moved to a more favorable quintile ranking with one metric. On the other hand, milk which has moderate environmental impacts on a kg basis moves between quintile rankings of 0 and 3 depending on the metric used. A similar occurrence happens when environmental impacts on a kg basis are very low; for water use, cassava, bananas, root vegetables, and onions and leeks have quintile rankings of 0 regardless of the metric used.

Wilcoxon results (Table S4), were calculated on ranks for individual food items, not on quintile rankings. When energy standardized, there is a significant difference between nutritionally-invested environmental impacts on a capped (NR_H) and uncapped basis (NR_G) for all impact categories (pall < 1E-5). As explained, the variation in absolute nutrient density scores across food groups is less on a capped basis (Fig. 7), which means that existing differences in environmental impacts will have a larger effect. Estd has a significant effect when comparing the NR_A and NR_G (pall < 1E-5) and QI and NR_I (pall < 1E-5), which is expected as Estd has a large effect on nutrient index scores.

The use of disqualifying nutrients in the FU is debated for pertinent reasons presented earlier. As expected, due to its strong influence on nutrient density scores, there are significant differences for all environmental categories: NR_A vs NR_E (pall < 1E-5), QI vs NBC_C (pall < 1E-5) and QI vs NBC_D (pghg, water, eutrophication, acidification < 1E-5, pland = 5.86E-3). When comparing metrics that include disqualifying nutrients, there is a non-significant difference between the NBC_C and NR_E in the water use category (p = NS); all others are significant (pghg, land, eutrophication, acidification < 1E-5).

For the reference amount "point", we also see a significant difference in nutritionally-invested environmental impacts between the NR_A and NR_C (pghg, land, acdification = 1.33E-2, pwater = 1.05E-2, peutrophication = 1.33E-2) and the NR_J and NR_I (pghg, land, acidification = 1.37E-2, pwater = 1.08E-2, peutrophication = 1.37E-2). Interestingly, between the NR_J (which has 22 nutrients) and NR_A (which has 9 nutrients), there are no significant differences (p = NS). The inclusion of dietary specificities has a variable effect. When comparing a dietary-specific metric (NRprot-sub) to a generic one with 22 nutrients (QI) there is a significant difference (pghg, land, acidification, eutrophication = 1.58E-2, pwater = 1.56E-2). In contrast, when comparing the NRprot-sub and the generic NR_G that has 9 nutrients, the p-value is nonsignificant (p = NS).

Lastly, weighting (p < 1E-5) results in significant differences and the selection of nutrients under pairwise comparisons does as well, except for the protein index (1 nutrient) which was surprisingly not significantly different from the NR_I (22 nutrients). Protein was significantly different from the NR_A (pghg, land, acidification, eutrophication = 1.2E-2, pwater = 1.2E-2) and the NR_A and NR_I were also significantly different from one another (pghg, land, acidification, eurtophication = 8.5E-3, pwater = 5E-3). The aforementioned differences in outcomes (e.g., significances in Wilcoxon p-values, CV values, and shifts in rankings of various foods in relation to particular “points”) support the conclusion that we need clearer guidance on using nutrient indices in combined sustainability analyses.

3.3 Framework recommendations

The following table (Table 4) offers recommendations for applying nutrient metrics in combined sustainability analyses. The recommendations are color-coded by prescriptiveness as detailed in the caption.

Table 4 Framework for nutrient adequacy metrics

4 Conclusion

We presented recommendations in the “points of differentiation” framework across different food levels. We offered recommendations with varying degrees of certainty because more research is needed in particular contexts before deciding on prescriptive recommendations. We clearly showed that applying this framework can enhance transparency in studies and comparability across them, which is imperative as the choice of metric can greatly influence results. Changes in nutrient density scores, quintile rankings, Spearman rank coefficients, CV values, and significant Wilcoxon test p-values demonstrate that nutrient indices can be influenced by “points.” Based on these results, the “points” that had the biggest influence are disqualifying nutrients and capping depending on how they are treated, reference amounts, and energy standardization, as determined by their weaker correlations and significant Wilcoxon p-values. Dietary specificities were highly relevant to nutrient indices but less for n-LCA results. However, this could change depending on the dietary context one is examining. Lastly, the influence of “points” varied across different food groups. Overall, it appears that the influence of “points” was less apparent for nutritionally-invested environmental impacts than for nutrient indices alone. Assessing trade-offs across nutritional and environmental dimensions can elucidate results imperative in the transition to a more sustainable food system by benefiting farmers, industry actors, policymakers, and consumers. Future work should more closely examine differences within food groups and evaluate the influence of “points” with respect to validation; for example, is the application of energy standardization in nutrient metrics important for health outcomes?

The most crucial issue is to consider these “points” carefully and as completely as possible. While it may be difficult to include all “points” in a study (e.g., context- and dietary-specific considerations due to lack of data, validation due to time constraints, or specific case studies such as fortified foods), practitioners should deliberate as many "points" as feasible; for example, by justifying the choice of weighting when applied. The use of n-FUs is warranted in many situations; however, due to a lack of explicit guidelines regarding their use in LCA, they should be deemed complementary metrics (i.e., they should be used in tandem with a volume- or mass-based FU). A recent FAO publication echoes the same sentiment (McLaren 2021). Moreover, the interpretation of the FU is critical (e.g., is it contingent or non-contingent). Alternatively, nutrient indices can be used as a separate indicator in the impact assessment phase to better explore the relationship between food matrices, food groups, and disqualifying nutrients or the moderating effect of qualifying nutrients. The use of the presented framework should be a guide to help streamline the use of nutrient metrics and communicate results across studies in a more transparent manner.

Going forward, major areas to explore include the role of disqualifying nutrients, the selection of nutrients in certain contexts, the role of capping for particular nutrients based on population deficiencies or on fat versus water soluble nutrients, the role of food functionality, and finally, interpretation and data quality (e.g., uncertainty analyses). Food functionality includes aspects of interaction factors, bioavailability, and processing. More research into single nutrients versus ratios and DRI values reflective of anti-nutrients and overall bioavailability, as done with iron and zinc, can be useful. DRIs related to polyphenols and antioxidants can also be developed to better illustrate the health value of certain foods. Moreover, many metrics do not include the influence of low-quality processing (which is sometimes referred to as ultra-processing) on nutrients and the subsequent effects on health. Additionally, the role of fortification vehicles needs to be addressed (e.g., is it justified to fortify foods that are high in disqualifying nutrients? Does this result in burden shifting from one health issue to another?) Finally, selecting nutrients specific to the dietary context or population needs is imperative because large-scale solutions do not always scale down to regional or local levels. Recognizing that populations have different challenges and solutions is the next step toward addressing the sustainability crisis. The most illustrative example of the need for contextual dependence is the case of animal meat; while environmentally detrimental (particularly ruminant meat) on a global scale, the production of animal meat is needed for certain subpopulations with limited access to protein-rich alternatives. Even in high-income countries, there are still massive disparities between low-income and minority populations compared to their wealthier counterparts in terms of food accessibility and associated diseases that arise with their consumption (e.g., obesity, cardiovascular complications, and nutrient deficiencies). However, such nuances are only visible if regionally explicit or non-aggregated data is available and used. Accordingly, we need metrics reflective of these nuances. N-LCA can be helpful for many actors, but it is still in need of further development. Future work should explore these “points” for various foods and in relation to other environmental impacts.