Addressing water quality in water footprinting: current status, methods and limitations

In contrast to water consumption, water pollution has gained less attention in water footprinting so far. Unlike water scarcity impact assessment, on which a consensus has recently been achieved, there is no agreement on how to address water quality deterioration in water footprinting. This paper provides an overview of existing water footprint methods to calculate impacts associated with water pollution and discusses their strengths and limitations using an illustrative example. The methods are described and applied to a case study for the wastewater generated in textile processing. The results for two scenarios with different water quality parameters are evaluated against each other and the water scarcity footprint (WSF). Finally, methodological aspects, strengths and limitations of each method are analysed and discussed and recommendations for the methods application are provided. Two general impact assessment approaches exist to address water quality in water footprinting: the Water Degradation Footprint (WDF) calculates the impacts associated with the propagation of released pollutants in the environment and their uptake by the population and ecosystem, while the Water Availability Footprint (WAF) quantifies the impacts related to the water deprivation, when polluted water cannot be used. Overall, seven methods to consider water quality in water footprinting were identified, which rely upon one or a combination of WDF, WAF and WSF. Methodological scopes significantly vary regarding the inventory requirements and provided results (a single-score or several impact categories). The case study demonstrated that the methods provide conflicting results concerning which scenario is less harmful with regard to the water pollution. This paper provides a review of the water pollution assessment methods in water footprinting and analyses their modelling choices and resulting effects on the WF. With regard to the identified inconsistencies, we reveal the urgent need for a guidance for the methods application to provide robust results and allow a consistent evaluation of the water quality in water footprinting.


Introduction
Global risks associated with water pollution have been emphasized in several studies, e.g. spread of waterborne diseases (Schwarzenbach et al. 2010;, ecosystem deterioration (UN-Water 2011) and reduced drinking water availability (FAO and IWMI 2017). The impacts on human health resulting from the use of unsafe water rank 4th out of 19 major risk factors with the highest effects observed in lowincome counties (WHO 2009). Despite setting global targets for achieving better water quality within the sustainable development goal 6 (SDG #6) (UN 2015), water pollution due to the agriculture, industrial production and domestic water use is predicted to increase in the coming years (FAO and IWMI 2017;OECD 2012).
Water footprint (WF) is a widely applied tool for the quantification of the impacts associated with water use throughout the life cycle of products. The general procedure for the WF quantification is set up in ISO 14046:2014, which requires an evaluation of both water quantity and quality. The standard determines water quality as "physical (e.g. thermal), chemical and biological characteristics of water with respect to its suitability for an intended use by humans or ecosystems" (ISO 2014).
Until now, most WF studies have been focussing only on the quantitative aspects of the water use. Water quality, in contrast, has gained less attention in water footprinting. Lovarelli et al. (2016) conducted a review of 96 WF studies and discovered that only 46% of them included a water pollution assessment, while the rest considered only the water consumption. Nevertheless, several industries cause significant impacts on water resources mainly due to water pollution, while having a relatively low water consumption. For example, Chapagain et al. (2006) calculated a WF of around 800 l per kg of cotton fabric resulting from the wastewater generated during the textile processing stage. At the same time, water consumption in the textile processing was negligible. Gerbens-Leenes et al. (2018) discovered that for steel production, the impacts associated with water pollution are about 20 to 220 times higher than that of water consumption. Therefore, the need to include water quality aspects in water footprinting has been recently emphasized by several authors (Berger and Finkbeiner 2013;Liu et al. 2017;Pradinaud et al. 2018;van Vliet et al. 2017).
Currently, there is a lack of harmonization in the terminology used across the publications in the field of water footprinting. Therefore, for consistency reasons, the nomenclature proposed by ISO 14046:2014 is adopted throughout this article.
Water degradation: negative change in water quality. Water availability: extent to which humans and ecosystems have sufficient water resources for their needs. If water availability only considers water quantity, it is called water scarcity. Water scarcity: extent to which demand for water compares to the replenishment of water in an area, e.g. a drainage basin, without taking into account the water quality (ISO 2014).
In water footprinting, two consequences of water pollution are modelled: (1) reduced water availability-Water Availability Footprint (WAF); and the intake of or contact to the pollutants-Water Degradation Footprint (WDF). Both WAF and WDF can be calculated at the midpoint or endpoint levels for the areas of protection (AoP) human health, natural environment and resources. The WAF quantifies the impacts originating from the water deprivation, which occurs when contaminated water does not fulfil the quality requirements of a user and therefore cannot be used (ISO 2014). The impacts of water deprivation depend on the affected user; for example, malnutrition is caused by agricultural water deprivation (Motoshita et al. 2014;Pfister et al. 2009) and infectious diseases by domestic water deprivation (Boulay et al. 2011a; Motoshita et al. 2011). The WDF quantifies the impacts caused by the spreading of the contaminants in the environment (i.e. pollution of water bodies) and their effect on the target (humans or ecosystem), e.g. toxicity or eutrophication potential. Water Scarcity Footprint (WSF) quantifies the impacts associated with reduced water availability resulting from water consumption, i.e. it does not consider the changes in water quality in the inventory analysis (Fig. 1). The term WF is used throughout this article for all three terms WDF, WAF and WSF.
The goal of this paper is to provide a review of existing methods to consider water quality in water footprinting. For this purpose, each method is described and applied to a case study, which allows for comparing the results provided by different methods. The paper is structured as follows: in Section 2, the methods are introduced, Section 3 describes the scope of the case study and Section 4 provides the results including an overview of the methodological aspects of all introduced methods and the case study results. The strengths and shortcomings of the methods are discussed in Section 5. In Section 6, the conclusion and outlook for future research are provided.
2 Methods to address water quality in water footprinting As described in the introduction part of this paper, existing methods to consider water quality in water footprinting rely upon one or combine two overarching impact assessment approaches: WAF (impacts due to water deprivation) and WDF (impacts due to the intake of or contact to the discharged pollutants).
The WAF quantifies the amount of or impacts related to water deprivation and is calculated by means of the distanceto-target (DtT) or functionality approach. The DtT approach relates the inventory (emissions to water) to a desired value, usually a water quality threshold. Thus, a DtT value above 1 indicates an exceedance of water quality thresholds. Resulting WAF is derived from the most penalizing pollutant, i.e. the one with the greatest threshold exceedance. The functionality approach classifies water into different categories depending on for which users (e.g. domestic, agriculture) it is functional based on its quality properties. If water of a lower class is discharged compared to the withdrawn water, it is not functional anymore for a specific user group. In this case, water withdrawal is accounted for as being consumed (i.e. unavailable) for these specific users (UNESCO/WHO/UNEP 1996).
The WDF is calculated by means of the impact assessment models for water quality degradation, which quantify the propagation and transformation of discharged contaminants in different environmental compartments (e.g. air, freshwater, soil), their transport to and effect on the target organisms or ecosystems (Rosenbaum et al. 2018). Resulting impacts are attributed to the contact to or intake of the pollutants and are calculated usually for four impact categories: eutrophication, acidification, aquatic ecotoxicity and human toxicity (ISO 2014).

Methods based on the WAF
2.1.1 Grey water footprint (Hoekstra et al. 2011) The grey water footprint (GWF) is a part of the water footprint assessment proposed by Hoekstra et al. (2011) alongside the blue and green WF.
The GWF relies upon the DtT approach and represents the amount of water needed to dilute a contamination to a certain quality threshold (Hoekstra et al. 2011). It is calculated by dividing the pollutant load L (mass/time) by the threshold value c i,max (mass/volume) or the difference between the threshold and natural concentration c i,nat in cases when the pollutant occurs naturally in the environment (see Eq. (1)).
where i is the pollutant. For the non-point sources of pollution (e.g. agriculture), the pollutant load is calculated by multiplying the application rate Appl (e.g. kg/ha) with the leaching rate α (%) of the corresponding pollutant (Eq. (2)).
When more than one water quality parameter is evaluated, the GWF is calculated for each pollutant. Total GWF is then determined based on the most penalizing pollutant, i.e. causing the highest GWF (Hoekstra et al. 2011). This implies that calculated amount of dilution water would be sufficient to assimilate all other discharged contaminants. The result is expressed in l/kg of product and reported separately or summed up with the blue and green WF.
2.1.2 Pollution-induced water scarcity (Zeng et al. 2013) Zeng et al. (2013) proposed a method to integrate the GWF (m 3 ) results into a water scarcity assessment for a river basin. For this, the indices for water quantity scarcity I blue and pollution induced water scarcity I grey in a basin b are summed up to a water scarcity index I (dimensionless) (Eq. (3)).
The authors calculate the GWF according to the method proposed by Hoekstra et al. (2011) as described in Section 2.1. Then, I grey is calculated by dividing the GWF by total renewable freshwater resources Q (m 3 ) available in the river basin (Eq. (4)).
If I grey < 1, the total contamination can be assimilated in the available freshwater resources. I blue is calculated by dividing water withdrawal by total renewable freshwater resources available in the river basin (Q b ). 2.1.3 Water Impact Index (Bayart et al. 2014) The Water Impact Index (WII) is proposed as a screening assessment for the water use including both water consumption and pollution (Bayart et al. 2014). WII is calculated as the difference between the amounts of water withdrawn and returned to a water body, both values multiplied by the corresponding water scarcity and water quality indices (Eq. (5)). The results are expressed in m 3 impact index equivalent (Bayart et al. 2014).
where W (m 3 ) is the amount of water withdrawn from a water body i, R (m 3 ) is the amount of water returned to a water body j, Q W and Q R (dimensionless) stand for the quality index of the withdrawn and returned water, respectively, and WSI (dimensionless) is the water scarcity index provided by the model of Pfister et al. (2009). The quality index follows the (inverse) DtT approach and is calculated as the ratio of the reference concentration of the pollutant (a threshold) to the actual concentration present in the withdrawn and discharged water. In the same manner as the GWF calculation, when several water quality parameters are evaluated, the result is based on the most penalizing pollutant. The water quality index lays between 0 and 1 and reaches the maximum of 1 when the reference concentration c p, ref is equal to or higher than the actual concentration c p in the withdrawn or discharged water (Eq. (6)).
When the water quality index turns 1, it is assumed that the entire volume of water returned to the water body remains available for human use (irrespective of whether c p is equal to or substantially lower than the threshold (see Eq. (5)). Otherwise, discharged water becomes partly unavailable due to insufficient quality and therefore is accounted for as being consumed. In this case, the more c p exceeds c p, ref , the larger the amount of water consumed. Boulay et al. (2011a, b) The method proposed by Boulay et al. (2011a, b) is based on the functionality approach, which implies that water quality degradation can render water unavailable for certain users.

The method of
Since the water quality requirements of different users vary, water polluted with certain contaminants can be unfit, for example, for domestic users, but still be suitable for irrigation purposes. To address this issue, the authors introduce eleven water user types: three domestic, two agricultural, and one industrial user, cooling, fisheries, hydropower, transport, and recreation. For each user, the desired water quality is specified, which results in eight water categories from excellent to unsuitable for both surface and groundwater. Furthermore, rain is introduced as a separate water category being functional for all users. For each category, water quality thresholds are set based on the national and international water quality guidelines. Overall, eleven general parameters (e.g. pH), 38 parameters for inorganics (salts and heavy metals) and 87 parameters for organics are considered. Parameters relevant for the WDF calculation have to be selected depending on the industry being assessed (Boulay et al. 2011a).
The method allows for quantifying the impacts on the midpoint (Water Stress Indicator) and endpoint (diseases due to lack of hygiene and/or malnutrition) level (Boulay et al. 2011b). The Water Stress Indicator is calculated as the difference between the volumes of withdrawn and discharged water of a certain category, each multiplied by the corresponding water stress index (Eq. (7)). The results are expressed in m 3 equivalent of water.

Water Stress Indicator
where α is the water stress index (dimensionless), and V i,in and V i,out (m 3 ) stand for the volume of withdrawn and discharged water, respectively, and i is the water category.
The impacts on human health are measured in disability adjusted life years (DALYs) and calculated similarly by multiplying the volume of withdrawn and discharged water by the human health impact factors for the corresponding water category. Resulting damage on human health depends on the users who are deprived. For example, malnutrition arises from water deprivation in agriculture and fisheries, while impacts associated with lack of hygiene and sanitation result from domestic water use deprivation.
2.2 Methods based on the WDF or combined approaches 2.2.1 Impact assessment models for water quality degradation Impact assessment models for water quality degradation quantify the WDF, i.e. the impacts resulting from the intake of or contact to the pollutants. The calculation is based on the ISO 14040 and ISO 14044 standards (ISO 2006a, b) for the life cycle assessment (LCA) and can be carried out at midpoint and/or endpoint level. The method includes two steps: assigning the elementary flows compiled in the inventory (emissions to water) to the relevant impact categories (classification) and multiplying them by the corresponding characterization factors (CFs) (characterization) (Eq. (8)).
where E is the emission, p is the pollutant and c stands for the impact category. The CFs themselves are calculated using the environmental mechanism models that describe the cause-effect chains between the inventory and resulting impacts on the environment for all elementary flows that contribute to the selected impact category (ISO 2006b). The cause-effect chains are modelled based on the contaminants' propagation and transformation in the environment (fate), transport and contact to targets including ecosystems and humans (exposure), and negative impacts on the targets (effect), e.g. a disease (Rosenbaum et al. 2018).
In contrast to the LCA studies, where a comprehensive set of environmental impacts are considered, for the WDF, only the impact categories related to water quality, e.g. aquatic eutrophication, acidification, aquatic ecotoxicity and human toxicity, are accounted for (Boulay et al. 2015;ISO 2014).
Aquatic eutrophication impacts are calculated based on the waterborne emissions of nutrients (N and P) and organic compounds. The cause-effect chain includes the emissions' transport, increased nutrient concentration in water compartments and subsequent increased algal growth and oxygen depletion in lakes. Acidification impacts are related to the emissions of acidifying compounds, e.g. sulphur oxides and ammonia. The latter can be absorbed by water and form acids, which reach water bodies via rainfall and leaching from soils (Rosenbaum et al. 2018). Human and ecotoxicity are attributed to emissions of heavy metals and organic compounds that have adverse impacts (carcinogenic and non-carcinogenic) on human health and ecosystems. The cause-effect chains model the contaminants' spread in the environment, contact to or intake by the organism and resulting negative impact on the target (Rosenbaum et al. 2008).
Calculating WDF at the endpoint level requires further modelling steps, which cover the entire cause-effect chain for the areas of protection human health, natural environment and resources. An example of the cause-effect chain for the AoP natural environment is species extinction and the subsequent damage to aquatic ecosystems due to eutrophication or acidification. Human health damage can be attributed to the intake of pollutants via food and drinking water (impacts due to diseases). Providing results at the endpoint level allows the aggregatation of impacts from both water consumption and pollution (i.e. WSF and WDF) into one value, e.g. impact on the human health expressed in DALY (disability adjusted life years) (Huijbregts et al. 2017).

The method of Ridoutt and Pfister (2013)
Ridoutt and Pfister (2013) emphasize the need for a standalone water footprint indicator considering both quantitative and qualitative aspects of the water use, pointing out that the GWF has not gained a broad acceptence. The authors propose a method, which combines the WSF and WDF in the units "water-equivalents" (H 2 O-eq.) and allows the results to be summed up into a single score.
WDF is addressed by the authors as degradative water use (DWU) and quantified using the Life Cycle Impact Assessment (LCIA) model ReCiPE for each emission (Huijbregts et al. 2017). The recommended impact categories are freshwater eutrophication, freshwater ecotoxicity and human toxicity, which are quantified at the endpoint level for the AoPs human health and natural environment. Normalization and weighting are then conducted. During the normalization step, the results in each category are related to a reference, which are the European factors in the method of Ridoutt and Pfister (2013). Weighting aggregates the results of different impact categories into a single score through multiplication with selected factors (value choice) (ISO 2006b). This provides a single value for all considered pollutants in the units ReCiPe points. Finally, the result is divided by the WSF (also expressed in ReCiPe points) of 1 l water consumption (CWU) (global average consumption weighted value) (Eq. (9)). The authors calculated 1.86E-06 ReCiPe points for the global average for 1 l CWU (based on Ridoutt and Pfister 2013).

DWU ¼
ReCiPe points emissions to water from the product system ð Þ 1:86E−06 ReCiPe points global average for 1 litre CWU ð Þ Then, the DWU is summed up with the impacts associated with the CWU to obtain a single score. The CWU (H 2 O-eq.) is calculated by multiplying water consumption with the water scarcity index (WSI) of the corresponding river basin and dividing the result by the global average WSI.

Pollution Water Indicator (Lovarelli et al. 2018)
Lovarelli et al. (2018) point out that the GWF method is insufficient to reflect water contamination comprehensively, because it is based on only one (the most penalizing) pollutant. Nevertheless, other contaminants may cause damage to human health and environment, even if present in low concentrations. To address this issue, the authors introduce the Pollution Water Indicator (PWI), which combines the WDF and WAF. The PWI calculates the GWF according to the method of Hoekstra et al. (2011) and three impact categories for water quality degradation freshwater eutrophication, marine eutrophication and freshwater ecotoxicity. The results of the GWF and impact categories are plotted on the axes of a spider diagram and connected to each other, with each axis representing a vector of an environmental impact. To allow for plotting, the results' unit and order of magnitude in different impact categories are neglected (each vector varies between 0 and 1). The PWI (dimensionless) is calculated as the area of the obtained rhombus, thus, the smaller the area of the rhombus, the lower the resulting PWI.

Case study
This section describes the scope of the case study including the inventory data and calculation procedure, while the results are provided in Section 4.2. The case study is carried out for textile production, which represents one of the most water polluting industries worldwide (Ellen MacArthur Foundation 2017). The impact assessment is conducted for Pakistan, one of the world's major textile exporters that simultaneously suffers from acute water shortage and pollution (Statista 2020;WWF-Pakistan 2007).
The WAF and WDF are calculated for two generic scenarios of the textile processing step. Only wastewater emissions discharged directly into the environment from the textile production are considered. Overall, eight water quality parameters are included in the calculation (see Table 1). The results are compared to the WSF, which is calculated based on the water withdrawal and discharge data ( Table 1). All results refer to one ton of fabric.
The inventory for the first scenario represents the average wastewater from ten textile processing plants and literature data (InoCottonGROW 2019; Manzoor et al. 2006). In the second scenario, considered wastewater parameters are equal to the foundational thresholds of the Zero Discharge of Hazardous Chemicals (ZDHC) standard, which is an international guideline for the wastewater quality in the textile production (Stichting ZDHC Foundation 2016) (see Table 1). The volumes of withdrawn and discharged water are assumed to remain the same in both scenarios. It should be noted that the case study serves as a theoretical comparison of the WF methods and not for deriving conclusions on the case.
In order to determine WAF and WDF according to the methods described in Section 2, the following parameter settings have been chosen: the thresholds applied for the calculation using the methods based on the DtT approach (GWF, pollution induced water scarcity, WII) are derived from the National Environmental Quality Standard (NEQS) of Pakistan for industrial wastewater (PEPA 1999) (see Table 1). For the calculation of the pollution induced water scarcity according to Zeng et al. (2013), the data on total renewable freshwater resources of Pakistan is used (FAO AQUASTAT 2019). For the WII calculation, the WSI is set to 0.967 according to Pfister et al. (2009). In the method of Boulay et al. (2011a, b), the water stress index α is set to 1 and withdrawn groundwater belongs to the category 1 (fits for all users) according to Boulay et al. (2011a). The method is applied to calculate the impacts on the midpoint (Water Stress Indicator) and endpoint (human health damage) level.
For the WDF calculation, the impact categories freshwater ecotoxicity (FETP), human toxicity (HTP) and marine eutrophication (ME) are calculated by means of the ReCiPe 2016 method (Huijbregts et al. 2017). At the endpoint level, the damage on human health and ecosystems is calculated. For the PWI calculation, the GWF and results for the impact categories FETP, HTP and ME are used. The impact category HTP is used instead of freshwater eutrophication originally proposed by Lovarelli et al. (2018), since none of the emissions compiled in the inventory contributes to this impact category. The possibilities and effects of applying different impact categories are discussed in Section 5.7.
The WSF is calculated as the share of WAF related to water consumption for the methods based on the WAF approach. For the method of Ridoutt and Pfister (2013), the WSF is determined as described in Section 2.2.2. The WSF is not quantified for the impact assessment models for water quality degradation at the midpoint level, since in this case, the results of the WSF are not comparable to the WDF due to different units. At the endpoint level, the WSF is calculated for the human health impacts due to malnutrition using the characterization model of Motoshita et al. (2014). The WSF was not

Methods overview
Four of seven introduced methods calculate the WAF: GWF, pollution induced water scarcity and WII (all use the DtT approach) and the method of Boulay et al. (2011a, b) that is based on the functionality approach. The WDF is calculated by the impact assessment models for water quality degradation and the method of Ridoutt and Pfister (2013) (combined WDF and WSF). Lovarelli et al. (2018) combine the WAF and WDF (see Fig. 2 and Table 2). All methods that calculate the WDF consider all pollutants compiled in the inventory as long as they contribute to selected impact categories. This means that for each relevant pollutant, a cause-effect chain is modelled and the resulting impact is quantified, irrespective of whether this pollutant is emitted in a concentration below or above the water quality threshold. The method of Boulay et al. (2011a, b) adopts the functionality approach and therefore considers all pollutants whose concentrations exceed the water quality thresholds specified by the method for a total of 136 parameters. Therefore, several contaminants influence the resulting water functionality, while others are neglected if emitted in the concentrations lower than the thresholds. The methods based on the DtT approach (GWF, pollution induced water scarcity and WII) consider only one most penalizing pollutant, while the impacts of other emissions are neglected, even if their concentrations are higher than the thresholds. None of the methods based on the DtT approach specify the quality thresholds to be applied in the calculation. The quality of withdrawn water is considered in the method of Boulay et al. (2011a, b) and WII.
Except the GWF, which does not consider local consequences of water pollution, all methods allow the user to conduct a regionalized impact assessment. This is achieved through the application of country-specific water scarcity factors (in the case of WAF) or spatially explicit cause-effect chains including fate and exposure modelling (WDF).
Three methods provide results at the endpoint level: the method of Boulay et al. (2011a, b), impact assessment models for water quality degradation and the method of Ridoutt and Pfister (2013). While the method of Boulay et al. (2011a, b) calculates impacts only for the AoP human health, the other two methods provide results for all three AoPs human health, natural environment and resources.
Six methods provide results as a stand-alone indicator as WAF or a combination of two different approaches (WDF/ Fig. 2 Modelling steps of the WF methods addressing water quality. Each method is highlighted in a different colour. The results obtained by the methods are highlighted in italics and with bold frames WAF or WDF/WSF). The impact assessment models for water quality degradation are the exception since they produce results for different impact categories, which cannot be summed up into a single score. The GWF, pollution induced water scarcity, impact categories for water quality degradation at the endpoint level and the method of Ridoutt and Pfister (2013) allow comparing the WDF/WAF and WSF, since both results are provided in same units (e.g. m 3 or DALY). The WAF calculated by means of the WII and the method of Boulay et al. (2011a, b) can be compared to the WSF, when the water quality index is set to one. In this case, the WAF results from the water consumption only and thus is equal to WSF. Methodological aspects of all methods are summarized in Table 2.

Case study results
The case study demonstrates that the WDF and WAF calculated by means of different methods significantly vary (see Table 3 and Fig. 3). Overall, a general trend can be observed: the methods that rely upon the DtT approach yield a higher WAF in the first scenario: GWF, pollution-induced water scarcity and WII. This can be explained by the fact that while in the first scenario, several pollutants are present in concentrations higher than the applied thresholds; in the second scenario, no pollutants exceed these thresholds (see Table 1), which makes the water pollution indices equal zero. For the WII result in the second scenario, the impact of 25.7 m 3 impact index eq. is driven by the water consumption only.
The method of Boulay et al. (2011a, b) provides the same results for both scenarios, since discharged water is unusable for all users in both cases (Fig. 3). This happens because in both scenarios, the emissions significantly exceed drinking water quality thresholds adopted by the method. Apart from COD and TDS, which are not considered in the method, all water quality parameters are accounted for by the WAF calculation.
The impact assessment models for water quality degradation and method of Ridoutt and Pfister (2013)) provide a higher WDF for the second scenario (except for the impact category ME), which can be explained by increased chromium and copper concentrations. The latter determine toxicity related impact categories (HTP and FETP) and the results at the endpoint level. Marine eutrophication is calculated based on the nitrogen emissions (total-N), which are emitted in a lower concentration in the second scenario and therefore lead to a lower result in the impact category ME compared to the first scenario. Other contaminants included in the inventory are not reflected by the WDF since they do not contribute to any impact category. Therefore, high concentrations of COD, BOD, TDS and oil and grease (over the thresholds) in the first scenario and their potential impacts are not reflected by the WDF (see Table 3).
The PWI combines the impact categories' results with the GWF and therefore considers COD (used for the GWF calculation) in addition to the emissions included in the calculation of ME, MEPT and HTP. Calculated PWI is about 2.4-times higher in the first scenario compared to the second one.
As described in Section 4.1, all methods except the impact assessment models for water quality degradation and PWI allow for comparing the WF results related to water pollution (WDF/WAF related to water pollution) to WSF. In case of the models based on the WAF approach, this can be achieved by setting the quality indices to zero, which turns the WAF being equal to WSF (i.e. WAF is caused by water consumption only). Figure 4 demonstrates the shares of water pollution (WDF/WAF related to water pollution) and water consumption (WSF/WAF related to water consumption) in the total WF. The methods based on the DtT approach have a higher contribution of the pollution related WAF to the total WF in the first scenario: GWF (96%), pollution-induced water scarcity (80%) and the WII (80%). In the second scenario, these methods do not account for water pollution (because the water quality thresholds are not exceeded); therefore, resulting WF is attributed to WSF only. According to the method of Boulay et al. (2011a, b), water pollution contributes to 86% of the total WF in both scenarios. The same result is yielded because water belongs to the category "unusable" in both scenarios as described above. According to the impact assessment models for water quality degradation at the endpoint level (AoP human health), WDF accounts for only 0.01% of the total WF in the first scenario and 0.04% in the second one. The WDF calculated by means of the method of Ridoutt and Pfister (2013) contributes to 7% and 30% of the total WF in the first and second scenario, respectively (Fig. 4). Detailed results for each method are provided in ESM S.1.

Discussion and recommendations
In the previous sections, the methods to consider water quality in water footprinting were described and applied to a case study. Each method is based on one or combines two impact assessment approaches (DtT, functionality and pollution based environmental mechanism), each addressing water pollution in a different way. This leads to diverse and in some cases conflicting results as demonstrated in the case study.

GWF
The GWF calculation is based on the DtT approach and therefore requires selecting water quality thresholds, which are then set in relation to the inventory. This step is a value choice, since there is no consensus on which thresholds should be used in water footprinting. The water quality limits are usually taken from national or international water quality standards or, Table 3 Case study results: WDF and WAF for the scenario 1 and 2 and WSF. Pollutants exceeding the thresholds are highlighted in bold (in the second scenario, none of the pollutants is in the concentration over the thresholds). It should be noted that the WDF considers only water pollution, while the WAF considers both water pollution and consumption; the WSF considers only water consumption  Berger and Finkbeiner (2010). For example, the GWF calculated in the case study for the first scenario is based on the national water quality standards of Pakistan (NEQS) (PEPA 1999) and amounts to 499 m 3 . Applying more ambitious aspirational thresholds (Stichting ZDHC Foundation 2016) or the thresholds for drinking water used in the method of Boulay et al. (2011a, b) (EEC 1975 will lead to a two-and twelvefold GWF increase, respectively (Fig. 5).
Overall, the GWF can be misinterpreted for endorsing the potential to assimilate water pollution in the available freshwater resources (Wichelns 2015). Furthermore, the method Fig. 3 Case study results. The highest result of each method is set to 100% and the lowest result is set in relation to it Fig. 4 Case study results. Relative contribution of WDF/WAF and WSF to the total WF for the scenario 1 and 2 does not consider local water scarcity; thus, it may be unclear whether the result is problematic or not (Wichelns 2017). Nevertheless, GWF is widely applied to address water quality in water footprinting and has been calculated for a broad range of products from agricultural goods, e.g. maize (Chukalla et al. 2018) and wheat (Chu et al. 2016) to wastewater treatment plants (Morera et al. 2016).

Pollution-induced water scarcity
The pollution-induced water scarcity indicator proposed by Zeng et al. (2013) advances the GWF method by introducing spatial differentiation into the impact assessment, which is achieved through setting GWF in relation to available freshwater resources in the study area. In the case study, calculated pollution-induced water scarcity amounts to 2.0E-09 and, thus, lays significantly below the threshold of 1 set by the authors (see Section 2.2). The very low value is yielded because the inventory data for one ton of fabric is set in relation to the total freshwater resources of Pakistan. The threshold of 1 will be not exceeded even if the data for local water resources (e.g. on a province level) is used for the calculation. Thus, the method is applicable only for large inventories, e.g. water pollution of a whole city as demonstrated by Zeng et al. (2013) using the example of Beijing, but does not fit for the calculation on a product level.

WII
The WII adopts the DtT approach and provides results for both water quantity and quality as a single score. As addressed above, relating the inventory to a threshold leads to the fact that only one most penalizing pollutant is considered in the calculation, while other emissions are neglected. Same as by the GWF calculation, applying different thresholds has an influence on the result. Furthermore, for the WII calculation, both water consumption and pollution (as the water quality index) are multiplied by the water scarcity factor of the production region (see Eq. 5), which therefore strongly affects the WII. For example, if the water scarcity factor for Brazil (0.0659 according to Pfister et al. (2009)) is applied in the case study, resulting WII amounts to only 8.4 m 3 impact index eq., which is more than twelve times lower than the result for Pakistan obtained in the case study. This may lead to an underestimation of the WAF or provide incentives for companies to locate polluting industries in water-rich countries instead of reducing the emissions. Boulay et al. (2011a, b) The method of Boulay et al. (2011a, b) is based on strict water quality thresholds that allow a direct water use, e.g. for drinking or irrigation. Therefore, in both scenarios calculated within the case study, water is unsuitable for all users except transport and hydropower, even though in the second scenario it complies with the ZDHC foundational thresholds. These results may lead to loss of incentives for reducing water pollution from a company perspective, because the WAF remains same even by achieving strict industry-specific quality thresholds. The method specifies overall 136 water quality thresholds; however, the authors let the user decide which parameters to select for the WAF calculation depending on the data availability and industry being evaluated. This may provide misleading results, particularly if relevant (present in high concentrations) substances are not considered. Therefore, determining a set of industry-or process-specific water quality parameters that have to be included in the inventory analysis could support practitioners in the method's application. In the same manner as the WII calculation, water scarcity factors are directly applied in the calculation and therefore may significantly influence the result (see Eq. 7).

Impact assessment models for water quality degradation
Three impact categories were calculated in the case study at the midpoint level: FETP, HTP and ME. While the results for ME are higher in the first scenario, toxicity-related impacts (FETP and HTP) are higher in the second scenario due to increased chromium and copper concentrations. For the same reason, higher WDF is yielded in the second scenario at the endpoint level for both human health and natural environment AoPs. Apart from nitrogen (as total N), chromium and copper, other water quality parameters compiled in the inventory are not considered, since none of them contributes to any impact category. As a result, neglecting some pollutants may lead to significant underestimation of the WDF. Particularly nonbiodegradable organics (reported as COD) may cause severe damage to human health, e.g. due to intake of dyestuffs, residues of the auxiliary materials and breakdown products in case of the textile production (Roos et al. 2019). Since in the first scenario the COD concentration is five times higher than in the second one, a higher toxicity level can be anticipated as well. Thus, completeness of the inventory (coverage of all pollutants instead of providing a sum parameter as COD) is essential to conduct a comprehensive impact assessment and provide robust results; however, this might be challenging with regard to the data collection.
At the endpoint level, the WDF (human health damage due to toxicity) is four orders of magnitude lower than the WSF (damage due to malnutrition) (see Fig. 4). To validate these results, the WSF is calculated by means of two other models: malnutrition impacts according to Pfister et al. (2009) and health damage due to lack of water for domestic use and resulting infectious diseases (Motoshita et al. 2011). The comparison is carried out for the second scenario of the case study. The WDF contributes to only 0.1% of the total WF when applying the method of Pfister et al. (2009) and over one third of the total WF when using the method of Motoshita et al. (2011) (Fig. 6). These results demonstrate that the WDF calculated by means of the impact assessment models for water quality degradation at the endpoint level might be underestimated compared to the WSF. At the same time, this result can also be caused by the fact that several toxic pollutants reported as COD were not considered in the WDF calculation as discussed above.

The method of Ridoutt and Pfister (2013)
The method of Ridoutt and Pfister (2013) calculates the "single stand-alone weighted indicator" including both WDF and WSF. The method is based on the impact assessment models for water quality degradation and includes additional impact assessment steps: normalization and weighting. These additional steps enable aggregation of the results into a single score, but at the same time they may distort the results, e.g. due to normalization with the European factors and weighting. Due to the long calculation procedure (alone the DWU calculation includes four steps), some information may be lost, so that analysing the hotspots attributed to individual emissions becomes problematic. Furthermore, the results provided by this method cannot be used for public reporting due to the weighting step (ISO 2006a, b). Calculated DWU contributes to only 7% in the first and 30% of the total WF in the second scenario (see Fig. 4). In the same way as the toxicityrelated impact categories FETP and HTP, this can be explained by not individually considering organic pollutants included in the sum parameter COD.

PWI
The PWI developed by Lovarelli et al. (2018) combines the GWF and impact assessment models for water quality degradation. The calculation may lead to loss of information or distort the results since the units and orders of magnitude of the individual results obtained in different impact categories are disregarded for the PWI calculation. Furthermore, the authors do not provide any guidance on how to plot different results (GWF and impact categories) to calculate the surface area of the rhombus. Therefore, the resulting PWI depends on the way the chart is built. For example, in the case study, the GWF result is plotted on the same diagonal with the impact category FETP. In this case, the PWI obtained in the first scenario is about 2.4-times higher than the PWI of the second scenario (see Fig. 7a, b). If plotted in a different way (GWF and ME on the same diagonal), the PWI calculated for the second scenario is around 1.7-times higher compared to the first one (Fig. 7c, d). This inconsistency may lead to misinterpretation or misuse of the results since, as demonstrated, different plotting may lead to controversial results regarding which scenario is less detrimental with regard to water pollution.
For the case study calculation, the impact category HTP was applied instead of freshwater eutrophication as proposed by the authors, since none of the emissions compiled in the inventory contribute to this impact category. Lovarelli et al. (2018) do not specify whether the impact categories can be substituted, e.g. depending on the inventory or focus on specific environmental impacts. Providing a guidance for selecting the impact categories and plotting results could support practitioners and facilitate the application of the method.

Applicability and limitations
When selecting a method to calculate the impacts associated with the water pollution, practitioners first need to decide between the WAF and WDF, as these two impact assessment approaches differ in the way they address environmental issues. The WAF implies that if the water quality exceeds defined thresholds, it will not be available for users whose water quality requirements are not met. In this case, the damage Fig. 6 Relative contribution of WDF and WSF to the total WF: comparison of the case study results (based on Motoshita et al. (2014)) to the method of Pfister et al. (2009) and Motoshita et al. (2011) associated with water pollution results from the lack of water. Several WF models exist that calculate the health damage resulting from the water deprivation, e.g. due to malnutrition (for agricultural water deprivation) (Motoshita et al. 2014;Pfister et al. 2009) and infectious diseases (for domestic water deprivation) (Boulay et al. 2011a;Motoshita et al. 2011). However, as addressed by Pradinaud et al. (2018), water is often used despite the contamination. On the one hand, the presence of a contaminant may be not visible for the users, e.g. if it does not influence such water properties as colour, odour or taste. This applies to many contaminants, e.g. pesticides, pharmaceuticals and some heavy metals. On the other hand, even when being aware of the water contamination, some users would rather withdraw polluted water than suffer from water scarcity. This applies to many regions in the world, where people have to use polluted water (e.g. for irrigation) due to either inexistence or lack of access to a proper water supply (UN-Water 2019). In this case, water pollution will result in impacts associated with the contaminants taken up by the population rather than lack of water due to not using polluted water.
The quality of withdrawn water is considered in the GWF, WII calculation and the method of Boulay et al. (2011a, b). This allows for considering the initial pollution of the input water and is important in particular for the regions with an overall high pollution of the freshwater resources. For example, withdrawing water of a lower quality will result in a lower WAF compared to the case when unpolluted water (i.e. with a high water quality class) is discharged. This can even lead to a negative WAF in an extreme case when discharged water has a higher class than the withdrawn, which means that the evaluated process is providing additional water for the users. However, the inventory data for the withdrawn water is not reported in LCI databases and difficult to obtain, because the quality of the withdrawn water used in the industrial production is usually not measured apart from the quality parameters that are essential for the production process (e.g. hardness). Thus, currently, considering the quality of the withdrawn water is challenging and can be conducted either by gathering primary data or using highly aggregated datasets on a country or regional level.
All methods that rely upon the DtT approach (GWF, pollution-induced water scarcity and WII) perform low with regard to the completeness of scope (pollutant coverage), since the results are calculated based on one pollutant. However, most penalizing pollutant might be not the most harmful (Lovarelli et al. 2018). As demonstrated in the case study, heavy metals discharged with the wastewater are not addressed when applying the DtT approach, even with increased (but still below the thresholds) concentrations in the second scenario. Thus, neglecting some pollutants may lead to an underestimation of the impacts associated with water pollution. The method of Boulay et al. (2011a, b) and the methods that calculate the WDF have a much broader scope Fig. 7 Different plotting of the GWF and impact categories and its effect on the PWI with regard to considered pollutants, which however might be limited by data availability, i.e. emissions included in the inventory. Particularly for the impact assessment models for water quality degradation, considering individual substances instead of the sum parameters (e.g. COD, TDS, TSS) is crucial, but might be very time-consuming and challenging due to low data availability and high costs for carrying out a wastewater analysis. Furthermore, all water-related impact categories have to be considered to ensure full coverage of the water pollution-related impacts.
Apart from the GWF, all methods allow the impact assessment to be conducted considering regional context by means of country-specific water scarcity factors or regionalized cause-effect chains. As demonstrated in Section 5.3, applying water scarcity factors significantly influences the results, which should be taken into account when using the WII or method of Boulay et al. (2011a, b).
While the methods based on the WAF approach consider both water consumption and pollution and provide result as a single-score, WDF-based models, in contrast, require additional calculation of the WSF to consider the impacts related to water consumption. Therefore, a WAF assessment may be more straightforward (i.e. less calculation effort) and beneficial when communicating the results to stakeholders, while WDF allows for a more comprehensive impact assessment. When applying the WAF, it should be noted that the water consumption is already considered in the result to avoid double-counting, e.g. by calculating the WSF and adding it to the WAF results. Methods' limitations are summarized in Table 4.

Recommendations
As described in the previous section, when choosing a method for the quantification of the impacts resulting from water pollution, first, the underlying impact assessment approach (WDF or WAF) needs to be selected. This decision needs to be made by the practitioners, since currently, there is no guidance that specifies when the one or another approach should be applied. Therefore, the choice between these two approaches needs to be made by analysing which impact pathway (water deprivation or intake of the contaminants) is the most probable for the study area. For example, considering increasing water scarcity in many parts of the world, the usage of the wastewater for irrigation and resulting impacts on human health is more likely to occur than agricultural water deprivation due to water pollution (UN-Water 2017. We propose to distinguish between three general situations with regard to possible impact pathways and availability of the inventory data for water pollution. These archetypal situations can serve as a guidance for the practitioners to select the most appropriate WF method for their study: -Most probable impact pathway is the intake of or contact to the emitted contaminants; a comprehensive inventory is available (i.e. the inventory data includes the emissions of all contaminants relevant for the study); the inventory can be classified to an impact category (e.g. acidification, eutrophication etc.). We recommend to quantify WDF by means of the impact assessment models for water quality degradation. This allows to determine the full range of the impacts associated with water pollution and to identify potential trade-offs between different impact categories (e.g. eutrophication vs. human toxicity as demonstrated in the case study) -Most probable impact pathway is water deprivation due to water pollution; a comprehensive inventory is available. We recommend to quantify WAF by means of the method of Boulay et al. (2011a, b), which allows to consider all pollutants included in the inventory (in contrast to other WAF methods, which are based on the DtT approach and therefore consider only one most penalizing pollutant) -Most probable impact pathway is water deprivation due to water pollution; only one or few water quality parameters are available in the inventory. We recommend to quantify WAF by means of the WII. It should be taken into account that the WF result might be significantly underestimated, since only one pollutant is considered. We do not recommend using the GWF, since the method does not allow to conduct a regionalized impact assessment. We also do not recommend using the pollutioninduced water scarcity method, since as demonstrated in the case study, it does not fit well for the impact assessment on a product level, which is typical for WF studies.
None of the methods can be recommended for the situation when the impacts originate from the intake of the contaminants, but only one or few water quality parameters are available in the inventory. This emphasizes the importance of the availability of inventory data. As addressed above, we do not recommend the application of GWF and the pollution-induced water scarcity method. We also do not recommend using the method of Ridoutt and Pfister (2013) due to two reasons: (1) the normalization and weighting steps may partly distort the results and (2) due to weighting, the results cannot be used for external communication. We also do not recommend to use the PWI, because as demonstrated in Section 5.7, the method is currently not robust enough.
The selection of a method can also be made depending on whether the results need to be provided at the midpoint or endpoint level. The latter can be quantified by means of impact assessment models for water quality degradation and the method of Boulay et al. (2011a, b).
As addressed above, a comprehensive inventory is essential to provide reliable results, in particular when using the methods based on the functionality approach and impact assessment models for water quality degradation. Nevertheless, data availability remains a significant challenge for LCA practitioners. Therefore, initiatives that provide quality assessments of global water resources and process-specific wastewater quality datasets play a significant role in enhancing water quality evaluation in water footprinting. Furthermore, determining industry-specific water quality parameters that have to be considered in the inventory analysis can significantly support practitioners in the application of the methods by (1) ensuring that all relevant pollutants are considered and (2) reducing the number of water quality parameters that need to be determined to the required ones. Providing a set of consistent water quality thresholds can increase the transparency and comparability of the results provided by the methods based on the DtT approach. Finally, further impact assessment models, e.g. for the pathogen pollution, need to be developed and included into the scope of water footprinting to address all impacts associated with the water quality deterioration.

Providing a comprehensive WF assessment
As stated in ISO 14046 (ISO 2014), the term water footprint can only be used if a comprehensive WF assessment is conducted; otherwise, a qualifier (e.g. WSF or "water eutrophication footprint") needs to be applied. At the same time, the standard allows for selecting the environmental impacts to be considered (e.g. water scarcity and/or water degradation) depending on the goal and scope of the study. As described in the introduction part of this article, currently, less than 50% of WF studies consider water quality (Lovarelli et al. 2016). This small share seems to be inappropriate considering the fact, that, on the one hand, there is hardly an industry that does not contribute to water pollution, and on the other hand, 80% of globally released wastewater is untreated (UN-Water 2020). Therefore, including both water quantity and quality in the WF assessment is the only proper way to address impacts related to the water use comprehensively. To achieve this goal, the WF can be determined either as the combination of WDF and WSF or as WAF (considering both water and emission flows in the inventory analysis). Combining WDF and WAF (e.g. as it is done in the PWI) may lead to the overestimation of the impacts due to the double-counting of the impacts as it is described by Berger and Finkbeiner (2013). Still, there might be cases when the application of both WDF and WAF is reasonable. For example, ecosystems and agricultural water users might be affected by the pollutants due to application of untreated wastewater (WDF), while domestic water users will suffer from water deprivation (WAF). In this case, the WDF and WAF should be applied based on the shares of affected water users, while the difference between the origin of the impacts (intake of the pollutants vs. lack of water) needs to be taken into account when interpreting the result.

Limitations of the case study
The case study serves for comparing the WF methods and is not intended to be used as a representative example of the textile production processes. Only eight water quality parameters were included in the case study. As discussed above, considering all relevant contaminants is crucial to provide robust results particularly when using the method of Boulay et al. (2011a, b) and methods calculating the WDF. However, comprehensive wastewater quality datasets are usually not available in literature and are difficult to obtain since wastewater quality analysis is time-consuming and costly, particularly when measuring additional parameters apart from the general ones as COD and TSS. This limitation regarding the inventory completeness applies for the WDF and WAF calculation in general irrespective of the industry being evaluated.

Conclusions
This paper provides an overview on existing methods to calculate impacts associated with water pollution and an analysis of the methodological aspects, strengths and shortcomings of each method. Decomposing modelled impact pathways and highlighting their methodological choices can support practitioners in choosing an appropriate way to implement a water quality assessment according to their goals and data availability when conducting a water footprint study. It is an alarming fact that different methods provide conflicting results for the two evaluated scenarios. This can lead to wrong interpretation of the results regarding which scenario is more beneficial or misuse of the results for communication purposes. Comparing WDF and WAF to WSF calculated in the case study also demonstrated a large difference between the results provided by different methods, which should be investigated in future research. Therefore, a clear guidance for the application of the methods to account for water pollution in water footprinting, particularly with regard to the inventory requirements and applied thresholds, is essential to provide robust results and facilitate method application for decision-support in politics and industry.