Can we take the pulse of environmental governance the way we take the pulse of nature? Applying the Freshwater Health Index in Latin America

Quantitative assessments have long been used to evaluate the condition of the natural environment, providing information for standard setting, adaptive management, and monitoring. Similar approaches have been developed to measure environmental governance, however, the end result (e.g., numeric indicators) belies the subjective and normative judgments that are involved in evaluating governance. We demonstrate a framework that makes this information transparent, through an application of the Freshwater Health Index in three different river basins in Latin America. Water Governance is measured on a 0–100 scale, using data derived from perception-based surveys administered to stakeholders. Results suggest that water governance is a primary area of concern in all three places, with low overall scores (Guandu-26, Alto Mayo-38, Bogotá-43). We conclude that this approach to measuring governance at the river basin scale provides valuable information to support monitoring and decision making, and we offer suggestions on how it can be improved. Electronic supplementary material The online version of this article (10.1007/s13280-020-01407-8) contains supplementary material, which is available to authorized users.


INTRODUCTION
Water security rightly ranks as a top environmental concern, and has spurred numerous efforts to accurately measure the quantity, quality, and ecological integrity of freshwater supplies at multiple spatial scales, and for a variety of audiences (Vollmer et al. 2016). But there has also been increasing recognition that issues of water insecurity are generally crises of governance, not just problems of inadequate supply or climate variability (Rogers and Hall 2003;McDonnell 2008;Tortajada 2010;Bakker and Morinville 2013;Akhmouch 2014). Water resource management increasingly refers to managing relations among stakeholders, rather than a single institution managing a physical resource (Falkenmark 2004). Where the underlying governance system is weak, stakeholders are unable to efficiently and effectively respond to pressures like pollution, increasing water demand, and freshwater ecosystem degradation. Yet prevailing assessments of sustainability have typically focused on the technical and biophysical factors that readily lend themselves to quantificationthese could be viewed as the ''outcomes'' of water governance (Wiek and Larson 2012;Schneider et al. 2014) but by themselves do not offer insights into the impact of different aspects of governance. Quantitative indicators are now widely used to assess the sustainability or resilience of freshwater systems (Vollmer et al. 2016), but Pires et al's (2017) review of 170 sets of water sustainability indicators revealed that only about one-third even included governance indicators. And comparative water governance is a growing field, but is dominated by qualitative, one-off assessments (Ö zerol et al. 2018).
From these existing efforts to quantitatively assess water governance, a number of insights have emerged. The first is that water governance should no longer be viewed as the domain of governments alone; it involves multiple levels of participation from public institutions, along with private sector and civil society actors (Tortajada 2010). The second is that, while ''good governance'' is a laudable goal, there is no universal definition (Ö zerol et al. 2018), though guidance has tended to focus on common themes such as the enabling environment (policies, laws, regulations, norms), engagement or participation, and performance (Bertule et al. 2017). The third insight is that, for quantitative analysis there are few alternatives to survey data, including perception-based surveys, but these can provide value in governance assessments (Kaufmann et al. 2010). Consequently, analysts have often resorted to articulating frameworks and general principles for evaluation, rather than normative approaches to quantitative assessment (Woodhouse and Muller 2017), allowing some space for local adaptation and interpretation.
The OECD's Principles on Water Governance (Akhmouch et al. 2018) is one of the latest examples of this, offering a list of 36 input and process indicators along with a check list for self-assessments. Some frameworks have instead focused on problem structuring and comparative analysis rather than prescribing specific processes, noting that the link between ''good'' governance processes and improved biophysical outcomes is largely unproven (Knieper et al. 2010;Pahl-Wostl et al. 2010). Others have concentrated more narrowly on specific institutional arrangements, chiefly river basin organizations (RBOs), offering suites of indicators to assess institutional performance (e.g., Hooper 2010) as a proxy for progress toward more integrated water resource management (IWRM). The advent of the Sustainable Development Goals (SDGs) has also renewed interest in measuring water governance at a national and ultimately global scale. Bartram (2018) assesses SDG 6, the ''water goal'', and highlights the challenges of linking the governance (''means of implementation'') targets with the more concrete ''outcomes'' that can be measured in biophysical or economic terms, questioning their universal applicability and acceptability. Bertule et al. (2018) note that the IWRM target (6.5.1) is typically measured via survey administered to a national focal point, often a single person within a Ministry of Water Resources, with three quarters of countries not undertaking stakeholder consultations prior to reporting. Moreover, national level governance indicators, although useful for a quick international comparison, do not necessarily correspond well to environmental and natural resource issues which typically have important local (subnational) and basin-specific characteristics (Barrett et al. 2006).
Analysts of water governance must choose among three possible data sources and collection methods (Fig. 1). Using existing data carries the advantages of transparency, presumed objectivity, and reproducibility (assuming that the datasets are routinely updated). However, suitable data are rarely available, leaving analysts to select between coarse global products such as the Worldwide Governance Indicators (Kaufmann et al. 2010) or the IWRM data portal (UNEP-DHI 2018), or developing numeric proxies such as budget line items, meetings held, or negative media mentions-these are easy to count but are not necessarily relevant to the real subject of interest (Stefano 2010). Moreover, observable data provide a narrow frame for examining the complexity of governance (Olsson and Jerneck 2018). For these reasons, checklists are frequently developed and employed (Wilde et al. 2009); they are designed to facilitate specific data collection in a transparent and repeatable manner. At their simplest, they are constrained to binary choices so that the analyst simply answers yes/no questions and can tally a quantitative score based on that.
However, this is less suitable for monitoring progress (Wilde et al. 2009), and is not designed to capture the nuance or subjectivity of, for example, terms like transparency and accountability (Huitema and Meijerink 2017), or more broadly, the gap between de jure and de facto governance in a place (Wilde et al. 2009;Kaufmann et al. 2010). Checklists also tend to be rigid, mirroring a noted shortcoming of prescriptive IWRM approaches (Ait Kadi 2019). For example, stakeholder engagement is widely promoted, but still lacks an evidence base demonstrating its impact on water governance (Akhmouch and Clavreul 2016) and may not be universally accepted as a high priority. This has led more assessments toward perceptionbased data, applying methods ranging from a single analyst using a numeric rating scale (Araral and Yu 2010;Davis et al. 2013) to large groups of stakeholders ranking statements or giving their own ratings (Adger et al. 2005;Carmenta et al. 2017;Cradock-Henry et al. 2017). Though subjective and potentially less reliable than directly observable phenomena (Stefano 2010), perception data are valid because agents base their actions on perception (Kaufmann et al. 2010;Carmenta et al. 2017). Assessing stakeholders' perceptions can also reveal their current understanding of the governance structure and dynamics (Musacchio et al. 2020).
It is a non-trivial task to ''take the pulse'' of water governance the way that we now commonly measure nature or the economy, by constructing quantitative indicators that can be monitored over time. Analysts must be cognizant of the tension between local relevance versus global interest, and subjective versus objective information, as well as the logistical challenges of collecting useful data. We build on recent efforts in this field and present an application of the Freshwater Health Index (FHI) framework (Vollmer et al. 2018) to assess water governance in three case studies in Latin America. The FHI includes a composite set of indicators to measure the ecological integrity, delivery of ecosystem services, and governance in river and lake basins. To calculate the governance indicators, we administered a survey to groups of stakeholders in three countries-Guandu basin in the state of Rio de Janeiro, Brazil, Alto Mayo basin in the Andean Amazon region of Perú, and the water supply area for metropolitan Bogotá, Colombia. These same stakeholders underwent a weighting exercise to determine levels of consensus around priorities and, by extension, areas of disagreement that could forestall action on improving water governance. By comparing results from these three studies, we are able to provide insights into how a systematic framework can be applied and adapted locally to measure water governance, the types of information it can generate, and the strengths and weaknesses of the approach.

Case Study Basins
The three selected case studies all come from South America, where water resources are generally abundant but where water governance, according to a recent countrylevel assessment, has underperformed relative to other parts of the globe (Bertule et al. 2018). All three basins were part of a project to apply the Freshwater Health Index (Vollmer et al. 2018;Bezerra et al. 2020) and had been selected based on existing relationships with relevant governance stakeholders and not with a goal of cross-basin comparative analysis in mind. At the time of writing we were not aware of any qualitative assessments of water governance in these basins, making it difficult to validate our quantitative results against other assessment methods. However, to the extent possible we interpret survey scores in light of the context of each place.
In the Bogotá, top water management concerns include meeting water demands for the city of about 10 million people, and ensuring that upstream activities such as potato cultivation, tanneries, and artisanal mining do not jeopardize water quality. Water supply is sourced from a system of five watersheds and storage reservoirs, which deliver water to the city and surrounding municipalities, with wastewater then channeled to the Bogotá River. Day to day management is a collaboration among the district government, the water and sanitation public utility known as Acueducto, and the regional environmental agency (CAR). The three entities signed a legal document in 2007 (Convenio 171) that spelled out their respective responsibilities. CAR is to focus on conservation and reversing environmental degradation, executing national policies. Acueducto (which is owned by the District government) focuses on water supply, distribution, and sanitation infrastructure, including wetland protection as a form of natural infrastructure.
The Guandu basin in Brazil is a highly engineered coastal watershed that acts as the water supply to approximately 9 million people in metropolitan Rio de Janeiro, mainly through a diversion of water from the much larger Paraíba do Sul basin (González-Bravo et al. 2019). Most water users reside east of Guandu basin, in neighboring Guanabara basin. The State of Rio de Janeiro created a basin committee (Comite Guandu) in 2002, affiliated with the State Council of Hydrologic Resources, to provide advisory services, lead deliberations, and promote participatory management. Complementing the committee is AGEVAP, its executive arm, which collects user fees and applies the funds to carry out water resource management plans. Top concerns include water quality due to the industrial activities located in Guandu basin, and water allocation between users residing in Guandu and the larger urban population in the city of Rio de Janeiro.
Finally, the Alto Mayo basin in Perú is a typical Andean-Amazonian watershed, still nearly 80% forested but experiencing some of the highest rates of deforestation in the region as land is cleared for livestock, rice, and coffee cultivation. The basin's population of 248,000 people includes 14 indigenous communities with customary rights over one-fifth of the area. Compared to the other two case study basins, water governance in the Alto Mayo basin is more centralized-the national government has traditionally played a strong role focused on irrigation development, although a basin committee was recently and voluntarily established as the first of its kind in the Andean Fig. 1 Sources of data for governance assessments, and their relative merit Amazon (ANA 2017). A substantial portion of the basin is protected forest, and although water supply is abundant, concerns are emerging about water quality due to excess sediment from forest degradation, and pollution from coffee processing.
In all three basins, a much wider network of stakeholders play important roles in water governance, including environmental monitoring, territorial planning, biodiversity conservation, advocacy, and women's empowerment. We do not detail all of those entities here but all were involved in the assessment process. This is one of the aims of the methods we present, to represent voices from these diverse water governance actors.

FHI Governance and Stakeholders assessment
Between March and November 2018, a team of researchers worked with stakeholders from each of these basins to apply the FHI framework. The full framework ( Fig. 2) includes indicators for ecological integrity, ecosystem services, and governance, and has been applied in China (Vollmer et al. 2018) and the Lower Mekong (Liu et al. 2019;Souter et al. 2020). This is the first application of the framework in Latin America, and the full case studies and biophysical results are presented in Bezerra et al. (2020). FHI indicators are all quantified and placed on a 0-100 scale for ease of comparison-governance indicators are not combined with biophysical indicators to create a greater composite, but the rationale behind providing quantitative indicators is so that all aspects of the freshwater system can be monitored over time, according to the common framework, and priorities identified. It is not designed for inter-basin comparison mainly because the target end-users are decision makers within each basin.
Here for the first time we present the detailed methods applied in measuring the ''Governance & Stakeholders'' pillar of the Index. This pillar comprises four major indicators that further comprise twelve sub-indicators (see Table 1). The Governance & Stakeholders indicators share common elements with the OECD's principles of water governance (Akhmouch et al. 2018) as well as the earlier UNDP guidance on assessing water governance (Jacobson et al. 2013), such as: • focusing on the broader governance context in which specific paradigms such as IWRM operate, • including stakeholders as a variable within governance, and • accounting for effectiveness (in addition to process).
In addition, the FHI framework includes a major indicator and two sub-indicators that focus on the idea of adaptive governance (Bakker and Morinville 2013;Chaffin et al. 2014).
We administered a perception-based survey to groups of representative stakeholders in each basin, with the goal of creating a richer dataset that reflects the diversity of opinions in each governance system. We used purposive sampling to identify participants, by mapping the stakeholder groups in each basin and inviting a cross-section to ensure participation from the various levels and agencies of government, along with participants from civil society, academia, and the private sector.
Using a Likert-type ordinal (1-5) rating scale, participants were asked to rate their opinion regarding the quality or degree of implementation of a variety of topics related to water governance in their respective basins. The survey instrument was written first in English (Supplementary Material) and then translated into Spanish and Portuguese. It included 12 modules, each corresponding to a sub-indicator (Table 1), and each module contained between three and six statements for participants, totaling 51 statements, though everyone was instructed to only rate statements for which they felt knowledgeable. Surveys were administered in person at separate workshops (one per basin). Responses were kept anonymous, although respondents' sectoral affiliation (e.g., government, or civil society organization) and geographic location (e.g., upstream or downstream) were recorded. In total, 22 respondents completed the survey for Guandu basin, 29 for Alto Mayo, and 60 for Bogotá-the latter had a larger sample to reflect its larger geographic area and the fact that the study area actually comprises five different sub-basins and thus a greater diversity of local stakeholder groups.
Data were aggregated in each basin for the purposes of calculating mean values for each of the 12 sub-indicators, the four major indicators, and a final summary value for the Governance & Stakeholders component of the FHI assessment. Although it is generally inadvisable to take the mean of ordinal scale data because the difference between points are not necessary equal, individual items can be grouped into constructs, or ''survey scales'', such as our 12 modules, and then taking the mean of these constructs is appropriate (Sullivan and Artino 2013). We calculated Cronbach's alpha (Eq. 1) for each sub-indicator, to evaluate whether the individual statements or components of each scale were intercorrelated and thus a reliable or internally consistent measure.
where: k is the number of scale items, r 2 y i is the variance associated with item i, r 2 x is the variance associated with the observed total scores If the scale items have no covariance, a would = 0; as covariance increases, a approaches 1. Although there is no strict guideline for interpreting a, the typical minimum recommended coefficient value is between 0.65 and 0.8 for scales with a small number of components or test items (Vaske et al. 2017) such as our survey modules. Finally, we calculated the interquartile range (IQR) for each of the 51 statements. This provides a measure of the variance of responses, as an accompaniment to the mean value, in cases where responses might be polarized around extreme values rather than exhibiting a normal distribution. We therefore use the IQR as a measure of consensus (Novakowski and Wellar 2008), with values B 1.0, 1.01-1.99, and C 2.0 corresponding to ''high'', ''moderate'', and ''low'' consensus, respectively. But the IQR also allowed us to test what effect, if any, increasing our sample size had in reducing variance, which could in turn increase confidence in the results and lessen the impact of any one respondent registering extreme values.

Weighting and scoring indicators
The 51 individual statements were left unweighted, meaning that, for example, if a module had three questions, each statement contributed one-third of the average rating for the corresponding sub-indicator. But because subindicators were aggregated into the four major indicator scores, and those were subsequently aggregated to a final Governance & Stakeholders value, we asked participants to undertake a weighting exercise so that the four indicators and their twelve sub-indicators could be weighted in a transparent manner specific to that particular basin. To avoid our survey results biasing participants' views of what they consider more or less important, we administered the weighting exercise prior to sharing any results from the governance survey. For this, we used the Analytic Hierarchy Process (AHP) (Saaty 2005), in which participants made a series of pairwise comparisons first among the major indicators (the top-level hierarchy), and then within each major indicator group, among its sub-indicators. Participants used a standard linear (1-9) scale to register their individual preferences. These quantitative preferences fill a comparison matrix, from which we calculate the normalized Eigen vector, giving us each individual's relative weights or priorities, along with a consistency ratio. Global (weighted geometric mean) priority weights are calculated for the group, along with a consistency ratio for the group and a consensus measure, based on Shannon a and b entropy (Goepel 2013). Again, there is not a strict standard for measuring group consensus, but in this case we interpret 65% and below as low consensus, 65-75% moderate, and greater than 75% to be high.
To calculate final scores on a 0-100 scale, we first averaged the ratings for each of the 12 modules. Since these ratings ranged between 1 and 5, we transformed the scale by subtracting 1 from the average rating and multiplying the result by 25 to arrive at a score out of a possible 100. As a reference, a rating of 3 corresponds to a numeric score of 50. To aggregate these sub-indicator scores into major indicator scores, we used the stakeholder-derived weights and calculated a weighted geometric mean, i.e., x w 1 1 Â x w 2 2 Â Á Á Á x w n n , which is more sensitive to the weights than an arithmetic mean. We performed the same process on the major indicators to arrive at an overall score.

RESULTS AND DISCUSSION
The Freshwater Health Index was designed for use in monitoring and resource management in individual basins, not necessarily to promote comparative analyses across case studies, given the unique geographic, hydrologic, and sociopolitical contexts of basins around the world (Vollmer et al. 2018). Thus, we present here the results of the governance assessments from the three cases to illustrate the insights that our methods can produce and their applicability in a range of contexts. Among the three basins, Guandu registered the lowest overall score with 26 (out of 100), followed by Alto Mayo at 38 and Bogotá at 43. It is apparent that overall, water governance is not performing up to most stakeholders' expectations in any of the basins and so a general conclusion we can draw is that improving water governance should be a high priority in all three places. But the exceedingly low score in Guandu was surprising, given that it appears to have the most mature formal governance structure. This may be owing to the fact that most of its water resources are derived from an interstate basin (Paraiba do Sul, shared between Rio and Sao Paulo states), and most of its beneficiaries reside in another basin (Guanabara), but stakeholders also noted that there is a gap between the existence of formal mechanisms and their successful implementation. This is also consistent with the observation that the implementation of IWRM principles has been very slow throughout Brazil, in spite of the creation of river basin committees (Costa et al. 2017). Table 2 summarizes the full results of both the governance survey and the weighting exercise. We found that the Cronbach's alpha was satisfactory for each sub-indicator, with values ranging from 0.62 to 0.95, and averages of 0.80 for Alto Mayo and Bogotá and 0.82 for Guandu, suggesting With such small sample sizes (ranging from n = 22 in Guandu to n = 60 in Bogotá), we caution against interpreting too much from split samples (in this case, by sector). But qualitatively, we did observe trends in Alto Mayo and Bogotá that conformed to hypotheses, namely that government representatives were more likely to provide scores greater than the mean, while NGO and community representatives provided scores lower than the mean, with the ''experts'' from academia splitting the difference. This trend was most pronounced in Bogotá, where we compared the responses of those from the public sector (n =34) with all other groups (academia, industry, NGOs, and community groups, n = 26). Recalculating scores with the government subset yielded an overall score of 51, with subindicator scores of 54, 53, 45, and 51. The other sample of all non-government stakeholders yielded an overall score of 36, with sub-indicator scores of 40, 35, 37, and 32, highlighting a substantial difference between government stakeholders' perception and that of the non-government stakeholders. When we singled out community and NGO actors, their averages were even lower than the non-government group mean (32 and 34 respectively). Finally, we ran one-tailed t-tests, assuming unequal variance, on responses for each of the 51 statements and found that the two groups (government and non-government) were statistically different (p \ 0.05) on 35 of them. We must reiterate that these are perception data, thus there is not an objective true rating, but in cases like this, the differing mean values between sub-groups highlight discrepancies between stakeholder groups, and at a minimum these discrepancies are fodder for further discussion and analysis.

Guandu, Brazil
For the Enabling Environment category, stakeholders in Guandu placed a comparatively low weight (relative to stakeholders in Alto Mayo and Bogotá) and registered the highest score for the Water Resource Management subindicator, and placed comparatively higher weights on both Rules for Resource Use and Incentives and Regulations. This perhaps reflects the fact that the laws about roles and responsibilities for river basin committees are already promulgated; participants also gave the highest overall rating (mean of 3.1 on the 1-5 scale) on the statement relating to infrastructure being centrally managed. The quality and clarity of rules around water allocation received a relatively high rating (2.7), while rules for groundwater abstraction were rated the lowest in that sub-indicator category (1.8), similar to the situation in neighboring Sao Paulo (Borges and Santos 2014). Ratings for financial incentives for environmental stewardship, as well as land use zoning policy, were high (2.7), but market-based schemes rated low (1.6), reflecting the fact that there has been little consideration of, for example, tradeable water rights in the region. Stakeholders placed a high weight on the sub-indicator Water-related conflict, which received the overall highest sub-indicator score in Guandu at 37 and seems to correlate with the higher rating for rules around allocation. The score was driven by a relatively high rating (meaning good performance) for conflicts about water rights allocation (3.0) but a low rating for conflicts regarding water access (1.8). It was explained that, in particular, residents in Rio municipality typically enjoy reliable access to water services, while residents of the municipalities actually living in Guandu basin experience inferior service. Stakeholders remarked that the FHI as implemented did not seem to capture this discrepancy adequately, particularly for the four municipalities physically located within Guandu basin (Queimados, Paracambi, Japeri, and Nova Iguaçu) that have been subject to curtailed water supplies 2-3 times per week, as a result of switchover operations (Britto et al. 2016). Finally, stakeholders placed their highest weight on the Vision and Adaptive Governance indicator, and a high weight on the Monitoring and Learning Mechanisms sub-indicator, which actually received the lowest score (19) of the entire governance assessment. As a result, stakeholders became more aware of the quality and coverage of existing data, the lack of reliable information on climate and discharge in particular, and the fact that not all existing monitoring stations are actively collecting data. Thus, one initial outcome of the FHI assessment was that stakeholders from the Guandu Committee and AGEVAP stated that they would use the results to guide investments in additional monitoring.

Alto Mayo, Peru
Despite being a remote watershed in the Andean Amazon with a comparatively small population, the Alto Mayo has been a pilot site for numerous water governance initiatives. But this small population and thus lack of a revenue base of water users may help explain why the five statements about financial capacity garnered the lowest average score (2.2 out of a possible 5, where 3 corresponds to ''Satisfactory''). Among these, water supply and delivery system investments scored marginally higher (2.5 and 2.6), while wastewater management, ecosystem conservation, and monitoring and enforcement investments all scored 2.0 or lower. This gap between policy and investment is also illustrated by the fact that Incentives and Regulations received the highest sub-indicator score (49), while Financial Capacity received the lowest (30). Given the amount of forest conservation and agriculture taking place, and the income inequality between upstream residents and the more urbanized downstream communities, Alto Mayo has been testing out payments for watershed services as an additional financial mechanism since residents in the city of Moyobamba agreed to an increase in fees in 2009 (Stern and Echavarria 2013). Perhaps the most interesting result from Alto Mayo was the high weight stakeholders assigned to the Stakeholder Engagement indicator (0.4). The region is home to a large population of indigenous peoples and the corresponding score for this indicator (40) outperformed  Ó The Author(s) 2020 www.kva.se/en the overall score slightly, but the score for the Information Access sub-indicator was much higher than its companion sub-indicator on Engagement in Decision-making Processes (44 versus 37). Stakeholders placed their lowest weight on the Water-related Conflict sub-indicator (0.19), which incidentally received the highest sub-indicator score (46), suggesting that it is at present a lower concern. Interestingly, however, the statements in the Water-related Conflict sub-indicator exhibited the highest average IQR (1.85) of any sub-indicator we evaluated across all three basins. Statements about overlapping jurisdictions, water access, infrastructure siting, and downstream water quality conflicts all had an IQR of 2, signifying ''low'' consensus.
One factor underpinning this could be that there is also a perception that the most economically vulnerable populations in the Alto Mayo basin are not equally benefitting from the region's resources, as stakeholders rated that statement lowest of all (1.9) and with a relatively low IQR (1) and variance (0.7) in responses. This is consistent with findings in Ostovar (2019), where Peru's historically marginalized communities in the Andes hold distinctly different preferences and worldviews regarding watershed protection, relative to the majority of downstream users.

Bogotá, Colombia
Compared to the other two basins, stakeholders in Bogotá placed greater emphasis on the Water Resource Management, giving it a weight about double (0.23) that observed in Guandu and Alto Mayo. Among the individual statements there, the (translated) statement ''Ecosystem conservation priorities are developed and actions implemented'' was rated especially low (2.3). Its best performing sub-indicator was Incentives and Regulations, with a score of 57, but this was brought down by a low rating for honorary recognition programs (2.5) although this statement also had the highest IQR (2.5), suggesting either strong disagreement about the quality of such programs (the majority of respondents rated a 1, meaning they either do not exist or are in an early stage of discussion) or confusion about what types of programs would belong in this category. Another statement that stood out in the Bogotá assessment was the rating on gender, specifically how women and girls benefit from ecosystem services. Stakeholders rated the statement 3.2, the highest of all individual statements in the survey, and nearly a full point higher than Guandu and Alto Mayo (both 2.3). Finally, water quantity monitoring was rated relatively high (3.1), which is not surprising given the role that the municipal utility plays in managing water supply for the region. On the other hand, biological and ecological monitoring received the single lowest rating (2.3) of any statement on the Bogotá assessment, and was thus noted as a priority area for improvement.

The utility of governance ''self-assessments''
One major distinction of the approach presented here is that governance actors themselves lead the assessment (as opposed to an external analyst), and the methods can flexibly accommodate as many actors as are interested in participating. Granted, we as analysts introduced the framework and administered surveys, not unlike qualitative researchers conducting interviews and document reviews. But by relying solely on perception data and standardized responses, remaining biases in the analysis are largely attributable to the respondents themselves, i.e., those who represent the water governance system in each basin, rather than the external analyst. Had we only surveyed academic experts, we might have obtained similar quantitative scores, at least in the cases of Alto Mayo and Bogotá, but we would not have been able to observe the slight positive and negative biases that government and non-government actors, respectively, hold. This is important in terms of the transparency of our approach but is also potentially important information for the governance actors themselves as they work toward more participatory and integrated water resource management. After all, they base decisions on their perception (Kaufmann et al. 2010;Carmenta et al. 2017) and so it is helpful to have more insight into what perceptions are and how they vary among stakeholders. Similarly, the weighting exercise gives agency to the actors themselves to identify their preferences and, by extension, their priorities when it comes to maintaining or improving freshwater health. Indicator-based assessments do sometimes allow the decision makers to weight components, but there is not a universal process for doing so, particularly in group settings (Sharpe 2004;Vollmer et al. 2016). We employed the commonly used AHP but a range of weighting approaches from decision science (e.g., rating, ranking, ratios) would suffice. Based on our results, as measured by consistency ratios, and subsequent feedback from participants, the pairwise rating scheme of the AHP was challenging, particularly with four or more sub-indicators, because the required choice sets and thus chances for inconsistent responses are much greater. AHP software typically offers prompts to participants to adjust their responses if they fail to meet a prescribed consistency ratio, but this is not a fail-safe. For this reason, we suggest exploring other methods that are slightly less cognitively demanding but still allow for maximum participation.
The survey instruments we developed provide a template for decision makers in each basin to monitor changes over time. This may help to raise the profile of water governance on par with the more regular monitoring of the biophysical indicators of freshwater health. And as we found in all three basins, the governance indicators are almost universally low-performing, meaning that there is a lot of room for improvement and therefore a need to monitor this improvement. Having a baseline measure to compare against can also help decision makers understand how far the governance system needs to improve and, by extension, how long this might take. Changes in the governance system take time and are typically non-linear (e.g., a new water law's passage could have substantial and farreaching impacts), so it is too soon to say the right frequency for this kind of monitoring, or what specific factors can drive changes in indicator scores. Still, with the baseline established and tools in hand, these data will eventually be available and valuable in future research. Although the results of these surveys are influenced by the stakeholders who participate, by involving a broad group of stakeholders they should be less susceptible to bias than assessments done by a single analyst or small group. The results from Bogotá showed how non-government actors provided a ''counter-weight'' to government actors; an assessment excluding one of these sub-groups would have presented a different picture.

Issues of scale, inclusion and consensus
Water governance in the three case studies, as is true in most of the world, is becoming more polycentric (Woodhouse and Muller 2017). Rather than specify an ideal scale or key actor with the FHI, it is important for information to flow between spatial scales (Rouillard and Spray 2017), recognizing the influence of power dynamics as well as the multi-level governance processes in place (Bakker and Morinville 2013;Norman et al. 2013). In the case of Alto Mayo, we were working with a watershed committee, but this is still an informal mechanism nested within the larger Huallaga River basin, which is the hydrographic region designated by Peru's national water agency, ANA. ANA recently conducted its own nationwide water governance assessment, using a checklist and qualitative indicators following the OECD framework, (ANA 2018). The richer and finer scale data we collected here could conceivably be nested within ANA's broader assessments, and our methods could be replicated nationwide, but would require similar workshops that collectively might involve thousands of participants. The finer scale assessment approach that we have demonstrated supports a primary goal of the FHI, which is to help more stakeholders understand their impacts and dependence on ecosystem services in their particular basin.
Yet the Guandu example demonstrated that single hydrographic regions and their management committees are not necessarily the most suitable scale of assessment either. One of the main recommendations from stakeholders there (many of whom meet as part of the Guandu Basin Committee) is that the assessment needs to be extended to stakeholders from neighboring Guanabara Bay and the middle Paraiba do Sul River Basin, to accurately reflect the water supply and demand areas of metropolitan Rio. Similarly, the Bogotá case study area involves portions of five different hydrographic basins making up its municipal supply system. This is one of the reasons for the larger group of stakeholders in the Bogotá case, as we sought to have representation from all five basins in the supply system, along with representatives from the various communities and local governments.
The Bogotá case also allowed us to test a hypothesis about sample size. Our hypothesis was that by increasing the sample size, we would reduce the variance and IQR of responses, providing additional confidence that scores were not unduly influenced by a small faction. While it is true that the average IQR for Bogotá was the lowest of the three cases (1.17 compared to 1.26 for Guandu and 1.34 for Alto Mayo), there was no clear reduction in variance measured at the sub-indicator level. Moreover, the measures of consensus for indicator weights in the Bogotá case were no different from the other two basins (Table 2). Therefore, we would suggest that the number of participants in the surveys should be determined based on a judgment about stakeholder groups that should have a voice in the assessment process, recognizing that there will be an upper limit to the number of people in a given geography with sufficient knowledge about the range of water governance issues. Representativeness is likely more important than sample size in gaining insight into actual water governance dynamics.
Which leads to another issue-how best to reflect the differing viewpoints that arise during the assessment. The indicators in the FHI, like many quantitative approaches, require a mean or summary value which, in the case of the governance sub-indicators as well as the weights, may not capture the variance in responses. We measured and reported this variance, and of course the lower the variance, the more confidence we have that our mean values are adequately representing the collective perception of the participants. But less than a third of all the weighting tasks the groups completed registered as at least ''moderate'' consensus (scores of 65 and above). In other words, for the vast majority of reported weights we observed low or very low consensus-participants may be disagreeing on the relative weights or even the ranking of the indicators and sub-indicators. Like the OECD approach, we report on the strength of consensus for each group of indicators, though in the case of the OECD it is qualitative and so it is unclear how it is determined. This contrasts with the current process for measuring SDG 6.5.1, where workshops are encouraged to ''foster consensus'' around scores (Bertule et al. 2018) but without guidance on what constitutes consensus. Our reporting of IQRs for each sub-indicator suggests that, with a few exceptions, consensus was generally ''moderate'' or ''high'' (Table 2).
In this research we treated consensus as a measurable, making clear that the summary scores represented mean values that in some cases were masking highly diverse viewpoints. The goal of the initial FHI assessment is to develop a baseline understanding of the biophysical and water governance dynamics in a basin, not necessarily to force consensus among decision makers as to what these dynamics look like. This informal network of stakeholders is thus actively contributing to the co-production of knowledge of governance in their basin, which should help with the legitimacy of the information (Armitage et al. 2015). By first eliciting their individual responses, stakeholders can understand how divergent their views are and determine whether and where to compromise when it comes to governance in their basin. Individual responses and voices might become muted if there are more dominant actors in the room, or less engaged if the process is treated as a validation of pre-determined (e.g., government or expert-led) scoring. Without this information about the level of consensus and, more importantly, where there may be geographic or sectoral factions, decision makers may be missing opportunities to resolve underlying conflicts or heading off impending ones. In a separate exercise, one might consider adopting a Delphi-method approach, where initial results of our assessment are revealed to participants who are then given the opportunity to debate the issue and amend their responses if they would like to. Stakeholders often benefit from having dedicated time to debate issues not easily captured by data and measurement (Bosch et al. 2012).

CONCLUSION
We have designed the FHI approach to water governance assessment to be low-cost and provide this basic, baseline information, as a sort of screening before deeper (more costly and complex) diagnostics. We demonstrated through three case studies how the FHI can be applied in varying contexts. The diversity of perspectives it can accommodate is a strength, but augmenting the quantitative results with more qualitative information is advisable (Wesselink et al. 2017) to provide deeper insights into issues of influence, networks, and power dynamics (McDonnell 2008) and lend further interpretation to the numbers and their nuances. By incorporating stakeholders' perceptions, there is not going to be a single objective ''reality'' of the governance system that we might compare our results against. Our measures of consensus illuminate where stakeholders are aligned and where there may in fact be divergent factions. Our results generally conformed to stakeholders' expectations and previous qualitative studies from the basins, but our assessment provides new and valuable insights. Like an ECG, the FHI does not prescribe treatment. However, if used to monitor changes over time, it can be a valuable tool in adaptive water governance (Pahl-Wostl 2019). And although our focus is on the basins where the tool is being applied, if applied in a standard and transparent way, our framework and results can contribute to knowledge accumulation across cases (Wilde et al. 2009;Ö zerol et al. 2018). Water governance is increasing in its complexity as well as its importance, and so there is a clear need to harness all the knowledge we can from cases around the world in a way that is systematic yet retains the context of individual systems. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.